pyschema_extensions package

Submodules

pyschema_extensions.avro module

Extension for generating Avro schemas from PySchema Record classes

Usage:

>>> class MyRecord(pyschema.Record):
>>>     foo = Text()
>>>     bar = Integer()
>>>
>>> [pyschema_extensions.avro.]get_schema_string(MyRecord)

‘{“fields”: [{“type”: “string”, “name”: “foo”}, {“type”: “long”, “name”: “bar”}], “type”: “record”, “name”: “MyRecord”}’

class pyschema_extensions.avro.EnumMixin[source]
simplified_avro_type_schema(state)[source]
class pyschema_extensions.avro.FieldMixin[source]
avro_default_value()[source]
avro_dump(o)[source]
avro_load(o)[source]
avro_type_schema(state)[source]

Full type specification for the field

I.e. the same as would go into the “type” field. For most field, only simplified_avro_type_schema has to be implemented.

simplified_avro_type_schema(state)[source]

The basic avro type for this field

Not including nullability.

class pyschema_extensions.avro.FloatMixin[source]
avro_type_name[source]
class pyschema_extensions.avro.IntegerMixin[source]
avro_type_name[source]
class pyschema_extensions.avro.ListMixin[source]
avro_dump(obj)[source]
avro_load(obj)[source]
simplified_avro_type_schema(state)[source]
class pyschema_extensions.avro.MapMixin[source]
avro_dump(obj)[source]
avro_load(obj)[source]
simplified_avro_type_schema(state)[source]
class pyschema_extensions.avro.SchemaGeneratorState[source]

Bases: object

class pyschema_extensions.avro.SubRecordMixin[source]
avro_dump(obj)[source]
avro_load(obj)[source]
avro_type_name[source]
simplified_avro_type_schema(state)[source]
pyschema_extensions.avro.dumps(record)[source]
pyschema_extensions.avro.from_json_compatible(schema, dct)[source]

Load from json-encodable

pyschema_extensions.avro.get_schema_dict(record, state=None)[source]
pyschema_extensions.avro.get_schema_string(record)[source]
pyschema_extensions.avro.loads(s, record_store=None, schema=None)[source]
pyschema_extensions.avro.to_json_compatible(record)[source]

pyschema_extensions.avro_to_pyschema module

Helper functions for converting an Avro schema definition (json) to a PySchema python source definition.

TODO: Another idea is to read avro schema and create Python classes dynamically without generating python source code.

pyschema_extensions.avro_to_pyschema.get_field_definition(field, sub_records)[source]
pyschema_extensions.avro_to_pyschema.get_field_type_name(field_type)[source]
pyschema_extensions.avro_to_pyschema.get_first_type(field_type)[source]
pyschema_extensions.avro_to_pyschema.get_name(field)[source]
pyschema_extensions.avro_to_pyschema.get_pyschema_record(schema, sub_records)[source]
pyschema_extensions.avro_to_pyschema.get_sub_field(field)[source]
pyschema_extensions.avro_to_pyschema.get_sub_fields_name(sub_type)[source]
pyschema_extensions.avro_to_pyschema.is_nullable(field_type)[source]
pyschema_extensions.avro_to_pyschema.nullable_str(field_type)[source]

pyschema_extensions.jsonschema module

Extension for generating JSON schema schemas from PySchema classes

JSON schema: http://json-schema.org/

When dumping to JSON schema, all fields in a record are mandatory, although a list or map can be empty (but must be present). These records are still dump-able, but they will not validate against the schema.

Usage:

>>> class MyRecord(pyschema.Record):
>>>     foo = Text()
>>>     bar = Integer()
>>>
>>> [pyschema_extensions.jsonschema.]get_root_schema_string(MyRecord)

‘{“additionalProperties”: false, “required”: [“bar”, “foo”], “type”: “object”, “id”: “MyRecord”, “properties”: {“foo”: {“t ype”: “string”}, “bar”: {“type”: “integer”}}} ‘

class pyschema_extensions.jsonschema.EnumMixin[source]
jsonschema_type_schema(state)[source]
class pyschema_extensions.jsonschema.FieldMixin[source]
jsonschema_type_schema(state)[source]
class pyschema_extensions.jsonschema.ListMixin[source]
jsonschema_type_schema(state)[source]
class pyschema_extensions.jsonschema.MapMixin[source]
jsonschema_type_schema(state)[source]
class pyschema_extensions.jsonschema.SchemaGeneratorState[source]

Bases: object

class pyschema_extensions.jsonschema.SubRecordMixin[source]
jsonschema_type_name[source]
jsonschema_type_schema(state)[source]
pyschema_extensions.jsonschema.dumps(record)[source]
pyschema_extensions.jsonschema.get_root_schema_dict(record)[source]

Return a root jsonschema for a given record

A root schema includes the $schema attribute and all sub-record schemas and definitions.

pyschema_extensions.jsonschema.get_root_schema_string(record)[source]
pyschema_extensions.jsonschema.get_schema_dict(record, state=None)[source]

Return a python dict representing the jsonschema of a record

Any references to sub-schemas will be URI fragments that won’t be resolvable without a root schema, available from get_root_schema_dict.

pyschema_extensions.jsonschema.loads(s, record_store=None, schema=None)[source]

pyschema_extensions.luigi module

Basic utilities for using PySchema in Luigi’s Python MR lib

pyschema_extensions.luigi.mr_reader(job, input_stream, loads=<function loads at 0x10ab08cf8>)[source]
Converts a file object with json serialised pyschema records
to a stream of pyschema objects

Can be used as job.reader in luigi.hadoop.JobTask

pyschema_extensions.luigi.mr_writer(job, outputs, output_stream, stderr=<open file '<stderr>', mode 'w' at 0x108a701e0>, dumps=<function dumps at 0x10ab08d70>)[source]

Writes a stream of json serialised pyschema Records to a file object

Can be used as job.writer in luigi.hadoop.JobTask

pyschema_extensions.postgres module

Postgres style SQL generation based on pyschemas

Quite incomplete and still a work in progress

pyschema_extensions.postgres.camel_case_to_underscore(name)[source]
pyschema_extensions.postgres.create_statement(schema, table_name=None)[source]
pyschema_extensions.postgres.types(schema)[source]

Module contents