dae.parquet.schema2 package
Submodules
dae.parquet.schema2.parquet_io module
- class dae.parquet.schema2.parquet_io.ContinuousParquetFileWriter(filepath: str, annotation_schema: list[dae.annotation.annotation_pipeline.AttributeInfo], filesystem: AbstractFileSystem | None = None, row_group_size: int = 50000, schema: str = 'schema', blob_column: str | None = None)[source]
Bases:
object
A continous parquet writer.
Class that automatically writes to a given parquet file when supplied enough data. Automatically dumps leftover data when closing into the file
- BATCH_ROWS = 1000
- DEFAULT_COMPRESSION = 'SNAPPY'
- append_family_allele(allele: FamilyAllele, json_data: str) None [source]
Append the data for an entire variant to the correct file.
- append_summary_allele(allele: SummaryAllele, json_data: str) None [source]
Append the data for an entire variant to the correct file.
- class dae.parquet.schema2.parquet_io.VariantsParquetWriter(out_dir: str, annotation_schema: list[dae.annotation.annotation_pipeline.AttributeInfo], partition_descriptor: PartitionDescriptor, bucket_index: int = 1, row_group_size: int = 50000, include_reference: bool = True, filesystem: AbstractFileSystem | None = None)[source]
Bases:
object
Provide functions for storing variants into parquet dataset.
- write_dataset(full_variants_iterator: Iterator[tuple[dae.variants.variant.SummaryVariant, list[dae.variants.family_variant.FamilyVariant]]]) list[str] [source]
Write variant to partitioned parquet dataset.
- write_summary_variant(summary_variant: SummaryVariant, attributes: dict[str, Any] | None = None, sj_base_index: int | None = None) None [source]
Write a single summary variant to the correct parquet file.
dae.parquet.schema2.serializers module
- class dae.parquet.schema2.serializers.AlleleParquetSerializer(annotation_schema: List[AttributeInfo], extra_attributes: List[str] | None = None)[source]
Bases:
object
Serialize a bunch of alleles.
- ENUM_PROPERTIES = {'allele_in_roles': <enum 'Role'>, 'allele_in_sexes': <enum 'Sex'>, 'allele_in_statuses': <enum 'Status'>, 'inheritance_in_members': <enum 'Inheritance'>, 'transmission_type': <enum 'TransmissionType'>, 'variant_type': <enum 'Type'>}
- FAMILY_ALLELE_BASE_SCHEMA = {'allele_in_members': ListType(list<item: string>), 'allele_in_roles': DataType(int32), 'allele_in_sexes': DataType(int8), 'allele_in_statuses': DataType(int8), 'allele_index': DataType(int32), 'bucket_index': DataType(int32), 'family_id': DataType(string), 'family_index': DataType(int32), 'inheritance_in_members': DataType(int16), 'is_denovo': DataType(int8), 'sj_index': DataType(int64), 'summary_index': DataType(int32)}
- SUMMARY_ALLELE_BASE_SCHEMA = {'af_allele_count': DataType(int32), 'af_allele_freq': DataType(float), 'af_parents_called_count': DataType(int32), 'af_parents_called_percent': DataType(float), 'allele_index': DataType(int32), 'bucket_index': DataType(int32), 'chromosome': DataType(string), 'effect_gene': ListType(list<item: struct<effect_gene_symbols: string, effect_types: string>>), 'end_position': DataType(int32), 'family_alleles_count': DataType(int32), 'family_variants_count': DataType(int32), 'position': DataType(int32), 'reference': DataType(string), 'seen_as_denovo': DataType(bool), 'seen_in_status': DataType(int8), 'sj_index': DataType(int64), 'summary_index': DataType(int32), 'transmission_type': DataType(int8), 'variant_type': DataType(int8)}
- build_family_allele_batch_dict(allele: FamilyAllele, family_variant_data: str) dict[str, list[Any]] [source]
Build a batch of family allele data in the form of a dict.
- build_summary_allele_batch_dict(allele: SummaryAllele, summary_variant_data: str) dict[str, Any] [source]
Build a batch of summary allele data in the form of a dict.
- classmethod build_summary_schema(annotation_schema: list[dae.annotation.annotation_pipeline.AttributeInfo]) Schema [source]
Build the schema for the summary alleles.
- property schema_family: Schema
Lazy construct and return the schema for the family alleles.
- property schema_summary: Schema
Lazy construct and return the schema for the summary alleles.