dae.backends.schema2 package
Submodules
dae.backends.schema2.base_query_builder module
- class dae.backends.schema2.base_query_builder.BaseQueryBuilder(dialect: dae.backends.schema2.base_query_builder.Dialect, db: str, family_variant_table: str, summary_allele_table: str, pedigree_table: str, family_variant_schema: dict[str, str], summary_allele_schema: dict[str, str], table_properties: Optional[dict], pedigree_schema: dict[str, str], pedigree_df: pandas.core.frame.DataFrame, gene_models: Optional[dae.genomic_resources.gene_models.GeneModels] = None)[source]
Bases:
abc.ABC
Class that abstracts away the process of building a query.
- GENE_REGIONS_HEURISTIC_CUTOFF = 20
- GENE_REGIONS_HEURISTIC_EXTEND = 20000
- MAX_CHILD_NUMBER = 9999
- QUOTE = "'"
- WHERE = '\n WHERE\n {where}\n '
- build_query(regions: Optional[list[dae.utils.regions.Region]] = None, genes=None, effect_types=None, family_ids=None, person_ids=None, inheritance=None, roles=None, sexes=None, variant_type=None, real_attr_filter=None, ultra_rare=None, frequency_filter=None, return_reference=None, return_unknown=None, limit=None, pedigree_fields=None)[source]
Build an SQL query in the correct order.
dae.backends.schema2.bigquery_variants module
- class dae.backends.schema2.bigquery_variants.BigQueryDialect(ns: Optional[str] = None)[source]
Bases:
dae.backends.schema2.base_query_builder.Dialect
Abstracts away details related to bigquery.
- class dae.backends.schema2.bigquery_variants.BigQueryVariants(gcp_project_id, db, summary_allele_table, family_variant_table, pedigree_table, meta_table, gene_models=None)[source]
Bases:
object
Backend for BigQuery.
- query_summary_variants(regions=None, genes=None, effect_types=None, family_ids=None, person_ids=None, inheritance=None, roles=None, sexes=None, variant_type=None, real_attr_filter=None, ultra_rare=None, frequency_filter=None, return_reference=None, return_unknown=None, limit=None)[source]
Query summary variants.
- query_variants(regions=None, genes=None, effect_types=None, family_ids=None, person_ids=None, inheritance=None, roles=None, sexes=None, variant_type=None, real_attr_filter=None, ultra_rare=None, frequency_filter=None, return_reference=None, return_unknown=None, limit=None, affected_status=None)[source]
Query summary variants.
dae.backends.schema2.family_builder module
- class dae.backends.schema2.family_builder.FamilyQueryBuilder(dialect: dae.backends.schema2.base_query_builder.Dialect, db: str, family_variant_table: str, summary_allele_table: str, pedigree_table: str, family_variant_schema: dict[str, str], summary_allele_schema: dict[str, str], table_properties: Optional[dict], pedigree_schema: dict[str, str], pedigree_df: pandas.core.frame.DataFrame, gene_models=None, do_join_pedigree=False)[source]
Bases:
dae.backends.schema2.base_query_builder.BaseQueryBuilder
Build queries related to family variants.
dae.backends.schema2.impala_variants module
- class dae.backends.schema2.impala_variants.ImpalaVariants(impala_helpers, db, family_variant_table, summary_allele_table, pedigree_table, meta_table, gene_models=None)[source]
Bases:
object
A backend implementing an impala backend.
- build_family_variants_query_runner(regions=None, genes=None, effect_types=None, family_ids=None, person_ids=None, inheritance=None, roles=None, sexes=None, variant_type=None, real_attr_filter=None, ultra_rare=None, frequency_filter=None, return_reference=None, return_unknown=None, limit=None, pedigree_fields=None)[source]
Build a query selecting the appropriate family variants.
- static build_person_set_collection_query(person_set_collection: dae.person_sets.PersonSetCollection, person_set_collection_query: Tuple[str, Set[str]])[source]
No idea what it does. If you know please edit.
- build_summary_variants_query_runner(regions=None, genes=None, effect_types=None, family_ids=None, person_ids=None, inheritance=None, roles=None, sexes=None, variant_type=None, real_attr_filter=None, ultra_rare=None, frequency_filter=None, return_reference=None, return_unknown=None, limit=None) dae.backends.query_runners.QueryRunner [source]
Build a query selecting the appropriate summary variants.
- query_summary_variants(regions=None, genes=None, effect_types=None, family_ids=None, person_ids=None, inheritance=None, roles=None, sexes=None, variant_type=None, real_attr_filter=None, ultra_rare=None, frequency_filter=None, return_reference=None, return_unknown=None, limit=None)[source]
Query summary variants.
- query_variants(regions=None, genes=None, effect_types=None, family_ids=None, person_ids=None, inheritance=None, roles=None, sexes=None, variant_type=None, real_attr_filter=None, ultra_rare=None, frequency_filter=None, return_reference=None, return_unknown=None, limit=None, pedigree_fields=None)[source]
Query family variants.
dae.backends.schema2.parquet_io module
- class dae.backends.schema2.parquet_io.ContinuousParquetFileWriter(filepath, variant_loader, filesystem=None, rows=100000, schema='schema')[source]
Bases:
object
A continous parquet writer.
Class that automatically writes to a given parquet file when supplied enough data. Automatically dumps leftover data when closing into the file
- append_family_allele(allele, json_data)[source]
Append the data for an entire variant to the correct file.
- class dae.backends.schema2.parquet_io.NoPartitionDescriptor(root_dirname='')[source]
Bases:
dae.backends.schema2.parquet_io.PartitionDescriptor
Defines class for missing partition description.
- property chromosomes
- static generate_file_access_glob()[source]
Return a glob for accessing every parquet file in the partition.
- property region_length
- class dae.backends.schema2.parquet_io.ParquetManager[source]
Bases:
object
Provide function for producing variants and pedigree parquet files.
- class dae.backends.schema2.parquet_io.ParquetPartitionDescriptor(chromosomes, region_length, family_bin_size=0, coding_effect_types=None, rare_boundary=0, root_dirname='')[source]
Bases:
dae.backends.schema2.parquet_io.PartitionDescriptor
Defines partition description used for parquet datasets.
- property chromosomes
- property coding_effect_types
- property family_bin_size
- static from_config(config_path, root_dirname='')[source]
Create a partition description from the provided config file.
- static from_dict(config, root_dirname='')[source]
Create a partition description from the provided dictionary.
- generate_file_access_glob()[source]
Return a glob for accessing every parquet file in the partition.
- property rare_boundary
- property region_length
- class dae.backends.schema2.parquet_io.PartitionDescriptor[source]
Bases:
object
Abstract class for partition description.
- property chromosomes
- family_alleles_dirname: str = 'family'
- property region_length
- summary_alleles_dirname: str = 'summary'
- class dae.backends.schema2.parquet_io.VariantsParquetWriter(variants_loader, partition_descriptor, bucket_index=1, rows=100000, include_reference=True, filesystem=None)[source]
Bases:
object
Provide functions for storing variants into parquet dataset.
- dae.backends.schema2.parquet_io.add_missing_parquet_fields(pps, ped_df)[source]
Add missing parquet fields.
dae.backends.schema2.serializers module
- class dae.backends.schema2.serializers.AlleleParquetSerializer(annotation_schema, extra_attributes=None)[source]
Bases:
object
Serialize a bunch of alleles.
- BASE_SEARCHABLE_PROPERTIES_TYPES = {'allele_in_members': DataType(string), 'allele_in_roles': DataType(int32), 'allele_in_sexes': DataType(int8), 'allele_in_statuses': DataType(int8), 'allele_index': DataType(int32), 'bucket_index': DataType(int32), 'chromosome': DataType(string), 'effect_gene': ListType(list<item: struct<effect_gene_symbols: string, effect_types: string>>), 'end_position': DataType(int32), 'family_id': DataType(string), 'family_index': DataType(int32), 'inheritance_in_members': DataType(int16), 'is_denovo': DataType(int8), 'position': DataType(int32), 'reference': DataType(string), 'summary_index': DataType(int32), 'transmission_type': DataType(int8), 'variant_type': DataType(int8)}
- ENUM_PROPERTIES = {'allele_in_roles': <enum 'Role'>, 'allele_in_sexes': <enum 'Sex'>, 'allele_in_statuses': <enum 'Status'>, 'inheritance_in_members': <enum 'Inheritance'>, 'transmission_type': <enum 'TransmissionType'>, 'variant_type': <enum 'Type'>}
- FAMILY_ALLELE_BASE_SCHEMA = {'allele_in_members': ListType(list<item: string>), 'allele_in_roles': DataType(int32), 'allele_in_sexes': DataType(int8), 'allele_in_statuses': DataType(int8), 'allele_index': DataType(int32), 'bucket_index': DataType(int32), 'family_id': DataType(string), 'family_index': DataType(int32), 'inheritance_in_members': DataType(int16), 'is_denovo': DataType(int8), 'summary_index': DataType(int32)}
- SUMMARY_ALLELE_BASE_SCHEMA = {'af_allele_count': DataType(int32), 'af_allele_freq': DataType(float), 'af_parents_called': DataType(int32), 'af_parents_freq': DataType(float), 'allele_index': DataType(int32), 'bucket_index': DataType(int32), 'chromosome': DataType(string), 'effect_gene': ListType(list<element: struct<effect_gene_symbols: string, effect_types: string>>), 'end_position': DataType(int32), 'position': DataType(int32), 'reference': DataType(string), 'summary_index': DataType(int32), 'transmission_type': DataType(int8), 'variant_type': DataType(int8)}
- build_family_allele_batch_dict(allele, family_variant_data) dict[str, list[Any]] [source]
Build a batch of family allele data in the form of a dict.
- build_summary_allele_batch_dict(allele, summary_variant_data) dict[str, list[Any]] [source]
Build a batch of summary allele data in the form of a dict.
- property schema_family
Lazy construct and return the schema for the family alleles.
- property schema_summary
Lazy construct and return the schema for the summary alleles.
- property searchable_properties
- property searchable_properties_family
- property searchable_properties_summary
dae.backends.schema2.summary_builder module
- class dae.backends.schema2.summary_builder.SummaryQueryBuilder(dialect: dae.backends.schema2.base_query_builder.Dialect, db, family_variant_table, summary_allele_table, pedigree_table, family_variant_schema, summary_allele_schema, table_properties, pedigree_schema, pedigree_df, gene_models=None, do_join_affected=False)[source]
Bases:
dae.backends.schema2.base_query_builder.BaseQueryBuilder
Build queries related to summary variants.
dae.backends.schema2.vcf2schema2 module
import script similar to vcf2parquet.py.
# when complete add to setup.py # do not inherit, create a new tool. # retrace steps of Variants2ParquetTool class
- class dae.backends.schema2.vcf2schema2.MakefilePartitionHelper(partition_descriptor, genome, add_chrom_prefix=None, del_chrom_prefix=None)[source]
Bases:
object
- class dae.backends.schema2.vcf2schema2.Variants2Schema2[source]
Bases:
object
- BUCKET_INDEX_DEFAULT = 1000
- VARIANTS_FREQUENCIES: bool = True
- VARIANTS_LOADER_CLASS
alias of
dae.backends.vcf.loader.VcfLoader
- VARIANTS_TOOL: Optional[str] = 'vcf2schema2.py'
- dae.backends.schema2.vcf2schema2.construct_import_annotation_pipeline(gpf_instance, annotation_configfile=None)[source]
Module contents
Implementation for the next version (v2) of the DB schema.
Variants schema separated into two separate tables: summary allele and family variant.
supported on BigQuery and Impala (specified via Dialect)
parquet generation outputs two separate parquet files