Genomic Variants
Apache Parquet variants schema
Summary Variants/Alleles flat schema
- chrom (string) -
chromosome where variant is located
- position (int64) -
1-based position of the start of the variant
- reference (string) -
reference DNA string
- alternative (string) -
alternative DNA string (None for reference allele)
- summary_index (int64) -
index of the summary variant
- allele_index (int16) -
index of the allele inside given summary variant
- variant_type (int8) -
variant type in CSHL nottation
- cshl_variant (string) -
variant description in CSHL notation
- cshl_position (int64) -
variant position in CSHL notation
- cshl_length (int32) -
variant length in CSHL notation
- effect_type (string) -
worst effect of the variant (None for reference allele)
- effect_gene_genes (list_(string)) -
list of all genes affected by the variant allele (None for reference allele)
- effect_gene_types (list_(string)) -
list of all effect types corresponding to the effect_gene_genes (None for reference allele)
- effect_details_transcript_ids (list_(string)) -
list of all transcript ids affected by the variant allele (None for reference allele)
- effect_details_details (list_(string)) -
list of all effected details corresponding to the effect_details_transcript_ids (None for reference allele)
- af_parents_called_count (int32) -
count of independent parents that has well specified genotype for this allele
- af_parents_called_percent (float64) -
parcent of independent parents corresponding to af_parents_called_count
- af_allele_count (int32) -
count of this allele in the independent parents
- af_allele_freq (float64) -
allele frequency
Family Variants schema
chrom (string)
position (int64)
- family_index (int64) -
index of the family variant
- summary_index (int64) -
index of the summary variant
- family_id (string) -
family ID
- genotype (list_(int8)) -
genotype of the variant for the specified family
- inheritance (int32) -
inheritance type of the variant
Family Alleles schema
family_index (int64)
summary_index (int64)
allele_index (int16)
- variant_in_members (list_(string)) -
list of members of the family that have this allele
- variant_in_roles (list_(int32)) -
list of family members’ roles that have this allele
- variant_in_sexes (list_(int8)) -
list of family members’ sexes that have this allele
Variant Scores schema
summary_index (int64)
allele_index (int16)
score_id (string or int64)
score_value (float64)
Pedigree file schema
familyId (string)
personId (string)
dadId (string)
momId (string)
sex (int8)
status (int8)
role (int32)
sampleId (string)
order (int32)
Family variants query interface
Once you have family variants interface created, you can use it to search for variants you are interested in. The variants interface supports query by various attributes of the family variants:
query by genome regions
query by genes and variant effect types
query by inheritance types
query by family IDs
query by person IDs
query by sexes
query by family roles
query by variant types
query by real value variant attributes (scores).
query using general purpose filter function
In the following examples we will assume that fvars is an instance of family variants query interface that allows searching for variants by various criteria.
Query by regions
The query interface support searching of variants in given genome region or list of regions.
- Example:
The following example will return variants that are at one single position on chromosome 1:878109:
from dae.utils.regions import Region vs = fvars.query_variants(regions=[Region("1", 878109, 878109)])
You can specify list of regions in the query:
from dae.utils.regions import Region vs = fvars.query_variants( regions=[Region("1", 11539, 11539), Region("1", 11550, 11550)])
Query by genes and effect types
- Example:
The following example will return only variants with effect type frame-shift:
vs = fvars.query_variants( effects=["frame-shift"])
You can specify multiple effects in the query. The following example will return variants that with effect type frame-shift or missense:
vs = fvars.query_variants( effects=["frame-shift", "missense"])
You can search for variants in specific gene:
vs = fvars.query_variants( genes=["PLEKHN1"])
or list of genes:
vs = fvars.query_variants( genes=["PLEKHN1", "SAMD11"])
You can specifye combination of effect types and genes in which case the query will return only variants that match both criteria:
vs = fvars.query_variants( effect_types=["synonymous", "frame-shift"], genes=["PLEKHN1"])
Query by inheritance
- Example:
The following example will return only variants that have inheritance type denovo:
vs = fvars.query_variants( inheritance="denovo")
You can inheritance type using or:
vs = fvars.query_variants( inheritance="denovo or omission")
You can use not to get all family variants that has non reference inheritance type:
vs = fvars.query_variants(inheritance="not reference")
Query by family IDs
- Example:
The following example will return only variants that affect specified families:
vs = fvars.query_variants(family_ids=['f1', 'f2'])
where f1 and f2 are family IDs.
Query by person IDs
- Example:
The following example will return only variants that affect specified individuals:
vs = fvars.query_variants(person_ids=['mom2', 'ch2'])
where mom2 and ch2 are persons (individuals) IDs.
Query by sexes
- Example:
The following example will return only variants that affect male individuals:
vs = fvars.query_variants(sexes="male")
You can use or to combine sexes and not to negate. For example:
vs = fvars.query_variants(sexes="male and not female")
will return only family variants that affect male individuals in family, but not female.
Query by roles
- Example:
The following example will return only variants that affect probands in families:
vs = fvars.query_variants(roles="prb")
You can use or, and and not to combine roles. For example:
vs = fvars.query_variants(roles="prb and not sib")
will return only family variants that affect probands in family, but not siblings.
Query by variant types
- Example:
The following example will return only variants that are of type sub:
vs = fvars.query_variants(variant_types="sub")
You can use or, and and not to combine variant types. For example:
vs = fvars.query_variants(roles="sub or del")
will return only family variants that are of type sub or del.
Query with real value variant attributes (scores)
Not fully implemented yet
Query with filter function
Not fully implemented yet