variants package¶
Variants in families¶
Example usage of variants
¶
Example usage of variants package:
import os
from utils.variant_utils import mat2str
from variants.builder import variants_builder as VB
prefix = "ivan-tiny/a"
# prefix = "spark/nspark"
prefix = 'fixtures/effects_trio'
genome_file = os.path.join(
os.environ.get("DAE_DB_DIR"),
"genomes/GATK_ResourceBundle_5777_b37_phiX174",
"chrAll.fa")
print(genome_file)
gene_models_file = os.path.join(
os.environ.get("DAE_DB_DIR"),
"genomes/GATK_ResourceBundle_5777_b37_phiX174",
"refGene-201309.gz")
print(gene_models_file)
fvars = VB(prefix=prefix, genome_file=genome_file,
gene_models_file=gene_models_file)
vs = fvars.query_variants()
for c, v in enumerate(vs):
print(c, v, v.family_id, mat2str(v.best_st), sep='\t')
for aa in v.alt_alleles:
print(aa.effect.worst, aa.effect.genes)
print(aa['af_allele_count'], aa['af_allele_freq'])
Family variants query interface¶
Once you have family variants interface created, you can use it to search for variants you are interested in. The variants interface supports query by various attributes of the family variants:
query by genome regions
query by genes and variant effect types
query by inheritance types
query by family IDs
query by person IDs
query by sexes
query by family roles
query by variant types
query by real value variant attributes (scores).
query using general purpose filter function
In the following examples we will assume that fvars is an instance of family variants query interface that allows searching for variants by various criteria.
Query by regions¶
The query interface support searching of variants in given genome region or list of regions.
- Example
The following example will return variants that are at one single position on chromosome 1:878109:
from dae.utils.regions import Region vs = fvars.query_variants(regions=[Region("1", 878109, 878109)])
You can specify list of regions in the query:
from dae.utils.regions import Region vs = fvars.query_variants( regions=[Region("1", 11539, 11539), Region("1", 11550, 11550)])
Query by genes and effect types¶
- Example
The following example will return only variants with effect type frame-shift:
vs = fvars.query_variants( effects=["frame-shift"])
You can specify multiple effects in the query. The following example will return variants that with effect type frame-shift or missense:
vs = fvars.query_variants( effects=["frame-shift", "missense"])
You can search for variants in specific gene:
vs = fvars.query_variants( genes=["PLEKHN1"])
or list of genes:
vs = fvars.query_variants( genes=["PLEKHN1", "SAMD11"])
You can specifye combination of effect types and genes in which case the query will return only variants that match both criteria:
vs = fvars.query_variants( effect_types=["synonymous", "frame-shift"], genes=["PLEKHN1"])
Query by inheritance¶
- Example
The following example will return only variants that have inheritance type denovo:
vs = fvars.query_variants( inheritance="denovo")
You can inheritance type using or:
vs = fvars.query_variants( inheritance="denovo or omission")
You can use not to get all family variants that has non reference inheritance type:
vs = fvars.query_variants(inheritance="not reference")
Query by family IDs¶
- Example
The following example will return only variants that affect specified families:
vs = fvars.query_variants(family_ids=['f1', 'f2'])
where f1 and f2 are family IDs.
Query by person IDs¶
- Example
The following example will return only variants that affect specified individuals:
vs = fvars.query_variants(person_ids=['mom2', 'ch2'])
where mom2 and ch2 are persons (individuals) IDs.
Query by sexes¶
- Example
The following example will return only variants that affect male individuals:
vs = fvars.query_variants(sexes="male")
You can use or to combine sexes and not to negate. For example:
vs = fvars.query_variants(sexes="male and not female")
will return only family variants that affect male individuals in family, but not female.
Query by roles¶
- Example
The following example will return only variants that affect probands in families:
vs = fvars.query_variants(roles="prb")
You can use or, and and not to combine roles. For example:
vs = fvars.query_variants(roles="prb and not sib")
will return only family variants that affect probands in family, but not siblings.
Query by variant types¶
- Example
The following example will return only variants that are of type sub:
vs = fvars.query_variants(variant_types="sub")
You can use or, and and not to combine variant types. For example:
vs = fvars.query_variants(roles="sub or del")
will return only family variants that are of type sub or del.
Query with real value variant attributes (scores)¶
Not fully implemented yet
Query with filter function¶
Not fully implemented yet
VariantBase - a base class for variants¶
SummaryAllele - a base class for representing alleles¶
-
class
dae.variants.variant.
SummaryAllele
(chromosome: str, position: int, reference: str, alternative: Optional[str] = None, end_position: Optional[int] = None, summary_index: int = -1, allele_index: int = 0, transmission_type: dae.variants.attributes.TransmissionType = <TransmissionType.transmitted: 1>, variant_type=None, attributes: Optional[Dict[str, Any]] = None)[source]¶ SummaryAllele represents a single allele for given position.
-
__init__
(chromosome: str, position: int, reference: str, alternative: Optional[str] = None, end_position: Optional[int] = None, summary_index: int = -1, allele_index: int = 0, transmission_type: dae.variants.attributes.TransmissionType = <TransmissionType.transmitted: 1>, variant_type=None, attributes: Optional[Dict[str, Any]] = None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
property
allele_index
¶ index of the allele in summary variant
-
property
alternative
¶
-
property
attributes
¶ additional attributes of the allele
-
property
chrom
¶
-
property
chromosome
¶
-
property
cshl_location
¶
-
property
cshl_position
¶
-
property
cshl_variant
¶
-
property
cshl_variant_full
¶
-
property
details
¶
-
property
effect
¶ effects of the allele; None for the reference allele.
-
property
effect_gene_symbols
¶
-
property
effect_genes
¶
-
property
effect_types
¶
-
property
effects
¶
-
property
end_position
¶
-
property
frequency
¶
-
get_attribute
(item: str, default=None)¶ looks up values matching key item in additional attributes passed on creation of the variant.
-
has_attribute
(item: str) → bool¶ checks if additional variant attributes contain values for key item.
-
property
is_reference_allele
¶
-
property
position
¶
-
property
reference
¶
-
property
summary_index
¶ index of the summary variant this allele belongs to
-
property
transmission_type
¶
-
update_attributes
(atts) → None¶ updates additional attributes of variant using dictionary atts.
-
property
variant_type
¶
-
SummaryVariant - representation of summary variants¶
-
class
dae.variants.variant.
SummaryVariant
(alleles)[source]¶ -
__contains__
(item: Any) → bool¶
-
__getitem__
(item: Any) → List[Any]¶
-
property
allele_count
¶
-
property
alleles
¶ list of all alleles of the variant
-
property
alt_alleles
¶ list of all alternative alleles
-
property
alternative
¶
-
property
chrom
¶
-
property
chromosome
¶
-
property
cshl_location
¶
-
property
cshl_variant
¶
-
property
cshl_variant_full
¶
-
property
details
¶ list of VariantDetails, that describe each alternative allele.
-
property
effect_gene_symbols
¶
-
property
effect_types
¶
-
property
effects
¶ 1-based list of Effect, that describes variant effects.
-
property
end_position
¶
-
property
frequencies
¶ 0-base list of frequencies for variant.
-
get_attribute
(item: Any, default: Optional[Any] = None) → List[Any]¶
-
has_attribute
(item: Any) → bool¶
-
property
location
¶
-
property
matched_alleles
¶
-
property
matched_alleles_indexes
¶
-
property
matched_gene_effects
¶
-
property
position
¶
-
property
ref_allele
¶ the reference allele
-
property
reference
¶
-
set_matched_alleles
(alleles_indexes)¶
-
property
summary_index
¶
-
property
svuid
¶
-
property
transmission_type
¶
-
update_attributes
(atts: Dict[str, Any]) → None¶
-
property
variant_types
¶ returns set of variant types.
-
FamilyDelegate - common inheritance methods¶
-
class
dae.variants.family_variant.
FamilyDelegate
(family)[source]¶ -
property
family_id
¶ Returns the family ID.
-
property
members_ids
¶ Returns list of family members IDs.
-
property
members_in_order
¶ Returns list of the members of the family in the order specified from the pedigree file. Each element of the returned list is an object of type
variants.family.Person
.
-
property
FamilyAllele - representation of family allele¶
-
class
dae.variants.family_variant.
FamilyAllele
(summary_allele: dae.variants.variant.SummaryAllele, family: dae.pedigrees.family.Family, genotype, best_state, genetic_model=None, inheritance_in_members=None)[source]¶ -
property
allele_index
¶ index of the allele in summary variant
-
property
alternative
¶
-
property
attributes
¶ additional attributes of the allele
-
property
best_st
¶ Deprecated since version Replace: best_st with best_state
-
property
best_state
¶
-
classmethod
calc_inheritance_trio
(p1, p2, ch, allele_index)[source]¶ Calculates the inheritance type of a trio family.
- Parameters
p1 – genotype of the first parent (pair of allele indexes).
p2 – genotype of the second parent.
ch – genotype of the child.
- Returns
inheritance type as
variants.attributes.Inheritance
of the trio family.
-
static
check_denovo_trio
(p1, p2, ch, allele_index)[source]¶ Checks if the inheritance type for a trio family is denovo.
- Parameters
p1 – genotype of the first parent (pair of allele indexes).
p2 – genotype of the second parent.
ch – genotype of the child.
- Returns
True, when the inheritance is mendelian.
-
static
check_mendelian_trio
(p1, p2, ch, allele_index)[source]¶ Checks if the inheritance type for a trio family is mendelian.
- Parameters
p1 – genotype of the first parent (pair of allele indexes).
p2 – genotype of the second parent.
ch – genotype of the child.
- Returns
True, when the inheritance is mendelian.
-
static
check_omission_trio
(p1, p2, ch, allele_index)[source]¶ Checks if the inheritance type for a trio family is omission.
- Parameters
p1 – genotype of the first parent (pair of allele indexes).
p2 – genotype of the second parent.
ch – genotype of the child.
- Returns
True, when the inheritance is mendelian.
-
property
chrom
¶
-
property
chromosome
¶
-
property
cshl_location
¶
-
property
cshl_position
¶
-
property
cshl_variant
¶
-
property
cshl_variant_full
¶
-
property
details
¶
-
property
effect
¶ effects of the allele; None for the reference allele.
-
property
effect_gene_symbols
¶
-
property
effect_genes
¶
-
property
effect_types
¶
-
property
effects
¶
-
property
end_position
¶
-
property
family_id
¶ Returns the family ID.
-
property
family_index
¶
-
property
frequency
¶
-
property
genetic_model
¶
-
property
genotype
¶ Returns genotype of the family.
-
get_attribute
(item: str, default=None)¶ looks up values matching key item in additional attributes passed on creation of the variant.
-
has_attribute
(item: str) → bool¶ checks if additional variant attributes contain values for key item.
-
property
inheritance_in_members
¶
-
property
is_reference_allele
¶
-
property
members_ids
¶ Returns list of family members IDs.
-
property
members_in_order
¶ Returns list of the members of the family in the order specified from the pedigree file. Each element of the returned list is an object of type
variants.family.Person
.
-
property
position
¶
-
property
reference
¶
-
summary_allele
: dae.variants.variant.SummaryAllele¶ summary allele that corresponds to this allele in family variant
-
property
summary_index
¶ index of the summary variant this allele belongs to
-
property
transmission_type
¶
-
update_attributes
(atts) → None¶ updates additional attributes of variant using dictionary atts.
-
property
variant_in_members
¶ Returns set of members IDs of the family that are affected by this family variant.
-
property
variant_in_members_objects
¶
-
property
variant_in_roles
¶ Returns list of roles (or ‘None’) of the members of the family that are affected by this family variant.
-
property
variant_in_sexes
¶ Returns list of sexes (or ‘None’) of the members of the family that are affected by this family variant.
-
property
variant_type
¶
-
property
FamilyVariant - representation of family variants¶
-
class
dae.variants.family_variant.
FamilyVariant
(summary_variant: dae.variants.variant.SummaryVariant, family: dae.pedigrees.family.Family, genotype: Any, best_state: Any, inheritance_in_members=None)[source]¶ -
property
allele_count
¶
-
property
allele_indexes
¶
-
property
alleles
¶ list of all alleles of the variant
-
property
alt_alleles
¶ list of all alternative alleles
-
property
alternative
¶
-
property
best_st
¶ Deprecated since version Replace: usage of best_st with best_state
-
property
best_state
¶
-
static
calc_alleles
(gt)[source]¶ Returns allele indexes that are relevant for the given genotype.
- Parameters
gt – genotype as np.array.
- Returns
list of all allele indexes present into genotype passed.
-
static
calc_alt_alleles
(gt)[source]¶ Returns alternative allele indexes that are relevant for the given genotype.
- Parameters
gt – genotype as np.array.
- Returns
list of all alternative allele indexes present into genotype passed.
-
property
chrom
¶
-
property
chromosome
¶
-
property
cshl_location
¶
-
property
cshl_variant
¶
-
property
cshl_variant_full
¶
-
property
details
¶ list of VariantDetails, that describe each alternative allele.
-
property
effect_gene_symbols
¶
-
property
effect_types
¶
-
property
effects
¶ 1-based list of Effect, that describes variant effects.
-
property
end_position
¶
-
property
family_allele_indexes
¶
-
property
family_best_state
¶
-
property
family_genotype
¶ Returns family genotype using family variant indexes.
-
property
family_id
¶ Returns the family ID.
-
property
family_index
¶
-
property
frequencies
¶ 0-base list of frequencies for variant.
-
property
fvuid
¶
-
property
genetic_model
¶
-
property
genotype
¶ Returns genotype using summary variant allele indexes.
-
get_attribute
(item: Any, default: Optional[Any] = None) → List[Any]¶
-
has_attribute
(item: Any) → bool¶
-
property
location
¶
-
property
matched_alleles
¶
-
property
matched_alleles_indexes
¶
-
property
matched_gene_effects
¶
-
property
members_ids
¶ Returns list of family members IDs.
-
property
members_in_order
¶ Returns list of the members of the family in the order specified from the pedigree file. Each element of the returned list is an object of type
variants.family.Person
.
-
property
position
¶
-
property
ref_allele
¶ the reference allele
-
property
reference
¶
-
set_matched_alleles
(alleles_indexes)¶
-
property
summary_index
¶
-
property
transmission_type
¶
-
update_attributes
(atts: Dict[str, Any]) → None¶
-
property
variant_in_members
¶
-
property
variant_types
¶ returns set of variant types.
-
property
RawVcfVariants - query interface for VCF variants¶
Apache Parquet variants schema¶
Summary Variants/Alleles flat schema¶
- chrom (string) -
chromosome where variant is located
- position (int64) -
1-based position of the start of the variant
- reference (string) -
reference DNA string
- alternative (string) -
alternative DNA string (None for reference allele)
- summary_index (int64) -
index of the summary variant
- allele_index (int16) -
index of the allele inside given summary variant
- variant_type (int8) -
variant type in CSHL nottation
- cshl_variant (string) -
variant description in CSHL notation
- cshl_position (int64) -
variant position in CSHL notation
- cshl_length (int32) -
variant length in CSHL notation
- effect_type (string) -
worst effect of the variant (None for reference allele)
- effect_gene_genes (list_(string)) -
list of all genes affected by the variant allele (None for reference allele)
- effect_gene_types (list_(string)) -
list of all effect types corresponding to the effect_gene_genes (None for reference allele)
- effect_details_transcript_ids (list_(string)) -
list of all transcript ids affected by the variant allele (None for reference allele)
- effect_details_details (list_(string)) -
list of all effected details corresponding to the effect_details_transcript_ids (None for reference allele)
- af_parents_called_count (int32) -
count of independent parents that has well specified genotype for this allele
- af_parents_called_percent (float64) -
parcent of independent parents corresponding to af_parents_called_count
- af_allele_count (int32) -
count of this allele in the independent parents
- af_allele_freq (float64) -
allele frequency
Family Variants schema¶
chrom (string)
position (int64)
- family_index (int64) -
index of the family variant
- summary_index (int64) -
index of the summary variant
- family_id (string) -
family ID
- genotype (list_(int8)) -
genotype of the variant for the specified family
- inheritance (int32) -
inheritance type of the variant
Family Alleles schema¶
family_index (int64)
summary_index (int64)
allele_index (int16)
- variant_in_members (list_(string)) -
list of members of the family that have this allele
- variant_in_roles (list_(int32)) -
list of family members’ roles that have this allele
- variant_in_sexes (list_(int8)) -
list of family members’ sexes that have this allele
Variant Scores schema¶
summary_index (int64)
allele_index (int16)
score_id (string or int64)
score_value (float64)
Pedigree file schema¶
familyId (string)
personId (string)
dadId (string)
momId (string)
sex (int8)
status (int8)
role (int32)
sampleId (string)
order (int32)