variants package

Variants in families

Example usage of variants

Example usage of variants package:

import os
from utils.variant_utils import mat2str
from variants.builder import variants_builder as VB

prefix = "ivan-tiny/a"
# prefix = "spark/nspark"
prefix = 'fixtures/effects_trio'

genome_file = os.path.join(
    os.environ.get("DAE_DB_DIR"),
    "genomes/GATK_ResourceBundle_5777_b37_phiX174",
    "chrAll.fa")
print(genome_file)

gene_models_file = os.path.join(
    os.environ.get("DAE_DB_DIR"),
    "genomes/GATK_ResourceBundle_5777_b37_phiX174",
    "refGene-201309.gz")
print(gene_models_file)


fvars = VB(prefix=prefix, genome_file=genome_file,
           gene_models_file=gene_models_file)

vs = fvars.query_variants()


for c, v in enumerate(vs):
    print(c, v, v.family_id, mat2str(v.best_st), sep='\t')
    for aa in v.alt_alleles:
        print(aa.effect.worst, aa.effect.genes)
        print(aa['af_allele_count'], aa['af_allele_freq'])

Family variants query interface

Once you have family variants interface created, you can use it to search for variants you are interested in. The variants interface supports query by various attributes of the family variants:

  • query by genome regions

  • query by genes and variant effect types

  • query by inheritance types

  • query by family IDs

  • query by person IDs

  • query by sexes

  • query by family roles

  • query by variant types

  • query by real value variant attributes (scores).

  • query using general purpose filter function

In the following examples we will assume that fvars is an instance of family variants query interface that allows searching for variants by various criteria.

Query by regions

The query interface support searching of variants in given genome region or list of regions.

Example

The following example will return variants that are at one single position on chromosome 1:878109:

from dae.utils.regions import Region

vs = fvars.query_variants(regions=[Region("1", 878109, 878109)])

You can specify list of regions in the query:

from dae.utils.regions import Region

vs = fvars.query_variants(
    regions=[Region("1", 11539, 11539), Region("1", 11550, 11550)])

Query by genes and effect types

Example

The following example will return only variants with effect type frame-shift:

vs = fvars.query_variants(
    effects=["frame-shift"])

You can specify multiple effects in the query. The following example will return variants that with effect type frame-shift or missense:

vs = fvars.query_variants(
    effects=["frame-shift", "missense"])

You can search for variants in specific gene:

vs = fvars.query_variants(
    genes=["PLEKHN1"])

or list of genes:

vs = fvars.query_variants(
    genes=["PLEKHN1", "SAMD11"])

You can specifye combination of effect types and genes in which case the query will return only variants that match both criteria:

vs = fvars.query_variants(
    effect_types=["synonymous", "frame-shift"],
    genes=["PLEKHN1"])

Query by inheritance

Example

The following example will return only variants that have inheritance type denovo:

vs = fvars.query_variants(
    inheritance="denovo")

You can inheritance type using or:

vs = fvars.query_variants(
    inheritance="denovo or omission")

You can use not to get all family variants that has non reference inheritance type:

vs = fvars.query_variants(inheritance="not reference")

Query by family IDs

Example

The following example will return only variants that affect specified families:

vs = fvars.query_variants(family_ids=['f1', 'f2'])

where f1 and f2 are family IDs.

Query by person IDs

Example

The following example will return only variants that affect specified individuals:

vs = fvars.query_variants(person_ids=['mom2', 'ch2'])

where mom2 and ch2 are persons (individuals) IDs.

Query by sexes

Example

The following example will return only variants that affect male individuals:

vs = fvars.query_variants(sexes="male")

You can use or to combine sexes and not to negate. For example:

vs = fvars.query_variants(sexes="male and not female")

will return only family variants that affect male individuals in family, but not female.

Query by roles

Example

The following example will return only variants that affect probands in families:

vs = fvars.query_variants(roles="prb")

You can use or, and and not to combine roles. For example:

vs = fvars.query_variants(roles="prb and not sib")

will return only family variants that affect probands in family, but not siblings.

Query by variant types

Example

The following example will return only variants that are of type sub:

vs = fvars.query_variants(variant_types="sub")

You can use or, and and not to combine variant types. For example:

vs = fvars.query_variants(roles="sub or del")

will return only family variants that are of type sub or del.

Query with real value variant attributes (scores)

Not fully implemented yet

Query with filter function

Not fully implemented yet

VariantBase - a base class for variants

SummaryAllele - a base class for representing alleles

class dae.variants.variant.SummaryAllele(chromosome: str, position: int, reference: str, alternative: Optional[str] = None, end_position: int = None, summary_index: int = -1, allele_index: int = 0, transmission_type: dae.variants.attributes.TransmissionType = <TransmissionType.transmitted: 1>, variant_type=None, attributes: Dict[str, Any] = None)[source]

SummaryAllele represents a single allele for given position.

__init__(chromosome: str, position: int, reference: str, alternative: Optional[str] = None, end_position: int = None, summary_index: int = -1, allele_index: int = 0, transmission_type: dae.variants.attributes.TransmissionType = <TransmissionType.transmitted: 1>, variant_type=None, attributes: Dict[str, Any] = None)[source]

Initialize self. See help(type(self)) for accurate signature.

property allele_index

index of the allele in summary variant

property alternative
property attributes

additional attributes of the allele

property chrom
property chromosome
static create_reference_allele(allele) → dae.variants.variant.Allele[source]
property cshl_location
property cshl_position
property cshl_variant
property details
property effect

effects of the allele; None for the reference allele.

property effect_gene_symbols
property effect_genes
property effect_types
property effects
property end_position
property frequency
get_attribute(item: str, default=None)

looks up values matching key item in additional attributes passed on creation of the variant.

has_attribute(item: str) → bool

checks if additional variant attributes contain values for key item.

property is_reference_allele
property position
property reference
property summary_index

index of the summary variant this allele belongs to

property transmission_type
update_attributes(atts) → None

updates additional attributes of variant using dictionary atts.

property variant_type

SummaryVariant - representation of summary variants

class dae.variants.variant.SummaryVariant(alleles)[source]
__contains__(item: Any) → bool
__getitem__(item: Any) → List[Any]
__init__(alleles)[source]

Initialize self. See help(type(self)) for accurate signature.

property allele_count
property alleles

list of all alleles of the variant

property alt_alleles

list of all alternative alleles

property alternative
property chrom
property chromosome
property details

1-based list of VariantDetails, that describes each alternative allele.

property effects

1-based list of Effect, that describes variant effects.

property end_position
property frequencies

0-base list of frequencies for variant.

get_attribute(item: Any, default: Optional[Any] = None) → List[Any]
has_attribute(item: Any) → bool
property location
property position
property ref_allele

the reference allele

property reference
property summary_index
update_attributes(atts: Dict[str, Any]) → None
property variant_types

returns set of variant types.

FamilyDelegate - common inheritance methods

class dae.variants.family_variant.FamilyDelegate(family)[source]
property family_id

Returns the family ID.

property members_ids

Returns list of family members IDs.

property members_in_order

Returns list of the members of the family in the order specified from the pedigree file. Each element of the returned list is an object of type variants.family.Person.

FamilyAllele - representation of family allele

class dae.variants.family_variant.FamilyAllele(summary_allele: dae.variants.variant.SummaryAllele, family: dae.pedigrees.family.Family, genotype, best_state, genetic_model=None, inheritance_in_members=None)[source]
property allele_index

index of the allele in summary variant

property alternative
property attributes

additional attributes of the allele

property best_st

Deprecated since version Replace: best_st with best_state

property best_state
classmethod calc_inheritance_trio(p1, p2, ch, allele_index)[source]

Calculates the inheritance type of a trio family.

Parameters
  • p1 – genotype of the first parent (pair of allele indexes).

  • p2 – genotype of the second parent.

  • ch – genotype of the child.

Returns

inheritance type as variants.attributes.Inheritance of the trio family.

static check_denovo_trio(p1, p2, ch, allele_index)[source]

Checks if the inheritance type for a trio family is denovo.

Parameters
  • p1 – genotype of the first parent (pair of allele indexes).

  • p2 – genotype of the second parent.

  • ch – genotype of the child.

Returns

True, when the inheritance is mendelian.

static check_mendelian_trio(p1, p2, ch, allele_index)[source]

Checks if the inheritance type for a trio family is mendelian.

Parameters
  • p1 – genotype of the first parent (pair of allele indexes).

  • p2 – genotype of the second parent.

  • ch – genotype of the child.

Returns

True, when the inheritance is mendelian.

static check_omission_trio(p1, p2, ch, allele_index)[source]

Checks if the inheritance type for a trio family is omission.

Parameters
  • p1 – genotype of the first parent (pair of allele indexes).

  • p2 – genotype of the second parent.

  • ch – genotype of the child.

Returns

True, when the inheritance is mendelian.

property chrom
property chromosome
property cshl_location
property cshl_position
property cshl_variant
property details
property effect

effects of the allele; None for the reference allele.

property effect_gene_symbols
property effect_genes
property effect_types
property effects
property end_position
property family_id

Returns the family ID.

property family_index
property frequency
property genetic_model
property genotype

Returns genotype of the family.

get_attribute(item: str, default=None)

looks up values matching key item in additional attributes passed on creation of the variant.

gt_flatten()[source]

Return genotype of the family variant flattened to 1-dimensional array.

has_attribute(item: str) → bool

checks if additional variant attributes contain values for key item.

property inheritance_in_members
property is_reference_allele
property members_ids

Returns list of family members IDs.

property members_in_order

Returns list of the members of the family in the order specified from the pedigree file. Each element of the returned list is an object of type variants.family.Person.

property position
property reference
summary_allele: SummaryAllele = None

summary allele that corresponds to this allele in family variant

property summary_index

index of the summary variant this allele belongs to

property transmission_type
update_attributes(atts) → None

updates additional attributes of variant using dictionary atts.

property variant_in_members

Returns set of members IDs of the family that are affected by this family variant.

property variant_in_members_objects
property variant_in_roles

Returns list of roles (or ‘None’) of the members of the family that are affected by this family variant.

property variant_in_sexes

Returns list of sexes (or ‘None’) of the members of the family that are affected by this family variant.

property variant_type

FamilyVariant - representation of family variants

class dae.variants.family_variant.FamilyVariant(summary_variant: dae.variants.variant.SummaryVariant, family: dae.pedigrees.family.Family, genotype: Any, best_state: Any)[source]
property allele_count
property alleles

list of all alleles of the variant

property alt_alleles

list of all alternative alleles

property alternative
property best_st

Deprecated since version Replace: usage of best_st with best_state

property best_state
static calc_alleles(gt)[source]

Returns allele indexes that are relevant for the given genotype.

Parameters

gt – genotype as np.array.

Returns

list of all allele indexes present into genotype passed.

static calc_alt_alleles(gt)[source]

Returns alternative allele indexes that are relevant for the given genotype.

Parameters

gt – genotype as np.array.

Returns

list of all alternative allele indexes present into genotype passed.

property chrom
property chromosome
property details

1-based list of VariantDetails, that describes each alternative allele.

property effects

1-based list of Effect, that describes variant effects.

property end_position
property family_id

Returns the family ID.

property family_index
property frequencies

0-base list of frequencies for variant.

property fvuid
property genetic_model
property genotype

Returns genotype of the family.

get_attribute(item: Any, default: Optional[Any] = None) → List[Any]
gt_flatten()[source]

Return genotype of the family variant flattened to 1-dimensional array.

has_attribute(item: Any) → bool
is_reference()[source]

Returns True if all known alleles in the family variant are reference.

is_unknown()[source]

Returns True if all alleles in the family variant are unknown.

property location
property matched_alleles
property matched_alleles_indexes
property matched_gene_effects
property members_ids

Returns list of family members IDs.

property members_in_order

Returns list of the members of the family in the order specified from the pedigree file. Each element of the returned list is an object of type variants.family.Person.

property position
property ref_allele

the reference allele

property reference
set_matched_alleles(alleles_indexes)[source]
property summary_index
update_attributes(atts: Dict[str, Any]) → None
property variant_types

returns set of variant types.

RawVcfVariants - query interface for VCF variants

Apache Parquet variants schema

Summary Variants/Alleles flat schema

  • chrom (string) -

    chromosome where variant is located

  • position (int64) -

    1-based position of the start of the variant

  • reference (string) -

    reference DNA string

  • alternative (string) -

    alternative DNA string (None for reference allele)

  • summary_index (int64) -

    index of the summary variant

  • allele_index (int16) -

    index of the allele inside given summary variant

  • variant_type (int8) -

    variant type in CSHL nottation

  • cshl_variant (string) -

    variant description in CSHL notation

  • cshl_position (int64) -

    variant position in CSHL notation

  • cshl_length (int32) -

    variant length in CSHL notation

  • effect_type (string) -

    worst effect of the variant (None for reference allele)

  • effect_gene_genes (list_(string)) -

    list of all genes affected by the variant allele (None for reference allele)

  • effect_gene_types (list_(string)) -

    list of all effect types corresponding to the effect_gene_genes (None for reference allele)

  • effect_details_transcript_ids (list_(string)) -

    list of all transcript ids affected by the variant allele (None for reference allele)

  • effect_details_details (list_(string)) -

    list of all effected details corresponding to the effect_details_transcript_ids (None for reference allele)

  • af_parents_called_count (int32) -

    count of independent parents that has well specified genotype for this allele

  • af_parents_called_percent (float64) -

    parcent of independent parents corresponding to af_parents_called_count

  • af_allele_count (int32) -

    count of this allele in the independent parents

  • af_allele_freq (float64) -

    allele frequency

Family Variants schema

  • chrom (string)

  • position (int64)

  • family_index (int64) -

    index of the family variant

  • summary_index (int64) -

    index of the summary variant

  • family_id (string) -

    family ID

  • genotype (list_(int8)) -

    genotype of the variant for the specified family

  • inheritance (int32) -

    inheritance type of the variant

Family Alleles schema

  • family_index (int64)

  • summary_index (int64)

  • allele_index (int16)

  • variant_in_members (list_(string)) -

    list of members of the family that have this allele

  • variant_in_roles (list_(int32)) -

    list of family members’ roles that have this allele

  • variant_in_sexes (list_(int8)) -

    list of family members’ sexes that have this allele

Variant Scores schema

  • summary_index (int64)

  • allele_index (int16)

  • score_id (string or int64)

  • score_value (float64)

Pedigree file schema

  • familyId (string)

  • personId (string)

  • dadId (string)

  • momId (string)

  • sex (int8)

  • status (int8)

  • role (int32)

  • sampleId (string)

  • order (int32)

Functions from parquet_io module