dae.annotation package

Submodules

dae.annotation.annotatable module

class dae.annotation.annotatable.Annotatable(chrom: str, pos: int, pos_end: int, annotatable_type: Type)[source]

Bases: object

Base class for annotatables used in annotation pipeline.

class Type(value)[source]

Bases: Enum

Defines annotatable types.

COMPLEX = 5
LARGE_DELETION = 7
LARGE_DUPLICATION = 6
POSITION = 0
REGION = 1
SMALL_DELETION = 4
SMALL_INSERTION = 3
SUBSTITUTION = 2
static from_string(variant: str) Type[source]

Construct annotatable type from string argument.

property chrom: str
property chromosome: str
property end_position: int
static from_string(value: str) Annotatable[source]

Deserialize an Annotatable instance from a string value.

property pos: int
property pos_end: int
property position: int
static tokenize(value: str) tuple[str, list[str]][source]
class dae.annotation.annotatable.CNVAllele(chrom: str, pos_begin: int, pos_end: int, cnv_type: Type)[source]

Bases: Annotatable

Defines copy number variants annotatable.

static from_string(value: str) CNVAllele[source]

Deserialize an Annotatable instance from a string value.

class dae.annotation.annotatable.Position(chrom: str, pos: int)[source]

Bases: Annotatable

Annotatable class representing a single position in a chromosome.

static from_string(value: str) Position[source]

Deserialize an Annotatable instance from a string value.

class dae.annotation.annotatable.Region(chrom: str, pos_begin: int, pos_end: int)[source]

Bases: Annotatable

Annotatable class representing a region in a chromosome.

static from_string(value: str) Region[source]

Deserialize an Annotatable instance from a string value.

class dae.annotation.annotatable.VCFAllele(chrom: str, pos: int, ref: str, alt: str)[source]

Bases: Annotatable

Defines small variants annotatable.

property alt: str
property alternative: str
static from_string(value: str) VCFAllele[source]

Deserialize an Annotatable instance from a string value.

property ref: str
property reference: str

dae.annotation.annotate_columns module

class dae.annotation.annotate_columns.AnnotateColumnsTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]

Bases: AnnotationTool

Annotation tool for TSV-style text files.

static annotate(args: Namespace, pipeline_config: str, grr_definition: dict | None, ref_genome_id: str | None, out_file_path: str, region: tuple = (), compress_output: bool = False) None[source]

Annotate a variants file with a given pipeline configuration.

get_argument_parser() ArgumentParser[source]

Configure argument parser.

work() None[source]
dae.annotation.annotate_columns.cli(raw_args: list[str] | None = None) None[source]
dae.annotation.annotate_columns.combine(args: Any, partfile_paths: list[str], out_file_path: str) None[source]

Combine annotated region parts into a single VCF file.

dae.annotation.annotate_columns.produce_tabix_index(filepath: str, args: Any, header: list[str], ref_genome: ReferenceGenome | None) None[source]

Produce a tabix index file for the given variants file.

dae.annotation.annotate_columns.read_input(args: Any, region: tuple = ()) tuple[Any, Any, list[str]][source]

Return a file object, line iterator and list of header columns.

Handles differences between tabixed and non-tabixed input files.

dae.annotation.annotate_vcf module

class dae.annotation.annotate_vcf.AnnotateVCFTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]

Bases: AnnotationTool

Annotation tool for the VCF file format.

static annotate(input_file: str, region: tuple[str, int, int] | None, pipeline_config: str, grr_definition: dict | None, out_file_path: str, allow_repeated_attributes: bool, pipeline_config_old: str | None = None) None[source]

Annotate a region from a given input VCF file using a pipeline.

get_argument_parser() ArgumentParser[source]

Construct and configure argument parser.

work() None[source]
dae.annotation.annotate_vcf.cli(raw_args: list[str] | None = None) None[source]
dae.annotation.annotate_vcf.combine(input_file_path: str, pipeline_config: list[dae.annotation.annotation_pipeline.AnnotatorInfo] | None, grr_definition: dict | None, partfile_paths: List[str], output_file_path: str) None[source]

Combine annotated region parts into a single VCF file.

dae.annotation.annotate_vcf.update_header(variant_file: VariantFile, pipeline: AnnotationPipeline | ReannotationPipeline) None[source]

Update a variant file’s header with annotation pipeline scores.

dae.annotation.annotation_factory module

Factory for creation of annotation pipeline.

class dae.annotation.annotation_factory.AnnotationConfigParser[source]

Bases: object

Parser for annotation configuration.

static has_wildcard(string: str) bool[source]

Ascertain whether a string contains a valid wildcard.

static match_labels_query(query: dict[str, str], resource_labels: dict[str, str]) bool[source]

Check if the labels query for a wildcard matches.

static normalize(pipeline_config: List[Any]) List[Dict][source]

Return a normalized annotation pipeline configuration.

static parse_complete(raw: dict[str, Any], idx: int) AnnotatorInfo[source]

Parse a full-form annotation config.

static parse_config_file(filename: str, grr: GenomicResourceRepo | None) List[AnnotatorInfo][source]

Parse annotation pipeline configuration file.

static parse_minimal(raw: str, idx: int) AnnotatorInfo[source]

Parse a minimal-form annotation config.

static parse_raw(pipeline_raw_config: list[dict[str, Any]] | None, grr: GenomicResourceRepo | None = None) list[dae.annotation.annotation_pipeline.AnnotatorInfo][source]

Parse raw dictionary annotation pipeline configuration.

static parse_raw_attribute_config(raw_attribute_config: dict[str, Any]) AttributeInfo[source]

Parse annotation attribute raw configuration.

static parse_raw_attributes(raw_attributes_config: Any) list[dae.annotation.annotation_pipeline.AttributeInfo][source]

Parse annotator pipeline attribute configuration.

static parse_short(raw: dict[str, Any], idx: int, grr: GenomicResourceRepo | None = None) list[dae.annotation.annotation_pipeline.AnnotatorInfo][source]

Parse a short-form annotation config.

static parse_str(content: str, source_file_name: str | None = None, grr: GenomicResourceRepo | None = None) list[dae.annotation.annotation_pipeline.AnnotatorInfo][source]

Parse annotation pipeline configuration string.

static query_resources(annotator_type: str, wildcard: str, grr: GenomicResourceRepo) list[str][source]

Collect resources matching a given query.

exception dae.annotation.annotation_factory.AnnotationConfigurationError[source]

Bases: ValueError

dae.annotation.annotation_factory.build_annotation_pipeline(pipeline_config: list[dae.annotation.annotation_pipeline.AnnotatorInfo] | None = None, pipeline_config_raw: list[dict] | None = None, pipeline_config_file: str | None = None, pipeline_config_str: str | None = None, grr_repository: GenomicResourceRepo | None = None, grr_repository_file: str | None = None, grr_repository_definition: dict | None = None, allow_repeated_attributes: bool = False) AnnotationPipeline[source]

Build an annotation pipeline.

dae.annotation.annotation_factory.check_for_repeated_attributes_in_annotator(annotator_config: AnnotatorInfo) None[source]

Check for repeated attributes in annotator configuration.

dae.annotation.annotation_factory.check_for_repeated_attributes_in_pipeline(pipeline: AnnotationPipeline, allow_repeated_attributes: bool = False) None[source]

Check for repeated attributes in pipeline configuration.

dae.annotation.annotation_factory.check_for_unused_parameters(info: AnnotatorInfo) None[source]

Check annotator configuration for unused parameters.

dae.annotation.annotation_factory.copy_annotation_pipeline(pipeline: AnnotationPipeline) AnnotationPipeline[source]

Copy an annotation pipeline instance.

dae.annotation.annotation_factory.copy_reannotation_pipeline(pipeline: ReannotationPipeline) ReannotationPipeline[source]

Copy a reannotation pipeline instance.

dae.annotation.annotation_factory.get_annotator_factory(annotator_type: str) Callable[[AnnotationPipeline, AnnotatorInfo], Annotator][source]

Find and return a factory function for creation of an annotator type.

If the specified annotator type is not found, this function raises ValueError exception.

Returns:

the annotator factory for the specified annotator type.

Raises:

ValueError – when can’t find an annotator factory for the specified annotator type.

dae.annotation.annotation_factory.get_available_annotator_types() List[str][source]

Return the list of all registered annotator factory types.

dae.annotation.annotation_factory.register_annotator_factory(annotator_type: str, factory: Callable[[AnnotationPipeline, AnnotatorInfo], Annotator]) None[source]

Register additional annotator factory.

By default all genotype storage factories should be registered at [dae.genotype_storage.factories] extenstion point. All registered factories are loaded automatically. This function should be used if you want to bypass extension point mechanism and register addition genotype storage factory programatically.

dae.annotation.annotation_factory.resolve_repeated_attributes(pipeline: AnnotationPipeline, repeated_attributes: set[str]) None[source]

Resolve repeated attributes in pipeline configuration via renaming.

dae.annotation.annotation_pipeline module

Provides annotation pipeline class.

class dae.annotation.annotation_pipeline.AnnotationPipeline(repository: GenomicResourceRepo)[source]

Bases: object

Provides annotation pipeline abstraction.

add_annotator(annotator: Annotator) None[source]
annotate(annotatable: Annotatable, context: dict | None = None) dict[source]

Apply all annotators to an annotatable.

close() None[source]

Close the annotation pipeline.

get_annotator_by_attribute_info(attribute_info: AttributeInfo) Annotator | None[source]
get_attribute_info(attribute_name: str) AttributeInfo | None[source]
get_attributes() list[dae.annotation.annotation_pipeline.AttributeInfo][source]
get_info() list[dae.annotation.annotation_pipeline.AnnotatorInfo][source]
get_resource_ids() set[str][source]
open() AnnotationPipeline[source]

Open all annotators in the pipeline and mark it as open.

class dae.annotation.annotation_pipeline.Annotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo)[source]

Bases: ABC

Annotator provides a set of attrubutes for a given Annotatable.

abstract annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any][source]

Produce annotation attributes for an annotatable.

property attributes: list[dae.annotation.annotation_pipeline.AttributeInfo]
close() None[source]
get_info() AnnotatorInfo[source]
is_open() bool[source]
open() Annotator[source]
property resource_ids: set[str]
property resources: list[dae.genomic_resources.repository.GenomicResource]
property used_context_attributes: tuple[str, ...]
class dae.annotation.annotation_pipeline.AnnotatorDecorator(child: Annotator)[source]

Bases: Annotator

Defines annotator decorator base class.

close() None[source]
is_open() bool[source]
open() Annotator[source]
class dae.annotation.annotation_pipeline.AnnotatorInfo(_type: str, attributes: list[dae.annotation.annotation_pipeline.AttributeInfo], parameters: ParamsUsageMonitor | dict[str, Any], documentation: str = '', resources: list[dae.genomic_resources.repository.GenomicResource] | None = None, annotator_id: str = 'N/A')[source]

Bases: object

Defines annotator configuration.

annotator_id: str
attributes: list[dae.annotation.annotation_pipeline.AttributeInfo]
documentation: str = ''
parameters: ParamsUsageMonitor
resources: list[dae.genomic_resources.repository.GenomicResource]
type: str
class dae.annotation.annotation_pipeline.AttributeInfo(name: str, source: str, internal: bool, parameters: ParamsUsageMonitor | dict[str, Any], _type: str = 'str', description: str = '', documentation: str | None = None)[source]

Bases: object

Defines annotation attribute configuration.

description: str = ''
property documentation: str
internal: bool
name: str
parameters: ParamsUsageMonitor
source: str
type: str = 'str'
class dae.annotation.annotation_pipeline.InputAnnotableAnnotatorDecorator(child: Annotator)[source]

Bases: AnnotatorDecorator

Defines annotator decorator to use input annotatable if defined.

annotate(_: Annotatable | None, context: dict[str, Any]) dict[str, Any][source]

Produce annotation attributes for an annotatable.

static decorate(child: Annotator) Annotator[source]
property used_context_attributes: tuple[str, ...]
class dae.annotation.annotation_pipeline.ParamsUsageMonitor(data: dict[str, Any])[source]

Bases: Mapping

Class to monitor usage of annotator parameters.

get_unused_keys() set[str][source]
get_used_keys() set[str][source]
class dae.annotation.annotation_pipeline.ReannotationPipeline(pipeline_new: AnnotationPipeline, pipeline_old: AnnotationPipeline)[source]

Bases: AnnotationPipeline

Special pipeline that handles reannotation of a previous pipeline.

AnnotationDependencyGraph

alias of dict[AnnotatorInfo, list[tuple[AnnotatorInfo, AttributeInfo]]]

annotate(annotatable: Annotatable, record: dict) dict[source]

Apply all annotators to an annotatable.

annotate_summary_allele(allele: SummaryAllele) dict[source]
static build_dependency_graph(pipeline: AnnotationPipeline) AnnotationDependencyGraph[source]

Make dependency graph for an annotation pipeline.

get_attributes() list[dae.annotation.annotation_pipeline.AttributeInfo][source]
get_dependencies_for(info: AnnotatorInfo) set[dae.annotation.annotation_pipeline.AnnotatorInfo][source]

Get all dependencies for a given annotator.

get_dependents_for(info: AnnotatorInfo) set[dae.annotation.annotation_pipeline.AnnotatorInfo][source]

Get all dependents for a given annotator.

class dae.annotation.annotation_pipeline.ValueTransformAnnotatorDecorator(child: Annotator, value_transformers: dict[str, Callable[[Any], Any]])[source]

Bases: AnnotatorDecorator

Define value transformer annotator decorator.

annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any][source]

Produce annotation attributes for an annotatable.

static decorate(child: Annotator) Annotator[source]

Apply value transform decorator to an annotator.

dae.annotation.annotator_base module

Provides base class for annotators.

class dae.annotation.annotator_base.AnnotatorBase(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, source_type_desc: dict[str, tuple[str, str]])[source]

Bases: Annotator

Base implementation of the Annotator class.

annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any][source]

Produce annotation attributes for an annotatable.

dae.annotation.clinvar_annotator module

dae.annotation.context module

class dae.annotation.context.CLIAnnotationContext(context_objects: Dict[str, Any], source: tuple[str, ...])[source]

Bases: CLIGenomicContext

Defines annotation pipeline genomics context.

static context_builder(args: Namespace) CLIAnnotationContext[source]

Build a CLI genomic context.

static get_pipeline(context: GenomicContext) AnnotationPipeline[source]

Construct an annotation pipeline.

static register(args: Namespace) None[source]

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

dae.annotation.effect_annotator module

class dae.annotation.effect_annotator.EffectAnnotatorAdapter(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]

Bases: AnnotatorBase

Adapts effect annotator to be used in annotation infrastructure.

close() None[source]
open() Annotator[source]
dae.annotation.effect_annotator.build_effect_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator[source]

dae.annotation.gene_score_annotator module

Module containing the gene score annotator.

class dae.annotation.gene_score_annotator.GeneScoreAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, gene_score_resource: GenomicResource, input_gene_list: str)[source]

Bases: Annotator

Gene score annotator class.

DEFAULT_AGGREGATOR_TYPE = 'dict'
aggregate_gene_values(score_id: str, gene_symbols: list[str], aggregator_type: str) Any[source]

Aggregate gene score values.

annotate(_: Annotatable | None, context: dict[str, Any]) dict[str, Any][source]

Produce annotation attributes for an annotatable.

property used_context_attributes: tuple[str, ...]
dae.annotation.gene_score_annotator.build_gene_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator[source]

Create a gene score annotator.

dae.annotation.liftover_annotator module

Provides a lift over annotator and helpers.

class dae.annotation.liftover_annotator.LiftOverAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, chain: LiftoverChain, target_genome: ReferenceGenome)[source]

Bases: AnnotatorBase

Liftovver annotator class.

close() None[source]
liftover_allele(allele: VCFAllele) VCFAllele | None[source]

Liftover an allele.

liftover_cnv(cnv_allele: Annotatable) Annotatable | None[source]

Liftover CNV allele annotatable.

liftover_position(position: Annotatable) Annotatable | None[source]

Liftover position annotatable.

liftover_region(region: Annotatable) Annotatable | None[source]

Liftover region annotatable.

open() Annotator[source]
dae.annotation.liftover_annotator.build_liftover_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator[source]

Create a liftover annotator.

dae.annotation.normalize_allele_annotator module

Provides normalize allele annotator and helpers.

class dae.annotation.normalize_allele_annotator.NormalizeAlleleAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]

Bases: AnnotatorBase

Annotator to normalize VCF alleles.

close() None[source]
open() Annotator[source]
dae.annotation.normalize_allele_annotator.build_normalize_allele_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator[source]
dae.annotation.normalize_allele_annotator.normalize_allele(allele: VCFAllele, genome: ReferenceGenome) VCFAllele[source]

Normalize an allele.

Using algorithm defined in following https://genome.sph.umich.edu/wiki/Variant_Normalization

dae.annotation.record_to_annotatable module

class dae.annotation.record_to_annotatable.CSHLAlleleRecordToAnnotatable(columns: tuple, ref_genome: ReferenceGenome | None)[source]

Bases: RecordToAnnotable

Transform a CSHL variant record into a VCF allele annotatable.

build(record: dict[str, str]) Annotatable[source]
class dae.annotation.record_to_annotatable.RecordToAnnotable(columns: tuple, ref_genome: ReferenceGenome | None)[source]

Bases: ABC

abstract build(record: dict[str, str]) Annotatable[source]
class dae.annotation.record_to_annotatable.RecordToCNVAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]

Bases: RecordToAnnotable

Transform a columns record into a CNV allele annotatable.

build(record: dict[str, str]) Annotatable[source]
class dae.annotation.record_to_annotatable.RecordToPosition(columns: tuple, ref_genome: ReferenceGenome | None)[source]

Bases: RecordToAnnotable

build(record: dict[str, str]) Annotatable[source]
class dae.annotation.record_to_annotatable.RecordToRegion(columns: tuple, ref_genome: ReferenceGenome | None)[source]

Bases: RecordToAnnotable

build(record: dict[str, str]) Annotatable[source]
class dae.annotation.record_to_annotatable.RecordToVcfAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]

Bases: RecordToAnnotable

build(record: dict[str, str]) Annotatable[source]
class dae.annotation.record_to_annotatable.VcfLikeRecordToVcfAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]

Bases: RecordToAnnotable

Transform a columns record into VCF allele annotatable.

build(record: dict[str, str]) Annotatable[source]
dae.annotation.record_to_annotatable.add_record_to_annotable_arguments(parser: ArgumentParser) None[source]
dae.annotation.record_to_annotatable.build_record_to_annotatable(parameters: dict[str, str], available_columns: set[str], ref_genome: ReferenceGenome | None = None) RecordToAnnotable[source]

Transform a variant record into an annotatable.

dae.annotation.schema module

dae.annotation.score_annotator module

This contains the implementation of the three score annotators.

Genomic score annotators defined are positions_score, np_score, and allele_score.

class dae.annotation.score_annotator.AlleleScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]

Bases: GenomicScoreAnnotatorBase

This class implements allele_score annotator.

annotate(annotatable: Annotatable | None, _: dict[str, Any]) dict[str, Any][source]

Produce annotation attributes for an annotatable.

build_score_aggregator_documentation(attr_info: AttributeInfo) list[str][source]

Collect score aggregator documentation.

class dae.annotation.score_annotator.GenomicScoreAnnotatorBase(pipeline: AnnotationPipeline, info: AnnotatorInfo, score: GenomicScore)[source]

Bases: Annotator

Genomic score base annotator.

add_score_aggregator_documentation(attribute_info: AttributeInfo, aggregator: str, attribute_conf_agg: str | None) None[source]

Collect score aggregator documentation.

abstract build_score_aggregator_documentation(attr_info: AttributeInfo) list[str][source]

Construct score aggregator documentation.

close() None[source]
is_open() bool[source]
open() Annotator[source]
class dae.annotation.score_annotator.NPScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]

Bases: PositionScoreAnnotatorBase

This class implements np_score annotator.

build_score_aggregator_documentation(attr_info: AttributeInfo) list[str][source]

Collect score aggregator documentation.

class dae.annotation.score_annotator.PositionScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]

Bases: PositionScoreAnnotatorBase

This class implements the position_score annotator.

The position_score annotator requires the resrouce_id parameter, whose value must be an id of a genomic resource of type position_score.

The position_score resource provides a set of scores (see …) that the position_score annotator uses as attributes to assign to the annotatable.

The position_score annotator recognized one attribute level parameter called position_aggregator that controls how the position scores are aggregator for annotates that ref to a region of the reference genome.

build_score_aggregator_documentation(attr_info: AttributeInfo) list[str][source]

Collect score aggregator documentation.

class dae.annotation.score_annotator.PositionScoreAnnotatorBase(pipeline: AnnotationPipeline, info: AnnotatorInfo, score: GenomicScore)[source]

Bases: GenomicScoreAnnotatorBase

Defines position score base annotator class.

annotate(annotatable: Annotatable | None, _: dict[str, Any]) dict[str, Any][source]

Produce annotation attributes for an annotatable.

dae.annotation.score_annotator.build_allele_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator[source]
dae.annotation.score_annotator.build_np_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator[source]
dae.annotation.score_annotator.build_position_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator[source]
dae.annotation.score_annotator.get_genomic_resource(pipeline: AnnotationPipeline, info: AnnotatorInfo, resource_type: str) GenomicResource[source]

Return genomic score resource used for given genomic score annotator.

dae.annotation.utils module

Module contents