dae.annotation package
Submodules
dae.annotation.annotatable module
- class dae.annotation.annotatable.Annotatable(chrom: str, pos: int, pos_end: int, annotatable_type: Type)[source]
Bases:
object
Base class for annotatables used in annotation pipeline.
- class Type(value)[source]
Bases:
Enum
Defines annotatable types.
- COMPLEX = 5
- LARGE_DELETION = 7
- LARGE_DUPLICATION = 6
- POSITION = 0
- REGION = 1
- SMALL_DELETION = 4
- SMALL_INSERTION = 3
- SUBSTITUTION = 2
- property chrom: str
- property chromosome: str
- property end_position: int
- static from_string(value: str) Annotatable [source]
Deserialize an Annotatable instance from a string value.
- property pos: int
- property pos_end: int
- property position: int
- class dae.annotation.annotatable.CNVAllele(chrom: str, pos_begin: int, pos_end: int, cnv_type: Type)[source]
Bases:
Annotatable
Defines copy number variants annotatable.
- class dae.annotation.annotatable.Position(chrom: str, pos: int)[source]
Bases:
Annotatable
Annotatable class representing a single position in a chromosome.
- class dae.annotation.annotatable.Region(chrom: str, pos_begin: int, pos_end: int)[source]
Bases:
Annotatable
Annotatable class representing a region in a chromosome.
- class dae.annotation.annotatable.VCFAllele(chrom: str, pos: int, ref: str, alt: str)[source]
Bases:
Annotatable
Defines small variants annotatable.
- property alt: str
- property alternative: str
- static from_string(value: str) VCFAllele [source]
Deserialize an Annotatable instance from a string value.
- property ref: str
- property reference: str
dae.annotation.annotate_columns module
- class dae.annotation.annotate_columns.AnnotateColumnsTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]
Bases:
AnnotationTool
Annotation tool for TSV-style text files.
- dae.annotation.annotate_columns.combine(args: Any, partfile_paths: list[str], out_file_path: str) None [source]
Combine annotated region parts into a single VCF file.
- dae.annotation.annotate_columns.produce_tabix_index(filepath: str, args: Any, header: list[str], ref_genome: ReferenceGenome | None) None [source]
Produce a tabix index file for the given variants file.
dae.annotation.annotate_vcf module
- class dae.annotation.annotate_vcf.AnnotateVCFTool(raw_args: list[str] | None = None, gpf_instance: GPFInstance | None = None)[source]
Bases:
AnnotationTool
Annotation tool for the VCF file format.
- dae.annotation.annotate_vcf.combine(input_file_path: str, pipeline_config: list[dae.annotation.annotation_pipeline.AnnotatorInfo] | None, grr_definition: dict | None, partfile_paths: List[str], output_file_path: str) None [source]
Combine annotated region parts into a single VCF file.
- dae.annotation.annotate_vcf.update_header(variant_file: VariantFile, pipeline: AnnotationPipeline | ReannotationPipeline) None [source]
Update a variant file’s header with annotation pipeline scores.
dae.annotation.annotation_factory module
Factory for creation of annotation pipeline.
- class dae.annotation.annotation_factory.AnnotationConfigParser[source]
Bases:
object
Parser for annotation configuration.
- static has_wildcard(string: str) bool [source]
Ascertain whether a string contains a valid wildcard.
- static match_labels_query(query: dict[str, str], resource_labels: dict[str, str]) bool [source]
Check if the labels query for a wildcard matches.
- static normalize(pipeline_config: List[Any]) List[Dict] [source]
Return a normalized annotation pipeline configuration.
- static parse_complete(raw: dict[str, Any], idx: int) AnnotatorInfo [source]
Parse a full-form annotation config.
- static parse_config_file(filename: str, grr: GenomicResourceRepo | None) List[AnnotatorInfo] [source]
Parse annotation pipeline configuration file.
- static parse_minimal(raw: str, idx: int) AnnotatorInfo [source]
Parse a minimal-form annotation config.
- static parse_raw(pipeline_raw_config: list[dict[str, Any]] | None, grr: GenomicResourceRepo | None = None) list[dae.annotation.annotation_pipeline.AnnotatorInfo] [source]
Parse raw dictionary annotation pipeline configuration.
- static parse_raw_attribute_config(raw_attribute_config: dict[str, Any]) AttributeInfo [source]
Parse annotation attribute raw configuration.
- static parse_raw_attributes(raw_attributes_config: Any) list[dae.annotation.annotation_pipeline.AttributeInfo] [source]
Parse annotator pipeline attribute configuration.
- static parse_short(raw: dict[str, Any], idx: int, grr: GenomicResourceRepo | None = None) list[dae.annotation.annotation_pipeline.AnnotatorInfo] [source]
Parse a short-form annotation config.
- static parse_str(content: str, source_file_name: str | None = None, grr: GenomicResourceRepo | None = None) list[dae.annotation.annotation_pipeline.AnnotatorInfo] [source]
Parse annotation pipeline configuration string.
- static query_resources(annotator_type: str, wildcard: str, grr: GenomicResourceRepo) list[str] [source]
Collect resources matching a given query.
- dae.annotation.annotation_factory.build_annotation_pipeline(pipeline_config: list[dae.annotation.annotation_pipeline.AnnotatorInfo] | None = None, pipeline_config_raw: list[dict] | None = None, pipeline_config_file: str | None = None, pipeline_config_str: str | None = None, grr_repository: GenomicResourceRepo | None = None, grr_repository_file: str | None = None, grr_repository_definition: dict | None = None, allow_repeated_attributes: bool = False) AnnotationPipeline [source]
Build an annotation pipeline.
- dae.annotation.annotation_factory.check_for_repeated_attributes_in_annotator(annotator_config: AnnotatorInfo) None [source]
Check for repeated attributes in annotator configuration.
- dae.annotation.annotation_factory.check_for_repeated_attributes_in_pipeline(pipeline: AnnotationPipeline, allow_repeated_attributes: bool = False) None [source]
Check for repeated attributes in pipeline configuration.
- dae.annotation.annotation_factory.check_for_unused_parameters(info: AnnotatorInfo) None [source]
Check annotator configuration for unused parameters.
- dae.annotation.annotation_factory.copy_annotation_pipeline(pipeline: AnnotationPipeline) AnnotationPipeline [source]
Copy an annotation pipeline instance.
- dae.annotation.annotation_factory.copy_reannotation_pipeline(pipeline: ReannotationPipeline) ReannotationPipeline [source]
Copy a reannotation pipeline instance.
- dae.annotation.annotation_factory.get_annotator_factory(annotator_type: str) Callable[[AnnotationPipeline, AnnotatorInfo], Annotator] [source]
Find and return a factory function for creation of an annotator type.
If the specified annotator type is not found, this function raises ValueError exception.
- Returns:
the annotator factory for the specified annotator type.
- Raises:
ValueError – when can’t find an annotator factory for the specified annotator type.
- dae.annotation.annotation_factory.get_available_annotator_types() List[str] [source]
Return the list of all registered annotator factory types.
- dae.annotation.annotation_factory.register_annotator_factory(annotator_type: str, factory: Callable[[AnnotationPipeline, AnnotatorInfo], Annotator]) None [source]
Register additional annotator factory.
By default all genotype storage factories should be registered at [dae.genotype_storage.factories] extenstion point. All registered factories are loaded automatically. This function should be used if you want to bypass extension point mechanism and register addition genotype storage factory programatically.
- dae.annotation.annotation_factory.resolve_repeated_attributes(pipeline: AnnotationPipeline, repeated_attributes: set[str]) None [source]
Resolve repeated attributes in pipeline configuration via renaming.
dae.annotation.annotation_pipeline module
Provides annotation pipeline class.
- class dae.annotation.annotation_pipeline.AnnotationPipeline(repository: GenomicResourceRepo)[source]
Bases:
object
Provides annotation pipeline abstraction.
- annotate(annotatable: Annotatable, context: dict | None = None) dict [source]
Apply all annotators to an annotatable.
- get_annotator_by_attribute_info(attribute_info: AttributeInfo) Annotator | None [source]
- get_attribute_info(attribute_name: str) AttributeInfo | None [source]
- get_attributes() list[dae.annotation.annotation_pipeline.AttributeInfo] [source]
- get_info() list[dae.annotation.annotation_pipeline.AnnotatorInfo] [source]
- open() AnnotationPipeline [source]
Open all annotators in the pipeline and mark it as open.
- class dae.annotation.annotation_pipeline.Annotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo)[source]
Bases:
ABC
Annotator provides a set of attrubutes for a given Annotatable.
- abstract annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- property attributes: list[dae.annotation.annotation_pipeline.AttributeInfo]
- get_info() AnnotatorInfo [source]
- property resource_ids: set[str]
- property resources: list[dae.genomic_resources.repository.GenomicResource]
- property used_context_attributes: tuple[str, ...]
- class dae.annotation.annotation_pipeline.AnnotatorDecorator(child: Annotator)[source]
Bases:
Annotator
Defines annotator decorator base class.
- class dae.annotation.annotation_pipeline.AnnotatorInfo(_type: str, attributes: list[dae.annotation.annotation_pipeline.AttributeInfo], parameters: ParamsUsageMonitor | dict[str, Any], documentation: str = '', resources: list[dae.genomic_resources.repository.GenomicResource] | None = None, annotator_id: str = 'N/A')[source]
Bases:
object
Defines annotator configuration.
- annotator_id: str
- attributes: list[dae.annotation.annotation_pipeline.AttributeInfo]
- documentation: str = ''
- parameters: ParamsUsageMonitor
- resources: list[dae.genomic_resources.repository.GenomicResource]
- type: str
- class dae.annotation.annotation_pipeline.AttributeInfo(name: str, source: str, internal: bool, parameters: ParamsUsageMonitor | dict[str, Any], _type: str = 'str', description: str = '', documentation: str | None = None)[source]
Bases:
object
Defines annotation attribute configuration.
- description: str = ''
- property documentation: str
- internal: bool
- name: str
- parameters: ParamsUsageMonitor
- source: str
- type: str = 'str'
- class dae.annotation.annotation_pipeline.InputAnnotableAnnotatorDecorator(child: Annotator)[source]
Bases:
AnnotatorDecorator
Defines annotator decorator to use input annotatable if defined.
- annotate(_: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- property used_context_attributes: tuple[str, ...]
- class dae.annotation.annotation_pipeline.ParamsUsageMonitor(data: dict[str, Any])[source]
Bases:
Mapping
Class to monitor usage of annotator parameters.
- class dae.annotation.annotation_pipeline.ReannotationPipeline(pipeline_new: AnnotationPipeline, pipeline_old: AnnotationPipeline)[source]
Bases:
AnnotationPipeline
Special pipeline that handles reannotation of a previous pipeline.
- AnnotationDependencyGraph
alias of
dict
[AnnotatorInfo
,list
[tuple
[AnnotatorInfo
,AttributeInfo
]]]
- annotate(annotatable: Annotatable, record: dict) dict [source]
Apply all annotators to an annotatable.
- annotate_summary_allele(allele: SummaryAllele) dict [source]
- static build_dependency_graph(pipeline: AnnotationPipeline) AnnotationDependencyGraph [source]
Make dependency graph for an annotation pipeline.
- get_attributes() list[dae.annotation.annotation_pipeline.AttributeInfo] [source]
- get_dependencies_for(info: AnnotatorInfo) set[dae.annotation.annotation_pipeline.AnnotatorInfo] [source]
Get all dependencies for a given annotator.
- get_dependents_for(info: AnnotatorInfo) set[dae.annotation.annotation_pipeline.AnnotatorInfo] [source]
Get all dependents for a given annotator.
- class dae.annotation.annotation_pipeline.ValueTransformAnnotatorDecorator(child: Annotator, value_transformers: dict[str, Callable[[Any], Any]])[source]
Bases:
AnnotatorDecorator
Define value transformer annotator decorator.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
dae.annotation.annotator_base module
Provides base class for annotators.
- class dae.annotation.annotator_base.AnnotatorBase(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, source_type_desc: dict[str, tuple[str, str]])[source]
Bases:
Annotator
Base implementation of the Annotator class.
- annotate(annotatable: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
dae.annotation.clinvar_annotator module
dae.annotation.context module
- class dae.annotation.context.CLIAnnotationContext(context_objects: Dict[str, Any], source: tuple[str, ...])[source]
Bases:
CLIGenomicContext
Defines annotation pipeline genomics context.
- static context_builder(args: Namespace) CLIAnnotationContext [source]
Build a CLI genomic context.
- static get_pipeline(context: GenomicContext) AnnotationPipeline [source]
Construct an annotation pipeline.
dae.annotation.effect_annotator module
- class dae.annotation.effect_annotator.EffectAnnotatorAdapter(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
AnnotatorBase
Adapts effect annotator to be used in annotation infrastructure.
- dae.annotation.effect_annotator.build_effect_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
dae.annotation.gene_score_annotator module
Module containing the gene score annotator.
- class dae.annotation.gene_score_annotator.GeneScoreAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, gene_score_resource: GenomicResource, input_gene_list: str)[source]
Bases:
Annotator
Gene score annotator class.
- DEFAULT_AGGREGATOR_TYPE = 'dict'
- aggregate_gene_values(score_id: str, gene_symbols: list[str], aggregator_type: str) Any [source]
Aggregate gene score values.
- annotate(_: Annotatable | None, context: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- property used_context_attributes: tuple[str, ...]
- dae.annotation.gene_score_annotator.build_gene_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
Create a gene score annotator.
dae.annotation.liftover_annotator module
Provides a lift over annotator and helpers.
- class dae.annotation.liftover_annotator.LiftOverAnnotator(pipeline: AnnotationPipeline | None, info: AnnotatorInfo, chain: LiftoverChain, target_genome: ReferenceGenome)[source]
Bases:
AnnotatorBase
Liftovver annotator class.
- liftover_cnv(cnv_allele: Annotatable) Annotatable | None [source]
Liftover CNV allele annotatable.
- liftover_position(position: Annotatable) Annotatable | None [source]
Liftover position annotatable.
- liftover_region(region: Annotatable) Annotatable | None [source]
Liftover region annotatable.
- dae.annotation.liftover_annotator.build_liftover_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
Create a liftover annotator.
dae.annotation.normalize_allele_annotator module
Provides normalize allele annotator and helpers.
- class dae.annotation.normalize_allele_annotator.NormalizeAlleleAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
AnnotatorBase
Annotator to normalize VCF alleles.
- dae.annotation.normalize_allele_annotator.build_normalize_allele_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.normalize_allele_annotator.normalize_allele(allele: VCFAllele, genome: ReferenceGenome) VCFAllele [source]
Normalize an allele.
Using algorithm defined in following https://genome.sph.umich.edu/wiki/Variant_Normalization
dae.annotation.record_to_annotatable module
- class dae.annotation.record_to_annotatable.CSHLAlleleRecordToAnnotatable(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
Transform a CSHL variant record into a VCF allele annotatable.
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToAnnotable(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
ABC
- abstract build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToCNVAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
Transform a columns record into a CNV allele annotatable.
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToPosition(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToRegion(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.RecordToVcfAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
- build(record: dict[str, str]) Annotatable [source]
- class dae.annotation.record_to_annotatable.VcfLikeRecordToVcfAllele(columns: tuple, ref_genome: ReferenceGenome | None)[source]
Bases:
RecordToAnnotable
Transform a columns record into VCF allele annotatable.
- build(record: dict[str, str]) Annotatable [source]
- dae.annotation.record_to_annotatable.add_record_to_annotable_arguments(parser: ArgumentParser) None [source]
- dae.annotation.record_to_annotatable.build_record_to_annotatable(parameters: dict[str, str], available_columns: set[str], ref_genome: ReferenceGenome | None = None) RecordToAnnotable [source]
Transform a variant record into an annotatable.
dae.annotation.schema module
dae.annotation.score_annotator module
This contains the implementation of the three score annotators.
Genomic score annotators defined are positions_score, np_score, and allele_score.
- class dae.annotation.score_annotator.AlleleScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
GenomicScoreAnnotatorBase
This class implements allele_score annotator.
- annotate(annotatable: Annotatable | None, _: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- build_score_aggregator_documentation(attr_info: AttributeInfo) list[str] [source]
Collect score aggregator documentation.
- class dae.annotation.score_annotator.GenomicScoreAnnotatorBase(pipeline: AnnotationPipeline, info: AnnotatorInfo, score: GenomicScore)[source]
Bases:
Annotator
Genomic score base annotator.
- add_score_aggregator_documentation(attribute_info: AttributeInfo, aggregator: str, attribute_conf_agg: str | None) None [source]
Collect score aggregator documentation.
- abstract build_score_aggregator_documentation(attr_info: AttributeInfo) list[str] [source]
Construct score aggregator documentation.
- class dae.annotation.score_annotator.NPScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
PositionScoreAnnotatorBase
This class implements np_score annotator.
- build_score_aggregator_documentation(attr_info: AttributeInfo) list[str] [source]
Collect score aggregator documentation.
- class dae.annotation.score_annotator.PositionScoreAnnotator(pipeline: AnnotationPipeline, info: AnnotatorInfo)[source]
Bases:
PositionScoreAnnotatorBase
This class implements the position_score annotator.
The position_score annotator requires the resrouce_id parameter, whose value must be an id of a genomic resource of type position_score.
The position_score resource provides a set of scores (see …) that the position_score annotator uses as attributes to assign to the annotatable.
The position_score annotator recognized one attribute level parameter called position_aggregator that controls how the position scores are aggregator for annotates that ref to a region of the reference genome.
- build_score_aggregator_documentation(attr_info: AttributeInfo) list[str] [source]
Collect score aggregator documentation.
- class dae.annotation.score_annotator.PositionScoreAnnotatorBase(pipeline: AnnotationPipeline, info: AnnotatorInfo, score: GenomicScore)[source]
Bases:
GenomicScoreAnnotatorBase
Defines position score base annotator class.
- annotate(annotatable: Annotatable | None, _: dict[str, Any]) dict[str, Any] [source]
Produce annotation attributes for an annotatable.
- dae.annotation.score_annotator.build_allele_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.score_annotator.build_np_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.score_annotator.build_position_score_annotator(pipeline: AnnotationPipeline, info: AnnotatorInfo) Annotator [source]
- dae.annotation.score_annotator.get_genomic_resource(pipeline: AnnotationPipeline, info: AnnotatorInfo, resource_type: str) GenomicResource [source]
Return genomic score resource used for given genomic score annotator.