dae.genomic_resources package

Submodules

dae.genomic_resources.aggregators module

class dae.genomic_resources.aggregators.Aggregator[source]

Bases: ABC

Base class for score aggregators.

add(value: Any, **kwargs: Any) None[source]
clear() None[source]
get_final() Any[source]
get_total_count() int[source]
get_used_count() int[source]
class dae.genomic_resources.aggregators.ConcatAggregator[source]

Bases: Aggregator

Aggregator that concatenates all passed values.

get_final() Any[source]
class dae.genomic_resources.aggregators.DictAggregator[source]

Bases: Aggregator

Aggregator that builds a dictionary of all passed values.

get_final() Any[source]
class dae.genomic_resources.aggregators.JoinAggregator(separator: str)[source]

Bases: Aggregator

Aggregator that joins all passed values using a separator.

get_final() Any[source]
class dae.genomic_resources.aggregators.ListAggregator[source]

Bases: Aggregator

Aggregator that builds a list of all passed values.

get_final() Any[source]
class dae.genomic_resources.aggregators.MaxAggregator[source]

Bases: Aggregator

Maximum value aggregator for genomic scores.

get_final() Any[source]
class dae.genomic_resources.aggregators.MeanAggregator[source]

Bases: Aggregator

Aggregator for genomic scores that calculates mean value.

get_final() Any[source]
class dae.genomic_resources.aggregators.MedianAggregator[source]

Bases: Aggregator

Aggregator for genomic scores that calculates median value.

get_final() Any[source]
class dae.genomic_resources.aggregators.MinAggregator[source]

Bases: Aggregator

Minimum value aggregator for genomic scores.

get_final() Any[source]
class dae.genomic_resources.aggregators.ModeAggregator[source]

Bases: Aggregator

Aggregator for genomic scores that calculates mode value.

get_final() Any[source]
dae.genomic_resources.aggregators.build_aggregator(aggregator_type: str) Aggregator[source]
dae.genomic_resources.aggregators.create_aggregator(aggregator_def: dict[str, Any]) Aggregator[source]

Create an aggregator by aggregator definition.

dae.genomic_resources.aggregators.create_aggregator_definition(aggregator_type: str) dict[str, Any][source]

Parse an aggregator definition string.

dae.genomic_resources.aggregators.get_aggregator_class(aggregator: str) Callable[[], Aggregator][source]
dae.genomic_resources.aggregators.validate_aggregator(aggregator_type: str) None[source]

dae.genomic_resources.cached_repository module

Provides caching genomic resources.

class dae.genomic_resources.cached_repository.CacheResource(resource: GenomicResource, protocol: CachingProtocol)[source]

Bases: GenomicResource

Represents resources stored in cache.

class dae.genomic_resources.cached_repository.CachingProtocol(remote_protocol: ReadOnlyRepositoryProtocol, local_protocol: FsspecReadWriteProtocol)[source]

Bases: ReadOnlyRepositoryProtocol

Defines caching GRR repository protocol.

file_exists(resource: GenomicResource, filename: str) bool[source]

Check if given file exist in give resource.

get_all_resources() Generator[GenomicResource, None, None][source]

Return generator for all resources in the repository.

get_url() str[source]

Return the repository URL.

invalidate() None[source]

Invalidate internal cache of repository protocol.

load_manifest(resource: GenomicResource) Manifest[source]

Load resource manifest.

open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open file in a resource and returns a file-like object.

open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file in a resource and return a pysam tabix file.

Not all repositories support this method. Repositories that do no support this method raise and exception.

open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file in a resource and return a pysam VariantFile.

Not all repositories support this method. Repositories that do no support this method raise and exception.

refresh_cached_resource(resource: GenomicResource) None[source]

Refresh all resource files in cache if neccessary.

refresh_cached_resource_file(resource: GenomicResource, filename: str) tuple[str, str][source]

Refresh a resource file in cache if neccessary.

class dae.genomic_resources.cached_repository.GenomicResourceCachedRepo(child: GenomicResourceRepo, cache_url: str, **kwargs: str | None)[source]

Bases: GenomicResourceRepo

Defines caching genomic resources repository.

find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return requested resource or None if not found.

get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

get_resource_cached_files(resource_id: str) set[str][source]

Get a set of filenames of cached files for a given resource.

invalidate() None[source]

Invalidate internal state of the repository.

dae.genomic_resources.cached_repository.cache_resources(repository: GenomicResourceRepo, resource_ids: Iterable[str] | None, workers: int | None = None) None[source]

Cache resources from a list of remote resource IDs.

dae.genomic_resources.cli module

Provides CLI for management of genomic resources repositories.

dae.genomic_resources.cli.cli_browse(cli_args: list[str] | None = None) None[source]

Provide CLI for repository browsing.

dae.genomic_resources.cli.cli_manage(cli_args: list[str] | None = None) None[source]

Provide CLI for repository management.

dae.genomic_resources.cli.collect_dvc_entries(proto: ReadWriteRepositoryProtocol, res: GenomicResource) dict[str, dae.genomic_resources.repository.ManifestEntry][source]

Collect manifest entries defined by .dvc files.

dae.genomic_resources.clinvar module

dae.genomic_resources.fsspec_protocol module

Provides GRR protocols based on fsspec library.

class dae.genomic_resources.fsspec_protocol.FsspecReadOnlyProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]

Bases: ReadOnlyRepositoryProtocol

Provides fsspec genomic resources repository protocol.

file_exists(resource: GenomicResource, filename: str) bool[source]

Check if given file exist in give resource.

get_all_resources() Generator[GenomicResource, None, None][source]

Return generator over all resources in the repository.

get_resource_file_url(resource: GenomicResource, filename: str) str[source]

Return url of a file in the resource.

get_resource_url(resource: GenomicResource) str[source]

Return url of the specified resources.

get_url() str[source]

Return the repository URL.

invalidate() None[source]

Invalidate internal cache of repository protocol.

load_manifest(resource: GenomicResource) Manifest[source]

Load resource manifest.

open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open file in a resource and returns a file-like object.

open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file in a resource and return a pysam tabix file.

Not all repositories support this method. Repositories that do no support this method raise and exception.

open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file in a resource and return a pysam VariantFile.

Not all repositories support this method. Repositories that do no support this method raise and exception.

class dae.genomic_resources.fsspec_protocol.FsspecReadWriteProtocol(proto_id: str, url: str, filesystem: AbstractFileSystem)[source]

Bases: FsspecReadOnlyProtocol, ReadWriteRepositoryProtocol

Provides fsspec genomic resources repository protocol.

build_content_file() list[dict[str, Any]][source]

Build the content of the repository (i.e ‘.CONTENTS’ file).

build_index_info(repository_template: Template) dict[source]

Build info dict for the repository.

collect_all_resources() Generator[GenomicResource, None, None][source]

Return generator over all resources managed by this protocol.

collect_resource_entries(resource: GenomicResource) Manifest[source]

Scan the resource and resturn a manifest.

copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Copy a resource file into repository.

delete_resource_file(resource: GenomicResource, filename: str) None[source]

Delete a resource file and it’s internal state.

get_all_resources() Generator[GenomicResource, None, None][source]

Return generator over all resources in the repository.

get_resource_file_size(resource: GenomicResource, filename: str) int[source]

Return the size of a resource file.

get_resource_file_timestamp(resource: GenomicResource, filename: str) float[source]

Return the timestamp (ISO formatted) of a resource file.

load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None[source]

Load resource file state from internal GRR state.

If the specified resource file has no internal state returns None.

obtain_resource_file_lock(resource: GenomicResource, filename: str) ContextManager[source]

Lock a resource’s file.

save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None[source]

Save resource file state into internal GRR state.

update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Update a resource file into repository if needed.

dae.genomic_resources.fsspec_protocol.build_fsspec_protocol(proto_id: str, root_url: str, **kwargs: str | None) FsspecReadOnlyProtocol | FsspecReadWriteProtocol[source]

Create fsspec GRR protocol based on the root url.

dae.genomic_resources.fsspec_protocol.build_inmemory_protocol(proto_id: str, root_path: str, content: Dict[str, Any]) FsspecReadWriteProtocol[source]

Build and return an embedded fsspec protocol for testing.

dae.genomic_resources.fsspec_protocol.build_local_resource(dirname: str, config: Dict[str, Any]) GenomicResource[source]

Build a resource from a local filesystem directory.

dae.genomic_resources.gene_models module

class dae.genomic_resources.gene_models.Exon(start: int, stop: int, frame: int | None = None, number: int | None = None, cds_start: int | None = None, cds_stop: int | None = None)[source]

Bases: object

Provides exon model.

class dae.genomic_resources.gene_models.GeneModels(resource: GenomicResource)[source]

Bases: GenomicResourceImplementation, ResourceConfigValidationMixin, InfoImplementationMixin

Provides class for gene models.

SUPPORTED_GENE_MODELS_FILE_FORMATS = {'ccds', 'default', 'gtf', 'knowngene', 'refflat', 'refseq', 'ucscgenepred'}
add_statistics_build_tasks(task_graph: TaskGraph, **kwargs: Any) list[dae.task_graph.graph.Task][source]

Add tasks for calculating resource statistics to a task graph.

calc_info_hash() bytes[source]

Compute and return the info hash.

calc_statistics_hash() bytes[source]

Compute the statistics hash.

This hash is used to decide whether the resource statistics should be recomputed.

property files: set[str]

Return a list of resource files the implementation utilises.

gene_models_by_gene_name(name: str) list[dae.genomic_resources.gene_models.TranscriptModel] | None[source]
gene_models_by_location(chrom: str, pos1: int, pos2: int | None = None) list[dae.genomic_resources.gene_models.TranscriptModel][source]

Retrieve TranscriptModel objects based on genomic position(s).

Args:

chrom (str): The chromosome name. pos1 (int): The starting genomic position. pos2 (Optional[int]): The ending genomic position. If not provided,

only models that contain pos1 will be returned.

Returns:
list[TranscriptModel]: A list of TranscriptModel objects that

match the given location criteria.

gene_names() list[str][source]
get_info() str[source]

Construct the contents of the implementation’s HTML info page.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

get_statistics() dict[str, int] | None[source]

Try and load resource statistics.

get_template() Template[source]
is_loaded() bool[source]
load() GeneModels[source]

Load gene models.

relabel_chromosomes(relabel: dict[str, str] | None = None, map_file: str | None = None) None[source]

Relabel chromosomes in gene model.

property resource_id: str
save(output_filename: str, gzipped: bool = True) None[source]

Save gene models in a file in default file format.

update_indexes() None[source]
class dae.genomic_resources.gene_models.GeneModelsParser(*args, **kwargs)[source]

Bases: Protocol

Gene models parser function type.

class dae.genomic_resources.gene_models.TranscriptModel(gene: str, tr_id: str, tr_name: str, chrom: str, strand: str, tx: tuple[int, int], cds: tuple[int, int], exons: list[dae.genomic_resources.gene_models.Exon] | None = None, attributes: dict[str, Any] | None = None)[source]

Bases: object

Provides transcript model.

all_regions(ss_extend: int = 0, prom: int = 0) list[dae.utils.regions.BedRegion][source]

Build and return list of regions.

calc_frames() list[int][source]

Calculate codon frames.

cds_len() int[source]
cds_regions(ss_extend: int = 0) list[dae.utils.regions.BedRegion][source]

Compute CDS regions.

is_coding() bool[source]
test_frames() bool[source]
total_len() int[source]
update_frames() None[source]

Update codon frames.

utr3_len() int[source]
utr3_regions() list[dae.utils.regions.BedRegion][source]

Build and return list of UTR3 regions.

utr5_len() int[source]
utr5_regions() list[dae.utils.regions.BedRegion][source]

Build list of UTR5 regions.

dae.genomic_resources.gene_models.build_gene_models_from_file(file_name: str, file_format: str | None = None, gene_mapping_file_name: str | None = None) GeneModels[source]

Load gene models from local filesystem.

dae.genomic_resources.gene_models.build_gene_models_from_resource(resource: GenomicResource | None) GeneModels[source]

Load gene models from a genomic resource.

dae.genomic_resources.gene_models.join_gene_models(*gene_models: GeneModels) GeneModels[source]

Join muliple gene models into a single gene models object.

dae.genomic_resources.genomic_position_table module

class dae.genomic_resources.genomic_position_table.Line(raw_line: tuple, chrom_key: str | int = 0, pos_begin_key: str | int = 1, pos_end_key: str | int = 2, ref_key: str | int | None = None, alt_key: str | int | None = None, header: tuple[str, ...] | None = None)[source]

Bases: LineBase

Represents a line read from a genomic position table.

Provides attribute access to a number of important columns - chromosome, start position, end position, reference allele and alternative allele.

get(key: str | int) str[source]

Return score value.

row() tuple[source]

Return row as tuple.

class dae.genomic_resources.genomic_position_table.LineBuffer[source]

Bases: object

Represent a line buffer for Tabix genome position table.

append(line: LineBase) None[source]
clear() None[source]
contains(chrom: str, pos: int) bool[source]
fetch(chrom: str, pos_begin: int, pos_end: int) Generator[LineBase, None, None][source]

Return a generator of rows matching the region.

find_index(chrom: str, pos: int) int[source]

Find index in line buffer that contains the passed position.

peek_first() LineBase[source]
peek_last() LineBase[source]
pop_first() LineBase[source]
prune(chrom: str, pos: int) None[source]

Prune the buffer if needed.

region() tuple[Optional[str], Optional[int], Optional[int]][source]

Return region stored in the buffer.

class dae.genomic_resources.genomic_position_table.TabixGenomicPositionTable(genomic_resource: GenomicResource, table_definition: dict)[source]

Bases: GenomicPositionTable

Represents Tabix file genome position table.

BUFFER_MAXSIZE = 20000
close() None[source]

Close the resource.

get_all_records() Generator[LineBase | None, None, None][source]

Return generator of all records in the table.

get_chromosome_length(chrom: str, step: int = 100000000) int[source]

Return the length of a chromosome (or contig).

Returned value is guarnteed to be larget than the actual contig length.

get_chromosomes() list[str][source]

Return list of contigs in the genomic position table.

get_file_chromosomes() list[str][source]

Return chromosomes in a genomic table file.

This is to be overwritten by the subclass. It should return a list of the chromomes in the file in the order determinted by the file.

get_line_iterator(chrom: str | None = None, pos_begin: int | None = None) Generator[LineBase | None, None, None][source]

Extract raw lines and wrap them in our Line adapter.

get_records_in_region(chrom: str, pos_begin: int | None = None, pos_end: int | None = None) Generator[LineBase, None, None][source]

Return an iterable over the records in the specified range.

The interval is closed on both sides and 1-based.

open() TabixGenomicPositionTable[source]
class dae.genomic_resources.genomic_position_table.VCFGenomicPositionTable(genomic_resource: GenomicResource, table_definition: dict)[source]

Bases: TabixGenomicPositionTable

Represents a VCF file genome position table.

CHROM = 'CHROM'
POS_BEGIN = 'POS'
POS_END = 'POS'
get_file_chromosomes() list[str][source]

Return chromosomes in a genomic table file.

This is to be overwritten by the subclass. It should return a list of the chromomes in the file in the order determinted by the file.

get_line_iterator(chrom: str | None = None, pos_begin: int | None = None) Generator[VCFLine | None, None, None][source]

Extract raw lines and wrap them in our Line adapter.

open() VCFGenomicPositionTable[source]
class dae.genomic_resources.genomic_position_table.VCFLine(raw_line: VariantRecord, allele_index: int | None)[source]

Bases: LineBase

Line adapter for lines derived from a VCF file.

Implements functionality for handling multi-allelic variants and INFO fields.

get(key: str | int) Any[source]

Get a value from the INFO field of the VCF line.

row() tuple[source]

Return row as tuple.

dae.genomic_resources.genomic_position_table.build_genomic_position_table(resource: GenomicResource, table_definition: dict) GenomicPositionTable[source]

Instantiate a genome position table from a genomic resource.

dae.genomic_resources.genomic_context module

class dae.genomic_resources.genomic_context.CLIGenomicContext(context_objects: Dict[str, Any], source: tuple[str, ...])[source]

Bases: SimpleGenomicContext

Defines CLI genomics context.

static add_context_arguments(parser: ArgumentParser) None[source]

Add command line arguments to the argument parser.

static context_builder(args: Namespace) CLIGenomicContext[source]

Build a CLI genomic context.

static register(args: Namespace) None[source]

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

class dae.genomic_resources.genomic_context.DefaultRepositoryContextProvider[source]

Bases: SimpleGenomicContextProvider

Genomic context provider for default GRR.

static context_builder() GenomicContext[source]
static register() None[source]

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

class dae.genomic_resources.genomic_context.GenomicContext[source]

Bases: ABC

Abstract base class for genomic context.

abstract get_context_keys() set[str][source]

Return set of all keys that could be found in the context.

abstract get_context_object(key: str) Any | None[source]

Return a genomic context object corresponding to the passed key.

If there is no such object returns None.

get_gene_models() GeneModels | None[source]

Return gene models from context.

get_genomic_resources_repository() GenomicResourceRepo | None[source]

Return genomic resources repository from context.

get_reference_genome() ReferenceGenome | None[source]

Return reference genome from context.

abstract get_source() tuple[str, ...][source]

Return a tuple of strings that identifies the genomic context.

class dae.genomic_resources.genomic_context.GenomicContextProvider[source]

Bases: ABC

Abstract base class for genomic contexts provider.

abstract get_context_provider_priority() int[source]
abstract get_context_provider_type() str[source]
abstract get_contexts() Iterable[GenomicContext][source]
class dae.genomic_resources.genomic_context.PriorityGenomicContext(contexts: Iterable[GenomicContext])[source]

Bases: GenomicContext

Defines a priority genomic context.

get_context_keys() set[str][source]

Return set of all keys that could be found in the context.

get_context_object(key: str) Any | None[source]

Return a genomic context object corresponding to the passed key.

If there is no such object returns None.

get_source() tuple[str, ...][source]

Return a tuple of strings that identifies the genomic context.

class dae.genomic_resources.genomic_context.SimpleGenomicContext(context_objects: Dict[str, Any], source: tuple[str, ...])[source]

Bases: GenomicContext

Simple implementation of genomic context.

get_all_context_objects() Dict[str, Any][source]
get_context_keys() set[str][source]

Return set of all keys that could be found in the context.

get_context_object(key: str) Any | None[source]

Return a genomic context object corresponding to the passed key.

If there is no such object returns None.

get_source() tuple[str, ...][source]

Return a tuple of strings that identifies the genomic context.

class dae.genomic_resources.genomic_context.SimpleGenomicContextProvider(context_builder: Callable[[], GenomicContext | None], provider_type: str, priority: int)[source]

Bases: GenomicContextProvider

Simple implementation of genomic contexts provider.

get_context_provider_priority() int[source]
get_context_provider_type() str[source]
get_contexts() Iterable[GenomicContext][source]
dae.genomic_resources.genomic_context.get_genomic_context() GenomicContext[source]
dae.genomic_resources.genomic_context.register_context(context: GenomicContext) None[source]
dae.genomic_resources.genomic_context.register_context_provider(context_provider: GenomicContextProvider) None[source]

Register genomic context provider.

dae.genomic_resources.genomic_scores module

class dae.genomic_resources.genomic_scores.AlleleScore(resource: GenomicResource)[source]

Bases: GenomicScore

Defines allele genomic scores.

fetch_scores(chrom: str, position: int, reference: str, alternative: str, scores: list[str] | None = None) list[Any] | None[source]

Fetch scores values for specific allele.

fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[dae.genomic_resources.genomic_scores.AlleleScoreQuery] | None = None) list[dae.genomic_resources.aggregators.Aggregator][source]

Fetch score values in a region and aggregates them.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

open() AlleleScore[source]

Open genomic score resource and returns it.

class dae.genomic_resources.genomic_scores.AlleleScoreAggr(score: 'str', position_aggregator: 'Aggregator', allele_aggregator: 'Aggregator')[source]

Bases: object

allele_aggregator: Aggregator
position_aggregator: Aggregator
score: str
class dae.genomic_resources.genomic_scores.AlleleScoreQuery(score: 'str', position_aggregator: 'Optional[str]' = None, allele_aggregator: 'Optional[str]' = None)[source]

Bases: object

allele_aggregator: str | None = None
position_aggregator: str | None = None
score: str
class dae.genomic_resources.genomic_scores.GenomicScore(resource: GenomicResource)[source]

Bases: ResourceConfigValidationMixin

Genomic scores base class.

PositionScore, NPScore and AlleleScore inherit from this class. Statistics builder implementation uses only GenomicScore interface to build all defined statistics.

close() None[source]
fetch_region(chrom: str, pos_begin: int | None, pos_end: int | None, scores: Iterable[str]) Iterator[dict[str, Union[str, int, float, bool, NoneType]]][source]

Return score values in a region.

get_all_chromosomes() list[str][source]
get_all_scores() list[str][source]
get_config() dict[str, Any][source]
get_default_annotation_attribute(score_id: str) str | None[source]

Return default annotation attribute for a score.

Returns None if the score is not included in the default annotation. Returns the name of the attribute if present or the score if not.

get_default_annotation_attributes() list[Any][source]

Collect default annotation attributes.

get_histogram_filename(score_id: str) str[source]
get_histogram_image_filename(score_id: str) str[source]
get_histogram_image_url(score_id: str) str | None[source]
get_number_range(score_id: str) tuple[float, float] | None[source]

Return the value range for a number score.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

get_score_definition(score_id: str) _ScoreDef | None[source]
get_score_histogram(score_id: str) NullHistogram | CategoricalHistogram | NumberHistogram[source]

Return defined histogram for a score.

is_open() bool[source]
open() GenomicScore[source]

Open genomic score resource and returns it.

class dae.genomic_resources.genomic_scores.NPScore(resource: GenomicResource)[source]

Bases: GenomicScore

Defines nucleotide-position genomic score.

fetch_scores(chrom: str, position: int, reference: str, alternative: str, scores: list[str] | None = None) list[Any] | None[source]

Fetch score values at specified genomic position and nucleotide.

fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[dae.genomic_resources.genomic_scores.NPScoreQuery] | None = None) list[dae.genomic_resources.aggregators.Aggregator][source]

Fetch score values in a region and aggregates them.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

open() NPScore[source]

Open genomic score resource and returns it.

class dae.genomic_resources.genomic_scores.NPScoreAggr(score: 'str', position_aggregator: 'Aggregator', nucleotide_aggregator: 'Aggregator')[source]

Bases: object

nucleotide_aggregator: Aggregator
position_aggregator: Aggregator
score: str
class dae.genomic_resources.genomic_scores.NPScoreQuery(score: 'str', position_aggregator: 'Optional[str]' = None, nucleotide_aggregator: 'Optional[str]' = None)[source]

Bases: object

nucleotide_aggregator: str | None = None
position_aggregator: str | None = None
score: str
class dae.genomic_resources.genomic_scores.PositionScore(resource: GenomicResource)[source]

Bases: GenomicScore

Defines position genomic score.

fetch_scores(chrom: str, position: int, scores: list[str] | None = None) list[Any] | None[source]

Fetch score values at specific genomic position.

fetch_scores_agg(chrom: str, pos_begin: int, pos_end: int, scores: list[dae.genomic_resources.genomic_scores.PositionScoreQuery] | None = None) list[dae.genomic_resources.aggregators.Aggregator][source]

Fetch score values in a region and aggregates them.

Case 1:
res.fetch_scores_agg(“1”, 10, 20) –>

all score with default aggregators

Case 2:
res.fetch_scores_agg(“1”, 10, 20,

non_default_aggregators={“bla”:”max”}) –>

all score with default aggregators but ‘bla’ should use ‘max’

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

open() PositionScore[source]

Open genomic score resource and returns it.

class dae.genomic_resources.genomic_scores.PositionScoreAggr(score: 'str', position_aggregator: 'Aggregator')[source]

Bases: object

position_aggregator: Aggregator
score: str
class dae.genomic_resources.genomic_scores.PositionScoreQuery(score: 'str', position_aggregator: 'Optional[str]' = None)[source]

Bases: object

position_aggregator: str | None = None
score: str
class dae.genomic_resources.genomic_scores.ScoreDef(score_id: str, desc: str, value_type: str, pos_aggregator: str | None, nuc_aggregator: str | None, allele_aggregator: str | None, small_values_desc: str | None, large_values_desc: str | None, hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None)[source]

Bases: object

Score configuration definition.

allele_aggregator: str | None
desc: str
hist_conf: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None
large_values_desc: str | None
nuc_aggregator: str | None
pos_aggregator: str | None
score_id: str
small_values_desc: str | None
value_type: str
class dae.genomic_resources.genomic_scores.ScoreLine(line: LineBase, score_defs: dict[str, dae.genomic_resources.genomic_scores._ScoreDef])[source]

Bases: object

Abstraction for a genomic score line. Wraps the line adapter.

property alt: str | None
property chrom: str
get_available_scores() tuple[Any, ...][source]
get_score(score_id: str) Any | None[source]

Get and parse configured score from line.

property pos_begin: int
property pos_end: int
property ref: str | None
dae.genomic_resources.genomic_scores.build_score_from_resource(resource: GenomicResource) GenomicScore[source]

Build a genomic score resource and return the coresponding score.

dae.genomic_resources.group_repository module

Provides group genomic resources repository.

class dae.genomic_resources.group_repository.GenomicResourceGroupRepo(children: list[dae.genomic_resources.repository.GenomicResourceRepo], repo_id: str | None = None)[source]

Bases: GenomicResourceRepo

Defines group genomic resources repository.

find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return one resource with id qual to resource_id.

If resource is not found, None is returned.

get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

invalidate() None[source]

Invalidate internal state of the repository.

dae.genomic_resources.histogram module

Handling of genomic scores statistics.

Currently we support only genomic scores histograms.

class dae.genomic_resources.histogram.CategoricalHistogram(config: CategoricalHistogramConfig, values: dict[str, int] | None = None)[source]

Bases: Statistic

Class for categorical data histograms.

VALUES_LIMIT = 100
add_value(value: str | None) None[source]

Add a value to the categorical histogram.

Returns true if successfully added and false if failed. Will fail if too many values are accumulated.

property bars: dict[str, int]

Return categorical histogram bars in order.

static deserialize(content: str) CategoricalHistogram[source]

Create a statistic from serialized data.

static from_dict(data: dict[str, Any]) CategoricalHistogram[source]
merge(other: Statistic) None[source]

Merge with other histogram.

plot(outfile: IO, score_id: str) None[source]

Plot histogram and save it into outfile.

serialize() str[source]

Return a serialized version of this statistic.

to_dict() dict[str, Any][source]
type = 'categorical_histogram'
values_domain() str[source]
class dae.genomic_resources.histogram.CategoricalHistogramConfig(value_order: list[str] | None = None, y_log_scale: bool = False)[source]

Bases: object

Configuration class for categorical histograms.

static default_config() CategoricalHistogramConfig[source]
static from_dict(parsed: dict[str, Any]) CategoricalHistogramConfig[source]

Create categorical histogram config from configuratin dict.

to_dict() dict[str, Any][source]
value_order: list[str] | None = None
y_log_scale: bool = False
exception dae.genomic_resources.histogram.HistogramError[source]

Bases: BaseException

Class used for histogram specific errors.

Histograms should be nullified when a HistogramError occurs.

class dae.genomic_resources.histogram.HistogramStatisticMixin[source]

Bases: object

Mixin for creating statistics classes with histograms.

static get_histogram_file(score_id: str) str[source]
static get_histogram_image_file(score_id: str) str[source]
class dae.genomic_resources.histogram.NullHistogram(config: NullHistogramConfig | None)[source]

Bases: Statistic

Class for annulled histograms.

add_value(value: Any) None[source]

Add a value to the statistic.

static deserialize(content: str) NullHistogram[source]

Create a statistic from serialized data.

static from_dict(data: dict[str, Any]) NullHistogram[source]

Build a null histogram from a dict.

merge(other: Any) None[source]

Merge the values from another statistic in place.

plot(outfile: IO, score_id: str) None[source]
serialize() str[source]

Return a serialized version of this statistic.

to_dict() dict[str, Any][source]
type = 'null_histogram'
values_domain() str[source]
class dae.genomic_resources.histogram.NullHistogramConfig(reason: str)[source]

Bases: object

Configuration class for null histograms.

static default_config() NullHistogramConfig[source]
static from_dict(parsed: dict[str, Any]) NullHistogramConfig[source]

Create Null histogram from configuration dict.

reason: str
to_dict() dict[str, Any][source]
class dae.genomic_resources.histogram.NumberHistogram(config: NumberHistogramConfig, bins: ndarray | None = None, bars: ndarray | None = None)[source]

Bases: Statistic

Class to represent a histogram.

add_value(value: float | None) None[source]

Add value to the histogram.

choose_bin_lin(value: float) int[source]

Compute bin index for a passed value for linear x-scale.

choose_bin_log(value: float) int[source]

Compute bin index for a passed value for log x-scale.

static deserialize(content: str) NumberHistogram[source]

Create a statistic from serialized data.

static from_dict(data: dict[str, Any]) NumberHistogram[source]

Build a number histogram from a dict.

merge(other: Statistic) None[source]

Merge two histograms.

plot(outfile: IO, score_id: str) None[source]

Plot histogram and save it into outfile.

serialize() str[source]

Return a serialized version of this statistic.

to_dict() dict[str, Any][source]
type = 'number_histogram'
values_domain() str[source]
view_max() float[source]
view_min() float[source]
property view_range: tuple[Optional[float], Optional[float]]
class dae.genomic_resources.histogram.NumberHistogramConfig(view_range: tuple[Optional[float], Optional[float]], number_of_bins: int = 30, x_log_scale: bool = False, y_log_scale: bool = False, x_min_log: float | None = None)[source]

Bases: object

Configuration class for number histograms.

static default_config(min_max: MinMaxValue | None) NumberHistogramConfig[source]

Build a number histogram config from a parsed yaml file.

static from_dict(parsed: dict[str, Any]) NumberHistogramConfig[source]

Build a number histogram config from a parsed yaml file.

has_view_range() bool[source]
number_of_bins: int = 30
to_dict() dict[str, Any][source]
view_range: tuple[Optional[float], Optional[float]]
x_log_scale: bool = False
x_min_log: float | None = None
y_log_scale: bool = False
dae.genomic_resources.histogram.build_default_histogram_conf(value_type: str, **kwargs: Any) NumberHistogramConfig | CategoricalHistogramConfig | NullHistogramConfig[source]

Build default histogram config for given value type.

dae.genomic_resources.histogram.build_empty_histogram(config: NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig) NumberHistogram | CategoricalHistogram | NullHistogram[source]

Create an empty histogram from a deserialize histogram dictionary.

dae.genomic_resources.histogram.build_histogram_config(config: dict[str, Any] | None) NullHistogramConfig | CategoricalHistogramConfig | NumberHistogramConfig | None[source]

Create histogram config form configuration dict.

dae.genomic_resources.histogram.load_histogram(resource: GenomicResource, filename: str) NullHistogram | CategoricalHistogram | NumberHistogram[source]

Load and return a histogram in a resource.

On an error or missing histogram, an appropriate NullHistogram is returned.

dae.genomic_resources.liftover_resource module

dae.genomic_resources.reference_genome module

class dae.genomic_resources.reference_genome.ReferenceGenome(resource: GenomicResource)[source]

Bases: ResourceConfigValidationMixin

Provides an interface for quering a reference genome.

property chrom_prefix: str

Return a prefix of all chromosomes of the reference genome.

property chromosomes: list[str]

Return a list of all chromosomes of the reference genome.

close() None[source]

Close reference genome sequence file-like objects.

fetch(chrom: str, start: int, stop: int | None, buffer_size: int = 512) Generator[str, None, None][source]

Yield the nucleotides in a specific region.

While line feed calculation can be inaccurate because not every fetch will start at the start of a line, line feeds add extra characters to read and the output is limited by the amount of nucleotides expected to be read.

property files: list[str]
get_all_chrom_lengths() list[tuple[str, int]][source]

Return list of all chromosomes lengths.

get_chrom_length(chrom: str) int[source]

Return the length of a specified chromosome.

static get_schema() dict[str, Any][source]

Return schema to be used for config validation.

get_sequence(chrom: str, start: int, stop: int) str[source]

Return sequence of nucleotides from specified chromosome region.

is_open() bool[source]
is_pseudoautosomal(chrom: str, pos: int) bool[source]

Return true if specified position is pseudoautosomal.

open() ReferenceGenome[source]

Open reference genome resources.

property resource_id: str
split_into_regions(region_size: int, chromosome: str | None = None) Generator[Region, None, None][source]

Split the reference genome into regions and yield them.

Can specify a specific chromosome to limit the regions to be in that chromosome only.

dae.genomic_resources.reference_genome.build_reference_genome_from_file(filename: str) ReferenceGenome[source]

Open a reference genome from a file.

dae.genomic_resources.reference_genome.build_reference_genome_from_resource(resource: GenomicResource) ReferenceGenome[source]

Open a reference genome from resource.

dae.genomic_resources.repository module

Provides basic classes for genomic resources and repositories.

+———————+ +—————–+

+—–| GenomicResourceRepo |--------------------| GenomicResource | | +———————+ +—————–+ | ^ ^ | | | | | | | +—————————–+ +—————————-+ | | | GenomicResourceProtocolRepo | —-| ReadOnlyRepositoryProtocol | | | +—————————–+ +—————————-+ | | ^ | | | | +————————–+ +—————————–+ +—-| GenomicResourceGroupRepo | | ReadWriteRepositoryProtocol |

+————————–+ +—————————–+

class dae.genomic_resources.repository.GenomicResource(resource_id: str, version: tuple[int, ...], protocol: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol, config: dict[str, Any] | None = None, manifest: Manifest | None = None)[source]

Bases: object

Base class for genomic resources.

file_exists(filename: str) bool[source]

Check if filename exists in this resource.

get_config() dict[str, Any][source]

Return the resouce configuration.

get_description() str[source]

Return resource description.

get_file_content(filename: str, *, uncompress: bool = True, mode: str = 't') Any[source]

Return the content of file in a resource.

get_genomic_resource_id_version() str[source]

Return a string combinint resource ID and version.

Returns a string of the form aa/bb/cc[3.2] for a genomic resource with id aa/bb/cc and version 3.2. If the version is 0 the string will be aa/bb/cc.

get_id() str[source]

Return genomic resource ID.

get_labels() dict[str, Any][source]

Return resource labels.

get_manifest() Manifest[source]

Load resource manifest if it exists. Otherwise builds it.

get_summary() str | None[source]

Return resource summary.

get_type() str[source]

Return resource type as defined in ‘genomic_resource.yaml’.

get_url() str[source]
get_version_str() str[source]

Return version string of the form ‘3.1’.

invalidate() None[source]

Clean up cached attributes like manifest, etc.

open_raw_file(filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open a file in the resource and returns a File-like object.

open_tabix_file(filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file and returns a pysam.TabixFile.

open_vcf_file(filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file and returns a pysam.VariantFile.

class dae.genomic_resources.repository.GenomicResourceProtocolRepo(proto: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol)[source]

Bases: GenomicResourceRepo

Base class for real genomic resources repositories.

find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return one resource with id qual to resource_id.

If resource is not found, None is returned.

get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

invalidate() None[source]

Invalidate internal state of the repository.

class dae.genomic_resources.repository.GenomicResourceRepo(repo_id: str)[source]

Bases: ABC

Base class for genomic resources repositories.

property definition: dict[str, Any] | None
abstract find_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource | None[source]

Return one resource with id qual to resource_id.

If resource is not found, None is returned.

abstract get_all_resources() Generator[GenomicResource, None, None][source]

Return a generator over all resource in the repository.

abstract get_resource(resource_id: str, version_constraint: str | None = None, repository_id: str | None = None) GenomicResource[source]

Return one resource with id qual to resource_id.

If resource is not found, exception is raised.

abstract invalidate() None[source]

Invalidate internal state of the repository.

property repo_id: str
class dae.genomic_resources.repository.Manifest[source]

Bases: object

Provides genomic resource manifest object.

add(entry: ManifestEntry) None[source]

Add manifest enry to the manifest.

static from_file_content(file_content: str) Manifest[source]

Produce a manifest from manifest file content.

static from_manifest_entries(manifest_entries: list[dict[str, Any]]) Manifest[source]

Produce a manifest from parsed manifest file content.

get_files() list[tuple[str, int]][source]
names() set[str][source]

Return set of all file names from the manifest.

to_manifest_entries() list[dict[str, Any]][source]

Transform manifest to list of dictionaries.

Helpfull when storing the manifest.

update(entries: dict[str, dae.genomic_resources.repository.ManifestEntry]) None[source]
class dae.genomic_resources.repository.ManifestEntry(name: str, size: int, md5: str | None)[source]

Bases: object

Provides an entry into manifest object.

md5: str | None
name: str
size: int
class dae.genomic_resources.repository.ManifestUpdate(manifest: Manifest, entries_to_delete: set[str], entries_to_update: set[str])[source]

Bases: object

Provides a manifest update object.

entries_to_delete: set[str]
entries_to_update: set[str]
manifest: Manifest
class dae.genomic_resources.repository.Mode(value)[source]

Bases: Enum

Protocol mode.

READONLY = 1
READWRITE = 2
class dae.genomic_resources.repository.ReadOnlyRepositoryProtocol(proto_id: str)[source]

Bases: ABC

Defines read only genomic resources repository protocol.

CHUNK_SIZE = 32768
build_genomic_resource(resource_id: str, version: tuple[int, ...], config: dict | None = None, manifest: Manifest | None = None) GenomicResource[source]

Build a genomic resource based on this protocol.

compute_md5_sum(resource: GenomicResource, filename: str) str[source]

Compute a md5 hash for a file in the resource.

abstract file_exists(resource: GenomicResource, filename: str) bool[source]

Check if given file exist in give resource.

find_resource(resource_id: str, version_constraint: str | None = None) GenomicResource | None[source]

Return requested resource or None if not found.

abstract get_all_resources() Generator[GenomicResource, None, None][source]

Return generator for all resources in the repository.

get_file_content(resource: GenomicResource, filename: str, *, uncompress: bool = True, mode: str = 't') Any[source]

Return content of a file in given resource.

get_id() str[source]

Return the repository ID.

get_manifest(resource: GenomicResource) Manifest[source]

Load and returns a resource manifest.

get_resource(resource_id: str, version_constraint: str | None = None) GenomicResource[source]

Return requested resource or raises exception if not found.

In case resource is not found a FileNotFoundError exception is raised.

abstract get_url() str[source]

Return the repository URL.

abstract invalidate() None[source]

Invalidate internal cache of repository protocol.

abstract load_manifest(resource: GenomicResource) Manifest[source]

Load resource manifest.

load_yaml(resource: GenomicResource, filename: str) Any[source]

Return parsed YAML file.

mode() Mode[source]

Return repository protocol mode - READONLY or READWRITE.

abstract open_raw_file(resource: GenomicResource, filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open file in a resource and returns a file-like object.

abstract open_tabix_file(resource: GenomicResource, filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file in a resource and return a pysam tabix file.

Not all repositories support this method. Repositories that do no support this method raise and exception.

abstract open_vcf_file(resource: GenomicResource, filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file in a resource and return a pysam VariantFile.

Not all repositories support this method. Repositories that do no support this method raise and exception.

class dae.genomic_resources.repository.ReadWriteRepositoryProtocol(proto_id: str)[source]

Bases: ReadOnlyRepositoryProtocol

Defines read write genomic resources repository protocol.

abstract build_content_file() list[dict[str, Any]][source]

Build the content of the repository (i.e ‘.CONTENTS’ file).

build_manifest(resource: GenomicResource, prebuild_entries: dict[str, dae.genomic_resources.repository.ManifestEntry] | None = None) Manifest[source]

Build full manifest for the resource.

build_resource_file_state(resource: GenomicResource, filename: str, **kwargs: str | float | int | None) ResourceFileState[source]

Build resource file state.

check_update_manifest(resource: GenomicResource, prebuild_entries: dict[str, dae.genomic_resources.repository.ManifestEntry] | None = None) ManifestUpdate[source]

Check if the resource manifest needs update.

abstract collect_all_resources() Generator[GenomicResource, None, None][source]

Return generator for all resources managed by this protocol.

abstract collect_resource_entries(resource: GenomicResource) Manifest[source]

Scan the resource and returns manifest with all files.

copy_resource(remote_resource: GenomicResource) GenomicResource[source]

Copy a remote resource into repository.

abstract copy_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Copy a remote resource file into local repository.

abstract delete_resource_file(resource: GenomicResource, filename: str) None[source]

Delete a resource file and it’s internal state.

get_manifest(resource: GenomicResource) Manifest[source]

Load or build a resource manifest.

get_or_create_resource(resource_id: str, version: tuple[int, ...]) GenomicResource[source]

Return a resource with specified ID and version.

If the resource is not found create an empty resource.

abstract get_resource_file_size(resource: GenomicResource, filename: str) int[source]

Return the size of a resource file.

abstract get_resource_file_timestamp(resource: GenomicResource, filename: str) float[source]

Return the timestamp (ISO formatted) of a resource file.

abstract load_resource_file_state(resource: GenomicResource, filename: str) ResourceFileState | None[source]

Load resource file state from internal GRR state.

If the specified resource file has no internal state returns None.

mode() Mode[source]

Return repository protocol mode - READONLY or READWRITE.

save_index(resource: GenomicResource, contents: str) None[source]

Save an index HTML file into the genomic resource’s directory.

save_manifest(resource: GenomicResource, manifest: Manifest) None[source]

Save manifest into genomic resource’s directory.

abstract save_resource_file_state(resource: GenomicResource, state: ResourceFileState) None[source]

Save resource file state into internal GRR state.

update_manifest(resource: GenomicResource, prebuild_entries: dict[str, dae.genomic_resources.repository.ManifestEntry] | None = None) Manifest[source]

Update or create full manifest for the resource.

update_resource(remote_resource: GenomicResource, files_to_copy: set[str] | None = None) GenomicResource[source]

Copy a remote resource into repository.

Allows copying of a subset of files from the resource via files_to_copy. If files_to_copy is None, copies all files.

abstract update_resource_file(remote_resource: GenomicResource, dest_resource: GenomicResource, filename: str) ResourceFileState | None[source]

Update a resource file into repository if needed.

class dae.genomic_resources.repository.ResourceFileState(filename: str, size: int, timestamp: float, md5: str)[source]

Bases: object

Defines resource file state saved into internal GRR state.

filename: str
md5: str
size: int
timestamp: float
dae.genomic_resources.repository.is_gr_id_token(token: str) bool[source]

Check if token can be used as a genomic resource ID.

Genomic Resource Id Token is a string with one or more letters, numbers, ‘.’, ‘_’, or ‘-’. The function checks if the parameter token is a Genomic REsource Id Token.

dae.genomic_resources.repository.is_version_constraint_satisfied(version_constraint: str | None, version: tuple[int, ...]) bool[source]

Check if a version matches a version constraint.

dae.genomic_resources.repository.parse_gr_id_version_token(token: str) tuple[str, tuple[int, ...]][source]

Parse genomic resource ID with version.

Genomic Resource Id Version Token is a Genomic Resource Id Token with an optional version appened. If present, the version suffix has the form “(3.3.2)”. The default version is (0). Returns None if s in not a Genomic Resource Id Version. Otherwise returns token,version tupple

dae.genomic_resources.repository.parse_resource_id_version(resource_path: str) tuple[str, tuple[int, ...]][source]

Parse genomic resource id and version path into Id, Version tuple.

An optional version (0,) appened if needed. If present, the version suffix has the form “(3.3.2)”. The default version is (0,). Returns tuple (None, None) if the path does not match the resource_id/version requirements. Otherwise returns tuple (resource_id, version).

dae.genomic_resources.repository.version_string_to_suffix(version: str) str[source]

Transform version string into resource ID version suffix.

dae.genomic_resources.repository.version_tuple_to_string(version: tuple[int, ...]) str[source]
dae.genomic_resources.repository.version_tuple_to_suffix(version: tuple[int, ...]) str[source]

Transform version tuple into resource ID version suffix.

dae.genomic_resources.repository_factory module

Provides a factory for building genomic resources repostiories.

dae.genomic_resources.repository_factory.build_genomic_resource_group_repository(repo_id: str, children: list[dae.genomic_resources.repository.GenomicResourceRepo]) GenomicResourceRepo[source]
dae.genomic_resources.repository_factory.build_genomic_resource_repository(definition: dict | None = None, file_name: str | None = None) GenomicResourceRepo[source]

Build a GRR using a definition dict or yaml file.

dae.genomic_resources.repository_factory.build_resource_implementation(res: GenomicResource) GenomicResourceImplementation[source]

Build a resource implementation from a resource.

dae.genomic_resources.repository_factory.get_default_grr_definition() dict[str, Any][source]

Return default genomic resources repository definition.

dae.genomic_resources.repository_factory.get_default_grr_definition_path() str | None[source]

Return a path to default genomic resources repository definition.

dae.genomic_resources.repository_factory.load_definition_file(filename: str) Any[source]

Load GRR definition from a YAML file.

dae.genomic_resources.testing module

Provides tools usefult for testing.

dae.genomic_resources.testing.build_filesystem_test_protocol(root_path: Path, repair: bool = True) FsspecReadWriteProtocol[source]

Build and return an filesystem fsspec protocol for testing.

The root_path is expected to point to a directory structure with all the resources.

dae.genomic_resources.testing.build_filesystem_test_repository(root_path: Path) GenomicResourceProtocolRepo[source]

Build and return an filesystem fsspec repository for testing.

The root_path is expected to point to a directory structure with all the resources.

dae.genomic_resources.testing.build_filesystem_test_resource(root_path: Path) GenomicResource[source]
dae.genomic_resources.testing.build_http_test_protocol(root_path: Path, repair: bool = True) Generator[FsspecReadOnlyProtocol, None, None][source]

Run an HTTP range server and construct genomic resource protocol.

The HTTP range server is used to serve directory pointed by root_path. This directory should be a valid filesystem genomic resource repository.

dae.genomic_resources.testing.build_inmemory_test_protocol(content: dict[str, Any]) FsspecReadWriteProtocol[source]

Build and return an embedded fsspec protocol for testing.

dae.genomic_resources.testing.build_inmemory_test_repository(content: dict[str, Any]) GenomicResourceProtocolRepo[source]

Create an embedded GRR repository using passed content.

dae.genomic_resources.testing.build_inmemory_test_resource(content: dict[str, Any]) GenomicResource[source]

Create a test resource based on content passed.

The passed content should appropriate for a single resource. Example content: {

“genomic_resource.yaml”: textwrap.dedent(‘’’

type: position_score table:

filename: data.txt

scores:
  • id: aaaa

    type: float desc: “” name: sc

‘’’), “data.txt”: convert_to_tab_separated(‘’’

#chrom start end sc 1 10 12 1.1 2 13 14 1.2

‘’’)

}

dae.genomic_resources.testing.build_s3_test_bucket(s3filesystem: S3FileSystem | None = None) str[source]

Create an s3 test buckent.

dae.genomic_resources.testing.build_s3_test_filesystem(endpoint_url: str | None = None) S3FileSystem[source]

Create an S3 fsspec filesystem connected to the S3 server.

dae.genomic_resources.testing.build_s3_test_protocol(root_path: Path) Generator[FsspecReadWriteProtocol, None, None][source]

Run an S3 moto server and construct fsspec genomic resource protocol.

The S3 moto server is populated with resource from filesystem GRR pointed by the root_path.

dae.genomic_resources.testing.convert_to_tab_separated(content: str) str[source]

Convert a string into tab separated file content.

Useful for testing purposes. If you need to have a space in the file content use ‘||’.

dae.genomic_resources.testing.copy_proto_genomic_resources(dest_proto: FsspecReadWriteProtocol, src_proto: FsspecReadOnlyProtocol) None[source]
dae.genomic_resources.testing.http_process_test_server(path: Path) Generator[str, None, None][source]
dae.genomic_resources.testing.http_threaded_test_server(path: Path) Generator[str, None, None][source]

Run a range HTTP threaded server.

The HTTP range server is used to serve directory pointed by root_path.

dae.genomic_resources.testing.proto_builder(scheme: str, content: dict) Generator[FsspecReadOnlyProtocol | FsspecReadWriteProtocol, None, None][source]

Build a test genomic resource protocol with specified content.

dae.genomic_resources.testing.resource_builder(scheme: str, content: dict) Generator[GenomicResource, None, None][source]
dae.genomic_resources.testing.s3_test_protocol() FsspecReadWriteProtocol[source]

Build an S3 fsspec testing protocol on top of existing S3 server.

dae.genomic_resources.testing.s3_test_server_endpoint() str[source]
dae.genomic_resources.testing.setup_dae_transmitted(root_path: Path, summary_content: str, toomany_content: str) tuple[str, str][source]

Set up a DAE transmitted variants file using passed content.

dae.genomic_resources.testing.setup_denovo(denovo_path: Path, content: str) Path[source]
dae.genomic_resources.testing.setup_directories(root_dir: Path, content: str | dict[str, Any]) None[source]

Set up directory and subdirectory structures using the content.

dae.genomic_resources.testing.setup_empty_gene_models(out_path: Path) GeneModels[source]

Set up empty gene models.

dae.genomic_resources.testing.setup_gene_models(out_path: Path, content: str, fileformat: str | None = None) GeneModels[source]

Set up gene models in refflat format using the passed content.

dae.genomic_resources.testing.setup_genome(out_path: Path, content: str) ReferenceGenome[source]

Set up reference genome using the content.

dae.genomic_resources.testing.setup_gzip(gzip_path: Path, gzip_content: str) Path[source]

Set up a gzipped TSV file.

dae.genomic_resources.testing.setup_pedigree(ped_path: Path, content: str) Path[source]
dae.genomic_resources.testing.setup_tabix(tabix_path: Path, tabix_content: str, **kwargs: bool | str | int) tuple[str, str][source]

Set up a tabix file.

dae.genomic_resources.testing.setup_vcf(out_path: Path, content: str, csi: bool = False) Path[source]

Set up a VCF file using the content.

Module contents

class dae.genomic_resources.GenomicResource(resource_id: str, version: tuple[int, ...], protocol: ReadOnlyRepositoryProtocol | ReadWriteRepositoryProtocol, config: dict[str, Any] | None = None, manifest: Manifest | None = None)[source]

Bases: object

Base class for genomic resources.

file_exists(filename: str) bool[source]

Check if filename exists in this resource.

get_config() dict[str, Any][source]

Return the resouce configuration.

get_description() str[source]

Return resource description.

get_file_content(filename: str, *, uncompress: bool = True, mode: str = 't') Any[source]

Return the content of file in a resource.

get_genomic_resource_id_version() str[source]

Return a string combinint resource ID and version.

Returns a string of the form aa/bb/cc[3.2] for a genomic resource with id aa/bb/cc and version 3.2. If the version is 0 the string will be aa/bb/cc.

get_id() str[source]

Return genomic resource ID.

get_labels() dict[str, Any][source]

Return resource labels.

get_manifest() Manifest[source]

Load resource manifest if it exists. Otherwise builds it.

get_summary() str | None[source]

Return resource summary.

get_type() str[source]

Return resource type as defined in ‘genomic_resource.yaml’.

get_url() str[source]
get_version_str() str[source]

Return version string of the form ‘3.1’.

invalidate() None[source]

Clean up cached attributes like manifest, etc.

open_raw_file(filename: str, mode: str = 'rt', **kwargs: str | bool | None) IO[source]

Open a file in the resource and returns a File-like object.

open_tabix_file(filename: str, index_filename: str | None = None) TabixFile[source]

Open a tabix file and returns a pysam.TabixFile.

open_vcf_file(filename: str, index_filename: str | None = None) VariantFile[source]

Open a vcf file and returns a pysam.VariantFile.

dae.genomic_resources.build_genomic_resource_repository(definition: dict | None = None, file_name: str | None = None) GenomicResourceRepo[source]

Build a GRR using a definition dict or yaml file.

dae.genomic_resources.get_resource_implementation_builder(resource_type: str) Callable[[GenomicResource], GenomicResourceImplementation] | None[source]

Return an implementation builder for a certain resource type.

If the builder is not registered, then it will search for an entry point in the found implementations list. If an entry point is found, it will be loaded and registered and returned.