pheno package

Pheno DB access

Example usage of PhenoDB

To access a pheno DB you need to import DAE and use a factory object named pheno:

In [1]: from DAE import pheno

In [2]: pheno.get_pheno_db_names()
Out[2]: ['ssc', 'vip', 'spark', 'agre']

In [3]: phdb = pheno.get_pheno_db('agre')

The result of get_pheno_db is an instance of PhenoDB class. This is the main class that provides access to the phenotype database.

To access values of given measure use:

In [8]: df = phdb.get_measure_values_df('ADOS21.CSB9')
In [9]: df.head()
Out[9]:
  person_id  ADOS21.CSB9
0  AU011105          1.0
1  AU014005          2.0
2  AU015904          2.0
3  AU024704          2.0
4  AU025005          1.0

You can get a data frame with value for multiple measures by using:

In [12]: df = phdb.get_values_df(['ADOS21.CSB9', 'Raven1.B12'])
In [13]: df.head()
Out[13]:
  person_id  ADOS21.CSB9  Raven1.B12
0  AU011105          1.0         5.0
1  AU014005          2.0        -1.0
2  AU015904          2.0         NaN
3  AU024704          2.0         NaN
4  AU025005          1.0        -1.0

To access data for individuals in the database use:

In [10]: psdf = phdb.get_persons_df()
In [11]: psdf.head()
Out[11]:
    person_id family_id      role    gender             status
 0  AU2275201    AU2275  Role.dad  Gender.M  Status.unaffected
 1  AU2275202    AU2275  Role.mom  Gender.F  Status.unaffected
 2  AU2275301    AU2275  Role.prb  Gender.M    Status.affected
 3  AU2275302    AU2275  Role.sib  Gender.M    Status.affected
 4  AU0966201    AU0966  Role.dad  Gender.M  Status.unaffected

You can access individuals and measures values as a joined data frame by using get_persons_values_df:

In [17]: df = phdb.get_persons_values_df(['ADIR1.EHFMAN', 'Raven1.B12'])
In [18]: df.head()
Out[18]:
    person_id family_id      role    gender           status  \
2   AU2275301    AU2275  Role.prb  Gender.M  Status.affected
3   AU2275302    AU2275  Role.sib  Gender.M  Status.affected
6   AU0966301    AU0966  Role.prb  Gender.M  Status.affected
7   AU0966302    AU0966  Role.sib  Gender.M  Status.affected
10  AU0965301    AU0965  Role.prb  Gender.M  Status.affected

    ADIR1.EHFMAN  Raven1.B12
2            3.0         5.0
3            0.0         5.0
6            2.0        -1.0
7            2.0        -1.0
10           1.0        -1.0

dae.pheno.pheno_db module

class dae.pheno.pheno_db.Instrument(name)[source]

Bases: object

Instrument object represents phenotype instruments.

Common fields are:

  • instrument_name

  • measures – dictionary of all measures in the instrument

class dae.pheno.pheno_db.Measure(name)[source]

Bases: object

Measure objects represent phenotype measures.

Common fields are:

  • instrument_name

  • measure_name

  • measure_id - formed by instrument_name.`measure_name`

  • measure_type - one of ‘continuous’, ‘ordinal’, ‘categorical’

  • description

  • min_value - for ‘continuous’ and ‘ordinal’ measures

  • max_value - for ‘continuous’ and ‘ordinal’ measures

  • value_domain - string that represents the values

class dae.pheno.pheno_db.PhenoDB(dbfile)[source]

Bases: object

Main class for accessing phenotype database in DAE.

To access the phenotype database create an instance of this class and call the method load().

Common fields of this class are:

  • families – list of all families in the database

  • persons – list of all individuals in the database

  • instruments – dictionary of all instruments

  • measures – dictionary of all measures

get_instrument_measures(instrument_name)[source]

Returns measures for given instrument.

get_instrument_values(instrument_id, person_ids=None, family_ids=None, role=None)[source]

Returns a dictionary with values for all measures in given instrument (see get_values()).

get_instrument_values_df(instrument_id, person_ids=None, family_ids=None, role=None)[source]

Returns a dataframe with values for all measures in given instrument (see get_values_df).

get_measure(measure_id)[source]

Returns a measure by measure_id.

get_measure_values(measure_id, person_ids=None, family_ids=None, roles=None, default_filter='apply')[source]

Returns a dictionary with values for the specified measure_id.

measure_id – a measure ID which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

default_filter – one of (‘skip’, ‘apply’, ‘invert’). When the measure has a default_filter this argument specifies whether the filter should be applied or skipped or inverted.

The returned dictionary contains values of the measure for each individual. The person_id is used as key in the dictionary.

get_measure_values_df(measure_id, person_ids=None, family_ids=None, roles=None, default_filter='apply')[source]

Returns a data frame with values for the specified measure_id.

measure_id – a measure ID which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are retuned.

default_filter – one of (‘skip’, ‘apply’, ‘invert’). When the measure has a default_filter this argument specifies whether the filter should be applied or skipped or inverted.

The returned data frame contains two columns: person_id for individuals IDs and column named as measure_id values of the measure.

get_measures(instrument=None, measure_type=None)[source]

Returns a dictionary of measures objects.

instrument – an instrument name which measures should be returned. If not specified all type of measures are returned.

measure_type – a type (‘continuous’, ‘ordinal’ or ‘categorical’) of measures that should be returned. If not specified all type of measures are returned.

get_persons(roles=None, person_ids=None, family_ids=None)[source]

Returns individuals data from phenotype database.

roles – specifies persons of which role should be returned. If not specified returns all individuals from phenotype database.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

Returns a dictionary of (personId, Person()) where the Person object is the same object used into VariantDB families.

get_persons_df(roles=None, person_ids=None, family_ids=None)[source]

Returns a individuals information form phenotype database as a data frame.

roles – specifies persons of which role should be returned. If not specified returns all individuals from phenotype database.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

Each row of the returned data frame represnts a person from phenotype database.

Columns returned are: person_id, family_id, role, sex.

get_persons_values_df(measure_ids, person_ids=None, family_ids=None, roles=None)[source]

Returns a data frame with values for all measures in measure_ids joined with a data frame returned by get_persons_df.

get_values(measure_ids, person_ids=None, family_ids=None, roles=None)[source]

Returns dictionary dictionaries with values for all measure_ids.

The returned dictionary uses person_id as key. The value for each key is a dictionary of measurement values for each ID in measure_ids keyed measure_id.

measure_ids – list of measure IDs which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

get_values_df(measure_ids, person_ids=None, family_ids=None, roles=None, default_filter='apply')[source]

Returns a data frame with values for given list of measures.

Values are loaded using consecutive calls to get_measure_values_df() method for each measure in measure_ids. All data frames are joined in the end and returned.

measure_ids – list of measure ids which values should be returned.

person_ids – list of person IDs to filter result. Only data for individuals with person_id in the list person_ids are returned.

family_ids – list of family IDs to filter result. Only data for individuals that are members of any of the specified family_ids are returned.

roles – list of roles of individuals to select measure value for. If not specified value for individuals in all roles are returned.

has_measure(measure_id)[source]

Checks is measure_id is value ID for measure in our phenotype DB.

load()[source]

Loads basic families, instruments and measures data from the phenotype database.