Open In Colab Open In nbviewer

Blood chemistry#

This tutorial is a brief guide for the implementation of PhenoAge. Link to paper.

We just need two packages for this tutorial.

[1]:
import pandas as pd
import pyaging as pya

Download and load example data#

Let’s download some example human blood data.

[2]:
pya.data.download_example_data('blood_chemistry_example')
|-----> πŸ—οΈ Starting download_example_data function
|-----------> Data found in pyaging_data/blood_chemistry_example.pkl
|-----> πŸŽ‰ Done! [0.0014s]
[3]:
df = pd.read_pickle('pyaging_data/blood_chemistry_example.pkl')
[4]:
df.head()
[4]:
albumin creatinine glucose log_crp lymphocyte_percent mean_cell_volume red_cell_distribution_width alkaline_phosphatase white_blood_cell_count age
patient1 51.8 87.2 4.5 -0.2 27.9 92.4 13.9 123.5 0.006037 70.2
patient2 53.1 57.3 6.1 -0.2 27.8 80.9 12.0 81.5 0.004135 76.5
patient3 37.4 114.7 5.6 -0.2 23.6 83.2 12.4 124.4 0.007382 66.4
patient4 45.9 88.1 5.4 -0.2 38.6 92.5 11.4 113.4 0.006537 46.5
patient5 40.7 45.4 4.7 -0.2 38.3 88.8 13.5 107.8 0.004695 42.3

Convert data to AnnData object#

AnnData objects are highly flexible and are thus our preferred method of organizing data for age prediction.

[5]:
adata = pya.preprocess.df_to_adata(df)
|-----> πŸ—οΈ Starting df_to_adata function
|-----> βš™οΈ Create anndata object started
|-----> βœ… Create anndata object finished [0.0036s]
|-----> βš™οΈ Add metadata to anndata started
|-----------? No metadata provided. Leaving adata.obs empty
|-----> ⚠️ Add metadata to anndata finished [0.0005s]
|-----> βš™οΈ Log data statistics started
|-----------> There are 30 observations
|-----------> There are 10 features
|-----------> Total missing values: 0
|-----------> Percentage of missing values: 0.00%
|-----> βœ… Log data statistics finished [0.0010s]
|-----> βš™οΈ Impute missing values started
|-----------> No missing values found. No imputation necessary
|-----> βœ… Impute missing values finished [0.0008s]
|-----> πŸŽ‰ Done! [0.0089s]

Note that the original DataFrame is stored in X_original under layers. is This is what the adata object looks like:

[6]:
adata
[6]:
AnnData object with n_obs Γ— n_vars = 30 Γ— 10
    var: 'percent_na'
    layers: 'X_original'

Predict age#

We can either predict one clock at once or all at the same time. Given we only have one clock of interest for this tutorial, let’s go with one. The function is invariant to the capitalization of the clock name.

[7]:
pya.pred.predict_age(adata, 'PhenoAge')
|-----> πŸ—οΈ Starting predict_age function
|-----> βš™οΈ Set PyTorch device started
|-----------> Using device: cpu
|-----> βœ… Set PyTorch device finished [0.0008s]
|-----> πŸ•’ Processing clock: phenoage
|-----------> βš™οΈ Load clock started
|-----------------> Data found in pyaging_data/phenoage.pt
|-----------> βœ… Load clock finished [0.0148s]
|-----------> βš™οΈ Check features in adata started
|-----------------> All features are present in adata.var_names.
|-----------> βœ… Check features in adata finished [0.0006s]
|-----------> βš™οΈ Predict ages with model started
|-----------------> There is no preprocessing necessary
|-----------------> The postprocessing method is mortality_to_phenoage
|-----------------> in progress: 100.0000%
|-----------> βœ… Predict ages with model finished [0.0345s]
|-----------> βš™οΈ Add predicted ages and clock metadata to adata started
|-----------> βœ… Add predicted ages and clock metadata to adata finished [0.0006s]
|-----> πŸŽ‰ Done! [0.1068s]
[8]:
adata.obs.head()
[8]:
phenoage
patient1 70.643137
patient2 64.834061
patient3 70.258559
patient4 42.979385
patient5 41.677749

Having so much information printed can be overwhelming, particularly when running several clocks at once. In such cases, just set verbose to False.

[9]:
pya.data.download_example_data('blood_chemistry_example', verbose=False)
df = pd.read_pickle('pyaging_data/blood_chemistry_example.pkl')
adata = pya.preprocess.df_to_adata(df, verbose=False)
pya.pred.predict_age(adata, ['PhenoAge'], verbose=False)
[10]:
adata.obs.head()
[10]:
phenoage
patient1 70.643137
patient2 64.834061
patient3 70.258559
patient4 42.979385
patient5 41.677749

After age prediction, the clocks are added to adata.obs. Moreover, the percent of missing values for each clock and other metadata are included in adata.uns.

[11]:
adata
[11]:
AnnData object with n_obs Γ— n_vars = 30 Γ— 10
    obs: 'phenoage'
    var: 'percent_na'
    uns: 'phenoage_percent_na', 'phenoage_missing_features', 'phenoage_metadata'
    layers: 'X_original'

Get citation#

The doi, citation, and some metadata are automatically added to the AnnData object under adata.uns[CLOCKNAME_metadata].

[12]:
adata.uns['phenoage_metadata']
[12]:
{'clock_name': 'phenoage',
 'data_type': 'blood chemistry',
 'species': 'Homo sapiens',
 'year': 2018,
 'approved_by_author': 'βŒ›',
 'citation': 'Levine, Morgan E., et al. "An epigenetic biomarker of aging for lifespan and healthspan." Aging (albany NY) 10.4 (2018): 573.',
 'doi': 'https://doi.org/10.18632%2Faging.101414',
 'notes': None,
 'research_only': None,
 'version': None}