Predict#

Please note that most functions are helper functions and are not meant to be used directly.

pyaging.predict._pred#

pyaging.predict._pred.predict_age(adata, clock_names='horvath2013', dir='pyaging_data', batch_size=1024, clean=True, verbose=True)[source][source]#

Predicts biological age using specified aging clocks.

This function takes an AnnData object and applies one or more specified aging clock models to predict the biological age of the samples. It handles the entire pipeline from data preprocessing, model loading, prediction, to postprocessing. It also enriches the input AnnData object with the predicted ages and relevant clock metadata.

Parameters:

adata AnnData: An AnnData object. The object should have .X attribute for the data matrix and .var_names for feature names.
clock_names str (default: 'horvath2013'): Names of the aging clocks to be applied. It can be a single clock name as a string or a list of clock names, by default “horvath2013”.
dir str (default: 'pyaging_data'): Retained for backward compatibility. Hugging Face files use its standard cache.
batch_size int (default: 1024): The batch size for age inferece. Defaults to 1024.
clean bool (default: True): Whether to delete the matrix data create for each clock in adata.obsm[X_clock]. Defaults to True.
verbose bool (default: True): Whether to log the output to console with the logger. Defaults to True.

Return type:

AnnData

Returns:

AnnData The input AnnData object enriched with the predicted ages and clock metadata in the .obs and .uns attributes, respectively.

Notes

The function is designed to be flexible and can handle both single and multiple clock predictions. The predicted ages are appended to the .obs attribute of the AnnData object with the clock name as the key. The metadata of each clock used in the prediction is stored in the .uns attribute. Change batch size depending on memory constraints.

It is important that the input AnnData object’s .X attribute contains data suitable for age prediction.

The function automatically handles the transfer of data and models to the appropriate compute device (CPU or GPU) based on system configuration.

Examples

>>> adata = anndata.read_h5ad("sample_data.h5ad")
>>> adata = predict_age(adata, clock_names=["horvath2013", "hannum"])
>>> adata.obs["horvath2013"]  # Access predicted ages by clock name

pyaging.predict._pred_utils#

pyaging.predict._pred_utils.add_pred_ages_and_clock_metadata_adata(adata, model, predicted_ages, dir, logger, indent_level=2)[source][source]#

Add predicted ages to an AnnData object as a new column in the observation (obs) attribute. Also adds the specific clock metadata to the uns attribute of an AnnData object.

This function appends the predicted ages, obtained from a biological aging clock or similar model, to the AnnData object’s obs attribute. The predicted ages are added as a new column, named after the clock used to generate these predictions.

Parameters:

adata AnnData: The AnnData object to which the predicted ages will be added. It’s a data structure for handling large-scale biological data, like gene expression matrices, commonly used in bioinformatics.
model pyagingModel: The aging clock from which to get the metadata.
predicted_ages tensor: A torch tensor of predicted ages corresponding to the samples in the AnnData object. The length of this array should match the number of samples in adata.
dir str: Retained for backward compatibility. Hugging Face files use its standard cache.
logger Logger: A logger object for logging the progress or relevant information during the operation.
indent_level int (default: 2): The indentation level for logging messages, by default 2.

Return type:

None

Returns:

None This function modifies the AnnData object in-place and does not return any value.

Notes

It is essential to ensure that the length of predicted_ages matches the number of samples in the adata object. Mismatch in lengths will lead to errors or misaligned data.

This function is part of a pipeline that integrates aging clock predictions with the standard data structures used in bioinformatics, facilitating downstream analyses like visualization or statistical testing.

Examples

>>> adata = anndata.AnnData(np.random.rand(5, 10))
>>> predicted_ages = [25, 30, 35, 40, 45]
>>> add_pred_ages_adata(adata, predicted_ages_tensor, clock, "pyaging_data", logger)
>>> adata.obs["horvath2013"]
0    25
1    30
2    35
3    40
4    45
Name: horvath2013, dtype: int64
>>> adata.uns["horvath2013_metadata"]
{'species': 'Homo sapiens', 'data_type': 'methylation', 'citation': 'Horvath, S. (2013)'}

pyaging.predict._pred_utils.check_features_in_adata(adata, model, logger, indent_level=2)[source][source]#

Verifies if all required features are present in an AnnData object and adds missing features.

This function checks an AnnData object (commonly used in single-cell analysis) to ensure that it contains all the necessary features specified in the ‘features’ list inside the model. If any features are missing, they are added to the AnnData object with a default value of 0 or with a reference value if given. This is crucial for downstream analyses where the presence of all specified features is assumed.

Parameters:

adata AnnData: The AnnData object to be checked. It is a commonly used data structure in single-cell genomics containing high-dimensional data.
model pyagingModel: The pyagingModel of the aging clock of interest. Must contain defined features.
logger Logger: A logger object used for logging information about the process, such as the number of missing features.
indent_level int (default: 2): The indentation level for the logger, by default 2. It controls the formatting of the log messages.

Return type:

AnnData

Returns:

anndata.AnnData The updated AnnData object, which includes any missing features added with a default value of 0 (or reference value if provided).

Notes

This function is particularly useful in preprocessing steps where the consistency of data structure across different datasets is crucial. The function modifies the AnnData object if there are missing features and logs detailed information about these modifications.

The added features are initialized with zeros. This approach, while providing completeness, may introduce biases if not accounted for in downstream analyses. If reference values are provided, then they are used instead of zeros.

Examples

>>> updated_adata = check_features_in_adata(adata, bitage, ["gene1", "gene2"], logger)
>>> updated_adata.var_names
Index(['gene1', 'gene2', ...], dtype='object')

pyaging.predict._pred_utils.cleanup_clock_memory(model=None, clock_name=None, dir=None, **kwargs)[source][source]#

Explicitly clean up memory and disk space from loaded clock models.

This function performs aggressive memory and disk cleanup to prevent out-of-memory and out-of-disk-space issues during testing or when processing multiple clocks sequentially. It deletes specified objects, removes downloaded .pt files, and forces garbage collection.

Parameters:

model pyagingModel, optional: The loaded clock model to delete from memory.
clock_name str, optional: The name of the clock whose .pt file should be deleted from disk.
dir str, optional: The directory containing the .pt file to delete. Required if clock_name is provided.
**kwargs dict: Additional objects to delete from memory. Each key-value pair represents an object name and the object itself to be deleted.

Return type:

None

Notes

This function is particularly useful during testing when multiple clocks are loaded sequentially, as it prevents memory accumulation and disk space consumption that can lead to “No space left on device” errors in CI environments.

The function performs the following cleanup steps: 1. Deletes the provided model object if given 2. Deletes any additional objects passed via kwargs 3. Removes the downloaded .pt file from disk if clock_name and dir are provided 4. Forces Python garbage collection 5. Clears PyTorch CUDA cache if available

Examples

>>> model = load_clock("horvath2013", "cpu", "pyaging_data", logger)
>>> # ... use model ...
>>> cleanup_clock_memory(model=model, clock_name="horvath2013", dir="pyaging_data")

pyaging.predict._pred_utils.load_clock(clock_name, device, dir, logger, indent_level=2)[source][source]#

Loads the specified aging clock from Hugging Face and returns its components.

This function downloads the weights and configuration of a specified aging clock from Hugging Face. This allows users to instantiate and use the clock in their analyses.

Parameters:

clock_name str: The name of the aging clock to be loaded. This name identifies the clock’s weights and configuration on Hugging Face.
device str: Device to move clock to. Eithe ‘cpu’ or ‘cuda’.
dir str: Retained for backward compatibility. Hugging Face files use its standard cache.
logger Logger: A logger object used for logging information during the function execution.
indent_level int (default: 2): The indentation level for the logger, by default 2. It controls the formatting of the log messages.

Return type:

Tuple

Returns:

pyagingModel A clock model

Notes

The clock’s weights and configuration are stored in a .pt (PyTorch) file on Hugging Face. If the requested clock is unavailable, the function raises a NameError.

The logger is used extensively for progress tracking and information logging, enhancing transparency and user experience.

Examples

>>> clock = load_clock("clock1", "pyaging_data", logger)

pyaging.predict._pred_utils.predict_ages_with_model(adata, model, device, batch_size, logger, indent_level=2)[source][source]#

Predict biological ages using a trained model and input data.

This function takes a machine learning model and input data, and returns predictions made by the model. It’s primarily used for estimating biological ages based on various biological markers. The function assumes that the model is already trained. A dataloader is used because of possible memory constraints for large datasets.

Parameters:

adata AnnData: The AnnData object containing the dataset. Its .X attribute is expected to be a matrix where rows correspond to samples and columns correspond to features.
model pyagingModel: The pyagingModel of the aging clock of interest.
device str: Device to move AnnData to during inference. Eithe ‘cpu’ or ‘cuda’.
batch_size int: Batch size for the AnnLoader object to predict age.
logger Logger: A logger object for logging the progress or any relevant information during the prediction process.
indent_level int (default: 2): The indentation level for logging messages, by default 2.

Return type:

Tensor

Returns:

predictions : torch.Tensor An array of predicted ages or biological markers, as returned by the model.

Notes

Ensure that the data is preprocessed (e.g., scaled, normalized) as required by the model before passing it to this function. The model should be in evaluation mode if it’s a type that has different behavior during training and inference (e.g., PyTorch models).

The exact nature of the predictions (e.g., age, biological markers) depends on the model being used.

Examples

>>> model = load_pretrained_model()
>>> predictions = predict_ages_with_model(model, "cpu", logger)
>>> print(predictions[:5])
[34.5, 29.3, 47.8, 50.1, 42.6]

pyaging.predict._pred_utils.set_torch_device(logger, indent_level=1)[source][source]#

Set and return the PyTorch device based on the availability of CUDA.

This function checks if CUDA is available in the system and accordingly sets the PyTorch device to either ‘cuda’ or ‘cpu’. If CUDA is available, it utilizes GPU acceleration for PyTorch operations, significantly enhancing computation speed for large datasets. The chosen device is logged for user reference.

Parameters:

logger Logger: A logger object for logging the selected device.
indent_level int (default: 1): The indentation level for logging messages, by default 1.

Return type:

None

Returns:

torch.device The PyTorch device object set to ‘cuda’ if CUDA is available, or ‘cpu’ otherwise.

Notes

The function automatically detects the availability of CUDA and makes a decision without user input. This makes it convenient for deploying code on different machines without the need for manual configuration.

It is important to use the returned device for all PyTorch operations to ensure that they are executed on the correct hardware (CPU or GPU).

Examples

>>> logger = pyaging.logger.LoggerManager.gen_logger("example")
>>> device = set_torch_device(logger)
>>> print(device)
device(type='cuda')  # or device(type='cpu') if CUDA is not available

pyaging.predict._preprocessing#

pyaging.predict._preprocessing.binarize(x)[source][source]#: Binarizes an array based on the median of each row, excluding zeros.

pyaging.predict._preprocessing.quantile_normalize_with_gold_standard(x, gold_standard_means)[source][source]#: Apply quantile normalization on x using gold standard means.

pyaging.predict._preprocessing.scale(x, scaler)[source][source]#: Scales the input data using the provided scaler.

pyaging.predict._preprocessing.scale_row(x, x_overlap)[source][source]#: Scales the input data per row with mean 0 and std 1.

pyaging.predict._preprocessing.scale_with_gold_standard(x, column_means, column_stds)[source][source]#: Scales the input data per column given means and standard deviations.

pyaging.predict._preprocessing.tpm_norm_log1p(x, lengths)[source][source]#: Normalize an array of counts to TPM (Transcripts Per Million) then transforms with log1p.

pyaging.predict._postprocessing#

pyaging.predict._postprocessing.anti_log(x)[source][source]#: Applies a simple anti-logarithmic transformation.

pyaging.predict._postprocessing.anti_log_linear(x, adult_age=20)[source][source]#: Applies an anti-logarithmic linear transformation to a value.

pyaging.predict._postprocessing.anti_log_log(x)[source][source]#: Applies a double transformation: logarithmic followed by anti-logarithmic.

pyaging.predict._postprocessing.anti_logp2(x)[source][source]#: Applies an anti-logarithmic transformation with an offset of -2.

pyaging.predict._postprocessing.mortality_to_phenoage(x)[source][source]#: Applies a convertion from a CDF of the mortality score from a Gompertz distribution to phenotypic age.

pyaging.predict._postprocessing.petkovichblood(x)[source][source]#: Applies a convertion from the output of an ElasticNet to mouse age in months.

pyaging.predict._postprocessing.stubbsmultitissue(x)[source][source]#: Applies a convertion from the output of an ElasticNet to mouse age in months.