Predict#
Please note that most functions are helper functions and are not meant to be used directly.
pyaging.predict._pred#
- pyaging.predict._pred.predict_age(adata, clock_names='horvath2013', dir='pyaging_data', batch_size=1024, clean=True, verbose=True)[source][source]#
Predicts biological age using specified aging clocks.
This function takes an AnnData object and applies one or more specified aging clock models to predict the biological age of the samples. It handles the entire pipeline from data preprocessing, model loading, prediction, to postprocessing. It also enriches the input AnnData object with the predicted ages and relevant clock metadata.
- Parameters:
- adata AnnData
An AnnData object. The object should have .X attribute for the data matrix and .var_names for feature names.
- clock_names str or list of str, optional
Names of the aging clocks to be applied. It can be a single clock name as a string or a list of clock names, by default “horvath2013”.
- dir str
The directory to deposit the downloaded file. Defaults to “pyaging_data”.
- batch_size int
The batch size for age inferece. Defaults to 1024.
- clean bool
Whether to delete the matrix data create for each clock in adata.obsm[X_clock]. Defaults to True.
- verbose bool
Whether to log the output to console with the logger. Defaults to True.
- Return type:
AnnData- Returns:
AnnData The input AnnData object enriched with the predicted ages and clock metadata in the .obs and .uns attributes, respectively.
Notes
The function is designed to be flexible and can handle both single and multiple clock predictions. The predicted ages are appended to the .obs attribute of the AnnData object with the clock name as the key. The metadata of each clock used in the prediction is stored in the .uns attribute. Change batch size depending on memory constraints.
It is important that the input AnnData object’s .X attribute contains data suitable for age prediction.
The function automatically handles the transfer of data and models to the appropriate compute device (CPU or GPU) based on system configuration.
Examples
>>> adata = anndata.read_h5ad("sample_data.h5ad") >>> adata = predict_age(adata, clock_names=["horvath2013", "hannum"]) >>> adata.obs["horvath2013"] # Access predicted ages by clock name
pyaging.predict._pred_utils#
- pyaging.predict._pred_utils.add_pred_ages_and_clock_metadata_adata(adata, model, predicted_ages, dir, logger, indent_level=2)[source][source]#
Add predicted ages to an AnnData object as a new column in the observation (obs) attribute. Also adds the specific clock metadata to the uns attribute of an AnnData object.
This function appends the predicted ages, obtained from a biological aging clock or similar model, to the AnnData object’s obs attribute. The predicted ages are added as a new column, named after the clock used to generate these predictions.
- Parameters:
- adata anndata.AnnData
The AnnData object to which the predicted ages will be added. It’s a data structure for handling large-scale biological data, like gene expression matrices, commonly used in bioinformatics.
- model pyagingModel
The aging clock from which to get the metadata.
- predicted_ages torch.tensor
A torch tensor of predicted ages corresponding to the samples in the AnnData object. The length of this array should match the number of samples in adata.
- dir str
The directory to deposit the downloaded file.
- logger Logger
A logger object for logging the progress or relevant information during the operation.
- indent_level int, optional
The indentation level for logging messages, by default 2.
- Return type:
None- Returns:
None This function modifies the AnnData object in-place and does not return any value.
Notes
It is essential to ensure that the length of predicted_ages matches the number of samples in the adata object. Mismatch in lengths will lead to errors or misaligned data.
This function is part of a pipeline that integrates aging clock predictions with the standard data structures used in bioinformatics, facilitating downstream analyses like visualization or statistical testing.
Examples
>>> adata = anndata.AnnData(np.random.rand(5, 10)) >>> predicted_ages = [25, 30, 35, 40, 45] >>> add_pred_ages_adata(adata, predicted_ages_tensor, clock, "pyaging_data", logger) >>> adata.obs["horvath2013"] 0 25 1 30 2 35 3 40 4 45 Name: horvath2013, dtype: int64 >>> adata.uns["horvath2013_metadata"] {'species': 'Homo sapiens', 'data_type': 'methylation', 'citation': 'Horvath, S. (2013)'}
- pyaging.predict._pred_utils.check_features_in_adata(adata, model, logger, indent_level=2)[source][source]#
Verifies if all required features are present in an AnnData object and adds missing features.
This function checks an AnnData object (commonly used in single-cell analysis) to ensure that it contains all the necessary features specified in the ‘features’ list inside the model. If any features are missing, they are added to the AnnData object with a default value of 0 or with a reference value if given. This is crucial for downstream analyses where the presence of all specified features is assumed.
- Parameters:
- adata anndata.AnnData
The AnnData object to be checked. It is a commonly used data structure in single-cell genomics containing high-dimensional data.
- model pyagingModel
The pyagingModel of the aging clock of interest. Must contain defined features.
- logger Logger
A logger object used for logging information about the process, such as the number of missing features.
- indent_level int, optional
The indentation level for the logger, by default 2. It controls the formatting of the log messages.
- Return type:
AnnData- Returns:
anndata.AnnData The updated AnnData object, which includes any missing features added with a default value of 0 (or reference value if provided).
Notes
This function is particularly useful in preprocessing steps where the consistency of data structure across different datasets is crucial. The function modifies the AnnData object if there are missing features and logs detailed information about these modifications.
The added features are initialized with zeros. This approach, while providing completeness, may introduce biases if not accounted for in downstream analyses. If reference values are provided, then they are used instead of zeros.
Examples
>>> updated_adata = check_features_in_adata(adata, bitage, ["gene1", "gene2"], logger) >>> updated_adata.var_names Index(['gene1', 'gene2', ...], dtype='object')
- pyaging.predict._pred_utils.cleanup_clock_memory(model=None, clock_name=None, dir=None, **kwargs)[source][source]#
Explicitly clean up memory and disk space from loaded clock models.
This function performs aggressive memory and disk cleanup to prevent out-of-memory and out-of-disk-space issues during testing or when processing multiple clocks sequentially. It deletes specified objects, removes downloaded .pt files, and forces garbage collection.
- Parameters:
- model pyagingModel, optional
The loaded clock model to delete from memory.
- clock_name str, optional
The name of the clock whose .pt file should be deleted from disk.
- dir str, optional
The directory containing the .pt file to delete. Required if clock_name is provided.
- **kwargs dict
Additional objects to delete from memory. Each key-value pair represents an object name and the object itself to be deleted.
- Return type:
None
Notes
This function is particularly useful during testing when multiple clocks are loaded sequentially, as it prevents memory accumulation and disk space consumption that can lead to “No space left on device” errors in CI environments.
The function performs the following cleanup steps: 1. Deletes the provided model object if given 2. Deletes any additional objects passed via kwargs 3. Removes the downloaded .pt file from disk if clock_name and dir are provided 4. Forces Python garbage collection 5. Clears PyTorch CUDA cache if available
Examples
>>> model = load_clock("horvath2013", "cpu", "pyaging_data", logger) >>> # ... use model ... >>> cleanup_clock_memory(model=model, clock_name="horvath2013", dir="pyaging_data")
- pyaging.predict._pred_utils.load_clock(clock_name, device, dir, logger, indent_level=2)[source][source]#
Loads the specified aging clock from a remote source and returns its components.
This function downloads the weights and configuration of a specified aging clock from a remote server. This allows users to instantiate and use the clock in their analyses.
- Parameters:
- clock_name str
The name of the aging clock to be loaded. This name is used to construct the URL for downloading the clock’s weights and configuration.
- device str
Device to move clock to. Eithe ‘cpu’ or ‘cuda’.
- dir str
The directory to deposit the downloaded file.
- logger Logger
A logger object used for logging information during the function execution.
- indent_level int, optional
The indentation level for the logger, by default 2. It controls the formatting of the log messages.
- Return type:
Tuple- Returns:
pyagingModel A clock model
Notes
The clock’s weights and configuration are assumed to be stored in a .pt (PyTorch) file on a remote server. The URL for the clock is constructed based on the clock’s name. The function uses the download utility to retrieve the file. If the clock or its components are not found, the function may fail or return incomplete information.
The logger is used extensively for progress tracking and information logging, enhancing transparency and user experience.
Examples
>>> clock = load_clock("clock1", "pyaging_data", logger)
- pyaging.predict._pred_utils.predict_ages_with_model(adata, model, device, batch_size, logger, indent_level=2)[source][source]#
Predict biological ages using a trained model and input data.
This function takes a machine learning model and input data, and returns predictions made by the model. It’s primarily used for estimating biological ages based on various biological markers. The function assumes that the model is already trained. A dataloader is used because of possible memory constraints for large datasets.
- Parameters:
- adata anndata.AnnData
The AnnData object containing the dataset. Its .X attribute is expected to be a matrix where rows correspond to samples and columns correspond to features.
- model pyagingModel
The pyagingModel of the aging clock of interest.
- device str
Device to move AnnData to during inference. Eithe ‘cpu’ or ‘cuda’.
- batch_size int
Batch size for the AnnLoader object to predict age.
- logger Logger
A logger object for logging the progress or any relevant information during the prediction process.
- indent_level int, optional
The indentation level for logging messages, by default 2.
- Return type:
Tensor- Returns:
predictions : torch.Tensor An array of predicted ages or biological markers, as returned by the model.
Notes
Ensure that the data is preprocessed (e.g., scaled, normalized) as required by the model before passing it to this function. The model should be in evaluation mode if it’s a type that has different behavior during training and inference (e.g., PyTorch models).
The exact nature of the predictions (e.g., age, biological markers) depends on the model being used.
Examples
>>> model = load_pretrained_model() >>> predictions = predict_ages_with_model(model, "cpu", logger) >>> print(predictions[:5]) [34.5, 29.3, 47.8, 50.1, 42.6]
- pyaging.predict._pred_utils.set_torch_device(logger, indent_level=1)[source][source]#
Set and return the PyTorch device based on the availability of CUDA.
This function checks if CUDA is available in the system and accordingly sets the PyTorch device to either ‘cuda’ or ‘cpu’. If CUDA is available, it utilizes GPU acceleration for PyTorch operations, significantly enhancing computation speed for large datasets. The chosen device is logged for user reference.
- Parameters:
- logger Logger
A logger object for logging the selected device.
- indent_level int, optional
The indentation level for logging messages, by default 1.
- Return type:
None- Returns:
torch.device The PyTorch device object set to ‘cuda’ if CUDA is available, or ‘cpu’ otherwise.
Notes
The function automatically detects the availability of CUDA and makes a decision without user input. This makes it convenient for deploying code on different machines without the need for manual configuration.
It is important to use the returned device for all PyTorch operations to ensure that they are executed on the correct hardware (CPU or GPU).
Examples
>>> logger = pyaging.logger.LoggerManager.gen_logger("example") >>> device = set_torch_device(logger) >>> print(device) device(type='cuda') # or device(type='cpu') if CUDA is not available
pyaging.predict._preprocessing#
- pyaging.predict._preprocessing.binarize(x)[source][source]#
Binarizes an array based on the median of each row, excluding zeros.
- pyaging.predict._preprocessing.quantile_normalize_with_gold_standard(x, gold_standard_means)[source][source]#
Apply quantile normalization on x using gold standard means.
- pyaging.predict._preprocessing.scale(x, scaler)[source][source]#
Scales the input data using the provided scaler.
- pyaging.predict._preprocessing.scale_row(x, x_overlap)[source][source]#
Scales the input data per row with mean 0 and std 1.
pyaging.predict._postprocessing#
- pyaging.predict._postprocessing.anti_log(x)[source][source]#
Applies a simple anti-logarithmic transformation.
- pyaging.predict._postprocessing.anti_log_linear(x, adult_age=20)[source][source]#
Applies an anti-logarithmic linear transformation to a value.
- pyaging.predict._postprocessing.anti_log_log(x)[source][source]#
Applies a double transformation: logarithmic followed by anti-logarithmic.
- pyaging.predict._postprocessing.anti_logp2(x)[source][source]#
Applies an anti-logarithmic transformation with an offset of -2.
- pyaging.predict._postprocessing.mortality_to_phenoage(x)[source][source]#
Applies a convertion from a CDF of the mortality score from a Gompertz distribution to phenotypic age.