Hi,
First of all, thank you for this fantastic tool. I've been using the Tasccoda module for differential abundance analysis across multiple datasets (different regions) in a loop. I store the resulting MuData objects, which contain the MCMC results, in a dictionary for later analysis and plotting.
The Problem
I've encountered an issue when trying to generate ArviZ plots in a separate step after the modeling is complete. The coda.make_arviz(data) function fails with a matrix dimension mismatch error (TypeError: dot_general requires contracting dimensions...).
After investigating, I realized that make_arviz does not solely rely on the MuData object passed to it. It also uses the internal state of the Tasccoda instance (e.g., self.mcmc) which was configured by the last call to coda.prepare().
This leads to a "desynchronization" problem: if I try to extract arviz data from a MuData object from a previous iteration in the loop, make_arviz will use the model configuration from the last iteration, causing a shape mismatch if the number of cell types (which usually happens as different regions have different cell populations).
Example workflow to illustrate:
coda = pt.tl.Tasccoda()
processed_models = {}
datasets = {"region_A": data_A, "region_B": data_B} # Example datasets
for name, mdata in datasets.items():
coda.prepare(mdata, ...)
coda.run_nuts(mdata, ...)
processed_models[name] = mdata
# This call fails if region_A has a different number of cell types than region_B
arviz_data = coda.make_arviz(data=processed_models["region_A"])
Workaround
The current workaround is to "resynchronize" the Tasccoda instance by calling coda.prepare() again with the specific MuData object right before calling make_arviz.
# Workaround
coda.prepare(processed_models["region_A"], ...) # Re-run prepare
arviz_data = coda.make_arviz(data=processed_models["region_A"]) # Now it works
While this works, it's not very intuitive and makes the code more error-prone.
Suggestion for Improvement
Would it be technically feasible to make make_arviz a stateless function, or at least have it derive all necessary model information directly from the provided MuData object? The MuData object already stores the MCMC results and the parameters in uns['scCODA_params']. If the model structure could also be inferred or retrieved from the MuData object, it would make the function much more robust and self-contained.
This would align with the expectation that the MuData object is the single source of truth for a given analysis, making batch processing and later analysis much more straightforward.
Thank you for considering this suggestion!
Hi,
First of all, thank you for this fantastic tool. I've been using the
Tasccodamodule for differential abundance analysis across multiple datasets (different regions) in a loop. I store the resultingMuDataobjects, which contain the MCMC results, in a dictionary for later analysis and plotting.The Problem
I've encountered an issue when trying to generate ArviZ plots in a separate step after the modeling is complete. The
coda.make_arviz(data)function fails with a matrix dimension mismatch error (TypeError: dot_general requires contracting dimensions...).After investigating, I realized that
make_arvizdoes not solely rely on theMuDataobject passed to it. It also uses the internal state of theTasccodainstance (e.g.,self.mcmc) which was configured by the last call tocoda.prepare().This leads to a "desynchronization" problem: if I try to extract
arvizdata from aMuDataobject from a previous iteration in the loop,make_arvizwill use the model configuration from the last iteration, causing a shape mismatch if the number of cell types (which usually happens as different regions have different cell populations).Example workflow to illustrate:
Workaround
The current workaround is to "resynchronize" the
Tasccodainstance by callingcoda.prepare()again with the specificMuDataobject right before callingmake_arviz.While this works, it's not very intuitive and makes the code more error-prone.
Suggestion for Improvement
Would it be technically feasible to make
make_arviza stateless function, or at least have it derive all necessary model information directly from the providedMuDataobject? TheMuDataobject already stores the MCMC results and the parameters inuns['scCODA_params']. If the model structure could also be inferred or retrieved from theMuDataobject, it would make the function much more robust and self-contained.This would align with the expectation that the
MuDataobject is the single source of truth for a given analysis, making batch processing and later analysis much more straightforward.Thank you for considering this suggestion!