Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/api/distributions/transforms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Transformations
***************

.. currentmodule:: pymc.distributions.transforms
.. module:: pymc.distributions.transforms
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify. :module:: must be specified at least once (and probably only once) for Sphinx to create an "index entry". Without it, references :currentmodule::... and :mod:... would lead to nowhere. It so happens that module:: pymc.distributions.transforms was absent, even though currentmodule was present, and Sphinx does not check this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the module directive here is probably fine, I am not sure how relevant it is though. Is pymc.distributions.transforms as a module something relevant to users we'll want to reference? If so we need to switch, otherwise using currentmodule is perfectly fine but using module won't hurt either.


Full context:

.. currentmodule:: is basically syntactic sugar, its only role is defining the module name to prepend all the autodoc/autosummary entries in that file. That is, we can use circular or ordered in the autosummary directive below instead of needing to specify pymc.distributions.transforms.circular...

.. module:: does the same but also generates a target that can be referenced from other places using the :mod:`pymc.distributions.transforms` role; .. currentmodule:: does not reference the module directive with the same name nor needs it to exist to work properly though.

The module directive can also take a :synopsis: to manually add the module docstring. In our case however, using this optional argument wouldn't make much sense, it would be much better to use .. automodule:: instead which defines a module directive pulling the docstring directly from the module in question (like we do for classes or functions).

Sphinx does not check this because module/automodule should indeed be used at most once, but it isn't really necessary to use it at least once, that depends on how the package maintainers decide to define and document the public API.

Copy link
Copy Markdown
Author

@avm19 avm19 Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the module directive here is probably fine, I am not sure how relevant it is though.

Without it references did not work, which is why I made the change.

P.S. I don't remember if Sphinx threw an error or just no hyperlink without the module. I don't know much about this, but I guess you were not using automodule because this module is accompanied by a lot of text, which was placed in transform.rst (better have it there, than a huge module-level docstring). Some other entries, e.g. shape_utils.rst, also contain text in addition to Sphinx directives.

Is pymc.distributions.transforms as a module something relevant to users we'll want to reference?

Yes. The module page explains how PyMC deals with transformations, and I reference it as a good source for further details in a method's docstring.


While many distributions are defined on constrained spaces (e.g. intervals), MCMC samplers typically perform best when sampling on the unconstrained real line; this is especially true of HMC samplers. PyMC balances this through the use of transforms. A transform instance can be passed to the constructor of a random variable to tell the sampler how to move between the underlying unconstrained space where the samples are actually drawn and the transformed space constituting the support of the random variable. Transforms are not currently implemented for discrete random variables.

Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
}
# fmt: on
numpydoc_xref_aliases = {
"Variable": ":class:`~pytensor.graph.basic.Variable`",
"TensorVariable": ":class:`~pytensor.tensor.TensorVariable`",
"RandomVariable": ":class:`~pytensor.tensor.random.RandomVariable`",
"ndarray": ":class:`~numpy.ndarray`",
Expand Down
204 changes: 152 additions & 52 deletions pymc/model/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -582,20 +582,37 @@ def compile_logp(
sum: bool = True,
**compile_kwargs,
) -> PointFunc:
"""Compiled log probability density function.

The function expects as input a dictionary with the same structure as self.initial_point()
"""Compiled joint log-probability density of the model or joint log-probability contributions.

Parameters
----------
vars : list of random variables or potential terms, optional
Compute the gradient with respect to those variables. If None, use all
free and observed random variables, as well as potential terms in model.
jacobian : bool
Whether to include jacobian terms in logprob graph. Defaults to True.
sum : bool
Whether to sum all logp terms or return elemwise logp for each variable.
Defaults to True.
vars : Variable, sequence of Variable or None, default None
Random variables or potential terms whose contribution to logp is to be included.
If None, use all basic (free or observed) variables and potentials defined in the model.
jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some working.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything specific?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include jacobian correction term for transformed variables ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should explain why, specially because the why is context specific. Maybe you don't want it because you want to do optimization on the constrained space, using unconstrained variables.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the goal of this PR to give a concise and clear description of what jacobian=True does, so this bit is central. I am trying to share my perspective of an outsider to make the docstrings friendlier to users who are not used to PyMC parlance.

I don't think we should explain why

Here I am explaining not so much "why", as "what". The phrase "Jacobian correction term" might sound precise and unquestionable to people working on PyMC or specialising in Bayesian modelling, but to someone coming from a slightly different field it is very ambiguous and uninformative. Which Jacobian? Why correction? There are so many Jacobians, it is not clear which one it is about. Any gradient, such as Model.dlogp, is also a Jacobian (neglect the differential geometry subtleties of co- and contra-variance), and one would naturally think that the Jacobian refers to dlogp, because d2logp is called a Hessian in the same file. Further, what is meant by "correction"? Who made the error that needs to be corrected? Or is it like the prediction-correction optimisation methods used in engineering? They use first and second derivatives, so sounds about right... Frankly, I never saw the change-of-variables Jacobian determinant being referred to as "correction", so it is the last connection I would make.

I am trying to explain what jacobian does in a way that does not allow misinterpretations. I am not try trying to convince the reader why they should use it.

I thought about using "which" here:

jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
so that which make the result is the true density of transformed random variables.
See :py:mod:pymc.distributions.transforms for details.

but then it is not clear if "which" refers to "transformations" (incorrect) or to "contributions" (correct). This is why I chose "so that" as a conjunction -- it refers to "add", i.e. the purpose of the parameter jacobian.

Copy link
Copy Markdown
Author

@avm19 avm19 Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They may not be automatic, the user can override behavior with transform=x when defining a variable.

They are still automatic in the sense that you don't have to change variables in the model definition. Maybe it would be correct to say that all these transformations are automatic, which includes both the default transformations and the user-specified transformations? Here is a relevant quote:

If a random variable has a ``default_transform`` and an additional transform
is provided through the ``transform`` parameter, PyMC will automatically
create an instance of the :class:`Chain` transform that applies the
user-provided transform on top of the default one.

true thing still, doesn't sound quite right.

Oh, I misunderstood your previous message, I thought you were referring to somewhat redundant "If True ...". Let me think if there are ways to re-phrase "true density".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the true thing still, doesn't sound quite right

You are right, "true density" doesn't sound right, neither does "proper", and I haven't found anything better. How about:

jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
so that the result is the log-probability density of transformed random variables.
See :py:mod:pymc.distributions.transforms for details.

That is, if I convinced you that user-specified and default transforms are all "automatic". Otherwise, remove the word "automatic".

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that "true" is not right, it makes a subjective judgement where none exists. You can see an example here of a case where applying it leads to the "false" logp. It applies a change of variables correction implied by the model's transformation, if one exists. I would write a docstring that says something like that:

jacobian: bool, default True
     If True, the change-of-variable correction implied by each random variable's transformation, if applicable. This correction accounts for the distortion of probability mass induced by the application of a non-linear function to a random variable. See :py:mod:pymc.distributions.transforms for details. 

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see an example here of a case where applying it leads to the "false" logp.

After a cursory reading, I tend to think that the confusion in there was between $\ln p_{X_t, Y_t, Z_t, ...}(x_t,y_t,z_t,...)$ and $\ln p_{X,Y,Z,...}(x(x_t),y(y_t),z(z_t),...)$. Both are functions of transformed variables, both are "true" logps. But the former is the log density of transformed RVs, and the latter is the log density of the original RVs. To sample transformed variables one would use $p_{X_t, Y_t, Z_t}(...)$, and to optimise $p_{X,Y,Z}(...)$ one would use $p_{X,Y,Z}(...)$. I alluded to this in the conversation and in my suggested version. The "true" was a bad choice (it may also suggest ground truth vs approximation). But I think "density of transformed random variables" is a good part, which pushes the reader in the right direction.

Maybe we should also add that with jacobian=False the result is "density of untransformed variables"? (I am like 95% sure, but I can check the maths and get a toy example)

Copy link
Copy Markdown
Member

@ricardoV94 ricardoV94 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I tend to think of transformed/untransformed as unconstrained/constrained, but that's giving an interpretation of why the transform was applied)

Maybe we should also add that with jacobian=False the result is "density of untransformed variables"? (I am like 95% sure, but I can check the maths and get a toy example)

I would maybe try to emphasize this function always takes a point in transformed space and "untransforms" it to evaluate the logprob function of each random variable in it's "natural"/original space. What's optional is whether we add the terms corresponding to this transformation.

This is not a specific suggestion, just sharing how I think mechanistically about it.

so that the result is the true density of transformed random variables.
See :py:mod:`pymc.distributions.transforms` for details.
sum : bool, default True
If True, return the sum of the relevant logp terms as a single Variable.
If False, return a list of logp terms corresponding to `vars`.
**compile_kwargs : dict
Extra arguments passed to :meth:`self.compile_fn() <Model.compile_fn>`.

Returns
-------
PointFunc
The function expects as input a dictionary with the same structure as
:meth:`self.initial_point() <Model.initial_point>`.

See Also
--------
:py:meth:`logp` :
log-probability density as a Variable (in a symbolic form).
:py:meth:`compile_dlogp` :
gradient of log-probability density as a compiled function.
:py:meth:`compile_d2logp` :
Hessian of log-probability density as a compiled function.
"""
compile_kwargs.setdefault("on_unused_input", "ignore")
return self.compile_fn(
Expand All @@ -610,18 +627,34 @@ def compile_dlogp(
jacobian: bool = True,
**compile_kwargs,
) -> PointFunc:
"""Compiled log probability density gradient function.

The function expects as input a dictionary with the same structure as self.initial_point()

"""Compiled gradient of the joint log-probability density of the model.

Parameters
----------
vars : list of random variables or potential terms, optional
Compute the gradient with respect to those variables. If None, use all
free and observed random variables, as well as potential terms in model.
jacobian : bool
Whether to include jacobian terms in logprob graph. Defaults to True.
vars : Variable, sequence of Variable or None, default None
Compute the gradient with respect to values of these variables.
If None, use all continuous free (unobserved) variables defined in the model.
jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
so that the result is the true density of transformed random variables.
See :py:mod:`pymc.distributions.transforms` for details.
**compile_kwargs : dict
Extra arguments passed to :meth:`self.compile_fn() <Model.compile_fn>`.

Returns
-------
PointFunc
The function expects as input a dictionary with the same structure as
:meth:`self.initial_point() <Model.initial_point>`.

See Also
--------
:py:meth:`dlogp` :
gradient of log-probability density as a Variable (in a symbolic form).
:py:meth:`compile_logp` :
log-probability density as a compiled function.
:py:meth:`compile_d2logp` :
Hessian of log-probability density as a compiled function.
"""
compile_kwargs.setdefault("on_unused_input", "ignore")
return self.compile_fn(
Expand All @@ -637,17 +670,36 @@ def compile_d2logp(
negate_output=True,
**compile_kwargs,
) -> PointFunc:
"""Compiled log probability density hessian function.

The function expects as input a dictionary with the same structure as self.initial_point()
"""Compiled Hessian of the joint log-probability density of the model.

Parameters
----------
vars : list of random variables or potential terms, optional
Compute the gradient with respect to those variables. If None, use all
free and observed random variables, as well as potential terms in model.
jacobian : bool
Whether to include jacobian terms in logprob graph. Defaults to True.
vars : Variable, sequence of Variable or None, default None
Compute the gradient with respect to values of these variables.
If None, use all continuous free (unobserved) variables defined in the model.
jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
so that the result is the true density of transformed random variables.
See :py:mod:`pymc.distributions.transforms` for details.
negate_output : bool, default True
If True, change the sign of the output and return the opposite of the Hessian.
**compile_kwargs : dict
Extra arguments passed to :meth:`self.compile_fn() <Model.compile_fn>`.

Returns
-------
PointFunc
The function expects as input a dictionary with the same structure as
:meth:`self.initial_point() <Model.initial_point>`.

See Also
--------
:py:meth:`d2logp` :
Hessian of log-probability density as a Variable (in a symbolic form).
:py:meth:`compile_logp` :
log-probability density as a compiled function.
:py:meth:`compile_dlogp` :
gradient of log-probability density as a compiled function.
"""
compile_kwargs.setdefault("on_unused_input", "ignore")
return self.compile_fn(
Expand All @@ -662,22 +714,46 @@ def logp(
jacobian: bool = True,
sum: bool = True,
) -> Variable | list[Variable]:
"""Elemwise log-probability of the model.
"""Joint log-probability density of the model or joint log-probability contributions.

Parameters
----------
vars : list of random variables or potential terms, optional
Compute the gradient with respect to those variables. If None, use all
free and observed random variables, as well as potential terms in model.
jacobian : bool
Whether to include jacobian terms in logprob graph. Defaults to True.
sum : bool
Whether to sum all logp terms or return elemwise logp for each variable.
Defaults to True.
vars : Variable, sequence of Variable or None, default None
Random variables or potential terms whose contribution to logp is to be included.
If None, use all basic (free or observed) variables and potentials defined in the model.
jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
so that the result is the true density of transformed random variables.
See :py:mod:`pymc.distributions.transforms` for details.
sum : bool, default True
If True, return the sum of the relevant logp terms as a single Variable.
If False, return a list of logp terms corresponding to `vars`.

Returns
-------
Logp graph(s)
Variable or list of Variable

See Also
--------
:py:meth:`compile_logp` :
log-probability density as a compiled function.
:py:meth:`dlogp` :
gradient of log-probability density as a Variable (in a symbolic form).
:py:meth:`d2logp` :
Hessian of log-probability density as a Variable (in a symbolic form).
:py:meth:`logp_dlogp_function` :
compile logp and its gradient as a single function.
:py:attr:`varlogp` :
convenience property for logp of all free (unobserved) RVs.
:py:attr:`varlogp_nojac` :
convenience property for logp of all free (unobserved) RVs without transformation
corrections.
:py:attr:`observedlogp` :
convenience property for logp of all observed RVs.
:py:attr:`potentiallogp`.
convenience property for all additional logp terms (potentials).
:py:attr:`point_logps` :
convenience property for numerical evaluation of local logps at a point.
"""
varlist: list[TensorVariable]
if vars is None:
Expand Down Expand Up @@ -742,19 +818,30 @@ def dlogp(
vars: Variable | Sequence[Variable] | None = None,
jacobian: bool = True,
) -> Variable:
"""Gradient of the models log-probability w.r.t. ``vars``.
"""Gradient of the joint log-probability density of the model.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

density is a strong word, model may be discrete or a mix

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, a probability mass function is a probability density (i.e. a Radon-Nikodym derivative) with respect to the counting measure, in contrast to a "regular" PDF of an absolutely continuous variable, which is wrt the standard Lebesgue-Borel measure; and so the mixed case should also be a density wrt a certain product measure.

I agree that people might find this confusing, and some may even assume that discrete variables are marginalised out or not allowed at all in this context. On the other hand, I used the word "density" because Jacobians arise only in (absolutely) continuous variables; if it were not a density, there would be no Jacobians.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do constant use the term log-probability though. Yes jacobian corrections don't exist for discrete variables, I don't think that's going to be what makes it confusing to understand jacobian kwarg here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the word "density" was there before me in compile_logp, compile_dlogp, compile_d2logp only. I only put it everywhere else for consistency.

"""Compiled log probability density gradient function.

Should I make it "log-probability" (hyphenated, noun) and remove "density" from everywhere (except jacobian parameter where "density" is relevant)?


Parameters
----------
vars : list of random variables or potential terms, optional
Compute the gradient with respect to those variables. If None, use all
free and observed random variables, as well as potential terms in model.
jacobian : bool
Whether to include jacobian terms in logprob graph. Defaults to True.
vars : Variable, sequence of Variable or None, default None
Compute the gradient with respect to values of these variables.
If None, use all continuous free (unobserved) variables defined in the model.
jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
so that the result is the true density of transformed random variables.
See :py:mod:`pymc.distributions.transforms` for details.

Returns
-------
dlogp graph
Variable

See Also
--------
:py:meth:`compile_dlogp` :
gradient of log-probability density as a compiled function.
:py:meth:`logp` :
log-probability density as a Variable (in a symbolic form).
:py:meth:`d2logp` :
Hessian of log-probability density as a Variable (in a symbolic form).
"""
if vars is None:
value_vars = self.continuous_value_vars
Expand Down Expand Up @@ -782,19 +869,32 @@ def d2logp(
jacobian: bool = True,
negate_output=True,
) -> Variable:
"""Hessian of the models log-probability w.r.t. ``vars``.
"""Hessian of the joint log-probability density of the model.

Parameters
----------
vars : list of random variables or potential terms, optional
Compute the gradient with respect to those variables. If None, use all
free and observed random variables, as well as potential terms in model.
jacobian : bool
Whether to include jacobian terms in logprob graph. Defaults to True.
vars : Variable, sequence of Variable or None, default None
Compute the gradient with respect to values of these variables.
If None, use all continuous free (unobserved) variables defined in the model.
jacobian : bool, default True
If True, add Jacobian contributions associated with automatic variable transformations,
so that the result is the true density of transformed random variables.
See :py:mod:`pymc.distributions.transforms` for details.
negate_output : bool, default True
If True, change the sign of the output and return the opposite of the Hessian.

Returns
-------
d²logp graph
Variable

See Also
--------
:py:meth:`compile_d2logp` :
Hessian of log-probability density as a compiled function.
:py:meth:`logp` :
log-probability density as a Variable (in a symbolic form).
:py:meth:`dlogp` :
gradient of log-probability density as a Variable (in a symbolic form).
"""
if vars is None:
value_vars = self.continuous_value_vars
Expand Down