Multidimensional PDFs#

This tutorial is about handling multiple dimensions when creating a custom PDF.

The differences are marginal since the ordering is handled automatically. It is on the other hand crucial to understand the concept of a Space, most notably obs and axes.

A user (1someone who instantiates the PDF) only knows and handles observables. The relative order does not matter, if a data has observables a and b and a pdf has observables b and a, the data will be reordered automatically. Inside a PDF on the other hand, we do not care at all about observables but only about the ordering of the data, the axis. So any data tensor we have, and limits for integration, normalization etc. inside the PDF is order based and uses axes.

When passing the observables to the init of the PDF (as a user), each observable is automatically assigned to an axis corresponding to the order of the observable. The crucial point is therefore to communicate to the user which axis corresponds to what. The naming of the observables is completely up to the user, but the order of the observables depends on the pdf. Therefore, the correspondance of each axis to it’s meaning has to be stated in the docs.

import numpy as np
import zfit
import zfit.z.numpy as znp
from zfit import z

/home/docs/checkouts/readthedocs.org/user_builds/zfit/envs/latest/lib/python3.12/site-packages/zfit/__init__.py:59: UserWarning: TensorFlow warnings are by default suppressed by zfit. In order to show them, set the environment variable ZFIT_DISABLE_TF_WARNINGS=0. In order to suppress the TensorFlow warnings AND this warning, set ZFIT_DISABLE_TF_WARNINGS=1.
  warnings.warn(

Axes, not obs#

Since we create a pdf here, we now can completely forget about observables. We can assume that all the data is axes based (order based).We simply need to write down what each axis means.

An example pdf is implemented below. It calculates the lenght of a vector shifted by some number (dummy example).

class AbsVectorShifted(zfit.pdf.ZPDF):
    _N_OBS = 3  # dimension, can be omitted
    _PARAMS = ['xshift', 'yshift']  # the name of the parameters

    @zfit.supports()
    def _unnormalized_pdf(self, x, params):
        x0 = x[0]
        x1 = x[1]
        x2 = x[2]
        # alternatively, we could use the following line to get the same result
        # x0, x1, x2 = z.unstack_x(x)  # returns a list with the columns: do x1, x2, x3 = z.unstack_x(x) for 3D
        xshift = params['xshift']
        yshift = params['yshift']
        x0 = x0 + xshift
        x1 = x1 + yshift
        return znp.sqrt(znp.square(x0) + x1 ** 2 + znp.power(x2, 2))  # dummy calculations, all are equivalent

Done. Now we can use our pdf already!

xobs = zfit.Space('xobs', (-3, 3))
yobs = zfit.Space('yobs', (-2, 2))
zobs = zfit.Space('z', (-1, 1))
obs = xobs * yobs * zobs

data_np = np.random.random(size=(1000, 3))
data = zfit.data.Data.from_numpy(array=data_np, obs=obs)  # obs is automatically used as limits here.

Create two parameters and an instance of your own pdf

xshift = zfit.Parameter("xshift", 1.)
yshift = zfit.Parameter("yshift", 2.)
abs_vector = AbsVectorShifted(obs=obs, xshift=xshift, yshift=yshift)

probs = abs_vector.pdf(data)
print(probs[:20])

Estimated integral error ( 9.2569686084903219e-05 ) larger than tolerance ( 3e-06 ), which is maybe not enough (but maybe it's also fine). You can (best solution) implement an anatytical integral (see examples in repo) or manually set a higher number on the PDF with 'update_integration_options' and increase the 'max_draws' (or adjust 'tol'). If partial integration is chosen, this can lead to large memory consumption.This is a new warning checking the integral accuracy. It may warns too often as it is Work In Progress. If you have any observation on it, please tell us about it: https://github.com/zfit/zfit/issues/new/chooseTo suppress this warning, use zfit.settings.set_verbosity(-1).

tf.Tensor(
[0.02356894 0.02467707 0.02311619 0.02353638 0.02005047 0.01834578
 0.02087984 0.02198275 0.02095349 0.02367019 0.02114007 0.01970138
 0.01891328 0.02075711 0.02149213 0.01940187 0.01740966 0.02298603
 0.02279457 0.02171138], shape=(20,), dtype=float64)

We could improve our PDF by registering an integral. This requires a few steps:

define our integral as a function in python
define in which space our integral is valid, e.g. whether it is an integral over all axis or only partial and whether any limit is valid or only special (e.g. from -inf to inf)
register the integral and say if it supports additional things (e.g. norm_range)

Let’s start defining the function. This takes, for an integral over all axes, three parameters:

limits: the actual limits the integral is over
params: the parameters of the model (which may be needed)
model: the model (pdf/func) itself

we need to calculate the integral and return (currently) a scalar.

def abs_vector_integral_from_any_to_any(limits, params, model):
    lower, upper = limits.v1.limits
    # write your integral here
    return 42.  # dummy integral, must be a scalar!

Now let’s define the limits. We want to allow an integral over whole space in three dims, this may looks cumbersome but is straightforward (and done only once):

limit0 = zfit.Space(axes=0, lower=zfit.Space.ANY_LOWER, upper=zfit.Space.ANY_UPPER)
limit1 = zfit.Space(axes=1, lower=zfit.Space.ANY_LOWER, upper=zfit.Space.ANY_UPPER)
limit2 = zfit.Space(axes=2, lower=zfit.Space.ANY_LOWER, upper=zfit.Space.ANY_UPPER)
limits = limit0 * limit1 *  limit2  # creates the 3D limits
print(limits)

<zfit Space obs=None, axes=(0, 1, 2), limits=rectangular, binned=False>

Now we create our space and register the integral. In order to change precedency of integrals (e.g. because some are very simple and return a single number, so this special cases should be regarded first), a priority argument can be given. Also if the integral supports multiple limits or norm range calculation, this can be specified here. Otherwise, this is automatically handled and the integral never gets multiple limits resp a norm range (that’s why we don’t have it in the API of the integral function).

AbsVectorShifted.register_analytic_integral(func=abs_vector_integral_from_any_to_any, limits=limits,
                                           priority=51,
                                            supports_norm_range=False,  # False by default, but could be set to
                                            supports_multiple_limits=False)  # True. False -> autohandled

n-dimensional#

Advanced Custom PDF#

Subclass BasePDF. The _unnormalized_pdf has to be overriden and, in addition, the __init__.

Any of the public main methods (pdf, integrate, partial_integrate etc.) can always be overriden by implementing the function with a leading underscore, e.g. implement _pdf to directly controls pdf, the API is the same as the public function without the name. In case, during execution of your own method, it is found to be a bad idea to have overridden the default methods, throwing a NotImplementedError will restore the default behavior.

# TOBEDONE