# Custom models#

All elements of zfit are built to be easily customized. Especially models offer many possibilities to be implemented by the user; in the end, regardless of how many models are provided by a library and of how many things are though, there is always a use-case that was not thought of. High flexibility is therefore a crucial aspect.

This has disadvantages: the more freedom a model takes for itself, the less optimizations are potentially available. But this is usually not noticeable.

## Creating a model#

Following the philosophy of zfit, there are different levels of customization. For the most simple use-case, all we need to do is to provide a function describing the shape and the name of the parameters. This can be done by overriding `_unnormalized_pdf`

.

To implement a mathematical function in zfit, TensorFlow or z should be used. The latter is a subset of TensorFlow and improves it in some aspects, such as automatic dtype casting, and therefore preferred to use.
(*There are other ways to use arbitrary Python functions, they will be discussed later on*).

```
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import zfit
from zfit import z
```

```
/home/docs/checkouts/readthedocs.org/user_builds/zfit/envs/latest/lib/python3.8/site-packages/zfit/__init__.py:62: UserWarning: TensorFlow warnings are by default suppressed by zfit. In order to show them, set the environment variable ZFIT_DISABLE_TF_WARNINGS=0. In order to suppress the TensorFlow warnings AND this warning, set ZFIT_DISABLE_TF_WARNINGS=1.
warnings.warn(
```

```
/home/docs/checkouts/readthedocs.org/user_builds/zfit/envs/latest/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:53: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.8.0 and strictly below 2.11.0 (nightly versions are not supported).
The versions of TensorFlow you are currently using is 2.11.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
warnings.warn(
```

We can start with a simple model and implement a custom second order polynomial. Therefore we need to inherit from the right base class, the simpler one is `ZPDF`

.

For a minimal example, we need to override only `_unnormalized_pdf`

and specify a list of parameters.

`_unnormalized_pdf`

gets (currently) one argument, x. This is a zfit `Data`

object and should first be unstacked. If it is one dimensional - such as here - it will return a single Tensor, otherwise a list of Tensors that can directly be unpacked.

```
class SecondOrderPoly(zfit.pdf.ZPDF):
"""Second order polynomial `a + b * x + c * x^2`"""
_PARAMS = ['b', 'c'] # specify which parameters to take
def _unnormalized_pdf(self, x): # implement function, unnormalized
data = z.unstack_x(x)
b = self.params['b']
c = self.params['c']
return 1 + b * data + c * data ** 2
```

Note that we omitted *consciously* any attempt to normalize the function, as this is usually done over a specific range. Also, no analytic sampling or integration has to be provided. The model handles all of this internally automatically and we have the full functionality available.

First, we can instantiate the model:

```
obs = zfit.Space("obs1", limits=(-4, 4))
b = zfit.Parameter('b', 0.2, 0.1, 10)
custom_poly = SecondOrderPoly(obs=obs, b=b, c=1.4)
```

which lets us now fully access all the main methods of a model:

```
integral = custom_poly.integrate(limits=(-1, 2))
sample = custom_poly.sample(n=1000)
prob = custom_poly.pdf(sample)
print(f"integral={integral}, sample={sample}, prob={prob[:10]}")
```

```
integral=[0.11072835], sample=<zfit.Data: Data obs=('obs1',)>, prob=[0.19685551 0.03987646 0.28011154 0.03947442 0.22590297 0.31998154
0.20846688 0.05501554 0.06118631 0.25485279]
```

```
```

### What happened?#

The model tries to use analytical functions for integration and sampling *if available*, otherwise (as happened above), it falls back to the numerical methods. To improve our model, we can add an analytic integral, a common use case. This has to be the *integral over the _unnormalized_pdf*.

```
# define the integral function
def cdf_poly(limit, b, c):
return limit + 0.5 * b * limit ** 2 + 1 / 3 * c * limit ** 3
def integral_func(limits, norm_range, params, model):
b = params['b']
c = params['c']
lower, upper = limits.limit1d
lower = z.convert_to_tensor(lower) # the limits are now 1-D, for axis 1
upper = z.convert_to_tensor(upper)
# calculate the integral
integral = cdf_poly(upper, b, c) - cdf_poly(lower, b, c)
print("Integral called")
return integral
# define the space over which it is defined. Here, we use the axes
integral_limits = zfit.Space(axes=(0,), limits=(zfit.Space.ANY, zfit.Space.ANY))
SecondOrderPoly.register_analytic_integral(func=integral_func, limits=integral_limits)
```

```
poly2 = SecondOrderPoly(obs=obs, b=b, c=1.2)
```

```
integral_analytic = custom_poly.integrate(limits=(-1, 2))
sample = custom_poly.sample(n=1000)
prob_analytic = custom_poly.pdf(sample)
print(f"integral={integral}, sample={sample}, prob={prob[:10]}")
```

```
Integral called
```

```
```

```
Integral called
```

```
```

```
Integral called
```

```
```

```
integral=[0.11072835], sample=<zfit.Data: Data obs=('obs1',)>, prob=[0.19685551 0.03987646 0.28011154 0.03947442 0.22590297 0.31998154
0.20846688 0.05501554 0.06118631 0.25485279]
```

```
```

## Multiple dimensions and parameters with angular observables#

So far, we used rather simple examples and many basic shapes, such as polynomials, already have an efficient implementation within zfit. Therefore, we will now create a three dimensional PDF measuring the angular observables of a \(B^+ \rightarrow K^* l l\) decay.

The implementation is not “special” or complicated at all, it rather shows how to deal with multiple dimensions and how to manage several parameters. It was created using the equation of the angular observables (taken from a paper).

*Many thanks to Rafael Silva Coutinho for the implementation!*

```
class AngularPDF(zfit.pdf.ZPDF):
"""Full d4Gamma/dq2dOmega for Bd -> Kst ll (l=e,mu)
Angular distribution obtained in the total PDF (using LHCb convention JHEP 02 (2016) 104)
i.e. the valid of the angles is given for
- phi: [-pi, pi]
- theta_K: [0, pi]
- theta_l: [0, pi]
The function is normalized over a finite range and therefore a PDF.
Args:
FL (`zfit.Parameter`): Fraction of longitudinal polarisation of the Kst
S3 (`zfit.Parameter`): A_perp^2 - A_para^2 / A_zero^2 + A_para^2 + A_perp^2 (L, R)
S4 (`zfit.Parameter`): RE(A_zero*^2 * A_para^2) / A_zero^2 + A_para^2 + A_perp^2 (L, R)
S5 (`zfit.Parameter`): RE(A_zero*^2 * A_perp^2) / A_zero^2 + A_para^2 + A_perp^2 (L, R)
AFB (`zfit.Parameter`): Forward-backward asymmetry of the di-lepton system (also i.e. 3/4 * S6s)
S7 (`zfit.Parameter`): IM(A_zero*^2 * A_para^2) / A_zero^2 + A_para^2 + A_perp^2 (L, R)
S8 (`zfit.Parameter`): IM(A_zero*^2 * A_perp^2) / A_zero^2 + A_para^2 + A_perp^2 (L, R)
S9 (`zfit.Parameter`): IM(A_perp*^2 * A_para^2) / A_zero^2 + A_para^2 + A_perp^2 (L, R)
obs (`zfit.Space`):
name (str):
dtype (tf.DType):
"""
_PARAMS = ['FL', 'S3', 'S4', 'S5', 'AFB', 'S7', 'S8', 'S9']
_N_OBS = 3
def _unnormalized_pdf(self, x):
FL = self.params['FL']
S3 = self.params['S3']
S4 = self.params['S4']
S5 = self.params['S5']
AFB = self.params['AFB']
S7 = self.params['S7']
S8 = self.params['S8']
S9 = self.params['S9']
costheta_l, costheta_k, phi = z.unstack_x(x)
sintheta_k = tf.sqrt(1.0 - costheta_k * costheta_k)
sintheta_l = tf.sqrt(1.0 - costheta_l * costheta_l)
sintheta_2k = (1.0 - costheta_k * costheta_k)
sintheta_2l = (1.0 - costheta_l * costheta_l)
sin2theta_k = (2.0 * sintheta_k * costheta_k)
cos2theta_l = (2.0 * costheta_l * costheta_l - 1.0)
sin2theta_l = (2.0 * sintheta_l * costheta_l)
pdf = ((3.0 / 4.0) * (1.0 - FL) * sintheta_2k +
FL * costheta_k * costheta_k +
(1.0 / 4.0) * (1.0 - FL) * sintheta_2k * cos2theta_l +
-1.0 * FL * costheta_k * costheta_k * cos2theta_l +
S3 * sintheta_2k * sintheta_2l * tf.cos(2.0 * phi) +
S4 * sin2theta_k * sin2theta_l * tf.cos(phi) +
S5 * sin2theta_k * sintheta_l * tf.cos(phi) +
(4.0 / 3.0) * AFB * sintheta_2k * costheta_l +
S7 * sin2theta_k * sintheta_l * tf.sin(phi) +
S8 * sin2theta_k * sin2theta_l * tf.sin(phi) +
S9 * sintheta_2k * sintheta_2l * tf.sin(2.0 * phi))
return pdf
```

### Multidimensional Spaces#

This PDF now expects multidimensional data. Therefore, we need to provide a Space in multiple dimensions. The preferred way is to use the product operations to build this space from one dimensional `Space`

s

```
costhetha_k = zfit.Space('costheta_k', (-1, 1))
costhetha_l = zfit.Space('costheta_l', (-1, 1))
phi = zfit.Space('phi', (-np.pi, np.pi))
angular_obs = costhetha_k * costhetha_l * phi
```

### Managing parameters#

Luckily, we’re in Python, which provides many tools out-of-the-box. Handling parameters in a `dict`

can make things very easy, even for several parameters as here.

```
params_init = {'FL': 0.43, 'S3': -0.1, 'S4': -0.2, 'S5': -0.4, 'AFB': 0.343, 'S7': 0.001, 'S8': 0.003, 'S9': 0.002}
params = {name: zfit.Parameter(name, val, -1, 1) for name, val in params_init.items()}
angular_pdf = AngularPDF(obs=angular_obs, **params)
```

```
integral_analytic = angular_pdf.integrate(limits=angular_obs) # this should be one
sample = angular_pdf.sample(n=1000)
prob_analytic = angular_pdf.pdf(sample)
print(f"integral={integral}, sample={sample}, prob={prob[:10]}")
```

```
Estimated integral error ( 2.4615430669631983e-05 ) larger than tolerance ( 3e-06 ), which is maybe not enough (but maybe it's also fine). You can (best solution) implement an anatytical integral (see examples in repo) or manually set a higher number on the PDF with 'update_integration_options' and increase the 'max_draws' (or adjust 'tol'). If partial integration is chosen, this can lead to large memory consumption.This is a new warning checking the integral accuracy. It may warns too often as it is Work In Progress. If you have any observation on it, please tell us about it: https://github.com/zfit/zfit/issues/new/chooseTo suppress this warning, use zfit.settings.set_verbosity(-1).
```

```
Estimated integral error ( 2.4615430669631983e-05 ) larger than tolerance ( 3e-06 ), which is maybe not enough (but maybe it's also fine). You can (best solution) implement an anatytical integral (see examples in repo) or manually set a higher number on the PDF with 'update_integration_options' and increase the 'max_draws' (or adjust 'tol'). If partial integration is chosen, this can lead to large memory consumption.This is a new warning checking the integral accuracy. It may warns too often as it is Work In Progress. If you have any observation on it, please tell us about it: https://github.com/zfit/zfit/issues/new/chooseTo suppress this warning, use zfit.settings.set_verbosity(-1).
```

```
Estimated integral error ( 2.4615430669631983e-05 ) larger than tolerance ( 3e-06 ), which is maybe not enough (but maybe it's also fine). You can (best solution) implement an anatytical integral (see examples in repo) or manually set a higher number on the PDF with 'update_integration_options' and increase the 'max_draws' (or adjust 'tol'). If partial integration is chosen, this can lead to large memory consumption.This is a new warning checking the integral accuracy. It may warns too often as it is Work In Progress. If you have any observation on it, please tell us about it: https://github.com/zfit/zfit/issues/new/chooseTo suppress this warning, use zfit.settings.set_verbosity(-1).
```

```
integral=[0.11072835], sample=<zfit.Data: Data obs=('costheta_k', 'costheta_l', 'phi')>, prob=[0.19685551 0.03987646 0.28011154 0.03947442 0.22590297 0.31998154
0.20846688 0.05501554 0.06118631 0.25485279]
```

```
```

### Including another observable#

We built our angular PDF successfully and can use this 3 dimensional PDF now. If we want, we could also include another observable. For example, the polynomial that we created above and make it 4 dimensional. Because it’s so simple, let’s do that!

```
full_pdf = angular_pdf * poly2
# equivalently
# full_pdf = zfit.pdf.ProductPDF([angular_pdf, poly2])
```

Done! This PDF is now 4 dimensional, which *had to be*, given that the observable of `poly2`

is different from the observable of `angular_pdf`

. If they would coincide, e.g. if `poly2`

had the observable `phi`

, this would now be a 3 dimensional PDF.

```
print(f"obs angular: {angular_pdf.obs} obs poly:{poly2.obs} obs product: {full_pdf.obs})")
```

```
obs angular: ('costheta_k', 'costheta_l', 'phi') obs poly:('obs1',) obs product: ('costheta_k', 'costheta_l', 'phi', 'obs1'))
```

```
```

## What happened *exactly* ?#

The model tries to be as smart as possible and calls the most explicit function. Then it starts falling back to alternatives and uses, whenever possible, the analytic version (if available), otherwise a numerical.

The rule simplified: public (sanitizes input and) calls […] private. So e.g. `pdf`

calls `_pdf`

and if this is not provided, it uses the fallback that may not be optimized, but general enough to work.

The rule extended (in its current implementation): public calls a series of well defined methods and hooks before it calls the private method. These intermediate *can* be used, they mostly automatically catch certain cases and handle them for us.

**To remember**: in order to have full control over a public function such as `integrate`

, `pdf`

, `sample`

or `normalization`

, the private method, e.g. `_integrate`

can be overriden and is *guaranteed* to be called before other possibilities.

In the case above, `pdf`

called first `_pdf`

(which is not implemented), so it calls `_unnormalized_pdf`

and divides this by the `normalization`

. The latter also does not have an explicit implementation (`_implementation`

), so it uses the fallback and calls `integrate`

over the `norm_range`

. Since `_integrate`

is not provided, the fallback tries to perform an analytic integral, which is not available. Therefore, it integrates the `_unnormalized_prob`

numerically. In all of this calls, we can hook in by overriding the mentioned, specified methods.

What we did not mention: `ZPDF`

is just a wrapper around the actual `BasePDF`

that should be preferred in general; it simply provides a convenient `__init__`

. For the next example, we will implement a multidimensional PDF and use the custom `__init__`

### Overriding `pdf`

#

Before, we used `_unnormalized_pdf`

, which is the common use-case. Even if we want to add an analytic integral, we can register it. Or do more fancy stuff like overriding the `_normalization`

. We can however also get the full control of what our model output by directly overriding `_pdf`

. The signature does not contain only `x`

but additionally `norm_range`

. This can have no limits (`norm_range.has_limits`

is False), in which case the “unnormalized pdf” is requested. Otherwise, `norm_range`

can have different limits and we have to take care of the proper normalization.

This is usually not needed and inside zfit, all PDFs are implemented using the `_unnormalized_pdf`

.

Therefore, it provides mostly a possibility to implement *whatever* is wanted, any unforeseen use-case, any kind of hack to “just try out something”.

```
class CustomPDF(zfit.pdf.BasePDF):
"""My custom pdf with three parameters.
"""
def __init__(self, param1, param2, param3, obs, name="CustomPDF", ):
# we can now do complicated stuff here if needed
# only thing: we have to specify explicitly here what is which parameter
params = {'super_param': param1, # we can change/compose etc parameters
'param2': param2, 'param3': param3}
super().__init__(obs, params, name=name)
@zfit.supports(norm=True)
def _pdf(self, x, norm):
data = z.unstack_x(x)
param1 = self.params['super_param']
param2 = self.params['param2']
param3 = self.params['param3']
# just an arbitrary function
probs = 42 * param1 + (data * param3) ** param2
return probs
```

In a similar manner, other methods can be overriden as well. We won’t go into further details here, as this provides a quite advanced task. Furthermore, if stability is a large concern or such special cases need to be implemented, it is recommended to get in contact with the developers and share the idea.

### Composed PDFs#

So far, we only looked at creating a model that depends on parameters and data but did not include other models. This is crucial to create for example sums or products of PDFs. Instead of inheriting from `BasePDF`

, we can use the `BaseFunctor`

that contains a mixin which handles daughter PDFs correctly.

The main difference is that we can now provide a list of PDFs that our model depends on. There can still be parameters (as for example the `fracs`

for the sum) that describe the behavior of the models but they can also be omitted (e.g. for the product). _Sidenote: technically, a normal `BasePDF`

can of course also have no parameters, however, since this is a constant function without dependencies, this will rarely be used in practice.

```
class SquarePDF(zfit.pdf.BaseFunctor):
"""Example of a functor pdf that takes the log of a single PDF.
DEMONSTRATION PURPOSE ONLY, DO **NOT** USE IN REAL CASE.
"""
def __init__(self, pdf1, name="SumOf3"):
pdfs = [pdf1] # we could have more of course, e.g. for sums
# no need for parameters here, so we can omit it
obs = pdf1.space
super().__init__(pdfs=pdfs, obs=obs, name=name)
def _unnormalized_pdf(self, x):
# we do not need to unstack x here as we want to feed it directly to the pdf1
pdf1 = self.pdfs[0]
return pdf1.pdf(x) ** 2
```

```
squarepdf = SquarePDF(pdf1=poly2)
```

```
squarepdf.integrate(limits=(-2, 3.2))
```

```
Integral called
```

```
```

```
Integral called
```

```
```

```
<tf.Tensor: shape=(1,), dtype=float64, numpy=array([0.22241748])>
```

```
sample_square = squarepdf.sample(n=1000)
sample_square
```

```
Integral called
```

```
```

```
<zfit.core.data.SampleData at 0x7fdf29cc1e20>
```

```
squarepdf.pdf(sample_square)[:10]
```

```
Integral called
```

```
```

```
Integral called
```

```
```

```
<tf.Tensor: shape=(10,), dtype=float64, numpy=
array([0.18764492, 0.03313204, 0.37926908, 0.45790753, 0.28057677,
0.22386374, 0.04256745, 0.02233529, 0.07492064, 0.24992454])>
```

## …and now?#

We’ve implemented a custom PDF. Maybe spent quite some time fine tuning it, debugging it. Adding an integral. And now? Time to make it available to others: zfit-physics. This repository is meant for community contributions. It has less requirements to contribute than to zfit core and has a low threshold. Core devs can provide you with help and you can provide the community with a PDF.

Make an issue or a PR, everything is welcome!

### Mixing with pure Python#

Whenever possible, it is preferrable to write anything in TensorFlow. But there is the possibility to mix with pure Python, however losing many of the benefits that TensorFlow provides. To do so:

try to use

`z.py_function`

or`tf.py_function`

to wrap pure Python codeif you write something and want to make sure it is run in eager mode, use

`zfit.run.assert_executing_eagerly()`

. This way, your function won’t be compiled and an error would be raised.set the graph mode and numerical gradient accordingly

```
x_tf = z.constant(42.)
def sqrt(x):
return np.sqrt(x)
y = z.py_function(func=sqrt, inp=[x_tf], Tout=tf.float64)
```

```
/home/docs/checkouts/readthedocs.org/user_builds/zfit/envs/latest/lib/python3.8/site-packages/zfit/z/zextension.py:351: AdvancedFeatureWarning: Either you're using an advanced feature OR causing unwanted behavior. To turn this warning off, use `zfit.settings.advanced_warnings['py_func_autograd'] = False` or 'all' (use with care) with `zfit.settings.advanced_warnings['all'] = False
Using py_function without numerical gradients. If the Python code does not contain any parametrization by `zfit.Parameter` or similar, this can work out. Otherwise, in case it depends on those, you may want to set `zfit.run.set_autograd_mode(=False)`.
warn_advanced_feature(
```

This raises a warning: since we do not use pure TensorFlow anymore, it means that the automatic gradient (potentially) fails, as it cannot be traced through Python operations. Depending on the use-case, this is not a problem. That’s why the warning is an `AdvancedFeatureWarning`

: it doesn’t say what we’re doing is wrong, it simply warns that we should know what we’re doing; it can also be switched off as explained in the warning.

It is technically not always required: if we e.g. use the internal, numerical gradient of a minimizer such as Minuit, the global setting does not really matter anyway.

This follows strongly the zfit philosophy that there *must* not be any bounds in terms of flexibility and even hackability of the framework, this should be an inherent part of it. However, the user should be made aware when leaving “the safe path”.

To do what the above warning told us to do, we can use `zfit.run.set_autograd_mode(False)`

.

This is needed whenever we want to use non-traceable Python calls in the dynamic calculations, be it by using `py_function`

or be it by switching off the gradient mode as shown below.

#### Sidestep: What is ‘z’?#

This is a subset of TensorFlow, wrapped to improve dtype handling and sometimes even provide additional functionality, such as `z.function`

decorator.

### Full Python compatibility#

To operate in a full Python compatible, yet (way) less efficient mode, we can switch off the automatic gradient, as discussed before, and the graph compilation, leaving us with a Numpy-like TensorFlow

```
zfit.run.set_graph_mode(False)
zfit.run.set_autograd_mode(False)
```

```
<zfit.util.temporary.TemporarilySet at 0x7fdf2821a3a0>
```

We can now build a Gaussian purely based on Numpy. As we have seen when building graphs with TensorFlow: anything Python-like will be converted to a static value in the graph. So we have to make sure that our code is never run in graph mode but only executed eagerly.

This can be done by calling `zfit.run.assert_executing_eagerly()`

, which raises an error if this code is run in graph mode.

Note that omitting the graph mode means to loose many optimizations: Not only do we loose the whole TensorFlow speedup from the graph, we also perform redundant tasks that are not cached, since zfit itself is optimized to be run in the graph mode. However, practially, this mode should anyway be used rather rarely and compares still in the same order of magnitude as alternatives.

```
class NumpyGauss(zfit.pdf.ZPDF):
_PARAMS = ['mu', 'sigma']
def _unnormalized_pdf(self, x):
zfit.run.assert_executing_eagerly() # make sure we're eager
data = z.unstack_x(x)
mu = self.params['mu']
sigma = self.params['sigma']
return z.convert_to_tensor(np.exp( - 0.5 * (data - mu) ** 2 / sigma ** 2))
```

Make sure to return a Tensor again, otherwise there will be an error.

```
obs = zfit.Space('obs1', (-3, 3))
mu = zfit.Parameter('mu', 0., -1, 1)
sigma = zfit.Parameter('sigma', 1., 0.1, 10)
```

```
gauss_np = NumpyGauss(obs=obs, mu=mu, sigma=sigma)
gauss = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)
```

```
integral_np = gauss_np.integrate((-1, 0))
integral = gauss.integrate((-1, 0))
print(integral_np, integral)
```

```
tf.Tensor([0.3422688], shape=(1,), dtype=float64)
```

```
```

```
tf.Tensor([0.3422688], shape=(1,), dtype=float64)
```

```
```