Upgrade guide to 0.20#
With version 0.20, zfit prepares for a more stable and user-friendly interface. This guide will help you to upgrade your code to the new version and demonstrate the most significant changes. It is meant for people who are already familiar with zfit and want to upgrade their code.
Not all changes are everywhere reflected in the docs, help with the docs is welcome as well as adding a more polished PDF. (See all issues with contributions wanted or reach out to us on Mattermost, Gitter, GitHub, or e-mail if you’re interested in becoming a contributor, from beginners to advanced).
import numpy as np
# standard imports
import zfit
import zfit.z.numpy as znp # use this "numpy-like" for mathematical operations
import zfit_physics as zphys # physics module, with new physics-inspired PDFs
from zfit import z
/home/docs/checkouts/readthedocs.org/user_builds/zfit/checkouts/latest/.venv/lib/python3.12/site-packages/zfit/__init__.py:60: UserWarning: TensorFlow warnings are by default suppressed by zfit. In order to show them, set the environment variable ZFIT_DISABLE_TF_WARNINGS=0. In order to suppress the TensorFlow warnings AND this warning, set ZFIT_DISABLE_TF_WARNINGS=1.
warnings.warn(
2024-11-30 18:42:05.113219: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732992125.132994 2310 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732992125.138749 2310 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
@z.function
def maximum(x, y):
return znp.maximum(x, y)
# example usage of the numpy-like backend, use it if possible
print(f"sqrt with np:{np.sqrt(maximum(1., 2.))}, with z:{znp.sqrt(maximum(1., 2.))}")
print(f"vectorized: {np.sqrt(maximum(znp.array([1., 2.]), znp.array([3., 4.])))}")
sqrt with np:1.4142135623730951, with z:1.4142135623730951
vectorized: [1.73205081 2. ]
2024-11-30 18:42:09.777294: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Parameters#
The largest news comes from parameters: the NameAlreadyTakenError
is gone (!). Multiple parameters with the same name can now be created and co-exist. The only limit is: they must not exist within the same PDF/loss etc., as this would lead to ambiguities.
param1 = zfit.Parameter("param1", 1, 0, 10)
param1too = zfit.Parameter("param1", 2, 0, 10)
# no error!
Labels#
Many objects, including parameters, can now have a label. This label is purely human-readable and can be used for plotting, printing, etc. It can contain arbitrary characters.
The name
of objects still exists and will in a future version probably be used for identification purpose (in conjunction with serialization). Therefore, use label
for human-readable names and avoid name
for that purpose.
param1again = zfit.Parameter("param1", 3, 0, 10, label=r"$param_1$ [GeV$^2$]")
Space#
As explained in the github discussion thread there are major improvements and changes.
multispaces (adding two spaces, having disjoint observables) are now deprecated and will be removed. The new
TruncatedPDF
supports multiple spaces as limits and achieves a similar, if not better, functionality.the return type of
Space.limit
will be changed in the future. For forwards compatibility, useSpace.v1.limit
(or similar methods) instead ofSpace.limit
. The old one is still available viaSpace.v0.limit
.new ways of creating spaces
More examples on how to use the new spaces.
obs1 = zfit.Space("obs1", -1, 1) # no tuple needed anymore
obs2 = zfit.Space("obs2", lower=-1, upper=1, label="observable two")
# create a space with multiple observables
obs12 = obs1 * obs2
# limits are now as one would naively expect, and area has been renamed to volume (some are tensors, but that doesn't matter: they behave like numpy arrays)
print(f"old: limits={obs12.v0.limits}")
print(f"new: limits={obs12.v1.limits}") # we have just 1D arrays
old: limits=(array([[-1., -1.]]), array([[1., 1.]]))
new: limits=(<tf.Tensor: shape=(2,), dtype=float64, numpy=array([-1., -1.])>, <tf.Tensor: shape=(2,), dtype=float64, numpy=array([1., 1.])>)
# this allows, for example, for a more intuitive way
np.linspace(*obs12.v1.limits, num=7)
array([[-1. , -1. ],
[-0.66666667, -0.66666667],
[-0.33333333, -0.33333333],
[ 0. , 0. ],
[ 0.33333333, 0.33333333],
[ 0.66666667, 0.66666667],
[ 1. , 1. ]])
Data#
Data handling has been significantly simplified and streamlined.
all places (or most) take directly
numpy
arrays, tensors orpandas DataFrame
as input, using aData
object is not necessary anymore (but convenient, as it cuts the data to the space)Data
objects can now be created without the specific constructors (e.g.zfit.Data.from_pandas
), but directly with the data. The constructors are still available for convenience and for more options.Data
objects are now stateless and offer insteadwith
-methods to change the data. For example,with_obs
,with_weights
(can beNone
to have without weights) etc.The
SamplerData
has been overhauld and has now a more public API including aupdate_data
method that allows to change the data without creating a new object and without relying on acreate_sampler
method from a PDF.zfit.data.concat
has been added to concatenate data objects, similar topd.concat
.
data1_obs1 = zfit.Data(np.random.uniform(0, 1, 1000), obs=obs1)
data1_obs2 = zfit.Data(np.random.uniform(0, 1, 1000), obs=obs2, label="My favourite $x$")
data2_obs1 = zfit.Data(np.random.normal(0, 1, 1000), obs=obs1)
# similar like pd.concat
data_obs12 = zfit.data.concat([data1_obs1, data1_obs2], axis="columns")
data_obs1 = zfit.data.concat([data1_obs1, data2_obs1], axis="index")
# data can be accessed with "obs" directly
data_obs12["obs1"] # returns a numpy array shape (n,)
data_obs12[["obs2", "obs1"]] # returns a numpy array shape (n, 2)
<tf.Tensor: shape=(1000, 2), dtype=float64, numpy=
array([[0.33590007, 0.9542226 ],
[0.39612457, 0.78468156],
[0.15094887, 0.09979714],
...,
[0.03894517, 0.95923108],
[0.24036664, 0.12277354],
[0.31331769, 0.47680337]])>
Binning#
Using a binned space has now the effect of creating a binned version. This happens for Data
and PDF
objects.
obs1_binned = obs1.with_binning(12)
data_binned = zfit.Data(np.random.normal(0, 0.2, 1000), obs=obs1_binned)
PDFs#
there are a plethora of new PDFs, mostly covering physics inspired use-cases. Amongst the interesting ones are a
GeneralizedCB
, a more general version of theDoubleCB
that should be preferred in the future. A Voigt profile is available, Bernstein polynomials, QGauss, GaussExpTail, etc. and in zfit-physics HEP specific PDFS , fromCMSShape
,Cruijff
,Novosibirsk
and more.the
TruncatedPDF
has been added to allow for a more flexible way of truncating a PDF. Any PDF can be converted to a truncated version usingto_truncated
(which, by default, truncates to the limits of the space).PDFs have a new
plot
method that allows for a quick plotting of the PDF (it takes an “obs” argument that allows to simply project it!). This is still experimental and may changes, the main purpose is to allow for a quick check of the PDF in interactive environments. The function is fully compatible with matplotlib and takes anax
argument, it also allows to pass through any keyword arguments to the plotting function.
# all the new PDFs
print(zfit.pdf.__all__)
['BasePDF', 'BaseFunctor', 'Exponential', 'Voigt', 'CrystalBall', 'DoubleCB', 'GeneralizedCB', 'GaussExpTail', 'GeneralizedGaussExpTail', 'Gauss', 'GeneralizedGauss', 'BifurGauss', 'Uniform', 'TruncatedGauss', 'WrapDistribution', 'Cauchy', 'Poisson', 'QGauss', 'ChiSquared', 'StudentT', 'Gamma', 'JohnsonSU', 'Bernstein', 'Chebyshev', 'Legendre', 'Chebyshev2', 'Hermite', 'Laguerre', 'RecursivePolynomial', 'ProductPDF', 'SumPDF', 'GaussianKDE1DimV1', 'KDE1DimGrid', 'KDE1DimExact', 'KDE1DimFFT', 'KDE1DimISJ', 'FFTConvPDFV1', 'ConditionalPDFV1', 'ZPDF', 'SimplePDF', 'SimpleFunctorPDF', 'UnbinnedFromBinnedPDF', 'BinnedFromUnbinnedPDF', 'HistogramPDF', 'SplineMorphingPDF', 'BinwiseScaleModifier', 'BinnedSumPDF', 'BaseBinnedFunctorPDF', 'BaseBinnedPDF', 'SplinePDF', 'TruncatedPDF', 'LogNormal', 'CachedPDF']
print(zphys.pdf.__all__)
['Argus', 'RelativisticBreitWigner', 'CMSShape', 'Cruijff', 'ErfExp', 'Novosibirsk', 'Tsallis']
# create a PDF
pdf = zfit.pdf.Gauss(
mu=zfit.Parameter("mu", 1.2), sigma=param1again, obs=obs1, extended=1000
) # using an extended PDF, the truncated pdf automatically rescales
pdf.plot.plotpdf() # quick plot
# truncate it
pdf_truncated = pdf.to_truncated(limits=(-0.7, 0.1))
pdf_truncated.plot.plotpdf()
/tmp/ipykernel_2310/1765407134.py:5: ExperimentalFeatureWarning: The function <function assert_initialized.<locals>.wrapper at 0x7f8e82ebd300> is EXPERIMENTAL, potentially unstable and likely to change in the future! Use it with caution and feedback (Gitter, Mattermost, e-mail or https://github.com/zfit/zfit/issues) is very welcome!
pdf.plot.plotpdf() # quick plot
<Axes: xlabel='obs1', ylabel='Extended probability density'>
# a truncated PDF can also have multiple limits, replacing the MultiSpace concept
range1 = zfit.Space('obs1', -0.9, -0.1, label="lower range") # an excellent use-case of a label
range2 = zfit.Space('obs1', 0.3, 0.5, label="upper range")
pdf_disjoint = pdf.to_truncated(limits=(range1, range2))
pdf_disjoint.plot.plotpdf()
<Axes: xlabel='obs1', ylabel='Extended probability density'>
# binned pdfs from space work like the data
gauss_binned = zfit.pdf.Gauss(mu=zfit.Parameter("mu", 2.5), sigma=param1again, obs=obs1_binned, extended=1000)
/home/docs/checkouts/readthedocs.org/user_builds/zfit/checkouts/latest/.venv/lib/python3.12/site-packages/zfit/models/tobinned.py:78: AdvancedFeatureWarning: Either you're using an advanced feature OR causing unwanted behavior. To turn this warning off, use `zfit.settings.advanced_warnings['extend_wrapped_extended'] = False` or 'all' (use with care) with `zfit.settings.advanced_warnings['all'] = False
PDF Gauss Gauss is already extended, but extended also given 1000. Will use the given yield.
warn_advanced_feature(
# as mentioned before, PDFs can be evaluated directly on numpy arrays or pandas DataFrames
pdf.pdf(data_obs1.to_pandas())
<tf.Tensor: shape=(1667,), dtype=float64, numpy=
array([0.54824178, 0.54483858, 0.51430953, ..., 0.44047785, 0.51147179,
0.49626985])>
Loss and minimizer#
They stay mostly the same (apart from improvements behind the scenes and bugfixes).
Losses take now directly the data and the model, without the need of a Data
object (if the data is already cut to the space).
To use the automatic gradient, set gradient="zfit"
in the minimizer. This can speed up the minimization for larger fits.
Updated params#
The minimizer currently updates the parameter default values after each minimization. To keep this behavior, add update_params()
call after the minimization.
(experimentally, the update can be disabled with zfit.run.experimental_disable_param_update(True)
, this will probably be the default in the future. Pay attention that using this experimental features most likely breaks current scripts. Feedback on this new feature is highly welcome!)
loss = zfit.loss.ExtendedUnbinnedNLL(model=pdf, data=data2_obs1.to_pandas())
minimizer = zfit.minimize.Minuit(
gradient="zfit"
) # to use the automatic gradient -> can fail, but can speed up the minimization
result = minimizer.minimize(loss).update_params()
pdf.plot.plotpdf(full=False) # plot only the pdf, no labels
<Axes: >
Result#
The result has more usage for setting parameters, the update_params
method is now available and can be used to set the parameters to the minimum as seen above.
It can also be used in a context manager to temporarily set the parameters to the minimum and restore them afterwards.
param1again.set_value(1.5)
with result:
print(f"param1 set temporarily to {param1again}")
print(f"param1 is now {param1again} again")
param1 set temporarily to <zfit.Parameter 'param1' floating=True value=0.9023>
param1 is now <zfit.Parameter 'param1' floating=True value=1.5> again
# or to set the parameters to the minimum
zfit.param.set_values(result) # supports also a dict of {param: value}!
print(param1again)
<zfit.Parameter 'param1' floating=True value=0.9023>
Serialization#
The result can now be pickled like any object in zfit!
(this was not possible before, only after calling freeze
on the result)
This works directly using dill
(a library that extends pickle
), but can fail if the garbage collector is not run. Therefore, zfit provides a slightly modified dill
that can work as a drop-in replacement.
The objects can be saved and loaded again and used as before. Ideal to store the minimum of a fit and use it later for statistical treatments, plotting, etc.
result_serialized = zfit.dill.dumps(result)
result_deserialized = zfit.dill.loads(result_serialized)
# the result can be used as before
result_deserialized.hesse() # the default name is now "hesse" and not "minuit_hesse"
result_deserialized.errors() # the default name is now "errors" and not "minuit_minos"
print(result_deserialized)
FitResult
of
<ExtendedUnbinnedNLL model=[<zfit.<class 'zfit.models.dist_tfp.Gauss'> params=[mu, param1]] data=[<zfit.Data: Data obs=('obs1',) shape=(667, 1)>] constraints=[]>
with
<Minuit Minuit tol=0.001>
╒═════════╤═════════════╤══════════════════╤═════════╤══════════════════════════════╕
│ valid │ converged │ param at limit │ edm │ approx. fmin (full | opt.) │
╞═════════╪═════════════╪══════════════════╪═════════╪══════════════════════════════╡
│
True
│ True
│ False
│ 2.1e-05 │ 519.15 | 10009.44 │
╘═════════╧═════════════╧══════════════════╧═════════╧══════════════════════════════╛
Parameters
name value (rounded) hesse errors at limit
------ ------------------ ----------- ------------------- ----------
mu 0.00324798 +/- 0.059 - 0.06 + 0.061 False
param1 0.902278 +/- 0.1 - 0.088 + 0.12 False
Parameters as arguments#
The values of the parameters can now be directly used as arguments in functions of PDFs/loss. Methods in the pdf also take the parameters as arguments.
The name of the argument has to match the name of the parameter given in initialization (or can also be the parameter itself).
from matplotlib import pyplot as plt
x = znp.linspace(*obs1.v1.limits, 1000)
plt.plot(x, pdf.pdf(x, params={"param1": 1.5}), label="param1=1.5")
plt.plot(x, pdf.pdf(x, params={param1again: 2.5}), label="param1=2.5")
plt.legend()
<matplotlib.legend.Legend at 0x7f8e9fad3da0>
import tqdm
# similar for the loss
param1dict = result_deserialized.params[param1again]
param1min = param1dict["value"]
lower, upper = param1dict["errors"]["lower"], param1dict["errors"]["upper"]
x = np.linspace(param1min + 2 * lower, param1min + 2 * upper, 50)
y = []
param1again.floating = False # not minimized
for x_i in tqdm.tqdm(x):
param1again.set_value(x_i)
minimizer.minimize(loss).update_params() # set nuisance parameters to minimum
y.append(loss.value())
plt.plot(x, y)
0%| | 0/50 [00:00<?, ?it/s]
2%|▏ | 1/50 [00:00<00:16, 3.02it/s]
8%|▊ | 4/50 [00:00<00:04, 10.41it/s]
14%|█▍ | 7/50 [00:00<00:02, 15.06it/s]
20%|██ | 10/50 [00:00<00:02, 18.05it/s]
26%|██▌ | 13/50 [00:00<00:01, 20.03it/s]
32%|███▏ | 16/50 [00:00<00:01, 21.29it/s]
38%|███▊ | 19/50 [00:01<00:01, 22.27it/s]
44%|████▍ | 22/50 [00:01<00:01, 22.97it/s]
50%|█████ | 25/50 [00:01<00:01, 23.47it/s]
56%|█████▌ | 28/50 [00:01<00:00, 23.83it/s]
62%|██████▏ | 31/50 [00:01<00:00, 24.08it/s]
68%|██████▊ | 34/50 [00:01<00:00, 24.02it/s]
74%|███████▍ | 37/50 [00:01<00:00, 24.16it/s]
80%|████████ | 40/50 [00:01<00:00, 24.29it/s]
86%|████████▌ | 43/50 [00:02<00:00, 24.35it/s]
92%|█████████▏| 46/50 [00:02<00:00, 24.33it/s]
98%|█████████▊| 49/50 [00:02<00:00, 24.34it/s]
100%|██████████| 50/50 [00:02<00:00, 21.38it/s]
[<matplotlib.lines.Line2D at 0x7f8e79722ae0>]
# a result can also be used as argument for PDFs or, here, for losses
loss_before = loss.value()
loss_min = loss.value(params=result_deserialized) # evaluate at minimum
print(f"loss before (random from before): {loss_before:.7} minimum value: {loss_min:.7} vs {result_deserialized.fmin:.7}")
loss before (random from before): 520.5985 minimum value: 519.1466 vs 519.1466
# creating a PDF looks also different, but here we use the name of the parametrization and the axis (integers)
class MyGauss2D(zfit.pdf.ZPDF):
_PARAMS = ("mu", "sigma")
_N_OBS = 2
@zfit.supports() # this allows the params magic
def _unnormalized_pdf(self, x, params):
x0 = x[0] # this means "axis 0"
x1 = x[1] # this means "axis 1"
mu = params["mu"]
sigma = params["sigma"]
return znp.exp(-0.5 * ((x0 - mu) / sigma) ** 2) * x1 # fake, just for demonstration
gauss2D = MyGauss2D(mu=0.8, sigma=param1again, obs=obs12, label="2D Gaussian$^2$")
gauss2D.plot.plotpdf(obs="obs1") # we can project the 2D pdf to 1D
<Axes: xlabel='obs1', ylabel='Probability density'>