Toy studies and inference#

While a single fit is useful, it does not say a lot about the uncertainty of the result and whether the fit is biased in any way or not.

Many statistical methods, such as obtaining sWeights, (Feldman and Cousins) Confidence Interval, setting limits and more are all covered it the hepstats library, which work directly with zfit parts.

For other toy studies, models offer a sampler function that can be used for repeated sampling.

Playing with toys: Multiple samplings#

The method create_sampler() returns a sampler that can be used like a Data object (e.g. for building a ZfitLoss). The sampling itself is not yet done but only when resample() is invoked. The sample generated depends on the original pdf at this point, e.g. parameters have the value they have when the resample() is invoked. To have certain parameters fixed, they have to be specified either on create_sampler() via fixed_params, on resample() by specifying which parameter will take which value via param_values or by changing the attribute of Sampler.

Reusing the model, obs and parameters from Building a model, this is typically how toys look like:

import os
os.environ["ZFIT_DISABLE_TF_WARNINGS"] = "1"

import zfit
from zfit import z
import numpy as np

obs = zfit.Space('x', limits=(4800, 6000))

mu1 = zfit.Parameter("mu1", 1.)
sigma1 = zfit.Parameter("sigma1", 1.)
gauss1 = zfit.pdf.Gauss(obs=obs, mu=mu1, sigma=sigma1)

mu2 = zfit.Parameter("mu2", 1.)
sigma2 = zfit.Parameter("sigma2", 1.)
# using the previous gaussians and obs to create a model

gauss3 = zfit.pdf.Gauss(obs=obs, mu=mu2, sigma=sigma2)
model = zfit.pdf.SumPDF([gauss1, gauss3], fracs=0.4)

sampler = model.create_sampler(n=1000,fixed_params=True)
nll = zfit.loss.UnbinnedNLL(model=model, data=sampler)

minimizer = zfit.minimize.Minuit()

results = []
nruns = 5
for run_number in range(nruns):
    # initialize the parameters randomly
    sampler.resample()  # now the resampling gets executed

    sigma1.set_value(abs(np.random.normal()) + 0.5)

    result = minimizer.minimize(nll)

    # safe the result, collect the values, calculate errors...

Here we fixed all parameters as they have been initialized and then sample. If we do not provide any arguments to resample, this will always sample now from the distribution with the parameters set to the values when the sampler was created.

To give another, though not very useful example:

# create a model depending on mu1, sigma1, mu2, sigma2

sampler = model.create_sampler(n=1000, fixed_params=[mu1, mu2])
nll = zfit.loss.UnbinnedNLL(model=model, data=sampler)

sampler.resample()  # now it sampled

# do something with nll
minimizer.minimize(nll)  # minimize

# note that the nll, being dependent on ``sampler``, also changed!

The sample is now resampled with the current values (minimized values) of sigma1, sigma2 and with the initial values of mu1, mu2 (because they have been fixed).

We can also specify the parameter values explicitly by using the following argument. Reusing the example above

sampler.resample(param_values={sigma1: 5})

The sample (and therefore also the sample the nll depends on) is now sampled with sigma1 set to 5.

If some parameters are constrained to values observed from external measurements, usually Gaussian constraints, then sampling of the observed values might be needed to obtain an unbiased sample from the model. Example:

# same model depending on mu1, sigma1, mu2, sigma2

constraint = zfit.constraint.GaussianConstraint(params=[sigma1, sigma2],
                                                observation=[1.0, 0.5],
                                                uncertainty=[0.1, 0.05])

n_samples = 5

sampler = model.create_sampler(n=n_samples, fixed_params=[mu1, mu2])
nll = zfit.loss.UnbinnedNLL(model=model, data=sampler, constraints=constraint)

constr_values = constraint.sample(n=n_samples)

for constr_params, constr_vals in constr_values.items():
    # do something with nll, temporarily assigning values to the parameters
    with zfit.param.set_values(constr_params, constr_vals):
        minimizer.minimize(nll)  # minimize