Toy studies and inference¶
While a single fit is useful, it does not say a lot about the uncertainty of the result and whether the fit is biased in any way or not.
Many statistical methods, such as obtaining sWeights, (Feldman and Cousins) Confidence Interval, setting limits and more are all covered it the hepstats library, which work directly with zfit parts.
For other toy studies, models offer a sampler function that can be used for repeated sampling.
Playing with toys: Multiple samplings¶
The method create_sampler()
returns a sampler that can be used
like a Data
object (e.g. for building a ZfitLoss
).
The sampling itself is not yet done but only when resample()
is
invoked. The sample generated depends on the original pdf at this point, e.g. parameters have the
value they have when the resample()
is invoked. To have certain
parameters fixed, they have to be specified either on create_sampler()
via fixed_params
, on resample()
by specifying which parameter
will take which value via param_values
or by changing the attribute of Sampler
.
Reusing the model, obs and parameters from Building a model, this is how typically toys look like:
import os
os.environ["ZFIT_DISABLE_TF_WARNINGS"] = "1"
import zfit
from zfit import z
import numpy as np
obs = zfit.Space('x', limits=(4800, 6000))
mu1 = zfit.Parameter("mu1", 1.)
sigma1 = zfit.Parameter("sigma1", 1.)
gauss1 = zfit.pdf.Gauss(obs=obs, mu=mu1, sigma=sigma1)
mu2 = zfit.Parameter("mu2", 1.)
sigma2 = zfit.Parameter("sigma2", 1.)
# using the previous gaussians and obs to create a model
gauss3 = zfit.pdf.Gauss(obs=obs, mu=mu2, sigma=sigma2)
model = zfit.pdf.SumPDF([gauss1, gauss3], fracs=0.4)
sampler = model.create_sampler(n=1000,fixed_params=True)
nll = zfit.loss.UnbinnedNLL(model=model, data=sampler)
minimizer = zfit.minimize.Minuit()
results = []
nruns = 5
for run_number in range(nruns):
# initialize the parameters randomly
sampler.resample() # now the resampling gets executed
mu1.set_value(np.random.normal())
sigma1.set_value(abs(np.random.normal()) + 0.5)
result = minimizer.minimize(nll)
results.append(result)
# safe the result, collect the values, calculate errors...
Here we fixed all parameters as they have been initialized and then sample. If we do not provide any
arguments to resample
, this will always sample now from the distribution with the parameters set to the values when
the sampler was created.
To give another, though not very useful example:
# create a model depending on mu1, sigma1, mu2, sigma2
sampler = model.create_sampler(n=1000, fixed_params=[mu1, mu2])
nll = zfit.loss.UnbinnedNLL(model=model, data=sampler)
sampler.resample() # now it sampled
# do something with nll
minimizer.minimize(nll) # minimize
sampler.resample()
# note that the nll, being dependent on ``sampler``, also changed!
The sample is now resampled with the current values (minimized values) of sigma1
, sigma2
and with
the initial values of mu1
, mu2
(because they have been fixed).
We can also specify the parameter values explicitly by using the following argument. Reusing the example above
sigma1.set_value(np.random.normal())
sampler.resample(param_values={sigma1: 5})
The sample (and therefore also the sample the nll
depends on) is now sampled with sigma1
set to 5.
If some parameters are constrained to values observed from external measurements, usually Gaussian constraints, then sampling of the observed values might be needed to obtain an unbiased sample from the model. Example:
# same model depending on mu1, sigma1, mu2, sigma2
constraint = zfit.constraint.GaussianConstraint(params=[sigma1, sigma2],
observation=[1.0, 0.5],
uncertainty=[0.1, 0.05])
n_samples = 5
sampler = model.create_sampler(n=n_samples, fixed_params=[mu1, mu2])
nll = zfit.loss.UnbinnedNLL(model=model, data=sampler, constraints=constraint)
constr_values = constraint.sample(n=n_samples)
for constr_params, constr_vals in constr_values.items():
sampler.resample()
# do something with nll, temporarily assigning values to the parameters
with zfit.param.set_values(constr_params, constr_vals):
minimizer.minimize(nll) # minimize