KDE1DimGrid#
- class zfit.pdf.KDE1DimGrid(data, *, bandwidth=None, kernel=None, padding=None, num_grid_points=None, binning_method=None, obs=None, weights=None, name='GridKDE1DimV1')[source]#
Bases:
zfit.models.kde.KDEHelper
,zfit.models.dist_tfp.WrapDistribution
Kernel Density Estimation is a non-parametric method to approximate the density of given points.
For a more in-depth explanation, see also in the section about Kernel Density Estimation Grid KDEs
\[f_h(x) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big)\]- Parameters
data (ztyping.XTypeInput) –
Data sample to approximate the density from. The points represent positions of the kernel, the \(x_i\). This is preferrably a
ZfitData
, but can also be an array-like object.If the data has weights, they will be taken into account. This will change the count of the events, whereas weight \(w_i\) of \(x_i\) will scale the value of \(K_i( x_i)\), resulting in a factor of :math:`frac{w_i}{sum w_i} `.
If no weights are given, each kernel will be scaled by the same constant \(\frac{1}{n_{data}}\).
bandwidth (ztyping.ParamTypeInput | str | Callable | None) –
Valid pre-defined options are {‘silverman’, ‘scott’, ‘adaptive_zfit’, ‘adaptive_geom’}.Bandwidth of the kernel, often also denoted as \(h\). For a Gaussian kernel, this corresponds to sigma. This can be calculated using pre-defined options or by specifying a numerical value that is broadcastable to data – a scalar or an array-like object with the same size as data.
A scalar value is usually referred to as a global bandwidth while an array holds local bandwidths
kernel (tfd.Distribution) –
The kernel is the heart of the Kernel Density Estimation, which consists of the sum of kernels around each sample point. Therefore, a kernel should represent the distribution probability of a single data point as close as possible.
The most widespread kernel is a Gaussian, or Normal, distribution. Due to the law of large numbers, the sum of many (arbitrary) random variables – this is the case for most real world observable as they are the result of multiple consecutive random effects – results in a Gaussian distribution. However, there are many cases where this assumption is not per-se true. In this cases an alternative kernel may offer a better choice.
Valid choices are callables that return a
Distribution
, such as all distributions that belong to the loc-scale family.padding (callable | str | bool | None) –
KDEs have a peculiar weakness: the boundaries, as the outside has a zero density. This makes the KDE go down at the bountary as well, as the density approaches zero, no matter what the density inside the boundary was.
There are two ways to circumvent this problem:
the best solution: providing a larger dataset than the default space the PDF is used in
mirroring the existing data at the boundaries, which is equivalent to a boundary condition with a zero derivative. This is a padding technique and can improve the boundaries. However, one important drawback of this method is to keep in mind that this will actually alter the PDF to look mirrored. If the PDF is plotted in a larger range, this becomes clear.
Possible options are a number (default 0.1) that depicts the fraction of the overall space that defines the data mirrored on both sides. For example, for a space from 0 to 5, a value of 0.3 means that all data in the region of 0 to 1.5 is taken, mirrored around 0 as well as all data from 3.5 to 5 and mirrored at 5. The new data will go from -1.5 to 6.5, so the KDE is also having a shape outside the desired range. Using it only for the range 0 to 5 hides this. Using a dict, each side separately (or only a single one) can be mirrored, like
{'lowermirror: 0.1}
or{'lowermirror: 0.2, 'uppermirror': 0.1}
. For more control, a callable that takes data and weights can also be used.num_grid_points (int | None) –
Number of points in the binning grid.
The data will be binned using the binning_method in num_grid_points and this histogram grid will then be used as kernel points. This has the advantage to have a constant computational complexity independent of the data size.
A number from 32 on can already yield good results, while the default is set to 1024, creating a fine grid. Lowering the number increases the performance at the cost of accuracy.
binning_method (str | None) –
Method to be used for binning the data. Options are ‘linear’, ‘simple’.
The data can be binned in the usual way (‘simple’), but this is less precise for KDEs, where we are interested in the shape of the histogram and smoothing it. Therefore, a better suited method, ‘linear’, is available.
In normal binnig, each event (or weight) falls into the bin within the bin edges, while the neighbouring bins get zero counts from this event. In linear binning, the event is split between two bins, proportional to its closeness to each bin.
The ‘linear’ method provides superior performance, most notably in small (~32) grids.
obs (ztyping.ObsTypeInput | None) – Observable space of the KDE. As with any other PDF, this will be used as the default norm, but does not define the domain of the PDF. Namely, this can be a smaller space than data, as long as the name of the observable match. Using a larger dataset is actually good practice avoiding bountary biases, see also Boundary bias and padding.
weights (np.ndarray | tf.Tensor | None) –
Weights of each event in data, can be None or Tensor-like with shape compatible with data. Instead of using this parameter, it is preferred to use a
ZfitData
as data that contains weights. This will change the count of the events, whereas weight \(w_i\) of \(x_i\) will scale the value of \(K_i( x_i)\), resulting in a factor of :math:`frac{w_i}{sum w_i} `.If no weights are given, each kernel will be scaled by the same constant \(\frac{1}{n_{data}}\).
name (str) – Human-readable name or label of the PDF for better identification. Has no programmatical functional purpose as identification.
- add_cache_deps(cache_deps, allow_non_cachable=True)#
Add dependencies that render the cache invalid if they change.
- Parameters
cache_deps (
Union
[zfit.core.interfaces.ZfitGraphCachable,Iterable
[zfit.core.interfaces.ZfitGraphCachable]]) –allow_non_cachable (
bool
) – IfTrue
, allowcache_dependents
to be non-cachables. IfFalse
, anycache_dependents
that is not aZfitGraphCachable
will raise an error.
- Raises
TypeError – if one of the
cache_dependents
is not aZfitGraphCachable
_and_allow_non_cachable
ifFalse
.
- analytic_integrate(limits, norm=None, *, norm_range=None)#
Analytical integration over function and raise Error if not possible.
- Parameters
- Return type
Union
[float
,Tensor
]- Returns
The integral value
- Raises
AnalyticIntegralNotImplementedError – If no analytical integral is available (for this limits).
NormRangeNotImplementedError – if the norm argument is not supported. This means that no analytical normalization is available, explicitly: the analytical integral over the limits = norm is not available.
- apply_yield(value, norm=False, log=False)#
If a norm_range is given, the value will be multiplied by the yield.
- as_func(norm=False, *, norm_range=None)#
Return a
Function
with the functionmodel(x, norm=norm)
.- Parameters
norm (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) –
- copy(**override_parameters)#
Creates a copy of the model.
Note: the copy model may continue to depend on the original initialization arguments.
- Parameters
**override_parameters – String/value dictionary of initialization arguments to override with new value.
- Return type
- Returns
- A new instance of
type(self)
initialized from the union of self.parameters and override_parameters, i.e.,
dict(self.parameters, **override_parameters)
.
- A new instance of
- create_extended(yield_, name_addition='_extended')#
Return an extended version of this pdf with yield
yield_
. The parameters are shared.- Parameters
yield –
name_addition –
- Return type
ZfitPDF
- Returns
ZfitPDF
- create_projection_pdf(limits, *, options=None, limits_to_integrate=None)#
Create a PDF projection by integrating out some of the dimensions. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(limits_to_integrate)
. They will be removed in a future version. Instructions for updating: Uselimits
instead.The new projection pdf is still fully dependent on the pdf it was created with.
- Parameters
() (options) –
() –
limits (
Union
[ZfitLimit
,Tensor
,ndarray
,Iterable
[float
],float
,Tuple
[float
],List
[float
],bool
,None
]) –
- Return type
ZfitPDF
- Returns
A pdf without the dimensions from
limits_to_integrate
.
- create_sampler(n=None, limits=None, fixed_params=True)#
Create a
Sampler
that acts asData
but can be resampled, also with changed parameters and n.If
limits
is not specified,space
is used (if the space contains limits). Ifn
is None and the model is an extended pdf, ‘extended’ is used by default.- Parameters
n –
The number of samples to be generated. Can be a Tensor that will be or a valid string. Currently implemented:
’extended’: samples
poisson(yield)
from each pdf that is extended.
limits – From which space to sample.
fixed_params – A list of
Parameters
that will be fixed during severalresample
calls. If True, all are fixed, if False, all are floating. If aParameter
is not fixed and its value gets updated (e.g. by aParameter.set_value()
call), this will be reflected inresample
. If fixed, the Parameter will still have the same value as theSampler
has been created with when it resamples.
- Returns
Sampler
- Raises
NotExtendedPDFError – if ‘extended’ is chosen (implicitly by default or explicitly) as an option for
n
but the pdf itself is not extended.ValueError – if n is an invalid string option.
InvalidArgumentError – if n is not specified and pdf is not extended.
- property dtype: tensorflow.python.framework.dtypes.DType#
The dtype of the object.
- Return type
DType
- get_cache_deps(only_floating=True)#
Return a set of all independent
Parameter
that this object depends on.- Parameters
only_floating – If
True
, only return floatingParameter
- get_dependencies(only_floating=True)#
DEPRECATED FUNCTION
Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
get_params
instead if you want to retrieve the independent parameters orget_cache_deps
in case you need the numerical cache dependents (advanced).
- get_params(floating=True, is_yield=None, extract_independent=True, only_floating=<class 'zfit.util.checks.NotSpecified'>)#
Recursively collect parameters that this object depends on according to the filter criteria.
- Which parameters should be included can be steered using the arguments as a filter.
- None: do not filter on this. E.g.
floating=None
will return parameters that are floating as well as parameters that are fixed.
- None: do not filter on this. E.g.
True: only return parameters that fulfil this criterion
- False: only return parameters that do not fulfil this criterion. E.g.
floating=False
will return only parameters that are not floating.
- False: only return parameters that do not fulfil this criterion. E.g.
- Parameters
floating (bool | None) – if a parameter is floating, e.g. if
floating()
returnsTrue
is_yield (bool | None) – if a parameter is a yield of the _current_ model. This won’t be applied recursively, but may include yields if they do also represent a parameter parametrizing the shape. So if the yield of the current model depends on other yields (or also non-yields), this will be included. If, however, just submodels depend on a yield (as their yield) and it is not correlated to the output of our model, they won’t be included.
extract_independent (bool | None) – If the parameter is an independent parameter, i.e. if it is a
ZfitIndependentParameter
.
- Return type
set[ZfitParameter]
- get_yield()#
Return the yield (only for extended models).
- Return type
Parameter | None
- Returns
The yield of the current model or None
- property is_extended: bool#
Flag to tell whether the model is extended or not.
- Return type
bool
- Returns
A boolean.
- log_pdf(x, norm=None, *, norm_range=None)#
Log probability density function normalized over
norm_range
.- Parameters
x (
Union
[float
,Tensor
]) –float
ordouble
Tensor
.norm (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
,None
]) –Space
to normalize over
- Return type
Union
[float
,Tensor
]- Returns
A
Tensor
of typeself.dtype
.
- property name: str#
The name of the object.
- Return type
str
- property norm: Space | None | bool#
Return the current normalization range. If None and the
obs
have limits, they are returned.- Return type
Space | None | bool
- Returns
The current normalization range.
- property norm_range: Space | None | bool#
Return the current normalization range. If None and the
obs
have limits, they are returned. (deprecated)Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use the
norm
attribute instead.- Return type
Space | None | bool
- Returns
The current normalization range.
- normalization(limits, *, options=None)#
Return the normalization of the function (usually the integral over
limits
).- Parameters
() (options) –
() –
limits (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – The limits on where to normalize over
- Return type
Union
[float
,Tensor
]- Returns
The normalization value
- numeric_integrate(limits, norm=None, *, options=None, norm_range=None)#
Numerical integration over the model.
- classmethod register_additional_repr(**kwargs)#
Register an additional attribute to add to the repr.
- Parameters
an (any keyword argument. The value has to be gettable from the instance (has to be) –
self. (attribute or callable method of) –
- classmethod register_analytic_integral(cls, func, limits=None, priority=50, *, supports_norm=None, supports_norm_range=None, supports_multiple_limits=None)#
Register an analytic integral with the class. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(supports_norm_range)
. They will be removed in a future version. Instructions for updating: Usesupports_norm
instead.- Parameters
func (Callable) –
A function that calculates the (partial) integral over the axes
limits
. The signature has to be the following:- x (
ZfitData
, None): the data for the remaining axes in a partial integral. If it is not a partial integral, this will be None.
- x (
limits (
ZfitSpace
): the limits to integrate over.- norm_range (
ZfitSpace
, None): Normalization range of the integral. If not
supports_supports_norm_range
, this will be None.
- norm_range (
params (Dict[param_name,
zfit.Parameters
]): The parameters of the model.model (
ZfitModel
):The model that is being integrated.
limits (ztyping.LimitsType) – If a
Space
is given, it is used as limits. Otherwise arguments to instantiate a Range class can be given as follows.|limits_init|priority (int | float) – Priority of the function. If multiple functions cover the same space, the one with the highest priority will be used.
supports_multiple_limits (bool) – If
True
, thelimits
given to the integration function can have multiple limits. IfFalse
, only simple limits will pass through and multiple limits will be auto-handled.supports_norm (bool) – If
True
,norm
argument to the function may not beNone
. IfFalse
,norm
will always beNone
and care is taken of the normalization automatically.
- Return type
None
- register_cacher(cacher)#
Register a
cacher
that caches values produces by this instance; a dependent.- Parameters
cacher (
Union
[zfit.core.interfaces.ZfitGraphCachable,Iterable
[zfit.core.interfaces.ZfitGraphCachable]]) –
- classmethod register_inverse_analytic_integral(func)#
Register an inverse analytical integral, the inverse (unnormalized) cdf.
- Parameters
func (
Callable
) – A function with the signaturefunc(x, params)
, wherex
is a Data object andparams
is a dict.- Return type
None
- reset_cache_self()#
Clear the cache of self and all dependent cachers.
- sample(n=None, limits=None, x=None)#
Sample
n
points withinlimits
from the model.If
limits
is not specified,space
is used (if the space contains limits). Ifn
is None and the model is an extended pdf, ‘extended’ is used by default.- Parameters
n (ztyping.nSamplingTypeIn) –
The number of samples to be generated. Can be a Tensor that will be or a valid string. Currently implemented:
’extended’: samples
poisson(yield)
from each pdf that is extended.
limits (ztyping.LimitsType) – In which region to sample in
- Return type
SampleData
- Returns
SampleData(n_obs, n_samples)
- Raises
NotExtendedPDFError – if ‘extended’ is (implicitly by default or explicitly) chosen as an option for
n
but the pdf itself is not extended.ValueError – if n is an invalid string option.
InvalidArgumentError – if n is not specified and pdf is not extended.
- set_norm_range(norm)#
Set the normalization range (temporarily if used with contextmanager).
- Parameters
norm (
Union
[ZfitLimit
,Tensor
,ndarray
,Iterable
[float
],float
,Tuple
[float
],List
[float
],bool
,None
]) –
- set_yield(value)#
Make the model extended by setting a yield. If possible, prefer to use
create_extended
.This does not alter the general behavior of the PDF. The
pdf
andintegrate
and similar methods will continue to return the same - normalized to 1 - values. However, not only can this parameter be accessed viaget_yield
, the methodsext_pdf
andext_integral
provide a version ofpdf
andintegrate
respecetively that is multiplied by the yield.These can be useful for plotting and for binned likelihoods.
- Parameters
() (value) –
- unnormalized_pdf(x)#
PDF “unnormalized”. Use
functions
for unnormalized pdfs. this is only for performance in special cases. (deprecated)Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
pdf(norm=False)
instead- Parameters
x (
Union
[float
,Tensor
]) – The value, have to be convertible to a Tensor- Return type
Union
[float
,Tensor
]- Returns
1-dimensional
tf.Tensor
containing the unnormalized pdf.
- update_integration_options(draws_per_dim=None, mc_sampler=None, tol=None, max_draws=None, draws_simpson=None)#
Set the integration options.
- Parameters
max_draws (default ~1'000'000) – Maximum number of draws when integrating . Typically 500’000 - 5’000’000.
tol – Tolerance on the error of the integral. typically 1e-4 to 1e-8
draws_per_dim – The draws for MC integration to do per iteration. Can be set to
'auto
’.draws_simpson – Number of points in one dimensional Simpson integration. Can be set to
'auto'
.