KDE1DimISJ#

class zfit.pdf.KDE1DimISJ(data, *, obs=None, padding=None, num_grid_points=None, binning_method=None, weights=None, extended=None, norm=None, name='KDE1DimISJ', label=None)[source]#

Bases: KDEHelper, BasePDF, SerializableMixin

Kernel Density Estimation is a non-parametric method to approximate the density of given points.

For a more in-depth explanation, see also in the section about Kernel Density Estimation ISJ KDEs

\[f_h(x) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big)\]

The bandwidth is computed by using a trick described in a paper by Botev et al. that uses the fact, that the Kernel Density Estimation with a Gaussian Kernel is a solution to the Heat Equation.

Parameters:

data (ztyping.XTypeInput) –
Data sample to approximate the density from. The points represent positions of the kernel, the \(x_i\). This is preferrably a ZfitData, but can also be an array-like object.

If the data has weights, they will be taken into account. This will change the count of the events, whereas weight \(w_i\) of \(x_i\) will scale the value of \(K_i( x_i)\), resulting in a factor of :math:`frac{w_i}{sum w_i} `.

If no weights are given, each kernel will be scaled by the same constant \(\frac{1}{n_{data}}\).
padding (callable | str | bool | None) –
KDEs have a peculiar weakness: the boundaries, as the outside has a zero density. This makes the KDE go down at the bountary as well, as the density approaches zero, no matter what the density inside the boundary was.

There are two ways to circumvent this problem:
- the best solution: providing a larger dataset than the default space the PDF is used in
- mirroring the existing data at the boundaries, which is equivalent to a boundary condition with a zero derivative. This is a padding technique and can improve the boundaries. However, one important drawback of this method is to keep in mind that this will actually alter the PDF to look mirrored. If the PDF is plotted in a larger range, this becomes clear.
Possible options are a number (default 0.1) that depicts the fraction of the overall space that defines the data mirrored on both sides. For example, for a space from 0 to 5, a value of 0.3 means that all data in the region of 0 to 1.5 is taken, mirrored around 0 as well as all data from 3.5 to 5 and mirrored at 5. The new data will go from -1.5 to 6.5, so the KDE is also having a shape outside the desired range. Using it only for the range 0 to 5 hides this. Using a dict, each side separately (or only a single one) can be mirrored, like {'lowermirror: 0.1} or {'lowermirror: 0.2, 'uppermirror': 0.1}. For more control, a callable that takes data and weights can also be used.
num_grid_points (int | None) –
Number of points in the binning grid.

The data will be binned using the binning_method in num_grid_points and this histogram grid will then be used as kernel points. This has the advantage to have a constant computational complexity independent of the data size.

A number from 32 on can already yield good results, while the default is set to 1024, creating a fine grid. Lowering the number increases the performance at the cost of accuracy.
binning_method (str | None) –
Method to be used for binning the data. Options are ‘linear’, ‘simple’.

The data can be binned in the usual way (‘simple’), but this is less precise for KDEs, where we are interested in the shape of the histogram and smoothing it. Therefore, a better suited method, ‘linear’, is available.

In normal binnig, each event (or weight) falls into the bin within the bin edges, while the neighbouring bins get zero counts from this event. In linear binning, the event is split between two bins, proportional to its closeness to each bin.

The ‘linear’ method provides superior performance, most notably in small (~32) grids.
obs (ztyping.ObsTypeInput | None) – Observable space of the KDE. As with any other PDF, this will be used as the default norm, but does not define the domain of the PDF. Namely, this can be a smaller space than data, as long as the name of the observable match. Using a larger dataset is actually good practice avoiding bountary biases, see also Boundary bias and padding.
weights (np.ndarray | tf.Tensor | None) –
Weights of each event in data, can be None or Tensor-like with shape compatible with data. Instead of using this parameter, it is preferred to use a ZfitData as data that contains weights. This will change the count of the events, whereas weight \(w_i\) of \(x_i\) will scale the value of \(K_i( x_i)\), resulting in a factor of :math:`frac{w_i}{sum w_i} `.

If no weights are given, each kernel will be scaled by the same constant \(\frac{1}{n_{data}}\).
norm (NormInputType) – Normalization of the PDF. By default, this is the same as the default space of the PDF.
extended (ExtendedInputType) – The overall yield of the PDF. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the ext_* methods and the counts (for binned PDFs).
name (str) – Name of the PDF. Maybe has implications on the serialization and deserialization of the PDF. For a human-readable name, use the label.
label (str | None) – Human-readable name or label of the PDF for a better description, to be used with plots etc. Has no programmatical functional purpose as identification.

add_cache_deps(cache_deps, allow_non_cachable=True)#

Add dependencies that render the cache invalid if they change.

Parameters:

cache_deps (ztyping.CacherOrCachersType)
allow_non_cachable (bool) – If True, allow cache_dependents to be non-cachables. If False, any cache_dependents that is not a ZfitGraphCachable will raise an error.

Raises:

TypeError – if one of the cache_dependents is not a ZfitGraphCachable _and_ allow_non_cachable if False.

analytic_integrate(limits, norm=None, *, params=None)#

Analytical integration over function and raise Error if not possible.

Parameters:

limits (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – Limits of the integration.
norm (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – Normalization of the integration. By default, this is the same as the default space of the PDF. False means no normalization and returns the unnormed integral.
params (TypeVar(ParamTypeInput, zfit.core.interfaces.ZfitParameter, Union[int, float, complex, Tensor, zfit.core.interfaces.ZfitParameter])) – Mapping of the parameter names to the actual values. The parameter names refer to the names of the parameters, typically Parameter, that the model was _initialized_ with, not the name of the models parametrization.

Return type:

Union[float, Tensor]

Returns:

The integral value

Raises:

AnalyticIntegralNotImplementedError – If no analytical integral is available (for this limits).
NormRangeNotImplementedError – if the norm argument is not supported. This means that no analytical normalization is available, explicitly: the analytical integral over the limits = norm is not available.

as_func(norm=False)#

Return a Function with the function model(x, norm=norm).

Parameters:: norm (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – If not False or a ZfitSpace, this will be used to call the pdf function.

copy(**override_parameters)#

Creates a copy of the model.

Note: the copy model may continue to depend on the original initialization arguments.

Parameters:

**override_parameters – String/value dictionary of initialization arguments to override with new value.

Return type:

BasePDF

Returns:

A new instance of type(self) initialized from the union: of self.parameters and override_parameters, i.e., dict(self.parameters, **override_parameters).

create_extended(yield_, name=None, *, name_addition=None)#

Return an extended version of this pdf with yield yield_. The parameters are shared.

Parameters:

yield – Yield (expected number of events) of the PDF. This is the expected number of events. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the ext_* methods and the counts (for binned PDFs).
name (str | None) – New name of the PDF. If None, the name of the PDF with a trailing “_ext” is used.

Returns:

a new PDF that is extended

Return type:

ZfitPDF

create_projection_pdf(*, limits=None, obs=None, options=None, name=None, label=None, extended=None, norm=None)#

Create a PDF projection by integrating out some dimensions.

The new projection pdf is still fully dependent on the pdf it was created with.

Parameters:

limits (Union[ZfitLimit, Tensor, ndarray, Iterable[float], float, Tuple[float], List[float], bool, None]) – Limits of the integration to project out. If not given, all observables that are not in obs are projected on using the default limits of the observables.
obs (Union[ZfitLimit, Tensor, ndarray, Iterable[float], float, Tuple[float], List[float], bool, None]) – Observables to project on. If not given, all observables that are not in limits are projected on.
options –
Options for the integration. Additional options for the integration. Currently supported options are: - type: one of (bins)

This hints that bins are integrated. A method that is vectorizable, non-dynamic and therefore less suitable for complicated functions is chosen.

Return type:

ZfitPDF

Returns:

A pdf without the dimensions from limits.

create_sampler(n=None, limits=None, *, fixed_params=None, params=None)#

Create a SamplerData that acts as Data but can be resampled, also with changed parameters and (deprecated arguments)

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (fixed_params). They will be removed in a future version. Instructions for updating: Use params instead.

If limits is not specified, space is used (if the space contains limits). If n is None and the model is an extended pdf, ‘extended’ is used by default.

Parameters:

n (Union[int, Tensor, str]) –
The number of samples to be generated. Can be a Tensor that will be or a valid string. Currently implemented:
- ’extended’: samples poisson(yield) from each pdf that is extended.
limits (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – From which space to sample.
fixed_params (Union[bool, list[ZfitParameter], tuple[ZfitParameter], None]) – A list of Parameters that will be fixed during several resample calls. If True, all are fixed, if False, all are floating. If a Parameter is not fixed and its value gets updated (e.g. by a Parameter.set_value() call), this will be reflected in resample. If fixed, the Parameter will still have the same value as the SamplerData has been created with when it resamples.
params (TypeVar(ParamTypeInput, zfit.core.interfaces.ZfitParameter, Union[int, float, complex, Tensor, zfit.core.interfaces.ZfitParameter])) – Mapping of the parameter names to the actual values. The parameter names refer to the names of the parameters, typically Parameter, that the model was _initialized_ with, not the name of the models parametrization.

Return type:

SamplerData

Returns:

SamplerData

Raises:

NotExtendedPDFError – if ‘extended’ is chosen (implicitly by default or explicitly) as an option for n but the pdf itself is not extended.
ValueError – if n is an invalid string option.
InvalidArgumentError – if n is not specified and pdf is not extended.

property dtype: DType#: The dtype of the object.

property extended: Parameter | None#

Return the yield (only for extended models).

Returns:: The yield of the current model or None

classmethod from_asdf(asdf_obj, *, reuse_params=None)#: Load an object from an asdf file.

Args#

asdf_obj: Object reuse_params:If parameters, the parameters

will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.

classmethod from_dict(dict_, *, reuse_params=None)#

Creates an object from a dictionary structure as generated by to_dict.

Parameters:

dict – Dictionary structure.
reuse_params – If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.

Returns:

The deserialized object.

classmethod from_json(cls, json, *, reuse_params=None)#

Load an object from a json string.

Parameters:

json (str) – Serialized object in a JSON string.
reuse_params – If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.

Return type:

object

Returns:

The deserialized object.

get_cache_deps(only_floating=True)#

Return a set of all independent Parameter that this object depends on.

Parameters:: only_floating (bool) – If True, only return floating Parameter
Return type:: OrderedSet

get_dependencies(only_floating: bool = True) → ztyping.DependentsType#

DEPRECATED FUNCTION

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use get_params instead if you want to retrieve the independent parameters or get_cache_deps in case you need the numerical cache dependents (advanced).

Return type:: OrderedSet

get_params(floating=True, is_yield=None, extract_independent=True, only_floating=<class 'zfit.util.checks.NotSpecified'>)#

Recursively collect parameters that this object depends on according to the filter criteria.

Which parameters should be included can be steered using the arguments as a filter.

None: do not filter on this. E.g. floating=None will return parameters that are floating as well as
parameters that are fixed.
True: only return parameters that fulfil this criterion
False: only return parameters that do not fulfil this criterion. E.g. floating=False will return
only parameters that are not floating.

Parameters:

floating (bool | None) – if a parameter is floating, e.g. if floating() returns True
is_yield (bool | None) – if a parameter is a yield of the _current_ model. This won’t be applied recursively, but may include yields if they do also represent a parameter parametrizing the shape. So if the yield of the current model depends on other yields (or also non-yields), this will be included. If, however, just submodels depend on a yield (as their yield) and it is not correlated to the output of our model, they won’t be included.
extract_independent (bool | None) – If the parameter is an independent parameter, i.e. if it is a ZfitIndependentParameter.

Return type:

set[ZfitParameter]

classmethod get_repr()#

Abstract representation of the object for serialization.

This objects knows how to serialize and deserialize the object and is used by the to_json, from_json, to_dict and from_dict methods.

Returns:: The representation of the object.
Return type:: pydantic.BaseModel

get_yield()#

Return the yield (only for extended models).

Return type:: Parameter | None
Returns:: The yield of the current model or None

property is_extended: bool#

Flag to tell whether the model is extended or not.

Returns:: A boolean.

log_normalization(norm, *, options=None, params=None)#

Return the normalization of the function (usually the integral over norm).

Parameters:

norm (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – Normalization of the function. By default, this is the norm of the PDF (which by default is the same as the space of the PDF). Should be ZfitSpace to define the space to normalize over.
options – |@doc:pdf.param.options||@docend:pdf.param.options|
params (Optional[Iterable[ZfitParameter]]) – Mapping of the parameter names to the actual values. The parameter names refer to the names of the parameters, typically Parameter, that the model was _initialized_ with, not the name of the models parametrization.

Return type:

Union[float, Tensor]

Returns:

The normalization value

log_pdf(x, norm=None, *, params=None)#

Log probability density function normalized over norm_range.

Parameters:

x (Union[float, Tensor]) – Data to evaluate the method on. Should be ZfitData or a mapping of obs to numpy-like arrays. If an array is given, the first dimension is interpreted as the events while the second is meant to be the dimensionality of a single event.
norm (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – Normalization of the function. By default, this is the norm of the PDF (which by default is the same as the space of the PDF). Should be ZfitSpace to define the space to normalize over.
params (Optional[Iterable[ZfitParameter]]) – Mapping of the parameter names to the actual values. The parameter names refer to the names of the parameters, typically Parameter, that the model was _initialized_ with, not the name of the models parametrization.

Return type:

Union[float, Tensor]

Returns:

A Tensor of type self.dtype.

property name: str#: The name of the object.

property norm: Space | None | bool#

Return the current normalization range. If None and the obs have limits, they are returned.

Returns:: The current normalization range.

property norm_range: Space | None | bool#

Return the current normalization range. If None and the obs have limits, they are returned. (deprecated)

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use the norm attribute instead.

Returns:: The current normalization range.

normalization(norm=None, *, options=None, limits=None, params=None)#

Return the normalization of the function (usually the integral over norm). (deprecated arguments)

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (limits). They will be removed in a future version. Instructions for updating: Use norm instead.

Parameters:

norm (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – Normalization of the function. By default, this is the norm of the PDF (which by default is the same as the space of the PDF). Should be ZfitSpace to define the space to normalize over.
options – |@doc:pdf.param.options||@docend:pdf.param.options|
params (Optional[Iterable[ZfitParameter]]) – Mapping of the parameter names to the actual values. The parameter names refer to the names of the parameters, typically Parameter, that the model was _initialized_ with, not the name of the models parametrization.

Return type:

Union[float, Tensor]

Returns:

The normalization value

numeric_integrate(limits, norm=None, *, options=None, params=None)#

Numerical integration over the model.

Parameters:

limits (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – Limits of the integration.
norm (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – Normalization of the integration. By default, this is the same as the default space of the PDF. False means no normalization and returns the unnormed integral.
options –
Options for the integration. Additional options for the integration. Currently supported options are: - type: one of (bins)

This hints that bins are integrated. A method that is vectorizable, non-dynamic and therefore less suitable for complicated functions is chosen.
params (TypeVar(ParamTypeInput, zfit.core.interfaces.ZfitParameter, Union[int, float, complex, Tensor, zfit.core.interfaces.ZfitParameter])) – Mapping of the parameter names to the actual values. The parameter names refer to the names of the parameters, typically Parameter, that the model was _initialized_ with, not the name of the models parametrization.

Return type:

Union[float, Tensor]

Returns:

The integral value

classmethod register_additional_repr(**kwargs)#

Register an additional attribute to add to the repr.

Parameters:

an (any keyword argument. The value has to be gettable from the instance (has to be)
self. (attribute or callable method of)

classmethod register_analytic_integral(cls, func, limits=None, priority=50, *, supports_norm=None, supports_norm_range=None, supports_multiple_limits=None)#

Register an analytic integral with the class. (deprecated arguments)

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (supports_norm_range). They will be removed in a future version. Instructions for updating: Use supports_norm instead.

Parameters:

func (Callable) –
A function that calculates the (partial) integral over the axes limits. The signature has to be the following:
- x (ZfitData, None): the data for the remaining axes in a partial
  integral. If it is not a partial integral, this will be None.
- limits (ZfitSpace): the limits to integrate over.
- norm_range (ZfitSpace, None): Normalization range of the integral.
  If not supports_supports_norm_range, this will be None.
- params (Dict[param_name, zfit.Parameters]): The parameters of the model.
- model (ZfitModel):The model that is being integrated.
limits (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – If a Space is given, it is used as limits. Otherwise arguments to instantiate a Range class can be given as follows.|limits_init|
priority (int | float) – Priority of the function. If multiple functions cover the same space, the one with the highest priority will be used.
supports_multiple_limits (bool | None) – If True, the ``limits` given to the integration function can have multiple limits. If False, only simple limits will pass through and multiple limits will be auto-handled.
supports_norm (bool | None) – If True, norm argument to the function may not be None. If False, norm will always be None and care is taken of the normalization automatically.

Return type:

None

register_cacher(cacher)#

Register a cacher that caches values produces by this instance; a dependent.

Parameters:: cacher (ztyping.CacherOrCachersType)

classmethod register_inverse_analytic_integral(func)#

Register an inverse analytical integral, the inverse (unnormalized) cdf.

Parameters:: func (Callable) – A function with the signature func(x, params), where x is a Data object and params is a dict.
Return type:: None

reset_cache_self()#: Clear the cache of self and all dependent cachers.

sample(n=None, limits=None, *, x=None, params=None)#

Sample n points within limits from the model.

If limits is not specified, space is used (if the space contains limits). If n is None and the model is an extended pdf, ‘extended’ is used by default.

Parameters:

n (Union[int, Tensor, str]) –
The number of samples to be generated. Can be a Tensor that will be or a valid string. Currently implemented:
- ’extended’: samples poisson(yield) from each pdf that is extended.
limits (Union[Tuple[Tuple[float, ...]], Tuple[float, ...], bool, Space]) – In which region to sample in
params (TypeVar(ParamTypeInput, zfit.core.interfaces.ZfitParameter, Union[int, float, complex, Tensor, zfit.core.interfaces.ZfitParameter])) – Mapping of the parameter names to the actual values. The parameter names refer to the names of the parameters, typically Parameter, that the model was _initialized_ with, not the name of the models parametrization.

Returns:

The observables are the limits

Return type:

Data(n_obs, n_samples)

Raises:

NotExtendedPDFError – if ‘extended’ is (implicitly by default or explicitly) chosen as an option for n but the pdf itself is not extended.
ValueError – if n is an invalid string option.
InvalidArgumentError – if n is not specified and pdf is not extended.

set_norm_range(norm: ztyping.LimitsTypeInput)#

Set the normalization range (temporarily if used with contextmanager). (deprecated)

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Prefer to create a new PDF with norm set or wrap the existing in a TruncatedPDF.

Parameters:: norm (Union[ZfitLimit, Tensor, ndarray, Iterable[float], float, Tuple[float], List[float], bool, None])

set_yield(value)#

Make the model extended inplace by setting a yield. If possible, prefer to use create_extended.

This does not alter the general behavior of the PDF. The pdf and integrate and similar methods will continue to return the same - normalized to 1 - values. However, not only can this parameter be accessed via get_yield, the methods ext_pdf and ext_integral provide a version of pdf and integrate respecetively that is multiplied by the yield.

These can be useful for plotting and for binned likelihoods.

Parameters:: value – Yield (expected number of events) of the PDF. This is the expected number of events. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the ext_* methods and the counts (for binned PDFs).

to_asdf()#: Convert the object to an asdf file.

to_binned(space, extended=None, norm=None, name=None, label=None)#: Convert to binned pdf, returns self if already binned.

to_dict()#

Convert the object to a nested dictionary structure.

Returns:: The dictionary structure.
Return type:: dict

to_json()#

Convert the object to a json string.

Returns:: The json string.
Return type:: str

to_truncated(limits=None, *, obs=None, extended=None, norm=None, name=None, label=None)#

Convert the PDF to a truncated version with possibly different and multiple limits.

The arguments are the same as for TruncatedPDF, the only difference being that if no limits are given, the limit of the PDF is used, thereby truncating the PDF to its original limits.

Parameters:

pdf – The PDF to be truncated.
limits (Union[ZfitSpace, Iterable[ZfitSpace], None]) – The limits to truncate the PDF. Can be a single limit or multiple limits.
obs –
Observables of the model. This will be used as the default space of the PDF and, if not given explicitly, as the normalization range.

The default space is used for example in the sample method: if no sampling limits are given, the default space is used.

If the observables are binned and the model is unbinned, the model will be a binned model, by wrapping the model in a BinnedFromUnbinnedPDF, equivalent to calling to_binned().

The observables are not equal to the domain as it does not restrict or truncate the model outside this range.
extended – The overall yield of the PDF. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the ext_* methods and the counts (for binned PDFs).If None, the PDF will be extended if the original PDF is extended. If True and the original PDF is extended, the yield will be scaled to the fraction of the total integral that is within the limits. Therefore, the overall yield is comparable, i.e. the pdfs can be plotted “on top of each other”.
norm – Normalization of the PDF. By default, this is the same as the default space of the PDF.
name (str | None) – Name of the PDF. Maybe has implications on the serialization and deserialization of the PDF. For a human-readable name, use the label.
label (str | None) – Human-readable name or label of the PDF for a better description, to be used with plots etc. Has no programmatical functional purpose as identification.

to_unbinned()#: Convert to unbinned pdf, returns self if already unbinned.

to_yaml()#

Convert the object to a yaml string.

Returns:: The yaml string.
Return type:: str

unnormalized_pdf(x: ztyping.XType) → ztyping.XType#

PDF “unnormalized”. Use functions for unnormalized pdfs. this is only for performance in special cases. (deprecated)

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use pdf(norm=False) instead

Parameters:: x (Union[float, Tensor]) – Data to evaluate the method on. Should be ZfitData or a mapping of obs to numpy-like arrays. If an array is given, the first dimension is interpreted as the events while the second is meant to be the dimensionality of a single event.
Return type:: Union[float, Tensor]
Returns:: 1-dimensional tf.Tensor containing the unnormalized pdf.

update_integration_options(draws_per_dim=None, mc_sampler=None, tol=None, max_draws=None, draws_simpson=None)#

Set the integration options.

Parameters:

max_draws (default ~1'000'000) – Maximum number of draws when integrating . Typically 500’000 - 5’000’000.
tol – Tolerance on the error of the integral. typically 1e-4 to 1e-8
draws_per_dim – The draws for MC integration to do per iteration. Can be set to 'auto’.
draws_simpson – Number of points in one dimensional Simpson integration. Can be set to 'auto'.

KDE1DimISJ#

Args#