GaussianKDE1DimV1#
- class zfit.pdf.GaussianKDE1DimV1(obs, data, bandwidth=None, weights=None, truncate=False, *, extended=None, norm=None, name='GaussianKDE1DimV1')[source]#
Bases:
KDEHelper
,WrapDistribution
EXPERIMENTAL, `FEEDBACK WELCOME.
<zfit/zfit#new>`_ Exact, one dimensional, (truncated) Kernel Density Estimation with a Gaussian Kernel.
This implementation features an exact implementation as is preferably used for smaller (max. ~ a few thousand points) data sets. For larger data sets, methods such as
KDE1DimGrid
that bin the dataset may be more efficient Kernel Density Estimation is a non-parametric method to approximate the density of given points.\[f_h(x) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big)\]where the kernel in this case is a (truncated) Gaussian
\[K = \exp \Big(\frac{(x - x_i)^2}{\sigma^2}\Big)\]The bandwidth of the kernel can be estimated in different ways. It can either be a global bandwidth, corresponding to a single value, or a local bandwidth, each corresponding to one data point.
Usage
The KDE can be instantiated by using a numpy-like data sample, preferably a zfit.Data object. This will be used as the mean of the kernels. The bandwidth can either be given as a parameter (with length 1 for a global one or length equal to the data size) - a rather advanced concept for methods such as cross validation - or determined from the data automatically, either through a simple method like scott or silverman rule of thumbs or through an iterative, adaptive method.
Examples#
# generate some example kernels size = 150 data = np.random.normal(size=size, loc=2, scale=3) limits = (-15, 5) obs = zfit.Space("obs1", limits=limits) kde_silverman = zfit.pdf.GaussianKDE1DimV1(data=data, obs=obs, bandwidth='silverman') # for a better smoothing of the kernels, use an adaptive approach kde = zfit.pdf.GaussianKDE1DimV1(data=data, obs=obs, bandwidth='adaptive')
- type data:
TypeVar
(ParamTypeInput
, zfit.core.interfaces.ZfitParameter,Union
[int
,float
,complex
,Tensor
, zfit.core.interfaces.ZfitParameter])- param data:
Data sample to approximate the density from. The points represent positions of the kernel, the \(x_i\). This is preferrably a
ZfitData
, but can also be an array-like object.If the data has weights, they will be taken into account. This will change the count of the events, whereas weight \(w_i\) of \(x_i\) will scale the value of \(K_i( x_i)\), resulting in a factor of :math:`frac{w_i}{sum w_i} `.
If no weights are given, each kernel will be scaled by the same constant \(\frac{1}{n_{data}}\).
- type obs:
- param obs:
Observable space of the KDE. As with any other PDF, this will be used as the default norm, but does not define the domain of the PDF. Namely, this can be a smaller space than data, as long as the name of the observable match. Using a larger dataset is actually good practice avoiding bountary biases, see also Boundary bias and padding.
- type bandwidth:
Union
[TypeVar
(ParamTypeInput
, zfit.core.interfaces.ZfitParameter,Union
[int
,float
,complex
,Tensor
, zfit.core.interfaces.ZfitParameter]),str
]- param bandwidth:
Valid pre-defined options are {‘silverman’, ‘scott’, ‘adaptive’}.Bandwidth of the kernel, often also denoted as \(h\). For a Gaussian kernel, this corresponds to sigma. This can be calculated using pre-defined options or by specifying a numerical value that is broadcastable to data – a scalar or an array-like object with the same size as data.
A scalar value is usually referred to as a global bandwidth while an array holds local bandwidths
- param The bandwidth can also be a parameter:
- param which should be used with caution. However:
:param : :param it allows to use it in cross-valitadion with a likelihood method.: :type weights:
UnionType
[None
,ndarray
,Tensor
] :param weights:Weights of each eventin data, can be None or Tensor-like with shape compatible with data. Instead of using this parameter, it is preferred to use a
ZfitData
as data that contains weights. This will change the count of the events, whereas weight \(w_i\) of \(x_i\) will scale the value of \(K_i( x_i)\), resulting in a factor of :math:`frac{w_i}{sum w_i} `.If no weights are given, each kernel will be scaled by the same constant \(\frac{1}{n_{data}}\).
- type truncate:
- param truncate:
If a truncated Gaussian kernel should be used with the limits given by the obs lower and upper limits. This can cause NaNs in case datapoints are outside the limits.
- type extended:
Union
[bool
,TypeVar
(ParamTypeInput
, zfit.core.interfaces.ZfitParameter,Union
[int
,float
,complex
,Tensor
, zfit.core.interfaces.ZfitParameter]),None
]- param extended:
The overall yield of the PDF. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the
ext_*
methods and thecounts
(for binned PDFs).- type norm:
- param norm:
Normalization of the PDF. By default, this is the same as the default space of the PDF.
- type name:
- param name:
Human-readable name or label of the PDF for better identification. Has no programmatical functional purpose as identification.
- param extended:
The overall yield of the PDF. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the
ext_*
methods and thecounts
(for binned PDFs).
- add_cache_deps(cache_deps, allow_non_cachable=True)#
Add dependencies that render the cache invalid if they change.
- Parameters:
cache_deps (ztyping.CacherOrCachersType) –
allow_non_cachable (bool) – If True, allow cache_dependents to be non-cachables. If False, any cache_dependents that is not a ZfitGraphCachable will raise an error.
- Raises:
TypeError – if one of the cache_dependents is not a ZfitGraphCachable _and_ allow_non_cachable if False.
- analytic_integrate(limits, norm=None, *, norm_range=None)#
Analytical integration over function and raise Error if not possible.
- Parameters:
limits (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – Limits of the integration.norm (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – Normalization of the integration. By default, this is the same as the default space of the PDF.False
means no normalization and returns the unnormed integral.
- Return type:
- Returns:
The integral value
- Raises:
AnalyticIntegralNotImplementedError – If no analytical integral is available (for this limits).
NormRangeNotImplementedError – if the norm argument is not supported. This means that no analytical normalization is available, explicitly: the analytical integral over the limits = norm is not available.
- as_func(norm=False, *, norm_range=None)#
Return a Function with the function model(x, norm=norm).
- copy(**override_parameters)#
Creates a copy of the model.
Note: the copy model may continue to depend on the original initialization arguments.
- Parameters:
**override_parameters – String/value dictionary of initialization arguments to override with new value.
- Return type:
- Returns:
- A new instance of type(self) initialized from the union
of self.parameters and override_parameters, i.e., dict(self.parameters, **override_parameters).
- create_extended(yield_, name=None, *, name_addition=None)#
Return an extended version of this pdf with yield
yield_
. The parameters are shared.- Parameters:
yield – Yield (expected number of events) of the PDF. This is the expected number of events. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the
ext_*
methods and thecounts
(for binned PDFs).name (
str
) – New name of the PDF. IfNone
, the name of the PDF with a trailing “_ext” is used.
- Returns:
a new PDF that is extended
- Return type:
ZfitPDF
- create_projection_pdf(limits, *, options=None, limits_to_integrate=None)#
Create a PDF projection by integrating out some dimensions.
The new projection pdf is still fully dependent on the pdf it was created with.
- Parameters:
limits (
Union
[ZfitLimit
,Tensor
,ndarray
,Iterable
[float
],float
,Tuple
[float
],List
[float
],bool
,None
]) – |@doc:pdf.partial_integrate.limits||@docend:pdf.partial_integrate.limit|options –
Options for the integration. Additional options for the integration. Currently supported options are: - type: one of (
bins
)This hints that bins are integrated. A method that is vectorizable, non-dynamic and therefore less suitable for complicated functions is chosen.
- Return type:
ZfitPDF
- Returns:
A pdf without the dimensions from
limits
.
- create_sampler(n=None, limits=None, fixed_params=True)#
Create a
Sampler
that acts as Data but can be resampled, also with changed parameters and n.If limits is not specified, space is used (if the space contains limits). If n is None and the model is an extended pdf, ‘extended’ is used by default.
- Parameters:
The number of samples to be generated. Can be a Tensor that will be or a valid string. Currently implemented:
’extended’: samples poisson(yield) from each pdf that is extended.
limits (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – From which space to sample.fixed_params (
bool
|list
[ZfitParameter
] |tuple
[ZfitParameter
]) – A list of Parameters that will be fixed during several resample calls. If True, all are fixed, if False, all are floating. If aParameter
is not fixed and its value gets updated (e.g. by a Parameter.set_value() call), this will be reflected in resample. If fixed, the Parameter will still have the same value as the Sampler has been created with when it resamples.
- Return type:
Sampler
- Returns:
Sampler
- Raises:
NotExtendedPDFError – if ‘extended’ is chosen (implicitly by default or explicitly) as an option for n but the pdf itself is not extended.
ValueError – if n is an invalid string option.
InvalidArgumentError – if n is not specified and pdf is not extended.
- property dtype: DType#
The dtype of the object.
- property extended: Parameter | None#
Return the yield (only for extended models).
- Returns:
The yield of the current model or None
- get_cache_deps(only_floating=True)#
Return a set of all independent
Parameter
that this object depends on.- Parameters:
only_floating (
bool
) – IfTrue
, only return floatingParameter
- Return type:
OrderedSet
- get_dependencies(only_floating: bool = True) ztyping.DependentsType #
DEPRECATED FUNCTION
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use get_params instead if you want to retrieve the independent parameters or get_cache_deps in case you need the numerical cache dependents (advanced).
- Return type:
OrderedSet
- get_params(floating=True, is_yield=None, extract_independent=True, only_floating=<class 'zfit.util.checks.NotSpecified'>)#
Recursively collect parameters that this object depends on according to the filter criteria.
- Which parameters should be included can be steered using the arguments as a filter.
- None: do not filter on this. E.g.
floating=None
will return parameters that are floating as well as parameters that are fixed.
- None: do not filter on this. E.g.
True: only return parameters that fulfil this criterion
- False: only return parameters that do not fulfil this criterion. E.g.
floating=False
will return only parameters that are not floating.
- False: only return parameters that do not fulfil this criterion. E.g.
- Parameters:
floating (
Optional
[bool
]) – if a parameter is floating, e.g. iffloating()
returns Trueis_yield (
Optional
[bool
]) – if a parameter is a yield of the _current_ model. This won’t be applied recursively, but may include yields if they do also represent a parameter parametrizing the shape. So if the yield of the current model depends on other yields (or also non-yields), this will be included. If, however, just submodels depend on a yield (as their yield) and it is not correlated to the output of our model, they won’t be included.extract_independent (
Optional
[bool
]) – If the parameter is an independent parameter, i.e. if it is aZfitIndependentParameter
.
- Return type:
set
[ZfitParameter
]
- get_yield()#
Return the yield (only for extended models).
- log_pdf(x, norm=None, *, norm_range=None)#
Log probability density function normalized over
norm_range
.- Parameters:
x (
Union
[float
,Tensor
]) – Data to evaluate the method on. Should beZfitData
or a mapping of obs to numpy-like arrays. If an array is given, the first dimension is interpreted as the events while the second is meant to be the dimensionality of a single event.norm (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – Normalization of the function. By default, this is thenorm
of the PDF (which by default is the same as the space of the PDF). Should beZfitSpace
to define the space to normalize over.
- Return type:
- Returns:
A
Tensor
of typeself.dtype
.
- property norm: Space | None | bool#
Return the current normalization range. If None and the
obs
have limits, they are returned.- Returns:
The current normalization range.
- property norm_range: Space | None | bool#
Return the current normalization range. If None and the
obs
have limits, they are returned. (deprecated)Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use the norm attribute instead.
- Returns:
The current normalization range.
- normalization(norm, *, options=None, limits=None)#
Return the normalization of the function (usually the integral over
norm
). (deprecated arguments)Deprecated: SOME ARGUMENTS ARE DEPRECATED: (limits). They will be removed in a future version. Instructions for updating: Use norm instead.
- Parameters:
norm (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – Normalization of the function. By default, this is thenorm
of the PDF (which by default is the same as the space of the PDF). Should beZfitSpace
to define the space to normalize over.() (options) – |@doc:pdf.param.options||@docend:pdf.param.options|
- Return type:
- Returns:
The normalization value
- numeric_integrate(limits, norm=None, *, options=None, norm_range=None)#
Numerical integration over the model.
- Parameters:
limits (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – Limits of the integration.norm (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – Normalization of the integration. By default, this is the same as the default space of the PDF.False
means no normalization and returns the unnormed integral.options –
Options for the integration. Additional options for the integration. Currently supported options are: - type: one of (
bins
)This hints that bins are integrated. A method that is vectorizable, non-dynamic and therefore less suitable for complicated functions is chosen.
- Return type:
- Returns:
The integral value
- classmethod register_additional_repr(**kwargs)#
Register an additional attribute to add to the repr.
- Parameters:
an (any keyword argument. The value has to be gettable from the instance (has to be) –
self. (attribute or callable method of) –
- classmethod register_analytic_integral(cls, func, limits=None, priority=50, *, supports_norm=None, supports_norm_range=None, supports_multiple_limits=None)#
Register an analytic integral with the class. (deprecated arguments)
Deprecated: SOME ARGUMENTS ARE DEPRECATED: (supports_norm_range). They will be removed in a future version. Instructions for updating: Use supports_norm instead.
- Parameters:
func (
Callable
) –A function that calculates the (partial) integral over the axes
limits
. The signature has to be the following:- x (
ZfitData
, None): the data for the remaining axes in a partial integral. If it is not a partial integral, this will be None.
- x (
limits (
ZfitSpace
): the limits to integrate over.- norm_range (
ZfitSpace
, None): Normalization range of the integral. If not
supports_supports_norm_range
, this will be None.
- norm_range (
params (Dict[param_name,
zfit.Parameters
]): The parameters of the model.model (
ZfitModel
):The model that is being integrated.
limits (
Union
[Tuple
[Tuple
[float
,...
]],Tuple
[float
,...
],bool
,Space
]) – If aSpace
is given, it is used as limits. Otherwise arguments to instantiate a Range class can be given as follows.|limits_init|priority (
int
|float
) – Priority of the function. If multiple functions cover the same space, the one with the highest priority will be used.supports_multiple_limits (
bool
) – IfTrue
, the ``limits` given to the integration function can have multiple limits. If False, only simple limits will pass through and multiple limits will be auto-handled.supports_norm (
bool
) – If True, norm argument to the function may not be None. If False, norm will always be None and care is taken of the normalization automatically.
- Return type:
- register_cacher(cacher)#
Register a cacher that caches values produces by this instance; a dependent.
- Parameters:
cacher (ztyping.CacherOrCachersType) –
- classmethod register_inverse_analytic_integral(func)#
Register an inverse analytical integral, the inverse (unnormalized) cdf.
- reset_cache_self()#
Clear the cache of self and all dependent cachers.
- sample(n=None, limits=None, x=None)#
Sample n points within limits from the model.
If limits is not specified, space is used (if the space contains limits). If n is None and the model is an extended pdf, ‘extended’ is used by default.
- Parameters:
- Return type:
SampleData
- Returns:
SampleData(n_obs, n_samples)
- Raises:
NotExtendedPDFError – if ‘extended’ is (implicitly by default or explicitly) chosen as an option for n but the pdf itself is not extended.
ValueError – if n is an invalid string option.
InvalidArgumentError – if n is not specified and pdf is not extended.
- set_norm_range(norm: ztyping.LimitsTypeInput)#
Set the normalization range (temporarily if used with contextmanager). (deprecated)
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Prefer to create a new PDF with norm set.
- set_yield(value)#
Make the model extended inplace by setting a yield. If possible, prefer to use
create_extended
.This does not alter the general behavior of the PDF. The
pdf
andintegrate
and similar methods will continue to return the same - normalized to 1 - values. However, not only can this parameter be accessed viaget_yield
, the methodsext_pdf
andext_integral
provide a version ofpdf
andintegrate
respecetively that is multiplied by the yield.These can be useful for plotting and for binned likelihoods.
- Parameters:
value – Yield (expected number of events) of the PDF. This is the expected number of events. If this is parameter-like, it will be used as the yield, the expected number of events, and the PDF will be extended. An extended PDF has additional functionality, such as the
ext_*
methods and thecounts
(for binned PDFs).
- to_binned(space, *, extended=None, norm=None)#
Convert to binned pdf, returns self if already binned.
- to_unbinned()#
Convert to unbinned pdf, returns self if already unbinned.
- unnormalized_pdf(x: ztyping.XType) ztyping.XType #
PDF “unnormalized”. Use
functions
for unnormalized pdfs. this is only for performance in special cases. (deprecated)Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use pdf(norm=False) instead
- Parameters:
x (
Union
[float
,Tensor
]) – Data to evaluate the method on. Should beZfitData
or a mapping of obs to numpy-like arrays. If an array is given, the first dimension is interpreted as the events while the second is meant to be the dimensionality of a single event.- Return type:
- Returns:
1-dimensional
tf.Tensor
containing the unnormalized pdf.
- update_integration_options(draws_per_dim=None, mc_sampler=None, tol=None, max_draws=None, draws_simpson=None)#
Set the integration options.
- Parameters:
max_draws (default ~1'000'000) – Maximum number of draws when integrating . Typically 500’000 - 5’000’000.
tol – Tolerance on the error of the integral. typically 1e-4 to 1e-8
draws_per_dim – The draws for MC integration to do per iteration. Can be set to
'auto
’.draws_simpson – Number of points in one dimensional Simpson integration. Can be set to
'auto'
.