SamplerData#

class zfit.data.SamplerData(data, *, sample_and_weights_func, sample_holder, n, weights=None, weights_holder=None, params=None, obs=None, name=None, label=None, dtype=tf.float64, use_hash=None, guarantee_limits=False)[source]#

Bases: Data

Create a SamplerData object.

Use constructor from_sampler instead.

property fixed_params#

DEPRECATED FUNCTION

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use params instead.

classmethod from_sample(sample_func: Callable, n: ztyping.NumericalScalarType, obs: ztyping.ObsTypeInput, fixed_params=None, name: str | None = None, weights=None, dtype=None, use_hash: bool | None = None)[source]#

DEPRECATED FUNCTION

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use from_sampler instead (with an ‘r’ at the end).

classmethod from_sampler(cls, *, sample_func=None, sample_and_weights_func=None, n, obs, params=None, fixed_params=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)[source]#

Create a SamplerData from a sampler function. (deprecated arguments)

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (fixed_params). They will be removed in a future version. Instructions for updating: Use params instead.

This is a more flexible way to create a SamplerData. Instead of providing a fixed sample, a sampler function is provided that will be called to sample the data. If the data is used in the loss, the sampler function will updated the value in the compiled version.

Note

If any method of the SamplerData is used to create a new data object, such as with_obs, the resulting data will be a Data object and not a SamplerData object; the data will be fixed and not resampled.

Parameters:
  • sample_func (Optional[Callable]) – A callable that takes as argument n and returns a sample of the data. The sample has to have the same number of observables as the obs of the SamplerData. If None, sample_and_weights_func has to be given.

  • sample_and_weights_func (Optional[Callable]) – A callable that takes as argument n and returns a tuple of the sample and the weights of the data. The sample has to have the same number of observables as the obs of the SamplerData. If None, sample_func has to be given.

  • n (Union[int, float, complex, Tensor, ZfitParameter]) – The number of samples to produce initially. This is used to have a first sample that can be used for compilation.

  • obs (Union[str, Iterable[str], Space]) – Observables of the data. If the space has limits, the data will be cut to the limits.

  • params (Optional[Mapping[Union[str, ZfitParameter], Union[int, float, complex, Tensor, ZfitParameter]]]) – A mapping from Parameter or a string to a numerical value. This is used as the default values for the parameters in the sample_func or sample_and_weights_func and needs to fully specify the parameters.

  • name (str | None) – ​Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.​

  • label (str | None) – ​Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.​

  • dtype – The dtype of the data.

  • use_hash (bool | None) – ​If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.​

  • guarantee_limits (bool) – ​Guarantee that the data is within the limits. If True, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.​

update_data(sample, weights=None, guarantee_limits=False)[source]#

Load a new sample into the dataset, presumably similar to the previous one.

Parameters:
  • sample (Union[Tensor, TensorProtocol, int, float, bool, str, bytes, complex, tuple, list, ndarray, generic]) – The new sample to load. Has to have the same number of observables as the obs of the SamplerData but can have a different number of events.

  • weights (Union[Tensor, TensorProtocol, int, float, bool, str, bytes, complex, tuple, list, ndarray, generic, None]) – The weights of the new sample. If None, the weights are not changed. If the SamplerData was initialized with weights, this has to be given. If the SamplerData was initialized without weights, this cannot be given.

  • guarantee_limits (bool) – ​Guarantee that the data is within the limits. If True, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.​

resample(params=None, *, n=None, param_values=None)[source]#

Update the sample by newly sampling. This affects any object that used this data already internally. (deprecated arguments)

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (param_values). They will be removed in a future version. Instructions for updating: Use params instead.

All params that are not in the attribute params will use their current value for the creation of the new sample. The value can also be overwritten for one sampling by providing a mapping with param_values from Parameter to the temporary value.

Parameters:
add_cache_deps(cache_deps, allow_non_cachable=True)#

Add dependencies that render the cache invalid if they change.

Parameters:
  • cache_deps (ztyping.CacherOrCachersType)

  • allow_non_cachable (bool) – If True, allow cache_dependents to be non-cachables. If False, any cache_dependents that is not a ZfitGraphCachable will raise an error.

Raises:

TypeError – if one of the cache_dependents is not a ZfitGraphCachable _and_ allow_non_cachable if False.

property data_range#

DEPRECATED FUNCTION

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use space instead.

enable_hashing()#

Enable hashing for this data object if it was disabled.

A hash allows some objects to be cached and reused. If a hash is enabled, the data object will be hashed and the hash _can_ be used for caching. This can speedup various objects, however, it maybe doesn’t have an effect at all. For example, if an object was already called before with the data object, the hash will probably not be used, as the object is already compiled.

classmethod from_asdf(asdf_obj, *, reuse_params=None)#

Load an object from an asdf file.

Args#

asdf_obj: Object reuse_params:​If parameters, the parameters

will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.​

classmethod from_dict(dict_, *, reuse_params=None)#

Creates an object from a dictionary structure as generated by to_dict.

Parameters:
  • dict – Dictionary structure.

  • reuse_params – ​If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.​

Returns:

The deserialized object.

classmethod from_json(cls, json, *, reuse_params=None)#

Load an object from a json string.

Parameters:
  • json (str) – Serialized object in a JSON string.

  • reuse_params – ​If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.​

Return type:

object

Returns:

The deserialized object.

classmethod from_mapping(mapping, obs=None, *, weights=None, label=None, name=None, dtype=None, use_hash=None, guarantee_limits=False)#

Create a Data from a mapping of observables to arrays.

Parameters:
  • mapping (Mapping[str, Union[Tensor, TensorProtocol, int, float, bool, str, bytes, complex, tuple, list, ndarray, generic]]) – A mapping from the observables to the data, with the observables as keys and the data as values.

  • obs (Union[str, Iterable[str], Space]) –

    ​Space of the data. The space is used to define the observables and the limits of the data. If the Space has limits, these will be used to cut the data. If the data is already cut, use guarantee_limits for a possible performance improvement.​

    They will be matched to the data in the same order. Can be omitted, in which case the keys of the mapping are used as observables.

  • weights (Union[Tensor, TensorProtocol, int, float, bool, str, bytes, complex, tuple, list, ndarray, generic, None]) –

    ​Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.​

    Can also be a string that is a column in the dataframe. By default, look for a column "", i.e., an empty string.

  • name (str | None) – ​Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.​

  • label (str | None) – ​Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.​

  • dtype (DType) – dtype of the data

  • use_hash (bool | None) – ​If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.​

  • guarantee_limits (bool | None) – ​Guarantee that the data is within the limits. If True, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.​

Returns:

data.init.returns| zfit.Data or zfit.BinnedData:

A Data object containing the unbinned data or a BinnedData if the obs is binned.​

Return type:

|@doc

Raises:

ValueError – If the observables are not in the mapping.

classmethod from_numpy(obs, array, *, weights=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)#

Create Data from a np.array.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – ​Space of the data. The space is used to define the observables and the limits of the data. If the Space has limits, these will be used to cut the data. If the data is already cut, use guarantee_limits for a possible performance improvement.​

  • array (ndarray) – Numpy array containing the data. Has to be of shape (nevents, nobs) or, if only one observable, (nevents) is also possible.

  • weights (Union[Tensor, None, ndarray]) – ​Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.​

  • name (str | None) – ​Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.​

  • label (str | None) – ​Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.​

  • dtype (DType) – dtype of the data.

  • use_hash – ​If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.​

  • guarantee_limits (bool) – ​Guarantee that the data is within the limits. If True, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.​

Returns:

data.init.returns| zfit.Data or zfit.BinnedData:

A Data object containing the unbinned data or a BinnedData if the obs is binned.​

Return type:

|@doc

Raises:

TypeError – If the array is not a numpy array.

classmethod from_pandas(df, obs=None, *, weights=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)#

Create a Data from a pandas DataFrame. If obs is None, columns are used as obs.

Parameters:
  • df (DataFrame) – pandas DataFrame that contains the data. If obs is None, columns are used as obs. Can be a superset of obs.

  • obs (Union[str, Iterable[str], Space]) –

    ​Space of the data. The space is used to define the observables and the limits of the data. If the Space has limits, these will be used to cut the data. If the data is already cut, use guarantee_limits for a possible performance improvement.​

    If None, columns are used as obs.

  • weights (Union[Tensor, None, ndarray, str]) – ​Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.​

  • name (str | None) – ​Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.​

  • label (str | None) – ​Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.​

  • guarantee_limits (bool) –

    ​Guarantee that the data is within the limits. If True, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.​For example, if the data is a pd.DataFrame and the limits

    of obs have already been enforced through a query on the DataFrame, the limits are guaranteed to be correct and the data will not be checked again. Possible speedup, should not have any effect on the result.

  • dtype (DType) – The DType of the return value. Defaults to the zfit default (usually float64).

  • use_hash (bool | None) – ​If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.​

Returns:

data.init.returns| zfit.Data or zfit.BinnedData:

A Data object containing the unbinned data or a BinnedData if the obs is binned.​

Return type:

|@doc

Raises:

ValueError – If the observables are not in the dataframe.

classmethod from_root(path, treepath, obs=None, *, weights=None, obs_alias=None, name=None, label=None, dtype=None, root_dir_options=None, use_hash=None, branches=None, branches_alias=None)#

Create a Data from a ROOT file. Arguments are passed to uproot.

The arguments are passed to uproot directly.

Parameters:
  • path (str) – Path to the root file.

  • treepath (str) – Name of the tree in the root file.

  • obs (ZfitSpace) – Observables of the data. This will also be the columns of the data if not obs_alias is given.

  • weights (Union[Tensor, None, ndarray, str]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents). Can be a column of the ROOT file by using a string corresponding to a column.

  • obs_alias (Mapping[str, str] | None) – A mapping from the obs (as keys) to the actual branches (as values) in the root file. This allows to have different observable names, independent of the branch name in the file.

  • name (str | None) – ​Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.​

  • label (str | None) – ​Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.​

  • dtype (DType) – dtype of the data.

  • root_dir_options – Options passed to uproot.

  • use_hash (bool | None) – If True, a hash of the data is created and is used to identify it in caching.

Returns:

A Data object containing the unbinned data.

Return type:

zfit.Data

classmethod from_tensor(obs, tensor, *, weights=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)#

Create a Data from a tf.Tensor

Parameters:
  • obs (Union[str, Iterable[str], Space]) – ​Space of the data. The space is used to define the observables and the limits of the data. If the Space has limits, these will be used to cut the data. If the data is already cut, use guarantee_limits for a possible performance improvement.​

  • tensor (Tensor) – Tensor containing the data. Has to be of shape (nevents, nobs) or, if only one observable, (nevents) is also possible.

  • weights (Union[Tensor, None, ndarray]) – ​Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.​

  • name (str | None) – ​Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.​

  • label (str | None) – ​Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.​

  • dtype (DType) – dtype of the data.

  • use_hash – ​If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.​

  • guarantee_limits (bool) – ​Guarantee that the data is within the limits. If True, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.​

Returns:

data.init.returns| zfit.Data or zfit.BinnedData:

A Data object containing the unbinned data or a BinnedData if the obs is binned.​

Return type:

|@doc

Raises:

TypeError – If the tensor is not a tensorflow tensor.

property name: str#

The name of the object.

register_cacher(cacher)#

Register a cacher that caches values produces by this instance; a dependent.

Parameters:

cacher (ztyping.CacherOrCachersType)

reset_cache_self()#

Clear the cache of self and all dependent cachers.

set_data_range(data_range)#

DEPRECATED FUNCTION

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Do not change the range, preferably use pandas or similar, or use with_obs instead.

set_weights(weights: ztyping.WeightsInputType)#

Set (temporarily) the weights of the dataset. (deprecated)

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use with_weights instead.

Parameters:

weights (Union[Tensor, None, ndarray])

sort_by_axes(axes: ztyping.AxesTypeInput, allow_superset: bool = True)#

DEPRECATED FUNCTION

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use with_obs instead.

sort_by_obs(obs: ztyping.ObsTypeInput, allow_superset: bool = False)#

DEPRECATED FUNCTION

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use with_obs instead.

to_asdf()#

Convert the object to an asdf file.

to_binned(space, *, name=None, label=None, use_hash=None)#

Bins the data using space and returns a BinnedData object.

Parameters:
  • space (Space) – The space to bin the data in.

  • name (str | None) – ​Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.​

  • label (str | None) – ​Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.​

  • use_hash (bool | None) – ​If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.​

Returns:

A new BinnedData object containing the binned data.

Return type:

zfit.BinnedData

to_dict()#

Convert the object to a nested dictionary structure.

Returns:

The dictionary structure.

Return type:

dict

to_json()#

Convert the object to a json string.

Returns:

The json string.

Return type:

str

to_numpy()#

Return the data as a numpy array.

Pandas DataFrame equivalent method :returns: The data as a numpy array. :rtype: np.ndarray

to_pandas(obs=None, weightsname=None)#

Create a pd.DataFrame from obs as columns and return it.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – The observables to use as columns. If None, all observables are used.

  • weightsname (str | None) – The name of the weights column if the data has weights. If None, defaults to "", an empty string.

Returns:

A pd.DataFrame containing the data and the weights (if present).

Return type:

pd.DataFrame

to_yaml()#

Convert the object to a yaml string.

Returns:

The yaml string.

Return type:

str

unstack_x(obs=None, always_list=None)#

Return the unstacked data: a list of tensors or a single Tensor.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – Observables to return. If None, all observables are returned. Can be a subset of the original

  • always_list – If True, always return a list, even if only one observable is requested.

Returns:

List(tf.Tensor)

value(obs=None, axis=None)#

Return the data as a numpy-like object in obs order.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – Observables to return. If None, all observables are returned. Can be a subset of the original observables. If a string is given, a 1-D array is returned with shape (nevents,). If a list of strings or a zfit.Space is given, a 2-D array is returned with shape (nevents, nobs).

  • axis (int | None) – If given, the axis to return instead of the full data. If obs is a string, this has to be None.

Return type:

Tensor

Returns:

property weights#

Get the weights of the data.

with_obs(obs, *, guarantee_limits=False)#

Create a new Data with a subset of the data using the obs.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – Observables to return. Has to be a subset of the original observables.

  • guarantee_limits (bool) – ​Guarantee that the data is within the limits. If True, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.​

Returns:

A new Data object containing the subset of the data.

Return type:

zfit.Data

with_weights(weights)#

Create a new Data with a different set of weights.

Parameters:

weights (Union[Tensor, None, ndarray]) – The new weights to use. Has to be 1-D and match the shape of the data (nevents).

Returns:

A new Data object containing the new weights.

Return type:

zfit.Data