Data#

class zfit.data.Data(dataset, obs=None, name=None, weights=None, dtype=None, use_hash=None)[source]#

Bases: ZfitUnbinnedData, BaseDimensional, BaseObject, GraphCachable, SerializableMixin, ZfitSerializable

Create a data holder from a dataset used to feed into models.

Parameters:
  • dataset (DatasetV2 | LightDataset) – A dataset storing the actual values

  • obs (Union[str, Iterable[str], Space]) – Observables where the data is defined in

  • name (str | None) – Name of the Data

  • weights – Weights of the data

  • dtype (DType) – The DType of the return value. Defaults to the zfit default (usually float64).

  • use_hash (bool | None) – Whether to use a hash for caching

property weights#

Get the weights of the data.

set_weights(weights: ztyping.WeightsInputType)[source]#

Set (temporarily) the weights of the dataset. (deprecated)

Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Do not set the weights on a data set, create a new one instead.

Parameters:

weights (Union[Tensor, None, ndarray]) –

classmethod from_pandas(df, obs=None, weights=None, name=None, dtype=None, use_hash=None)[source]#

Create a Data from a pandas DataFrame. If obs is None, columns are used as obs.

Parameters:
  • df (DataFrame) – pandas DataFrame that contains the data. If obs is None, columns are used as obs. Can be a superset of obs.

  • obs (Union[str, Iterable[str], Space]) – obs to use for the data. obs have to be the columns in the data frame. If None, columns are used as obs.

  • weights (Union[Tensor, None, ndarray, str]) –

    Weights of the data. Has to be 1-D and match the shape of the data (nevents) or a string that is a column in the dataframe. By default, looks for a column "", i.e.

    an empty string.

  • name (str | None) –

  • dtype (DType) – dtype of the data

  • use_hash (bool | None) – If True, a hash of the data is created and is used to identify it in caching.

classmethod from_root(cls, path, treepath, obs=None, *, weights=None, obs_alias=None, name=None, dtype=None, root_dir_options=None, use_hash=None, branches=None, branches_alias=None)[source]#

Create a Data from a ROOT file. Arguments are passed to uproot. (deprecated arguments) (deprecated arguments)

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (branches). They will be removed in a future version. Instructions for updating: Use obs instead.

Deprecated: SOME ARGUMENTS ARE DEPRECATED: (branches_alias). They will be removed in a future version. Instructions for updating: Use obs_alias instead and make sure to invert the logic! I.e. it’s a mapping from the observable name to the actual branch name.

The arguments are passed to uproot directly.

Parameters:
  • path (str) – Path to the root file.

  • treepath (str) – Name of the tree in the root file.

  • obs (ZfitSpace) – Observables of the data. This will also be the columns of the data if not obs_alias is given.

  • weights (Union[Tensor, None, ndarray, str]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents). Can be a column of the ROOT file by using a string corresponding to a column.

  • obs_alias (Mapping[str, str] | None) – A mapping from the obs (as keys) to the actual branches (as values) in the root file. This allows to have different observable names, independent of the branch name in the file.

  • name (str | None) –

  • root_dir_options

Returns:

A Data object containing the unbinned data.

Return type:

zfit.Data

classmethod from_numpy(obs, array, weights=None, name=None, dtype=None, use_hash=None)[source]#

Create Data from a np.array.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – Observables of the data. They will be matched to the data in the same order.

  • array (ndarray) – Numpy array containing the data.

  • weights (Union[Tensor, None, ndarray]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents).

  • name (str | None) – Name of the data.

  • dtype (DType) – dtype of the data.

  • use_hash – If True, a hash of the data is created and is used to identify it in caching.

Returns:

A Data object containing the unbinned data.

Return type:

zfit.Data

classmethod from_tensor(obs, tensor, weights=None, name=None, dtype=None, use_hash=None)[source]#

Create a Data from a tf.Tensor. Value simply returns the tensor (in the right order).

Parameters:
  • obs (Union[str, Iterable[str], Space]) – Observables of the data. They will be matched to the data in the same order.

  • tensor (Tensor) – Tensor containing the data.

  • weights (Union[Tensor, None, ndarray]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents).

  • name (str | None) – Name of the data.

Returns:

A Data object containing the unbinned data.

Return type:

zfit.Data

with_obs(obs)[source]#

Create a new Data with a subset of the data using the obs.

Parameters:

obs – Observables to return. Has to be a subset of the original observables.

Returns:

A new Data object containing the subset of the data.

Return type:

zfit.Data

to_pandas(obs=None, weightsname=None)[source]#

Create a pd.DataFrame from obs as columns and return it.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – The observables to use as columns. If None, all observables are used.

  • weightsname (str | None) – The name of the weights column if the data has weights. If None, defaults to "", an empty string.

Returns:

A pd.DataFrame containing the data and the weights (if present).

Return type:

pd.DataFrame

unstack_x(obs=None, always_list=None)[source]#

Return the unstacked data: a list of tensors or a single Tensor.

Parameters:
  • obs (Union[str, Iterable[str], Space]) – Observables to return. If None, all observables are returned. Can be a subset of the original

  • always_list – If True, always return a list, even if only one observable is requested.

Returns:

List(tf.Tensor)

value(obs=None)[source]#

Return the data as a numpy-like object in obs order.

Parameters:

obs (Union[str, Iterable[str], Space]) – Observables to return. If None, all observables are returned. Can be a subset of the original observables. If a string is given, a 1-D array is returned with shape (nevents,). If a list of strings or a zfit.Space is given, a 2-D array is returned with shape (nevents, nobs).

Returns:

add_cache_deps(cache_deps, allow_non_cachable=True)#

Add dependencies that render the cache invalid if they change.

Parameters:
  • cache_deps (ztyping.CacherOrCachersType) –

  • allow_non_cachable (bool) – If True, allow cache_dependents to be non-cachables. If False, any cache_dependents that is not a ZfitGraphCachable will raise an error.

Raises:

TypeError – if one of the cache_dependents is not a ZfitGraphCachable _and_ allow_non_cachable if False.

classmethod from_asdf(asdf_obj, *, reuse_params=None)#

Load an object from an asdf file.

Args#

asdf_obj: Object reuse_params:​If parameters, the parameters

will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.​

classmethod from_dict(dict_, *, reuse_params=None)#

Creates an object from a dictionary structure as generated by to_dict.

Parameters:
  • dict – Dictionary structure.

  • reuse_params – ​If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.​

Returns:

The deserialized object.

classmethod from_json(cls, json, *, reuse_params=None)#

Load an object from a json string.

Parameters:
  • json (str) – Serialized object in a JSON string.

  • reuse_params – ​If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.​

Return type:

object

Returns:

The deserialized object.

classmethod get_repr()#

Abstract representation of the object for serialization.

This objects knows how to serialize and deserialize the object and is used by the to_json, from_json, to_dict and from_dict methods.

Returns:

The representation of the object.

Return type:

pydantic.BaseModel

property name: str#

The name of the object.

register_cacher(cacher)#

Register a cacher that caches values produces by this instance; a dependent.

Parameters:

cacher (ztyping.CacherOrCachersType) –

reset_cache_self()#

Clear the cache of self and all dependent cachers.

to_asdf()#

Convert the object to an asdf file.

to_dict()#

Convert the object to a nested dictionary structure.

Returns:

The dictionary structure.

Return type:

dict

to_json()#

Convert the object to a json string.

Returns:

The json string.

Return type:

str

to_yaml()#

Convert the object to a yaml string.

Returns:

The yaml string.

Return type:

str