Data#
- class zfit.data.Data(data, *, obs=None, weights=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)[source]#
Bases:
ZfitUnbinnedData
,BaseDimensional
,BaseObject
,GraphCachable
,SerializableMixin
,ZfitSerializable
Create data, a thin wrapper around an array-like structure that supports weights and limits.
Instead of creating a
Data
object directly, thefrom_*
constructors, such asfrom_pandas
,from_mapping
,from_tensor
, andfrom_numpy
, can be used for a more fine-grained control of some arguments and for more extensive documentation on the allowed arguments.The data is unbinned, i.e. it is a collection of events. The data can be weighted and is defined in a space, which is a set of observables, whose limits are enforced.
- Parameters:
data (
LightDataset
|DataFrame
|Mapping
[str
,ndarray
] |Tensor
|ndarray
|Data
) – A dataset storing the actual values. A variety of data-types are possible, as long as they are array-like.obs (
Union
[str
,Iterable
[str
],Space
]) –Space of the data. The space is used to define the observables and the limits of the data. If the
Space
has limits, these will be used to cut the data. If the data is already cut, useguarantee_limits
for a possible performance improvement.Some data-types, such as
pd.DataFrame
, already have observables defined implicitly. Ifobs
isNone
, the observables are inferred from the data. If theobs
is binned, the unbinned data will be binned according to the binning of theobs
and aBinnedData
will be returned.weights (
Union
[Tensor
,TensorProtocol
,int
,float
,bool
,str
,bytes
,complex
,tuple
,list
,ndarray
,generic
]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.name (
str
|None
) – Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.label (
str
|None
) – Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.guarantee_limits (
bool
) –Guarantee that the data is within the limits. If
True
, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.For example, if the data is apd.DataFrame
and the limitsof
obs
have already been enforced through aquery
on the DataFrame, the limits are guaranteed to be correct and the data will not be checked again. Possible speedup, should not have any effect on the result.dtype (
DType
) – The DType of the return value. Defaults to the zfit default (usually float64).use_hash (
bool
|None
) – If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.
- Returns:
- data.init.returns|
zfit.Data
orzfit.BinnedData
: A
Data
object containing the unbinned data or aBinnedData
if the obs is binned.
- data.init.returns|
- Return type:
|@doc
- Raises:
ShapeIncompatibleError – If the shape of the data is incompatible with the observables.
ValueError – If the data is not a recognized type.
- enable_hashing()[source]#
Enable hashing for this data object if it was disabled.
A hash allows some objects to be cached and reused. If a hash is enabled, the data object will be hashed and the hash _can_ be used for caching. This can speedup various objects, however, it maybe doesn’t have an effect at all. For example, if an object was already called before with the data object, the hash will probably not be used, as the object is already compiled.
- property data_range#
DEPRECATED FUNCTION
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
space
instead.
- set_data_range(data_range)[source]#
DEPRECATED FUNCTION
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Do not change the range, preferably use pandas or similar, or use
with_obs
instead.
- property weights#
Get the weights of the data.
- set_weights(weights: ztyping.WeightsInputType)[source]#
Set (temporarily) the weights of the dataset. (deprecated)
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
with_weights
instead.
- classmethod from_pandas(df, obs=None, *, weights=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)[source]#
Create a
Data
from a pandas DataFrame. Ifobs
isNone
, columns are used as obs.- Parameters:
df (
DataFrame
) – pandas DataFrame that contains the data. Ifobs
isNone
, columns are used as obs. Can be a superset of obs.obs (
Union
[str
,Iterable
[str
],Space
]) –Space of the data. The space is used to define the observables and the limits of the data. If the
Space
has limits, these will be used to cut the data. If the data is already cut, useguarantee_limits
for a possible performance improvement.If
None
, columns are used as obs.weights (
Union
[Tensor
,None
,ndarray
,str
]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.name (
str
|None
) – Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.label (
str
|None
) – Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.guarantee_limits (
bool
) –Guarantee that the data is within the limits. If
True
, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.For example, if the data is apd.DataFrame
and the limitsof
obs
have already been enforced through aquery
on the DataFrame, the limits are guaranteed to be correct and the data will not be checked again. Possible speedup, should not have any effect on the result.dtype (
DType
) – The DType of the return value. Defaults to the zfit default (usually float64).use_hash (
bool
|None
) – If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.
- Returns:
- data.init.returns|
zfit.Data
orzfit.BinnedData
: A
Data
object containing the unbinned data or aBinnedData
if the obs is binned.
- data.init.returns|
- Return type:
|@doc
- Raises:
ValueError – If the observables are not in the dataframe.
- classmethod from_mapping(mapping, obs=None, *, weights=None, label=None, name=None, dtype=None, use_hash=None, guarantee_limits=False)[source]#
Create a
Data
from a mapping of observables to arrays.- Parameters:
mapping (
Mapping
[str
,Union
[Tensor
,TensorProtocol
,int
,float
,bool
,str
,bytes
,complex
,tuple
,list
,ndarray
,generic
]]) – A mapping from the observables to the data, with the observables as keys and the data as values.obs (
Union
[str
,Iterable
[str
],Space
]) –Space of the data. The space is used to define the observables and the limits of the data. If the
Space
has limits, these will be used to cut the data. If the data is already cut, useguarantee_limits
for a possible performance improvement.They will be matched to the data in the same order. Can be omitted, in which case the keys of the mapping are used as observables.
weights (
Union
[Tensor
,TensorProtocol
,int
,float
,bool
,str
,bytes
,complex
,tuple
,list
,ndarray
,generic
,None
]) –Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.
Can also be a string that is a column in the dataframe. By default, look for a column
""
, i.e., an empty string.name (
str
|None
) – Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.label (
str
|None
) – Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.dtype (
DType
) – dtype of the datause_hash (
bool
|None
) – If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.guarantee_limits (
bool
|None
) – Guarantee that the data is within the limits. IfTrue
, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.
- Returns:
- data.init.returns|
zfit.Data
orzfit.BinnedData
: A
Data
object containing the unbinned data or aBinnedData
if the obs is binned.
- data.init.returns|
- Return type:
|@doc
- Raises:
ValueError – If the observables are not in the mapping.
- classmethod from_root(path, treepath, obs=None, *, weights=None, obs_alias=None, name=None, label=None, dtype=None, root_dir_options=None, use_hash=None, branches=None, branches_alias=None)[source]#
Create a
Data
from a ROOT file. Arguments are passed touproot
.The arguments are passed to uproot directly.
- Parameters:
path (
str
) – Path to the root file.treepath (
str
) – Name of the tree in the root file.obs (
ZfitSpace
) – Observables of the data. This will also be the columns of the data if not obs_alias is given.weights (
Union
[Tensor
,None
,ndarray
,str
]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents). Can be a column of the ROOT file by using a string corresponding to a column.obs_alias (
Mapping
[str
,str
] |None
) – A mapping from theobs
(as keys) to the actualbranches
(as values) in the root file. This allows to have differentobservable
names, independent of the branch name in the file.name (
str
|None
) – Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.label (
str
|None
) – Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.dtype (
DType
) – dtype of the data.root_dir_options – Options passed to uproot.
use_hash (
bool
|None
) – IfTrue
, a hash of the data is created and is used to identify it in caching.
- Returns:
A
Data
object containing the unbinned data.- Return type:
zfit.Data
- classmethod from_numpy(obs, array, *, weights=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)[source]#
Create
Data
from anp.array
.- Parameters:
obs (
Union
[str
,Iterable
[str
],Space
]) – Space of the data. The space is used to define the observables and the limits of the data. If theSpace
has limits, these will be used to cut the data. If the data is already cut, useguarantee_limits
for a possible performance improvement.array (
ndarray
) – Numpy array containing the data. Has to be of shape (nevents, nobs) or, if only one observable, (nevents) is also possible.weights (
Union
[Tensor
,None
,ndarray
]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.name (
str
|None
) – Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.label (
str
|None
) – Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.dtype (
DType
) – dtype of the data.use_hash – If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.
guarantee_limits (
bool
) – Guarantee that the data is within the limits. IfTrue
, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.
- Returns:
- data.init.returns|
zfit.Data
orzfit.BinnedData
: A
Data
object containing the unbinned data or aBinnedData
if the obs is binned.
- data.init.returns|
- Return type:
|@doc
- Raises:
TypeError – If the array is not a numpy array.
- classmethod from_tensor(obs, tensor, *, weights=None, name=None, label=None, dtype=None, use_hash=None, guarantee_limits=False)[source]#
Create a
Data
from atf.Tensor
- Parameters:
obs (
Union
[str
,Iterable
[str
],Space
]) – Space of the data. The space is used to define the observables and the limits of the data. If theSpace
has limits, these will be used to cut the data. If the data is already cut, useguarantee_limits
for a possible performance improvement.tensor (
Tensor
) – Tensor containing the data. Has to be of shape (nevents, nobs) or, if only one observable, (nevents) is also possible.weights (
Union
[Tensor
,None
,ndarray
]) – Weights of the data. Has to be 1-D and match the shape of the data (nevents). Note that a weighted dataset may not be supported by all methods or need additional approximations to correct for the weights, taking more time.name (
str
|None
) – Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.label (
str
|None
) – Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.dtype (
DType
) – dtype of the data.use_hash – If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.
guarantee_limits (
bool
) – Guarantee that the data is within the limits. IfTrue
, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.
- Returns:
- data.init.returns|
zfit.Data
orzfit.BinnedData
: A
Data
object containing the unbinned data or aBinnedData
if the obs is binned.
- data.init.returns|
- Return type:
|@doc
- Raises:
TypeError – If the tensor is not a tensorflow tensor.
- with_obs(obs, *, guarantee_limits=False)[source]#
Create a new
Data
with a subset of the data using the obs.- Parameters:
obs (
Union
[str
,Iterable
[str
],Space
]) – Observables to return. Has to be a subset of the original observables.guarantee_limits (
bool
) – Guarantee that the data is within the limits. IfTrue
, the data will not be checked and _is assumed_ to be within the limits, possibly because it was already cut before. This can lead to a performance improvement as the data does not have to be checked.
- Returns:
A new
Data
object containing the subset of the data.- Return type:
zfit.Data
- to_pandas(obs=None, weightsname=None)[source]#
Create a
pd.DataFrame
fromobs
as columns and return it.- Parameters:
- Returns:
A
pd.DataFrame
containing the data and the weights (if present).- Return type:
pd.DataFrame
- unstack_x(obs=None, always_list=None)[source]#
Return the unstacked data: a list of tensors or a single Tensor.
- value(obs=None, axis=None)[source]#
Return the data as a numpy-like object in
obs
order.- Parameters:
obs (
Union
[str
,Iterable
[str
],Space
]) – Observables to return. IfNone
, all observables are returned. Can be a subset of the original observables. If a string is given, a 1-D array is returned with shape (nevents,). If a list of strings or azfit.Space
is given, a 2-D array is returned with shape (nevents, nobs).axis (
int
|None
) – If given, the axis to return instead of the full data. Ifobs
is a string, this has to beNone
.
- Return type:
Tensor
Returns:
- to_numpy()[source]#
Return the data as a numpy array.
Pandas DataFrame equivalent method :returns: The data as a numpy array. :rtype: np.ndarray
- sort_by_axes(axes: ztyping.AxesTypeInput, allow_superset: bool = True)[source]#
DEPRECATED FUNCTION
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
with_obs
instead.
- sort_by_obs(obs: ztyping.ObsTypeInput, allow_superset: bool = False)[source]#
DEPRECATED FUNCTION
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
with_obs
instead.
- to_binned(space, *, name=None, label=None, use_hash=None)[source]#
Bins the data using
space
and returns aBinnedData
object.- Parameters:
space (
Space
) – The space to bin the data in.name (
str
|None
) – Name of the data. This can possibly be used for future identification, with possible implications on the serialization and deserialization of the data. The name should therefore be “machine-readable” and not contain special characters. (currently not used for a special purpose) For a human-readable name or description, use the label.label (
str
|None
) – Human-readable name or label of the data for a better description, to be used with plots etc. Can contain arbitrary characters. Has no programmatical functional purpose as identification.use_hash (
bool
|None
) – If true, store a hash for caching. If a PDF can cache values, this option needs to be enabled for the PDF to be able to cache values.
- Returns:
A new
BinnedData
object containing the binned data.- Return type:
zfit.BinnedData
- add_cache_deps(cache_deps, allow_non_cachable=True)#
Add dependencies that render the cache invalid if they change.
- classmethod from_asdf(asdf_obj, *, reuse_params=None)#
Load an object from an asdf file.
Args#
asdf_obj: Object reuse_params:If parameters, the parameters
will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.
- classmethod from_dict(dict_, *, reuse_params=None)#
Creates an object from a dictionary structure as generated by
to_dict
.- Parameters:
dict – Dictionary structure.
reuse_params – If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.
- Returns:
The deserialized object.
- classmethod from_json(cls, json, *, reuse_params=None)#
Load an object from a json string.
- Parameters:
json (
str
) – Serialized object in a JSON string.reuse_params – If parameters, the parameters will be reused if they are given. If a parameter is given, it will be used as the parameter with the same name. If a parameter is not given, a new parameter will be created.
- Return type:
- Returns:
The deserialized object.
- classmethod get_repr()#
Abstract representation of the object for serialization.
This objects knows how to serialize and deserialize the object and is used by the
to_json
,from_json
,to_dict
andfrom_dict
methods.- Returns:
The representation of the object.
- Return type:
pydantic.BaseModel
- register_cacher(cacher)#
Register a
cacher
that caches values produces by this instance; a dependent.- Parameters:
cacher (ztyping.CacherOrCachersType)
- reset_cache_self()#
Clear the cache of self and all dependent cachers.
- to_asdf()#
Convert the object to an asdf file.
- to_dict()#
Convert the object to a nested dictionary structure.
- Returns:
The dictionary structure.
- Return type: