Bayesian Inference#

Introduction#

zfit provides a Bayesian inference framework that allows you to perform parameter estimation using MCMC (Markov Chain Monte Carlo) sampling. This functionality complements the frequentist approach of maximum likelihood estimation by incorporating prior knowledge and providing full posterior distributions for parameters.

Key Components#

  • Priors: Define prior distributions for parameters

  • EmceeSampler: MCMC sampler based on the emcee ensemble sampler

  • PosteriorSamples: Result object for analyzing posterior distributions

  • ArviZ Integration: Advanced diagnostics and visualization through ArviZ

Priors#

zfit provides several built-in prior distributions that can be attached to parameters:

import zfit
from zfit import prior

# Create parameters with priors
mu = zfit.Parameter("mu", 0.0, -5.0, 10.0, prior=prior.Normal(mu=0.0, sigma=2.0))
sigma = zfit.Parameter("sigma", 1.0, 0.1, 5.0, prior=prior.HalfNormal(sigma=2.0))
frac = zfit.Parameter("frac", 0.5, 0.0, 1.0, prior=prior.Uniform(lower=0.0, upper=1.0))

Available Prior Distributions#

import zfit
from zfit import prior

# List all available prior distributions
available_priors = [name for name in dir(prior) if not name.startswith('_') and hasattr(getattr(prior, name), '__call__')]
print("Available prior distributions:")
for p in sorted(available_priors):
    print(f"  - {p}")
Available prior distributions:

  - AffineTransform

  - Beta

  - Cauchy

  - ConstraintType

  - Exponential

  - Gamma

  - HalfNormal

  - IdentityTransform

  - KDE

  - LogNormal

  - LogTransform

  - LowerBoundTransform

  - Normal

  - Poisson

  - PriorConstraint

  - SigmoidTransform

  - StudentT

  - Uniform

  - UpperBoundTransform

MCMC Sampling#

We can sample from the posterior distribution using MCMC methods. zfit provides the EmceeSampler, which is based on the popular emcee library.

from zfit.mcmc import EmceeSampler

# Create sampler with custom settings
sampler = EmceeSampler(
    nwalkers=32,      # Number of walkers (default: 2 × n_params)
    verbosity=0,      # Verbosity level (0-6: no progress, 7: phases, 8+: progress bars)
)

print("EmceeSampler created with:")
print(f"  - nwalkers: {sampler.nwalkers}")
EmceeSampler created with:

  - nwalkers: 32

Basic Usage Example#

Here’s a complete example of Bayesian inference with zfit:

import zfit
from zfit.mcmc import EmceeSampler
import numpy as np

# Set seed for reproducible results
zfit.settings.set_seed(42)

# Create parameters with priors
mu = zfit.Parameter("mu", 5.0, 4.5, 5.5,
                    prior=zfit.prior.Uniform(lower=4.8, upper=5.2))
sigma = zfit.Parameter("sigma", 0.1, 0.05, 0.3,
                      prior=zfit.prior.HalfNormal(sigma=0.1))

# Create a model
obs = zfit.Space("x", -10, 10)
gauss = zfit.pdf.Gauss(mu=mu, sigma=sigma, obs=obs)

# Create some data
data = zfit.Data.from_numpy(obs=obs, array=np.random.normal(5.0, 0.12, 1000))

# Create negative log-likelihood loss
nll = zfit.loss.UnbinnedNLL(model=gauss, data=data)

# Sample from the posterior (small sample for docs)
sampler = EmceeSampler(nwalkers=16, verbosity=0)
posterior = sampler.sample(nll, n_samples=100, n_warmup=50)

# Display results
print("Posterior sampling completed:")
print(f"  - Parameters: {posterior.param_names}")
print(f"  - Samples shape: {posterior.samples.shape}")
print(f"  - Total samples: {len(posterior.samples)} ({sampler.nwalkers} walkers × {100} steps)")
Posterior sampling completed:

  - Parameters: ['mu', 'sigma']

  - Samples shape: (1600, 2)

  - Total samples: 1600 (16 walkers × 100 steps)

Posterior Analysis#

The PosteriorSamples object provides methods for analyzing the posterior:

# Get posterior statistics
mu_mean = posterior.mean("mu")
mu_std = posterior.std("mu")

print(f"Parameter 'mu':")
print(f"  - Mean: {mu_mean:.4f}")
print(f"  - Std:  {mu_std:.4f}")

# Get credible intervals
lower, upper = posterior.credible_interval("mu", alpha=0.05)  # 95% CI
print(f"  - 95% CI: [{lower:.4f}, {upper:.4f}]")

# Check convergence
print(f"\nConvergence:")
print(f"  - Converged: {posterior.converged}")
print(f"  - R̂: {posterior.rhat}")
print(f"  - ESS: {posterior.ess}")
Parameter 'mu':

  - Mean: 4.9967

  - Std:  0.0038

  - 95% CI: [4.9883, 5.0041]


Convergence:

  - Converged: True

  - R̂: [1.01587368 1.00995116]

  - ESS: [1899.44063636 1873.02121012]

Posterior Integration#

The posterior samples integrate with zfit’s parameter system:

print("Original parameter values:")
print(f"  - mu: {mu.value():.4f}")
print(f"  - sigma: {sigma.value():.4f}")

# Set parameters to posterior means
posterior.update_params()

print("\nAfter updating with posterior means:")
print(f"  - mu: {mu.value():.4f}")
print(f"  - sigma: {sigma.value():.4f}")
Original parameter values:

  - mu: 5.0000

  - sigma: 0.1000


After updating with posterior means:

  - mu: 4.9967

  - sigma: 0.1187

For more advanced usage, you can also use the ArviZ library to visualize and analyze the posterior distributions, including trace plots, pair plots, and more.

import arviz as az

# Convert posterior samples to ArviZ InferenceData
inference_data = posterior.to_arviz()

# Plot trace and pair plots
az.plot_trace(inference_data)
az.plot_pair(inference_data)
<Axes: xlabel='mu', ylabel='sigma'>
../../../_images/bayesian_inference_6_1.png ../../../_images/bayesian_inference_6_2.png