Histogram#

The Histogram object is the core of boost-histogram.

Filling#

You call .fill to fill. You must have one 1D array (or scalar value) per dimension. For maximum performance, numeric arrays should be contiguously laid out in memory, and either 64-bit floats or ints. If any other layouts or numeric datatypes are supplied, a temporary copy will be made internally before filling.

All storages support a weight= parameter, and some storages support a sample= parameter. If supplied, they must be a scalar (applies to all items equally) or an iterable of scalars/1D arrays that matches the number of dimensions of the histogram.

The summing accumulators (not Mean() and WeightedMean()) support threaded filling. Pass threads=N to the fill parameter to fill with N threads (and using 0 will select the number of virtual cores on your system). This is helpful only if you have a large number of entries compared to your number of bins, as all non-atomic storages will make copies for each thread, and then will recombine after the fill is complete.

Data#

The primary values from a histogram are always available as .values(). The variances are available as .variances(), unless you fill an unweighed histogram with weights, which will cause this to return None, since the variances are no longer computable (use a weighted storage instead if you need the variances). The counts are available as .counts(). If the histogram is weighted, .counts() returns the effective counts; see UHI for details.

Views#

While Histograms do conform to the Python buffer protocol, the best way to get access to the raw contents of a histogram as a NumPy array is with .view(). This way you can optionally pass flow=True to get the flow bins, and if you have an accumulator storage, you will get a View, which is a slightly augmented ndarrray subclass (see Accumulators). Views support setting as well for non-computed properties; you can use an expression like this to set the values of an accumulator storage:

h.view().value = values

You can also used stacked arrays (N+1 dimensional) to set a histogram’s contents. This is especially useful if you need to set a computed value, like variance on a Mean/WeightedMean storage, which cannot be set using the above method:

h[...] = np.stack([values, variances], axis=-1)

If you leave endpoints off (such as with ... above), then you can match the size with or without flow bins.

Operations#

h.rank: The number of dimensions
h.size or len(h): The number of bins
+: Add two histograms, or add a scalar or array (storages must match types currently)
*=: Multiply by a scalar, array, or histogram (not all storages) (hist * scalar and scalar * hist supported too)
/=: Divide by a scalar, array, or histogram (not all storages) (hist / scalar supported too)
[...]: Access a bin or a range of bins (get or set) (see Indexing)
.sum(flow=False): The total count of all bins
.project(ax1, ax2, ..., flow=True): Project down to listed axis (numbers); pass flow=False to drop the flow bins of the removed axes instead of summing them in
.to_numpy(flow=False, view=False): Convert to a NumPy style tuple (with or without under/overflow bins, and either return values (the default) or the entire view for accumulator storages.)
.view(flow=False): Get a view on the bin contents (with or without under/overflow bins)
.values(flow=False): Get a view on the values (counts or means, depending on storage)
.variances(flow=False): Get the variances if available
.counts(flow=False): Get the effective counts for all storage types
.reset(): Set counters to 0
.empty(flow=False): Check to see if the histogram is empty (can check flow bins too if asked)
.copy(deep=False): Make a copy of a histogram
.axes: Get the axes as a tuple-like (all properties of axes are available too)
- .axes[0]: Get the 0th axis
- .axes.edges: The lower values as a broadcasting-ready array
- .axes.centers: The centers of the bins broadcasting-ready array
- .axes.widths: The bin widths as a broadcasting-ready array
- .axes.metadata: A tuple of the axes metadata
- .axes.traits: A tuple of the axes traits
- .axes.size: A tuple of the axes sizes (size without flow)
- .axes.extent: A tuple of the axes extents (size with flow)
- .axes.bin(*args): Returns the bin edges as a tuple of pairs (continuous axis) or values (discrete axis)
- .axes.index(*args): Returns the bin index at a value for each axis
- .axes.value(*args): Returns the bin value at an index for each axis

Saving a Histogram#

You can save a histogram using pickle:

import pickle

with open("file.pkl", "wb") as f:
    pickle.dump(h, f)

with open("file.pkl", "rb") as f:
    h2 = pickle.load(f)

assert h == h2

Special care was taken to ensure that this is fast and efficient. Please use the latest version of the Pickle protocol you feel comfortable using; you cannot use version 0, the version that used to be default on Python 2. The most recent versions provide performance benefits.

You can nest this in other Python structures, like dictionaries, and save those instead.

UHI dictionary serialization#

You can also convert a histogram to and from a plain dictionary following the UHI serialization schema. This is useful for sharing histograms as JSON (or any other format that supports nested dictionaries and arrays):

from boost_histogram.serialization import to_uhi, from_uhi

data = to_uhi(h)
h2 = from_uhi(data)

assert h == h2

If you only want to share the structure of a histogram – its axes, metadata, and storage type – without the (potentially large) bin contents, pass keep_storage=False. The resulting storage entry contains only its type:

data = to_uhi(h, keep_storage=False)
assert data["storage"] == {"type": "double"}

# Loading restores an empty histogram with the same axes and storage type
h_empty = from_uhi(data)

This is handy for transmitting just the “initialization” of a histogram (for example, to set up a matching histogram on a server) without serializing all of the zeros. from_uhi accepts both full and metadata-only dictionaries.

Histogram

Contents