boost_histogram#

class boost_histogram._internal.hist.Histogram(*axes: Axis | CppAxis | Histogram | Any, storage: Storage = boost_histogram._core.storage.double, metadata: Any = None)#

Bases: object

axes: AxesTuple#
copy(*, deep: bool = True) H#

Make a copy of the histogram. Defaults to making a deep copy (axis metadata copied); use deep=False to avoid making a copy of axis metadata.

counts(flow: bool = False) ndarray[Any, dtype[Any]]#

Returns the number of entries in each bin for an unweighted histogram or profile and an effective number of entries (defined below) for a weighted histogram or profile. An exotic generalized histogram could have no sensible .counts, so this is Optional and should be checked by Consumers.

If kind == “MEAN”, counts (effective or not) can and should be used to determine whether the mean value and its variance should be displayed (see documentation of values and variances, respectively). The counts should also be used to compute the error on the mean (see documentation of variances).

For a weighted histogram, counts is defined as sum_of_weights ** 2 / sum_of_weights_squared. It is equal or less than the number of times the bin was filled, the equality holds when all filled weights are equal. The larger the spread in weights, the smaller it is, but it is always 0 if filled 0 times, and 1 if filled once, and more than 1 otherwise.

Returns:

“np.typing.NDArray[Any]”[np.float64]

empty(flow: bool = False) bool#

Check to see if the histogram has any non-default values. You can use flow=True to check flow bins too.

fill(*args: Any | str, weight: Any | None = None, sample: Any | None = None, threads: int | None = None) H#

Insert data into the histogram.

Parameters:
  • *args (Union[Array[float], Array[int], Array[str], float, int, str]) – Provide one value or array per dimension.

  • weight (List[Union[Array[float], Array[int], float, int, str]]]) – Provide weights (only if the histogram storage supports it)

  • sample (List[Union[Array[float], Array[int], Array[str], float, int, str]]]) – Provide samples (only if the histogram storage supports it)

  • threads (Optional[int]) – Fill with threads. Defaults to None, which does not activate threaded filling. Using 0 will automatically pick the number of available threads (usually two per core).

property kind: Kind#

Returns Kind.COUNT if this is a normal summing histogram, and Kind.MEAN if this is a mean histogram.

Returns:

Kind

property ndim: int#

Number of axes (dimensions) of the histogram.

project(*args: int) H | float | Any#

Project to a single axis or several axes on a multidimensional histogram. Provided a list of axis numbers, this will produce the histogram over those axes only. Flow bins are used if available.

reset() H#

Clear the bin counters.

property shape: tuple[int, ...]#

Tuple of axis sizes (not including underflow/overflow).

property size: int#

Total number of bins in the histogram (including underflow/overflow).

property storage_type: type[Storage]#
sum(flow: bool = False) float | Any#

Compute the sum over the histogram bins (optionally including the flow bins).

to_numpy(flow: bool = False, *, dd: bool = False, view: bool = False) tuple[ndarray[Any, dtype[Any]], ...] | tuple[ndarray[Any, dtype[Any]], tuple[ndarray[Any, dtype[Any]], ...]]#

Convert to a NumPy style tuple of return arrays. Edges are converted to match NumPy standards, with upper edge inclusive, unlike boost-histogram, where upper edge is exclusive.

Parameters:
  • flow (bool = False) – Include the flow bins.

  • dd (bool = False) – Use the histogramdd return syntax, where the edges are in a tuple. Otherwise, this is the histogram/histogram2d return style.

  • view (bool = False) – The behavior for the return value. By default, this will return array of the values only regardless of the storage (which is all NumPy’s histogram function can do). view=True will return the boost-histogram view of the storage.

Returns:

  • contents (Array[Any]) – The bin contents

  • *edges (Array[float]) – The edges for each dimension

values(flow: bool = False) ndarray[Any, dtype[Any]]#

Returns the accumulated values. The counts for simple histograms, the sum of weights for weighted histograms, the mean for profiles, etc.

If counts is equal to 0, the value in that cell is undefined if kind == “MEAN”.

Parameters:

flow – Enable flow bins. Not part of PlottableHistogram, but

included for consistency with other methods and flexibility.

Returns:

“np.typing.NDArray[Any]”[np.float64]

variances(flow: bool = False) ndarray[Any, dtype[Any]] | None#

Returns the estimated variance of the accumulated values. The sum of squared weights for weighted histograms, the variance of samples for profiles, etc. For an unweighed histogram where kind == “COUNT”, this should return the same as values if the histogram was not filled with weights, and None otherwise. If counts is equal to 1 or less, the variance in that cell is undefined if kind == “MEAN”. This must be written <= 1, and not < 2; when this effective counts (weighed mean), then counts could be less than 2 but more than 1.

If kind == “MEAN”, the counts can be used to compute the error on the mean as sqrt(variances / counts), this works whether or not the entries are weighted if the weight variance was tracked by the implementation.

Currently, this always returns - but in the future, it will return None if a weighted fill is made on a unweighed storage.

Parameters:

flow – Enable flow bins. Not part of PlottableHistogram, but

included for consistency with other methods and flexibility.

Returns:

“np.typing.NDArray[Any]”[np.float64]

view(flow: bool = False) ndarray[Any, dtype[Any]] | WeightedSumView | WeightedMeanView | MeanView#

Return a view into the data, optionally with overflow turned on.

boost_histogram.axis#

class boost_histogram.axis.ArrayTuple(iterable=(), /)#

Bases: tuple

broadcast() A#

The arrays in this tuple will be compressed if possible to save memory. Use this method to broadcast them out into their full memory representation.

class boost_histogram.axis.AxesTuple(_AxesTuple__iterable: Iterable[Axis])#

Bases: tuple

bin(*indexes: float) tuple[float, ...]#

Return the edges of the bins as a tuple for a continuous axis or the bin value for a non-continuous axis, when given an index.

property centers: ArrayTuple#
property edges: ArrayTuple#
property extent: tuple[int, ...]#
index(*values: float) tuple[float, ...]#

Return the fractional index(es) given a value (or values) on the axis.

property size: tuple[int, ...]#
value(*indexes: float) tuple[float, ...]#

Return the value(s) given an (fractional) index (or indices).

property widths: ArrayTuple#
class boost_histogram.axis.Axis(ax: Any, metadata: dict[str, Any] | None, __dict__: dict[str, Any] | None)#

Bases: object

bin(index: float) int | str | tuple[float, float]#

Return the edges of the bins as a tuple for a continuous axis or the bin value for a non-continuous axis, when given an index.

property centers: ndarray[Any, dtype[Any]]#

An array of bin centers.

property edges: ndarray[Any, dtype[Any]]#
property extent: int#

Return number of bins including under- and overflow.

index(value: float | str) int#

Return the fractional index(es) given a value (or values) on the axis.

property size: int#

Return number of bins excluding under- and overflow.

property traits: Traits#

Get traits for the axis - read only properties of a specific axis.

value(index: float) float#

Return the value(s) given an (fractional) index (or indices).

property widths: ndarray[Any, dtype[Any]]#

An array of bin widths.

class boost_histogram.axis.Boolean(*, metadata: Any = None, __dict__: dict[str, Any] | None = None)#

Bases: Axis

class boost_histogram.axis.IntCategory(categories: Iterable[int], *, metadata: Any = None, growth: bool = False, overflow: bool = True, __dict__: dict[str, Any] | None = None)#

Bases: BaseCategory

class boost_histogram.axis.Integer(start: int, stop: int, *, metadata: Any = None, underflow: bool = True, overflow: bool = True, growth: bool = False, circular: bool = False, __dict__: dict[str, Any] | None = None)#

Bases: Axis

class boost_histogram.axis.Regular(bins: int, start: float, stop: float, *, metadata: Any = None, underflow: bool = True, overflow: bool = True, growth: bool = False, circular: bool = False, transform: AxisTransform | None = None, __dict__: dict[str, Any] | None = None)#

Bases: Axis

property transform: AxisTransform | None#
class boost_histogram.axis.StrCategory(categories: Iterable[str], *, metadata: Any = None, growth: bool = False, overflow: bool = True, __dict__: dict[str, Any] | None = None)#

Bases: BaseCategory

index(value: float | str) int#

Return the fractional index(es) given a value (or values) on the axis.

class boost_histogram.axis.Traits(underflow: 'bool' = False, overflow: 'bool' = False, circular: 'bool' = False, growth: 'bool' = False, continuous: 'bool' = False, ordered: 'bool' = False)#

Bases: object

circular: bool = False#
continuous: bool = False#
property discrete: bool#

True if axis is not continuous

growth: bool = False#
ordered: bool = False#
overflow: bool = False#
underflow: bool = False#
class boost_histogram.axis.Variable(edges: Iterable[float], *, metadata: Any = None, underflow: bool = True, overflow: bool = True, growth: bool = False, circular: bool = False, __dict__: dict[str, Any] | None = None)#

Bases: Axis

boost_histogram.axis.transform#

class boost_histogram.axis.transform.AxisTransform#

Bases: object

forward(value: float) float#

Compute the forward transform

inverse(value: float) float#

Compute the inverse transform

class boost_histogram.axis.transform.Function(forward: Any, inverse: Any, *, convert: Any = None, name: str = '')#

Bases: AxisTransform

class boost_histogram.axis.transform.Pow(power: float)#

Bases: AxisTransform

property power: float#

The power of the transform

boost_histogram.accumulators#

boost_histogram.accumulators.Accumulator#

alias of Any

boost_histogram.numpy#

boost_histogram.numpy.histogram(a: object, bins: int | str | ndarray[Any, dtype[Any]] = 10, range: tuple[float, float] | None = None, normed: None = None, weights: object | None = None, density: bool = False, *, histogram: None | type[Histogram] = None, storage: Storage | None = None, threads: int | None = None) Any#

Return a boost-histogram object using the same arguments as numpy’s histogram. This does not support the deprecated normed=True argument. Three extra arguments are added: histogram=bh.Histogram will enable object based output, storage=bh.storage.* lets you set the storage used, and threads=int lets you set the number of threads to fill with (0 for auto, None for 1).

Compute the histogram of a dataset.

Parameters:
  • a (array_like) – Input data. The histogram is computed over the flattened array.

  • bins (int or sequence of scalars or str, optional) –

    If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.

    Added in version 1.11.0.

    If bins is a string, it defines the method used to calculate the optimal bin width, as defined by histogram_bin_edges.

  • range ((float, float), optional) – The lower and upper range of the bins. If not provided, range is simply (a.min(), a.max()). Values outside the range are ignored. The first element of the range must be less than or equal to the second. range affects the automatic bin computation as well. While bin width is computed to be optimal based on the actual data within range, the bin count will fill the entire range including portions containing no data.

  • weights (array_like, optional) – An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1). If density is True, the weights are normalized, so that the integral of the density over the range remains 1. Please note that the dtype of weights will also become the dtype of the returned accumulator (hist), so it must be large enough to hold accumulated values as well.

  • density (bool, optional) – If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

Returns:

  • hist (array) – The values of the histogram. See density and weights for a description of the possible semantics. If weights are given, hist.dtype will be taken from weights.

  • bin_edges (array of dtype float) – Return the bin edges (length(hist)+1).

See also

histogramdd, bincount, searchsorted, digitize, histogram_bin_edges

Notes

All but the last (righthand-most) bin is half-open. In other words, if bins is:

[1, 2, 3, 4]

then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

Examples

>>> import numpy as np
>>> np.histogram([1, 2, 1], bins=[0, 1, 2, 3])
(array([0, 2, 1]), array([0, 1, 2, 3]))
>>> np.histogram(np.arange(4), bins=np.arange(5), density=True)
(array([0.25, 0.25, 0.25, 0.25]), array([0, 1, 2, 3, 4]))
>>> np.histogram([[1, 2, 1], [1, 0, 1]], bins=[0,1,2,3])
(array([1, 4, 1]), array([0, 1, 2, 3]))
>>> a = np.arange(5)
>>> hist, bin_edges = np.histogram(a, density=True)
>>> hist
array([0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5])
>>> hist.sum()
2.4999999999999996
>>> np.sum(hist * np.diff(bin_edges))
1.0

Added in version 1.11.0.

Automated Bin Selection Methods example, using 2 peak random data with 2000 points.

boost_histogram.numpy.histogram2d(x: object, y: object, bins: int | tuple[int, int] = 10, range: None | Sequence[None | tuple[float, float]] = None, normed: None = None, weights: object | None = None, density: bool = False, *, histogram: None | type[Histogram] = None, storage: Storage = boost_histogram._core.storage.double, threads: int | None = None) Any#

Return a boost-histogram object using the same arguments as numpy’s histogram2d. This does not support the deprecated normed=True argument. Three extra arguments are added: histogram=bh.Histogram will enable object based output, storage=bh.storage.* lets you set the storage used, and threads=int lets you set the number of threads to fill with (0 for auto, None for 1).

Compute the bi-dimensional histogram of two data samples.

Parameters:
  • x (array_like, shape (N,)) – An array containing the x coordinates of the points to be histogrammed.

  • y (array_like, shape (N,)) – An array containing the y coordinates of the points to be histogrammed.

  • bins (int or array_like or [int, int] or [array, array], optional) –

    The bin specification:

    • If int, the number of bins for the two dimensions (nx=ny=bins).

    • If array_like, the bin edges for the two dimensions (x_edges=y_edges=bins).

    • If [int, int], the number of bins in each dimension (nx, ny = bins).

    • If [array, array], the bin edges in each dimension (x_edges, y_edges = bins).

    • A combination [int, array] or [array, int], where int is the number of bins and array is the bin edges.

  • range (array_like, shape(2,2), optional) – The leftmost and rightmost edges of the bins along each dimension (if not specified explicitly in the bins parameters): [[xmin, xmax], [ymin, ymax]]. All values outside of this range will be considered outliers and not tallied in the histogram.

  • density (bool, optional) – If False, the default, returns the number of samples in each bin. If True, returns the probability density function at the bin, bin_count / sample_count / bin_area.

  • weights (array_like, shape(N,), optional) – An array of values w_i weighing each sample (x_i, y_i). Weights are normalized to 1 if density is True. If density is False, the values of the returned histogram are equal to the sum of the weights belonging to the samples falling into each bin.

Returns:

  • H (ndarray, shape(nx, ny)) – The bi-dimensional histogram of samples x and y. Values in x are histogrammed along the first dimension and values in y are histogrammed along the second dimension.

  • xedges (ndarray, shape(nx+1,)) – The bin edges along the first dimension.

  • yedges (ndarray, shape(ny+1,)) – The bin edges along the second dimension.

See also

histogram

1D histogram

histogramdd

Multidimensional histogram

Notes

When density is True, then the returned histogram is the sample density, defined such that the sum over bins of the product bin_value * bin_area is 1.

Please note that the histogram does not follow the Cartesian convention where x values are on the abscissa and y values on the ordinate axis. Rather, x is histogrammed along the first dimension of the array (vertical), and y along the second dimension of the array (horizontal). This ensures compatibility with histogramdd.

Examples

>>> import numpy as np
>>> from matplotlib.image import NonUniformImage
>>> import matplotlib.pyplot as plt

Construct a 2-D histogram with variable bin width. First define the bin edges:

>>> xedges = [0, 1, 3, 5]
>>> yedges = [0, 2, 3, 4, 6]

Next we create a histogram H with random bin content:

>>> x = np.random.normal(2, 1, 100)
>>> y = np.random.normal(1, 1, 100)
>>> H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges))
>>> # Histogram does not follow Cartesian convention (see Notes),
>>> # therefore transpose H for visualization purposes.
>>> H = H.T

imshow can only display square bins:

>>> fig = plt.figure(figsize=(7, 3))
>>> ax = fig.add_subplot(131, title='imshow: square bins')
>>> plt.imshow(H, interpolation='nearest', origin='lower',
...         extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
<matplotlib.image.AxesImage object at 0x...>

pcolormesh can display actual edges:

>>> ax = fig.add_subplot(132, title='pcolormesh: actual edges',
...         aspect='equal')
>>> X, Y = np.meshgrid(xedges, yedges)
>>> ax.pcolormesh(X, Y, H)
<matplotlib.collections.QuadMesh object at 0x...>

NonUniformImage can be used to display actual bin edges with interpolation:

>>> ax = fig.add_subplot(133, title='NonUniformImage: interpolated',
...         aspect='equal', xlim=xedges[[0, -1]], ylim=yedges[[0, -1]])
>>> im = NonUniformImage(ax, interpolation='bilinear')
>>> xcenters = (xedges[:-1] + xedges[1:]) / 2
>>> ycenters = (yedges[:-1] + yedges[1:]) / 2
>>> im.set_data(xcenters, ycenters, H)
>>> ax.add_image(im)
>>> plt.show()

It is also possible to construct a 2-D histogram without specifying bin edges:

>>> # Generate non-symmetric test data
>>> n = 10000
>>> x = np.linspace(1, 100, n)
>>> y = 2*np.log(x) + np.random.rand(n) - 0.5
>>> # Compute 2d histogram. Note the order of x/y and xedges/yedges
>>> H, yedges, xedges = np.histogram2d(y, x, bins=20)

Now we can plot the histogram using pcolormesh, and a hexbin for comparison.

>>> # Plot histogram using pcolormesh
>>> fig, (ax1, ax2) = plt.subplots(ncols=2, sharey=True)
>>> ax1.pcolormesh(xedges, yedges, H, cmap='rainbow')
>>> ax1.plot(x, 2*np.log(x), 'k-')
>>> ax1.set_xlim(x.min(), x.max())
>>> ax1.set_ylim(y.min(), y.max())
>>> ax1.set_xlabel('x')
>>> ax1.set_ylabel('y')
>>> ax1.set_title('histogram2d')
>>> ax1.grid()
>>> # Create hexbin plot for comparison
>>> ax2.hexbin(x, y, gridsize=20, cmap='rainbow')
>>> ax2.plot(x, 2*np.log(x), 'k-')
>>> ax2.set_title('hexbin')
>>> ax2.set_xlim(x.min(), x.max())
>>> ax2.set_xlabel('x')
>>> ax2.grid()
>>> plt.show()
boost_histogram.numpy.histogramdd(a: tuple[object, ...], bins: int | tuple[int, ...] | tuple[ndarray[Any, dtype[Any]], ...] = 10, range: None | Sequence[None | tuple[float, float]] = None, normed: None = None, weights: object | None = None, density: bool = False, *, histogram: None | type[Histogram] = None, storage: Storage = boost_histogram._core.storage.double, threads: int | None = None) Any#

Return a boost-histogram object using the same arguments as numpy’s histogramdd. This does not support the deprecated normed=True argument. Three extra arguments are added: histogram=bh.Histogram will enable object based output, storage=bh.storage.* lets you set the storage used, and threads=int lets you set the number of threads to fill with (0 for auto, None for 1).

Compute the multidimensional histogram of some data.

Parameters:
  • sample ((N, D) array, or (N, D) array_like) –

    The data to be histogrammed.

    Note the unusual interpretation of sample when an array_like:

    • When an array, each row is a coordinate in a D-dimensional space - such as histogramdd(np.array([p1, p2, p3])).

    • When an array_like, each element is the list of values for single coordinate - such as histogramdd((X, Y, Z)).

    The first form should be preferred.

  • bins (sequence or int, optional) –

    The bin specification:

    • A sequence of arrays describing the monotonically increasing bin edges along each dimension.

    • The number of bins for each dimension (nx, ny, … =bins)

    • The number of bins for all dimensions (nx=ny=…=bins).

  • range (sequence, optional) – A sequence of length D, each an optional (lower, upper) tuple giving the outer bin edges to be used if the edges are not given explicitly in bins. An entry of None in the sequence results in the minimum and maximum values being used for the corresponding dimension. The default, None, is equivalent to passing a tuple of D None values.

  • density (bool, optional) – If False, the default, returns the number of samples in each bin. If True, returns the probability density function at the bin, bin_count / sample_count / bin_volume.

  • weights ((N,) array_like, optional) – An array of values w_i weighing each sample (x_i, y_i, z_i, …). Weights are normalized to 1 if density is True. If density is False, the values of the returned histogram are equal to the sum of the weights belonging to the samples falling into each bin.

Returns:

  • H (ndarray) – The multidimensional histogram of sample x. See density and weights for the different possible semantics.

  • edges (tuple of ndarrays) – A tuple of D arrays describing the bin edges for each dimension.

See also

histogram

1-D histogram

histogram2d

2-D histogram

Examples

>>> import numpy as np
>>> rng = np.random.default_rng()
>>> r = rng.normal(size=(100,3))
>>> H, edges = np.histogramdd(r, bins = (5, 8, 4))
>>> H.shape, edges[0].size, edges[1].size, edges[2].size
((5, 8, 4), 6, 9, 5)

boost_histogram.storage#

class boost_histogram.storage.AtomicInt64(*args: Any, **kwargs: Any)#

Bases: atomic_int64, Storage

accumulator#

alias of int

class boost_histogram.storage.Double(*args: Any, **kwargs: Any)#

Bases: double, Storage

accumulator#

alias of float

class boost_histogram.storage.Int64(*args: Any, **kwargs: Any)#

Bases: int64, Storage

accumulator#

alias of int

class boost_histogram.storage.Mean(*args: Any, **kwargs: Any)#

Bases: mean, Storage

class boost_histogram.storage.Storage#

Bases: object

accumulator: ClassVar[type[int] | type[float] | type[boost_histogram._core.accumulators.WeightedMean] | type[boost_histogram._core.accumulators.WeightedSum] | type[boost_histogram._core.accumulators.Mean]]#
class boost_histogram.storage.Unlimited(*args: Any, **kwargs: Any)#

Bases: unlimited, Storage

accumulator#

alias of float

class boost_histogram.storage.Weight(*args: Any, **kwargs: Any)#

Bases: weight, Storage

class boost_histogram.storage.WeightedMean(*args: Any, **kwargs: Any)#

Bases: weighted_mean, Storage

boost_histogram.tag#

class boost_histogram.tag.Locator(offset: int = 0)#

Bases: object

NAME = ''#
offset#
class boost_histogram.tag.Slicer#

Bases: object

This is a simple class to make slicing inside dictionaries simpler. This is how it should be used:

s = bh.tag.Slicer()

h[{0: s[::bh.rebin(2)]}] # rebin axis 0 by two

class boost_histogram.tag.at(value: int)#

Bases: object

value#
class boost_histogram.tag.loc(value: str | float, offset: int = 0)#

Bases: Locator

value#
class boost_histogram.tag.rebin(factor: int | None = None, *, groups: Sequence[int] | None = None)#

Bases: object

factor#
group_mapping(axis: PlottableAxis) Sequence[int]#
groups#
boost_histogram.tag.sum(iterable, /, start=0)#

Return the sum of a ‘start’ value (default: 0) plus an iterable of numbers

When the iterable is empty, return the start value. This function is intended specifically for use with numeric values and may reject non-numeric types.

boost_histogram.version#