Using boost-histogram

This workbook was designed with Python 3 in mind.

[1]:
import boost_histogram as bh
import numpy as np
import matplotlib.pyplot as plt
import functools
import operator

1: Basic 1D histogram

Let’s start with the basics. We will create a histogram using boost-histogram and fill it.

1.1: Data

Let’s make a 1d dataset to run on.

[2]:
data1 = np.random.normal(3.5, 2.5, size=1_000_000)

Now, let’s make a histogram

[3]:
hist1 = bh.Histogram(bh.axis.Regular(40, -2, 10))
[4]:
hist1.fill(data1)
[4]:
Histogram(Regular(40, -2, 10), storage=Double()) # Sum: 981297.0 (1000000.0 with flow)

You can see that the histogram has been filled. Let’s explicitly check to see how many entries are in the histogram:

[5]:
hist1.sum()
[5]:
981297.0

What happened to the missing items? They are in the underflow and overflow bins:

[6]:
hist1.sum(flow=True)
[6]:
1000000.0

Like ROOT, we have overflow bins by default. We can turn them off, but they enable some powerful things like projections.

Let’s plot this (Hist should make this easier):

[7]:
plt.bar(hist1.axes[0].centers, hist1.view(), width=hist1.axes[0].widths);
../_images/notebooks_BoostHistogramHandsOn_13_0.png

Note: You can leave off the .view() if you want to - histograms conform to the buffer protocol. Also, you can select the axes before or after calling .centers; this is very useful for ND histograms.

From now on, let’s be lazy

[8]:
plothist = lambda h: plt.bar(*h.axes.centers, h, width=h.axes.widths[0]);

Aside: here’s step. It’s quite ugly for us, just like it is for numpy. Or anyone.

[9]:
plt.step(hist1.axes[0].edges[:-1], hist1, where='post');
../_images/notebooks_BoostHistogramHandsOn_18_0.png

No plotting is built in, but the data is easy to access.

2: Drop-in replacement for Numpy

To start using this yourself, you don’t even need to change your code. Let’s try the numpy adapters.

[10]:
bins2, edges2 = bh.numpy.histogram(data1, bins=10)
[11]:
b2, e2 = np.histogram(data1, bins=10)
[12]:
bins2 - b2
[12]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
[13]:
e2 - edges2
[13]:
array([ 0.00000000e+00,  8.88178420e-16,  0.00000000e+00, -8.88178420e-16,
       -8.88178420e-16,  0.00000000e+00, -1.77635684e-15,  1.77635684e-15,
       -1.77635684e-15, -3.55271368e-15,  1.77635684e-15])

Not bad! Let’s start moving to the boost-histogram API, so we can use our little plotting function:

[14]:
hist2 = bh.numpy.histogram(data1, bins='auto', histogram=bh.Histogram)
plothist(hist2);
../_images/notebooks_BoostHistogramHandsOn_26_0.png

Now we can move over to boost-histogram one step at a time! Just to be complete, we can also go back to a Numpy tuple from a Histogram object:

[15]:
b2p, e2p = bh.numpy.histogram(data1, bins=10, histogram=bh.Histogram).to_numpy()
b2p == b2
[15]:
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

3: More dimensions

The same API works for multiple dimensions.

[16]:
hist3 = bh.Histogram(
    bh.axis.Regular(150, -1.5, 1.5),
    bh.axis.Regular(100, -1, 1)
)
[17]:
def make_2D_data(*, mean=(0, 0), widths=(1, 1), size=1_000_000):
    cov = np.asarray(widths) * np.eye(2)
    return np.random.multivariate_normal(mean, cov, size=size).T
[18]:
data3x = make_2D_data(mean=[-.75, .5], widths=[.2, 0.02])
data3y = make_2D_data(mean=[.75, .5], widths=[.2, 0.02])

From here on out, I will be using .reset() before a .fill(), just to make sure each cell in the notebook can be rerun.

[19]:
hist3.reset()
hist3.fill(*data3x)
hist3.fill(*data3y)
[19]:
Histogram(
  Regular(150, -1.5, 1.5),
  Regular(100, -1, 1),
  storage=Double()) # Sum: 1905708.0 (2000000.0 with flow)

Again, let’s make plotting a little function:

[20]:
def plothist2d(h):
    return plt.pcolormesh(*h.axes.edges.T, h.view().T)

This is transposed because `pcolormesh expects it <https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.pcolormesh.html#axes-pcolormesh-grid-orientation>`__.

[21]:
plothist2d(hist3);
../_images/notebooks_BoostHistogramHandsOn_39_0.png

Let’s try a 3D histogram

[22]:
data3d = [np.random.normal(size=1_000_000) for _ in range(3)]

hist3d = bh.Histogram(
    bh.axis.Regular(150, -5, 5),
    bh.axis.Regular(100, -5, 5),
    bh.axis.Regular(100, -5, 5)
)

hist3d.fill(*data3d)
[22]:
Histogram(
  Regular(150, -5, 5),
  Regular(100, -5, 5),
  Regular(100, -5, 5),
  storage=Double()) # Sum: 999997.0 (1000000.0 with flow)

Let’s project to the first two axes:

[23]:
plothist2d(hist3d.project(0, 1));
../_images/notebooks_BoostHistogramHandsOn_43_0.png

4: UHI

Let’s explore the boost-histogram UHI syntax. We will reuse the previous 2D histogram from part 3:

[24]:
plothist2d(hist3);
../_images/notebooks_BoostHistogramHandsOn_45_0.png

I can see that I want y from 0.25 to 0.75, in data coordinates:

[25]:
plothist2d(hist3[:, bh.loc(.25):bh.loc(.75)]);
../_images/notebooks_BoostHistogramHandsOn_47_0.png

What’s the contents of a bin?

[26]:
hist3[100, 87]
[26]:
192.0

How about in data coordinates?

[27]:
hist3[bh.loc(.5), bh.loc(.75)]
[27]:
192.0

Note: to get the coordinates manually:

hist3.axes[0].index(.5) == 100
hist3.axes[1].index(.75) == 87

How about a 1d histogram?

[28]:
plothist(hist3[:, ::bh.sum]);
plothist(hist3[::bh.sum, :]);
../_images/notebooks_BoostHistogramHandsOn_54_0.png

Let’s look at one part and rebin:

[29]:
plothist2d(hist3[:50:bh.rebin(2), 50::bh.rebin(2)]);
../_images/notebooks_BoostHistogramHandsOn_56_0.png

What is the value at (-.75, .5)?

[30]:
hist3[bh.loc(-.75), bh.loc(.5)]
[30]:
1049.0

5: Understanding accumulators

Boost-histogram has several different storages; storages store accumulators. Let’s try making a profile.

[31]:
mean = bh.accumulators.Mean()
mean.fill([.3, .4, .5])
[31]:
Mean(count=3, value=0.4, variance=0.01)

Here’s a quick example accessing the values:

[32]:
print(f"mean.count={mean.count} mean.value={mean.value:g} mean.variance={mean.variance:g}")

# Python 3.8:
# print(f"{mean.count=} {mean.value=} {mean.variance=}")
mean.count=3.0 mean.value=0.4 mean.variance=0.01

6: Changing the storage

[33]:
hist6 = bh.Histogram(bh.axis.Regular(10, 0, 10), storage=bh.storage.Mean())
[34]:
hist6.fill([0.5]*3, sample=[.3, .4, .5])
[34]:
Histogram(Regular(10, 0, 10), storage=Mean()) # Sum: Mean(count=3, value=0.4, variance=0.01)
[35]:
hist6[0]
[35]:
Mean(count=3, value=0.4, variance=0.01)
[36]:
hist6.view()
[36]:
MeanView(
      [(3., 0.4, 0.02), (0., 0. , 0.  ), (0., 0. , 0.  ), (0., 0. , 0.  ),
       (0., 0. , 0.  ), (0., 0. , 0.  ), (0., 0. , 0.  ), (0., 0. , 0.  ),
       (0., 0. , 0.  ), (0., 0. , 0.  )],
      dtype=[('count', '<f8'), ('value', '<f8'), ('sum_of_deltas_squared', '<f8')])
[37]:
hist6.view().value
[37]:
array([0.4, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ])
[38]:
hist6.view().variance
[38]:
array([ 0.01, -0.  , -0.  , -0.  , -0.  , -0.  , -0.  , -0.  , -0.  ,
       -0.  ])

7: Making a density histogram

Let’s try to make a density histogram like Numpy’s.

[39]:
bins = [-10, -7, -4, -3, -2, -1, -.75, -.5, -.25, 0, .25, .5, .75, 1, 2, 3, 4, 7, 10]
d7, e7 = np.histogram(data1 - 3.5, bins=bins, density=True)
plt.hist(data1 - 3.5, bins=bins, density=True);
../_images/notebooks_BoostHistogramHandsOn_72_0.png

Yes, it’s ugly. Don’t judge.

We don’t have a .density! What do we do? (note: density=True is supported if you do not return a bh object)

[40]:
hist7 = bh.numpy.histogram(data1 - 3.5, bins=bins, histogram=bh.Histogram)

widths = hist7.axes.widths
area = functools.reduce(operator.mul, hist7.axes.widths)

area
[40]:
array([3.  , 3.  , 1.  , 1.  , 1.  , 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
       0.25, 0.25, 1.  , 1.  , 1.  , 3.  , 3.  ])

Yes, that does not need to be so complicated for 1D, but it’s general.

[41]:
factor = np.sum(hist7.view())
view = hist7.view() / (factor * area)
[42]:
plt.bar(hist7.axes[0].centers, view, width=hist7.axes[0].widths);
../_images/notebooks_BoostHistogramHandsOn_77_0.png

8: Axis types

There are more axes types, and they all provide the same API in histograms, so they all just work without changes:

[43]:
hist8 = bh.Histogram(
    bh.axis.Regular(30, 1, 10, transform=bh.axis.transform.log),
    bh.axis.Regular(30, 1, 10, transform=bh.axis.transform.sqrt)
)
[44]:
hist8.reset()
hist8.fill(*make_2D_data(mean=(5, 5), widths=(5, 5)))
[44]:
Histogram(
  Regular(30, 1, 10, transform=log),
  Regular(30, 1, 10, transform=sqrt),
  storage=Double()) # Sum: 903458.0 (1000000.0 with flow)
[45]:
plothist2d(hist8);
../_images/notebooks_BoostHistogramHandsOn_81_0.png

9: And, circular, too!

[46]:
hist9 = bh.Histogram(bh.axis.Regular(30, 0, 2*np.pi, circular=True))
hist9.fill(np.random.uniform(0, np.pi*4, size=300))
[46]:
Histogram(Regular(30, 0, 6.28319, circular=True), storage=Double()) # Sum: 300.0

Now, the really complicated part, making a circular histogram:

[47]:
ax = plt.subplot(111, polar=True)
plothist(hist9);
../_images/notebooks_BoostHistogramHandsOn_85_0.png