Performance Comparison#

We will compare boost-histogram to numpy.

[1]:
import numpy as np
from numpy.testing import assert_allclose

import boost_histogram as bh
[2]:
import os

threads = os.cpu_count() // 2
print(f"threads: {threads}")
threads: 8

Testing setup#

This is just a simple 1D and 2D dataset to use for performance runs. The testing setup is the same as “MBP” in this post, a dual-core MacBook Pro 2015.

[3]:
bins = (100, 100)
ranges = ((-3, 3), (-3, 3))
bins = np.asarray(bins).astype(np.int64)
ranges = np.asarray(ranges).astype(np.float64)

edges = (
    np.linspace(*ranges[0, :], bins[0] + 1),
    np.linspace(*ranges[1, :], bins[1] + 1),
)
[4]:
np.random.seed(42)
vals = np.random.normal(size=[2, 10_000_000]).astype(np.float32)
vals1d = np.random.normal(size=[10_000_000]).astype(np.float32)

Traditional 1D NumPy Histogram#

This is reasonably optimized; it should provide good performance.

[5]:
answer, e = np.histogram(vals1d, bins=bins[0], range=ranges[0])
[6]:
%%timeit
h, _ = np.histogram(vals1d, bins=bins[0], range=ranges[0])
assert_allclose(h, answer, atol=1)
74.5 ms ± 2.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Boost histogram 1D#

[7]:
%%timeit
hist = bh.Histogram(bh.axis.Regular(bins[0], *ranges[0]), storage=bh.storage.Int64())
hist.fill(vals1d)
assert_allclose(hist, answer, atol=1)
41.6 ms ± 712 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Boost histogram 1D NumPy clone#

[8]:
%%timeit
h, _ = bh.numpy.histogram(vals1d, bins=bins[0], range=ranges[0])
assert_allclose(h, answer, atol=1)
43.1 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Boost histogram in 1D, threaded#

[9]:
%%timeit
hist = bh.Histogram(bh.axis.Regular(bins[0], *ranges[0]), storage=bh.storage.Int64())

hist.fill(vals1d, threads=threads)
assert_allclose(hist, answer, atol=1)
13.3 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Boost histogram 1D NumPy clone, threaded#

[10]:
%%timeit
h, _ = bh.numpy.histogram(vals1d, bins=bins[0], range=ranges[0], threads=threads)
assert_allclose(h, answer, atol=1)
13.8 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Traditional 2D NumPy histogram#

Not as well optimized for regular filling.

[11]:
answer2, *ledges = np.histogram2d(*vals, bins=bins, range=ranges)
[12]:
%%timeit
H, *ledges = np.histogram2d(*vals, bins=bins, range=ranges)
assert_allclose(H, answer2, atol=1)
874 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Boost histogram in 2D#

[13]:
%%timeit
hist = bh.Histogram(
    bh.axis.Regular(bins[0], *ranges[0]), bh.axis.Regular(bins[1], *ranges[1])
)
hist.fill(*vals)
assert_allclose(hist, answer2, atol=1)
77.6 ms ± 615 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Boost histogram 2D NumPy clone#

[14]:
%%timeit
H, *ledges = bh.numpy.histogram2d(*vals, bins=bins, range=ranges)
assert_allclose(H, answer2, atol=1)
84.7 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Boost histogram in 2D, threaded#

[15]:
%%timeit
hist = bh.Histogram(
    bh.axis.Regular(bins[0], *ranges[0]), bh.axis.Regular(bins[1], *ranges[1])
)

hist.fill(*vals, threads=threads)
assert_allclose(hist, answer2, atol=1)
28.7 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Boost histogram 2D NumPy clone, threaded#

[16]:
%%timeit
H, *ledges = bh.numpy.histogram2d(*vals, bins=bins, range=ranges, threads=threads)
assert_allclose(H, answer2, atol=1)
29.6 ms ± 503 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)