Best practice to Share histogram between c++ and python #400
-
I usually create large histograms in C++ that I want to analyze and visualize in Python. One possibility I could think of would be to recreate the histogram based on two files, one containing the axis information and the other containing the histogram values. #include <boost/archive/text_oarchive.hpp>
#include <boost/histogram.hpp>
#include <boost/histogram/serialization.hpp>
#include <fstream>
#include <iostream>
#include <sstream>
namespace bh = boost::histogram;
int main()
{
auto hist = bh::make_histogram(bh::axis::regular<>(10, 0.0, 1.0, "radius"),
bh::axis::circular<>(10, 0.f, 2.f * M_PI, "phi"),
bh::axis::circular<>(10, 0.f, 1.f * M_PI, "theta"));
hist(bh::weight(0.1), 0.1, 0.f * M_PI, 1 * M_PI);
hist(bh::weight(0.2), 0.5, 3.f * M_PI, 2 * M_PI);
hist(bh::weight(0.3), 0.9, 2.f * M_PI, 3 * M_PI);
{
std::ofstream ofs("histogram.csv");
for (auto&& x : indexed(hist))
ofs << x.index(0) << "," << x.index(1) << "," << x.index(2) << ","
<< *x << "\n";
}
{
std::ofstream ofs("metadata.txt");
for (int i = 0; i < 3; ++i)
ofs << hist.axis(i) << "\n";
}
return 0;
} I can imagine that there are much better solutions, but I haven't found them yet. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 18 replies
-
Hi, this use case is not covered in the docs, sorry. If you have access to pybind11 and numpy on the machine where you want to generate your file in C++, you can call the C++ serialization code from boost-histogram directly to generate a pickle file, which you can then unpickle in Python normally. The C++ code which is needed for Python's pickle protocol is here: You need to include this file and run this code (beware, this code was not actually tested): py::tuple tup;
tuple_oarchive oa{tup};
oa << hist; // corresponds to `hist` in your example code
// call into Python libs to pickle `tup`
auto open = py::module_::import("gzip").attr("open");
auto dump = py::module_::import("pickle").attr("dump");
auto f = open("myfile.pkl.gz", "w");
dump(tup, f);
f.attr("close")(); This should generate a gzip'ed file with the binary contents of your histogram. Saving the data in this way is much more efficient than generating an ASCII file and the file will be quite small, too. I think we should add an example to this repo which demonstrates this. Apart from this solution, I was discussing with @henryiii and others at some point to make a separate repo with shared serialization code for Boost.Histogram (this lib) and boost-histogram (the Python wrapper), but we never got to do it. It shouldn't be difficult, however. We could use the portable_binary archive from https://github.com/USCiLab/cereal to generate a common binary representation of the histogram. cereal is source compatible with Boost.Serialization and Boost.Histogram already supports Boost.Serialization. We could also use Boost.Serialization directly, but it does not have support for portable binary. That being said, the only caveat of the solution I show here is that you need to have pybind11 and Python libs installed on the machine where you want to write the file in C++. If that's not an issue, it is a great solution. |
Beta Was this translation helpful? Give feedback.
You need to initialize the Python interpreter before you can call the Python API. See
https://pybind11.readthedocs.io/en/stable/advanced/embedding.html
You need to add this line:
The rest looks good.