I want to read a chunk of data which is just one frame of many frames stored in one dataset. The shape of the whole dataset is (10, 11214,3), 10 frames each frame has 11214 rows and 4 columns. Here is the file. The chunk I want to read would have the shape (11214,3). I can print the predefined array using, but I'm not sure how can I read data from a hdf5 file. Here is my code,
#include <h5xx/h5xx.hpp>
#include <boost/multi_array.hpp>
#include <iostream>
#include <vector>
#include <cstdio>
typedef boost::multi_array<int, 2> array_2d_t;
const int NI=10;
const int NJ=NI;
void print_array(array_2d_t const& array)
{
for (unsigned int j = 0; j < array.shape()[1]; j++)
{
for (unsigned int i = 0; i < array.shape()[0]; i++)
{
printf("%2d ", array[j][i]);
}
printf("\n");
}
}
void write_int_data(std::string const& filename, array_2d_t const& array)
{
h5xx::file file(filename, h5xx::file::trunc);
std::string name;
{
// --- create dataset and fill it with the default array data (positive values)
name = "integer array";
h5xx::create_dataset(file, name, array);
h5xx::write_dataset(file, name, array);
// --- create a slice object (aka hyperslab) to specify the location in the dataset to be overwritten
std::vector<int> offset; int offset_raw[2] = {4,4}; offset.assign(offset_raw, offset_raw + 2);
std::vector<int> count; int count_raw[2] = {2,2}; count.assign(count_raw, count_raw + 2);
h5xx::slice slice(offset, count);
}
}
void read_int_data(std::string const& filename)
{
h5xx::file file(filename, h5xx::file::in);
std::string name = "integer array";
// read and print the full dataset
{
array_2d_t array;
// --- read the complete dataset into array, the array is resized and overwritten internally
h5xx::read_dataset(file, name, array);
printf("original integer array read from file, negative number patch was written using a slice\n");
print_array(array);
printf("\n");
}
}
int main(int argc, char** argv)
{
std::string filename = argv[0];
filename.append(".h5");
// --- do a few demos/tests using integers
{
array_2d_t array(boost::extents[NJ][NI]);
{
const int nelem = NI*NJ;
int data[nelem];
for (int i = 0; i < nelem; i++)
data[i] = i;
array.assign(data, data + nelem);
}
write_int_data(filename, array);
read_int_data(filename);
}
return 0;
}
I'm using the h5xx — a template-based C++ wrapper for the HDF5 library link and boost library.
The datasets are stored in particles/lipids/box/positions path. The dataset name value holds the frames.
argv[0] is not what you want (arguments start at 1, 0 is the program name). Consider bounds checking as well:
std::vector<std::string> const args(argv, argv + argc);
std::string const filename = args.at(1) + ".h5";
the initialization can be done directly, without a temporary array (what is multi_array for, otherwise?)
for (size_t i = 0; i < array.num_elements(); i++)
array.data()[i] = i;
Or indeed, make it an algorithm:
std::iota(array.data(), array.data() + array.num_elements(), 0);
same with vectors:
std::vector<int> offset; int offset_raw[2] = {4,4}; offset.assign(offset_raw, offset_raw + 2);
std::vector<int> count; int count_raw[2] = {2,2}; count.assign(count_raw, count_raw + 2);
besides being a formatting mess can be simply
std::vector offset{4,4}, count{2,2};
h5xx::slice slice(offset, count);
On To The Real Question
The code has no relevance to the file. At all. I created some debug/tracing code to dump the file contents:
void dump(h5xx::group const& g, std::string indent = "") {
auto dd = g.datasets();
auto gg = g.groups();
for (auto it = dd.begin(); it != dd.end(); ++it) {
std::cout << indent << " ds:" << it.get_name() << "\n";
}
for (auto it = gg.begin(); it != gg.end(); ++it) {
dump(*it, indent + "/" + it.get_name());
}
}
int main()
{
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
dump(xaa);
}
Prints
/particles/lipids/box/edges ds:box_size
/particles/lipids/box/edges ds:step
/particles/lipids/box/edges ds:time
/particles/lipids/box/edges ds:value
/particles/lipids/box/positions ds:step
/particles/lipids/box/positions ds:time
/particles/lipids/box/positions ds:value
Now we can drill down to the dataset. Let's see whether we can figure out the correct type. It certainly is NOT array_2d_t:
h5xx::dataset ds(xaa, "particles/lipids/box/positions/value");
array_2d_t a;
h5xx::datatype detect(a);
std::cout << "type: " << std::hex << ds.get_type() << std::dec << "\n";
std::cout << "detect: " << std::hex << detect.get_type_id() << std::dec << "\n";
Prints
type: 30000000000013b
detect: 30000000000000c
That's a type mismatch. I guess I'll have to learn to read that gibberish as well...
Let's add some diagnostics:
void diag_type(hid_t type)
{
std::cout << " Class " << ::H5Tget_class(type) << std::endl;
std::cout << " Size " << ::H5Tget_size(type) << std::endl;
std::cout << " Sign " << ::H5Tget_sign(type) << std::endl;
std::cout << " Order " << ::H5Tget_order(type) << std::endl;
std::cout << " Precision " << ::H5Tget_precision(type) << std::endl;
std::cout << " NDims " << ::H5Tget_array_ndims(type) << std::endl;
std::cout << " NMembers " << ::H5Tget_nmembers(type) << std::endl;
}
int main()
{
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
// dump(xaa);
{
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
std::cout << "dataset: " << std::hex << ds.get_type() << std::dec << std::endl;
diag_type(ds.get_type());
}
{
array_2d_t a(boost::extents[NJ][NI]);
h5xx::datatype detect(a);
std::cout << "detect: " << std::hex << detect.get_type_id() << std::dec << std::endl;
diag_type(detect.get_type_id());
}
}
Prints
dataset: 30000000000013b
Class 1
Size 4
Sign -1
Order 0
Precision 32
NDims -1
NMembers -1
detect: 30000000000000c
Class 0
Size 4
Sign 1
Order 0
Precision 32
NDims -1
NMembers -1
At least we know that HST_FLOAT (class 1) is required. Let's modify array_2d_t:
using array_2d_t = boost::multi_array<float, 2>;
array_2d_t a(boost::extents[11214][3]);
This at least makes the data appear similarly. Let's ... naively try to read:
h5xx::read_dataset(ds, a);
Oops, that predictably throws
terminate called after throwing an instance of 'h5xx::error'
what(): /home/sehe/Projects/stackoverflow/deps/h5xx/h5xx/dataset/boost_multi_array.hpp:176:read_dataset(): dataset "/particles/lipi
ds/box/positions/value" and target array have mismatching dimensions
No worries, we can guess:
using array_3d_t = boost::multi_array<float, 3>;
array_3d_t a(boost::extents[10][11214][3]);
h5xx::read_dataset(ds, a);
At least this does work. Adapting the print function:
template <typename T> void print_array(T const& array) {
for (auto const& row : array) {
for (auto v : row) printf("%5f ", v);
printf("\n");
}
}
Now we can print the first frame:
h5xx::read_dataset(ds, a);
print_array(*a.begin()); // print the first frame
This prints:
80.480003 35.360001 4.250000
37.450001 3.920000 3.960000
18.530001 -9.690000 4.680000
55.389999 74.339996 4.600000
22.110001 68.709999 3.850000
-4.130000 24.040001 3.730000
40.160000 6.390000 4.730000
-5.400000 35.730000 4.850000
36.669998 22.450001 4.080000
-3.680000 -10.660000 4.180000
(...)
That checks out with h5ls -r -d xaa.h5/particles/lipids/box/positions/value:
particles/lipids/box/positions/value Dataset {75/Inf, 11214, 3}
Data:
(0,0,0) 80.48, 35.36, 4.25, 37.45, 3.92, 3.96, 18.53, -9.69, 4.68,
(0,3,0) 55.39, 74.34, 4.6, 22.11, 68.71, 3.85, -4.13, 24.04, 3.73,
(0,6,0) 40.16, 6.39, 4.73, -5.4, 35.73, 4.85, 36.67, 22.45, 4.08, -3.68,
(0,9,1) -10.66, 4.18, 35.95, 36.43, 5.15, 57.17, 3.88, 5.08, -23.64,
(0,12,1) 50.44, 4.32, 6.78, 8.24, 4.36, 21.34, 50.63, 5.21, 16.29,
(0,15,1) -1.34, 5.28, 22.26, 71.25, 5.4, 19.76, 10.38, 5.34, 78.62,
(0,18,1) 11.13, 5.69, 22.14, 59.7, 4.92, 15.65, 47.28, 5.22, 82.41,
(0,21,1) 2.09, 5.24, 16.87, -11.68, 5.35, 15.54, -0.63, 5.2, 81.25,
(...)
The Home Stretch: Adding The Slice
array_2d_t read_frame(int frame_no) {
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
array_2d_t a(boost::extents[11214][3]);
std::vector offsets{frame_no, 0, 0}, counts{1, 11214, 3};
h5xx::slice slice(offsets, counts);
h5xx::read_dataset(ds, a, slice);
return a;
}
There you have it. Now we can print any frame:
print_array(read_frame(0));
Printing the same as before. Let's try the last frame:
print_array(read_frame(9));
Prints
79.040001 36.349998 3.990000
37.250000 3.470000 4.140000
18.600000 -9.270000 4.900000
55.669998 75.070000 5.370000
21.920000 67.709999 3.790000
-4.670000 24.770000 3.690000
40.000000 6.060000 5.240000
-5.340000 36.320000 5.410000
36.369999 22.490000 4.130000
-3.520000 -10.430000 4.280000
(...)
Checking again with h5ls -r -d xaa.h5/particles/lipids/box/positions/value |& grep '(9' | head confirms:
(9,0,0) 79.04, 36.35, 3.99, 37.25, 3.47, 4.14, 18.6, -9.27, 4.9, 55.67,
(9,3,1) 75.07, 5.37, 21.92, 67.71, 3.79, -4.67, 24.77, 3.69, 40, 6.06,
(9,6,2) 5.24, -5.34, 36.32, 5.41, 36.37, 22.49, 4.13, -3.52, -10.43,
(9,9,2) 4.28, 35.8, 36.43, 4.99, 56.6, 4.09, 5.04, -23.37, 49.42, 3.81,
(9,13,0) 6.31, 8.83, 4.56, 22.01, 50.38, 5.43, 16.3, -2.92, 5.4, 22.02,
(9,16,1) 70.09, 5.36, 20.23, 11.12, 5.66, 78.48, 11.34, 6.09, 20.26,
(9,19,1) 61.45, 5.35, 14.25, 48.32, 5.35, 79.95, 1.71, 5.38, 17.56,
(9,22,1) -11.61, 5.39, 15.64, -0.19, 5.06, 80.43, 71.77, 5.29, 75.54,
(9,25,1) 35.14, 5.26, 22.45, 56.86, 5.56, 16.47, 52.97, 6.16, 20.62,
(9,28,1) 65.12, 5.26, 19.68, 71.2, 5.52, 23.39, 49.84, 5.28, 22.7,
Full Listing
#include <boost/multi_array.hpp>
#include <h5xx/h5xx.hpp>
#include <iostream>
using array_2d_t = boost::multi_array<float, 2>;
template <typename T> void print_array(T const& array)
{
for (auto const& row : array) { for (auto v : row)
printf("%5f ", v);
printf("\n");
}
}
void dump(h5xx::group const& g, std::string indent = "") {
auto dd = g.datasets();
auto gg = g.groups();
for (auto it = dd.begin(); it != dd.end(); ++it) {
std::cout << indent << " ds:" << it.get_name() << std::endl;
}
for (auto it = gg.begin(); it != gg.end(); ++it) {
dump(*it, indent + "/" + it.get_name());
}
}
array_2d_t read_frame(int frame_no) {
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
array_2d_t arr(boost::extents[11214][3]);
std::vector offsets{frame_no, 0, 0}, counts{1, 11214, 3};
h5xx::slice slice(offsets, counts);
h5xx::read_dataset(ds, arr, slice);
return arr;
}
int main()
{
print_array(read_frame(9));
}
I have a struct that looks like this.
typedef struct superCellBoxStruct {
float_tt cmx,cmy,cmz; /* fractional center of mass coordinates */
float_tt ax,by,cz;
boost::shared_ptr<std::vector<atom>> atoms; /* contains all the atoms within the super cell */
} superCellBox;
now when I want to access atoms[i] I get
error: invalid use of ‘boost::detail::sp_array_access >::type {aka void}’
What is the proper way of passing around a shared vector in my application, or what is the correct way to access its operator[]?
Pick one:
(*atoms)[i]
atoms->operator[](i);
I usually go with the first, but they are all equivalent.
As a side note, in my experience a shared_ptr<vector> like that is usually a symptom of a bad design, maybe you want to put the entire superCellBox in a shared_ptr?
Also, this is not C, use struct name {}; instead typedef struct tagName {} name;
Prefer unique_ptr<T[]> if you can, because you get operator[] for free (§ 20.7.1.3.3):
Quick demo:
Live On Coliru
#include <memory>
#include <iostream>
int main() {
std::unique_ptr<int[]> p(new int[3] { 1,2,3 });
std::cout << "Before: " << p[0] << ", " << p[1] << ", " << p[2] << ";\n";
p[1] = 42;
std::cout << "After: " << p[0] << ", " << p[1] << ", " << p[2] << ";\n";
}
Prints:
Before: 1, 2, 3;
After: 1, 42, 3;
UPDATE
In response to the comment, just make a small wrapper:
Live On Coliru
#include <memory>
template <typename RAContainer> struct shared_randomaccess_container
{
template <typename... A> shared_randomaccess_container(A&&... args)
: _ptr(new RAContainer{ std::forward<A>(args)... })
{ }
template <typename T> shared_randomaccess_container(std::initializer_list<T> init)
: _ptr(std::make_shared<RAContainer>(init))
{ }
auto begin() const -> typename RAContainer::const_iterator { return _ptr->begin(); }
auto end () const -> typename RAContainer::const_iterator { return _ptr->end (); }
auto begin() -> typename RAContainer::iterator { return _ptr->begin(); }
auto end () -> typename RAContainer::iterator { return _ptr->end (); }
template <typename Idx>
typename RAContainer::value_type const& operator[](Idx i) const { return (*_ptr)[i]; }
template <typename Idx>
typename RAContainer::value_type& operator[](Idx i) { return (*_ptr)[i]; }
template <typename Idx>
typename RAContainer::value_type const& at(Idx i) const { return _ptr->at(i); }
template <typename Idx>
typename RAContainer::value_type& at(Idx i) { return _ptr->at(i); }
protected:
using Ptr = std::shared_ptr<RAContainer>;
Ptr _ptr;
};
////////////////////////////////////////////////////
// demo intances
#include <vector>
template <typename... Ts> using shared_vector = shared_randomaccess_container<std::vector<Ts...> >;
You can use it like:
shared_vector<int> sv {1,2,3};
std::cout << "Before: ";
for (auto i : sv) std::cout << i << " ";
sv[1] = 42;
std::cout << "\nAfter: ";
for (auto i : sv) std::cout << i << " ";
Prints:
Before: 1 2 3
After: 1 42 3
Bonus
Let's also support aggregate initializing containers with the same technique
Live On Coliru
Output:
void test() [with With = std::vector<int>]
Before: 1 2 3
After: 1 42 3
void test() [with With = std::array<int, 3ul>]
Before: 1 2 3
After: 1 42 3
void test() [with With = shared_randomaccess_container<std::vector<int>, false>]
Before: 1 2 3
After: 1 42 3
void test() [with With = shared_randomaccess_container<std::array<int, 3ul>, true>]
Before: 1 2 3
After: 1 42 3
I am probably making some elementary mistake here but given:
std::array<int, 3> arr = { 1, 2, 3 };
std::vector<int> vecint;
vecint.push_back(1);
vecint.push_back(2);
This is one obvious way to compare the elements in arr with the ones in vecint.
std::for_each(vecint.begin(), vecint.end(), [&arr](int vecvalue) {
for (auto arritr = arr.begin(); arritr != arr.end(); ++arritr) {
if (vecvalue == *arritr) {
std::cout << "found!!! " << vecvalue << "\n";
}
}
});
However, should I be able to do it like this too?
std::for_each(vecint.begin(), vecint.end(), [&arr](int vecvalue) {
if (std::find(arr.begin(), arr.end(), [=](int arrval) { return vecvalue == arrval; }) != arr.end()) {
std::cout << "found!!! " << vecvalue << "\n";
}
});
The latter fails to compile in VC11 with the following error:
1>c:\program files (x86)\microsoft visual studio 11.0\vc\include\xutility(3186): error C2678: binary '==' : no operator found which takes a left-hand operand of type 'int' (or there is no acceptable conversion)
What am I missing?
cppreference on std::find and std::find_if
std::find takes a value to compare with as the third parameter, whereas std::find_if takes a UnaryPredicate (a function object taking one parameter). You probably just had a typo / wanted to use std::find_if.
Using std::find_if works for me. Live example.
#include <array>
#include <vector>
#include <iostream>
#include <algorithm>
int main()
{
std::array<int, 3> arr = {{ 1, 2, 3 }};
std::vector<int> vecint;
vecint.push_back(1);
vecint.push_back(2);
std::for_each
(
vecint.begin(), vecint.end(),
[&arr](int vecvalue)
{
if (std::find_if(arr.begin(), arr.end(),
[=](int arrval) { return vecvalue == arrval; })
!= arr.end())
{
std::cout << "found!!! " << vecvalue << "\n";
}
}
);
}
A simpler version is of course to use std::find (correctly):
std::for_each
(
vecint.begin(), vecint.end(),
[&arr](int vecvalue)
{
if (std::find(arr.begin(), arr.end(), vecvalue) != arr.end())
{
std::cout << "found!!! " << vecvalue << "\n";
}
}
);
Then, there's of course the range-based-for-loop variant, if your compiler supports it:
for(auto const& ve : vecint)
{
for(auto const& ae : arr)
{
if(ve == ae)
{
std::cout << "found!!! " << ve << "\n";
}
}
}
If your ranges are sorted, there are faster algorithms to get the intersection. Either you write your own loop to invoke an action for each element in the intersection, or you let the Standard Library copy the intersection into a new container:
#include <iterator> // additionally
std::vector<int> result;
std::set_intersection(arr.begin(), arr.end(), vecint.begin(), vecint.end(),
std::back_inserter(result));
for(auto const& e : result)
{
std::cout << e << std::endl;
}
What happens under the hood - why you get that error:
std::find is defined as (from cppreference):
template< class InputIt, class T >
InputIt find( InputIt first, InputIt last, const T& value );
That is, the type of value and the type of the iterators are independent. However, in the implementation of std::find, there has to be a comparison like:
if(*first == value) { return first; }
And at this point, you're comparing an int (type of the expression *first) with a lambda (type of value) more precisely: with a closure type. This is ill-formed (luckily), as there's no conversion from a lambda to an int, and there's no comparison operator declared that's applicable here.