intrusive_ptr, shared_ptr performance tests - boost

class X {
public:
std::string name;
int age;
long references;
X(string n, int a) : references(0), name(n), age(a) {}
};
inline void intrusive_ptr_add_ref(X* x){
++x->references;
}
inline void intrusive_ptr_release(X* x){
if(--x->references == 0)
delete x;
}
int _tmain(int argc, _TCHAR* argv[])
{
time_t t=clock();
size_t rounds=1000000;
for(size_t i=0; i<rounds; i++)
{
intrusive_ptr<X> myX(new X("Michael",40));
myX->age++;
}
cout << "Time taken to generate " << rounds << " of intrusive_ptr is "
<< clock()-t << endl;
t=clock();
for(size_t i=0; i<rounds; i++)
{
boost::shared_ptr<X> myX(new X("Michael",40));
myX->age++;
}
cout << "Time taken to generate " << rounds << " of shared_ptr is "
<< clock()-t << endl;
t=clock();
for(size_t i=0; i<rounds; i++)
{
std::shared_ptr<X> myX(new X("Michael",40));
myX->age++;
}
cout << "Time taken to generate " << rounds << " of Microsoft shared_ptr is "
<< clock()-t << endl;
t=clock();
for(size_t i=0; i<rounds; i++)
{
boost::shared_ptr<X> myX=boost::make_shared<X>("Michael",40);
myX->age++;
}
cout << "Time taken to generate " << rounds << " of shared_ptr using make_shared is "
<< clock()-t << endl;
t=clock();
for(size_t i=0; i<rounds; i++)
{
std::shared_ptr<X> myX=std::make_shared<X>("Michael",40);
myX->age++;
}
cout << "Time taken to generate " << rounds << " of Microsoft shared_ptr using make_shared is "
<< clock()-t << endl;
_getche();
return 0;
}
I got below results using vs2010 for release mode.
Time taken to generate 1000000 of intrusive_ptr is 116
Time taken to generate 1000000 of shared_ptr is 175
Time taken to generate 1000000 of Microsoft shared_ptr is 182
Time taken to generate 1000000 of shared_ptr using make_shared is 176
Time taken to generate 1000000 of Microsoft shared_ptr using make_shared is 120
Seems intrusive_ptr is the fastest, but seems MS is also doig well with shared_ptr using make_shared function. But why is boost make_shared performing not as well as MS version? Anybody did a similiar test? Anything that is wrong with my test or there is something that I didn't consider?

Related

How to read chunk of the data from a hdf5 file in c++?

I want to read a chunk of data which is just one frame of many frames stored in one dataset. The shape of the whole dataset is (10, 11214,3), 10 frames each frame has 11214 rows and 4 columns. Here is the file. The chunk I want to read would have the shape (11214,3). I can print the predefined array using, but I'm not sure how can I read data from a hdf5 file. Here is my code,
#include <h5xx/h5xx.hpp>
#include <boost/multi_array.hpp>
#include <iostream>
#include <vector>
#include <cstdio>
typedef boost::multi_array<int, 2> array_2d_t;
const int NI=10;
const int NJ=NI;
void print_array(array_2d_t const& array)
{
for (unsigned int j = 0; j < array.shape()[1]; j++)
{
for (unsigned int i = 0; i < array.shape()[0]; i++)
{
printf("%2d ", array[j][i]);
}
printf("\n");
}
}
void write_int_data(std::string const& filename, array_2d_t const& array)
{
h5xx::file file(filename, h5xx::file::trunc);
std::string name;
{
// --- create dataset and fill it with the default array data (positive values)
name = "integer array";
h5xx::create_dataset(file, name, array);
h5xx::write_dataset(file, name, array);
// --- create a slice object (aka hyperslab) to specify the location in the dataset to be overwritten
std::vector<int> offset; int offset_raw[2] = {4,4}; offset.assign(offset_raw, offset_raw + 2);
std::vector<int> count; int count_raw[2] = {2,2}; count.assign(count_raw, count_raw + 2);
h5xx::slice slice(offset, count);
}
}
void read_int_data(std::string const& filename)
{
h5xx::file file(filename, h5xx::file::in);
std::string name = "integer array";
// read and print the full dataset
{
array_2d_t array;
// --- read the complete dataset into array, the array is resized and overwritten internally
h5xx::read_dataset(file, name, array);
printf("original integer array read from file, negative number patch was written using a slice\n");
print_array(array);
printf("\n");
}
}
int main(int argc, char** argv)
{
std::string filename = argv[0];
filename.append(".h5");
// --- do a few demos/tests using integers
{
array_2d_t array(boost::extents[NJ][NI]);
{
const int nelem = NI*NJ;
int data[nelem];
for (int i = 0; i < nelem; i++)
data[i] = i;
array.assign(data, data + nelem);
}
write_int_data(filename, array);
read_int_data(filename);
}
return 0;
}
I'm using the h5xx — a template-based C++ wrapper for the HDF5 library link and boost library.
The datasets are stored in particles/lipids/box/positions path. The dataset name value holds the frames.
argv[0] is not what you want (arguments start at 1, 0 is the program name). Consider bounds checking as well:
std::vector<std::string> const args(argv, argv + argc);
std::string const filename = args.at(1) + ".h5";
the initialization can be done directly, without a temporary array (what is multi_array for, otherwise?)
for (size_t i = 0; i < array.num_elements(); i++)
array.data()[i] = i;
Or indeed, make it an algorithm:
std::iota(array.data(), array.data() + array.num_elements(), 0);
same with vectors:
std::vector<int> offset; int offset_raw[2] = {4,4}; offset.assign(offset_raw, offset_raw + 2);
std::vector<int> count; int count_raw[2] = {2,2}; count.assign(count_raw, count_raw + 2);
besides being a formatting mess can be simply
std::vector offset{4,4}, count{2,2};
h5xx::slice slice(offset, count);
On To The Real Question
The code has no relevance to the file. At all. I created some debug/tracing code to dump the file contents:
void dump(h5xx::group const& g, std::string indent = "") {
auto dd = g.datasets();
auto gg = g.groups();
for (auto it = dd.begin(); it != dd.end(); ++it) {
std::cout << indent << " ds:" << it.get_name() << "\n";
}
for (auto it = gg.begin(); it != gg.end(); ++it) {
dump(*it, indent + "/" + it.get_name());
}
}
int main()
{
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
dump(xaa);
}
Prints
/particles/lipids/box/edges ds:box_size
/particles/lipids/box/edges ds:step
/particles/lipids/box/edges ds:time
/particles/lipids/box/edges ds:value
/particles/lipids/box/positions ds:step
/particles/lipids/box/positions ds:time
/particles/lipids/box/positions ds:value
Now we can drill down to the dataset. Let's see whether we can figure out the correct type. It certainly is NOT array_2d_t:
h5xx::dataset ds(xaa, "particles/lipids/box/positions/value");
array_2d_t a;
h5xx::datatype detect(a);
std::cout << "type: " << std::hex << ds.get_type() << std::dec << "\n";
std::cout << "detect: " << std::hex << detect.get_type_id() << std::dec << "\n";
Prints
type: 30000000000013b
detect: 30000000000000c
That's a type mismatch. I guess I'll have to learn to read that gibberish as well...
Let's add some diagnostics:
void diag_type(hid_t type)
{
std::cout << " Class " << ::H5Tget_class(type) << std::endl;
std::cout << " Size " << ::H5Tget_size(type) << std::endl;
std::cout << " Sign " << ::H5Tget_sign(type) << std::endl;
std::cout << " Order " << ::H5Tget_order(type) << std::endl;
std::cout << " Precision " << ::H5Tget_precision(type) << std::endl;
std::cout << " NDims " << ::H5Tget_array_ndims(type) << std::endl;
std::cout << " NMembers " << ::H5Tget_nmembers(type) << std::endl;
}
int main()
{
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
// dump(xaa);
{
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
std::cout << "dataset: " << std::hex << ds.get_type() << std::dec << std::endl;
diag_type(ds.get_type());
}
{
array_2d_t a(boost::extents[NJ][NI]);
h5xx::datatype detect(a);
std::cout << "detect: " << std::hex << detect.get_type_id() << std::dec << std::endl;
diag_type(detect.get_type_id());
}
}
Prints
dataset: 30000000000013b
Class 1
Size 4
Sign -1
Order 0
Precision 32
NDims -1
NMembers -1
detect: 30000000000000c
Class 0
Size 4
Sign 1
Order 0
Precision 32
NDims -1
NMembers -1
At least we know that HST_FLOAT (class 1) is required. Let's modify array_2d_t:
using array_2d_t = boost::multi_array<float, 2>;
array_2d_t a(boost::extents[11214][3]);
This at least makes the data appear similarly. Let's ... naively try to read:
h5xx::read_dataset(ds, a);
Oops, that predictably throws
terminate called after throwing an instance of 'h5xx::error'
what(): /home/sehe/Projects/stackoverflow/deps/h5xx/h5xx/dataset/boost_multi_array.hpp:176:read_dataset(): dataset "/particles/lipi
ds/box/positions/value" and target array have mismatching dimensions
No worries, we can guess:
using array_3d_t = boost::multi_array<float, 3>;
array_3d_t a(boost::extents[10][11214][3]);
h5xx::read_dataset(ds, a);
At least this does work. Adapting the print function:
template <typename T> void print_array(T const& array) {
for (auto const& row : array) {
for (auto v : row) printf("%5f ", v);
printf("\n");
}
}
Now we can print the first frame:
h5xx::read_dataset(ds, a);
print_array(*a.begin()); // print the first frame
This prints:
80.480003 35.360001 4.250000
37.450001 3.920000 3.960000
18.530001 -9.690000 4.680000
55.389999 74.339996 4.600000
22.110001 68.709999 3.850000
-4.130000 24.040001 3.730000
40.160000 6.390000 4.730000
-5.400000 35.730000 4.850000
36.669998 22.450001 4.080000
-3.680000 -10.660000 4.180000
(...)
That checks out with h5ls -r -d xaa.h5/particles/lipids/box/positions/value:
particles/lipids/box/positions/value Dataset {75/Inf, 11214, 3}
Data:
(0,0,0) 80.48, 35.36, 4.25, 37.45, 3.92, 3.96, 18.53, -9.69, 4.68,
(0,3,0) 55.39, 74.34, 4.6, 22.11, 68.71, 3.85, -4.13, 24.04, 3.73,
(0,6,0) 40.16, 6.39, 4.73, -5.4, 35.73, 4.85, 36.67, 22.45, 4.08, -3.68,
(0,9,1) -10.66, 4.18, 35.95, 36.43, 5.15, 57.17, 3.88, 5.08, -23.64,
(0,12,1) 50.44, 4.32, 6.78, 8.24, 4.36, 21.34, 50.63, 5.21, 16.29,
(0,15,1) -1.34, 5.28, 22.26, 71.25, 5.4, 19.76, 10.38, 5.34, 78.62,
(0,18,1) 11.13, 5.69, 22.14, 59.7, 4.92, 15.65, 47.28, 5.22, 82.41,
(0,21,1) 2.09, 5.24, 16.87, -11.68, 5.35, 15.54, -0.63, 5.2, 81.25,
(...)
The Home Stretch: Adding The Slice
array_2d_t read_frame(int frame_no) {
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
array_2d_t a(boost::extents[11214][3]);
std::vector offsets{frame_no, 0, 0}, counts{1, 11214, 3};
h5xx::slice slice(offsets, counts);
h5xx::read_dataset(ds, a, slice);
return a;
}
There you have it. Now we can print any frame:
print_array(read_frame(0));
Printing the same as before. Let's try the last frame:
print_array(read_frame(9));
Prints
79.040001 36.349998 3.990000
37.250000 3.470000 4.140000
18.600000 -9.270000 4.900000
55.669998 75.070000 5.370000
21.920000 67.709999 3.790000
-4.670000 24.770000 3.690000
40.000000 6.060000 5.240000
-5.340000 36.320000 5.410000
36.369999 22.490000 4.130000
-3.520000 -10.430000 4.280000
(...)
Checking again with h5ls -r -d xaa.h5/particles/lipids/box/positions/value |& grep '(9' | head confirms:
(9,0,0) 79.04, 36.35, 3.99, 37.25, 3.47, 4.14, 18.6, -9.27, 4.9, 55.67,
(9,3,1) 75.07, 5.37, 21.92, 67.71, 3.79, -4.67, 24.77, 3.69, 40, 6.06,
(9,6,2) 5.24, -5.34, 36.32, 5.41, 36.37, 22.49, 4.13, -3.52, -10.43,
(9,9,2) 4.28, 35.8, 36.43, 4.99, 56.6, 4.09, 5.04, -23.37, 49.42, 3.81,
(9,13,0) 6.31, 8.83, 4.56, 22.01, 50.38, 5.43, 16.3, -2.92, 5.4, 22.02,
(9,16,1) 70.09, 5.36, 20.23, 11.12, 5.66, 78.48, 11.34, 6.09, 20.26,
(9,19,1) 61.45, 5.35, 14.25, 48.32, 5.35, 79.95, 1.71, 5.38, 17.56,
(9,22,1) -11.61, 5.39, 15.64, -0.19, 5.06, 80.43, 71.77, 5.29, 75.54,
(9,25,1) 35.14, 5.26, 22.45, 56.86, 5.56, 16.47, 52.97, 6.16, 20.62,
(9,28,1) 65.12, 5.26, 19.68, 71.2, 5.52, 23.39, 49.84, 5.28, 22.7,
Full Listing
#include <boost/multi_array.hpp>
#include <h5xx/h5xx.hpp>
#include <iostream>
using array_2d_t = boost::multi_array<float, 2>;
template <typename T> void print_array(T const& array)
{
for (auto const& row : array) { for (auto v : row)
printf("%5f ", v);
printf("\n");
}
}
void dump(h5xx::group const& g, std::string indent = "") {
auto dd = g.datasets();
auto gg = g.groups();
for (auto it = dd.begin(); it != dd.end(); ++it) {
std::cout << indent << " ds:" << it.get_name() << std::endl;
}
for (auto it = gg.begin(); it != gg.end(); ++it) {
dump(*it, indent + "/" + it.get_name());
}
}
array_2d_t read_frame(int frame_no) {
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
array_2d_t arr(boost::extents[11214][3]);
std::vector offsets{frame_no, 0, 0}, counts{1, 11214, 3};
h5xx::slice slice(offsets, counts);
h5xx::read_dataset(ds, arr, slice);
return arr;
}
int main()
{
print_array(read_frame(9));
}

CUDA which is faster? Memory coalescing vs caching?

I have encountered this exercise which asks for which code is faster between the following two.
First code.
int sum = 0;
for(int i = 0; i < n; i++) {
sum += array[i*n + thread_id];
}
Second code.
int sum = 0;
for(int i = 0; i < n; i++) {
sum += array[n*thread_id + i];
}
I would try the code myself I will not have a Nvidia GPU in the following days.
I think that the first code takes advantage of memory coalescing see here, while the second one would take advantage of caching.
Many thanks to #RobertCrovella for clarifying the issues regarding memory coalescing. This is my attempt to benchmark the two codes as asked for. It can be clearly noticed from the output (run on a NVS5400M GPU laptop) that the first code is twice more efficient as compared to the second one. This is because of the memory coalescing taking place in the first one (kernel1).
#include <cuda.h>
#include <ctime>
#include <iostream>
#include <stdio.h>
using namespace std;
#define BLOCK_SIZE 1024
#define GRID_SIZE 1024
// Error Handling
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
//kernel1<<<8,8>>>(d_array,d_sum1,n);
__global__ void kernel1(int *array, long *sum, int n) {
long result=0;
int thread_id=threadIdx.x+blockIdx.x*blockDim.x;
for(int i=0;i<n;i++) {
result += array[i*n + thread_id];
}
//__syncthreads();
sum[thread_id]=result;
}
__global__ void kernel2(int *array, long *sum, int n) {
long result=0;
int thread_id=threadIdx.x+blockIdx.x*blockDim.x;
for(int i=0;i<n;i++) {
result += array[n*thread_id+i];
}
__syncthreads();
sum[thread_id]=result;
}
int main() {
srand((unsigned)time(0));
long *h_sum1,*d_sum1;
long *h_sum2,*d_sum2;
int n=10;
int size1=n*BLOCK_SIZE*GRID_SIZE+n;
int *h_array;
h_array=new int[size1];
h_sum1=new long[size1];
h_sum2=new long[size1];
//random number range
int min =1, max =10;
for(int i=0;i<size1;i++) {
h_array[i]= min + (rand() % static_cast<int>(max - min + 1));
h_sum1[i]=0;
h_sum2[i]=0;
}
int *d_array;
gpuErrchk(cudaMalloc((void**)&d_array,size1*sizeof(int)));
gpuErrchk(cudaMalloc((void**)&d_sum1,size1*sizeof(long)));
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
gpuErrchk(cudaMemcpy(d_array,h_array,size1*sizeof(int),cudaMemcpyHostToDevice));
gpuErrchk(cudaMemcpy(d_sum1,h_sum1,size1*sizeof(long),cudaMemcpyHostToDevice));
cudaEventRecord(start);
kernel1<<<GRID_SIZE,BLOCK_SIZE>>>(d_array,d_sum1,n);
cudaEventRecord(stop);
gpuErrchk(cudaMemcpy(h_sum1,d_sum1,size1*sizeof(long),cudaMemcpyDeviceToHost));
float milliSeconds1=0;
cudaEventElapsedTime(&milliSeconds1,start,stop);
gpuErrchk(cudaMalloc((void**)&d_sum2,size1*sizeof(long)));
gpuErrchk(cudaMemcpy(d_sum2,h_sum2,size1*sizeof(long),cudaMemcpyHostToDevice));
cudaEventRecord(start);
kernel2<<<GRID_SIZE,BLOCK_SIZE>>>(d_array,d_sum2,10);
cudaEventRecord(stop);
gpuErrchk(cudaMemcpy(h_sum2,d_sum2,size1*sizeof(long),cudaMemcpyDeviceToHost));
float milliSeconds2=0;
cudaEventElapsedTime(&milliSeconds2,start,stop);
long result_device1=0,result_host1=0;
long result_device2=0,result_host2=0;
for(int i=0;i<size1;i++) {
result_device1 += h_sum1[i];
result_device2 += h_sum2[i];
}
for(int thread_id=0;thread_id<GRID_SIZE*BLOCK_SIZE;thread_id++)
for(int i=0;i<10;i++) {
result_host1 += h_array[i*10+thread_id];
result_host2 += h_array[10*thread_id+i];
}
cout << "Device result1 = " << result_device1 << endl;
cout << "Host result1 = " << result_host1 << endl;
cout << "Time1 (ms) = " << milliSeconds1 << endl;
cout << "Device result2 = " << result_device2 << endl;
cout << "Host result2 = " << result_host2 << endl;
cout << "Time2 (ms) = " << milliSeconds2 << endl;
gpuErrchk(cudaFree(d_array));
gpuErrchk(cudaFree(d_sum1));
gpuErrchk(cudaFree(d_sum2));
return 0;
}
The Cuda Event timer output is as under:
Device result1 = 57659226
Host result1 = 57659226
Time1 (ms) = 5.21952
Device result2 = 57674257
Host result2 = 57674257
Time2 (ms) = 11.8356

How to use OpenMP to deal with two for loops with

I am new to OpenMP... Please help me with this dumb question. Thank you :)
Basically, I want to use OpenMP to speed up two for loops. But I do not know why it keeps saying: invalid controlling predicate for the for loop.
By the way, my GCC version is gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005, and OS I am using is Ubuntu 16.10.
Basically, I generate a toy data that has a typical Key-Value style, like this:
Data = {
"0": ["100","99","98","97",..."1"];
"1": ["100","99","98","97",..."1"];
...
"999":["100","99","98","97",..."1"];
}
Then, for each key, I want to compare its value with the rest of the keys. Here, I sum them up through "user1_list.size()+user2_list.size();". As for each key, the sum-up process is totally independent of other keys, which means this works for parallelism.
Here is my toy example code.
#include <map>
#include <vector>
#include <string>
#include <iostream>
#include "omp.h"
using namespace std;
int main(){
// Create Data
map<string, vector<string>> data;
for(int i=0; i != 1000; i++){
vector<string> list;
for (int j=100; j!=0; j--){
list.push_back(to_string(j));
}
data[to_string(i)]=list;
}
cout << "Data Total size: " << data.size() << endl;
int count = 1;
#pragma omp parallel for private(count)
for (auto it=data.begin(); it!=data.end(); it++){
//cout << "Evoke Thread: " << omp_get_thread_num();
cout << " count: " << count << " / " << data.size() << endl;
count ++;
string user1 = it->first;
vector<string> user1_list = it->second;
for (auto it2=data.begin(); it2!=data.end(); it2++){
string user2 = it2->first;
vector<string> user2_list = it2->second;
cout << "u1:" << user1 << " u2:" << user2;
int total_size = user1_list.size()+user2_list.size();
cout << " total size: " << total_size << endl;
}
}
return 0;
}

C ++ , My for loop doesn't work when I run it on the terminal. Any ideas?

When I run it on the terminal it works fine but the loop. The for loop just doesn't do anything at all. I'm learning C++, so I don't know much.
#include <iostream>
#include <cstring>
using namespace std;
int main( int argc, char *argv[] ) {
if (argc == 2) {
cout << "The first argument is " << argv[0] << endl;
cout << "The second argument is " << argv[1] << endl;
} else if (argc > 2) {
cout << "Too many arguments" << endl;
exit(0);
} else {
cout << "Only one argument" << endl;
cout << "The argument is " << argv[0] << endl;
exit(0);
}
if (atoi(argv[1]) < 0) {
cout << "Error negative number" << endl;
exit(0);
}
// this loop does not work, everything else does.
for (int i = 1; i >= atoi(argv[1]); i++){
int count = atoi(argv[1]--);
cout << count << endl;
int sum = sum + i;
}
cout << "The sum is: " << endl;
return(0);}
I think that could be the if statements what are messing around with the loop.
I think you made mistake in the for loop.
You show use "<=" instead of ">=" in the for loop.
Hope this might helps you.
I guess your code is not reaching the for loop as you have exit() conditions on each and every condition of if. Your code only reaches the loop if you are passing 2 arguments in the terminal while you are running your code

performance of operator>>(istream&, double) VC10/VC11

Switching from VC10 to VC11 I observe a performance drop of a factor 10 when reading a file with double numbers:
#include <iostream>
int main() {
double sum = 0, x;
for(int i=0; i<1000000; i++){
std::cin >> x;
sum += x;
}
std::cerr << sum << std::endl;
return 0;
}
I built the executable in Developer Studio, so that the environment chooses the options in release mode at best.
Can anybody confirm this?
What could be the problem? Might it be related to locale?
thanks in advance,
andreas
*for some reason my previous answer was deleted (I do admit that the first sentence was a bit confusing, due to a clumsy edit when it got better results)
Actually, for me the performance is about the same.
VC11 writing/reading 1M doubles -> 6.600/3.562 seconds
VC10 writing/reading 1M doubles -> 6.266/3.606 seconds
So in my experiment, reading doubles from file in vc11 is aprox. the same performance as with vc10.
Codesample:
int _tmain(int argc, _TCHAR* argv[])
{
auto x = 0.0;
auto numberofdoubles = 1000000;
auto filename = "C:\\double.txt";
{
std::ofstream filestr(filename);
auto starttime = clock();
for(int i=0; i<numberofdoubles; i++)
filestr << (double)i << " ";
auto endtime = clock();
auto elapsed = (double)(endtime - starttime)/CLOCKS_PER_SEC;
std::cout << "writing: " << elapsed << std::endl;
}
{
std::ifstream filestr (filename);
auto starttime = clock();
for(int i=0; i<numberofdoubles; i++)
filestr >> x;
auto endtime = clock();
auto elapsed = (double)(endtime - starttime)/CLOCKS_PER_SEC;
std::cout << "reading: " << elapsed << std::endl;
}
return 0;
}

Resources