dividing a string into array after comma c++ - c++11

I have the following string :-
CoursesExams =
HUM001,Technical Writing,28/4/2016,HallA;CSE121,Computer Programming,3/5/2016,HallB]
 
I want to split it after each ; into an array. How can I do that using c++?

Use std::getline and stringstream:
std::string s = "HUM001,Technical Writing,28/4/2016,HallA;CSE121,Computer Programming,3/5/2016,HallB]";
std::vector<std::string> arr;
std::istringstream str(s);
std::string elem;
// getline reads str stream until comma is found, then returns string in elem
while(std::getline(str, elem, ',')) arr.push_back(elem);
for (auto& s : arr) std::cout << s << "\n";

Related

Error using Max_Element with String Vector

I'm implementing an algorithm to return a vector string array with only the largest elements in the vector string array of entrance:
vector<string> solution(vector<string> inputArray) {
vector<string> s;
auto m = *max_element(inputArray.begin(),inputArray.end());
for(int i=0;i<inputArray.size();i++){
if(inputArray[i].size() == m.size())
{
s.push_back(inputArray[i]);
}
}
return s;
It works for every test case except in the case the entry string vector is {"enyky", "benyky","yely","varennyky"}. 'm' should return a pointer to "varennyky", but it returns a pointer to "yely" instead.
I digged in to the documentation for max_element, but cant find what I'm doing wrong. Can anybody help me?
Your function is comparing the strings lexicographically, which is the default comparison in case of strings.
To illustrate, consider the following example:
#include <algorithm>
#include <string>
#include <vector>
// Print a vector of strings
void print_vec(std::vector<std::string> vec)
{
for (const auto& el : vec) {
std::cout << el << " ";
}
std::cout << std::endl;
}
// Compares strings by length
bool less_length(const std::string& s1, const std::string& s2)
{
return s1.length() < s2.length();
}
int main()
{
std::vector<std::string> test_0 = {"enyky", "benyky","yely","varennyky"};
// Default sort and max element
std::sort(test_0.begin(), test_0.end());
print_vec(test_0);
const auto largest_0 = *std::max_element(test_0.begin(), test_0.end());
std::cout << "Largest member (lexicographically): " << largest_0 << '\n' << std::endl;
// Sort and max element by string size
std::sort(test_0.begin(), test_0.end(), less_length);
print_vec(test_0);
const auto largest_1 = *std::max_element(test_0.begin(), test_0.end(), less_length);
std::cout << "Largest member (by string length): " << largest_1 << std::endl;
}
The first part of the program runs what you are doing in your function: it finds the maximum element based on lexicographic ordering. According to that ordering, the largest string is yely, you can see that by the output from sort.
The second part uses a custom comparison function, borrowed directly from this book. It uses string length to determine the order in the max_element call and the result is what you were looking for. Again, the sorted vector is also printed for clarity.

How to read chunk of the data from a hdf5 file in c++?

I want to read a chunk of data which is just one frame of many frames stored in one dataset. The shape of the whole dataset is (10, 11214,3), 10 frames each frame has 11214 rows and 4 columns. Here is the file. The chunk I want to read would have the shape (11214,3). I can print the predefined array using, but I'm not sure how can I read data from a hdf5 file. Here is my code,
#include <h5xx/h5xx.hpp>
#include <boost/multi_array.hpp>
#include <iostream>
#include <vector>
#include <cstdio>
typedef boost::multi_array<int, 2> array_2d_t;
const int NI=10;
const int NJ=NI;
void print_array(array_2d_t const& array)
{
for (unsigned int j = 0; j < array.shape()[1]; j++)
{
for (unsigned int i = 0; i < array.shape()[0]; i++)
{
printf("%2d ", array[j][i]);
}
printf("\n");
}
}
void write_int_data(std::string const& filename, array_2d_t const& array)
{
h5xx::file file(filename, h5xx::file::trunc);
std::string name;
{
// --- create dataset and fill it with the default array data (positive values)
name = "integer array";
h5xx::create_dataset(file, name, array);
h5xx::write_dataset(file, name, array);
// --- create a slice object (aka hyperslab) to specify the location in the dataset to be overwritten
std::vector<int> offset; int offset_raw[2] = {4,4}; offset.assign(offset_raw, offset_raw + 2);
std::vector<int> count; int count_raw[2] = {2,2}; count.assign(count_raw, count_raw + 2);
h5xx::slice slice(offset, count);
}
}
void read_int_data(std::string const& filename)
{
h5xx::file file(filename, h5xx::file::in);
std::string name = "integer array";
// read and print the full dataset
{
array_2d_t array;
// --- read the complete dataset into array, the array is resized and overwritten internally
h5xx::read_dataset(file, name, array);
printf("original integer array read from file, negative number patch was written using a slice\n");
print_array(array);
printf("\n");
}
}
int main(int argc, char** argv)
{
std::string filename = argv[0];
filename.append(".h5");
// --- do a few demos/tests using integers
{
array_2d_t array(boost::extents[NJ][NI]);
{
const int nelem = NI*NJ;
int data[nelem];
for (int i = 0; i < nelem; i++)
data[i] = i;
array.assign(data, data + nelem);
}
write_int_data(filename, array);
read_int_data(filename);
}
return 0;
}
I'm using the h5xx — a template-based C++ wrapper for the HDF5 library link and boost library.
The datasets are stored in particles/lipids/box/positions path. The dataset name value holds the frames.
argv[0] is not what you want (arguments start at 1, 0 is the program name). Consider bounds checking as well:
std::vector<std::string> const args(argv, argv + argc);
std::string const filename = args.at(1) + ".h5";
the initialization can be done directly, without a temporary array (what is multi_array for, otherwise?)
for (size_t i = 0; i < array.num_elements(); i++)
array.data()[i] = i;
Or indeed, make it an algorithm:
std::iota(array.data(), array.data() + array.num_elements(), 0);
same with vectors:
std::vector<int> offset; int offset_raw[2] = {4,4}; offset.assign(offset_raw, offset_raw + 2);
std::vector<int> count; int count_raw[2] = {2,2}; count.assign(count_raw, count_raw + 2);
besides being a formatting mess can be simply
std::vector offset{4,4}, count{2,2};
h5xx::slice slice(offset, count);
On To The Real Question
The code has no relevance to the file. At all. I created some debug/tracing code to dump the file contents:
void dump(h5xx::group const& g, std::string indent = "") {
auto dd = g.datasets();
auto gg = g.groups();
for (auto it = dd.begin(); it != dd.end(); ++it) {
std::cout << indent << " ds:" << it.get_name() << "\n";
}
for (auto it = gg.begin(); it != gg.end(); ++it) {
dump(*it, indent + "/" + it.get_name());
}
}
int main()
{
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
dump(xaa);
}
Prints
/particles/lipids/box/edges ds:box_size
/particles/lipids/box/edges ds:step
/particles/lipids/box/edges ds:time
/particles/lipids/box/edges ds:value
/particles/lipids/box/positions ds:step
/particles/lipids/box/positions ds:time
/particles/lipids/box/positions ds:value
Now we can drill down to the dataset. Let's see whether we can figure out the correct type. It certainly is NOT array_2d_t:
h5xx::dataset ds(xaa, "particles/lipids/box/positions/value");
array_2d_t a;
h5xx::datatype detect(a);
std::cout << "type: " << std::hex << ds.get_type() << std::dec << "\n";
std::cout << "detect: " << std::hex << detect.get_type_id() << std::dec << "\n";
Prints
type: 30000000000013b
detect: 30000000000000c
That's a type mismatch. I guess I'll have to learn to read that gibberish as well...
Let's add some diagnostics:
void diag_type(hid_t type)
{
std::cout << " Class " << ::H5Tget_class(type) << std::endl;
std::cout << " Size " << ::H5Tget_size(type) << std::endl;
std::cout << " Sign " << ::H5Tget_sign(type) << std::endl;
std::cout << " Order " << ::H5Tget_order(type) << std::endl;
std::cout << " Precision " << ::H5Tget_precision(type) << std::endl;
std::cout << " NDims " << ::H5Tget_array_ndims(type) << std::endl;
std::cout << " NMembers " << ::H5Tget_nmembers(type) << std::endl;
}
int main()
{
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
// dump(xaa);
{
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
std::cout << "dataset: " << std::hex << ds.get_type() << std::dec << std::endl;
diag_type(ds.get_type());
}
{
array_2d_t a(boost::extents[NJ][NI]);
h5xx::datatype detect(a);
std::cout << "detect: " << std::hex << detect.get_type_id() << std::dec << std::endl;
diag_type(detect.get_type_id());
}
}
Prints
dataset: 30000000000013b
Class 1
Size 4
Sign -1
Order 0
Precision 32
NDims -1
NMembers -1
detect: 30000000000000c
Class 0
Size 4
Sign 1
Order 0
Precision 32
NDims -1
NMembers -1
At least we know that HST_FLOAT (class 1) is required. Let's modify array_2d_t:
using array_2d_t = boost::multi_array<float, 2>;
array_2d_t a(boost::extents[11214][3]);
This at least makes the data appear similarly. Let's ... naively try to read:
h5xx::read_dataset(ds, a);
Oops, that predictably throws
terminate called after throwing an instance of 'h5xx::error'
what(): /home/sehe/Projects/stackoverflow/deps/h5xx/h5xx/dataset/boost_multi_array.hpp:176:read_dataset(): dataset "/particles/lipi
ds/box/positions/value" and target array have mismatching dimensions
No worries, we can guess:
using array_3d_t = boost::multi_array<float, 3>;
array_3d_t a(boost::extents[10][11214][3]);
h5xx::read_dataset(ds, a);
At least this does work. Adapting the print function:
template <typename T> void print_array(T const& array) {
for (auto const& row : array) {
for (auto v : row) printf("%5f ", v);
printf("\n");
}
}
Now we can print the first frame:
h5xx::read_dataset(ds, a);
print_array(*a.begin()); // print the first frame
This prints:
80.480003 35.360001 4.250000
37.450001 3.920000 3.960000
18.530001 -9.690000 4.680000
55.389999 74.339996 4.600000
22.110001 68.709999 3.850000
-4.130000 24.040001 3.730000
40.160000 6.390000 4.730000
-5.400000 35.730000 4.850000
36.669998 22.450001 4.080000
-3.680000 -10.660000 4.180000
(...)
That checks out with h5ls -r -d xaa.h5/particles/lipids/box/positions/value:
particles/lipids/box/positions/value Dataset {75/Inf, 11214, 3}
Data:
(0,0,0) 80.48, 35.36, 4.25, 37.45, 3.92, 3.96, 18.53, -9.69, 4.68,
(0,3,0) 55.39, 74.34, 4.6, 22.11, 68.71, 3.85, -4.13, 24.04, 3.73,
(0,6,0) 40.16, 6.39, 4.73, -5.4, 35.73, 4.85, 36.67, 22.45, 4.08, -3.68,
(0,9,1) -10.66, 4.18, 35.95, 36.43, 5.15, 57.17, 3.88, 5.08, -23.64,
(0,12,1) 50.44, 4.32, 6.78, 8.24, 4.36, 21.34, 50.63, 5.21, 16.29,
(0,15,1) -1.34, 5.28, 22.26, 71.25, 5.4, 19.76, 10.38, 5.34, 78.62,
(0,18,1) 11.13, 5.69, 22.14, 59.7, 4.92, 15.65, 47.28, 5.22, 82.41,
(0,21,1) 2.09, 5.24, 16.87, -11.68, 5.35, 15.54, -0.63, 5.2, 81.25,
(...)
The Home Stretch: Adding The Slice
array_2d_t read_frame(int frame_no) {
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
array_2d_t a(boost::extents[11214][3]);
std::vector offsets{frame_no, 0, 0}, counts{1, 11214, 3};
h5xx::slice slice(offsets, counts);
h5xx::read_dataset(ds, a, slice);
return a;
}
There you have it. Now we can print any frame:
print_array(read_frame(0));
Printing the same as before. Let's try the last frame:
print_array(read_frame(9));
Prints
79.040001 36.349998 3.990000
37.250000 3.470000 4.140000
18.600000 -9.270000 4.900000
55.669998 75.070000 5.370000
21.920000 67.709999 3.790000
-4.670000 24.770000 3.690000
40.000000 6.060000 5.240000
-5.340000 36.320000 5.410000
36.369999 22.490000 4.130000
-3.520000 -10.430000 4.280000
(...)
Checking again with h5ls -r -d xaa.h5/particles/lipids/box/positions/value |& grep '(9' | head confirms:
(9,0,0) 79.04, 36.35, 3.99, 37.25, 3.47, 4.14, 18.6, -9.27, 4.9, 55.67,
(9,3,1) 75.07, 5.37, 21.92, 67.71, 3.79, -4.67, 24.77, 3.69, 40, 6.06,
(9,6,2) 5.24, -5.34, 36.32, 5.41, 36.37, 22.49, 4.13, -3.52, -10.43,
(9,9,2) 4.28, 35.8, 36.43, 4.99, 56.6, 4.09, 5.04, -23.37, 49.42, 3.81,
(9,13,0) 6.31, 8.83, 4.56, 22.01, 50.38, 5.43, 16.3, -2.92, 5.4, 22.02,
(9,16,1) 70.09, 5.36, 20.23, 11.12, 5.66, 78.48, 11.34, 6.09, 20.26,
(9,19,1) 61.45, 5.35, 14.25, 48.32, 5.35, 79.95, 1.71, 5.38, 17.56,
(9,22,1) -11.61, 5.39, 15.64, -0.19, 5.06, 80.43, 71.77, 5.29, 75.54,
(9,25,1) 35.14, 5.26, 22.45, 56.86, 5.56, 16.47, 52.97, 6.16, 20.62,
(9,28,1) 65.12, 5.26, 19.68, 71.2, 5.52, 23.39, 49.84, 5.28, 22.7,
Full Listing
#include <boost/multi_array.hpp>
#include <h5xx/h5xx.hpp>
#include <iostream>
using array_2d_t = boost::multi_array<float, 2>;
template <typename T> void print_array(T const& array)
{
for (auto const& row : array) { for (auto v : row)
printf("%5f ", v);
printf("\n");
}
}
void dump(h5xx::group const& g, std::string indent = "") {
auto dd = g.datasets();
auto gg = g.groups();
for (auto it = dd.begin(); it != dd.end(); ++it) {
std::cout << indent << " ds:" << it.get_name() << std::endl;
}
for (auto it = gg.begin(); it != gg.end(); ++it) {
dump(*it, indent + "/" + it.get_name());
}
}
array_2d_t read_frame(int frame_no) {
h5xx::file xaa("xaa.h5", h5xx::file::mode::in);
h5xx::group g(xaa, "particles/lipids/box/positions");
h5xx::dataset ds(g, "value");
array_2d_t arr(boost::extents[11214][3]);
std::vector offsets{frame_no, 0, 0}, counts{1, 11214, 3};
h5xx::slice slice(offsets, counts);
h5xx::read_dataset(ds, arr, slice);
return arr;
}
int main()
{
print_array(read_frame(9));
}

Reading in from file with modern c++ - data is not stored

maybe I get something wrong with shared_pointers or there is some basic shortcoming of mine but I couldn't get this right. So I want to read in some data from a file. There are position and momentum data on each line of the data file and the first line stores the number of data points.
I need to read this in to my data structure and for some reason my graph would not fill, although the data reads in correctly.
const int dim = 3; // dimension of problem
template <typename T, typename G>
// T is the type of the inputted locations and G is the type of the
// distance between them
// for example: int point with float/double distance
struct Node{
std::pair< std::array<T, dim>,std::pair< std::array<T, dim>, G > > pos; // position
std::pair< std::array<T, dim>,std::pair< std::array<T, dim>, G > > mom; // momentum
// a pair indexed by a position in space and has a pair of position
// and the distance between these points
};
template <typename T, typename G>
struct Graph{
int numOfNodes;
std::vector< Node<T,G> > nodes;
};
This is the data structure and here's my read function (std::cout-s are only for testing):
template <typename T, typename G>
std::istream& operator>>(std::istream& is, std::shared_ptr< Graph<T,G> >& graph){
is >> graph->numOfNodes; // there's the number of nodes on the first line of the data file
std::cout << graph->numOfNodes << "\n";
for(int k=0; k<graph->numOfNodes; k++){
Node<T,G> temp;
for(auto i : temp.pos.first){
is >> i;
std::cout << i << "\t";
}
std::cout << "\t";
for(auto i : temp.mom.first){
is >> i;
std::cout << i << "\t";
}
std::cout << "\n";
graph->nodes.push_back(temp);
}
return is;
}
I have an output function as well. So if I output the graph which I intended to fill during read-in is zeroed out. Number of nodes os correct however positions and momente are all zeroed out. What did I do wrong? Thanks in advance.
for(auto i : temp.pos.first){
is >> i;
std::cout << i << "\t";
}
Think of this as similar to a function. If you have something like:
void doX(int i) { i = 42; }
int main() {
int j=5;
doX(j);
return j;
}
Running this code, you'll see the program returns the value 5. This is because the function doX takes i by value; it basically takes a copy of the variable.
If you replace doX's signature with
void doX(int &i)
and run the code, you'll see it returns 42. This is because the function is now taking the argument by reference, and so can modify it.
Your loops will behave similarly. As you have it now, they take a copy of the values in the arrays in turn, but are not by reference.
As with the function, you can change your loops to look like
for(auto &i : temp.pos.first){
is >> i;
std::cout << i << "\t";
}
This should then let you change the values stored in the arrays.

Parsing through Vectors

I am new and learning C++ using the Programming Principles ... book by Bjarne Stroustrup. I am working on one problem and can't figure out how to make my code work. I know the issue is with if (words[i]==bad[0, bad.size() - 1]) in particular bad.size() - 1])
I am trying to out put all words in the words vector except display a bleep instead of any words from the words vector that match any of the words in the bad vector. So I need to know if words[i] matches any of the values in the bad vector.
#include "../std_lib_facilities.h"
using namespace std;
int main()
{
vector<string> words; //declare Vector
vector<string> bad = {"idiot", "stupid"};
//Read words into Vector
for(string temp; cin >> temp;)
words.push_back(temp);
cout << "Number of words currently entered "
<< words.size() << '\n';
//sort the words
sort(words);
//read out words
for(int i = 0; i < words.size(); ++i)
if (i==0 || words[i-1]!= words[i])
if (words[i]==bad[0, bad.size() - 1])
cout << "Bleep!\n";
else
cout << words[i] << '\n';
return 0;
}
You need to go through all of the entries in the bad vector for each entry in the words vector. Something like this:
for(const string& word : words)
{
bool foundBadWord = false;
for(const string& badWord : bad)
{
if(0 == word.compare(badWord))
{
foundBadWord = true;
break;
}
}
if(foundBadWord)
{
cout << "Bleep!\n";
}
else
{
cout << word << "\n";
}
}

How to perform a range-based c++11 for loop on char* argv[]?

I would like to try out c++11 range-based for loop on char* argv[] but I am getting errors. By current approach is :
for( char* c : argv )
{
printf("param: %s \n", c);
}
and in my makefile I have the following line :
g++ -c -g -std=c++11 -O2 file.cc
argv is an array of pointers to raw strings, you can't obtain a range from it directly.
With C++17 you can use std::string_view to avoid allocating strings:
for (auto && str : std::vector<std::string_view> { argv, argv + argc })
{
std::printf("%s\n", str.data()); // Be careful!
std::cout << str << std::endl; // Always fine
fmt::print("{}\n", str); // <3
}
Take caution when using string_view with printf because:
Unlike std::basic_string::data() and string literals, data() may return a pointer to a buffer that is not null-terminated.
argv always contains null-terminated strings so you're fine here though.
Without C++17 you can simply use std::string:
for (auto && str : std::vector<std::string> { argv, argv + argc })
std::printf("param: %s\n", str.c_str());
Starting from C++20 you can use std::span:
for (auto && str : std::span(argv, argc))
std::printf("param: %s\n", str);
You can't use the range-based loop since you don't have a range.
You can either write your own range wrapper (a kind of "array view"), or just use a normal loop:
for (char ** p = argv, e = argv + argc; p != e; ++p)
{
// use *p
}
With a wrapper:
#include <cstddef>
template <typename T>
struct array_view
{
T * first, * last;
array_view(T * a, std::size_t n) : first(a), last(a + n) {}
T * begin() const { return first; }
T * end() const { return last; }
};
template <typename T>
array_view<T> view_array(T * a, std::size_t n)
{
return array_view<T>(a, n);
}
Usage:
for (auto s : view_array(argv, argc))
{
std::cout << s << "\n";
}

Resources