regex pattern to extract empty field from CSV file

regex pattern to extract empty field from CSV file - c++11

I have a csv file that needs to be read into Matrix.
Right now i have regex pattern
regex pat { R"(("[^"]+")|([^,]+))" }
i found similar topics from stackoverflow, but either theey used different regex pattern or were used with language other than c++.
Right now it chooses between sequences that are between quotes and anything that is not comma. The file contains data from the survey that has questions with yes no answers. If you answer "no" you do not need to answer to some related questions.
Therefore i get some sequences in file like this: ":,,,,,,,," Wheres each two commas mean an empty field. But i would like to remain the row as an equally numbered array. It seems that it would be easyer to later navigate through matrix to get information. So i would have to extract these empty fields between the commas.
I could not find a regex pattern for empty sequence. Is regex pattern a proper way for solving this issue?

This code illustrates sample usage of the named pattern:
#include <iostream>
#include <iterator>
#include <string>
#include <regex>
int main()
{
std::regex field_regex("(\"([^\"]*)\"|([^,]*))(,|$)");
for (const std::string s : {
"a,,hello,,o",
"\"a\",,\"hello\",,\"o\"",
",,,,"})
{
std::cout << "parsing: " << s << "\n";
std::cout << "======================================" << "\n";
auto i = 0;
for (auto it = std::sregex_iterator(s.begin(), s.end(), field_regex);
it != std::sregex_iterator();
++it, ++i)
{
auto match = *it;
auto extracted = match[2].length() ? match[2].str() : match[3].str();
std::cout << "column[" << i << "]: " << extracted << "\n";
if (match[4].length() == 0)
{
break;
}
}
std::cout << "\n";
}
}
Output:
parsing: a,,hello,,o
======================================
column[0]: a
column[1]:
column[2]: hello
column[3]:
column[4]: o
parsing: "a",,"hello",,"o"
======================================
column[0]: a
column[1]:
column[2]: hello
column[3]:
column[4]: o
parsing: ,,,,
======================================
column[0]:
column[1]:
column[2]:
column[3]:
column[4]:

Related

Continue appending [to stream] in for-loop without local variable access

Imagine you have the following code where logDebug() is expensive or is not appropriate to call more than once:
QDebug d = logDebug();
d << __FUNCTION__ << ":";
d << "positions separated with \" --- \":";
for (const auto& str : positions)
{
d << "---" << str;
}
A macro (just to replace the function name correctly) already exists which replaces the first 2 lines:
#define LOG_FUNCTION this->logDebug() << __FUNCTION__ << ":"
It creates the local variable by calling logDebug(). Once called, you can only use the operator<< onto the macro.
The problem is you can't attach the for loop body to logger.
Q: Is there a way I could use the macro for pasting all the positions (without calling logDebug again?
I would guess this should be possible using lambdas, but I quite don't know how to.
Please help, the shortest answer wins!

Q: Is there a way I could use the macro for pasting all the positions (without calling logDebug again? I would guess this should be possible using lambdas, but I quite don't know how to.
I suppose it's possible with something as follows (used std::cout instead of logDebug())
#include <iostream>
#define LOG_FUNCTION std::cout << __FUNCTION__ << ": "
#define LOG_DEB(ps) \
[](auto & s, auto const & _ps) { for ( auto const & p : _ps ) s << p; } \
(LOG_FUNCTION, ps)
int main ()
{
int a[] { 0, 1, 2, 3, 4 };
LOG_DEB(a);
}
I've used a couple of auto as types of the lambda arguments and this works only starting from C++14.
In C++11 you have to replace they with the correct types.

Well the macro can be coerced to return your debug object:
#define LOG_FUNCTION() this->logDebug() << __FUNCTION__ << ":"
Then use it like this:
auto& d = LOG_FUNCTION();
d << "positions separated with \" --- \":";
for (const auto& str : positions)
{
d << "---" << str;
}

C++: How can I make an integer filter with only <iostream> library?

(I don't have much english vocabulary, so sry for this weird try of english)
Hi guys! I'm new at C++ and I need to know how to create a filter code that help me at only accept int-eger numbers. I need that this code use only the 'iostream' library. This is because my teacher don't let us use another kind of library (we are new at C++ coding).
Here I put an example of what I have at this moment:
# include <iostream>
# include <limits> //I should't use this library
using namespace std;
int main() {
int value = 0;
cout << "Enter an integer value: ";
while(!(cin >> value)) {
cin.clear();
cin.ignore(numeric_limits<streamsize>::max(), '\n');
cout << endl <<"Value must be an integer"<< endl << endl; //This line needs <limits>
cout << "Enter another integer value: " ;
}
}
But this code have some inconvenients:
I'm using "#include 'limits'" library and I shouldn't use it
If you enter "1asd" it takes the "1" value, give it like if its correct and it isn't true
Do you guys have any solution for this situation? Thanks in advance for your time.

You just have to check if the bytes that the user entered are numerals like below. If all the bytes of the entered string are numerals (ie between characters 0 and 9), then the entire string is an integer. Except first byte of the string can be a '+', '-', a space/tab or just the first numeral in the number. (Thanks Zett42).
std::cout << "Enter an integer value: ";
std::string res1;
std::cin >> res1;
std::string::iterator it;
for ( it = res1.begin() ; it < res1.end(); it++)
{ std::cout << "checking " << *it << ' ';
if (!( '0' <= *it && *it <= '9' )) {
std::cout << "this is a numeral\n";
} else {
std::cout << "you entered: " << *it << " -- this is *not* a numeral\n";
}
}

Boost spirit skip parser with at least one whitespace

In the grammar i'm implementing, there are elements separated by whitespace. With a skip parser, the spaces between the elements are skipped automatically, but this also allows no space, which is not what i want. Sure, i could explicitly write a grammar that includes these spaces, but it seems to me (with the complexity and flexibility offered by spirit) that there is a better way to do this. Is there?
Here is an example:
#include <cstdlib>
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main(int argc, char** argv)
{
if(argc != 2)
{
std::exit(1);
}
std::string str = argv[1];
auto iter = str.begin();
bool r = qi::phrase_parse(iter, str.end(), qi::char_ >> qi::char_, qi::blank);
if (r && iter == str.end())
{
std::cout << "parse succeeded\n";
}
else
{
std::cout << "parse failed. Remaining unparsed: " << std::string(iter, str.end()) << '\n';
}
}
This allows ab as well as a b. I want only the latter to be allowed.
Related to this: How do the skip parsers work, exactly? One supplies something like qi::blank, is then the kleene star applied to form the skip parser? I would like to get some enlightenment here, maybe this also helps on solving this problem.
Additional information: My real parser looks something like this:
one = char_("X") >> repeat(2)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
two = char_("Y") >> repeat(3)[omit[+blank] >> +alnum];
three = char_("Z") >> repeat(4)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
main = one | two | three;
which makes the grammar quite noisy, which i would like to avoid.

First off, the grammar specs I usually see this kind of requirement in are (always?) RFCs. In 99% of cases there is no issue, consider e.g.:
myrule = skip(space) [ uint_ >> uint_ ];
This already implicitly requires at least 1 whitespace character between the numbers, for the simple reason that there would be 1 number, otherwise. The same simplification occurs in surprisingly many cases (see e.g. the simplifications made around the ubiquitous WSP productions in this answer last week Boost.Spirit qi value sequence vector).
With that out of the way, skippers apply zero or more times, by definition, so no there is not a way to get what you want with an existing stateful directive like skip(). See also http://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965 or the docs - under lexeme, [no_]skip and skip_flag::dont_postskip).
Looking at your specific grammar, I'd do this:
bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);
Here, you can add a negative lookahead assertion inside a lexeme to assert that "the end of the token was reached" - which in your parser would be mandated as !qi::graph:
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
See a demo:
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
for (std::string const str : { "ab", " ab ", " a b ", "a b" }) {
auto iter = str.begin(), end = str.end();
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);
std::cout << " --- " << std::quoted(str) << " --- ";
if (r) {
std::cout << "parse succeeded.";
} else {
std::cout << "parse failed.";
}
if (iter != end) {
std::cout << " Remaining unparsed: " << std::string(iter, str.end());
}
std::cout << std::endl;
}
}
Prints
--- "ab" --- parse failed. Remaining unparsed: ab
--- " ab " --- parse failed. Remaining unparsed: ab
--- " a b " --- parse succeeded.
--- "a b" --- parse succeeded.
BONUS Review notes
My guidelines would be:
your skipper should be the grammar's responsibility. It's sad that all Qi samples lead people to believe you need to let the caller decide that
end-iterator checking does not equal error-checking. It's very possible to parse things correctly without consuming all input. Which is why reporting the "remaining input" should not just happen in the case that parsing failed.
If trailing unparsed input is an error, spell it out:
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
for (std::string const str : { "ab", " ab ", " a b ", "a b happy trees are trailing" }) {
auto iter = str.begin(), end = str.end();
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
bool r = qi::parse(iter, end, qi::skip(qi::space) [ token >> token >> qi::eoi ]);
std::cout << " --- " << std::quoted(str) << " --- ";
if (r) {
std::cout << "parse succeeded.";
} else {
std::cout << "parse failed.";
}
if (iter != end) {
std::cout << " Remaining unparsed: " << std::quoted(std::string(iter, str.end()));
}
std::cout << std::endl;
}
}
Prints
--- "ab" --- parse failed. Remaining unparsed: "ab"
--- " ab " --- parse failed. Remaining unparsed: " ab "
--- " a b " --- parse succeeded.
--- "a b happy trees are trailing" --- parse failed. Remaining unparsed: "a b happy trees are trailing"

C++11 set range based for using structs as elements

Let's say I have a struct like this:
struct Something{
string name;
int code;
};
And a set of Something type:
set<Something> myset;
myset.insert({"aaa",123,});
myset.insert({"bbb",321});
myset.insert({"ccc",213});
What's wrong with this?
for (auto sth : myset){
cout << sth.name;
cout << sth.code;
}
Along the same lines... why can't I modify an element (even when the set contains plain int items) using something like this?
for (auto &sth : myset){
sth=[some value];
}
I know I can do this with vectors and maps. Why not sets?
Thanks!

Modifying an element of a set implies its position in the set's order can change. Because your compiler cannot know what exactly a particular set uses to determine its element's orders. Well, it could, theoretically, but even then it would be nearly impossible to keep track of the rearrangements while iterating through the container. It would make no sense.
What you can do, if you want to modify the elements of a set in such a way that you know will not change their order in a set, you can make the non-ordering members of your struct mutable. Note that if you make a mistake and the set's order is disturbed, any other operations on the set (like a binary search) will give incorrect results after that faulty modification. If you don't want to make members mutable, const_cast is an option, with the same caveats.
To elaborate on my answer above, an example:
#include <iostream>
#include <set>
struct bla
{
std::string name;
int index;
};
bool operator<(const bla& left, const bla& right) { return left.index < right.index; }
int main()
{
std::set<bla> example{{"har", 1}, {"diehar", 2}};
// perfectly fine
for(auto b : example)
std::cout << b.index << ' ' << b.name << '\n';
// perfectly fine - name doesn't influence set order
for(auto& b : example) // decltype(b) == const bla&
const_cast<std::string&>(b.name) = "something";
// better than first loop: no temporary copies
for(const auto& b : example)
std::cout << b.index << ' ' << b.name << '\n';
// using a "universal reference auto&&", mostly useful in template contexts
for(auto&& b : example) // decltype(b) == const bla&
std::cout << b.index << ' ' << b.name << '\n';
// destroying order of the set here:
for(auto& b : example)
const_cast<int&>(b.index) = -b.index;
// anything here relying on an ordered collection will fail
// This includes std::set::find, all the algorithms that depend on uniqueness and/or ordering
// This is pretty much all that will still work, although it may not even be guaranteed
for(auto&& b : example)
std::cout << b.index << ' ' << b.name << '\n';
}
Live code on Coliru.
Note the first const_cast is only ok because the underlying example isn't const in the first place.

Boost Spirit - Parser Capturing Unwanted Text

I have a simple struct
// in namespace client
struct UnaryExpression
{
std::string key;
SomeEnums::CompareType op;
};
SomeEnums::CompareType is an enum where I define a symbol table as such:
struct UnaryOps : bsq::symbols<char, SomeEnums::CompareType>
{
UnaryOps() : bsq::symbols<char, SomeEnums::CompareType>(std::string("UnaryOps"))
{
add("exists", SomeEnums::Exists)
("nexists", SomeEnums::NotExists);
}
};
I have two different ways I want to parse the struct, which I asked about in another thread and got to work (mostly).
My grammar looks as follows:
template<typename Iterator>
struct test_parser : bsq::grammar<Iterator, client::UnaryExpression(), bsq::ascii::space_type>
{
test_parser()
: test_parser::base_type(unaryExp, std::string("Test"))
{
using bsq::no_case;
key %= bsq::lexeme[bsq::alnum >> +(bsq::alnum | bsq::char_('.'))];
unaryExp %= unaryE | unaryF;
unaryE %= key >> no_case[unaryOps];
unaryF %= no_case[unaryOps] >> '(' >> key >> ')';
};
UnaryOps unaryOps;
bsq::rule<Iterator, std::string(), bsq::ascii::space_type> key;
bsq::rule<Iterator, client::UnaryExpression(), bsq::ascii::space_type> unaryExp;
bsq::rule<Iterator, client::UnaryExpression(), bsq::ascii::space_type> unaryE;
bsq::rule<Iterator, client::UnaryFunction(), bsq::ascii::space_type> unaryF;
};
And I'm parsing the code using the following logic:
bool r = phrase_parse(iter, end, parser, bsq::ascii::space, exp);
if (r && iter == end)
{
std::cout << "-------------------------\n";
std::cout << "Parsing succeeded\n";
std::cout << "key: " << exp.key << "\n";
std::cout << "op : " << exp.op << "\n";
std::cout << "-------------------------\n";
}
This all works fine if I do the input like foo exists and exp.key equals "foo" and exp.op equals the corresponding enum value (in this case 0). Something like foo1 nexists also works.
However, that second rule doesn't work like I expect. If I give it input of nexists(foo) then I get the following output:
-------------------------
Parsing succeeded
key: nexistsfoo
op : 1
-------------------------
It seems that the enum value is getting set appropriately but I can't figure out why the "nexsts" is getting prepended to the key string. Can someone please tell me how I can fix my rule so that the key would equal just 'foo' with the second rule?
I have posted a copy of the stripped down code that illustrates my problem here: http://pastebin.com/402M9iTS

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

regex pattern to extract empty field from CSV file - c++11

Related

Continue appending [to stream] in for-loop without local variable access

C++: How can I make an integer filter with only <iostream> library?

Boost spirit skip parser with at least one whitespace

C++11 set range based for using structs as elements

Boost Spirit - Parser Capturing Unwanted Text

Categories

Resources