Boost spirit skip parser with at least one whitespace - boost

In the grammar i'm implementing, there are elements separated by whitespace. With a skip parser, the spaces between the elements are skipped automatically, but this also allows no space, which is not what i want. Sure, i could explicitly write a grammar that includes these spaces, but it seems to me (with the complexity and flexibility offered by spirit) that there is a better way to do this. Is there?
Here is an example:
#include <cstdlib>
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main(int argc, char** argv)
{
if(argc != 2)
{
std::exit(1);
}
std::string str = argv[1];
auto iter = str.begin();
bool r = qi::phrase_parse(iter, str.end(), qi::char_ >> qi::char_, qi::blank);
if (r && iter == str.end())
{
std::cout << "parse succeeded\n";
}
else
{
std::cout << "parse failed. Remaining unparsed: " << std::string(iter, str.end()) << '\n';
}
}
This allows ab as well as a b. I want only the latter to be allowed.
Related to this: How do the skip parsers work, exactly? One supplies something like qi::blank, is then the kleene star applied to form the skip parser? I would like to get some enlightenment here, maybe this also helps on solving this problem.
Additional information: My real parser looks something like this:
one = char_("X") >> repeat(2)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
two = char_("Y") >> repeat(3)[omit[+blank] >> +alnum];
three = char_("Z") >> repeat(4)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
main = one | two | three;
which makes the grammar quite noisy, which i would like to avoid.

First off, the grammar specs I usually see this kind of requirement in are (always?) RFCs. In 99% of cases there is no issue, consider e.g.:
myrule = skip(space) [ uint_ >> uint_ ];
This already implicitly requires at least 1 whitespace character between the numbers, for the simple reason that there would be 1 number, otherwise. The same simplification occurs in surprisingly many cases (see e.g. the simplifications made around the ubiquitous WSP productions in this answer last week Boost.Spirit qi value sequence vector).
With that out of the way, skippers apply zero or more times, by definition, so no there is not a way to get what you want with an existing stateful directive like skip(). See also http://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965 or the docs - under lexeme, [no_]skip and skip_flag::dont_postskip).
Looking at your specific grammar, I'd do this:
bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);
Here, you can add a negative lookahead assertion inside a lexeme to assert that "the end of the token was reached" - which in your parser would be mandated as !qi::graph:
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
See a demo:
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
for (std::string const str : { "ab", " ab ", " a b ", "a b" }) {
auto iter = str.begin(), end = str.end();
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);
std::cout << " --- " << std::quoted(str) << " --- ";
if (r) {
std::cout << "parse succeeded.";
} else {
std::cout << "parse failed.";
}
if (iter != end) {
std::cout << " Remaining unparsed: " << std::string(iter, str.end());
}
std::cout << std::endl;
}
}
Prints
--- "ab" --- parse failed. Remaining unparsed: ab
--- " ab " --- parse failed. Remaining unparsed: ab
--- " a b " --- parse succeeded.
--- "a b" --- parse succeeded.
BONUS Review notes
My guidelines would be:
your skipper should be the grammar's responsibility. It's sad that all Qi samples lead people to believe you need to let the caller decide that
end-iterator checking does not equal error-checking. It's very possible to parse things correctly without consuming all input. Which is why reporting the "remaining input" should not just happen in the case that parsing failed.
If trailing unparsed input is an error, spell it out:
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
for (std::string const str : { "ab", " ab ", " a b ", "a b happy trees are trailing" }) {
auto iter = str.begin(), end = str.end();
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
bool r = qi::parse(iter, end, qi::skip(qi::space) [ token >> token >> qi::eoi ]);
std::cout << " --- " << std::quoted(str) << " --- ";
if (r) {
std::cout << "parse succeeded.";
} else {
std::cout << "parse failed.";
}
if (iter != end) {
std::cout << " Remaining unparsed: " << std::quoted(std::string(iter, str.end()));
}
std::cout << std::endl;
}
}
Prints
--- "ab" --- parse failed. Remaining unparsed: "ab"
--- " ab " --- parse failed. Remaining unparsed: " ab "
--- " a b " --- parse succeeded.
--- "a b happy trees are trailing" --- parse failed. Remaining unparsed: "a b happy trees are trailing"

Related

Continue appending [to stream] in for-loop without local variable access

Imagine you have the following code where logDebug() is expensive or is not appropriate to call more than once:
QDebug d = logDebug();
d << __FUNCTION__ << ":";
d << "positions separated with \" --- \":";
for (const auto& str : positions)
{
d << "---" << str;
}
A macro (just to replace the function name correctly) already exists which replaces the first 2 lines:
#define LOG_FUNCTION this->logDebug() << __FUNCTION__ << ":"
It creates the local variable by calling logDebug(). Once called, you can only use the operator<< onto the macro.
The problem is you can't attach the for loop body to logger.
Q: Is there a way I could use the macro for pasting all the positions (without calling logDebug again?
I would guess this should be possible using lambdas, but I quite don't know how to.
Please help, the shortest answer wins!
Q: Is there a way I could use the macro for pasting all the positions (without calling logDebug again? I would guess this should be possible using lambdas, but I quite don't know how to.
I suppose it's possible with something as follows (used std::cout instead of logDebug())
#include <iostream>
#define LOG_FUNCTION std::cout << __FUNCTION__ << ": "
#define LOG_DEB(ps) \
[](auto & s, auto const & _ps) { for ( auto const & p : _ps ) s << p; } \
(LOG_FUNCTION, ps)
int main ()
{
int a[] { 0, 1, 2, 3, 4 };
LOG_DEB(a);
}
I've used a couple of auto as types of the lambda arguments and this works only starting from C++14.
In C++11 you have to replace they with the correct types.
Well the macro can be coerced to return your debug object:
#define LOG_FUNCTION() this->logDebug() << __FUNCTION__ << ":"
Then use it like this:
auto& d = LOG_FUNCTION();
d << "positions separated with \" --- \":";
for (const auto& str : positions)
{
d << "---" << str;
}

C++: How can I make an integer filter with only <iostream> library?

(I don't have much english vocabulary, so sry for this weird try of english)
Hi guys! I'm new at C++ and I need to know how to create a filter code that help me at only accept int-eger numbers. I need that this code use only the 'iostream' library. This is because my teacher don't let us use another kind of library (we are new at C++ coding).
Here I put an example of what I have at this moment:
# include <iostream>
# include <limits> //I should't use this library
using namespace std;
int main() {
int value = 0;
cout << "Enter an integer value: ";
while(!(cin >> value)) {
cin.clear();
cin.ignore(numeric_limits<streamsize>::max(), '\n');
cout << endl <<"Value must be an integer"<< endl << endl; //This line needs <limits>
cout << "Enter another integer value: " ;
}
}
But this code have some inconvenients:
I'm using "#include 'limits'" library and I shouldn't use it
If you enter "1asd" it takes the "1" value, give it like if its correct and it isn't true
Do you guys have any solution for this situation? Thanks in advance for your time.
You just have to check if the bytes that the user entered are numerals like below. If all the bytes of the entered string are numerals (ie between characters 0 and 9), then the entire string is an integer. Except first byte of the string can be a '+', '-', a space/tab or just the first numeral in the number. (Thanks Zett42).
std::cout << "Enter an integer value: ";
std::string res1;
std::cin >> res1;
std::string::iterator it;
for ( it = res1.begin() ; it < res1.end(); it++)
{ std::cout << "checking " << *it << ' ';
if (!( '0' <= *it && *it <= '9' )) {
std::cout << "this is a numeral\n";
} else {
std::cout << "you entered: " << *it << " -- this is *not* a numeral\n";
}
}

rules working on 1.46 boost::spirit and stopped working on boost spirit 1.55

constant_double_quotation_string %= char_( '"' ) >>
*( spirit::qi::string( "\\\"" )[ _val += _1 ] |
( char_ - '"' ) ) >> char_( '"' );
constant_single_quotation_string %= char_( '\'' ) >>
*( spirit::qi::string( "\\\'" )[ _val += _1 ] |
( char_ - '\'' ) ) >> char_( '\'' );
now it is saying char is not a class or structure or union type with gcc 4.7.2?
Elaborating on my earlier answer
In case you actually do want to expose the unescaped value, I'd suggest:
not using raw (obviously, because we don't wish to mirror the exact input sequence in the presence of escaped characters)
still not using semantic actions
instead playing clever with lit('\\') to match the escape character without adding it to the output sequence.
Here I chose to use a single rule definition for both the double-/single quoted literal parsers. Instead, I pass in the expected quote character as an inherited attribute:
qi::rule<It, std::string(char)>
q_literal;
q_literal = lit(_r1) >> *('\\' >> char_ | (char_ - lit(_r1))) >> lit(_r1);
start = q_literal('"') | q_literal('\'');
Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper = qi::space_type>
struct my_grammar : qi::grammar<It, std::string(), Skipper> {
my_grammar() : my_grammar::base_type(start) {
using namespace qi;
start = q_literal('"') | q_literal('\'');
q_literal = lit(_r1) >> *('\\' >> char_ | (char_ - lit(_r1))) >> lit(_r1);
BOOST_SPIRIT_DEBUG_NODES( (start)(q_literal) )
}
private:
qi::rule<It, std::string(), Skipper> start;
// drop skipper to make these rules implicitly 'lexeme'
// see: https://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965
qi::rule<It, std::string(char)> q_literal;
};
int main() {
using It = std::string::const_iterator;
my_grammar<It> g;
for (std::string const& input : {
"\"hello world\"",
"\"hello \\\"world\\\"\"",
"'bye world'",
"'bye \"\\'world\\'\"'",
"bogus" })
{
std::cout << "\n------- Parsing: " << input << '\n';
It f = input.begin(), l = input.end();
std::string result;
bool ok = qi::phrase_parse(f, l, g, qi::space, result);
if (ok)
std::cout << "Parse success: " << result << "\n";
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed input '" << std::string(f,l) << "'\n";
}
}
Printing the unescaped literals:
------- Parsing: "hello world"
Parse success: hello world
------- Parsing: "hello \"world\""
Parse success: hello "world"
------- Parsing: 'bye world'
Parse success: bye world
------- Parsing: 'bye "\'world\'"'
Parse success: bye "'world'"
------- Parsing: bogus
Parse failed
Remaining unparsed input 'bogus'
You don't even specify the declared type of the constant_single_quotation_string rule.
Here's some observations and a working approach:
Since you
apparently do not want the synthesized attribute value to be the input sequence unescaped you can simply use the qi::raw[] directive to mirror the input sequence directly. This way you can simplify the rule itself
You don't need %= (auto rule assignment) or semantic actions ([_val+=_1]) at all; ¹
Instead if you e.g. didn't want the opening/closing quotes as a part
of the attribute, just replace qi::char_('"') with qi::lit('"') (or indeed, just '"')
Simplified:
qi::rule<It, std::string()>
dq_literal,
sq_literal;
dq_literal = raw [ '"' >> *("\\\"" | ~char_('"')) >> '"' ];
sq_literal = raw [ "'" >> *("\\'" | ~char_("'")) >> "'" ];
Full Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper = qi::space_type>
struct my_grammar : qi::grammar<It, std::string(), Skipper> {
my_grammar() : my_grammar::base_type(start) {
using namespace qi;
start = dq_literal
| sq_literal;
dq_literal = raw [ '"' >> *("\\\"" | ~char_('"')) >> '"' ];
sq_literal = raw [ "'" >> *("\\'" | ~char_("'")) >> "'" ];
BOOST_SPIRIT_DEBUG_NODES(
(start)(dq_literal)(sq_literal)
)
}
private:
qi::rule<It, std::string(), Skipper> start;
// drop skipper to make these rules implicitly 'lexeme'
// see: https://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965
qi::rule<It, std::string()>
dq_literal,
sq_literal;
};
int main() {
using It = std::string::const_iterator;
my_grammar<It> g;
for (std::string const& input : {
"\"hello world\"",
"\"hello \\\"world\\\"\"",
"'bye world'",
"'bye \"\\'world\\'\"'",
"bogus" })
{
std::cout << "\n------- Parsing: " << input << '\n';
It f = input.begin(), l = input.end();
std::string result;
bool ok = qi::phrase_parse(f, l, g, qi::space, result);
if (ok)
std::cout << "Parse success: " << result << "\n";
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed input '" << std::string(f,l) << "'\n";
}
}
Printing:
------- Parsing: "hello world"
Parse success: "hello world"
------- Parsing: "hello \"world\""
Parse success: "hello \"world\""
------- Parsing: 'bye world'
Parse success: 'bye world'
------- Parsing: 'bye "\'world\'"'
Parse success: 'bye "\'world\'"'
------- Parsing: bogus
Parse failed
Remaining unparsed input 'bogus'
¹ see also Boost Spirit: "Semantic actions are evil"?

Boost Spirit - Parser Capturing Unwanted Text

I have a simple struct
// in namespace client
struct UnaryExpression
{
std::string key;
SomeEnums::CompareType op;
};
SomeEnums::CompareType is an enum where I define a symbol table as such:
struct UnaryOps : bsq::symbols<char, SomeEnums::CompareType>
{
UnaryOps() : bsq::symbols<char, SomeEnums::CompareType>(std::string("UnaryOps"))
{
add("exists", SomeEnums::Exists)
("nexists", SomeEnums::NotExists);
}
};
I have two different ways I want to parse the struct, which I asked about in another thread and got to work (mostly).
My grammar looks as follows:
template<typename Iterator>
struct test_parser : bsq::grammar<Iterator, client::UnaryExpression(), bsq::ascii::space_type>
{
test_parser()
: test_parser::base_type(unaryExp, std::string("Test"))
{
using bsq::no_case;
key %= bsq::lexeme[bsq::alnum >> +(bsq::alnum | bsq::char_('.'))];
unaryExp %= unaryE | unaryF;
unaryE %= key >> no_case[unaryOps];
unaryF %= no_case[unaryOps] >> '(' >> key >> ')';
};
UnaryOps unaryOps;
bsq::rule<Iterator, std::string(), bsq::ascii::space_type> key;
bsq::rule<Iterator, client::UnaryExpression(), bsq::ascii::space_type> unaryExp;
bsq::rule<Iterator, client::UnaryExpression(), bsq::ascii::space_type> unaryE;
bsq::rule<Iterator, client::UnaryFunction(), bsq::ascii::space_type> unaryF;
};
And I'm parsing the code using the following logic:
bool r = phrase_parse(iter, end, parser, bsq::ascii::space, exp);
if (r && iter == end)
{
std::cout << "-------------------------\n";
std::cout << "Parsing succeeded\n";
std::cout << "key: " << exp.key << "\n";
std::cout << "op : " << exp.op << "\n";
std::cout << "-------------------------\n";
}
This all works fine if I do the input like foo exists and exp.key equals "foo" and exp.op equals the corresponding enum value (in this case 0). Something like foo1 nexists also works.
However, that second rule doesn't work like I expect. If I give it input of nexists(foo) then I get the following output:
-------------------------
Parsing succeeded
key: nexistsfoo
op : 1
-------------------------
It seems that the enum value is getting set appropriately but I can't figure out why the "nexsts" is getting prepended to the key string. Can someone please tell me how I can fix my rule so that the key would equal just 'foo' with the second rule?
I have posted a copy of the stripped down code that illustrates my problem here: http://pastebin.com/402M9iTS

regex pattern to extract empty field from CSV file

I have a csv file that needs to be read into Matrix.
Right now i have regex pattern
regex pat { R"(("[^"]+")|([^,]+))" }
i found similar topics from stackoverflow, but either theey used different regex pattern or were used with language other than c++.
Right now it chooses between sequences that are between quotes and anything that is not comma. The file contains data from the survey that has questions with yes no answers. If you answer "no" you do not need to answer to some related questions.
Therefore i get some sequences in file like this: ":,,,,,,,," Wheres each two commas mean an empty field. But i would like to remain the row as an equally numbered array. It seems that it would be easyer to later navigate through matrix to get information. So i would have to extract these empty fields between the commas.
I could not find a regex pattern for empty sequence. Is regex pattern a proper way for solving this issue?
This code illustrates sample usage of the named pattern:
#include <iostream>
#include <iterator>
#include <string>
#include <regex>
int main()
{
std::regex field_regex("(\"([^\"]*)\"|([^,]*))(,|$)");
for (const std::string s : {
"a,,hello,,o",
"\"a\",,\"hello\",,\"o\"",
",,,,"})
{
std::cout << "parsing: " << s << "\n";
std::cout << "======================================" << "\n";
auto i = 0;
for (auto it = std::sregex_iterator(s.begin(), s.end(), field_regex);
it != std::sregex_iterator();
++it, ++i)
{
auto match = *it;
auto extracted = match[2].length() ? match[2].str() : match[3].str();
std::cout << "column[" << i << "]: " << extracted << "\n";
if (match[4].length() == 0)
{
break;
}
}
std::cout << "\n";
}
}
Output:
parsing: a,,hello,,o
======================================
column[0]: a
column[1]:
column[2]: hello
column[3]:
column[4]: o
parsing: "a",,"hello",,"o"
======================================
column[0]: a
column[1]:
column[2]: hello
column[3]:
column[4]: o
parsing: ,,,,
======================================
column[0]:
column[1]:
column[2]:
column[3]:
column[4]:

Resources