Boost Spirit rule with custom attribute parsing - boost

I am writing a Boost Spirit grammar to parse text into a vector of these structs:
struct Pair
{
double a;
double b;
};
BOOST_FUSION_ADAPT_STRUCT(
Pair,
(double, a)
(double, a)
)
This grammar has a rule like this:
qi::rule<Iterator, Pair()> pairSequence;
However, the actual grammar of pairSequence is this:
double_ % separator
I want this grammar to produce a Pair with a equal to the double and b equal to some constant. I want to do something like this:
pairSequence = double_[_val = Pair(_1, DEFAULT_B)] % separator;
The above does not compile, of course. I tried adding a constructor to Pair, but I still get compile errors (no matching function for call to 'Pair::Pair(const boost::phoenix::actor >&, double)').

First of all, the signature of pairSequence needs to be:
qi::rule<Iterator, std::vector<Pair>()> pairSequence;
as the list operator exposes a std::vector<Pair> as its attribute.
All functions called from inside a semantic action have to be 'lazy', so you need to utilize phoenix:
namespace phx = boost::phoenix;
pairSequence =
double_[
phx::push_back(_val,
phx::construct<Pair>(_1, phx::val(DEFAULT_B))
)
] % separator
;
Another possibility would be to add a (non-explicit) constructor to Pair:
struct Pair
{
Pair(double a) : a(a), b(DEFAULT_B) {}
double a;
double b;
};
which allows to simplify the grammar:
pairSequence = double_ % separator;
and completely relies on Spirit's built-in attribute propagation rules.
BTW, for any of this to work, you don't need to adapt Pair as a Fusion sequence.

Related

Why is `ref` used instead of an asterisk in pattern matching?

I am having trouble trying to understand pattern matching rules in Rust. I originally thought that the idea behind patterns are to match the left-hand side and right-hand side like so:
struct S {
x: i32,
y: (i32, i32)
}
let S { x: a, y: (b, c) } = S { x: 1, y: (2, 3) };
// `a` matches `1`, `(b, c)` matches `(2, 3)`
However, when we want to bind a reference to a value on the right-hand side, we need to use the ref keyword.
let &(ref a, ref b) = &(3, 4);
This feels rather inconsistent.
Why can't we use the dereferencing operator * to match the left-hand side and right-hand side like this?
let &(*a, *b) = &(3, 4);
// `*a` matches `3`, `*b` matches `4`
Why isn't this the way patterns work in Rust? Is there a reason why this isn't the case, or have I totally misunderstood something?
Using the dereferencing operator would be very confusing in this case. ref effectively takes a reference to the value. These are more-or-less equivalent:
let bar1 = &42;
let ref bar2 = 42;
Note that in let &(ref a, ref b) = &(3, 4), a and b both have the type &i32 — they are references. Also note that since match ergonomics, let (a, b) = &(3, 4) is the same and shorter.
Furthermore, the ampersand (&) and asterisk (*) symbols are used for types. As you mention, pattern matching wants to "line up" the value with the pattern. The ampersand is already used to match and remove one layer of references in patterns:
let foo: &i32 = &42;
match foo {
&v => println!("{}", v),
}
By analogy, it's possible that some variant of this syntax might be supported in the future for raw pointers:
let foo: *const i32 = std::ptr::null();
match foo {
*v => println!("{}", v),
}
Since both ampersand and asterisk could be used to remove one layer of reference/pointer, they cannot be used to add one layer. Thus some new keyword was needed and ref was chosen.
See also:
Meaning of '&variable' in arguments/patterns
What is the syntax to match on a reference to an enum?
How can the ref keyword be avoided when pattern matching in a function taking &self or &mut self?
How does Rust pattern matching determine if the bound variable will be a reference or a value?
Why does pattern matching on &Option<T> yield something of type Some(&T)?
In this specific case, you can achieve the same with neither ref nor asterisk:
fn main() {
let (a, b) = &(3, 4);
show_type_name(a);
show_type_name(b);
}
fn show_type_name<T>(_: T) {
println!("{}", std::any::type_name::<T>()); // rust 1.38.0 and above
}
It shows both a and b to be of type &i32. This ergonomics feature is called binding modes.
But it still doesn't answer the question of why ref pattern in the first place. I don't think there is a definite answer to that. The syntax simply settled on what it is now regarding identifier patterns.

c++ pointer assignment confusion

Consider this MCV example
class A
{
class B
{
public:
B();
~B();
};
public:
B* a, b, c;
A();
~A();
void foo();
};
A::foo()
{
a = b = c;
}
yields the following compilation error in Visual Studio 2015
Severity Code Description Project File Line Suppression State
Error C2679 binary '=': no operator found which takes a right-hand operand of type 'A::B *' (or there is no acceptable conversion)
Strangely if I declare a, b, and c as follows
B* a; B* b, B* c;
There is no compilation issue. Because the pointers are class type, am I required to provide an appropriate B operator=(B& poo) for the original declaration to work? Certainly I can do the following int x, y, z so why is the above generating a compiler error?
The correct answer here is, don't declare multiple variables on one line. It's a pointless character saving that saves nothing on semantics whatsoever and merely leads to confusion. Don't use the std::add_pointer_t thing and don't just add more stars.
This is an anachronism from C; the pointer asterisk (*) binds to the name, and not the type, yielding:
B* a;
B b;
B c;
A better, less error prone way to declare multiple raw pointers is this:
std::add_pointer_t<B> a, b, c;
If you only have access to C++11, you need to use the more verbose std::add_pointer<B>::type instead.
In some codebases you might also find named typedefs for commonly used pointer types, like so:
typedef B* BPtr;
BPtr a, b, c;
Which yields what you'd expect. You can still use that, mixing and matching with using and std::add_pointer.
Alternatively, you can put a star in front of every name. That's why some people write this as:
B *a, *b, *c;
I'd personally discourage that. As mentioned before, it's not really readable and quite error-prone.
However, this assumes that your variables should actually be of the same type, which isn't coincidental. An example of such coincidence could be two numeric values happening to be int-s, but with no relationship between them. This is more of a design decision, though, and I assume that if you're asking about a single-type, multiple-name declaration, you understand what it entails.
In the original declaration, a is of type B*, but b and c are of type B.
To make it work as a single declaration it should be
B* a, *b, *c;
IMHO I'd leave it as separate declarations if only to avoid the entire issue.

Why can't std::is_permutation act between two different types of data?

Suppose I have a vector of integers and of strings, and I want to compare whether they have equivalent elements, without consideration of order. Ultimately, I'm asking if the integer vector is a permutation of the string vector (or vice versa). I'd like to be able to just call is_permutation, specify a binary predicate that allows me to compare the two, and move on with my life. eg:
bool checkIntStringComparison( const std::vector<int>& intVec,
const std::vector<std::string>& stringVec,
const std::map<int, std::string>& intStringMap){
return std::is_permutation<std::vector<int>::const_iterator, std::vector<std::string>::const_iterator>(
intVec.cbegin(), intVec.cend(), stringVec.cbegin(), [&intStringMap](const int& i, const std::string& string){
return string == intStringMap.at(i);
});
}
But trying to compile this (in gcc) returns an error message that boils down to:
no match for call to stuff::< lambda(const int&, const string& >)(const std::_cxx11::basic_string&, const int&)
see how it switches the calling signature from the lambda's? If I switch them around, the signature switches itself the other way.
Digging around about this error, it seems that the standard specifies for std::is_permutation that ForwardIterator1 and 2 must be the same type. So I understand the compiler error in that regard. But why should it be this way? If I provide a binary predicate that allows me to compare the two (or if we had previously defined some equality operator between the two?), isn't the real core of the algorithm just searching through container 1 to make sure all its elements are in container 2 uniquely?
The problem is that an element can occur more than once. That means that the predicate needs to be able to not only compare the elements of the first range to the elements of the second range, but to compare the elements of the first range to themselves:
if (size(range1) != size(range2))
return false;
for (auto const& x1 : range1)
if (count_if(range1, [&](auto const& y1) { return pred(x1, y1); }) !=
count_if(range2, [&](auto const& y2) { return pred(x1, y2); }))
return false;
return true;
Since it's relatively tricky to create a function object that takes two distinct signatures, and passing two predicates would be confusing, the easiest option was to specify that both ranges must have the same value type.
Your options are:
Wrap one range (or both) in a transform that gives the same value type (e.g. use Boost.Adaptors.Transformed);
Write your own implementation of std::is_permutation (e.g. copying the example implementation on cppreference);
Actually, note that the gcc (i.e. libstdc++) implementation does not enforce that the value types are the same; it just requires several signatures which you'd have to provide anyway, so write a polymorphic predicate as e.g. a function object or a polymorphic lambda, or with parameter types convertible from both range value types (e.g. in your case boost::variant<int, string> - ugly, but probably not that bad). This is non-portable, as another implementation might choose to enforce that requirement.

C++ Boost qi recursive rule construction

[It seems my explanations and expectations are not clear at all, so I added precision on how I'd like to use the feature at the end of the post]
I'm currently working on grammars using boost qi. I had a loop construction for a rule cause I needed to build it from the elements of a vector. I have re-written it with simple types, and it looks like:
#include <string>
// using boost 1.43.0
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_eps.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace bqi = boost::spirit::qi;
typedef const char* Iterator;
// function that you can find [here][1]
template<typename P> void test_phrase_parser(char const* input, P const& p, bool full_match = true);
int main()
{
// my working rule type:
bqi::rule<Iterator, std::string()> myLoopBuiltRule;
std::vector<std::string> v;
std::vector<std::string>::const_iterator iv;
v.push_back("abc");
v.push_back("def");
v.push_back("ghi");
v.push_back("jkl");
myLoopBuiltRule = (! bqi::eps);
for(iv = v.begin() ; iv != v.end() ; iv++)
{
myLoopBuiltRule =
myLoopBuiltRule.copy() [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
;
}
debug(myLoopBuiltRule);
char s[] = " abc ";
test_phrase_parser(s, myLoopBuiltRule);
}
(Looks like here does not want to be replaced by corresponding hyperlink, so here is the address to find function test_phrase_parser(): http://www.boost.org/doc/libs/1_43_0/libs/spirit/doc/html/spirit/qi/reference/basics.html)
All was for the best in the best of all worlds... until I had to pass an argument to this rule. Here is the new rule type:
// my not-anymore-working rule type:
bqi::rule<Iterator, std::string(int*)> myLoopBuiltRule;
'int*' type is for example purpose only, my real pointer is adressing a much more complex class... but still a mere pointer.
I changed my 'for' loop accordingly, i.e.:
for(iv = v.begin() ; iv != v.end() ; iv++)
{
myLoopBuiltRule =
myLoopBuiltRule.copy()(bqi::_r1) [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
;
}
I had to add a new rule because test_phrase_parser() cannot guess which value is to be given to the int pointer:
bqi::rule<Iterator> myInitialRule;
And change everything that followed the for loop:
myInitialRule = myLoopBuiltRule((int*)NULL);
debug(myLoopBuiltRule);
char s[] = " abc ";
test_phrase_parser(s, myInitialRule);
Then everything crashed:
/home/sylvain.darras/software/repository/software/external/include/boost/boost_1_43_0/boost/spirit/home/qi/nonterminal/rule.hpp:199: error: no matching function for call to ‘assertion_failed(mpl_::failed************ (boost::spirit::qi::rule<Iterator, T1, T2, T3, T4>::operator=(const Expr&)
Then I got crazy and tried:
myLoopBuiltRule =
myLoopBuiltRule.copy(bqi::_r1) [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
-->
error: no matching function for call to ‘boost::spirit::qi::rule<const char*, std::string(int*), boost::fusion::unused_type, boost::fusion::unused_type, boost::fusion::unused_type>::copy(const boost::phoenix::actor<boost::spirit::attribute<1> >&)’
Then I got mad and wrote:
myLoopBuiltRule =
myLoopBuiltRule(bqi::_r1) [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
Which compiles since it is perfectly syntactically correct, but which magnificently stack overflows coz it happily, nicely, recursively, calls itself to death...
Then I lost my mind and typed:
myLoopBuiltRule =
jf jhsgf jshdg fjsdgh fjsg jhsdg jhg sjfg jsgh df
Which, as you probably expect, has failed to compile.
You imagine that before writing the above novel, I checked out on the web, but didn't find out anything related to copy() and argument passing in the same time. Has anyone already experienced this problem ? Have I missed something ?
Be assured that any help will be really really appreciated.
PS: Great thanks to hkaiser who has, without knowing it, answered a lot of my boost::qi problems through google (but this one).
Further information:
The purpose of my parser is to read files written in a given language L. The purpose of my post is to propagate my "context" (i.e.: variable definitions and especially constant values, so I can compute expressions).
The number of variable types I handle is small, but it's bound to grow, so I keep these types in a container class. I can loop on these managed types.
So, let's consider a pseudo-algorithm of what I would like to achive:
LTypeList myTypes;
LTypeList::const_iterator iTypes;
bqi::rule<Iterator, LType(LContext*)> myLoopBuiltRule;
myLoopBuiltRule = (! bqi::eps);
for(iTypes = myTypes.begin() ; iTypes != myTypes.end() ; iTypes++)
{
myLoopBuiltRule =
myLoopBuiltRule.copy()(bqi::_r1) [ bqi::_val = bqi::_1 ]
| iTypes->getRule()(bqi::_r1) [ bqi::_val = bqi::_1 ]
}
This is done during initialization and then myLoopBuiltRule is used and reused with different LContext*, parsing multiple types. And since some L types can have bounds, which are integer expressions, and that these integer expressions can exhibit constants, I (think that I) need my inherited attribute to take my LContext around and be able to compute expression value.
Hope I've been clearer in my intentions.
Note I just extended my answer with a few more informational links. In this particular case I have a hunch that you could just get away with the Nabialek trick and replacing the inherited attribute with a corresponding qi::locals<> instead. If I have enough time, I might work out a demonstration later.
Caveats, expositioning the problem
Please be advised that there are issues when copying proto expression trees and spirit parser expressions in particular - it will create dangling references as the internals are not supposed to live past the end of the containing full expressions. See BOOST_SPIRIT_AUTO on Zero to 60 MPH in 2 seconds!
Also see these answers which also concerns themselves with building/composing rules on the fly (at runtime):
Generating Spirit parser expressions from a variadic list of alternative parser expressions
Can Boost Spirit Rules be parameterized which demonstrates how to return rules from a function using boost::proto::deepcopy (like BOOST_SPIRIT_AUTO does, actually)
Nabialek Trick
In general, I'd very strongly advise against combining rules at runtime. Instead, if you're looking to 'add alternatives' to a rule at runtime, you can always use qi::symbols<> instead. The trick is to store a rule in the symbol-table and use qi::lazy to call the rule. In particular, this is known as the Nabialek Trick.
I have a toy command-line arguments parser here that demonstrates how you could use this idiom to match a runtime-defined set of command line arguments:
https://gist.github.com/sehe/2a556a8231606406fe36
Limitations of qi::lazy, what's next?
Unfortunately, qi::lazy does not support inherited arguments see e.g.
http://boost.2283326.n4.nabble.com/pass-inhertited-attributes-to-nabialek-trick-td2679066.html
You might be better off writing a custom parser component, as documented here:
http://boost-spirit.com/home/articles/qi-example/creating-your-own-parser-component-for-spirit-qi/
I'll try to find some time to work out a sample that replaces inherited arguments by qi::locals later.

HowTo sort std::map by first and second params?

Here is my std::map example, like std::map< string, string > my_map;
// ABC | aaa ABC | aaa
// DEF | def ABC | dcd
// BCD | def -> ABC | zzz
// DEF | bcd BCD | def
// ABC | dcd DEF | bcd
// ABC | zzz DEF | def
As you can see, I'm trying to sort left std::map and get the right one.
And here is my code (I used not strings, but my custom types. any way, in final, I'm sorting strings):
template < typename T1, typename T2 >
struct less_second
{
typedef std::pair< T1, T2 > type;
bool operator ()( type const& _left, type const& _right ) const
{
return ( (*_left.first).name() < (*_right.first).name() ) &&
( (*_left.second).name() < (*_right.second).name() );
}
};
Problem: when I use only in less_second
return (*_left.first).name() < (*_right.first).name();
All data from the first column sorted, but second column not (of course, because we are used only first!)
The mirrored situation, when I use only
return (*_left.second).name() < (*_right.second).name();
The second column sorted.
BUT I need to sort and first and the second columns at once. How to code this? What I'm doing wrong?
Thanks for help!
Sorry, forget this code:
std::vector< std::pair< CompanyPtr, ContractorPtr > > n_map_( buddies_ccm_.begin(), buddies_ccm_.end() );
std::sort( n_map_.begin(), n_map_.end(), less_second< CompanyPtr, ContractorPtr >() );
Your comparison function is wrong. It does not work when both first.name() are equal. Try something like this:
bool operator ()( type const& _left, type const& _right ) const
{
if ((*_left.first).name() > (*_right.first).name())
return false;
if ((*_left.first).name() < (*_right.first).name())
return true;
return ( (*_left.second).name() < (*_right.second).name() );
}
I'm a little bit confused by the term "columns". Are you talking about keys and values?
A std::map is always ordered by key. You can specify a compare object at construction time of your map to define that order. But this compare object does not compare std::pairs, but objects of the key type of your map.
Moreover, a key in a map is unique. Thus, there cannot be two entries with the key "ABC" in the map.
I suppose you try to sort the map with std::sort from <algorithm>. I'm not sure what happens in this case, but I think it is not what you expect to happen.
You only want to take the second column into account if _left->first and _right->first are equal. Put another way, you want to make your code compare the first fields first, and fall back to comparing the second fields only if the first comparison is inconclusive (which happens when both firsts are equal).
You cannot achieve what you want with a std::map as it has been already explained in another answer. If your pairs don't have repetitions you could use a set of pairs:
std::set<std::pair<std::string, std::string> > pair_set;
You don't need to provide a comparator as std::pair already supports the kind of comparison you require.

Resources