I'm trying to parse a language where a unary minus is distinguished from a binary minus by the whitespaces existing around the sign. Below are some pseudo rules defining how the minus sign is interpreted in this language:
-x // unary
x - y // binary
x-y // binary
x -y // unary
x- y // binary
(- y ... // unary
Note: The open paren in the last rule can be replaced by any token in the language except 'identifier', 'number' and 'close_paren'.
Note: In the 4th case, x is an identifier. An identifier can constitue a statement of its own. And -y is a separate statement.
Since the minus sign type depends on whitespaces, I thought I'd have two different tokens returned from the lexer, one for unary minus and one for binary minus. Any ideas how can I do this?
Code: Here's some code that works for me, but I'm not quite sure if it's robust enough. I tried to make it simple by removing all the irrelevant lexer rules:
#ifndef LEXER_H
#define LEXER_H
#include <iostream>
#include <algorithm>
#include <string>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_function.hpp>
#include <boost/spirit/include/phoenix_algorithm.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#define BOOST_SPIRIT_LEXERTL_DEBUG 1
using std::string;
using std::cerr;
namespace skill {
namespace lex = boost::spirit::lex;
namespace phoenix = boost::phoenix;
// base iterator type
typedef string::iterator BaseIteratorT;
// token type
typedef lex::lexertl::token<BaseIteratorT, boost::mpl::vector<int, string> > TokenT;
// lexer type
typedef lex::lexertl::actor_lexer<TokenT> LexerT;
template <typename LexerT>
struct Tokens: public lex::lexer<LexerT>
{
Tokens(const string& input):
lineNo_(1)
{
using lex::_start;
using lex::_end;
using lex::_pass;
using lex::_state;
using lex::_tokenid;
using lex::_val;
using lex::omit;
using lex::pass_flags;
using lex::token_def;
using phoenix::ref;
using phoenix::count;
using phoenix::construct;
// macros
this->self.add_pattern
("EXP", "(e|E)(\\+|-)?\\d+")
("SUFFIX", "[yzafpnumkKMGTPEZY]")
("INTEGER", "-?\\d+")
("FLOAT", "-?(((\\d+)|(\\d*\\.\\d+)|(\\d+\\.\\d*))({EXP}|{SUFFIX})?)")
("SYMBOL", "[a-zA-Z_?#](\\w|\\?|#)*")
("STRING", "\\\"([^\\\"]|\\\\\\\")*\\\"");
// whitespaces and comments
whitespaces_ = "\\s+";
comments_ = "(;[^\\n]*\\n)|(\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/)";
// literals
float_ = "{FLOAT}";
integer_ = "{INTEGER}";
string_ = "{STRING}";
symbol_ = "{SYMBOL}";
// operators
plus_ = '+';
difference_ = '-';
minus_ = "-({SYMBOL}|\\()";
// ... more operators
// whitespace
this->self += whitespaces_
[
ref(lineNo_) += count(construct<string>(_start, _end), '\n'),
_pass = pass_flags::pass_ignore
];
// a minus between two identifiers, numbers or close-open parens is a binary minus, so add spaces around it
this->self += token_def<omit>("[)a-zA-Z?_0-9]-[(a-zA-Z?_0-9]")
[
unput(_start, _end, *_start + construct<string>(" ") + *(_start + 1) + " " + *(_start + 2)),
_pass = pass_flags::pass_ignore
];
// operators (except for close-brackets) cannot be followed by a binary minus
this->self += token_def<omit>("['`.+*<>/!~&|({\\[=,:#](\\s+-\\s*|\\s*-\\s+)")
[
unput(_start, _end, *_start + construct<string>("-")),
_pass = pass_flags::pass_ignore
];
// a minus directly preceding a symbol or an open paren is a unary minus
this->self += minus_
[
unput(_start, _end, construct<string>(_start + 1, _end)),
_val = construct<string>("-")
];
// literal rules
this->self += float_ | integer_ | string_ | symbol_;
// ... other rules
}
~Tokens() {}
size_t lineNo() { return lineNo_; }
// ignored tokens
token_def<omit> whitespaces_, comments_;
// literal tokens
token_def<int> integer_;
token_def<string> float_, symbol_, string_;
// operator tokens
token_def<> plus_, difference_, minus_; // minus_ is a unary minus
// ... other tokens
// current line number
size_t lineNo_;
};
}
#endif // LEXER_H
Basically, I defined a binary minus (called difference in the code) to be any minus sign that has whitespaces on both sides and used unput to ensure this rule. I also defined a unary minus as a minus sign that directly precedes a symbol or an open paren and again used unput to ensure this rule is maintained (for numbers, the minus sign is part of the token).
Related
I'm learning at GCC and while I was trying various solutions to verify the entry of a certain word, IF Word = Word {do something;}
It seems that in C it cannot be done directly and so I tried this solution that seems to work:
#include <stdio.h>
#include <string.h>
int main(){
int CClose = 0;
int VerifyS = 0;
char PWord[30] ={'\0'};
do {
printf("\n Type a word: ");
scanf(" %s", &PWord);
VerifyS = strncmp(PWord, "exit", 4);
if (!VerifyS){ CClose = 1;}else{ printf("\n The Word is:%s", PWord);}
}while (CClose != 1);
return 0;
}
I wanted to know if there is another way to do the same thing.
Thank you.
What you've written is essentially the most common way to do this. There is indeed no way in C to compare two strings in a single expression without calling a function.
You can cut out the temporary variable VerifyS if you like, by writing
if (!strncmp(pWord, "exit", 4)) { /...
or, perhaps slightly clearer
if (strncmp(pWord, "exit", 4) == 0) { /...
I have a problem with a string in arduino. I know that I can not put together different types like that. I have tried several conversions but don't get it working.
The following line is where I get the message "invalid operands of types 'const char [35]' and 'double' to binary 'operator +'"
sendString("Time: " + (micros () - PingTimer) * 0.001, 3 + " ms");
Disclaimer: This question is pretty similar but on a different stack exchange site (and the answer is questionable).
The problem can be reduced to the following snippet:
void setup() {
"hello" + 3.0;
}
It produces the following error message:
error: invalid operands of types 'const char [6]' and 'double' to binary 'operator+'
Many programming languages support "adding" character sequences together, C++ doesn't. Which means that you will need to use a class which represents a character sequence and implements the + operator.
Luckily there is already such a class which you can use: String. Example:
void setup() {
String("hello") + 3.0;
}
The expression is evaluates from left to right which means that the left most type has to be a String, in other words:
String("a") + 1 + 2 + 3
Is understood as:
((String("a") + 1) + 2) + 3
Where String("a") + 1 is a String and therefore (String("a") + 1) + 2 is, and so on...
For visiting a variant using a lambda based visitor I came up to the boost.preprocessor for generating the boilerplate required:
#include <boost/preprocessor.hpp>
#define MY_OVERLOAD(r, data, elem) \
[](elem const& t) { return false; },
#define MY_OVERLOAD_SEQ_MEMBER(typeSeq) \
BOOST_PP_SEQ_FOR_EACH(MY_OVERLOAD, ~, typeSeq)
#define MY_OVERLOAD_MEMBER(typeSeq) \
MY_OVERLOAD_SEQ_MEMBER(typeSeq)
int main()
{
auto visitor = hana::overload(
#if 0 // like to have:
[](int t) { return false; },
[](double t) { return false; },
[](std::string const& t) { false; }
#else
MY_OVERLOAD_MEMBER((int)(double)(std::string))
#endif
);
...
}
This expands as expected so far but failed at last element - there is a trailing comma which fails to compile. I know about BOOST_PP_COMMA_IF, BOOST_PP_ENUM... which do require the number of elements to generate. But this doesn't hold for my use case - the type list is different for each type of visitor of course. Further, I don't be restricted to list the arguments in the manner shown here, a comma separted list as macro argument is also sufficient...
Also note, this code shows only the concept I want to use - in real I don't want to catch POD by reference.
BTW; is the kind of expanding required horizontal or vertically in the term of boost.preprocessor? From feeling horizontal, isn't it?
I have a program that is supposed to take in a paragraph like
Testing#the hash#tag
#program!#when #beginning? a line
or #also a #comma,
and output something like
#the
#tag
#program
#when
#beginning
#also
#comma,
I feel like the logic makes sense, but obviously not because the program never seems to get into the line of input. The problem is almost definitely in the last source file below.
Here is the main source program
#include "HashTagger.h"
#include <string>
#include <iostream>
using namespace hw02;
using namespace std;
int main() {
// Construct an object for extracting the
// hashtags.
HashTagger hashTagger;
// Read the standard input and extract the
// hashtags.
while (true) {
// Read one line from the standard input.
string line;
getline(cin, line);
if (!cin) {
break;
}
// Get all of the hashtags on the line.
hashTagger.getTags(line);
}
// Print the hashtags.
hashTagger.printTags();
// Return the status.
return 0;
}
my header file
#ifndef HASHTAGGER_H
#define HASHTAGGER_H
#include <string>
namespace hw02 {
class HashTagger {
public:
void getTags(std::string line);
void printTags();
private:
std::string hashtags_;
};
}
#endif
and a source file
the test in the source file seems to show that the program only gets to the second line and then stops before grabbing the last 2 hashtags
#include "HashTagger.h"
#include <iostream>
using namespace std;
using namespace hw02;
void HashTagger::getTags(string line) {
// Loop over all characters in a line that can begin a hashtag
int b = 0;
string hashtags_ = "";
for (unsigned int j = 0; j < line.length(); ++j) {
char c = line.at(j);
// if "#" is found assign beginning of capture to b
if (c == '#') {
b = j;
// if the beginning is less than the end space, newline, ".", "?", or "!" found, add substring of the hashtag to hashtags_
}
if (b < j && (c == ' ' || c == '\n' || c == '.' || c == '?' || c == '!' )) {
hashtags_ = hashtags_ + "\n" + line.substr(b, j - b + 1);
b = 0;
//Test// cout << b << "/" << j << "/" << c << "/" << hashtags_ << "/" << endl;
}
}
}
void HashTagger::printTags() {
// print out hashtags_ to the console
cout << hashtags_ << endl;
}
You are redeclaring hashtags_ inside your getTags function. Therefore, all string modifications operate on a local variable instead of the class member variable.
Change the line
string hashtags_ = "";
to
hashtags_ = "";
in order to avoid the redeclaration and operate on the class member variable used for the output later on.
Also, make sure that your input is terminated with two newline characters (\n\n), to avoid breaking out of the main loop too early, or move your check and break statement after the getTags call:
while (true) {
// Read one line from the standard input.
string line;
getline(cin, line);
// Get all of the hashtags on the line.
hashTagger.getTags(line);
if (!cin) {
break;
}
}
I have a problem in regards of extracting signed int from string in c++.
Assuming that i have a string of images1234, how can i extract the 1234 from the string without knowing the position of the last non numeric character in C++.
FYI, i have try stringstream as well as lexical_cast as suggested by others through the post but stringstream returns 0 while lexical_cast stopped working.
int main()
{
string virtuallive("Images1234");
//stringstream output(virtuallive.c_str());
//int i = stoi(virtuallive);
//stringstream output(virtuallive);
int i;
i = boost::lexical_cast<int>(virtuallive.c_str());
//output >> i;
cout << i << endl;
return 0;
}
How can i extract the 1234 from the string without knowing the position of the last non numeric character in C++?
You can't. But the position is not hard to find:
auto last_non_numeric = input.find_last_not_of("1234567890");
char* endp = &input[0];
if (last_non_numeric != std::string::npos)
endp += last_non_numeric + 1;
if (*endp) { /* FAILURE, no number on the end */ }
auto i = strtol(endp, &endp, 10);
if (*endp) {/* weird FAILURE, maybe the number was really HUGE and couldn't convert */}
Another possibility would be to put the string into a stringstream, then read the number from the stream (after imbuing the stream with a locale that classifies everything except digits as white space).
// First the desired facet:
struct digits_only: std::ctype<char> {
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white-space:
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except digits, which are digits
std::fill(&rc['0'], &rc['9'], std::ctype_base::digit);
// and '.', which we'll call punctuation:
rc['.'] = std::ctype_base::punct;
return &rc[0];
}
};
Then the code to read the data:
std::istringstream virtuallive("Images1234");
virtuallive.imbue(locale(locale(), new digits_only);
int number;
// Since we classify the letters as white space, the stream will ignore them.
// We can just read the number as if nothing else were there:
virtuallive >> number;
This technique is useful primarily when the stream contains a substantial amount of data, and you want all the data in that stream to be interpreted in the same way (e.g., only read numbers, regardless of what else it might contain).