How to use boost::spirit to modify a string like regex does? - boost

I'm writing a little Domain Specific Language for my program, using JUCE::JavascriptEngine as the scripting engine. This takes a string as input and then parses it, but I need to do some pre-processing on the string to adapt it from my DSL to JavaScript. The pre-processing mainly consists of wrapping some terms inside functions, and placing object names in front of functions. So, for instance, I want to do something like this:
take some special string input "~/1/2"...
wrap it inside a function: "find("~/1/2")"...
and then attach an object to it: "someObject.find("~/1/2")" (the object name has to be a variable).
I've been using regex for this (now I have two problems...). The regexes are getting complicated and unreadable, and it's missing a lot of special cases. Since what I'm doing is grammatical, I thought I'd upgrade from regex to a proper parser (now I have three problems...). After quite a lot of research, I chose Boost.Spirit. I've been going through the documentation, but it's not taking me in the right direction. Can someone suggest how I might use this library to manipulate strings in the way I am looking for? Given that I am only trying to manipulate a string and am not interested in storing the parsed data, do I need to use karma for the output, or can I output the string with qi or x3, during the parsing process?
If I'm headed down the wrong path here, please feel free to re-direct me.

This seems too broad to answer.
What you're doing is parsing input, and transforming it to something else. What you're not doing is find/replace (otherwise you'd be fine using regular expressions).
Of course you can do what regular expressions do, but I'm not sure it buys you anything:
template <typename It, typename Out>
Out preprocess(It f, It l, Out out) {
namespace qi = boost::spirit::qi;
using boost::spirit::repository::qi::seek;
auto passthrough = [&out](boost::iterator_range<It> ignored, auto&&...) {
for (auto ch : ignored) *out++ = ch;
};
auto transform = [&out](std::string const& literal, auto&&...) {
for (auto ch : "someObject.find(\"~"s) *out++ = ch;
for (auto ch : literal) *out++ = ch;
for (auto ch : "\")"s) *out++ = ch;
};
auto pattern = qi::copy("\"~" >> (*~qi::char_('"')) >> '"');
qi::rule<It> ignore = qi::raw[+(!pattern >> qi::char_)] [passthrough];
qi::parse(f, l, -qi::as_string[pattern][transform] % ignore);
return out;
}
The nice thing about this way of writing it, is that it will work with any source iterator:
for (std::string const input : {
R"(function foo(a, b) { var path = "~/1/2"; })",
})
{
std::cout << "Input: " << input << "\n";
std::string result;
preprocess(begin(input), end(input), back_inserter(result));
std::cout << "Result: " << result << "\n";
}
std::cout << "\n -- Or directly transformed stdin to stdout:\n";
preprocess(
boost::spirit::istream_iterator(std::cin >> std::noskipws), {},
std::ostreambuf_iterator<char>(std::cout));
See it Live On Coliru, printing the output:
Input: function foo(a, b) { var path = "~/1/2"; }
Result: function foo(a, b) { var path = someObject.find("~/1/2"); }
-- Or directly transformed stdin to stdout:
function bar(c, d) { var path = someObject.find("~/1/42"); }
But this is very limited since it will not even do the right thing if such things are parts of comments or multiline strings etc.
So instead you probably want a dedicated library that knows how to parse javascript and use it to do your transformation, such as (one of the first hits when googling tooling library preprocess javascript transform): https://clojurescript.org/reference/javascript-library-preprocessing

Related

Passing a temporary stream object to a lambda function as part of an extraction expression

I have a function which needs to parse some arguments and several if clauses inside it need to perform similar actions. In order to reduce typing and help keep the code readable, I thought I'd use a lambda to encapsulate the recurring actions, but I'm having trouble finding sufficient info to determine whether I'm mistakenly invoking undefined behavior or what I need to do to actualize my approach.
Below is a simplified code snippet of what I have currently:
int foo(int argc, char* argv[])
{
Using ss = std::istringstream;
auto sf = [&](ss&& stream) -> ss& {
stream.exceptions(ss::failbit);
return stream;
};
int retVal = 0;
bool valA = false;
bool valB = false;
try
{
for(int i=1; i < argc; i++)
{
std::string arg( argv[i] );
if( !valA )
{
valA = true;
sf( ss(arg) ) >> myInt;
}
else
if( !valB )
{
valB = true;
sf( ss(arg) ) >> std::hex >> myOtherInt;
}
}
}
catch( std::exception& err )
{
retVal = -1;
std::cerr << err.what() << std::endl;
}
return retVal;
}
First, based on what I've read, I don't think that specifying the lambda argument as an rvalue reference (ss&&) is doing quite what I want it to do, however, trying to compile with it declared as a normal reference (ss&) failed with the error cannot bind non-const lvalue reference of type 'ss&'. Changing ss& to ss&& got rid of the error and did not produce any warnings, but I'm not convinced that I'm using that construct correctly.
I've tried reading up on the various definitions for each, but the wording is a bit confusing.
I guess ultimately my questions are:
Can I expect the lifetime of my temporary ss(arg) object to extend through the entire extraction expression?
What is the correct way to define a lambda such that I can use the lambda in the way I demonstrate above, assuming that such a thing is actually possible?

Extract trailing int from string containing other characters

I have a problem in regards of extracting signed int from string in c++.
Assuming that i have a string of images1234, how can i extract the 1234 from the string without knowing the position of the last non numeric character in C++.
FYI, i have try stringstream as well as lexical_cast as suggested by others through the post but stringstream returns 0 while lexical_cast stopped working.
int main()
{
string virtuallive("Images1234");
//stringstream output(virtuallive.c_str());
//int i = stoi(virtuallive);
//stringstream output(virtuallive);
int i;
i = boost::lexical_cast<int>(virtuallive.c_str());
//output >> i;
cout << i << endl;
return 0;
}
How can i extract the 1234 from the string without knowing the position of the last non numeric character in C++?
You can't. But the position is not hard to find:
auto last_non_numeric = input.find_last_not_of("1234567890");
char* endp = &input[0];
if (last_non_numeric != std::string::npos)
endp += last_non_numeric + 1;
if (*endp) { /* FAILURE, no number on the end */ }
auto i = strtol(endp, &endp, 10);
if (*endp) {/* weird FAILURE, maybe the number was really HUGE and couldn't convert */}
Another possibility would be to put the string into a stringstream, then read the number from the stream (after imbuing the stream with a locale that classifies everything except digits as white space).
// First the desired facet:
struct digits_only: std::ctype<char> {
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white-space:
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except digits, which are digits
std::fill(&rc['0'], &rc['9'], std::ctype_base::digit);
// and '.', which we'll call punctuation:
rc['.'] = std::ctype_base::punct;
return &rc[0];
}
};
Then the code to read the data:
std::istringstream virtuallive("Images1234");
virtuallive.imbue(locale(locale(), new digits_only);
int number;
// Since we classify the letters as white space, the stream will ignore them.
// We can just read the number as if nothing else were there:
virtuallive >> number;
This technique is useful primarily when the stream contains a substantial amount of data, and you want all the data in that stream to be interpreted in the same way (e.g., only read numbers, regardless of what else it might contain).

std::ostream to file or standard output

I would like to write my output to a file if a file name is avaliable or on the screen (stdout) otherwise. So I've read posts on this forum and found a code, which below I wrapped into a method:
std::shared_ptr<std::ostream> out_stream(const std::string & fname) {
std::streambuf * buf;
std::ofstream of;
if (fname.length() > 0) {
of.open(fname);
buf = of.rdbuf();
} else
buf = std::cout.rdbuf();
std::shared_ptr<std::ostream> p(new std::ostream(buf));
return p;
}
The code works perfectly when used in-place. Unfortunately it behaves oddly when wrapped into a separate method (as given above). Is it because the the objects defined within the method (of, buff) are destroyed once the call is finished?
I am using this part of code in several places and it really should be extracted as a separate non-repeating fragment: a method or a class. How can I achieve this?
You're correct that the problems you're having come from the destruction of of. Wouldn't something like this (untested) work?
std::shared_ptr<std::ostream>
out_stream(const std::string &fname) {
if (fname.length() > 0)
std::shared_ptr<std::ostream> p(new std::ofstream(fname));
else
std::shared_ptr<std::ostream> p(new std::ostream(std::cout.rdbuf()));
}

possible to share work when parsing multiple files with libclang?

If I have multiple files in a large project, all of which share a large number of included header files, is there any way to share the work of parsing the header files? I had hoped that creating one Index and then adding multiple translationUnits to it could cause some work to be shared - however even code along the lines of (pseudocode)
index = clang_createIndex();
clang_parseTranslationUnit(index, "myfile");
clang_parseTranslationUnit(index, "myfile");
seems to take the full amount of time for each call to parseTranslationUnit, performing no better than
index1 = clang_createIndex();
clang_parseTranslationUnit(index1, "myfile");
index2 = clang_createIndex();
clang_parseTranslationUnit(index2, "myfile");
I am aware that there are specialized functions for reparsing the exact same file; however what I really want is that parsing "myfile1" and "myfile2" can share the work of parsing "myheader.h", and reparsing-specific functions won't help there.
As a sub-question, is there any meaningful difference between reusing an index and creating a new index for each translation unit?
One way of doing this consists in creating Precompiled Headers (PCH file) from the shared header in your project.
Something along these lines seems to work (you can see the whole example here):
auto Idx = clang_createIndex (0, 0);
CXTranslationUnit TU;
Timer t;
{
char const *args[] = { "-xc++", "foo.hxx" };
int nargs = 2;
t.reset();
TU = clang_parseTranslationUnit(Idx, 0, args, nargs, 0, 0, CXTranslationUnit_ForSerialization);
std::cerr << "PCH parse time: " << t.get() << std::endl;
displayDiagnostics (TU);
clang_saveTranslationUnit (TU, "foo.pch", clang_defaultSaveOptions(TU));
clang_disposeTranslationUnit (TU);
}
{
char const *args[] = { "-include-pch", "foo.pch", "foo.cxx" };
int nargs = 3;
t.reset();
TU = clang_createTranslationUnitFromSourceFile(Idx, 0, nargs, args, 0, 0);
std::cerr << "foo.cxx parse time: " << t.get() << std::endl;
displayDiagnostics (TU);
clang_disposeTranslationUnit (TU);
}
{
char const *args[] = { "-include-pch", "foo.pch", "foo2.cxx" };
int nargs = 3;
t.reset();
TU = clang_createTranslationUnitFromSourceFile(Idx, 0, nargs, args, 0, 0);
std::cerr << "foo2.cxx parse time: " << t.get() << std::endl;
displayDiagnostics (TU);
clang_disposeTranslationUnit (TU);
}
yielding the following output:
PCH parse time: 5.35074
0 diagnostics
foo1.cxx parse time: 0.158232
0 diagnostics
foo2.cxx parse time: 0.143654
0 diagnostics
I did not find much information about libclang and precompiled headers in the API documentation, but here are a few pages where the keyword appears: CINDEX and TRANSLATION_UNIT
Please note that this solution is not optimal by any ways. I'm looking forward to seeing better answers. In particular:
each source file can have at most one precompiled header
nothing here is libclang-specific ; this is the exact same strategy that is used for build time optimization using the standard clang command lines.
it is not really automated, in that you have to explicitly create the precompiled header (and must thus know the name of the shared header file)
I don't think using different CXIndex objects would have made any difference here

Boost serialization end of file

I serialize multiple objects into a binary archive with Boost.
When reading back those objects from a binary_iarchive, is there a way to know how many objects are in the archive or simply a way to detect the end of the archive ?
The only way I found is to use a try-catch to detect the stream exception.
Thanks in advance.
I can think of a number of approaches:
Serialize STL containers to/from your archive (see documentation). The archive will automatically keep track of how many objects there are in the containers.
Serialize a count variable before serializing your objects. When reading back your objects, you'll know beforehand how many objects you expect to read back.
You could have the last object have a special value that acts as a kind of sentinel that indicates the end of the list of objects. Perhaps you could add an isLast member function to the object.
This is not very pretty, but you could have a separate "index file" alongside your archive that stores the number of objects in the archive.
Use the tellp position of the underlying stream object to detect if you're at the end of file:
Example (just a sketch, not tested):
std::streampos archiveOffset = stream.tellg();
std::streampos streamEnd = stream.seekg(0, std::ios_base::end).tellg();
stream.seekg(archiveOffset);
while (stream.tellp() < streamEnd)
{
// Deserialize objects
}
This might not work with XML archives.
Do you have all your objects when you begin serializing? If not, you are "abusing" boost serialization - it is not meant to be used that way. However, I am using it that way, using try catch to find the end of the file, and it works for me. Just hide it away somewhere in the implementation. Beware though, if using it this way, you need to either not serialize pointers, or disable pointer tracking.
If you do have all the objects already, see Emile's answer. They are all valid approaches.
std::istream* stream_;
boost::iostreams::filtering_streambuf<boost::iostreams::input>* filtering_streambuf_;
...
stream_ = new std::istream(memoryBuffer_);
if (stream_) {
filtering_streambuf_ = new boost::iostreams::filtering_streambuf<boost::iostreams::input>();
if (filtering_streambuf_) {
filtering_streambuf_->push(boost::iostreams::gzip_decompressor());
filtering_streambuf_->push(*stream_);
archive_ = new eos::portable_iarchive(*filtering_streambuf_);
}
}
using zip when reading data from the archives, and filtering_streambuf have such method as
std::streamsize std::streambuf::in_avail()
Get number of characters available to read
so i check the end of archive as
bool IArchiveContainer::eof() const {
if (filtering_streambuf_) {
return filtering_streambuf_->in_avail() == 0;
}
return false;
}
It is not helping to know how many objects are last in the archive, but helping to detect the end of them
(i'm using eof test only in the unit test for serialization/unserialization my classes/structures - to make sure that i'm reading all what i'm writing)
Sample code which I used to debug the similar issue
(based on Emile's answer) :
#include <fstream>
#include <iostream>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
struct A{
int a,b;
template <typename T>
void serialize(T &ar, int ){
ar & a;
ar & b;
}
};
int main(){
{
std::ofstream ofs( "ff.ar" );
boost::archive::binary_oarchive ar( ofs );
for(int i=0;i<3;++i){
A a {2,3};
ar << a;
}
ofs.close();
}
{
std::ifstream ifs( "ff.ar" );
ifs.seekg (0, ifs.end);
int length = ifs.tellg();
ifs.seekg (0, ifs.beg);
boost::archive::binary_iarchive ar( ifs );
while(ifs.tellg() < length){
A a;
ar >> a;
std::cout << "a.a-> "<< a.a << " and a.b->"<< a.b << "\n";
}
}
return 0;
}
you just read a byte from the file.
If you do not reach the end,
backword a byte then.

Resources