using stl to run length encode a string using std::adjacent_find - algorithm

I am trying to perform run length compression on a string for a special protocol that I am using. Runs are considered efficient when the run size or a particular character in the string is >=3. Can someone help me to achieve this. I have live demo on coliru. I am pretty sure this is possible with the standard library's std::adjacent_find with a combination of std::not_equal_to<> as the binary predicate to search for run boundaries and probably using std::equal_to<> once I find a boundary. Here is what I have so far but I am having trouble with the results:
Given the following input text string containing runs or spaces and other characters (in this case runs of the letter 's':
"---thisssss---is-a---tesst--"
I am trying to convert the above text string into a vector containing elements that are either pure runs of > 2 characters or mixed characters. The results are almost correct but not quite and I cannot spot the error.
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
expected the following
======================
---,thi,sssss,---,is-a,---,tesst--,
actual results
==============
---,thi,sssss,---,is-a,---,te,ss,--,
EDIT: I fixed up the previous code to make this version closer to the final solution. Specifically I added explicit tests for the run size to be > 2 to be included. I seem to be having boundary case problems though - the all spaces case and the case where the end of the strings ends in several spaces:
#include <iterator>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
#include <algorithm>
#include <functional>
int main()
{
// I want to convert this string containing adjacent runs of characters
std::string testString("---thisssss---is-a---tesst--");
// to the following
std::vector<std::string> idealResults = {
"---", "thi", "sssss",
"---", "is-a",
"---", "tesst--"
};
std::vector<std::string> tokenizedStrings;
auto adjIter = testString.begin();
auto lastIter = adjIter;
// temporary string used to accumulate characters that
// are not part of a run.
std::unique_ptr<std::string> stringWithoutRun;
while ((adjIter = std::adjacent_find(
adjIter, testString.end(), std::not_equal_to<>())) !=
testString.end()) {
auto next = std::string(lastIter, adjIter + 1);
// append to foo if < run threshold
if (next.length() < 2) {
if (!stringWithoutRun) {
stringWithoutRun = std::make_unique<std::string>();
}
*stringWithoutRun += next;
} else {
// if we have encountered non run characters, save them first
if (stringWithoutRun) {
tokenizedStrings.push_back(*stringWithoutRun);
stringWithoutRun.reset();
}
tokenizedStrings.push_back(next);
}
lastIter = adjIter + 1;
adjIter = adjIter + 1;
}
tokenizedStrings.push_back(std::string(lastIter, adjIter));
std::cout << "expected the following" << std::endl;
std::cout << "======================" << std::endl;
std::copy(idealResults.begin(), idealResults.end(), std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
std::cout << "actual results" << std::endl;
std::cout << "==============" << std::endl;
std::copy(tokenizedStrings.begin(), tokenizedStrings.end(), std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}

if (next.length() < 2) {
if (!stringWithoutRun) {
stringWithoutRun = std::make_unique<std::string>();
}
*stringWithoutRun += next;
}
This should be if (next.length() <= 2). You need to add a run of identical characters to the current token if its length is either 1 or 2.
I seem to be having boundary case problems though - the all spaces
case and the case where the end of the strings ends in several spaces
When stringWithoutRun is not empty after the loop finishes, the characters accumulated in it are not added to the array of tokens. You can fix it like this:
// The loop has finished
if (stringWithoutRun)
tokenizedStrings.push_back(*stringWithoutRun);
tokenizedStrings.push_back(std::string(lastIter, adjIter));

Related

How would I write a program that reads from a standard input and outputs only 6 characters to a line?

For example if the input was:
My name is Alex and
I also love coding
The correct output should be:
1:My nam
1:e is A
1:lex an
1:d
2:I also
2: love
2:coding
So far I have this
int main () {
string i;
i.substr(0,6);
while (getline(cin, i)) {
cout << i << endl;
}
}
Using ranges, what you ask is almost as easy as
auto result = view | split('\n') | transform(chunk(6));
where view represents somehow the input, | split('\n') splits that input in several lines, and | transform(chunk(6)) transforms each line by splitting it in chunks of 6 chars. The result is therefore a "range of ranges of chunks", on which you can loop with a double nested for.
Here's a full example:
#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <range/v3/range/conversion.hpp>
#include <range/v3/view/chunk.hpp>
#include <range/v3/view/istream.hpp>
#include <range/v3/view/split.hpp>
#include <range/v3/view/transform.hpp>
// Comment/uncomment the line below
//#define FROM_FILE
using namespace ranges;
using namespace ranges::views;
int main() {
// prepare a path-to-file or string buffer
#ifdef FROM_FILE
std::string path_to_file{"/path/to/file"};
#else
std::basic_stringbuf<char> strbuf{"My name is Alex and\nI also love coding"};
#endif
// generate an input stream from the file or the string buffer
#ifdef FROM_FILE
std::ifstream is(path_to_file);
#else
std::istream is(&strbuf);
#endif
// prevent the stream from skipping whitespaces
is >> std::noskipws;
// generate a range view on the stream
ranges::istream_view<char> view(is);
// manipulate the view
auto out_lines = view | split('\n') // split at line breaks
| transform(chunk(6)); // split each in chunks of 6
// output
int index{};
for (auto line : out_lines) {
++index;
for (auto chunk_of_6 : line) {
std::cout << index << ':'
<< (chunk_of_6 | to<std::string>)
<< std::endl;
}
}
}
First I suggest that you give your variables meaningful names. i isn't good for a variable you use to read lines from std::cin. I've changed that name to line in my example below.
You are on the right track with i.substr(0,6); but you've placed it outside of the loop where i is empty - and you don't print it.
You are also supposed to prepend each line with the line number but that part is completely missing.
You have also missed that you should print the next 6 characters of the read line on the next line until you've printed everything that you read.
Here's an example how that could be fixed:
#include <iostream>
#include <string>
int main() {
unsigned max_len = 6;
std::string line;
for(unsigned line_number = 1; std::getline(std::cin, line); ++line_number) {
// loop until the read line is empty:
while(!line.empty()) {
// print max `max_len` characters and prepend it with the line number:
std::cout << line_number << ':' << line.substr(0, max_len) << '\n';
// if the line was longer than `max_len` chars, remove the first
// `max_len` chars:
if(line.size() > max_len) {
line = line.substr(max_len);
} else { // otherwise, make it empty
line.clear();
}
}
}
}

VSCode adds random percentage

Everytime I use the terminal to print out a string or any kind of character, it automatically prints an "%" at the end of each line. This happens everytime I try to print something from C++ or php, havent tried other languages yet. I think it might be something with vscode, and have no idea how it came or how to fix it.
#include <iostream>
using namespace std;
int test = 2;
int main()
{
if(test < 9999){
test = 1;
}
cout << test;
}
Output:
musti#my-mbp clus % g++ main.cpp -o tests && ./tests
1%
Also changing the cout from cout << test; to cout << test << endl; Removes the % from the output.
Are you using zsh? A line without endl is considered a "partial line", so zsh shows a color-inverted % then goes to the next line.
When a partial line is preserved, by default you will see an inverse+bold character at the end of the partial line: a ‘%’ for a normal user or a ‘#’ for root. If set, the shell parameter PROMPT_EOL_MARK can be used to customize how the end of partial lines are shown.
More information is available in their docs.

C++: No viable overloaded '=' data.end() -1 = '\0'

I'm trying to create a program that filters through speech text, removes any unwanted characters (",", "?", etc., etc.") and then produces a new speech where the words are jumbled based on what words follow or precede them. So for example, if you had the Gettysburg Address:
Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
my program would take that text, put it into a set of strings. i.e. ["Four","score","and","seven",...."continent,"..."Liberty,"..."equal."] Then it would remove any unwanted characters from each string using c++ .erase and c++ .remove, like "," or "." and capitals. After, you'd have a filtered string like ["four","score","and","seven",...."continent"..."liberty"..."equal."]
After that then the words would be rearranged into a new coherent, funnier speech, like:
"Seven years ago our fathers conceived on men...", etc.
That was just so you know the scope of this project. My trouble at the moment has to do with either using my iterator properly or null terminators.
#include <iostream>
#include <fstream>
#include <iomanip>
#include <string>
#include <set>
#include <iterator> //iterates through sets
#include <algorithm>
using namespace std;
int main() {
set <string> speechSet;
set <string> ::iterator itr; //forgot what :: means. Declares iterator as set
int sum = 0;
int x;
string data;
ofstream out;
string setString;
ifstream speechFile; //declare output file stream object. Unknown type name
speechFile.open("./MySpeech");
if (!speechFile) {
cerr << "Unable to open file " << endl;
exit(1);
}
char unwantedCharacters[] = ".";
while (!speechFile.eof()) {
speechFile >> data; //speechFile input into data
for (unsigned int i = 0; i < strlen(unwantedCharacters); ++i) {
data.erase((remove(data.begin(), data.end(),
unwantedCharacters[i]), data.end())); //remove doesn't delete.
data.end() - 1 = '\0'; //Reorganizes
cout << data << endl;
}
speechSet.insert(string(data));
}
//Go through each string (word) one at a time and remove "",?, etc.
/*for(itr = speechSet.begin(); itr != speechSet.end(); ++itr){
if(*itr == ".")//if value pointed to by *itr is equal to '.'
itr = speechSet.erase(itr);//erase the value in the set and leave blank
cout << " " << *itr;//print out the blank
else{
cout << " " << *itr;
}
}*/
speechFile.close();
return (0);
}
I keep getting an error that says error: no viable overloaded '='. At first I thought it might be due to .end() not being a command for a C++ string, but I checked the documentation and it shouldn't be an issue of mismatched data typed. Then I thought it might have to set the iterator itr equal to the end of the data.
iterator itr = data.end() - 1;
and then dereference that pointer and set it equal to the null terminator
itr* = '\0';
That removed the overload error, but I still had another error use of class template 'iterator' requires template arguments. Let me know if any more clarification is needed.
In the for loop, use auto for iterator so you don't have to specify its type like:
for(auto itr = speechSet.begin(); itr != speechSet.end(); ++itr){

String length changes suddenly

Here in this code, the character length is changing suddenly. Before introducing char file the strlen(str) was correct. As I introduced the new char file the strlen value of variable str changes.
#include <unistd.h>
#include <iostream>
#include <stdio.h>
#include <string.h>
using namespace std;
int main(){
char buf[BUFSIZ];
if(!getcwd(buf,BUFSIZ)){
perror("ERROR!");
}
cout << buf << endl;
char *str;
str = new char[strlen(buf)];
strcpy(str,buf);
strcat(str,"/");
strcat(str,"input/abcdefghijklmnop");
cout << str << endl;
cout << strlen(str) << endl;
char *file;
file = new char[strlen(str)];
cout << strlen(file) << endl;
strcpy(file,str);
cout << file << endl;
}
Your code has undefined behavior because of buffer overflow. You should be scared.
You should consider using std::string.
std::string sbuf;
{
char cwdbuf[BUFSIZ];
if (getcwd(cwdbuf, sizeof(cwdbuf))
sbuf = cwdbuf;
else {
perror("getcwd");
exit(EXIT_FAILURE);
}
}
sbuf += "/input/abcdefghijklmnop";
You should compile with all warnings & debug info (e.g. g++ -Wall -Wextra -g) then use the debugger gdb. Don't forget that strings are zero-byte terminated. Your str is much too short. If you insist on avoiding std::string (which IMHO you should not), you need to allocate more space (and remember the extra zero byte).
str = new char[strlen(buf)+sizeof("/input/abcdefghijklmnop")];
strcpy(str, buf);
strcat(str, "/input/abcdefghijklmnop");
Remember that the sizeof some literal string is one byte more than its length (as measured by strlen). For instance sizeof("abc") is 4.
Likewise your file variable is one byte too short (missing space for the terminating zero byte).
file = new char[strlen(str)+1];
BTW on GNU systems (such as Linux) you could use asprintf(3) or strdup(3) (and use free not delete to release the memory) and consider using valgrind.

Different behavior of boost::serialization of strings on text archive

I'm having some issue serializing a std::string with boost::serialization on a text_oarchive. AFAICT, I have two identical pieces of code that behaves differently in two different programs.
This is the program that I believe is behaving correctly:
#include <iostream>
#include <string>
#include <sstream>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
template <typename T>
void serialize_deserialize(const T & src, T & dst)
{
std::string serialized_data_str;
std::cout << "original data: " << src << std::endl;
std::ostringstream archive_ostream;
boost::archive::text_oarchive oarchive(archive_ostream);
oarchive << src;
serialized_data_str = archive_ostream.str();
std::cout << "serialized data: " << serialized_data_str << std::endl;
std::istringstream archive_istream(serialized_data_str);
boost::archive::text_iarchive iarchive(archive_istream);
iarchive >> dst;
}
int main()
{
std::string archived_data_str = "abcd";
std::string restored_data_str;
serialize_deserialize<std::string>(archived_data_str, restored_data_str);
std::cout << "restored data: " << restored_data_str << std::endl;
return 0;
}
And this is its output:
original data: abcd
serialized data: 22 serialization::archive 10 4 abcd
restored data: abcd
(You can compile it with: g++ boost-serialization-string.cpp -o boost-serialization-string -lboost_serialization)
This one, on the other hand, is an excerpt of the program I'm writing (derived from boost_asio/example/serialization/connection.hpp) that serializes std::string data converting each character in its hex representation:
/// Asynchronously write a data structure to the socket.
template <typename T, typename Handler>
void async_write(const T& t, Handler handler)
{
// Serialize the data first so we know how large it is.
std::cout << "original data: " << t << std::endl;
std::ostringstream archive_stream;
boost::archive::text_oarchive archive(archive_stream);
archive << t;
outbound_data_ = archive_stream.str();
std::cout << "serialized data: " << outbound_data_ << std::endl;
[...]
And this is an excerpt of its output:
original data: abcd
serialized data: 22 serialization::archive 10 5 97 98 99 100 0
The version (10) is the same, right? So that should be the proof that I'm using the same serialization library in both programs.
However, I really can't figure out what's going on here. I've been trying to solve this puzzle for almost an entire work day now, and I'm out of ideas.
For anyone that may want to reproduce this result, it should be sufficient to download the Boost serialization example, add the following line
connection_.async_write("abcd", boost::bind(&client::handle_write, this, boost::asio::placeholders::error));
at line 50 of client.cpp, add the following member function in client.cpp
/// Handle completion of a write operation.
void handle_write(const boost::system::error_code& e)
{
// Nothing to do. The socket will be closed automatically when the last
// reference to the connection object goes away.
}
add this cout:
std::cout << "serialized data: " << outbound_data_ << std::endl;
at connection.hpp:59
and compile with:
g++ -O0 -g3 client.cpp -o client -lboost_serialization -lboost_system
g++ -O0 -g3 server.cpp -o server -lboost_serialization -lboost_system
I'm using g++ 4.8.1 under Ubuntu 13.04 64bit with Boost 1.53
Any help would be greatly appreciated.
P.s. I'm posting this because the deserialization of the std::strings isn't working at all! :)
I see two causes of such behavior.
The compiler does not explicitly converts "abcd" from const char * to std::string and the serialization handles it as a vector of "bytes" and not as an ASCII string. Changing the code to the connection_.async_write(std::string("abcd"), boost::bind(&client::handle_write, this, boost::asio::placeholders::error)); should fix the problem.
Probably, the string type passed as the t argument of the async_write template method is not std::string but std::wstring and it is serialized not as an ASCII string ("abcd") but as an unsigned short vector and 97 98 99 100 is a decimal representation of the ASCII characters a, b, c and d.

Resources