Need help reading a file that has a book format - c++11

I been struggling reading a file that has a book format. The file is broken into pages by a string that looks like this "---------------------------------------". What I'm trying to do is read all the words and keep track of the page number and the word number of every word, the file looks like this
my file
For example if the word "hello" appears in the first page it would looks like this " hello 1,1" because it's the first word on page one if the word would appear in the second page the output will be "hello 2,1"
This is the code I have so far
ifstream inFile;
inFile.open("GreatExpectations.txt");
if(!inFile.is_open()) {
cout << "Error, can't open the file....."<<endl;
return 1;
}
string word;
string separator;
separator = "----------------------------------------";
int pageNum = 0, wordNum = 0;
IndexMap myMap(200000);
string title;
for(int i = 0; i < 2; i++) {
getline(inFile, title);
cout << title <<endl;
}
while(!inFile.eof())
{
inFile >> word;
//cout << word << " ";
wordNum++;
if(word == separator)
pageNum++;
}

If I well understood your question here is my approach to the problem:
#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
using namespace std;
struct WordInfo {
string word;
int pageNum;
int wordNum;
};
int main() {
ifstream inFile;
inFile.open("GreatExpectations.txt");
if(!inFile.is_open()) {
cout << "Error, can't open the file....."<<endl;
return 1;
}
int pageNum = 1, wordNum = 0;
vector<WordInfo> words; // container for words with informations
// read the file line-by-line
for(string line; getline(inFile, line);) {
// detect the page separator which is a line from hyphens only
if(line.find_first_not_of("-") == string::npos) {
pageNum++;
wordNum = 0;
continue;
}
// process the line word-by-word
stringstream ss(line);
for(string word; getline(ss, word, ' ');) {
wordNum++;
words.push_back({ word, pageNum, wordNum });
}
}
return 0;
}
The WordInfo structure will hold informations from a word as you wanted. It's not the optimal but more simple to read the file line-by-line, so there is two loops: the first reads a line and the second reads the words from that line. If a word is read, it will be pushed into the words vector for later use. That's all.

Related

feeding a stringstream to a class member function

I am a newbie trying to learn by doing. I want to feed a stringstream into a class member function called "print()" but I get errors. Once this works, I can proceed to write more class member functions that work with the data I feed them.
For now I have created a class that has a member function 'print'.
class Month
{
public:
string m_month;
void print()
{
cout << m_month << endl;
}
};
Next, I initialized 12 months:
Month month1 = { "January" };
Month month2 = { "February" };
Month month3 = { "March" };
etc.
When I call "month1.print();" it prints January which is correct.
I used stringstream and a for loop to concatenate month + 1 to 12 and I want to feed the stringstream to the print function.
stringstream os;
string mValue = "month";
int iValue = 1;
for(int i = 0; i < 12; ++i)
{
os << mValue << "" << iValue << "\n";
iValue += 1;
}
However, the stringstream can't be combined with the print function.
os.print(); and os.str().print();
result in "error: ‘std::stringstream {aka class std::__cxx11::basic_stringstream}’ has no member named ‘print’"
Converting the stringstream to char and then feeding it into the print function results in "error: request for member ‘print’ in ‘cstr’, which is of non-class type ‘const char*’"
const string tmp = os.str();
const char* cstr = tmp.c_str();
cstr.print();
Long story short: What I am trying to do is concatenate month + 1 to 12 and feed that to the class member function "print". This seems trivial but I can't get it to work. Any suggestions?
Edit: Full code:
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
class Month
{
public:
string m_month;
void print()
{
cout << m_month << endl;
}
};
int main()
{
Month month1 = { "January" };
Month month2 = { "February" };
Month month3 = { "March" };
Month month4 = { "April" };
Month month5 = { "May" };
Month month6 = { "June" };
Month month7 = { "July" };
Month month8 = { "August" };
Month month9 = { "September" };
Month month10 = { "October" };
Month month11 = { "November" };
Month month12 = { "December" };
stringstream os; // Initialize stringstream "os"
string mValue = "month"; // Initialize mValue "month"
int iValue = 1; // Initialize iValue "1"
for(int i = 0; i < 12; ++i)
{
os << mValue << "" << iValue << "\n"; // Glue mValue and iValue
// together
iValue += 1; // Increment iValue by one
}
send stringstream "os" to the print function // mock code: Here I want to send month1.print(); month2.print(); etc. to the print function. The output should be January, February etc.
return 0;
}
This doesn't do what you think it does:
for(int i = 0; i < 12; ++i)
{
// iValue is actually unnecessary. You could have just used (i + 1)
os << mValue << "" << iValue << "\n";
iValue += 1;
}
All this does is fill the stringstream with the string:
"month1\nmonth2\nmonth3\nmonth4\nmonth5\nmonth6\nmonth7\nmonth8\nmonth9\nmonth10\nmonth11\nmonth12"
Your intent seemed to be to concat a number to the end of a "month" string, and have them act as the month1, month2... variables that you defined above. That's not how it works. You can't (and shouldn't) try to "dynamically" reference variables like that. In os.print();, the stringstream doesn't act as Month simply because it contains a string with the same name as a Month variable.
Instead, add the variables to some kind of container (like a std::vector), and loop over it:
std::vector<Month> months{ month1, month2, month3, ..., month12 }
for (unsigned int i = 0; i < months.size(); i++)
{
months[i].print();
}
A stringstream should be thought of as a stream like any other, except that it happens to be text and held in memory. So it's cheap to convert it to a string, and in fact they are often used for building strings.
But a "Print" method of a class has no business knowing that a stream is a stringstream. All it should care is that it gets a stream which is text, and is input. In fact the former is a bit hard to enforce due to historical weaknesses stretching back a long way. If you just read the stream byte by byte, pass to std::cout, and terminate on EOF then that's probably OK.

Pointer not printing char[] array

I'm writing some code to take in a string, turn it into a char array and then print back to the user (before passing to another function).
Currently the code works up to dat.toCharArray(DatTim,datsize); however, the pointer does not seem to be working as the wile loop never fires
String input = "Test String for Foo";
InputParse(input);
void InputParse (String dat)
//Write Data
datsize = dat.length()+1;
const char DatTim[datsize];
dat.toCharArray(DatTim,datsize);
//Debug print back
for(int i=0;i<datsize;i++)
{
Serial.write(DatTim[i]);
}
Serial.println();
//Debug pointer print back
const char *b;
b=*DatTim;
while (*b)
{
Serial.print(*b);
b++;
}
Foo(*DatTim);
I can't figure out the difference between what I have above vs the template code provided by Majenko
void PrintString(const char *str)
{
const char *p;
p = str;
while (*p)
{
Serial.print(*p);
p++;
}
}
The expression *DatTim is the same as DatTim[0], i.e. it gets the first character in the array and then assigns it to the pointer b (something the compiler should have warned you about).
Arrays naturally decays to pointers to their first element, that is DatTim is equal to &DatTim[0].
The simple solution is to simply do
const char *b = DatTim;

What could cause my program to only read 2 lines of a 3 line input

I have a program that is supposed to take in a paragraph like
Testing#the hash#tag
#program!#when #beginning? a line
or #also a #comma,
and output something like
#the
#tag
#program
#when
#beginning
#also
#comma,
I feel like the logic makes sense, but obviously not because the program never seems to get into the line of input. The problem is almost definitely in the last source file below.
Here is the main source program
#include "HashTagger.h"
#include <string>
#include <iostream>
using namespace hw02;
using namespace std;
int main() {
// Construct an object for extracting the
// hashtags.
HashTagger hashTagger;
// Read the standard input and extract the
// hashtags.
while (true) {
// Read one line from the standard input.
string line;
getline(cin, line);
if (!cin) {
break;
}
// Get all of the hashtags on the line.
hashTagger.getTags(line);
}
// Print the hashtags.
hashTagger.printTags();
// Return the status.
return 0;
}
my header file
#ifndef HASHTAGGER_H
#define HASHTAGGER_H
#include <string>
namespace hw02 {
class HashTagger {
public:
void getTags(std::string line);
void printTags();
private:
std::string hashtags_;
};
}
#endif
and a source file
the test in the source file seems to show that the program only gets to the second line and then stops before grabbing the last 2 hashtags
#include "HashTagger.h"
#include <iostream>
using namespace std;
using namespace hw02;
void HashTagger::getTags(string line) {
// Loop over all characters in a line that can begin a hashtag
int b = 0;
string hashtags_ = "";
for (unsigned int j = 0; j < line.length(); ++j) {
char c = line.at(j);
// if "#" is found assign beginning of capture to b
if (c == '#') {
b = j;
// if the beginning is less than the end space, newline, ".", "?", or "!" found, add substring of the hashtag to hashtags_
}
if (b < j && (c == ' ' || c == '\n' || c == '.' || c == '?' || c == '!' )) {
hashtags_ = hashtags_ + "\n" + line.substr(b, j - b + 1);
b = 0;
//Test// cout << b << "/" << j << "/" << c << "/" << hashtags_ << "/" << endl;
}
}
}
void HashTagger::printTags() {
// print out hashtags_ to the console
cout << hashtags_ << endl;
}
You are redeclaring hashtags_ inside your getTags function. Therefore, all string modifications operate on a local variable instead of the class member variable.
Change the line
string hashtags_ = "";
to
hashtags_ = "";
in order to avoid the redeclaration and operate on the class member variable used for the output later on.
Also, make sure that your input is terminated with two newline characters (\n\n), to avoid breaking out of the main loop too early, or move your check and break statement after the getTags call:
while (true) {
// Read one line from the standard input.
string line;
getline(cin, line);
// Get all of the hashtags on the line.
hashTagger.getTags(line);
if (!cin) {
break;
}
}

Extract trailing int from string containing other characters

I have a problem in regards of extracting signed int from string in c++.
Assuming that i have a string of images1234, how can i extract the 1234 from the string without knowing the position of the last non numeric character in C++.
FYI, i have try stringstream as well as lexical_cast as suggested by others through the post but stringstream returns 0 while lexical_cast stopped working.
int main()
{
string virtuallive("Images1234");
//stringstream output(virtuallive.c_str());
//int i = stoi(virtuallive);
//stringstream output(virtuallive);
int i;
i = boost::lexical_cast<int>(virtuallive.c_str());
//output >> i;
cout << i << endl;
return 0;
}
How can i extract the 1234 from the string without knowing the position of the last non numeric character in C++?
You can't. But the position is not hard to find:
auto last_non_numeric = input.find_last_not_of("1234567890");
char* endp = &input[0];
if (last_non_numeric != std::string::npos)
endp += last_non_numeric + 1;
if (*endp) { /* FAILURE, no number on the end */ }
auto i = strtol(endp, &endp, 10);
if (*endp) {/* weird FAILURE, maybe the number was really HUGE and couldn't convert */}
Another possibility would be to put the string into a stringstream, then read the number from the stream (after imbuing the stream with a locale that classifies everything except digits as white space).
// First the desired facet:
struct digits_only: std::ctype<char> {
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white-space:
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except digits, which are digits
std::fill(&rc['0'], &rc['9'], std::ctype_base::digit);
// and '.', which we'll call punctuation:
rc['.'] = std::ctype_base::punct;
return &rc[0];
}
};
Then the code to read the data:
std::istringstream virtuallive("Images1234");
virtuallive.imbue(locale(locale(), new digits_only);
int number;
// Since we classify the letters as white space, the stream will ignore them.
// We can just read the number as if nothing else were there:
virtuallive >> number;
This technique is useful primarily when the stream contains a substantial amount of data, and you want all the data in that stream to be interpreted in the same way (e.g., only read numbers, regardless of what else it might contain).

What Time Is This Returning

Deep in the sauce here. I haven't worked with time to much so I'm a little confused here. I know there is FILETIME and SYSTEMTIME. What I am trying to get at this point (because it might change) are file that are less than a 20 seconds old. This returning the files and their size and something in seconds, What I'd like to know is where it is filtering by time if it is, and how can I adjust it to suit my needs. Thank you.
using namespace std;
typedef vector<WIN32_FIND_DATA> tFoundFilesVector;
std::wstring LastWriteTime;
int getFileList(wstring filespec, tFoundFilesVector &foundFiles)
{
WIN32_FIND_DATA findData;
HANDLE h;
int validResult=true;
int numFoundFiles = 0;
h = FindFirstFile(filespec.c_str(), &findData);
if (h == INVALID_HANDLE_VALUE)
return 0;
while (validResult)
{
numFoundFiles++;
foundFiles.push_back(findData);
validResult = FindNextFile(h, &findData);
}
return numFoundFiles;
}
void showFileAge(tFoundFilesVector &fileList)
{
unsigned _int64 fileTime, curTime, age;
tFoundFilesVector::iterator iter;
FILETIME ftNow;
//__int64 nFileSize;
//LARGE_INTEGER li;
//li.LowPart = ftNow.dwLowDateTime;
//li.HighPart = ftNow.dwHighDateTime;
CoFileTimeNow(&ftNow);
curTime = ((_int64) ftNow.dwHighDateTime << 32) + ftNow.dwLowDateTime;
for (iter=fileList.begin(); iter<fileList.end(); iter++)
{
fileTime = ((_int64)iter->ftLastWriteTime.dwHighDateTime << 32) + iter->ftLastWriteTime.dwLowDateTime;
age = curTime - fileTime;
cout << "FILE: '" << iter->cFileName << "', AGE: " << (_int64)age/10000000UL << " seconds" << endl;
}
}
int main()
{
string fileSpec = "*.*";
tFoundFilesVector foundFiles;
tFoundFilesVector::iterator iter;
int foundCount = 0;
getFileList(L"c:\\Mapper\\*.txt", foundFiles);
getFileList(L"c:\\Mapper\\*.jpg", foundFiles);
foundCount = foundFiles.size();
if (foundCount)
{
cout << "Found "<<foundCount<<" matching files.\n";
showFileAge(foundFiles);
}
system("pause");
return 0;
}
I don't know what you've done to try to debug this but your code doesn't work at all. The reason is you're passing getFileList() a wstring but then passing that to the ANSI version of FindFirstFile(). Unless you #define UNICODE or use the appropriate compiler option, all system calls will expect char *, not UNICODE.
The easiest fix is to simply change the declaration of getFileList() to this:
int getFileList(const char * filespec, tFoundFilesVector &foundFiles)
Change the call to FindFirstFile() to this:
h = FindFirstFile((LPCSTR)filespec, &findData);
And then change the calls to it to this:
getFileList("c:\\Mapper\\*.txt", foundFiles);
getFileList("c:\\Mapper\\*.jpg", foundFiles);
Your other option is to switch all char strings to wide chars, but either way you need to be consistent throughout. Once you do that the program works as expected.
As for your final question, your program is not filtering by time at all.
Not quite an answer, but you might want to read about file system tunneling.
It may prevent you from what you're trying to do in some situations.

Resources