How to test if std::ifstream reached EOL - c++11

I have an std::ifstream of a file containing space separated values on multiple lines.
Is there anyway to know if there I reached the end of a line when reading from the std::ifstream?

If I get to guess how you are currently reading it
No,
ins >> stuff;
discards whitespace.
You want to do a
std::getline(ins, line_string);
And process it line by line.

Related

Parse CSV data between two strings and include string from line below

I have files containing data sampled at 20Hz. Appended to each line are data packets from an IMU that are not synchronised with the 20Hz data. The IMU data packets have a start marker (255,90) and an end marker (51). I am using the term packet for brevity, they are just coma separated variables. Packet1 is not the same length as packet2 and so on.
"2019-12-08 21:29:11.90",3390323,..[ CSV data ]..,1,"1 1025.357 ",[ incomplete packet from line above ],51,255,90,[ packet1 ],51,255,90,[ packet2 ],51,255,90,[ packet3 ],51,255,90,[ First part of packet4 ]
"2019-12-08 21:29:11.95",3390324,.............,1,"1 1025.367 ",[ Second part of packet4 ],51,255,90,[ packet5 ],51,255,90,[ packet6 ],51,255,90,[ packet7 ],51,255,90,[ First part of packet8 ]
I would like to parse the file so that I extract the time stamp with the IMU packets from the first start marker to after the last start marker and take the partial packet from the next line and append it to the end of the line so the output is in the form:
"2019-12-08 21:29:11.90",255,90,[ packet1 ],51,255,90,[ packet2 ],51,255,90,[ packet3 ],51,255,90,[ First part of packet4 ][ Second part of packet4 ],51
"2019-12-08 21:29:11.95",255,90,[ packet5 ],51,255,90,[ packet6 ],51,255,90,[ packet7 ],51,255,90,[ First part of packet8 ][ Second part of packet8 ],51
As requested I have included my real world example: This is just five lines. The last lines would be deleted as it would remain incomplete.
"2019-08-28 10:43:46.2",10802890,32,22.1991,-64,"1 1015.400 ",0,0,0,0,67,149,115,57,11,0,63,24,51,255,90,12,110,51,255,90,177,109,51,255,90,4,193,141,125,51,255,90,114,51,255,90,8,0,250,63,51,255,90,9,0,46,0,136,251,232,66,0,0,160,64,0,0,0,0,0,0,0,0,233,124,139,56,0,0,0,0,0,0,0,0,195,80,152,184,0,0,0,0
"2019-08-28 10:43:46.25",10802891,32,22.1991,-64,"1 1015.400 ",0,0,0,0,118,76,101,57,11,0,32,249,51,255,90,230,252,51,255,90,53,221,51,255,90,4,193,33,60,51,255,90,104,51,255,90,8,0,23,192,51,255,90,9,0,46,0,200,151,233,66,0,0,160,64,0,0,0,0,0,0,0,0,2,117,157,56,0,0,0,0,0,0,0,0,31,182,140,57,0,0,0,0
"2019-08-28 10:43:46.3",10802892,32,22.1991,-64,"1 1015.400 ",0,0,0,0,151,113,95,57,11,0,72,194,51,255,90,105,41,51,255,90,12,15,51,255,90,4,193,70,8,51,255,90,89,51,255,90,8,0,46,210,51,255,90,9,0,46,0,40,130,234,66,0,0,160,64,0,0,0,0,0,0,0,0,132,206,183,56,0,0,0,0,0,0,0,0,97,191,197,56,0,0,0,0
"2019-08-28 10:43:46.35",10802893,32,22.1991,-64,"1 1015.400 ",0,0,0,0,110,51,95,57,11,0,9,37,51,255,90,78,13,51,255,90,255,246,51,255,90,4,193,52,161,51,255,90,152,51,255,90,8,0,163,85,51,255,90,9,0,46,0,104,30,235,66,0,0,160,64,0,0,0,0,0,0,0,0,49,42,201,56,0,0,0,0,0,0,0,0,82,125,132,57,0,0,0,0
"2019-08-28 10:43:46.4",10802894,32,22.1991,-64,"1 1015.400 ",0,0,0,0,173,103,97,57,11,0,185,229,51,255,90,177,130,51,255,90,57,236,51,255,90,4,193,213,77,51,255,90,252,51,255,90,8,0,9,201,51,255,90,9,0,46,0,200,8,236,66,0,0,160,64,0,0,0,0,0,0,0,0,83,67,227,56,0,0,0,0,0,0,0,0,58,205,192,184,0,0,0,0
I would like to parse the data to the following format:
"2019-08-28 10:43:46.2",255,90,12,110,51,255,90,177,109,51,255,90,4,193,141,125,51,255,90,114,51,255,90,8,0,250,63,51,255,90,9,0,46,0,136,251,232,66,0,0,160,64,0,0,0,0,0,0,0,0,233,124,139,56,0,0,0,0,0,0,0,0,195,80,152,184,0,0,0,0,0,0,0,0,118,76,101,57,11,0,32,249,51
"2019-08-28 10:43:46.25",255,90,230,252,51,255,90,53,221,51,255,90,4,193,33,60,51,255,90,104,51,255,90,8,0,23,192,51,255,90,9,0,46,0,200,151,233,66,0,0,160,64,0,0,0,0,0,0,0,0,2,117,157,56,0,0,0,0,0,0,0,0,31,182,140,57,0,0,0,0,0,0,0,0,151,113,95,57,11,0,72,194,51
"2019-08-28 10:43:46.3",255,90,105,41,51,255,90,12,15,51,255,90,4,193,70,8,51,255,90,89,51,255,90,8,0,46,210,51,255,90,9,0,46,0,40,130,234,66,0,0,160,64,0,0,0,0,0,0,0,0,132,206,183,56,0,0,0,0,0,0,0,0,97,191,197,56,0,0,0,0,0,0,0,0,110,51,95,57,11,0,9,37,51
"2019-08-28 10:43:46.35",255,90,78,13,51,255,90,255,246,51,255,90,4,193,52,161,51,255,90,152,51,255,90,8,0,163,85,51,255,90,9,0,46,0,104,30,235,66,0,0,160,64,0,0,0,0,0,0,0,0,49,42,201,56,0,0,0,0,0,0,0,0,82,125,132,57,0,0,0,0,0,0,0,0,173,103,97,57,11,0,185,229,51
"2019-08-28 10:43:46.4",255,90,177,130,51,255,90,57,236,51,255,90,4,193,213,77,51,255,90,252,51,255,90,8,0,9,201,51,255,90,9,0,46,0,200,8,236,66,0,0,160,64,0,0,0,0,0,0,0,0,83,67,227,56,0,0,0,0,0,0,0,0,58,205,192,184,0,0,0,0
This last line would remain incomplete as there is no next line.
When you are dealing with fields you should be thinking awk. In this case, awk provides a simple solution -- so long as your record format does not change. While generally, that wouldn't matter, here it does...
Why? Because your wanted output does not match your problem description.
Why? Because in all records other than the fourth, the first 51 ending your data to append to the previous line is located in field 19, (with a ',' as the field-separator) while in the fourth record it is found in field 12.
So normally where you would just scan forward though your fields to find the first 51 eliminating the need to know what field the first 51 is found in -- using that method with your data does not produce your wanted results. (the 3rd output line would have a short-remainder from the 4th input line reducing its length and instead forcing the additional packet data to the fourth line of output)
However, sacrificing that flexibility and considering fields 7-19 to be packets belonging with the previous line allows your wanted output to be matched exactly. (it also simplifies the script, but at the cost of flexibility in record format)
A short awk script taking the file to process as its first argument can be written as follows:
#!/usr/bin/awk -f
BEGIN { FS=","; dtfield=""; packets=""; pkbeg=7; pkend=19 }
NF > 1 {
if (length(packets) > 0) { # handle 1st part of next line
for (i=pkbeg; i<=pkend; i++) # append packet data though filed 19
packets=packets "," $i
print dtfield packets "\n" # output the date and packet data
packets="" # reset packet data empty
}
dtfield=$1 # for every line, store date field
for (i=pkend+1; i<=NF; i++) # loop from 20 to end savind data
packets=packets "," $i
}
END {
print dtfield packets "\n" # output final line
}
Don't forget to chmod +x scriptname to make the script executable.
Example Use/Output
(non-fixed width due to output line length -- as was done in the question)
$ ./imupackets.awk imu
"2019-08-28
10:43:46.2",255,90,12,110,51,255,90,177,109,51,255,90,4,193,141,125,51,255,90,114,51,255,90,8,0,250,63,51,255,90,9,0,46,0,136,251,232,66,0,0,160,64,0,0,0,0,0,0,0,0,233,124,139,56,0,0,0,0,0,0,0,0,195,80,152,184,0,0,0,0,0,0,0,0,118,76,101,57,11,0,32,249,51
"2019-08-28
10:43:46.25",255,90,230,252,51,255,90,53,221,51,255,90,4,193,33,60,51,255,90,104,51,255,90,8,0,23,192,51,255,90,9,0,46,0,200,151,233,66,0,0,160,64,0,0,0,0,0,0,0,0,2,117,157,56,0,0,0,0,0,0,0,0,31,182,140,57,0,0,0,0,0,0,0,0,151,113,95,57,11,0,72,194,51
"2019-08-28
10:43:46.3",255,90,105,41,51,255,90,12,15,51,255,90,4,193,70,8,51,255,90,89,51,255,90,8,0,46,210,51,255,90,9,0,46,0,40,130,234,66,0,0,160,64,0,0,0,0,0,0,0,0,132,206,183,56,0,0,0,0,0,0,0,0,97,191,197,56,0,0,0,0,0,0,0,0,110,51,95,57,11,0,9,37,51
"2019-08-28
10:43:46.35",255,90,78,13,51,255,90,255,246,51,255,90,4,193,52,161,51,255,90,152,51,255,90,8,0,163,85,51,255,90,9,0,46,0,104,30,235,66,0,0,160,64,0,0,0,0,0,0,0,0,49,42,201,56,0,0,0,0,0,0,0,0,82,125,132,57,0,0,0,0,0,0,0,0,173,103,97,57,11,0,185,229,51
"2019-08-28
10:43:46.4",255,90,177,130,51,255,90,57,236,51,255,90,4,193,213,77,51,255,90,252,51,255,90,8,0,9,201,51,255,90,9,0,46,0,200,8,236,66,0,0,160,64,0,0,0,0,0,0,0,0,83,67,227,56,0,0,0,0,0,0,0,0,58,205,192,184,0,0,0,0
Look things over and let me know if you have questions.
This the following command pipes the your_input_file into a sed command (GNU sed 4.8) that accomplishes the task. At least it works for me with the files you provided (as they are at the time of writing, empty lines included).
cat your_input_file | sed '
s/,51,\(255,90,.*,51\),255,90,/,51\n,\1,255,90,/
s/\("[^"]*"\).*",\(.*\),51\n/\2,51\n\1/
$!N
H
$!d
${
x
s/^[^"]*//
s/\n\n\([^\n]*\)/,\1\n/g
}'
Clearly you can save the sed script in a file (named for instance myscript.sed)
#!/usr/bin/sed -f
s/,51,\(255,90,.*,51\),255,90,/,51\n,\1,255,90,/
s/\("[^"]*"\).*",\(.*\),51\n/\2,51\n\1/
$!N
H
$!d
${
x
s/^[^"]*//
s/\n\n\([^\n]*\)/,\1\n/g
}
and use it like this: ./myscript.sed your_input_file.
Note that if the first ,51, on each line is guaranteed to be followed by 255,90, (something which your fourth example violates, ",0,0,0,0,110,51,95,), then the first substitution command reduces to s/,51,/,51\n,/.
Please, test it and let me know if I have correctly interpreted your question. I have not explained how the script works for the simple reason that it will take considerable time for me to write down an explanation (I tend to be fairly meticulous when walking through a script, as you can see here, where I created another analogous sed script), and I want to be sure it does represent a solution for you.
Maybe shorter solutions are possible (even with sed itself). I'm not sure awk would allow a shorter solution; it would certainly offer infinitely more readability than sed, but (I think) at the price of length. Indeed, as you can see from another answer, the awk script is more readable but longer (369 characters characters/bytes vs sed script's 160 bytes).
Actually, even in the world of sed scripts, the one above is fairly inefficient, I guess, as it basically preprocesses each lines and keeps appending each one to all the preceding ones, then it does some processing on the long resulting multiline and prints it to screen.

sort -o appends newline to end of file - why?

I'm working on a small text file with a list of words in it that I want to add a new word to, and then sort. The file doesn't have a newline at the end when I start, but does after the sort. Why? Can I avoid this behavior or is there a way to strip the newline back out?
Example:
words.txt looks like
apple
cookie
salmon
I then run printf "\norange" >> words.txt; sort words.txt -o words.txt
I use printf rather than echo figuring that'll avoid the newline, but the file then reads
apple
cookie
orange
salmon
#newline here
If I just run printf "\norange" >> words.txt orange appears at the bottom of the file, with no newline, ie;
apple
cookie
salmon
orange
This behavior is explicitly defined in the POSIX specification for sort:
The input files shall be text files, except that the sort utility shall add a newline to the end of a file ending with an incomplete last line.
As a UNIX "text file" is only valid if all lines end in newlines, as also defined in the POSIX standard:
Text file - A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the newline character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Think about what you are asking sort to do.
You are asking it "take all the lines, and sort them in order."
You've given it a file containing four lines, which it splits to the following strings:
"salmon\n"
"cookie\n"
"orange"
It sorts these for you dutifully:
"cookie\n"
"orange"
"salmon\n"
And it then outputs them as a single string:
"cookie
orangesalmon
"
That is almost certainly exactly what you do not want.
So instead, if your file is missing the terminating newline that it should have had, the sort program understands that, most likely, you still intended that last line to be a line, rather than just a fragment of a line. It appends a \n to the string "orange", making it "orange\n". Then it can be sorted properly, without "orange" getting concatenated with whatever line happens to come immediately after it:
"cookie\n"
"orange\n"
"salmon\n"
So when it then outputs them as a single string, it looks a lot better:
"cookie
orange
salmon
"
You could strip the last character off the file, the one from the end of "salmon\n", using a range of handy tools such as awk, sed, perl, php, or even raw bash. This is covered elsewhere, in places like:
How can I remove the last character of a file in unix?
But please don't do that. You'll just cause problems for all other utilities that have to handle your files, like sort. And if you assume that there is no terminating newline in your files, then you will make your code brittle: any part of the toolchain which "fixes" your error (as sort kinda does here) will "break" your code.
Instead, treat text files the way they are meant to be treated in unix: a sequence of "lines" (strings of zero or more non-newline bytes), each followed by a newline.
So newlines are line-terminators, not line-separators.
There is a coding style where prints and echos are done with the newline leading. This is wrong for many reasons, including creating malformed text files, and causing the output of the program to be concatenated with the command prompt. printf "orange\n" is correct style, and also more readable: at a glance someone maintaining your code can tell you're printing the word "orange" and a newline, whereas printf "\norange" looks at first glance like it's printing a backslash and the phrase "no range" with a missing space.

How can i get only special strings (by condition) from file?

I have a huge text file with strings of a special format. How can i quickly create another file with only strings corresponding to my condition?
for example, file contents:
[2/Nov/2015][rule="myRule"]"GET
http://uselesssotialnetwork.com/picturewithcat.jpg"
[2/Nov/2015][rule="mySecondRule"]"GET
http://anotheruselesssotialnetwork.com/picturewithdog.jpg"
[2/Nov/2015][rule="myRule"]"GET
http://uselesssotialnetwork.com/picturewithzombie.jpg"
and i only need string with "myRule" and "cat"?
I think it should be perl, or bash, but it doesn't matter.
Thanks a lot, sorry for noob question.
Is it correct, that each entry is two lines long? Then you can use sed:
sed -n '/myRule/ {N }; /myRule.*cat/ {p}'
the first rule appends the nextline to patternspace when myRule matches
the second rule tries to match myRule followed by a cat in the patternspace , if found it prints patternspace
If your file is truly huge to the extent that it won't fit in memory (although files up to a few gigabytes are fine in modern computer systems) then the only way is to either change the record separator or to read the lines in pairs
This shows the first way, and assumes that the second line of every pair ends with a double quote followed by a newline
perl -ne'BEGIN{$/ = qq{"\n}} print if /myRule/ and /cat/' huge_file.txt
and this is the second
perl -ne'$_ .= <>; print if /myRule/ and /cat/' huge_file.txt
When given your sample data as input, both methods produce this output
[2/Nov/2015][rule="myRule"]"GET
http://uselesssotialnetwork.com/picturewithcat.jpg"

Ruby scan/gets until EOF

I want to scan unknown number of lines till all the lines are scanned. How do I do that in ruby?
For ex:
put returns between paragraphs
for linebreak add 2 spaces at end
_italic_ or **bold**
The input is not from a 'file' but through the STDIN.
Many ways to do that in ruby.
Most usually, you're gonna wanna process one line at a time, which you can do, for example, with
while line=gets
end
or
STDIN.each_line do |line|
end
or by running ruby with the -n switch, for example, which implies one of the above loops (line is being saved into $_ in each iteration, and you can addBEGIN{}, and END{}, just like in awk—this is really good for one-liners).
I wouldn't do STDIN.read, though, as that will read the whole file into memory at once (which may be bad, if the file is really big.)
Use IO#read (without length argument, it reads until EOF)
lines = STDIN.read
or use gets with nil as argument:
lines = gets(nil)
To denote EOF, type Ctrl + D (Unix) or Ctrl + Z (Windows).

What changes when a file is saved in Kedit for windows that the unix2dos command doesn't do?

So I have a strange question. I have written a script that re-formats data files. I basically create new files with the right column order, spacing, and such. I then unix2dos these files (the program I am formatting these files for is DIPS for windows, and I assume that the files should be ansi). When I go to open the files in the DIPS Program however an error occurs and the file won't open.
When I create the same kind of data file through the DIPS program and open it in note pad, it matches exactly with the data files I have created with my script.
On the other hand if I open the data files that I have created with my script in Kedit first, save them, and then open them in the DIPS program everything works.
My question is what could saving in Kedit possibly do that unix2dos does not?
(Also if I try using note pad or word pad to save instead of Kedit the file doesn't open in DIPS)
Here is what was created using the diff command in unix
"
1,16c1,16
* This file is generated by Dips for Windows.
* The following 2 lines are the Title of this file.
Cobre Panama
Drill Hole B11106-GT
Number of Traverses: 0
Global Orientation is:
DIP/DIPDIRECTION
0.000000 (Declination)
NO QUANTITY
Number of extra columns are: 0
--
* This file is generated by Dips for Windows.
* The following 2 lines are the Title of this file.
Cobre Panama
Drill Hole B11106-GT
Number of Traverses: 0
Global Orientation is:
DIP/DIPDIRECTION
0.000000 (Declination)
NO QUANTITY
Number of extra columns are: 0
18c18
--
440c440
--
442c442
-1
-1
"
Any help would be appreciated! Thanks!
Okay! Figured it out.
Simply when you unix2dos your file you do not strip any space characters in between the last letter in a line and the line break character. When saving in Kedit you do strip the spaces between the last letter in a line and the line break character.
In my script I had a poor programing practice in which I was writing a string like this;
echo "This is an example string " >> outfile.txt
The character count is 32, and if you could see the break line character (chr(10)) the line would read;
This is an example string
If you unix2dos outfile.txt the line looks the same as above but with a different break line character. However when you place the file into Kedit and save it, now the character count is 25 and the line looks like this;
This is an example string
This occurs because Kedit does not preserve spaces at the end of a line. It places the return or line break character at the last letter or "non space" character in a line.
So programs that read literal input like DIPS (i'm guessing) or more widely used AutoCAD scripting will have a real problem with extra spaces before the return character. Basically in AutoCAD scripting a space in a line is treated as a return character. So if you have ten extra spaces at the end of a line it's treated the same as ten returns instead of the one you probably intended.
OH and if this helped you out or though it was good please give me a vote up!
unix2dos converts the line-break characters at the end of each line, from unix line breaks (10) to dos line breaks (13, 10)
Kedit could possible change the encoding of the file (like from ansi to UTF-8)
You can change the encoding of a file with the iconv utility (on a linux box)

Resources