Read files line by line with \r, \n or \r\n as line separator - ruby

I want to process files line by line. However, these files have different line separators: "\r", "\n" or "\r\n". I don't know which one they use or which kind of OS they come from.
I have two solutions:
using bash command to translate these separators to "\n".
cat file |
tr '\r\n' '\n' |
tr '\r' '\n' |
ruby process.rb
read the whole file and gsub these separators
text=File.open('xxx.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
do some thing
end
but the second solution is not good when the file is huge. See reference. Is there any other ruby idiomatic and efficient solution?

I suggest you first determine the line separator. I've assumed that you can do that by reading characters until you encounter "\n" or "\r" (or reach the end of the file, in which case we can regard "\n" as the line separator). If the character "\n" is found, I assume that to be the separator; if "\r" is found I attempt to read the next character. If I can do so and it is "\n", I return "\r\n" as the separator. If "\r" is the last character in the file or is followed by a character other than "\n", I return "\r" as the separator.
def separator(fname)
f = File.open(fname)
enum = f.each_char
c = enum.next
loop do
case c[/\r|\n/]
when "\n" then break
when "\r"
c << "\n" if enum.peek=="\n"
break
end
c = enum.next
end
c[0][/\r|\n/] ? c : "\n"
end
Then process the file line-by-line
def process(fname)
sep = separator(fname)
IO.foreach(fname, sep) { |line| puts line }
end
I haven't converted "\r" or "\r\n" to "\n", but of course you could do that easily. Just open a file for writing and in process read each line and write it to the output file with the default line separator.
Let's try it (for clarity I show the value returned by separator):
fname = "temp"
IO.write(fname, "slash n line 1\nslash n line 2\n")
#=> 30
separator(fname)
#=> "\n"
process(fname)
# slash n line 1
# slash n line 2
IO.write(fname, "slash r line 1\rslash r line 2\r", )
#=> 30
separator(fname)
#=> "\r"
process(fname)
# slash r line 1
# slash r line 2
IO.write(fname, "slash r slash n line 1\r\nslash r slash n line 2\r\n")
#=> 48
separator(fname)
#=> "\r\n"
process(fname)
# slash r slash n line 1
# slash r slash n line 2

Related

Ruby - How to get rid of escape character and "\n" while converting string to hash value?

In Ruby, i'm trying to convert a string to a hash value. It shows up with escapse character and "\n" in the string.
Eg:
hashex = { keyex: 'example "test" line 1
line 2 "test2"'}
puts hashex
It is printing the result as
{:keyex=>"example \"test\" line 1\n line 2 \"test2\""}
I need to get the result as
{ keyex: 'example "test" line 1
line 2 "test2"'}
preserving the newline (not '\n') and the "". Kindly help.
Note
{:keyex=>"example \"test\" line 1\n line 2 \"test2\""}
is just the way Ruby represents the hash. It is 100% the same object as :
{ keyex: 'example "test" line 1
line 2 "test2"'}
even though it might look different.
Code
You could replace "\\n" from inspect with newlines, \" with " and " with ' :
hashex = { keyex: 'example "test" line 1
line 2 "test2"'}
puts hashex.inspect.gsub("\\n", "\n").gsub('"', "'").gsub("\\'",'"')
# {:keyex=>'example "test" line 1
# line 2 "test2"'}

Ruby - Reading a file causes an extra line while printing

How can I avoid a new line when I use puts line + "test"
Example code:
File.open("test.txt", "r") do |f|
f.each_line do |line|
puts line + "test" #=>line1\ntest
#puts "test" + line #=> testline1
end
end
When I use:
puts "test" + line`
It shows:
testline1
(line1 being the only thing in the test.txt)
However,
puts line + "test"
looks like:
test
line1
Is there anyway of stopping it from producing the extra line?
If you want to strip out the newline, use String#chomp to take care of it.
http://apidock.com/ruby/v1_9_3_392/String/chomp
puts line.chomp + "test"
Use String#strip to strip out all the leading and trailing whitespace characters (including new line):
puts line.strip + "test"
# => line1test
To delete only the trailing whitespaces, you can use String#rstrip:
puts line.rstrip + "test"
# => line1test

Check if string1 is before string2 on the same line

I am trying to match comment lines in a c#/sql code. CREATE may come before or after /*. They can be on the same line.
line6 = " CREATE /* this is ACTIVE line 6"
line5 = " charlie /* CREATE inside this is comment 5"
In the first case, it will be an active line; in the second, it will be a comment. I probably can do some kind of charindex, but maybe there is a simpler way
regex1 = /\/\*||\-\-/
if (line1 =~ regex1) then puts "Match comment___" + line6 else puts '____' end
if (line1 =~ regex1) then puts "Match comment___" + line5 else puts '____' end
With the regex
r = /
\/ # match forward slash
\* # match asterisk
\s+ # match > 0 whitespace chars
CREATE # match chars
\b # match word break (to avoid matching CREATED)
/ # extended mode for regex def
you can return an array of the comment lines thus:
[line6, line5].select { |l| l =~ r }
#=> [" charlie /* CREATE inside this is comment 5"]

Why does my IO.write insert a "%" sign at the end of output?

I use this line to read from temp.dat, which contains "100"
fahrenheit = IO.read("temp.dat").to_i * 9 / 5 + 32
Now, to write this result in another file;
Method 1
f = File.new("temp.out", "w")
f.puts fahrenheit
cat temp.out
212
Method 2
IO.write("temp.out", fahrenheit)
cat temp.out
212%
Why does my IO.write insert a “%” sign at the end of output?
It doesn't. Here's the binary content of the file. That % character is the command prompt of your shell, which is confused by the lack of EOL in the file. POSIX-compliant text files should always end lines with end-of-line character.

How to add string "\n" literally at the end of each line in Ruby?

Here is a string str:
str = "line1
line2
line3"
We would like to add string "\n" to the end of each line:
str = "line1 \n
line2 \n
line3 \n"
A method is defined:
def mod_line(str)
s = ""
str.each_line do |l|
s += l + '\\n'
end
end
The problem is that '\n' is a line feed and was not added to the end of the str even with escape \. What's the right way to add '\n' literally to each line?
String#gsub/String#gsub! plus a very simple regular expression can be used to achieve that:
str = "line1
line2
line3"
str.gsub!(/$/, ' \n')
puts str
Output:
line1 \n
line2 \n
line3 \n
The platform-independent solution:
str.gsub(/\R/) { " \\n#{$~}" }
It will search for line-feeds/carriage-returns and replace them with themselves, prepended by \n.
\n needs to be interpreted as a special character. You need to put it in double quotes.
"\n"
Your attempt:
'\\n'
only escapes the backslash, which is actually redundant. With or without escaping on the backslash, it gives you a backslash followed by the letter n.
Also, your method mod_line returns the result of str.each_line, which is the original string str. You need to return the modified string s:
def mod_line(str)
...
s
end
And by the way, be aware that each line of the original string already has "\n" at the end of each line, so you are adding the second "\n" to each line (making it two lines).
This is the closest I got to it.
def mod_line(str)
s = ""
str.each_line do |l|
s += l
end
p s
end
Using p instead of puts leaves the \n on the end of each line.

Resources