Ruby. NUL chars after reading simple file - ruby

I'm reading simple text files using Ruby for further regex processing and suddenly I see that str NUL after each printable character. Totally lost, where it comes from, I tested typing simple text in Notepad, saving as txt file and still getting those. I'm on W machine, didn't have this before.
How I can process it, probably replace them, not sure how to refer to them.
My regex doesn't work with them, tried several ways, using SciTE for run.
e.g. use presented as uNULsNULeNUL and not equal to use
puts File.read(file_name)
puts '____________________'
File.open(file_name, "r") do |f|
f.each_line do |line|
puts 'Line.....' + line
end
end
---------------------- below pic on content of file and output:

This file is probably in UTF-16 format. You'll need to read it in that way:
File.open(file_name, "r:UTF-16LE") do |f|
# ...
end
That format is the default in Windows.
You can always fix this by re-saving the file as UTF-8.

Related

ruby match string or space/tab at the beginning of a line and insert uniq lines to a file

This is my code:
File.open(file_name) do |file|
file.each_line do |line|;
if line =~ (/SAPK/) || (line =~ /^\t/ and tabs == true) || (line =~ /^ / and spaces == true)
file = File.open("./1.log", "a"); puts "found a line #{line}"; file.write("#{line}".lstrip!)
end
end
end
File.open("./2.log", "a") { |file| file.puts File.readlines("./1.log").uniq }
I want to to insert all the lines that match a specific string, start with tab or start with space to a file 1.log, all lines should be with space/tab at the beginning so I removed them.
I want get the unique lines in 1.log and write them to 2.log
It will be great if some can go over the code and tell me if something is not correct.
When using files in Ruby, what is the difference between the w+ and a modes?
I know:
w+ - Create an empty file for both reading and writing.
a - Append to a file.The file is created if it does not exist.
But both options append to the file, I though w+ should behave like >, instead of >> ,so I guess w+ also like >> ?
Thanks !!
There's a lot of confusion in this code and it isn't helped by your habit of jamming things on a single line for no reason. Do try and keep your code clean, as the functionality should be obvious. There's also a lot of quirky anti-patterns like stringifying strings and testing booleans vs booleans which you should really avoid doing.
One thing you'll want to do is employ Tempfile for those situations where you need an intermediate file.
Here's a reworked version that's cleaned up:
Tempfile.open do |temp|
File.open(file_name) do |input|
input.each_line do |line|
if line.match(/SAPK/) || (line.match(/^\t/) and tabs) || (line.match(/^ /) and spaces)
puts "found a line #{line}"
temp.write(line.lstrip!)
end
end
end
File.open("./2.log", "w+") do |file|
# Rewind the temporary file to read data back
temp.rewind
file.write(temp.readlines.uniq)
end
end
Now a and w+ are largely similar, it's just two ways that are offered for people familiar with whatever notation. It's like how Array has both length and size which do the same thing. Pick one and use it consistently or your code will be confusing.
My criticism over things like x == true is because something that narrowly specific usually means that x could take on a multitude of values and true is one particular case we're trying to handle, something that implies that we should be aware it might be false and many other things. It's a red herring and will only invite questions.

Ruby file output issue

I have written some ruby to automate batch file creation, the problem lies with the resulting output in the GUI;
The files are outputted, but the formatting looks very strange indeed. Also the filenames are all ending in '.txt' but MacOS does not see it this way. i.e. You cannot click to open in Textedit.
Code is as follows;
puts "Please enter amount of files to create: "
file_count = gets.to_i
puts "Thanks! Enter a filename header: "
file_head = gets
puts "And a suffix?"
suffix = gets
puts "Please input your target directory:"
Dir.chdir(gets.chomp)
while file_count != 0
filename = "#{file_head}_#{file_count}#{suffix}"
File.open(filename, "w") {|x| x.write("This is #{filename}.")}
file_count -= 1
end
Tips on shortening length or refactoring are always welcome.
The Kernel#gets documentation contains:
The separator is included with the contents of each record.
By default the separator is a newline (see $/). So both file_head and suffix end with a newline character. filename also does, of course. Thus the extension of your files isn't .txt as it's actually ".txt\n" (in Ruby string notation). The application takes the newline character literally and continues writing the filename on a new line. That's why it looks so strange!
You already know a way to fix it: call String#chomp to get rid of the trailing newline (the separator). See the line in your code that contains Dir.chdir for an example.

Decode base64 string and write to file

I'm trying to read file which contains encoded base64 string and write decoded output into another file. My Input.txt contains a base64 string, something like:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48cmV2aWV3LWNhc2UgY3JlYXRl\r\nZGF0ZT0iMTMvTWFyLzIwMTQgMDk6MDQ6NTEiIHN5c3RlbT0iVHJhZmlndXJhX1RlbXBsYXRlX01h\r\nbmFnZW1lbnRfdjUuMSIgYmF0Y2hpZD0iMCIgdHJhbnNhY3Rpb25ubz0iMSIgYmF0Y2huYW1lPSJH\r\nVUlEKGY1NWRmYjgwODQ4ZDQ3YzliZmVhYTg3YzMyZDQyNDQyKS1HTE9CQUxfSU5WT0lDRS1FTkdM\r\nSVNIIiB2ZXJzaW9uPSI1LjEuMi44ICBidWlsZCA1MjUzOSI+PHRyYW5zYWN0aW9uPjxvYmplY3Rz\r\nPjxvYmplY3QgY2xhc3M9IlRoXzE5NTQwMDk3OTRfNl9tb2RlbCIgbmFtZT0ibW9kZWwiPjxwcm9w\r\nZXJ0eSBuYW1lPSJUaXRsZSIgdmFsdWU9IlByb3Zpc2lvbmFsIEludm9pY2UiLz48cHJvcGVydHkg\r\nbmFtZT0iR3JvdXBDb21wYW55Ij48b2JqZWN0IGNsYXNzPSJUaF8xOTU0MDA5Nzk0XzZfR3JvdXBD\r\nb21wYW55IiBuYW1lPSJHcm91cENvbXBhbnkiPjxwcm9wZXJ0eSBuYW1lPSJOYW1lIiB2YWx1ZT0i\r\nVHJhZmlndXJhIEJlaGVlciBCLlYuIEFNU1RFUkRBTSwgQlJBTkNIIE9GRklDRSBMVUNFUk5FIi8+\r\nPHByb3BlcnR5IG5hbWU9IkFkZHJlc3MiIHZhbHVlPSJaPz9yaWNoc3RyYXNzZSAzMSIgaW5kZXg9\r\nIjAiLz48cHJvcGVydHkgbmFtZT0iQWRkcmVzcyIgdmFsdWU9Ikx1Y2VybmUiIGluZGV4PSIxIi8+\r\nPHByb3BlcnR5IG5hbWU9IkFkZHJlc3MiIHZhbHVlPSI2MDAyIiBpbmRleD0iMiIvPjxwcm9wZXJ0\r\neSBuYW1lPSJBZGRyZXNzIiB2YWx1ZT0iU3dpdHplcmxhbmQiIGluZGV4PSIzIi8+PHByb3BlcnR5\r\nIG5hbWU9IlBob25lTnVtYmVyIiB2YWx1
This string is created on server side with Java apache codec.binary.Base64 library. This string is captured with Fiddler when two different web services communicates with each other. Sometimes I have no access to the another web-service, that is why I sniff messages between services. In addition I use Ruby to automate some routine tasks and decided this time to use Ruby again. For encoding captured base64 string I use next snippet of code:
require "base64"
content = File.read('Input.txt')
decode_base64_content = Base64.decode64(content)
File.open("Output.txt", "wb") do |f|
f.write(decode_base64_content)
end
But output looks malformed, like <?xml version="1.0" encoding="UTF-8"?><review-case create®vFFSТ#2фЦ"у#B“ЈCЈS"7—7FVУТ%G&f–wW&хFVЧЖFUфЦзnagement_v5.1" ba and so on. Can you please advise on what I'm doing wrong? I use Ruby 1.9.3 on Windows 7 and Ubuntu 12.04.
I do not know how you manage to do this, but the line endings \r\n in your string seem to be there as 4-byte character sequences, not as 2-byte escaped CRLF. If I copy your file into a ruby string with single ticks:
unescaped='PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48cmV2aWV3LWNhc2UgY3JlYXRl\r\nZGF0ZT0iMTMvTWFyLzIwMTQgMDk6MDQ6NTEiIHN5c3RlbT0iVHJhZmlndXJhX1RlbXBsYXRlX01h\r\nbmFnZW1lbnRfdjUuMSIgYmF0Y2hpZD0iMCIgdHJhbnNhY3Rpb25ubz0iMSIgYmF0Y2huYW1lPSJH'
Base64.decode64(unescaped)
#=> garbled text for every second line
if I do the same with double quotes (which respect the escape sequences):
escaped="PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48cmV2aWV3LWNhc2UgY3JlYXRl\r\nZGF0ZT0iMTMvTWFyLzIwMTQgMDk6MDQ6NTEiIHN5c3RlbT0iVHJhZmlndXJhX1RlbXBsYXRlX01h\r\nbmFnZW1lbnRfdjUuMSIgYmF0Y2hpZD0iMCIgdHJhbnNhY3Rpb25ubz0iMSIgYmF0Y2huYW1lPSJH"
Base64.decode64(escaped)
#=> all is well that ends well
Therefore the problem seems to occur when you write the file. It can be amended in Ruby though:
unescaped='PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48cmV2aWV3LWNhc2UgY3JlYXRl\r\nZGF0ZT0iMTMvTWFyLzIwMTQgMDk6MDQ6NTEiIHN5c3RlbT0iVHJhZmlndXJhX1RlbXBsYXRlX01h\r\nbmFnZW1lbnRfdjUuMSIgYmF0Y2hpZD0iMCIgdHJhbnNhY3Rpb25ubz0iMSIgYmF0Y2huYW1lPSJH'
Base64.decode64(unescaped)
escaped=unescaped.gsub('\\r', "\r").gsub('\\n', "\n")
Base64.decode64(escaped)
#=> now you should be fine again
but of course the correct solution would be to store the file correctly.
Given your current file the following should work:
require "base64"
content = File.read('Input.txt')
content.gsub!('\\r', "\r")
content.gsub!('\\n', "\n")
decode_base64_content = Base64.decode64(content)
File.open("Output.txt", "wb") do |f|
f.write(decode_base64_content)
end
Please do post some output if it does not.

How to work with multibyte strings read from a file in Ruby? See inside

I'm a newbie to Ruby with Perl background. And I got some problems with .reverse of a multibyte string read from an utf-8 encoded file.
Code:
#!C:\Ruby200-x64\bin\ruby
puts "Content-Type:text/plain;charset=utf8\n\n" #I execute it via CGI
$: << "."
puts "А это строка".reverse #mb-string output is pretty fine
#but when I do the following, it fails;
file = File.open('test_rb_file.txt','r')
file.each_line {|line| puts line.reverse} #puts line works good, but not puts line.reverse
The script itself is in utf-8. The test_rb_file.txt is in utf-8. So, when I try to output a multibyte string - all ok, but when I try to read it from a file and reverse - it fails.
I think, specifying the encoding of the file I read from (test_rb_file.txt) would do the trick, but I don't know how to do that so far. And I maybe wrong about that.
Any ideas to fix the problem? Thanks in advance
UPD All fixed, thanks everyone. Following thing sets the encoding of input file and fixes the problem:
file = File.open('test_rb_file.txt','r:UTF-8')
File.open('test_rb_file.txt','r:UTF-8')
To check encoding of a String "YourString".encoding

How to detect and handle different EOL in Ruby?

I am trying to process a CSV file that can either be generated with CF or LF as an EOL marker. When I try to read the file with
infile = File.open('my.csv','r')
while line = infile.gets
...
The entire 20MB file is read in as one line.
How can I detect and handle properly?
TIA
I would slurp the file, normalize the input, and then feed it to CSV:
raw = File.open('my.csv','rb',&:read).gsub("\r\n","\n")
CSV.parse(raw) do |row|
# use row here...
end
The above uses File.open instead of IO.read due to slow file reads on Windows Ruby.
When in doubt, use a regex.
> "how\r\nnow\nbrown\r\ncow\n".split /[\r\n]+/
=> ["how", "now", "brown", "cow"]
So, something like
infile.read.split(/[\r\n]+/).each do |line|
. . .
end
Now, it turns out that the standard library CSV already understands mixed line endings, so you could just do:
CSV.parse(infile.read).each do |line|
. . .

Resources