Ruby - detecting the end of the read file - ruby

I upload through a form a file and in the controller this file read. My problem is, that I don't know, hot to detect the end of the file (=> when stop a loop). This part of code looks like this:
dat = params[:data]
while(d = dat.read)
puts d
break if d.eof #this doesn't work
end
The result of this part is (except the error about eof) infinity while looping.

From http://ruby-doc.org/core-1.9.3/IO.html#method-i-read:
If length is omitted or is nil, it reads until EOF and the encoding conversion is applied. It returns a string even if EOF is met at beginning.
So I guess you should just do dat.read
Edit: if you want all the lines of the file, use dat.readlines - this will return an Array of Strings

Related

Ruby script which can replace a string in a binary file to a different, but same length string?

I would like to write a Ruby script (repl.rb) which can replace a string in a binary file (string is defined by a regex) to a different, but same length string.
It works like a filter, outputs to STDOUT, which can be redirected (ruby repl.rb data.bin > data2.bin), regex and replacement can be hardcoded. My approach is:
#!/usr/bin/ruby
fn = ARGV[0]
regex = /\-\-[0-9a-z]{32,32}\-\-/
replacement = "--0ca2765b4fd186d6fc7c0ce385f0e9d9--"
blk_size = 1024
File.open(fn, "rb") {|f|
while not f.eof?
data = f.read(blk_size)
data.gsub!(regex, str)
print data
end
}
My problem is that when string is positioned in the file that way it interferes with the block size used by reading the binary file. For example when blk_size=1024 and my 1st occurance of the string begins at byte position 1000, so I will not find it in the "data" variable. Same happens with the next read cycle. Should I process the whole file two times with different block size to ensure avoiding this worth case scenario, or is there any other approach?
I would posit that a tool like sed might be a better choice for this. That said, here's an idea: Read block 1 and block 2 and join them into a single string, then perform the replacement on the combined string. Split them apart again and print block 1. Then read block 3 and join block 2 and 3 and perform the replacement as above. Split them again and print block 2. Repeat until the end of the file. I haven't tested it, but it ought to look something like this:
File.open(fn, "rb") do |f|
last_block, this_block = nil
while not f.eof?
last_block, this_block = this_block, f.read(blk_size)
data = "#{last_block}#{this_block}".gsub(regex, str)
last_block, this_block = data.slice!(0, blk_size), data
print last_block
end
print this_block
end
There's probably a nontrivial performance penalty for doing it this way, but it could be acceptable depending on your use case.
Maybe a cheeky
f.pos = f.pos - replacement.size
at the end of the while loop, just before reading the next chunk.

Ruby: How do you search for a substring, and increment a value within it?

I am trying to change a file by finding this string:
<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]>
and replacing {CLONEINCR} with an incrementing number. Here's what I have so far:
file = File.open('input3400.txt' , 'rb')
contents = file.read.lines.to_a
contents.each_index do |i|contents.join["<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>"] = "<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>" end
file.close
But this seems to go on forever - do I have an infinite loop somewhere?
Note: my text file is 533,952 lines long.
You are repeatedly concatenating all the elements of contents, making a substitution, and throwing away the result. This is happening once for each line, so no wonder it is taking a long time.
The easiest solution would be to read the entire file into a single string and use gsub on that to modify the contents. In your example you are inserting the (zero-based) file line numbers into the CDATA. I suspect this is a mistake.
This code replaces all occurrences of <![CDATA[{CLONEINCR}]]> with <![CDATA[1]]>, <![CDATA[2]]> etc. with the number incrementing for each matching CDATA found. The modified file is sent to STDOUT. Hopefully that is what you need.
File.open('input3400.txt' , 'r') do |f|
i = 0
contents = f.read.gsub('<![CDATA[{CLONEINCR}]]>') { |m|
m.sub('{CLONEINCR}', (i += 1).to_s)
}
puts contents
end
If what you want is to replace CLONEINCR with the line number, which is what your above code looks like it's trying to do, then this will work. Otherwise see Borodin's answer.
output = File.readlines('input3400.txt').map.with_index do |line, i|
line.gsub "<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>",
"<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>"
end
File.write('input3400.txt', output.join(''))
Also, you should be aware that when you read the lines into contents, you are creating a String distinct from the file. You can't operate on the file directly. Instead you have to create a new String that contains what you want and then overwrite the original file.

Extract a single line string having "foo: XXXX"

I have a file with one or more key:value lines, and I want to pull a key:value out if key=foo. How can I do this?
I can get as far as this:
if File.exist?('/file_name')
content = open('/file_name').grep(/foo:??/)
I am unsure about the grep portion, and also once I get the content, how do I extract the value?
People like to slurp the files into memory, which, if the file will always be small, is a reasonable solution. However, slurping isn't scalable, and the practice can lead to excessive CPU and I/O waits as content is read.
Instead, because you could have multiple hits in a file, and you're comparing the content line-by-line, read it line-by-line. Line I/O is very fast and avoids the scalability problems. Ruby's File.foreach is the way to go:
File.foreach('path/to/file') do |li|
puts $1 if li[/foo:\s*(\w+)/]
end
Because there are no samples of actual key/value pairs, we're shooting in the dark for valid regex patterns, but this is the basis for how I'd solve the problem.
Try this:
IO.readlines('key_values.txt').find_all{|line| line.match('key1')}
i would recommend to read the file into array and select only lines you need:
regex = /\A\s?key\s?:/
results = File.readlines('file').inject([]) do |f,l|
l =~ regex ? f << "key = %s" % l.sub(regex, '') : f
end
this will detect lines starting with key: and adding them to results like key = value,
where value is the portion going after key:
so if you have a file like this:
key:1
foo
key:2
bar
key:3
you'll get results like this:
key = 1
key = 2
key = 3
makes sense?
value = File.open('/file_name').read.match("key:(.*)").captures[0] rescue nil
File.read('file_name')[/foo: (.*)/, 1]
#=> XXXX

Using Ruby to automate a large directory system

So I have the following little script to make a file setup for organizing reports that we get.
#This script is to create a file structure for our survey data
require 'fileutils'
f = File.open('CustomerList.txt') or die "Unable to open file..."
a = f.readlines
x = 0
while a[x] != nil
Customer = a[x]
FileUtils.mkdir_p(Customer + "/foo/bar/orders")
FileUtils.mkdir_p(Customer + "/foo/bar/employees")
FileUtils.mkdir_p(Customer + "/foo/bar/comments")
x += 1
end
Everything seems to work before the while, but I keep getting:
'mkdir': Invalid argument - Cust001_JohnJacobSmith(JJS) (Errno::EINVAL)
Which would be the first line from the CustomerList.txt. Do I need to do something to the array entry to be considered a string? Am I mismatching variable types or something?
Thanks in advance.
The following worked for me:
IO.foreach('CustomerList.txt') do |customer|
customer.chomp!
["orders", "employees", "comments"].each do |dir|
FileUtils.mkdir_p("#{customer}/foo/bar/#{dir}")
end
end
with data like so:
$ cat CustomerList.txt
Cust001_JohnJacobSmith(JJS)
Cust003_JohnJacobSmith(JJS)
Cust002_JohnJacobSmith(JJS)
A few things to make it more like the ruby way:
Use blocks when opening a file or iterating through arrays, that way you don't need to worry about closing the file or accessing the array directly.
As noted by #inger, local vars start with lower case, customer.
When you want the value of a variable in a string usign #{} is more rubinic than concatenating with +.
Also note that we took off the trailing newline using chomp! (which changes the var in place, noted by the trailing ! on the method name)

Ruby: UTF-8 incorrect input

I have a .rb file that when run takes a string input for UTF-8, but for some reason the input is modified automatically. Here is an example of what my code looks like:
# encoding :UTF-8
.
.
.
print "Enter a UTF-8 input: "
text = gets.chomp
p text
So, if I input "\n\u001C\u0018\t\u001C", it prints out "\\n\\u001C\\u0018\\t\\u001C" which is not what I inputted!
Curious as I was, I compared the lengths, and it is the same 22. But, I know it is modified because when I run the text through a function in the same file, it reads it as the second one. I know this because when I ran my actual code through irb, it works as intended, but when I run it from the file, it doesn't do what I want.
EDIT: Sean answered the question I had about the printing, but it doesn't explain why when I use the value in text for a function within the same ruby file, it does not see it as it should. In other words, the function works perfectly on irb when I physically input the UTF string. So, if I input "\t\u001C\u001C".xor "key" to the function below, the result should be "bye". Once again, this works in irb, but it doesn't work when I run it from a file! When I run it from the file, it gives me a "'*': negative argument (ArgumentError)" when I don't get any errors running it from irb! Below is the function:
class String
def xor(key)
text = dup
b1 = text.unpack("U*")
b2 = key.unpack("U*")
longest = key.length #[b1.length,b2.length].max
b1 = [0]*(longest-b1.length) + b1
b2 = [0]*(longest-b2.length) + b2
result = b1.zip(b2).map{ |a,b| a^b }
result.pack("U*")
end
end
The reason this is happening is because you are using:
p text
vs
puts text
When you use p, ruby outputs the result of:
puts text.inspect
Which will show you the extra \'s in there that are being used as escape characters. If you just used puts you will see the expected result!
Cheers!

Resources