How do I split byte sequence in ruby and keep the delimeter? - ruby

I am reading a database file with dynamically sized columns in HEX in Ruby. I can successfully split the file into records by using this script:
open_file = IO.binread(path + file_name)
record_delimeters = ['FAFA', 'FEFE', 'FDFD']
# regex with bytes is kinda finicky.. So I went this route to avoid the pitfalls of escape characters and gsub... If anyone knows a better way to do this part, I am up for suggestions as well..
final_reg = '['
record_delimeters.each_with_index do |delim, index|
standard_string = '\xFA-\xFA'
standard_string[2,2] = delim[0,2]
standard_string[7,2] = delim[2,2]
unless index == 0
final_reg += '|'
end
final_reg += standard_string
end
final_reg += ']+'
reg = Regexp.new final_reg.encode('UTF-8'), Regexp::IGNORECASE | Regexp::MULTILINE, 'n'
records = open_file.split(reg);nil
However, I would like to keep my delimiters as reference because the delimiter denotes all of the 'type' contents of the record. ie: 'uint, int, word, etc...'.
Ultimately I want the records to look like this:
["\xFE\xFE\x00\xF4\x35...", "\xFA\xFA\x03\x4F\x7A...", ...]
OR this:
["\xFE\xFE", "\x00\xF4\x35...", "\xFA\xFA", "\x03\x4F\x7A...", ...]
BUT DEFINITELY NOT THIS(Which is what I have):
["\x00\xF4\x35...", "\x03\x4F\x7A...", ...]

Related

Ruby script which can replace a string in a binary file to a different, but same length string?

I would like to write a Ruby script (repl.rb) which can replace a string in a binary file (string is defined by a regex) to a different, but same length string.
It works like a filter, outputs to STDOUT, which can be redirected (ruby repl.rb data.bin > data2.bin), regex and replacement can be hardcoded. My approach is:
#!/usr/bin/ruby
fn = ARGV[0]
regex = /\-\-[0-9a-z]{32,32}\-\-/
replacement = "--0ca2765b4fd186d6fc7c0ce385f0e9d9--"
blk_size = 1024
File.open(fn, "rb") {|f|
while not f.eof?
data = f.read(blk_size)
data.gsub!(regex, str)
print data
end
}
My problem is that when string is positioned in the file that way it interferes with the block size used by reading the binary file. For example when blk_size=1024 and my 1st occurance of the string begins at byte position 1000, so I will not find it in the "data" variable. Same happens with the next read cycle. Should I process the whole file two times with different block size to ensure avoiding this worth case scenario, or is there any other approach?
I would posit that a tool like sed might be a better choice for this. That said, here's an idea: Read block 1 and block 2 and join them into a single string, then perform the replacement on the combined string. Split them apart again and print block 1. Then read block 3 and join block 2 and 3 and perform the replacement as above. Split them again and print block 2. Repeat until the end of the file. I haven't tested it, but it ought to look something like this:
File.open(fn, "rb") do |f|
last_block, this_block = nil
while not f.eof?
last_block, this_block = this_block, f.read(blk_size)
data = "#{last_block}#{this_block}".gsub(regex, str)
last_block, this_block = data.slice!(0, blk_size), data
print last_block
end
print this_block
end
There's probably a nontrivial performance penalty for doing it this way, but it could be acceptable depending on your use case.
Maybe a cheeky
f.pos = f.pos - replacement.size
at the end of the while loop, just before reading the next chunk.

Read a file into an associative array

I want to be able to read the file into an associative array where I can access the elements by the column head name.
My file is formatted as follows:
KeyName Val1Name Val2Name ... ValMName
Key1 Val1-1 Val2-1 ... ValM-1
Key2 Val1-2 Val2-2 ... ValM-2
Key3 Val1-3 Val2-3 ... ValM-3
.. .. .. .. ..
KeyN Val1-N Val2-N ... ValM-N
The only problem is I don't have a clue how to do it. So far I have:
scores = File.read("scores.txt")
lines = scores.split("\n")
lines.each { |x|
y = x.to_s.split(' ')
}
Which gets close to what I want, but still am unable to get it into the format that is usable for me.
f = File.open("scores.txt") #get an instance of the file
first_line = f.gets.chomp #get the first line in the file (header)
first_line_array = first_line.split(/\s+/) #split the first line in the file via whitespace(s)
array_of_hash_maps = f.readlines.map do |line|
Hash[first_line_array.zip(line.split(/\s+/))]
end
#read the remaining lines of the file via `IO#readlines` into an array, split each read line by whitespace(s) into an array, and zip the first line with them, then convert it into a `Hash` object, and return a collection of the `Hash` objects
f.close #close the file
puts array_of_hash_maps #print the collection of the Hash objects to stdout
Can be done in 3 lines (This is why I love Ruby)
scores = File.readlines('/scripts/test.txt').map{|l| l.split(/\s+/)}
headers = scores.shift
scores.map!{|score|Hash[headers.zip(score)]}
now scores contains your hash array
Here is a verbose explanation
#open the file and read
#then split on new line
#then create an array of each line by splitting on space and stripping additional whitespace
scores = File.open('scores.txt', &:read).split("\n").map{|l| l.split(" ").map(&:strip)}
#shift the array to capture the header row
headers = scores.shift
#initialize an Array to hold the score hashs
scores_hash_array = []
#loop through each line
scores.each do |score|
#map the header value based on index with the line value
scores_hash_array << Hash[score.map.with_index{|l,i| [headers[i],l]}]
end
#=>[{"KeyName"=>"Key1", "Val1Name"=>"Val1-1", "Val2Name"=>"Val2-1", "..."=>"...", "ValMName"=>"ValM-1"},
{"KeyName"=>"Key2", "Val1Name"=>"Val1-2", "Val2Name"=>"Val2-2", "..."=>"...", "ValMName"=>"ValM-2"},
{"KeyName"=>"Key3", "Val1Name"=>"Val1-3", "Val2Name"=>"Val2-3", "..."=>"...", "ValMName"=>"ValM-3"},
{"KeyName"=>"..", "Val1Name"=>"..", "Val2Name"=>"..", "..."=>"..", "ValMName"=>".."},
{"KeyName"=>"KeyN", "Val1Name"=>"Val1-N", "Val2Name"=>"Val2-N", "..."=>"...", "ValMName"=>"ValM-N"}]
scores_hash_array now has a hash for each row in the sheet.
You can try something like this:-
enter code here
fh = File.open("scores.txt","r")
rh={} #result Hash
fh.readlines.each{|line|
kv=line.split(/\s+/)
puts kv.length
rh[kv[0]] = kv[1..kv.length-1].join(",") #***store the values joined by ","
}
puts rh.inspect
fh.close
If you want to get an array of values,replace the last line in loop by
rh[kv[0]] = kv[1..kv.length-1]

How do I get Ruby to search for a pattern on the tail of a local file?

Say I have a file blah.rb which is constantly written to somehow and has patterns like :
bagtagrag" " hellobello " blah0 blah1 " trag kljesgjpgeagiafw blah2 " gneo" whatttjtjtbvnblah3
Basically, it's garbage. But I want to check for the blah that keeps on coming up and find the latest value i.e. number in front of the blah.
Hence, something like :
grep "blah"{$1} | tail var/test/log
My file is at location var/test/log and as you can see, I need to get the number in front of the blah.
def get_last_blah("filename")
// Code to get the number after the last blah in the less of the filename
end
def display_the_last_blah()
puts get_last_blah("var/test/log")
end
Now, I could just keep on reading the file and performing something akin to string pattern search on the entire file again and again. Obtaining the last value, I can then get the number. But what if I only want to look at the added text in the less and not the entire text.
Moreover, is there a quick one-liner or smart command to get this?
Use IO.open to read the file and Enumerable#grep to search the desired text using a regular expression like the following code does:
def get_last_blah(filename)
open(filename) { |f| f.grep(/.*blah(\d).*$/){$1}.last.to_i }
end
puts get_last_blah('var/test/log')
# => 3
The method return the number in from of the last "blah" word of the file. It is reading the entire file but the result is the same as if is done with tail.
If you want to use a proper tail, take a look at the File::Tail gem.
I presume you wish to avoid reading the entire file each time; rather, you want to start at the end and work backward until you find the last string of interest. Here's a way to do that.
Code
BLOCK_SIZE = 30
MAX_BLAH_NBR = 123
def doit(fname, blah_text)
#f = File.new(fname)
#blah_text = blah_text
#chars_to_read = BLOCK_SIZE + #blah_text.size + MAX_BLAH_NBR.to_s.size
ptr = #f.size
block_size = BLOCK_SIZE
loop do
return nil if ptr.zero?
ptr -= block_size
if ptr < 0
block_size += ptr
ptr = 0
end
blah_nbr = read_block(ptr)
(f.close; return blah_nbr.to_i) if blah_nbr
end
end
def read_block(ptr)
#f.seek(ptr)
#f.read(#chars_to_read)[/.*#{#blah_text}(\d+)/,1]
end
Demo
Let's first write something interesting to a file.
MY_FILE = 'my_file.txt'
text =<<_
Now is the time
for all blah2 to
come to the aid of
their blah3, blah4 enemy or
perhaps do blagh5 something
else like wash the dishes.
_
File.write(MY_FILE, text)
Now run the program:
p doit(MY_FILE, "blah") #=> 4
We expected it to return 4 and it did.
Explanation
doit first instructs read_block to read up to 37 characters, beginning BLOCK_SIZE (30) characters from the end of the file. That's at the beginning of the string
"ng\nelse like wash the dishes.\n"
which is 30 characters long. (I'll explain the "37" in a moment.) read_block finds no text matching the regex (like "blah3"), so returns nil.
As nil was returned, doit makes the same request of read_block, but this time starting BLOCK_SIZE characters closer to the beginning of the file. This time read_block reads the 37 character string:
"y or\nperhaps do blagh5 something\nelse"
but, again, does not match the regex, so returns nil to doit. Notice that it read the seven characters, "ng\nelse", that it read previously. This overlap is necessary in case one 30-character block ended, "...bla" and the next one began "h3...". Hence the need to read more characters (here 37) than the block size.
read_block next reads the string:
"aid of\ntheir blah3, blah4 enemy or\npe"
and finds that "blah4" matches the regex (not "blah3", because the regex is being "greedy" with .*), so it returns "4" to doit, which converts that to the number 4, which it returns.
doit would return nil if the regex did not match any text in the file.

Ruby Truncate Words + Long Text

I have the following function which accepts text and a word count and if the number of words in the text exceeded the word-count it gets truncated with an ellipsis.
#Truncate the passed text. Used for headlines and such
def snippet(thought, wordcount)
thought.split[0..(wordcount-1)].join(" ") + (thought.split.size > wordcount ? "..." : "")
end
However what this function doesn't take into account is extremely long words, for instance...
"Helloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
world!"
I was wondering if there's a better way to approach what I'm trying to do so it takes both word count and text size into consideration in an efficient way.
Is this a Rails project?
Why not use the following helper:
truncate("Once upon a time in a world far far away", :length => 17)
If not, just reuse the code.
This is probably a two step process:
Truncate the string to a max length (no need for regex for this)
Using regex, find a max words quantity from the truncated string.
Edit:
Another approach is to split the string into words, loop through the array adding up
the lengths. When you find the overrun, join 0 .. index just before the overrun.
Hint: regex ^(\s*.+?\b){5} will match first 5 "words"
The logic for checking both word and char limits becomes too convoluted to clearly express as one expression. I would suggest something like this:
def snippet str, max_words, max_chars, omission='...'
max_chars = 1+omision.size if max_chars <= omission.size # need at least one char plus ellipses
words = str.split
omit = words.size > max_words || str.length > max_chars ? omission : ''
snip = words[0...max_words].join ' '
snip = snip[0...(max_chars-3)] if snip.length > max_chars
snip + omit
end
As other have pointed out Rails String#truncate offers almost the functionality you want (truncate to fit in length at a natural boundary), but it doesn't let you independently state max char length and word count.
First 20 characters:
>> "hello world this is the world".gsub(/.+/) { |m| m[0..20] + (m.size > 20 ? '...' : '') }
=> "hello world this is t..."
First 5 words:
>> "hello world this is the world".gsub(/.+/) { |m| m.split[0..5].join(' ') + (m.split.size > 5 ? '...' : '') }
=> "hello world this is the world..."

Parse a particular number of lines

I'm trying to read through a file, find a certain pattern and then grabbing a set number of lines of text after the line that contains that pattern. Not really sure how to approach this.
If you want the n number of lines after the line matching pattern in the file filename:
lines = File.open(filename) do |file|
line = file.readline until line =~ /pattern/ || file.eof;
file.eof ? nil : (1..n).map { file.eof ? nil : file.readline }.compact
end
This should handle all cases, like the pattern not present in the file (returns nil) or there being less than n lines after the matching lines (the resulting array containing the last lines of the file.)
First parse the file into lines. Open, read, split on the line break
lines = File.open(file_name).read.split("\n")
Then get index
index = line.index{|x| x.match(/regex_pattern/)}
Where regex_pattern is the pattern that you are looking for. Use the index as a starting point and then the second argument is the number of lines (in this case 5)
lines[index, 5]
It will return an array of 'lines'
You could combine it a bit more to reduce the number of lines. but I was attempting to keep it readable.
If you're not tied to Ruby, grep -A 12 trivet will show the 12 lines after any line with trivet in it. Any regex will work in place of "trivet"
matched = false;
num = 0;
res = "";
new File(filename).each_line { |line|
if (matched) {
res += line+"\n";
num++;
if (num == num_lines_desired) {
break;
}
} elsif (line.match(/regex/)) {
matched = true;
}
}
This has the advantage of not needing to read the whole file in the event of a match.
When done, res will hold the desired lines.
in rails (only difference is how I generate the file object)
file = File.open(File.join(Rails.root, 'lib', 'file.json'))
#convert file into an array of strings, with \n as the separator
line_ary = file.readlines
line_count = line_ary.count
i = 0
#or however far up the document you want to be...you can get very fancy with this or just do it manually
hsh = {}
line_count.times do |l|
child_id = JSON.parse(line_ary[i])
i += 1
parent_ary = JSON.parse(line_ary[i])
i += 1
hsh[child_id] = parent_ary
end
haha I've said too much that should definitely get you started

Resources