Parse a particular number of lines - ruby

I'm trying to read through a file, find a certain pattern and then grabbing a set number of lines of text after the line that contains that pattern. Not really sure how to approach this.

If you want the n number of lines after the line matching pattern in the file filename:
lines = File.open(filename) do |file|
line = file.readline until line =~ /pattern/ || file.eof;
file.eof ? nil : (1..n).map { file.eof ? nil : file.readline }.compact
end
This should handle all cases, like the pattern not present in the file (returns nil) or there being less than n lines after the matching lines (the resulting array containing the last lines of the file.)

First parse the file into lines. Open, read, split on the line break
lines = File.open(file_name).read.split("\n")
Then get index
index = line.index{|x| x.match(/regex_pattern/)}
Where regex_pattern is the pattern that you are looking for. Use the index as a starting point and then the second argument is the number of lines (in this case 5)
lines[index, 5]
It will return an array of 'lines'
You could combine it a bit more to reduce the number of lines. but I was attempting to keep it readable.

If you're not tied to Ruby, grep -A 12 trivet will show the 12 lines after any line with trivet in it. Any regex will work in place of "trivet"

matched = false;
num = 0;
res = "";
new File(filename).each_line { |line|
if (matched) {
res += line+"\n";
num++;
if (num == num_lines_desired) {
break;
}
} elsif (line.match(/regex/)) {
matched = true;
}
}
This has the advantage of not needing to read the whole file in the event of a match.
When done, res will hold the desired lines.

in rails (only difference is how I generate the file object)
file = File.open(File.join(Rails.root, 'lib', 'file.json'))
#convert file into an array of strings, with \n as the separator
line_ary = file.readlines
line_count = line_ary.count
i = 0
#or however far up the document you want to be...you can get very fancy with this or just do it manually
hsh = {}
line_count.times do |l|
child_id = JSON.parse(line_ary[i])
i += 1
parent_ary = JSON.parse(line_ary[i])
i += 1
hsh[child_id] = parent_ary
end
haha I've said too much that should definitely get you started

Related

Insert multiple characters in string at once

Where as str[] will replace a character, str.insert will insert a character at a position. But it requires two lines of code:
str = "COSO17123456"
str.insert 4, "-"
str.insert 7, "-"
=> "COSO-17-123456"
I was thinking how to do this in one line of code. I came up with the following solution:
str = "COSO17123456"
str.each_char.with_index.reduce("") { |acc,(c,i)| acc += c + ( (i == 3 || i == 5) ? "-" : "" ) }
=> "COSO-17-123456
Is there a built-in Ruby helper for this task? If not, should I stick with the insert option rather than combining several iterators?
Use each to iterate over an array of indices:
str = "COSO17123456"
[4, 7].each { |i| str.insert i, '-' }
str #=> "COSO-17-123456"
You can uses slices and .join:
> [str[0..3], str[4..5],str[6..-1]].join("-")
=> "COSO-17-123456"
Note that the index after the first one (between 3 and 4) will be different since you are not inserting earlier insertion first. ie, more natural (to me anyway...)
You will insert at the absolute index of the original string -- not the moving relative index as insertions are made.
If you want to insert at specific absolute index values, you can also use ..each_with_index and control the behavior character by character:
str2 = ""
tgts=[3,5]
str.split("").each_with_index { |c,idx| str2+=c; str2+='-' if tgts.include? idx }
Both of the above create a new string.
String#insert returns the string itself.
This means you can chain the method calls, which can be a prettier and more efficient if you only have to do it a couple of times like in your example:
str = "COSO17123456".insert(4, "-").insert(7, "-")
puts str
COSO-17-123456
Your reduce version can be therefore more concisely written as:
[4,7].reduce(str) { |str, idx| str.insert(idx, '-') }
I'll bring one more variation to the table, String#unpack:
new_str = str.unpack("A4A2A*").join('-')
# or with String#%
new_str = "%s-%s-%s" % str.unpack("A4A2A*")

How do I split byte sequence in ruby and keep the delimeter?

I am reading a database file with dynamically sized columns in HEX in Ruby. I can successfully split the file into records by using this script:
open_file = IO.binread(path + file_name)
record_delimeters = ['FAFA', 'FEFE', 'FDFD']
# regex with bytes is kinda finicky.. So I went this route to avoid the pitfalls of escape characters and gsub... If anyone knows a better way to do this part, I am up for suggestions as well..
final_reg = '['
record_delimeters.each_with_index do |delim, index|
standard_string = '\xFA-\xFA'
standard_string[2,2] = delim[0,2]
standard_string[7,2] = delim[2,2]
unless index == 0
final_reg += '|'
end
final_reg += standard_string
end
final_reg += ']+'
reg = Regexp.new final_reg.encode('UTF-8'), Regexp::IGNORECASE | Regexp::MULTILINE, 'n'
records = open_file.split(reg);nil
However, I would like to keep my delimiters as reference because the delimiter denotes all of the 'type' contents of the record. ie: 'uint, int, word, etc...'.
Ultimately I want the records to look like this:
["\xFE\xFE\x00\xF4\x35...", "\xFA\xFA\x03\x4F\x7A...", ...]
OR this:
["\xFE\xFE", "\x00\xF4\x35...", "\xFA\xFA", "\x03\x4F\x7A...", ...]
BUT DEFINITELY NOT THIS(Which is what I have):
["\x00\xF4\x35...", "\x03\x4F\x7A...", ...]

Parse file, find a string and store next values

I need to parse a file according to different rules.
The file contains several lines.
I go through the file line by line. When I find a specific string, I have to store the data present in the next lines until a specific character is found.
Example of file:
start {
/* add comment */
first_step {
sub_first_step {
};
sub_second_step {
code = 50,
post = xxx (aaaaaa,
bbbbbb,
cccccc,
eeeeee),
number = yyyy (fffffff,
gggggg,
jjjjjjj,
ppppppp),
};
So, in this case:
File.open(#file_to_convert, "r").each_line do |line|
In "line" I have my current line. I need to:
1) find when the line contains the string "xxx"
if line.include?("union") then
Correct?
2) store the next values (e.g.: aaaa, bbbb, ccccc,eeee) in an array until I find the character ")". This highlights that the section is finished.
I think we I reach the line with the string "xxxx" I have to iterate the next lines inside the block "if".
Try this:
file_contents = File.read(#file_to_convert)
lines = file_contents[/xxx \(([^)]+)\)/, 1].split
# => ["aaaaaa,", "bbbbbb,", "cccccc,", "eeeeee"]
The regex (xxx \(([^)]+)\)) takes all the text after xxx ( until the next ), and split splits it into its items.
It think this is what you are looking for:
looking = true
results = []
File.open(#file_to_convert, "r").each_line do |line|
if looking
if line.include?("xxx")
looking = false
results << line.scan(/\(([^,]*)/x)
end
else
if line.include?(")")
results << line.strip.delete('),')
break
else
results << line.strip.delete(',')
end
end
end
puts results

Read a file into an associative array

I want to be able to read the file into an associative array where I can access the elements by the column head name.
My file is formatted as follows:
KeyName Val1Name Val2Name ... ValMName
Key1 Val1-1 Val2-1 ... ValM-1
Key2 Val1-2 Val2-2 ... ValM-2
Key3 Val1-3 Val2-3 ... ValM-3
.. .. .. .. ..
KeyN Val1-N Val2-N ... ValM-N
The only problem is I don't have a clue how to do it. So far I have:
scores = File.read("scores.txt")
lines = scores.split("\n")
lines.each { |x|
y = x.to_s.split(' ')
}
Which gets close to what I want, but still am unable to get it into the format that is usable for me.
f = File.open("scores.txt") #get an instance of the file
first_line = f.gets.chomp #get the first line in the file (header)
first_line_array = first_line.split(/\s+/) #split the first line in the file via whitespace(s)
array_of_hash_maps = f.readlines.map do |line|
Hash[first_line_array.zip(line.split(/\s+/))]
end
#read the remaining lines of the file via `IO#readlines` into an array, split each read line by whitespace(s) into an array, and zip the first line with them, then convert it into a `Hash` object, and return a collection of the `Hash` objects
f.close #close the file
puts array_of_hash_maps #print the collection of the Hash objects to stdout
Can be done in 3 lines (This is why I love Ruby)
scores = File.readlines('/scripts/test.txt').map{|l| l.split(/\s+/)}
headers = scores.shift
scores.map!{|score|Hash[headers.zip(score)]}
now scores contains your hash array
Here is a verbose explanation
#open the file and read
#then split on new line
#then create an array of each line by splitting on space and stripping additional whitespace
scores = File.open('scores.txt', &:read).split("\n").map{|l| l.split(" ").map(&:strip)}
#shift the array to capture the header row
headers = scores.shift
#initialize an Array to hold the score hashs
scores_hash_array = []
#loop through each line
scores.each do |score|
#map the header value based on index with the line value
scores_hash_array << Hash[score.map.with_index{|l,i| [headers[i],l]}]
end
#=>[{"KeyName"=>"Key1", "Val1Name"=>"Val1-1", "Val2Name"=>"Val2-1", "..."=>"...", "ValMName"=>"ValM-1"},
{"KeyName"=>"Key2", "Val1Name"=>"Val1-2", "Val2Name"=>"Val2-2", "..."=>"...", "ValMName"=>"ValM-2"},
{"KeyName"=>"Key3", "Val1Name"=>"Val1-3", "Val2Name"=>"Val2-3", "..."=>"...", "ValMName"=>"ValM-3"},
{"KeyName"=>"..", "Val1Name"=>"..", "Val2Name"=>"..", "..."=>"..", "ValMName"=>".."},
{"KeyName"=>"KeyN", "Val1Name"=>"Val1-N", "Val2Name"=>"Val2-N", "..."=>"...", "ValMName"=>"ValM-N"}]
scores_hash_array now has a hash for each row in the sheet.
You can try something like this:-
enter code here
fh = File.open("scores.txt","r")
rh={} #result Hash
fh.readlines.each{|line|
kv=line.split(/\s+/)
puts kv.length
rh[kv[0]] = kv[1..kv.length-1].join(",") #***store the values joined by ","
}
puts rh.inspect
fh.close
If you want to get an array of values,replace the last line in loop by
rh[kv[0]] = kv[1..kv.length-1]

How do I get Ruby to search for a pattern on the tail of a local file?

Say I have a file blah.rb which is constantly written to somehow and has patterns like :
bagtagrag" " hellobello " blah0 blah1 " trag kljesgjpgeagiafw blah2 " gneo" whatttjtjtbvnblah3
Basically, it's garbage. But I want to check for the blah that keeps on coming up and find the latest value i.e. number in front of the blah.
Hence, something like :
grep "blah"{$1} | tail var/test/log
My file is at location var/test/log and as you can see, I need to get the number in front of the blah.
def get_last_blah("filename")
// Code to get the number after the last blah in the less of the filename
end
def display_the_last_blah()
puts get_last_blah("var/test/log")
end
Now, I could just keep on reading the file and performing something akin to string pattern search on the entire file again and again. Obtaining the last value, I can then get the number. But what if I only want to look at the added text in the less and not the entire text.
Moreover, is there a quick one-liner or smart command to get this?
Use IO.open to read the file and Enumerable#grep to search the desired text using a regular expression like the following code does:
def get_last_blah(filename)
open(filename) { |f| f.grep(/.*blah(\d).*$/){$1}.last.to_i }
end
puts get_last_blah('var/test/log')
# => 3
The method return the number in from of the last "blah" word of the file. It is reading the entire file but the result is the same as if is done with tail.
If you want to use a proper tail, take a look at the File::Tail gem.
I presume you wish to avoid reading the entire file each time; rather, you want to start at the end and work backward until you find the last string of interest. Here's a way to do that.
Code
BLOCK_SIZE = 30
MAX_BLAH_NBR = 123
def doit(fname, blah_text)
#f = File.new(fname)
#blah_text = blah_text
#chars_to_read = BLOCK_SIZE + #blah_text.size + MAX_BLAH_NBR.to_s.size
ptr = #f.size
block_size = BLOCK_SIZE
loop do
return nil if ptr.zero?
ptr -= block_size
if ptr < 0
block_size += ptr
ptr = 0
end
blah_nbr = read_block(ptr)
(f.close; return blah_nbr.to_i) if blah_nbr
end
end
def read_block(ptr)
#f.seek(ptr)
#f.read(#chars_to_read)[/.*#{#blah_text}(\d+)/,1]
end
Demo
Let's first write something interesting to a file.
MY_FILE = 'my_file.txt'
text =<<_
Now is the time
for all blah2 to
come to the aid of
their blah3, blah4 enemy or
perhaps do blagh5 something
else like wash the dishes.
_
File.write(MY_FILE, text)
Now run the program:
p doit(MY_FILE, "blah") #=> 4
We expected it to return 4 and it did.
Explanation
doit first instructs read_block to read up to 37 characters, beginning BLOCK_SIZE (30) characters from the end of the file. That's at the beginning of the string
"ng\nelse like wash the dishes.\n"
which is 30 characters long. (I'll explain the "37" in a moment.) read_block finds no text matching the regex (like "blah3"), so returns nil.
As nil was returned, doit makes the same request of read_block, but this time starting BLOCK_SIZE characters closer to the beginning of the file. This time read_block reads the 37 character string:
"y or\nperhaps do blagh5 something\nelse"
but, again, does not match the regex, so returns nil to doit. Notice that it read the seven characters, "ng\nelse", that it read previously. This overlap is necessary in case one 30-character block ended, "...bla" and the next one began "h3...". Hence the need to read more characters (here 37) than the block size.
read_block next reads the string:
"aid of\ntheir blah3, blah4 enemy or\npe"
and finds that "blah4" matches the regex (not "blah3", because the regex is being "greedy" with .*), so it returns "4" to doit, which converts that to the number 4, which it returns.
doit would return nil if the regex did not match any text in the file.

Resources