How to get a block at an offset in the IO.foreach loop in ruby? - ruby

I'm using the IO.foreach loop to find a string using regular expressions. I want to append the next block (next line) to the file_names list. How can I do that?
file_names = [""]
IO.foreach("a.txt") { |block|
if block =~ /^file_names*/
dir = # get the next block
file_names.append(dir)
end
}
Actually my input looks like this:
file_names[174]:
name: "vector"
dir_index: 1
mod_time: 0x00000000
length: 0x00000000
file_names[175]:
name: "stl_bvector.h"
dir_index: 2
mod_time: 0x00000000
length: 0x00000000
I have a list of file_names, and I want to capture each of the name, dir_index, mod_time and length properties and put them into the files_names array index according to the file_names index in the text.

You can use #each_cons to get the value of the next 4 rows from the text file:
files = IO.foreach("text.txt").each_cons(5).with_object([]) do |block, o|
if block[0] =~ /file_names.*/
o << block[1..4].map{|e| e.split(':')[1]}
end
end
puts files
#=> "vector"
# 1
# 0x00000000
# 0x00000000
# "stl_bvector.h"
# 2
# 0x00000000
# 0x00000000
Keep in mind that the files array contains subarrays of 4 elements. If the : symbol occurs later in the lines, you could replace the third line of my code with this:
o << block[1..4].map{ |e| e.partition(':').last.strip}
I also added #strip in case you want to remove the whitespaces around the values. With this line changed, the actual array will look something like this:
p files
#=>[["\"vector\"", "1", "0x00000000", "0x00000000"], ["\"stl_bvector.h\"", "2", "0x00000000", "0x00000000"]]
(the values don't contain the \ escape character, that's just the way #p shows it).
Another option, if you know the pattern 1 filename, 4 values will be persistent through the entire text file and the textfile always starts with a filename, you can replace #each_cons with #each_slice and remove the regex completely, this will also speed up the entire process:
IO.foreach("text.txt").each_slice(5).with_object([]) do |block, o|
o << block[1..4].map{ |e| e.partition(':').last.strip }
end

It's actually pretty easy to carve up a series of lines based on a pattern using slice_before:
File.readlines("data.txt").slice_before(/\Afile_names/)
Now you have an array of arrays that looks like:
[
[
"file_names[174]:\n",
" name: \"vector\"\n",
" dir_index: 1\n",
" mod_time: 0x00000000\n",
" length: 0x00000000\n"
],
[
"file_names[175]:\n",
" name: \"stl_bvector.h\"\n",
" dir_index: 2\n",
" mod_time: 0x00000000\n",
" length: 0x00000000"
]
]
Each of these groups could be transformed further, like for example into a Ruby Hash using those keys.

Related

n-way file merge in Ruby

I have several files from Java application (Gigaspaces logs) from multiple hosts which I need to merge based on date/time value.
Since every log file is already sorted, I need to get a first record from every file into an array, decide which one have a key with minimum value, merge it to result file, get a new line from the same file & repeat.
Record's definition - first line have a key and all following lines have no key, example:
2015-04-05 02:33:42,135 GSC SEVERE [com.gigaspaces.lrmi] - LRMI Transport Protocol caught server exception caused by [/10.0.1.2:46949] client.; Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:311)
at com.gigaspaces.lrmi.SmartByteBufferCache.get(SmartByteBufferCache.java:50)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelNoneBlocking(Reader.java:410)
at com.gigaspaces.lrmi.nio.Reader.readBytesNonBlocking(Reader.java:644)
at com.gigaspaces.lrmi.nio.Reader.bytesToStream(Reader.java:509)
at com.gigaspaces.lrmi.nio.Reader.readRequest(Reader.java:112)
at com.gigaspaces.lrmi.nio.ChannelEntry.readRequest(ChannelEntry.java:121)
at com.gigaspaces.lrmi.nio.Pivot.handleReadRequest(Pivot.java:445)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleRead(ReadSelectorThread.java:81)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleConnection(ReadSelectorThread.java:45)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.doSelect(AbstractSelectorThread.java:74)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.run(AbstractSelectorThread.java:50)
at java.lang.Thread.run(Thread.java:662)
Ideally, result file should contain key, directory/filename.log & rest of the record.
Questions:
How to get a record from file in Ruby?
How to open multiple files and iterate through them using algorithm described above?
Code
Read all lines from all files that begin with a date-time string into an array, then sort the array by the date-time strings:
require 'date'
def get_key_rows(*fnames)
fnames.flat_map do |fname|
IO.foreach(fname).with_object([]) do |s, arr|
dt = DateTime.strptime(s[0, 19], '%Y-%m-%d %H:%M:%S') rescue nil
arr << [s[0, 19], fname, s[19..-1].rstrip] if dt
end
end.sort_by(&:first)
end
This method returns an array of three-element arrays. Each three-element array corresponds to a key line in one of the files, comprised of the date/time string, the filename and the remainder of the part of the line that follows the date/time string. Note that it is not necessary for key lines to be ordered within each file. The method uses:
DateTime#strptime to identify key rows;
Enumerable#flat_map, rather than Enumerable#map followed by Array#flatten; and
Enumerable#sort_by to sort the key rows by date/time.
Regarding sort_by, note that the strings can be sorted by the date/time strings, rather than by corresponding DateTime objects, because the form of the date/time string is 'yyyy-mm-dd hh-mm-ss'.
Examples
Let's create some files:
IO.write("f0", "2015-04-05 02:33:42,135 more stuff in f0\n" +
"more in f0\n" +
"2015-04-05 04:33:42,135 more stuff in f0\n" +
"even more in f0")
#=> 108
IO.write("f1", "2015-04-04 02:33:42,135 more stuff in f1\n" +
"2015-04-06 02:33:42,135 more stuff in f1\n" +
"more in f1")
#=> 92
IO.write("f2", "something in f2\n" +
"2015-04-05 02:33:43,135 more stuff in f2\n" +
"even more in f2\n" +
"2015-04-04 02:23:42,135 more stuff in f2")
#=> 113
get_key_rows('f0', 'f1', 'f2')
#=> [["2015-04-04 02:23:42", "f2", ",135 more stuff in f2"],
# ["2015-04-04 02:33:42", "f1", ",135 more stuff in f1"],
# ["2015-04-05 02:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-05 02:33:43", "f2", ",135 more stuff in f2"],
# ["2015-04-05 04:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-06 02:33:42", "f1", ",135 more stuff in f1"]]

Ruby replace array list

I have two strings:
packages="­linux-imag­e-3.2.0-4-­amd64 linux­-libc-dev linux­-headers-3­.2.0-4-amd­64 linux­-headers-3­.2.0-4-com­mon dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
I need a string pkgs that has the content of packages without exclusion like this:
pkgs="­linux-imag­e-3.2.0-4-­amd64 linux­-libc-dev linux­-headers-3­.2.0-4-amd­64 linux­-headers-3­.2.0-4-com­mon"
I tried the following code:
pkgs = packages.gsub!( /(?<!^|,)#{exclusion}(?!,|$)/, '\1')
which does not seem to be working. What would be the best working solution in this case?
packages="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
(packages.split - exclusion.split).join(" ") # => "linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"
You need your variables to be arrays, not strings. Then you can just use the - operator to "subtract" the items in exclusion from packages:
packages = [ "­linux-imag­e-3.2.0-4-­amd64",
"linux­-libc-dev",
"linux­-headers-3­.2.0-4-amd­64",
"linux­-headers-3­.2.0-4-com­mon",
"dnsutils",
"mysql-server-5.5" ]
exclusion = [ "dnsutils", "mysql-server-5.5" ]
remaining = packages - exclusion
# => [ "­linux-imag­e-3.2.0-4-­amd64",
# "linux­-libc-dev",
# "linux­-headers-3­.2.0-4-amd­64",
# "linux­-headers-3­.2.0-4-com­mon" ]
If you then need the values in a single string, join them together with the join method:
remaining_str = remaining.join(" ")
# => "­linux-imag­e-3.2.0-4-­amd64 linux­-libc-dev linux­-headers-3­.2.0-4-amd­64 linux­-headers-3­.2.0-4-com­mon"
If you want to keep it simple, you can always split these strings into arrays, and join the difference.
(packages.split - exclusion.split).join ' '
String's split method will default to space characters. This give you two arrays, where you subtract the any values that exist in the both the first and second array from the first array. You then join this new array with space characters.
Longer example:
packages="linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common dnsutils mysql-server-5.5"
exclusion="dnsutils mysql-server-5.5"
one = packages.split
# >> ["linux-image-3.2.0-4-amd64", "linux-libc-dev", "linux-headers-3.2.0-4-amd64", "linux-headers-3.2.0-4-common", "dnsutils", "mysql-server-5.5"]
two = exclusion.split
# >> ["dnsutils", "mysql-server-5.5"]
difference = one - two
# >> ["linux-image-3.2.0-4-amd64", "linux-libc-dev", "linux-headers-3.2.0-4-amd64", "linux-headers-3.2.0-4-common"]
finished = difference.join ' '
# >> "linux-image-3.2.0-4-amd64 linux-libc-dev linux-headers-3.2.0-4-amd64 linux-headers-3.2.0-4-common"

Recovering hex data from a large log-file using Ruby and RegEx

I'm trying to filter/append lines of hex data from a large log-file, using Ruby and RegEx.
The lines of the log-file that I need look like this:
Data: 10 55 61 (+ lots more hex data)
I want to add all of the hex data, for further processing later. The regex /^\sData:(.+)/ should do the trick.
My Ruby-program looks like this:
puts "Start"
fileIn = File.read("inputfile.txt")
fileOut = File.new("outputfile.txt", "w+")
fileOut.puts "Start of regex data\n"
fileIn.each_line do
dataLine = fileIn.match(/^\sData:(.+)/).captures
fileOut.write dataLine
end
fileOut.puts "\nEOF"
fileOut.close
puts "End"
It works - sort of - but the lines in the output file are all the same, just repeating the result of the first regex match.
What am I doing wrong?
You are iterating over the same entire file. You need to iterate over the line.
fileIn.each_line do |line|
dataLine = line.match(/^\sData:(.+)/).captures
fileOut.write dataLine
end

Join array of strings into 1 or more strings each within a certain char limit (+ prepend and append texts)

Let's say I have an array of Twitter account names:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
And a prepend and append variable:
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
How can I turn this into an array of as few strings as possible each with a maximum length of 140 characters, starting with the prepend text, ending with the append text, and in between the Twitter account names all starting with an #-sign and separated with a space. Like this:
tweets = ['Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday', 'Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday', 'Check out these cool people: #example18 #example19 #example20 #FollowFriday']
(The order of the accounts isn't important so theoretically you could try and find the best order to make the most use of the available space, but that's not required.)
Any suggestions? I'm thinking I should use the scan method, but haven't figured out the right way yet.
It's pretty easy using a bunch of loops, but I'm guessing that won't be necessary when using the right Ruby methods. Here's what I came up with so far:
# Create one long string of #usernames separated by a space
tmp = twitter_accounts.map!{|a| a.insert(0, '#')}.join(' ')
# alternative: tmp = '#' + twitter_accounts.join(' #')
# Number of characters left for mentioning the Twitter accounts
length = 140 - (prepend + append).length
# This method would split a string into multiple strings
# each with a maximum length of 'length' and it will only split on empty spaces (' ')
# ideally strip that space as well (although .map(&:strip) could be use too)
tweets = tmp.some_method(' ', length)
# Prepend and append
tweets.map!{|t| prepend + t + append}
P.S.
If anyone has a suggestion for a better title let me know. I had a difficult time summarizing my question.
The String rindex method has an optional parameter where you can specify where to start searching backwards in a string:
arr = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
str = arr.map{|name|"##{name}"}.join(' ')
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
max_chars = 140 - prepend.size - append.size
until str.size <= max_chars do
p str.slice!(0, str.rindex(" ", max_chars))
str.lstrip! #get rid of the leading space
end
p str unless str.empty?
I'd make use of reduce for this:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
prepend = 'Check out these cool people:'
append = '#FollowFriday'
# Extra -1 is for the space before `append`
max_content_length = 140 - prepend.length - append.length - 1
content_strings = string.reduce([""]) { |result, target|
result.push("") if result[-1].length + target.length + 2 > max_content_length
result[-1] += " ##{target}"
result
}
tweets = content_strings.map { |s| "#{prepend}#{s} #{append}" }
Which would yield:
"Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday"
"Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday"
"Check out these cool people: #example18 #example19 #example20 #FollowFriday"

How to write some value to a text file in ruby based on position

I need some help is some unique solution. I have a text file in which I have to replace some value based on some position. This is not a big file and will always contain 5 lines with fixed number of length in all the lines at any given time. But I have to specficaly replace soem text in some position only. Further, i can also put in some text in required position and replace that text with required value every time. I am not sure how to implement this solution. I have given the example below.
Line 1 - 00000 This Is Me 12345 trying
Line 2 - 23456 This is line 2 987654
Line 3 - This is 345678 line 3 67890
Consider the above is the file I have to use to replace some values. Like in line 1, I have to replace '00000' with '11111' and in line 2, I have to replace 'This' with 'Line' or any require four digit text. The position will always remain the same in text file.
I have a solution which works but this is for reading the file based on position and not for writing. Can someone please give a solution similarly for wrtiting aswell based on position
Solution for reading the file based on position :
def read_var file, line_nr, vbegin, vend
IO.readlines(file)[line_nr][vbegin..vend]
end
puts read_var("read_var_from_file.txt", 0, 1, 3) #line 0, beginning at 1, ending at 3
#=>308
puts read_var("read_var_from_file.txt", 1, 3, 6)
#=>8522
I have also tried this solution for writing. This works but I need it to work based on position or based on text present in the specific line.
Explored solution to wirte to file :
open(Dir.pwd + '/Files/Try.txt', 'w') { |f|
f << "Four score\n"
f << "and seven\n"
f << "years ago\n"
}
I made you a working sample anagraj.
in_file = "in.txt"
out_file = "out.txt"
=begin
=>contents of file in.txt
00000 This Is Me 12345 trying
23456 This is line 2 987654
This is 345678 line 3 67890
=end
def replace_in_file in_file, out_file, shreds
File.open(out_file,"wb") do |file|
File.read(in_file).each_line.with_index do |line, index|
shreds.each do |shred|
if shred[:index]==index
line[shred[:begin]..shred[:end]]=shred[:replace]
end
end
file << line
end
end
end
shreds = [
{index:0, begin:0, end:4, replace:"11111"},
{index:1, begin:6, end:9, replace:"Line"}
]
replace_in_file in_file, out_file, shreds
=begin
=>contents of file out.txt
11111 This Is Me 12345 trying
23456 Line is line 2 987654
This is 345678 line 3 67890
=end

Resources