Create array from csv using readlines ruby - ruby

I can’t seem to get this to work
I know I can do this with csv gem but Im trying out new stuff and I want to do it this way. All Im trying to do is to read lines in from a csv and then create one array from each line. I then want to put the second element in each array.
So far I have
filed="/Users/me/Documents/Workbook3.csv"
if File.exists?(filed)
File.readlines(filed).map {|d| puts d.split(",").to_a}
else puts "No file here”
The problem is that this creates one array which has all the lines in it whereas I want a separate array for each line (perhaps an array of arrays?)
Test data
Trade date,Settle date,Reference,Description,Unit cost (p),Quantity,Value (pounds)
04/09/2014,09/09/2014,S5411,Plus500 Ltd ILS0.01 152 # 419,419,152,624.93
02/09/2014,05/09/2014,B5406,Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75,4284.75,150,-6439.08
29/08/2014,03/09/2014,S5398,Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84,1116.84,520,5795.62
What I would like
S5411
B5406
S5398

Let write your data to a file:
s =<<THE_BITTER_END
Trade date,Settle date,Reference,Description,Unit cost (p),Quantity,Value (pounds)
04/09/2014,09/09/2014,S5411,Plus500 Ltd ILS0.01 152 # 419,419,152,624.93
02/09/2014,05/09/2014,B5406,Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75,4284.75,150,-6439.08
29/08/2014,03/09/2014,S5398,Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84,1116.84,520,5795.62
THE_BITTER_END
IO.write('temp',s)
#=> 363
We can then do this:
arr = File.readlines('temp').map { |s| s.split(',') }
#=> [["Trade date", "Settle date", "Reference", "Description", "Unit cost (p)",
"Quantity", "Value (pounds)\n"],
["04/09/2014", "09/09/2014", "S5411",
"Plus500 Ltd ILS0.01 152 # 419", "419", "152", "624.93\n"],
["02/09/2014", "05/09/2014", "B5406",
"Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75",
"4284.75", "150", "-6439.08\n"],
["29/08/2014", "03/09/2014", "S5398",
"Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84", "1116.84",
"520", "5795.62\n"]]
The values you want begin in the second element of arr and is the third element in each of those arrays. Therefore, you can pluck them out as follows:
arr[1..-1].map { |a| a[2] }
#=> ["S5411", "B5406", "S5398"]
Adopting #Stefan's suggestion of putting [2] within the block containing split, we can write this more compactly as follows:
File.readlines('temp')[1..-1].map { |s| s.split(',')[2] }
#=> ["S5411", "B5406", "S5398"]

You can also use built-in class CSV to do this very easily.
require "csv"
s =<<THE_BITTER_END
Trade date,Settle date,Reference,Description,Unit cost (p),Quantity,Value (pounds)
04/09/2014,09/09/2014,S5411,Plus500 Ltd ILS0.01 152 # 419,419,152,624.93
02/09/2014,05/09/2014,B5406,Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75,4284.75,150,-6439.08
29/08/2014,03/09/2014,S5398,Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84,1116.84,520,5795.62
THE_BITTER_END
arr = CSV.parse(s, :headers=>true).collect { |row| row["Reference"] }
p arr
#=> ["S5411", "B5406", "S5398"]
PS: I have borrowed the string from #Cary's answer

Related

How to get a block at an offset in the IO.foreach loop in ruby?

I'm using the IO.foreach loop to find a string using regular expressions. I want to append the next block (next line) to the file_names list. How can I do that?
file_names = [""]
IO.foreach("a.txt") { |block|
if block =~ /^file_names*/
dir = # get the next block
file_names.append(dir)
end
}
Actually my input looks like this:
file_names[174]:
name: "vector"
dir_index: 1
mod_time: 0x00000000
length: 0x00000000
file_names[175]:
name: "stl_bvector.h"
dir_index: 2
mod_time: 0x00000000
length: 0x00000000
I have a list of file_names, and I want to capture each of the name, dir_index, mod_time and length properties and put them into the files_names array index according to the file_names index in the text.
You can use #each_cons to get the value of the next 4 rows from the text file:
files = IO.foreach("text.txt").each_cons(5).with_object([]) do |block, o|
if block[0] =~ /file_names.*/
o << block[1..4].map{|e| e.split(':')[1]}
end
end
puts files
#=> "vector"
# 1
# 0x00000000
# 0x00000000
# "stl_bvector.h"
# 2
# 0x00000000
# 0x00000000
Keep in mind that the files array contains subarrays of 4 elements. If the : symbol occurs later in the lines, you could replace the third line of my code with this:
o << block[1..4].map{ |e| e.partition(':').last.strip}
I also added #strip in case you want to remove the whitespaces around the values. With this line changed, the actual array will look something like this:
p files
#=>[["\"vector\"", "1", "0x00000000", "0x00000000"], ["\"stl_bvector.h\"", "2", "0x00000000", "0x00000000"]]
(the values don't contain the \ escape character, that's just the way #p shows it).
Another option, if you know the pattern 1 filename, 4 values will be persistent through the entire text file and the textfile always starts with a filename, you can replace #each_cons with #each_slice and remove the regex completely, this will also speed up the entire process:
IO.foreach("text.txt").each_slice(5).with_object([]) do |block, o|
o << block[1..4].map{ |e| e.partition(':').last.strip }
end
It's actually pretty easy to carve up a series of lines based on a pattern using slice_before:
File.readlines("data.txt").slice_before(/\Afile_names/)
Now you have an array of arrays that looks like:
[
[
"file_names[174]:\n",
" name: \"vector\"\n",
" dir_index: 1\n",
" mod_time: 0x00000000\n",
" length: 0x00000000\n"
],
[
"file_names[175]:\n",
" name: \"stl_bvector.h\"\n",
" dir_index: 2\n",
" mod_time: 0x00000000\n",
" length: 0x00000000"
]
]
Each of these groups could be transformed further, like for example into a Ruby Hash using those keys.

n-way file merge in Ruby

I have several files from Java application (Gigaspaces logs) from multiple hosts which I need to merge based on date/time value.
Since every log file is already sorted, I need to get a first record from every file into an array, decide which one have a key with minimum value, merge it to result file, get a new line from the same file & repeat.
Record's definition - first line have a key and all following lines have no key, example:
2015-04-05 02:33:42,135 GSC SEVERE [com.gigaspaces.lrmi] - LRMI Transport Protocol caught server exception caused by [/10.0.1.2:46949] client.; Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:311)
at com.gigaspaces.lrmi.SmartByteBufferCache.get(SmartByteBufferCache.java:50)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelNoneBlocking(Reader.java:410)
at com.gigaspaces.lrmi.nio.Reader.readBytesNonBlocking(Reader.java:644)
at com.gigaspaces.lrmi.nio.Reader.bytesToStream(Reader.java:509)
at com.gigaspaces.lrmi.nio.Reader.readRequest(Reader.java:112)
at com.gigaspaces.lrmi.nio.ChannelEntry.readRequest(ChannelEntry.java:121)
at com.gigaspaces.lrmi.nio.Pivot.handleReadRequest(Pivot.java:445)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleRead(ReadSelectorThread.java:81)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleConnection(ReadSelectorThread.java:45)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.doSelect(AbstractSelectorThread.java:74)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.run(AbstractSelectorThread.java:50)
at java.lang.Thread.run(Thread.java:662)
Ideally, result file should contain key, directory/filename.log & rest of the record.
Questions:
How to get a record from file in Ruby?
How to open multiple files and iterate through them using algorithm described above?
Code
Read all lines from all files that begin with a date-time string into an array, then sort the array by the date-time strings:
require 'date'
def get_key_rows(*fnames)
fnames.flat_map do |fname|
IO.foreach(fname).with_object([]) do |s, arr|
dt = DateTime.strptime(s[0, 19], '%Y-%m-%d %H:%M:%S') rescue nil
arr << [s[0, 19], fname, s[19..-1].rstrip] if dt
end
end.sort_by(&:first)
end
This method returns an array of three-element arrays. Each three-element array corresponds to a key line in one of the files, comprised of the date/time string, the filename and the remainder of the part of the line that follows the date/time string. Note that it is not necessary for key lines to be ordered within each file. The method uses:
DateTime#strptime to identify key rows;
Enumerable#flat_map, rather than Enumerable#map followed by Array#flatten; and
Enumerable#sort_by to sort the key rows by date/time.
Regarding sort_by, note that the strings can be sorted by the date/time strings, rather than by corresponding DateTime objects, because the form of the date/time string is 'yyyy-mm-dd hh-mm-ss'.
Examples
Let's create some files:
IO.write("f0", "2015-04-05 02:33:42,135 more stuff in f0\n" +
"more in f0\n" +
"2015-04-05 04:33:42,135 more stuff in f0\n" +
"even more in f0")
#=> 108
IO.write("f1", "2015-04-04 02:33:42,135 more stuff in f1\n" +
"2015-04-06 02:33:42,135 more stuff in f1\n" +
"more in f1")
#=> 92
IO.write("f2", "something in f2\n" +
"2015-04-05 02:33:43,135 more stuff in f2\n" +
"even more in f2\n" +
"2015-04-04 02:23:42,135 more stuff in f2")
#=> 113
get_key_rows('f0', 'f1', 'f2')
#=> [["2015-04-04 02:23:42", "f2", ",135 more stuff in f2"],
# ["2015-04-04 02:33:42", "f1", ",135 more stuff in f1"],
# ["2015-04-05 02:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-05 02:33:43", "f2", ",135 more stuff in f2"],
# ["2015-04-05 04:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-06 02:33:42", "f1", ",135 more stuff in f1"]]

How to extract a number using regular expression in ruby

I am new to regular expressions and ruby. below is the example which I start working with
words= "apple[12345]: {123123} boy 1233 6F74 2AC 28458 1594 6532 1500 D242g
apple[13123]: {123123123} girl Aui817E 9AD453 91321SDF 3423FS 1213FDAS 110FADA4 43ADAC0 1AADS4D8 BASAA24 "
I want to extract boy 1233 6F74 .. to .. D242g in an array
Similarly I want to extract girl Aui817E 9AD453 .. to .. 43ADAC0 1AADS4D8 BASAA24 in an array
I did tried to this could not do it. Can some one please help me to this simple exercise.
Thanks in advance.
begin
pattern = /apple\[\d+\]: \{\d+\} (\w) (\d+) (\d+) /
f = pattern.match(words)
puts " #{f}"
end
words.scan(/apple\[\d+\]: \{\d+\}(.+)/).map{|a| a.first.scan(/\S+/)}
or
words.each_line.map{|s| s.split.drop(2)}
Output:
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]
array = words.scan(/apple\[\d+\]: {\d+}(.+)/).flatten.map { |line| line.scan(/\w+/) }
({ and } are not need to escape on regex.)
return
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]
array[0] gives an array start with "boy", and array[1] gives an array start with "girl".

Match Multiple Patterns in a String and Return Matches as Hash

I'm working with some log files, trying to extract pieces of data.
Here's an example of a file which, for the purposes of testing, I'm loading into a variable named sample. NOTE: The column layout of the log files is not guaranteed to be consistent from one file to the next.
sample = "test script result
Load for five secs: 70%/50%; one minute: 53%; five minutes: 49%
Time source is NTP, 23:25:12.829 UTC Wed Jun 11 2014
D
MAC Address IP Address MAC RxPwr Timing I
State (dBmv) Offset P
0000.955c.5a50 192.168.0.1 online(pt) 0.00 5522 N
338c.4f90.2794 10.10.0.1 online(pt) 0.00 3661 N
990a.cb24.71dc 127.0.0.1 online(pt) -0.50 4645 N
778c.4fc8.7307 192.168.1.1 online(pt) 0.00 3960 N
"
Right now, I'm just looking for IPv4 and MAC address; eventually the search will need to include more patterns. To accomplish this, I'm using two regular expressions and passing them to Regexp.union
patterns = Regexp.union(/(?<mac_address>\h{4}\.\h{4}\.\h{4})/, /(?<ip_address>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/)
As you can see, I'm using named groups to identify the matches.
The result I'm trying to achieve is a Hash. The key should equal the capture group name, and the value should equal what was matched by the regular expression.
Example:
{"mac_address"=>"0000.955c.5a50", "ip_address"=>"192.168.0.1"}
{"mac_address"=>"338c.4f90.2794", "ip_address"=>"10.10.0.1"}
{"mac_address"=>"990a.cb24.71dc", "ip_address"=>"127.0.0.1"}
{"mac_address"=>"778c.4fc8.7307", "ip_address"=>"192.168.1.1"}
Here's what I've come up with so far:
sample.split(/\r?\n/).each do |line|
hashes = []
line.split(/\s+/).each do |val|
match = val.match(patterns)
if match
hashes << Hash[match.names.zip(match.captures)].delete_if { |k,v| v.nil? }
end
end
results = hashes.reduce({}) { |r,h| h.each {|k,v| r[k] = v}; r }
puts results if results.length > 0
end
I feel like there should be a more "elegant" way to do this. My chief concern, though, is performance.

How to save hash values into a CSV

I have a CSV with one column that I like to save all my hash values on it. I am using nokogiri sax to parse a xml document and then save it to a CSV. I am getting the xml-value like this: #infodata[:academic] = #content.inspect The hash have the following keys:
#infodata = {}
#infodata[:titles] = Array.new([])
#infodata[:identifier]
#infodata[:typeOfLevel]
#infodata[:typeOfResponsibleBody]
#infodata[:type]
#infodata[:exact]
#infodata[:degree]
#infodata[:academic]
#infodata[:code]
#infodata[:text]
When I use this code right now to loop through the keys and save it to CSV:
def end_document
CSV.open("info.csv", "wb") do |row|
for key, val in #infodata
row << [val,]
end
end
puts "Finished..."
end
The output that I get is:
"""avancerad"""
"""Ingen examen"""
"""uh"""
"""Arkivvetenskap""""Archival science"""
"""HIA80D"""
"""10.300"""
"""uoh"""
"""Arkivvetenskap rör villkoren för befintliga arkiv och modern arkivbildning med fokus på arkivarieyrkets arbetsuppgifter: bevara, tillgängliggöra och styra information. Under ett år behandlas bl a informations- och dokumenthantering, arkivredovisning, gallring, lagstiftning och arkivteori. I kursen ingår praktik, där man under handledning får arbeta med olika arkivarieuppgifter."""
"""statlig"""
"""60"""
How do I get the output like this:
"avancerad", "Ingen examen", "uh", "Arkivvetenskap", "Archival science", "HIA80D", 10.300,"uoh", "Arkivvetenskap rör villkoren för befintliga arkiv och modern arkivbildning med fokus på arkivarieyrkets arbetsuppgifter: bevara, tillgängliggöra och styra information. Under ett år behandlas bl a informations- och dokumenthantering, arkivredovisning, gallring, lagstiftning och arkivteori. I kursen ingår praktik, där man under handledning får arbeta med olika arkivarieuppgifter.", "statlig", 60
I think I understand your general question, so perhaps this can help you:
# Flatten the titles Array into one String
#infodata[:titles] = #infodata[:titles].join(", ")
# Open the CSV for writing
CSV.open("info.csv", "wb") do |csv|
# Write the entire row all at once
csv << #infodata.values
end
The join method that #joelparkerhenderson talks about just takes the two array value and joins them togheter.
You can use flatten to separate and create a new array like this:
# Open the CSV for writing
CSV.open("info.csv", "wb") do |csv|
# Write the entire row all at once
csv << #infodata.values.flatten
end
Read more at: http://www.ruby-doc.org/core-1.9.3/Hash.html#method-i-flatten

Resources