How do I read a gzip file line by line? - ruby

I have a gzip file and currently I read it like this:
infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
output = gz.read
puts result
I think this converts the file to a string, but I would like to read it line by line.
What I want to accomplish is that the file has some warning messages with some garbage, I want to grep those warning messages and then write them to another file. But, some warning messages are repeated so I have to make sure that i only grep them once. Hence line by line reading would help me.

You should be able to simply loop over the gzip reader like you do with regular streams (according to the docs)
infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
gz.each_line do |line|
puts line
end

Try this:
infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
while output = gz.gets
puts output
end

Other answers show how to read the file line by line, but not how to only capture the errors once. Building on #Tigraine's answer:
require 'set'
infile = open("file.log.gz")
gz = Zlib::GzipReader.new(infile)
errors = Set.new
# or ...
# errors = [].to_set
gz.each_line do |line|
errors << line if (line[/^Error:/])
# or ...
# errors << line if (line['Error:'])
end
puts errors
Set acts like Array, but is built using Hash, so it's like a Hash but we're only concerned with the keys, i.e. only unique values are stored. If you try to add duplicates they will be thrown away, leaving you with only the unique values. You could use an Array, and afterwards use uniq, on it, but a Set will manage it for you up-front.
>> require 'set'
=> true
>> errors = Set.new
=> #<Set: {}>
>> errors << 'a'
=> #<Set: {"a"}>
>> errors << 'b'
=> #<Set: {"a", "b"}>
>> errors << 'a'
=> #<Set: {"a", "b"}>

Related

Change Headers for Certain Columns in CSV File

I have a CSV file that I want to change the headers only for certain columns (about 20 of them in my actual file). Here's a sample CSV file:
CSV File
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
I have been trying this with a File/IO class, but when it reads the file to do the gsub it removes all of the quotation marks around each string separated by commas. Here's the code I'm using:
Ruby Code
file = 'file.csv'
replacements = {
'blah_01_blah' => 'newblah1',
'foo_01_foo' => 'coolfoo1',
'bacon_01_bacon' => 'goodpig1',
'bacon_01_bacon' => 'goodpig2'
}
matcher = /#{replacements.keys.join('|')}/
outdata = File.read(file).gsub(matcher, replacements)
File.open(file, 'w') do |out|
out << outdata
end
What I end up with is this in the CSV file:
New CSV File
name,blah_01_blah,foo_1_01_foo,bacon_01_bacon,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae
It's keeping the quotation marks in fields that are blank, but taking them out around the strings elsewhere. I want to retain those quotation marks in case for some reason a rogue comma ends up in a string somewhere so it doesn't get thrown off. How can I change the headers without losing my quotation marks around the strings?
EDIT - This is what I want the file to look like at the end.
Expected Result CSV File
"name","newblah1","coolfoo1","goodpig1","goodpig2"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
Thanks!
You don’t need to handle CSV at all:
File.write(
file,
File.readlines(file).tap do |lines|
lines.first.gsub!(matcher, replacements)
end.join
)
File#readlines.
The trick here is we actually deal with the first line only, as with plain text.
Let's first create the input CSV file.
text =<<_
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
_
file_in = 'file_in.csv'
file_out = 'file_out.csv'
File.write(file_in, text)
#=> 137
Here is the replacements hash, which I simplified slightly.
replacements = {'blah_01_blah'=>'newblah1', 'foo_01_foo'=>'coolfoo1',
'bacon_01_bacon'=>'goodpig1'}
The first task is to modify this hash so that if it has no key k, replacements[k] will return k. For this we use the method Hash#default_proc=.
replacements.default_proc = ->(_,k) { k }
Here are two examples of how this hash is used.
replacements['bacon_01_bacon']
#=> "goodpig1"
replacements['name']
#=> "name"`
The latter follows because replacements has no key 'name'.
The code is as follows.
require 'csv'
f_in = CSV.read(file_in, headers:true)
CSV.open(file_out, 'w') do |csv_out|
csv_out << replacements.values_at(*f_in.headers)
f_in.each { |row| csv_out << row }
end
#=> #<CSV::Table mode:col_or_row row_count:3>
Note that
f_in.headers
#=> ["name", "blah_01_blah", "foo_1_01_foo", "bacon_01_bacon", "bacon_02_bacon"]
Let's look at the output file.
puts File.read(file_out)
prints
name,newblah1,foo_1_01_foo,goodpig1,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae

How to map and edit a CSV file with Ruby

Is there a way to edit a CSV file using the map method in Ruby? I know I can open a file using:
CSV.open("file.csv", "a+")
and add content to it, but I have to edit some specific lines.
The foreach method is only useful to read a file (correct me if I'm wrong).
I checked the Ruby CSV documentation but I can't find any useful info.
My CSV file has less than 1500 lines so I don't mind reading all the lines.
Another answer using each.with_index():
rows_array = CSV.read('sample.csv')
desired_indices = [3, 4, 5].sort # these are rows you would like to modify
rows_array.each.with_index(desired_indices[0]) do |row, index|
if desired_indices.include?(index)
# modify over here
rows_array[index][target_column] = 'modification'
end
end
# now update the file
CSV.open('sample3.csv', 'wb') { |csv| rows_array.each{|row| csv << row}}
You can also use each_with_index {} insead of each.with_index {}
Is there a way to edit a CSV file using the map method in Ruby?
Yes:
rows = CSV.open('sample.csv')
rows_array = rows.to_a
or
rows_array = CSV.read('sample.csv')
desired_indices = [3, 4, 5] # these are rows you would like to modify
edited_rows = rows_array.each_with_index.map do |row, index|
if desired_indices.include?(index)
# simply return the row
# or modify over here
row[3] = 'shiva'
# store index in each edited rows to keep track of the rows
[index, row]
end
end.compact
# update the main row_array with updated data
edited_rows.each{|row| rows_array[row[0]] = row[1]}
# now update the file
CSV.open('sample2.csv', 'wb') { |csv| rows_array.each{|row| csv << row}}
This is little messier. Is not it? I suggest you to use each_with_index with out map to do this. See my another answer
Here is a little script I wrote as an example on how read CSV data, do something to data, and then write out the edited text to a new file:
read_write_csv.rb:
#!/usr/bin/env ruby
require 'csv'
src_dir = "/home/user/Desktop/csvfile/FL_insurance_sample.csv"
dst_dir = "/home/user/Desktop/csvfile/FL_insurance_sample_out.csv"
puts " Reading data from : #{src_dir}"
puts " Writing data to : #{dst_dir}"
#create a new file
csv_out = File.open(dst_dir, 'wb')
#read from existing file
CSV.foreach(src_dir , :headers => false) do |row|
#then you can do this
# newrow = row.each_with_index { |rowcontent , row_num| puts "# {rowcontent} #{row_num}" }
# OR array to hash .. just saying .. maybe hash of arrays..
#h = Hash[*row]
#csv_out << h
# OR use map
#newrow = row.map(&:capitalize)
#csv_out << h
#OR use each ... Add and end
#newrow.each do |k,v| puts "#{k} is #{v}"
#Lastly, write back the edited , regexed data ..etc to an out file.
#csv_out << newrow
end
# close the file
csv_out.close
The output file has the desired data:
USER#USER-SVE1411EGXB:~/Desktop/csvfile$ ls
FL_insurance_sample.csv FL_insurance_sample_out.csv read_write_csv.rb
The input file data looked like this:
policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1

Output many arrays to CSV-files in Ruby

I have a question about Ruby. What I want to do is first to sort my items ascending and then write them out to a CSV-file. Now, the problem is further complicated by the fact that I want to iterate over a lot of CSV-files. I found this thread and the answer looks fine, but I am not able to get more than the last line written to my output file.
How can I get the whole data sorted and written to different CSV-files?
My code:
require 'date'
require 'csv'
class Daily <
# Daily has a open
Struct.new(:open)
# a method to print out a csv record for the current Daily.
def print_csv_record
printf("%s,", open)
printf("\n")
end
end
#------#
# MAIN #
#------#
# This is where I iterate over my csv-files:
foobar = ['foo', 'bar']
foobar.each do |foobar|
# get the input filename from the command line
input_file = "#{foobar}.csv"
# define an array to hold the Daily records
arr = Array.new
# loop through each record in the csv file, adding
# each record to my array while overlooking the header.
f = File.open(input_file, "r")
f.each_with_index { |row, i|
next if i == 0
words = row.split(',')
p = Daily.new
# do a little work here to convert my numbers
p.open = words[1].to_f
arr.push(p)
}
# sort the data by ascending opens
arr.sort! { |a,b| a.open <=> b.open }
# print out all the sorted records (just print to stdout)
arr.each { |p|
CSV.open("#{foobar}_new.csv", "w") do |csv|
csv << p.print_csv_record
end
}
end
My input CSV-file:
Open
52.23
52.45
52.36
52.07
52.69
52.38
51.2
50.99
51.41
51.89
51.38
50.94
49.55
50.21
50.13
50.14
49.49
48.5
47.92
My output CSV-file:
47.92
You need to put the iteration inside the open CSV file:
CSV.open("#{foobar}_new.csv", "w") do |csv|
arr.each { |p|
csv << p.print_csv_record
}
end

CSV.generate and converters?

I'm trying to create a converter to remove newline characters from CSV output.
I've got:
nonewline=lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
I've verified that this works properly IF I load a variable and then run something like:
csv=CSV(variable,:converters=>[nonewline])
However, I'm attempting to use this code to update a bunch of preexisting code using CSV.generate, and it does not appear to work at all.
CSV.generate(:converters=>[nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
returns:
"\"hello\ngoodbye\"\n"
I've tried quite a few things as well as trying other examples I've found online, and it appears as though :converters has no effect when used with CSV.generate.
Is this correct, or is there something I'm missing?
You need to write your converter as as below :
CSV::Converters[:nonewline] = lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
Then do :
CSV.generate(:converters => [:nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
Read the documentation Converters .
Okay, above part I didn't remove, as to show you how to write the custom CSV converters. The way you wrote it is incorrect.
Read the documentation of CSV::generate
This method wraps a String you provide, or an empty default String, in a CSV object which is passed to the provided block. You can use the block to append CSV rows to the String and when the block exits, the final String will be returned.
After reading the docs, it is quite clear that this method is for writing to a csv file, not for reading. Now all the converters options ( like :converters, :header_converters) is applied, when you are reading a CSV file, but not applied when you are writing into a CSV file.
Let me show you 2 examples to illustrate this more clearly.
require 'csv'
string = <<_
foo,bar
baz,quack
_
File.write('a',string)
CSV::Converters[:upcase] = lambda do |s|
s.upcase
end
I am reading from a CSV file, so :converters option is applied to it.
CSV.open('a','r',:converters => :upcase) do |csv|
puts csv.read
end
output
# >> FOO
# >> BAR
# >> BAZ
# >> QUACK
Now I am writing into the CSV file, converters option is not applied.
CSV.open('a','w',:converters => :upcase) do |csv|
csv << ['dog','cat']
end
CSV.read('a') # => [["dog", "cat"]]
Attempting to remove newlines using :converters did not work.
I had to override the << method from csv.rb adding the following code to it:
# Change all CR/NL's into one space
row.map! { |element|
if element.is_a?(String)
element.gsub(/(\r?\n)+/,' ')
else
element
end
}
Placed right before
output = row.map(&#quote).join(#col_sep) + #row_sep # quote and separate
at line 21.
I would think this would be a good patch to CSV, as newlines will always produce bad CSV output.

How do I make an array of arrays out of a CSV?

I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.

Resources