How to remove headers and second column in CSV in ruby? - ruby

I have a CSV that looks like this:
user_id,is_user_unsubscribed
131072,1
7077888,1
11010048,1
12386304,1
327936,1
2228480,1
6553856,1
9830656,1
10158336,1
10486016,1
10617088,1
11010304,1
11272448,1
393728,1
7012864,1
8782336,1
11338240,1
11928064,1
4326144,1
8127232,1
11862784,1
but I want the data to look like this:
131072
7077888
11010048
12386304
327936
...
any ideas on what to do? I have 330,000 rows...

You can read your file as an array and ignore the first row like this:
data = CSV.read("dataset.csv")[1 .. -1]
This way you can remove the header.
Regarding the column, you can delete a column like this:
data = CSV.read("dataset.csv")[1 .. -1]
data.delete("is_user_unsubscribed")
data.to_csv # => The new CSV in string format
Check this for more info: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Table.html
http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html

My recommendation would be to read in a line from your file as a string, then split the String that you get by commas (there is a comma separating your columns).
Splitting a Ruby String:
https://code-maven.com/ruby-split
require 'pp'
line_num=0
text=File.open('myfile.csv').read
text.each_line do |line|
textArray = line.split
textIWant = textArray[0]
line_num = line_num + 1
print "#{textIWant}"
end
In this code we open a text file, and read line by line. Each line we split into the text we want by choosing the text from the first column (zeroth item in the array), then print it.
If you do not want the headers, when line_num = 0, add an if statement to not pick up the data. Even better use unless.
Just rewrite a new file with your new data.

I wound up doing this. Is this kosher?
user_ids = []
[]
CSV.foreach("eds_users_sept15.csv", headers:true) do |row|
user_ids << row['user_id']
end
nil
user_ids.count
322101
CSV.open('some_new_file.csv', 'w') do |c|
user_ids.each do |id|
c << [id]
end
end

I have 330,000 rows...
So I guess speed matters, right?
I took your method and the other 2 that was proposed, tested them on a 330,000 rows csv file and made a benchmark to show you something interesting.
require 'csv'
require 'benchmark'
Benchmark.bm(10) do |bm|
bm.report("Method 1:") {
data = Array.new
CSV.foreach("input.csv", headers:true) do |row|
data << row['user_id']
end
}
bm.report("Method 2:") {
data = CSV.read("input.csv")[1 .. -1]
data.delete("is_user_unsubscribed")
}
bm.report("Method 3:") {
data = Array.new
File.open('input.csv').read.each_line do |line|
data << line.split(',')[0]
end
data.shift # => remove headers
}
end
The output:
user system total real
Method 1: 3.110000 0.010000 3.120000 ( 3.129409)
Method 2: 1.990000 0.010000 2.000000 ( 2.004016)
Method 3: 0.380000 0.010000 0.390000 ( 0.383700)
As you can see handling the CSV file as a simple text file, splitting the lines and pushing them into the array is ~5 times faster than using CSV Module. Of course it has some disadvantages too; i.e., if you'll ever add columns in the input file you'll have to review the code.
It's up to you if you prefer lightspeed code or easier scalability.

I'm guessing that you plan to convert each string that precedes a comma to an integer. If so,
CSV.read("dataset.csv").drop(1).map(:to_i)
is all you need. (For example, "131072,1".to_i #=> 131072.)
If you want strings, you could write
CSV.read("dataset.csv").drop(1).map { |s| s[/d+/] }

Related

Change Headers for Certain Columns in CSV File

I have a CSV file that I want to change the headers only for certain columns (about 20 of them in my actual file). Here's a sample CSV file:
CSV File
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
I have been trying this with a File/IO class, but when it reads the file to do the gsub it removes all of the quotation marks around each string separated by commas. Here's the code I'm using:
Ruby Code
file = 'file.csv'
replacements = {
'blah_01_blah' => 'newblah1',
'foo_01_foo' => 'coolfoo1',
'bacon_01_bacon' => 'goodpig1',
'bacon_01_bacon' => 'goodpig2'
}
matcher = /#{replacements.keys.join('|')}/
outdata = File.read(file).gsub(matcher, replacements)
File.open(file, 'w') do |out|
out << outdata
end
What I end up with is this in the CSV file:
New CSV File
name,blah_01_blah,foo_1_01_foo,bacon_01_bacon,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae
It's keeping the quotation marks in fields that are blank, but taking them out around the strings elsewhere. I want to retain those quotation marks in case for some reason a rogue comma ends up in a string somewhere so it doesn't get thrown off. How can I change the headers without losing my quotation marks around the strings?
EDIT - This is what I want the file to look like at the end.
Expected Result CSV File
"name","newblah1","coolfoo1","goodpig1","goodpig2"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
Thanks!
You don’t need to handle CSV at all:
File.write(
file,
File.readlines(file).tap do |lines|
lines.first.gsub!(matcher, replacements)
end.join
)
File#readlines.
The trick here is we actually deal with the first line only, as with plain text.
Let's first create the input CSV file.
text =<<_
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
_
file_in = 'file_in.csv'
file_out = 'file_out.csv'
File.write(file_in, text)
#=> 137
Here is the replacements hash, which I simplified slightly.
replacements = {'blah_01_blah'=>'newblah1', 'foo_01_foo'=>'coolfoo1',
'bacon_01_bacon'=>'goodpig1'}
The first task is to modify this hash so that if it has no key k, replacements[k] will return k. For this we use the method Hash#default_proc=.
replacements.default_proc = ->(_,k) { k }
Here are two examples of how this hash is used.
replacements['bacon_01_bacon']
#=> "goodpig1"
replacements['name']
#=> "name"`
The latter follows because replacements has no key 'name'.
The code is as follows.
require 'csv'
f_in = CSV.read(file_in, headers:true)
CSV.open(file_out, 'w') do |csv_out|
csv_out << replacements.values_at(*f_in.headers)
f_in.each { |row| csv_out << row }
end
#=> #<CSV::Table mode:col_or_row row_count:3>
Note that
f_in.headers
#=> ["name", "blah_01_blah", "foo_1_01_foo", "bacon_01_bacon", "bacon_02_bacon"]
Let's look at the output file.
puts File.read(file_out)
prints
name,newblah1,foo_1_01_foo,goodpig1,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae

Write an array to multi column CSV format using Ruby

I have an array of arrays in Ruby that i'm trying to output to a CSV file (or text). That I can then easily transfer over to another XML file for graphing.
I can't seem to get the output (in text format) like so. Instead I get one line of data which is just a large array.
0,2
0,3
0,4
0,5
I originally tried something along the lines of this
File.open('02.3.gyro_trends.text' , 'w') { |file| trend_array.each { |x,y| file.puts(x,y)}}
And it outputs
0.2
46558
0
46560
0
....etc etc.
Can anyone point me in the "write" direction for getting either:
(i) .text file that can put my data like so.
trend_array[0][0], trend_array[0][1]
trend_array[1][0], trend_array[1][1]
trend_array[2][0], trend_array[2][1]
trend_array[3][0], trend_array[3][1]
(ii) .csv file that would put this data in separate columns.
edit I recently added more than two values into my array, check out my answer combining Cameck's solution.
This is currently what I have at the moment.
trend_array=[]
j=1
# cycle through array and find change in gyro data.
while j < gyro_array.length-2
if gyro_array[j+1][1] < 0.025 && gyro_array[j+1][1] > -0.025
trend_array << [0, gyro_array[j][0]]
j+=1
elsif gyro_array[j+1][1] > -0.025 # if the next value is increasing by x1.2 the value of the previous amount. Log it as +1
trend_array << [0.2, gyro_array[j][0]]
j+=1
elsif gyro_array[j+1][1] < 0.025 # if the next value is decreasing by x1.2 the value of the previous amount. Log it as -1
trend_array << [-0.2, gyro_array[j][0]]
j+=1
end
end
#for graphing and analysis purposes (wanted to print it all as a csv in two columns)
File.open('02.3test.gyro_trends.text' , 'w') { |file| trend_array.each { |x,y| file.puts(x,y)}}
File.open('02.3test.gyro_trends_count.text' , 'w') { |file| trend_array.each {|x,y| file.puts(y)}}
I know it's something really easy, but for some reason I'm missing it. Something with concatenation, but I found that if I try and concatenate a \\n in my last line of code, it doesn't output it to the file. It outputs it in my console the way I want it, but not when I write it to a file.
Thanks for taking the time to read this all.
File.open('02.3test.gyro_trends.text' , 'w') { |file| trend_array.each { |a| file.puts(a.join(","))}}
Alternately using the CSV Class:
def write_to_csv(row)
if csv_exists?
CSV.open(#csv_name, 'a+') { |csv| csv << row }
else
# create and add headers if doesn't exist already
CSV.open(#csv_name, 'wb') do |csv|
csv << CSV_HEADER
csv << row
end
end
end
def csv_exists?
#exists ||= File.file?(#csv_name)
end
Call write_to_csv with an array [col_1, col_2, col_3]
Thank you both #cameck & #tkupari, both answers were what I was looking for. Went with Cameck's answer in the end, because it "cut out" cutting and pasting text => xml. Here's what I did to get an array of arrays into their proper places.
require 'csv'
CSV_HEADER = [
"Apples",
"Oranges",
"Pears"
]
#csv_name = "Test_file.csv"
def write_to_csv(row)
if csv_exists?
CSV.open(#csv_name, 'a+') { |csv| csv << row }
else
# create and add headers if doesn't exist already
CSV.open(#csv_name, 'wb') do |csv|
csv << CSV_HEADER
csv << row
end
end
end
def csv_exists?
#exists ||= File.file?(#csv_name)
end
array = [ [1,2,3] , ['a','b','c'] , ['dog', 'cat' , 'poop'] ]
array.each { |row| write_to_csv(row) }

How to map and edit a CSV file with Ruby

Is there a way to edit a CSV file using the map method in Ruby? I know I can open a file using:
CSV.open("file.csv", "a+")
and add content to it, but I have to edit some specific lines.
The foreach method is only useful to read a file (correct me if I'm wrong).
I checked the Ruby CSV documentation but I can't find any useful info.
My CSV file has less than 1500 lines so I don't mind reading all the lines.
Another answer using each.with_index():
rows_array = CSV.read('sample.csv')
desired_indices = [3, 4, 5].sort # these are rows you would like to modify
rows_array.each.with_index(desired_indices[0]) do |row, index|
if desired_indices.include?(index)
# modify over here
rows_array[index][target_column] = 'modification'
end
end
# now update the file
CSV.open('sample3.csv', 'wb') { |csv| rows_array.each{|row| csv << row}}
You can also use each_with_index {} insead of each.with_index {}
Is there a way to edit a CSV file using the map method in Ruby?
Yes:
rows = CSV.open('sample.csv')
rows_array = rows.to_a
or
rows_array = CSV.read('sample.csv')
desired_indices = [3, 4, 5] # these are rows you would like to modify
edited_rows = rows_array.each_with_index.map do |row, index|
if desired_indices.include?(index)
# simply return the row
# or modify over here
row[3] = 'shiva'
# store index in each edited rows to keep track of the rows
[index, row]
end
end.compact
# update the main row_array with updated data
edited_rows.each{|row| rows_array[row[0]] = row[1]}
# now update the file
CSV.open('sample2.csv', 'wb') { |csv| rows_array.each{|row| csv << row}}
This is little messier. Is not it? I suggest you to use each_with_index with out map to do this. See my another answer
Here is a little script I wrote as an example on how read CSV data, do something to data, and then write out the edited text to a new file:
read_write_csv.rb:
#!/usr/bin/env ruby
require 'csv'
src_dir = "/home/user/Desktop/csvfile/FL_insurance_sample.csv"
dst_dir = "/home/user/Desktop/csvfile/FL_insurance_sample_out.csv"
puts " Reading data from : #{src_dir}"
puts " Writing data to : #{dst_dir}"
#create a new file
csv_out = File.open(dst_dir, 'wb')
#read from existing file
CSV.foreach(src_dir , :headers => false) do |row|
#then you can do this
# newrow = row.each_with_index { |rowcontent , row_num| puts "# {rowcontent} #{row_num}" }
# OR array to hash .. just saying .. maybe hash of arrays..
#h = Hash[*row]
#csv_out << h
# OR use map
#newrow = row.map(&:capitalize)
#csv_out << h
#OR use each ... Add and end
#newrow.each do |k,v| puts "#{k} is #{v}"
#Lastly, write back the edited , regexed data ..etc to an out file.
#csv_out << newrow
end
# close the file
csv_out.close
The output file has the desired data:
USER#USER-SVE1411EGXB:~/Desktop/csvfile$ ls
FL_insurance_sample.csv FL_insurance_sample_out.csv read_write_csv.rb
The input file data looked like this:
policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1

Ignoring multiple header lines in a CSV

I've worked a bit with Ruby's CSV module, but am having some problems getting it to ignore multiple header lines.
Specifically, here are the first twenty lines of a file I want to parse:
USGS Digital Spectral Library splib06a
Clark and others 2007, USGS, Data Series 231.
For further information on spectrsocopy, see: http://speclab.cr.usgs.gov
ASCII Spectral Data file contents:
line 15 title
line 16 history
line 17 to end: 3-columns of data:
wavelength reflectance standard deviation
(standard deviation of 0.000000 means not measured)
( -1.23e34 indicates a deleted number)
----------------------------------------------------
Olivine GDS70.a Fo89 165um W1R1Bb AREF
copy of splib05a r 5038
0.205100 -1.23e34 0.090781
0.213100 -1.23e34 0.018820
0.221100 -1.23e34 0.005416
0.229100 -1.23e34 0.002928
The actual headers are given on the tenth line, and the seventeenth line is where the actual data start.
Here's my code:
require "nyaplot"
# Note that DataFrame basically just inherits from Ruby's CSV module.
class SpectraHelper < Nyaplot::DataFrame
class << self
def from_csv filename
df = super(filename, col_sep: ' ') do |csv|
csv.convert do |field, info|
STDERR.puts "Field is #{field}"
end
end
end
end
def csv_headers
[:wavelength, :reflectance, :standard_deviation]
end
end
def read_asc filename
f = File.open(filename, "r")
16.times do
line = f.gets
puts "Ignoring #{line}"
end
d = SpectraHelper.from_csv(f)
end
The output suggests that my calls to f.gets are not actually ignoring those lines, and I can't understand why. Here are the first few lines of output:
Field is Clark
Field is and
Field is others
Field is 2007,
Field is USGS,
I tried looking for a tutorial or example which shows processing of more complicated CSV files, but haven't had much luck. If someone could point me towards a resource which answers this question, I would be grateful (and would prefer to mark that as accepted over a solution to my specific problem — but both would be appreciated).
Using Ruby 2.1.
It believe that you are using ::open which uses IO.open. This method will open the file again.
I modified the script a bit
require 'csv'
class SpectraHelper < CSV
def self.from_csv(filename)
df = open(filename, 'r' , col_sep: ' ') do |csv|
csv.drop(16).each {|c| p c}
end
end
end
def read_asc(filename)
SpectraHelper.from_csv(filename)
end
read_asc "data/csv1.csv"
It turns out the problem here was not with my understanding of CSV, but rather with now Nyaplot::DataFrame handles CSV files.
Basically, Nyaplot doesn't actually store things as CSVs. CSV is just an intermediate format. So a simple way to handle the files makes use of #khelli's suggestion:
def read_asc filename
Nyaplot::DataFrame.new(CSV.open(filename, 'r',
col_sep: ' ',
headers: [:wavelength, :reflectance, :standard_deviation],
converters: :numeric).
drop(16).
map do |csv_row|
csv_row.to_h.delete_if { |k,v| k.nil? }
end)
end
Thanks, everyone, for the suggestions.
I wouldn't use the CSV module since your file is not well formatted. the following code will read the file and give you an array of your records:
lines = File.open(filename,'r').readlines
lines.slice!(0,16)
records = lines.map {|line| line.chomp.split}
the recordsoutput:
[["0.205100", "-1.23e34", "0.090781"], ["0.213100", "-1.23e34", "0.018820"], ["0.221100", "-1.23e34", "0.005416"], ["0.229100", "-1.23e34", "0.002928"]]

How do I make an array of arrays out of a CSV?

I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.

Resources