How can I convert this CSV to JSON with Ruby? - ruby

I am trying to convert a CSV file to JSON using Ruby. I am very, very, green when it comes to working with Ruby (or any language for that matter) so the answers may need to be dumbed down for me. Putting it in JSON seems like the most reasonable solution to me because I understand how to work with JSON when assigning variables equal to the attributes that come in the response. If there is a better way to do it, feel free to teach me.
My CSV is in the following format:
Header1,Header,Header3
ValueX,ValueY,ValueZ
I would like to be able to use the data to say something along the lines of this:
For each ValueX in Row 1 after the headers, check if valueZ is > ValueY. If yes, do this, if no do that. I understand how to do the if statement, just now how to parse out my information into variables/arrays.
Any ideas here?

require 'csv'
require 'json'
rows = []
CSV.foreach('a.csv', headers: true, converters: :all) do |row|
rows << row.to_hash
end
puts rows.to_json
# => [{"Header1":"ValueX","Header":"ValueY","Header3":"ValueZ"}]

Here is a first pointer:
require 'csv'
data = CSV.read('your_file.csv', { :col_sep => ',' }
Now you should have the data in data; you can test in irb.
I don't entirely understand the question:
if z > y
# do this
else
# do that
end
For JSON, you should be able to do JSON.parse().
I am not sure what target format JSON requires, probably a Hash.
You can populate your hash with the dataset from the CVS:
hash = Hash.new
hash[key_goes_here] = value_here

Related

JSON to CSV File Ruby

I am trying to convert the following JSON to CSV via Ruby, but am having trouble with my code. I am learning as I go, so any help is appreciated.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
JSON.parse(struct).read.each do |hash|
csv << hash.values
end
end
The error I receive is:
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `new'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `parse'
from test.rb:14:in `block in <main>'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/2.2.0/csv.rb:1273:in `open'
from test.rb:13:in `<main>'
I am trying to get all the data off of the following link and put it into a CSV file that I can analyse later. https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C
You have several problems here, the most significant of which is that you're calling JSON.parse twice. The second time you call it on struct, which was the result of calling JSON.parse the first time. You're basically doing JSON.parse(JSON.parse(string)). Oops.
There's another problem on the line where you call JSON.parse a second time: You call read on the value it returns. As far as I know JSON.parse does not ordinarily return anything that responds to read.
Fixing those two errors, your code looks something like this:
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
struct.each do |hash|
csv << hash.values
end
end
This ought to work iif struct is an object that responds to each (like an array) and the values yielded by each all respond to values (like a hash). In other words, this code assumes that JSON.parse will return an array of hashes, or something similar. If it doesn't—well, that's beyond the scope of this question.
As an aside, this is not great:
response.body.scan(/processPOIs\((.*)\);/).first.first
The purpose of String#scan is to find every substring in a string that matches a regular expression. But you're only concerned with the first match, so scan is the wrong choice.
An alternative is to use String#match:
matches = response.body.match(/processPOIs\((.*)\)/)
json = matches[1]
struct = JSON.parse(json)
However, that's overkill. Since this is a JSONP response, we know that it will look like this:
processPOIs(...);
...give or take a trailing semicolon or newline. We don't need a regular expression to find the parts inside the parentheses, because we already know where it is: It starts 13 characters from the start (i.e. index 12) and ends two characters before the end ("index" -3). That makes it easy work with String#slice, a.k.a. String#[]:
json = response.body[12..-3]
struct = JSON.parse(json)
Like I said, "give or take a trailing semicolon or newline," so you might need to tweak that ending index depending on what the API returns. And with that, no more ugly .first.first, and it's faster, too.
Thank you everybody for the help. I was able to get everything into a CSV and then just used some VBA to organize it the way I wanted.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
matches = response.body.match(/processPOIs\((.*)\)/)
json = response.body[12..-3]
struct = JSON.parse(json)
CSV.open("output.csv", "w") do |csv|
csv << struct['searchResults'].map { |result| result['fields']}
end

Ruby CSV converter, remove all converters?

I have some data I was writing from one CSV to another CSV because I need to do some data manipulation.
I noticed the CSV library has some default converters that are taking my values that look like dates and parsing those into new date strings.
I was wondering if I could remove all converters? I tried using my custom converter, but no matter what I do it seems that the dates keep getting parsed.
Here is my code simplified:
require 'csv'
CSV::Converters[:my_converter] = lambda do |value|
value
end
CSV.open('new-data.csv', 'w') do |csv|
data = CSV.read('original-data.csv', :converters => [:my_converter]).each do |row|
csv << row
end
end
The value 9/30/14 0:00 is getting changed to 9/30/2014 0:00, for example.
Are you sure that your CSV file doesn't actually contain the 4-digit year? Try looking at puts File.read('original-data.csv')
When I tried this on Ruby 2.1.8, it didn't change the value
require 'csv'
my_csv_data = 'hello,"9/30/14 0:00",world'
CSV.new(my_csv_data).each do |row|
puts row.inspect # prints ["hello", "9/30/14 0:00", "world"], as expected
end
CSV files are not parsed and converted into objects, the data in the fields is returned as a string. Always. This behavior is different than YAML or JSON, which do convert back to their base types.
Consider this:
require 'csv'
CSV.parse("1,10/1/14,foo") # => [["1", "10/1/14", "foo"]]
All values are strings.
csv = ["foo", 'bar', 1, Date.new(2014, 10, 1)].to_csv # => "foo,bar,1,2014-10-01\n"
Converting an array containing native Ruby objects results in a string of comma-delimited values.
CSV.parse(csv) # => [["foo", "bar", "1", "2014-10-01"]]
Reparsing that string returns the string versions but doesn't attempt to return them to their original types as CSV doesn't have a way of knowing what those were. The developer (you) has to know and do that.
The end-result of all that is that CSV won't change a year from '14' to '2014'. It doesn't know that it's a date, and, because it's not CSV's place to convert to objects, it only splits the fields appropriately and passes the information on to be massaged by the developer.

How to add/read rows from Ruby CSV instance

Though it seems far more common for people to use the Ruby CSV class methods, I have an occasion to use a CSV instance, but it seams completely uncooperative.
What I'd like to do is create a CSV instance, add some rows to it, then be able to retrieve all those rows and write them to a file. Sadly, the following code doesn't work as I would like at all.
require 'csv'
csv = CSV.new('', headers: ['name', 'age'])
csv.read # Apparently I need to do this so that the headers are actually read in.
csv.add_row(['john', '22'])
csv.add_row(['jane', '24'])
csv.read
csv.to_a
csv.to_s
All I want to be able to retrieve the information I put into the csv and then write that to a file, but I can't seem to do that :/
What am I doing wrong?
You need to use CSV#rewind
Here is the sample:
require 'csv'
csv = CSV.new(File.new("data1.csv", "r+"), headers: ['name', 'age'], write_headers: true)
csv.add_row(['john', '22'])
csv.add_row(['jane', '24'])
p csv.to_a # Empty array
csv.rewind
p csv.to_a # Array with three CSV::Row objects (including header)

Merging CSV tables with Ruby

I'm trying to join CSV files containing stock indexes with Ruby, and having a surprisingly hard time understanding the documentation out there. It's late, and I could use a friend, so go easy on me:
I have several files, with identical headers:
["Date", "Open", "High", "Low", "Close", "Volume"]
I would like my ruby script to read each "Date" column, and write to a new CSV compiling an all encompassing date range from the earliest date to the latest.
Bonus:
Ideally, I would like to add all of the other column data ("Open", "High", etc.) into this new CSV file, split by a column simply containing the following CSV's filename for reference.
Thanks for any consideration given to this. What I'd really like to do is sit down with a Ruby sensei to help me make sense of the documentation. How can I use the CSV.read() or CSV.foreach() do |x| methods to create arrays / hashes to perform upon?
(Theoretical and intelligent responses welcomed)
hypothetical:
CSV.read("data/DOW.csv") do |output|
puts output
end
returns:
[["Date", "Open", "High", "Low", "Close", "Volume"], ["2014-07-14", "71.35", "71.52", "70.82", "71.28", "823063.0"], ["2014-07-15", "71.32", "71.76", "71.0", "71.28", "813861.0"], ["2014-07-16", "71.34", "71.58", "70.68", "71.02", "843347.0"], ["2014-07-17", "70.54", "71.46", "70.54", "71.13", "1303839.0"], ["2014-07-18", "71.46", "72.95", "71.09", "72.46", "1375922.0"], ["2014-07-21", "72.21", "73.46", "71.88", "73.38", "1603854.0"], ["2014-07-22", "73.46", "74.76", "73.46", "74.57", "1335305.0"], ["2014-07-23", "74.54", "75.1", "73.77", "74.88", "1834953.0"]]
How can I identify rows, columns, etc? I'm looking for methods or ways to transform this array into hashes etc. Honestly, an overarching theoretical approach would suit my needs.
I've been playing with Ruby and CSV most of this day, I might be able to help (even though I am beginner myself) but I don't understand what do you want as output (little example would help).
This example would load only columns "Date", "High" and "Volume" into "my_array".
my_array = []
CSV.foreach("data.csv") do |row|
my_array.push([row[0], row[2], row[5]])
end
If you want every column try:
my_array = []
CSV.foreach("data.csv") do |row|
my_array.push(row)
end
If you want to access element of array inside array:
puts my_array[0][0].inspect #=> "Date"
puts my_array[1][0].inspect #=> "2014-07-14"
When you finally get what you want as output, if you are on Windows you can do this from command prompt to save it:
my_file.rb > output_in_text_form.txt
You can do something like this:
#!/usr/bin/env ruby
require 'csv'
input = ARGV.shift
output = ARGV.shift
File.open(output, 'w') do |o|
csv_string = File.read(input)
CSV.parse(csv_string).each do |r|
# r is an array of columns. Do something with it.
...
# Generate string version.
new_csv_row = CSV.generate_line(r, {:force_quotes => true})
# Write to file
o.puts new_csv_row
end
end
Using files is optional. You can use shell redirection and directly read from STDIN and/or directly write to STDOUT.

Convert CSV file into array of hashes

I have a csv file, some hockey stats, for example:
09.09.2008,1,HC Vitkovice Steel,BK Mlada Boleslav,1:0 (PP)
09.09.2008,1,HC Lasselsberger Plzen,RI OKNA ZLIN,6:2
09.09.2008,1,HC Litvinov,HC Sparta Praha,3:5
I want to save them in an array of hashes. I don't have any headers and I would like to add keys to each value like "time" => "09.09.2008" and so on. Each line should by accessible like arr[i], each value by for example arr[i]["time"]. I prefer CSV class rather than FasterCSV or split. Can you show the way or redirect to some thread where a similar problem was solved?
Just pass headers: true
CSV.foreach(data_file, headers: true) do |row|
puts row.inspect # hash
end
From there, you can manipulate the hash however you like.
(Tested with Ruby 2.0, but I think this has worked for quite a while.)
Edit
You say you don't have any headers - could you add a header line to the beginning of the file contents after reading them?
You can use the Ruby CSV parser to parse it, and then use Hash[ keys.zip(values) ] to make it a hash.
Example:
test = '''
09.09.2008,1,HC Vitkovice Steel,BK Mlada Boleslav,1:0 (PP)
09.09.2008,1,HC Lasselsberger Plzen,RI OKNA ZLIN,6:2
09.09.2008,1,HC Litvinov,HC Sparta Praha,3:5
'''.strip
keys = ['time', etc... ]
CSV.parse(test).map {|a| Hash[ keys.zip(a) ] }
This is a fantastic post by Josh Nichols which explains how to do what you're asking.
To summarize, here his code:
csv = CSV.new(body, :headers => true, :header_converters => :symbol, :converters => [:all, :blank_to_nil])
csv.to_a.map {|row| row.to_hash }
=> [{:year=>1997, :make=>"Ford", :model=>"E350", :description=>"ac, abs, moon", :price=>3000.0}, {:year=>1999, :make=>"Chevy", :model=>"Venture \"Extended Edition\"", :description=>nil, :price=>4900.0}, {:year=>1999, :make=>"Chevy", :model=>"Venture \"Extended Edition, Very Large\"", :description=>nil, :price=>5000.0}, {:year=>1996, :make=>"Jeep", :model=>"Grand Cherokee", :description=>"MUST SELL!\nair, moon roof, loaded", :price=>4799.0}]
So, you could save the body of your CSV file into a string called body.
body = "09.09.2008,1,HC Vitkovice Steel,BK Mlada Boleslav,1:0 (PP)
09.09.2008,1,HC Lasselsberger Plzen,RI OKNA ZLIN,6:2
09.09.2008,1,HC Litvinov,HC Sparta Praha,3:5"
And then run his code as listed above on it.
A little shorter solution
Parse string:
CSV.parse(content, headers: :first_row).map(&:to_h)
Parse file:
CSV.open(filename, headers: :first_row).map(&:to_h)
Slight variation on Nathan Long's answer
data_file = './sheet.csv'
data = CSV.foreach(data_file, headers: true).map(&:to_h)
Now data is an array of hashes to do your bidding with!
The headers option to the CSV module accepts an array of strings to be used as the headers, when they're not present as the first row in the CSV content.
CSV.parse(content, headers: %w(time number team_1 team_2 score))
This will generate an enumerable of hashes using the given headers as keys.
You can try the following gem also
require 'csv_hasher'
arr_of_hashes = CSVHasher.hashify('/path/to/csv/file')
The keys of the returned hashes will be the header values of the csv file.
If you want to pass your own keys then
keys = [:key1, :key2, ... ]
arr_of_hashers = CSVHasher.hashify('/path/to/csv/file', { keys: keys })
I guess this is the shortest version:
keys = ["time", ...]
CSV.parse(content, headers: keys).map(&:to_h)
you could also use the SmarterCSV gem,
which returns data from CSV files as Ruby hashes by default.
It has a lot of features, including processing the data in chunks, which is very benefitial for huge data files.
require 'smarter_csv'
options = {} # see GitHub README
data = SmarterCSV.process(your_file_name, options)

Resources