How to save hash values into a CSV - ruby

I have a CSV with one column that I like to save all my hash values on it. I am using nokogiri sax to parse a xml document and then save it to a CSV. I am getting the xml-value like this: #infodata[:academic] = #content.inspect The hash have the following keys:
#infodata = {}
#infodata[:titles] = Array.new([])
#infodata[:identifier]
#infodata[:typeOfLevel]
#infodata[:typeOfResponsibleBody]
#infodata[:type]
#infodata[:exact]
#infodata[:degree]
#infodata[:academic]
#infodata[:code]
#infodata[:text]
When I use this code right now to loop through the keys and save it to CSV:
def end_document
CSV.open("info.csv", "wb") do |row|
for key, val in #infodata
row << [val,]
end
end
puts "Finished..."
end
The output that I get is:
"""avancerad"""
"""Ingen examen"""
"""uh"""
"""Arkivvetenskap""""Archival science"""
"""HIA80D"""
"""10.300"""
"""uoh"""
"""Arkivvetenskap rör villkoren för befintliga arkiv och modern arkivbildning med fokus på arkivarieyrkets arbetsuppgifter: bevara, tillgängliggöra och styra information. Under ett år behandlas bl a informations- och dokumenthantering, arkivredovisning, gallring, lagstiftning och arkivteori. I kursen ingår praktik, där man under handledning får arbeta med olika arkivarieuppgifter."""
"""statlig"""
"""60"""
How do I get the output like this:
"avancerad", "Ingen examen", "uh", "Arkivvetenskap", "Archival science", "HIA80D", 10.300,"uoh", "Arkivvetenskap rör villkoren för befintliga arkiv och modern arkivbildning med fokus på arkivarieyrkets arbetsuppgifter: bevara, tillgängliggöra och styra information. Under ett år behandlas bl a informations- och dokumenthantering, arkivredovisning, gallring, lagstiftning och arkivteori. I kursen ingår praktik, där man under handledning får arbeta med olika arkivarieuppgifter.", "statlig", 60

I think I understand your general question, so perhaps this can help you:
# Flatten the titles Array into one String
#infodata[:titles] = #infodata[:titles].join(", ")
# Open the CSV for writing
CSV.open("info.csv", "wb") do |csv|
# Write the entire row all at once
csv << #infodata.values
end

The join method that #joelparkerhenderson talks about just takes the two array value and joins them togheter.
You can use flatten to separate and create a new array like this:
# Open the CSV for writing
CSV.open("info.csv", "wb") do |csv|
# Write the entire row all at once
csv << #infodata.values.flatten
end
Read more at: http://www.ruby-doc.org/core-1.9.3/Hash.html#method-i-flatten

Related

join 3 csv files with Ruby, all columns and files

I want to merge three csv files into one.
This is what my .csv files look like:
q1.csv
Play,Num_Question
Roblox,146
Big Farm,135
q2.csv
Topic,Num_Responses,Num_Views,Year
f,0,0,2023
"Gente, estava yo en roblox y esto me asustó :(",5,23,2021 Configuracion del juego,1,12,2019
q3.csv
Month,year
01,2023
03,2021
Expected_Result.csv
Play,Num_Question,Topic,Num_Responses,Num_Views,Year,Month
Roblox,146,f,0,0,2023,01
Big Farm,135,"Gente, estava yo en roblox y esto me asustó :(",5,23,2021 Configuracion del juego,1,12,2019,03
From the q3.csv file I only need to join the month, since the year is repeated in q2.csv.
I have tried the following code but I have not had a good result.
hs = %w{Play,Num_Question,Topic,Num_Responses,Num_Views,Year,Month}
CSV.open('result.csv','w') do |csv|
csv << hs
CSV.foreach('q1.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('q2.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('q3.csv', headers: true) do |row|
csv << row.values_at('Play', 'Num_Question')+row.values_at( 'Topic','Num_Responses','Num_Views','Year')+row.values_at('Mes')
end
end
Additional Information:
q1.csv has 393 rows, q2.csv and 93.csv have 21 rows. Including row 1 that has the title of each column written.

Rails Not Reading a Specific Column from CSV

I'm trying to read my CSV file in Rails 5.2.3 (Ruby 2.6.3) and it's not reading a specific column.
CSV file:
barcode,hardware,title,price
2900007868390,PS4,title1,300
3499550362923,PS4,title2,1800
3499550370973,Nintendo Switch,title3,5000
and so on...
Code:
csv = CSV.read('path/to/my_csv.csv', headers: true)
csv.each do |row|
puts "#{row['barcode']}, #{row['hardware']}, #{row['title']}, #{row['price']}"
end
Result:
, PS4, title1, 300
, PS4, title2, 1800
, Nintendo Switch, title3, 5000
and so on...
As you can see above, it's not reading the barcode column for some reason.
I've managed to get the barcode value if I write row[0] instead of row['barcode'] .
Any ideas why this is happening?
This was because my CSV file had BOMs...
https://en.wikipedia.org/wiki/Byte_order_mark
The code below solved the problem:
csv = CSV.read("resources/games/original_#{date}.csv", 'r:BOM|UTF-8', headers: true)
Thank you #dimitry_n for your help!

How to export pdf table data into csv?

I am using Rails 4.2, Ruby 2.2, Gem: 'pdf-reader'.
My application will read pdf file which has table-data and it exports into CSV which i have already done. When i match result with table header and table content, they are in wrong position, yes because pdf table is not a actual table, we need to write some extra logic behind this which I am asking for.
marks.pdf has content similar as shown below
School Name: ABC
Program: MicroBiology Year: Second
| Roll No | Math |
|----------- |-------- |
1000001 | 65
|----------- |-------- |
Any help would be appreciated.
Working code which reads PDF and export to CSV is given below
class ExportToCsv
# method useful to export pdf to csv
def convert_to_csv
pdf_reader = PDF::Reader.new("public/marks.pdf")
csv = CSV.open("output100.tsv","wb", {:col_sep => "\t"})
data_header = ""
pdf_reader.pages.each do |page|
page.text.each_line do |line|
# line with characters
if /^[a-z|\s]*$/i=~line
data_header = line.strip
else
# line with number
data_row = line.split(/[0-9]/).first
csv_line = line.sub(data_row,'').strip.split(/[\(|\)]/)
csv_line.unshift(data_row).unshift(data_header)
csv << csv_line
end
end
end
end
end
I am not able to attach original pdf here because of security, sorry for that. You can generate the pdf as per below screenshot.
The screen of pdf is given below:
The screen of generated Csv is given below:
Desired pdf should be like below image

Using a CSV file to insert values using Ruby

I have some sample code I can execute for our Nexpose server and I need to do some mass asset tagging. Here is an example of the code.
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
nsc.login
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', ['ip1', 'ip2'])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new("tagname", Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
tag.save(nsc)
I have a file called with the following data.
ip1,ip2,tagname
192.168.1.1,192.168.1.255,Workstations
How would I go about running a for loop and using the CSV to quickly process the above code? I have no experiance with Ruby and tried to follow some example but I'm confused at this point.
There's a CSV library in Ruby's standard lib collection that you can use.
Basic example based on your code example and data, not tested:
require 'csv'
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
nsc.login
CSV.foreach("path/to/file.csv", headers: true) do |row|
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', [row['ip1'], row['ip2'])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new(row['tagname'], Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
tag.save(nsc)
end
I made a directory with input.csv and main.rb
input.csv
ip1,ip2,tagname
192.168.1.1,192.168.1.255,Workstations
main.rb
require "csv"
CSV.foreach("input.csv", headers: true) do |row|
puts "ip1: #{row['ip1']}"
puts "ip2: #{row['ip2']}"
puts "tagname: #{row['tagname']}"
end
the output is
ip1: 192.168.1.1
ip2: 192.168.1.255
tagname: Workstations
I hope this can help. If you have questions I'm here :)
If you just need to loop through each line of the file and fire that chunk of code for each line, you could do something like this:
file = Net::HTTP.get(URI(<whatever_your_file_name_is>))
index = 0
file.each_line do |line|
next if index == 0
index += 1
split_line = line.split(',')
ip1 = split_line[0]
ip2 = split_line[1]
tagname = split_line[2]
nsc = Nexpose::Connection.new('your_nexpose_instance', 'username', 'password', 3780)
nsc.login
criterion = Nexpose::Tag::Criterion.new('IP_RANGE', 'IN', [ip1, ip2])
criteria = Nexpose::Tag::Criteria.new(criterion)
tag = Nexpose::Tag.new(tagname, Nexpose::Tag::Type::Generic::CUSTOM)
tag.search_criteria = criteria
tag.save(nsc)
end
NOTE: This code example is assuming that the CSV file is stored remotely, not locally.
ALSO: In case you're wondering, the next if index == 0 is there to skip your header record.
UPDATE
To use this approach for a local file, you can use File.open() instead of Net::HTTP.get(), like so:
file = File.open(<whatever_your_file_name_is>).read
Two things to note:
Make sure you use the fully-qualified name of the file - i.e. ~/folder/folder/filename.csv instead of just filename.csv.
If the files you're going to be loading are enormous, this might not be an ideal approach because it's actually reading the whole file into memory. But considering your file only has 3 columns, you'd have to have an extreme number of rows in the file for this to be an issue.

How do I use Ruby to combine several CSV files into one big CSV file?

I have been using SmarterCSV to convert bed format file to csv file and changing the column names.
Now I have collected several CSV files, and want to combine them into one big CSV file.
In test3.csv, there are three columns, chromosome, start_site and end_site that will be used, and the other three columns, binding_site_pattern,score and strand that will be removed.
By adding three new columns to the test3.csv file, the data are all the same in the transcription_factor column: Cmyc, in the cell_type column: PWM, in the project_name column: JASPAR.
Anyone have any ideas on this one?
test1.csv
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE
Cmyc,GM12878,11,6704236,6704683,ENCODE
test2.csv
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,H1ESC,19,9710417,9710587,ENCODE
Cmyc,H1ESC,11,541754,542137,ENCODE
test3.csv
chromosome,start_site,end_site,binding_site_pattern,score,strand
chr1,21942,21953,AAGCACGTGGT,1752,+
chr1,21943,21954,AACCACGTGCT,1335,-
Desired combined result:
transcription_factor,cell_type,chromosome,start_site,end_site,project_name
Cmyc,GM12878,11,809296,809827,ENCODE
Cmyc,GM12878,11,6704236,6704683,ENCODE
Cmyc,H1ESC,19,9710417,9710587,ENCODE
Cmyc,H1ESC,11,541754,542137,ENCODE
Cmyc,PWM,1,21942,21953,JASPAR
Cmyc,PWM,1,21943,21954,JASPAR
hs = %w{ transcription_factor cell_type chromosome start_site end_site project_name }
CSV.open('result.csv','w') do |csv|
csv << hs
CSV.foreach('test1.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('test2.csv', headers: true) {|row| csv << row.values_at(*hs) }
CSV.foreach('test3.csv', headers: true) do |row|
csv << ['Cmyc', 'PWM', row['chromosome'].match(/\d+/).to_s] + row.values_at('start_site', 'end_site') + ['JASPAR']
end
end

Resources