Related
I have a CSV in the following format:
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,codes.1
YK,1234,4567,AB001,AK002
As you can see, this is a nested structure. The CSV may contain multiple rows. I would like to convert this into an array of hashes like this:
[
{
name: 'YK',
contacts: [
{
phone_no: '1234'
},
{
phone_no: '4567'
}
],
codes: ['AB001', 'AK002']
}
]
The structure uses numbers in the given format to represent arrays. There can be hashes inside arrays. Is there a simple way to do that in Ruby?
The CSV headers are dynamic. It can change. I will have to create the hash on the fly based on the CSV file.
There is a similar node library called csvtojson to do that for JavaScript.
Just read and parse it line-by-line. The arr variable in the code below will hold an array of Hash that you need
arr = []
File.readlines('README.md').drop(1).each do |line|
fields = line.split(',').map(&:strip)
hash = { name: fields[0], contacts: [fields[1], fields[2]], address: [fields[3], fields[4]] }
arr.push(hash)
end
Let's first construct a CSV file.
str = <<~END
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,IQ,codes.1
YK,1234,4567,AB001,173,AK002
ER,4321,7654,BA001,81,KA002
END
FName = 't.csv'
File.write(FName, str)
#=> 121
I have constructed a helper method to construct a pattern that will be used to convert each row of the CSV file (following the first, containing the headers) to an element (hash) of the desired array.
require 'csv'
def construct_pattern(csv)
csv.headers.group_by { |col| col[/[^.]+/] }.
transform_values do |arr|
case arr.first.count('.')
when 0
arr.first
when 1
arr
else
key = arr.first[/(?<=\d\.).*/]
arr.map { |v| { key=>v } }
end
end
end
In the code below, for the example being considered:
construct_pattern(csv)
#=> {"name"=>"name",
# "contacts"=>[{"phone_no"=>"contacts.0.phone_no"},
# {"phone_no"=>"contacts.1.phone_no"}],
# "codes"=>["codes.0", "codes.1"],
# "IQ"=>"IQ"}
By tacking if pattern.empty? onto the above expression we ensure the pattern is constructed only once.
We may now construct the desired array.
pattern = {}
CSV.foreach(FName, headers: true).map do |csv|
pattern = construct_pattern(csv) if pattern.empty?
pattern.each_with_object({}) do |(k,v),h|
h[k] =
case v
when Array
case v.first
when Hash
v.map { |g| g.transform_values { |s| csv[s] } }
else
v.map { |s| csv[s] }
end
else
csv[v]
end
end
end
#=> [{"name"=>"YK",
# "contacts"=>[{"phone_no"=>"1234"}, {"phone_no"=>"4567"}],
# "codes"=>["AB001", "AK002"],
# "IQ"=>"173"},
# {"name"=>"ER",
# "contacts"=>[{"phone_no"=>"4321"}, {"phone_no"=>"7654"}],
# "codes"=>["BA001", "KA002"],
# "IQ"=>"81"}]
The CSV methods I've used are documented in CSV. See also Enumerable#group_by and Hash#transform_values.
I have two CSV files
file1.csv
username;userid;full_name;follower_count;following_count;media_count;email;category
helloworld;1234;data3;data4;data5;data6;data7;data8
file2.csv
username;owner_id;owner_profile_pic_url;media_url;tagged_brand_username
helloworld;1234;data3b;data4b;data5b
I need the following output file using Ruby with blank if file1.csv username is not found in file2.csv (e.g. row 2).
output.csv
username;userid;full_name;follower_count;following_count;media_count;email;category;owner_profile_pic_url;media_url;tagged_brand_username
helloworld;1234;data3;data4;data5;data6;data7;data8;data3b;data4b;data5b
helloworld;1234;data3;data4;data5;data6;data7;data8;;;
Currently I'm doing it using a Excel vlookup function.
Thanks
There's a lot to unpack in this script. Essentially you need to read both CSV files into a hash, merge file2 into file1, and write it back to a CSV.
require "csv"
dict = Hash.new
options = { col_sep: ";", headers: true}
# read file1
CSV.foreach("file1.csv", options) do |row|
row = row.to_h
user = "#{row['username']+row['userid']}"
dict[user] = row
end
# read file2
CSV.foreach("file2.csv", options) do |row|
row = row.to_h
user = "#{row['username']+row['owner_id']}"
row.delete('owner_id')
dict[user] = row.merge(dict[user]) if dict[user]
end
# turn hash into rows
rows = [['username','userid','full_name','follower_count','following_count','media_count','email','category','owner_profile_pic_url','media_url','tagged_brand_username']]
dict.each do |key, value|
row = rows[0].map{|h| value[h] || "" }
rows.push(row)
end
# write to csv
File.write("output.csv", rows.map{|r| r.to_csv(col_sep: ";") }.join)
This covers both when there is a match and no username match in file1.
# file1.csv
username;userid;full_name;follower_count;following_count;media_count;email;category
helloworld;1234;data3;data4;data5;data6;data7;data8
goodbyeworld;5678;data3;data4;data5;data6;data7;file2.csv
# file2.csv
username;owner_id;owner_profile_pic_url;media_url;tagged_brand_username
helloworld;1234;data3b;data4b;data5b
# output.csv
username;userid;full_name;follower_count;following_count;media_count;email;category;owner_profile_pic_url;media_url;tagged_brand_username
helloworld;1234;data3;data4;data5;data6;data7;data8;data3b;data4b;data5b
goodbyeworld;5678;data3;data4;data5;data6;data7;data8;"";"";""
As mentioned, the fact that there is two lines with the same ID in output.csv is very confusing. Next time just add an extra row showing what happens if there's no match. While this is a good question, we have guidelines on how to write an excellent question.
There are two existing CSV input files and we wish to create one CSV output file:
FNAME1 = 'file1.csv'
FNAME2 = 'file2.csv'
FILE_OUT = 'output.csv'
Let's first create the two input files.
File.write(FNAME1, "username;userid;full_name;follower_count;following_count;media_count;email;category\nhelloworld;1234;data3;data4;data5;data6;data7;data8\n")
#=> 136
File.write(FNAME2, "username;owner_id;owner_profile_pic_url;media_url;tagged_brand_username\nhelloworld;1234;data3b;data4b;data5b\n")
#=> 109
Now go through the steps to read those files, manipulate their contents and write the output file.
require 'csv'
First read both input files and save their contents in variables.
def read_csv(fname)
CSV.read(fname, col_sep: ';', headers: true)
end
csv1 = read_csv(FNAME1)
#=> #<CSV::Table mode:col_or_row row_count:2>
csv2 = read_csv(FNAME2)
#=> #<CSV::Table mode:col_or_row row_count:2>
Note:
csv1.to_a
#=> [["username", "userid", "full_name", "follower_count", "following_count",
# "media_count", "email", "category"],
# ["helloworld", "1234", "data3", "data4", "data5",
# "data6", "data7", "data8"]]
csv2.to_a
#=> [["username", "owner_id", "owner_profile_pic_url", "media_url", "tagged_brand_username"],
# ["helloworld", "1234", "data3b", "data4b", "data5b"]]
As you see, these are ordinary arrays, so if we wished we could at this point forget they came from CSV files and use standard Ruby methods to create the desired output file.
Now see if the values of "username" are the same in both files:
username1 = csv1['username'].first
#=> "helloworld"
username2 = csv2['username'].first
#=> "helloworld"
csv1['username'] creates an array of all values in the "helloworld" column. Here that is simply ["helloworld"]; hence .first. Same for csv2, of course.
If username1 == username2 #=> false we perform an action that I am not clear about, then quit. Henceforth, I assume the two usernames are equal.
Read the headers of both files into arrays.
headers1 = csv1.headers
#=> ["username", "userid", "full_name", "follower_count", "following_count",
# "media_count", "email", "category"]
headers2 = csv2.headers
#=> ["username", "owner_id", "owner_profile_pic_url", "media_url",
# "tagged_brand_username"]
The output file is to contain all the columns in headers1 and all the columns in headers2 with the exception of "username" and "owner_id" in headers2, so let's next get rid of those headers in headers2:
headers2 -= ["username", "owner_id"]
#=> ["owner_profile_pic_url", "media_url", "tagged_brand_username"]
Next retrieve the values of the headers in the first file:
values1 = headers1.flat_map { |h| csv1[h] }
#=> ["helloworld", "1234", "data3", "data4", "data5", "data6", "data7", "data8"]
and the values of the remaining headers in the second file:
values2 = headers2.flat_map { |h| csv2[h] }
#=> ["data3b", "data4b", "data5b"]
We will modify values2 below so we need to save its current size:
values2_size = values2.size
#=> i
The first line in the output file after the header line is to contain the values:
values1 += values2
#=> ["helloworld", "1234", "data3", "data4", "data5", "data6", "data7", "data8",
# "data3b", "data4b", "data5b"]
and the second line is to contain:
values2 = values1 - values2
#=> ["helloworld", "1234", "data3", "data4", "data5", "data6", "data7", "data8",
plus values2_size #=> 3 empty fields.
We could use CSV methods to write this to file, but there is really no advantage in doing so over using regular file methods. We can simply write the following string to file.
str = [(headers1 + headers2).join(';'),
values1.join(';'),
values2.join(';') + ';' * values2_size
].join("\n")
puts str
username;userid;full_name;follower_count;following_count;media_count;email;category;owner_profile_pic_url;media_url;tagged_brand_username
helloworld;1234;data3;data4;data5;data6;data7;data8;data3b;data4b;data5b
helloworld;1234;data3;data4;data5;data6;data7;data8;;;
Let's do it.
File.write(FILE_OUT, str)
#=> 265
Note that, if a and b are arrays, a += b and a -= b expand to a = a + b and a = a - b, respectively. The CSV methods I've used are documented here.
I will leave it to the OP to combine the operations I've discussed into a method.
My file content is
blablabla
Name : 'XYZ'
Age : '30'
Place : 'ABCD'
blablabla
How can I grep for "Name", "Age", "Place" and store name "XYZ", age "30" and place "ABCD" in a hash?
What should be the '?' in this code to get those?
data = {}
name = /Name/
age = /Age/
place = /Place/
read_lines(file) { |l|
case l
when name
data[:name] = ?
when age
data[:age] = ?
when place
data[:place]= ?
end
}
You can use something like this.
data = {}
keys = {:name => "Name", :age => "Age", :place => "Place"}
File.open("test.txt", "r") do |f|
f.each_line do |line|
line.chomp!
keys.each do |hash_key, string|
if line[/#{string}/]
data[hash_key] = line.strip.split(" : ")[-1].gsub("'", "")
break
end
end
end
end
output
p data
# => {:name=>"XYZ", :age=>"30", :place=>"ABCD"}
Strange code, but in this case:
data[:name] = l.split(':')[1] if l.match(name)
when age
data[:age] = l.split(':')[1] if l.match(age)
when place
data[:place]= l.split(':')[1] if l.match(place)
Are you interested in refactoring?
One option is to:
mapping =
[
{ name: :name, pattern: /Name/ },
{ name: :age, pattern: /Age/ },
{ name: :place, pattern: /Place/ }
]
data = str.split(/\r?\n|\r/).map do |line|
mapping.map{|pair|
{ pair[:name] => line.split(' : ')[1].gsub("'", "") } if line.match(pair[:pattern])
}.compact.reduce({}, :merge)
end.reduce({}, :merge)
Suppose we first read the file into a string:
str = File.read('fname')
which is:
str =<<_
blablabla
Name : 'XYZ'
Age : '30'
Place : 'ABCD'
blablabla
_
#=> "blablabla\nName : 'XYZ'\nAge : '30'\nPlace : 'ABCD'\nblablabla\n"
Then use the regex
r = /
^ # match beginning of line
Name\s*:\s*'(.*)'\n # match 'Name`, ':' possibly surrounded by spaces, any number
# of any character in capture group 1, end of line
Age\s*:\s*'(.*)'\n # match 'Age`, ':' possibly surrounded by spaces, any number
# of any character in capture group 2, end of line
Place\s*:\s*'(.*)'\n # match 'Place`, ':' possibly surrounded by spaces, any number
# of any character in capture group 3, end of line
/x # free-spacing regex definition mode
with String#scan to form the hash:
[:name, :age, :place].zip(str.scan(r).first).to_h
#=> {:name=>"XYZ", :age=>"30", :place=>"ABCD"}
I'd do something like this:
str = <<EOT
blablabla
Name : 'XYZ'
Age : '30'
Place : 'ABCD'
blablabla
EOT
str.scan(/(Name|Age|Place)\s+:\s'([^']+)/).to_h # => {"Name"=>"XYZ", "Age"=>"30", "Place"=>"ABCD"}
scan will create sub-arrays if it sees pattern groups in the regular expression. Those make it easy to turn the returned array of arrays into a hash.
If you need to fold the keys to lower-case, or convert them to symbols:
str.scan(/(Name|Age|Place)\s+:\s'([^']+)/)
.map{ |k, v| [k.downcase, v] } # => [["name", "XYZ"], ["age", "30"], ["place", "ABCD"]]
.to_h # => {"name"=>"XYZ", "age"=>"30", "place"=>"ABCD"}
Or:
str.scan(/(Name|Age|Place)\s+:\s'([^']+)/)
.map{ |k, v| [k.downcase.to_sym, v] } # => [[:name, "XYZ"], [:age, "30"], [:place, "ABCD"]]
.to_h # => {:name=>"XYZ", :age=>"30", :place=>"ABCD"}
Or some variation on:
str.scan(/(Name|Age|Place)\s+:\s'([^']+)/)
.each_with_object({}){ |(k,v), h| h[k.downcase.to_sym] = v}
# => {:name=>"XYZ", :age=>"30", :place=>"ABCD"}
If the example string truly is the complete file, and there won't be any other reoccurrence of the key/value pairs, then this will work. If there could be more than one then the resulting hash will not be correct because the subsequent pairs will stomp on the first one. If the file is as you said, then it'll work fine.
sorry my bad english, im new
i have this document.txt
paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27
... lot more
i mean, how to strip each line, and comma sparator, into a hash like this
result = {
line_num: { name1: "paula wood", name2: "sarah carnley", m1: 1277, m2: 1268, sc1: 21, sc2: 12, sc3: 21, sc4: 19 }
}
i try to code like this
im using text2re for regex here
doc = File.read("doc.txt")
lines = doc.split("\n")
counts = 0
example = {}
player1 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))'
player2 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))'
re = (player1 + player2 )
m = Regexp.new(re, Regexp::IGNORECASE)
lines.each do |line|
re1='((?:[a-z][a-z]+))' # Word 1
re2='(.)' # Any Single Character 1
re3='((?:[a-z][a-z]+))' # Word 2
re4='(.)' # Any Single Character 2
re5='((?:[a-z][a-z]+))' # Word 3
re6='(.)' # Any Single Character 3
re7='((?:[a-z][a-z]+))' # Word 4
re=(re1+re2+re3+re4+re5+re6+re7)
m=Regexp.new(re,Regexp::IGNORECASE);
if m.match(line)
word1=m.match(line)[1];
c1=m.match(line)[2];
word2=m.match(line)[3];
c2=m.match(line)[4];
word3=m.match(line)[5];
c3=m.match(line)[6];
word4=m.match(line)[7];
counts += 1
example[counts] = word1+word2
puts example
end
end
# (/[a-z].?/)
but the output does not match my expectation
1=>"", 2=>"indahdelika", 3=>"masam",
..more
Your data is comma-separated, so use the CSV class instead of trying to roll your own parser. There are dragons waiting for you if you try to split simply using commas.
I'd use:
require 'csv'
data = "paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27
"
hash = {}
CSV.parse(data).each_with_index do |row, i|
name1, name2, m1, m2, sc1_2, sc3_4 = row
sc1, sc2 = sc1_2.split('-')
sc3, sc4 = sc3_4.split('-')
hash[i] = {
name1: name1,
name2: name2,
m1: m1,
m2: m2,
sc1: sc1,
sc2: sc2,
sc3: sc3,
sc4: sc4,
}
end
Which results in:
hash
# => {0=>
# {:name1=>"paul gordon",
# :name2=>"jin kazama",
# :m1=>"1277",
# :m2=>"1268",
# :sc1=>"21",
# :sc2=>"12",
# :sc3=>"21",
# :sc4=>"19"},
# 1=>
# {:name1=>"yoshimistu",
# :name2=>"the rock",
# :m1=>"2020",
# :m2=>"2092",
# :sc1=>"21",
# :sc2=>"9",
# :sc3=>"21",
# :sc4=>"23"}}
Since you're reading from a file, modify the above a bit using the "Reading from a file a line at a time" example in the documentation.
If the numerics need to be integers, tweak the hash definition to:
hash[i] = {
name1: name1,
name2: name2,
m1: m1.to_i,
m2: m2.to_i,
sc1: sc1.to_i,
sc2: sc2.to_i,
sc3: sc3.to_i,
sc4: sc4.to_i,
}
Which results in:
# => {0=>
# {:name1=>"paul gordon",
# :name2=>"jin kazama",
# :m1=>1277,
# :m2=>1268,
# :sc1=>21,
# :sc2=>12,
# :sc3=>21,
# :sc4=>19},
# 1=>
# {:name1=>"yoshimistu",
# :name2=>"the rock",
# :m1=>2020,
# :m2=>2092,
# :sc1=>21,
# :sc2=>9,
# :sc3=>21,
# :sc4=>23}}
# :sc4=>"23"}}
This is another way you could do it. I have made no assumptions about the number of items per line which are to be the values of :namex, :scx or :mx, or the order of those items.
Code
def hashify(str)
str.lines.each_with_index.with_object({}) { |(s,i),h| h[i] = inner_hash(s) }
end
def inner_hash(s)
n = m = sc = 0
s.split(',').each_with_object({}) do |f,g|
case f
when /[a-zA-Z].*/
g["name#{n += 1}".to_sym] = f
when /\-/
g["sc#{sc += 1}".to_sym], g["sc#{sc += 1}".to_sym] = f.split('-').map(&:to_i)
else
g["m#{m += 1}".to_sym] = f.to_i
end
end
end
Example
str = "paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27"
hashify(str)
#=> {0=>{:name1=>"paul gordon", :name2=>"jin kazama",
# :m1=>1277, :m2=>1268,
# :sc1=>21, :sc2=>12, :sc3=>21, :sc4=>19},
# 1=>{:name1=>"yoshimistu", :name2=>"the rock",
# :m1=>2020, :m2=>2092,
# :sc1=>21, :sc2=>9, :sc3=>21, :sc4=>23, :sc5=>25, :sc6=>27}
# }
I am generating a script that is outputting information to the console. The information is some kind of statistic with a value. So much like a hash.
So one value's name may be 8 characters long and another is 3. when I am looping through outputting the information with two \t some of the columns aren't aligned correctly.
So for example the output might be as such:
long value name 14
short 12
little 13
tiny 123421
long name again 912421
I want all the values lined up correctly. Right now I am doing this:
puts "#{value_name} - \t\t #{value}"
How could I say for long names, to only use one tab? Or is there another solution?
Provided you know the maximum length to be no more than 20 characters:
printf "%-20s %s\n", value_name, value
If you want to make it more dynamic, something like this should work nicely:
longest_key = data_hash.keys.max_by(&:length)
data_hash.each do |key, value|
printf "%-#{longest_key.length}s %s\n", key, value
end
There is usually a %10s kind of printf scheme that formats nicely.
However, I have not used ruby at all, so you need to check that.
Yes, there is printf with formatting.
The above example should right align in a space of 10 chars.
You can format based on your widest field in the column.
printf ([port, ]format, arg...)
Prints arguments formatted according to the format like sprintf. If the first argument is the instance of the IO or its subclass, print redirected to that object. the default is the value of $stdout.
String has a built-in ljust for exactly this:
x = {"foo"=>37, "something long"=>42, "between"=>99}
x.each { |k, v| puts "#{k.ljust(20)} #{v}" }
# Outputs:
# foo 37
# something long 42
# between 99
Or, if you want tabs, you can do a little math (assuming tab display width of 8) and write a short display function:
def tab_pad(label, tab_stop = 4)
label_tabs = label.length / 8
label.ljust(label.length + tab_stop - label_tabs, "\t")
end
x.each { |k, v| puts "#{tab_pad(k)}#{v}" }
# Outputs:
# foo 37
# something long 42
# between 99
There was few bugs in it before, but now you can use most of printf syntax with % operator:
1.9.3-p194 :025 > " %-20s %05d" % ['hello', 12]
=> " hello 00012"
Of course you can use precalculated width too:
1.9.3-p194 :030 > "%-#{width}s %05x" % ['hello', 12]
=> "hello 0000c"
I wrote a thing
Automatically detects column widths
Spaces with spaces
Array of arrays [[],[],...] or array of hashes [{},{},...]
Does not detect columns too wide for console window
lists = [
[ 123, "SDLKFJSLDKFJSLDKFJLSDKJF" ],
[ 123456, "ffff" ],
]
array_maxes
def array_maxes(lists)
lists.reduce([]) do |maxes, list|
list.each_with_index do |value, index|
maxes[index] = [(maxes[index] || 0), value.to_s.length].max
end
maxes
end
end
array_maxes(lists)
# => [6, 24]
puts_arrays_columns
def puts_arrays_columns(lists)
maxes = array_maxes(hashes)
lists.each do |list|
list.each_with_index do |value, index|
print " #{value.to_s.rjust(maxes[index])},"
end
puts
end
end
puts_arrays_columns(lists)
# Output:
# 123, SDLKFJSLDKFJSLDKFJLSDKJF,
# 123456, ffff,
and another thing
hashes = [
{ "id" => 123, "name" => "SDLKFJSLDKFJSLDKFJLSDKJF" },
{ "id" => 123456, "name" => "ffff" },
]
hash_maxes
def hash_maxes(hashes)
hashes.reduce({}) do |maxes, hash|
hash.keys.each do |key|
maxes[key] = [(maxes[key] || 0), key.to_s.length].max
maxes[key] = [(maxes[key] || 0), hash[key].to_s.length].max
end
maxes
end
end
hash_maxes(hashes)
# => {"id"=>6, "name"=>24}
puts_hashes_columns
def puts_hashes_columns(hashes)
maxes = hash_maxes(hashes)
return if hashes.empty?
# Headers
hashes.first.each do |key, value|
print " #{key.to_s.rjust(maxes[key])},"
end
puts
hashes.each do |hash|
hash.each do |key, value|
print " #{value.to_s.rjust(maxes[key])},"
end
puts
end
end
puts_hashes_columns(hashes)
# Output:
# id, name,
# 123, SDLKFJSLDKFJSLDKFJLSDKJF,
# 123456, ffff,
Edit: Fixes hash keys considered in the length.
hashes = [
{ id: 123, name: "DLKFJSDLKFJSLDKFJSDF", asdfasdf: :a },
{ id: 123456, name: "ffff", asdfasdf: :ab },
]
hash_maxes(hashes)
# => {:id=>6, :name=>20, :asdfasdf=>8}
Want to whitelist columns columns?
hashes.map{ |h| h.slice(:id, :name) }
# => [
# { id: 123, name: "DLKFJSDLKFJSLDKFJSDF" },
# { id: 123456, name: "ffff" },
#]
For future reference and people who look at this or find it... Use a gem. I suggest https://github.com/wbailey/command_line_reporter
You typically don't want to use tabs, you want to use spaces and essentially setup your "columns" your self or else you run into these types of problems.