Process a file using hash - ruby

Below is the input file that I want to store into a hash table, sort it and output in the format shown below.
Input File
Name=Ashok, Email=ashok85#gmail.com, Country=India, Comments=9898984512
Email=raju#hotmail.com, Country=Sri Lanka, Name=Raju
Country=India, Comments=45535878, Email=vijay#gmail.com, Name=Vijay
Name=Ashok, Country=India, Email=ashok37#live.com, Comments=8898788987
Output File (Sorted by Name)
Name Email Country Comments
-------------------------------------------------------
Ashok ashok37#live.com India 8898788987
Ashok ashok85#gmail.com India 9898984512
Raju raju#hotmail.com Sri Lanka
Vijay vijay#gmail.com India 45535878
So far, I have read the data from the file and stored every line into an array, but I am stuck at hash[key]=>value
file_data = {}
File.open('input.txt', 'r') do |file|
file.each_line do |line|
line_data = line.split('=')
file_data[line_data[0]] = line_data[1]
end
end
puts file_data

Given that each line in your input file has pattern of key=value strings which are separated by commas, you need to split the line first around comma, and then around equals sign. Here is version of corrected code:
# Need to collect parsed data from each line into an array
array_of_file_data = []
File.open('input.txt', 'r') do |file|
file.each_line do |line|
#create a hash to collect data from each line
file_data = {}
# First split by comma
pairs = line.chomp.split(", ")
pairs.each do |p|
#Split by = to separate out key and value
key_value = p.split('=')
file_data[key_value[0]] = key_value[1]
end
array_of_file_data << file_data
end
end
puts array_of_file_data
Above code will print:
{"Name"=>"Ashok", "Email"=>"ashok85#gmail.com", "Country"=>"India", "Comments"=>"9898984512"}
{"Email"=>"raju#hotmail.com", "Country"=>"Sri Lanka", "Name"=>"Raju"}
{"Country"=>"India", "Comments"=>"45535878", "Email"=>"vijay#gmail.com", "Name"=>"Vijay"}
{"Name"=>"Ashok", "Country"=>"India", "Email"=>"ashok37#live.com", "Comments"=>"8898788987"}
A more complete version of program is given below.
hash_array = []
# Parse the lines and store it in hash array
File.open("sample.txt", "r") do |f|
f.each_line do |line|
# Splits are done around , and = preceded or followed
# by any number of white spaces
splits = line.chomp.split(/\s*,\s*/).map{|p| p.split(/\s*=\s*/)}
# to_h can be used to convert an array with even number of elements
# into a hash, by treating it as an array of key-value pairs
hash_array << splits.to_h
end
end
# Sort the array of hashes
hash_array = hash_array.sort {|i, j| i["Name"] <=> j["Name"]}
# Print the output, more tricks needed to get it better formatted
header = ["Name", "Email", "Country", "Comments"]
puts header.join(" ")
hash_array.each do |h|
puts h.values_at(*header).join(" ")
end
Above program outputs:
Name Email Country Comments
Ashok ashok85#gmail.com India 9898984512
Ashok ashok37#live.com India 8898788987
Raju raju#hotmail.com Sri Lanka
Vijay vijay#gmail.com India 45535878
You may want to refer to Padding printed output of tabular data to have better formatted tabular output

Related

Ruby substring from a file

If I need to get a word from a text file, for example, I have this text file
AWD,SUV,0km,auto
and I need to get the km number or the Drivetrain which is AWD, what do we do after reading the file?
here's how I'm reading the file
def getWord(fileName)
file=fileName
File.readlines(file).each do |line|
puts line
end
This is a CSV (Comma Separated Value) file. You can just split on the comma and take the fields.
# chomp: true removes the trailing newline
File.readlines(fileName, chomp: true).each do |line|
(drivetrain,type,mileage,transmission) = line.split(',')
puts drivetrain
puts mileage
end
But CSV files can get more complex. For example, if there's a comma in a value it could be quoted. 1997,Ford,E350,"Super, luxurious truck". To handle all the possibilities, use the CSV class to parse each line.
headers: [:drivetrain, :type, :mileage, :transmission]
CSV.foreach(fileName, headers: headers) do |row|
puts row[:drivetrain]
puts row[:mileage]
end

Ruby: script to convert structured text to a csv

I have a text file with structured text which I wish to convert to a csv file.
The file looks something like:
name: Seamus
address: 123 Strand Avenue
name: Seana
address: 126 Strand Avenue
I would like it to look like:
|name | address
______________________________
|Seamus | 123 Strand Avenue
______________________________
|Seana | 126 Strand Avenue
So I understand that I need to do something like;
create a csv file
create the column names
read the text file
for each row of the text file starting with 'name' I assign the following text to the 'name' column, for ech row starting with 'address' assign the value to the 'address' column etc.
But I dont' know how to do so.
I would appreciate any pointers people could provide.
The solution starts by identifying how to parse the text file. In this specific case what separates the "records" in the text file is an empty line.
First step would be importing the file contents:
string_content = File.read("path/to/my_file.txt")
# => "name: Seamus\naddress: 123 Strand Avenue\n\nname: Seana\naddress: 126 Strand Avenue\n"
Then you would need to separate the records. As you can see when parsing the file the empty line is a line that only contains \n, so the \n from the line above plus the one on the empty line make \n\n. That is what you need to look for to separate the records:
string_records = string_content.split("\n\n")
# => ["name: Seamus\naddress: 123 Strand Avenue", "name: Seana\naddress: 126 Strand Avenue\n"]
And then once you have the strings with the records is just a matter of splitting by \n again to separate the fields:
records_by_field = string_records.map do |string_record|
string_record.split("\n")
end
# => [["name: Seamus", "address: 123 Strand Avenue"], ["name: Seana", "address: 126 Strand Avenue"]]
Once that is separated you need to split the records by : to separate field_name and value:
data = records_by_field.map do |record|
record.each_with_object({}) do |field, new_record|
field_name, field_value = field.split(":")
new_record[field_name] = field_value.strip # don't forget to get rid of the initial space with String#strip
end
end
# => [{"name"=>"Seamus", "address"=>"123 Strand Avenue"}, {"name"=>"Seana", "address"=>"126 Strand Avenue"}]
And there you have it! An array of hashes with the correct key-value pairs.
Now from that you can create a CSV or just use it to give it any other format you may want.
To resolve your specific CSV question:
require 'csv'
# first you need to get your column headers, which will be the keys of any of the hashes, the first will do
column_names = data.first.keys
CSV.open("output_file.csv", "wb") do |csv|
# first we add the headers
csv << column_names
# for each data row we create an array with values ordered as the column_names
data.each do |data_hash|
csv << [data_hash[column_names[0]], data_hash[column_names[1]]]
end
end
That will create an output_file.csv in the same directory where you run your ruby script.
And that's it!
Let's construct the file.
str =<<~END
name: Seamus
address: 123 Strand Avenue
name: Seana
address: 126 Strand Avenue
address: 221B Baker Street
name: Sherlock
END
Notice that I've added a third record that has the order of the "name" and "address" lines reversed, and it is preceded by an extra blank line.
in_file = 'temp.txt'
File.write(in_file, str)
#=> 124
The first step is to to obtain the headers for the CSV file:
headers = []
f = File.open(in_file)
loop do
header = f.gets[/[^:]+(?=:)/]
break if header.nil?
headers << header
end
f.close
headers
#=> ["name", "address"]
Notice that the number of headers (two in the example) is arbitrary.
See IO::gets. The regular expression reads, "match one or more characters other than a colon" immediately followed by a colon ((?=:) being a positive lookahead).
If in_file is not exceedingly large it's easiest to first read that file into an array of hashes. The first step is to read the file into a string and then split the string on contiguous lines that contain nothing other than newlines and spaces:
arr = File.read(in_file).chomp.split(/\n\s*\n/)
#=> ["name: Seamus\naddress: 123 Strand Avenue",
# "name: Seana\naddress: 126 Strand Avenue",
# "address: 221B Baker Street\nname: Sherlock"]
We can now convert each element of this array to a hash:
arr = File.read(in_file).split(/\n\s*\n/).
map do |s|
s.split("\n").
each_with_object({}) do |p,h|
key, value = p.split(/: +/)
h[key] = value
end
end
#=> [{"name"=>"Seamus", "address"=>"123 Strand Avenue"},
# {"name"=>"Seana", "address"=>"126 Strand Avenue"},
# {"address"=>"221B Baker Street", "name"=>"Sherlock"}]
We are now ready to construct the CSV file:
out_file = 'temp.csv'
require 'csv'
CSV.open(out_file, 'w') do |csv|
csv << headers
arr.each { |h| csv << h.values_at(*headers) }
end
Let's see what was written:
puts File.read(out_file)
name,address
Seamus,123 Strand Avenue
Seana,126 Strand Avenue
Sherlock,221B Baker Street
See CSV::open and Hash#values_at.
This is not the format specified in the question. In fact, a file with that format would not be a valid CSV file, because there is no consistent column separator. For example, the first line, '|name | address' has a column separator ' | ', whereas the second line, '|Seamus | 123 Strand Avenue' has a column separator ' | '. Moreover, even if they were the same the pipe at the beginning of each line would become the first letter of the name.
We could change the column separator to a pipe (rather than a comma, the default) by writing CSV.open(out_file, col_sep: '|', 'w'). A common mistake in constructing CSV files is to surround the column separator with one or more spaces. That invariably leads to boo-boos.

How to separate two data in one cell on csv by ruby

I want to change CSV file content:
itemId,url,name,type
1|urlA|nameA|typeA
2|urlB|nameB|typeB
3|urlC,urlD|nameC|typeC
4|urlE|nameE|typeE
into an array:
[itemId,url,name,type]
[1,urlA,nameA,typeA]
[2,urlB,nameB,typeB]
[**3**,**urlC**,nameC,typeC]
[**3**,**urlD**,nameC,typeC]
[4,urlE,nameE,typeE]
Could anybody teach me how to do it?
Finally, I'm going to DL url files(.jpg)
The header row has a different separator than the data. That's a problem. You need to change the header row to use | instead of ,. Then:
require 'csv'
require 'pp'
array = Array.new
CSV.foreach("test.csv", col_sep: '|', headers: true) do |row|
if row['url'][/,/]
row['url'].split(',').each do |url|
row['url'] = url
array.push row.to_h.values
end
else
array.push row.to_h.values
end
end
pp array
=> [["1", "urlA", "nameA", "typeA"],
["2", "urlB", "nameB", "typeB"],
["3", "urlC", "nameC", "typeC"],
["3", "urlD", "nameC", "typeC"],
["4", "urlE", "nameE", "typeE"]]
You'll need to test the fifth column to see how the line should be parsed. If you see a fifth element (row[4]) output the line twice replacing the url column
array = Array.new
CSV.foreach("test.csv") do |row|
if row[4]
array << [row[0..1], row[3..4]].flatten
array << [[row[0]], row[2..4]].flatten
else
array << row
end
end
p array
In your example you had asterisks but I'm assuming that was just to emphasise the lines for which you want special handling. If you do want asterisks, you can modify the two array shovel commands appropriately.

How to enter values, save them to a new file and search for the values in the file

I have a school assignment that I need help with.
This is the description of the assignment:
Ruby program that can hold these values:
Artifacts
Values
Assumptions
It shall be possible for a user:
To enter these three types of values.
Search for a value of type Artifacts, Values or Assumptions.
The program shall use loops and at least one class definition.
The only function that won't work out for me are these lines:
f= File.new("Artefacts", "r")
puts "Search for information regarding cultural information"
userinput = gets.chomp
if File.readlines("Artefacts").include?('userinput')
puts "We have found your input."
else
puts "We have not found your input."
f.close
No matter what the user inserts, it only displays "We have not found your input".
Part A: get user input and write to file
def write_to_file(path, string)
# 'a' means append
# it will create the file if it doesnt exist
File.open(path, 'a') do |file|
file.write string + "\n"
end
end
path = "Artefacts"
num_inputs = 3
num_inputs.times do |i|
puts "enter input (#{i + 1} / #{num_inputs}):"
write_to_file path, gets.chomp
end
puts `cat #{path}`
# if you entered "foo" for each input,
# this will show:
# foo
# foo
# foo
Part B: read a file and check if it contains a string:
path = "./Artefacts"
query = "foo"
text = File.read path
# this will be a string with all the text
lines = File.readlines path
# this will be an array of strings (one for each line)
is_text_found = text.include? query
# or
is_text_found = lines.any? do |line|
line.include? query
end

Search through text word by word

I'd like to search through a txt file for a particular word. If I find that word, I'd like to retrieve the word that immediately follows it in the file. If my text file contained:
"My name is Jay and I want to go to the store"
I'd be searching for the word "want", and would want to add the word "to" to my array. I'll be looking through a very big text file, so any notes on performance would be great too.
The most literal way to read that might look like this:
a = []
str = "My name is Jack and I want to go to the store"
str.scan(/\w+/).each_cons(2) {|x, y| a << y if x == 'to'}
a
#=> ["go", "the"]
To read the file into a string use File.read.
This is one way:
Code
def find_next(fname, word)
enum = IO.foreach(fname)
loop do
e = (enum.next).scan(/\w+/)
ndx = e.index(word)
if ndx
return e[ndx+1] if ndx < e.size-1
loop do
e = enum.next
break if e =~ /\w+/
end
return e[/\w+/]
end
end
nil
end
Example
text =<<_
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
. . . . .
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair…
_
FName = "two_cities"
File.write(FName, text)
find_next(FName, "worst")
# of
find_next(FName, "wisdom")
# it
find_next(FName, "foolishness")
# it
find_next(FName, "dispair")
#=> nil
find_next(FName, "magpie")
#=> nil
Shorter, but less efficient, and problematic with large files:
File.read(FName)[/(?<=\b#{word}\b)\W+(\w+)/,1]
This is probably not the fastest way to do it, but something along these lines should work:
filename = "/path/to/filename"
target_word = "weasel"
next_word = ""
File.open(filename).each_line do |line|
line.split.each_with_index do |word, index|
if word == target_word
next_word = line.split[index + 1]
end
end
end
Given a File, String, or StringIO stored in file:
pattern, match = 'want', nil
catch :found do
file.each_line do |line|
line.split.each_cons(2) do |words|
if words[0] == pattern
match = words.pop
throw :found
end
end
end
end
match
#=> "to"
Note that this answer will find at most one match per file for speed, and linewise operation will save memory. If you want to find multiple matches per file, or find matches across line breaks, then this other answer is probably the way to go. YMMV.
This is the fastest I could come up with, assuming your file is loaded in a string:
word = 'want'
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
This will find ALL words that follow your particular word. So for example:
word = 'want'
string = 'My name is Jay and I want to go and I want a candy'
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
p array #=> ["to", "a"]
Testing this on my machine where I duplicated this string 500,000 times, I was able to reach 0.6 seconds execution time. I've also tried other approaches like splitting the string etc. but this was the fastest solution:
require 'benchmark'
Benchmark.bm do |bm|
bm.report do
word = 'want'
string = 'My name is Jay and I want to go and I want a candy' * 500_000
array = []
string.scan(/\b#{word}\b\s(\w+)/) do
array << $1
end
end
end

Resources