Ruby substring from a file - ruby

If I need to get a word from a text file, for example, I have this text file
AWD,SUV,0km,auto
and I need to get the km number or the Drivetrain which is AWD, what do we do after reading the file?
here's how I'm reading the file
def getWord(fileName)
file=fileName
File.readlines(file).each do |line|
puts line
end

This is a CSV (Comma Separated Value) file. You can just split on the comma and take the fields.
# chomp: true removes the trailing newline
File.readlines(fileName, chomp: true).each do |line|
(drivetrain,type,mileage,transmission) = line.split(',')
puts drivetrain
puts mileage
end
But CSV files can get more complex. For example, if there's a comma in a value it could be quoted. 1997,Ford,E350,"Super, luxurious truck". To handle all the possibilities, use the CSV class to parse each line.
headers: [:drivetrain, :type, :mileage, :transmission]
CSV.foreach(fileName, headers: headers) do |row|
puts row[:drivetrain]
puts row[:mileage]
end

Related

Remove whitespace from CSV row

I need to remove or stripe white space from CSV rows.
test.csv
60 500, # want 60500
8 100, # want 8100
5 400, # want 5400
480, # want 480
remove_space.rb
require 'csv'
CSV.foreach('test.csv') do |row|
row = row[0]
row = row.strip
row = row.gsub(" ","")
puts row
end
I don't understand why it doesn't work. Result is same as test.csv
Any idea?
Your test.csv file contains narrow no-break space (U+202F) unicode characters. This is a non-whitespace character. (A regular space character is U+0020.)
You can see the different possible unicode spaces here: http://jkorpela.fi/chars/spaces.html
Here is a more generic script - using a POSIX bracket group - to remove all "space-like" characters:
require 'csv'
CSV.foreach('test.csv') do |row|
row = row[0]
row = row.gsub(/[[:space:]]/,"")
puts row
end
The thing is those are not normal spaces but rather Narrow No-Break Spaces:
require 'csv'
CSV.foreach('/tmp/test.csv') do |row|
puts row[0].delete "\u202f"
end
#⇒ 60500
# 8100
# 5400
# 480
You can strip out all the spaces, including unicode ones, by using \p{Space} matcher.
require 'csv'
CSV.foreach('/tmp/test.csv') do |row|
puts row[0].gsub /\p{Space}/, ''
end

Reading specific line into an array - ruby

Have a txt file with the following:
Anders Hansen;87442355;11;87
Jens Hansen;22338843;23;11
Nanna Kvist;25233255;24;84
I would like to search the file after a specific name taken from the user input. Then save that line into an array, splittet via ";". Can't get it to work though. This is my code:
user1 = []
puts "Start by entering the full name of user 1: "
input = gets.chomp
File.open("userregister.txt") do |f|
f.each_line { |line|
if line =~ input then do |line|
user1 << line.split(';').map
=~ in ruby tries to match a string with a regex (or vice versa). Here, you use it with two strings, which gives an error:
'foo' =~ 'bar' # => TypeError: type mismatch: String given
There are more appropriate String methods to use instead. In your case, #start_with? does the job. If you wanted to check if the latter is contained somewhere as a substring (but not necessary the beginning), you can use #include?.
In case you actually wanted to take a regex as a user input (generally a bad idea), you can convert it from string to regex:
line =~ /#{input}/
Looking at the file format, I would actually use Ruby CSV class. By specifying the column separator to ;, you will get an array for each row.
require 'csv'
input = gets.chomp
CSV.foreach('userregister.txt', col_sep: ';') do |row|
if row[0].downcase == input.downcase
# Do stuffs with row[1..-1]
end
end

Process a file using hash

Below is the input file that I want to store into a hash table, sort it and output in the format shown below.
Input File
Name=Ashok, Email=ashok85#gmail.com, Country=India, Comments=9898984512
Email=raju#hotmail.com, Country=Sri Lanka, Name=Raju
Country=India, Comments=45535878, Email=vijay#gmail.com, Name=Vijay
Name=Ashok, Country=India, Email=ashok37#live.com, Comments=8898788987
Output File (Sorted by Name)
Name Email Country Comments
-------------------------------------------------------
Ashok ashok37#live.com India 8898788987
Ashok ashok85#gmail.com India 9898984512
Raju raju#hotmail.com Sri Lanka
Vijay vijay#gmail.com India 45535878
So far, I have read the data from the file and stored every line into an array, but I am stuck at hash[key]=>value
file_data = {}
File.open('input.txt', 'r') do |file|
file.each_line do |line|
line_data = line.split('=')
file_data[line_data[0]] = line_data[1]
end
end
puts file_data
Given that each line in your input file has pattern of key=value strings which are separated by commas, you need to split the line first around comma, and then around equals sign. Here is version of corrected code:
# Need to collect parsed data from each line into an array
array_of_file_data = []
File.open('input.txt', 'r') do |file|
file.each_line do |line|
#create a hash to collect data from each line
file_data = {}
# First split by comma
pairs = line.chomp.split(", ")
pairs.each do |p|
#Split by = to separate out key and value
key_value = p.split('=')
file_data[key_value[0]] = key_value[1]
end
array_of_file_data << file_data
end
end
puts array_of_file_data
Above code will print:
{"Name"=>"Ashok", "Email"=>"ashok85#gmail.com", "Country"=>"India", "Comments"=>"9898984512"}
{"Email"=>"raju#hotmail.com", "Country"=>"Sri Lanka", "Name"=>"Raju"}
{"Country"=>"India", "Comments"=>"45535878", "Email"=>"vijay#gmail.com", "Name"=>"Vijay"}
{"Name"=>"Ashok", "Country"=>"India", "Email"=>"ashok37#live.com", "Comments"=>"8898788987"}
A more complete version of program is given below.
hash_array = []
# Parse the lines and store it in hash array
File.open("sample.txt", "r") do |f|
f.each_line do |line|
# Splits are done around , and = preceded or followed
# by any number of white spaces
splits = line.chomp.split(/\s*,\s*/).map{|p| p.split(/\s*=\s*/)}
# to_h can be used to convert an array with even number of elements
# into a hash, by treating it as an array of key-value pairs
hash_array << splits.to_h
end
end
# Sort the array of hashes
hash_array = hash_array.sort {|i, j| i["Name"] <=> j["Name"]}
# Print the output, more tricks needed to get it better formatted
header = ["Name", "Email", "Country", "Comments"]
puts header.join(" ")
hash_array.each do |h|
puts h.values_at(*header).join(" ")
end
Above program outputs:
Name Email Country Comments
Ashok ashok85#gmail.com India 9898984512
Ashok ashok37#live.com India 8898788987
Raju raju#hotmail.com Sri Lanka
Vijay vijay#gmail.com India 45535878
You may want to refer to Padding printed output of tabular data to have better formatted tabular output

Force the file opened with "r+" to end

There is a file with some marker word in it:
qwerty
I am the marker!
zxcvbn
123456
I want to overwrite all the rest of the file after the marker with some unknown amount of lines instead:
qwerty
I am the marker!
inserted line #1
inserted line #2
inserted line #3
But if there are too few lines to be inserted, the tail can be still there, that I do not need:
qwerty
I am the marker!
inserted line #1
123456
Here is my code (simplified):
File.open("file.txt", "r+") do |file|
file.gets "marker"
file.gets
lines_to_insert.each do |line|
file.puts line
end
# I wish I could do file.put_EOF here
end
File.open("file.txt", "r+") do |file|
file.gets "marker"
file.gets
lines_to_insert.each do |line|
file.puts line
end
# EOF here
file.truncate(file.pos)
end
Making use of File#pos to specify where to truncate.
How about using a temp file?
File.open("file.tmp", "w") do |tmp_file|
File.open("file.txt", "r+") do |file|
file.readlines.each do |line|
# add each line of the original file up to and including marker line
tmp_file.puts line
if line.include? "marker" #or however you're indicating marker
break
end
end
# add new lines
lines_to_insert.each do |line|
tmp_file.puts line
end
end
end
FileUtils.mv 'file.tmp', 'file.txt'
This will guarantee a file with a proper EOF line and not a hacky set of lines at the end that are nothing but newline characters or spaces.
Why not fill an array with each line, by using something like this:
array = file.split("\\\n")
Then you can just find the index of the array that contains the word marker
marker_index = array.index{|line|line.include('marker')}
Then just add random values to any index > marker_index
Finally concatenate all the strings in your array (don't forget to add your \n back in) and write back to your file.

What's the best way to parse a tab-delimited file in Ruby?

What's the best (most efficient) way to parse a tab-delimited file in Ruby?
The Ruby CSV library lets you specify the field delimiter. Ruby 1.9 uses FasterCSV. Something like this would work:
require "csv"
parsed_file = CSV.read("path-to-file.csv", col_sep: "\t")
The rules for TSV are actually a bit different from CSV. The main difference is that CSV has provisions for sticking a comma inside a field and then using quotation characters and escaping quotes inside a field. I wrote a quick example to show how the simple response fails:
require 'csv'
line = 'boogie\ttime\tis "now"'
begin
line = CSV.parse_line(line, col_sep: "\t")
puts "parsed correctly"
rescue CSV::MalformedCSVError
puts "failed to parse line"
end
begin
line = CSV.parse_line(line, col_sep: "\t", quote_char: "Ƃ")
puts "parsed correctly with random quote char"
rescue CSV::MalformedCSVError
puts "failed to parse line with random quote char"
end
#Output:
# failed to parse line
# parsed correctly with random quote char
If you want to use the CSV library you could used a random quote character that you don't expect to see if your file (the example shows this), but you could also use a simpler methodology like the StrictTsv class shown below to get the same effect without having to worry about field quotations.
# The main parse method is mostly borrowed from a tweet by #JEG2
class StrictTsv
attr_reader :filepath
def initialize(filepath)
#filepath = filepath
end
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
fields = Hash[headers.zip(line.split("\t"))]
yield fields
end
end
end
end
# Example Usage
tsv = Vendor::StrictTsv.new("your_file.tsv")
tsv.parse do |row|
puts row['named field']
end
The choice of using the CSV library or something more strict just depends on who is sending you the file and whether they are expecting to adhere to the strict TSV standard.
Details about the TSV standard can be found at http://en.wikipedia.org/wiki/Tab-separated_values
There are actually two different kinds of TSV files.
TSV files that are actually CSV files with a delimiter set to Tab. This is something you'll get when you e.g. save an Excel spreadsheet as "UTF-16 Unicode Text". Such files use CSV quoting rules, which means that fields may contain tabs and newlines, as long as they are quoted, and literal double quotes are written twice. The easiest way to parse everything correctly is to use the csv gem:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t")
TSV files conforming to the IANA standard. Tabs and newlines are not allowed as field values, and there is no quoting whatsoever. This is something you will get when you e.g. select a whole Excel spreadsheet and paste it into a text file (beware: it will get messed up if some cells do contain tabs or newlines). Such TSV files can be easily parsed line-by-line with a simple line.rstrip.split("\t", -1) (note -1, which prevents split from removing empty trailing fields). If you want to use the csv gem, simply set quote_char to nil:
use 'csv'
parsed = CSV.read("file.tsv", col_sep: "\t", quote_char: nil)
I like mmmries answer. HOWEVER, I hate the way that ruby strips off any empty values off of the end of a split. It isn't stripping off the newline at the end of the lines, either.
Also, I had a file with potential newlines within a field. So, I rewrote his 'parse' as follows:
def parse
open(filepath) do |f|
headers = f.gets.strip.split("\t")
f.each do |line|
myline=line
while myline.scan(/\t/).count != headers.count-1
myline+=f.gets
end
fields = Hash[headers.zip(myline.chomp.split("\t",headers.count))]
yield fields
end
end
end
This concatenates any lines as necessary to get a full line of data, and always returns the full set of data (without potential nil entries at the end).

Resources