Split text while reading line by line - ruby

I am trying to read a text file with contents like this
ABC = Thefirststep
XYZ = Secondstep
ABC_XYZ = Finalstep=345ijk!r4+
I am able to read the file line by line using this
#!/usr/bin/ruby
text = '/tmp/data'
f = File.open(text , "r")
f.each_line { |line|
puts line
}
f.close
What I want to do is have the values TheFirststep Secondstep and Finalstep assigned to separate variables. better if we use split().

You could use something like this:
#!/usr/bin/ruby
text = '/tmp/data'
data = []
f = File.open(text , "r")
f.each_line { |line|
data.push( line.split("=").last)
}
f.close

You said you want to, "have the values, 'TheFirststep', 'Secondstep and 'Finalstep' assigned to separate variables.
You cannot create local variables dynamically (not since Ruby v1.8 anyway). That leaves two choices: assign those values to instance variables or use a different data structure, specifically, a hash.
First let's create a data file.
data <=-END
ABC = Thefirststep
XYZ = Secondstep
ABC_XYZ = Finalstep=345ijk!r4+
END
FName = 'test'
File.write(FName, data)
#=> 73
Assign values to instance variables
File.foreach(FName) do |line|
var, value, * = line.chomp.split(/\s*=\s*/)
instance_variable_set("##{var.downcase}", value)
end
#abc
#=> "Thefirststep"
#xyz
#=> "Secondstep"
#abc_xyz
#=> "Finalstep"
The convention for the names of instance variables (after the "#") is to use snake-case, which is why I downcased them.
Store the values in a hash
File.foreach(FName).with_object({}) do |line,h|
var, value, * = line.chomp.split(/\s*=\s*/)
h[var] = value
end
#=> {"ABC"=>"Thefirststep", "XYZ"=>"Secondstep", "ABC_XYZ"=>"Finalstep"}
As easy as this was to do, it's not generally helpful to generate instance variables dynamically or hashes with dynamically created keys. That's because they are only useful if their values can be obtained and possibly changed, which is problematic.
Note that in
var, value, * = line.chomp.split(/\s*=\s*/)
var equals the first element of the array returned by the split operation, value is the second value and * discards the remaining elements, if any.

Related

Why am i getting the error `stack level too deep (systemstackerror)` when hashing 2 columns in a CSV?

This is my code, which is supposed to hash the 2 columns in fotoFd.csv and then save the hashed columns in a separate file, T4Friendship.csv:
require "csv"
arrayUser=[]
arrayUserUnique=[]
arrayFriends=[]
fileLink = "fotoFd.csv"
f = File.open(fileLink, "r")
f.each_line { |line|
row = line.split(",");
arrayUser<<row[0]
arrayFriends<<row[1]
}
arrayUserUnique = arrayUser.uniq
arrayHash = []
for i in 0..arrayUser.size-1
arrayHash<<arrayUser[i]
arrayHash<<i
end
hash = Hash[arrayHash.each_slice(2).to_a]
array1 =hash.values_at *arrayUser
array2 =hash.values_at *arrayFriends
fileLink = "T4Friendship.csv"
for i in 0..array1.size-1
logfile = File.new(fileLink,"a")
logfile.print("#{array1[i]},#{array2[i]}\n")
logfile.close
end
The first columns contains users, and the second column contains their friends. So, I want it to produce something like this in the T4Friendship.csv:
1 2
1 4
1 10
1 35
2 1
2 8
2 11
3 28
3 31
...
The problem is caused by the splat expansion of a large array. The splat * can be used to expand an array as a parameter list. The parameters are passed on the stack. If there are too many parameters, you'll exhaust stack space and get the mentioned error.
Here's a quick demo of the problem in irb that tries to splat an array of one million elements when calling puts:
irb
irb(main):001:0> a = [0] * 1000000; nil # Use nil to suppress statement output
=> nil
irb(main):002:0> puts *a
SystemStackError: stack level too deep
from /usr/lib/ruby/1.9.1/irb/workspace.rb:80
Maybe IRB bug!
irb(main):003:0>
You seem to be processing large CSV files, and so your arrayUser array is quite large. Expanding the large array with the splat causes the problem on the line:
array1 =hash.values_at *arrayUser
You can avoid the splat by calling map on arrayUser, and converting each value in a block:
array1 = arrayUser.map{ |user| hash[user] }
Suggested Code
Your code appears to map names to unique ID numbers. The output appears to be the same format as the input, except with the names translated to ID numbers. You can do this without keeping any arrays around eating up memory, and just use a single hash built up during read, and used to translate the names to numbers on the fly. The code would look like this:
def convertCsvNamesToNums(inputFileName, outputFileName)
# Create unique ID number hash
# When unknown key is lookedup, it is added with new unique ID number
# Produces a 0 based index
nameHash = Hash.new { |hash, key| hash[key] = hash.size }
# Convert input CSV with names to output CSV with ID numbers
File.open(inputFileName, "r") do |inputFile|
File.open(outputFileName, 'w') do |outputFile|
inputFile.each_line do |line|
# Parse names from input CSV
userName, friendName = line.split(",")
# Map names to unique ID numbers
userNum = nameHash[userName]
friendNum = nameHash[friendName]
# Write unique ID numbers to output CSV
outputFile.puts "#{userNum}, #{friendNum}"
end
end
end
end
convertCsvNamesToNums("fotoFd.csv", "T4Friendship.csv")
Note: This code assigns ID numbers to user and friends, as they are encountered. Your previous code assigned ID numbers to users only, and then looked up the friends after. The code I suggested will ensure friends are assigned ID numbers, even if they never appeared in the user list. The numerical ordering will different slightly from what you supplied, but I assume that is not important.
You can also shorten the body of the inner loop to:
# Parse names from input, map to ID numbers, and write to output
outputFile.puts line.split(",").map{|name| nameHash[name]}.join(',')
I thought I'd include this change separately for readability.
Updated Code
As per your request in the comments, here is code that gives priority to the user column for ID numbers. Only once the first column is completely processed will ID numbers be assigned to entries in the second column. It does this by first passing over the input once, adding the first column to the hash, and then passing over the input a second time to process it as before, using the pre-prepared hash from the first pass. New entries can still be added in the second pass in the case where the friend column contains a new entry that doesn't exist anywhere in the user column.
def convertCsvNamesToNums(inputFileName, outputFileName)
# Create unique ID number hash
# When unknown key is lookedup, it is added with new unique ID number
# Produces a 0 based index
nameHash = Hash.new { |hash, key| hash[key] = hash.size }
# Pass over the data once to give priority to user column for ID numbers
File.open(inputFileName, "r") do |inputFile|
inputFile.each_line do |line|
name, = line.split(",") # Parse name from line, ignore the rest
nameHash[name] # Add name to unique ID number hash (if it doesn't already exist)
end
end
# Convert input CSV with names to output CSV with ID numbers
File.open(inputFileName, "r") do |inputFile|
File.open(outputFileName, 'w') do |outputFile|
inputFile.each_line do |line|
# Parse names from input, map to ID numbers, and write to output
outputFile.puts line.split(",").map{|name| nameHash[name]}.join(',')
end
end
end
end
convertCsvNamesToNums("fotoFd.csv", "T4Friendship.csv")

Best way of Parsing 2 CSV files and printing the common values in a third file

I am new to Ruby, and I have been struggling with a problem that I suspect has a simple answer. I have two CSV files, one with two columns, and one with a single column. The single column is a subset of values that exist in one column of my first file. Example:
file1.csv:
abc,123
def,456
ghi,789
jkl,012
file2.csv:
def
jkl
All I need to do is look up the column 2 value in file1 for each value in file2 and output the results to a separate file. So in this case, my output file should consist of:
456
012
I’ve got it working this way:
pairs=IO.readlines("file1.csv").map { |columns| columns.split(',') }
f1 =[]
pairs.each do |x| f1.push(x[0]) end
f2 = IO.readlines("file2.csv").map(&:chomp)
collection={}
pairs.each do |x| collection[x[0]]=x[1] end
f=File.open("outputfile.txt","w")
f2.each do |col1,col2| f.puts collection[col1] end
f.close
...but there has to be a better way. If anyone has a more elegant solution, I'd be very appreciative! (I should also note that I will eventually need to run this on files with millions of lines, so speed will be an issue.)
To be as memory efficient as possible, I'd suggest only reading the full file2 (which I gather would be the smaller of the two input files) into memory. I'm using a hash for fast lookups and to store the resulting values, so as you read through file1 you only store the values for those keys you need. You could go one step further and write the outputfile while reading file2.
require 'CSV'
# Read file 2, the smaller file, and store keys in result Hash
result = {}
CSV.foreach("file2.csv") do |row|
result[row[0]] = false
end
# Read file 1, the larger file, and look for keys in result Hash to set values
CSV.foreach("file1.csv") do |row|
result[row[0]] = row[1] if result.key? row[0]
end
# Write the results
File.open("outputfile.txt", "w") do |f|
result.each do |key, value|
f.puts value if value
end
end
Tested with Ruby 1.9.3
Parsing For File 1
data_csv_file1 = File.read("file1.csv")
data_csv1 = CSV.parse(data_csv_file1, :headers => true)
Parsing For File 2
data_csv_file2 = File.read("file2.csv")
data_csv2 = CSV.parse(data_csv_file1, :headers => true)
Collection of names
names_from_sheet1 = data_csv1.collect {|data| data[0]} #returns an array of names
names_from_sheet2 = data_csv2.collect {|data| data[0]} #returns an array of names
common_names = names_from_sheet1 & names_from_sheet2 #array with common names
Collecting results to be printed
results = [] #this will store the values to be printed
data_csv1.each {|data| results << data[1] if common_names.include?(data[0]) }
Final output
f = File.open("outputfile.txt","w")
results.each {|result| f.puts result }
f.close

loop, array and file problem in ruby

I'm currently learning ruby and here what I'm trying to do:
A script which open a file, make a subsitution, then comparing every lines to each other to see if it exist many times.
So, I tried to work directly with the string, but I didn't find how to do it, so I put every line in an array, and comparing every row.
But I got a first problem.
Here is my code:
#!/usr/bin/env ruby
DOC = "test.txt"
FIND = /,,^M/
SEP = "\n"
#make substitution
puts File.read(DOC).gsub(FIND, SEP)
#open the file and put every line in an array
openFile = File.open(DOC, "r+")
fileArray = openFile.each { |line| line.split(SEP) }
#print fileArray #--> give the name of the object
#Cross the array to compare every items to every others
fileArray.each do |items|
items.chomp
fileArray.each do |items2|
items2.chomp
#Delete if the item already exist
if items = items2
fileArray.delete(items2)
end
end
end
#Save the result in a new file
File.open("test2.txt", "w") do |f|
f.puts fileArray
end
At the end, I only have the name of the array object "fileArray". I print the object after the split, and i've got the same, so I guess the problem is from here. Little help required (if you know how to do this without array, just with the line in the file, answer appreciate too).
Thanks !
EDIT:
So, here's my code now
#!/usr/bin/env ruby
DOC = "test.txt"
FIND = /,,^M/
SEP = "\n"
#make substitution
File.read(DOC).gsub(FIND, SEP)
unique_lines = File.readlines(DOC).uniq
#Save the result in a new file
File.open('test2.txt', 'w') { |f| f.puts(unique_lines) }
Can't figure out how to chomp this.
Deleting duplicate lines in a file:
no_duplicate_lines = File.readlines("filename").uniq
No need to write so much code :)
Modify your code like this:
f.puts fileArray.join("\n")
Alternate way:
unique_lines = File.readlines("filename").uniq
# puts(unique_lines.join("\n")) # Uncomment this line and see if the variable holds the result you want...
File.open('filename', 'w') {|f| f.puts(unique_lines.join("\n"))}
Just a couple of points about the original code:
fileArray = openFile.each { |line| line.split(SEP) }
sets fileArray to a File object, which I suspect wasn't your intention. File#each (the # notation is Ruby convention to describe a particular method on an object of the supplied class) executes your supplied block for each line (it's also available with a synonym: each_line), where a line is defined by default as your OS's end-line character(s).
If you were looking to build an array of lines, then you could just have written
fileArray = openFile.readlines
and if you wanted those lines to be chomped (often a good idea) then that could be achieved by something like
fileArray = openFile.readlines.collect { |line| line.chomp }
or even (since File mixes in Enumerable)
fileArray = openFile.collect { |line| line.chomp }
And one other tiny thing: Ruby tests for equality with ==, = is only for assignment, so
if items = items2
will set items to items2 (and will always evaluate as true)

Ruby array object find

I am trying out ruby by making a program i need. I have a custom class, and I need an array of objects of that class. This custom class has some attributes that change in the course of the program.
How can I find a specific object in my array, so I can access it and change it?
class Mathima
attr_accessor :id, :tmimata
def initialize(id)
#id = id
#tmimata = []
end
end
# main
mathimata = []
previd = id = ""
File.read("./leit/sortedinput0.txt").lines do |line|
array = line.split(' ') # i am reading a sorted file
id = array.delete_at(0) # i get the first two words as the id and tmima
tmima = array.delete_at(0)
if previd != id
mathimata.push(Mathima.new(id)) # if it's a new id, add it
end
# here is the part I have to go in mathimata array and add something in the tmimata array in an object.
previd = id
end
Use a Hash for mathimata as Greg pointed out:
mathimata = {}
File.read("./leit/sortedinput0.txt").lines do |line|
id, tmima, rest = line.split(' ', 3)
mathimata[id] ||= Mathima.new(id)
end
mathima = mathimata.find{|mathima| mathima.check() }
# update your object - mathima
Array.find() lets you search sequentially through an array, but that doesn't scale well.
I'd recommend that if you are dealing with a lot of objects or elements, and they're unique, then a Hash will be much better. Hashes allow indexed lookup based on their key.
Because of you are only keeping unique IDs either a Set or a Hash would be a good choice:
mathimata.push(Mathima.new(id)) # if it's a new id, add it
Set is between Array and a Hash. It only allows unique entries in the collection, so it's like an exclusive Array. It doesn't allow lookups/accesses by a key like a Hash.
Also, you can get your first two words in a more Ruby-like way:
array = line.split(' ') # i am reading a sorted file
id = array.delete_at(0) # i get the first two words as the id and tmima
tmima = array.delete_at(0)
would normally be written:
id, tmima = line.split(' ')[0, 2]
or:
id, tmima = line.split(' ')[0 .. 1]

How do you assign new variable names when its already assigned to something? Ruby

The title really really doesn't explain things. My situation is that I would like to read a file and put the contents into a hash. Now, I want to make it clever, I want to create a loop that opens every file in a directory and put it into a hash. Problem is I don't know how to assign a name relative to the file name. eg:
hash={}
Dir.glob(path + "*") do |datafile|
file = File.open(datafile)
file.each do |line|
key, value = line.chomp("\t")
# Problem here is that I wish to have a different
# hash name for every file I loop through
hash[key]=value
end
file.close
end
Is this possible?
Why don't you use a hash whose keys are the file names (in your case "datafile") and whose value are hashes in which you insert your data?
hash = Hash.new { |h, key| h[key] = Hash.new }
Dir.glob(path + '*') do |datafile|
next unless File.stat(datafile).file?
File.open(datafile) do |file|
file.each do |line|
key, value = line.split("\t")
puts key, value
# Different hash name for every file is now hash[datafile]
hash[datafile][key]=value
end
end
end
You want to dynamically create variables with the names of the files you process?
try this:
Dir.glob(path + "*") do |fileName|
File.open(fileName) {
# the variable `hash` and a variable named fileName will be
# pointing to the same object...
hash = eval("#{fileName} = Hash.new")
file.each do |line|
key, value = line.chomp("\t")
hash[key]=value
end
}
end
Of course you would have to make sure you rubify the filename first. A variable named "bla.txt" wouldn't be valid in ruby, neither would "path/to/bla.csv"
If you want to create a dynamic variable, you can also use #instance_variable_set (assuming that instance variables are also OK.
Dir.glob(path + "*") do |datafile|
file = File.open(datafile)
hash = {}
file.each do |line|
key, value = line.chomp("\t")
hash[key] = value
end
instance_variable_set("#file_#{File.basename(datafile)}", hash)
end
This only works when the filename is a valid Ruby variable name. Otherwise you would need some transformation.
Can't you just do the following?
filehash = {} # after the File.open line
...
# instead of hash[key] = value, next two lines
hash[datafile] = filehash
filehash[key] = value
You may want to use something like this:
hash[file] = {}
hash[file][key] = value
Two hashes is enough now.
fileHash -> lineHash -> content.

Resources