Reading text file, parsing it, and storing it into a hash - ruby

I want to open a text file name test.txt and turn into hash which has the condition value into 1111 only
Instance Id:xxxxx, value: 123
Instance Id:xxxxx, value: 1111
Instance Id:xxxxx, value: 1111
can any one please help me.
This my sample code:
File.open('test.txt').each_line do |line|
puts line if line.match(/1111/)
end

# define a array in the outside scope so you can access it
array = []
# run your loop that reads the file
File.open('test.txt').each_line do |line|
# split lines into two parts, instance_id - value pairs
instance_id, value = line.split(',')
# only add to array if the value is the one you're looking for
# also, split instance_id to only get the value of the ID
array << instance_id.split(':')[1] if value.match(/1111/)
end
puts array
# => ["xxxxx", "xxxxx"]
EDIT: updated the suggestion to better suit the updated request in the comments
Also worth noting is that it serves no purpose to have the values in a hash since you would have different IDs for the same value, you would want to put this in an array.

Related

reading only the second row of an xlsx file using roo

I am using roo to read an xlsx file.
book = Roo::Spreadsheet.open("codes.xlsx")
sheet = book.sheet('Sheet1')
This reads all the rows in the file.
I can iterate through this sheet using
plans_sheet.each(column_1: 'column_1', column_2: 'column_2') do |hash|
But this iteration happens from the first row which has all the column names as well.
I need to take only the second row data.
Update -
when you do row(2) it returns you an array. And when you are iterating using .each it returns you a hash which has column names as your key .
How to do that.
Roo::Excelx#each method returns standard Ruby Enumerator:
enum = sheet.each(column_1: 'column_1', column_2: 'column_2')
enum.class # => Enumerator
So there are two ways to achieve your goal:
1) Use drop method:
enum.drop(1).each do |hash|
# do something with the hash
end
If you need only the second row:
hash = enum.drop(1).first
2) Move the internal position of the iterator:
enum.next # move 1 step forward to skip the first row
# continue moving
loop do
hash = enum.next
# do something with the hash
end
If you need only the second row:
enum.next # skip the first row
hash = enum.next # second row
Also take into account:
There isRoo::Excelx::Sheet class which also represents a worksheet and has the each_row method that receives :offset parameter. But unfortunately it doesn't have an option to transform a row into a hash with given keys.
book = Roo::Spreadsheet.open("codes.xlsx")
sheet = book.sheet_for('Sheet1')
sheet.each_row(offset: 1) do |row| # skip first row
# do something with the row
end
If you want to iterate over rows starting from the second one, just use Enumerable#drop
plans_sheet.each(column_1: 'column_1', column_2: 'column_2').drop(1) do |hash|
# ...
end
Just do
whatever_sheet.row(2)
It is always good to have a look at the documentation, it is explained in the README.md of the Gem, see https://github.com/roo-rb/roo#accessing-rows-and-columns

Split text while reading line by line

I am trying to read a text file with contents like this
ABC = Thefirststep
XYZ = Secondstep
ABC_XYZ = Finalstep=345ijk!r4+
I am able to read the file line by line using this
#!/usr/bin/ruby
text = '/tmp/data'
f = File.open(text , "r")
f.each_line { |line|
puts line
}
f.close
What I want to do is have the values TheFirststep Secondstep and Finalstep assigned to separate variables. better if we use split().
You could use something like this:
#!/usr/bin/ruby
text = '/tmp/data'
data = []
f = File.open(text , "r")
f.each_line { |line|
data.push( line.split("=").last)
}
f.close
You said you want to, "have the values, 'TheFirststep', 'Secondstep and 'Finalstep' assigned to separate variables.
You cannot create local variables dynamically (not since Ruby v1.8 anyway). That leaves two choices: assign those values to instance variables or use a different data structure, specifically, a hash.
First let's create a data file.
data <=-END
ABC = Thefirststep
XYZ = Secondstep
ABC_XYZ = Finalstep=345ijk!r4+
END
FName = 'test'
File.write(FName, data)
#=> 73
Assign values to instance variables
File.foreach(FName) do |line|
var, value, * = line.chomp.split(/\s*=\s*/)
instance_variable_set("##{var.downcase}", value)
end
#abc
#=> "Thefirststep"
#xyz
#=> "Secondstep"
#abc_xyz
#=> "Finalstep"
The convention for the names of instance variables (after the "#") is to use snake-case, which is why I downcased them.
Store the values in a hash
File.foreach(FName).with_object({}) do |line,h|
var, value, * = line.chomp.split(/\s*=\s*/)
h[var] = value
end
#=> {"ABC"=>"Thefirststep", "XYZ"=>"Secondstep", "ABC_XYZ"=>"Finalstep"}
As easy as this was to do, it's not generally helpful to generate instance variables dynamically or hashes with dynamically created keys. That's because they are only useful if their values can be obtained and possibly changed, which is problematic.
Note that in
var, value, * = line.chomp.split(/\s*=\s*/)
var equals the first element of the array returned by the split operation, value is the second value and * discards the remaining elements, if any.

Store Ruby YAML results into an array

So I have an empty array and a .yml file. I have managed to output the results of that file with this code
puts YAML.load_file('some_yml_file.yml').inspect
I was wondering, how can I pull out each of the data and store them into an empty array?
Is it
emptyarray = []
YAML.load_file('some_yml_file.yml').inspect do |entry|
emptyarray << entry
end
Any help would be appreciated! Thanks!
YAML.load_file returns a Ruby object corresponding to the type of data structure the YAML represents. If the YAML contains a sequence, YAML.load_file will return a Ruby array. You don't need to do anything further to put the data into an array, because it's already an array:
yaml = <<END
---
- I am
- a YAML
- sequence
END
data = YAML.load(yaml)
puts data.class
# => Array
puts data == ["I am", "a YAML", "sequence"]
# => true
(You'll notice that I used YAML.load to load the data from a string rather than a file, but the result is the same as using YAML.load_file on a file with the same contents.)
If the top-level structure in the YAML is not a sequence (e.g. if it's a mapping, analogous to a Ruby hash), then you will have to do additional work to turn it into an array, but we can't tell you what that code would look like without seeing your YAML.
Change YAML.load_file('some_yml_file.yml').inspect do |entry| with YAML.load_file('some_yml_file.yml').each do |entry| and it should work as you expect it (assuming it's not a string).
If you post a sample of your data structure inside the YAML file and what you wish to extract and put in an array then that would help.

Why am i getting the error `stack level too deep (systemstackerror)` when hashing 2 columns in a CSV?

This is my code, which is supposed to hash the 2 columns in fotoFd.csv and then save the hashed columns in a separate file, T4Friendship.csv:
require "csv"
arrayUser=[]
arrayUserUnique=[]
arrayFriends=[]
fileLink = "fotoFd.csv"
f = File.open(fileLink, "r")
f.each_line { |line|
row = line.split(",");
arrayUser<<row[0]
arrayFriends<<row[1]
}
arrayUserUnique = arrayUser.uniq
arrayHash = []
for i in 0..arrayUser.size-1
arrayHash<<arrayUser[i]
arrayHash<<i
end
hash = Hash[arrayHash.each_slice(2).to_a]
array1 =hash.values_at *arrayUser
array2 =hash.values_at *arrayFriends
fileLink = "T4Friendship.csv"
for i in 0..array1.size-1
logfile = File.new(fileLink,"a")
logfile.print("#{array1[i]},#{array2[i]}\n")
logfile.close
end
The first columns contains users, and the second column contains their friends. So, I want it to produce something like this in the T4Friendship.csv:
1 2
1 4
1 10
1 35
2 1
2 8
2 11
3 28
3 31
...
The problem is caused by the splat expansion of a large array. The splat * can be used to expand an array as a parameter list. The parameters are passed on the stack. If there are too many parameters, you'll exhaust stack space and get the mentioned error.
Here's a quick demo of the problem in irb that tries to splat an array of one million elements when calling puts:
irb
irb(main):001:0> a = [0] * 1000000; nil # Use nil to suppress statement output
=> nil
irb(main):002:0> puts *a
SystemStackError: stack level too deep
from /usr/lib/ruby/1.9.1/irb/workspace.rb:80
Maybe IRB bug!
irb(main):003:0>
You seem to be processing large CSV files, and so your arrayUser array is quite large. Expanding the large array with the splat causes the problem on the line:
array1 =hash.values_at *arrayUser
You can avoid the splat by calling map on arrayUser, and converting each value in a block:
array1 = arrayUser.map{ |user| hash[user] }
Suggested Code
Your code appears to map names to unique ID numbers. The output appears to be the same format as the input, except with the names translated to ID numbers. You can do this without keeping any arrays around eating up memory, and just use a single hash built up during read, and used to translate the names to numbers on the fly. The code would look like this:
def convertCsvNamesToNums(inputFileName, outputFileName)
# Create unique ID number hash
# When unknown key is lookedup, it is added with new unique ID number
# Produces a 0 based index
nameHash = Hash.new { |hash, key| hash[key] = hash.size }
# Convert input CSV with names to output CSV with ID numbers
File.open(inputFileName, "r") do |inputFile|
File.open(outputFileName, 'w') do |outputFile|
inputFile.each_line do |line|
# Parse names from input CSV
userName, friendName = line.split(",")
# Map names to unique ID numbers
userNum = nameHash[userName]
friendNum = nameHash[friendName]
# Write unique ID numbers to output CSV
outputFile.puts "#{userNum}, #{friendNum}"
end
end
end
end
convertCsvNamesToNums("fotoFd.csv", "T4Friendship.csv")
Note: This code assigns ID numbers to user and friends, as they are encountered. Your previous code assigned ID numbers to users only, and then looked up the friends after. The code I suggested will ensure friends are assigned ID numbers, even if they never appeared in the user list. The numerical ordering will different slightly from what you supplied, but I assume that is not important.
You can also shorten the body of the inner loop to:
# Parse names from input, map to ID numbers, and write to output
outputFile.puts line.split(",").map{|name| nameHash[name]}.join(',')
I thought I'd include this change separately for readability.
Updated Code
As per your request in the comments, here is code that gives priority to the user column for ID numbers. Only once the first column is completely processed will ID numbers be assigned to entries in the second column. It does this by first passing over the input once, adding the first column to the hash, and then passing over the input a second time to process it as before, using the pre-prepared hash from the first pass. New entries can still be added in the second pass in the case where the friend column contains a new entry that doesn't exist anywhere in the user column.
def convertCsvNamesToNums(inputFileName, outputFileName)
# Create unique ID number hash
# When unknown key is lookedup, it is added with new unique ID number
# Produces a 0 based index
nameHash = Hash.new { |hash, key| hash[key] = hash.size }
# Pass over the data once to give priority to user column for ID numbers
File.open(inputFileName, "r") do |inputFile|
inputFile.each_line do |line|
name, = line.split(",") # Parse name from line, ignore the rest
nameHash[name] # Add name to unique ID number hash (if it doesn't already exist)
end
end
# Convert input CSV with names to output CSV with ID numbers
File.open(inputFileName, "r") do |inputFile|
File.open(outputFileName, 'w') do |outputFile|
inputFile.each_line do |line|
# Parse names from input, map to ID numbers, and write to output
outputFile.puts line.split(",").map{|name| nameHash[name]}.join(',')
end
end
end
end
convertCsvNamesToNums("fotoFd.csv", "T4Friendship.csv")

Comparing values in two hashes

I am having trouble in comparing values in two hashes, getting the error "Can't convert String into Integer".
First hash has values captured from a web page using the method "capture_page_data(browser)" and the second hash has data parsed from a report.
Code looks like below:
# Open the web application
# Navigate to a specific page and capture page data
loan_data = Hash.new
loan_data = capture_page_data(browser)
Second hash has values captured from a report generated from the web application.
Code looks like below:
#report_data[page] = Hash.new
# we have written some logic to parse the data from the report into hash variable
Now I am trying to compare the values in theses two hashes to ensure the data in report is matching with the data in application using below code which is giving me the error "Can't convert String into Integer".
loan_data.map{|ld| ld['MainContent_cphContent_LoanOverViewGeneralInfoCtrl_lblRelName']} &
#report_data.map{|rd| rd['Relationship']}
Please help me out in resolving this issue.
Regards,
Veera.
Hash#map iterates through the hash like it was an array of key/value pairs.
{a:1,b:2}.map{|x| puts x.inspect }
# prints
# [:a,1]
# [:b,2]
{a:1,b:2}.map{|k,v| puts "#{k} => #{v}" }
# prints
# a => 1
# b => 2
It applies the block you provide to each pair and collects the results into a new array.
result = {a:1,b:2}.map{|k,v| "#{k} => #{v}" }
puts result.inspect
# prints
# [ "a => 1", "b => 2" ]
I would guess what you are trying to do is compare a single key from each array... in which case...
if loan_data[:id][:span]['MainContent_cphContent_LoanOverViewGeneralInfoCtrl_lblR‌​elName'] == #report_data[1]['Relationship']
log_message("pass")
else
log_message("fail")
end
might be what you are trying to do.. but I am only guessing.
It all depends on the shape of your data.
If you inspect the ld variable inside your block, you will find that it is an array. You can get an element of it with ld[0] or ld[1], but ld[string] does not make sense and results in the exception you are seeing. The ld array will actually be an array with two elements: key and value.
Thanks for your suggestions.. but I found a different solution to compare a single key from two hashes/Arrays using the below code which worked fine.
string_equals?(loan_data[:id][:span]['MainContent_cphContent_LoanOverViewGeneralInfoCtrl_lblRelName'], #report_data[1]['Relationship'] )
Thanks,
Veera.
It's best to debug the content of loan_data and #report_data directly, but you can try .to_sym to convert the key into symbol.
loan_data.map{|ld| ld['MainContent_cphContent_LoanOverViewGeneralInfoCtrl_lblRelName'.to_sym]} &
#report_data.map{|rd| rd['Relationship'.to_sym]}

Resources