Ruby - comparing adjacent entries in one column of a csv file - ruby

I'm new to Ruby, so apologies if this is dead easy :-)
I have a .csv file with 5 columns. The first column has a record identifier (in this case a driver number) and the other 4 columns in each row have data relating to that record. For each record there are around 50 rows of data (just under 2,000 rows in total). The .csv file has a header row.
I need to read the .csv file and identify the last entry for each user, so I can move on to the next user. I've tried to get it to compare the first column and the entry in the next row.
I have this so far, it returns incorrect row numbers and they're anywhere between 1 and 5 rows out...?!?!
require 'csv-mapper'
Given(/^I compare the driver numbers from rows "(.*?)" to "(.*?)"$/) do |firstrow, lastrow|
data = CsvMapper.import('C:/auto_test_data/Courts code example csv.csv', headers: true) do
[dln]
end
row = firstrow.to_i
while row <= lastrow.to_i
#licnum1 = data.at(row).dln
#licnum2 = data.at(row+1).dln
if
#licnum2 == #licnum1
$newrecord = "same"
else
$newrecord = #licnum2
end
if
$newrecord != "same"
puts "Last row for #{#licnum1} is #{row}\n"
end
row = row + 1
end
end
This is the layout for the .csv file:
recordidentifier1 dataitem1 dataitem2 code descriptionforcomparison
recordidentifier1 dataitem1 dataitem2 code descriptionforcomparison
recordidentifier2 dataitem1 dataitem2 code descriptionforcomparison
recordidentifier2 dataitem1 dataitem2 code descriptionforcomparison
All help will be greatly appreciated.
Thanks,
Peter

Here's one way to do it
current_identifier = nil
(firstrow.to_i..lastrow.to_i).each do |row|
if current_identifer != data.at(row).dln # current row is new identifier
if current_identifier # this is not the first row
puts "Last row for #{current_identifier} is #{row-1}\n"
end
current_identifier = data.at(row).dln # remember current row
end
# we need to track the last row as the last for the current identifier
puts "Last row for #{current_identifier} is #{lastrow.to_i}\n"

Related

Ruby - How to filter SQLite rows based on column conditions

I have a variable session[:pet_profile_tags] that returns an array like ["adult", "activity_low", "overweight"].
I then have an SQLite database, called balto.db, which contains the table pet_tips. That table contains 2 columns, "ID" (Integer) and "C1_inclusion" (VARCHAR).
For each row of the pet_tips table, I need to check if the value contained in the C1_inclusion column contains one of the values of the session[:pet_profile_tags] array variable. If that is the case, I need to check the row's ID and store it inside another array variable, named pet_tips.
I tried the below code but I am getting the following error: TypeError - no implicit conversion of String into Integer: index.rb:428:in '[]' , line 428 being if (session[:pet_profile_tags].include?(row["C1_inclusion"].to_s))
# Assign pet tips
pet_tips = []
# Query the pet_tips table
db = SQLite3::Database.open "balto.db"
rows = db.execute("SELECT * FROM pet_tips")
# Iterate through each row of the table
rows.each do |row|
# Check if the row matches the C1_inclusion column
if (session[:pet_profile_tags].include?(row["C1_inclusion"].to_s))
# If the row matches, add the ID to the pet_tips array
pet_tips << row["ID"]
end
end
session[:pet_tips] = pet_tips
db.close
I have been stuck for hours, any help would be really appreciated!
First I tried returning the value of the session[:pet_profile_tags] variable to make sure I was getting the correct array. Then, I made sure to check that the different column and variable names where correctly referenced in my code.
Error
Your error is here: row["C1_inclusion"]
row is an Array of values in column order e.g. [1,"adult"].
Array#[] expects an Integer (index) but you are calling it with a String ("C1_inclusion")
Solutions
To solve this you can
Option 1:
Use Integer indexes based on column order
if (session[:pet_profile_tags].include?(row[1]))
pet_tips << row[0]
end
Option 2: convert the row to Hash:
Always:
db = SQLite3::Database.open "balto.db"
db.results_as_hash
rows = db.execute("SELECT * FROM pet_tips")
rows.each do |row|
if (session[:pet_profile_tags].include?(row["C1_inclusion"]))
Just for this loop:
rows.each_hash do |row|
if (session[:pet_profile_tags].include?(row["C1_inclusion"]))
Option 3: Query just the data you want.
# Query the pet_tips table
db = SQLite3::Database.open "balto.db"
session[:pet_tips] = db.prepare("SELECT ID FROM pet_tips WHERE pet_tips.C1_inclusion IN (?)")
.execute!(session[:pet_profile_tags].map {|s| "'#{s}'"}.join(","))
.flatten
db.close
This uses session[:pet_profile_tags] as a WHERE condition against pet_tips.C1_inclusion and assumes you have control over this variable (e.g. does not perform escaping)

reading specific columns from excel file and writing to another excel file

My objective is to read an excel file columns and write it to another new file.
Till now I am able to create a new file with specified columns.I am able to read an excel file based on row and column index. But my objective is a different.
I have to pick specific columns from the excel file and write all the data to another file under the same column.
How can I achieve this.
require 'spreadsheet'
#Step 1 : create an excel sheet
book = Spreadsheet::Workbook.new
sheet = book.create_worksheet
sheet.row(0).concat %w[id column_1 column_2 column_3]
book.write 'Data/write_excel.xls'
#step 2 : read the data excel file
book1 = Spreadsheet.open('Data/read_excel.xls')
sheet1 = book1.worksheet('')
val = sheet1[0, 1]
puts val
This is an option, knowing before the number of the source column and the number of the destination column:
# Step 3 copy the columns
col_num = 3 #=> the destination column
row_num = 1 #=> to skip the headers and start from the second row
sheet1.each row_num do |row|
sheet[row_num, col_num] = row[0] # row[1] the number represents the column to copy from the source file
row_num += 1
end
Then save the file: book.write 'filename'

Why does the other function take longer to run

I have two functions:
def construct_heirarchy(csv_file):
heirarchy = defaultdict(dict)
for row in read_CSV(csv_file):
row = edit_csv_row_data(
row,
translate_ttypes=translate_ttypes,
include_countries=settings.GEO_COUNTRY.itervalues(),
skip_headers=SKIP_HEADERS,
translate_header=translate_header)
if not row:
continue
pid, _id, ttype = map(row.get, ('pid', 'id', 'type'))
if pid:
heirarchy[pid].setdefault('target', []).append(_id)
heirarchy[_id]['type'] = ttype
return heirarchy
and
def extract_csv_data(csv_file):
csv_data = dict()
for row in read_CSV(csv_file):
row = edit_csv_row_data(
row,
translate_ttypes=translate_ttypes,
include_countries=settings.GEO_COUNTRY.itervalues(),
skip_headers=SKIP_HEADERS,
translate_header=translate_header)
if not row:
continue
# yield row['id'], row
csv_data[row['id']] = row
return csv_data
I track time of these functions using time.time
Heirarchy -2.90870666504e-05
Extracting 1.49716997147
I don't understand why there is such a huge time difference. If I use second function as generator, time is
Extracting 1.90734863281e-06
But then I can't use csv_data.get
Could someone help me understand what is going wrong here and what is the optimized way to do this?
PS: CSV is 6.1 MB with 85263x7 length.

VLOOKUP using batch scripting

I am new to scripting and perform lot of activity as an analyst using excel sheets.
I have two files with list of items in it.
File1 contains 1 column
File2 contains 2 columns.
I want to check if the list present in column1 of file2 is same as in column1 of file2. If yes then it should print column1File1, column1File2 and coulmn2File2 in file3 else it should print "NA", column1File2, column2File2 in file3.
Please help, It will simplify my work a lot.
I made this program a while ago, although it will iterate through sheets in 1 workbook, and compare cell by cell, it may set you in the right direction. It would take a cell in 1 "master" sheet and then iterate through each sheet to find that in a particular column. After it found it the counter would increment, then it would take the next cell in the master sheet and so on. you could alter to use multiple books and take whatever cells you want and compare them.
Sub Open_Excel()
'Use worksheetNum as sheet to read/write data
Set currentWorkSheet = objExcel.ActiveWorkbook.Worksheets(worksheetNum)
'How many rows are used in the current worksheet
usedRowsCount = currentWorkSheet.UsedRange.Rows.Count
'Use current worksheet cells for values
Set Cells = currentWorksheet.Cells
'Loop through each row in the worksheet
For curRow = startRow to (usedRowsCount)
'Get computer name to ping to
strEmailAddressSource = Cells(curRow,colEmailAddressSource).Value
strServerSource = Cells(curRow,colHostserverSource).Value
strLocationSource = Cells(curRow,colLocationSource).Value
'make the values unique
strconcatenation = strServerSource & strLocationSource
Call Comparison()
Next
End Sub
'********************************************************************************************
'**** Comparison
'********************************************************************************************
'Comparison test
Sub Comparison()
'choose the worksheets to go through
For worksheetCounter = 6 to 9 'workSheetCount
Set currentWorkSheetComparison = objExcel.ActiveWorkbook.Worksheets(worksheetCounter)
usedRowsCountNew = currentWorkSheetComparison.UsedRange.Rows.Count
'First row to start the comparison from
For rowCompare = 2 to (usedRowsCountNew)
strEmailLot = currentWorkSheetComparison.Cells(rowCompare,colEmailAddressLot).Value
comp1 = StrComp(strEmailAddressSource,strEmailLot,0)
comp2 = StrComp(strconcatenation,reportConcat,0)
'check if the values match
If ((comp1 = 0) AND (comp2 = 0)) THEN
countvalue = countvalue + 1
End If
Next
Next
End Sub

Ruby Watir: Selecting a specific row

Consider the following html
http://www.carbide-red.com/prog/test_table.html
I have worked out that I can move left to right on the columns using
browser.td(:text => "Equipment").parent.td(:index => "2").flash
to flash the 3rd column over on the line containing "Equipement"
But how can I move down a certain number of rows? I am having terrible luck using .tr & .rows, no matter how I try it just crashes out when using those. Even something as simple as
browser.tr(:text => "Equipment").flash
Am I just misunderstanding how tr/row works?
Specific Row/Column
It sounds like you have already calculated which row/column you want. You can get the cell at a specific row/column index by simply doing:
browser.table[row_index][column_index]
Where row_index and column_index are integers for the row and column you want (note that it is zero-based index).
Specific Row
You can also do the following to select rows based on an index:
browser.table.tr(:index, 1).flash
browser.table.row(:index, 2).flash
Note that .tr includes nested tables while .row ignores nested tables.
Update - Find Rows After Specific Row
To find a row after a specific row containing a certain text, determine the index of the specific row first. Then you can locate the other rows in relation to it. Here are some examples:
#Get the 3rd row down from the row containing the text 'Equipment'
starting_row_index = browser.table.rows.to_a.index{ |row| row.text =~ /Equipment/ }
offset = 3
row = browser.table.row(:index, starting_row_index + offset)
puts row.text
# => CAT03 ...
#Get the 3rd row down from the row containing a cell with yellow background colour
starting_row_index = browser.table.rows.to_a.index{ |row| row.td(:css => "td[bgcolor=yellow]").present? }
offset = 3
row = browser.table.row(:index, starting_row_index + offset)
puts row.text
# => ETS36401 ...
#Output the first column text of each row after the row containing a cell with yellow background colour
starting_row_index = browser.table.rows.to_a.index{ |row| row.td(:css => "td[bgcolor=yellow]").present? }
(starting_row_index + 1).upto(browser.table.rows.length - 1){ |x| puts browser.table[x][0].text }
# => CAT03, CAT08, ..., INTEGRA10, INTEGRA11
Let me know if that helps or if you have a specific example you want.

Resources