reading specific columns from excel file and writing to another excel file - ruby

My objective is to read an excel file columns and write it to another new file.
Till now I am able to create a new file with specified columns.I am able to read an excel file based on row and column index. But my objective is a different.
I have to pick specific columns from the excel file and write all the data to another file under the same column.
How can I achieve this.
require 'spreadsheet'
#Step 1 : create an excel sheet
book = Spreadsheet::Workbook.new
sheet = book.create_worksheet
sheet.row(0).concat %w[id column_1 column_2 column_3]
book.write 'Data/write_excel.xls'
#step 2 : read the data excel file
book1 = Spreadsheet.open('Data/read_excel.xls')
sheet1 = book1.worksheet('')
val = sheet1[0, 1]
puts val

This is an option, knowing before the number of the source column and the number of the destination column:
# Step 3 copy the columns
col_num = 3 #=> the destination column
row_num = 1 #=> to skip the headers and start from the second row
sheet1.each row_num do |row|
sheet[row_num, col_num] = row[0] # row[1] the number represents the column to copy from the source file
row_num += 1
end
Then save the file: book.write 'filename'

Related

Is there a way to copy only unique rows in an Excel worksheet column to another sheet?

I use a CSV file as $AgencyMaster with two columns, AgencyID and AgencyName. I currently manually input these from another file, $Excel_File_Path, but I would like to automatically generate $AgencyMaster if possible.
$Excel_File_Path has three worksheets: Sheet1, Sheet2 and Template. Sheet1 and Sheet2 are full of data, while Template is used as a graphical representation of said data which populates based on the AgencyID. I have a script that opens $Excel_File_Path, inputs AgencyID into a specific cell, saves it, then converts it to a PDF. It does this for each AgencyID in $AgencyMaster, which is currently over 200.
In $Excel_File_Path, columns B and C in Sheet1 and Sheet2 contain all of the AgencyIDs and AgencyNames, but there are a bunch of duplicates. I can't delete any of the rows because while they are duplicates in column B and C, columns D, E, F, etc have different data used in Template. So I need to be able to take each unique AgencyID and AgencyName which may appear in Sheet1 or Sheet2 and export them to a CSV to use as $AgencyMaster.
Example:
(https://i.imgur.com/j8UIZqp.jpg)
Column B contains the AgencyID and Column C contains the AgencyName. I'd like to export unique values of each from Sheet1 and Sheet2 to CSV $AgencyMaster
I've found how to export it to a different worksheet within the same workbook, just not a separate workbook alltogether. I'd also like to save it as a .CSV with leading 0's in cell A.
# Checking that $AgencyMaster Exists, and importing the data if it does
If (Test-Path $AgencyMaster) {
$AgencyData = Import-CSV -Path $AgencyMaster
# Taking data from $AgencyMaster and assigning it to each variable
ForEach ($Agency in $AgencyData) {
$AgencyID = $Agency.AgencyID
$AgencyName = $Agency.AgencyName
# Insert agency code into cell D9 on Template worksheet
$ExcelWS.Cells.Item(9,4) = $AgencyID
$ExcelWB.Save()
# Copy-Item Properties
$Destination_File_Path = "$Xlsx_Destination\$AgencyID -
$AgencyName - $company $month $year.xlsx"
$CI_Props = #{
'Path' = $Excel_File_Path;
'Destination' = $Destination_File_Path;
'PassThru' = $true;
} # Close $CI_Props
# Copy & Rename file
Copy-Item #CI_Props
} # Close ForEach
} # Close IF
I would recommend using either Sort-Object -Unique or Group-Object.

How to access the column/row info of existing content of a xlsx file

I want to get the info of the last column and row of an existing xlsx file so that I can append new contents right below the existing content. How do I do so with RubyXL? If that's not possible, what alternative gem would you recommend?
As I wrote in my comment, I don't know if this is exactly what you are looking for:
require 'rubyXL'
workbook = RubyXL::Parser.parse("Workbook1.xlsx")
worksheet = workbook[0]
rows = worksheet.map {|row| row && row.cells.each { |cell| cell && cell.value != nil}}
p last_row = rows.size
p last_column = rows.compact.max_by{|row| row.size}.size
Let worksheet be the first (e.g. worksheet = workbook[0]).
You could use:
last_row = worksheet.count
last_col = worksheet.map{|i| i.cells.count}.max

Ruby - comparing adjacent entries in one column of a csv file

I'm new to Ruby, so apologies if this is dead easy :-)
I have a .csv file with 5 columns. The first column has a record identifier (in this case a driver number) and the other 4 columns in each row have data relating to that record. For each record there are around 50 rows of data (just under 2,000 rows in total). The .csv file has a header row.
I need to read the .csv file and identify the last entry for each user, so I can move on to the next user. I've tried to get it to compare the first column and the entry in the next row.
I have this so far, it returns incorrect row numbers and they're anywhere between 1 and 5 rows out...?!?!
require 'csv-mapper'
Given(/^I compare the driver numbers from rows "(.*?)" to "(.*?)"$/) do |firstrow, lastrow|
data = CsvMapper.import('C:/auto_test_data/Courts code example csv.csv', headers: true) do
[dln]
end
row = firstrow.to_i
while row <= lastrow.to_i
#licnum1 = data.at(row).dln
#licnum2 = data.at(row+1).dln
if
#licnum2 == #licnum1
$newrecord = "same"
else
$newrecord = #licnum2
end
if
$newrecord != "same"
puts "Last row for #{#licnum1} is #{row}\n"
end
row = row + 1
end
end
This is the layout for the .csv file:
recordidentifier1 dataitem1 dataitem2 code descriptionforcomparison
recordidentifier1 dataitem1 dataitem2 code descriptionforcomparison
recordidentifier2 dataitem1 dataitem2 code descriptionforcomparison
recordidentifier2 dataitem1 dataitem2 code descriptionforcomparison
All help will be greatly appreciated.
Thanks,
Peter
Here's one way to do it
current_identifier = nil
(firstrow.to_i..lastrow.to_i).each do |row|
if current_identifer != data.at(row).dln # current row is new identifier
if current_identifier # this is not the first row
puts "Last row for #{current_identifier} is #{row-1}\n"
end
current_identifier = data.at(row).dln # remember current row
end
# we need to track the last row as the last for the current identifier
puts "Last row for #{current_identifier} is #{lastrow.to_i}\n"

VLOOKUP using batch scripting

I am new to scripting and perform lot of activity as an analyst using excel sheets.
I have two files with list of items in it.
File1 contains 1 column
File2 contains 2 columns.
I want to check if the list present in column1 of file2 is same as in column1 of file2. If yes then it should print column1File1, column1File2 and coulmn2File2 in file3 else it should print "NA", column1File2, column2File2 in file3.
Please help, It will simplify my work a lot.
I made this program a while ago, although it will iterate through sheets in 1 workbook, and compare cell by cell, it may set you in the right direction. It would take a cell in 1 "master" sheet and then iterate through each sheet to find that in a particular column. After it found it the counter would increment, then it would take the next cell in the master sheet and so on. you could alter to use multiple books and take whatever cells you want and compare them.
Sub Open_Excel()
'Use worksheetNum as sheet to read/write data
Set currentWorkSheet = objExcel.ActiveWorkbook.Worksheets(worksheetNum)
'How many rows are used in the current worksheet
usedRowsCount = currentWorkSheet.UsedRange.Rows.Count
'Use current worksheet cells for values
Set Cells = currentWorksheet.Cells
'Loop through each row in the worksheet
For curRow = startRow to (usedRowsCount)
'Get computer name to ping to
strEmailAddressSource = Cells(curRow,colEmailAddressSource).Value
strServerSource = Cells(curRow,colHostserverSource).Value
strLocationSource = Cells(curRow,colLocationSource).Value
'make the values unique
strconcatenation = strServerSource & strLocationSource
Call Comparison()
Next
End Sub
'********************************************************************************************
'**** Comparison
'********************************************************************************************
'Comparison test
Sub Comparison()
'choose the worksheets to go through
For worksheetCounter = 6 to 9 'workSheetCount
Set currentWorkSheetComparison = objExcel.ActiveWorkbook.Worksheets(worksheetCounter)
usedRowsCountNew = currentWorkSheetComparison.UsedRange.Rows.Count
'First row to start the comparison from
For rowCompare = 2 to (usedRowsCountNew)
strEmailLot = currentWorkSheetComparison.Cells(rowCompare,colEmailAddressLot).Value
comp1 = StrComp(strEmailAddressSource,strEmailLot,0)
comp2 = StrComp(strconcatenation,reportConcat,0)
'check if the values match
If ((comp1 = 0) AND (comp2 = 0)) THEN
countvalue = countvalue + 1
End If
Next
Next
End Sub

Ruby Watir: Selecting a specific row

Consider the following html
http://www.carbide-red.com/prog/test_table.html
I have worked out that I can move left to right on the columns using
browser.td(:text => "Equipment").parent.td(:index => "2").flash
to flash the 3rd column over on the line containing "Equipement"
But how can I move down a certain number of rows? I am having terrible luck using .tr & .rows, no matter how I try it just crashes out when using those. Even something as simple as
browser.tr(:text => "Equipment").flash
Am I just misunderstanding how tr/row works?
Specific Row/Column
It sounds like you have already calculated which row/column you want. You can get the cell at a specific row/column index by simply doing:
browser.table[row_index][column_index]
Where row_index and column_index are integers for the row and column you want (note that it is zero-based index).
Specific Row
You can also do the following to select rows based on an index:
browser.table.tr(:index, 1).flash
browser.table.row(:index, 2).flash
Note that .tr includes nested tables while .row ignores nested tables.
Update - Find Rows After Specific Row
To find a row after a specific row containing a certain text, determine the index of the specific row first. Then you can locate the other rows in relation to it. Here are some examples:
#Get the 3rd row down from the row containing the text 'Equipment'
starting_row_index = browser.table.rows.to_a.index{ |row| row.text =~ /Equipment/ }
offset = 3
row = browser.table.row(:index, starting_row_index + offset)
puts row.text
# => CAT03 ...
#Get the 3rd row down from the row containing a cell with yellow background colour
starting_row_index = browser.table.rows.to_a.index{ |row| row.td(:css => "td[bgcolor=yellow]").present? }
offset = 3
row = browser.table.row(:index, starting_row_index + offset)
puts row.text
# => ETS36401 ...
#Output the first column text of each row after the row containing a cell with yellow background colour
starting_row_index = browser.table.rows.to_a.index{ |row| row.td(:css => "td[bgcolor=yellow]").present? }
(starting_row_index + 1).upto(browser.table.rows.length - 1){ |x| puts browser.table[x][0].text }
# => CAT03, CAT08, ..., INTEGRA10, INTEGRA11
Let me know if that helps or if you have a specific example you want.

Resources