How to separate tables contained in an excel file in different CSV? - ruby

I have an excel with multiple tables separated by blank rows and I want to save each table in separate CSV files with a script. How could I do it?
Thanks for the help
UPDATE:
INPUT EXAMPLE:
Excel Example
OUTPUT EXAMPLE:
I want everyone of those columns in a file like this one.
276.1722 54.318 50.6335
276.373 52.573 51.4047
277.0097 50.864 51.9912
277.9329 49.4127 52.8294
279.0832 47.9623 53.3041
280.3554 46.5477 53.5295
281.3679 44.9695 53.8862
282.4689 43.4235 54.1254
283.4763 41.8019 54.0885
284.5859 40.3595 53.5828
285.7263 38.941 52.988
286.8929 37.5684 52.3438
288.0729 36.2914 51.5373
289.0561 35.1335 50.4119
289.7246 34.2113 48.8901
290.0624 33.3207 47.2446
290.1395 32.2516 45.6541
290.0895 31.2818 44.0091
289.7804 30.5224 42.2812
289.211 29.8383 40.5862

Alternate way of storing the data:
You could actually store them in an excel file and in different sheets and use these gems depending on if your working with old or new excel:
'Roo', 'roo-xls', 'spreadsheet', 'write_xlsx'
Loop through the sheets and perform the same logic instead of placing them throughout a single sheet.

Related

Keras - using predefined training / validation split

I'm working with Tensorflow/Keras. I have two text files (train_{modality_name}.txt and val_{modality_name}.txt). They contain the split I want to use for the images I'm processing.
The format of these files is the following:
example_0_path category_id
example_1_path category_id
...
example_N_path category_id
and my folder structure is like this:
/labels
train_X.txt
val_X.txt
/data
/modality_1
...
/modality_M
(e.g. data/sketch/abbey/id)
How can I make use of the files?
'flow_from_dataframe' did the job, additionally it was necessary to preprocess the txt with pandas. This tutorial was very helpful: https://medium.com/#vijayabhaskar96/tutorial-on-keras-imagedatagenerator-with-flow-from-dataframe-8bd5776e45c1
Still having problems with matching target size of the arrays (labels seem to have the wrong format)

Automate downloading of multiple xml files from web service with power query

I want to download multiple xml files from web service API. I have a query that gets a JSON document:
= Json.Document(Web.Contents("http://reports.sem-o.com/api/v1/documents/static-reports?DPuG_ID=BM-086&page_size=100"))
and manipulates it to get list of file names such as: PUB_DailyMeterDataD1_201812041627.xml in a column on an excel spreadsheet.
I hoped to get a function to run against this list of names to get all the data, so first I worked on one file: PUB_DailyMeterDataD1_201812041627
= Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/PUB_DailyMeterDataD1_201812041627.xml"))
This gets an xml table which I manipulate to get the data I want (the half hourly metered MWh for generator GU_401970
Now I want to change the query into a function to automate the process across all xml files avaiable from the service. The function requires a variable to be substituted for the filename. I try this as preparation for the function:
let
Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = (Web.Contents("https://reports.sem-o.com/documents/Filename")),
(followed by the manipulating Mcode)
This doesnt work.
then this:
let
Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/[Filename]")),
I get:
DataFormat.Error: Xml processing failed. Either the input is invalid or it isn't supported. (Internal error: Data at the root level is invalid. Line 1, position 1.)
Details:
Binary
So stuck here. Can you help.
thanks
Conor
You append strings with the "&" symbol in Power Query. [Somename] is the format for referencing a field within a table, a normal variable is just referenced with it's name. So in your example
let Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/" & Filename)),
Would work.
It sounds like you have an existing query that drills down to a list of filenames and you are trying to use that to import them from the url though, so assuming that the column you have gotten the filenames from is called "Filename" then you could add a custom column with this in it
Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/" & [Filename]))
And it will load the table onto the row of each of the filenames.

Update existing excel file template formulas using ruby

I had been using spreadsheet to read in a template excel file, modify it and output a new file for the end-user.
As far as I can identify from the documentation spreadsheet provides no way to input or edit formulas in the produced document.
However, the purpose of my script is to read an undefined number of items from a site and enter them into the spreadsheet, then calculate totals and subtotals.
The end user (using excel or libreoffice etc) is then able to make slight modifications to the quantity of items whilst the totals update (due to formulas) as they are accustomed.
I have looked into the writeexcel gem which claims to be able to input formulas, but I can't see how to take an existing template file and modify it to produce my output. I can only create fresh workbooks.
Any tips please? I do not want to use Win32OLE.
This is surprisingly difficult; apparently all Gems for handling Excel files are missing some crucial functionality.
I can think of two approaches for this problem:
use a combination of spreadsheet (to read the Excel file) and use writeexcel (to write the output file)
use an input file that already contains the required formulas on a separate "formula" sheet and copies the formulas to the "real" sheet
Here's a simplistic version of the second approach:
require 'rubygems'
require 'spreadsheet'
Dir.chdir(File.dirname(__FILE__))
# input file, contains this data
# Sheet0: headers + data (for this simple demo, we will generate the data on-the-fly)
# Sheet1: Formula '=SUM(Worksheet1.A2:A255) in cell A1
book = Spreadsheet.open 'in.xls'
sheet = book.worksheet 0
formulasheet = book.worksheet 1
# insert some input data (in a real application,
# this data would already be present in the input sheet)
rows = rand(20) + 1
(1..rows).each do |i|
sheet[i,0] = i
end
# add total at bottom of column C
sheet[rows+1,2] = formulasheet[0,0]
# write output file
book.write 'out.xls'
However, this will fail if
you're using the same column for your input data and your totals (since then, the total will try to include itself in the calculation)

How to attach document to an Excel column using Ruby?

I have an requirment as follows:
I have 100 folders- namely 74555Attachment,55874Attachment like this. Now each folder contains maximum 5-6 files(.pdf,.csv etc).now i want to attach them to a excel from the folder to an excel column.
TicketNumber file1 file2
74555 abc.pdf tt.csv
55874 ab.pdf tt.docx
Can it be done using Ruby?
Thanks,
You can easily do this using ruby file operations
or gems like "spreadsheet", "CSV" to handle excel file. you need to use regex to separate number from file name.

How do I create a copy of some columns of a CSV file in Ruby with different data in one column?

I have a CSV file called "A.csv". I need to generate a new CSV file called "B.csv" with data from "A.csv".
I will be using a subset of columns from "A.csv" and will have to update one column's values to new values in "B.csv". Ultimately, I will use this data from B.csv to validate against a database.
How do I create a new CSV file?
How do I copy the required columns' data from A.csv to "B.csv"?
How do I append values for a particular column?
I am new to Ruby, but I am able to read CSV to get an array or hash.
As mikeb pointed out, there are the docs - http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html - Or you can follow along with the examples below (all are tested and working):
To create a new file:
In this file we'll have two rows, a header row and data row, very simple CSV:
require "csv"
CSV.open("file.csv", "wb") do |csv|
csv << ["animal", "count", "price"]
csv << ["fox", "1", "$90.00"]
end
result, a file called "file.csv" with the following:
animal,count,price
fox,1,$90.00
How to append data to a CSV
Almost the same formula as above only instead of using "wb" mode, we'll use "a+" mode. For more information on these see this stack overflow answer: What are the Ruby File.open modes and options?
CSV.open("file.csv", "a+") do |csv|
csv << ["cow", "3","2500"]
end
Now when we open our file.csv we have:
animal,count,price
fox,1,$90.00
cow,3,2500
Read from our CSV file
Now you know how to copy and to write to a file, to read a CSV and therefore grab the data for manipulation you just do:
CSV.foreach("file.csv") do |row|
puts row #first row would be ["animal", "count", "price"] - etc.
end
Of course, this is like one of like a hundred different ways you can pull info from a CSV using this gem. For more info, I suggest visiting the docs now that you have a primer: http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
Have you seen Ruby's CSV class? It seems pretty comprehensive. Check it out here:
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
You will probably want to use CSV::parse to help Ruby understand your CSV as the table of data that it is and enable easy access to values by header.
Unfortunately, the available documentation on the CSV::parse method doesn't make it very clear how to actually use it for this purpose.
I had a similar task and was helped much more by How to Read & Parse CSV Files With Ruby on rubyguides.com than by the CSV class documentation or by the answers pointing to it from here.
I recommend reading that page in its entirety. The crucial part is about transforming a given CSV into a CSV::Table object using:
table = CSV.parse(File.read("cats.csv"), headers: true)
Now there's documentation on the CSV::Table class, but again you might be helped more by the clear examples on the rubyguides.com page. One thing I'll highlight is that when you tell .parse to expect headers, the resulting table will treat the first row of data as row [0].
You will probably be especially interested in the .by_col method available for your new Table object. This will allow you to iterate through different column index positions in the input and/or output and either copy from one to the other or add a new value to the output. If I get it working, I'll come back and post an example.

Resources