How to attach document to an Excel column using Ruby? - ruby

I have an requirment as follows:
I have 100 folders- namely 74555Attachment,55874Attachment like this. Now each folder contains maximum 5-6 files(.pdf,.csv etc).now i want to attach them to a excel from the folder to an excel column.
TicketNumber file1 file2
74555 abc.pdf tt.csv
55874 ab.pdf tt.docx
Can it be done using Ruby?
Thanks,

You can easily do this using ruby file operations
or gems like "spreadsheet", "CSV" to handle excel file. you need to use regex to separate number from file name.

Related

Keras - using predefined training / validation split

I'm working with Tensorflow/Keras. I have two text files (train_{modality_name}.txt and val_{modality_name}.txt). They contain the split I want to use for the images I'm processing.
The format of these files is the following:
example_0_path category_id
example_1_path category_id
...
example_N_path category_id
and my folder structure is like this:
/labels
train_X.txt
val_X.txt
/data
/modality_1
...
/modality_M
(e.g. data/sketch/abbey/id)
How can I make use of the files?
'flow_from_dataframe' did the job, additionally it was necessary to preprocess the txt with pandas. This tutorial was very helpful: https://medium.com/#vijayabhaskar96/tutorial-on-keras-imagedatagenerator-with-flow-from-dataframe-8bd5776e45c1
Still having problems with matching target size of the arrays (labels seem to have the wrong format)

Automate downloading of multiple xml files from web service with power query

I want to download multiple xml files from web service API. I have a query that gets a JSON document:
= Json.Document(Web.Contents("http://reports.sem-o.com/api/v1/documents/static-reports?DPuG_ID=BM-086&page_size=100"))
and manipulates it to get list of file names such as: PUB_DailyMeterDataD1_201812041627.xml in a column on an excel spreadsheet.
I hoped to get a function to run against this list of names to get all the data, so first I worked on one file: PUB_DailyMeterDataD1_201812041627
= Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/PUB_DailyMeterDataD1_201812041627.xml"))
This gets an xml table which I manipulate to get the data I want (the half hourly metered MWh for generator GU_401970
Now I want to change the query into a function to automate the process across all xml files avaiable from the service. The function requires a variable to be substituted for the filename. I try this as preparation for the function:
let
Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = (Web.Contents("https://reports.sem-o.com/documents/Filename")),
(followed by the manipulating Mcode)
This doesnt work.
then this:
let
Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/[Filename]")),
I get:
DataFormat.Error: Xml processing failed. Either the input is invalid or it isn't supported. (Internal error: Data at the root level is invalid. Line 1, position 1.)
Details:
Binary
So stuck here. Can you help.
thanks
Conor
You append strings with the "&" symbol in Power Query. [Somename] is the format for referencing a field within a table, a normal variable is just referenced with it's name. So in your example
let Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/" & Filename)),
Would work.
It sounds like you have an existing query that drills down to a list of filenames and you are trying to use that to import them from the url though, so assuming that the column you have gotten the filenames from is called "Filename" then you could add a custom column with this in it
Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/" & [Filename]))
And it will load the table onto the row of each of the filenames.

How to separate tables contained in an excel file in different CSV?

I have an excel with multiple tables separated by blank rows and I want to save each table in separate CSV files with a script. How could I do it?
Thanks for the help
UPDATE:
INPUT EXAMPLE:
Excel Example
OUTPUT EXAMPLE:
I want everyone of those columns in a file like this one.
276.1722 54.318 50.6335
276.373 52.573 51.4047
277.0097 50.864 51.9912
277.9329 49.4127 52.8294
279.0832 47.9623 53.3041
280.3554 46.5477 53.5295
281.3679 44.9695 53.8862
282.4689 43.4235 54.1254
283.4763 41.8019 54.0885
284.5859 40.3595 53.5828
285.7263 38.941 52.988
286.8929 37.5684 52.3438
288.0729 36.2914 51.5373
289.0561 35.1335 50.4119
289.7246 34.2113 48.8901
290.0624 33.3207 47.2446
290.1395 32.2516 45.6541
290.0895 31.2818 44.0091
289.7804 30.5224 42.2812
289.211 29.8383 40.5862
Alternate way of storing the data:
You could actually store them in an excel file and in different sheets and use these gems depending on if your working with old or new excel:
'Roo', 'roo-xls', 'spreadsheet', 'write_xlsx'
Loop through the sheets and perform the same logic instead of placing them throughout a single sheet.

Update existing excel file template formulas using ruby

I had been using spreadsheet to read in a template excel file, modify it and output a new file for the end-user.
As far as I can identify from the documentation spreadsheet provides no way to input or edit formulas in the produced document.
However, the purpose of my script is to read an undefined number of items from a site and enter them into the spreadsheet, then calculate totals and subtotals.
The end user (using excel or libreoffice etc) is then able to make slight modifications to the quantity of items whilst the totals update (due to formulas) as they are accustomed.
I have looked into the writeexcel gem which claims to be able to input formulas, but I can't see how to take an existing template file and modify it to produce my output. I can only create fresh workbooks.
Any tips please? I do not want to use Win32OLE.
This is surprisingly difficult; apparently all Gems for handling Excel files are missing some crucial functionality.
I can think of two approaches for this problem:
use a combination of spreadsheet (to read the Excel file) and use writeexcel (to write the output file)
use an input file that already contains the required formulas on a separate "formula" sheet and copies the formulas to the "real" sheet
Here's a simplistic version of the second approach:
require 'rubygems'
require 'spreadsheet'
Dir.chdir(File.dirname(__FILE__))
# input file, contains this data
# Sheet0: headers + data (for this simple demo, we will generate the data on-the-fly)
# Sheet1: Formula '=SUM(Worksheet1.A2:A255) in cell A1
book = Spreadsheet.open 'in.xls'
sheet = book.worksheet 0
formulasheet = book.worksheet 1
# insert some input data (in a real application,
# this data would already be present in the input sheet)
rows = rand(20) + 1
(1..rows).each do |i|
sheet[i,0] = i
end
# add total at bottom of column C
sheet[rows+1,2] = formulasheet[0,0]
# write output file
book.write 'out.xls'
However, this will fail if
you're using the same column for your input data and your totals (since then, the total will try to include itself in the calculation)

How to fetch all records using NCBI Batch Entrez

I have over 200,000 accessions in a flat file, which need to retrieve relevant entry from NBCI.
I use Batch Entrez (http://www.ncbi.nlm.nih.gov/sites/batchentrez) to do the job. But encountered several problems:
The initial file was splitted into multiple sub-files, each containing 4000 lines. But it seems Batch Entrez has some size limitation on the returned file. For example: if the first 1000 accessions all have tens of thousands lines which reach the size limitation, then the rest 3000 accessions will be rejected and won't be searched.
One possible solution in my head is to split the file into more sub-files and search individually. However this requires too much manual effort.
So I am just wondering if there is any other solution, or any code could be used.
Thanks in advance
Your problem sounds a good fit for a Bio-star toolkit. This is a solution using BioSmalltalk
| giList gbReader |
giList := (BioObject openFullFileNamed: 'd:\Batch_entrez_1.txt') contents lines.
gbReader := BioNCBIGenBankReader new.
gbReader
genBankRecordsFrom: 'nuccore'
format: #setModeXML
uids: giList.
(BioGBSeqCollection newFromXMLCollection: gbReader searchResults)
collect: [: e | BioParser
tokenizeNcbiXmlBlast: e contents
nodes: #('GBAuthor' 'GBSeq_definition') ]
To execute/debug the script, just select it and a right-click will open the Smalltalk world-menu.
The API automatically split and fetch your accession list (in the script contained in Batch_entrez_1.txt) maintaining the NCBI Entrez post limits to avoid penalities.
The result format is XML (which is an "easy" format to parse or filter specific fields) although it could be any of the retrieval modes supported by Entrez, for example setting #setModeText will answer an ASN.1 representation. Replace 'nuccore' for the database you want to query. Finally choose the interesting fields, in the script I have choosed 'GBAuthor' and 'GBSeq_definition', but you are free to choose anyone of the available nodes.

Resources