As noted in other posts here, the roo gem has some convoluted documents, and I can't figure out how to iterate through a xlsx workbook's sheets, then each sheet's rows.
Roo gem
Iterating through each sheet only comes up with the sheet's name, not a reference to the sheet itself. So I have to declare it. Still has no way of iterating through all rows.
s = '/Users/user3/Desktop/departments.xlsx'
a = Roo::Spreadsheet.open(s)
sheets = a.sheets.reverse
sheets.each_with_index do |index, i|
sheet = a.sheet(index)
sheet.each_row do|row|
puts row
end
end
Other examples on the roo documentation assume there is a default sheet, and a single sheet in a workbook file. Not my case, so it gets confusing quickly.
How can I iterate through every sheet, then ever row, for testing and inclusion into my analysis? The spreadsheet I'm working with has a blank column and many rows, and some artwork...typical office stuff. These people are not data people. Getting around their 'creativity' in spreadsheets is done by iteration and testing of every row, then stripping blank cells.
Anybody with some insight truly appreciated.
Solved: use the Creek gem. Kudos to #Rajagopalan for this.
Related
I have a formula that fetches names of books from goodreads.com:
=IMPORTXML("https://www.goodreads.com/book/show/" & gr_id; "//*[#id='bookTitle']")
where gr_id is a column containing ids of the books. For example when gr_id=23848607, it fetches from URL https://www.goodreads.com/book/show/23848607 and the result is "Warheart".
The formula worked fine some time ago. I did not change anything and now I noticed it stopped working for some of the books (still working for others). Instead of the name of the book now it gives N/A with "Import Internal Error" hint. The ids that do not work are:
48332548
35906922
How to make it work for all books?
There were many questions posted about "Import Internal Error" problems. I tried some solutions including copying the formula to a fresh sheet, but it did not work.
Update: I tried the following different XPath formulas instead of "//*[#id='bookTitle']".
"//h1[#id='bookTitle']"
"//h1"
Those different XPath formulas worked the same as the original XPath formula. They worked correctly for the same ids that the original one did and produced N/As for the same ids that the original one did.
Update: I just re-checked and all my formulas worked correctly for all gr_ids (I had not changed anything since the time when they did not work.) May be someone knows how to prevent them from stopping working in the future.
Update: I re-checked once again. Of all gr_ids only this one was showing N\A now: 35906922. I created an example spreadsheet, because my working spreadsheet contains too many unrelated details, but the problem did not appear in the example spreadsheet. I went back to my working spreadsheet and reloaded it - and the problem disappeared in my working spreadsheet too. Then I added more test data in the example spreadsheet and the following new example gr_ids showed N\A:
48213012
48213092
I tried to make a copy of the example spreadsheet to see if it fixes the problem. The behavior in the copy example spreadsheet was identical to the original example spreadsheet - the problem only with two gr_ids specified above.
if you run full IMPORTXML on those two IDs you can see it won't return anything at all:
=IMPORTXML("https://www.goodreads.com/book/show/48213012-fathers-and-sons", "//*")
which means that Google Sheets can't reach the XML content for some reason (could be something similar to https://stackoverflow.com/a/24891676/5632629)
therefore we can try to read the source code directly with IMPORTDATA where we can find around 70 elements with the same information so we pick one, isolate it and remove HTML tags. then we just wrap the prior formula in IFERROR and force the formula to take a 2nd look if it fails first time. the result is like this:
=IFERROR(IMPORTXML("https://www.goodreads.com/book/show/"&A:A, "//*[#id='bookTitle']"),
REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(
IMPORTDATA("https://www.goodreads.com/book/show/"&A:A), 100, 1),
"select Col1 where Col1 contains '</title>'"), ">(.*) by"))
IMPORTXML() seems to be unreliable. I decided not to use it, because I did not find an acceptable solution to my problem. Instead of using IMPORTXML() I exported my books from goodreads.com to csv file (there is such a feature of goodreads.com) and then imported the csv file into my spreadsheet. This is not be an perfect solution, because I need to re-import every time I need to update the books, but at least it works.
I need to fetch some data but I'm completely stumped after trying a few things.
I want to access Airlines & Destinations from the Albuquerque_International_Sunport's wiki page - keep in mind, I'll be going through a prepopulated list of airports with this data.
There are multiple "types" of Airlines: Passenger, Cargo, sometimes there's other (sub?)sections; other times there are none:
Articles for multiple airports will be accessed automatically - including some less known airports. This means I need to:
Check if "Airlines & Destinations" section exists
Take all data inside of any table
Scrape it; otherwise do nothing
I've tried using the ruby wikipedia-client gem however, the .raw_data method isn't even returning the section data:
Next, I went to Wikipedia's API: unless I am mistaken, but it doesn't return "section" names! This doesn't seem right but I wasn't able to get it working.
So I suppose that leaves Nokogiri. I can grab and parse the pages fine, but:
How would I go about detecting "Airlines & Destinations" section presence, getting all table data BEFORE end of section? I have a suspicion I need some tricky Xpath for this.
Seems to be the only viable solution.
Any thoughts welcome. Putting a bounty on this question when I can.
Edit: Perhaps it's better to simply somehow grab a list of all airlines in the world and hit them against HTML? Seems like it could be computationally expensive.
Well, I'm not an expert user of Nokogiri but maybe this can give you some idea.
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open("https://en.wikipedia.org/wiki/Albuquerque_International_Sunport"))
# this is the passenger table
page.xpath('//*[#id="mw-content-text"]/div/table[2]/tr').each do |tr|
p tr.text()
puts "-"*50
end
# this is the cargo table
page.xpath('//*[#id="mw-content-text"]/div/table[3]/tr').each do |tr|
p tr.text()
puts "-"*50
end
I am fairly new to selenium ruby/rspec scripting and I have come across a use case that requires data to be pulled from a csv or xlsx file in the test. Any help or suggestions on how to approach this would be greatly appreciated.
The test would pull from the csv file and input data from each row to complete the same actions against. This particular file contains a single column of "id's" and the same action would need to be repeated until all values from the column have been used. Here is the basic steps...
User logs in
User pulls first value (id) from file to search in text field
User completes action against this id
User returns to text field and pulls value from second row of same file
User completes action against this id
This would repeat until all rows are completed
Is this possible to complete the same method repeatedly but filtering through data from the file?
How would you pull the csv file into the script and specifically grab the first value (and subsequent values) throughout?
I know this is pretty vague but I have not seen any examples such as this in SO and researching online. Any suggestions or examples would very much appreciated.
I'm not sure I fully understand your question but I can try to point you in the right direction.
Ruby has a csv parser built in which you can read about here: CVS class
One of the first examples would seem to provide you with the functinoality you are looking for. I think in your example you would want to do something like:
CSV.foreach("path/to/file.csv") do |row|
#I'm assuming ID is the first element in each row
id = row.first
method_that_does_stuff(id)
# assert that stuff has happened ...
end
does that help get you started?
I m using “spreadsheet”ruby gem to generate xls files.
I have already an xls file “MyFile.xls” which contains many sheets: sh_01, sh_02, sh_03 …
I want to read the name of the last sheet (sh_last_number) and add a new sheet called “sh_last_number+1” to this file (MyFile.xls) and write some data on it.
In other words, I have to open it (read data) and write on it at the same time.
If this idea can’t be realized with Spreadsheet, is their another gem more efficient?
Thanks in advance.
You can definitely do this with the spreadsheet gem. Since you are working with excel files, you may need to require the excel component of the gem if you are using an older version:
require 'spreadsheet' # You may need to require 'spreadsheet/excel'
Then working with and writing pages is simple. To open the workbook (xls file with multiple pages) you do something like:
#workbook = Spreadsheet.open("MyFile.xls")
And then to add a sheet to the workbook you've opened, you simply:
new_sheet = "sh_#{#workbook.worksheets.size + 1}"
#worksheet = #workbook.create_worksheet(:name => new_sheet)
Hope this helps.
Cheers,
Sean
I have an AppleScript program which creates XML tags and elements within an Adobe InDesign document. The data is in tables, and tagging each cell takes .5 seconds. The entire script takes several hours to complete.
I can post the inner loop code, but I'm not sure if SO is supposed to be generic or specific. I'll let the mob decide.
[edit]
The code builds a list (prior to this loop) which contains one item per row in the table. There is also a list containing one string for each column in the table. For each cell, the program creates an XML element and an XML tag by concatenating the items in the [row]/[column] positions of the two lists. It also associates the text in that cell to the newly-created element.
I'm completely new to AppleScript so some of this code is crudely modified from Adobe's samples. If the code is atrocious I won't be offended.
Here's the code:
repeat with columnNumber from COL_START to COL_END
select text of cell ((columnNumber as string) & ":" & (rowNumber as string)) of ThisTable
tell activeDocument
set thisXmlTag to make XML tag with properties {name:item rowNumber of symbolList & "_" & item columnNumber of my histLabelList}
tell rootXmlElement
set thisXmlElement to make XML element with properties {markup tag:thisXmlTag}
end tell
set contents of thisXmlElement to (selection as string)
end tell
end repeat
EDIT: I've rephrased the question to better reflect the correct answer.
The problem is almost certainly the select. Is there anyway you could extract all the text at once then iterate over internal variables?
I figured this one out.
The document contains a bunch of data tables. In all, there are about 7,000 data points that need to be exported. I was creating one root element with 7,000 children.
Don't do that. Adding each child to the root element got slower and slower until at about 5,000 children AppleScript timed out and the program aborted.
The solution was to make my code more brittle by creating ~480 children off the root, with each child having about 16 grandchildren. Same number of nodes, but the code now runs fast enough. (It still takes about 40 minutes to process the document, but that's infinitely less time than infinity.)
Incidentally, the original 7,000 children plan wasn't as stupid or as lazy as it appears. The new solution is forcing me to link the two tables together using data in the tables that I don't control. The program will now break if there's so much as a space where there shouldn't be one. (But it works.)
I can post the inner loop code, but I'm not sure if SO is supposed to be generic or specific. I'll let the mob decide.
The code you post as an example can be as specific as you (or your boss) is comfortable with - more often than not, it's easier to help you with more specific details.
If the inner loop code is a reasonable length, I don't see any reason you can't post it. I think Stack Overflow is intended to encompass both general and specific questions.
Are you using InDesign or InDesign Server? How many pages is your document (or what other information can you tell us about your document/ID setup)?
I do a lot of InDesign Server development. You could be seeing slow-downs for a couple of reasons that aren't necessarily code related.
Right now, I'm generating 100-300 page documents almost completely from script/xml in about 100 seconds (you may be doing something much larger).