The docs indicate that the 'url' parameter of the /http/import robot allows for an array of urls that can be imported by the relevant step. In the docs, the /s3/import robot's 'path' parameter only seems to accept a single path in a step. Is there any way to import multiple s3 documents in a step?
(speaking as a Transloadit co-founder) Currently that's not possible, but a feature request has been filed and we expect to ship this before the end of the month (October 2014).
Related
I am trying to crawl different websites (e-commerce websites) and extract specific information from the pages of each website (i.e. product price, quantity, date of publication, etc.).
My question is: how to configure the parsing since each website has a different HTML layout which means I need different Xpaths for the same item depending on the website? Can we add multiple parser bolts in the topology for each website? If yes, how can we assign different parsefilters.json files to each parser bolt?
You need #586. At the moment there is no way to do it but to put all your XPATH expressions regardless of the site you want to use them on in the parsefilters.json.
You can't assign different parsefilters.json to the various instances of a bolt.
UPDATE however you could have multiple XpathFilters sections within the parseFilters.json. Each could cover a specific source, however, there is currently no way of constraining which source a parse filter gets applied to. You could extend XPathFilter so that it takes some extra config e.g. regular expression a URL must match in order to be applied. That would work quite nicely I think.
I've recently added JsoupFilters which will be in the next release. These should be useful for your use case but that still doesn't solve the issue that you need an implementation of the filter that organizes the resources per host. It shouldn't be too hard to implement taking the URL filter one as a example and would also make a very nice contribution to the project.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I am trying to pull a number from the Morningstar "Cash Flow" page an arbitrary stock ticker using XPath. I have the tested the XPath on the morningstar website by an XPath tester and it returned desired values. However, when I want to use this value in a google sheet, it returns #N/A (Imported content is empty.).
=IMPORTXML("http://financials.morningstar.com/cash-flow/cf.html?t=fb®ion=usa&culture=en-US", "//div[#id='data_tts1']/div")
I did a bit of research on this and find out that data in such websites generated dynamically and downloads the content in stages, Therefore, page needs to be loaded first to be able to pull any data out of it!
I'm wondering if there is any solution to this issue?
You help would much be appreciated.
it's empty as it should be because the content you are trying to scrape is of JavaScript origin. Google Sheets does not support imports of JS elements. you can always test this by disabling JS for a given site and only what's left can be scraped:
It might be possible. But you have to prepare a custom sheet to extract the data. Use IMPORTDATA to parse the .json which contains the data :
http://financials.morningstar.com/ajax/ReportProcess4HtmlAjax.html?&t=XNAS:FB®ion=usa&culture=en-US&cur=&reportType=cf&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=672024&callback=jsonp1585016592836&_=1585016593002
AFAIK, you couldn't import directly the .csv version (specific headers needed, so curl or other specific tools would be required).
http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:FB®ion=usa&culture=en-US&cur=&reportType=cf&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=764423&denominatorView=raw&number=3
Since this .json is very special (contains html tags), i don't think a custom script for GoogleSheets could import it correctly. So once the .json is loaded in GoogleSheets, TRANSPOSE the rows to columns and use formulas to locate your data (target the cells which contain data_s1 and data_s2 for example). Use CONCAT to merge the cells of interest. Then split the result into columns (use a custom separator). SEARCH for the data you want and clean the results with SUBSTITUTE. The method is dirty but i think it could be automated for the whole process.
I have a situation that I had to sync my array with language files, so every time I had to generate & translate it.
I was looking for a package like laravel-langman it has an option to sync. But now that I am looking, it doesn't allow me to create a key with the value using artisan commend directly without asking for input.
Any Help will be appreciated.
You should check out this page maybe, it mentions multiple packages that solve your problem. We currently use a combination of 2 packages. I think the first one has what you want.
We use 2 packages to solve this issue, one is for the basic translations that don't get added dynamically, for this we used: waavi/translation
Now you still need it working for dynamically created or removed translations which you need if you want your models to contain multi language descriptions or something similar. For this we used: dimsav/laravel-translatable
With both of those you are all set, but you can also see if you like another package over the ones i listed.
I need to modify the body of an existing GitHub issue in a Project. All I'll be passed is the title of the issue, and a word (the word exists in the body, and I'll just need to fill the checkbox next it).
It looks like to do this I'll need to use the GET API to get the body of the issue, modify it, and then use the EDIT API to swap in the new body. However the GET API can only be called with the issue number. I need to do all this as quickly as possible. Is there some way to search via an API call?
Thoughts much appreciated!
Edit: All my issues are in the same project (and issue titles will be unique there). I've also recently discovered Github's GraphQL API, which may be applicable here.
You can use the issue search endpoint with the in and repo¹ keywords:
GET /search/issues?q=text+to+search+in:title+repo:some/repo
Of course, issue titles aren't guaranteed to be unique. You'll have to request each of the issues that comes back and see if its body contains the word you're looking for. Even in that case you could get multiple positive results.
It would be much better if you could search by issue number.
¹I've assumed that you really mean "repository" when you say "project". But if you're actually talking about GitHub Project Boards you can use the project keyword as well or instead.
does anybody has experience in exporting data as a FITS file with custom Metadata (FITS header) information? So far I was only able to generate FITS files with the standard Mathematica FITS header template. The documentation gives no hint on whether custom Metadata export is supported and how it might be done.
The following suggestions from comp.soft-sys.math.mathematica do not work:
header=Import[<some FITS file>, "Metadata"];
Export<"test.fits",data ,"Metadata"->header]
or
Export["test.fits",{"Data"->data,"Metadata"->header}]
What is the proper way to export my own Metadata to a FITS file ?
Cheers,
Markus
Update: response from Wolfram Support:
"Mathematica does not yet support Export of metadata for FITS file. The
example are referring to importing of this data. We do plan to support
this in the future..."
"There are also plans to include binary tables into FITS import
functionality."
I will try to come up with some workaround.
According to the documentation for v.7 and v.8, there is a couple of ways of accomplishing what you want, and you almost have the rule form correct:
Export["test.fits", {"Data" -> data, "Metadata" -> header}, "Rules"]
The other ways are
Export["test.fits", header, "Metadata"]
Export["test.fits", {data, header}, {{"Data", "Metadata"}}]
note the double brackets around the element labels in the second method.
Edit: After some testing, due to prodding from #belisarius, whenever I include the "Metadata" element, I get an error stating that it is not a valid export element. Also, you can't export a "RawData" element, either. So, I'd submit a bug for two reasons: the metadata isn't user settable which is vitally important for any serious application. At a minimum, the user should at least be able to augment the default Mathematica metadata. Second, the documentation is woefully inadequate in describing what is a "valid" export element vs. import element. Of course, I'd describe all of the documentation for v.6 and beyond as woefully inadequate, so this is par for the course.
Mathematica 9 now allows export of metadata (header) entries, which are additive to the standard required entries. In the Help browser, search "FITS" and there is an example that shows this (with Export followed by Import to verify).