Can I the result that did FetchPage and can do FetchPage more in Yahoo!Pipes? - yahoo-pipes

I do scrape of a page with Yahoo!Pipes and want about scrape doing other pages with this result more.
For example
[FetchPage] -[Regex]
Based on it
[URLBuilder]
I want to do input of URLBuilder in Path, but will such a thing be possible?

You can use a Number Input as an input to the Fetch Page (if you want to page from different parts of a page) or URL Builder (if the page supports server-side paging)

Related

FHIR Page number support

Foe search interactions, while there is support for specifying the number of items expected in the response using the _count parameter, we are not able to find any reference to a parameter to specify the page number.
The _query parameter can be used for custom queries, but is that an option or is there a better alternative.
For example, what is the standard way to request for the second page of a patient resultset with each page having 10 records? -
GET Patient?_count=10&[pagenumber?]=2
There's no mechanism to navigate to a specific page. You use the URLs provided in Bundle.link (e.g. previous, next, first, last) to navigate through the search result set.

Yahoo Pipes to loop through all pages

I am looking to pull job postings from a site that has multiple pages of postings. I can pull the content from one page
On a simple example I can get it to iterate and grab page content (this is a simple example site base)
However when I take the first example and try to clean the data (I can't use the Xpath filter to grab the HTML id and I cand seem to find a way to limit the scope elsewhere. Here is what I am trying (regex, rename...):
http://pipes.yahoo.com/pipes/pipe.edit?_id=3619ea93d66e47442659a1976746ba6c
Any thoughts?

Use of Mechanize

I want to get response from websites that take a simple input, which is also reflected in the parameter of the url. Is it better to simply get the result by using conventional methods, for example OpenURI.open_uri(...) with some parameter set, or it is better to use mechanize, extract the form, and get the result through submit?
The mechanize page gives an example of extracting a form and submitting it to get the search result from Google search. However, this much can be done simply as OpenURI.open_uri("http://www.google.com/search?q=...").read. Is there any reason I should try to use one way or the other?
There are lots of sites where it turns out to be easiest to use mechanize. If you need to log in, and set a cookie before accessing the data, then mechanize is a simple way of doing this. Similarly, if there are lots of hidden fields that need to be matched (such as CSRF token), then fetching the page using mechanize then submitting it with the data filled out is often a more foolproof method that crafting the URL yourself.
If it is a simple URI, like google's search pages, then manually constructing it may be simpler.

Spring MVC AbstractWizardFormController Question

I am using a controller implementation that is extending the Spring MVC
AbstractWizardFormController
This wizard controller will consist of 4 pages. The first 2 pages are used to collect information. The third page will show results based on what information is submitted on page 1 and 2.
So to be a little more specific
Page 1 the user will select a state and some other information
Page 2 the user will enter more information such as contact information
Page 3 will display information dependent on the information collected in first two pages
There is more pages after this, but they do not pertain, so if the first think you are thinking of is using onSubmit(), then it wont work because it is not the end of the controller life.
I need to collect all the data from the first two pages, and then run a db query and return it to the third page. where and how is the best way to do this, do I run the query in reference data when returning to the third page?
You can use postProcessPage method. Its API is clear
Post-process the given page after binding and validation, potentially updating its command object. The passed-in request might contain special parameters sent by the page.

Scraping pages with asynchronous responses with Hpricot

I'm trying to scrape a page but the initial response has nothing in the body as the content is pumped in asynchronously, e.g. the results from a search on the apple website: http://www.apple.com/uk/search/?q=searching+for+something&sec=global
Any ideas on how I can successfully grab the results from the search with hpricot?
Thanks.
When the search page you refer to is loaded, it makes a request via javascript/ajax to some other location, then populates the search results. This is what you're seeing in the page. Hpricot itself can't help you here because it has no way to interpret the javascript that comes with the page in order to fetch the actual search results list.
Now, if what you're interested in are the search results, you'd need to analyze a bit what happens when you enter that page and type a search query. Some javascript in the page takes your query, and calls (via XMLHttpRequest or similar, AJAX techniques) some other script in Apple's server. This is the one that actually does the search in a database and returns the result.
I suggest you install Firefox with the Firebug plugin, or some other way of seeing the actual requests a page and its javascript components send and / or receive. You'll see that, for the search page you referred, it fetches two parts: First, the "featured" results that come from this URL:
http://www.apple.com/global/scripts/search_featured.php?q=mac+mini&section=global&geo=uk
Notice the search string is in the "q" parameter.
Second, a long results list comes from here:
http://www.apple.com/search/service/nph-search10?site=uk_www&filter=1&snum=50&q=mac+mini
These both are XML documents; you might have better luck parsing these URLs with Hpricot.

Resources