I use Google Chrome code inspect utility at the page http://7pay.in/to_phone?phone=9101235577 when I scroll down to Bitcoin icon, something happening in ajax: I don't know how ajax works at all, but I really want to parse the Bitcoin address which is not represented as HTML code when you fetch the link given above, until you press Bitcoin button (Its not represented when I curl or wget the page)
Could you please give me a hint on how to work with that kind of ajax elements. I understand logically that by default some parts of page are hidden by Javascript, but I can't quite understand what I should do in order to work with it using PHP Simple DOM parser.
Related
So I would go to an instagram account, say, https://www.instagram.com/foodie/ to copy its xpath that gives me the number of posts, number of followers, and number of following.
I would then run the command this command on a scrapy shell:
response.xpath('//*[#id="react-root"]/section/main/article/header/section/ul')
to grab the elements on that list but scrapy keeps returning an empty list. Any thoughts on what I'm doing wrong here? Thanks in advance!
This site is a Single Page Application (SPA) so it's javascript that render DOM is not rendered yet at the time your downloader working.
When you use view(response) the javascript that your downloader collected can continue render by your browser, so you can see the page with DOM rendered (but can't interacting with Site API). You can look at your downloaded content via response.text and saw that!
In this case, you can apply selenium + phantomjs to making a rendered page for your spider!
Another trick: You can use regular expression to select the JSON part of Script, parse it to JSON obj and select your correspond attribute value (number of post, following, ...) from script!
I am trying to scrape some data from the following website: https://xrpcharts.ripple.com/
The data I am interested in is Total XRP which you can see immediately below or to the side (depending on your browser) of the circle diagram. So what I first did was inspect the element I am interested in. So I see that it is inside <div class="stat" inside span ng-bind="totalXRP | number:2" class="ng-binding">99,993,056,930.18</span>.
The number 99,993,056,930.18 is what I am interested in.
So I started in a scrapy shell and wrote:
fetch("https://xrpcharts.ripple.com")
I then used chrome to copy the Xpath by right clicking on that place of HTML code, the result chrome gave me was:
/html/body/div[5]/div[3]/div/div/div[2]/div[3]/ul/li[1]/div/span
Then I used the Xpath command to extract the text:
response.xpath('/html/body/div[5]/div[3]/div/div/div[2]/div[3]/ul/li[1]/div/span/text()').extract()
but this gave me an empty list []. I really do not understand what I am doing wrong here. I think I am making an obvious mistake but I dont see it. Thanks in advance!
The bottom line is: you cannot expect the page you see in the browser to be the same page Scrapy would download and have available to work with. Scrapy is not a browser.
This page is quite dynamic and complex and is constructed with the help of multiple asynchronous requests bringing in both the logic and the data. There is also JavaScript executed in the browser that plays an important role in forming and supporting the HTML document object tree.
Scrapy does not have all these things, the thing you get when you do fetch() is just the very first initial "bare bones" HTML page without all the "dynamic content".
I want to click a link with Mechanize that I select with xpath (nokogiri).
How is that possible?
next_page = page.search "//div[#class='grid-dataset-pager']/span[#class='currentPage']/following-sibling::a[starts-with(#class, 'page')][1]"
next_page.click
The problem is that nokogiri element doesn't have click function.
I can't read the href (URL) and send get request because the link has onclick function defined (no href attribute).
If that's not possible, what are the alternatives?
Use page.at instead of page.search when you're trying to find only one element.
You can make your selector simpler (shorter) by using CSS selector syntax:
next_page = page.at('div.grid-dataset-pager > span.currentPage + a[class^="page"]')
You can construct your own Link instance if you have the Nokogiri element, page, and mechanize object to feed the constructor:
next_link = Mechanize::Page::Link.new( next_page, mech, page )
next_link.click
However, you might not need that, because Mechanize#click lets you supply a string with the text of the anchor/button to click on.
# Assuming this link text is unique on the page, which I suspect it is
mech.click next_page.text
Edit after re-reading the question completely: However, none of this is going to help you, because Mechanize is not a web browser! It does not have a JavaScript engine, and thus won't (can't) execute your onclick for you. For this you'll need to use Ruby to control a real web browser, e.g. using Watir or Selenium or Celerity or the like.
In general you would do:
page.link_with(:node => next_link).click
However like Phrogz says, this won't really do what you want.
Why don't you use a hpricot element instead? Mechanize can click on a hpricot element as long as the link has a 'src' or 'href' attribute. Try something along these lines:
page = agent.get("http://www.example.com")
next_page = agent.click((page/"//your/xpath/a"))
Edit After reading Phrogz answer I also realized that this won't really do it. Mechanize doesn't support Javascript yet. With this in mind you have 3 options.
Use a library that controls a real web browser. See #Phrogz answer.
Use Capybara which is an integration testing library but can also be used as a stand alone crawler. I've done this successfully with HTMLUnit which is a also an integration testing library in Java. Capybara comes with Selenium support by default though it also supports Webkit via an external gem. Capybara interprets Javascript out of the box. This blog post might help.
Grok the page that you intend to crawl and use something like HTTPFox to monitor what the onclick Javascript function does and replicate this in your Mechanize script.
Good luck.
I have an NS Window with a WebView.
My program takes in a search query and executes a Google search with it, the results being displayed in the WebView, like a browser.
Instead of displaying the search results in the WebView, I'd like to automatically open the first link and display the contents of that result instead.
As a better example, how do I display the contents of the first result of Google in a WebView?
Is this even possible?
Any help greatly appreciated. Thanks!
You could use the direct Google Search API. That would be more convinient.
https://developers.google.com/custom-search/v1/cse/list?hl=de-DE
Also you could also try to make a google request like the "I'm feeling lucky" button, which will direct you automatically to the first search result.
If you have to parse the HTML, you need to have a look at the HTML structure of the google result page. Look for specific id and class css properties in the div and a tags. If you found the ones, where the actual results are you can start parsing that content. Also i guess it would be easier to put some javascript together, that will find the first result and open it. (More easier than parsing the HTML using obj-c). You can evaluate javascript in the webview using [myWebView stringByEvaluatingJavaScriptFromString: #"put your js code here"].
Sure it is possible.
The first way to accomplish that that goes through my head is to parse the HTML response from Google, then launch a WebView with the first link you extracted.
Take a look at regular expressions to make it easy.
I have a project that require me to do a auto complete search using CI and YUI.
User will enter search keywords in textfield 1 and the result will shown in textfield 2..
YUI is client-side JavaScript. What you should do is read some documentation on autocomplete first, and on CodeIgniter second and then you could start building your solution.
Basically, you have to make a few things.
For one, you should create a client-side page with what is basically an <input> or <textarea> tag and a div for your results and hook a YUI autocomplete handler to that field. To be able to use YUI, you first need to include the YUI script:
<script src="http://yui.yahooapis.com/3.4.1/build/yui/yui-min.js"></script>
And you also need to load the autocomplete module and configure it:
YUI().use('autocomplete', function (Y) {
// enter code here
}
Second, you need to build a server-side script with CI that will listen for queries, do database or whatever searches and return JSON or some other format of results.
You'll then give the URL of this CI query web script to YUI AutoComplete configuration and YUI will do the AJAX and stuff for you.
Those are very general steps, but your question is also pretty generic. If you ask more specific questions, you'll get better answers (or better, just search and read the documentation).