I would like to extract data from this web page using R:
I guess the web page is loaded via server using Ajax or something similar. Moreover I would like to save also the data available on the following pages, the ones that I can see when I press the NEXT button at the bottom of the data table.
Thanks a lot for any tips.
domenico
It appears to be generated using a JavaScript blob. You could look at rvest, which is an R library for just this problem (web scraping). If that doesn't work, RCUrl's GetURL function definitely grabs the script contents (although it's ugly as sin and you'll be wanting to grep it. Mind you, all automatically generated HTML is ugly).
Related
I'm writing a crawler to get the content from a website which uses AJAX.
There is a "show more" button at the bottom of the page, and my origin approach is to use Selenium.PhantomJS to pretend a web browser but it works in some website and some don't.
I'm wondering if there is some way i can directly get the underly JSON file of the AJAX action. Please give me some details, thanks.
By the way, I'm using Python.
I understand this is less of a python than a scraping problem in general (and I understand you meant "scraping" instead of "crawling" as a scraper reads/parses/processes one page whereas a crawler processes multiple pages and they're relation to each other).
You can get the JSON file immediately given you know it's URL. If you don't (for example because the URL changes from time to time), you might need to search through javascript files on the page manually to find out how the URL is generated.
Once you know the JSON file's URL, it's quite simple. As you already seem to know how to get the HTML of the "main" page, you can use your existing code to get the JSON file.
I'm not familiar with PhantomJS, but I reckon it's easier to get the JSON file immediately instead of simulating an AJAX request (if that's even possible with Phantom).
I have a site that I want to load links in the centre column, so why would I use AJAX over an iframe?
AJAX is a system which allows you to get data - any data - from a server. This is in contrast to iframes which simply load in a HTML document or fragment based on a given URL.
THe strength of AJAX's flexibility over iframes is their ability to give you back raw data with which you then use to build or update your view. Consider this example: you have a dataset that needs to be displayed in tabular form, so you could set up a page that contains the table with your data in it and load that in via iframe.
A better option however, is to get the raw data from the server with AJAX and send it to your application uncluttered by markup and extra tags. Your page then builds the table in whatever part of the page it needs, according to how your application needs it to appear right at that point.
Where iframes can load HTML content that must be presented as it is built, AJAX allows the developer to separate the data from the way it is presented, giving them the freedom to manipulate it as much as they like and use it in as many ways, pages and applications as they need.
More Information
AJAX allows the client or web page to send data back to the server to do things with (such as changing passwords, updating settings, or transaction processing, etc.) so if your page or application needs two-way communication without refreshing the page, AJAX is probably the technology you'd use.
Also, using iframes restricts your ability to work with the data inside. An iframe creates a new context which you can think of as a container of sorts - the data inside is safe from the outside and vice versa. This often means your CSS won't flow down into your iframe giving you unexpected results. On some browsers, clicking a link in an iframe can actually cause the link's destination to load inside the iframe instead of changing the whole page - usually not what you want.
I have a requirement of having the same URL throughout the application navigations. Like below
http://www.[Site Name]:com. (Here User should not have the idea of chaning the URL from one page to another page)
I am using ASP.NET MVC3 with latest Razor View Engine,
Can some body give suggestions on this?
Advanced Thanks,
Satish Kotha
This may make it very difficult for users to access your site - they won't be able to bookmark a specific page, for example.
It sounds like you want a single-page-app (e.g. like Google Mail or Reader). In this case, you have one page and make heavy use of AJAX. You can query the server via javascript, and send back Partial Views, or raw data in JSON format which can be rendered on the client, possibly via some kind of templating engine.
As Graham Clark has already mentioned, this functionality will most likely be frustrating for the user; however, achieving it depends on the complexity of your project. You may want to look into using jquery to load partial views into the main content area of your site.
When you click on your navigation you can use jquery's load() to replace the main content on the page. Jquery-load-with-mvc-2-partialviews is an interesting blog post that may give you more insight into what you would want to do. Your code to load content could look something like this:
$("#mainContent").load("/Controller/Action",
{parameter}, function () {
// perform javascript actions on load complete
});
I am building a webpage that needs to Query the database and display events in your area based on the users GeoLocation. I've seen this done so many times now on a lot of websites. But I have never built this functionality before.
I have the GeoLocation part functioning. When you land on the page it displays your zip code. I'm not sure how to grab that Zipcode from the div and query the database immediatley.? This webpage being is built using Orchard MVC/NHibernate.
Any suggestion would be helpful.
It sounds like you want the entire solution, but I'm not going to go into that kind of detail.
So I'll give you tips:
Use jQuery to grab the zip code from the DOM
Use jQuery to do an AJAX call to a action method in one of your controllers, passing the zip
Have the controller call the DB and hydrate the results into JSON
Have the controller return a JsonResult
Have jQuery handle the response of the AJAX call and do something meaningful
Hi Masters Of Web Development.
I have a not that simple question this time. I have got a simple external HTML page, that I want to include in my site. The HTML page contains a submit form or something like that, and I wish to send this data from the form, without to reload the whole page. So I decided to put HTML page inside iframe. But, some people said that this is older technology, google doesn't like iframes, etc. So I want to use something like AJAX or JQuery to load that external HTML page, and to send submit form without reloading the whole page with it. :)
Any suggestions on how to make this?
Thanks in advance :)
Do you really need Google to index that iframe form? If that's the only iframe you have throughout the site, it aint going to be a problem in terms of google indexing.
About using the Iframe, if you are not comfortable learning and building ajax-type form, you'll still be fine (like what Frankie commented). Just make sure the form works, usable and compatible with popular browsers.
You want to use the jQuery Forms Plugin. Its very straightforward and easy to turn any normal HTML form into an AJAX form.