What I want to know is how to know which file is being called to render the webpage
i.e., for example I am visiting an url: http://local-servername/abc/0/86#Create
the page is loading fine but what I should know is from which code is being called and how the page is loading
I am using fiddler to get that info but I am unable get the file names and location.
please suggest how to use fiddler or is there any other tools for this
thanks in advance
Related
I'm updating some old CasperJS code that downloads a CSV report. The web interface recently changed. The old version had a link tag I could grab and then use casper.download() to retrieve the file.
However, the new version appears to be an Angular app and the download button triggers a handleDownload() function that does something under the hood, which results in a popup dialog in my browser.
Is there some way to intercept this dialog or otherwise extract the URL from the actual file?
A few options:
You can see what URL is requested (F12 > Network in Chrome). You could then try to deduce the URL.
Look at what handleDownload does - the logic should be available to
you. You may be able to pull data there.
Hard to help without seeing the code.
Background
I am reworking the remote-browser plugin for ettercap.
I catch the url and extract it.
I have set a filter with an "Accept: ..." Statement so not everything will be opened. But some Ads, Images still manage to pass this filter and open themselves in my browser.
I was hoping to find a solution in the firefox source code. I already looked over various files and researched a lot.
I need to know how firefox knows an incoming link via HTTP-Package is an image/ad for the page it is working on or it is a new page which must open in a new tab.
Question
So the question is:
In which file or method does firefox load external content like ads or images.
Thanks in advance
I'm writing a crawler to get the content from a website which uses AJAX.
There is a "show more" button at the bottom of the page, and my origin approach is to use Selenium.PhantomJS to pretend a web browser but it works in some website and some don't.
I'm wondering if there is some way i can directly get the underly JSON file of the AJAX action. Please give me some details, thanks.
By the way, I'm using Python.
I understand this is less of a python than a scraping problem in general (and I understand you meant "scraping" instead of "crawling" as a scraper reads/parses/processes one page whereas a crawler processes multiple pages and they're relation to each other).
You can get the JSON file immediately given you know it's URL. If you don't (for example because the URL changes from time to time), you might need to search through javascript files on the page manually to find out how the URL is generated.
Once you know the JSON file's URL, it's quite simple. As you already seem to know how to get the HTML of the "main" page, you can use your existing code to get the JSON file.
I'm not familiar with PhantomJS, but I reckon it's easier to get the JSON file immediately instead of simulating an AJAX request (if that's even possible with Phantom).
i am attempting to copy a tutorial example found here http://html5.gingerhost.com/ but whenever i try to refresh the page it takes me to a "500 Internal Server Error". also when i click the link and the page loads the other content, when i look at the source code it only shows my another "500 Internal Server Error"
please help!
thanks alot!
This is because the server needs to be able to understand the URLs too. So for example, you're page is at http://example.com. You use a link's click event to make it http://example.com/more-info. But if http://example.com/more-info doesn't exist on the server, refreshing the page won't work properly because the browser won't know that /more-info is actually part of the index page. So pushstate only works if the client and the server both recognise the new URLs.
I have been trying to scrape information from a site where the relevant information is continuously updated using Ajax calls . Inspite of repeated attempts, I haven't been able to determine the link from which the ajax calls are receiving data . So , I decided to change track and use firebug to get the data since it is showing me the data got from the Ajax call. So, does Firebug save the page source anywhere and can we access that by any chance?
This is the link and I am trying to get the song + movie name from the bottom of the page which is updated using Ajax calls. I have tried going through the firebug source code to determine the save but that has also not yielded any result.
Actually I did find the solution to my problem . Looking under Net tab in firebug, I was able to find out the ajax call that was editing the particular site. Using the response I finally managed to use urllib and urllib2 to get the required data :).
As for how firebug gets the source code , there were a few links that I read which said that Firebug gets the exact source code at that instant because it is a browser extension and so has more access to the browser variables and the server response. But not sure how correct the information is.
Cheers