Where does firefox load page content - firefox

Background
I am reworking the remote-browser plugin for ettercap.
I catch the url and extract it.
I have set a filter with an "Accept: ..." Statement so not everything will be opened. But some Ads, Images still manage to pass this filter and open themselves in my browser.
I was hoping to find a solution in the firefox source code. I already looked over various files and researched a lot.
I need to know how firefox knows an incoming link via HTTP-Package is an image/ad for the page it is working on or it is a new page which must open in a new tab.
Question
So the question is:
In which file or method does firefox load external content like ads or images.
Thanks in advance

Related

Download a file without URL

I'm updating some old CasperJS code that downloads a CSV report. The web interface recently changed. The old version had a link tag I could grab and then use casper.download() to retrieve the file.
However, the new version appears to be an Angular app and the download button triggers a handleDownload() function that does something under the hood, which results in a popup dialog in my browser.
Is there some way to intercept this dialog or otherwise extract the URL from the actual file?
A few options:
You can see what URL is requested (F12 > Network in Chrome). You could then try to deduce the URL.
Look at what handleDownload does - the logic should be available to
you. You may be able to pull data there.
Hard to help without seeing the code.

how does usatoday display URI for news docs?

I am developing a web app/message board in AJAX. Ive come to the part where I need to decide how to display threads.
Should I refresh a completely new page for each thread? Or load it via AJAX. Obviously, I want each thread to be crawlable, linkable, and saveable as a favorite in your browser.
Then I saw USAToday's website (www.usatoday.com/news). Its very interesting how they load the page through a popup window, change the URI, and keep the data in the background.
This is exactly what I want, but I don't know what they are doing.
Can anyone else decipher this or lead me down the right path?
My impeccable googling skills has led me to believe that the answer lies in pushState.
http://www.seomoz.org/blog/create-crawlable-link-friendly-ajax-websites-using-pushstate
Essentially, it appears they are...
using the HREF of the provided link to change the URI via pushState.
using AJAX to load the contents of the page accessed via the link.
on close, they most likely use data from the newly loaded page to figure out what section its was under(sports, entertainment, etc), and reload that page.

Scraping content from an AJAX/Javascript web page

I need to do some screen scraping on a web page where the content I need is generated by AJAX. On the initial page there is a table with 4 tabs. When you click on any of the tabs the content of the table changes. I need the content from the 3rd tab only.
I have used the google chrome 'Inspect Element' tool to see what the requests and post data was and I can get the information I need when I put the information (session id and a lot of other cookie data as well as post data) from the inspect element result into a PHP curl request. But this only works for the 30 minutes that the session lasts. Does anyone know of a way I can get to this information?
I wont reproduce the code here but I will point you to the answer.
Its within this book:
http://www.amazon.com/Webbots-Spiders-Screen-Scrapers-Developing/dp/1593273975/ref=dp_ob_image_bk
A must buy for someone doing what your doing.
In the end I used htmlunit to get the content I needed. I also found the HTMLUnit Scripter very useful to help generate the Java code required.

Making content accessible on Addon SDK

I am developing an addon using Firefox's Addon SDK (v. 1.11). My extension dynamically creates an iframe on each website and then loads an html file which includes other resources such as images, font files, etc. from the add on's local directory.
Problem
When loading any of such local resources (i.e.: "resource://" schema), the iframe fails to display them and a message is thrown:
Security Error: Content at http: //www.XXX may not load or link to
resource://XXX
This is a security measure introduced on Firefox 3. When developing without the Addon SDK, the way around it is declaring a directory with "contentaccessible=yes", making the directory's contents accessible to anyone, including my add on. However, I have not been able to find similar functionality using the Addon SDK. Is there a better way of using local data on an iframe that my addon creates and inserts into a page?
I don't think you can directly load an iFrame that points to a resource inside your URL. The browser complains because it's either breaking same origin policy or cross site scripting one. I can't remember which one right now.
if it is html content you want to load you can always inject it into the DOM and then send a message to the document object using the events API to display your custom html. I've done this in the past and it works.
so from main.js send a message to content script which will then inject your iframe html into the DOM and then you can send the document object a message to display it.
I hope this helps.
Not sure if this was the case when you posted the question, but it appears that "resource://" should no longer be used with the Addon SDK.
If you're using the resource inside of an HTML file in the extension, you can reference it locally, otherwise you should use data.url('whatever.jpg') and pass around that value as needed.
Full info is here: http://blog.mozilla.org/addons/2012/01/11/sdk-1-4-known-issue-with-hard-coding-resource-uris/

Ajax generated pages with different URLs

I couldn't really word the title very well, but here's my problem: I've got a webpage that reads from a database each time the user clicks a button, the content is then replaced for part of the page.
Because it is an ajax load, everything is done in the background, and so the URL stays the same. This wasn't be a problem at all until I realised that I will want to have a different Facebook comments box for each set of content that is loaded - so if someone comments, it is posted to their facebook profile, people click on the link and are then taken to different content.
So... what I need is some way of referencing each set of content, and I've found a site that does exactly that (I'm sure there are a lot of them).
Here's the link.
Each set of content has a different 'hash code' (because I don't know the actual name for it) which is appended to the URL - in this case the code is "#1922934", this allows people to post links to it that specific set of content on Facebook etc. - and also allows a different Facebook comment box for each set of content.
Does anyone know how such a set-up can be achieved or how these 'hash codes' work?
Here's a document from wikipedia on it.
[http://en.wikipedia.org/wiki/Fragment_identifier][1]
The main idea is that URI fragments are used because they don't cause a page reload. They also can be used to refer to anchors on a web page.
What I would do is on page load use JavaScript to read the URI fragment (location.hash) then make a request to your server to load the comments etc. The URI fragment cannot be read by a server and is only found through a client (browser)
Sounds like you want something like SammyJS.

Resources