How to extract website information using XPath inside firefox extension? - firefox

I have made a firefox extension which loads a web page using xmlhttprequest.
My extension has it's own window opened alongside the main Firefox.
The idea of my extension is to load a webpage in memory, modify it and publish in newly opened tab in firefox.
The webpage has a div with id "Content". And that's the div i want to modify. I have been using XPath alot in greaseMonkey scripts and so i wanted to use it in my extension, however, i have a problem. It seems it doesn't work as i would want. I always get the result of 0.
var pageContents = result.responseText; //webpage which was loaded via xmlhttprequest
var localDiv = document.createElement("div"); //div to keep webpage data
localDiv.innerHTML = pageContents;
// trying to evaluate and get the div i need
var rList = document.evaluate('//div[#id="content"]', localDiv, null XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
The result is always 0 as i said. Now i have created the local div to store website data because i cannot parse the text using XPath. And document in this case is my extensions XUL document/window.
I did expect it to work, but i was wrong.
I know how to extract the div using string.indexOf(str) and then slice(..). However, thats very slow and is not handy, because i need to modify the contents. Change the background, borders of the many forms inside this div. And for this job, i have not seen a better method than evaluating XPath to get all the nodes i need.
So main question is, how to use XPath to parse loaded web page in firefox extension?
Thank you

Why not load the page in a tab, then modify it in place, like Greasemonkey does?
As for your code, you don't say where it executes (i.e. what is document.location?), but assuming it runs in a XUL window, it makes no sense -- document.createElement will not create an HTML element (but a XUL div element, which has no special meaning), innerHTML shouldn't work for such element, etc.

Related

I need to scrape that number xpath + aspx

I'm trying to get the total number of the page (or the page value) of this URL:
http://scorelibrary.fabermusic.com/String-Quartet-No-2-300-Weihnachtslieder-23032.aspx
1/58
I think that I can't because the values are inside in the ASPX frame.
I'll try a lot of thing. This is the line:
<label id="page_count">1/58</label>
using the following XPath
//label[#id='page_count']/text()
How can I use XPath inside the ASPX frame to get the page value?
You are right, you cannot get that value directly because the element is in an <iframe>, and therefore lives in a different context. You need to activate or switch to the URL context of the iframe. There are JavaScript approaches like postMessage, but I think the easiest way is loading the URL of the iframe directly and access the DOM from there.

Copying the xpath from Instagram inspect (using chrome) returns an empty list

So I would go to an instagram account, say, https://www.instagram.com/foodie/ to copy its xpath that gives me the number of posts, number of followers, and number of following.
I would then run the command this command on a scrapy shell:
response.xpath('//*[#id="react-root"]/section/main/article/header/section/ul')
to grab the elements on that list but scrapy keeps returning an empty list. Any thoughts on what I'm doing wrong here? Thanks in advance!
This site is a Single Page Application (SPA) so it's javascript that render DOM is not rendered yet at the time your downloader working.
When you use view(response) the javascript that your downloader collected can continue render by your browser, so you can see the page with DOM rendered (but can't interacting with Site API). You can look at your downloaded content via response.text and saw that!
In this case, you can apply selenium + phantomjs to making a rendered page for your spider!
Another trick: You can use regular expression to select the JSON part of Script, parse it to JSON obj and select your correspond attribute value (number of post, following, ...) from script!

Get computed font-size in Rails

I'm using Rails 3 to scrape a website, and doing a query like so:
agent = Mechanize.new
doc = agent.get(url)
I'm then doing
doc.search("//div")
Which returns a list of all divs on the page. I'd like to select the div that has the largest font size. Is there anyway to use Mechanize, Nokogiri, or any other Rails gem to find the computed font-size of a div, and from there, choose the one with the largest font size?
Thanks
You can't do this with Mechanize or Nokogiri, because they simply read the static HTML. Yet font size isn't usually defined in HTML anymore; it is generally defined in CSS or added programmatically using JavaScript.
The only solution is to be able to execute JavaScript and use JavaScript's getComputedStyle method which can get the font size that has been applied to an element (via either CSS or JS). So you need a way to inject JS into your pages and get a result. This may be possible using watir-webdriver, because Selenium has hooks to do this. See the very end of this page for instructions on how to inject JS and return a result back to the caller in Selenium. Another option is PhantomJS which is a headless browser with a JS API.

Programmatically Open Link in WebView

I have an NS Window with a WebView.
My program takes in a search query and executes a Google search with it, the results being displayed in the WebView, like a browser.
Instead of displaying the search results in the WebView, I'd like to automatically open the first link and display the contents of that result instead.
As a better example, how do I display the contents of the first result of Google in a WebView?
Is this even possible?
Any help greatly appreciated. Thanks!
You could use the direct Google Search API. That would be more convinient.
https://developers.google.com/custom-search/v1/cse/list?hl=de-DE
Also you could also try to make a google request like the "I'm feeling lucky" button, which will direct you automatically to the first search result.
If you have to parse the HTML, you need to have a look at the HTML structure of the google result page. Look for specific id and class css properties in the div and a tags. If you found the ones, where the actual results are you can start parsing that content. Also i guess it would be easier to put some javascript together, that will find the first result and open it. (More easier than parsing the HTML using obj-c). You can evaluate javascript in the webview using [myWebView stringByEvaluatingJavaScriptFromString: #"put your js code here"].
Sure it is possible.
The first way to accomplish that that goes through my head is to parse the HTML response from Google, then launch a WebView with the first link you extracted.
Take a look at regular expressions to make it easy.

How do I target the search bar from a Firefox extension?

I'm new to building extensions -- and would like some help with knowing how to target the standard Google search bar that comes with Firefox..
I am thinking I have to find out which meni ID it is and assign it somehow within the .xul file..
From chrome (normally overlay of the chrome://browser/content/browser.xul) you can get access to search bar by getting it with document.getElementById('searchbar') The best way to find ids you need is by using Dom Inspector: https://addons.mozilla.org/en-US/firefox/addon/dom-inspector-6622/
Anyhow, if you want to access inner dom elements of the searchbar you'll need to use getAnonymousElementByAttribute because that's anonymous content (from XBL binding). So if you need to get input element itself (where you're typing your search terms) you'll do it something like this from chrome:
var searchbarElement = document.getElementById('searchbar');
var input = document.getAnonymousElementByAttribute(searchbarElement, 'anonid', 'input');
You'll need to use Dom Inspector to figure out which element you need and how to access it.

Resources