I need to scrape that number xpath + aspx

I need to scrape that number xpath + aspx - xpath

I'm trying to get the total number of the page (or the page value) of this URL:
http://scorelibrary.fabermusic.com/String-Quartet-No-2-300-Weihnachtslieder-23032.aspx
1/58
I think that I can't because the values are inside in the ASPX frame.
I'll try a lot of thing. This is the line:
<label id="page_count">1/58</label>
using the following XPath
//label[#id='page_count']/text()
How can I use XPath inside the ASPX frame to get the page value?

You are right, you cannot get that value directly because the element is in an <iframe>, and therefore lives in a different context. You need to activate or switch to the URL context of the iframe. There are JavaScript approaches like postMessage, but I think the easiest way is loading the URL of the iframe directly and access the DOM from there.

Related

Copying the xpath from Instagram inspect (using chrome) returns an empty list

So I would go to an instagram account, say, https://www.instagram.com/foodie/ to copy its xpath that gives me the number of posts, number of followers, and number of following.
I would then run the command this command on a scrapy shell:
response.xpath('//*[#id="react-root"]/section/main/article/header/section/ul')
to grab the elements on that list but scrapy keeps returning an empty list. Any thoughts on what I'm doing wrong here? Thanks in advance!

This site is a Single Page Application (SPA) so it's javascript that render DOM is not rendered yet at the time your downloader working.
When you use view(response) the javascript that your downloader collected can continue render by your browser, so you can see the page with DOM rendered (but can't interacting with Site API). You can look at your downloaded content via response.text and saw that!
In this case, you can apply selenium + phantomjs to making a rendered page for your spider!
Another trick: You can use regular expression to select the JSON part of Script, parse it to JSON obj and select your correspond attribute value (number of post, following, ...) from script!

Why can I not locate a button using XPath?

I have the following HTML:
<input type="button" value="Close List" class="tiny round success button" id="btnSaveCloseListPanel">
The following code does not work:
# browser.button(:value => "Close List").click # does not work - timeout
browser.button(:xpath => "/html/body/center/div/div[9]/div[2]/input[2]").when_present.click
The error is:
Watir::Wait::TimeoutError:
timed out after 60 seconds
when_present(300) does not work.
I found the XPath using Firefox Developer Tools. I used the complete path to avoid any silly errors. I can find the same path manually in IE.
The component is a .NET MVC popup. I think it's called a "panel". The panel is a grandchild of the Internet Explorer tab.
The panel contains a datepicker, a dropdown, a text box, and 3 buttons. I can't find any of these using Watir. I can find anything in the panel's parent (obviously).
The underlying code does not seem to be aware that something actually doesn't exist. To prove that, I tested the following XPath, which is simply the above XPath with the middle bit removed:
browser.button(:xpath => "/html/body/center/div/input[2]").when_present.click
The error is "timeout", rather than "doesn't exist".
So, the code seems to be unaware that:
input[1] does not exist, therefore input[2] cannot exist.
div[2] does not exist.
Therefore there's nothing left to search.
Added:
I'm changing the specific element that I want to find.
Reason: The button in my OP was at the foot of the panel. I was going cross-eyed trying to step upwards through hundreds of lines of HTML. Instead, I'm now using the first field in the panel. All the previous info is still the same.
The first field is a text field with datepicker.
The HTML is:
<input type="text" value="" style="width:82px!important;" readonly="readonly" name="ListDateClosed" id="ListDateClosed" class="hasDatepicker">
Using F12 in Firefox, the XPath is:
/html/body/center/div/div[1]/div[2]/input
But, now, with a lot less lines of HTML, I can clearly see that the html tag is not the topmost html tag in the file. The parent of html is iframe
I've never used iframe before. Maybe this is what t0mppa was referring to in his comment to the first questiion.
As an experiment, I modified my XPath to:
browser.text_field(:xpath, '//iframe/html/body/center/div/div[1]/div[2]/input').when_present.set("01-Aug-2014")
But this times out, even with a 3-minute timeout.

Given that the elements are in an iframe, there are two things to note:
Unlike other elements types, you must always tell Watir when an element is in an iframe.
XPaths (in the context of Watir) cannot be used to cross into frames.
Assuming that there is only 1 iframe on the page, you can explicitly tell Watir to search the first iframe by using the iframe method:
browser.iframe.text_field(:xpath, '//body/center/div/div[1]/div[2]/input').when_present.set("01-Aug-2014")
If there are multiple iframes, you can use the usual locators to be more specific about which iframe. For example, if the iframe had an id:
browser.iframe(id: 'iframe_id')
.text_field(xpath: '//body/center/div/div[1]/div[2]/input')
.when_present
.set("01-Aug-2014")

Inline navigation with hashbang pages

In the past, I used to rely on hash for inline navigation, for example:
http://url?Category=a&item=3#Paragrah1
(Pointing to Paragraph1 within the http://url?Category=a&item=3 page)
With the widespread use of ajax, the hash tag has switched to a different purpose, allowing page refresh without full page reload. For example:
http://url#!Category=a&item=3
http://url#!Category=a&item=4 (the page switches to item 4, no full page reload)
My question: how can I make inline navigation work in such pages? To take the above example, how can I point to Paragraph1 in the http://url#!Category=a&item=4 page?

Use the HTML5 history API instead. Then you can use the hash for scrolling again.

If you need to use the hash # for both webapp page navigation and also for moving to a specific element on a specific page, then you'll need to handle the scrolling yourself.
javascript provides: window.scroll(x, y).
In your example, when you handle the URL http://url#!Category=a&item=4, you'll need to do a window.scroll using coordinates that cause Paragraph1 to move to the right place on the page, i.e. the top. You'll need to adjust these coordinates anytime the page layout changes.

Is it possible loading a different code of html in the same division? Based on anchors clicked

I mean without requiring to refresh the whole page, Is it possible to load a part of html code through Ajax into the same division.
eg. If the user clicks on the PROFILE anchor (not having href) the anchor should generate request-object and load the content corresponding to PROFILE in the division below it??
Similarly if he clicks SETTINGS it should load settings details in the same division which was used to load the profile.

Yes you can do this. After you do the ajax call, once you get the response, you can bind it to the div. For this set some id to the div, do the document.getElementById('id').innerHTML = 'Your html response'.
Try exploring jQuery.
Hope this helps.

How to extract website information using XPath inside firefox extension?

I have made a firefox extension which loads a web page using xmlhttprequest.
My extension has it's own window opened alongside the main Firefox.
The idea of my extension is to load a webpage in memory, modify it and publish in newly opened tab in firefox.
The webpage has a div with id "Content". And that's the div i want to modify. I have been using XPath alot in greaseMonkey scripts and so i wanted to use it in my extension, however, i have a problem. It seems it doesn't work as i would want. I always get the result of 0.
var pageContents = result.responseText; //webpage which was loaded via xmlhttprequest
var localDiv = document.createElement("div"); //div to keep webpage data
localDiv.innerHTML = pageContents;
// trying to evaluate and get the div i need
var rList = document.evaluate('//div[#id="content"]', localDiv, null XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
The result is always 0 as i said. Now i have created the local div to store website data because i cannot parse the text using XPath. And document in this case is my extensions XUL document/window.
I did expect it to work, but i was wrong.
I know how to extract the div using string.indexOf(str) and then slice(..). However, thats very slow and is not handy, because i need to modify the contents. Change the background, borders of the many forms inside this div. And for this job, i have not seen a better method than evaluating XPath to get all the nodes i need.
So main question is, how to use XPath to parse loaded web page in firefox extension?
Thank you

Why not load the page in a tab, then modify it in place, like Greasemonkey does?
As for your code, you don't say where it executes (i.e. what is document.location?), but assuming it runs in a XUL window, it makes no sense -- document.createElement will not create an HTML element (but a XUL div element, which has no special meaning), innerHTML shouldn't work for such element, etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

I need to scrape that number xpath + aspx - xpath

Related

Copying the xpath from Instagram inspect (using chrome) returns an empty list

Why can I not locate a button using XPath?

Inline navigation with hashbang pages

Is it possible loading a different code of html in the same division? Based on anchors clicked

How to extract website information using XPath inside firefox extension?

Categories

Resources