How to access page URL with XPath? - xpath

Is there any possibility to address with XPath a page URL, which appears only in the browser address bar (not in the head or body)?

You can access with XPath the nodes in the document. Nothing else.
Determining a URL depends on what tools you're using and what you want to achieve, XPath alone won't help you.

Related

Setting the correct xpath

I'm trying to set the right xpath for using RSelenium, but I'm not very experienced in this area, so any help would be much appreciated.
Since I'm not allowed to post pictures yet I have tried to add a link to a screenshot of the html:
The html
I need R to scrape the dates (28-10-2020 - 13-11-2020), but so far I have not been able to set the correct xpath when using html.nodes.
I'm trying to scrape from sites like this one: https://www.boligsiden.dk/adresse/topperne-9-3-33-2620-albertslund-01650532___9__3__33
I usually do this on python rather than R
As you can see in this image when you right-click on the element concerned. You get a drop-down menu with an x-path to the element.
Other than that, the site orientation and x-path might change and a full x-path might be a good option in the short-run, so I rather prefer driver.find_element_by_xpath('//button[contains(text(),"Login")]')\ .click()
In your case which would be find_element_by_xpath('//*[contains(#class, 'u-pb-4 u-block')]')
I hope this helps and it is mostly the same across different languages

Confused about scrapy and Xpath

I am trying to scrape some data from the following website: https://xrpcharts.ripple.com/
The data I am interested in is Total XRP which you can see immediately below or to the side (depending on your browser) of the circle diagram. So what I first did was inspect the element I am interested in. So I see that it is inside <div class="stat" inside span ng-bind="totalXRP | number:2" class="ng-binding">99,993,056,930.18</span>.
The number 99,993,056,930.18 is what I am interested in.
So I started in a scrapy shell and wrote:
fetch("https://xrpcharts.ripple.com")
I then used chrome to copy the Xpath by right clicking on that place of HTML code, the result chrome gave me was:
/html/body/div[5]/div[3]/div/div/div[2]/div[3]/ul/li[1]/div/span
Then I used the Xpath command to extract the text:
response.xpath('/html/body/div[5]/div[3]/div/div/div[2]/div[3]/ul/li[1]/div/span/text()').extract()
but this gave me an empty list []. I really do not understand what I am doing wrong here. I think I am making an obvious mistake but I dont see it. Thanks in advance!
The bottom line is: you cannot expect the page you see in the browser to be the same page Scrapy would download and have available to work with. Scrapy is not a browser.
This page is quite dynamic and complex and is constructed with the help of multiple asynchronous requests bringing in both the logic and the data. There is also JavaScript executed in the browser that plays an important role in forming and supporting the HTML document object tree.
Scrapy does not have all these things, the thing you get when you do fetch() is just the very first initial "bare bones" HTML page without all the "dynamic content".

How to get href with Watir using Ruby

I'm trying to use Watir to grab a specific link on a page:
Screenshot: Here is the href I am trying to grab.
My guess is I need to specify the ancestor element biz-website(?) then traverse down to the a tag and grab its href somehow, but I'm not sure what the syntax of my code would need to be do that.
Any ideas or tips?
You should be able to get the value of the href with
browser.span(:class, 'biz-website').a.href
If the class 'biz-website' is not unique for spans on your page, you can also use 'biz-website js-add-url-tagging'. If that is still not unique, you could also try
browser.span(:text, 'Business website').parent.a.href

XPath Next Page navigation

I'm using Chrome Data Miner, and so far, failing to extract the data from my query: http://www.allinlondon.co.uk/restaurants.php?type=name&rest=gluten+free
How to code the Next Element Xpath for this website? I tried all the possible web sources, nothing worked.
Thanks in advance!
You could look for a tags (//a) whose descendant::text() starts with "Next" and then get the href attribute of that a element.
% xpquery -p HTML '//a[starts-with(descendant::text(), "Next")]/#href' 'http://www.allinlondon.co.uk/restaurants.php?type=name&rest=gluten+free'
href="http://www.allinlondon.co.uk/restaurants.php?type=name&tube=0&rest=glutenfree&region=0&cuisine=0&start=30&ordering=&expand="

extract xpath

I want to retrieve the xpath of an attribute (example "brand" of a product from a retailer website).
One way of doing it is using addons like xpather or xpath checker to firefox, opening up the website using firefox and right clicking the desired attrbute I am interested in. This is ok. But I want to capture this information for many attributes and right clicking each and every attribute maybe time consuming. Also, the other problem I have is that attributes I maybe interested in will be there for one product. The other attributes maybe for some other product. So, I will have to go that product & then do it manually again.
Is there an automated or programatic way of retrieving the xpath of the desired attributes from a website rather than having to do this manually?
You must notice that not all websites use valid XML that you can use xpath on...
That said, you should check out some HTML parsers that will allow you to use xpath on HTML even if it is not a valid XML.
Since you did not specify the technology you are working with - I'll suggest the .NET HTML Agility Pack, if you need others, search for questions dealing with this here on SO.
The solution I use for this kind of thing is to write an xpath something like this:
//*[text()="Brand"]/following-sibling::*
//*[text()="Color"]/following-sibling::*
//*[text()="Size"]/following-sibling::*
//*[text()="Material"]/following-sibling::*
It works by finding all elements (labels) with the text you want and then looking to the next sibling in the HTML. Without a specific URL to see I can't help any further.
This is a generalised version you can make more specific versions by replacing the asterisks is tag types, and you can navigate differently by replacing the axis following sibling with something else.
I use xPaths in import.io to make APIs for this kind of thing all the time, It's just a matter of finding a xPath that's generic enough to find the HTML no matter where it is on the page, but being specific enough to get the right data.

Resources