How to get all the DOM hierarchy of webpage DOM - xpath

I'd like to create a hierarchic view of the dom of a webpage.
For example this webpage :
<body>
<div class="A"> </>
<div class="B">
<div class="C"> </>
</>
I would like a result such as this
node parent_node
body
div class="A" body
div class="B" body
div class="C" div class="B"
I tried to use scrapy with xpath
for mytable in response.xpath("//*"):
yield {
'node': mytable,
'parent': mytable.xpath("//parent::*")
}
But It didn't work, it was doing a forever loop
Thanks for your help. I'dl like this to work for any webpage (so I've no info specific to one webpage)
Anderson terrific answer resolved it.
This is pretty good but there's still a slight issue.
Scrapy returns some weird stuff
<Selector xpath='//*' data=u'<span class="price">69,00\xa0\u20ac</span>'>
I'd prefer to get a cleaner result
<span class="price">69,00 €</span>

Related

get Xpath data comes from External JS

It is simple example HTML for demonstration my issue
<!DOCTYPE html>
<html>
<body>
<div>
<label>This value comes from internal</label>
<div>
<div name='internal'>$11.11</div>
</div>
<div>
<label>This value comes from external</label>
<div>
<input type ='text' name='internal' readonly='true'>
</div>
</div>
</body>
</html>
//Display in Web Browser
This value comes from internal
11.11
This value comes from external
55.55
I want to get $55.55, but I searched "//*[text()='$55.55']" with inspector but I could not find any $55.55
and I figured out this $55.55 value comes from external JS and changed DOM and display.
This value displayed on browser but I could not get Xpath of input value
How can I get Xpath get this value "$55.55"
Thank you
Well, I dont know if I correctly understand your question, but if your goal is to get the input's value you can do something easy as, without the need of XPath:
var myInputValue = document.querySelector('input[name=internal]').value;
console.log(myInputValue);

onmouseenter/onmouseleave animations apply won't work

I've been trying hard for the few past hours, but I can't make it work.
My objective is to put an animation on element1's ::before and another one on element1. I tried the simple CSS trick with the ::before and :hover thingy.. but it won't work.
HTML
<div id="fc3"><div class="f_bb">Hover this</div>
<div class="f_bloc">This should appear with an animation</div></div>
CSS
/can't put it here without bugging it, dunno why/
JS
/same/
https://jsfiddle.net/5Lf76r7h/
Any idea ?
Thanks. :)
UPDATE
Full CSS attempt : https://jsfiddle.net/5Lf76r7h/3/
UPDATE 2
https://jsfiddle.net/5Lf76r7h/4/
I have used jQuery animate with mouseenter and mouseleave to achieve the same. Please see this fiddle:
https://jsfiddle.net/rawatdeepesh/2f41k6m5/
Here is the code for your design:
<div id="fc3">
<div class="f_bb">Hover this
<div class="progress" style="border:1px blue solid;width:10px;height:5px;background-color:yellow;display:none">
</div>
<div class="f_bloc" style="display:none">This should appear with an animation</div>
</div>
</div>
In case you need more on animations: http://www.w3schools.com/jquery/jquery_animate.asp

XPath Getting child elements from html

I am trying to find the xpath for only the child of a navigation bar. The path which I am trying at the moment is //div[#class='navCol subMenus'] from this peace of HTML.
<div class="PrimaryNavigationContainer">
<div class="PrimaryNavigation">
<div class="Menu">
<div>
<span>Brands</span>
<div class="navCol">
<div>
<a class="NoLink unselectable"><span>Shop by Brand</span></a>
<div class="navCol subMenus">
<div>
<span>blah</span>
I have tried a number of Xpath syntax but none seem to work to bring up just the sub categories. Thank you for any help which you can provide.

Phantom <span> element using ImportXML with XPath in Google Spreadsheet

I am trying to get the value of an element attribute from this site via importXML in Google Spreadsheet using XPath.
The attribute value i seek is content found in the <span> with itemprop="price".
<div class="left" style="margin-top: 10px;">
<meta itemprop="currency" content="RON">
<span class="pret" itemprop="price" content="698,31 RON">
<p class="pret">Pretul tau:</p>
698,31 RON
</span>
...
</div>
I can access <div class="left"> but i can't get to the <span> element.
Tried using:
//span[#class='pret']/#content i get #N/A;
//span[#itemprop='price']/#content i get #N/A;
//div[#class='left']/span[#class='pret' and #itemprop='price']/#content i get #N/A;
//div[#class='left']/span[1]/#content i get #N/A;
//div[#class='left']/span/text() to get the text node of <span> i get #N/A;
//div[#class='left']//span/text() i get the text node of a <span> lower in div.left.
To get the text node of <span> i have to use //div[#class='left']/text(). But i can't use that text node because the layout of the span changes if a product is on sale, so i need the attribute.
It's like the span i'm looking for does not exist, although it appears in the development view of Chrome and in the page source and all XPath work in the console using $x("").
I tried to generate the XPath directly form the development tool by right clicking and i get //*[#id='produs']/div[4]/div[4]/div[1]/span which does not work. I also tried to generate the XPath with Firefox and plugins for FF and Chrome to no avail. The XPath generated in these ways did not even work on sites i managed to scrape with "hand coded XPath".
Now, the strangest thing is that on this other site with apparently similar code structure the XPath //span[#itemprop='price']/#content works.
I struggled with this for 4 days now. I'm starting to think it's something to do with the auto-closing meta tag, but why doesn't this happen on the other site?
Perhaps the following formulas can help you:
=ImportXML("http://...","//div[#class='product-info-price']//div[#class='left']/text()")
Or
=INDEX(ImportXML("http://...","//div[#class='product-info-price']//div[#class='left']"), 1, 2)
UPDATE
It seems that not properly parse the entire document, it fails. A document extraction, something like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<div class="product-info-price">
<div class="left" style="margin-top: 10px;">
<meta itemprop="currency" content="RON">
<span class="pret" itemprop="price" content="698,31 RON">
<p class="pret">Pretul tau:</p>
698,31 RON
</span>
<div class="resealed-info">
» Vezi 1 resigilat din aceasta categorie
</div>
<ul style="margin-left: auto;margin-right: auto;width: 200px;text-align: center;margin-top: 20px;">
<li style="color: #000000; font-size: 11px;">Rata de la <b>28,18 RON</b> prin BRD</li>
<li style="color: #5F5F5F;text-align: center;">Pretul include TVA</li>
<li style="color: #5F5F5F;">Cod produs: <span style="margin-left: 0;text-align: center;font-weight: bold;" itemprop="identifier" content="mol:GA-Z87X-UD3H">GA-Z87X-UD3H</span> </li>
</ul>
</div>
<div class="right" style="height: 103px;line-height: 103px;">
<form action="/?a=shopping&sa=addtocart" method="post" id="add_to_cart_form">
<input type="hidden" name="product-183641" value="on"/>
<img src="/templates/marketonline/images/pag-prod/buton_cumpara.jpg"/>
</form>
</div>
</div>
</html>
works with the following XPath query:
"//div[#class='product-info-price']//div[#class='left']//span[#itemprop='price']/#content"
UPDATE
It occurs to me that one option is that you can use Apps Script to create your own ImportXML function, something like:
/* CODE FOR DEMONSTRATION PURPOSES */
function MyImportXML(url) {
var found, html, content = '';
var response = UrlFetchApp.fetch(url);
if (response) {
html = response.getContentText();
if (html) content = html.match(/<span class="pret" itemprop="price" content="(.*)">/gi)[0].match(/content="(.*)"/i)[1];
}
return content;
}
Then you can use as follows:
=MyImportXML("http://...")
At this time, the referred web page in the first link doesn't include a span tag with itemprop="price", but the following XPath returns 639
//b[#itemprop='price']
Looks to me that the problem was that the meta tag was not XHTML compliant but now all the meta tags are properly closed.
Before:
<meta itemprop="currency" content="RON">
Now
<meta itemprop="priceCurrency" content="RON" />
For web pages that are not XHTML compliant, instead of IMPORTXML another solution should be used, like using IMPORTDATA and REGEXEXTRACT or Google Apps Script, the UrlFetch Service and the match JavasScript function, among other alternatives.
Try smth like this:
print 'content by key',tree.xpath('//*[#itemprop="price"]')[0].get('content')
or
nodes = tree.xpath('//div/meta/span')
for node in nodes:
print 'content =',node.get('content')
But i haven't tried that.

loading div content from external with jQuery.load into own div

Let's say that I have two html pages that are identically designed, but have different content. I have the same div with the same id on both pages. How do I use jQuery.load (or what do I use) so that the div#conent does not get added into the div#content of the first page.
I've tried this:
$(document).ready(function(){
$("a#linkHome").click(function(){$("div#content").load('index.htm #content');});
$("a#linkPage2").click(function(){$("div#content").load('page2.htm #content');});
});
... but it ends up adding another div to the already existing div!
<div id="content">
<div id="content">
Blah Blah Blah
<div id="content">
</div>
Try with:
$(document).ready(function(){
$("a#linkHome").click(function(){$("div#content").load('index.htm #content *');});
$("a#linkPage2").click(function(){$("div#content").load('page2.htm #content *');});
});
in this way you get all elements inside the div#content but not the div itself.
Or you can try the opposite approach. Just add a wrapper div into your target page.

Resources