Obtaining a partial value from XPath - xpath

I have the current HTML code:
<div class="group">
<ul class="smallList">
<li><strong>Date</strong>
13.06.2019
</li>
<li>...</li>
<li>...</li>
</ul>
</div>
and here is my "wrong" XPath:
//div[#class='group']/ul/li[1]
and I would like to extract the date with XPath without the text in the strong tag, but I'm not sure how NOT is used in XPath or could it even be used in here?
Keep in mind that the date is dynamic.

Use substring-after() to get the date value.
substring-after(//div[#class='group']/ul/li[1],'Date')
Output:

The easiest way to get the date is by using the XPath-1.0 expression
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]
The result does include the spaces.
If you want to get rid of them, too, use the following expression:
normalize-space(//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1])
Unfortunately this only works for one result in XPath-1.0.
If you'd have XPath-2.0 available, you could append the normalize-space() to the end of the expression which also enables the processing of multiple results:
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]/normalize-space()

Here is the python method that will read the data directly from the parent in your case the data is associated with ul/li.
Python:
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to call this in your case.
ulEle = driver.find_element_by_xpath("//div[#class='group']/ul/li[1]")
datePart = get_text_exclude_children(ulEle)
print(datePart)
Please feel free to convert to the language that you are using, if it's not python.

Related

xPath - Why is this exact text selector not working with the data test id?

I have a block of code like so:
<ul class="open-menu">
<span>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text Here</strong>
<small>...</small>
</div>
</li>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text</strong>
<small>...</small>
</div>
</li>
</span>
</ul>
I'm trying to select a menu item based on exact text like so in the dev tools:
$x('.//*[contains(#data-testid, "menu-item") and normalize-space() = "Text"]');
But this doesn't seem to be selecting the element. However, when I do:
$x('.//*[contains(#data-testid, "menu-item")]');
I can see both of the menu items.
UPDATE:
It seems that this works:
$x('.//*[contains(#class, "menu-item") and normalize-space() = "Text"]');
Not sure why using a class in this context works and not a data-testid. How can I get my xpath selector to work with my data-testid?
Why is this exact text selector not working
The fact that both li elements are matched by the XPath expression
if omitting the condition normalize-space() = "Text" is a clue.
normalize-space() returns ... Text Here ... for the first li
in the posted XML and ... Text ... for the second (or some other
content in place of ... from div/svg or div/small) causing
normalize-space() = "Text" to fail.
In an update you say the same condition succeeds. This has nothing to
do with using #class instead of #data-testid; it must be triggered
by some content change.
How can I get my xpath selector to work with my data-testid?
By testing for an exact text match in the li's descendant strong
element,
.//*[#data-testid = "menu-item" and div/strong = "Text"]
which matches the second li. Making the test more robust is usually
in order, e.g.
.//*[contains(#data-testid,"menu-item") and normalize-space(div/strong) = "Text"]
Append /div/small or /descendant::small, for example, to the XPath
expression to extract just the small text.
data-testid="menu-item" is matching both the outer li elements while text content you are looking for is inside the inner strong element.
So, to locate the outer li element based on it's data-testid attribute value and it's inner strong element text value you can use XPath expression like this:
//*[contains(#data-testid, "menu-item") and .//normalize-space() = "Text"]
Or
.//*[contains(#data-testid, "menu-item") and .//*[normalize-space() = "Text"]]
I have tested, both expressions are working correctly

how to get the data using XPATH from div with display:none?

I want to extract data from a div element with the attribute 'display:none'.
<div class='test' style='display:none;'>
<div id='test2'>data</div>
</div>
Here is what I tried:
//div[#class = "test"]//div[contains(#style, \'display:none\')';
Please help.
Try several changes:
1) Just put normal quotes around "display:none", like you did for your class attribute and close with ]
2) Then your div with class test and your style attribute is one and the same, so you need to call contains also for the same div:
'//div[#class = "test" and contains(#style, "display:none")]'
or the quotes the other way around, important is, that you are using differnt quotes around the expression than inside the expression
"//div[#class = 'test' and contains(#style, 'display:none')]"
if this still does not work, pls post an error message

scrapy xpath : selector with many <tr> <td>

Hello I want to ask a question
I scrape a website with xpath ,and the result is like this:
[u'<tr>\r\n
<td>address1</td>\r\n
<td>phone1</td>\r\n
<td>map1</td>\r\n
</tr>',
u'<tr>\r\n
<td>address1</td>\r\n
<td>telephone1</td>\r\n
<td>map1</td>\r\n
</tr>'...
u'<tr>\r\n
<td>address100</td>\r\n
<td>telephone100</td>\r\n
<td>map100</td>\r\n
</tr>']
now I need to use xpath to analyze this results again.
I want to save the first to address,the second to telephone,and the last one to map
But I can't get it.
Please guide me.Thank you!
Here is code,it's wrong. it will catch another thing.
store = sel.xpath("")
for s in store:
address = s.xpath("//tr/td[1]/text()").extract()
tel = s.xpath("//tr/td[2]/text()").extract()
map = s.xpath("//tr/td[3]/text()").extract()
As you can see in scrappy documentation to work with relative XPaths you have to use .// notation to extract the elements relative to the previous XPath, if not you're getting again all elements from the whole document. You can see this sample in the scrappy documentation that I referenced above:
For example, suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:
divs = response.xpath('//div')
At first, you may be tempted to use the following approach, which is wrong, as it actually extracts all <p> elements from the document, not only those inside <div> elements:
for p in divs.xpath('//p'): # this is wrong - gets all <p> from the whole document
This is the proper way to do it (note the dot prefixing the .//p XPath):
for p in divs.xpath('.//p'): # extracts all <p> inside
So I think in your case you code must be something like:
for s in store:
address = s.xpath(".//tr/td[1]/text()").extract()
tel = s.xpath(".//tr/td[2]/text()").extract()
map = s.xpath(".//tr/td[3]/text()").extract()
Hope this helps,

angular filtering decode characters

in angular, if
$scope.myStr = '™';
{{myStr}} yields '$trade;' instead of the TM mark, how would I solve this issue using a filter?
and in some cases, $amp;trade; also appears, so I would absolutely need a filter to run the procedures, and eventually I want to be able to {{}} the result without dom manipulation.
You can use ngBindUnsafeHtml: http://jsfiddle.net/Xnp3J/
<div ng-app ng-controller="x">
<span ng-bind-html-unsafe="myStr"></span>
</div>
-
function x($scope) {
$scope.myStr = '™';
}

Use Xpath To Retrieve Elements

HTML Portion:
<div class="abc">
<div style="text-align:left; itemscopr itemtype="xyz">
<h1 itemtype="mno"> I want this text </h1>
</div>
</div>
I am using
$text = $xpath->query('//div[class="abc"]/div/h1]
but I am getting no value. Please help me as I am new to it.
You should try
//div[#class="abc"]/div/h1
The difference is in the # sign before class, because the attribute axis is accessed this way. When you omit the # sign, it looks for node names (tag names).
This returns you the whole h1 node (or, rather, a node-set containing all the matching h1 nodes).
If you only wanted the text from the element, try the evaluate function instead:
$text = $xpath->evaluate("//div[#class='abc']/div/h1/text()")

Resources