xPath expression is not finding text inside HTML - xpath

I have following XML:
<div>
<ul>
<li>
<a>
Logout 1
</a>
</li>
<li>
<a>
Logout 2
</a>
</li>
<li>
<a>
Logout 3
</a>
</li>
<li>
<a>
Logout 4
</a>
</li>
</ul>
</div>
And I want to check if a a tag with the text Logout 4exists. I do this with the following expression:
/div/ul/li/a[text() = 'Logout 4']
Which doesnt seem to work, anyone can tell me what I am doing wrong?
I am testing my xPath on this site btw: http://www.xpathtester.com/xpath

Your XPath didn't return any result because the inner text of the a element has leading and trailing spaces, which you can clear using normalize-space() :
/div/ul/li/a[normalize-space() = 'Logout 4']
demo
or, if you really want to evaluate only the first child text node within a :
/div/ul/li/a[normalize-space(text()) = 'Logout 4']

Related

After parsing a valid expression, there is still more data in the expression pageCount

I'm making an E-commerce website and in the products section (inside admin), I was trying to display only 10 products per page. I'm new to Spring and while writing the code, I encountered an error (given in title) when trying to add the next page button. However, the code works fine with the Previous button and all the page numbers. Here's my code for the pagnation section:
<nav class="mt-3" th:if="${count > perPage}">
<ul class="pagination">
<li class="page-item" th:if="${page > 0}">
<a th:href="#{${#httpServletRequest.requestURI}} + '?page=__${page-1}__'" class="page-link">Previous</a>
</li>
<li class="page-item" th:each="number: ${#numbers.sequence(0, pageCount-1)}" th:classappend="${page==number} ? 'active' : ''">
<a th:href="#{${#httpServletRequest.requestURI}} + '?page=__${number}__'" class="page-link" th:text="${number+1}"></a>
</li>
<li class="page-item" th:if="${page pageCount-1}">
<a th:href="#{${#httpServletRequest.requestURI}} + '?page=__${page+1}__'" class="page-link">Next</a>
</li>
</ul>
</nav>
The first 2 li's work fine and I get the list of pages and also the previous button. But on adding the Next button, I get the error mentioned above.
First of all, please always provide the actual error message. Otherwise we are just guessing.
My guess is that th:if expects a boolean expression and what you have doesn't look like boolean to me: th:if="${page pageCount-1}"
Change that to something like page == pageCount-1, but again depends on what you want to display there

Proper xpath Syntax for Extracting Two Text Values

I am trying to scrape a web page for NAME OF COMPANY and CITY AND STATE OF COMPANY shown below.
I have an xpath code snippet that identifies both text elements at the same time:
// span[starts-with(#class,"text-align")]/text()[2]
This xpath snippet pulls the first text value (COMPANY NAME). How do I get the second text element (CITY,STATE)?
A snip of the web page code looks like this:
<div>
<ul class="pv-top-card-v3--experience-list">
<li>
<a class="pv-top-card-v3--experience-list-item" href="#" data-control-name="position_see_more" data-ember-action="" data-ember-action-172="172">
<img src="https://media.licdn.com/dms/image/C4E0BAQFhA8h46hvabA/company-logo_100_100/0?e=1582761600&v=beta&t=VAeZqaGu3Lu6Ol_n5kiiI74FSRuSOZA1ggAI5qTVRjE" id="ember173" class="EntityPhoto-square-1 flex-shrink-zero ember-view">
<span id="ember174" class="text-align-left ml2 t-14 t-black t-bold full-width lt-line-clamp lt-line-clamp--multi-line ember-view" style="-webkit-line-clamp: 2"> THIS IS THE NAME OF A COMPANY
<!----></span>
</a>
</li>
<li>
<a class="pv-top-card-v3--experience-list-item" href="#" data-control-name="education_see_more" data-ember-action="" data-ember-action-176="176">
<img src="https://media.licdn.com/dms/image/C560BAQEr2uQX-x2EwQ/company-logo_100_100/0?e=1582761600&v=beta&t=aDbYLUDMvlS4DpwOLjOaQj3Dj60C_cYLC5UUvGoyld0" id="ember177" class="EntityPhoto-square-1 flex-shrink-zero ember-view">
<span id="ember178" class="text-align-left ml2 t-14 t-black t-bold full-width lt-line-clamp lt-line-clamp--multi-line ember-view" style="-webkit-line-clamp: 2"> THIS IS THE CITY AND STATE OF COMPANY
<!----></span>
</a>
</li>
</ul>
</div>
The xpath string is picking up the two span elements using class. I can't use the span id attributes because they are dynamic and change with each page (one page per company).
Can someone advise how I extract the desired text?
Thanks.
point to the li level.
//ul/li[2]/a/span[starts-with(#class,"text-align")]

How to style one of the li that generate with repeat.for in aurelia?

<ul>
<li repeat.for="row of router.navigation" > ${row.title}
</li>
</ul>
infact i want to style one of the router button that generate with repeat.for method
want to make left border radius for navigation bar like the right of the navigation bar
As a one-liner option, you could bind your class attribute with ...
<li repeat.for="row of router.navigation" class="${myBool ? 'a-class' : 'another-class'}">${row.title}</li>
You can bind class with string interpolation or with .bind syntax. See https://aurelia.io/docs/binding/class-and-style#class
UPDATE: Sorry...should have read your other comment further down. If it's just for the first , why not just use CSS?
#myUl>li:first-child{
   // my CSS here
}​
You can identify the first repeated element with the $index context variable.
This would lead to something like this:
<ul>
<li repeat.for="row of router.navigation" >
<a if.bind="$index === 0" class="blah"> ${row.title} </a>
<a else class="another class"> ${row.title} </a>
</li>
</ul>
If you need to do the styling on the <li> tag, the solution could be like this:
<ul>
<template repeat.for="row of router.navigation" >
<li if.bind="$index === 0" class="blah"> ${row.title} </li>
<li else class="another class"> ${row.title} </li>
</template>
</ul>

Extract text inside anchor tag using xpath

I am trying to ascertain how many pages are there for any search result on a site so that i can scrape data for all the pages using lxml and xpath.
There is a pagination tab with the following structure:
Page: 1 2 3 ... 7 next
the html content for the same being something like
<ul class="ulclass">
<li></li>
<li>
<span> You are on the first page</span>
"1"
</li>
<li>
<a href="link to second page">
<span></span>
"2"
</a>
</li>
<li>
</li>
...
<li>
<a href="link to last page">
<span></span>
"7"
</a>
</li>
My approach is to extract the page numbers 1,2,3,7 so that i can repeat the web scraping 7 times for every page 'cause otherwise it just scrapes the first result of the page.
I have written the following xpath, but it doesnot return correct page numbers.
xpath('//ul[#class="ulclass"]/li/a/text())
If I expand your example to form this,
<ul class="ulclass">
<li><span>You are on the first page</span>"1"</li>
<li><span></span>"2"</li>
<li><span></span>"3"</li>
<li><span></span>"4"</li>
<li><span></span>"5"</li>
<li><span></span>"6"</li>
<li><span></span>"7"</li>
</ul>
then using scrapy in Python I can get this:
>>> from scrapy.selector import Selector
>>> selector = Selector(text=open('temp.htm').read())
>>> selector.xpath('..//ul[#class="ulclass"]/li/a/text()').extract()
['"2"', '"3"', '"4"', '"5"', '"6"', '"7"']

How to get xpath to a link to picture?

How to extract xpath to link of image? I want to extract this specific link:
http://insales.ru/images/bigpic.jpeg
I dont know how I must specify xpath. Do I have to include all parent tags to get it or I just can go directly to tag?
<div class="tango">
<div class="container-horizontal">
<div class="clip-horizontal">
<ul id="carousel" class="pagination jcarousel">
<li class="jcarousel-item">
<a rev=http://insales.ru/images/bigpic.jpeg href="http://insales.ru/images/pic2.jpeg">
<img src="http://insales.ru/images/thumb.jpeg">
</a>
</li>
</ul>
</div>
</div>
</div>
So my xpath would be:
//li[contains(#class, 'jcarousel-item jcarousel-item-horizontal jcarouse')]/a
Or I have to include all parent div tags:
//div[#class="tango"]//div[#class="container-horizontal"]//li[contains(#class, 'jcarousel-item jcarousel-item-horizontal jcarouse')]/a
But anyway both of this xpaths don't work.
How to specify xpath to extract this link: http://insales.ru/images/bigpic.jpeg
There are several options but the shortest one is probably:
.//*[#id='carousel']/li/a/img

Resources