XPath for extracting image scr with style attribute with the help of Screaming Frog - xpath

My goal is to extract image URLs together with style attribute (width and height) with the help of Screaming Frog.
<p style="text-align: center;"><img alt="Scary games are all about submerging into unknown territories" src="//cdn01.x-plarium.com/browser/content/blog/images/2022/scary-games-2.webp" style="width: 640px; height: 426px;"></p>
I am adding the following XPath for custom extraction - //img[contains(#style)]/#src
But getting errors for this.
Will really appreciate any help.

Your XPath below contains an error
//img[contains(#style)]/#src
The contains functions expects two parameters. You have only passed one (#style). The parameters are both strings; if the second string is a substring of the first then the function returns true, otherwise it returns false.
If you just want to check that the style attribute has some value (any value) then the following will work:
//img[#style]/#src
If you want to check that the style attribute contains some particular string (e.g. 'width') then you want something like this:
//img[contains(#style, 'width')]/#src

Related

Xpath find a div without children with specific text

I need to retrieve a div without children with given text. I have this html
<h1>Rest Object</h1>
<div style="background-color: transparent;">
<div>Title: Rest object</div>
<div>ID: 2</div>
<div>Title: Rest object Copy</div>
<div>Full text: This is the full text. ID: 2</div>
<div>Value: 0.564</div>
<div>Timestamp: 2017-06-14 11:35:40</div>
</div>
I want to find <div>ID: 2</div>. How? I tried
xpath=(//div)
and it returns first div. I tried to use
xpath=(//div[not(div)])
and it returns
<div>Title: Rest object</div>.
UPDATE. Now I know I could you index.
xpath=(//div[not(div)][2])
<div>ID: 2</div>.
What if I don't know the index.
which returns
One way to get the needed div is to use starts-with() function:
//div[starts-with(.,'ID:')]
boolean starts-with(string, string) - Returns true if the first argument string starts with the second argument string; otherwise returns false
To restrict the search to div element which has no children you may use count(*) function:
//div[starts-with(.,'ID:')][count(*)=0]

Xpath get element above

suppose I have this structure:
<div class="a" attribute="foo">
<div class="b">
<span>Text Example</span>
</div>
</div>
In xpath, I would like to retrieve the value of the attribute "attribute" given I have the text inside: Text Example
If I use this xpath:
.//*[#class='a']//*[text()='Text Example']
It returns the element span, but I need the div.a, because I need to get the value of the attribute through Selenium WebDriver
Hey there are lot of ways by which you can figure it out.
So lets say Text Example is given, you can identify it using this text:-
//span[text()='Text Example']/../.. --> If you know its 2 level up
OR
//span[text()='Text Example']/ancestor::div[#class='a'] --> If you don't know how many level up this `div` is
Above 2 xpaths can be used if you only want to identify the element using Text Example, if you don't want to iterate through this text. There are simple ways to identify it directly:-
//div[#class='a']
From your question itself you have mentioned the answer for it
but I need the div.a,
try this
driver.findElement(By.cssSelector("div.a")).getAttribute("attribute");
use cssSelector for best result.
or else try the following xpath
//div[contains(#class, 'a')]
If you want attribute of div.a with it's descendant span which contains text something, try as below :-
driver.findElement(By.xpath("//div[#class = 'a' and descendant::span[text() = 'Text Example']]")).getAttribute("attribute");
Hope it helps..:)

Not able to fetch data besides <Strong> tag in Robot Framework

I am trying to fetch the numeric value after strong tag, as its not an web element, I am not able to get the value 123456789 in to variable:
If I use Get Text xpath=//*[#id='referral-or-navinet-reference-number'] then the result is "Referral #: 123456789"
Please help me in getting only numeric value in to variable.
HTML Code:
<td class="normal-text" id="referral-or-navinet-reference-number" align="right">
<strong>Referral #:</strong> 123456789
</td>
You can directly use split method of python
Like :-
x.split(":") // x is a string variable of your gettext
http://www.tutorialspoint.com/python/string_split.htm
http://www.pythonforbeginners.com/dictionary/python-split
Hope it will help you :)
If your td only contains the wanted text as content text you may use the following xpath:
//*[#id='referral-or-navinet-reference-number']/text()
This should return 123456789 (perhaps with some whitespace)
You can use given xpath :
//td[#id="referral-or-navinet-reference-number"]/text()[normalize-space()]

What XPATH I need to extract the text inside SPAN that is preceded by a specific label inside a STRONG, both inside a P?

What XPATH I need to extract the text inside SPAN that is preceded by a specific label inside a STRONG, both inside a P?
For example to extract website and email addresses from a page that looks like this:
<p>
<strong>Website:</strong>
<span>www.example.com</span>
</p>
<p>
<strong>Contact email:</strong>
<span>email#example.com</span>
</p>
This shall do:
//p/span[preceding::*[1][self::strong and . = 'Contact email:']]
Here, you are selecting all p/span elements with first preceding element strong, where label is Contact email:
Website:
//p/span[preceding::strong[1]/text()='Website:']
Email:
//p/span[preceding::strong[1]/text()='Contact email:']
It is also important to note that, by using preceding axes as shown in the other two answers, the XPath will mistakenly return span element that is formed like the following :
<strong>Website:</strong>
<p>
<span>www.example.com</span>
</p>
You can use preceding-sibling axes instead to avoid the mistake mentioned above :
//p/span[preceding-sibling::*[1][self::strong and . = 'Website:']]
preceding-sibling axes only consider elements that is located before context element (the span in this case), and is sibling (share the same parent) of the context element.

scrapy xpath : selector with many <tr> <td>

Hello I want to ask a question
I scrape a website with xpath ,and the result is like this:
[u'<tr>\r\n
<td>address1</td>\r\n
<td>phone1</td>\r\n
<td>map1</td>\r\n
</tr>',
u'<tr>\r\n
<td>address1</td>\r\n
<td>telephone1</td>\r\n
<td>map1</td>\r\n
</tr>'...
u'<tr>\r\n
<td>address100</td>\r\n
<td>telephone100</td>\r\n
<td>map100</td>\r\n
</tr>']
now I need to use xpath to analyze this results again.
I want to save the first to address,the second to telephone,and the last one to map
But I can't get it.
Please guide me.Thank you!
Here is code,it's wrong. it will catch another thing.
store = sel.xpath("")
for s in store:
address = s.xpath("//tr/td[1]/text()").extract()
tel = s.xpath("//tr/td[2]/text()").extract()
map = s.xpath("//tr/td[3]/text()").extract()
As you can see in scrappy documentation to work with relative XPaths you have to use .// notation to extract the elements relative to the previous XPath, if not you're getting again all elements from the whole document. You can see this sample in the scrappy documentation that I referenced above:
For example, suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:
divs = response.xpath('//div')
At first, you may be tempted to use the following approach, which is wrong, as it actually extracts all <p> elements from the document, not only those inside <div> elements:
for p in divs.xpath('//p'): # this is wrong - gets all <p> from the whole document
This is the proper way to do it (note the dot prefixing the .//p XPath):
for p in divs.xpath('.//p'): # extracts all <p> inside
So I think in your case you code must be something like:
for s in store:
address = s.xpath(".//tr/td[1]/text()").extract()
tel = s.xpath(".//tr/td[2]/text()").extract()
map = s.xpath(".//tr/td[3]/text()").extract()
Hope this helps,

Resources