Xpath: getting data by comparing attributes - xpath

I need to assign an XPath expression to а reference tag which will generate automated text near my reference. The generated text should be taken from the title of the target element(figure).
This is how it looks.
Reference construction(could be located anywhere)
<internalRef internalRefId="fig1"></internalRef>
figure construction(may be anywhere)
<figure id="fig1">
<title>The TEXT I TRY TO GET
</title>
...
</graphic>
</figure>
I guess i should take the "title in figure" tag content if the figure's id attribute matches the link's target attribute.
One of my fail expression variants that prints nothing
//figure[self/#internalRefId=#id]/title
Thanks for ideas...

You're searching for #internalRefId attributes inside some non-existent <self/> element. Use you write the <internalRef/> element "could be located anywhere", this should be fine:
//figure[//#internalRefId=#id]/title
This will return all title elements for figures that have an #id equal to any #internalRefId attribute anywhere in the document.

Related

Parsing Meta Tag using xPath

How can I parse a Meta Tag such as
<meta itemprop="email" content="email#example.com" class="">
..and extract the email out of it.
When I copy the xPath of this tag, I get the following, which doesn't work
//*[#id="businessDetailsPrimary"]/div[2]/div/meta
Please advise.
Many thanks
The likelihood is that the itemprop="email" attribute will be unique across the webpage. In this case, you can select the email by accessing the content attribute via its XPath as follows:
//meta[#itemprop="email"]/#content
Demo
In case itemprop="email" is not unique, you can make your XPath more specific by selecting the element with id equal to businessDetailsPrimary first:
//*[#id="businessDetailsPrimary"]//meta[#itemprop="email"]/#content
Demo

How do I retrieve innerhtml using watir webdriver

I have the following HTML, and I need to get the text that is outside of the bold tag. For instance 'Submitted At:' I need to get the timestamp that follows. You will see that 'Submitted At: is surrounded by bold tags and the timestamp follows and I can not retrieve it.
<body>
<h2> … </h2>
<b> … </b>
jenkins
<br></br>
<b> … </b>
<br></br>
<b> … </b>
…
<br></br>
<b> … </b>
<br></br>
<b>
Submitted At:
</b>
29-Jan-2016 17:12:24
Things I have tried.
#browser.body.text.split("\n")
#browser.body.split("\n")
body_html = Nokogiri::HTML.parse(#browser.body.html)
body_html.xpath("//body//b").text
returned: "User: JobName: JobConf: Job-ACLs: All users are allowedSubmitted At: Launched At: Finished At: Status: Analyse This Job"
I have tried several things such as xpath, plain old text retrieval, but I am not able to get what I need. I have also done several searches and can't find what I need.
To start with, html bereft of classes and ids is always going to provide a challenge. It is going to be even worse when you want to access text that is merely in the body tag.
In this specific instance, this should work:
browser.b(index: 4)
InnerHtml is literally what it is - its inside a HTMLstart and end tag. So you are looking at InnerHtml of the outer tag actually - <body>.
The .text of <Body> tag will give you entire text. If the tags are gonna be dynamic index is not going to work. So if you know the timestamp length is gonna always be same, Get the entire text, delimit/unstring based on this string 'Submitted At:' to max timestamp length. This will be stable solution rather than a hardcoded Index value if it may change. Ie pickup substring starting from that tag to max length of timestamp.
The HTML appears to have a structure of:
a <b> tag that is the field description and
a following text node that is the field value.
Watir can only return the concatenation of all an element's text nodes. As a result, it does not deal well with this structure, which needs the text nodes separated. While you could parse the concatenated String, it could be error prone depending on the possible field descriptions/values.
I would therefore suggest parsing the HTML with Nokogiri as it can return individual text nodes. This would look like:
html = browser.html
doc = Nokogiri::HTML(html)
p doc.at_xpath('//b[normalize-space(text()) = "Submitted At:"]
/following-sibling::text()[1]').text.strip
#=> "29-Jan-2016 17:12:24"
Here we are using an XPath to find the <b> tag that contains the relevant field description, "Submitted At:". From that node, we find the text node, ie the "29-Jan-2016 17:12:24", that comes right after it.

Trouble accessing a text with XPath query

I have this html snippet
<div id="overview">
<strong>some text</strong>
<br/>
some other text
<strong>more text</strong>
TEXT I NEED IS HERE
<div id="sub">...</div>
</div>
How can I get the text I am looking for (shown in caps)?
I tried this, I get an error message saying not able to locate the element.
"//div[#id='overview']/strong[position()=2]/following-sibling"
I tried this, I get the div with id=sub, but not the text (correctly so)
"//div[#id='overview']/*[preceding-sibling::strong[position()=2]]"
Is there anyway to get the text, other than doing some string matching or regex with contents of overview div?
Thanks.
following-sibling is the axis, you still need to specify the actual node (in your example the XPath processor is searching for an element named following-sibling). You separate the axis from the node with ::.
Try this:
//div[#id='overview']/strong[position()=2]/following-sibling::text()[1]
This specifies the first text node after the second strong in the div.
If you always want the text immediately preceding the <div id="sub"> then you could try
//div[#id='sub']/preceding-sibling::text()[1]
That would give you everything between the </strong> and the opening <div ..., i.e. the upper case text plus its leading and trailing new lines and whitespace.

How can i solve this "The name attribute is obsolete. Consider putting an id attribute on the nearest container instead."

I have find the plugin for one page scroll and where i have to enter the attribute like to effect the scroll when clicking on the menu BUT the issue is W3C is showing error there.
First way this type of format which is required for the code -
<a name="aboutus"></a>
I have tried this way too -
<a name="http://www.domain.com/newcopy/responsive/index.html#aboutus"></a>
but not success. Please help.
Also one more error "Element img is missing required attribute src."
<img width="300" height="200" data-original="img/port9.jpg" alt="Portfolio 4" class="lazy imp-responsive">
I have added this code so the lazy load will works.
In HTML5 (CR):
The a element must not have a name attribute (however, its use is under some circumstances obsolete but conforming). Instead, use the global id attribute.
The img element must have the src attribute.

How do I add an image to an item in RSS 2.0?

Is there a way to send only an Image with a link and some alt text for each item in an RSS feed?
I looked at the enclosure tag but this is only for videos and music.
The enclosure element can be used to transmit pictures. The RSS 2.0 spec is quite clear about that, saying that the type is a MIME type. It does not say it is restricted to audio or video.
Here's an example: a set of photo feeds from Agence France Presse
One of solutions is to use CDATA in description
<![CDATA[
Image inside RSS
<img src="http://example.com/img/smiley.gif" alt="Smiley face">
]>
Note, that you may have a problem with hotlink prevented site.
This is possible in RRS2,
see
http://cyber.law.harvard.edu/rss/rss.html#ltenclosuregtSubelementOfLtitemgt
So you have to use the enclosure tag, to add media
You should use the enclosure tag within item to include the image. You can use it for images by setting the correct Mime Type (for example: image/jpeg) and including the image size as the "length" attribute. The length attribute doesn't need to be completely accurate but it's required for the RSS to be considered valid.
Here's a helpful article that discusses this and other options.
To work with the Mailchimp RSS to email feature, they expect the image to be specified in a <media:content> element inside <item>. This is their source for the feed item's image macro in their templates.
Thus, you need to add to the declarations
xmlns:media="http://search.yahoo.com/mrss/
Then inside the <item> element add
<media:content medium="image" url="http://whatever/foo.jpg" width="300" height="201" />
Without the extra declaration, the feed is invalid since media:content is not a known element.
Inside tag ITEM
<image:image xmlns:image="http://web.resource.org/rss/1.0/modules/image/">
http://domain. com/image.jpg
< /image:image>
Inside Description Tag
<![CDATA[
Some Text..
<br/><img src='http://domain. com/image.jpg' ><br/>
More Text
]]>
Regarding the <p> tag issue, You need to encode html within the xml.
Your code would look something like this:
<description><p> Text in the tag </p></description>
Since you are using php you can use htmlentities() to encode the html tags. They look horrible in the xml but RSS readers know what to do with it.
http://php.net/manual/en/function.htmlentities.php

Resources