Parsing Meta Tag using xPath - xpath

How can I parse a Meta Tag such as
<meta itemprop="email" content="email#example.com" class="">
..and extract the email out of it.
When I copy the xPath of this tag, I get the following, which doesn't work
//*[#id="businessDetailsPrimary"]/div[2]/div/meta
Please advise.
Many thanks

The likelihood is that the itemprop="email" attribute will be unique across the webpage. In this case, you can select the email by accessing the content attribute via its XPath as follows:
//meta[#itemprop="email"]/#content
Demo
In case itemprop="email" is not unique, you can make your XPath more specific by selecting the element with id equal to businessDetailsPrimary first:
//*[#id="businessDetailsPrimary"]//meta[#itemprop="email"]/#content
Demo

Related

Unable to extract the data through xpath or css selector

When I do inspect element or view source, the required data is available on page, but when I extract them by using xpath or css, I am getting an empty list. Even I tried to extract all the nodes and it's content but that required data which was shown in View page source are not getting extracted. What could be the reason?
Below is the example code:
I need to extract href value from tag.
<div class="url-link">
<a data-id="abc" class="abc xyz" data-is-avod="" href="/ab/extract/xyz/3&t=25">Title</a>
<span>title</span>
</div>
I used response.xpath('//div/a/#href').extract() xpath but I am unable to extract the desired content.
I have analyzed and found when I logged in to the website then only inspect element or View page source shows this <a> tag else it does not show. So i think to get the #href text i need to pass the form with login information, but I don't know how to pass a form and how to get details of the form.
Please help.

How to get meta tag content value in Watir?

I'am not able to get the content value of meta tag from site in Ruby using Watir-webdriver gem.
e.g.
<meta property="og:title" content="【楽天市場】ダヴ メンプラスケア クリーンコンフォート泡洗顔 つめかえ用(110mL)【unili3e102】【ダヴ(Dove)】[ダヴ 洗顔]:爽快ドラッグ">
The problem with browser.meta(:property, 'og:title').content is that "property" is not a valid attribute for meta tags. As a result, Watir does not allow it as a locator method.
To locate elements via unsupported attributes, you will need to use a CSS-selector:
browser.meta(css: 'meta[property="og:title"]').content
Or use XPath:
browser.meta(xpath: '//meta[#property="og:title"]').content

Xpath - Selecting attributes using starts-with

I am trying to write an xpath expression that selects all div tags that have an attribute id that start with CompanyCalendar. Below is a snippet of the HTML that I am looking at:
<td class="some class" align="center" onclick="Calendar_DayClicked(this,'EventCont','Event');">
<span class="Text"></span>
<div id="CompanyCalendar02.21" class="Pop CalendarClick" style="right: 200px; top: 235px;"></div>
There are multiple divs that have an id like CompanyCalendar02.21 but for each new month in the calendar, they change the id. For example, the next month would be CompanyCalendar02.22. I would like to be able to select all of the divs that are equal to CompanyCalendar*
I am rather new at this so I was using some example off the net to try and get my xpath expression to work but to no avail. Any help would be greatly appreciated.
I am trying to write an xpath expression that selects all div tags that have an attribute id that start with CompanyCalendar.
The following expression is perhaps what you are looking for:
//div[starts-with(#id,'CompanyCalendar')]
What it does, in plain English, is
Return all div elements in the XML document that have an attribute id whose attribute value starts with "CompanyCalendar".
While checking in Browser console with the $x() call, it worked only after flipping the quotes - i.e. double quotes inside the Xpath starts-with() call.
$x('//div[starts-with(#id,"CompanyCalendar")]')

Xpath: getting data by comparing attributes

I need to assign an XPath expression to а reference tag which will generate automated text near my reference. The generated text should be taken from the title of the target element(figure).
This is how it looks.
Reference construction(could be located anywhere)
<internalRef internalRefId="fig1"></internalRef>
figure construction(may be anywhere)
<figure id="fig1">
<title>The TEXT I TRY TO GET
</title>
...
</graphic>
</figure>
I guess i should take the "title in figure" tag content if the figure's id attribute matches the link's target attribute.
One of my fail expression variants that prints nothing
//figure[self/#internalRefId=#id]/title
Thanks for ideas...
You're searching for #internalRefId attributes inside some non-existent <self/> element. Use you write the <internalRef/> element "could be located anywhere", this should be fine:
//figure[//#internalRefId=#id]/title
This will return all title elements for figures that have an #id equal to any #internalRefId attribute anywhere in the document.

How do HtmlAgilityPack extract text from html node whose class attribute appended dynamically

Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn.
<div class="content"><ul>
<li><b>用户评分:</b>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
<a>
<span class="swSprite s_star_3_5 " title="平均3.6 星">
<span>平均3.6 星</span>
</span>
</a>
My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.DocumentNode.SelectSingleNode(" //span[#class='swSprite']").InnerText or //span[#class='swSprite s_star_3_5 '], but the result is an error or not what my want !
Any suggestions?
First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.
I'm suggesting this because I tested //span[#class='swSprite s_star_3_5 '] and worked correctly.
That was the issue in the following questions:
Selecting nodes that have an attribute with spaces using HTMLAgilityPack
XPath Query Problem using HTML Agility Pack
If that doesn't help, post the HTML code and I'll help you ;)
This works for me:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(#class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());
and outputs
平均3.6 星
Note I use the XPATH starts-with function.

Resources