IMPORTXML and XPath in Google Sheets - xpath

I have a question how to get pages quantity from here
One of the problems is that I never know how many spans will be in every page with book - here we have just 3, and "pages" span is number [2] here in the list but it can be any number so I cant just get it by using //p[#class='book']//text()[2]
I need to extract "300" using Google spreadsheets IMPORTXML function
<p class="book">
<span>condition: <b>good</b></span>
<br>
<span>pages: <b>300</b></span>
<br>
<span>color: <b>red</b></span>
<br>
</p>
I tried adding
[contains('pages: ')]
but no success here
Any suggestions?
p.s. //p[#class='book']//text() by itself
returns
condition:
good
pages:
300
color:
red

So you look for a span that start with 'pages:' and than take a value from it.
//p[#class='book']/span[starts-with(., 'pages:')]/b/text()

Related

ImportXML function in Google Dynamic XML path

I am trying to import the headlines and landing page URL's from "New + Updated" section of this page:
https://www.nytimes.com/wirecutter/
The issue is that the class "_988f698c" keeps changing as the headline is being replaced with a new headline/topic.
I need a workaround to use IMPORTXML function which will dynamically capture the class of that object in that position. The current formula is:
=IMPORTXML(https://www.nytimes.com/wirecutter/,"//*[#class='_988f698c']")
Here is the html tag for example. The class "_988f698c" refreshes every hour or so with new headlines coming in.
<li class="e9a6bea7">
<a class="_988f698c" href="https://www.nytimes.com/wirecutter/reviews/gir-spatula-review/">Why We Love GIR Spatulas</a>
<p class="_9d1f22a9">today
</p>
</li>
Is there a way I can do this?
Come back a little and look for an alternative path without forcing the use of random numbers.
For the title, use:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/a"
)
For the URL attached to the title:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/a/#href"
)
For the text indicating the day of publication:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/p"
)
If you want to collect everything together, use | to split the paths:
=IMPORTXML(
"https://www.nytimes.com/wirecutter/",
"//ul[#data-testid='new-and-updated']/li/a |
//ul[#data-testid='new-and-updated']/li/a/#href |
//ul[#data-testid='new-and-updated']/li/p"
)
only use it if you are absolutely sure that the values will always exist, because if they don't, you will have problems with the position in the sheet rows if you define formulas that depend on fixed values in each of the cells.

How to get intext info between tags with Xpath?

I am trying to retrieve an article number and some other data with the help of Xpath, where the ID is within an div tag surrounded by other HTML tags and text:
<div class="description">
<span class="product-name"></span><br>
details<br>
company<br>
Art.-Nr. (article): 1686382
<div class="product-icons"></div>
</div>
My Xpath looks like this
>>> response.xpath('//div[#id="product-list"]/div[1]/form/div[2]/div[2]').extract_first()
response:
'<div class="description">\n<span class="product-name"><b>Salviathymol N Madaus</b></span><br>\nTropfen, 100 Milliliter, N3<br>\nMEDA Pharma GmbH & Co. KG<br>\nArt.-Nr. (PZN): 11548439\n<div class="product-icons">\n<div class="rating"><span>(13)</span></div>\n</div>\n</div>'
How can I retrieve the three lines of data (details, company, article no)?
You current code will return the node rather the text. If you have to get the text then you have to point to the text node using text().
That's the reason why your below line of code extracted the text.
response.xpath('//div[#id="product-list"]/div[1]/form/div[2]/div[2]/br[3]//following-sibling::text()').extract_first()

Xpath Google Sheets counting class icon from html

Relatively new to Xpath using google sheets. I am trying to get scores from a movie website where the score is out of five stars with images used for the stars, so I need to count the class icon-star-full from the HTML below
<span class="rating "><i class="icon-star-full"></i> <i class="icon-star-full"></i> <i class="icon-star-full"></i> <i class="icon-star-full"></i> <i class="icon-star"></i></span>
In Google Sheets, the count function seems to be working fine for every class I try except for icon-star-full. For example count(//[#class='rating']) works fine I get a count of every class named rating. However count(//[#class='icon-star-full']) returns 0 on every page. For example, in the HTML above I should get 3 for my count but it's 0.
It there any different way I should be doing the count for icons?
try:
=COUNTA(IFERROR(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(A1), 1000, 1),
"where Col1 contains 'icon-star-full'")))&"/5"
where A1 is your URL

Need a little push to build an Xpath expression

The xpath I have defined below is working properly if tested individually. However, when I call
it from storage object and make that structure look like as underneath, trouble comes up and generates
disorganized results. Ignore my linguistic mistakes, if any.
Storage=xpath('//div[#class="info"]')
for item in Storage:
Name=item.xpath('//span[#itemprop="name"]/text()')
Address=item.xpath('//span[#itemprop="streetAddress" and #class="street-address"]/text()')
Phone=item.xpath('//div[#itemprop="telephone" and #class="phones phone primary"]/text()')
My question is: How to build an xpath expression If it is taken from "storage" and built "Name", "Address", and "Phone"
as I tried to do above. Thanks.
Here is the html element for that expression, if needed.
<div class="info"><h2 class="n">36. <span itemprop="name">The Coffee Table Eagle Rock</span></h2><div data-tripadvisor="{"rating":"4.0","count":"11"}" data-israteable="true" class="info-section info-primary"><div class="result-rating three half "><span class="count">(5)</span></div><div class="ta-rating extra-rating ta-4-0"></div><span class="ta-count">(11)</span><p itemscope="" itemtype="http://schema.org/PostalAddress" itemprop="address" class="adr"><span itemprop="streetAddress" class="street-address">1958 Colorado Blvd</span><span itemprop="addressLocality" class="locality">Los Angeles, </span><span itemprop="addressRegion">CA</span> <span itemprop="postalCode">90041</span></p><div itemprop="telephone" class="phones phone primary">(323) 255-2200</div></div><div class="info-section info-secondary"><div class="categories">Coffee & Espresso RestaurantsBars</div><div class="links">WebsiteMenu</div><a data-analytics="{"adclick":true,"events":"event7,event6","category":"8004238","impression_id":"fbd98612-6b8a-43c2-b31e-fd579de20126","listing_id":"11287432","item_id":-1,"listing_type":"free","ypid":"11287432","content_provider":"MDM","srid":"L-webyp-1c6db222-cc63-48d8-90d1-2d5dc8754cca-11287432","item_type":"PUP","lhc":"8004238","ldir":"LA","rate":3.5,"hasTripAdvisor":true,"mip_claimed_staus":"mip_unclaimed","mip_ypid":"11287432","click_id":523,"listing_features":"orderonline"}" href="https://yellowpages.pingup.com/Bkm3xG?ypid=11287432&uvid=t3pfPllxtLYkH2dlkSbiCC1marvZprsz1YhqhycO80NYrDv0OMX3uTJ3ryFG464RywmpWCrB&source=web-prod" rel="nofollow" target="_blank" class="action order-online" data-impressed="1">Order Online</a></div><div class="preferred-listing-features"></div><div class="snippet"><figure class="avatar-1 color-1"></figure><p class="body with-avatar">I went here recently with my 2 year old for breakfast. I got the Silverlake omelet and the breakfast sandwich for my son. The food was great (especi…</p></div></div>
If you want to get child/descendant elements of already defined item, you need to use .// to point on current ("item") element, but not // that points on root element. Try below:
Storage=xpath('//div[#class="info"]')
for item in Storage:
Name=item.xpath('.//span[#itemprop="name"]/text()')
Address=item.xpath('.//span[#itemprop="streetAddress" and #class="street-address"]/text()')
Phone=item.xpath('.//div[#itemprop="telephone" and #class="phones phone primary"]/text()')

XPath for Google Results: <em> and description without date

I have 3 questions:
1) How can I XPath the text in the Google Results, the bold marked. If there's no , there should be nothing shown.
2) =XPathOnUrl("https://www.google.de/search?q=KEYWORD&num=10");"//span[#class='st']") This gives me the Google Description, but how can i get the description without the <span class="f"> date?
3) I get the description with � as an "ä, ö, ü". How can these letters be displayed?
HTML DOM CODE:-
<span class="st">
<span class="f">18.11.2009 - </span>
This Thursday 19th November
<em>Moonshine</em>
turns 4 years old. I'm proud to say that's 4 years of Malaysian acts pretty much every month. We've ...
</span>
The code I used for this issue
driver.get("https://www.google.de/?gws_rd=ssl#q=moonshine+site:blogspot.com&nu%E2%80%8C%E2%80%8Bm=10");
List<WebElement> ele = driver.findElements(By.xpath("//span[#class='f']/following-sibling::text()"));
ele.toString();
for(int i=0;i<ele.size();i++)
{
System.out.println(ele.get(i).getText());
}
This code throws an InvalidSelectorException
The result of the xpath expression "//span[#class='f']/following-sibling::text()" is: [object Text]. It should be an element.
In future you try this following xpath to capture only the text i.e. description
//span[#class='f']/following-sibling::text()
Actually you can't capture that text because this is selenium Open Issue
[selenium-developer-activity] Issue 5459 in selenium: InvalidSelectorError: The result of the xpath expression is: [object Text]
you can find it in below link (issue details)
http://grokbase.com/t/gg/selenium-developer-activity/13475y4cgj/issue-5459-in-selenium-invalidselectorerror-the-result-of-the-xpath-expression-is-object-text
Use below Xpath for same. It will return all the dates present on the page:-
//span[#class='f']/text()
if you just want text the use below xpath
//span[#class='st' and not(#class='f')]/text()
Hope it will help you :)

Resources