The xpath I have defined below is working properly if tested individually. However, when I call
it from storage object and make that structure look like as underneath, trouble comes up and generates
disorganized results. Ignore my linguistic mistakes, if any.
Storage=xpath('//div[#class="info"]')
for item in Storage:
Name=item.xpath('//span[#itemprop="name"]/text()')
Address=item.xpath('//span[#itemprop="streetAddress" and #class="street-address"]/text()')
Phone=item.xpath('//div[#itemprop="telephone" and #class="phones phone primary"]/text()')
My question is: How to build an xpath expression If it is taken from "storage" and built "Name", "Address", and "Phone"
as I tried to do above. Thanks.
Here is the html element for that expression, if needed.
<div class="info"><h2 class="n">36. <span itemprop="name">The Coffee Table Eagle Rock</span></h2><div data-tripadvisor="{"rating":"4.0","count":"11"}" data-israteable="true" class="info-section info-primary"><div class="result-rating three half "><span class="count">(5)</span></div><div class="ta-rating extra-rating ta-4-0"></div><span class="ta-count">(11)</span><p itemscope="" itemtype="http://schema.org/PostalAddress" itemprop="address" class="adr"><span itemprop="streetAddress" class="street-address">1958 Colorado Blvd</span><span itemprop="addressLocality" class="locality">Los Angeles, </span><span itemprop="addressRegion">CA</span> <span itemprop="postalCode">90041</span></p><div itemprop="telephone" class="phones phone primary">(323) 255-2200</div></div><div class="info-section info-secondary"><div class="categories">Coffee & Espresso RestaurantsBars</div><div class="links">WebsiteMenu</div><a data-analytics="{"adclick":true,"events":"event7,event6","category":"8004238","impression_id":"fbd98612-6b8a-43c2-b31e-fd579de20126","listing_id":"11287432","item_id":-1,"listing_type":"free","ypid":"11287432","content_provider":"MDM","srid":"L-webyp-1c6db222-cc63-48d8-90d1-2d5dc8754cca-11287432","item_type":"PUP","lhc":"8004238","ldir":"LA","rate":3.5,"hasTripAdvisor":true,"mip_claimed_staus":"mip_unclaimed","mip_ypid":"11287432","click_id":523,"listing_features":"orderonline"}" href="https://yellowpages.pingup.com/Bkm3xG?ypid=11287432&uvid=t3pfPllxtLYkH2dlkSbiCC1marvZprsz1YhqhycO80NYrDv0OMX3uTJ3ryFG464RywmpWCrB&source=web-prod" rel="nofollow" target="_blank" class="action order-online" data-impressed="1">Order Online</a></div><div class="preferred-listing-features"></div><div class="snippet"><figure class="avatar-1 color-1"></figure><p class="body with-avatar">I went here recently with my 2 year old for breakfast. I got the Silverlake omelet and the breakfast sandwich for my son. The food was great (especi…</p></div></div>
If you want to get child/descendant elements of already defined item, you need to use .// to point on current ("item") element, but not // that points on root element. Try below:
Storage=xpath('//div[#class="info"]')
for item in Storage:
Name=item.xpath('.//span[#itemprop="name"]/text()')
Address=item.xpath('.//span[#itemprop="streetAddress" and #class="street-address"]/text()')
Phone=item.xpath('.//div[#itemprop="telephone" and #class="phones phone primary"]/text()')
Related
I am scraping through real estate listings from a certain site that contains multiple pages.
Here, I have summarized a structure nested deep in the DOM. I want to select all list items, based on the descendants that do not have a certain attribute name like <div id="nav-ad-container">
<ul class="photo-cards photo-cards_wow photo-cards_short photo-cards_extra-attribution">
<li>..</li>
<li>..</li>
<li>
<div id="nav-ad-container" class="zsg-aspect-ratio"></div>
</li>
<li>..</li>
<li>..</li>
<li>..</li>
</ul>
However, given that the attribute and the attribute's name change in the DOM for each page.
For example:
#id = 'nav-ad-container' or #class = 'nav-ad-empty'
In general, I want to retrieve the list items that do not contain the name pattern 'nav-ad'.
Things that I've tried with no success (still selects every list item)
xpath + //li[not(contains(#class, 'nav-ad'))]
xpath + //li[not((contains(#class,'nav-ad')) or contains(#id,'nav-ad'))]
Can anyone guide me toward a solution? I feel like I'm pretty close but missing something.
filter by classname of list items or descendants:
//li[not(contains(descendant-or-self::node()/#class,'nav-ad'))]
(not tested)
Try
//li[not(descendant-or-self::node()/#class[contains(.,'nav-ad')])]
I have a list of items that I iterate over as cards w/ thymeleaf:
<div th:each="show,iter : ${shows}" class="col-sm-6 col-xl-4 mb-5">
<div class="card">
...
</div
</div>
I want after every nth card to show an ad instead of the regular card but NOT skip the item in the list. I can't find a way to add the ad code as its own card without it skipping one of the items OR just messing up the UI.
My best thought is to add "dummy" items to the list itself, but that feels wrong.
Any ideas?
The main answer to this is that you want to do the manipulation on the server-side, NOT on the view. Then you can unit-test, cache, and simply display the results without a UI designer caring about any complicated code. All you're really doing is inserting the ad at every nth position and then displaying the full list, one-by-one.
If that is somehow not an option, you can do something like the following. Let's say you have:
List<String> shows = new ArrayList<>(Arrays.asList("Game of Thrones",
"Daniel Tiger's Neighborhood",
"The Mandalorian",
"Breaking Bad",
"RugRats",
"Big Bang Theory",
"Knight Rider",
"Quantum Leap",
"Friends",
"Gilligan's Island"));
model.addAttribute("shows", shows);
model.addAttribute("ad", "StackOverflow");
model.addAttribute("cardsToDisplay", new ArrayList<>()); //ignore a capacity for simplicity for now
Then you can do:
<th:block th:with="cardsToDisplay = ${cardsToDisplay}">
<th:block th:each="show : ${shows}">
<!-- add the first show and every 3 thereafter, add the ad -->
<th:block th:if="${showStat.index % 2 == 0 && showStat.index != 0}">
<th:block th:text="${cardsToDisplay.add(ad)}" th:remove="all"></th:block><!-- or however you are getting your ad -->
</th:block>
<th:block th:text="${cardsToDisplay.add(show)}" th:remove="all"></th:block>
</th:block>
<!-- display the manipulated list -->
<div th:each="theCard : ${cardsToDisplay}" class="col-sm-6 col-xl-4 mb-5">
<div th:text="${theCard}" class="card"></div>
</div>
</th:block>
Then your output would be:
Game of Thrones
Daniel Tiger's Neighborhood
StackOverflow
The Mandalorian
Breaking Bad
StackOverflow
RugRats
Big Bang Theory
StackOverflow
Knight Rider
Quantum Leap
StackOverflow
Friends
Gilligan's Island
Thymeleaf implicitly gives you this construct of showStat because we declare a variable called show. You need th:remove="all" to hide the output of the add operation.
Change the number 2 as needed to represent n.
You can alternatively do this work in Javascript, but doing so introduces another skill that someone on your team would maintain.
How to get the whole title:
Iphone case :) #phonecases#xmas#iphone#case
When the title does not include hashtags I can get all the title with this xpath:
((//*[#class='pinWrapper'])[2]//span)[1]/text()
This line:
((//*[#class='pinWrapper'])[2]//span)[1]//text()[normalize-space()]
returns only the first one: Iphone case :).
And this:
((//*[#class='pinWrapper'])[2]//span)[1][string()]
returns whole xml:
<span>Iphone case :) <span class="pinHashtag">#phonecases</span> <span class="pinHashtag">#xmas</span> <span class="pinHashtag">#iphone</span> <span class="pinHashtag">#case</span></span>
If ((//*[#class='pinWrapper'])[2]//span)[1]/text() returns you first text node only, try
string(((//*[#class='pinWrapper'])[2]//span)[1])
to get complete string
I'm looking to get the output:
50ml milk
From the following code:
<ul class="ingredients-list__group">
<li>50ml <a href="/glossary/milk" class="tooltip-processed">milk
<div class="tooltip">
<h2
class="node-title">Milk</h2> <span class="fonetic">mill-k</span>
<p>One of the most widely used ingredients, milk is often referred to as a complete food. While cow…</p>
</div>
</a>
</li>
</ul>
Currently I'm using the XPATH:
//ul[#class="ingredients-list__group"]/li
But getting:
50ml milk Milk mill-kOne of the most widely used ingredients, milk is often referred to as a complete food. While cow…
How do I exclude the stuff within the div/tooltip?
With xpath 2.0:
//ul[#class="ingredients-list__group"]/li/concat(./text()[1], ./a/text()[1])
With xpath 1.0:
concat(//ul[#class="ingredients-list__group"]/li/text()[1], //ul[#class="ingredients-list__group"]/li/a/text()[1])'
You can select the relevant text nodes using
//ul[#class="ingredients-list__group"]//
text()[not(ancestor::div[#class='tooltip'])]
If you're in XPath 2.0 you can then put this in a call of string-join() to join these into a single string. If you're stuck with 1.0, you'll have to return multiple text nodes to the calling application and concatenate them together in the host language code.
I am newbie here. Please advise. How to select checkbox in my case?
<ul class="phrases-list" style="">
<li>
<input type="checkbox" class="select-phrase">
<span class="prase-title"> Dog - Wikipedia, the free encyclopedia </span>
(en.wikipedia.org)
<div class="prase-desc hidden">The domestic dog (Canis lupus familiaris or Canis familiaris) is a domesticated...</div>
</li>
The following doesn't work for me:
When /I check box "([^\"]+)"$/ do |label|
page.check(label)
end
step: And I check box "Dog - Wikipedia, the free encyclopedia"
If you can change the html, wrap the input and span in a label element
<ul class="phrases-list" style="">
<li>
<label>
<input type="checkbox" class="select-phrase">
<span class="prase-title"> Dog - Wikipedia, the free encyclopedia </span>
</label>
(en.wikipedia.org)
<div class="prase-desc hidden">The domestic dog (Canis lupus familiaris or Canis familiaris) is a domesticated...</div>
</li>
which has the added benefit of clicks on the "Dog - Wikipedia ..." text triggering the checkbox too. With that change your step should work as written. If you can't modify the html then things get more difficult.
Something like
find('span', text: label).find(:xpath, './preceding-sibling::input').set(true)
should work, although I'm curious how you're using these checkboxes from JS with nothing tying them to any specific value
Let's assume that you are prevented from changing the HTML. In this case, it would probably be easiest to query for the element via XPath. For example:
# Here's the XPath query
q = "//span[contains(text(), 'Dog - Wikipedia')]/preceding-sibling::input"
# Use the query to find the checkbox. Then, check the checkbox.
page.find(:xpath, q).set(true)
Okay - it's not as bad as it looks! Let's analyze this XPath so we can understand what it's doing:
//span
This first part says "Search the entire HTML document and discover all "span" elements. Of course, there are probably a LOT of "span" elements in the HTML document, so we'll need to restrict this:
//span[contains(text(), 'Dog - Wikipedia')]
Now we're only searching for the "span" elements that contain the text "Dog - Wikipedia". Presumably, this text will uniquely identify the desired "span" element on the page (if not, then just search for more of the text).
At this point, we have the "span" element that is adjacent to the desired "input" element. So, we can query for the "input" element using the "preceding-sibling::" XPath Axis:
//span[contains(text(), 'Dog - Wikipedia')]/preceding-sibling::input