Lists alphabetical ordered by javascript - sorting

I have this code:
<h2 class="title">A-Z Categories</h2>
<div id="cat-abc">
<div class="cat-top">
<ul>
<li>January</li>
<li>May</li>
<li>November</li>
<li>August</li>
</ul>
</div>
<div class="cat-top">
<ul>
<li>March</li>
<li>June</li>
<li>September</li>
<li>December</li>
</ul>
</div>
</div>
I would like to sort this items alphabetically.How can I Lists alphabetical ordered by javascript ?
August
December
January
June
March
May
November
September
Thanks.

I would like to output looks like sorting items alphabetically:
August
December
January
June
March
May
November
September

Related

XPATH Matching Sale Price or Regular Price

I need to match either the sale price (if on sale) or the regular price using one expression(hope that's the right term). Here's the two example HTML structures:
On Sale
<span class="price">
<del>
<span class="woocommerce-Price-amount amount">
<span class="woocommerce-Price-currencySymbol">$</span>
14.99
</span>
</del>
<ins>
<span class="woocommerce-Price-amount amount">
<span class="woocommerce-Price-currencySymbol">$</span>
12.99
</span>
</ins>
Regular Price
<p class="price">
<span class="woocommerce-Price-amount amount">
<span class="woocommerce-Price-currencySymbol">$</span>
25.00
</span>
</p>
The expression I have so far is:
//*[#class="woocommerce-Price-amount amount"][last()]
It matches on both scenarios but returns both regular and sale prices for the "On Sale" scenario. Do I need some conditional to only return the sale price?
I thought I could possibly only return the last [#class="woocommerce-Price-amount amount"]. I tried last-child but wasn't fully comprehending.
First, note that in your "on sale" snippet, you are missing a closing <span>, but if we fix it, this convoluted expression should do the trick.
It's formatted for easier reading:
//span[#class="woocommerce-Price-amount amount"]
[parent::p
or
parent::ins
]
Try it on your actual code and see if it works.

Getting two lists using Xpath that are both contained in the same container

In the code sample below I'm looking to extract, using Xpath inside of Scrapy, first from list 1 and then from list 2. Some items may be linked out while others are just items in the list. What I need is two strings (or lists) one for List 1 and one for List 2
<div class="row">
<div class="col-xs-12 no-padding-xs">
<h3 class="text-primary gutter-xs">List 1</h3>
<div class="well well-sm">
Miniature, Mustang, Paint Pony, Pinto, Pony, POA, Quarter Pony, Shetland Pony, Spanish Mustang
</div>
<h3 class="text-primary gutter-xs">List 2</h3>
<div class="well well-sm">
All Around, Driving, Halter, Lesson, Natural Horsemanship, Show, Trail Riding, Western Pleasure, Western Riding, Youth, Champion Trainer, POA Ponies for Sale, Newaygo County, Horse Boarding, Equestrian Coaching, Michigan, Riding Lessons, Horse Lea
</div>
</div>
</div>
Not sure that I understood you properly, but you can try:
from w3lib.html import remove_tags
for list_text in ['List 1', 'List 2']:
div_data = response.xpath('//h3[text()="{}"]/following-sibling::div[1]'.format(list_text)).get()
if not div_data:
continue
print [remove_tags(i).strip() for i in div_data.split(',')]
Or if you want just strings:
for list_text in ['List 1', 'List 2']:
div_data = response.xpath('//h3[text()="{}"]/following-sibling::div[1]'.format(list_text)).get()
if not div_data:
continue
print remove_tags(div_data)

ImportXML on Google Sheets - how to get a user variable (kind of)?

After managing to import the filmography for any actor on rateyourmusic.com via
=importxml("https://rateyourmusic.com/films/cary_grant/","//li")
I couldn't figure out how to retrieve my own user rating for certain titles (which would also tell me which title in the list I've already seen).
As I'm still learning my ropes around the importxml command, all I found out is that they're under the 'film_cat_catalog_msg_1050' Xpath identifier(?), but fiddling with said command, all I could get on a separate column on my spreadsheet, was the standard 'rate' word so far - but no personal rating.
Could anyone help me with that, please?
<li><span onclick="RYMartistPage.openFilmCataloger(1050);" class="disco_cat_inner"><span class="disco_cat_catalog_msg"><i class="fa fa-caret-left"></i> </span> <span id="film_cat_catalog_msg_1050">4.5</span></span><div id="film_cataloger_1050" class="film_cataloger"><div class="film_cataloger_close" onclick="RYMartistPage.collapseFilmCataloger(1050);"><i class="fa fa-caret-right"></i> </div> <div id="film_cataloger_content_1050" class="film_cataloger_content"></div></div>
<div class="has_tip film_rel_img delayed_discography_img" data-delayloadurl="url('//e.snmc.io/lk/m/l/45956edc922ce07e2b84a6ff23da3452/6152891.jpg')" data-delayloadurl2x="url('//e.snmc.io/lk/t/l/48b945e1a503ab7a9dce538a50fa9b99/6152891.jpg')" style="background: rgba(0, 0, 0, 0) url("//e.snmc.io/lk/t/l/48b945e1a503ab7a9dce538a50fa9b99/6152891.jpg") repeat scroll 0% 0%;"></div><div class="disco_avg_rating">3.81</div><div class="disco_ratings">1,063</div><div class="disco_reviews">25</div> <div class="film_info">
<div class="film_mainline recommended">
<a title="[Film1050]" href="/film/his_girl_friday/" class="film">His Girl Friday</a>
</div>
<div class="film_subline">
<span title="18 January 1940 " class="disco_year_ymd">1940</span> • Walter Burns
</div>
</div></li>
As you have to be logged in in order to see said ratings, here's a screenshot for those who aren't members:
rateyourmusic.com filmography
Try it with this XPath query:
//span[#id="film_cat_catalog_msg_1050"]
Demo
As you have already guessed, we need something like starts-with since the numeric part is acutally variable:
//span[starts-with(#id, "film_cat_catalog_msg_")]
Demo 2
And putting it all together:
=importxml("https://rateyourmusic.com/films/cary_grant/","//span[starts-with(#id, 'film_cat_catalog_msg_')]")

Xpath to get main paragraph text omitting child nodes

I would like to match the main paragraph content of the following code, omitting the child nodes p, div, h3.
<div class="content">
sunday, monday, tuesday,
<br>
<br>
wednesday, thursday,
<br>
friday, saturday
<div class ="tags">sunday</div>
<h3>Days</h3>
<p>....</p>
<div class="style">monday to friday</div>
</div>
I tried Xpaths like //div[#class="content"]/*[not(self::p)] and //div[#class="content"]/*[not(name()="p")] , but none of them works. Then I tried //div[#class="content"]/node()[not(div)] and //div[#class="content"]/node()[not(h3)] it only matched the first text.
I need the text below
sunday, monday, tuesday,
<br>
<br>
wednesday, thursday,
<br>
friday, saturday
by omitting the children div class ="tags", h3, p, div class = style.
This should do the trick:
//div[#class="content"]/*[not(self::p) and not(self::h3) and not(self::div)]|//div[#class="content"]/text()
Demo
Explanation:
//div[#class="content"] selecting the node in question
*[not(self::p) and not(self::h3) and not(self::div)] omitting child elemnts: h3, p, div
(or instead of any div and not(self::div[#class="style"]) and not(self::div[#class="tags"])] if you really need to filter div class ="tags" and div class = style).
|//div[#class="content"]/text() then, join with the blank text()
Actually, this is a bit complicated. Maybe you are better off just selecting the text or do some DOM manipulation on the node.

How to define an xpath expression that only retrieves hyphenated elements from the first of two similar divs?

The divs below appear in that order in the HTML I am parsing.
//div[contains(#class,'top-container')]//font/text()
I'm using the xpath expression above to try to get any data in the first div below in which a hyphen is used to delimit the data:
Wednesday - Chess at Higgins Stadium
Thursday - Cook-off
The problem is I am getting data from the second div below such as:
Monday 10:00 - 11:00
Tuesday 10:00 - 11:00
How do I only retrieve the data from the first div? (I also want to exclude any elements in the first div that do not contain this hyphenated data)?
<div class="top-container">
<div dir="ltr">
<div dir="ltr"><font face="Arial" color="#000000" size="2">Wednesday - Chess at Higgins Stadium</font></div>
<div dir="ltr"><font face="Arial" size="2">Thursday - Cook-off</font></div>
<div dir="ltr"><font face="Arial" size="2"></font> </div>
<div dir="ltr"> </div>
<div dir="ltr"><font face="Arial" color="#000000" size="2"></font> </div>
</div>
<div dir="ltr">
<div RE><font face="Arial">
<div dir="ltr">
<div RE><font face="Arial" size="2"><strong>Alex Dawkin </strong></font></div>
<div RE><font face="Arial" size="2">Monday 10:00 - 11:00 </font></div>
<div RE><font size="2">Tuesday 10:00 - 11:00 </font></div>
<div RE>
<div RE><font face="Arial" size="2"></font></div><font face="Arial" size="2"></font></div>
<div RE> </div>
<div RE> </div>
Your XPATH was matching on any font element that is a descendant of <div class="top-container">.
div[1] will address the first div child element of the "top-container" element. If you add that to your XPATH, it will return the desired results.
//div[contains(concat(' ',#class,' '),' top-container '))]/div[1]//font/text()
If you want to ensure that only text() nodes that contain "-" are addressed, then you should also add a predicate filter to the text().
//div[contains(concat(' ',#class,' '),' top-container '))]/div[1]//font/text()[contains(.,'-')]
Instead of checking only for nodes
that contain "-", how would you modify
the last expression to just check for
non-empty strings?
If you want to return any text() node with a value, then the predicate filter on text() is not necessary. If a text node doesn't have content, then it isn't a text node and won't be selected.
However, if you only want to select text() nodes that contain text other than whitespace, you could use this expression:
//div[contains(concat(' ',#class,' '),' top-container '))]/div[1]//font/text()[normalize-space()]
normalize-space() removes any leading and trailing whitespace characters. So, if the text() only contained whitespace(including ), the result would be nothing and evaluate to false() in the predicate filter, so only text() containing something other than whitespace will be selected.

Resources