sort results from external website

sort results from external website - sorting

I'm displaying sporting fixtures from an external site using curl and absolute links which works well. Problem is that the external site doesn't sort the results correctly. Could I do this maybe referring to the tags of the external page.
This my code:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.sportingpulse.com/mobile/mobile.cgi?a=CL&aID=2307");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
curl_close($ch);
$result = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#",'$1http://www.sportingpulse.com/mobile/$2$3', $result);
echo $result
?>
This is the results I'd like sorted by age group (maybe using class="list-name") or any other suggestions:
<ul class="options" data-role="listview">
<li role="heading" data-role="list-divider">Please choose your Competition</li>
<li><div class="list-name">2012 Winter 23 Girls A</div></li>
<li><div class="list-name">2012 Winter 18 Boys A</div></li>
<li><div class="list-name">2012 Winter 23 Girls AR</div></li>
<li><div class="list-name">2012 Winter 18 Boys AR</div></li>
<li><div class="list-name">2012 Winter 18 Boys B</div></li>
<li><div class="list-name">2012 Winter 23 Girls B</div></li>
<li><div class="list-name">2012 Winter 18 Boys BR</div></li>
<li><div class="list-name">2012 Winter 18 Girls BR</div></li>
<li><div class="list-name">2012 Winter 18 Boys C</div></li>
<li><div class="list-name">2012 Winter 23 Girls BR</div></li>
<li><div class="list-name">2012 Winter 23 Girls C</div></li>
<li><div class="list-name">2012 Winter 20 Boys A</div></li>
<li><div class="list-name">2012 Winter 20 Boys AR</div></li>
<li><div class="list-name">2012 Winter 23 Boys A</div></li>
<li><div class="list-name">2012 Winter 20 Boys B</div></li>
<li><div class="list-name">2012 Winter 23 Boys AR</div></li>
<li><div class="list-name">2012 Winter 23 Boys B</div></li>
<li><div class="list-name">2012 Winter 23 Boys BR</div></li>
</ul>

This works alphabetically sorting the links contained in class="option" which in-turn sort the age groups
sort('ul.options>li', 'a');
function sort(list, key) {
$($(list).get().reverse()).each(function(outer) {
var sorting = this;
$($(list).get().reverse()).each(function(inner) {
if($(key, this).text().localeCompare($(key, sorting).text()) > 0) {
this.parentNode.insertBefore(sorting.parentNode.removeChild(sorting), this);
}
});
});
}

Related

XPath to exclude only a specific amount of html tags

<div class="page-main-content content-style">
<h1 class="title home-title">What’s On This Week <span class="">in the Best Sports Bar:</span></h1>
<div class="page-main-content-inner">
<p class="">Now open and serving food from 7.00am till 1.00am every day</p>
<p class="">Our kitchen stays open outside of these hours for special events.</p>
<h2>Here’s the sport showing this week at biggest family friendly sports bar & restaurant:</h2>
<h2>BOXING</h2><p class="">Vergil Ortiz Jr v Michael McKinson, Sunday, 7th August # 8.00am</p>
<h2>UFC on ESPN: Santos vs. Hill</h2><p class="">Sunday, 7th August # 9.00am (prelims # 7.00am)<br class=""> Full main card replay # 3.00pm on Sunday</p>
<h2>MOTO GP</h2><p class="">Practice: Friday, 5th August # 3.00pm and Saturday, 6th August # 3.00pm<br class=""> Qualifying: Saturday, 6th August # 6.00pm<br class=""> Races: Sunday, 7th August # 4.30pm</p>
<h2>CRICKET – WEST INDIES v INDIA</h2><p class="">2nd T20I: Monday, 1st August # 9.30pm<br class=""> 3rd T20I: Tuesday, 2nd August # 9.30pm<br class=""> 4th T20I: Saturday, 6th August # 9.30pm<br class=""> 5th T20I: Sunday, 7th August # 9.30pm</p>
</div>
</div>
I am trying to use XPATH to scrape all this content but I don't need the first two p tags and I don't need the first H2 tag either (Here’s the sport showing this week..)
So effectively I need to start scraping at BOXING which is the second H2 tag and then grab ALL content from there.
I've tried dozens of variations to exclude these:
//div[#class='page-main-content-inner']/*[not(self::p)]
But I cannot seem to get this to work. If I exclude p tags it excludes them all. Tried to limit this using stuff like [position()>1] but still cannot do it.

Try this:
//div[#class='page-main-content-inner']/h2[1]/following-sibling::*
It finds your first h2 and returns every sibling element that follows it:
<h2>BOXING</h2>
<p class="">Vergil Ortiz Jr v Michael McKinson, Sunday, 7th August # 8.00am</p>
<h2>UFC on ESPN: Santos vs. Hill</h2>
<p class="">Sunday, 7th August # 9.00am (prelims # 7.00am)<br class=""/> Full main card replay # 3.00pm on Sunday</p>
<h2>MOTO GP</h2>
<p class="">Practice: Friday, 5th August # 3.00pm and Saturday, 6th August # 3.00pm<br class=""/> Qualifying: Saturday, 6th August # 6.00pm<br class=""/> Races: Sunday, 7th August # 4.30pm</p>
<h2>CRICKET – WEST INDIES v INDIA</h2>
<p class="">2nd T20I: Monday, 1st August # 9.30pm<br class=""/> 3rd T20I: Tuesday, 2nd August # 9.30pm<br class=""/> 4th T20I: Saturday, 6th August # 9.30pm<br class=""/> 5th T20I: Sunday, 7th August # 9.30pm</p>

How to get this value with only one xpath?

I want to have an XPATH which is able to select the date and time (like june 19 2020 at 08:59 pm) in all cases:
<span class="post_date"><span title="June 21, 2020 at 08:18 AM" currentmouseover="12">1 hour ago</span> <span class="post_edit" id="edited_by_2462600"> </span></span>
<span class="post_date" currentmouseover="62">June 19, 2020 at 08:56 PM <span class="post_edit" id="edited_by_2454907"> </span></span>
<span class="post_date" currentmouseover="157"><span title="June 20, 2020" currentmouseover="168">Yesterday</span> at 10:41 AM <span class="post_edit" id="edited_by_2457722"> </span></span>
I can get the second one easily with //*[#class="post_date"]/text(), but is there any way to get the 2 others and have 1 xpath for all cases? Or am I better off writting a function for this?
Thank you

Working XPath expression to select all dates with one expression :
(//#title|//text())[contains(.,", ") or contains(.," at ")]
Output : 4 nodes
EDIT : If you need something stronger (assuming all messages were posted after year 2000).
//span[#class='post_date']/span[contains(#title,', 20')]/#title|//span/text()[contains(.,' at ') and contains (.,':')][ancestor::*[1][self::span][#class='post_date']]
Or :
(//span[#class='post_date']/span[#title]/#title|//span/text()[ancestor::*[1][self::span][#class='post_date']])[contains(.,', 20') or contains(.,' at ')]
Output : 4 nodes

tinymce removes empty element on hitting backspace in editor

TinyMCE removes empty element element on hitting backspace in editor. The element is used for styling purposes. The problem does occur in Firefox and not when using Chrome.
Steps to reproduce:
1. go to http://fiddle.tinymce.com/P0eaab (config adapted such that with class info is not cleaned)
2. click in the menu bar on "tools" and then on "Source code"
3. paste the following html code
<div class="eight columns row">
<h2>Tarieven seizoensopening 2014</h2>
Prijs per persoon per nacht met inbegrip van een uitgebreid ontbijt en 's avonds een culinair driegangenmenu (apero-wijn-water-koffie):
<ul class="list unstyled-list">
<li><i class="small-arrows"></i>Kamer 1: 125 € ppppn*</li>
<li><i class="small-arrows"></i>Kamer 2: 115 € ppppn*</li>
<li><i class="small-arrows"></i>Kamer 3: 115 € ppppn*</li>
<li><i class="small-arrows"></i>Kamer 4: 100 € ppppn*</li>
</ul>
</div>
Hit the "OK" button
Put cursor in editor window right behind the "5" in "125"
Hit backspace such that "125" is changed in "12"
Click in the menubar on "Tools" then on "Source code"
Expected result (only snippet with problem shown):
<li><i class="small-arrows">Kamer 1: 12 € ppppn*</li>
<li><i class="small-arrows"></i>Kamer 2: 115 € ppppn*</li>
Actual result (only snippet with problem shown):
<li>Kamer 1: 12 € ppppn*</li>
<li><i class="small-arrows"></i>Kamer 2: 115 € ppppn*</li>
Additional Notes:
-somehow the backspace did clean up the "" on the first li element
-performing the same experiment in chrome does not yield the error
-all plugins in FireFox were disabled in this test to avoid any interference
-is there a way to figure out what causes this removal of the i element? Can you actually use a debugger that shows "live" javascript action. I have Firebug but don't see how I can see the javascript live in action

It is so old question but maybe it will useful for someone.
You need add following setting in tinyMCE configuration
extended_valid_elements:'i[*]'
I will help only for "i" tag. So if you have other empry tags you need add it also here.

Scrape data of a div after one or two html tags using xpath

Here is the code:
<div id="content">
<div class="datebar">
<span style="float:right">some text1</span>
<b>some text2</b>
Thursday, September 8, 2011 - 1:17 pm EDT
</div>
</div>
I just want to extract date and time Thursday, September 8, 2011 - 1:17 pm EDT.
Any suggestions? Thanks.

div[#id = 'content']/div[#class = 'datebar']/text()
or
div[#id = 'content']/div[#class = 'datebar']/b/following-sibling::text()
Though it should be normalized after.

Retrieving an element from XPath using text as basis

Terrible title, I know, but is there a way in XPath to get to a desired link by only knowing that the link is second going back from the last ellipsis?
In this instance, the desired link is /2
<div class="page">
1
2
3
...
50
51
52
</div>
In this case, it is /3.
<div class="page">
1
2
3
4
...
50
51
52
</div>
And, just to throw a spanner, in the works... this one is 21:
<div class="page">
1
2
3
...
18
19
20
21
22
...
50
51
52
</div>
I've tried all sorts of ways to get at it, from writing out counts to throwing magic beans out of my window, but nothing works. And now I'm out of magic beans. :(
Any suggestions for this problem (XPath, not the magic beans!) are welcome!

/div[#class='page']/text()[normalize-space()='…'][last()]/preceding-sibling::a[2]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

sort results from external website - sorting

Related

XPath to exclude only a specific amount of html tags

How to get this value with only one xpath?

tinymce removes empty element on hitting backspace in editor

Scrape data of a div after one or two html tags using xpath

Retrieving an element from XPath using text as basis

Categories

Resources