I want to have an XPATH which is able to select the date and time (like june 19 2020 at 08:59 pm) in all cases:
<span class="post_date"><span title="June 21, 2020 at 08:18 AM" currentmouseover="12">1 hour ago</span> <span class="post_edit" id="edited_by_2462600"> </span></span>
<span class="post_date" currentmouseover="62">June 19, 2020 at 08:56 PM <span class="post_edit" id="edited_by_2454907"> </span></span>
<span class="post_date" currentmouseover="157"><span title="June 20, 2020" currentmouseover="168">Yesterday</span> at 10:41 AM <span class="post_edit" id="edited_by_2457722"> </span></span>
I can get the second one easily with //*[#class="post_date"]/text(), but is there any way to get the 2 others and have 1 xpath for all cases? Or am I better off writting a function for this?
Thank you
Working XPath expression to select all dates with one expression :
(//#title|//text())[contains(.,", ") or contains(.," at ")]
Output : 4 nodes
EDIT : If you need something stronger (assuming all messages were posted after year 2000).
//span[#class='post_date']/span[contains(#title,', 20')]/#title|//span/text()[contains(.,' at ') and contains (.,':')][ancestor::*[1][self::span][#class='post_date']]
Or :
(//span[#class='post_date']/span[#title]/#title|//span/text()[ancestor::*[1][self::span][#class='post_date']])[contains(.,', 20') or contains(.,' at ')]
Output : 4 nodes
Related
<div class="page-main-content content-style">
<h1 class="title home-title">What’s On This Week <span class="">in the Best Sports Bar:</span></h1>
<div class="page-main-content-inner">
<p class="">Now open and serving food from 7.00am till 1.00am every day</p>
<p class="">Our kitchen stays open outside of these hours for special events.</p>
<h2>Here’s the sport showing this week at biggest family friendly sports bar & restaurant:</h2>
<h2>BOXING</h2><p class="">Vergil Ortiz Jr v Michael McKinson, Sunday, 7th August # 8.00am</p>
<h2>UFC on ESPN: Santos vs. Hill</h2><p class="">Sunday, 7th August # 9.00am (prelims # 7.00am)<br class=""> Full main card replay # 3.00pm on Sunday</p>
<h2>MOTO GP</h2><p class="">Practice: Friday, 5th August # 3.00pm and Saturday, 6th August # 3.00pm<br class=""> Qualifying: Saturday, 6th August # 6.00pm<br class=""> Races: Sunday, 7th August # 4.30pm</p>
<h2>CRICKET – WEST INDIES v INDIA</h2><p class="">2nd T20I: Monday, 1st August # 9.30pm<br class=""> 3rd T20I: Tuesday, 2nd August # 9.30pm<br class=""> 4th T20I: Saturday, 6th August # 9.30pm<br class=""> 5th T20I: Sunday, 7th August # 9.30pm</p>
</div>
</div>
I am trying to use XPATH to scrape all this content but I don't need the first two p tags and I don't need the first H2 tag either (Here’s the sport showing this week..)
So effectively I need to start scraping at BOXING which is the second H2 tag and then grab ALL content from there.
I've tried dozens of variations to exclude these:
//div[#class='page-main-content-inner']/*[not(self::p)]
But I cannot seem to get this to work. If I exclude p tags it excludes them all. Tried to limit this using stuff like [position()>1] but still cannot do it.
Try this:
//div[#class='page-main-content-inner']/h2[1]/following-sibling::*
It finds your first h2 and returns every sibling element that follows it:
<h2>BOXING</h2>
<p class="">Vergil Ortiz Jr v Michael McKinson, Sunday, 7th August # 8.00am</p>
<h2>UFC on ESPN: Santos vs. Hill</h2>
<p class="">Sunday, 7th August # 9.00am (prelims # 7.00am)<br class=""/> Full main card replay # 3.00pm on Sunday</p>
<h2>MOTO GP</h2>
<p class="">Practice: Friday, 5th August # 3.00pm and Saturday, 6th August # 3.00pm<br class=""/> Qualifying: Saturday, 6th August # 6.00pm<br class=""/> Races: Sunday, 7th August # 4.30pm</p>
<h2>CRICKET – WEST INDIES v INDIA</h2>
<p class="">2nd T20I: Monday, 1st August # 9.30pm<br class=""/> 3rd T20I: Tuesday, 2nd August # 9.30pm<br class=""/> 4th T20I: Saturday, 6th August # 9.30pm<br class=""/> 5th T20I: Sunday, 7th August # 9.30pm</p>
my data looks like
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
22.60 24.60 30.60 34.60 36.20 35.70 32.10 30.20 31.40 31.60 28.00 24.80
25.40 27.60 32.40 34.60 36.50 38.10 31.70 31.40 30.30 30.20 27.00 23.90
and there are like hundreds of rows! I want to find a maximum value in each row and write it in different column next to data along with month
so my out put will be
36.20 MAY
38.10 JUN
.
.
I want to use maxloc function, but i have no idea how to use it!
Try
index = maxloc(myTable(3,:))
print *, myTable((/1,3/), index)
It should select the highest value from the third row and display the first and third value at this index.
I need to check checkbox type label input by text on label.
Actually I would want to check the checkbox by these values Mon, Tue, Wed there in html tag, is it even possible?
The labels Mon Tue Wed ... are shown not like checkboxes, but look like buttons, basically i need to click on label or something...
My html looks like this:
<div id="ck-button"><label><input tabindex="-1" id="checkbox_aghdfklg"
name="checkbox_fdhadfadf" type="checkbox"><span>Mon</span></label></div>
<div id="ck-button"><label><input tabindex="-1" id="checkbox_0_aghdfklg"
name="checkbox_0_fdhadfadf" type="checkbox"><span>Tue</span></label></div>
<div id="ck-button"><label><input tabindex="-1" id="checkbox_1_aghdfklg"
name="checkbox_1_fdhadfadf" type="checkbox"><span>Wed</span></label></div>
...
...
<div id="ck-button"><label><input tabindex="-1" id="checkbox_5_aghdfklg"
name="checkbox_7_fdhadfadf" type="checkbox"><span>Sun</span></label></div>
I would be satisfied enough if I could check checkbox_0_blabla but this blabla part changes every time I edit or change something on my webpage.
I have tried some ways already:
find(:xpath, '//*[contains(#id, "checkbox_1_")]').click()
find(:xpath, '//*[contains(#id, "checkbox_1_")]').set(true)
but I keep getting errors like:
Unable to find xpath...
or
invalid selector
and so on..
Maybe someone has some ideas?
Thank You!
Using Ruby: ruby 1.9.3dev (2011-09-23 revision 33323) [i686-linux]
I have the following string:
str = 'Message relates to activity TU4 Sep 5 Activity 1 <img src="/images/layout/placeholder.png" width="222" height="149"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1.'
I want to match the following:
35 (a number which is part of href attribute value)
TU4 Sep 5 Activity (the text for tag)
First question from Manager on TU4 Sep 5 Activity 1. (the remaining text after last <br/><br/> tags)
For achieving the same I have written the following regex
result = str.match(/<a href="\/activities\/(?<activity_id>\d+)">(?<activity_title>.*)<\/a>.*<br\/><br\/>(?<message>.*)/)
This produces following result:
#<MatchData "TU4 Sep 5 Activity 1 <img src=\"/images/layout/placeholder.png\" width=\"222\" height=\"149\"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1."
activity_id:"35"
activity_title:"TU4 Sep 5 Activity 1"
message:"First question from Manager on TU4 Sep 5 Activity 1.">
But I guess this is not efficient.
Is it possible that somehow only the required values(as mentioned above under what I want to match) is returned in the matched result and the following
value gets excluded from matched result:
"TU4 Sep 5 Activity 1 <img src=\"/images/layout/placeholder.png\" width=\"222\" height=\"149\"/><br/><br/>First question from Manager on TU4 Sep 5 Activity 1."
Thanks,
Jignesh
The appropriate way to do this is NOT to use regexen. Instead, use the Nokogiri library to easily parse your html:
require 'nokogiri'
doc = Nokogiri::HTML.parse(str)
activity_id = doc.css('[href^="/activities"]').attr('href').value[/\d+$/]
activity_title = doc.css('[href^="/activities"]')[0].inner_text
message = doc.search("//text()").last
This will do exactly what your regexp was attempting, with much lower chance of random failure.
Here is the code:
<div id="content">
<div class="datebar">
<span style="float:right">some text1</span>
<b>some text2</b>
Thursday, September 8, 2011 - 1:17 pm EDT
</div>
</div>
I just want to extract date and time Thursday, September 8, 2011 - 1:17 pm EDT.
Any suggestions? Thanks.
div[#id = 'content']/div[#class = 'datebar']/text()
or
div[#id = 'content']/div[#class = 'datebar']/b/following-sibling::text()
Though it should be normalized after.