to get selective content using xpath - xpath

<li>
Chemistry In-House Symposium 2014 (CiHS-2014)
<br>
In 2011, Chemistry In-House Symposium (CiHS) was initiated as department yearly event on the occasion of the International Year of Chemistry and Golden Jubilee Year of the Department. The theme of the CiHS is to provide the platform for the deliberation of exciting research findings and mutual exchange of ideas within the Department. This yearly event, in addition, is expected to provide an opportunity for further enhancing the collaborative research within the department. This year, the CiHS will be organized on the Wednesday, August 13, 2014 at IC & SR auditorium, IIT Madras.
<br>
<strong>Venue:</strong>
IC & SR auditorium, IIT Madras
<br>
<strong>Dates:</strong>
Aug 13, 2014 to Aug 13, 2014
<br>
<strong>Coordinator(s):</strong>
Ramesh Gardas
</li>
I need to write an xpath for the above script in which i should be able to get only the text of 'a' tag,dates and venue. I do not want the entire description.Is there any way we can get only selective text using the xpath=> //div[#class='block-inner clearfix']/ul/li//text()

To get the text within the a element you can simply execute
//div[#class='block-inner clearfix']/ul/li/a/text()
To get the text, which immediately follows the specified <strong/> element you can use the following XPath:
//div[#class='block-inner clearfix']/ul/li/text()[preceding-sibling::strong = "Dates:"][1]
To get the venue, use respectively
//div[#class='block-inner clearfix']/ul/li/text()[preceding-sibling::strong = "Venue:"][1]

Related

Xpath between 2 texts for IMPORTXML formula

I have worked out xpath which gives very close to what I need but needs some small refining.
https://www.punters.com.au/form-guide/
I want all URLs from the website racing Today and only in Australia
These are the xpaths I have now.
This one provides all the races on the page. Including all countries racing today. - //*[#class='component-wrapper form-guide-index']/table1/tbody/tr//td/a/#href
This one provides all races in Australia. But includes races today, tomorrow or any other day on the webpage - //tr[#class="upcoming-race__row"][preceding::tr[#class='upcoming-race__row upcoming-race__row--country']1[*/.="Australia"]]/td[position()>=2]/a/#href
OK. So this is the related topic :
xpath to obtain texts between 2 tags in IMPORTXML formula
To get the links of all races in Australia today (replace " with ' in GoogleSheets) :
//tr[#class="upcoming-race__row"][preceding::td[#class="upcoming-race__country-title"][1][.="Australia"]][preceding::h2[1][.="Today"]]/td[position()>=2]/a/#href
Alternative XPaths :
//h2[.="Today"]/following::table[1]//tr[#class="upcoming-race__row"][preceding::td[#class='upcoming-race__country-title'][1][.="Australia"]]/td[position()>=2]/a/#href
//div[#class="component-wrapper form-guide-index"]/table[1]//tr[#class="upcoming-race__row"][preceding::td[#class='upcoming-race__country-title'][1][.="Australia"]]/td[position()>=2]/a/#href

Render HTML to stdout as formatted text using Ruby

While building a CLI Google Card viewer, I stumbled on the problem of rendering HTML in command line like the browsers w3m or lynx. The closest I have come is using the text spit out from Nokogiri:
Nokogiri::HTML::parse(card_snippet).text
But it prints out as follows:
"Albert EinsteinTheoretical PhysicistAlbert Einstein was a German-born theoretical physicist. He developed the general theory of relativity, one of the two pillars of modern physics. Einstein's work is also known for its influence on the philosophy of science. WikipediaBorn: March 14, 1879, Ulm, GermanyDied: April 18, 1955, Princeton, New Jersey, United StatesInfluenced: Satyendra Nath Bose, Wolfgang Pauli, Leo Szilard, moreInfluenced by: Isaac Newton, Mahatma Gandhi, moreBooksThe World as I See It1949Relativity: The Special a...1916Ideas and Opinions2000Out of My Later Years2006The Meaning of Relativity1922The Evolution of Physics1938People also search forIsaac NewtonEduard EinsteinSonStephen HawkingElsa EinsteinSpouseMileva MarićFormer spouseThomas Edison"
But using lynx:
cat card_snippet.html | lyx -dump -stdin
Albert Einstein
Theoretical Physicist
Albert Einstein was a German-born theoretical physicist. He
developed the general theory of relativity, one of the two pillars
of modern physics. Einstein's work is also known for its influence
on the philosophy of science. Wikipedia
Born: March 14, 1879, Ulm, Germany
Died: April 18, 1955, Princeton, New Jersey, United States
Influenced: Satyendra Nath Bose, Wolfgang Pauli, Leo
Szilard,
Note: After stripping off some noise. But nonetheless the line endings are proper.
Any ideas for a similar solution in Ruby? The html snippet: Pastebin Link.
This works for me,
require 'nokogiri'
html = `curl http://pastebin.com/raw/pYKwACBp`
doc = Nokogiri::HTML(html)
puts doc.text.gsub(/[\r\n]+/,"\n").strip

How to Remove Trailing Zeros in XPath?

I have some not very clean 3rd party data that included numbers like:
PRICE
118.0000
99.0000
etc etc
normally would just use:
{price[1]}
but I just get price like $18,000,000.00 where it should be $118
I tried this (just guessing) ...
number{('price[1]')}
but nothing showed up for the price.
I also tried
format-number{(., 'price[1]')}
but that did not work.
Then I read I can use
translate(#Price, ',.', '.')
I tried that as
translate(#price, ',.', '.')
but no price showed
I then tried several variations using the [1] part , I'm only guessing as I'm not really a coder:
{translate(#price[1], ',.', '.')}
{translate(#price[1],',.','.')} ( I clean the spaces out)
then this one
translate(#price[1], ',.', '.')
and it finally showed a price but only as $1.00 where it should have been $1055 or another one should have been $1145 but they all showed $1.00
What can I do, it must be all on one line as it goes in my web based form to be submitted to import the data?
UPDATED:
Here is what I tried to write in the comments:
I tried these below and this is the results.I used your exact examples including the price as in the first 2 examples, then I tried with the "price" code but that is producing a $1 price again.
substring-before(118.00, '.') $11,800.00
substring-before('118.00','.')$11,800.00
substring-before('price[1]','.')$1.00
substring-before(price[1], '.')$1.00
I also tried using the brackets as I would normaly but that produces no price...
{substring-before(price[1], '.')}
{substring-before('price[1]','.')}
{substring-before('118.00','.')}
{substring-before(118.00, '.')}
I have tried to upload a much smaller copy of some of the input document and just changed some private details with "privatedomain" but I have no permission to include the links so they were deleted.
PROGRAMNAME PROGRAMURL CATALOGNAME LASTUPDATED NAME KEYWORDS DESCRIPTION SKU MANUFACTURER MANUFACTURERID UPC ISBN CURRENCY SALEPRICE PRICE RETAILPRICE FROMPRICE BUYURL IMPRESSIONURL IMAGEURL ADVERTISERCATEGORY THIRDPARTYID THIRDPARTYCATEGORY AUTHOR ARTIST TITLE PUBLISHER LABEL FORMAT SPECIAL GIFT PROMOTIONALTEXT STARTDATE ENDDATE OFFLINE ONLINE INSTOCK CONDITION WARRANTY STANDARDSHIPPINGCOST
PrivateName (deleted link) PrivateName - Product Catalog 2015-03-21 23:06:21.558 Ainsley Cuff, Gold $100-$299, cuff, gold, Open Cuff Captivatingly colorful, Kendra Scott’s collection will spruce up a basic sweater and can simultaneously fancy up a dressier cocktail frock. Her pieces have a southern influence, which brings fun, festivity, and charm to the collection’s aesthetic. 14k Gold Plated 2 Inches Wide Malleable kens-00005B Kendra Scott USD 120.0000 (deleted link) (deleted link)/jpeg_1.jpg Bracelets All Jewelry ,Designers,Shop All,All Jewelry,A-Z Designers,Shop by Occasion,Best of PrivateName,Kendra Scott,Office,Everyday,Vacation,Classic,Casual,Byzantine,Black and White,Gold,Cuff,Destination: Morocco,Back in Stock,Kendra Scott,Bracelets yes
PrivateName (deleted link) PrivateName - Product Catalog 2015-03-21 23:06:21.559 Crystal Deco Brooch $100-$299, crystal "Part of the Ben-Amun Evening Collection. Antique silver-plated over brass Clear Swarovski crystals Length 2.5"" NOTES: This product is made-to-order. Please allow up to 2-3 weeks for delivery. Expedited shipping is not available." BAMU-00037P Ben-Amun Bridal USD 195.0000 (deleted link) (deleted link)/jpg_2.jpg Brooches Collections,Shop By,Designers,Brooches,All Jewelry,Bridal,A-Z Designers,Jewelry Trends,Evening,Bridal,Deco,Crystal,Estate,Ben-Amun Bridal,Shop All yes
PrivateName (deleted link) PrivateName - Product Catalog 2015-03-21 23:06:21.559 Gold Teardrop Cutout Earrings $0-$100, gold "Wendy Mink’s jewelry mixes aspects of traditional Eastern jewelry with classic European design principles. Her pieces are carefully handmade with simple yet unexpected combinations of colors, materials, and shapes. Her collection draws inspiration from textiles created by women in India, Nepal, and Tibet—three regions she spent a great deal of time in while holding a position at the World Bank prior to reinventing herself as a jewelry designer. Gold-plated, 18kt Length 2.5"" Width 1.75"" French wire hook NOTES: This item is made to order and may take up to 3 weeks for delivery." wndm-00107E Wendy Mink USD 73.0000
Please let me know if you need anything else:
If you are using XPATH 1.0 you might be able to use substring-before(xpath, expression)
substring-before(118.00, '.')
should give 118 if I'm not mistaking.

schema.org: Multiple opening hours on same day

I'm building a website for a small store and want to implement schema.org-microdata-markup. The "problem": The store is opened from Tuesday till Friday – from 10:00 till 14:00 AND from 16:30 till 23:00 on these days. So I implemented the opening hours like this…
<time itemprop="openingHours" datetime="Tu-Fr 10:00-14:00, 16:00-23:00">XYZ</time>
But this way, the HTML-validator says…
Bad value Tu-Fr 10:00-14:00, 16:00-23:00 for attribute datetime on element time: The literal did not satisfy the time-datetime format.
How can I implement these multiple opening hours a day? Or is it impossible to do this with the <time>-tag and I have to change it to <meta>-tags? Thanks for your help! :-)
What if you used 2 entries for openingHours?
<time itemprop="openingHours" datetime="Tu-Fr 10:00-14:00">XYZ</time>
<time itemprop="openingHours" datetime="Tu-Fr 16:00-23:00">XYZ</time>
The LocalBusiness example has been updated to use <meta> elements:
<div itemscope itemtype="http://schema.org/Restaurant">
<span itemprop="name">GreatFood</span>
...
Hours:
<meta itemprop="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am - 2:30pm
<meta itemprop="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm - 9:30pm
<meta itemprop="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm - 10:00pm
</div>
The problem is the notations schema.org uses for openingHours are just not (yet) valid by the HTML5 spec for the time element.
You can copy all the examples from LocalBusiness in the Validator and they will all fail validation.
Until the spec contains a definition to write openingHours in a time element you will have to ignore the HTML validators I'm afraid.
BTW. The text on the schema.org site implies you could defines multiple times in one value:
The opening hours for a business. Opening hours can be specified as a weekly time range, starting with days, then times per day.
I recently ran in to the same issue and found that the second string of hours needs to be on another line like this:
<p><time itemprop="openingHours" datetime="Mo,Tu,We,Th, 08:30-13:30">M-Th 8:30am-12:30pm & 1:30pm-6:00pm</time>
<time itemprop="openingHours" datetime="Mo,Tu,We,Th, 14:30-18:00"></time></p>
It's also important to keep this code inside the closing tag, otherwise it will not register as being formatted correctly.

New Rules to validate US social security number(SSN)

I need to validate US SSN number. Currently I have below rules:
Should be 9 digits long.
Not allowed are SSNs with all zeros in any digit group (000-xx-####, ###-00-####, ###-xx-0000).
Not allowed are SSNs with Area Numbers (First 3 digits) 000, 666 and 900-999.
Not allowed are SSNs from 987-65-4320 to 987-65-4329.
And there are few rules to validate Group Code(-xx-). I have verified in below Site but I couldn't understand the logic of "Group Code"?
http://www.codeproject.com/KB/validation/ssnvalidator.aspx
The SSA changed the rules for issuance of SSNs effective June 25, 2011. See http://www.ssa.gov/employer/randomization.html.
The rules for SSNs issued until the day before are outlined here: http://www.ssa.gov/employer/ssnweb.htm
I suppose that for SSNs to be accurately validated, you would need to know their issuance dates. Prior to June 25, 2011, use the old rules. On June 25, 2011 or after, use the new rules.

Resources