Quering data in !cdata under xml document with xpath - xpath

Can I have the query (xpath) for getting the ISBN value ( isbn ...>
<description>
<![CDATA[
<img alt="Japon et fait colonial" src="http://cipango.revues.org/docannexe/file/1514/cip18-small200.jpeg" />
<div class="information"> <div class="isbn">ISBN 978-2-85831-195-8</div> </div>
<div class="introduction" lang="fr"> <p>La colonisation moderne, qui débute au
]]>
<![CDATA[...]]>
</description>

try
/description/substring-before(substring-after(text(), 'class="isbn">'), '<')

Related

Fullpage.js not working when adding a new section

I was using with no problems fullpage.js when a strange thing happened. I add a section to my page (it was 4 slides, i added a new slide) and fullpage seems not to work anymore.
I tried this code https://codepen.io/alvarotrigo/pen/eNLBXo in a new page and even this doesn't work. This is the code i used to set up my page and online i can add a new section without problems. So i think the problem should be somewhere else.
<div id="enlabs">
<div class="section testo">
LUISS Enlabs è un ecosistema animato da una community in continua evoluzione, è il luogo ideale per trasformare ogni startup digitale in una realtà d’impresa, crea opportunità uniche per
gli investitori. La capacità di avere una visione è il focus del nostro progetto. Con un approccio retrofuturistico abbiamo attinto da un repertorio di immagini che richiama gli anni ’50
e ’60 del secolo scorso, un momento storico in cui grazie al boom economico e ai primi viaggi nello spazio si guardava con grande ottimismo alla tecnologia e al futuro.
</div>
<div class="section">
<div class="slide fullscreen">
<picture><source data-srcset="https://www.spazio.studio/wp-content/uploads/2019/img_nuove/enlabs/SPST_enl_01_low.jpg" media="(max-width: 500px)"> <source data-srcset=
"https://www.spazio.studio/wp-content/uploads/2019/img_nuove/enlabs/SPST_enl_01_low.jpg" media="(max-width: 800px)"><img class="full-bleed-img lazyload" data-sizes="auto" data-src=
"https://www.spazio.studio/wp-content/uploads/2019/img_nuove/enlabs/SPST_enl_01_low.jpg"></picture>
</div>
</div>
<div class="section">
<div class="bordered" style="background-image:url(http://www.spazio.studio/wp-content/uploads/2019/img_nuove/cmp_img_017.jpg)">
</div>
</div>
<div class="section">
Four
</div>
<div class="section">
Five
</div>
</div>
<script async src='https://www.spazio.studio/wp-content/uploads/lazysizes.min.js'>
</script>
<script src='https://www.spazio.studio/wp-content/uploads/easings.min.js'>
</script>
<script src='https://www.spazio.studio/wp-content/uploads/scrolloverflow.min.js'>
</script>
<script src='https://www.spazio.studio/wp-content/uploads/fullpage.js'>
</script>
<script src='https://www.spazio.studio/wp-content/uploads/jquery-3.2.1.js'>
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js">
</script>
</body>
</html>
<script>
new fullpage('#enlabs', {
anchors: ['page1', 'page2', 'page3', 'page4', 'page5'],
sectionsColor: ['white', 'white', '#FFF', '#ADD8E6', '#ADD8E6'],
autoScrolling: true,
css3: true,
lazyLoading: true,
scrollOverflow: false,
lockAnchors: true,
});
</script>
I see page progress in navbar but browser window is white.

Google rich snippet - Missing required field "entry-title"

I was trying to see if the microdata was configured well on my website, using the google structured data testing tool and it gives some errors. In this question, I'm asking about the first: Missing required field "entry-title".
I already enclosed the title of the post in entry-title class but it is not working. It is the code of my page that regards the error:
<header itemscope itemtype="http://schema.org/Article">
<figure>
<img width="674" height="250" src="http://www.primapaginaonline.it/wp-content/uploads/2014/07/tagliodiverso-674x250.jpeg?3ef148" class="attachment-topimage wp-post-image" alt="tagliodiverso" itemprop="thumbnailUrl" /></figure>
</figure>
<div itemprop="articleSection" class="single-page-category">Cultura</div>
<h1 itemprop="headline" class="entry-title single-entry-title"> Tagliodiverso, gli appuntamenti della Pietraia dei Poeti</h1>
<div itemprop="description" class="meta-description">Con Tagliodiverso il museo a cielo aperto Pietraia dei Poeti stila un cartellone di incontri culturali incentrati sulla disabilità e l'accessibilità.</div>
<div class="single-post-meta"></div>
</header>
Solved, I must enclose all the article in a <div class="hentry"> tag and all works!

How can I get several similar tags data with HtmlAgilityPack?

Before explaining, I am using VB.net and HtmlAgilityPack.
I have the below html, all three sections have the same format. I am using htmlagilitypack to extract the data from the Title and Date. My code extracts the title correctly but the date is only extracted from the first instance and repeated 3 times:
HtmlAgilityPack code:
For Each h4 As HtmlNode In docnews.DocumentNode.SelectNodes("//h4[(#class='title')]")
Dim date1 As HtmlNode = docnews.DocumentNode.SelectSingleNode("//span[starts-with(#class, 'date ')]")
Dim newsdate As String = date1.InnerText
MessageBox.Show(h4.InnerText)
MessageBox.Show(newsdate)
Next
I thought being in each h4, I get its associated date accordingly...
HTML code:
<div class="article-header" style="" data-itemid="920729" data-source="ABC" data-preview="Text 1">
<h4 class="title">Text for Mr. A</h4>
<div class="byline">
<span class="date timestamp"><span title="29 November 2013">29-11-2013</span></span>
<span class="source" title="AGE">18</span>
</div>
<div class="preview">Text 1 Preview</div>
</div>
<div class="article-header" style="" data-itemid="920720" data-source="ABC" data-preview="Text 2">
<h4 class="title">Text for Mr. B</h4>
<div class="byline">
<span class="date timestamp"><span title="27 November 2013">27-11-2013</span></span>
<span class="source" title="AGE">25</span>
</div>
<div class="preview">Text 2 Preview</div>
</div>
<div class="article-header" style="" data-itemid="920719" data-source="ABC" data-pre+view="Text 3">
<h4 class="title">Text for Mr. C</h4>
<div class="byline">
<span class="date timestamp"><span title="22 October 2013">22-10-2013</span></span>
<span class="source" title="AGE">20</span>
</div>
<div class="preview">Text 3 Preview</div>
</div>
Final Output should be:
Text for Mr. A
29-11-2013
Text for Mr. B
27-11-2013
Text for Mr. C
22-10-2013
What I am getting with my code:
Text for Mr. A
29-11-2013
Text for Mr. B
29-11-2013
Text for Mr. C
29-11-2013
Any help is much appreciated.
You need to anchor your second XPath to look 'below' the h4:
Dim date1 As HtmlNode = h4.Parent.SelectSingleNode(".//span[starts-with(#class, 'date ')]")
^^^^^^^^^ ^^^
The .// tells Xpath to look under the node the Xpath is executed on. Thus by calling SelectSingleNode on the h4.Parent you get the date below the parent div tag of the h4.

metadata in webshop with multiple product on page

We have a webshop in Magento that has a lot of grouped products. A grouped product page has the basic info, and then a table with all the products in it. This table contains for each row the SKU, some attributes and the price. I want to add metadata (from schema.org) to it, but I'm not sure how to do this.
I tried it by adding an itemtype product for each and every row in that table, but that doesn't link to the product name in any way. I have also tried to make the whole page a product, but that doesn't give the desired result.
Has anyone come across this before and has solved it? Any input is welcome!
The page I'm working on: clickie
In fact in every row you have a bit different product (differs by diameter, length, etc). Ideally you should indicate this using schema.org/Product nested in schema.org/Offer and linked with general product information using itemref. Smth like this:
<div id="product_general">
<h1 itemprop="name" >Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N</h1>
</div>
<div itemscope itemtype="http://schema.org/Offer">
<div itemprop="itemOffered" itemscope itemtype="http://schema.org/Product" itemref="product_general">
<span itemprop="model">Diameter: 1.0</span>
</div>
<span itemprop="Price">€ 0,13</span>
</div>
The issue here is that you're using table for specific product and offer information. It seems there is no way to make a construction above in your current design with valid html code. However this is not a big problem for you if you're looking more for Rich Snippets than for super correct markup.
So your issue with Rich Snippets now is that highest price is not correct.
You can easily fight this using schema.org/AggregateOffer. In your current code (light version):
<div class="wrapper product-view" itemscope itemtype="http://schema.org/Product">
<h1 itemprop="name" id="product_name">Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N</h1>
<img itemprop="image" src="http://induweb.nl/media/catalog/product/cache/1/image/185x/5e06319eda06f020e43594a9c230972d/import/Verspanen/Boren/Cylindrische schacht/100000002-induweb-spiraalboor-hss-rolgewalst-din-338-type-n_0/induweb.nl--100000002-30.jpg" alt="Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N" title="Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N" />
<table><tr><td itemprop="brand">InduWeb</td></tr></table>
<div itemprop="description">
<p>· Rolgewalst <br />· Cilinderschacht <br />· Rechtssnijdend <br />· Kegelmantelgeslepen 118° <br />· Zwarte uitvoering</p> </div>
<!-- Put http://AggregateOffer here with high and low price properties-->
<div itemprop="offers" itemscope itemtype="http://schema.org/AggregateOffer">
<meta itemprop="lowPrice" content="€ 0,13">
<meta itemprop="highPrice" content="€ 1.75">
<meta itemprop="offerCount" content="98">
</div>
<!-- End of AggregateOffer-->
<table>
<tr itemscope itemtype="http://schema.org/Offer" itemprop="offers">
<td itemprop="sku">
<div class="shipping shipping-176" itemprop="availability" content="in_stock"></div>
100010006
</td>
<!-- Start sub attributen -->
<!-- -->
<td class="a-center">1.0</td>
<!-- -->
<td class="a-center">34</td>
<!-- -->
<td class="a-center">12</td>
<!-- Einde sub attributen -->
<td class="a-center" style="width: 25px;"><p>10</p></td>
<td>
<span itemprop="price">
<span class="price">€ 0,13</span>
</span>
</td>
</tr>
</table>
</div>
Although it's not semantically super correct but it will give pretty good result:

How to get text between two strings with special characters in ruby?

I have a string (#description) that contains HTML code and I want to extract the content between two elements. It looks something like this
<b>Content title<b><br/>
*All the content I want to extract*
<a href="javascript:print()">
I've managed to do something like this
#want = #description.match(/Content title(.*?)javascript:print()/m)[1].strip
But obviously this solution is far from perfect as I get some unwanted characters in my #want string.
Thanks for your help
Edit:
As requested in the comments, here is the full code:
I'm already parsing an HTML document doing something where the following code:
#description = #doc.at_css(".entry-content").to_s
puts #description
returns:
<div class="post-body entry-content">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"><br><br><div style="text-align: justify;">
Some text</div>
<b>More text</b><br><b>More text</b><br><br><ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br><b>Content Title</b><br>
Some text<br><br>
Some text(with links and images)<br>
Some text(with links and images)<br>
Some text(with links and images)<br>
<br><br><img src="http://url.com/photo.jpg">
<div style="clear: both;"></div>
</div>
The text can include more paragraphs, links, images, etc. but it always starts with the "Content Title" part and ends with the javascript reference.
This XPath expression selects all (sibling) nodes between the nodes $vStart and $vEnd:
$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
To obtain the full XPath expression to use in your specific case, simply substitute $vStart with:
/*/b[. = 'Content Title']
and substitute $vEnd with:
/*/a[#href = 'javascript:print()']
The final XPath expressions after the substitutions is:
/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
Explanation:
This is a simple corollary of the Kayessian formula for the intersection of two nodesets $ns1 and $ns2:
$ns1[count(.|$ns2) = count($ns2)]
In our case, the set of all nodes between the nodes $vStart and $vEnd is the intersection of two node-sets: all following siblings of $vStart and all preceding siblings of $vEnd.
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vStart" select="/*/b[. = 'Content Title']"/>
<xsl:variable name="vEnd" select="/*/a[#href = 'javascript:print()']"/>
<xsl:template match="/">
<xsl:copy-of select=
"$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
"/>
==============
<xsl:copy-of select=
"/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (converted to a well-formed XML document):
<div class="post-body entry-content">
<a href="http://www.photourl">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"/>
</a>
<br />
<br />
<div style="text-align: justify;">
Some text</div>
<b>More text</b>
<br />
<b>More text</b>
<br />
<br />
<ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br />
<b>Content Title</b>
<br />
Some text
<br />
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
<br />
<br />
<a href="javascript:print()">
<img src="http://url.com/photo.jpg"/>
</a>
<div style="clear: both;"></div>
</div>
the two XPath expressions (with and without variable references) are evaluated and the nodes selected in each case, conveniently delimited, are copied to the output:
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
==============
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
To test your HTML, I have added tags around your code then pasting it in a file
xmllint --html --xpath '/html/body/div/text()' /tmp/l.html
output :
Some text
Some text
Some text
Some text
Now, you can use an Xpath module in ruby and re-use the Xpath expression
You will find many examples on stackoverflow website searches.

Resources