How to exclude content from a rich snippet element? - microdata

I'm trying to apply rich snippet data to my web page, following http://schema.org/Article standards. One of the properties is articleBody, which I expect should include the entire body of text that comprises the article.
Unfortunately, the article's HTML representation is spotted with occasional buttons, ads and other hints, which has text that should not go into the articleBody.
For example:
<div itemscope itemtype="http://schema.org/Article">
<div itemtype="articleBody">
<p>1st Paragraph</p>
<p>2nd paragraph</p>
<a>A few useful links for my users</a>
<p>3rd paragraph</p>
<div>A few text ads</div>
<p>4th paragraph</p>
</div>
</div>
Is there a way to exclude the texts in the ads/links from the article itself?

No, Microdata doesn’t offer a way to exclude content.
articleBody’s value will be the textContent of the element.
An ugly "hack" would be to specify several articleBody properties for this item:
<div itemscope itemtype="http://schema.org/Article">
<div itemtype="articleBody">
<p>1st Paragraph</p>
<p>2nd paragraph</p>
</div>
<a>A few useful links for my users</a>
<p itemtype="articleBody">3rd paragraph</p>
<div>A few text ads</div>
<p itemtype="articleBody">4th paragraph</p>
</div>
</div>
But note that Microdata does not define how those values should be interpreted, so it’s up to the consumers.
Another ugly method:
Duplicate the information, contained in a meta element:
<div itemscope itemtype="http://schema.org/Article">
<div>
<p>1st Paragraph</p>
<p>2nd paragraph</p>
<a>A few useful links for my users</a>
<p>3rd paragraph</p>
<div>A few text ads</div>
<p>4th paragraph</p>
</div>
<meta itemtype="articleBody" content="1st Paragraph. 2nd paragraph. 3rd paragraph. 4th paragraph." />
</div>

Related

How to get elements between tags with XPATH

I need to get each subtitle of an article and its text. Since each subheading is inside , and I need to get everything between the first and the second. And then I will do between the second and third until I finish.
The structure is similar to this:
<article>
<p> introducion </p>
<h3>1. Subtitle </h3>
<p> text text </p>
<div> <p>other text</p> </div>
<h3>2. Subtitle </h3>
<p> text text </p>
<div> <p>other text</p> </div>
<h3>3. Subtitle </h3>
<p> text text </p>
<div> <p>other text</p> </div>
</article>
Currently I can get to the first subtitle like this: //h3[1]
But how can I get everything between the first and the second ???
This XPath expression gets nodes between //h3[1] and //h3[2] inclusive
//article/*[position()>= count(//h3[1]/preceding-sibling::*)+1 and position()<= count(//h3[2]/preceding-sibling::*)+1]
Result on browser console
$x('//article/*[position()>= count(//h3[1]/preceding-sibling::*)+1 and position()<= count(//h3[2]/preceding-sibling::*)+1]')
Array(4) [ h3, p, div, h3]
0: <h3>​
1: <p>​
2: <div>​
3: <h3>
length: 4

Best way to markup "mainContentOfPage"?

for other areas of a web page it is simple to mark up; i.e. navigation element, header, footer, sidebar
Not so with mainContentOfPage; I've seen a number of different ways to implement this, most recently (and I found this one to be the most strange) on schema.org itself:
<div itemscope itemtype="http://schema.org/Table">
<meta itemprop="mainContentOfPage" content="true"/>
<h2 itemprop="about">list of presidents</h2>
<table>
<tr><th>President</th><th>Party</th><tr>
<tr>
<td>George Washington (1789-1797)</td>
<td>no party</td>
</tr>
<tr>
<td>John Adams (1797-1801)</td>
<td>Federalist</td>
</tr>
...
</table>
</div>
I could use some examples; the main content of my page is in this case a search results page, but I would plan to use this on other pages too (homepage, product page, etc.)
Edit, I found some more examples:
Would this be valid? I found this on a blog:
<div id="main" itemscope itemtype="http://schema.org/WebPageElement" itemprop="mainContentOfPage">
<p>The content</p>
</div>
I also found this even simpler example on another blog (might be too simple?):
<div id="content" itemprop="mainContentOfPage">
<p>The content</p>
</div>
The mainContentOfPage property can be used on WebPage and expects a WebPageElement as value.
But Table is not a child of WebPage and true is not an expected value. So this example is in fact strange, as it doesn’t follow the specification.
A parent WebPage should use Table as value for mainContentOfPage:
<body itemscope itemtype="http://schema.org/WebPage">
<div itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/Table">
</div>
</body>
EDIT: Update
Your second example is the same like mine, it just uses the more general WebPageElement instead of Table. (Of course you’d still need a parent WebPage item, like in my example.)
Your third example is not in line with schema.org’s definition, as the value is Text and not the expected WebPageElement (or child) item.
A valid option would be:
<body itemscope itemtype="http://schema.org/WebPage">
<main itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/WebPageElement">
<div itemprop="about" itemscope="" itemtype="http://schema.org/Thing">
<h1 itemprop="name">whatever</h1>
</div>
</main>
</body>
Of course you may add related properties to top-level or nested elements, and change Thing into any other item type listed at Full Hierarchy. I also recommend to use mainEntity, documentation still doesn't clarify if it's really necessary, but according to 1st example here, using WebPage you may want to specify a mainEntity:
<body itemscope itemtype="http://schema.org/WebPage">
<header><h1 itemscope itemprop="mainEntity" itemtype="http://schema.org/Thing">whatever</h1></header>
<main itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/WebPageElement">
<div itemprop="about" itemscope="" itemtype="http://schema.org/Thing">
<h2 itemprop="name">whatever</h2>
</div>
</main>
</body>
Cannot tell if also this would be valid:
<body itemscope itemtype="http://schema.org/WebPage">
<main itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/WebPageElement">
<div itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Thing">
<h1 itemprop="name">whatever</h1>
</div>
</main>
</body>
Documentation doesn't say nothing about setting mainEntity to nested items.
In any case, consider that "[...] Every web page is implicitly assumed to be declared to be of type WebPage [...]" as stated in WebPage description, and use of HTML tags as <main>, <footer> or <header> already gives information about what type of elements are used in a page. So if actually you do not need to add relevant information to those elements or to your web page itself, with a proper use of HTML tags you could easily do without mainContentOfPage or even WebPage.

Google Structured Data Testing Tool dont validate goodrelations extension

<div
itemscope="itemscope"
itemtype="http://schema.org/Product"
itemid="urn:mpn:123456789">
<link
itemprop="additionalType"
href="http://www.productontology.org/id/Lawn_mower">
<span
itemprop="http://purl.org/goodrelations/v1#category"
content="Lawn mower">
Lawn mower
</span>
</div>
There is above an fragment of my markup and when I put on Google Structured Data Testing Tool I'm receiving the error:
'Error: Page contains property "http://purl.org/goodrelations/v1#category" which is not part of the schema.'.
I was thinking about remove microdata from span tag and keep only the link tag above with microdata to make it validate.
On [http://www.productontology.org/doc/Lawn_mower] there is the statement : "Breaking news: schema.org has just implemented our proposal to define an additionalType property with the use of this service in mind!" and I think it means it is compatible.
This error can impact my SEO? There is some advise to me? I searched about it a lot and can't found anything related.
The final markup after #daviddeering help:
<div itemscope="itemscope" itemtype="http://schema.org/Product" itemid="urn:mpn:123456789">
<a href="http://127.0.0.1/jkr/123456789" itemprop="url">
<img itemprop="image" alt="Partnumber:123456789" src="http://127.0.0.1/jkr/img/123456789.jpg" content="http://127.0.0.1/jkr/img/123456789.jpg">
<span itemprop="name">123456789 - Bosh lawn mower</span>
</a>
<span>PartNumber: </span>
<span itemprop="mpn">123456789</span>
<span>Line: </span>
<span itemprop="additionalType" href="http://www.productontology.org/id/Lawn_Mower">Lawn mower</span>
<span>Manuf.: </span>
<div itemscope="itemscope" itemprop="manufacturer"
itemtype="http://schema.org/Organization"><span itemprop="name">Bosh</span>
</div>
<div itemprop="offers" itemscope="itemscope" itemtype="http://schema.org/Offer">
<meta itemprop="availabilityStarts" content="2013-10-20 05:27:36"><span itemprop="priceCurrency" content="USD">USS</span><span itemprop="price" content="565.29">565,29*</span>
<link itemprop="availability" href="http://schema.org/OutOfStock"><span itemprop="inventoryLevel" content="0">Ask for it</span>
</div>
</div>
Well the product schema must always include a name. And the structure of your last itemprop line was incorrect. So the following code tested fine in Google's testing tool:
<div
itemscope="itemscope"
itemtype="http://schema.org/Product"
itemid="urn:mpn:123456789">
<span itemprop="name">Name of Lawn Mower</span>
<link
itemprop="additionalType"
href="http://www.productontology.org/id/Lawn_mower">
<span rel="gr:hasBusinessFunction" resource="http://purl.org/goodrelations/v1#sell"
content="Lawn mower">
Lawn mower
</span>
</div>
Although in your case, I'm not sure if it's necessary to combine the product schema and the GoodRelations markup. You could create the entire markup using just GoodRelations, or you could use schema.org and simply use the tag [link
itemprop="additionalType"
href="http://www.productontology.org/id/Lawn_mower"] where it currently is in the code then continue using schema to mark up the rest.

Rich Snippets : Microdata itemprop out of the itemtype?

I've recently decided to update a website by adding rich snippets - microdata.
The thing is I'm a newbie to this kind of things and I'm having a small question about this.
I'm trying to define the Organization as you can see from the code below:
<div class="block-content" itemscope itemtype="http://schema.org/Organization">
<p itemprop="name">SOME ORGANIZATION</p>
<p itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">Manufacture Street no 4</span>,
<span itemprop="PostalCode">4556210</span><br />
<span itemprop="addressLocality">CityVille</span>,
<span itemprop="addressCountry">SnippetsLand</span></p>
<hr>
<p itemprop="telephone">0444 330 226</p>
<hr>
<p>info#snippets.com</p>
</div>
Now, my problems consists in the following: I'd like to also tag the LOGO in order to make a complete Organization profile, but the logo stands in the header of my page, and the div I've posted above stands in the footer and the style/layout of the page doesnt permit me to add the logo in here and also make it visible.
So, how can I solve this thing? What's the best solution?
Thanks.
You can use the itemref attribute.
Give your logo in the header an id and add the corresponding itemprop:
<img src="acme-logo.png" alt="ACME Inc." itemprop="logo" id="logo" />
Now add itemref="logo" to your div in the footer:
<div class="block-content" itemscope itemtype="http://schema.org/Organization" itemref="logo">
…
</div>
If this is not possible in your case, you could "duplicate" the logo so that it’s included in your div, but not visible. Microdata allows meta and link elements in the body for this case. You should use the link element, as http://schema.org/Organization expects an URL for the logo property. (Alternatively, add it via meta as a separate ImageObject).
<div class="block-content" itemscope itemtype="http://schema.org/Organization">
…
<link itemprop="logo" src="logo.png" />
…
</div>
Side note: I don’t think that you are using the hr element correctly in your example. If you simply want to display a horizontal line, you should use CSS (e.g. border-top on the p) instead.
Dan, you could simply add in the logo schema with this code:
<img itemprop="logo" src="http://www.example.com/logo.png" />
So in your example, you could simply tag it as:
<div class="block-content" itemscope itemtype="http://schema.org/Organization">
<p itemprop="name">SOME ORGANIZATION</p>
<img itemprop="logo" src="http://www.example.com/logo.png" />
<p itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">Manufacture Street no 4</span>,
<span itemprop="PostalCode">4556210</span><br />
<span itemprop="addressLocality">CityVille</span>,
<span itemprop="addressCountry">SnippetsLand</span></p>
<hr>
<p itemprop="telephone">0444 330 226</p>
<hr>
<p>info#snippets.com</p>
</div>
I believe that should work for your particular case and it won't actually show the logo and you wouldn't have to mark up the logo separately. Hope that helps.

metadata in webshop with multiple product on page

We have a webshop in Magento that has a lot of grouped products. A grouped product page has the basic info, and then a table with all the products in it. This table contains for each row the SKU, some attributes and the price. I want to add metadata (from schema.org) to it, but I'm not sure how to do this.
I tried it by adding an itemtype product for each and every row in that table, but that doesn't link to the product name in any way. I have also tried to make the whole page a product, but that doesn't give the desired result.
Has anyone come across this before and has solved it? Any input is welcome!
The page I'm working on: clickie
In fact in every row you have a bit different product (differs by diameter, length, etc). Ideally you should indicate this using schema.org/Product nested in schema.org/Offer and linked with general product information using itemref. Smth like this:
<div id="product_general">
<h1 itemprop="name" >Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N</h1>
</div>
<div itemscope itemtype="http://schema.org/Offer">
<div itemprop="itemOffered" itemscope itemtype="http://schema.org/Product" itemref="product_general">
<span itemprop="model">Diameter: 1.0</span>
</div>
<span itemprop="Price">€ 0,13</span>
</div>
The issue here is that you're using table for specific product and offer information. It seems there is no way to make a construction above in your current design with valid html code. However this is not a big problem for you if you're looking more for Rich Snippets than for super correct markup.
So your issue with Rich Snippets now is that highest price is not correct.
You can easily fight this using schema.org/AggregateOffer. In your current code (light version):
<div class="wrapper product-view" itemscope itemtype="http://schema.org/Product">
<h1 itemprop="name" id="product_name">Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N</h1>
<img itemprop="image" src="http://induweb.nl/media/catalog/product/cache/1/image/185x/5e06319eda06f020e43594a9c230972d/import/Verspanen/Boren/Cylindrische schacht/100000002-induweb-spiraalboor-hss-rolgewalst-din-338-type-n_0/induweb.nl--100000002-30.jpg" alt="Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N" title="Induweb spiraalboor, HSS, Rolgewalst, DIN 338, type N" />
<table><tr><td itemprop="brand">InduWeb</td></tr></table>
<div itemprop="description">
<p>· Rolgewalst <br />· Cilinderschacht <br />· Rechtssnijdend <br />· Kegelmantelgeslepen 118° <br />· Zwarte uitvoering</p> </div>
<!-- Put http://AggregateOffer here with high and low price properties-->
<div itemprop="offers" itemscope itemtype="http://schema.org/AggregateOffer">
<meta itemprop="lowPrice" content="€ 0,13">
<meta itemprop="highPrice" content="€ 1.75">
<meta itemprop="offerCount" content="98">
</div>
<!-- End of AggregateOffer-->
<table>
<tr itemscope itemtype="http://schema.org/Offer" itemprop="offers">
<td itemprop="sku">
<div class="shipping shipping-176" itemprop="availability" content="in_stock"></div>
100010006
</td>
<!-- Start sub attributen -->
<!-- -->
<td class="a-center">1.0</td>
<!-- -->
<td class="a-center">34</td>
<!-- -->
<td class="a-center">12</td>
<!-- Einde sub attributen -->
<td class="a-center" style="width: 25px;"><p>10</p></td>
<td>
<span itemprop="price">
<span class="price">€ 0,13</span>
</span>
</td>
</tr>
</table>
</div>
Although it's not semantically super correct but it will give pretty good result:

Resources