Properties outside of 'itemscope' are assumed to belong to 'WebPage', but this creates invalid Microdata - microdata

Microdata allows elements with itemprop but without parent itemscope, as long as they are referenced by an itemref somewhere on the page. (See my question Is 'itemprop' without parent 'itemscope' valid? Does it create an item?).
So this example should be valid:
<body>
<div itemprop="email" id="orphan">
alice#example.com
</div>
<div itemscope itemtype="http://example.org/Person" itemref="orphan">
<span itemprop="name">Alice</span>
</div>
</body>
Now, when someone is using the Schema.org vocabulary instead (replacing "example" with "schema" in the itemtype value), it’s my understanding that this example would no longer be valid, because on http://schema.org/WebPage it says:
Every web page is implicitly assumed to be declared to be of type WebPage, so the various properties about that webpage, such as breadcrumb may be used. We recommend explicit declaration if these properties are specified, but if they are found outside of an itemscope, they will be assumed to be about the page
So this would mean that the following items and name-value pairs would be created:
Item <http://schema.org/Person>
name: Alice
email: alice#example.com
Item <http://schema.org/WebPage>
email: alice#example.com
But http://schema.org/WebPage can’t have an email property, so this is invalid Microdata, as in this case the itemprop value has to be
[…] a defined property name allowed in this situation according to the specification that defines the relevant types for the item
So this statement on http://schema.org/WebPage, if respected by consumers and implementors, would result in invalid Microdata in cases where an element with itemprop has no itemscope parent and the property is not allowed on WebPage.
Is this correct or am I missing something?
How should I deal with this? Ignore this statement? AFAIK Microdata doesn’t require to follow those "informal rules", right?

It's not often discussed in these terms, but validity is a relative concept. A document is valid or invalid only with respect to the requirements of a particular specification.
In this case, the microdata spec says nothing about WebPage, so the inclusion of the itemprop in an implicit WebPage item has no effect on the validity of your document with respect to the Microdata spec.
On the other hand, with respect to the schema.org WebPage spec, your document, or at least the WebPage item within it, would be invalid.
Whether this matters to you or not is your choice. There are really only two practical outcomes. A consumer of the microdata of your page can either create a WebPage item, or not create WebPage item. It is highly unlikely that a consumer would refuse to create a WebPage item because of the presence an additional out-of-schema itemprop, where it would have created the WebPage item otherwise.
And that's ultimately what validity is all about. It's there to establish a common language that producers and consumers of documents both understand. Providing that consumers understand the information that producers provide, technical violations of the validity rules of any particular specification are of little consequence.

Related

Schema markup: WebPage vs Article

I'm providing a website about my health-related services, with a few pages describing my practice and services, my approach of work, and many articles about specific topics related to my field of work (imagine what a doctor or therapist is doing, that should give the idea).
But I'm confused whether to define my pages as "Webpage" or "Article"?
I defined them as 'Article' now, which in turn disallows me from tagging my phone number with
<span itemprop="telephone">
though, according to Google's Structured Data testing tool.
In the typical case, you would use both. You could provide a WebPage item on every page, and if the web page contains an article, you could provide an Article item in addition (or multiple, of course).
For a page dedicated to an article, you could use the mainEntity property to denote that the Article is the primary thing on that page:
<body itemscope itemtype="http://schema.org/WebPage">
<article itemprop="mainEntity" itemscope itemtype="http://schema.org/Article">
</article>
</body>
Neither a web page nor an article can have a telephone number (at least not in typical cases, which is why Schema.org doesn’t define the telephone property for WebPage/Article). A telephone number typically belongs to a person or an organization, which are among the types that can have the telephone property.
So you need an item that represents your business: in your case probably LocalBusiness. Then you can provide this item as author of the WebPage and/or the Article etc.
PS: Whenevery you use a type, check if a more specific child type applies in your case. So in your case maybe something like MedicalWebPage, NewsArticle, HealthAndBeautyBusiness, etc.

Joomla component "attachments" allow html in input

this question might be a bit special. I am using this Joomla 2.5 extensions to give authors the abilty to add Attachments to articles: Joomla Attachments
The extension renders an input field called "description" in a backend form to insert an file description for the provided file. Unfortunately it´s not taking HTML tags which I need. By saving the form it seems a strip_tags() or preg_replace() or something similar cleans the input. I combed through the code of the attachments extension but couldn´t find a place where the input is cleaned or saved.
To hopefully stay in the Question + Answer rule of Stackoverflow:
Is there a class which extensions inherit from the Joomla Core to save form data to a DB-table ( which also could be responsible to clean and validate user input )?
thanks for any idea,
tony
You should see how the field is defined first:
1. Form definition
look into the
administrator/component/yourcomponent/models/forms/somename.xml
there you could find a form definition, if so it will also specify the field type: depending on the type there are several available filters; for example the default textarea will strip html, and you need to set
filter="raw"
in order to enable it. see http://docs.joomla.org/Standard_form_field_and_parameter_types for a list of fields, click and you can find the available format options.
2. model
If the model inherits from JModelAdmin or JModelForm or other JModel* it will automatically handle binding of the forms' data to the database, look for the Save function which should receive the form $data.
3. more
There are at least another dozen possibilities. If the above didn't help, try finding the form: possibly you could find it just by looking at the markup. Once you have the form, check the following fields:
option
task
view
This should help you find the php code that is invoked based on the form:
if view is set, maybe in ./views/someview/view.html.php you could find the saving logic.
if task is set, look for a function with the same name in ./controller.php
if task contains a ".", look for the controller in the ./controllers/ folder.
if option is not the name of your component, your component is sending the data to another component for saving, and most likely set a return-url

rel="canonical" html5 unable to validate

Should the link attribute rel="canonical" validate against html5?
It is the first time I am using this and I am getting the following validation errors:
"Bad value canonical for attribute rel on element link: Keyword canonical is not registered."
It kind of suggests it shouldnt although I cant find any concrete documentation on this.
Edit - Here is the line that is throwing the validator off:
<link rel="canonical" href="http://dev.local/" />
I have tried it with and without the closing slash
Validator is marking canonical invalid because canonical is a recent addition (as Gutmann pointed out) and the validation tool does not update real time. The W3C tells us why canonical is on the Micorformats wiki but does not validate. It is because the updates of wiki data in the validator are manual.
You will see this in the validation reporting:
"A whitespace-separated list of link types listed as allowed on in the HTML specification or listed as an allowed on on the Microformats wiki without duplicate keywords in the list. Note that updates of the wiki data in the validator are manual and do not happen in real time"
This will validate on their next manual update.
There is no definition for the canonical rel in the HTML5 spec, but it does also say that:
The rel attribute has no default value. If the attribute is omitted or if none of the values in the attribute are recognized by the user agent, then the document has no particular relationship with the destination resource other than there being a hyperlink between the two.
So, it's not technically conforming HTML5, but it will simply be ignored by UAs which don't understand it.
Use this data-rel
<link data-rel="canonical" href="http://dev.local/" />
In error description they do reference Microformats wiki as list of valid link types, their list contains "canonical" link type.
I believe that this a temporary bug in validator because it reported error for rel="canonical" only on my HTML5 website, but not on my other XHTML website.
From what I can gather from the output of the validator and the part of the spec that defines the link's rel attribute I'd say the validator is marking the document as invalid due to the "canonical" type being only a proposal and not part of the official linkTypes right now.
At the same time that page also says ...
Types defined as extensions in the
Microformats wiki existing-rel-values
page with the status "proposed" or
"ratified" may be used with the rel
attribute on link, a, and area
elements in accordance to the "Effect
on..." field. [MFREL]
The validation message refers to this list of currently valid extensions to the "official catalog".
Up until June 2 this list did not contain the canonical link type so the validator was IMO correct in marking the document as invalid.
But now that the canonical type is is in the list of proposed types, I think this is just a matter of time before also the validator will recognize it :-)

How to get rid of this w3 validation error?

I developed a web page and now i am validating it with w3c HTML4.0... I got one error it says
Error Line 30, Column 57: there is no attribute "DATA-FLEXMENU"
href="about.php" class="mainlink" data-flexmenu="flexmenu1">About Us</a></div>
You have used the attribute named above in your document, but the document type you are using does not support that attribute for this element. This error is often caused by incorrect use of the "Strict" document type with a document that uses frames (e.g. you must use the "Transitional" document type to get the "target" attribute), or by using vendor proprietary extensions such as "marginheight" (this is usually fixed by using CSS to achieve the desired effect instead).
Any ways of getting rid of this error .... Any suggestion...
data attributes are present in HTML5.
see HTML 5 data- Attributes
Either you can change the doctype to html5 or remove the data attribute.
You can specify HTML 5 doctype like
<!DOCTYPE html>
As unhelpful as it will sound, either remove the attribute "data-flexmenu" from your markup, or accept a non-valid result.
The results are accurate, the A element in the DTD does not contain an attribute definition for the attribute data-flexmenu.
Alternatively you could define your own DTD and host it on a central server and reference that instead of the w3c one.
Or (as pointed out) use HTML5 DTD not HTML 4 DTD

Is the concept of a link inseparable from its html markup?

I'm looking for a strategy for managing links within articles. The body of the article is saved in a database and pulled during page assembly. What all should be saved in the database to easily define and manage links?
Some purists believe that markup should NEVER be stored in the database. Some believe its ok in moderation. But to me, the notion of a link is almost inseparable from its html markup.
Is there a better, more succinct way of representing a link in an article (in a database) than simply embedding "anchor text"?
One idea I've kicked around involves embedding just enough markup to semantically describe areas of interest, and in a different table, map those notions to actual URLs. All encounters of a particular notion get wrapped with the link.
<p>Here is an example of a
<span class="external-reference semantic-web">semantic</span>
approach to link management.</p>
A table then might associate the URL of the article and the key class of 'semantic-web' to a URL like http://en.wikipedia.org/wiki/Semantic_Web
<p>Here is an example of a <span class="external-reference semantic-web">
semantic</span>
approach to link management.</p>
Things I like about this approach is that all my URLs are in one location in the database. I could technically change or remove links without touching the body of the article. I have very good class names for CSS.
I don't like having another table to maintain, and another step/phase in render time. It could slow down response time.
Are there any other strategies out there that provide superior link management?
You may want to look at templating (such as Smarty for PHP).
I agree that markup shouldn't normally be held in the database.
However, you might also consider implementing a "pointer" concept, where at each link, you break your storage of the page, add a pointer in the table to the link, then a pointer in the link table to the next segment of content for the page. (I have no idea how complicated that would be - just an idea.)
Or look at how various CMS tools handle the idea. Some just put everything in the database as one big block of text, while others rely on templating, and others may do something else entirely (like object-oriented environments such as Plone).
There are a few attempts to do this that I have seen.
One way to do this is through URL redirects. You can implement a logic component on the server that will interpretate what the URL is requesting rather than a path to the content.
Another attempt is that the links orginally set to a reference value [which can be looked up in a database], and is requested at runtime/generation.
Regardless, you will have to reference the material that you wish to link to with some sort of identifier.

Resources