html-agility-pack avoid parsing nodes within TextArea - html-agility-pack

Html-agility-pack seems to build nodes from elements within TextArea, which are not real nodes.
For example:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1255">
<title>Sample</title>
</head>
<body>
<TEXTAREA>Text in the <div>hello</div>area</TEXTAREA>
</body>
</html>
This will yield a child-node of "div" under the "textarea".
Browsers will treat everything as text.
Is there a way to compel html-agility-pack to behave the same way?
Clarification
I don't want the node to be created in the first place. If I run doc.DocumentNode.SelectNodes("//div") I want this to yield nothing. Right now I have to use doc.DocumentNode.SelectNodes("//div [not(ancestor::textarea]") but I have to do this for every select I perform to avoid phantom nodes.
Any ideas?

Use the InnerText property to get just the text of a node. This also gets the text of any child nodes (in this case the div).
var textArea = doc.DocumentNode.SelectSingleNode("//textarea");
string text = textArea.InnerText;

Issue has been fixed by the kind folks at zzzprojects.
Fix available and tested on version 1.8.2.
You can see the ticket here: Issue 183

Related

Can aria-label be used on the title element

I have a page where the <title> tag contains some text (specifically: the department name) that screen readers do not pronounce very well (the department's name is ‘AskHR’ -- it’s the HR department’s helpdesk).
I want to provide screen readers with a more pronounceable version (‘Ask H R’) whilst keeping the more stylised version for visual display. I was thinking of using aria-label to achieve this, but I’m uncertain whether it can be applied to the <title> element in the <head>.
Can anyone confirm whether or not this is valid?
I don't think this is valid.
First not all screen readers are made equal!
What you're trying to do may work in some but not in others. For example VoiceOver reads out "AskHR" as you would expect. (And ignores the aria-label attribute.)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title aria-label="xxx">AskHR</title>
</head>
<body>
<button aria-label="close">X</button>
</body>
</html>
I think this is perhaps closer to what you're trying to do but support is limited:
.label {
speak-as: spell-out
}
See https://developer.mozilla.org/en-US/docs/Web/CSS/#counter-style/speak-as
If we inspect the example above in Chrome, you see this for the <button> element:
The aria-label attribute takes over the button content. VoiceOver reads out "close" instead of "x".
However this is what we see for <title>:

W3C Validation error: there is no attribute X

I edit the post, and for many changes i have 1error : there is no attribute X
You have used the attribute named above in your document, but the
document type you are using does not support that attribute for this
element. This error is often caused by incorrect use of the "Strict"
document type with a document that uses frames (e.g. you must use the
"Transitional" document type to get the "target" attribute), or by
using vendor proprietary extensions such as "marginheight" (this is
usually fixed by using CSS to achieve the desired effect instead).
This error may also result if the element itself is not supported in
the document type you are using, as an undefined element will have no
supported attributes; in this case, see the element-undefined error
message for further information.
How to fix: check the spelling and case of the element and attribute,
(Remember XHTML is all lower-case) and/or check that they are both
allowed in the chosen document type, and/or use CSS instead of this
attribute. If you received this error when using the element
to incorporate flash media in a Web page, see the FAQ item on valid
flash.
Line 71, column 16: there is no attribute "property"
<meta property='og:locale' content='en_US'/>
How can i fix this?
Thanks in advanced.
1 Update:
I replace the
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
with :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
The error didn't any more, but i have any other errors.
2 update
i replace in the header.php the
<meta http-equiv="content-language" content="en_US" />
to:
<meta http-equiv="content-language" content="en_us" />
The second thing that i have done:
In the opengraph.php (Yoast plugin) i replace the:
if ( $echo )
echo "<meta property='og:locale' content='" . esc_attr( $locale ) . "'/>\n";
else
return $locale;
to:
if ( $echo )
echo "<meta property='og:locale' content='en_us'/>\n";
else
return $locale;
But the result is the same. 1 error.
The <meta> tag doesn't have an attribute called "property". You appear to be validating Open Graph protocol tags using the W3C's HTML validator. This is pretty much guaranteed not to work. It might be advantageous to look at Facebook's debugger tool. It should provide feedback on OG markup.

Is it valid to give a style element an ID?

It says here that it is not within HTML4, though I don't really see where that's spelled out in the text.
From what I can tell, based on this, it is ok to do so in HTML5 but I'm not entirely sure (assuming style is an HTML element?)
I am using this to rotate out a stylesheet and want it to be as valid as possible according to HTML5 specs, so wondering if I should rewrite it with a data-* element.
+1 Interesting question!
Instead of using a style block, you should consider linking (link) to your stylesheets and then switch them out by referencing an id or a class.
That said, title is perfectly acceptable for a style tag in HTML5. You can use this as a hook for your stylesheet switching.
http://www.w3.org/TR/html5/semantics.html#the-style-element
Fyi... this validates
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<style title="whatever"></style>
</head>
<body>
Test body
</body>
</html>
http://validator.w3.org/#validate_by_input+with_options
I've just put the following code into the W3C validator and it has no errors :)
<!DOCTYPE html>
<html>
<head>
<title>Title</title>
<style id="test"></style>
</head>
<body>
Test body
</body>
</html>
I think the W3C Validator is a good resource for this type of thing, it is marked as experimental but that's because the standard is yet to be be finalised.
It is not valid in HTML4 (as per the spec) and data-* attributes are not either. That is, the document will not validate against the Doctype spec if you use these attributes.
Regardless of whether the document validates or not, browsers will ignore elements that they do not recognize.
Style tags are DOM elements like any other tag, so you can add any attributes you want.

How to load static files from view HTML in web2py?

Given a view with layout, how can I load static files (CSS and JS, essentially) into the <head> from the view file?
layout.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{{=T.accepted_language or 'en'}}">
<head>
<title>{{=response.title or request.application}}</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<!-- include requires CSS files
{{response.files.append(URL(request.application,'static','base.css'))}}
{{response.files.append(URL(request.application,'static','ez-plug-min.css'))}}
-->
{{include 'web2py_ajax.html'}}
</head>
<body>
{{include}}
</body>
</html>
myview.html
{{extend 'layout.html'}}
{{response.files.append(URL(r=request,c='static',f='myview.css'))}}
<h1>Some header</h1>
<div>
some content
</div>
In the above example, the "myview.css" file is either ignored by web2py or stripped out by the browser.
So what is the best way to load page-specific files like this CSS file? I'd rather not stuff all my static files into my layout.
In myview.html reverse the first two lines
{{response.files.append(URL(r=request,c='static',f='myview.css'))}}
{{extend 'layout.html'}}
Mind that 1.78.1 and 1.78.2 had a bug did not allow this to work. It was fixed in 1.78.3 on the same day. The response.file.append(...) can also be moved in the controller action that needs it. You are not supposed to put logic before extend but you define variables to be passed to the extended view.

Why does the use of the Frameset DTD cause a validation failure?

The project I work on takes random HTML files, converts them to XHTML as best as it can, and wraps them with some XML metdata. The DOCTYPE is stripped out as the resulting XML file is not an XHTML document. However when retrieving the wrapped XHTML from the XML file the DOCTYPE should be reinserted.
Because these are random HTML files they could contain any content, but I would prefer to not have to store or determine the original DTD. I figured that I should the Frameset DTD as it was just a superset of the Transitional DTD and would be valid for all content. However when using the W3C XHTML Validator with the same document, using the Transitional DTD passes but using the Frameset DTD fails.
I've stripped down the document to the minimum with which I can reproduce the problem. Here is the Frameset version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
And here is the Transitional version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
Please explain why this is happening, and how I should proceed.
Frameset DTD is not a 'superset' of transitional. It is a special DTD only used for laying out frames, not content (except inside <noframes> tag). It allows only <head> and <frameset> as the children of <html> tag.
Here is the specification.
Unless you know your page could have frames, stick to transitional or strict DTDs.
As Chetan pointed out, the Frameset DTD should only be used in case you need frames, and even then, I would recomend on using Transitional instead. If you don't rely on frames, Strict is the way to go.

Resources