Why is my html encoding causing an error? - validation

I am working on a website and when I try to validate the page get the following error:
The character encoding specified in
the HTTP header (iso-8859-1) is
different from the value in the
element (utf-8). I will use the value
from the HTTP header (iso-8859-1) for
this validation.
Here is the code in my header:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8"/>
I don't see where the iso-8859-1 is coming from. Any suggestions?
Thanks.

It's the webserver which specify the encoding in the HTTP header. It set it to iso-8859-1. But in your page, you wrote:
<meta http-equiv="Content-type" content="text/html;charset=UTF-8"/>
these two values are incompatible. I can only suppose that the webserver is right (it has sent the data anyway), and the validator makes the same assumption.
If you want to send UTF-8 encoded files, check that the content is really UTF8 encoded, and check the header informations. Ultimately the behavior depends on the webserver configuration and page generation.

That's the header of your HTML file, not the HTTP headers the server is sending. The meta element defines equivalents to HTTP headers. If both an HTTP header is sent and a meta element exists with the equivalent, the user agent must decide which to use. It might work in your browser but it seems the validator your are using gives precedence to the actual HTTP header.
So you have to figure out how to make your server send the correct Content-type header. If your page is generated by a PHP script you can use header('Content-type:text/html;charset=UTF-8'); at the beginning of your script to fix it.

Check the default HTTP headers that are sent (you can see this in firebug in the NET tab, if you use it).
There is probably a Content-Type header set to iso-8859-1.
HTTP headers are is different from the HTML header (which is part of the body of the HTTP message) - where your META tag specifies UTF-8 as the content type.
Since the two values are incompatible, you are getting an error.
Solution:
Make both content-types identical (either UTF-8, or iso-8859-1)

Related

"Stray doctype" error in firefox source code viewer

Since I learned to serve XHTML pages as XML, I have started noticing something odd: whenever I view an XHTML page in the Firefox source code viewer, the DOCTYPE is always marked as an error. According to the tooltip I get from mousing over it, the error in question is a "stray doctype". From what I understand, a "stray doctype" means that there is an extra DOCTYPE in the middle of the document where it doesn&apos;t belong, which is certainly not the case here.
Here&apos;s an example - this markup will pass validation, and display correctly in all modern browsers:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--FF source viewer will mark the preceding two lines as an error.-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="content-type"
content="application/xhtml+xml; charset=utf-8" />
<title>Sample XHTML Page</title>
</head>
<body>
<p>This is an example.</p>
</body>
</html>
This error message is especially odd, considering that these pages pass validation perfectly, and that a single parsing error would normally break the page.
I am the developer of this feature. You have found a bug. (Filed just now.) Thanks.
View Source syntax highlighting is based on the HTML parser, because our XML parser is not suited for the purpose and XML is rare enough that it doesn't make sense to put resources into implementing proper XML View Source. Hence, the XML View Source feature is a hack on the HTML parser and this aspect doesn't work quite right.
The error appears because the file is saved as UTF-8 BOM instead of UTF-8. Open the file in Notepad and change its encoding.
In addition to #Public Sphere's answer.
This warning can also occur when using <!DOCTYPE html>.
Probably the same warning is then also shown for the <html>, <head> and <body> tags (stray start tag "html").
To check if UTF-8 BOM is the problem:
Click the 'network' tab
Click the first request
On right details panel, click 'Response' tab and expand 'Response Payload'
You'll see the raw response now.
A red dot is in front of the doctype line,
and on hover it displays "\ufeff"
To easily find the files that could cause the problem, you can, in Linux, use this grep to find all files with BOM:
grep -rl $'\xEF\xBB\xBF' .

Opera and Safari aren't displaying latin1 characters

I'm having trouble displaying latin1 characters such as "ç", "ã" or "À" in the latest versions of Safari and Opera. I receive data (JSON) from a RoR backend using Ajax and JQuery (Latin1 charset) and the webpage itself relies on Latin1, thanks to:
<?php header('Content-Type: text/html; charset=ISO-8859-1');?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:og="http://ogp.me/ns#"
xmlns:fb="http://www.facebook.com/2008/fbml"
lang="pt">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
The custom Javascript lib i made also specifically states ISO-8859-1 when I perform the include some ten lines later on:
<script type="text/javascript" src="js/lib.js" charset="ISO-8859-1"></script>
Nevertheless, both browsers fail to display the characters afterwards. Safari shows the infamous black diamond, while Opera simply shows a blank space.
Any ideas? Thanks in advance
Most likely wrong charset sent in your Content-type: HTTP header for the JSON data. In your post you show the headers and META tags for the page itself and the included SCRIPT, but assuming the JSON data is sent separate it will be labelled separately. It would help to get a link to a page with this problem, but if you don't want to post one you can use a tool like Microsoft Fiddler HTTP debugger to inspect the headers that are being sent back and forth between the browser and the web site. If the web server sends
Content-type: text/html;charset=UTF-8
for a file with content in "latin" (iso-8859-1) or vice versa, that's your problem. Fix the HTTP header and you'll be fine.

Setting the character encoding in Day CQ

I've got some markup that I'm adding to a page component in Day CQ that was UTF-8 encoded by the author. Initially I couldn't save it in CRXDE, b/c the editor was set to save in ISO-8859-1. I found the setting to change this, but now when the page using this component is rendered to the browser, some of the characters appear to be using a different encoding. Is there a setting for the CQ web server, or servlet engine that I need to change? I'm running CQ 5.3 on Windows 7.
Edit: The HTTP Headers have Content-Type: text/html;charset=UTF-8 and there is a meta tag that specifies meta http-equiv="Content-type" content="text/html; charset=utf-8"
I believe the solution was to add pageEncoding="UTF-8" to all JSP's that are part of rendering this page. I also modified the web.xml file per this link: http://www.coderanch.com/t/87264/Tomcat/Character-Encoding-Tomcat, and restarted the server a number of times.

Japanese characters are not displayed in IE6

I am trying to display the japanese characters in my page. The page is working in all browsers except IE6. I noticed some sites http://translation.babylon.com/english/to-japanese/ display japanese characters as boxes. As i said earlier the page is working in all browsers except IE6.
The header i am using in the page is
!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
and UTF-8 encoding
Could you please help to find out what is the issue.
Thank you
Usually content developer has to write right meta-tag for correct character decoding. Like this.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If there is no meta tag on the content, the browser has to decode the page by own auto decoding method. But auto decoding is not perfect. Sometime it works, sometime it doesn't work.

Trouble with http header specifying character encoding iso-8859-1 rather than utf-8?

I have recently designed a website that contains German and Dutch characters and I would like the page to use character encoding utf-8.
I have added the xml declaration:
<?xml version="1.0" encoding="UTF-8"?>
and the meta tag:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
When I viewed the website on-line, the special characters found in the German text were not displaying correctly. When I tried validating the page with the w3c validator, I got the following warning:
The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the XML declaration (utf-8). I will use the value from the HTTP header (iso-8859-1).
Is this a server issue? It's just that I have uploaded the same files to a different server of mine and the pages display correctly there using utf-8.
Any help or advice regarding how I would go about getting the page to encode as utf-8 would be greatly appreciated.
I'm stumped!
Thanks to jason, I found a file named mod_mime-defaults.conf
this file contains the following:
# AddDefaultCharset UTF-8
AddDefaultCharset ISO-8859-1
If I remove the # from before AddDefaultCharset UTF-8, do you think this will help? Or maybe add a # before AddDefaultCharset ISO-8859-1.
I tried editing this file, but I don't think I have permission. Hmmm...?
This could be a server issue.
If you are using Apache check the Apache config file usually located here /etc/httpd/conf/httpd.conf on a *nix server, for the value of AddDefaultCharset.
This setting specifies the default for all content served. If it is commented out, that means it will rely on the browser's, or META settings to determine the Charset.
The HTML meta tag is not the same as the HTTP response header. You need to set the character encoding in the HTTP response header. As per your question history you're using PHP -or are at least familiar with it-, so here's a PHP targeted example of how to do it.
Put the following line in the PHP file before you echo any character.
header('Content-Type: text/html;charset=UTF-8');
See also:
PHP UTF-8 cheatsheet
Why do we need the meta content type tag?
Unrelated to the problem: you shouldn't put a XML declaration on a HTML page. This is recipe for other sort of trouble.
I changed charset=UTF-8 to charset=iso-8859-1, and the warning went away.

Resources