Language characters showing as question mark boxes - utf-8

I the following text (used for testing):
TÄSTåÄ
It's showing on the page as:
T�ST��
I have utf-8 as my content-type:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I'm using font-family:
"lucida
grande",tahoma,verdana,arial,sans-serif
Any idea why thse characters won't show up properly?
Thank you!

You need to be sure that the file is being saved as a UTF-8 encoded file and not as plain text.

Pls make sure you saved file in 'UTF-8' or 'utf-8 without BOM'. If you used UTF-8 only for encoding your html page, please check the DOCTYPE is valid.

Related

Why are my search results not in the same charset as my page encoding?

I am using UTF-8 encoding for an html page.
<head>
<meta charset="utf-8">
In the debugger console, document.characterSet returns "UTF-8".
On the page, I have metadata (keywords, description, title) with a valid UTF-8 character: '®', which is UTF-8: 'c2ae'
The character displays correctly in the view source, and on the page title.
But google search results and bing search results are showing it as 'î'. That is, during the web crawl, it appears to be getting converted to ISO-8859-1 or Western-1252 displaying both bytes: 'c2' and 'ae'.
If I replace the character with ® => (\u00ae) it shows correctly.
Short of converting my meta data to ISO-8859-1, is there a best practice I should be using for this?
Issue was on the back-end, the data was not being transcoded to UTF-8 properly when read from cache. So, I feel the best practice is to use the native UTF-8 BMP character, with the proper page encoding, and not be required to use html entity values.
Look at the pages meta tags and confirm that it is not using this:
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
For HTML5 Google recommends:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
Also note this:
Note:
<meta charset="">
Another Note:
Some characters are reserved in HTML. "Html Entities"
These reserved characters in HTML must be replaced with character entities.
e.g.
& ampersand & &
® registered trademark ® ®

UTF-8 Special Characters not working in header file

I am facing a strange issue with wkhtmltopdf. While in Footer and Content, all special characters are shown as supposed, in the header file, they don't show up or get replaced by blocks with a question mark. All of the three files are built the same:
<!DOCTYPE html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
</body>
</html>
Is there some kind of trick to tell the header to use utf-8 to?
Btw, I am already telling wkhtmltopdf to use utf-8 not only as meta but also in the script call:
--encoding utf-8
EDIT: I am using an html-header (as you can see on the code I posted). Every 3 HTML files are built the same way. But while it is working in the content, the header don't likes my special chars. Maybe it is a problem that the content in the header comes from a $_POST variable while in the content, the text is built out of the db?

UTF8 -- still showing weird characters?

My webpage's response headers show this:
Content-Type:text/html; charset=UTF-8
However, I still get a black diamond with white question mark for characters like é. What am I supposed to do exactly? It's my .htaccess that's setting UTF-8.
If its a script or HTML file, check the encoding of the file itself, which should be saved as UTF-8.
In Zend, its something like: Edit->Set encoding->Other: UTF-8,
If you are serving a HTML page you need to indicate in the HTML file that the content is UTF-8.
You can do this by adding a meta html tag to your header section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

HTML unit displaying wrong characters

I'm using HTMLUnit. I am accessing the pages however special (Maltese) characters are being displayed wrongly. For example, ġuvni is displayed as ?uvni
HtmlPage page = submit_button.click();
System.out.println(page.asText());
I suspect it's an encoding problem, though I don't find any page.setPageEndoding or some similar method... Has anyone had such a problem before?
Thanks!
Make sure your page is in UTF-8 by putting this meta tag in your <head>:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Firefox and UTF-16 encoding

I'm building a website with the encoding UTF-16. It means that every files (html,jsp) is encoded in UTF-18 and I set in the head of every HTML page :
<meta http-equiv="content-type" content="text/html; charset=UTF-16">
My index page is correctly displayed by Chrom and IE. However, firefox doesn't render the index. It displays 2 strange characters and the full index page code :
��<!DOCTYPE html> <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-16"> ...
Do you know the reason? It should be a problem of encoding, but I don't know where it's located...
Thanks
(Disclosure: I’m the developer responsible for the relevant code in Firefox.)
I'm building a website with the encoding UTF-16.
Please don’t. The short rules are:
Never use UTF-16 for interchange.
Always use UTF-8 for interchange.
If you break rules 1 & 2 and still use UTF-16, at least use the BOM (the right one).
But seriously, don’t break rules 1 and 2.
If you include user-provided content on your pages, using UTF-16 means that your site is vulnerable to socially engineered XSS at least in older browsers. Try this demo in an old version of Firefox (version 20 or older) or in a Presto-based version of Opera.
To avoid the vulnerability, use UTF-8.
It means that every files (html,jsp) is encoded in UTF-18
Uh oh. :-)
and I set in the head of every HTML page :
<meta http-equiv="content-type" content="text/html; charset=UTF-16">
A meta tag works as an internal encoding declaration only when the encoding being used maps the bytes of the meta tag to the same bytes ASCII would. That’s not the case for UTF-16.
Do you know the reason?
Not without full response headers and the original response body in a hex editor. The general solution, as noted above, is to use always UTF-8 and never to use UTF-16 over HTTP.
If your content is in a language for which UTF-16 is more compact than UTF-8, two things:
All the HTML, JS and CSS on the page is more compact in UTF-8.
gzip makes the difference go away.
Check that the server sends a Content-Type header with the correct encoding.

Resources