UTF-8 Special Characters not working in header file - wkhtmltopdf

I am facing a strange issue with wkhtmltopdf. While in Footer and Content, all special characters are shown as supposed, in the header file, they don't show up or get replaced by blocks with a question mark. All of the three files are built the same:
<!DOCTYPE html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
</body>
</html>
Is there some kind of trick to tell the header to use utf-8 to?
Btw, I am already telling wkhtmltopdf to use utf-8 not only as meta but also in the script call:
--encoding utf-8
EDIT: I am using an html-header (as you can see on the code I posted). Every 3 HTML files are built the same way. But while it is working in the content, the header don't likes my special chars. Maybe it is a problem that the content in the header comes from a $_POST variable while in the content, the text is built out of the db?

Related

Why are my search results not in the same charset as my page encoding?

I am using UTF-8 encoding for an html page.
<head>
<meta charset="utf-8">
In the debugger console, document.characterSet returns "UTF-8".
On the page, I have metadata (keywords, description, title) with a valid UTF-8 character: '®', which is UTF-8: 'c2ae'
The character displays correctly in the view source, and on the page title.
But google search results and bing search results are showing it as 'î'. That is, during the web crawl, it appears to be getting converted to ISO-8859-1 or Western-1252 displaying both bytes: 'c2' and 'ae'.
If I replace the character with ® => (\u00ae) it shows correctly.
Short of converting my meta data to ISO-8859-1, is there a best practice I should be using for this?
Issue was on the back-end, the data was not being transcoded to UTF-8 properly when read from cache. So, I feel the best practice is to use the native UTF-8 BMP character, with the proper page encoding, and not be required to use html entity values.
Look at the pages meta tags and confirm that it is not using this:
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
For HTML5 Google recommends:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
Also note this:
Note:
<meta charset="">
Another Note:
Some characters are reserved in HTML. "Html Entities"
These reserved characters in HTML must be replaced with character entities.
e.g.
& ampersand & &
® registered trademark ® ®

How to escape to htmlentities except for html tags in smarty

Example:
$smarty->assign('string', '<p>Germans use "Ümlauts" and pay in €uro</p>');
{$string|escape|unescape:"html"}
results in:
<p>Germans use 'Ümlauts' and pay in €uro</p>
What am I doing wrong...
You should also add UTF-8 to escape function as in documentation: http://www.smarty.net/docsv2/en/language.modifier.escape
There are more than one reasons why this can occur.
Check the encoding of
your php files,
your template files and
your html output (doctype and meta tags),
usually it is one of those which provokes this.
To avoid this kind of issue, in many cases the best way is to use utf8 throughout your project, which means converting smarty templates and php to utf8 and use proper utf8 tags in your html header.
HTML 4.01:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
HTML5:
<meta charset="UTF-8">

UTF8 -- still showing weird characters?

My webpage's response headers show this:
Content-Type:text/html; charset=UTF-8
However, I still get a black diamond with white question mark for characters like é. What am I supposed to do exactly? It's my .htaccess that's setting UTF-8.
If its a script or HTML file, check the encoding of the file itself, which should be saved as UTF-8.
In Zend, its something like: Edit->Set encoding->Other: UTF-8,
If you are serving a HTML page you need to indicate in the HTML file that the content is UTF-8.
You can do this by adding a meta html tag to your header section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

HTML unit displaying wrong characters

I'm using HTMLUnit. I am accessing the pages however special (Maltese) characters are being displayed wrongly. For example, ġuvni is displayed as ?uvni
HtmlPage page = submit_button.click();
System.out.println(page.asText());
I suspect it's an encoding problem, though I don't find any page.setPageEndoding or some similar method... Has anyone had such a problem before?
Thanks!
Make sure your page is in UTF-8 by putting this meta tag in your <head>:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Language characters showing as question mark boxes

I the following text (used for testing):
TÄSTåÄ
It's showing on the page as:
T�ST��
I have utf-8 as my content-type:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I'm using font-family:
"lucida
grande",tahoma,verdana,arial,sans-serif
Any idea why thse characters won't show up properly?
Thank you!
You need to be sure that the file is being saved as a UTF-8 encoded file and not as plain text.
Pls make sure you saved file in 'UTF-8' or 'utf-8 without BOM'. If you used UTF-8 only for encoding your html page, please check the DOCTYPE is valid.

Resources