How to Properly Define UTF-8 Charset in in <head> Tag Section of Web Document - utf-8

If my doc type is <!DOCTYPE html> is it best or more correct to use
<meta charset="utf-8" />
or
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
to define utf-8?
Thanks.

The first one is only valid with HTML5.
The second one is also valid for older (X)HTML versions
With this doctype (indicating HTML5) both are valid, I prefer the first as it is shorter. :)

Related

W3C validation - end tag for "meta" omitted, but OMITTAG NO was specified

I am getting an error as shown below
Error Line 6, Column 467: end tag for "meta" omitted, but OMITTAG NO was specified
…ta,Manufacturing_Industrial,Educational_Training,Teacher,Engineering_Projects">
✉
You may have neglected to close an element, or perhaps you meant to "self-close" an element, that is, ending it with "/>" instead of ">".
How can I solve this ?
Please see the source code of the page below
<!doctype html>
<html>
<head>
<title>Jobslamp-free online resume creation and sharing,fresher jobs,experienced jobs,India jobs,Kerala jobs</title>
<meta name="keywords" content="Karnataka,Bangalore_Rural,Healthcare,Office_Assistant,Kerala,Ernakulam,IT_Hardware_Networking,Engineer,Sales___Marketing,Executive,Maharashtra,Mumbai_City,Retailing,Manager,Kollam,CRM_CallCentres_BPO_ITES_Med.Trans,Customer_Care,Hotel_Travel_Tourism_Airlines_Hospitality,Front_Office_Staff,Andhra_Pradesh,Hyderabad,IT_Software,Java_Developer,Pathanamthitta,Manufacturing_Industrial,Educational_Training,Teacher,Engineering_Projects">
<meta name="description" content="The best job oriented resume sharing system. Create and Publish your online resumes for FREE. Search and apply your dream jobs for FREE. Post your jobs for FREE.">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Thanks in advance
The error message points out the solution: replace the ending > with />.
<meta name="keywords" content="all your keywords" />
You'll need to fix the other meta tags the same way.

Importing HTML table into OO Calc as UTF8 without converting to entities

I have a problem when opening a HTML table in OpenOffice or LibreOffice if it contains UTF8 extended characters like ÅÄÖåäö.
When opening the table into M$ Excel it works as intended but I can't make OO do the same thing.
By converting all extended characters to its HTML entity eqivalent Å etc. it works but it would be nice to get the correct characters directly.
Is there anyone who knows what I should do?
The following content I have in a file called excelsample.xls and if I open that with OO Calc it will not look nice.
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="content-type" content="application/vnd.ms-excel" charset="UTF-8">
<meta charset="UTF-8">
</head>
<body>
<table>
<tr>
<td>Prawn sandwich</td><td>Räksmörgås</td>
</tr>
</table>
</body>
</html>
Your meta tag is malformed and OO doesn't probably recognize the html5 charset tag.
So fix it with:
<meta http-equiv="content-type" content="application/vnd.ms-excel; charset=UTF-8">

Character encoding in ruby

I am parsing some data from one Holland site using Nokogiri, and saving data into csv. But data are not correctly displayed. For example on form thre is Einddatum1 empty space but when I print it into console before saving it is showed as "\u00A0". Also other strings are not correctly displayed, for example "Univ\u00E9 Zorg Geregeld Polis".
{:Bsn=>"112511111",
:Verzekerde=>"VerzekerdeAHM Andes-Faasse",
:Pakketnaam1=>"Univ\u00E9 Zorg Geregeld Polis",
:Verzekerdennummer1=>"1234987654",
:Begindatum1=>"01 jan 2012",
:Einddatum1=>"\u00A0",
}
Maybe header of this html page could be relevant:
<!doctype html>
<!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ -->
<!--[if lt IE 7 ]> <html class="no-js ie6" lang="en"> <![endif]-->
<!--[if IE 7 ]> <html class="no-js ie7" lang="en"> <![endif]-->
<!--[if IE 8 ]> <html class="no-js ie8" lang="en"> <![endif]-->
<!--[if (gte IE 9)|!(IE)]><!--> <html class="no-js" lang="en"> <!--<![endif]-->
<head id="Head1"><meta charset="utf-8" />
<!-- Always force latest IE rendering engine (even in intranet)
Remove this if you use the .htaccess -->
<meta http-equiv="X-UA-Compatible" content="IE=edge" /><title>
Verzekeringsrecht controleren
</title><meta http-equiv="cache-control" content="no-cache" /><meta http-equiv="content-language" content="nl-NL" />
It seams like it's utf-8 but there is problem with these characters. How to properly encode them?
Then the line would read :Pakketnaam1=>"Univé Zorg Geregeld Polis",
Is that what is supposed to be there and your console encoding is just not defined so Ruby does not know how to display the Unicode characters when printing them or should there be some more text?

HTML unit displaying wrong characters

I'm using HTMLUnit. I am accessing the pages however special (Maltese) characters are being displayed wrongly. For example, ġuvni is displayed as ?uvni
HtmlPage page = submit_button.click();
System.out.println(page.asText());
I suspect it's an encoding problem, though I don't find any page.setPageEndoding or some similar method... Has anyone had such a problem before?
Thanks!
Make sure your page is in UTF-8 by putting this meta tag in your <head>:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Why does the use of the Frameset DTD cause a validation failure?

The project I work on takes random HTML files, converts them to XHTML as best as it can, and wraps them with some XML metdata. The DOCTYPE is stripped out as the resulting XML file is not an XHTML document. However when retrieving the wrapped XHTML from the XML file the DOCTYPE should be reinserted.
Because these are random HTML files they could contain any content, but I would prefer to not have to store or determine the original DTD. I figured that I should the Frameset DTD as it was just a superset of the Transitional DTD and would be valid for all content. However when using the W3C XHTML Validator with the same document, using the Transitional DTD passes but using the Frameset DTD fails.
I've stripped down the document to the minimum with which I can reproduce the problem. Here is the Frameset version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
And here is the Transitional version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
Please explain why this is happening, and how I should proceed.
Frameset DTD is not a 'superset' of transitional. It is a special DTD only used for laying out frames, not content (except inside <noframes> tag). It allows only <head> and <frameset> as the children of <html> tag.
Here is the specification.
Unless you know your page could have frames, stick to transitional or strict DTDs.
As Chetan pointed out, the Frameset DTD should only be used in case you need frames, and even then, I would recomend on using Transitional instead. If you don't rely on frames, Strict is the way to go.

Resources