Character encoding in ruby - ruby

I am parsing some data from one Holland site using Nokogiri, and saving data into csv. But data are not correctly displayed. For example on form thre is Einddatum1 empty space but when I print it into console before saving it is showed as "\u00A0". Also other strings are not correctly displayed, for example "Univ\u00E9 Zorg Geregeld Polis".
{:Bsn=>"112511111",
:Verzekerde=>"VerzekerdeAHM Andes-Faasse",
:Pakketnaam1=>"Univ\u00E9 Zorg Geregeld Polis",
:Verzekerdennummer1=>"1234987654",
:Begindatum1=>"01 jan 2012",
:Einddatum1=>"\u00A0",
}
Maybe header of this html page could be relevant:
<!doctype html>
<!-- paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/ -->
<!--[if lt IE 7 ]> <html class="no-js ie6" lang="en"> <![endif]-->
<!--[if IE 7 ]> <html class="no-js ie7" lang="en"> <![endif]-->
<!--[if IE 8 ]> <html class="no-js ie8" lang="en"> <![endif]-->
<!--[if (gte IE 9)|!(IE)]><!--> <html class="no-js" lang="en"> <!--<![endif]-->
<head id="Head1"><meta charset="utf-8" />
<!-- Always force latest IE rendering engine (even in intranet)
Remove this if you use the .htaccess -->
<meta http-equiv="X-UA-Compatible" content="IE=edge" /><title>
Verzekeringsrecht controleren
</title><meta http-equiv="cache-control" content="no-cache" /><meta http-equiv="content-language" content="nl-NL" />
It seams like it's utf-8 but there is problem with these characters. How to properly encode them?

Then the line would read :Pakketnaam1=>"Univé Zorg Geregeld Polis",
Is that what is supposed to be there and your console encoding is just not defined so Ruby does not know how to display the Unicode characters when printing them or should there be some more text?

Related

canvas not outputting expected £ text symbol to canvas, contains an additional character

When I try to write a £ sign on to canvas
context.fillText("£ ",600,165);
The output will write  £ to the canvas object, anyone got ideas on what to do... I tried
context.fillText("£ ",600,165);
but that only writes &pound to the output object.
it's likely not to work if encoding of the page isn't defined. try this in the html page at the very top
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"> <!-- THIS ONE !! -->
blabla...
example below show it works when it is a html5 page with utf-8
document.getElementById("myCanvas").getContext("2d").fillText("£ ",10,10);
<canvas id="myCanvas" width="300px" height="50px">no html5 support</canvas>

How to Properly Define UTF-8 Charset in in <head> Tag Section of Web Document

If my doc type is <!DOCTYPE html> is it best or more correct to use
<meta charset="utf-8" />
or
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
to define utf-8?
Thanks.
The first one is only valid with HTML5.
The second one is also valid for older (X)HTML versions
With this doctype (indicating HTML5) both are valid, I prefer the first as it is shorter. :)

Importing HTML table into OO Calc as UTF8 without converting to entities

I have a problem when opening a HTML table in OpenOffice or LibreOffice if it contains UTF8 extended characters like ÅÄÖåäö.
When opening the table into M$ Excel it works as intended but I can't make OO do the same thing.
By converting all extended characters to its HTML entity eqivalent Å etc. it works but it would be nice to get the correct characters directly.
Is there anyone who knows what I should do?
The following content I have in a file called excelsample.xls and if I open that with OO Calc it will not look nice.
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="content-type" content="application/vnd.ms-excel" charset="UTF-8">
<meta charset="UTF-8">
</head>
<body>
<table>
<tr>
<td>Prawn sandwich</td><td>Räksmörgås</td>
</tr>
</table>
</body>
</html>
Your meta tag is malformed and OO doesn't probably recognize the html5 charset tag.
So fix it with:
<meta http-equiv="content-type" content="application/vnd.ms-excel; charset=UTF-8">

Inserting images on a webpage in notepad using html5

I'm building a webpage in Notepad. I'm using html5 for the first time. I believe I did the correct coding to insert these images but they don't show up on the page. Here is the code: I could use some help, please. Thank you.
<html>
<head>
<title>My practice website</title>
<meta charset="utf-8">
<html lang="en">
<meta name="keywords" content="html, css, javascript, history, poems, poetry"/>
<meta name="description" content="This site is about my personal life, poems, poetry, images of family, myself"/>
<meta name="author" content="schweidel tyson">
<meta http-equiv="refresh" content="30" />
<link rel="stylesheet" type="text/css" href="mainstyesheet.css"/>
<body style="background-color: #ccffff;">
</head>
<body/>
<h1>Welcome to my website</h1>
<img src="http://www.html.net/logo.png"/>
<p>This is basically a personal website build to showcase my fledging talent in webdesign to put up pictures of my family and friends. I also like poetry, so there will be some poems.</p><b/>
This is a link to a good html tutorial<br/><br/>
This is another great tutorial link<br/><br/>
A tutorial for styles link
<img src=My practice website/My Website/images/high yellow.jpg" width="192 height="256"/><alt="African Amereican light-skined woman"><br/><br/>
<img src="http://www.zimbio.com/My website/images/trendy.jpg" width="352" width="400"/><alt="African American Woman">
<img src="My practice website/My website/my new pic.jpg" width="104" height="104"/> <alt="me at the domiciliary">
</body>
</html>
Your HTML is a bit off:
<img src="..." width="104" height="104"/> <alt="me at the domiciliary">
alt is just the alternate text for the image. It's an attribute just like width, src and height:
<img src="..." width="104" height="104" alt="me at the domiciliary" />
Also, make sure your URLs are correct.
Also, without a DOCTYPE, your markup is invalid. Include a DOCTYPE (here's a HTML5 one):
<!DOCTYPE html>
<html>
...

Why does the use of the Frameset DTD cause a validation failure?

The project I work on takes random HTML files, converts them to XHTML as best as it can, and wraps them with some XML metdata. The DOCTYPE is stripped out as the resulting XML file is not an XHTML document. However when retrieving the wrapped XHTML from the XML file the DOCTYPE should be reinserted.
Because these are random HTML files they could contain any content, but I would prefer to not have to store or determine the original DTD. I figured that I should the Frameset DTD as it was just a superset of the Transitional DTD and would be valid for all content. However when using the W3C XHTML Validator with the same document, using the Transitional DTD passes but using the Frameset DTD fails.
I've stripped down the document to the minimum with which I can reproduce the problem. Here is the Frameset version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
And here is the Transitional version:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Make The Move</title>
</head>
<body style="background: none;">
<h3 id="why">Why should I move to Linux?</h3>
</body>
</html>
Please explain why this is happening, and how I should proceed.
Frameset DTD is not a 'superset' of transitional. It is a special DTD only used for laying out frames, not content (except inside <noframes> tag). It allows only <head> and <frameset> as the children of <html> tag.
Here is the specification.
Unless you know your page could have frames, stick to transitional or strict DTDs.
As Chetan pointed out, the Frameset DTD should only be used in case you need frames, and even then, I would recomend on using Transitional instead. If you don't rely on frames, Strict is the way to go.

Resources