php wrong utf8 characters from mysql using Twig - activerecord

Im doing some webapp on php, im using my own MVC pattern, including Activerecord, and Twig templates.
So i have some problems with charset, there is some details about my encoding.
Im using polish characters
Mysql encoding is set to utf8_unicode_ci (i tried urf8_general_ci)
Twig template have standard html-5 header with utf8 encoding
Im not sure about files encoding (using netbeans), but sublime text 2 console on view.encoding() says: u'Undefined', i dont try to change it yet.
Problem description:
When im using polish characters like ółąćź in Twig template file - everything looks good, there is no problem. I tried to use:
echo $twig->render('hello.tpl', array('locations'=>"óóśąłłąś"));
And in this case is no problem too.
But when I get my data from database the polish characters are like "�"
I tried to get data by structural php mysql call, and by activerecord - ex. Model::all().
It allways have problems with characters from database in Twig template.
And yes, i set my active record config like: dbname?charset=utf8

The answer is funny.
I tried again to do it structural and i used this query:
mysql_query("SET NAMES 'utf8'", $dbLink);
It works, all characters are visible now.
On activerecord the problem still apears, so i updated activerecord to nigtly build, and everything works now !


Codeigniter black diamond characters

This is more of a curiosity than actual problem as there is an easy and propably more preferable workaround. When using Codeigniters form validation and when displaying error message the CI user guide gives two ways to set one's own validation messages: through set_message-method and editing the language file which is located in the system folder.
However when editing the language file to contain error messages in my native language (which contains special character liks 'Ä' and 'Ö') the special characters are replaced with a black diamond. When using the set_message-method from form_validation it works without a problem and the characters are encoded with UTF-8 properly.
I am wondering where lies the problem when using the file instead of the method and how to solve it?
It sounds like the file is not saved by your editor as UTF-8. Make sure that it is.

Encoding utf-8 string with MySQL and Rails 3 not working

yet another encoding problem with MySQL, UTF-8 and Rails 3 application.
We recently migrated our code from Rails 2 to Rails 3. We use MySQL and the mysql2 gem. The thing is, in our old database we had content that included some utf-8 chars instead of their corresponding htmlentities, such as \xC3\x9F for an o with a dieresis.
We have those strings as a YAML serialization of some strings that have to go into the website. The problem is that when the records from the database are loaded into the ActiveRecord objects, this is done with strange characters, thus showing really nasty on the web. For example, ß is shown as à and so on.
I played a bit with the new encoding magic of Rails 3, trying various combinations of force_encoding and encode methods with no luck.
For the record, mysql is started with this two lines:
Any idea on what are we doing wrong, why the YAML is not reading correctly those escaped characters and what could we do to solve the issue?
Ok I just found out what the problem was: the yaml text was done with syck and now psyck is not liking it much.
I found the answer here

Norwegian Special character not supporting in Codeigniter 2.0.2

I am trying to save some special norwegian characters like æøå ÆØÅ but this not saved properly in database. Sometimes such characters get trimmed and sometimes shown like æøå Ã
I had used htmlentities to support such characters in Codeigniter 1.7 and works well.
So the problem came with new version of Codeigniter.
Any ideas?
I had a similar issue with one of our native languages in India (that had accented charcters), and I resolved it by changing charset values to utf8_unicode_ci in database (table and fields compilation) and files related with data capture and display of data.
Let me know that helps you.

convert ascii characters to ruby encoding

I'm testing a feature with watir and running into an issue with validating ascii characters in the html.
I'm grabbing the product description from a database like so 'Company® Some Product' and use it as the string that i'm validating against.
and it shows up that way in the html. However Ruby is looking for Company\u00AE Some Product, so my test is failing.
Anyone have any solutions for getting around these special characters when they turn up?
HTML Entities gem may help:

clean up strange encoding in ruby

I'm currently playing a bit with couchdb.
I'm trying to migrate some blog data from redis (key value store) to couchdb (key value store).
Seeing as I probably migrated this data a gazillion times from and to different blogging engines (everybody has got to have a hobby :) ), there seem to be some encoding snafus.
I'm using CouchREST to access CouchDB from ruby and I'm getting this:
<JSON::GeneratorError: source sequence is illegal/malformed>
the problem seems to be the body_html part of the object:
<Post:0x00000000e9ee18 #body_html="[.....]Wie Sie bereits wissen, m\xF6chte EUserv k\xFCnftig seine [...]
Those are supposed to be Umlauts ("möchte" and "künftig").
Any idea how to get rid of those problems? I tried some conversions using the ruby 1.9 encoding feature or iconv before inserting, but haven't got any luck yet :(
If I try to e.g. convert that stuff to ISO-8859-1 using the .encode() method of ruby 1.9, this is what happens (different text, same problem):
#<Encoding::UndefinedConversionError: "\xC6\x92" from UTF-8 to ISO-8859-1>
I try to e.g. convert that stuff to ISO-8859-1
Close. You actually want to do it the other way around: you've got ISO-8859-1(*), you want UTF-8(**). So str.encode('utf-8', 'iso-8859-1') would be more likely to do the trick.
*: actually you might well have Windows code page 1252, which is like ISO-8859-1, but with extra smart-quotes and things in the range 0x80-0x9F which ISO-8859-1 uses for control codes. If so, use 'cp1252' instead.
**: well, you probably do. Working with UTF-8 is the best way forward so you can store all possible characters. If you really want to keep working in ISO-8859-1/cp1252, then presumably the problem is just that Ruby has mis-guessed the character set in use and you can fix it by calling str.force_encoding('iso-8859-1').
