Copying from file: read psql message on windows with UTF8

Copying from file: read psql message on windows with UTF8 - windows

I'm trying to import csv file to postgres with COPY command.
As I've received well known 'ERROR: character with byte sequence 0xd0 0x9f in encoding "UTF8" has no equivalent in encoding "WIN1252"' I changed my client_encoding to utf8.
Now I'm getting completely unreadable message
ÐŸÐžÐœÐ˜Ð›ÐšÐ: Ð²Ñ–Ð´Ð½Ð¾ÑˆÐµÐ½Ð½Ñ "mytab" Ð½Ðµ Ñ–ÑÐ½ÑƒÑ”
I tried to change console codepage by chcp 65001 but with no luck.
Can anybody help me with that extraordinary rare and complex task - to import csv to database?

Solution:
I would suggest, that problem is due to UA or RU localization of installed DB.
Switching DB lang should help (at least hepled me):
SET lc_messages TO 'en_US.UTF-8';
Please try on your PC and let me know if that helps.
My investigation:
In the powerShell I all the time getting an error but with:
ERROR: character with byte sequence 0xd0 0x9f in encoding "UTF8" has no equivalent in encoding "WIN1252"
When I swith encoding to UTF-8 with comand:
SET client_encoding TO 'UTF8';
I'm starting to get the same not readable symbols, but if I going to pgAdmin4 and run the same command it gives me well explained error in UA lang:
ERROR: ПОМИЛКА: insert або update в таблиці "exam_results" порушує обмеження зовнішнього ключа "exam_results_subject_id_fkey"
DETAIL: Ключ (subject_id)=(0) не присутній в таблиці "subjects".
CONTEXT: SQL-оператор "insert into exam_results (student_id, subject_id, mark)
values ((random()*100000)::int,
(random()*1000)::int,
(random()*5)::int)"
Функція PL/pgSQL inline_code_block рядок 4 в SQL-оператор

Related

psql on windows: ERROR: invalid byte sequence for encoding "UTF8": 0xc8 0x20

on database1:
show LC_CTYPE; shows C
show LC_COLLATE; shows C
show SERVER_ENCODING; shows UTF8
but set "PGPASSWORD=password1" & set "PGCLIENTENCODING=UTF8" & psql.exe -h 127.0.0.1 -p 5432 -U postgres -d database1 -c "INSERT INTO table1 (column1) VALUES ('mise à jour 1');"
shows: ERROR: invalid byte sequence for encoding "UTF8": 0xc8 0x20
the error disappears if PGCLIENTENCODING is set to ISO_8859_5 for example
how to fix this issue?

There is nothing much to fix. Your Windows shell uses a different encoding than UTF-8, so you have to set the client encoding to that encoding to make it work. To find out which client encoding to use, you must figure out which encoding your shell uses. That in turn depends on which shell you are using and how the Windows system was configured.

How to read a file in utf8 encoding and output in Windows 10?

What is proper procedure to read and output utf8 encoded data in Windows 10?
My attempt to read utf8 encoded file in Windows 10 and output lines into terminal does not reproduce symbols of some languages.
OS: Windows 10
Native codepage: 437
Switched codepage: 65001
In cmd window issued command chcp 65001. Following ruby code reads utf8 encoded file and outputs lines with puts.
fname = 'hello_world.dat'
File.open(fname,'r:UTF-8') do |f|
puts f.read
end
hello_world.dat content
Afrikaans: Hello Wêreld!
Albanian: Përshendetje Botë!
Amharic: ሰላም ልዑል!
Arabic: مرحبا بالعالم!
Armenian: Բարեւ աշխարհ!
Basque: Kaixo Mundua!
Belarussian: Прывітанне Сусвет!
Bengali: ওহে বিশ্ব!
Bulgarian: Здравей свят!
Catalan: Hola món!
Chichewa: Moni Dziko Lapansi!
Chinese: 你好世界！
Croatian: Pozdrav svijete!
Czech: Ahoj světe!
Danish: Hej Verden!
Dutch: Hallo Wereld!
English: Hello World!
Estonian: Tere maailm!
Finnish: Hei maailma!
French: Bonjour monde!
Frisian: Hallo wrâld!
Georgian: გამარჯობა მსოფლიო!
German: Hallo Welt!
Greek: Γειά σου Κόσμε!
Hausa: Sannu Duniya!
Hebrew: שלום עולם!
Hindi: नमस्ते दुनिया!
Hungarian: Helló Világ!
Icelandic: Halló heimur!
Igbo: Ndewo Ụwa!
Indonesian: Halo Dunia!
Italian: Ciao mondo!
Japanese: こんにちは世界！
Kazakh: Сәлем Әлем!
Khmer: សួស្តីពិភពលោក!
Kyrgyz: Салам дүйнө!
Lao: ສະບາຍດີຊາວໂລກ!
Latvian: Sveika pasaule!
Lithuanian: Labas pasauli!
Luxemburgish: Moien Welt!
Macedonian: Здраво свету!
Malay: Hai dunia!
Malayalam: ഹലോ വേൾഡ്!
Mongolian: Сайн уу дэлхий!
Myanmar: မင်္ဂလာပါကမ္ဘာလောက!
Nepali: नमस्कार संसार!
Norwegian: Hei Verden!
Pashto: سلام نړی!
Persian: سلام دنیا!
Polish: Witaj świecie!
Portuguese: Olá Mundo!
Punjabi: ਸਤਿ ਸ੍ਰੀ ਅਕਾਲ ਦੁਨਿਆ!
Romanian: Salut Lume!
Russian: Привет мир!
Scots Gaelic: Hàlo a Shaoghail!
Serbian: Здраво Свете!
Sesotho: Lefatše Lumela!
Sinhala: හෙලෝ වර්ල්ඩ්!
Slovenian: Pozdravljen svet!
Spanish: ¡Hola Mundo!
Sundanese: Halo Dunya!
Swahili: Salamu Dunia!
Swedish: Hej världen!
Tajik: Салом Ҷаҳон!
Thai: สวัสดีชาวโลก!
Turkish: Selam Dünya!
Ukrainian: Привіт Світ!
Uzbek: Salom Dunyo!
Vietnamese: Chào thế giới!
Welsh: Helo Byd!
Xhosa: Molo Lizwe!
Yiddish: העלא וועלט!
Yoruba: Mo ki O Ile Aiye!
Zulu: Sawubona Mhlaba!
Steven Penny suggested to use PowerShell and do not change code page. Following picture demonstrates that the issue persists.
Windows Terminal installer (which is not a part of Windows distribution) solves utf8 output issue, please see included screen capture.

The problem is, you are using a some methods and tools that are really old. First:
Native codepage: 437
Switched codepage: 65001
You don't need to mess with the codepage any more, just leave it as the default. Also, from you picture I see you are also using Console Host, which is also really old. Windows Terminal [1] has been available since 2019, and has built in UTF-8 support. Using Windows Terminal, I can run your script, even without specifying UTF-8:
fname = 'hello_world.dat'
File.open(fname,'r') do |f|
puts f.read
end
and I get perfect result:
To use Windows Terminal, download the msixbundle file [2], then install it. Or, as it's essentially just a Zip file, you can rename it to file.zip and extract it with Windows, then run WindowsTerminal.exe. Or, since you are really having trouble with this process, you can use a portable version I just created
[3] (at your own risk).
https://github.com/microsoft/terminal
https://github.com/microsoft/terminal/releases/tag/v1.8.1444.0
https://github.com/microsoft/terminal/files/6563899/CascadiaPackage_1.8.1444.0_x64.zip

strange character in migration from postgres to oracle (Ansi)

I'm migrating a db from postgres to oracle.I create csv files with this command:
\copy ttt to 'C:\test\ttt.csv' CSV DELIMITER ',' HEADER encoding 'UTF8' quote as '"'; then with oracle sql loader I put data in oracle tables.
It's all ok but I have in some description this character Â that wasnt in the original DB.
The encoding of db postgres is UTF8 and I'm on a window machine.
Thanks to all.
Gian Piero

Before you start sqlloader run
chcp 65001
set NLS_LANG=.AL32UTF8
chcp 65001 sets codepage of your cmd.exe to UTF-8 (which is inherited by sqlloader and sqlplus)
With set NLS_LANG=.AL32UTF8 you tell the Oracle database "The client uses UTF-8"
Without these commands you would have this situation (due to defaults)
chcp 850
set NLS_LANG=AMERICAN_AMERICA.US7ASCII
Maybe on your PC you have codepage 437 instead of 850, it depends whether your PC is U.S. or Europe, see National Language Support (NLS) API Reference, column OEM codepage
You can set NLS_LANG also as Environment Variable in PC settings or you can define it in Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 32 bit), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG
You can also change codepage of your cmd.ext persistent, see https://stackoverflow.com/a/33475373/3027266
For details about NLS_LANG see https://stackoverflow.com/a/33790600/3027266

Import / export oracle scheme with correct character set

I have exported a scheme successfully. On the import however the log says that the character sets don't match. The strange thing is that on the server the export was done the character set is the same as on the target database.
This is from the source:
SQL> select * from v$NLS_PARAMETERS
2 ;
**NLS_CHARACTERSET
WE8MSWIN1252**
**NLS_NCHAR_CHARACTERSET
AL16UTF16**
And this is from the log of the import:
Importvorgang mit Zeichensatz WE8MSWIN1252 und Zeichensatz AL16UTF16 NCHAR durchgeführt
Export-Client verwendet Zeichensatz US7ASCII (mögliche Zeichensatzkonvertierung)
Why is the dump recognized as US7ASCII set? The source and target both are non-US machines.
Thank you

Yes, Looks like issue with char set of client session. Set it to globally supported and recommended UTF8 format.
Pls take the export again and try importing. (Do the following before export):
In Windows
set NLS_LANG=AMERICAN_AMERICA.UTF8
In Unix
export NLS_LANG=AMERICAN_AMERICA.UTF8
These days DB char set is also recommended to be 'AL32UTF8'.

DB2: How to set encoding for db2clp under Windows?

I have a DB2 that was created with encoding set to UTF-8
db2 create database mydb using codeset UTF-8
My data insert scripts are also stored in encoding UTF-8.
The problem now is that the command line processor seems to work with a different encoding as the Windows installation doesn't use UTF-8:
C:\Users\Administrator>chcp
Active code page: 850
This leads to the problem that my data (which contains special characters) is not stored correctly to the database.
Under Linux/AIX I could change the command line encoding by setting
export LC_ALL=en_US.UTF-8
How do I achieve this under Windows? I already tried
chcp 65001
UPDATE:
But that won't have any effect? It seems like the db2clp can't deal with the UTF-8 encoded file because it will print out junk:
D:\Program Files\ibm_db2\SQLLIB\BIN>chcp 65001
Active code page: 65001
D:\Program Files\ibm_db2\SQLLIB\BIN>type d:\tmp\encoding.sql
INSERT INTO MY_TABLE (ID, TXT) VALUES (99, 'äöü');
D:\Program Files\ibm_db2\SQLLIB\BIN>db2 connect to mydb
Datenbankverbindungsinformationen
Datenbank-Server = DB2/NT64 9.5.0
SQL-Berechtigungs-ID = MYUSER
Aliasname der lokalen Datenbank = MYDB
D:\Program Files\ibm_db2\SQLLIB\BIN>db2 -tvf d:\tmp\encoding.sql
INSERT INTO MY_TABLE (ID, TXT) VALUES (99, 'Ã¤Ã¶Ã¼')
DB20000I Der Befehl SQL wurde erfolgreich ausgeführt.

You need to set both:
CHCP 65001
SET DB2CODEPAGE=1208
on the db2cmd command line, before running db2 -tvf. This works for databases that have CODESET set to UTF-8. To check the CODESET setup for database run:
db2 get db cfg for <your database>
and look for "Database code page" and "Database code set" they should be 1208 and UTF-8 respectively.

when dealing with encodings, you have to take a careful look into your envirnoments, and where you are currently. So in your case:
the Server stores its data in encoding A (like UTF-8)
the client resides in an environment which has encoding B (like windows-1252)
in your client, you have to have to use the encoding of your client (or tell the client you intentionally use another encoding on client side (like UTF-8-encoded file inside a windows-1251 environment)!). The connection between the Client and the server is doing the work for you to change encoding B into encoding A for storing the data into the database.

It's work for me by setting db2codepage, thanks to Mr. Zoran Regvart.
by the way, after setting, you need to execute "db2 terminate" to reset client, and then reconnect.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Copying from file: read psql message on windows with UTF8 - windows

Related

psql on windows: ERROR: invalid byte sequence for encoding "UTF8": 0xc8 0x20

How to read a file in utf8 encoding and output in Windows 10?

strange character in migration from postgres to oracle (Ansi)

Import / export oracle scheme with correct character set

DB2: How to set encoding for db2clp under Windows?

Categories

Resources