I'm trying to migrate Oracle data with 'we8mswin1252' encoding to my postgres database which has 'UTF8' encoding.
I'm using foreign data wrapper.
I'm getting
invalid byte sequence error
What should i do?
Such errors can be caused by two things:
There may be zero bytes in your Oracle strings. That is allowed in Oracle (even though it is problematic), but forbidden in PostgreSQL.
It is easy to get data corruption in Oracle, because it is pretty sloppy with encoding checks and allows you to insert arbitrary illegal byte sequences when client encoding and server encoding are the same.
There are two approaches to dealing with this problem:
The correct way: fix the data on the Oracle side. oracle_fdw will support you by telling you which row in the result set caused the error.
The sloppy way: use a PostgreSQL database with database encoding sql_ascii, which will allow you to store anything in a string (except zero bytes).
Related
On Oracle 11, I dumped my data using exp/imp to be migrated to another DB.
I tested to import the dump file on my local database, with no problem at all.
But then my colleague tried the same on his own machine and some tables couldn't get imported due to the error:
can bind a LONG value only for insert into a LONG column.
I dont have any long type, but I read that this error could also be thrown when size exceeds on a varchar2 type, So I checked character sets of databases, I have default Windows charset and he has utf8 charset. So do you think maybe same length of data are represented with more bytes and this leads to this kind of error?
Do I have to change my database charset and create another dump? I look for a better solution, because this also needs to be imported to customers database, which I know has a totally different charset..
Any windows inherited character set isn't multi byte by definition. When you created multi byte(utf8) db every single character may be converted during the import to 1-3 bytes. So you have to increase automatically before import every string type column to x3 times. In case you will case the limit of 4096 use Clob type instead.
I have an exported data using exp command from a full Oracle 11gR2 database that has the AR8MSWIN1256 charset. However, when I import the data into an 11gR2 XE database, I get the error:
row rejected due to ORACLE error 12899
Could the problem be the mismatch in charsets (AL32UTF8 vs AR8MSWIN1256)? If so, is there a solution?
the table almost certainly has length semantics BYTE for the character columns. imp creates the table with the same length semantics as they were in the source database. So if you want to migrate to a multibyte character set you need to make sure that the length semantics of those columns are changed to character.
Easiest is to pre-create the tables and make sure that your column definitions don't specify their length in bytes but in characters.
I have Oracle SQL Developer (3.1.07) and I'm trying to work with a database that uses WE8ISO8859P1 encoding:
SELECT * FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
I have problems with saving packages that contains unicode symbols. When I open previously saved package all unicode symbols are turned to '¿'.
What settings do I have to change to make SQL Developer keep those symbols?
I've tried to set environment encoding to 'ISO-8859-15' and some other encodings, but it won't help.
If your database encodes text to a non-unicode single-byte encoding (e.g. ISO-8859), any symbol not present on the character table will be seen as invalid and replaced by a placeholder. You can't go back from that, the information is lost.
That can be usually worked around when storing data, but as for source code, you cannot control how Oracle would encode your strings.
If your database is configured to use such encoding scheme you're probably not supposed to write code that violates its rules.
Maybe you could need this character set migration
http://docs.oracle.com/cd/B10501_01/server.920/a96529/ch10.htm#1656
on the Oracle's documentation
At least to open PKG in sql developer, you can do a quick try and see if it works:-
Change SQL Developer 'encoding' to 'unicode-utf-8' which is default to later versions now.
You would ,eventually, need to go for database charset migration to 'AL32UTF8' to avoid other issues (like data) due to this char set.
If you look at USER_SOURCE you'll see that the source code, as stored/interpreted by the database, will be in a VARCHAR2 column so use the database character set. As such, your source code will need to be in WE8ISO8859P1.
In theory, if the client and database are using the same character set, then the database won't try to do any character set translation and you may be able to sneak in a sequence of bytes that the database thinks are WE8ISO8859P1 but will make sense in unicode. However, at some point, someone will use the wrong client and it will break.
You don't need unicode for identifiers etc in the code, so I assume it is in string literals. You are better off storing these in a table (NVARCHAR2 column) and selecting them into the code rather than hard-coding them. If that isn't possible, you could use UNISTR and hard-code the relevant hex values.
The client has asked for a number of tables to be extracted into csv's, all done no problem. They've just asked we make sure the files are always in UTF 8 format.
How do I check this is actually the case. Or even better force it to be so, is it something i can set in a procedure before running a query perhaps?
The data is extracted from an Oracle 10g database.
What should I be checking?
Thanks
You can check the database character set with the following query:
select value from nls_database_parameters
where parameter='NLS_CHARACTERSET'
If it says AL32UTF8 then your database is in the format what you need and if the export does not impair it then your are done.
You may read about Oracle globalization support here, and here about NLS parameters like the above.
How, exactly, are you generating the CSV files? Depending on the exact architecture, there will be different answers.
If you are, for example, using SQL*Plus to extract the data, you would need to set the NLS_LANG on the client machine to something appropriate (i.e. AMERICAN_AMERICA.AL32UTF8) to force the data to be sent to the client machine in UTF-8. If you are using other approaches, NLS_LANG may or may not be important.
What you have to look for is the eight-bit ascii characters in hte input (if any) are translated into double byte utf-8 characters.
This is highly dependant on your local ASCII code page but typically:-
ASCII "£" should be x'A3' in ascii magically becomes x'C2A3' in utf-8.
Ok it wasn't as simple as I first hoped. The query above returns AL32UTF8.
I am using a stored proc compiled on the database to loop through a list of table names held in an array inside the stored procedure.
I use DBMS_SQL package to build the SQL and UTL_FILE.PUT_NCHAR to insert data into a text file.
I believed then my resultant output would be in UTF 8 however opening in Textpad says it's in ANSI and the data is garbled in places :)
Cheers
It might be important that NLS_CHARACTERSET is AL32UTF8 and NLS_NCHAR_CHARACTERSET is AL16UTF16
I'm using an Oracle database with a collation different to my OS language. I'm accessing the database using the ODBC driver. When I prepare a statement (e.g. a "select * from x where=?"), that involves special non-ASCII characters supported by the DB's collation, I'm finding the data row with the characters. When I execute the select directly with the argument in the sql string, the data row isn't found.
Pure guess on my part, but it may be because your client computer isn't encoding the sql string with the argument written into it correctly. I think that if your client is set to a different regional setting than the DB collation, the character array containing the select statement that gets sent to Oracle would contain "incorrect" bytes where the original funky characters were located - Oracle would interpret these as some character other than the one you originally sent (causing the row to not be found).
Is there any reason you can't just use the parameterized approach (since it is working correctly)?