I'm querying from a 10g database using a dblink to an 8i database.
select col1, col2 ... from table#my_dblink_to_8i
8i charset is IW8ISO8859P8
10g charset is WE8MSWIN1252
the data is coming out as gibrish. I've tried all of variations I can think of
to_char(col1)
cast(col1 as nchar(4))
cast(col1 as nvarchar2(4))
cast(col1 as char(4))
cast(col1 as varchar2(4))
convert(col1, 'WE8MSWIN1252', 'IW8ISO8859P8')
convert(convert(col1,'UTF8','IW8ISO8859P8'),'WE8MSWIN1252','UTF8')
all returning with either gibrish or
ORA-12704: character set mismatch
ORA-02063: preceding line from OTHERDB
any suggestions ?
Is there an intermediate charset I can convert to ?
Yes, this is a known problem that sometimes occurs. I remember the first time experiencing it using a database link between two identical Oracle 7 version and then seeing it back in 9 when using Oracle 8.1.5.
It can not always be solved. Oracle development does not seem to test as intensively with non-US characters as with US characters.
The first thing you can try is to check the EXACT versions of Oracle 8i in use. Check that the server version is 8.1.7 or newer (such as 8.1.7.4). With 8.1.5 there are known problems, I think to recall that that is the first version to do AL32UTF8.
Also check the version of the SQL*Net client installation (if you are using a separate installation, I don't think so). It must be 8.1.7 or newer also.
Also check that the characters are available in BOTH character sets. They are largely identical, but not completely. I think the 8859P8 is an international without Europ-support, whereas MSWIN1252 is something of Microsoft.
Check the NLS_LANG on all nodes in between and that the database character set is correctly configured. Make sure they are correct. The interim nodes you can change to AL32UTF8. SQL*net does no character conversion but also no checks when the client and server talk the same characterset, so bugs in the characterset setup can slumber for years.
After testing those, you might want to try convert to AL32UTF instead of UTF8 (I think it was already available by 8, don't know sure, but maybe only mainstream supported on 9i).
As a last resort, do the character conversion yourself. Use a procedure to transport it binary to the caller and do the conversion on the receiving 10g database.
Or use an ETL tool like Kettle, spooling to text files as interim or alike.
I hope this answers your question. If not, please help me with some samples of the gibberish (transporting us7ascii texts, more advanced texts, and the results of out varchar2 parameters called across dblink). If yes, please let me know too. You have a intriguing question!
Related
I recently wrote a tool that extracts certain data from our DBs. It runs as PL/SQL script running in SQLDeveloper (either in a worksheet or as extension plugin) and produces its output to the SQLDeveloper log-window.
This worked all fine on my system, but now I encountered an issue when users are on systems with a different language or more specific with different default time/date/timestamp formats than on the machine on which I had been developing and testing this.
Now - I am not sure: is the format of dates, times and timestamps controlled by the DB or by SLQ-Developer? In my understanding these PL/SQL scripts are sent to the DB for execution and their output is sent back to SQL-Developer. That would mean for me, that the format of the output depends solely on the DB (or the system on which the DB executes). Or are the NLS setting of the client (SQL-Developer) somehow involved here?
To make my tool auto-adjust to these settings I will need to be able to query these formats - either from the DB in use (Oracle 12.2 or Oracle XE 18/19 in our case) or from SQLDeveloper.
Assuming, it's the DB: Is there a table that contains the default format strings that are being used for select results?
Note: The point is NOT how to format dates etc. as strings, but the other way round:
I get the the query results as strings in the log-window. These contain dates and timestamps. I now need a hint from the DB-system to figure out how to interpret these. E.g. when I get a date such as '10-11-12', is this meant to be Nov. 10th, 2012 or is meant to be Nov. 12th, 2010?
Hope I could make myself clear...
I am a greenhorn in Oracle Database and IBM Integration Bus and I'm trying to use the INSERT INTO function of ESQL in the IBM Integration Bus to insert data of a CSV file.
I'm using a DFDL with ISO-8859-1 encoding to read the file. When using the debugger, the message is fine and readable in SQLPLUS and SQL Developer.
I already tried to change the NLS_CHARARCTERSET setting in my Oracle Database although im not really sure which encoding I need. On default it was AL32UTF8 and I tried UTF8, WE8ISO8859P1.
What I also did was changing the encoding of the DFDL and changing the ODBC driver's settings to Use Oracle NLS Settings (Default), Use Microsoft Regional Settings and Use US settings.
If I try to use the INSERT INTO command the Database returns inverted question marks or Chinese characters, which is obviously not what I want.
EDIT:
If I hardcode the INSERT INTO values it also returns the question mark. The CSV's encoding doesn't matter. What I also found out is that the CSV file's data is displayed as null. When I hardcode the values in the INSERT INTO I get inverted question marks.
Its a simple option in the IIB ODBC Driver for Oracle. Just make sure that "Enable SQL Describe Param" in the advanced tab is checked.
If you mean the encoding of the CSV file. It is Windows 1250 since the input file has characters like 'ß'
I don't understand that statement. The use of the character 'ß' does not necessarily imply Windows 1250. Do you have any other supporting evidence for that claim?
If your claim is correct then your DFDL schema is incorrect. You cannot parse a multi-byte encoding using a single-byte decoder. So the first thing you should do is to change the 'encoding' property in the format block of your DFDL schema to "5346" (according to https://www.ibm.com/support/knowledgecenter/en/SSRH46_3.0.0_SWS/dni_ccsids_and_char_set_names.html ).
But (and I'm sorry for repeating this, but it really matters...) CHECK that your assumption about Windows 1250 is correct. Then make sure that the encoding in the DFDL schema matches the encoding of the CSV file.
We are looking to use WSO2 API manager at our current client, and are required to use the provided Oracle DDL to set up the necessary tables for Carbon, API manager and message broker. The Client's dba's are coming back asking why the script uses varchar instead of Varchar2 for the relevant fields, which is a good question as the standard approach for oracle is "Varchar2 is the industry standard, don't use varchar."
Is there a really good reason why the oracle scripts use Varchar instead of Varchar2? When implemented on top of Oracle, what is WSO2 doing that required the ability to differentiate null from empty?
This was probably a simple typo that doesn't matter.
It looks like only the column AM_API.INDEXER uses VARCHAR instead of VARCHAR2. The program also contains scripts for H2, MS SQL, MySQL, and PostGreSQL. All the other databases use VARCHAR. This is a pretty common mistake for products that support multiple databases.
Keep in mind that there is no meaningful difference between VARCHAR and VARCHAR2 in Oracle. The documentation claims that someday there will be a difference but I highly doubt it. This issue has existed for a long time and there's a ton of legacy code that depends on the old behavior. I would bet good money that Oracle will never make VARCHAR use different NULL comparison semantics.
I tried to change the script and created a pull request. I don't understand this project and almost certainly did something wrong. So don't be surprised if the request is rejected. But perhaps it will in some way help to lead toward a fix.
We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.
We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.
Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?
Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?
Thank you for your answers.
Note following Justin's answer:
Thank you for your answer. I will discuss your points, applied to our case:
Our application is usually alone on the Oracle database and takes care of the
data itself. Other software that connect to the database are limited to Toad,
Tora or SQL developer.
We also use SQL*Loader and SQL*Plus to communicate with the database for basic
statements or to upgrade between versions of the product. We have
not heard of any specific problem with all those software regarding NVARCHAR2.
We are also not aware that database administrators among our customers would
like to use other tools on the database that could not support data on
NVARCHAR2 and we are not really concerned whether their tools might disrupt,
after all they are skilled in their job and may find other tools if necessary.
Your last two points are more insightful for our case. We do not use many
built-in packages from Oracle but it still happens. We will explore that
problem.
Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to
store UTF-16, has to perform encoding conversions on all processed data?
If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.
There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.
For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.
Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.
In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.
Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.
Responding to your followups
Our application is usually alone on
the Oracle database and takes care of
the data itself. Other software that
connect to the database are limited to
Toad, Tora or SQL developer.
What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.
I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.
We also use SQL*Loader and SQL*Plus to
communicate with the database for
basic statements or to upgrade between
versions of the product. We have not
heard of any specific problem with all
those software regarding NVARCHAR2.
Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.
We are also not aware that database
administrators among our customers
would like to use other tools on the
database that could not support data
on NVARCHAR2 and we are not really
concerned whether their tools might
disrupt, after all they are skilled in
their job and may find other tools if
necessary.
While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.
Could we also expect performance
breakage if our application (that is
compiled under Visual C++), that uses
wchar_t to store UTF-16, has to
perform encoding conversions on all
processed data?
I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.
My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.
We have a requirement to store char data of different language in the same db schema. Oracle 10g is our DB. I am hoping that someone who have already done this would give me more specific instructions on how to i18n enable a oracle 10g db. We just need to store the data from multiple locales as well as collation (hoping all major db's support this) support at the db level. We doesn't need formatting of dates, datetime, numbers, currency etc.
I read some documentation on oracle's i18n support but somewhat confused about their many nls_* properties. Should I be using nls_lang or nls_language or NLS_CHARACTERSET.....
Assuming that you are building the database from scratch, not trying to retrofit an existing database which introduces other problems.
In the database, you need to ensure that the database character set supports all the characters you want to store. Presumably, that means setting the NLS_CHARACTERSET of the database to AL32UTF8. Personally, I prefer to set NLS_LENGTH_SEMANTICS to CHAR as well. That changes the default behavior of a VARCHAR2(n) to allocate n characters of storage rather than n bytes. Since AL32UTF8 is a variable-length character set, using byte semantics is generally problematic because you either have to declare fields that are 3 times as long and end up with different users being able to enter a different number of characters in the same field.
NLS_LANG is a client setting. That identifies the character set that the client is going to request the data be converted into. That generally depends on the code page of the operating system.