I am a greenhorn in Oracle Database and IBM Integration Bus and I'm trying to use the INSERT INTO function of ESQL in the IBM Integration Bus to insert data of a CSV file.
I'm using a DFDL with ISO-8859-1 encoding to read the file. When using the debugger, the message is fine and readable in SQLPLUS and SQL Developer.
I already tried to change the NLS_CHARARCTERSET setting in my Oracle Database although im not really sure which encoding I need. On default it was AL32UTF8 and I tried UTF8, WE8ISO8859P1.
What I also did was changing the encoding of the DFDL and changing the ODBC driver's settings to Use Oracle NLS Settings (Default), Use Microsoft Regional Settings and Use US settings.
If I try to use the INSERT INTO command the Database returns inverted question marks or Chinese characters, which is obviously not what I want.
EDIT:
If I hardcode the INSERT INTO values it also returns the question mark. The CSV's encoding doesn't matter. What I also found out is that the CSV file's data is displayed as null. When I hardcode the values in the INSERT INTO I get inverted question marks.
Its a simple option in the IIB ODBC Driver for Oracle. Just make sure that "Enable SQL Describe Param" in the advanced tab is checked.
If you mean the encoding of the CSV file. It is Windows 1250 since the input file has characters like 'ß'
I don't understand that statement. The use of the character 'ß' does not necessarily imply Windows 1250. Do you have any other supporting evidence for that claim?
If your claim is correct then your DFDL schema is incorrect. You cannot parse a multi-byte encoding using a single-byte decoder. So the first thing you should do is to change the 'encoding' property in the format block of your DFDL schema to "5346" (according to https://www.ibm.com/support/knowledgecenter/en/SSRH46_3.0.0_SWS/dni_ccsids_and_char_set_names.html ).
But (and I'm sorry for repeating this, but it really matters...) CHECK that your assumption about Windows 1250 is correct. Then make sure that the encoding in the DFDL schema matches the encoding of the CSV file.
Related
I have about 1200 databases in SQL 2016 Enterprise in which documents are stored as BLOB's in image fields in the database. I migrated all the documents to our Document Management System and now I want to replace the files in the database with shortcuts to the corresponding files in the DMS. I tried it a few months ago and that went well. Now I run into an issue. When I use this command
convert(varbinary(max), <string value of shortcut>)
It get's written to the field. When I try to open the shortcut I get an error. If I create the same shortcut from our DMS it is exactly the same except for the encoding. My binary = UTF-16 little endian and the DMS shortcut is encoded in UTF-8. The filesize has also doubled, which is logical since UTF-16 uses two bytes for every character. When I change the encoding in Notepad++ my shortcut works.
I need my blob to be encoded in UTF-8. That is possible, I can upload a shortcut to the system that the database uses and then it's stored correctly. I can't change the collation of the table or field because this is a vendor database. It's a pretty old-fashioned system. Who uses Blob's in the first place and if you do, why image and not varbinary?
I'm not much of a programmer so any help would be greatly appreciated.
I tried updating the database to the latest client version (not SQL just the application). That didn't change anything.
I tried nvarchar but that doesn't work on image fields.
I'm querying from a 10g database using a dblink to an 8i database.
select col1, col2 ... from table#my_dblink_to_8i
8i charset is IW8ISO8859P8
10g charset is WE8MSWIN1252
the data is coming out as gibrish. I've tried all of variations I can think of
to_char(col1)
cast(col1 as nchar(4))
cast(col1 as nvarchar2(4))
cast(col1 as char(4))
cast(col1 as varchar2(4))
convert(col1, 'WE8MSWIN1252', 'IW8ISO8859P8')
convert(convert(col1,'UTF8','IW8ISO8859P8'),'WE8MSWIN1252','UTF8')
all returning with either gibrish or
ORA-12704: character set mismatch
ORA-02063: preceding line from OTHERDB
any suggestions ?
Is there an intermediate charset I can convert to ?
Yes, this is a known problem that sometimes occurs. I remember the first time experiencing it using a database link between two identical Oracle 7 version and then seeing it back in 9 when using Oracle 8.1.5.
It can not always be solved. Oracle development does not seem to test as intensively with non-US characters as with US characters.
The first thing you can try is to check the EXACT versions of Oracle 8i in use. Check that the server version is 8.1.7 or newer (such as 8.1.7.4). With 8.1.5 there are known problems, I think to recall that that is the first version to do AL32UTF8.
Also check the version of the SQL*Net client installation (if you are using a separate installation, I don't think so). It must be 8.1.7 or newer also.
Also check that the characters are available in BOTH character sets. They are largely identical, but not completely. I think the 8859P8 is an international without Europ-support, whereas MSWIN1252 is something of Microsoft.
Check the NLS_LANG on all nodes in between and that the database character set is correctly configured. Make sure they are correct. The interim nodes you can change to AL32UTF8. SQL*net does no character conversion but also no checks when the client and server talk the same characterset, so bugs in the characterset setup can slumber for years.
After testing those, you might want to try convert to AL32UTF instead of UTF8 (I think it was already available by 8, don't know sure, but maybe only mainstream supported on 9i).
As a last resort, do the character conversion yourself. Use a procedure to transport it binary to the caller and do the conversion on the receiving 10g database.
Or use an ETL tool like Kettle, spooling to text files as interim or alike.
I hope this answers your question. If not, please help me with some samples of the gibberish (transporting us7ascii texts, more advanced texts, and the results of out varchar2 parameters called across dblink). If yes, please let me know too. You have a intriguing question!
We have a portal written in php/mysql and an enterprise application based on Java EE and Oracle. Recently we found out that a certain Unicode character (0643 to be precise) is invalid (due to improper data entry by end users) in text columns and must be changed to another character (06A9).
In MySQL I simply changed the export file using a text editor's find and replace tool. But in oracle, the dmp file is a binary file and i have no idea about how to edit the dmp file.
How can I change the invalid character?
Is there an alternative to iterating through all text columns in all tables?
(I have saved that as a last resort!)
Editing an Oracle dump file may be possible but isn't practical; even if you could get in and change something you'd risk corrupting it, and I doubt Oracle support would be impressed. (See this AskTom question for example).
If you're using data pump and you know which column(s) the data is in you might be able to use the REMAP_DATA parameter to change it on the fly, or the QUERY parameter to skip the data, but it doesn't sound like you're in that situation. You could potentially add temporary constraints to the relevant column(s) to block the value, so import would reject (and log) the affected rows, but that's painful and messy.
If you do have to check all columns on all tables, this link may be helpful.
We have a requirement to store char data of different language in the same db schema. Oracle 10g is our DB. I am hoping that someone who have already done this would give me more specific instructions on how to i18n enable a oracle 10g db. We just need to store the data from multiple locales as well as collation (hoping all major db's support this) support at the db level. We doesn't need formatting of dates, datetime, numbers, currency etc.
I read some documentation on oracle's i18n support but somewhat confused about their many nls_* properties. Should I be using nls_lang or nls_language or NLS_CHARACTERSET.....
Assuming that you are building the database from scratch, not trying to retrofit an existing database which introduces other problems.
In the database, you need to ensure that the database character set supports all the characters you want to store. Presumably, that means setting the NLS_CHARACTERSET of the database to AL32UTF8. Personally, I prefer to set NLS_LENGTH_SEMANTICS to CHAR as well. That changes the default behavior of a VARCHAR2(n) to allocate n characters of storage rather than n bytes. Since AL32UTF8 is a variable-length character set, using byte semantics is generally problematic because you either have to declare fields that are 3 times as long and end up with different users being able to enter a different number of characters in the same field.
NLS_LANG is a client setting. That identifies the character set that the client is going to request the data be converted into. That generally depends on the code page of the operating system.
I've been developing a DotNet project on oracle (Ver 10.2) for the last couple of months and was using Varchar2 for my string data fields. This was fine and when navigating the project page refreshes were never more than a half second if even (it's quiet a data intensive project). The data is referenced from 2 different schemas, one a centralised store of data and one of which is my own. Now the centralised schema will be changing to be unicode compliant (but hasn't yet) so all Varchar2 fields will become NVarchar2, in preparation for this I changed all the fields in my schema to be NVarchar2 and since then performance has been horrible .. up to 30/40 second page refreshes.
Could this be because Varchar2 fields in the centralised schema will be joined against NVarchar2 fields in my schema on some stored procedures. I know NVarchar2 is twice the size of Varchar2 but that wouldn't explain the sudden massive change. As I said any tips for what to look for to improve would be great, if I haven't explained the scenario well enough do ask for more information.
Firstly, do a
select * from v$nls_parameters where parameter like '%SET%';
Character sets can be complicated. You can have single-byte charactersets, fixed-size multibyte character set sand variable-sized multi-byte character sets. See the unicode descriptions here
Secondly, if you are joining a string in a single-byte characterset to a string in a two-byte characters set, you have a choice. You can do a binary/byte comparison (which generally won't match anything if you compare between a single-byte character set and a two-byte characterset). Or you can do a linguistic comparison, which will generally mean some CPU cost, as one value is converted into another, and often the failure to use an index.
Indexes are ordered, A,B,C etc. But a character like Ä may fall in different places depending on the Linguistic order. Say the index structure puts Ä between A and B. But then you do a linguistic comparison. The language of that comparison may put Ä after Z, in which case the index can't be used. (Remember your condition could be a BETWEEN rather than an = ).
In short, you'll need a lot of preparation, both in your schema and the central store, to enable efficient joins between different charactersets.
It is difficult to say anything based on what you have provided. Did you manage to check if the estimated cardinalities and/or explain plan changed when you changed the datatype to NVARCHAR2? You may want to read the following blog post to see if you can find a lead
http://joze-senegacnik.blogspot.com/2009/12/cbo-oddities-in-determing-selectivity.html
It is likely no longer able to use indexes that it previously could. As Narendra suggests check the explain plan to see what changed. It is possible that once the centeralized store is changed the indexes will again be usable. I suggest testing that path.
Setting the NLS_LANG initialization parameter properly is essential to proper data conversion. The character set that is specified by the NLS_LANG initialization parameter should reflect the setting for the client operating system. Setting NLS_LANG correctly enables proper conversion from the client operating system code page to the database character set. When these settings are the same, Oracle assumes that the data being sent or received is encoded in the same character set as the database character set, so no validation or conversion is performed. This can lead to corrupt data if conversions are necessary.