How to import chinese characters from excel into oracle - oracle

How to import chinese characters from excel into oracle and also extract chinese characters from oracle into excel?

You would simply need to ensure that you have set Oracle to use a Unicode-compatible character encoding (I recommend UTF-8) and that no tool you are using to do the transfer is non-Unicode-safe.
Without knowing what you're using to do the transfer it's hard to give more specific information about where something is going wrong.

Make sure the database is setup with the correct character set to store/house chinese characters properly. If the database can't store it then it pointless to imported it. If the database can support the asian character set then set the client character set to a compatible one. Once that is done the underline oci/odbc layer will handle the translation of one character set to the other as long as they are compatible.

Check the following.
Database Character set (zhs16gbk)
Set the local variable to NLS_LANG=SIMPLIFIED CHINESE.CHINA_UTF8
Open the excel file save the file to unicode text
open the text file save the file to ANSI format
In the PLSQL developer, we have option called text importer. Using this tool we can import the data sucessfully

Related

Oracle server encoding and file encoding

I have a Oracle server with a DAD defined with PlsqlNLSLanguage DANISH_DENMARK.WE8ISO8859P1.
I also have a JavaScript file that is loaded in the browser. The JavaScript file contains the danish letters æøå. When the js file is saved as UTF8 the danish letters are misencoded. When I save js file as UTF8-BOM or ANSI then the letters are shown correctly.
I am not sure what is wrong.
Try to set your DAD
PlsqlNLSLanguage DANISH_DENMARK.UTF8
or even better
PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8
When you save your file as ANSI it typically means "Windows Codepage 1252" on Western Windows, see column "ANSI codepage" at National Language Support (NLS) API Reference. CP1252 is very similar to ISO-8859-1, see ISO 8859-1 vs. Windows-1252 (it is the German Wikipedia, however that table shows the differences much better than the English Wikipedia). Hence for a 100% correct setting you would have to set PlsqlNLSLanguage DANISH_DENMARK.WE8MSWIN1252.
Now, why do you get correct characters when you save your file as UTF8-BOM, although there is a mismatch with .WE8ISO8859P1?
When the browser opens the file it first reads the BOM 0xEF,0xBB,0xBF and assumes the file encoded as UTF-8. However, this may fail in some circumstances, e.g. when you insert text from a input field to database.
With PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8 you tell the Oracle Database: "The web-server uses UTF-8." No more, no less (in terms of character set encoding). So, when your database uses character set WE8ISO8859P1 then the Oracle driver knows he has to convert ISO-8859-1 characters coming from database to UTF-8 for the browser - and vice versa.

FTP batch report with chinese characters to excel

We have a requirement to FTP the batch report to a excel sheet in .csv format. The batch report contains both single byte and double byte characters, for example, English and Chinese. The data in mainframe is in Base64 format and when this is FTP’ed in either Binary or ASCII mode, the resulting .csv spreadsheet shows only junk characters. We need a method to FTP the batch report file, so that the FTP’ed report is in readable format.
Request your help in resolving this issue.
I'm not familiar with Chinese character sets but I would think if you're not restricted to CSV, you might try to format an XML document for excel whereby you can specify the fonts as part of the spreadsheet definition.
Assuming that isn't an option I would think the Base64 format might need to be translated to ASCII (from EBCDIC) before transmission and then delivered in BINARY. Otherwise you risk having the data translated to something you didn't expect.
Another way to see what is really happening is send the data as ASCII and retrieve the data as BINARY and then compare the before and after results to see what characters were changed enroute during transmission. I recall having to do something similar to this once to resolve different code sets in Europe vs. U.S.
I'm not sure any of these suggestions would represent a "solution" to your problem, but these would be ideas that I would explore. I would be interested in hearing how you resolve this.

When I export Chinese characters from Oracle forms to Excel, they are not Chinese anymore

I have problem with Chinese characters when I export them from Oracle forms 10g to Excel on Windows 7. Although they look like Chinese but they are not Chinese characters. Take this into consideration that I have already changed the language of my computer to Chinese and restarted my computer. I use owa_sylk utility and call the excel report like:
v_url := 'http://....../excel_reports.rep?sqlString=' ||
v_last_query ||
'&font_name=' ||
'Arial Unicode MS'||
'&show_null_as=' ||
' ' ;
web.show_document(v_url,'_self');
Here you can see what it looks like:
Interestingly, when I change the language of my computer to English, this column is empty. Besides, I realized that if I open the file with a text editor then it has the right Chinese word, but when we open it with Excel we have problem.
Does anyone has a clue?
Thanks
Yes, the problem comes from different encodings. If DB uses UTF-8 and you need to send ASCII to Excel, you can convert data right inside the owa_sylk. Use function convert.
For ex. in function owa_sylk.print_rows change
p( line );
on
p(convert(line, 'ZHS32GB18030','AL32UTF8'));
Where 'ZHS32GB18030' is one of Chinese ASCII and 'AL32UTF8' - UTF-8.
To choose encoding parameters use Appendix A
You can also do
*SELECT * FROM V$NLS_VALID_VALUES WHERE parameter = 'CHARACTERSET'*
to see all the supported encodings.
This is a character encoding issue. What you need to make sure is that all tools in the whole chain (database, web service, Excel, text editor and web browser) use the same character encoding.
Changing your language can help here but a better approach is to nail the encoding down for each part of the chain.
The web browser, for example, will prefer the encoding supplied by the web server over the OS's language settings.
See this question how to set UTF-8 encoding (which can properly display Chinese in any form) for Oracle: export utf-8 data to text file with oracle sql developer
I'm not sure how to set the encoding for owa_sylk, you will have to check the documentation (couldn't find any, though). If you can't find anything, ask a question here or use a different tool.
So you need to find out who executes excel_reports.rep and configure that correctly. Use a developer tool of your web browser and check the "charset" or "encoding" of the page.
The problems in Excel are based on the file format which you feed into it. Excel (.xls and .xlsx files) files are Unicode safe, .csv isn't. So if you can read the file in your text editor, chances are that this is a non-Excel file format which Excel can parse but it doesn't contain the necessary encoding information.
If you were able to generate a UTF-8 encoded file with the steps above, you can load the file by using "Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin." in the "Text Import Wizard" (source)

A leading question mark in oracle using datastage to import from text to oracle?

The question mark "?" appears only in the front of the first field of the first row to insert.
For once, I changed the ftp upload file type to text/ascii (rather than binary) and it seemed resolve the problem. But later it came back.
The server OS is aix5.3.
DataStage is 7.5x2.
Oracle is 11g.
I used ue to save the file to utf-8, using unix end mark.
Has anyone got this thing before?
The question mark itself doesn't mean much as it could be only a "mask" for some special character which is not recognized by the database. You didn't provide any details about your environment, so my opinions here are only a guess. I hope it can give you a little of a light.
How is the text file created? If it's a file created in a windows environment you're very likely to have character like this due brake lines {CR}{LF} characters.
What is the datatype for the oracle table?
Char datatype will "fill" every position according to the size of the field, I'd recommend to use varchar instead on this case.
If it's not the case, I would edit the file in Hex mode and check for the Ascii code for this specific character then use a TRIM (if parallel) or Convert(if server) to replace the character.
The convert function would be something like this:
Convert(Char([ascii_char_number]),'',[your_string])
Alternatively you can use the Trim function if your job is a parallel job
Trim([your_string],[ascii_char_number],'L')
The option "L" will remove all leading characters. You might need to adapt this function to suit your needs. If you're not familiar with the TRIM function you can find more details at the datastage online documentation.
The only warning I'd give when doing this, is that you'll be deleting data from your original source of data, so make sure you're not deleting any valid information when manipulating a file like this as this is not a very recommended practice between the ETL gurus out there.
Any questions, give me a shout. Happy to help if I can.
Cheers
I had a similar issue where unprintable characters were being displayed as '?' and datastage was throwing a warning when processing these records. It was ok for me to not display those unprintable characters, so I used the function ICONV which converts those characters into printable ones. There are multiple options, I chose the one which will convert them to '.' which worked for me. More details are available in the IBM pages below:
https://www-01.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/r_deeref_String_Functions.html
http://docs.intersystems.com/ens201317/csp/docbook/DocBook.UI.Page.cls?KEY=RVBS_foconv
The conversion I used:
ICONV(column_name,"MCP")

mac excel 2011 mangling "nestlé" when importing text file

If I have a text file consisting solely the word "NESTLÉ", how do I open this in Excel without mangling the accent?
This question isn't quite covered by other questions on the site, so far as I can tell. I don't see any difference in any import option. I try to tell Excel it's UTF-8 when I import it, and the best that happens is that the É => _.
If I create a Google Docs spreadsheet with just that word and save it out to Excel format, then open in Excel, I get the data un-mangled, so that's good, it's possible to represent the data.
I've never seen Excel 2011 do anything smart with a UTF8 BOM indicator at the start of a file.
Does anyone else have different experience there, or know how to get this data from a text file to Excel without any intermediate translation tools?
I saved a file with that word in multiple formats. The results when opened with Excel 2010 by simply dragging and dropping the appropriate .txt file on it:
Correct
ANSI1 (Windows-1252 encoding on my system, which is US Windows)
UTF-8 with BOM
UTF-16BE without BOM
UTF-16LE without BOM
UTF-16LE with BOM
Incorrect
UTF-8 without BOM (result NESTLÉ)
UTF-16BE with BOM (result þÿNESTLÉ)
Do you know the encoding of your text file? Interesting the UTF-16BE with BOM failed. Excel is probably using a heuristic function such as IsTextUnicode.
1The so-called ANSI mode on Windows is a locale-specific encoding.

Resources