Displaying whole ORACLE 8-bit CHARSETS in UNICODE - oracle

I maintain an Java EE web application against an eight bits charset oracle database.
The application will be used from abroad and I want to be able to check strings -for example with UNICODE regexps, and both from Java and from Javascript- to see if they fit into the database CHARSET.
One function in GDK -globalization developer kit- gives the equivalent Java name of the oracle charset -I think it was ISO-8859-15-. But I'm not certain the correspondence will be exact.
What I wanted is to display the whole charset -NOT ISO..., but the ORACLE one- char by char to use both from Java and Javascript, even to display the UNICODE points and to tell apart the control characters from printable ones.
There is a funcion in Oracle's GDK to that end?
Thank you.

I think I've found it! (Eureka!)
A little JAVA JDBC program resulted in exactly the characters in ISO-8859-15 that are distintc to ISO-8859-1 (by the way, I've learned that ISO-8859-1 occupies from 0x00 to 0xff in UNICODE).
Program output:
CHR: 164 UNICODE: 8364 euro sign
CHR: 166 UNICODE: 352
CHR: 168 UNICODE: 353
CHR: 180 UNICODE: 381
CHR: 184 UNICODE: 382
CHR: 188 UNICODE: 338
CHR: 189 UNICODE: 339
CHR: 190 UNICODE: 376
Program code (not using GDK at all):
NOTE: the statement "SELECT CHR(i using nchar_cs) FROM DUAL" just gave back the same numbers... WHY?
for(int i=0; i<256; i++)
{
Statement select = con.createStatement();
ResultSet result = select.executeQuery("select CHR(" + i +") from DUAL");
while(result.next())
{
int unicodePoint = result.getString(1).codePointBefore(1);
//int unicodePoint = result.getString(1).codePointAt(0);
if (unicodePoint != i)
System.out.println("CHR: " + i + "\tUNICODE: " + unicodePoint);
}
result.close();
result = null;
select.close();
select = null;
}

Related

Different Code 128 barcode symbols representing the same data

I'm currently using software called LineView. It generates downtime reason codes for our factory lines. An operator scans the barcodes with an RS232 scanner and it goes into our XL board system.
The software itself generates the barcodes within an internet browser, but I am trying to make it so our own labeling machine can also print out the barcodes. However, the barcodes that are produced by the labeler (and the many online barcode generators I've tried) look longer and do not work.
The data for the example 128 barcode that I am trying to replicate is [SOH]1[STX]65;1067[ETX].
According to the manual:
- The Start of Header character (ASCII 0x01) starts the XL Command packet.
1 - The Serial Address of the XL device (the default is 1).
- The Start of Transmission character (ASCII 0x02) marks the start of the actual command.
65; - The ID of the Production State > Set Reason Code command.
The Reason Code ID (which can range from 1 to 999 for system reasons or 1000 to 1999 for user defined reasons). In my case it is 1067
- The End of Transmission character (ASCII 0x03) ends the XL Command packet.
I have attatched the pictures of what LineView produces (which is what I want it to look like) and what it is currently printing like on our labeller.
When I scan them they both come up with the [SOH]1[STX]65;1067[ETX] code despite them looking different.
Any help with this would be very much appreciated.
Your intended barcode is constructed internally using the following series of Code 128 codewords which correctly represent the ASCII control characters:
103 Start-in-Mode-A (Upper-case and control characters)
65 [SOH] (ASCII 1)
17 1
66 [STX] (ASCII 2)
22 6
21 5
27 ;
99 Switch-to-Mode-C (Double-density numeric)
10 10
67 67
101 Switch-to-Mode-A
67 [ETX] (ASCII 3)
67 Check-digit
106 Stop
Your label printer is printing a barcode representing the literal string [SOH]1[STX]65;1067[ETX] with no ASCII control characters (i.e. left-bracket, S, O, H, right-bracket, ...) using the following internal codewords:
104 Start-in-Mode-B (Mixed-case)
59 [
51 S
47 O
40 H
61 ]
17 1
59 [
51 S
52 T
56 X
61 ]
22 6
21 5
27 ;
99 Switch-to-Mode-C (Double-density numeric)
10 10
67 67
100 Switch-to-Mode-B
59 [
37 E
52 T
56 X
61 ]
57 Check-digit
106 Stop
So you need to work out how to correctly specify ASCII control characters in the input to your labelling machine.

Encrypt and Decrypt AES with Golang and Ruby

I'm working on making two secure systems talk via a common encryption scheme. I picked AES as it seems a secure standard, but I'm not married to it, so long as I have two way encryption.
Here is the Go source and Ruby source simplified down to a really clear example to run from command line and see the differences. I'm outputting bytecode for easier literal comparison.
I'm using 128 bit CFB in both, and neither of them appear to have padding, any help is greatly appreciated!
You passed wrong key size in Ruby code. It should be 192. (because key.size is 24 bytes == 192 bits)
cipher = OpenSSL::Cipher::AES.new(192, :CFB)
cipher.encrypt
cipher.key = key
cipher.iv = iv
encrypted = cipher.update(input) + cipher.final()
puts "Output: [" + encrypted.bytes.join(" ") + "]"
output:
Output: [155 79 127 80 31 163 142 111 13 211 221 163 219 248]

ruby YAML parse bug with number

I have encountered what appears to be a bug with the YAML parser. Take this simple yaml file for example:
new account:
- FLEETBOSTON
- 011001742
If you parse it using this ruby line of code:
INPUT_DATA = YAML.load_file("test.yml")
Then I get this back:
{"new account"=>["FLEETBOSTON", 2360290]}
Am I doing something wrong? Because I'm pretty sure this is never supposed to happen.
It is supposed to happen. Numbers starting with 0 are in octal notation. Unless the next character is x, in which case they're hexadecimal.
07 == 7
010 == 8
011 == 9
0x9 == 9
0xA == 10
0xF == 15
0x10 == 16
0x11 == 17
Go into irb and just type in 011001742.
1.9.2-p290 :001 > 011001742
=> 2360290
PEBKAC. :)
Your number is a number, so it's treated as a number. If you want to make it explictly a string, enclose it into quotes, so YAML will not try to make it a number.
new account:
- FLEETBOSTON
- '011001742'

Explain what those escaped numbers mean in unicode encoding in ruby 1.8.7

0186 is the unicode "code". Where do 198 and 134 come from? How can go the other way around, from these byte codes to unicode strings?
>> c = JSON '["\\u0186"]'
[
[0] "Ɔ"
]
>> c[0][0]
198
>> c[0][1]
134
>> c[0][2]
nil
Another confusing thing is unpack. Another seemingly arbitrary number. Where does that come from? Is it even correct? From the 1.8.7 String#unpack documentation:
U | Integer | UTF-8 characters as unsigned integers
>> c[0].unpack('U')
[
[0] 390
]
>
You can find your answers here Unicode Character 'LATIN CAPITAL LETTER OPEN O' (U+0186):
Note that 186 (hexadecimal) === 390 (decimal)
C/C++/Java source code : "\u0186"
UTF-32 (decimal) : 390
UTF-8 (hex) : 0xC6 0x86 (i.e. 198 134)
You can read more about UTF-8 encoding on Wikipedia's article on UTF-8.
UTF-8 (UCS Transformation Format — 8-bit[1]) is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32.

MATLAB: how to display UTF-8-encoded text read from file?

The gist of my question is this:
How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?
Details:
I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:
>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8');
>> [x, x, x, enc] = fopen(fid); enc
enc =
UTF-8
>> tbl = textscan(fid, '%s', 35, 'delimiter', ',');
>> tbl{1}{1}
ans =
ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ
>>
As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:
>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω'
pasted =
>>
Thanks!
I present below my findings after doing some digging... Consider these test files:
a.txt
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
b.txt
தமிழ்
First, we read files:
%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')'; %'# read bytes
fclose(fid);
%# decode as unicode string
str = native2unicode(b,'UTF-8');
If you try to print the string, you get a bunch of nonsense:
>> str
str =
Nonetheless, str does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):
>> double(str)
ans =
Columns 1 through 13
915 916 920 923 926 928 931 934 937 945 946 947 948
Columns 14 through 26
949 950 951 952 953 954 955 956 957 958 960 961 962
Columns 27 through 35
963 964 965 966 967 968 969 13 10
Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:
figure
text(0.1, 0.5, str, 'FontName','Arial Unicode MS')
title(str)
xlabel(str)
One trick I found is to use the embedded Java capability:
%# Java Swing
label = javax.swing.JLabel();
label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) );
label.setText(str);
f = javax.swing.JFrame('frame');
f.getContentPane().add(label);
f.pack();
f.setVisible(true);
As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet undocumented feature and set the charset to UTF-8 (on my machine, it is ISO-8859-1 by default):
feature('DefaultCharacterSet','UTF-8');
Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):
>> str
str =
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
>> disp(str)
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω
And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):
uicontrol('Style','text', 'String',str, ...
'Units','normalized', 'Position',[0 0 1 1], ...
'FontName','Arial Unicode MS', 'FontSize',30)
Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:
As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.

Resources