PL/SQL Apply Same Functions More Than Once - oracle

There is an encoding problem at existing Oracle database. From Java side, I apply these and fix it:
textToEscape = textToEscape.replace(/ö/g, 'ö');
textToEscape = textToEscape.replace(/ç/g, 'ç');
textToEscape = textToEscape.replace(/ü/g, 'ü');
textToEscape = textToEscape.replace(/ÅŸ/g, 'ş');
textToEscape = textToEscape.replace(/Ä/g, 'ğ');
There is a procedure which retrieves data from database. I want to write a function and apply that replace sequence inside it. I found that link:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions134.htm
However I want to apply consequent replaces. How can I chain them?

you can use Oracle CONVERT function to convert data into correct character set (compatible with your JAVA charset) inside database procedure itself.
That should handle all cases for you.

Assuming your database character set is AL32UTF8, The malformed characters that you see stem from a repeated conversion of an 8-bit character set encoding (presumably iso-8859-9 [Turkish]) to unicode in the utf-8 representation. The second of these conversions, of course, has been applied erroneously to the byte sequence that constituted the valis utf representation of your data.
You can reverse this within the database using the utl_raw package. Say tab.col contains your data, the following statement rectifies it.
update tab set col = utl_raw.cast_to_varchar2 ( utl_raw.convert ( utl_raw.cast_to_raw ( col ), 'WE8ISO8859P9', 'AL32UTF8' ) );
The casts retag the type of the character data which effectively allows for operating on the underlying octet (byte) sequence. on this level, the eroneus utf-8 mapping is invereted. since the result is still a valid representation in the database character set, a simple re-cast delivers the result.

Related

How does GitHub encode their graphQL cursors?

GitHub's graphql cursors are intentionally opaque, so they shouldn't ever be decode by a client. However I'd like to know their approach towards pagination, especially when combined with sorting.
There are multiple layers of encoding for the encoding used for pagination cursors used by GitHub. I will list them in order from the perspective of a decoder:
The cursor string is encoded using URL safe base64 meaning it uses - and _ instead of + and /. This might be to have consistency with their REST based API.
Decoding the base64 string gives us another string in the format of cursor:v2:[something] so the next step is decoding the something.
The 'something' is a binary encoded piece of data containing the actual cursor properties. The first byte defines the cursor type:
0x91 => We don't use any sorting, the cursor contains the length of the id field and the id itself. 0xcd seems to indicate a two-byte id, 0xce a four-byte id. This is followed by the id itself, which can be verified by decoding the base64 id graphql field.
0x92 => A composite cursor containing the sorted property and the id. This is either a length-prefixed ordinal number or two bytes plus a string or ISO date string followed by the length-prefixed id.

How do I add unprintable ascii control characters to a VARCHAR2 in Oracle PL/SQL?

I am supporting a label printing system which uses PL/SQL and an ORACLE database to fill in values that are then encoded in various barcodes (among other things). For a specific 2 dimensional data matrix barcode I'm attempting to create, I need to include some unprintable characters as control characters for the system that is scanning the barcode.
The problem is that I have no idea how to encode those characters.
Example:
v_string VARCHAR2(1000);
...
v_string := '[)><RS>06<GS>12SC<GS>16S2'||'<GS>V'||:(IN:-SPLR-)||'<GS>3S'||v_serial||'<GS>P'||:(IN:-CUSTITEM-)||'<GS>Q'||v_boxqty||'<GS>1T'||:(IN:-LOT-)||'<GS>15K123456789123'||:(IN:-PRODSEQ-)||'<RS><EOT>';
Then v_string is passed as a parameter to a function that actually populates the barcode on the printed label. The problem is that every <GS>, <RS>, and <EOT> in the string are supposed to be control characters. I have the ascii decimal and hex values for those control characters, but no idea how to add them into the above string instead of the placeholders.
Any help would be appreciated.
You can use the chr() function to supply individual ASCII characters:
CHR returns the character having the binary equivalent to n as a VARCHAR2 value in either the database character set or, if you specify USING NCHAR_CS, the national character set.
(You can also use unistr() for Unicode characters, but that doesn't seem to be necessary in this case; but note the ASCII/EBCDIC message in the chr() document...)
For those control characters you can use chr(4), chr(29) and chr(30):
v_string := '[)>'||chr(30)||'06'||chr(29)||'12SC'||chr(29)||'16S2'||chr(29)||'V'||:(IN:-SPLR-)||chr(29)||'3S'||v_serial||chr(29)||'P'||:(IN:-CUSTITEM-)||chr(29)||'Q'||v_boxqty||chr(29)||'1T'||:(IN:-LOT-)||chr(29)||'15K123456789123'||:(IN:-PRODSEQ-)||chr(30)||chr(4);
db<>fiddle showing the generated string - the printable parts, anyway - and it's dumped value, so you can see the 4/29/30 characters are actually there.
You could also build your string as you have it, then pass it through replace() to replace the <GS> etc. placeholders with the chr() values.
If you have the ASCII value, you can use the CHR function to construct your string, as in:
v_string:='xx'||chr(10).....

Oracle convert from utf-16 hex to utf-8 character

My database character set is AL32UTF8 and national character set AL16UTF16. I need to store in a table numeric values of some characters according to db character set and later on display a specific character using numeric value. I had some problems with understanding how this encoding works (differences between unistr, chr, ascii functions and so on), but eventually I found website where the following code was used:
chr(ascii(convert(unistr(hex), AL32UTF8)))
And it works fine when hex code is smaller than 1000 when I use for example:
chr(ascii(convert(unistr('\1555'), AL32UTF8)))
chr(ascii(convert(unistr('\1556'), AL32UTF8)))
it returns the same ascii value (ascii(convert(unistr('\hex >= 1000'), AL32UTF8))). Could anyone look at this and try to explain what's the reason? I really thought I understood how it works, but now I'm confused a bit.

characters from notepad file getting converted into special characters while reading using utl_file.get_line procedure

I have written a program to read data from a text file and load it into a table using UTL_FILE package in oracle. While reading a few lines some characters are getting converted into special characters, for example:
string in file = 63268982_GHC –EXH PALOMARES EVA
value entered into database = 63268982_GHC âEXH PALOMARES EVA
I tried using Convert function but it did not achieve anything.
My Oracle version is 11gR2 and it's using the nls charset WE8ISO8859P1. Because these strings represent physical file names I get a mismatch when I try to match with the filename.
I tried re-converting the value stored in Oracle in WE charset back to ascii like below:
convert('63268989_GHC âEXH PALOMARES','us7ascii','WE8ISO8859P1')
but the outcome is different from what was there in text file while reading. Can anyone please suggest how this problem can be overcome.
The – character in the file is not a regular hyphen (-, chr(45)) but an En Dash / U+2013 stored as three bytes, decimal 226, 128, 147 or hex e2, 80, 93. Interpreted individually rather than as a single multibyte character, these correspond to – as shown here.
Try opening the file with utl_file.fopen_nchar and reading lines with utl_file.get_line_nchar.
Oracle 11gR2 Database Globalization Support Guide: Programming with Unicode.

PLSQL - convert UTF-8 NVARCHAR2 to VARCHAR2

I have a table with a column configured as NVARCHAR2, I'm able save the string in UTF-8 without any issues.
But the application the calls the value does not fully support UTF-8.
This means that the string is passed to the database and back after the string is converted into HTML letter code. Each letter in the string is converted to such HTML code.
I'm looking for an easier solution.
I've considered converting it to BASE64, but it contains various characters which are considered illegal in the application.
In addition tried using HEXTORAW & RAWTOHEX.
None of the above helped.
If the column contains 'κόσμε' I need to find a way to convert/encode it to something else, but the decode should be possible to do from the HTML running the application.
Try using ASCIISTR function, it will convert it in something similar as JSON encodes unicode strings (it's actually the same, except "\" is used instead of "\u") and then when you receive it back from front end try using UNISTR to convert it back to unicode.
ASCIISTR: https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions006.htm
UNISTR: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions204.htm
SELECT ASCIISTR(N'κόσμε') FROM DUAL;
SELECT UNISTR('\03BA\1F79\03C3\03BC\03B5') FROM DUAL;

Resources