How to convert String to BLOB in ESQL? - ibm-integration-bus

It should be as simple as
SET OutputRoot.BLOB.BLOB = CAST(MYSTRING AS BLOB);
But when I do that IIB Throws an error
An attempt was made to cast the character string ''ABC'' to a byte string, but the string was of the wrong format. There must be an even number of hexadecimal digits (0-9, a-f, A-F).

As you figured out, the Syntax of the CAST-function you need here is
CAST( <source_expression> AS <DataType> CCSID <expression> )
so in your code it is
CAST( MYSTRING AS BLOB CCSID 1208 )
The CCSID parameter is used only for conversions to or from one of the string data types. Use the CCSID parameter to specify the code page of the source or target string. [Source]
So with the Coded character set identifiers (CCSID) you define the code page. For example 1208 is the CCSID for UTF-8 with IBM PUA. You can see a list of IBMs CCSIDs here.
If you want to get informations on this topic in more detail you can check the IIB documentation for Version 9.0.0 or Version 10.0.0.

In my case I needed to change it to AS BLOB CCSID 1208
I need to read up on what CCSID means now.

Related

AWS SAM throws UnicodeEncodeError when invoking NodeJS 12.x lambda function [duplicate]

What could be causing this error when I try to insert a foreign character into the database?
>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
And how do I resolve it?
Thanks!
I ran into this same issue when using the Python MySQLdb module. Since MySQL will let you store just about any binary data you want in a text field regardless of character set, I found my solution here:
Using UTF8 with Python MySQLdb
Edit: Quote from the above URL to satisfy the request in the first comment...
"UnicodeEncodeError:'latin-1' codec can't encode character ..."
This is because MySQLdb normally tries to encode everythin to latin-1.
This can be fixed by executing the following commands right after
you've etablished the connection:
db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')
"db" is the result of MySQLdb.connect(), and "dbc" is the result of
db.cursor().
Character U+201C Left Double Quotation Mark is not present in the Latin-1 (ISO-8859-1) encoding.
It is present in code page 1252 (Western European). This is a Windows-specific encoding that is based on ISO-8859-1 but which puts extra characters into the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it's an annoying but now-standard web browser behaviour that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252 instead. However, they really are two distinct encodings:
>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'
If you are using your database only as a byte store, you can use cp1252 to encode “ and other characters present in the Windows Western code page. But still other Unicode characters which are not present in cp1252 will cause errors.
You can use encode(..., 'ignore') to suppress the errors by getting rid of the characters, but really in this century you should be using UTF-8 in both your database and your pages. This encoding allows any character to be used. You should also ideally tell MySQL you are using UTF-8 strings (by setting the database connection and the collation on string columns), so it can get case-insensitive comparison and sorting right.
The best solution is
set mysql's charset to 'utf-8'
do like this comment(add use_unicode=True and charset="utf8")
db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim Mar
13 '14 at 17:04
detail see :
class Connection(_mysql.connection):
"""MySQL Database Connection Object"""
default_cursor = cursors.Cursor
def __init__(self, *args, **kwargs):
"""
Create a connection to the database. It is strongly recommended
that you only use keyword parameters. Consult the MySQL C API
documentation for more information.
host
string, host to connect
user
string, user to connect as
passwd
string, password to use
db
string, database to use
port
integer, TCP/IP port to connect to
unix_socket
string, location of unix_socket to use
conv
conversion dictionary, see MySQLdb.converters
connect_timeout
number of seconds to wait before the connection attempt
fails.
compress
if set, compression is enabled
named_pipe
if set, a named pipe is used to connect (Windows only)
init_command
command which is run once the connection is created
read_default_file
file from which default client values are read
read_default_group
configuration group to use from the default file
cursorclass
class object, used to create cursors (keyword only)
use_unicode
If True, text-like columns are returned as unicode objects
using the connection's character set. Otherwise, text-like
columns are returned as strings. columns are returned as
normal strings. Unicode objects will always be encoded to
the connection's character set regardless of this setting.
charset
If supplied, the connection character set will be changed
to this character set (MySQL-4.1 and newer). This implies
use_unicode=True.
sql_mode
If supplied, the session SQL mode will be changed to this
setting (MySQL-4.1 and newer). For more details and legal
values, see the MySQL documentation.
client_flag
integer, flags to use or 0
(see MySQL docs or constants/CLIENTS.py)
ssl
dictionary or mapping, contains SSL connection parameters;
see the MySQL documentation for more details
(mysql_ssl_set()). If this is set, and the client does not
support SSL, NotSupportedError will be raised.
local_infile
integer, non-zero enables LOAD LOCAL INFILE; zero disables
autocommit
If False (default), autocommit is disabled.
If True, autocommit is enabled.
If None, autocommit isn't set and server default is used.
There are a number of undocumented, non-standard methods. See the
documentation for the MySQL C API for some hints on what they do.
"""
I hope your database is at least UTF-8. Then you will need to run yourstring.encode('utf-8') before you try putting it into the database.
Use the below snippet to convert the text from Latin to English
import unicodedata
def strip_accents(text):
return "".join(char for char in
unicodedata.normalize('NFKD', text)
if unicodedata.category(char) != 'Mn')
strip_accents('áéíñóúü')
output:
'aeinouu'
You are trying to store a Unicode codepoint \u201c using an encoding ISO-8859-1 / Latin-1 that can't describe that codepoint. Either you might need to alter the database to use utf-8, and store the string data using an appropriate encoding, or you might want to sanitise your inputs prior to storing the content; i.e. using something like Sam Ruby's excellent i18n guide. That talks about the issues that windows-1252 can cause, and suggests how to process it, plus links to sample code!
SQLAlchemy users can simply specify their field as convert_unicode=True.
Example:
sqlalchemy.String(1000, convert_unicode=True)
SQLAlchemy will simply accept unicode objects and return them back, handling the encoding itself.
Docs
Latin-1 (aka ISO 8859-1) is a single octet character encoding scheme, and you can't fit \u201c (“) into a byte.
Did you mean to use UTF-8 encoding?
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 106: ordinal not in range(256)
Solution 1:
\u2013 - google the character meaning to identify what character actually causing this error, Then you can replace that specific character, in the string with some other character, that's part of the encoding you are using.
Solution 2:
Change the string encoding to some encoding which includes all the character of your string. and then you can print that string, it will work just fine.
below code is used to change encoding of the string , borrowed from #bobince
u'He said \u201CHello\u201D'.encode('cp1252')
The latest version of mysql.connector has only
db.set_charset_collation('utf8', 'utf8_general_ci')
and NOT
db.set_character_set('utf8') //This feature is not available
I ran into the same problem when I was using PyMySQL. I checked this package version, it's 0.7.9.
Then I uninstall it and reinstall PyMySQL-1.0.2, the issue is solved.
pip uninstall PyMySQL
pip install PyMySQL
Python: You will need to add
# - * - coding: UTF-8 - * - (remove the spaces around * )
to the first line of the python file. and then add the following to the text to encode: .encode('ascii', 'xmlcharrefreplace'). This will replace all the unicode characters with it's ASCII equivalent.

MSG Clarification on PidTagInternetCodePage, PidTagMessageCodepage, PidTagStoreSupportMask

The official documentation for the MSG format states
PidTagStoreSupportMask
indicates whether string properties within the .msg file are Unicode-encoded or not. STORE_UNICODE_OK Set if the string properties are Unicode-encoded.
PidTagMessageCodepage
specifies the code page used to encode the non-Unicode string properties on this Message object
PidTagInternetCodepage
indicates the code page used for the PidTagBody property or the PidTagBodyHtml property
Based on the above my understanding is that if the unicode mask is set then all String properties are unicode encoded i.e UTF-16LE
If the mask is not set then PidTagMessageCodepage is used to decode all String properties in the message.
Based on the documentation non-unicode and unicode properties cannot exist together.
So, what is the purpose of the PidTagInternetCodepage ? It is used to decode the body or bodyhtml which have types ptystring.
If a message has the unicode storemask then
Q1. Do we decode the PidTagBody/PidTagBodyHtml using unicode or PidTagInternetCodepage ?
If a message is non-unicode then
Q2. Do we decode PidTagBody/PidTagBodyHtml using PidTagMessageCodepage or PidTagInternetCodepage ?
Q3. Do we use unicode when storemask is set, and when it is not first attempt PidTagInternetCodepage then PidTagMessageCodepage for PidTagBody/PidTagBodyHtmlit ?
Q4. What do we do if none are present .. default to 1252 ?
PR_BODY is not different from any other string property (such as PR_SUBJECT) - it comes in both PT_STRING8 and PT_UNICODE flavors.
PR_HTML, on the other hand, is PT_BINARY and it stores the data in a binary byte blob. Most HTML bodies includes the charset as a part of the HTML headers, but if it is not present, you will need to use PR_INTERNET_CODEPAGE.

PLSQL - convert UTF-8 NVARCHAR2 to VARCHAR2

I have a table with a column configured as NVARCHAR2, I'm able save the string in UTF-8 without any issues.
But the application the calls the value does not fully support UTF-8.
This means that the string is passed to the database and back after the string is converted into HTML letter code. Each letter in the string is converted to such HTML code.
I'm looking for an easier solution.
I've considered converting it to BASE64, but it contains various characters which are considered illegal in the application.
In addition tried using HEXTORAW & RAWTOHEX.
None of the above helped.
If the column contains 'κόσμε' I need to find a way to convert/encode it to something else, but the decode should be possible to do from the HTML running the application.
Try using ASCIISTR function, it will convert it in something similar as JSON encodes unicode strings (it's actually the same, except "\" is used instead of "\u") and then when you receive it back from front end try using UNISTR to convert it back to unicode.
ASCIISTR: https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions006.htm
UNISTR: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions204.htm
SELECT ASCIISTR(N'κόσμε') FROM DUAL;
SELECT UNISTR('\03BA\1F79\03C3\03BC\03B5') FROM DUAL;

PL/SQL Apply Same Functions More Than Once

There is an encoding problem at existing Oracle database. From Java side, I apply these and fix it:
textToEscape = textToEscape.replace(/ö/g, 'ö');
textToEscape = textToEscape.replace(/ç/g, 'ç');
textToEscape = textToEscape.replace(/ü/g, 'ü');
textToEscape = textToEscape.replace(/ÅŸ/g, 'ş');
textToEscape = textToEscape.replace(/Ä/g, 'ğ');
There is a procedure which retrieves data from database. I want to write a function and apply that replace sequence inside it. I found that link:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions134.htm
However I want to apply consequent replaces. How can I chain them?
you can use Oracle CONVERT function to convert data into correct character set (compatible with your JAVA charset) inside database procedure itself.
That should handle all cases for you.
Assuming your database character set is AL32UTF8, The malformed characters that you see stem from a repeated conversion of an 8-bit character set encoding (presumably iso-8859-9 [Turkish]) to unicode in the utf-8 representation. The second of these conversions, of course, has been applied erroneously to the byte sequence that constituted the valis utf representation of your data.
You can reverse this within the database using the utl_raw package. Say tab.col contains your data, the following statement rectifies it.
update tab set col = utl_raw.cast_to_varchar2 ( utl_raw.convert ( utl_raw.cast_to_raw ( col ), 'WE8ISO8859P9', 'AL32UTF8' ) );
The casts retag the type of the character data which effectively allows for operating on the underlying octet (byte) sequence. on this level, the eroneus utf-8 mapping is invereted. since the result is still a valid representation in the database character set, a simple re-cast delivers the result.

SNMP4J - OID Output Options - Hex-STRING as STRING

I'm using SNMP4J to read info of devices with SNMP. Now I found some devices which represent the system name (OID iso.3.6.1.2.1.1.5.0) as a Hex-STRING instead of a STRING.
To show the system name I use the following code:
Variable var = response.getVariable(new OID(".1.3.6.1.2.1.1.5.0"));
System.out.println(var.toString());
Where response is a PDU object.
If the system name is represented as a STRING value, this goes as I expected. When it is represented as a Hex-STRING, it just prints the Hex value.
Example:
Take the name of the system as "SYSTEM NAME".
With STRING it prints "SYSTEM NAME".
With Hex-STRING it prints "53:59:53:54:45:4d:20:4e:41:4d:45"
Now with snmpwalk in command line I can just use the -Oa flag. This makes all Hex-STRING values show as STRING. Is it possible to use this flag in SNMP4J or is there a similar option?
I'm not sure where you're getting the term "Hex-STRING" from. SNMP does not define such a data type. I suggest you read through the relevant RFC documents, they are publicly available from IETF. The wikipedia article for SNMP (http://en.wikipedia.org/wiki/Simple_Network_Management_Protocol#References) has an excellent reference list, you can start with browsing the ones marked as "STD".
In SNMP, all strings are subtypes (or in a different word, "restrictions") of OCTET-STRING, a byte string of indeterminate length. It may contain any data, even non-printable stuff, representing a jpeg image or whatever.
Some textual-conventions have been defined, which restrict the data to some specific byte range, or length. A DisplayString is defined to only contain bytes from the NVT ASCII character set, so the user may trust it to be printable.
In fact, sysName is defined to be a DisplayString with a max length of 255 characters.
sysName OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..255))
Since a good SNMP manager is aware of RFC1213-MIB, which defines both sysName and DisplayString, the manager should assume that the data received is printable ASCII characters.
When you say "When it is represented as a Hex-STRING", what do you mean? "Represented" where, on the agent or in your Java code or when using the net-snmp "snmpwalk" command?
The var.toString() call should convert the contents of the variable into something that could be safely printed in a terminal, so it's possible that SNMP4j is converting any binary string to a hex string.

Resources