I am using ADODB with VB6 to select data from Excel. The "Book_Title" column in Excel contains extended ASCII characters (Abreve-Ă).
But when using the following code, I only get "A" instead of Abreve.
sConn = "DRIVER=Microsoft Excel Driver (*.xls);" & "DBQ=D:\sheik\metadata.xls"
rs.Open "SELECT [Book_Title], [Author_Title] FROM [Sheet1$], sConn
The problem here is that the Excel driver is converting the strings to ANSI symbols, for some reason. Some "clever" code is converting the Ă character (258) to A (65).
If you have the JET drivers with the ISAM Excel driver installed, then the following connection string will use them:
sConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\sheik\metadata.xls;Extended Properties=""Excel 8.0;"""
You will now get back the unconverted strings. However, you probably will not be able to see them correctly in any of the built-in VB controls, or the IDE, for that matter, because it is unlikely that this character exists in your current code page.
But you can confirm that the first character is correct by using the AscW() function to look at the characters in the string, obtained with Mid$().
Related
I have some product names that include unicode characters
⚠️📷PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK
SELLER
A query in heidiSQL shows it fine
I setup MariaDB new this morning having moved from MySQL, but when records are retrieved through a ColdFusion Query using the MariaDB JDBC I get
java.lang.StringIndexOutOfBoundsException: begin 0, end 80, length 74
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
at java.base/java.lang.String.substring(String.java:1883)
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalString(TextRowProtocol.java:238)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getString(SelectResultSet.java:948)
The productname field collation is utf8mb4_unicode_520_ci, I've tried a few options. I've tried to set this at table and database level where it let me.
The JDBC connection string in ColdFusion admin is jdbc:mysql://localhost:3307/usedlens?useUnicode=true&characterEncoding=UTF-8
I note that the live production database where MariaDB was used from the beginning I don't have this trouble but the default charset is latin1, and the same record is the database as
????PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK SELLER
Here's how we've been stripping high ASCII characters while retaining any characters that may be salvaged:
string function ASCIINormalize(string inputString=""){
return createObject( 'java', 'java.text.Normalizer' ).normalize( javacast("string", arguments.inputString) , createObject( 'java', 'java.text.Normalizer$Form' ).valueOf('NFD') ).replaceAll('\p{InCombiningDiacriticalMarks}+','').replaceAll('[^\p{ASCII}]+','');
}
productname = ASCIINormalize(productname);
/*
Comparisons using java UDF versus reReplace regex:
"ABC Café ’test" (note: High ASCII non-normal whitespace characters used.)
ASCIINormalize = "ABC Cafe test"
reReplace = "ABC Caf test"
"čeština"
ASCIINormalize = "cestina"
reReplace = "etina"
"Häuser Bäume Höfe Gärten"
ASCIINormalize = "Hauser Baume Hofe Garten"
reReplace = "Huser Bume Hfe Grten"
*/
This is due to a sequence of high ASCII characters that form emojis. I encountered similar issues when exporting MSSQL data to a UTF-8 file to be converted to Excel using a 3rd party tool. In this case, the database and file were correct, but the 3rd party tool would crash when encountering emoji characters.
Our approach to this was to convert emojis to their aliases so that information wasn't lost in the process. (If you strip high ASCII characters, you may lose some context.) To sanitize emojis to use aliases, I wrote this ColdFusion cf-emoji-java (CFC) to leverage emoji-java (JAR file) to convert emojis to their ASCII7-safe aliases.
emojijava = new emojijava();
emojijava.parseToAliases('I like 🍕'); // I like :pizza:
Since...
I'm not really in the business of supporting emojis
My data is just product names targeted at UK, Europe and the United States for the foreseeable future
I don't want to have to go through the same trouble with production (already defaulted to latin1_swedish_ci)
I decided to..
Match production, so I set the database, table, and fields to latin1_swedish_ci with help from
How to change the CHARACTER SET (and COLLATION) throughout a database?
and strip non ASCII characters in the product name
== edit don't do this, it takes out too many useful characters ==
<cfset productname = reReplace(productname, "[^\x20-\x7E]", "", "ALL")>
I have a table with a column configured as NVARCHAR2, I'm able save the string in UTF-8 without any issues.
But the application the calls the value does not fully support UTF-8.
This means that the string is passed to the database and back after the string is converted into HTML letter code. Each letter in the string is converted to such HTML code.
I'm looking for an easier solution.
I've considered converting it to BASE64, but it contains various characters which are considered illegal in the application.
In addition tried using HEXTORAW & RAWTOHEX.
None of the above helped.
If the column contains 'κόσμε' I need to find a way to convert/encode it to something else, but the decode should be possible to do from the HTML running the application.
Try using ASCIISTR function, it will convert it in something similar as JSON encodes unicode strings (it's actually the same, except "\" is used instead of "\u") and then when you receive it back from front end try using UNISTR to convert it back to unicode.
ASCIISTR: https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions006.htm
UNISTR: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions204.htm
SELECT ASCIISTR(N'κόσμε') FROM DUAL;
SELECT UNISTR('\03BA\1F79\03C3\03BC\03B5') FROM DUAL;
I'm trying to import a CSV file (UTF-8 encoding) in Ruby (2.0.0) in to my database (MSSQL 2008R2, COLLATION French_CI_AS), but the special characters (French accents on vowels) are not stored properly : éèçôü becomes éèçôü (or other similar jibberish).
I use this piece of code to read the file :
CSV.foreach(file, col_sep: ';', encoding: "utf-8") do |row|
# ...
end
I tried various encoding in the CSV options (utf-8, iso-8859-1, windows-1252), but none would store the special characters correctly.
Before you ask, my database collation supports those characters, since we have successfully imported data containing those using PHP importers. If I dump the data using puts or a file logger, everything is correct.
Is something wrong with my code, or do I need to specify something else (like the ruby class file encoding for example) ?
Thanks
EDIT : The data saving is done by a PHP REST API that works fine with accented characters. It stores data as it is received.
In Ruby, I parse my data, store it in an object and then send the JSON-encoded object in the body of my PUT request. But if I use an SQL query directly from Ruby, the problem remains :
query = <<-SQL
UPDATE MyTable SET MyTable_title = '#{row_data['title']}' WHERE MyTable_id = '#{row_data['id']}'
SQL
res = db.execute query
I was thinking that this had something to do with the encoding type on your CSV file, so started digging around on that. I did find that windows-1252 encoding will insert control characters.
You can read more about it here: Converting special charactes such as ü and à back to their original, latin alphbet counterparts in C#
While converting from Oracle to Sybase ASE I encounter the following issue: ASCII function doesn't return code for multi-byte characters properly, it looks like it gets only the first byte.
For example, the following statement returns 34655
select ASCII('㍉') from dual
while in Sybase it returns 63
select ASCII('㍉')
Adaptive Server has the following language settings
Language: Japanese
Character Set: eucjis
Even if I use Sybase uscalar function
select uscalar('㍉')
it returns 63
Only passing to uscalar function hex equivalent of this Japanese symbol gives different result, but not the same as in Oracle
select uscalar(0x875F)
returns 24455
But in this way another issue appears - I'm not able to cast this symbol to hex as
select convert(varbinary,'㍉')
returns only the first byte again (0x3f)
Please help me to find out the appropriate way of getting the correct ASCII code of multi-byte Japanese symbols in Adaptive Server Enterprise.
I'd like to create a .properties file to be used in a Java program from a VBScript. I'm going to use some strings in languages that use characters outside the ASCII map. So, I need to replace these characters for its UTF code. This would be \u0061 for a, \u0062 fro b and so on.
Is there a way to get the UTF code for a char in VBScript?
VBScript has the AscW function that returns the Unicode (wide) code of the first character in the specified string.
Note that AscW returns the character code as a decimal number, so if you need it in a specific format, you'll have to write some additional code for that (and the problem is, VBScript doesn't have decent string formatting functions). For example, if you need the code formatted as \unnnn, you could use a function like this:
WScript.Echo ToUnicodeChar("✈") ''# \u2708
Function ToUnicodeChar(Char)
str = Hex(AscW(Char))
ToUnicodeChar = "\u" & String(4 - Len(str), "0") & str
End Function