Firebird UTF-8 German phonebook sorting

Firebird UTF-8 German phonebook sorting - sorting

Server: Windows 2008 R2 x64 with Firebird 2.5 x64
I need a sorting methode like the German phonebook at a Query e.g.
SELECT Name, Forename
FROM Address
ORDER BY Name, Forename;
I want as result:
Name, Forename
Assemann, Simon
Aßmann, Erika
Assmann, Frank
Astmann, Manfred
Hacker, Simon
Hackmann, Gustav
Häcker, Emil
Haecker, Manfred
Häcker, Xaver
Hafermann, Ulrich
In the German phonebook sorting, the German special characters are handled at following way:
Ä/ä = Ae/ae
Ö/ö = Oe/oe
Ü/ü = Ue/ue
ß = ss
Other special characters like the French ones (á, à…) is handled like the normal characters (a,a…).
I’m trying du Upgrade the ICS*.dll from Version 3.0 to 5.0 or 5.6 but it doesn’t work.

Related

How to differ Chinese with GetLocaleInfo?

I want to get an ISO 639-1 language string from an LCID. The problem is that 2052 (Simplified Chinese) and 1028 (Traditional Chinese) both return zh (Chinese) instead of zh-CN and zh-TW.
The code I use is
WCHAR locale[8];
GetLocaleInfoW(lcid, LOCALE_SISO639LANGNAME, locale, 8);
Is there a way to get the right code?

ISO 639-1 specifies 2-letter language names, so GetLocaleInfo() correctly returns "zh" for both Simplified and Traditional Chinese - they are not differentiated in the ISO 639-1 spec.
A call with LOCALE_SNAME instead always returns a string also containing the sub-tag, eg "de-DE" or "de-AT".
Everything else, for example a 2-letter tag for "most" languages and 4-letter one (xx-YY) for some "exceptions" (like Chinese - and which other ones?), is something custom and would therefore require custom code.

UTF8mb4 unicode breaking MariaDB JDBC driver

I have some product names that include unicode characters
⚠️📷PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK
SELLER
A query in heidiSQL shows it fine
I setup MariaDB new this morning having moved from MySQL, but when records are retrieved through a ColdFusion Query using the MariaDB JDBC I get
java.lang.StringIndexOutOfBoundsException: begin 0, end 80, length 74
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
at java.base/java.lang.String.substring(String.java:1883)
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalString(TextRowProtocol.java:238)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getString(SelectResultSet.java:948)
The productname field collation is utf8mb4_unicode_520_ci, I've tried a few options. I've tried to set this at table and database level where it let me.
The JDBC connection string in ColdFusion admin is jdbc:mysql://localhost:3307/usedlens?useUnicode=true&characterEncoding=UTF-8
I note that the live production database where MariaDB was used from the beginning I don't have this trouble but the default charset is latin1, and the same record is the database as
????PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK SELLER

Here's how we've been stripping high ASCII characters while retaining any characters that may be salvaged:
string function ASCIINormalize(string inputString=""){
return createObject( 'java', 'java.text.Normalizer' ).normalize( javacast("string", arguments.inputString) , createObject( 'java', 'java.text.Normalizer$Form' ).valueOf('NFD') ).replaceAll('\p{InCombiningDiacriticalMarks}+','').replaceAll('[^\p{ASCII}]+','');
}
productname = ASCIINormalize(productname);
/*
Comparisons using java UDF versus reReplace regex:
"ABC Café ’test" (note: High ASCII non-normal whitespace characters used.)
ASCIINormalize = "ABC Cafe test"
reReplace = "ABC Caf test"
"čeština"
ASCIINormalize = "cestina"
reReplace = "etina"
"Häuser Bäume Höfe Gärten"
ASCIINormalize = "Hauser Baume Hofe Garten"
reReplace = "Huser Bume Hfe Grten"
*/

This is due to a sequence of high ASCII characters that form emojis. I encountered similar issues when exporting MSSQL data to a UTF-8 file to be converted to Excel using a 3rd party tool. In this case, the database and file were correct, but the 3rd party tool would crash when encountering emoji characters.
Our approach to this was to convert emojis to their aliases so that information wasn't lost in the process. (If you strip high ASCII characters, you may lose some context.) To sanitize emojis to use aliases, I wrote this ColdFusion cf-emoji-java (CFC) to leverage emoji-java (JAR file) to convert emojis to their ASCII7-safe aliases.
emojijava = new emojijava();
emojijava.parseToAliases('I like 🍕'); // I like :pizza:

Since...
I'm not really in the business of supporting emojis
My data is just product names targeted at UK, Europe and the United States for the foreseeable future
I don't want to have to go through the same trouble with production (already defaulted to latin1_swedish_ci)
I decided to..
Match production, so I set the database, table, and fields to latin1_swedish_ci with help from
How to change the CHARACTER SET (and COLLATION) throughout a database?
and strip non ASCII characters in the product name
== edit don't do this, it takes out too many useful characters ==
<cfset productname = reReplace(productname, "[^\x20-\x7E]", "", "ALL")>

How to remove special characters in XML through ESQL

I am having problem with special characters coming in input XML.
How can we remove the bad characters which can come anywhere in the XML field through ESQL code in broker toolkit.
In the below XML, description field is having bad character â€” :
<notificationsRequest>
<BillingCity>Troutdale</BillingCity>
<BillingCountry>United States</BillingCountry>
<BillingPostalCode>97060</BillingPostalCode>
<BillingState>Oregon</BillingState>
<BillingStreet>450 NW 257th Way, Suite 400</BillingStreet>
<CreatedById>005w0000003QlXtAAK</CreatedById>
<Type>Prospect</Type>
<Tyco_Operating_Co__c>Tyco IS - Commercial</Tyco_Operating_Co__c>
<Doing_Business_As_DBA__c>Columbia Gorge Outlets</Doing_Business_As_DBA__c>
<Description>As of January 2016â€”the property title should read Austell Columbia Gorge Equities, LLC-dba Columbia Gorge Outlets---so the title should be Austell Columbia Gorge Equities, LLC.</Description>
</notificationsRequest>

Your file seems to come with a wrong encoding or was corrupted while the conversion from one encoding to another. If you are a MS Windows user you can open it using Nodepad++ and try to convert its encodeing to UTF8 or any possible encoding to check the issue.

Can Unicode code points vary between platforms (Windows, Unix, Mac os)?

I read today in a book about Java the author stating about unicode characters (translated):
Codes of characters are part of extensions that differ from one country or working environment to another. In those extensions, the characters are not always defined at the same position in the Unicode table.
The character "é" is defined at position 234 in the Unix Unicode table, but at position 200 in Mac OS Unicode table. The special characters, consequently accented characters, don't always have the same Unicode code from one environment to another.
For instance, characters é, è and ê have respectively the following Unicode codes:
Unix: \u00e9 \u00e8 \u00ea
Dos: \u0082 \u008a \u0088
Windows: \u00e9 \u00e8 \u00ea
MAC OS: \u00c8 \u00cb \u00cd
But from my understanding of Unicode, a same character has always the same code point in the Unicode table and there's no such thing as different Unicode tables for different OS. For instance, the character é is always \u00e9 be it on Windows, Mac OS or Unix.
So either I still don't grasp the concept of Unicode, or the author is wrong. But still he couldn't have made it up, so perhaps was this true at the infancy of Unicode?

The author is wrong. You're right, a given character has the same Unicode code point for any correct implementation of Unicode. I seriously doubt that there were multiple representations even at the infancy of Unicode; that would have defeated the whole purpose.
She may be describing non-Unicode character sets such as the various ISO-8859 standards and the Windows code pages such as 1252. Unicode code points in the range 0x80 to 0x9F (decimal 128 to 159) are control characters; some 8-bit character sets have used those codes for accented letters and other symbols.
The character 'é' has the Unicode code point 233 (0xe9). That is invariant. (Are you sure the book said it's 234 in "the Unix Unicode table?)
There are alternate ways of representing certain characters; for example, 'é' can also be represented as a combination of e (0x65) with a combining acute accent (0x301), but that's not what the author is talking about.
Copying information from comments, the book is in French, and is titled "Le Livre de Java premiere langage", by Anne Tasso; the cited version is the 3rd edition, published in 2005. It's available in PDF format here. (The web site name matches the name of the publisher and copyright holder on the first page, so it appears to be a legitimate copy.)
In the original French:
Le caractère é est déﬁni en position 234 dans la table Unicode d’Unix,
alors qu’il est en position 200 dans la table Unicode du système Mac
OS. Les caractères spéciaux et, par conséquent, les caractères
accentués ne sont pas traités de la même façon d’un environnement à
l’autre : un même code Unicode ne correspond pas au même caractère
which, as far as I can tell from my somewhat limited ability to read French, is simply nonsense.
In the quoted table, the representations shown for Unix and Windows are identical, and are consistent with actual Unicode (which makes me think the "234" in the text above that is a typo in the book).
There is an 8-bit extended ASCII representation called Mac OS Roman, but it's inconsistent with what's shown in the table (for example 'é' is 0x8E, not 0xC8), and it's clearly not Unicode.
Windows-1252 is a common 8-bit encoding for Windows, and perhaps also for MS-DOS, but it's also inconsistent with anything shown in that table; 'é' is 0xE9, just as it is in Unicode.
I have no idea where the DOS and MacOS entries came from, or where the author got the idea that Unicode code points vary across operating systems.
I wonder if it's possible that some old implementations of Java have implemented Unicode incorrectly (though character display would be handled by the OS, not by Java). Even if that were the case, I'd expect that any modern Java implementation would get this right. Java might have problems with characters outside the Basic Multilingual Plane, but that's not relevant for characters like 'é'.

How to display chinese characters in output file of a concurrent program in oracle apps

I'm working for Taiwan operating unit where supplier name is store in traditinal chinese language. I have written a package in which I'm displaying these chinese supplier name in fnd output file of a concurrent program. But in output file I see ? (question mark) instead of chinese characters.
So can anyone tell me how can i output chinese characters?
Regards,
Pradvin

Change profile Option at user level as below
Profile Option Name : FND: NATIVE CLIENT ENCODING
Value : ZHS16GBK
Regards,
Ashish Vishwakarma

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio