Sinch sms api and chars with accent mark - sinch

When I try to send sms with the "ó" char I get a blank char instead.
I have read in the doc that:
the default alphabet is the GCM 7-bit, but characters in languages such
as Arabic, Chinese, Korean, Japanese, or Cyrillic alphabet languages
(e.g., Ukrainian, Serbian, Bulgarian, etc.) must be encoded using the
16-bit UCS–2 character encoding.
But if I encode the message with UTF-16 (I have read UCS-2 is UTF-16) I get a 40001 error. So, is posible to send special chars with sinch?

GSM-7 and USC-2 are encodings used by the Sinch backend to send the message over smpp. Currently Latin1 (iso-8859-1) is also used, and this is probably why you're getting this missing character since some sms providers do not support it and therefore decode the message using a different decoder. Sinch are removing Latin1 (which result in a shorter encoded short message than USC-2) support and will use USC-2 instead for messages that cannot be encoded with GSM-7 or ASCII.
I'm interested in the 40001 that you're getting. If you're setting the charset to utf-16 on the http request you should not do that. If you're doing something else please post your code (without appKey and secret) so I see more clearly how you generate that error.

Related

Encoding Special Characters for ISO-8859-1 API

I'm writing a Go package for communicating with a 3rd-party vendor's API. Their documentation states roughly this:
Our API uses the ISO-8859-1 encoding. If you fail to use ISO-8859-1 for encoding special characters, this will result in unexpected errors or malformed strings.
I've been doing research on the subject of charsets and encodings, trying to figure out how to "encode special characters" in ISO-8859-1, but based on what I've found this seems to be a red herring.
From StackOverflow, emphasis mine:
UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.
ISO-8859-1 is a binary encoding format where each possible value of a single byte maps to a specific character. It's certainly within my power to have my HTTP POST body encoded in this way, but not any characters beyond the 256 defined in the spec.
I gather that, to encode a special character (such as the Euro symbol) in ISO-8859-1, it would first need to be escaped in some way.
Is there some kind of standard ISO-8859-1 escaping? Would it suffice to URL-encode any special characters and then encode my POST body in ISO-8859-1?

UPS/FedEx shipment request special characters

Recently I've been working on implementation of Label generation for FedEx and UPS couriers using they external service. I have a problem with special characters printed on label. Within response I'm getting correct text but on Label all special chars are replaced by dummy signs. According UPS&FedEx docs they perfectly supports such characters on labels till they are passed as UTF-8 and encoding node in xml is present (pointing to UTF-8).
Did anyone faced similar problem? Maybe there is an official note from them that they'r not supporting such case that I'm not aware of.
UPS and FedEx APIs supports only Latin-1 chars. Dummy chars were assigned by auto utf-8 cast in one of internal methods (dicttoxml) that results in double UTF-8 encoding.

WriteConsoleW, wprintf and Unicode

AllocConsole();
consoleHandle = GetStdHandle(STD_OUTPUT_HANDLE);
WriteConsoleW(consoleHandle, L"qweąęėšų\n", 9, NULL, NULL);
_wfreopen(L"CONOUT$", L"w", stdout);
wprintf(L"qweąęėšų\n");
Output is:
qweąęėšų
qwe
Why does wprintf stop after printing qwe? \0 byte encountered in ą should terminate wide-char string, AFAIK
At first I accepted Hans Passant answer, but the root cause for wprintf not printing to UTF-8 streams is that wprintf behaves as though it uses the function wcrtomb, which encodes a wide character (wchar_t) into a multibyte sequence, depending on the current locale - link.
Windows does not have an UTF-8 capable locale (a locale which would support an UTF-8 codepage (65001)).
Quote from MSDN:
The set of available locale names, languages, country/region codes, and code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8.
The stdout stream can be redirected and therefore always operates in 8-bit mode. The Unicode string you pass to wprintf() gets converted from utf-16 to the 8-bit code page that's selected for the console. By default that's the olden 437 OEM code page. That's where the buck stops, that code page doesn't support the character.
You'll need to switch to another 8-bit code page, one that does support that character. A good choice is 65001, the code page for utf-8. Fix:
SetConsoleOutputCP(CP_UTF8);
Or use SetConsoleCP() if you want stdin to use utf-8 as well.

Different querystring urlencoding based on codepage. ASP classic

We are currently converting our webapp to UTF-8 from ISO-8859-1. And everything works great but requesting get/post variables from other sites (Signup forms).
Some of this sites that post to our site have ISO-8859-1 encoding and som have UTF-8.
The problem is that special characters gets URLencoded differently depending on the site charset.
For example:
ø = %F8 in ISO-8859-1
ø = %C3%B8 in UTF-8
I cant get %F8 right when i have UTF-8 charset. I only get a Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD).
Any tips on how to fix this would be much appreciated:)
Torbjørn
You can specify the encoding explicitly using <form accept-charset="UTF-8">.
If you don't want to do that, the browser has to guess the encoding you want. For that it usually takes the encoding of the page in which the form is. So if you serve the HTML files as UTF-8 your forms will be sent back as UTF-8, too.
I'd suggest you did a preanalysis of the inputs before converting them. Essentially, scan for the iso-8859-1 codes for Æ, Ø and Å (upper and lower case). If you find any, do a search/replace for the entire request, where you swap the iso-char codes to the UTF-8 charcodes.

Special characters in email from Oracle pl/sql

I try to send an email using utl_smtp with Oracle including norwegian characters (å æ ø). The characters are stored and displayed correctly in the database otherwise, but shows up as question marks in the email.
My database character set is WE8MSWIN1252
I have tried different Content-Type mime headers in the email including 'text/plain; charset="win-1252"', this does not seem to help.
By default smtp is 7bit ascii (kinda old tech :). You must be using UTL_SMTP.write_data and from the documentation:
Text (VARCHAR2) data sent using
WRITE_DATA is converted to US7ASCII
before it is sent. If the text
contains multibyte characters, each
multibyte character in the text that
cannot be converted to US7ASCII is
replaced by a '?' character. If
8BITMIME extension is negotiated with
the SMTP server using the EHLO
subprogram, multibyte VARCHAR2 data
can be sent by first converting the
text to RAW using the UTL_RAW package,
and then sending the RAW data using
WRITE_RAW_DATA.
There is a sample demo package on OTN that shows how to send multibyte emails.

Resources