GSM encoding for SMS with special characters (Twilio)

GSM encoding for SMS with special characters (Twilio) - sms

I hope and you can help me with this query that I have.
I need to send messages with special characters avoiding to segment the message too much, that is, if the maximum number of characters for a GSM message is 160 and I write a message lower than the limit allowed for GSM with the simple fact of having at least one special character this message is change to UCS2.
I do not know if there is a way to avoid this, and that the message is only encoded with GSM without importing special characters to prevent the message from costing more.
Thank you in advance and greetings.
Example (Text):
Encoded: GSM
Message: Hola Señor Cliente le informamos que ya está disponible su crédito, acuda a las oficinas de Compañia o marque al 00110011001.
Length: 125
Segments: 1
Encoded: UCS2
Message: Hola Señor Cliente le informamos que ya está disponible su crédito, acuda a las oficinas de Compañia o marque al 00110011001.
Length: 125
Segments: 2
Example (Image):
GSM
UCS2

Twilio developer evangelist here.
You cannot send special characters as a GSM encoded message because those characters do not exist within the GSM character set. This is why Twilio encodes those messages as UCS2 (well, really as UTF16 big endian). However when messages are encoded in UCS2, you can only fit 70 characters.
The only way to ensure that your messages are not encoded as UCS2 is to avoid any characters outside of the GSM character set.
You can read more about how Twilio handles special characters in the API in the first part of this blog post on adventures in unicode SMS.
Let me know if that helps at all.

Related

Sending a UNICODE string to A16 COMS Mainframe via TCP/IP

I need to send a UNICODE string message to A16 COMS (Mainframe) via TCP/IP. What algorithm do I need , what transformation of a string. String can contain one or more UNICODE Characters.
While sending ASCII only based string I convert(map) it to EBCDIC and send via TCP/IP connection. I know that EBCDIC doesn't handle UNICODE Character. Besides, I can send via TCP IP only byte array, where in case of ASCII string one character maps to one array cell. In the case of UNICODE character - it can occupy from 1 to 4 byte array cells.
The question is how do I send the UNICODE containing string to A16 Mainframe.
Further clarification:
When I run the code, the TCP client cannot receive any response. It passes timeout and gives an error. Increasing timeout does not help. C# can convert an UNC string to UTF-8 either using System.Text.Encoding or even with an algorithm - almost manually. Those are not a problem. Problem is that A16 COMS expects “one character = one byte”, (mapped to EBCDIC). And with UTF-8 one character may occupy 2, 3 or 4 cells of an array. Now EBCDIC mapping itself does not help, because EBCDIC is designed to work with non-unicode (ASCII based) strings.
I hope that someone whoever did this at some point in his career might read my post because not much can be done by figuring out. Can it be done with TCP Client and its NetworkStream? Send method has only array of bytes in its signature, but with utf-8 array of bytes can be so much longer than the limit.
It is a question asking to share experience, not knowledge.

Decode incoming SMS numbers

I use some SIMCOM GSM module to receive incoming messages. When I send SMS from my mobile phone I see my normal number:
+CMT: "+38012345678", ...
But when SMS comes from my cell operator, or some named SMS service as Google I see somу trash like here from Google:
+CMT: "16p6p6w237562767963656", ...
one more:
+CMT: "w49511#495946535451425", ...
and more:
+CMT: "#497966737471627", ...
According to module documentation this parameter named <oa> and means GSM 03.40 TP-Originating-Address Address-Value string field.
Is it possible to decode it on any programming language, e.g. from python? What can it be? If I switch to UCS2 and decode from it is absolutely the same.

According to SIM800 Series AT Command Manual v1.10, page 114:
GSM 03.40 TP-Destination-Address Address-Value field in string
format; BCD numbers (or GSM default alphabet characters) are converted
to characters of the currently selected TE character set (refer
Command +CSCS in 3GPP TS 27.007); type of address given by
If phone number in CMT message does not start with "+" sign, it is encoded with BCD numbers.
I tried to compare those numbers with ASCII table. This is not exactly BCD encoding, but looks very similar.
To decode "16p6p6w237562767963656" split it into pairs: 16 p6 p6 w2 37 56 27 67 96 36 56
then reverse each pair: 61 6p 6p 2w 73 65 72 76 69 63 65
Now compare to HEX codes in ASCII table and get the result: all services. You may wonder how to read 6p 6p 2w. I wonder either!
After searching other examples of encoded numbers I made an assumption that HEX digits 0, A-F have equivalent of different characters:
0 - w
A
B - #
C - p
D
E - +
F - #
I have no idea, why HEX digits were replaces by random letters.
"w49511#495946535451425" stands for "#Y?KYIVSTAR". The code "11" is unprintable and replaced by "?".
"#497966737471627" stands for "Kyivstar".

Are you sure your module is set to text format (AT+CMGF=1) when receiving those SMS? If you switched off your module and on again it probably is set to "PDU" mode, which is more suited for computers than humans..
See the SIMCOM AT Command manual for details, it's very extensive (380 pages pdf).

SMS PDU Concatenation

I am working with SMS Concatenation. My GSM Modem supports PDU Mode. My UDH works fine when i use the IEI for 05 for using a certain port but then i tried using IEI 00 which is for concatenation. I am receiving the two messages combined as single message without problem but i am receiving unreadable sms of weird characters. Below is my PDU for the first part.
0041000B819062972624F60000A0050003A1020154741914AFA7C76B9058FEBEBB41E6371EA4AEB7E173D0DB5E983E8E832881DD6E741E4F7D905A2A2CBA0783D3D5E83C4F2F7DD0D32BFF12075BD0D9F83DEF6B21C44479741ECB03E0F22BFCF2E10155D06C5EBE9F11A2496BFEF6E90F98D07A9EB6DF81CF4B697E5203ABA0C6287F57910F97D7681A8E832285E4F8FD720B1FC7D7783CC6F
and this one is for the second part:
0041000B819062972624F600007B050003A102027890BADE86CF416F7B590EA2A3CB2076589F0791DF6717888A2E83E2F5F4780D12CBDF7737C8FCC683D4F5367C0E7ADBCB72101D5D06B1C3FA3C88FC3EBB4054741914AFA7C76B9058FEBEBB41E6371EA4AEB7E173D0DB5E9683E8E832881DD6E741E4F7D905
Thanks a lot for helps in advance.

Did you remember to pad your UDH with additional bits so that your UD septets that follow start on a septet boundary?
If for example you have 6 octets in your UDH (most common), which equals 48 bits, then you have to add 1 more bit so that the GSM-7 encoded characters start on a septet boundary (49 bits is 7 septets).
Read http://mobiletidings.com/2009/02/18/combining-sms-messages/ for more information.

sending sms in hebrew

I'm using sms1.cardboardfish.com to sens smses through the web. I have these datacoding schemes to work with:
0: Flash
1: Normal
2: Binary
4: UCS2
5: Flash UCS2
6: Flash GSM
7: Normal GSM
and I want to send it in hebrew. right now I'm sending it in 7: Normal GSM and it comes out scrambled.. Ideas anyone?

Send it in UCS2, which is normal UTF-16 encoding.
I think this should do the trick:
>>> a=u"שלום"
>>> a
u'\u05e9\u05dc\u05d5\u05dd'
>>> a.encode("utf_16_be").encode("hex")
'05e905dc05d505dd'

Note that when using a multi-byte character set (such as UCS2) the maximum number of characters per message will be significantly reduced. The well known 160 character limit is based on a 7 bit character set, with a 16 bit character set you'll be limited to 70 characters.

Embedding GSM cellids in Short Messages

I'm using the WML function "providelocalinfo" to put location information into Short Messages send via a WIB menu on a GSM handset.
I'm using the WIG WML v.4 Spec from SmartTrust. The relevant section is "9.4 providelocalinfo Element"
I use the code as in the example, and then transmit the variable via SMS, and use Kannel to retrieve the message from the SMSC.
Here's the code that I'm using, with the exception of [myservicecentre] being my actual service centre:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE wml PUBLIC "-//SmartTrust//DTD WIG-WML 4.0//EN"
"http://www.smarttrust.com/DTD/WIG-WML4.0.dtd">
<wml wibletenc="UCS2">
<card id="s">
<p>
<providelocalinfo cmdqualifier="location" destvar="LOC"/>
<setvar name="X" value="loc=" class="binary"/>
<sendsm>
<destaddress value="367"/>
<userdata docudenc="hex-binary" dcs="245">
$(X)$(LOC)
</userdata>
<servicecentreaddress value="[myservicecentre]"/>
</sendsm>
</p>
</card>
</wml>
What I see in my received messages is "loc=" followed by 7 bytes (octets) or binary data. I have tried to find documentation explaining how to decode this data, but found nothing the explains this clearly.
Of the decoded 7 octets,
the first 3 octets are always the same,
The next 2 octets tend to vary between three unique values,
the last 2 octets appear to be the cellid.
So I have coded the receiver to pull the last two octets and construct a 16-bit GSM cellid. Most of the time it matches known cellids from the network. But quite often, the value does not match.
So I'm trying to find information on the following:
How to properly transmit the location information in a safe manner (encodings, casts, etc)
How to decode the information properly
How to configure Kannel to honor binary location data
I've examined the following documents in my vain searching, but not found the relevant data:
GSM 03.38, GSM 04.07, GSM 04.08, GSM 11.15, as well as the WIG WML Spec V .4
Any insight into what I might be doing wrong would be appreciated!

To decode the location info, you need to look in GSM 11.14 page 48
1.19 LOCATION INFORMATION
Byte(s) Description Length
1 Location Information tag 1
2 Length (X) of bytes following 1
3-5 Mobile Country & Network Codes (MCC & MNC) 3
6-7 Location Area Code (LAC) 2
8-9 Cell Identity Value (Cell ID) 2
The mobile country code (MCC), the mobile network code (MNC), the location area code (LAC) and the
cell ID are coded as in TS GSM 04.08 [8].
From personal experience, the first octet mentioned here is usually left off, so your first three unchanging bytes are the length and the country. The next 2 are the network operator code.

Not too many bites on this question! I wanted to summarize my findings in case others can find them useful:
Need to send messages with a dcs setting not equal to 0. dcs="0" sends data packed (honoring the lower 7-bits of each octet; this allows 160 character SMS messages when the max message size is actually 140 octets)
Need to parse the data in a binary safe manner: regex expressions that stop searching when 0x0A is encountered will fail when the binary data itself can be that value.
I found no need to change Kannel's default configuration.
Cheers
Disclaimer: Safe transmission of 16-bit GSM Cell-Ids requires dealing with a few settings that I understand only because they weren't configured by default. There are probably other defaults that I've depended on but am unaware that they can vary.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio