sending sms in hebrew - sms

I'm using sms1.cardboardfish.com to sens smses through the web. I have these datacoding schemes to work with:
0: Flash
1: Normal
2: Binary
4: UCS2
5: Flash UCS2
6: Flash GSM
7: Normal GSM
and I want to send it in hebrew. right now I'm sending it in 7: Normal GSM and it comes out scrambled.. Ideas anyone?

Send it in UCS2, which is normal UTF-16 encoding.
I think this should do the trick:
>>> a=u"שלום"
>>> a
u'\u05e9\u05dc\u05d5\u05dd'
>>> a.encode("utf_16_be").encode("hex")
'05e905dc05d505dd'

Note that when using a multi-byte character set (such as UCS2) the maximum number of characters per message will be significantly reduced. The well known 160 character limit is based on a 7 bit character set, with a 16 bit character set you'll be limited to 70 characters.

Related

Concurrent DOS + QEMU losing data through parallel port emulation

I have a piece of software running on concurrent DOS 3.1, which I emulate with QEMU 5.1.
In this program, there are several options to print data. The problem is that the data arriving to my host does not correspond to the data sent.
the command to start qemu:
qemu-system-i386 -chardev file,id=imp0,path=/path/to/file -parallel chardev:imp0 -hda DISK.Raw
So the output sent on parallel port of my guest is redirected to /path/to/file.
When I send the charactère 'é' from CDOS:
echo é>>PRN
The code page used on CDOS is Code Page 437, and in this charactere set, the charactère é is represented by 0x82, but on my host, instead, I receive the following:
cp437 é -> 0x82 ---------> host -> x1b52 x017b x1b52 x00
So I tried something else. I wrote the charactère 'é' in a file, and sent the file with nc.exe (from brutman and libmtcp), and with nc, the value stays 0x82.
So my question, what happen when I send my data to virtual parallel port? When does my data get transformed? Is it the parallel port on Concurrent DOS? Is it the QEMU? I can't figure out how to send my data through LPT1 properly.
I also tried this:
qemu-system-i386 -chardev socket,id=imp0,host=127.0.0.1,port=2222,server,nowait -parallel chardev:imp0 -hda DISK.Raw
I can read the socket fine, but same output as when I write in a file, the é get transformed to x1b52 x017b x1b52 x00.
The byte sequence "1B 52 01 7B 1B 52 00" is using Epson FX-style printer escape sequences (there's a reference here). Specifically, "1B 52 nn" is ESC R n which selects the international character set, where character set 0 is USA and 1 is France. So the sequence as a whole is "Select French character set; print byte 0x7b; select US character set". In the French character set for this printer standard, 0x7B is the e-acute é.
This is almost certainly the CDOS printer driver assuming that the thing on the end of PRN: is an Epson printer and emitting the appropriate escape sequences to output your text on that kind of printer.
OK, so i finally figured it out... After hours of searching where this printer.sys driver could be, or how to remove it, no Concurrent DOS, the setup command is "n". And of course, it is not listed in the "help" command...
Anyway, in there you can setup your printer, and select "no conversion" for the port you want. And it was actually on Epson MX-80/MX-100.
So thanks to Peter's answer, you led me to the right path !

Decode incoming SMS numbers

I use some SIMCOM GSM module to receive incoming messages. When I send SMS from my mobile phone I see my normal number:
+CMT: "+38012345678", ...
But when SMS comes from my cell operator, or some named SMS service as Google I see somу trash like here from Google:
+CMT: "16p6p6w237562767963656", ...
one more:
+CMT: "w49511#495946535451425", ...
and more:
+CMT: "#497966737471627", ...
According to module documentation this parameter named <oa> and means GSM 03.40 TP-Originating-Address Address-Value string field.
Is it possible to decode it on any programming language, e.g. from python? What can it be? If I switch to UCS2 and decode from it is absolutely the same.
According to SIM800 Series AT Command Manual v1.10, page 114:
GSM 03.40 TP-Destination-Address Address-Value field in string
format; BCD numbers (or GSM default alphabet characters) are converted
to characters of the currently selected TE character set (refer
Command +CSCS in 3GPP TS 27.007); type of address given by
If phone number in CMT message does not start with "+" sign, it is encoded with BCD numbers.
I tried to compare those numbers with ASCII table. This is not exactly BCD encoding, but looks very similar.
To decode "16p6p6w237562767963656" split it into pairs: 16 p6 p6 w2 37 56 27 67 96 36 56
then reverse each pair: 61 6p 6p 2w 73 65 72 76 69 63 65
Now compare to HEX codes in ASCII table and get the result: all services. You may wonder how to read 6p 6p 2w. I wonder either!
After searching other examples of encoded numbers I made an assumption that HEX digits 0, A-F have equivalent of different characters:
0 - w
A
B - #
C - p
D
E - +
F - #
I have no idea, why HEX digits were replaces by random letters.
"w49511#495946535451425" stands for "#Y?KYIVSTAR". The code "11" is unprintable and replaced by "?".
"#497966737471627" stands for "Kyivstar".
Are you sure your module is set to text format (AT+CMGF=1) when receiving those SMS? If you switched off your module and on again it probably is set to "PDU" mode, which is more suited for computers than humans..
See the SIMCOM AT Command manual for details, it's very extensive (380 pages pdf).

Concatenated SMS extended symbols at segments border - what is correct split method?

For concatenated SMS messages (in GSM encoding), if extended table symbol (one of these: }{[]|~^\€) is placed at the end of segment, what is correct way to split such message:
Leave first byte of symbol (0b) at the end of segment and put second byte to the beginning of next one, and so fill all available bytes of UD (which seems logically correct)
OR
Move whole symbol bytes to the next segment and leave unused byte at the end?
I didn't found any clarification neither in SMPP 3.4 specs or implementation guide nor in GSM 03.38 specs, so assume that method selection is up to content provider or sending software.
When you are using a 7-bit encoding, split when you reach 153 characters (whether that is in-between the 1B and the next character or not).
I can't think of any reason to prefer option 2 (leave unused byte at the end).
The message should not be rendered unless it's completely received by the subscriber's device. Furthermore, nothing is even represented by 1B septet in GSM7, so even if a device tried to render something unexpected it would fail.

Windows DHCP client hostname encoding

Recently I have been trying to save list of hostnames from captured DHCP packets. I have found out, every DHCP hostname (option 12) should have form defined in RFC 1035. So if I understand it correctly, hostname should be encoded in 7-bit ASCII and have other restrictions like:
- name should not start with digit and should omit some forbidden characters.
Almost every device I have encountered in packets fulfill this constraint, but not Windows devices (Vendor ID MSFT 5.0). IMHO Windows DHCP client takes computer (mobile) name and fill it in hostname option.
Problem occurs, when computer name is set for example to "Lukáš-PC". Wireshark display this hostname as Luk\240\347-PC. (240 and 347 are numbers in octal). To see for myself I have printed values in packets with printf("%hhu", c) (C language).
á = 160
š = 231
IMHO I think this is simple char variable overflow. I tried deduce original value from overflow value, but I haven't found any relation between character and known encodings. So my questions are:
Is there any way to convert these values back to original?
If yes, what was original character encoding, when overflow happened?
Thanks.
Default char is usually signed, and extends to int when passed to a variadic function. To ensure that it is printed unsigned, use printf("%hhu", c) or printf("%d", (unsigned char)c);.
The correct encoding is impossible to know because it depends on each system's settings.
Note that any compliant systems MUST encode names according to RFC 3490, but Windows seems to enjoy violating standards.
The characters á and š that you are seing are encoded using code page 852 (Latin-2 - Central European languages).
Unfortunately there is no simple way how you can figure out the encoding used only by looking at DHCP requests. In principle the DHCP client can use any code page it wants. If you are working in a private/controlled network, then it is probably safe to assume the all clients are using the same code page and explicitly encode the strings using that particular code page.

SMS PDU Concatenation

I am working with SMS Concatenation. My GSM Modem supports PDU Mode. My UDH works fine when i use the IEI for 05 for using a certain port but then i tried using IEI 00 which is for concatenation. I am receiving the two messages combined as single message without problem but i am receiving unreadable sms of weird characters. Below is my PDU for the first part.
0041000B819062972624F60000A0050003A1020154741914AFA7C76B9058FEBEBB41E6371EA4AEB7E173D0DB5E983E8E832881DD6E741E4F7D905A2A2CBA0783D3D5E83C4F2F7DD0D32BFF12075BD0D9F83DEF6B21C44479741ECB03E0F22BFCF2E10155D06C5EBE9F11A2496BFEF6E90F98D07A9EB6DF81CF4B697E5203ABA0C6287F57910F97D7681A8E832285E4F8FD720B1FC7D7783CC6F
and this one is for the second part:
0041000B819062972624F600007B050003A102027890BADE86CF416F7B590EA2A3CB2076589F0791DF6717888A2E83E2F5F4780D12CBDF7737C8FCC683D4F5367C0E7ADBCB72101D5D06B1C3FA3C88FC3EBB4054741914AFA7C76B9058FEBEBB41E6371EA4AEB7E173D0DB5E9683E8E832881DD6E741E4F7D905
Thanks a lot for helps in advance.
Did you remember to pad your UDH with additional bits so that your UD septets that follow start on a septet boundary?
If for example you have 6 octets in your UDH (most common), which equals 48 bits, then you have to add 1 more bit so that the GSM-7 encoded characters start on a septet boundary (49 bits is 7 septets).
Read http://mobiletidings.com/2009/02/18/combining-sms-messages/ for more information.

Resources