how to know SMS segment counts? - sms

I have a question regarding SMS.
In SMS specification, it can be deliver 160 characters of message at once.
That means, if I trying to send over 160(i.e 161) it segmented by two of SMS automatically, then delivered to receiver.
But, actually, nowadays phones are not show up 2 messages. It just shows up like 1 message.
It looks like there's some header for represent message identity and it assembled automatically at phone.
BTW, is there any way know about SMS header info and how many messages really delivered/received?
My smart phone(nexus-5) doesn't show it.
Thank you.

SMS is not just delivering 160 characters(1120 bits / (7 bits/character) = 160 characters) for every each time.
If it segmented, it can deliver only 153 characters.
(http://spin.atomicobject.com/2011/04/20/concatenated-sms-messages-and-character-counts/)
Nc = Total number of characters in message
Nx = Characters from extended GSM table (|^{}[]~\ and euro)
L = Message length in 7-bit characters
M = Number of messages
L = Nc + Nx
L > 160: M = L / 153 [rounded up]
L <= 160: M = 1
The division by 153 is because when sms's are divided into parts, each part gets a header of 48 bits.

Related

Reading the value of a byte in Racket

In my program I have to communicate over TCP/IP.
To do so I have to marshall the data I want to send.
Sometimes I want to send an integer, this fits perfectly in one byte.
In Racket an integer (0 <= number < 256) is considered to be a byte.
So for example I send :
(write 15 outputPort)
(flush-output outputPort)
At the other end, the receiver has to unmarshall the received data.
So I do :
(define (loop)
(if (byte-ready? inputPort)
(display (read-byte inputPort))
(loop)))
I would suppose it to display 15 (as all numbers between 0 and 255 can fit in one byte) but instead it displays 49 which is the ASCII value of "1".
And if I loop one more I will receive 53 as second value which is the ASCII value of "5".
So is there a manner to make a byte from a value between 0 and 255 without transforming each digit of the number to an ASCII value because that causes to send N bytes were N is the number of digits in the number.
If it isn't possible, what's the advantage of bytes in Racket?
As I could simply send my number as a string :
(write (number->string 15) outputPort)
(flush-output outputPort)
And then unmarshall it by reading a string and then convert in the other way :
(string->number (read-string length inputPort))
But I wanted to use bytes in order to avoid sending strings (because operations on strings are costly) so I could send one byte for a number between 0 and 255 instead of possibly 3 bytes (when the number contains 3 digits).
You want to use write-byte, not write:
-> (write-byte 61)
=
-> (write 61)
61
If you use write-byte instead of write it will actually send the byte value instead of the serialized representation of the value.
Similarly, you can use write-bytes to write a bytestring out byte-by-byte.

How to translate Text to Binary with Cocoa?

I'm making a simple Cocoa program that can encode text to binary and decode it back to text. I tried to make this script and I was not even close to accomplishing this. Can anyone help me? This has to include two textboxes and two buttons or whatever is best, Thanks!
There are two parts to this.
The first is to encode the characters of the string into bytes. You do this by sending the string a dataUsingEncoding: message. Which encoding you choose will determine which bytes it gives you for each character. Start with NSUTF8StringEncoding, and then experiment with other encodings, such as NSUnicodeStringEncoding, once you get it working.
The second part is to convert every bit of every byte into either a '0' character or a '1' character, so that, for example, the letter A, encoded in UTF-8 to a single byte, will be represented as 01000001.
So, converting characters to bytes, and converting bytes to characters representing bits. These two are completely separate tasks; the second part should work correctly for any stream of bytes, including any valid stream of encoded characters, any invalid stream of encoded characters, and indeed anything that isn't text at all.
The first part is easy enough:
- (NSString *) stringOfBitsFromEncoding:(NSStringEncoding)encoding
ofString:(NSString *)inputString
{
//Encode the characters to bytes using the UTF-8 encoding. The bytes are contained in an NSData object, which we receive.
NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];
//I did say these were two separate jobs.
return [self stringOfBitsFromData:data];
}
For the second part, you'll need to loop through the bytes of the data. A C for loop will do the job there, and that will look like this:
//This is the method we're using above. I'll leave out the method signature and let you fill that in.
- …
{
//Find out how many bytes the data object contains.
NSUInteger length = [data length];
//Get the pointer to those bytes. “const” here means that we promise not to change the values of any of the bytes. (The compiler may give a warning if we don't include this, since we're not allowed to change these bytes anyway.)
const char *bytes = [data bytes];
//We'll store the output here. There are 8 bits per byte, and we'll be putting in one character per bit, so we'll tell NSMutableString that it should make room for (the number of bytes times 8) characters.
NSMutableString *outputString = [NSMutableString stringWithCapacity:length * 8];
//The loop. We start by initializing i to 0, then increment it (add 1 to it) after each pass. We keep looping as long as i < length; when i >= length, the loop ends.
for (NSUInteger i = 0; i < length; ++i) {
char thisByte = bytes[i];
for (NSUInteger bitNum = 0; bitNum < 8; ++bitNum) {
//Call a function, which I'll show the definition of in a moment, that will get the value of a bit at a given index within a given character.
bool bit = getBitAtIndex(thisByte, bitNum);
//If this bit is a 1, append a '1' character; if it is a 0, append a '0' character.
[outputString appendFormat: #"%c", bit ? '1' : '0'];
}
}
return outputString;
}
Bits 101 (or, 1100101)
Bits are literally just digits in base 2. Humans in the Western world usually write out numbers in base 10, but a number is a number no matter what base it's written in, and every character, and every byte, and indeed every bit, is just a number.
Digits—including bits—are counted up from the lowest place, according to the exponent to which the base is raised to find the magnitude of that place. We want bits, so that base is 2, so our place values are:
2^0 = 1: The ones place (the lowest bit)
2^1 = 2: The twos place (the next higher bit)
2^2 = 4: The fours place
2^3 = 8: The eights place
And so on, up to 2^7. (Note that the highest exponent is exactly one lower than the number of digits we're after; in this case, 7 vs. 8.)
If that all reminds you of reading about “the ones place”, “the tens place”, “the hundreds place”, etc. when you were a kid, it should: it's the exact same principle.
So a byte such as 65, which (in UTF-8) completely represents the character 'A', is the sum of:
2^7 × 0 = 0
+ 2^6 × 0 = 64
+ 2^5 × 1 = 0
+ 2^4 × 0 = 0
+ 2^3 × 0 = 0
+ 2^2 × 0 = 0
+ 2^1 × 0 = 0
+ 2^0 × 1 = 1
= 0 + 64 +0+0+0+0+0 + 1
= 64 + 1
= 65
Back when you learned base 10 numbers as a kid, you probably noticed that ten is “10”, one hundred is “100”, etc. This is true in base 2 as well: as 10^x is “1” followed by x “0”s in base 10, so is 2^x “1” followed by “x” 0s in base 2. So, for example, sixty-four in base 2 is “1000000” (count the zeroes and compare to the table above).
We are going to use these exact-power-of-two numbers to test each bit in each input byte.
Finding the bit
C has a pair of “shift” operators that will insert zeroes or remove digits at the low end of a number. The former is called “shift left”, and is written as <<, and you can guess the opposite.
We want shift left. We want to shift 1 left by the number of the bit we're after. That is exactly equivalent to raising 2 (our base) to the power of that number; for example, 1 << 6 = 2^6 = “1000000”.
Testing the bit
C has an operator for bit testing, too; it's &, the bitwise AND operator. (Do not confuse this with &&, which is the logical AND operator. && is for using whole true/false values in making decisions; & is one of your tools for working with bits within values.)
Strictly speaking, & does not test single bits; it goes through the bits of both input values, and returns a new value whose bits are the bitwise AND of each input pair. So, for example,
01100101
& 00101011
----------
00100001
Each bit in the output is 1 if and only if both of the corresponding input bits were also 1.
Putting these two things together
We're going to use the shift left operator to give us a number where one bit, the nth bit, is set—i.e., 2^n—and then use the bitwise AND operator to test whether the same bit is also set in our input byte.
//This is a C function that takes a char and an int, promising not to change either one, and returns a bool.
bool getBitAtIndex(const char byte, const int bitNum)
//It could also be a method, which would look like this:
//- (bool) bitAtIndex:(const int)bitNum inByte:(const char)byte
//but you would have to change the code above. (Feel free to try it both ways.)
{
//Find 2^bitNum, which will be a number with exactly 1 bit set. For example, when bitNum is 6, this number is “1000000”—a single 1 followed by six 0s—in binary.
const int powerOfTwo = 1 << bitNum;
//Test whether the same bit is also set in the input byte.
bool bitIsSet = byte & powerOfTwo;
return bitIsSet;
}
A bit of magic I should acknowledge
The bitwise AND operator does not evaluate to a single bit—it does not evaluate to only 1 or 0. Remember the above example, in which the & operator returned 33.
The bool type is a bit magic: Any time you convert any value to bool, it automatically becomes either 1 or 0. Anything that is not 0 becomes 1; anything that is 0 becomes 0.
The Objective-C BOOL type does not do this, which is why I used bool in the code above. You are free to use whichever you prefer, except that you generally should use BOOL whenever you deal with anything that expects a BOOL, particularly when overriding methods in subclasses or implementing protocols. You can convert back and forth freely, though not losslessly (since bool will change non-zero values as described above).
Oh yeah, you said something about text boxes too
When the user clicks on your button, get the stringValue of your input field, call stringOfBitsFromEncoding:ofString: using a reasonable encoding (such as UTF-8) and that string, and set the resulting string as the new stringValue of your output field.
Extra credit: Add a pop-up button with which the user can choose an encoding.
Extra extra credit: Populate the pop-up button with all of the available encodings, without hard-coding or hard-nibbing a list.

LZW decompression algorithm

I'm writing a program for an assignment which has to implement LZW compression/decompression.
I'm using the following algorithms for this:
-compression
w = NIL;
while ( read a character k )
{
if wk exists in the dictionary
w = wk;
else
add wk to the dictionary;
output the code for w;
w = k;
}
-decompression
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
}
For the compression stage I'm just outputing ints representing the index for the
dictionary entry, also the starting dictionary consists of ascii characters (0 - 255).
But when I come to the decompression stage I get this error
for example if I compress a text file consisting of only "booop"
it will go through these steps to produce an output file:
w k Dictionary Output
- b - -
b o bo (256) 98 (b)
o o oo (257) 111 (o)
o o - -
oo p oop (258) 257 (oo)
p - - 112 (p)
output.txt:
98
111
257
112
Then when I come to decompress the file
w k entry output Dictionary
98 (b) b
b 111 (o) o o bo (256)
o 257 (error)
257 (oo) hasn't been added yet. Can anyone see where I'm going wrong here cause I'm
stumped. Is the algorithm wrong?
Your compression part is right and complete but the decompression part is not complete. You only include the case when the code is in the dictionary. Since the decompression process is always one step behind the compression process, there is the possibility when the decoder find a code which is not in the dictionary. But since it's only one step behind, it can figure out what the encoding process will add next and correctly output the decoded string, then add it to the dictionary. To continue your decompression process like this:
-decompression
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
if k exists in the dictionary
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
else
output entry = w + firstCharacterOf(w);
add entry to dictionary;
w = entry;
}
Then when you come to decompress the file and see 257, you find it's not there in the dictionary. But you know the previous entry is 'o' and it's first character is 'o' too, put them together, you get "oo". Now output oo and add it to dictionary. Next you get code 112 and sure you know it's p. DONE!
w k entry output Dictionary
98 (b) b
b 111 (o) o o bo (256)
o 257 (oo) oo oo(257)
oo 112(p) p
See: this explanation by Steve Blackstock for more information. A better page with flow chart for the actual decoder and encoder implementation on which the "icafe" Java image library GIF encoder and decoder are based.
From http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch are you falling into this case?
What happens if the decoder receives a code Z that is not yet in its dictionary? Since the decoder is always just one code behind the encoder, Z can be in the encoder's dictionary only if the encoder just generated it, when emitting the previous code X for χ. Thus Z codes some ω that is χ + ?, and the decoder can determine the unknown character as follows:
1) The decoder sees X and then Z.
2) It knows X codes the sequence χ and Z codes some unknown sequence ω.
3) It knows the encoder just added Z to code χ + some unknown character,
4) and it knows that the unknown character is the first letter z of ω.
5) But the first letter of ω (= χ + ?) must then also be the first letter of χ.
6) So ω must be χ + x, where x is the first letter of χ.
7) So the decoder figures out what Z codes even though it's not in the table,
8) and upon receiving Z, the decoder decodes it as χ + x, and adds χ + x to the table as the value of Z.
This situation occurs whenever the encoder encounters input of the form cScSc, where c is a single character, S is a string and cS is already in the dictionary, but cSc is not. The encoder emits the code for cS, putting a new code for cSc into the dictionary. Next it sees cSc in the input (starting at the second c of cScSc) and emits the new code it just inserted. The argument above shows that whenever the decoder receives a code not in its dictionary, the situation must look like this.
Although input of form cScSc might seem unlikely, this pattern is fairly common when the input stream is characterized by significant repetition. In particular, long strings of a single character (which are common in the kinds of images LZW is often used to encode) repeatedly generate patterns of this sort.
For this specific case, the wikipedia thing fits, you have X+? where X is (o), Z is unknown so far so the first letter is X giving (oo) add (oo) to the table as 257. I am just going on what I read at wikipedia, let us know how this turns out if that is not the solution.

String to Number and back algorithm

This is a hard one (for me) I hope people can help me. I have some text and I need to transfer it to a number, but it has to be unique just as the text is unique.
For example:
The word 'kitty' could produce 12432, but only the word kitty produces that number. The text could be anything and a proper number should be given.
One problem the result integer must me a 32-bit unsigned integer, that means the largest possible number is 2147483647. I don't mind if there is a text length restriction, but I hope it can be as large as possible.
My attempts. You have the letters A-Z and 0-9 so one character can have a number between 1-36. But if A = 1 and B = 2 and the text is A(1)B(2) and you add it you will get the result of 3, the problem is the text BA produces the same result, so this algoritm won't work.
Any ideas to point me in the right direction or is it impossible to do?
Your idea is generally sane, only needs to be developed a little.
Let f(c) be a function converting character c to a unique number in range [0..M-1]. Then you can calculate result number for the whole string like this.
f(s[0]) + f(s[1])*M + f(s[2])*M^2 + ... + f(s[n])*M^n
You can easily prove that number will be unique for particular string (and you can get string back from the number).
Obviously, you can't use very long strings here (up to 6 characters for your case), as 36^n grows fast.
Imagine you were trying to store Strings from the character set "0-9" only in a number (the equivalent of obtaining a number of a string of digits). What would you do?
Char 9 8 7 6 5 4 3 2 1 0
Str 0 5 2 1 2 5 4 1 2 6
Num = 6 * 10^0 + 2 * 10^1 + 1 * 10^2...
Apply the same thing to your characters.
Char 5 4 3 2 1 0
Str A B C D E F
L = 36
C(I): transforms character to number: C(0)=0, C(A)=10, C(B)=11, ...
Num = C(F) * L ^ 0 + C(E) * L ^ 1 + ...
Build a dictionary out of words mapped to unique numbers and use that, that's the best you can do.
I doubt there are more than 2^32 number of words in use, but this is not the problem you're facing, the problem is that you need to map numbers back to words.
If you were only mapping words over to numbers, some hash algorithm might work, although you'd have to work a bit to guarantee that you have one that won't produce collisions.
However, for numbers back to words, that's quite a different problem, and the easiest solution to this is to just build a dictionary and map both ways.
In other words:
AARDUANI = 0
AARDVARK = 1
...
If you want to map numbers to base 26 characters, you can only store 6 characters (or 5 or 7 if I miscalculated), but not 12 and certainly not 20.
Unless you only count actual words, and they don't follow any good countable rules. The only way to do that is to just put all the words in a long list, and start assigning numbers from the start.
If it's correctly spelled text in some language, you can have a number for each word. However you'd need to consider all possible plurals, place and people names etc. which is generally impossible. What sort of text are we talking about? There's usually going to be some existing words that can't be coded in 32 bits in any way without prior knowledge of them.
Can you build a list of words as you go along? Just give the first word you see the number 1, second number 2 and check if a word has a number already or it needs a new one. Then save your newly created dictionary somewhere. This would likely be the only workable solution if you require 100% reliable, reversible mapping from the numbers back to original words given new unknown text that doesn't follow any known pattern.
With 64 bits and a sufficiently good hash like MD5 it's extremely unlikely to have collisions, but for 32 bits it doesn't seem likely that a safe hash would exist.
Just treat each character as a digit in base 36, and calculate the decimal equivalent?
So:
'A' = 0
'B' = 1
[...]
'Z' = 25
'0' = 26
[...]
'9' = 35
'AA' = 36
'AB' = 37
[...]
'CAB' = 46657

How does Google Protocol Buffers compare to ASN.1

What are the most noticable differences between Google Protocol Buffers and ASN.1 (with PER-encoding)? For my project the most imporant issue is the size of the serialized data. Has anyone done any data-size comparisons between the two?
If you use ASN.1 with Unaligned PER, and define your data types using the appropriate constraints (e.g., specifying lower/upper bounds for integers, upper bounds for the length of lists, etc.), your encodings will be very compact. There will be no bits wasted for things like alignment or padding between the fields, and each field will be encoded in the minimum number of bits necessary to hold its permitted range of values. For example, a field of type INTEGER (1..8) will be encoded in 3 bits (1='000', 2='001', ..., 8='111'); and a CHOICE with four alternatives will occupy 2 bits (indicating the chosen alternative) plus the bits occupied by the chosen alternative. ASN.1 has many other interesting features that have been successfully used in many published standards. An example is the extension marker ("..."), which when applied to SEQUENCE, CHOICE, ENUMERATED, and other types, enables backward- and forward compatibility between endpoints implementing different versions of the specification.
It's a long time since I've done any ASN.1 work, but the size is very likely to depend on the details of your types and actual data.
I would strongly recommend that you prototype both and put some real data in to compare.
If your protocol buffer would contain repeated primitive types, you should look at the latest source in Subversion for Protocol Buffers - they can be represented in a "packed" format now which is much more space-efficient. (My C# port has just caught up with this feature, some time last week.)
When size of the packed/encoded message is important you should also note the fact that protobuf is not able to pack repeated fields that are not of a primitive numeric type, read this for more information.
This is an issue e.g. if you have messages of that type: (comment defines actual range of values)
message P{
required sint32 x = 1; // -0x1ffff to 0x20000
required sint32 y = 2; // -0x1ffff to 0x20000
required sint32 z = 3; // -0x319c to 0x3200
}
message Array{
repeated P ps = 1;
optional uint32 somemoredata = 2;
}
In case you have an array length of, e.g., 32 than you would result in a packed message size of approximately 250 to 450 bytes with protobuf, depending on what values the array actually contains. This can even increase to over 1000 bytes in case you use the full 32bit range or in case you use int32 instead of sint32 and have negative values.
The raw data blob (assuming that z can be defined as int16 value) would only consume 320 bytes and thus the ASN.1 message is always smaller than 320 bytes since the max values are actually not 32bit but 19bit (x,y) and 15bit (z).
The protobuf message size can be optimized with this message definition:
message Ps{
repeated sint32 xs = 1 [packed=true];
repeated sint32 ys = 2 [packed=true];
repeated sint32 zs = 3 [packed=true];
}
message Array{
required Ps ps = 1;
optional uint32 somemoredata = 2;
}
which results in message sizes between approximately 100 byte (all values are zeros), 300 byte (values at range max), and 500 byte (all values are high 32bit values).
Protocol Buffers does not guarantee preservation of the order of fields in the binary encoding but ASN.1 does. It is not related to size so might not be the most noticeable in your use case but it is an important difference for comparison, for digital signatures, for simplified parsing, and possibly other applications.

Resources