Most efficient barcode to store a GUID - barcode

I have a system that I'm working on at the moment that requires users to log into the system, and the client wants to use a barcode scanner and cards to keep prices down. (Yes username and password cheaper, but she wants a card type solution so she gets one.)
All my data uses GUIDs as key fields, so I'd like to store the GUID directly on the card in the barcode. While its simple enough to code it as 3 of 9 its not going to be the most efficient use of space.
Is there a best practice or most efficient method for storing GUIDs in a barcode? I'd have assumed that since there's a consistent length, and depth to the data there would be a standard, but I can't find it. Would be easy enough to generate my own - control char either end and then binary data between, but would like something that standard readers will know how to interpret.
Any help gratefully received.

There are no open standards for special-purpose data compaction with generic linear barcodes such as Code 39 and Code 128. Most ISO/IEC-standardised 2D barcodes do support a special-purpose data encoding mechanism called Extended Channel Interpretation (ECI) which allows you to specify that data conforms to a certain application standard or encoding regime, for example ECI 298765 for IPv4 address compaction [*]. Unfortunately GUID compaction isn't amongst those that have been registered and even if it were you would nevertheless need to handle this within your application as reader support would be lacking.
That leaves you with having to pre-encode (and subsequently decode) the GUID into a format that can be handled efficiently by some ubiquitous barcode symbology.
An efficient way to store a GUID would be to convert it to a 40-digit[†] decimal representation and store the result in a Code 128 barcode using double-density numeric compression ("Mode C").
For example, consider the GUID:
cd171f7c-560d-4a62-8d65-16b87419a58c
Expressed as a hexadecimal number:
0xCD171F7C560D4A628D6516B87419A58C
Converted to 40 decimal digits:
0272611800569275698104677545117639878028
Encoded within a Code 128 barcode:
Your application would of course need to recognise this input as a decimal-encoded GUID and reverse the above process but I doubt that a significantly more efficient approach exists that doesn't require you to transform the data into an unusual radix and then deal with the complexities of handling ASCII control characters at scan time.
[*] The register of assigned ECI codes is available from the AIM store as "ECI Part 3: Register".
[†] Whilst it is possible to store the entire GUID range within 39 digits a 39-digit Mode C Code 128 symbol is in fact longer than a 40-digit symbol.

Related

how to add signature to protobuf messges?

Is there a common way to sign protobuf messages? what I can imagine is to Add a data field and a signature field in a message, and use SerializeToArray(in cpp) or ToByteArray(in c#) to get raw bytes, and then use md5 or sha256 .. etc to calculate the hash value, then assign the hash value to the field 'sign'. Bue I don't know if there is any different with the raw bytes between different languages, or in proto2 and proto3?
The approach you discuss for signing is fine for integrity validation purposes, as long as your hashing algorithm is strong enough. If it is for anything stronger than an integrity checksum, you should probably use a true cryptographic hash (with public+private keys), as anyone can otherwise sign their own arbitrary payload, defeating the point.
You also seen to discuss determinism. The raw bytes in protobuf are not entirely deterministic. There are multiple valid ways of representing the same payload in protobuf, including:
reordering fields (numerical order is a "should", not a "must")
including or omitting zeros (different between proto2 and proto3)
packed vs sequential "repeated" encoding
the reality that "map" is usually backed by some platform-specific inbuilt map/dictionary type, which commonly do not define order, so in theory it can vary every time
not really an issue in reality, but in theory you can encode a varint with an arbitrary length (up to 10 bytes) simply by including unnecessary groups of zero bytes; similar to in text (JSON, etc) saying that 42, 042, 0042 and 0000000042 all represent the same integer; nobody does that, but: it would be valid

What for people sometimes convert numbers or strings to bytes?

Sometimes I encounter questions about converting sth to bytes. Are anything existing where it is vitally important to convert to bytes or what for could I convert sth to bytes?
In most languages the most common string functions come as part of the language or in a library/include/import that comes pre-made, often employing object code to take advantage of processor based strings functions, however, sometimes you need to do something with a string that isnt natively supported by the language so since 8-bit days, people have viewed strings as an array of 7 or 8-bit characters, which fit within a byte and use conventions like ASCII to determine which byte value represents which character.
While standard languages often have functions like "string.replaceChar(OFFSET,'a')" this methodology can be painstaking slow because each call to the replaceChar method results in processing overhead which may be greater than the processing needing to be done.
There is also the simplicity factor when designing your own string algorithms but like I said, most of the common algorithms come prebuilt in modern languages. (stringCompare, trimString, reverseString, etc).
Suppose you want to perform an operation on a string which doesnt come as standard.
Suppose you want to add two numbers which are represented in decimal digits in strings and the size of these numbers are greater than the 64-bit bus size of the processor? The RSA encryption/descryption behind the SSL browser padlocks employs the use of numbers which dont fit into the word size of a desktop computer but none the less the programs on a desktop which deal with RSA certificates and keys must be able to process these data which are actually strings.
There are many and varied reasons you would want to deal with string, as an array of bytes but each of these reasons would be fairly specialised.

Protocol Buffers - Best practice for repeated boolean values

I need to transfer some data over a relative slow (down to only 1Kb/s) connection. I have read that the encoding of Googles protocol buffers is efficient.
Thats true for most of my data, but not for boolean values, especialy if it is a repeated field.
The problem is that I have to transfer, beside other data, a specified number (15) of boolean values every 50 milliseconds. Protobuf is encoding each boolean value into one byte for the field ID and one byte for the boolean value (0x00 or 0x01) which results in 30 bytes of data for 15 boolean values.
So I am searching for a better way of encoding this now. Anybody also had this problem already? What would be the best practice to reach a efficient encoding for this situation?
My idea was to use a numbered data type (uint32) and manual encode the data, for every bool one bit of the integer. Any feedback about this idea?
In Protobuf, your best bet is to use an integer bitfield. If you have more than 64 bits, use a bytes field (and pack the bits manually).
Note that Cap'n Proto will pack boolean values (in both structs and lists) as individual bits, and so may be worth looking at.
However, if you are extremely bandwidth-constrained, it may be best to develop your own custom protocol. Most of these serialization frameworks trade-off a little bit of space for ease of use (especially when it comes to dealing with version skew), but if your case it may be more important to focus solely on size. A custom message format that just contains some bits should be easy enough to maintain and can be packed as tightly as you want.
(Disclosure: I am the author of Cap'n Proto, as well as most of Google's open source Protobuf code.)

Session Token Transmission

There is a requirement for my web application regarding sessions. Points are below:-
Session Ids must be randomly generated.
Session Ids must be unpredictable.
The size of session Id should be large enough to ensure that it is not vulnerable to a brute force attack.
The character set should be complex, i.e. Make use of special character.
A length of 50 random characters is advised.
Essentially the answer to all those points is: use a good random number generator. Your system or platform should have some form of good PRNG to offer; on UNIX that's /dev/random or /dev/urandom, on Windows it's its build-in crypto API. Your language of choice likely offers a wrapper API around those. Simply suck a number of random bytes out of that PRNG. If it's any good, those bytes will be random and unpredictable. The length is entirely up to you, just read enough data from the PRNG. The character set will be more than complex enough, since you'll receive raw bytes; you'll in fact have to "dumb them down" into the ASCII character set, for example by base 64 encoding them.

Encryption algorithm that output byte by byte based on password and offset

Is there a well-known (to be considered) algorithm that can encrypt/decrypt any arbitrary byte inside the file based on the password entered and the offset inside the file.
(Databyte, Offset, Password) => EncryptedByte
(EncryptedByte, Offset, Password) => DataByte
And is there some fundamental weakness in this approach or it's still theoretically possible to build it strong enough
Update:
More datails: Any cryptographic algorithm has input and output. For many existing ones the input operates on large blocks. I want to operate on only one byte, but the system based on this can only can remap bytes and weak by default, but if we take the position in the file of this byte, we for example can take the bits of this position value to interpret them as some operation on some step (0: xor, 1: shitf) and create the encrypted byte with this. But it's too simple, I'm looking for something stronger.
Maybe it's not very efficient but how about this:
for encryption use:
encryptedDataByte = Encrypt(offset,key) ^ dataByte
for decryption use:
dataByte = Encrypt(offset,key) ^ encryptedDataByte
Where Encrypt(offset,key) might be e.g. 3DES or AES (with padding the offset, if needed, and throwing away all but one result bytes)
If you can live with block sizes of 16 byte, you can try the XTS-mode described in the wikipedia article about Disk encryption theory (the advantage being that some good cryptologists already looked at it).
If you really need byte-wise encryption, I doubt that there is an established solution. In the conference Crypto 2009 there was a talk about How to Encipher Messages on a Small Domain: Deterministic Encryption and the Thorp Shuffle. In your case the domain is a byte, and as this is a power of 2, a Thorp Shuffle corresponds to a maximally unbalanced Feistel network. Maybe one can build something using the position and the password as key, but I'd be surprised if a home-made solution will be secure.
You can use AES in Counter Mode where you divide your input into blocks of 16 bytes (128 bits) and then basically encrypt a counter on the block number to get a pseudo-random 16 bytes that you can XOR with the plaintext. It is critically important to not use the same counter start value (and/or initialization vector) for the same key ever again or you will open yourself for an easy attack where an attacker can use a simple xor to recover the key.
You mention that you want to only operate on individual bytes, but this approach would give you that flexibility. Output Feedback Mode is another common one, but you have to be careful in its use.
You might consider using the EAX mode for better security. Also, make sure you're using something like PBKDF-2 or scrypt to generate your encryption key from the password.
However, as with most cryptography related issues, it's much better to use a rigorously tested and evaluated library rather than rolling your own.
Basically what you need to do is generate some value X (probably 1 byte) based on the offset and password, and use this to encrypt/decrypt the byte at that offset. We'll call it
X = f(offset,password)
The problem is that an attacker that "knows something" about the file contents (e.g. the file is English text, or a JPEG) can come up with an estimate (or sometimes be certain) of what an X could be. So he has a "rough idea" about many X values, and for each of these he knows what the offset is. There is a lot of information available.
Now, it would be nice if all that information were of little use to the attacker. For most purposes, using a cryptographic hash function (like SHA-1) will give you a reasonable assurance of decent security.
But I must stress that if this is something critical, consult an expert.
One possibility is a One Time Pad, possibly using the password to seed some pseudo-random number generator. One time pads theoretically achieve perfect secrecy, but there are some caveats. It should do what you're looking for though.

Resources