Generate unique alpha-numeric object ID, like Parse - parse-platform

I already asked what I needed at the title. I want to generate using either PHP or javascript.
I think the class name and some properties are used to build the objectId but someone may already know how its done that could share here?

The Parse Server generates the objectId. It is a randomly generated String of 10 chars length. You can see their implementation at cryptoUtils.newObjectId(). From the code we can conclude that they are not enforcing uniqueness.
https://github.com/ParsePlatform/parse-server/blob/master/src/cryptoUtils.js

Parse is probably using ids generated in Mongodb. They are not random and can be potentially predicted :
A BSON ObjectID is a 12-byte value
consisting of a 4-byte timestamp
(seconds since epoch), a 3-byte
machine id, a 2-byte process id, and a
3-byte counter
http://www.mongodb.org/display/DOCS/Object+IDs

Related

Abbreviating a UUID

What would be a good way to abbreviate UUID for use in a button in a user interface when the id is all we know about the target?
GitHub seems to abbreviate commit ids by taking 7 characters from the beginning. For example b1310ce6bc3cc932ce5cdbe552712b5a3bdcb9e5 would appear in a button as b1310ce. While not perfect this shorter version is sufficient to look unique in the context where it is displayed. I'm looking for a similar solution that would work for UUIDs. I'm wondering is some part of the UUID is more random than another.
The most straight forward option would be splitting at dash and using the first part. The UUID 42e9992a-8324-471d-b7f3-109f6c7df99d would then be abbreviated as 42e9992a. All of the solutions I can come up with seem equally arbitrary. Perhaps there is some outside the box user interface design solution that I didn't think of.
Entropy of a UUID is highest in the first bits for UUID V1 and V2, and evenly distributed for V3, V4 and V5. So, the first N characters are no worse than any other N characters subset.
For N=8, i.e. the group before the first dash, the odds of there being a collision within a list you could reasonably display within a single GUI screen is vanishingly small.
The question is whether you want to show part of the UUID or only ensure that unique strings are presented as shorter unique strings. If you want to focus on the latter, which appears to be the goal you are suggesting in your opening paragraph:
(...) While not perfect this shorter version is sufficient to look unique in
the context where it is displayed. (...)
you can make use of hashing.
Hashing:
Hashing is the transformation of a string of characters into a usually
shorter fixed-length value or key that represents the original string.
Hashing is used to index and retrieve items in a database because it
is faster to find the item using the shorter hashed key than to find
it using the original value.
Hashing is very common and easy to use across many of popular languages; simple approach in Python:
import hashlib
import uuid
encoded_str = uuid.UUID('42e9992a-8324-471d-b7f3-109f6c7df99d').bytes
hash_uuid = hashlib.sha1(encoded_str).hexdigest()
hash_uuid[:10]
'b6e2a1c885'
Expectedly, a small change in string will result in a different string correctly showing uniqueness.
# Second digit is replaced with 3, rest of the string remains untouched
encoded_str_two = uuid.UUID('43e9992a-8324-471d-b7f3-109f6c7df99d').bytes
hash_uuid_two = hashlib.sha1(encoded_str_two).hexdigest()
hash_uuid_two[:10]
'406ec3f5ae'
After thinking about this for a while I realised that the short git commit hash is used as part of command line commands. Since this requirement does not exist for UUIDs and graphical user interfaces I simply decided to use ellipsis for the abbreviation. Like so 42e9992...

Encode array to fixed length string

I'm trying to implement a similar function to PCPartPicker's list permalink function.
https://au.pcpartpicker.com/list/
basically generate a permalink based on the items in the list. The key part is to generate a string which should be:
unique
persistent
fixed length
I'm thinking about encoding an array contains product id, but can't find the right way to implement it.
Base64 and the similar (like Hashids library) can ensure it's unique and persistent, but it ends up quite long when the array has many items.
Is there other way to encode the array or is there other direction I can implement this function?
Thank you in advance.
One can't generate unique fixed length string for arbitrary length list that will contain all info - there is always some length that can't fit.
Since your site has database, you can generate UUID and store list in DB along with UUID. To save space and efforts you can save it into DB only when user presses "get permalink" button or something like that.

What is the length and characters in ids generated by elasticsearch?

The documentation of elastic search states:
The index operation can be executed without specifying the id. In such
a case, an id will be generated automatically.
But it does not provide any information about the properties of the ids.
What is the length (minimun/maximum)?
my guess is 22.
Which characters are used in the id?
My guess is [-_A-Za-z0-9]
Can the properties of the generated ids change at any time (is that part of the API)?
Auto-generated ids are random base64-encoded UUIDs. The base64 algorithm is used in URL-safe mode hence - and _ characters might be present in ids.
Auto-generated ids by elasticsearch are exactly 20 characters length (not 22 characters) and encoded by url-safe base64 algorithm [-_A-Za-z0-9].
Read more in documentation: https://www.elastic.co/guide/en/elasticsearch/guide/master/index-doc.html#_autogenerating_ids

Compression algorithms for Strings

I have to generate QRCodes using concatenated object properties. These strings might be long, that's why I'd like to know which compression algorithm to use knowing that my String's length is between 25 an 100+ characters
thanks in advance,
Jerec
I am assuming that since you are going to use compression before you store the strings that these QR codes will not be readable by any client, it would have to be an application that you wrote (b/c you are storing character with an unknown encoding, the client won't be able to decode).
Instead of compressing and storing the long string in the QR code, have your application create a URI (like a GUID or a URL) and when your application decodes that URI it looks up all the values (uncompressed) that you wanted to store in the QR code. Then your app can just look up the format in any way it wants.
For example, assuming your persistant storage is an xml file, but it could be anything:
<URI = "http://mydomain.com/790C9704-8C61-435F-991D-CDBB5767AA3D">
<MyElement>14523</MyElement>
<MyElement>67548</MyElement>
...
<MyElement>46167</MyElement>
</URI>
Encoded on QR code: "http://mydomain.com/790C9704-8C61-435F-991D-CDBB5767AA3D", values can then be looked up.
The algorithm used to encode QR codes is dependent on the type of data you encode. See http://www.swetake.com/qr/qr1_en.html.
If you know, for example, that you always have the same number of digits per id and therefor could just string them together without punctuation, you can encode them as purely numeric and you'll use 10 bits for every three characters.
If you need some kind of separator, if you use something in "0-9A-Z $%*+-./:", you'll stay alphanumeric and get 2 characters in 11 bits.
If you give it arbitrary data (note that this includes any lower case: the list above does not include lower case letters) you're going to be using 8 bits per characters.
So numeric only would end up being 60% smaller.

MAPI: Format of PR_SEARCH_KEY

Does anyone know the format of the MAPI property PR_SEARCH_KEY?
The online documentation has this to say about it:
The search key is formed by
concatenating the address type (in
uppercase characters), the colon
character ':', the e-mail address in
canonical form, and the terminating
null character.
And the exchange document MS-OXOABK says this:
The PidTagSearchKey property of type
PtypBinary is a binary value formed by
concatenating the ASCII string "EX: "
followed by the DN for the object
converted to all upper case, followed
by a zero byte value.
However all the MAPI messages I've seen with this property have it as some sort of binary 16 byte sequence that looks like a GUID. Does anyone else have any more information about it? Is it always 16 bytes?
Thanks!
I believe that the property PR_SEARCH_KEY will be of different formats for different objects (as alluded to by Moishe).
A MAPI message object will have a unique value assigned on creation for PR_SEARCH_KEY, however if the object is copied this property value is copied also. I presume when you reply to an e-mail, Exchange will assign the PR_SEARCH_KEY value to be the original message's value.
You will need to inspect each object type to understand how the PR_SEARCH_KEY is formed but I doubt if it's always 16 bytes for all MAPI types.
This link USENET discussion has a good discussion with Dmitry Streblechenko involved who is an expert on Extended MAPI.
The sentence before the ones you quoted from the online docs reads, "MAPI uses specific rules for constructing search keys for message recipients" which makes me think that it's talking about the PR_SEARCH_KEY property on MAPI_MAILUSER objects -- or at least not on MAPI_MESSAGE objects.

Resources