What is the length and characters in ids generated by elasticsearch?

What is the length and characters in ids generated by elasticsearch? - elasticsearch

The documentation of elastic search states:
The index operation can be executed without specifying the id. In such
a case, an id will be generated automatically.
But it does not provide any information about the properties of the ids.
What is the length (minimun/maximum)?
my guess is 22.
Which characters are used in the id?
My guess is [-_A-Za-z0-9]
Can the properties of the generated ids change at any time (is that part of the API)?

Auto-generated ids are random base64-encoded UUIDs. The base64 algorithm is used in URL-safe mode hence - and _ characters might be present in ids.

Auto-generated ids by elasticsearch are exactly 20 characters length (not 22 characters) and encoded by url-safe base64 algorithm [-_A-Za-z0-9].
Read more in documentation: https://www.elastic.co/guide/en/elasticsearch/guide/master/index-doc.html#_autogenerating_ids

Related

How does GitHub encode their graphQL cursors?

GitHub's graphql cursors are intentionally opaque, so they shouldn't ever be decode by a client. However I'd like to know their approach towards pagination, especially when combined with sorting.

There are multiple layers of encoding for the encoding used for pagination cursors used by GitHub. I will list them in order from the perspective of a decoder:
The cursor string is encoded using URL safe base64 meaning it uses - and _ instead of + and /. This might be to have consistency with their REST based API.
Decoding the base64 string gives us another string in the format of cursor:v2:[something] so the next step is decoding the something.
The 'something' is a binary encoded piece of data containing the actual cursor properties. The first byte defines the cursor type:
0x91 => We don't use any sorting, the cursor contains the length of the id field and the id itself. 0xcd seems to indicate a two-byte id, 0xce a four-byte id. This is followed by the id itself, which can be verified by decoding the base64 id graphql field.
0x92 => A composite cursor containing the sorted property and the id. This is either a length-prefixed ordinal number or two bytes plus a string or ISO date string followed by the length-prefixed id.

Encode array to fixed length string

I'm trying to implement a similar function to PCPartPicker's list permalink function.
https://au.pcpartpicker.com/list/
basically generate a permalink based on the items in the list. The key part is to generate a string which should be:
unique
persistent
fixed length
I'm thinking about encoding an array contains product id, but can't find the right way to implement it.
Base64 and the similar (like Hashids library) can ensure it's unique and persistent, but it ends up quite long when the array has many items.
Is there other way to encode the array or is there other direction I can implement this function?
Thank you in advance.

One can't generate unique fixed length string for arbitrary length list that will contain all info - there is always some length that can't fit.
Since your site has database, you can generate UUID and store list in DB along with UUID. To save space and efforts you can save it into DB only when user presses "get permalink" button or something like that.

Generate unique alpha-numeric object ID, like Parse

I already asked what I needed at the title. I want to generate using either PHP or javascript.
I think the class name and some properties are used to build the objectId but someone may already know how its done that could share here?

The Parse Server generates the objectId. It is a randomly generated String of 10 chars length. You can see their implementation at cryptoUtils.newObjectId(). From the code we can conclude that they are not enforcing uniqueness.
https://github.com/ParsePlatform/parse-server/blob/master/src/cryptoUtils.js

Parse is probably using ids generated in Mongodb. They are not random and can be potentially predicted :
A BSON ObjectID is a 12-byte value
consisting of a 4-byte timestamp
(seconds since epoch), a 3-byte
machine id, a 2-byte process id, and a
3-byte counter
http://www.mongodb.org/display/DOCS/Object+IDs

Compression algorithms for Strings

I have to generate QRCodes using concatenated object properties. These strings might be long, that's why I'd like to know which compression algorithm to use knowing that my String's length is between 25 an 100+ characters
thanks in advance,
Jerec

I am assuming that since you are going to use compression before you store the strings that these QR codes will not be readable by any client, it would have to be an application that you wrote (b/c you are storing character with an unknown encoding, the client won't be able to decode).
Instead of compressing and storing the long string in the QR code, have your application create a URI (like a GUID or a URL) and when your application decodes that URI it looks up all the values (uncompressed) that you wanted to store in the QR code. Then your app can just look up the format in any way it wants.
For example, assuming your persistant storage is an xml file, but it could be anything:
<URI = "http://mydomain.com/790C9704-8C61-435F-991D-CDBB5767AA3D">
<MyElement>14523</MyElement>
<MyElement>67548</MyElement>
...
<MyElement>46167</MyElement>
</URI>
Encoded on QR code: "http://mydomain.com/790C9704-8C61-435F-991D-CDBB5767AA3D", values can then be looked up.

The algorithm used to encode QR codes is dependent on the type of data you encode. See http://www.swetake.com/qr/qr1_en.html.
If you know, for example, that you always have the same number of digits per id and therefor could just string them together without punctuation, you can encode them as purely numeric and you'll use 10 bits for every three characters.
If you need some kind of separator, if you use something in "0-9A-Z $%*+-./:", you'll stay alphanumeric and get 2 characters in 11 bits.
If you give it arbitrary data (note that this includes any lower case: the list above does not include lower case letters) you're going to be using 8 bits per characters.
So numeric only would end up being 60% smaller.

MAPI: Format of PR_SEARCH_KEY

Does anyone know the format of the MAPI property PR_SEARCH_KEY?
The online documentation has this to say about it:
The search key is formed by
concatenating the address type (in
uppercase characters), the colon
character ':', the e-mail address in
canonical form, and the terminating
null character.
And the exchange document MS-OXOABK says this:
The PidTagSearchKey property of type
PtypBinary is a binary value formed by
concatenating the ASCII string "EX: "
followed by the DN for the object
converted to all upper case, followed
by a zero byte value.
However all the MAPI messages I've seen with this property have it as some sort of binary 16 byte sequence that looks like a GUID. Does anyone else have any more information about it? Is it always 16 bytes?
Thanks!

I believe that the property PR_SEARCH_KEY will be of different formats for different objects (as alluded to by Moishe).
A MAPI message object will have a unique value assigned on creation for PR_SEARCH_KEY, however if the object is copied this property value is copied also. I presume when you reply to an e-mail, Exchange will assign the PR_SEARCH_KEY value to be the original message's value.
You will need to inspect each object type to understand how the PR_SEARCH_KEY is formed but I doubt if it's always 16 bytes for all MAPI types.
This link USENET discussion has a good discussion with Dmitry Streblechenko involved who is an expert on Extended MAPI.

The sentence before the ones you quoted from the online docs reads, "MAPI uses specific rules for constructing search keys for message recipients" which makes me think that it's talking about the PR_SEARCH_KEY property on MAPI_MAILUSER objects -- or at least not on MAPI_MESSAGE objects.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

What is the length and characters in ids generated by elasticsearch? - elasticsearch

Auto-generated ids are random base64-encoded UUIDs. The base64 algorithm is used in URL-safe mode hence - and _ characters might be present in ids.

Auto-generated ids by elasticsearch are exactly 20 characters length (not 22 characters) and encoded by url-safe base64 algorithm [-_A-Za-z0-9]. Read more in documentation: https://www.elastic.co/guide/en/elasticsearch/guide/master/index-doc.html#_autogenerating_ids

Related

How does GitHub encode their graphQL cursors?

Encode array to fixed length string

Generate unique alpha-numeric object ID, like Parse

Compression algorithms for Strings

MAPI: Format of PR_SEARCH_KEY

Categories

Resources