Kotlin Convert long String letters to a numerical id - algorithm

I'm trying to find a way to convert a long string ID like "T2hR8VAR4tNULoglmIbpAbyvdRi1y02rBX" to a numerical id.
I thought about getting the ASCII value of each number and then adding them up but I don't think that this is a good way as different numbers can have the same result, for example, "ABC" and "BAC" will have the same result
A = 10, B = 20, C = 50,
ABC = 10 + 20 + 50 = 80
BAC = 20 + 10 + 50 = 80
I also thought about getting each letters ASCII code, then set the numbers next to each other for example "ABC"
so ABC = 102050
this method won't work as having a 20 letter String will result in a huge number, so how can I solve this problem? thank you in advance.

You can use the hashCode() function. "id".hashcode(). All objects implement a variance of this function.
From the documentation:
open fun hashCode(): Int
Returns a hash code value for the object. The general contract of hashCode is:
Whenever it is invoked on the same object more than once, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
If two objects are equal according to the equals() method, then calling the hashCode method on each of the two objects must produce the same integer result.
All platform object implements it by default. There is always a possibility for duplicates if you have lots of ids.
If you use a JVM based kotlin environment the hash will be produced by the
String.hashCode() function from the JVM.

If you need to be 100% confident that there are no possible duplicates, and the input Strings can be up to 20 characters long, then you cannot store the IDs in a 64-bit Long. You will have to use BigInteger:
val id = BigInteger(stringId.toByteArray())
At that point, I question whether there is any point in converting the ID to a numerical format. The String itself can be the ID.

Related

What's the best way to compress multiple values into deserializable value?

I'm implementing an openpeeps.com library for Flutter in which user can create their own peeps to use as an avatar within our product.
One of the reasons behind using peeps as avatar is that (in theory) it can be easily stored as a single value within a database.
A Peep within my library contains of up to 6 PeepAtoms:
class Peep {
final PeepAtom head;
final PeepAtom face;
final PeepAtom facialHair;
final PeepAtom? accessories;
final PeepAtom? body;
final PeepAtom? pose;
}
A PeepAtom is currently just a name identifying the underlying image file required to build a Peep:
class PeepAtom {
final String name;
}
How to get a hash?
What I'd like to do now is get a single value from a Peep (int or string) which I can store in a database. If I retrieve the data, I'd like to deconstruct the value into the unique atoms so I can render the appropriate atom images to display the Peep. While I'm not really looking to optimize for storage size, it would be nice if the bytesize would be small.
Since I'm normally not working with such stuff I don't have an idea what's the best option. These are my (naïve) ideas:
do a Peep.toJson and convert the output to base64. Likely inefficient due to a bunch of unnecessary characters.
do a PeepAtom.hashCode for each field within a Peep and upload this. As an array that would be 64bit = 8 Byte * 6 (Atoms). Thats pretty ok but not a single value.
since there are only a limited number of Atoms in each category (less than 100) I could use bitshifts and ^ to put this into one int. However, I think this would not really working because I'd need a unique identifier and since I'm code generating the PeepAtoms within my code that likely would be quite complex.
Any better ideas/algorithms?
I'm not sure what you mean by "quite complex". It looks quite simple to pack your atoms into a double.
Note that this is no way a "hash". A hash is a lossy operation. I presume that you want to recover the original data.
Based on your description, you need seven bits for each atom. They can range in 0..98 (since you said "less than 100"). A double has 52 bits of mantissa. Your six atoms needs 42 bits, so it fits easily. For atoms that can be null, just give that a special unused 7-bit value, like 127.
Now just use multiply and add to combine them. Use modulo and divide to pull them back out. E.g.:
double val = head;
val = val * 128 + face;
val = val * 128 + facialHair;
...
To extract:
int pose = val % 128;
val = (val / 128).floorToDouble();
int body = val % 128;
val = (val / 128).floorToDouble();
...

Enter string repeatedly Ruby Cucumber

I have a test that includes character lengths within fields etc.
I was wondering if I could have a set string of 10 characters like str = 'abcdefghij'
then have it multiply that string by the amount of times needed to fulfil the character length and fill in the field.
I've tried the times method but that just enters the same value over x iterations.
What I want is to take str, increase it ten fold and enter that value as 1 continuous string so abcdefghij becomes abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij etc
I'd parameterize the number of times to increase it depending on the field I'm testing. I want to do this so that I don't have huge amounts of variables stored to satisfy each test.
Can this be done? I hope I've explained clearly.
String#* would do:
'abc' * 10
#⇒ "abcabcabcabcabcabcabcabcabcabc"
To use a floating point parameter:
λ = ->(input, count) do
i, f = *count.divmod(1)
input * i << input[0...(f * input.size).to_i]
end
λ.('abcd', 2.5)
#⇒ 'abcdabcdab'

Hashing a long integer ID into a smaller string

Here is the problem, where I need to transform an ID (defined as a long integer) to a smaller alfanumeric identifier. The details are the following:
Each individual on the problem as an unique ID, a long integer of size 13 (something like 123123412341234).
I need to generate a smaller representation of this unique ID, a alfanumeric string, something like A1CB3X. The problem is that 5 or 6 character length will not be enough to represent such a large integer.
The new ID (eg A1CB3X) should be valid in a context where we know that only a small number of individuals are present (less than 500). The new ID should be unique within that small set of individuals.
The new ID (eg A1CB3X) should be the result of a calculation made over the original ID. This means that taking the original ID elsewhere and applying the same calculation, we should get the same new ID (eg A1CB3X).
This calculation should occur when the individual is added to the set, meaning that not all individuals belonging to that set will be know at that time.
Any directions on how to solve such a problem?
Assuming that you don't need a formula that goes in both directions (which is impossible if you are reducing a 13-digit number to a 5 or 6-character alphanum string):
If you can have up to 6 alphanumeric characters that gives you 366 = 2,176,782,336 possibilities, assuming only numbers and uppercase letters.
To map your larger 13-digit number onto this space, you can take a modulo of some prime number slightly smaller than that, for example 2,176,782,317, the encode it with base-36 encoding.
alphanum_id = base36encode(longnumber_id % 2176782317)
For a set of 500, this gives you a
2176782317P500 / 2176782317500 chance of a collision
(P is permutation)
Best option is to change the base to 62 using case sensitive characters
If you want it to be shorter, you can add unicode characters. See below.
Here is javascript code for you: https://jsfiddle.net/vewmdt85/1/
function compress(n) {
var symbols = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïð'.split('');
var d = n;
var compressed = '';
while (d >= 1) {
compressed = symbols[(d - (symbols.length * Math.floor(d / symbols.length)))] + compressed;
d = Math.floor(d / symbols.length);
}
return compressed;
}
$('input').keyup(function() {
$('span').html(compress($(this).val()))
})
$('span').html(compress($('input').val()))
How about using some base-X conversion, for example 123123412341234 becomes 17N644R7CI in base-36 and 9999999999999 becomes 3JLXPT2PR?
If you need a mapping that works both directions, you can simply go for a larger base.
Meaning: using base 16, you can reduce 1 to 16 to a single character.
So, base36 is the "maximum" that allows for shorter strings (when 1-1 mapping is required)!

Does random_string function return unique string?

I'm using random_string('alnum', 10) function from String Helper available in Codeigniter 3.0. Does the string returned by function each time remain unique as well? If not then what can I use?
No It does not unique.
As Example
random_string('alnum', 1);//if you run this more than 63 you will get minimum one duplicate.
For unique you can use
random_string('unique');
See the Full documentaion
As the name suggests the string is random. That means with a certain (potentially small) probability this function will return the same string twice.
One way to make create a unique string is to have a counter variable which is not a string but a number. Every time you need a new unique string, you increment the counter and then convert the number to a string.

Generating integer within range from unique string in ruby

I have a code that should get unique string(for example, "d86c52ec8b7e8a2ea315109627888fe6228d") from client and return integer more than 2200000000 and less than 5800000000. It's important, that this generated int is not random, it should be one for one unique string. What is the best way to generate it without using DB?
Now it looks like this:
did = "d86c52ec8b7e8a2ea315109627888fe6228d"
min_cid = 2200000000
max_cid = 5800000000
cid = did.hash.abs.to_s.split.last(10).to_s.to_i
if cid < min_cid
cid += min_cid
else
while cid > max_cid
cid -= 1000000000
end
end
Here's the problem - your range of numbers has only 3.6x10^9 possible values where as your sample unique string (which looks like a hex integer with 36 digits) has 16^32 possible values (i.e. many more). So when mapping your string into your integer range there will be collisions.
The mapping function itself can be pretty straightforward, I would do something such as below (also, consider using only a part of the input string for integer conversion, e.g. the first seven digits, if performance becomes critical):
def my_hash(str, min, max)
range = (max - min).abs
(str.to_i(16) % range) + min
end
my_hash(did, min_cid, max_cid) # => 2461595789
[Edit] If you are using Ruby 1.8 and your adjusted range can be represented as a Fixnum, just use the hash value of the input string object instead of parsing it as a big integer. Note that this strategy might not be safe in Ruby 1.9 (per the comment by #DataWraith) as object hash values may be randomized between invocations of the interpreter so you would not get the same hash number for the same input string when you restart your application:
def hash_range(obj, min, max)
(obj.hash % (max-min).abs) + [min, max].min
end
hash_range(did, min_cid, max_cid) # => 3886226395
And, of course, you'll have to decide what to do about collisions. You'll likely have to persist a bucket of input strings which map to the same value and decide how to resolve the conflicts if you are looking up by the mapped value.
You could generate a 32-bit CRC, drop one bit, and add the result to 2.2M. That gives you a max value of 4.3M.
Alternately you could use all 32 bits of the CRC, but when the result is too large, append a zero to the input string and recalculate, repeating until you get a value in range.

Resources