Creating hash function to map 6 numbers to a short string - algorithm

I have 6 variables 0 ≤ n₁,...,n₆ ≤ 12 and I'd like to build a hash function to do the direct mapping D(n₁,n₂,n₃,n₄,n₅,n₆) = S and another function to do the inverse mapping I(S) = (n₁,n₂,n₃,n₄,n₅,n₆), where S is a string (a-z, A-Z, 0-9).
My goal is to minimize the length of S for 3 or less.
I thought as the variables have 13 possible values, a single letter (a-z) should be able to represent 2 of them, but I realized that 1 + 12 = m and 2 + 11 = m, so I still don't know how to write a function.
Is there any approach to build a function that does this mapping and returns a small string?
Using the whole ASCII to represent S is an option if it's necessary.

You can convert a set of numbers in any given range to numbers in any other range using base conversion.
Binary is base 2 (0-1), decimal is base 10 (0-9). Your 6 numbers are base 13 (0-12).
Checking whether a conversion would be possible involves counting the number of possible combinations of values for each set. With each number in the range [0,n] (thus base n+1), we can go from all 0's to all n's, thus each number can take on n+1 values and the total number of possibilities is (n+1)numberCount. For 6 decimal digits, for example, it would be 106 = 1000000, which checks out, since there are 1000000 possible numbers with (at most) 6 digits, i.e. numbers < 1000000.
Lower- and uppercase letters and numbers (26+26+10) would be base 62 (0-61), but, following from the above, 3 such values would be insufficient to represent your 6 numbers (136 > 623). To do conversion from/to these, you can do the conversion to a set of base 62 numbers, then have appropriate if-statements to convert 0-9 <=> 0-9, a-z <=> 10-35, A-Z <=> 36-61.
You can represent your data in 3 bytes (since 2563 >= 136), although this wouldn't necessary be printable characters - 32-126 is considered the standard printable range (which is still too small of a range), 128-255 is the extended range and may not be displayed properly in any given environment (to give the best chance of properly displaying it, you should at least avoid 0-31 and 127, which are control characters - you can convert 0-... to the above ranges by adding 32 and then adding another 1 if the value is >= 127).
Many / most languages should allow you to give a numeric value to represent a character, so it should be fairly simple to output it once you do the base conversion. Although some may use Unicode to represent characters, which could make it a bit less trivial to work with ASCII.
If the numbers had specific constraints, that would reduce the number of possible combinations, thus possibly making it fit into a smaller set or range of numbers.
To do the actual base conversion:
It might be simplest to first convert it to a regular integral type (typically binary or decimal), where we don't have to worry about the base, and then convert it to the target base (although first make sure your value will fit in whichever data type you're using).
Consider how binary works:
1101 is 13 = 23 + 22 + 20
13 % 2 = 1 13 / 2 = 6
6 % 2 = 0 6 / 2 = 3
3 % 2 = 1 3 / 2 = 1
1 % 2 = 1
The above, from top to bottom: 1101 = our number
Using the same idea, we can convert to/from any base as follows: (pseudo-code)
int convertFromBase(array, base):
output = 0
for each i in array
output = base*output + i
return output
int[] convertToBase(num, base):
output = []
while num > 0
output.append(num % base)
num /= base
output.reverse()
return output
You can also extend this logic to situations where each number is in a different range by changing what you divide or multiple by at each step (a detailed explanation of that is perhaps a bit beyond the scope of the question).

I thought as the variables have 13 possible values, a single letter
(a-z) should be able to represent 2 of them
This reasoning is wrong. In fact to represent two variables (=any combination these variables might take) you will need 13x13 = 169 symbols.
For your example the 6 variables can take 13^6 (=4826809) different combinations. In order to represent all possible combinations you will need 5 letters (a-z) since 26^5 (=11881376) is the least amount that is will yield more than 13^6 combinations.
For ASCII characters 3 symbols should suffice since 256^3 > 13^6.
If you are still interested in code that does the conversion, I will be happy to help.

Related

How to find A%B for A and B are very large numbers (stored in string)

If I have two numbers A and B and I want to compute A%B for A and B are very large (as large as 10^100), both are stored in strings, how can I achieve that?
10^100 is actually not that large, so you can just use a language like Python, which do not have an explicitly defined limit on the size of long integers.
If you want to just do the calculation once, you can just use a big number calculator like here.
If you want to work this out for fun, you can implement the division algorithm directly on the string representation.
Addition of two numbers is not a big deal (from right to left, add the ASCII values and deduct that of 0; carry if necessary). Subtraction is similar. And the comparison of two numbers is also very similar.
Multiplication of a number by a digit is also manageable (from right to left, convert ASCII->digit, perform the multiply and convert the rightmost digit to ASCII; carries can be larger but will fit in an int).
The key operation is: given a dividend and a divisor, find the leftmost digit of the quotient.
E.g.
3452 : 27
27 fits once in 34, hence the first digit is 1. Now subtract and get the next digit
3452
-27
= 752
27 fits 2 times in 75, and
752
-54
=212
Finally, 27 fits 7 times in 212 and
212
-189
= 23
which is the remainder.

Check the Number of Powers of 2

I have a number X , I want to check the number of powers of 2 it have ?
For Ex
N=7 ans is 2 , 2*2
N=20 ans is 4, 2*2*2*2
Similar I want to check the next power of 2
For Ex:
N=14 Ans=16
Is there any Bit Hack for this without using for loops ?
Like we are having a one line solution to check if it's a power of 2 X&(X-1)==0,similarly like that ?
GCC has a built-in instruction called __builtin_clz() that returns the number of leading zeros in an integer. So for example, assuming a 32-bit int, the expression p = 32 - __builtin_clz(n) will tell you how many bits are needed to store the integer n, and 1 << p will give you the next highest power of 2 (provided p<32, of course).
There are also equivalent functions that work with long and long long integers.
Alternatively, math.h defines a function called frexp() that returns the base-2 exponent of a double-precision number. This is likely to be less efficient because your integer will have to be converted to a double-precision value before it is passed to this function.
A number is power of two if it has only single '1' in its binary value. For example, 2 = 00000010, 4 = 00000100, 8 = 00001000 and so on. So you can check it using counting the no. of 1's in its bit value. If count is 1 then the number is power of 2 and vice versa.
You can take help from here and here to avoid for loops for counting set bits.
If count is not 1 (means that Value is not power of 2) then take position of its first set bit from MSB and the next power of 2 value to this number is the value having only set bit at position + 1. For example, number 3 = 00000011. Its first set bit from MSB is 2nd bit. Therefore the next power of 2 number is a value having only set bit at 3rd position. i.e. 00000100 = 4.

Generate a unique number out of the combination of 'n' different numbers?

To clarify, as input I have 'n' (n1, n2, n3,...) numbers (integers) such as each number is unique within this set.
I would like to generate a number out of this set (lets call the generated number big 'N') that is also unique, and that allows me to verify that a number 'n1' belongs to the set 'n' just by using 'N'.
is that possible?
Edit:
Thanks for the answers guys, I am looking into them atm. For those requesting an example, here is a simple one:
imagine i have those paths (bi-directional graph) with a random unique value (let's call it identifier):
P1 (N1): A----1----B----2----C----3----D
P2 (N2): A----4----E----5----D
So I want to get the full path (unique path, not all paths) from A knowing N1 and this path as a result should be P1.
Mind you that 1,2,...are just unique numbers in this graph, not weights or distances, I just use them for my heuristic.
If you are dealing with small numbers, no problem. You are doing the same thing with digits every time you compose a number: a digit is a number from 0 to 9 and a full number is a combination of them that:
is itself a number
is unique for given digits
allows you to easily verify if a digit is inside
The gotcha is that the numbers must have an upper limit, like 10 is for digits. Let's say 1000 here for simplicity, the similar composed number could be:
n1*1000^k + n2*1000^(k-1) + n3*1000^(k-2) ... + nk*1000^(0)
So if you have numbers 33, 44 and 27 you will get:
33*1000000 + 44*1000 + 27, and that is number N: 33044027
Of course you can do the same with bigger limits, and binary like 256,1024 or 65535, but it grows big fast.
A better idea, if possible is to convert it into a string (a string is still a number!) with some separator (a number in base 11, that is 10 normal digits + 1 separator digit). This is more flexible as there are no upper limits. Imagine to use digits 0-9 + a separator digit 'a'. You can obtain number 33a44a27 in base 11. By translating this to base 10 or base 16 you can get an ordinary computer number (65451833 if I got it right). Then converting 65451833 to undecimal (base11) 33a44a27, and splitting by digit 'a' you can get the original numbers back to test.
EDIT: A VARIABLE BASE NUMBER?
Of course this would work better digitally in base 17 (16 digits+separator). But I suspect there are more optimal ways, for example if the numbers are unique in the path, the more numbers you add, the less are remaining, the shorter the base could shrink. Can you imagine a number in which the first digit is in base 20, the second in base 19, the third in base 18, and so on? Can this be done? Meh?
In this variating base world (in a 10 nodes graph), path n0-n1-n2-n3-n4-n5-n6-n7-n8-n9 would be
n0*10^0 + (n1*9^1)+(offset:1) + n2*8^2+(offset:18) + n3*7^3+(offset:170)+...
offset1: 10-9=1
offset2: 9*9^1-1*8^2+1=81-64+1=18
offset3: 8*8^2-1*7^3+1=343-512+1=170
If I got it right, in this fiddle: http://jsfiddle.net/Hx5Aq/ the biggest number path would be: 102411
var path="9-8-7-6-5-4-3-2-1-0"; // biggest number
o2=(Math.pow(10,1)-Math.pow(9,1)+1); // offsets so digits do not overlap
o3=(Math.pow(9,2)-Math.pow(8,2)+1);
o4=(Math.pow(8,3)-Math.pow(7,3)+1);
o5=(Math.pow(7,4)-Math.pow(6,4)+1);
o6=(Math.pow(6,5)-Math.pow(5,5)+1);
o7=(Math.pow(5,6)-Math.pow(4,6)+1);
o8=(Math.pow(4,7)-Math.pow(3,7)+1);
o9=(Math.pow(3,8)-Math.pow(2,8)+1);
o10=(Math.pow(2,9)-Math.pow(1,9)+1);
o11=(Math.pow(1,10)-Math.pow(0,10)+1);
var n=path.split("-");
var res;
res=
n[9]*Math.pow(10,0) +
n[8]*Math.pow(9,1) + o2 +
n[7]*Math.pow(8,2) + o3 +
n[6]*Math.pow(7,3) + o4 +
n[5]*Math.pow(6,4) + o5 +
n[4]*Math.pow(5,5) + o6 +
n[3]*Math.pow(4,6) + o7 +
n[2]*Math.pow(3,7) + o8 +
n[1]*Math.pow(2,8) + o9 +
n[0]*Math.pow(1,9) + o10;
alert(res);
So N<=102411 would represent any path of ten nodes? Just a trial. You have to find a way of naming them, for instance if they are 1,2,3,4,5,6... and you use 5 you will have to compact the remaining 1,2,3,4,6->5,7->6... => 1,2,3,4,5,6... (that is revertable and unique if you start from the first)
Theoretically, yes it is.
By defining p_i as the i'th prime number, you can generate N=p_(n1)*p_(n2)*..... Now, all you have to do is to check if N%p_(n) == 0 or not.
However, note that N will grow to huge numbers very fast, so I am not sure this is a very practical solution.
One very practical probabilistic solution is using bloom filters. Note that bloom filters is a set of bits, that can be translated easily to any number N.
Bloom filters have no false negatives (if you said a number is not in the set, it really isn't), but do suffer from false positives with an expected given probability (that is dependent on the size of the sets, number of functions used and number of bits used).
As a side note, to get a result that is 100% accurate, you are going to need at the very least 2^k bits (where k is the range of the elements) to represent the number N by looking at this number as a bitset, where each bit indicates existence or non-existence of a number in the set. You can show that there is no 100% accurate solution that uses less bits (peigeon hole principle). Note that for integers for example with 32 bits, it means you are going to need N with 2^32 bits, which is unpractical.

String to Number and back algorithm

This is a hard one (for me) I hope people can help me. I have some text and I need to transfer it to a number, but it has to be unique just as the text is unique.
For example:
The word 'kitty' could produce 12432, but only the word kitty produces that number. The text could be anything and a proper number should be given.
One problem the result integer must me a 32-bit unsigned integer, that means the largest possible number is 2147483647. I don't mind if there is a text length restriction, but I hope it can be as large as possible.
My attempts. You have the letters A-Z and 0-9 so one character can have a number between 1-36. But if A = 1 and B = 2 and the text is A(1)B(2) and you add it you will get the result of 3, the problem is the text BA produces the same result, so this algoritm won't work.
Any ideas to point me in the right direction or is it impossible to do?
Your idea is generally sane, only needs to be developed a little.
Let f(c) be a function converting character c to a unique number in range [0..M-1]. Then you can calculate result number for the whole string like this.
f(s[0]) + f(s[1])*M + f(s[2])*M^2 + ... + f(s[n])*M^n
You can easily prove that number will be unique for particular string (and you can get string back from the number).
Obviously, you can't use very long strings here (up to 6 characters for your case), as 36^n grows fast.
Imagine you were trying to store Strings from the character set "0-9" only in a number (the equivalent of obtaining a number of a string of digits). What would you do?
Char 9 8 7 6 5 4 3 2 1 0
Str 0 5 2 1 2 5 4 1 2 6
Num = 6 * 10^0 + 2 * 10^1 + 1 * 10^2...
Apply the same thing to your characters.
Char 5 4 3 2 1 0
Str A B C D E F
L = 36
C(I): transforms character to number: C(0)=0, C(A)=10, C(B)=11, ...
Num = C(F) * L ^ 0 + C(E) * L ^ 1 + ...
Build a dictionary out of words mapped to unique numbers and use that, that's the best you can do.
I doubt there are more than 2^32 number of words in use, but this is not the problem you're facing, the problem is that you need to map numbers back to words.
If you were only mapping words over to numbers, some hash algorithm might work, although you'd have to work a bit to guarantee that you have one that won't produce collisions.
However, for numbers back to words, that's quite a different problem, and the easiest solution to this is to just build a dictionary and map both ways.
In other words:
AARDUANI = 0
AARDVARK = 1
...
If you want to map numbers to base 26 characters, you can only store 6 characters (or 5 or 7 if I miscalculated), but not 12 and certainly not 20.
Unless you only count actual words, and they don't follow any good countable rules. The only way to do that is to just put all the words in a long list, and start assigning numbers from the start.
If it's correctly spelled text in some language, you can have a number for each word. However you'd need to consider all possible plurals, place and people names etc. which is generally impossible. What sort of text are we talking about? There's usually going to be some existing words that can't be coded in 32 bits in any way without prior knowledge of them.
Can you build a list of words as you go along? Just give the first word you see the number 1, second number 2 and check if a word has a number already or it needs a new one. Then save your newly created dictionary somewhere. This would likely be the only workable solution if you require 100% reliable, reversible mapping from the numbers back to original words given new unknown text that doesn't follow any known pattern.
With 64 bits and a sufficiently good hash like MD5 it's extremely unlikely to have collisions, but for 32 bits it doesn't seem likely that a safe hash would exist.
Just treat each character as a digit in base 36, and calculate the decimal equivalent?
So:
'A' = 0
'B' = 1
[...]
'Z' = 25
'0' = 26
[...]
'9' = 35
'AA' = 36
'AB' = 37
[...]
'CAB' = 46657

How many digits will be after converting from one numeral system to another

The main question: How many digits?
Let me explain. I have a number in binary system: 11000000 and in decimal is 192.
After converting to decimal, how many digits it will have (in dicimal)? In my example, it's 3 digits. But, it isn't a problem. I've searched over internet and found one algorithm for integral part and one for fractional part. I'm not quite understand them, but (I think) they works.
When converting from binary to octal, it's more easy: each 3 bits give you 1 digit in octal. Same for hex: each 4 bits = 1 hex digit.
But, I'm very curious, what to do, if I have a number in P numeral system and want to convert it to the Q numeral system? I know how to do it (I think, I know :)), but, 1st of all, I want to know how many digits in Q system it will take (u no, I must preallocate space).
Writing n in base b takes ceiling(log base b (n)) digits.
The ratio you noticed (octal/binary) is log base 8 (n) / log base 2 (n) = 3.
(From memory, will it stick?)
There was an error in my previous answer: look at the comment by Ben Schwehn.
Sorry for the confusion, I found and explain the error I made in my previous answer below.
Please use the answer provided by Paul Tomblin. (rewritten to use P, Q and n)
Y = ln(P^n) / ln(Q)
Y = n * ln(P) / ln(Q)
So Y (rounded up) is the number of characters you need in system Q to express the highest number you can encode in n characters in system P.
I have no answer (that wouldn't convert the number already and take up that many space in a temporary variable) to get the bare minimum for a given number 1000(bin) = 8(dec) while you would reserve 2 decimal positions using this formula.
If a temporary memory usage isn't a problem, you might cheat and use (Python):
len(str(int(otherBaseStr,P)))
This will give you the number of decimals needed to convert a number in base P, cast as a string (otherBaseStr), into decimals.
Old WRONG answer:
If you have a number in P numeral system of length n
Then you can calculate the highest number that is possible in n characters:
P^(n-1)
To express this highest number in number system Q you need to use logarithms (because they are the inverse to exponentiation):
log((P^(n-1))/log(Q)
(n-1)*log(P) / log(Q)
For example
11000000 in binary is 8 characters.
To get it in Decimal you would need:
(8-1)*log(2) / log(10) = 2.1 digits (round up to 3)
Reason it was wrong:
The highest number that is possible in n characters is
(P^n) - 1
not
P^(n-1)
If you have a number that's X digits long in base B, then the maximum value that can be represented is B^X - 1. So if you want to know how many digits it might take in base C, then you have to find the number Y that C^Y - 1 is at least as big as B^X - 1. The way to do that is to take the logarithm in base C of B^X-1. And since the logarithm (log) of a number in base C is the same as the natural log (ln) of that number divided by the natural log of C, that becomes:
Y = ln((B^X)-1) / ln(C) + 1
and since ln(B^X) is X * ln(B), and that's probably faster to calculate than ln(B^X-1) and close enough to the right answer, rewrite that as
Y = X * ln(B) / ln(C) + 1
Covert that to your favourite language. Because we dropped the "-1", we might end up with one digit more than you need in some cases. But even better, you can pre-calculate ln(B)/ln(C) and just multiply it by new "X"s and the length of the number you are trying to convert changes.
Calculating the number of digit can be done using the formulas given by the other answers, however, it might actually be faster to allocate a buffer of maximum size first and then return the relevant part of that buffer instead of calculating a logarithm.
Note that the worst case for the buffer size happens when you convert to binary, which gives you a buffer size of 32 characters for 32-bit integers.
Converting a number to an arbitrary base could be done using the C# function below (The code would look very similar in other languages like C or Java):
public static string IntToString(int value, char[] baseChars)
{
// 32 is the worst cast buffer size for base 2 and int.MaxValue
int i = 32;
char[] buffer = new char[i];
int targetBase= baseChars.Length;
do
{
buffer[--i] = baseChars[value % targetBase];
value = value / targetBase;
}
while (value > 0);
char[] result = new char[32 - i];
Array.Copy(buffer, i, result, 0, 32 - i);
return new string(result);
}
The keyword here is "logarithm", here are some suggestive links:
http://www.adug.org.au/MathsCorner/MathsCornerLogs2.htm
http://staff.spd.dcu.ie/johnbcos/download/Fermat%20material/Fermat_Record_Number/HOW_MANY.html
look at the logarithms base P and base Q. Round down to nearest integer.
The logarithm base P can be computed using your favorite base (10 or e): log_P(x) = log_10(x)/log_10(P)
You need to compute the length of the fractional part separately.
For binary to decimal, there are as many decimal digits as there are bits. For example, binary 0.11001101001001 is decimal 0.80133056640625, both 14 digits after the radix point.
For decimal to binary, there are two cases. If the decimal fraction is dyadic, then there are as many bits as decimal digits (same as for binary to decimal above). If the fraction is not dyadic, then the number of bits is infinite.
(You can use my decimal/binary converter to experiment with this.)

Resources