Bijective "Integer <-> String" function - algorithm

Here's a problem I'm trying to create the best solution for. I have a finite set of non-negative integers in the range of [0...N]. I need to be able to represent each number in this set as a string and be able to convert such string backwards to original number. So this should be a bijective function.
Additional requirements are:
String representation of a number should obfuscate original number at least to some degree. So primitive solution like f(x) = x.toString() will not work.
String length is important: the less the better.
If one knows the string representation of K, I would like it to be non-trivial (to some degree) to guess the string representation of K+1.
For p.1 & p.2 the obvious solution is to use something like Base64 (or whatever BaseXXX to fit all the values) notation. But can we fit into p.3 with minimal additional effort? Common sense tells me that I additionally need a bijective "String <-> String" function for BaseXXX values. Any suggestions?
Or maybe there's something better than BaseXXX to use to fit all 3 requirements?

If you do not need this to be too secure, you can just use a simple symmetric cipher after encoding in BaseXXX. For example you can choose a key sequence of integers [n₁, n₂, n₃...] and then use a Vigenere cipher.
The basic idea behind the cipher is simple--encode each character C as C + K (mod 26) where K is an element from the key. As you go along, just get the next number from the key for the next character, wrapping around once you run out of values in the key.
You really have two options here: you can first convert a number to a string in baseXXX and then encrypt, or you can use the same idea to just encrypt each number as a single character. In that case, you would want to change it from mod 26 to mod N + 1.
Come to think of it, an even simpler option would be to just xor the element from the key and the value. (As opposed to using the Vigenere formula.) I think this would work just as well for obfuscation.

This method meets requirements 1-3, but it is perhaps a bit too computationally expensive:
find a prime p > N+2, not too much larger
find a primitive root g modulo p, that is, a number whose multiplicative order modulo p is p-1
for 0 <= k <= N, let enc(k) = min {j > 0 : g^j == (k+2) (mod p)}
f(k) = enc(k).toString()

Construct a table of length M. This table should map the numbers 0 through M-1 to distinct short strings with a random ordering. Express the integer as a base-M number, using the strings from the table to represent the digits in the number. Decode with a straightforward reversal.
With M=26, you could just use a letter for each of the digits. Or take M=256 and use a byte for each digit.
Not even remotely a good cryptographic approach!

So you need a string that obfuscates the original number, but allows one to determine str(K+1) when str(K) is known?
How about just doing f(x) = (x + a).toString(), where a is secret? Then an outside user can't determine x from f(x), but they can be confident that if they have a string "1234", say, for an unknown x then "1235" maps to x+1.

p. 1 and p. 3 are slightly contradicting and a bit vague, too.
I would propose using hex representation of the integer numbers.
17 => 0x11
123123 => 1E0F3

Related

Generating a perfect hash function given known list of strings?

Suppose I have a list of N strings, known at compile-time.
I want to generate (at compile-time) a function that will map each string to a distinct integer between 1 and N inclusive. The function should take very little time or space to execute.
For example, suppose my strings are:
{"apple", "orange", "banana"}
Such a function may return:
f("apple") -> 2
f("orange") -> 1
f("banana") -> 3
What's a strategy to generate this function?
I was thinking to analyze the strings at compile time and look for a couple of constants I could mod or add by or something?
The compile-time generation time/space can be quite expensive (but obviously not ridiculously so).
Say you have m distinct strings, and let ai, j be the jth character of the ith string. In the following, I'll assume that they all have the same length. This can be easily translated into any reasonable programming language by treating ai, j as the null character if j ≥ |ai|.
The idea I suggest is composed of two parts:
Find (at most) m - 1 positions differentiating the strings, and store these positions.
Create a perfect hash function by considering the strings as length-m vectors, and storing the parameters of the perfect hash function.
Obviously, in general, the hash function must check at least m - 1 positions. It's easy to see this by induction. For 2 strings, at least 1 character must be checked. Assume it's true for i strings: i - 1 positions must be checked. Create a new set of strings by appending 0 to the end of each of the i strings, and add a new string that is identical to one of the strings, except it has a 1 at the end.
Conversely, it's obvious that it's possible to find at most m - 1 positions sufficient for differentiating the strings (for some sets the number of course might be lower, as low as log to the base of the alphabet size of m). Again, it's easy to see so by induction. Two distinct strings must differ at some position. Placing the strings in a matrix with m rows, there must be some column where not all characters are the same. Partitioning the matrix into two or more parts, and applying the argument recursively to each part with more than 2 rows, shows this.
Say the m - 1 positions are p1, ..., pm - 1. In the following, recall the meaning above for ai, pj for pj ≥ |ai|: it is the null character.
let us define h(ai) = ∑j = 1m - 1[qj ai, pj % n], for random qj and some n. Then h is known to be a universal hash function: the probability of pair-collision P(x ≠ y &wedge; h(x) = h(y)) ≤ 1/n.
Given a universal hash function, there are known constructions for creating a perfect hash function from it. Perhaps the simplest is creating a vector of size m2 and successively trying the above h with n = m2 with randomized coefficients, until there are no collisions. The number of attempts needed until this is achieved, is expected 2 and the probability that more attempts are needed, decreases exponentially.
It is simple. Make a dictionary and assign 1 to the first word, 2 to the second, ... No need to make things complicated, just number your words.
To make the lookup effective, use trie or binary search or whatever tool your language provides.

Generating Combinational string of length n using 3 possible values

I have three possible values of war(w) ,buy(b) and sell(s). I have to generate a combinational string of length N.
Suppose N is 2 total combination is 3x3=9
w,w
w,b
w,s
b,w
b,b
b,s
s,w
s,b
s,s
likewise I have to generate a combinational string of (w,s,b) of size equals to N. 2<=N<=8000
You can do it with a recursive function. Here's an example in python, but you can easily rewrite it in your favorite language.
def x(partial):
if len(partial) == N:
handle_solution(partial)
for c in ('w', 'b', 's'):
x(partial + c)
This is going to be slow regardless of language or implementation. The number of solutions is 3^N so even for relatively small values of N this will take a very long time. You should go back to your original problem and figure out a way to solve it without going through all the combinations.

image encryption using henon equation

i want to encrypt pixel value using henon equation :
Xi+2 = 1 - a*(Xi+1)*(Xi+1) + bXi (sorry i can't post image)
where a=1.4, b=0.3, x0=0.01, x1=0.02,
with this code :
k[i+2] =1-a*(Math.pow(k[i+1], 2))+b*k[i]
i can get random value from henon equation
1.00244,
-0.40084033504000005,
1.0757898361270288,
-0.7405053806319072,
0.5550494445953806,
0.3465365454865311,
0.99839222507778,
-0.2915408854881054,
1.1805231444476698,
-1.038551118053691,
-0.15586685140049938,
0.6544223990721852,
. after that i rounded the random value
with this code :
inter[i]= (int) Math.round((k[i]*65536)%256)
i can encrypt the pixel value by XOR with random value (henon).
my question :
there are some negative random value from henon, as we know that there aren't negative pixel value.
so may i skip the negative random value (only save positive random
value) to encrypt original pixel value ?
Thanks
You are using the Hénon sequence as a source for pseudo-random numbers, right?
Then you can of course chose to discard negative numbers (or take the absolute value, or do some other fancy thing) - as long as you do the same in encryption and decryption. If there is a specification, it should better be explicit about this.
Maybe you are using Javascript or some other language where % is not modulus, but remainder. If so, see this answer
Three other things to note:
Double-check that you are claculating the right thing. It seems to me that your calculation should read k[i+1] =1-a*(Math.pow(k[i], 2))+b*k[i], since the Hénon sequence only uses the last value.
`
Do you really need to store past values of k? If not, then just use
k =1-a*(Math.pow(k, 2))+b*k
or even better
k = 1 + k * (b - a *k)
(Spoiler warning: This may be the didactical point of an exercise.) The Hénon sequence is chaotic, and floating point errors will sooner or later influence the random numbers. So your random number generator maybe isn't as deterministic as you think.

How many digits will be after converting from one numeral system to another

The main question: How many digits?
Let me explain. I have a number in binary system: 11000000 and in decimal is 192.
After converting to decimal, how many digits it will have (in dicimal)? In my example, it's 3 digits. But, it isn't a problem. I've searched over internet and found one algorithm for integral part and one for fractional part. I'm not quite understand them, but (I think) they works.
When converting from binary to octal, it's more easy: each 3 bits give you 1 digit in octal. Same for hex: each 4 bits = 1 hex digit.
But, I'm very curious, what to do, if I have a number in P numeral system and want to convert it to the Q numeral system? I know how to do it (I think, I know :)), but, 1st of all, I want to know how many digits in Q system it will take (u no, I must preallocate space).
Writing n in base b takes ceiling(log base b (n)) digits.
The ratio you noticed (octal/binary) is log base 8 (n) / log base 2 (n) = 3.
(From memory, will it stick?)
There was an error in my previous answer: look at the comment by Ben Schwehn.
Sorry for the confusion, I found and explain the error I made in my previous answer below.
Please use the answer provided by Paul Tomblin. (rewritten to use P, Q and n)
Y = ln(P^n) / ln(Q)
Y = n * ln(P) / ln(Q)
So Y (rounded up) is the number of characters you need in system Q to express the highest number you can encode in n characters in system P.
I have no answer (that wouldn't convert the number already and take up that many space in a temporary variable) to get the bare minimum for a given number 1000(bin) = 8(dec) while you would reserve 2 decimal positions using this formula.
If a temporary memory usage isn't a problem, you might cheat and use (Python):
len(str(int(otherBaseStr,P)))
This will give you the number of decimals needed to convert a number in base P, cast as a string (otherBaseStr), into decimals.
Old WRONG answer:
If you have a number in P numeral system of length n
Then you can calculate the highest number that is possible in n characters:
P^(n-1)
To express this highest number in number system Q you need to use logarithms (because they are the inverse to exponentiation):
log((P^(n-1))/log(Q)
(n-1)*log(P) / log(Q)
For example
11000000 in binary is 8 characters.
To get it in Decimal you would need:
(8-1)*log(2) / log(10) = 2.1 digits (round up to 3)
Reason it was wrong:
The highest number that is possible in n characters is
(P^n) - 1
not
P^(n-1)
If you have a number that's X digits long in base B, then the maximum value that can be represented is B^X - 1. So if you want to know how many digits it might take in base C, then you have to find the number Y that C^Y - 1 is at least as big as B^X - 1. The way to do that is to take the logarithm in base C of B^X-1. And since the logarithm (log) of a number in base C is the same as the natural log (ln) of that number divided by the natural log of C, that becomes:
Y = ln((B^X)-1) / ln(C) + 1
and since ln(B^X) is X * ln(B), and that's probably faster to calculate than ln(B^X-1) and close enough to the right answer, rewrite that as
Y = X * ln(B) / ln(C) + 1
Covert that to your favourite language. Because we dropped the "-1", we might end up with one digit more than you need in some cases. But even better, you can pre-calculate ln(B)/ln(C) and just multiply it by new "X"s and the length of the number you are trying to convert changes.
Calculating the number of digit can be done using the formulas given by the other answers, however, it might actually be faster to allocate a buffer of maximum size first and then return the relevant part of that buffer instead of calculating a logarithm.
Note that the worst case for the buffer size happens when you convert to binary, which gives you a buffer size of 32 characters for 32-bit integers.
Converting a number to an arbitrary base could be done using the C# function below (The code would look very similar in other languages like C or Java):
public static string IntToString(int value, char[] baseChars)
{
// 32 is the worst cast buffer size for base 2 and int.MaxValue
int i = 32;
char[] buffer = new char[i];
int targetBase= baseChars.Length;
do
{
buffer[--i] = baseChars[value % targetBase];
value = value / targetBase;
}
while (value > 0);
char[] result = new char[32 - i];
Array.Copy(buffer, i, result, 0, 32 - i);
return new string(result);
}
The keyword here is "logarithm", here are some suggestive links:
http://www.adug.org.au/MathsCorner/MathsCornerLogs2.htm
http://staff.spd.dcu.ie/johnbcos/download/Fermat%20material/Fermat_Record_Number/HOW_MANY.html
look at the logarithms base P and base Q. Round down to nearest integer.
The logarithm base P can be computed using your favorite base (10 or e): log_P(x) = log_10(x)/log_10(P)
You need to compute the length of the fractional part separately.
For binary to decimal, there are as many decimal digits as there are bits. For example, binary 0.11001101001001 is decimal 0.80133056640625, both 14 digits after the radix point.
For decimal to binary, there are two cases. If the decimal fraction is dyadic, then there are as many bits as decimal digits (same as for binary to decimal above). If the fraction is not dyadic, then the number of bits is infinite.
(You can use my decimal/binary converter to experiment with this.)

Map strings to numbers maintaining the lexicographic ordering

I'm looking for an algorithm or function that is able to map a string to a number in such way that the resulting values correspond the lexicographic ordering of strings. Example:
"book" -> 50000
"car" -> 60000
"card" -> 65000
"a longer string" -> 15000
"another long string" -> 15500
"awesome" -> 16000
As a function it should be something like: f(x) = y, so that for any x1 < x2 => f(x1) < f(x2), where x is an arbitrary string and y is a number.
If the input set of x is finite, then I could always do a sort and assign the proper values, but I'm looking for something generic for an unlimited input set for x.
If you require that f map to integers this is impossible.
Suppose that there is such a map f. Consider the strings a, aa, aaa, etc. Consider the values f(a), f(aa), f(aaa), etc. As we require that f(a) < f(aa) < f(aaa) < ... we see that f(a_n) tends to infinity as n tends to infinity; here I am using the obvious notation that a_n is the character a repeated n times. Now consider the string b. We require that f(a_n) < f(b) for all n. But f(b) is some finite integer and we just showed that f(a_n) goes to infinity. We have a contradiction. No such map is possible.
Maybe you could tell us what you need this for? This is fairly abstract and we might be able to suggest something more suitable. Further, don't necessarily worry about solving "it" generally. YAGNI and all that.
As a corollary to Jason's answer, if you can map your strings to rational numbers, such a mapping is very straightforward. If code(c) is the ASCII code of the character c and s[i] is theith character in the string s, just sum like follows:
result <- 0
scale <- 1
for i from 1 to length(s)
scale <- scale / 26
index <- (1 + code(s[i]) - code('a'))
result <- result + index / scale
end for
return result
This maps the empty string to 0, and every other string to a rational number between 0 and 1, maintaining lexicographical order. If you have arbitrary-precision decimal floating-point numbers, you can replace the division by powers of 26 with powers of 100 and still have exactly representable numbers; with arbitrary precision binary floating-point numbers, you can divide by powers of 32.
what you are asking for is a a temporary suspension of the pigeon hole principle (http://en.wikipedia.org/wiki/Pigeonhole_principle).
The strings are the pigeons, the numbers are the holes.
There are more pigeons than holes, so you can't put each pigeon in its own hole.
You would be much better off writing a comparator which you can supply to a sort function. The comparator takes two strings and returns -1, 0, or 1. Even if you could create such a map, you still have to sort on it. If you need both a "hash" and the order, then keep stuff in two data structures - one that preserves the order, and one that allows fast access.
Maybe a Radix Tree is what you're looking for?
A radix tree, Patricia trie/tree, or
crit bit tree is a specialized set
data structure based on the trie that
is used to store a set of strings. In
contrast with a regular trie, the
edges of a Patricia trie are labelled
with sequences of characters rather
than with single characters. These can
be strings of characters, bit strings
such as integers or IP addresses, or
generally arbitrary sequences of
objects in lexicographical order.
Sometimes the names radix tree and
crit bit tree are only applied to
trees storing integers and Patricia
trie is retained for more general
inputs, but the structure works the
same way in all cases.
LWN.net also has an article describing this data structures use in the Linux kernel.
I have post a question here https://stackoverflow.com/questions/22798824/what-lexicographic-order-means
As workaround you can append empty symbols with code zero to right side of the string, and use expansion from case II.
Without such expansion with extra empty symbols I' m actually don't know how to make such mapping....
But if you have a finite set of Symbols (V), then |V*| is eqiualent to |N| -- fact from Disrete Math.

Resources