Recently, I attended an interview and faced a good question regarding hash collisions.
Question : Given a list of strings, print out the anagrams together.
Example : i/p : {act, god, animal, dog, cat}
o/p : act, cat, dog, god
I want to create hashmap and put the word as key and value as list of anagrams
To avoid collision, I want to generate unique hash code for anagrams instead of sorting and using the sorted word as key.
I am looking for hash algorithm which take care of collision other than using chaining. I want algorithm to generate same hash code for both act and cat... so that it will add next word to the value list
Can anyone suggest a good algorithm ?
Hashing with the sorted string is pretty nice, i'd have done that probably, but it could indeed be slow and cumbersome. Here's another thought, not sure if it works - pick a set of prime numbers, as small as you like, the same size as your character set, and build a fast mapping function from your chars to that. Then for a given word, map each character into the matching prime, and multiply. finally, hash using the result.
This is very similar to what Heuster suggested, only with less collisions (actually, I believe there will be no false collisions, given the uniqueness of the prime decomposition of any number).
simple e.g. -
int primes[] = {2, 3, 5, 7, ...} // can be auto generated with a simple code
inline int prime_map(char c) {
// check c is in legal char set bounds
return primes[c - first_char];
}
...
char* word = get_next_word();
char* ptr = word;
int key = 1;
while (*ptr != NULL) {
key *= prime_map(*ptr);
ptr++;
}
hash[key].add_to_list(word);
[edit]
A few words about the uniqueness - any integer number has a single breakdown to multiplications of primes, so given an integer key in the hash you can actually reconstruct all possible strings that would hash to it, and only these words. Just break into primes, p1^n1*p2^n2*... and convert each prime to the matching char. the char for p1 would appear n1 times, and so on.
You can't get any new prime you didn't explicitly used, being prime means you can't get it by any multiplication of other primes.
This brings another possible improvement - if you can construct the string, you just need to mark the permutations you saw when populating the hash. since the permutations can be ordered by lexicographic order, you can replace each one with a number. This would save the space of storing the actual strings in the hash, but would require more computations so it's not necessarily a good design choice. Still, it makes a nice complication of the original question for interviews :)
Hash function : Assign primary numbers to each character. While calculating hash code, get the prime number assigned to that character and multiply with to existing value.Now all anagrams produce same hash value.
ex :
a - 2,
c - 3
t - 7
hash value of cat = 3*2*7 = 42
hash value of act = 2*3*7 = 42
Print all strings which are having same hash value(anagrams will have same hash value)
The other posters suggested converting characters into prime numbers and multiplying them together. If you do this modulo a large prime, you get a good hash function that won't overflow. I tested the following Ruby code against the Unix word list of most English words and found no hash collisions between words that are not anagrams of one another. (On MAC OS X, this file is located here: /usr/share/dict/words.)
My word_hash function takes the ordinal value of each character mod 32. This will make sure that uppercase and lowercase letters have the same code. The large prime I use is 2^58 - 27. Any large prime will do so long as it is less than 2^64 / A where A is my alphabet size. I am using 32 as my alphabet size, so this means I can't use a number larger than about 2^59 - 1. Since ruby uses one bit for sign and a second bit to indicate if the value is a number or an object, I lose a bit over other languages.
def word_hash(w)
# 32 prime numbers so we can use x.ord % 32. Doing this, 'A' and 'a' get the same hash value, 'B' matches 'b', etc for all the upper and lower cased characters.
# Punctuation gets assigned values that overlap the letters, but we don't care about that much.
primes = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131]
# Use a large prime number as modulus. It must be small enough so that it will not overflow if multiplied by 32 (2^5). 2^64 / 2^5 equals 2^59, so we go a little lower.
prime_modulus = (1 << 58) - 27
h = w.chars.reduce(1) { |memo,letter| memo * primes[letter.ord % 32] % prime_modulus; }
end
words = (IO.readlines "/usr/share/dict/words").map{|word| word.downcase.chomp}.uniq
wordcount = words.size
anagramcount = words.map { |w| w.chars.sort.join }.uniq.count
whash = {}
inverse_hash = {}
words.each do |w|
h = word_hash(w)
whash[w] = h
x = inverse_hash[h]
if x && x.each_char.sort.join != w.each_char.sort.join
puts "Collision between #{w} and #{x}"
else
inverse_hash[h] = w
end
end
hashcount = whash.values.uniq.size
puts "Unique words (ignoring capitalization) = #{wordcount}. Unique anagrams = #{anagramcount}. Unique hash values = #{hashcount}."
Small practical Optimization , I would suggest for the above hash method is :
Assign least prime number to vowels and then most frequently occurring consonants.
Ex :
e : 2
a : 3
i : 5
o : 7
u : 11
t : 13
and so on...
Also, average word length for english is : ~ 6
Also, top 26 prime numbers are less than 100 [2,3,5,7, .. , 97]
Hence, on average your hash would generate value around 100^6 = 10^12.
So there are very less chances of collision if you take prime number for modulo bigger than 10^12.
The complexity above seems very misplaced! You don't need prime numbers or hashes. It's just three simple ops:
Map each OriginalWord to a Tuple of (SortedWord, OriginalWord). Example: "cat" becomes ("act", "cat"); "dog" becomes ("dgo", "dog"). This is a simple sort on the chars of each OriginalWord.
Sort the Tuples by their first element. Example: ("dgo", "dog"), ("act, "cat") sorts to ("act", "cat"), ("dgo", "dog"). This is a simple sort on the entire collection.
Iterate through the tuples (in order), emitting the OriginalWord. Example: ("act", "cat"), ("dgo", "dog") emits "cat" "dog". This is a simple iteration.
Two iterations and two sorts are all it takes!
In Scala, it's exactly one line of code:
val words = List("act", "animal", "dog", "cat", "elvis", "lead", "deal", "lives", "flea", "silent", "leaf", "listen")
words.map(w => (w.toList.sorted.mkString, w)).sorted.map(_._2)
# Returns: List(animal, act, cat, deal, lead, flea, leaf, dog, listen, silent, elvis, lives)
Or, as the original question implies, you only want cases where the count > 1, it's just a bit more:
scala> words.map(w => (w.toList.sorted.mkString, w)).groupBy(_._1).filter({case (k,v) => v.size > 1}).mapValues(_.map(_._2)).values.toList.sortBy(_.head)
res64: List[List[String]] = List(List(act, cat), List(elvis, lives), List(flea, leaf), List(lead, deal), List(silent, listen))
The solution using product of primes is brilliant and here's a Java implementation incase anyone needs one.
class HashUtility {
private int n;
private Map<Character, Integer> primeMap;
public HashUtility(int n) {
this.n = n;
this.primeMap = new HashMap<>();
constructPrimeMap();
}
/**
* Utility to check if the passed {#code number} is a prime.
*
* #param number The number which is checked to be prime.
* #return {#link boolean} value representing the prime nature of the number.
*/
private boolean isPrime(int number) {
if (number <= 2)
return number == 2;
else
return (number % 2) != 0
&&
IntStream.rangeClosed(3, (int) Math.sqrt(number))
.filter(n -> n % 2 != 0)
.noneMatch(n -> (number % n == 0));
}
/**
* Maps all first {#code n} primes to the letters of the given language.
*/
private void constructPrimeMap() {
List<Integer> primes = IntStream.range(2, Integer.MAX_VALUE)
.filter(this::isPrime)
.limit(this.n) //Limit the number of primes here
.boxed()
.collect(Collectors.toList());
int curAlphabet = 0;
for (int i : primes) {
this.primeMap.put((char) ('a' + curAlphabet++), i);
}
}
/**
* We calculate the hashcode of a word by calculating the product of each character mapping prime. This works since
* the product of 2 primes is unique from the products of any other primes.
* <p>
* Since the hashcode can be huge, we return it modulo a large prime.
*
* #param word The {#link String} to be hashed.
* #return {#link int} representing the prime hashcode associated with the {#code word}
*/
public int hashCode(String word) {
long primeProduct = 1;
long mod = 100000007;
for (char currentCharacter : word.toCharArray()) {
primeProduct *= this.primeMap.get(currentCharacter) % mod;
}
return (int) primeProduct;
}
}
Please let me know if/how I can improve this.
We can use the binary value representation of array. This code snippet is assuming all characters are small latin characters.
public int hashCode() {
//TODO: so that each set of anagram generates same hashCode
int sLen = s.length();
int [] ref = new int[26];
for(int i=0; i< sLen; i++) {
ref[s.charAt(i) - 'a'] +=1;
}
int hashCode = 0;
for(int i= 0; i < ref.length; i++) {
hashCode += new Double(Math.pow(2, i)).intValue() * ref[i];
}
return hashCode;
}
create the hascode in following way
String hash(String s){
char[] hashValue = new char[26];
for(char c: s.toCharArray()){
hash[c-'a']++;
}
return new String(hashValue);
}
here the hash will be initialized with the default value of char u0000 and an increment will make the value to the next Unicode. since its a char array we can convert it to string and use it as the key
Related
Problem
I have a need to pick one unique random number at a time between 0 and 10,000,000,000 and do it till all numbers are selected. Essentially the behavior I need is a pre-built stack/queue with 10 billion numbers in random order, with no ability to push new items into it.
Not so good ways to solve:
There's no shortage of inefficient ways in my brain. Such as,
persist generated numbers and check newly generated random number is already used, at some point this gets us into indefinite wait before a usable number is produced.
Persist all possible numbers in a table and pop a random row and maintain new row count for next pick etc. Not sure if this is good or bad.
Questions:
Are there other deterministic ways besides storing all possible combinations and using random?
Like maintaining windows of available numbers and randomly select a window first and randomly select a number within that window etc. eg: like this
If not, what is the best type to store numbers in reasonably small amount of space?
50+% of numbers wont fit in a 32 bit (int), 64 bit (long) is waste. Cos largest number fits in 34 bits, wasting 30 bits per number (>37GB total).
If this problem hasn't been solved already.
What is a good data structure for storing & picking a random spot and quickly adjust the structure for next pick to be fast?
***Sorry for the ambiguity. The largest selectable number is 9,999,999,999 and smallest selectable is 1.
You ask: "Are there other deterministic ways besides storing all possible combinations and using random?"
Yes there is: Encryption. Encryption with a given key guarantees a unique result for unique inputs since it is reversible. Each key defines a one-to-one permutation of the possible inputs. You need an encryption of inputs in the range [1..10e9]. To deal with something that big you need 34 bit numbers, which go up to 17,179,869,183.
There is no standard 34 bit encryption. Depending on how much security you need, and how fast you need the numbers, you can either write your own simple, fast, insecure four-round Feistel Cipher or else for something slower and more secure use Hasty Pudding cipher in 34 bit mode.
With either solution, if the first encryption gives a result outside the range, just encrypt the result again until the new result is within the range you want. The one-to-one property ensures that the final result of the chain of encryptions will be unique.
To generate a sequence of unique random-seeming numbers just encrypt 0, 1, 2, 3, 4, ... in order with the same key. Encryption guarantees that the results will be unique for that key. If you record how far you have got, then you can generate more unique numbers later, up to your 10 billion limit.
As mentioned by AChampion in the comments, you could use a Linear Congruential generator.
Your modulo (m) value will be 10 billion. In order to get a full period (all values in the range appear before the series repeats) you need to choose the a and c constants to satisfy certain criteria. m and c need to be relatively prime and a - 1 needs to be divisible by the prime factors of m (which are just 2 and 5) and also by 4 (since 10 billion is divisible by 4).
If you just come up with a single set of constants, you will only have one possible series and the numbers will always occur in the same order. However you can easily randomly generate constants that satisfy the criteria. To test for relative primality of c and m, just test if c is divisible by 2 and 5, since these are the only prime factors of m (see first condition of coprimality test here)
Simple sketch in Python:
import random
m = 10000000000
a = 0
c = 0
r = 0
def setupLCG():
global a, c, r
# choose value of c that is 0 < c < m and relatively prime to m
c = 5
while ((c % 5 == 0) or (c % 2 == 0)):
c = random.randint(1, m - 1)
# choose value of a that is 0 < a <= m and a - 1 is divisible by
# prime factors of m, and 4
a = 4
while ((((a - 1) % 4) != 0) or (((a - 1) % 5) != 0)):
a = random.randint(1, m)
r = random.randint(0, m - 1)
def rand():
global m, a, c, r
r = (a*r + c) % m
return r
random.seed()
setupLCG()
for i in range(1000):
print rand() + 1
This approach won't give the full possibility of 10000000000! possible combinations, but it will still be on the order of 1019, which is quite a lot. It does have a few other issues (e.g. alternates even and odd values). You could mix it up a bit by having a small pool of numbers, adding a number from the sequence to it each time and randomly drawing one out.
Similar to what rossum has suggested, you can use invertible integer hash function, which uniquely maps an integer in [0,2^k) to another integer in the same range. For your particular problem, you choose k=34 (2^34=16 billion) and reject any number above 10 billion. Here is a complete implementation:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
uint64_t hash_64(uint64_t key, uint64_t mask)
{
key = (~key + (key << 21)) & mask; // key = (key << 21) - key - 1;
key = key ^ key >> 24;
key = ((key + (key << 3)) + (key << 8)) & mask; // key * 265
key = key ^ key >> 14;
key = ((key + (key << 2)) + (key << 4)) & mask; // key * 21
key = key ^ key >> 28;
key = (key + (key << 31)) & mask;
return key;
}
int main(int argc, char *argv[])
{
uint64_t i, shift, mask, max = 10000ULL;
char *dummy;
if (argc > 1) max = strtol(argv[1], &dummy, 10);
for (shift = 0; 1ULL<<shift <= max; ++shift) {}
mask = (1ULL<<shift) - 1;
for (i = 0; i <= mask; ++i) {
uint64_t x = hash_64(i, mask);
x = hash_64(x, mask);
x = hash_64(x, mask); // apply multiple times to increase randomness
if (x > max || x == 0) continue;
printf("%llu\n", x);
}
return 0;
}
This should give you number [0,10000000000] in random order.
For the range 1-999,999,999,999 is equivalent 0-999,999,999,998 (just add 1). Given the definition of LCG then you can implement this:
import functools as ft
import itertools as it
import operator as op
from sympy import primefactors, nextprime
def LCG(m, seed=0):
factors = set(primefactors(m))
a = ft.reduce(op.mul, factors)+1
assert(m%4 != 0 or (m%4 == 0 and (a-1)%m == 0))
c = nextprime(max(factors)+1)
assert(c < m)
x = seed
while True:
x = (a * x + c) % m
yield x
# Check the first 10,000,000 for duplicates
>>> x = list(it.islice(LCG(999999999999), 10000000))
>>> len(x) == len(set(x))
True
# Last 10 numbers
>>> x[-10:]
[99069910838, 876847698522, 765736597318, 99069940559, 210181061577,
432403293706, 99069970280, 543514424631, 99069990094, 99070000001]
I've taken a couple of shortcuts for the context of this question as the asserts should be replaced with handling code, currently it would just fail if those asserts were False
I'm not aware of any truly random methods of picking the numbers without storing a list of the numbers already picked. You could do some sort of linear hashing algorithm, and then pass the numbers 0 to n through it (repeating when your hash returns a value above 10000000000), but this wouldn't be truly random.
If you are to store the numbers, you might consider doing it via a bitmask. To pick quickly in the bitmask, you would likely keep a tree, where each leaf would represent the number of free bits in the corresponding 32 bytes, the branches above that would list the number of free bits in the corresponding 2K entries, and so forth. You then have O(log(n)) time to find your next entry, and O(log(n)) time to claim a bit (as you have to update the tree). It would require something to the order of 2n bits to store as well.
You definitely don't need to store all the numbers.
If you want a perfect set of the numbers from 1 to 10B each exactly once, there are two options that I see: as hinted at by the others, use a 34-bit LCG or Galois LFSR or XOR-shift that generates a sequence of numbers from 1 to 17B or so, then throw out the ones over 10B. I am not aware of any specifically 34-bit functions for this, but I'm sure someone is.
Option 2, if you can spare 1.25 GB of memory, is to create a bitmap that stores only the information that a certain number has been chosen, then use Floyd's Algorithm to get the numbers, which would be fast and give you much better quality numbers (in fact, it would work just fine with hardware RNGs).
Option 3, if you can live with a rare but occasional mistake (duplicate or never-selected number), replace the bitmap with a Bloom filter and save memory.
If predictability is not a concern, you can generate quickly using XOR operations. Suppose you want to generate a random sequence of unique numbers with n bits (34 in your case):
1- take a seed number on n bits. This number, K, can be considered as a seed that you can change each time you run a new experiment.
2- Use a counter from 0 upward
3- Each time XOR the counter with K : next = counter xor K; counter++;
To limit the range to 10 Billion, which is not a power of two, you will need to do rejection.
The obvious drawback is predictability. In step 3, you can do a prior transposition on the bytes of the counter, for example inverse the order of the bytes (like when you transform from little-endian to big endian). This would yield some improvement concerning the predictability of the next number.
Finally I have to admit that this answer can be considered as a particular implementation of encryption which was mentioned in the answer of #rossum, but it's more specific and probably fastest.
Incredibly slow but it should work. Completely random
using System;
using System.Diagnostics;
using System.IO;
using System.Runtime.InteropServices;
namespace ConsoleApplication1
{
class Program
{
static Random random = new Random();
static void Main()
{
const long start = 1;
const long NumData = 10000000000;
const long RandomNess = NumData;
var sz = Marshal.SizeOf(typeof(long));
var numBytes = NumData * sz;
var filePath = Path.GetTempFileName();
using (var stream = new FileStream(filePath, FileMode.Create))
{
// create file with numbers in order
stream.Seek(0, SeekOrigin.Begin);
for (var index = start; index < NumData; index++)
{
var bytes = BitConverter.GetBytes(index);
stream.Write(bytes, 0, sz);
}
for (var iteration = 0L; iteration < RandomNess; iteration++)
{
// get 2 random longs
var item1Index = LongRandom(0, NumData - 1, random);
var item2Index = LongRandom(0, NumData - 1, random);
// allocate room for data
var data1ByteArray = new byte[sz];
var data2ByteArray = new byte[sz];
// read the first value
stream.Seek(item1Index * sz, SeekOrigin.Begin);
stream.Read(data1ByteArray, 0, sz);
// read the second value
stream.Seek(item2Index * sz, SeekOrigin.Begin);
stream.Read(data2ByteArray, 0, sz);
var item1 = BitConverter.ToInt64(data1ByteArray, 0);
var item2 = BitConverter.ToInt64(data2ByteArray, 0);
Debug.Assert(item1 < NumData);
Debug.Assert(item2 < NumData);
// swap the values
stream.Seek(item1Index * sz, SeekOrigin.Begin);
stream.Write(data2ByteArray, 0, sz);
stream.Seek(item2Index * sz, SeekOrigin.Begin);
stream.Write(data1ByteArray, 0, sz);
}
}
File.Delete(filePath);
Console.WriteLine($"{numBytes}");
}
static long LongRandom(long min, long max, Random rand)
{
long result = rand.Next((int)(min >> 32), (int)(max >> 32));
result = (result << 32);
result = result | rand.Next((int)min, (int)max);
return result;
}
}
}
I'm trying to find, given 4 arrays of N strings, a string that is common to at least 3 of the arrays in O(N*log(N)) time, and if it exists return the lexicographically first string.
What I tried was creating an array of size 4*N and adding items from the 4 arrays to it while removing the duplicates. Then I did a Quick sort on the big array to find the first eventual triplicate.
Does anyone know a better solution?
You can do this in O(n log n), with constant extra space. It's a standard k-way merge problem, after sorting the individual lists. If the individual lists can contain duplicates, then you'll need to remove the duplicates during the sorting.
So, assuming you have list1, list2, list3, and list4:
Sort the individual lists, removing duplicates
Create a priority queue (min-heap) of length 4
Add the first item from each list to the heap
last-key = ""
last-key-count = 0
while not done
remove the smallest item from the min-heap
add to the heap the next item from the list that contained the item you just removed.
if the item matches last-key
increment last-key-count
if last-key-count == 3 then
output last-key
exit done
else
last-key-count = 1
last-key = item key
end while
// if you get here, there was no triplicate item
An alternate way to do this is to combine all the lists into a single list, then sort it. You can then go through it sequentially to find the first triplicate. Again, if the individual lists can contain duplicates, you should remove them before you combine the lists.
combined = list1.concat(list2.concat(list3.concat(list4)))
last-key = ""
last-key-count = 0
for i = 0 to combined.length-1
if combined[i] == last-key
last-key-count++
if last-key-count == 3
exit done
else
last-key = combined[i]
last-key-count = 1
end for
// if you get here, no triplicate was found
Here we have 4 arrays of N strings, where N = 5. My approach to get all triplicates is:
Get the 1st string of the 1st array and add it in a Map< String, Set< Integer > > with the array number in the Set (I'm using a Hash because insertion and search are O(1));
Get the 1st string of the 2nd array and add it in a Map< String, Set< Integer > > with the array number in the Set;
Repeat step 2, but using 3rd and 4th arrays instead of 2nd;
Repeat steps 1, 2 and 3 but using the 2nd string instead of 1st;
Repeat steps 1, 2 and 3 but using the 3nd string instead of 1st;
Etc.
In the worst case, we will have N*4 comparisons, O(N*log(N)).
public class Main {
public static void main(String[] args) {
String[][] arr = {
{ "xxx", "xxx", "xxx", "zzz", "aaa" },
{ "ttt", "bbb", "ddd", "iii", "aaa" },
{ "sss", "kkk", "uuu", "rrr", "zzz" },
{ "iii", "zzz", "lll", "hhh", "aaa" }};
List<String> triplicates = findTriplicates(arr);
Collections.sort(triplicates);
for (String word : triplicates)
System.out.println(word);
}
public static List<String> findTriplicates(String[][] arr) {
Map<String, Set<Integer>> map = new HashMap<String, Set<Integer>>();
List<String> triplicates = new ArrayList<String>();
final int N = 5;
for (int i = 0; i < N; i++) {
for (int j = 0; j < 4; j++) {
String str = arr[j][i];
if (map.containsKey(str)) {
map.get(str).add(j);
if (map.get(str).size() == 3)
triplicates.add(str);
} else {
Set<Integer> set = new HashSet<Integer>();
set.add(j);
map.put(str, set);
}
}
}
return triplicates;
}
}
Output:
aaa
zzz
Ok, if you don't care about the constant factors this can be done in O(N) where N is the size of strings. It is important to distinguish number of strings vs their total size for practical purposes. (At the end I propose an alternative version which is O(N log N) where N is number of string comparisons.
You need one map string -> int for count, and one temporary already_counted map string -> bool. The latter one is basically a set. Important thing is to use unordered/hash versions of the associative containers, to avoid log factors.
For each array, for each element, you check whether the current element is in already_counted set. If not, do count[current_string] ++. Before going over to the next array empty the already_counted set.
Now you basically need a min search. Go over each element of count and if an element has value 3 or more, then compare the key associated with it, to your current min. VoilĂ . min is the lowest string with 3 or more occurences.
You don't need the N log N factor, because you do not need all the triplets, so no sorting or ordered data structures are needed. You have O(3*N) (again N is the total size of all string). This is an over estimation, later I give more detailed estimation.
Now, the caveat is that this method is based on string hashing, which is O(S), where S is the size of string. Twice, to deal with per-array repetitions. So, alternatively, might be faster, at least in c++ implementation, to actually use ordered versions of the containers. There are two reasons for this:
Comparing strings might be faster then hashing them. If the strings are different, then you will get a result of a comparison relatively fast, whereas with hashing you always go over whole string, and hashing quite more complicated.
They are contiguous in memory - cache friendly.
Hashing also has a problem with rehashing, etc. etc.
If the number of strings is not large, or if their size is very big, I would place my bet on the ordered versions. Also, if you have ordered count you get an edge in finding the least element because it's the 1st with count > 3, though in worst case you will get tons of a* with count 1 and z with 3.
So, to sum all of it up, if we call n the number of string comparisons, and N the number of string hashes.
Hash-based method is O(2 N + n) and with some trickery you can bring down constant factor by 1, e.g. reusing hash for count and the already_checked.\, or combining both data structures for example via bitset. So you would get O(N + n).
Pure string comparison based method would be O(2 n log n + n). Maybe somehow it would be possible to easily use hinting to drop the constant, but I am not sure.
It can be solved in O(N) using Trie.
You loop 4 lists one by one, for each list you insert the strings into the Trie.
When you inserting a string s of list L, increase the counter only if there is string s in previous lists. Update the answer if the counter >= 3 and is lexicographically smaller than the current answer.
Here is a sample C++ code, you can input 4 list of string, each contains 5 string to test it.
http://ideone.com/fTmKgJ
#include<bits/stdc++.h>
using namespace std;
vector<vector<string>> lists;
string ans = "";
struct TrieNode
{
TrieNode* l[128];
int n;
TrieNode()
{
memset(l, 0, sizeof(TrieNode*) * 128);
n = 0;
}
} *root = new TrieNode();
void add(string s, int listID)
{
TrieNode* p = root;
for (auto x: s)
{
if (!p->l[x]) p->l[x] = new TrieNode();
p = p->l[x];
}
p->n |= (1<<listID);
if(__builtin_popcount(p->n) >= 3 && (ans == "" || s < ans)) ans = s;
}
int main() {
for(int i=0; i<4;i++){
string s;
vector<string> v;
for(int i=0; i<5; i++){
cin >> s;
v.push_back(s);
}
lists.push_back(v);
}
for(int i=0; i<4;i++){
for(auto s: lists[i]){
add(s, i);
}
}
if(ans == "") cout << "NO ANSWER" << endl;
else cout << ans << endl;
return 0;
}
Basically, I would like help designing an algorithm that takes a given number, and returns a random number that is unrelated to the first number. The stipulations being that a) the given output number will always be the same for a similar input number, and b) within a certain range (ex. 1-100), all output numbers are distinct. ie., no two different input numbers under 100 will give the same output number.
I know it's easy to do by creating an ordered list of numbers, shuffling them randomly, and then returning the input's index. But I want to know if it can be done without any caching at all. Perhaps with some kind of hashing algorithm? Mostly the reason for this is that if the range of possible outputs were much larger, say 10000000000, then it would be ludicrous to generate an entire range of numbers and then shuffle them randomly, if you were only going to get a few results out of it.
Doesn't matter what language it's done in, I just want to know if it's possible. I've been thinking about this problem for a long time and I can't think of a solution besides the one I've already come up with.
Edit: I just had another idea; it would be interesting to have another algorithm that returned the reverse of the first one. Whether or not that's possible would be an interesting challenge to explore.
This sounds like a non-repeating random number generator. There are several possible approaches to this.
As described in this article, we can generate them by selecting a prime number p and satisfies p % 4 = 3 that is large enough (greater than the maximum value in the output range) and generate them this way:
int randomNumberUnique(int range_len , int p , int x)
if(x * 2 < p)
return (x * x) % p
else
return p - (x * x) % p
This algorithm will cover all values in [0 , p) for an input in range [0 , p).
Here's an example in C#:
private void DoIt()
{
const long m = 101;
const long x = 387420489; // must be coprime to m
var multInv = MultiplicativeInverse(x, m);
var nums = new HashSet<long>();
for (long i = 0; i < 100; ++i)
{
var encoded = i*x%m;
var decoded = encoded*multInv%m;
Console.WriteLine("{0} => {1} => {2}", i, encoded, decoded);
if (!nums.Add(encoded))
{
Console.WriteLine("Duplicate");
}
}
}
private long MultiplicativeInverse(long x, long modulus)
{
return ExtendedEuclideanDivision(x, modulus).Item1%modulus;
}
private static Tuple<long, long> ExtendedEuclideanDivision(long a, long b)
{
if (a < 0)
{
var result = ExtendedEuclideanDivision(-a, b);
return Tuple.Create(-result.Item1, result.Item2);
}
if (b < 0)
{
var result = ExtendedEuclideanDivision(a, -b);
return Tuple.Create(result.Item1, -result.Item2);
}
if (b == 0)
{
return Tuple.Create(1L, 0L);
}
var q = a/b;
var r = a%b;
var rslt = ExtendedEuclideanDivision(b, r);
var s = rslt.Item1;
var t = rslt.Item2;
return Tuple.Create(t, s - q*t);
}
That generates numbers in the range 0-100, from input in the range 0-100. Each input results in a unique output.
It also shows how to reverse the process, using the multiplicative inverse.
You can extend the range by increasing the value of m. x must be coprime with m.
Code cribbed from Eric Lippert's article, A practical use of multiplicative inverses, and a few of the previous articles in that series.
You can not have completely unrelated (particularly if you want the reverse as well).
There is a concept of modulo inverse of a number, but this would work only if the range number is a prime, eg. 100 will not work, you would need 101 (a prime). This can provide you a pseudo random number if you want.
Here is the concept of modulo inverse:
If there are two numbers a and b, such that
(a * b) % p = 1
where p is any number, then
a and b are modular inverses of each other.
For this to be true, if we have to find the modular inverse of a wrt a number p, then a and p must be co-prime, ie. gcd(a,p) = 1
So, for all numbers in a range to have modular inverses, the range bound must be a prime number.
A few outputs for range bound 101 will be:
1 == 1
2 == 51
3 == 34
4 == 76
etc.
EDIT:
Hey...actually you know, you can use the combined approach of modulo inverse and the method as defined by #Paul. Since every pair will be unique and all numbers will be covered, your random number can be:
random(k) = randomUniqueNumber(ModuloInverse(k), p) //this is Paul's function
The question:
Given any string, add the least amount of characters possible to make it a palindrome in linear time.
I'm only able to come up with a O(N2) solution.
Can someone help me with an O(N) solution?
Revert the string
Use a modified Knuth-Morris-Pratt to find the latest match (simplest modification would be to just append the original string to the reverted string and ignore matches after len(string).
Append the unmatched rest of the reverted string to the original.
1 and 3 are obviously linear and 2 is linear beacause Knuth-Morris-Pratt is.
If only appending is allowed
A Scala solution:
def isPalindrome(s: String) = s.view.reverse == s.view
def makePalindrome(s: String) =
s + s.take((0 to s.length).find(i => isPalindrome(s.substring(i))).get).reverse
If you're allowed to insert characters anywhere
Every palindrome can be viewed as a set of nested letter pairs.
a n n a b o b
| | | | | * |
| -- | | |
--------- -----
If the palindrome length n is even, we'll have n/2 pairs. If it is odd, we'll have n/2 full pairs and one single letter in the middle (let's call it a degenerated pair).
Let's represent them by pairs of string indexes - the left index counted from the left end of the string, and the right index counted from the right end of the string, both ends starting with index 0.
Now let's write pairs starting from the outer to the inner. So in our example:
anna: (0, 0) (1, 1)
bob: (0, 0) (1, 1)
In order to make any string a palindrome, we will go from both ends of the string one character at a time, and with every step, we'll eventually add a character to produce a correct pair of identical characters.
Example:
Assume the input word is "blob"
Pair (0, 0) is (b, b) ok, nothing to do, this pair is fine. Let's increase the counter.
Pair (1, 1) is (l, o). Doesn't match. So let's add "o" at position 1 from the left. Now our word became "bolob".
Pair (2, 2). We don't need to look even at the characters, because we're pointing at the same index in the string. Done.
Wait a moment, but we have a problem here: in point 2. we arbitrarily chose to add a character on the left. But we could as well add a character "l" on the right. That would produce "blolb", also a valid palindrome. So does it matter? Unfortunately it does because the choice in earlier steps may affect how many pairs we'll have to fix and therefore how many characters we'll have to add in the future steps.
Easy algorithm: search all the possiblities. That would give us a O(2^n) algorithm.
Better algorithm: use Dynamic Programming approach and prune the search space.
In order to keep things simpler, now we decouple inserting of new characters from just finding the right sequence of nested pairs (outer to inner) and fixing their alignment later. So for the word "blob" we have the following possibilities, both ending with a degenerated pair:
(0, 0) (1, 2)
(0, 0) (2, 1)
The more such pairs we find, the less characters we will have to add to fix the original string. Every full pair found gives us two characters we can reuse. Every degenerated pair gives us one character to reuse.
The main loop of the algorithm will iteratively evaluate pair sequences in such a way, that in step 1 all valid pair sequences of length 1 are found. The next step will evaluate sequences of length 2, the third sequences of length 3 etc. When at some step we find no possibilities, this means the previous step contains the solution with the highest number of pairs.
After each step, we will remove the pareto-suboptimal sequences. A sequence is suboptimal compared to another sequence of the same length, if its last pair is dominated by the last pair of the other sequence. E.g. sequence (0, 0)(1, 3) is worse than (0, 0)(1, 2). The latter gives us more room to find nested pairs and we're guaranteed to find at least all the pairs that we'd find for the former. However sequence (0, 0)(1, 2) is neither worse nor better than (0, 0)(2, 1). The one minor detail we have to beware of is that a sequence ending with a degenerated pair is always worse than a sequence ending with a full pair.
After bringing it all together:
def makePalindrome(str: String): String = {
/** Finds the pareto-minimum subset of a set of points (here pair of indices).
* Could be done in linear time, without sorting, but O(n log n) is not that bad ;) */
def paretoMin(points: Iterable[(Int, Int)]): List[(Int, Int)] = {
val sorted = points.toSeq.sortBy(identity)
(List.empty[(Int, Int)] /: sorted) { (result, e) =>
if (result.isEmpty || e._2 <= result.head._2)
e :: result
else
result
}
}
/** Find all pairs directly nested within a given pair.
* For performance reasons tries to not include suboptimal pairs (pairs nested in any of the pairs also in the result)
* although it wouldn't break anything as prune takes care of this. */
def pairs(left: Int, right: Int): Iterable[(Int, Int)] = {
val builder = List.newBuilder[(Int, Int)]
var rightMax = str.length
for (i <- left until (str.length - right)) {
rightMax = math.min(str.length - left, rightMax)
val subPairs =
for (j <- right until rightMax if str(i) == str(str.length - j - 1)) yield (i, j)
subPairs.headOption match {
case Some((a, b)) => rightMax = b; builder += ((a, b))
case None =>
}
}
builder.result()
}
/** Builds sequences of size n+1 from sequence of size n */
def extend(path: List[(Int, Int)]): Iterable[List[(Int, Int)]] =
for (p <- pairs(path.head._1 + 1, path.head._2 + 1)) yield p :: path
/** Whether full or degenerated. Full-pairs save us 2 characters, degenerated save us only 1. */
def isFullPair(pair: (Int, Int)) =
pair._1 + pair._2 < str.length - 1
/** Removes pareto-suboptimal sequences */
def prune(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val allowedHeads = paretoMin(sequences.map(_.head)).toSet
val containsFullPair = allowedHeads.exists(isFullPair)
sequences.filter(s => allowedHeads.contains(s.head) && (isFullPair(s.head) || !containsFullPair))
}
/** Dynamic-Programming step */
#tailrec
def search(sequences: List[List[(Int, Int)]]): List[List[(Int, Int)]] = {
val nextStage = prune(sequences.flatMap(extend))
nextStage match {
case List() => sequences
case x => search(nextStage)
}
}
/** Converts a sequence of nested pairs to a palindrome */
def sequenceToString(sequence: List[(Int, Int)]): String = {
val lStr = str
val rStr = str.reverse
val half =
(for (List(start, end) <- sequence.reverse.sliding(2)) yield
lStr.substring(start._1 + 1, end._1) + rStr.substring(start._2 + 1, end._2) + lStr(end._1)).mkString
if (isFullPair(sequence.head))
half + half.reverse
else
half + half.reverse.substring(1)
}
sequenceToString(search(List(List((-1, -1)))).head)
}
Note: The code does not list all the palindromes, but gives only one example, and it is guaranteed it has the minimum length. There usually are more palindromes possible with the same minimum length (O(2^n) worst case, so you probably don't want to enumerate them all).
O(n) time solution.
Algorithm:
Need to find the longest palindrome within the given string that contains the last character. Then add all the character that are not part of the palindrome to the back of the string in reverse order.
Key point:
In this problem, the longest palindrome in the given string MUST contain the last character.
ex:
input: abacac
output: abacacaba
Here the longest palindrome in the input that contains the last letter is "cac". Therefore add all the letter before "cac" to the back in reverse order to make the entire string a palindrome.
written in c# with a few test cases commented out
static public void makePalindrome()
{
//string word = "aababaa";
//string word = "abacbaa";
//string word = "abcbd";
//string word = "abacac";
//string word = "aBxyxBxBxyxB";
//string word = "Malayal";
string word = "abccadac";
int j = word.Length - 1;
int mark = j;
bool found = false;
for (int i = 0; i < j; i++)
{
char cI = word[i];
char cJ = word[j];
if (cI == cJ)
{
found = true;
j--;
if(mark > i)
mark = i;
}
else
{
if (found)
{
found = false;
i--;
}
j = word.Length - 1;
mark = j;
}
}
for (int i = mark-1; i >=0; i--)
word += word[i];
Console.Write(word);
}
}
Note that this code will give you the solution for least amount of letter to APPEND TO THE BACK to make the string a palindrome. If you want to append to the front, just have a 2nd loop that goes the other way. This will make the algorithm O(n) + O(n) = O(n). If you want a way to insert letters anywhere in the string to make it a palindrome, then this code will not work for that case.
I believe #Chronical's answer is wrong, as it seems to be for best case scenario, not worst case which is used to compute big-O complexity. I welcome the proof, but the "solution" doesn't actually describe a valid answer.
KMP finds a matching substring in O(n * 2k) time, where n is the length of the input string, and k substring we're searching for, but does not in O(n) time tell you what the longest palindrome in the input string is.
To solve this problem, we need to find the longest palindrome at the end of the string. If this longest suffix palindrome is of length x, the minimum number of characters to add is n - x. E.g. the string aaba's longest suffix substring is aba of length 3, thus our answer is 1. The algorithm to find out if a string is a palindrome takes O(n) time, whether using KMP or the more efficient and simple algorithm (O(n/2)):
Take two pointers, one at the first character and one at the last character
Compare the characters at the pointers, if they're equal, move each pointer inward, otherwise return false
When the pointers point to the same index (odd string length), or have overlapped (even string length), return true
Using the simple algorithm, we start from the entire string and check if it's a palindrome. If it is, we return 0, and if not, we check the string string[1...end], string[2...end] until we have reached a single character and return n - 1. This results in a runtime of O(n^2).
Splitting up the KMP algorithm into
Build table
Search for longest suffix palindrome
Building the table takes O(n) time, and then each check of "are you a palindrome" for each substring from string[0...end], string[1...end], ..., string[end - 2...end] each takes O(n) time. k in this case is the same factor of n that the simple algorithm takes to check each substring, because it starts as k = n, then goes through k = n - 1, k = n - 2... just the same as the simple algorithm did.
TL; DR:
KMP can tell you if a string is a palindrome in O(n) time, but that supply an answer to the question, because you have to check if all substrings string[0...end], string[1...end], ..., string[end - 2...end] are palindromes, resulting in the same (but actually worse) runtime as a simple palindrome-check algorithm.
#include<iostream>
#include<string>
using std::cout;
using std::endl;
using std::cin;
int main() {
std::string word, left("");
cin >> word;
size_t start, end;
for (start = 0, end = word.length()-1; start < end; end--) {
if (word[start] != word[end]) {
left.append(word.begin()+end, 1 + word.begin()+end);
continue;
}
left.append(word.begin()+start, 1 + word.begin()+start), start++;
}
cout << left << ( start == end ? std::string(word.begin()+end, 1 + word.begin()+end) : "" )
<< std::string(left.rbegin(), left.rend()) << endl;
return 0;
}
Don't know if it appends the minimum number, but it produces palindromes
Explained:
We will start at both ends of the given string and iterate inwards towards the center.
At each iteration, we check if each letter is the same, i.e. word[start] == word[end]?.
If they are the same, we append a copy of the variable word[start] to another string called left which as it name suggests will serve as the left hand side of the new palindrome string when iteration is complete. Then we increment both variables (start)++ and (end)-- towards the center
In the case that they are not the same, we append a copy of of the variable word[end] to the same string left
And this is the basics of the algorithm until the loop is done.
When the loop is finished, one last check is done to make sure that if we got an odd length palindrome, we append the middle character to the middle of the new palindrome formed.
Note that if you decide to append the oppoosite characters to the string left, the opposite about everything in the code becomes true; i.e. which index is incremented at each iteration and which is incremented when a match is found, order of printing the palindrome, etc. I don't want to have to go through it again but you can try it and see.
The running complexity of this code should be O(N) assuming that append method of the std::string class runs in constant time.
If some wants to solve this in ruby, The solution can be very simple
str = 'xcbc' # Any string that you want.
arr1 = str.split('')
arr2 = arr1.reverse
count = 0
while(str != str.reverse)
count += 1
arr1.insert(count-1, arr2[count-1])
str = arr1.join('')
end
puts str
puts str.length - arr2.count
I am assuming that you cannot replace or remove any existing characters?
A good start would be reversing one of the strings and finding the longest-common-substring (LCS) between the reversed string and the other string. Since it sounds like this is a homework or interview question, I'll leave the rest up to you.
Here see this solution
This is better than O(N^2)
Problem is sub divided in to many other sub problems
ex:
original "tostotor"
reversed "rototsot"
Here 2nd position is 'o' so dividing in to two problems by breaking in to "t" and "ostot" from the original string
For 't':solution is 1
For 'ostot':solution is 2 because LCS is "tot" and characters need to be added are "os"
so total is 2+1 = 3
def shortPalin( S):
k=0
lis=len(S)
for i in range(len(S)/2):
if S[i]==S[lis-1-i]:
k=k+1
else :break
S=S[k:lis-k]
lis=len(S)
prev=0
w=len(S)
tot=0
for i in range(len(S)):
if i>=w:
break;
elif S[i]==S[lis-1-i]:
tot=tot+lcs(S[prev:i])
prev=i
w=lis-1-i
tot=tot+lcs(S[prev:i])
return tot
def lcs( S):
if (len(S)==1):
return 1
li=len(S)
X=[0 for x in xrange(len(S)+1)]
Y=[0 for l in xrange(len(S)+1)]
for i in range(len(S)-1,-1,-1):
for j in range(len(S)-1,-1,-1):
if S[i]==S[li-1-j]:
X[j]=1+Y[j+1]
else:
X[j]=max(Y[j],X[j+1])
Y=X
return li-X[0]
print shortPalin("tostotor")
Using Recursion
#include <iostream>
using namespace std;
int length( char str[])
{ int l=0;
for( int i=0; str[i]!='\0'; i++, l++);
return l;
}
int palin(char str[],int len)
{ static int cnt;
int s=0;
int e=len-1;
while(s<e){
if(str[s]!=str[e]) {
cnt++;
return palin(str+1,len-1);}
else{
s++;
e--;
}
}
return cnt;
}
int main() {
char str[100];
cin.getline(str,100);
int len = length(str);
cout<<palin(str,len);
}
Solution with O(n) time complexity
public static void main(String[] args) {
String givenStr = "abtb";
String palindromeStr = covertToPalindrome(givenStr);
System.out.println(palindromeStr);
}
private static String covertToPalindrome(String str) {
char[] strArray = str.toCharArray();
int low = 0;
int high = strArray.length - 1;
int subStrIndex = -1;
while (low < high) {
if (strArray[low] == strArray[high]) {
high--;
} else {
high = strArray.length - 1;
subStrIndex = low;
}
low++;
}
return str + (new StringBuilder(str.substring(0, subStrIndex+1))).reverse().toString();
}
// string to append to convert it to a palindrome
public static void main(String args[])
{
String s=input();
System.out.println(min_operations(s));
}
static String min_operations(String str)
{
int i=0;
int j=str.length()-1;
String ans="";
while(i<j)
{
if(str.charAt(i)!=str.charAt(j))
{
ans=ans+str.charAt(i);
}
if(str.charAt(i)==str.charAt(j))
{
j--;
}
i++;
}
StringBuffer sd=new StringBuffer(ans);
sd.reverse();
return (sd.toString());
}
I'm reading the numbers 0, 1, ..., (N - 1) one by one in some order. My goal is to find the lexicography index of this given permutation, using only O(1) space.
This question was asked before, but all the algorithms I could find used O(N) space. I'm starting to think that it's not possible. But it would really help me a lot with reducing the number of allocations.
Considering the following data:
chars = [a, b, c, d]
perm = [c, d, a, b]
ids = get_indexes(perm, chars) = [2, 3, 0, 1]
A possible solution for permutation with repetitions goes as follows:
len = length(perm) (len = 4)
num_chars = length(chars) (len = 4)
base = num_chars ^ len (base = 4 ^ 4 = 256)
base = base / len (base = 256 / 4 = 64)
id = base * ids[0] (id = 64 * 2 = 128)
base = base / len (base = 64 / 4 = 16)
id = id + (base * ids[1]) (id = 128 + (16 * 3) = 176)
base = base / len (base = 16 / 4 = 4)
id = id + (base * ids[2]) (id = 176 + (4 * 0) = 176)
base = base / len (base = 4 / 4 = 1)
id = id + (base * ids[3]) (id = 176 + (1 * 1) = 177)
Reverse process:
id = 177
(id / (4 ^ 3)) % 4 = (177 / 64) % 4 = 2 % 4 = 2 -> chars[2] -> c
(id / (4 ^ 2)) % 4 = (177 / 16) % 4 = 11 % 4 = 3 -> chars[3] -> d
(id / (4 ^ 1)) % 4 = (177 / 4) % 4 = 44 % 4 = 0 -> chars[0] -> a
(id / (4 ^ 0)) % 4 = (177 / 1) % 4 = 177 % 4 = 1 -> chars[1] -> b
The number of possible permutations is given by num_chars ^ num_perm_digits, having num_chars as the number of possible characters, and num_perm_digits as the number of digits in a permutation.
This requires O(1) in space, considering the initial list as a constant cost; and it requires O(N) in time, considering N as the number of digits your permutation will have.
Based on the steps above, you can do:
function identify_permutation(perm, chars) {
for (i = 0; i < length(perm); i++) {
ids[i] = get_index(perm[i], chars);
}
len = length(perm);
num_chars = length(chars);
index = 0;
base = num_chars ^ len - 1;
base = base / len;
for (i = 0; i < length(perm); i++) {
index += base * ids[i];
base = base / len;
}
}
It's a pseudocode, but it's also quite easy to convert to any language (:
If you are looking for a way to obtain the lexicographic index or rank of a unique combination instead of a permutation, then your problem falls under the binomial coefficient. The binomial coefficient handles problems of choosing unique combinations in groups of K with a total of N items.
I have written a class in C# to handle common functions for working with the binomial coefficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the set.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it is also faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
The following tested code will iterate through each unique combinations:
public void Test10Choose5()
{
String S;
int Loop;
int N = 10; // Total number of elements in the set.
int K = 5; // Total number of elements in each group.
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// The Kindexes array specifies the indexes for a lexigraphic element.
int[] KIndexes = new int[K];
StringBuilder SB = new StringBuilder();
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination.
BC.GetKIndexes(Combo, KIndexes);
// Verify that the Kindexes returned can be used to retrive the
// rank or lexigraphic order of the KIndexes in the table.
int Val = BC.GetIndex(true, KIndexes);
if (Val != Combo)
{
S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
Console.WriteLine(S);
}
SB.Remove(0, SB.Length);
for (Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
S = "KIndexes = " + SB.ToString();
Console.WriteLine(S);
}
}
You should be able to port this class over fairly easily to the language of your choice. You probably will not have to port over the generic part of the class to accomplish your goals. Depending on the number of combinations you are working with, you might need to use a bigger word size than 4 byte ints.
There is a java solution to this problem on geekviewpoint. It has a good explanation for why it's true and the code is easy to follow. http://www.geekviewpoint.com/java/numbers/permutation_index. It also has a unit test that runs the code with different inputs.
There are N! permutations. To represent index you need at least N bits.
Here is a way to do it if you want to assume that arithmetic operations are constant time:
def permutationIndex(numbers):
n=len(numbers)
result=0
j=0
while j<n:
# Determine factor, which is the number of possible permutations of
# the remaining digits.
i=1
factor=1
while i<n-j:
factor*=i
i+=1
i=0
# Determine index, which is how many previous digits there were at
# the current position.
index=numbers[j]
while i<j:
# Only the digits that weren't used so far are valid choices, so
# the index gets reduced if the number at the current position
# is greater than one of the previous digits.
if numbers[i]<numbers[j]:
index-=1
i+=1
# Update the result.
result+=index*factor
j+=1
return result
I've purposefully written out certain calculations that could be done more simply using some Python built-in operations, but I wanted to make it more obvious that no extra non-constant amount of space was being used.
As maxim1000 noted, the number of bits required to represent the result will grow quickly as n increases, so eventually big integers will be required, which no longer have constant-time arithmetic, but I think this code addresses the spirit of your question.
Nothing really new in the idea but a fully matricial method with no explicit loop or recursion (using Numpy but easy to adapt):
import numpy as np
import math
vfact = np.vectorize(math.factorial, otypes='O')
def perm_index(p):
return np.dot( vfact(range(len(p)-1, -1, -1)),
p-np.sum(np.triu(p>np.vstack(p)), axis=0) )
I just wrote a code using Visual Basic and my program can directly calculate every index or every corresponding permutation to a given index up to 17 elements (this limit is due to the approximation of the scientific notation of numbers over 17! of my compiler).
If you are interested I can I can send the program or publish it somewhere for download.
It works fine and It can be useful for testing and paragon the output of your codes.
I used the method of James D. McCaffrey called factoradic and you can read about it here and something also here (in the discussion at the end of the page).