Find same sequences of values in arrays - algorithm

I have N different arrays with different numbers of elements. I want know if there is a good algorithm to find same sequences of values.
For example:
a= 1,2,3,4,5,6,7,8
b= 9,10,13,5,6,7,13,12
c= 20,36,24,11,2,3,5,6,7,9,11
I want, as result, that all the three arrays have the sequence 5,6,7 in common. Any suggestion?

You can use Suffix Array and LCP or Suffix Trie to solve this problem. Check this tutorial : http://wcipeg.com/wiki/Longest_common_substring
It will work in O(NLogN) time, where N is summation of all the sequence's length.
if number of list is not big then you can use dynamic programming solution explained here: http://wcipeg.com/wiki/Longest_common_substring#Dynamic_programming_solution

Related

Most natural datastructure to store a permutation of distinct elements?

One easy way to store a permutation of a sequence of distinct elements is as a string (or list) like, "acb" which is clearly a permutation of "abc". However, if I use a string to represent my permutation, I will end up with the possibility of strings like "abb" which do not correspond to any permutation. As a result, the representation of permutations in strings is not dense, so to speak. Lists of indices like [2,3,1] have the same problem.
Alternatively, I could recognize that over N elements there are N! permutations, which can be enumerated in some way. Then, I could store the permutation as an integer. However this is not ideal because the integer would be opaque to interpretation (nobody would know what "permutation number 43" meant), and also because the group structure of the integers over addition is nothing like the group structure of permutations.
Is there a way to represent permutations in a computer that does not have the drawbacks of the methods that I have suggested?

Min number of Elements To generate all other elements using xor

I have n integers a_1, ..., a_n. I want to pick the minimum number from all of them whose xor forms others.
For example, consider [1,2,3], 1^3=2 so you don't need 2 in the array. So you can remove it. To end up with [1,3]. So the min number of elements is 2 and they can form all the original elements in the array by xoring any 2 of them. Would a greedy approach work here? or DP?
Edit: To explain what I am thinking. A greedy approach I thought about was due to the fact that if a^b=c then a^c=b and b^c=a. First I delete all duplicates. then I would first in the beginning list all the pairs that each element can pair up with to form another element in the array. It takes O(n^3) for preprocessing. Then I pick the element with the least contribution and I delete it and subsequently subtract 1 from each of the other elements. I repeat this until all elements have <=2 pairs. and I stop. This would also take O(n^3) for a total of O(n^3). Does this greedy approach work? Is there a DP way to do it?
If n is bounded by 50 I think backtracking should work.
Suppose at some step we have already selected a subset S of numbers (that should produce all the others) and want to include a new number to that subset.
Then we can do the following:
Consider all remaining numbers R and include in S all numbers that can't be produced by others (in S and R)
Include in S a random (or "best" in some way) number from R
Remove from R all numbers that can be produced by those in updated S
Also you should keep track of the current best solution and cut off all the branches that won't allow to get a better result.

algorithm to accomplish comparing two arrays with user define criteria

I want to compare tow float arrays' value. But it may be different from other criteria. Here is how I define which array is the best.
Say we have two array named a,b.First, we compare the max value of these two array, and the array with smaller max value wins. If they have same value, then we can divide each array into two parts. The first part is a[1:max_loc(a)-1] and a[max_loc(a)+1,len(a)], and b is similar. Then we use the same criteria on a[1:max_loc(a)-1] and b[1:max_loc(b)-1] to see which array has the smaller max value. If they have the same max value on these intervals, then divide them to smaller arrays and do the same comparison. We also do the same thing for the a[max_loc(a)+1,len(a)] and b[max_loc(b)+1,len(b)]. Until we find smaller max value on the same intervals, the program end and print out the best array.
What's the algorithm to fulfill this comparison?
P.S. these two arrays may have different length.
Most of the time, what you search is somewhere already on the Internet :
https://www.ics.uci.edu/~eppstein/161/960118.html
Here you got 2 examples with full explanations which follows the divide and conquer idea (MergeSort and QuickSort)

sorting a bivalued list

If I have a list of just binary values containing 0's and 1's like the following 000111010110
and I want to sort it to the following 000000111111 what would be the most efficient way to do this if you also know the list size? Right now I am thinking to have one counter where I just count the number of 0's as I traverse the list from beginning to end. Then if I divide the listSize by numberOfZeros I get numberOfOnes. Then I was thinking instead of reordering the list starting with zeros, I would just create a new list. Would you agree this is the most efficient method?
Your algorithm implements the most primitive version of the classic bucket sort algorithm (its counting sort implementation). It is the fastest possible way to sort numbers when their range is known, and is (relatively) small. Since zeros and ones is all you have, you do not need an array of counters that are present in the bucket sort: a single counter is sufficient.
If you have numeric values, you can use the assembly instruction bitscan (BSF in x86 assembly) to count the number of bits. To create the "sorted" value you would set the n+1 bit, then subtract one. This will set all the bits to the right of the n+1 bit.
Bucket sort is a sorting algorithm as it seems.
I dont think there is a need for such operations.As we know there is no Sorting algorithm faster than N*logN . So by default it is wrong.
And all that because all you got to do is what you said in the very beginning.Just traverse the list and count the Zero's or the One's that will give you O(n) complexity.Then just create a new array with the counted zero's in the beginning followed by the One's.Then you have a total of N+N complexity that gives you
O(n) complexity.
And thats only because you have only two values.So neither quick sort or any other sort can do this faster.There is no faster sorting than NLog(n)

Generating ids for a set of integers

Background:
I'm working with permutations of the sequence of integers {0, 1, 2 ... , n}.
I have a local search algorithm that transforms a permutation in some systematic way into another permutation. The point of the algorithm is to produce a permutation that minimises a cost function. I'd like to work with a wide range of problems, from n=5 to n=400.
The problem:
To reduce search effort I need to be able to check if I've processed a particular permutation of integers before. I'm using a hash table for this and I need to be able to generate an id for each permutation which I can use as a key into the table. However, I can't think of any nice hash function that maps a set of integers into a key such that collisions do not occur too frequently.
Stuff I've tried:
I started out by generating a sequence of n prime numbers and multiplying the ith number in my permutation with the ith prime then summing the results. The resulting key however produces collisions even for n=5.
I also thought to concatenate the values of all numbers together and take the integer value of the resulting string as a key but the id quickly becomes too big even for small values of n. Ideally, I'd like to be able to store each key as an integer.
Does stackoverflow have any suggestions for me?
Zobrist hashing might work for you. You need to create an NxN matrix of random integers, each cell representing that element i is in the jth position in the current permutation.
For a given permutation you pick the N cell values, and xor them one by one to get the permutation's key (note that key uniqueness is not guaranteed).
The point in this algorithm is, that if you swap to elements in your permutations, you can easily generate the new key from the current permutation by simply xor-ing out the old and xor-ing in the new positions.
Judging by your question, and the comments you've left, I'd say your problem is not possible to solve.
Let me explain.
You say that you need a unique hash from your combination, so let's make that rule #1:
1: Need a unique number to represent a combination of an arbitrary number of digits/numbers
Ok, then in a comment you've said that since you're using quite a few numbers, storing them as a string or whatnot as a key to the hashtable is not feasible, due to memory constraints. So let's rewrite that into another rule:
2: Cannot use the actual data that were used to produce the hash as they are no longer in memory
Basically, you're trying to take a large number, and store that into a much smaller number range, and still have uniqueness.
Sorry, but you can't do that.
Typical hashing algorithms produce relatively unique hash values, so unless you're willing to accept collisions, in the sense that a new combination might be flagged as "already seen" even though it hasn't, then you're out of luck.
If you were to try a bit-field, where each combination has a bit, which is 0 if it hasn't been seen, you still need large amounts of memory.
For the permutation in n=20 that you left in a comment, you have 20! (2,432,902,008,176,640,000) combinations, which if you tried to simply store each combination as a 1-bit in a bit-field, would require 276,589TB of storage.
You're going to have to limit your scope of what you're trying to do.
As others have suggested, you can use hashing to generate an integer that will be unique with high probability. However, if you need the integer to always be unique, you should rank the permutations, i.e. assign an order to them. For example, a common order of permutations for set {1,2,3} is the lexicographical order:
1,2,3
1,3,2
2,1,3
2,3,1
3,1,2
3,2,1
In this case, the id of a permutation is its index in the lexicographical order. There are other methods of ranking permutations, of course.
Making ids a range of continuous integers makes it possible to implement the storage of processed permutations as a bit field or a boolean array.
How fast does it need to be?
You could always gather the integers as a string, then take the hash of that, and then just grab the first 4 bytes.
For a hash you could use any function really, like MD5 or SHA-256.
You could MD5 hash a comma separated string containg your ints.
In C# it would look something like this (Disclaimer: I have no compiler on the machine I'm using today):
using System;
using System.Security.Cryptography;
using System.Text;
public class SomeClass {
static Guid GetHash(int[] numbers) {
string csv = string.Join(',', numbers);
return new Guid(new MD5CryptoServiceProvider().ComputeHash(Encoding.ASCII.GetBytes(csv.Trim())));
}
}
Edit: What was I thinking? As stated by others, you don't need a hash. The CSV should be sufficient as a string Id (unless your numbers array is big).
Convert each number to String, concatenate Strings (via StringBuffer) and take contents of StringBuffer as a key.
Not relates directly to the question, but as an alternative solution you may use Trie tree as a look up structure. Trie trees are very good for strings operations, its implementation relatively easy and it should be more faster (max of n(k) where k is length of a key) than hashset for a big amount of long strings. And you aren't limited in key size( such in a regular hashset in must int, not bigger). Key in your case will be a string of all numbers separated by some char.
Prime powers would work: if p_i is the ith prime and a_i is the ith element of your tuple, then
p_0**a_0 * p_1**a_1 * ... * p_n**a_n
should be unique by the Fundamental Theorem of Arithmetic. Those numbers will get pretty big, though :-)
(e.g. for n=5, (1,2,3,4,5) will map to 870,037,764,750 which is already more than 32 bits)
Similar to Bojan's post it seems like the best way to go is to have a deterministic order to the permutations. If you process them in that order then there is no need to do a lookup to see if you have already done any particular permutation.
get two permutations of same series of numbers {1,.., n}, construct a mapping tupple, (id, permutation1[id], permutation2[id]), or (id, f1(id), f2(id)); you will get an unique map by {f3(id)| for tuple (id, f1(id), f2(id)) , from id, we get a f2(id), and find a id' from tuple (id',f1(id'),f2(id')) where f1(id') == f2(id)}

Resources