I have been tried to do the Morris Pratt table and the code is basically this one in C:
void preMp(char *x, int m, int mpNext[]) {
int i, j;
i = 0;
j = mpNext[0] = -1;
while (i < m) {
while (j > -1 && x[i] != x[j])
j = mpNext[j];
mpNext[++i] = ++j;
}
}
and here is where i get so far in Fortran
program MP_ALGORITHM
implicit none
integer, parameter :: m=4
character(LEN=m) :: x='abac'
integer, dimension(4) :: T
integer :: i, j
i=0
T(1)=-1
j=-1
do while(i < m)
do while((j > -1) .AND. (x(i+1:i+1) /= (x(j+i+1:j+i+1))))
j=T(j)
end do
i=i+1
j=j+1
T(i)=j
end do
print *, T(1:)
end program MP_ALGORITHM
and the problem is i think i am having the wrong output.
for x=abac it should be (?):
a b a c
-1 0 1 0
and my code is returning 0 1 1 1
so, what i've done wrong?
The problem here is that C indices start from zero, but Fortran indices start from one. You can try to adjust the index for every array acces by one, but this will get unwieldy.
The Morris-Pratt table itself is an array of indices, so it should look different in C and Fortran: The Fortran array should have one-based indices and it should use zero as invalid index.
Together with the error that chw21 pointed out, your function might look like this:
subroutine kmp_table(x, t)
implicit none
character(*), intent(in) :: x
integer, dimension(:), intent(out) :: t
integer m
integer :: i, j
m = len(x)
i = 1
t(1) = 0
j = 0
do while (i < m)
do while(j > 0 .and. x(i:i) /= x(j:j))
j = t(j)
end do
i = i + 1
j = j + 1
t(i) = j
end do
end subroutine
You can then use it in the Morris-Pratt algorithm as taken straight from the Wikipedia page with adjustment for Fortran indices:
function kmp_index(S, W) result(res)
implicit none
integer :: res
character(*), intent(in) :: S ! text to search
character(*), intent(in) :: W ! word to find
integer :: m ! zero-based offset in S
integer :: i ! one-based offset in W and T
integer, dimension(len(W)) :: T ! KMP table
call kmp_table(W, T)
i = 1
m = 0
do while (m + i <= len(S))
if (W(i:i) == S(m + i:m + i)) then
if (i == len(W)) then
res = m + 1
return
end if
i = i + 1
else
if (T(i) > 0) then
m = m + i - T(i)
i = T(i)
else
i = 1
m = m + 1
end if
end if
end do
res = 0
end function
(The index m is zero-based here, because t is only ever used in conjunction with i in S(m + i:m + i). Adding two one-based indices will yield an offset of one, whereas keeping m zero-based makes this a neutral addition. m is a local variable that isn't exposed to code from the outside.)
Alternatively, you could make your Fortran arrays zero-based by specifying a lower bound of zero for your string and array. That will clash with the useful character(*) notation, though, which always uses one-based indexing. In my opinion, it is better to think about the whole algorithm in the typical one-based indexing scheme of Fortran.
this site isn't really a debugging site. Normally I would suggest you have a look at how to debug code. It didn't take me very long to go through your code with a pen and paper and verify that that is indeed the table it produces.
Still, here are a few pointers:
The C code compares x[i] and x[j], but you compare x[i] and x[i+j] in your Fortran code, more or less.
Integer arrays usually also start at index 1 in Fortran. So just like adding one to the index in the x String, you also need to add 1 every time you access T anywhere.
I am trying to do a fast exponentiation. But the result does not seem to produce the correct result. Any help would be appreciated.
EDIT: Manage to solve it thanks for all the help.
if (content[i] == '1')
s1 = (int)(po1 * (Math.pow(po1, 2)));
else
s1 = po1 * po1;
final_result *= temp;
Check out this Exponation by squaring
You probably want to bit-shift right and square your base each time you encounter a 1 bit in the exponent
int pow(int base, int e)
{
int retVal = 1;
while (e)
{
if (e % 2 == 1)//i.e. last bit of exponent is 1
retVal *= base;
e >>= 1; //bitshift exponent to the right.
base *= base; // square base since we shifted 1 bit in our exponent
}
return retVal ;
}
A good way of thinking about it is that your exponent is being broken down: say, 6^7 (exponent in bits is 1, 1, 1) = 6^1 * 6^2 * 6^4 = 6 * 36 * 36^2 = 6 * 36 * 1296. Your base is always squaring itself.
temp = (int)(g1 * (Math.pow(g1, 2)));
This basically just boils down to g13. I'm not familiar with this algorithm but this can't be right.
Also, as a side note, don't ever call Math.pow(<var>, 2), just write <var> * <var>.
There are several problems with your code, starting with the fact that you are reading the exp string in the wrong direction, adding extra multiplications by the base, and not considering the rank of the 1 when raising the powers of 2.
Here is a python quick sketch of what you are trying to achieve:
a = int(raw_input("base"))
b = "{0:b}".format(int(raw_input("exp")))
res = 1
for index, i in enumerate(b[::-1]):
if i == '1':
res *= a**(2**index)
print res
Alternatively, you could square a at every iteration instead:
for index, i in enumerate(b[::-1]):
if i == '1':
res *= a
a *= a
I know that there is an algorithm that permits, given a combination of number (no repetitions, no order), calculates the index of the lexicographic order.
It would be very useful for my application to speedup things...
For example:
combination(10, 5)
1 - 1 2 3 4 5
2 - 1 2 3 4 6
3 - 1 2 3 4 7
....
251 - 5 7 8 9 10
252 - 6 7 8 9 10
I need that the algorithm returns the index of the given combination.
es: index( 2, 5, 7, 8, 10 ) --> index
EDIT: actually I'm using a java application that generates all combinations C(53, 5) and inserts them into a TreeMap.
My idea is to create an array that contains all combinations (and related data) that I can index with this algorithm.
Everything is to speedup combination searching.
However I tried some (not all) of your solutions and the algorithms that you proposed are slower that a get() from TreeMap.
If it helps: my needs are for a combination of 5 from 53 starting from 0 to 52.
Thank you again to all :-)
Here is a snippet that will do the work.
#include <iostream>
int main()
{
const int n = 10;
const int k = 5;
int combination[k] = {2, 5, 7, 8, 10};
int index = 0;
int j = 0;
for (int i = 0; i != k; ++i)
{
for (++j; j != combination[i]; ++j)
{
index += c(n - j, k - i - 1);
}
}
std::cout << index + 1 << std::endl;
return 0;
}
It assumes you have a function
int c(int n, int k);
that will return the number of combinations of choosing k elements out of n elements.
The loop calculates the number of combinations preceding the given combination.
By adding one at the end we get the actual index.
For the given combination there are
c(9, 4) = 126 combinations containing 1 and hence preceding it in lexicographic order.
Of the combinations containing 2 as the smallest number there are
c(7, 3) = 35 combinations having 3 as the second smallest number
c(6, 3) = 20 combinations having 4 as the second smallest number
All of these are preceding the given combination.
Of the combinations containing 2 and 5 as the two smallest numbers there are
c(4, 2) = 6 combinations having 6 as the third smallest number.
All of these are preceding the given combination.
Etc.
If you put a print statement in the inner loop you will get the numbers
126, 35, 20, 6, 1.
Hope that explains the code.
Convert your number selections to a factorial base number. This number will be the index you want. Technically this calculates the lexicographical index of all permutations, but if you only give it combinations, the indexes will still be well ordered, just with some large gaps for all the permutations that come in between each combination.
Edit: pseudocode removed, it was incorrect, but the method above should work. Too tired to come up with correct pseudocode at the moment.
Edit 2: Here's an example. Say we were choosing a combination of 5 elements from a set of 10 elements, like in your example above. If the combination was 2 3 4 6 8, you would get the related factorial base number like so:
Take the unselected elements and count how many you have to pass by to get to the one you are selecting.
1 2 3 4 5 6 7 8 9 10
2 -> 1
1 3 4 5 6 7 8 9 10
3 -> 1
1 4 5 6 7 8 9 10
4 -> 1
1 5 6 7 8 9 10
6 -> 2
1 5 7 8 9 10
8 -> 3
So the index in factorial base is 1112300000
In decimal base, it's
1*9! + 1*8! + 1*7! + 2*6! + 3*5! = 410040
This is Algorithm 2.7 kSubsetLexRank on page 44 of Combinatorial Algorithms by Kreher and Stinson.
r = 0
t[0] = 0
for i from 1 to k
if t[i - 1] + 1 <= t[i] - 1
for j from t[i - 1] to t[i] - 1
r = r + choose(n - j, k - i)
return r
The array t holds your values, for example [5 7 8 9 10]. The function choose(n, k) calculates the number "n choose k". The result value r will be the index, 251 for the example. Other inputs are n and k, for the example they would be 10 and 5.
zero-base,
# v: array of length k consisting of numbers between 0 and n-1 (ascending)
def index_of_combination(n,k,v):
idx = 0
for p in range(k-1):
if p == 0: arrg = range(1,v[p]+1)
else: arrg = range(v[p-1]+2, v[p]+1)
for a in arrg:
idx += combi[n-a, k-1-p]
idx += v[k-1] - v[k-2] - 1
return idx
Null Set has the right approach. The index corresponds to the factorial-base number of the sequence. You build a factorial-base number just like any other base number, except that the base decreases for each digit.
Now, the value of each digit in the factorial-base number is the number of elements less than it that have not yet been used. So, for combination(10, 5):
(1 2 3 4 5) == 0*9!/5! + 0*8!/5! + 0*7!/5! + 0*6!/5! + 0*5!/5!
== 0*3024 + 0*336 + 0*42 + 0*6 + 0*1
== 0
(10 9 8 7 6) == 9*3024 + 8*336 + 7*42 + 6*6 + 5*1
== 30239
It should be pretty easy to calculate the index incrementally.
If you have a set of positive integers 0<=x_1 < x_2< ... < x_k , then you could use something called the squashed order:
I = sum(j=1..k) Choose(x_j,j)
The beauty of the squashed order is that it works independent of the largest value in the parent set.
The squashed order is not the order you are looking for, but it is related.
To use the squashed order to get the lexicographic order in the set of k-subsets of {1,...,n) is by taking
1 <= x1 < ... < x_k <=n
compute
0 <= n-x_k < n-x_(k-1) ... < n-x_1
Then compute the squashed order index of (n-x_k,...,n-k_1)
Then subtract the squashed order index from Choose(n,k) to get your result, which is the lexicographic index.
If you have relatively small values of n and k, you can cache all the values Choose(a,b) with a
See Anderson, Combinatorics on Finite Sets, pp 112-119
I needed also the same for a project of mine and the fastest solution I found was (Python):
import math
def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)
def index(comb,n,k):
r=nCr(n,k)
for i in range(k):
if n-comb[i]<k-i:continue
r=r-nCr(n-comb[i],k-i)
return r
My input "comb" contained elements in increasing order You can test the code with for example:
import itertools
k=3
t=[1,2,3,4,5]
for x in itertools.combinations(t, k):
print x,index(x,len(t),k)
It is not hard to prove that if comb=(a1,a2,a3...,ak) (in increasing order) then:
index=[nCk-(n-a1+1)Ck] + [(n-a1)C(k-1)-(n-a2+1)C(k-1)] + ... =
nCk -(n-a1)Ck -(n-a2)C(k-1) - .... -(n-ak)C1
There's another way to do all this. You could generate all possible combinations and write them into a binary file where each comb is represented by it's index starting from zero. Then, when you need to find an index, and the combination is given, you apply a binary search on the file. Here's the function. It's written in VB.NET 2010 for my lotto program, it works with Israel lottery system so there's a bonus (7th) number; just ignore it.
Public Function Comb2Index( _
ByVal gAr() As Byte) As UInt32
Dim mxPntr As UInt32 = WHL.AMT.WHL_SYS_00 '(16.273.488)
Dim mdPntr As UInt32 = mxPntr \ 2
Dim eqCntr As Byte
Dim rdAr() As Byte
modBinary.OpenFile(WHL.WHL_SYS_00, _
FileMode.Open, FileAccess.Read)
Do
modBinary.ReadBlock(mdPntr, rdAr)
RP: If eqCntr = 7 Then GoTo EX
If gAr(eqCntr) = rdAr(eqCntr) Then
eqCntr += 1
GoTo RP
ElseIf gAr(eqCntr) < rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mxPntr = mdPntr
mdPntr \= 2
ElseIf gAr(eqCntr) > rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mdPntr += (mxPntr - mdPntr) \ 2
End If
Loop Until eqCntr = 7
EX: modBinary.CloseFile()
Return mdPntr
End Function
P.S. It takes 5 to 10 mins to generate 16 million combs on a Core 2 Duo. To find the index using binary search on file takes 397 milliseconds on a SATA drive.
Assuming the maximum setSize is not too large, you can simply generate a lookup table, where the inputs are encoded this way:
int index(a,b,c,...)
{
int key = 0;
key |= 1<<a;
key |= 1<<b;
key |= 1<<c;
//repeat for all arguments
return Lookup[key];
}
To generate the lookup table, look at this "banker's order" algorithm. Generate all the combinations, and also store the base index for each nItems. (For the example on p6, this would be [0,1,5,11,15]). Note that by you storing the answers in the opposite order from the example (LSBs set first) you will only need one table, sized for the largest possible set.
Populate the lookup table by walking through the combinations doing Lookup[combination[i]]=i-baseIdx[nItems]
EDIT: Never mind. This is completely wrong.
Let your combination be (a1, a2, ..., ak-1, ak) where a1 < a2 < ... < ak. Let choose(a,b) = a!/(b!*(a-b)!) if a >= b and 0 otherwise. Then, the index you are looking for is
choose(ak-1, k) + choose(ak-1-1, k-1) + choose(ak-2-1, k-2) + ... + choose (a2-1, 2) + choose (a1-1, 1) + 1
The first term counts the number of k-element combinations such that the largest element is less than ak. The second term counts the number of (k-1)-element combinations such that the largest element is less than ak-1. And, so on.
Notice that the size of the universe of elements to be chosen from (10 in your example) does not play a role in the computation of the index. Can you see why?
Sample solution:
class Program
{
static void Main(string[] args)
{
// The input
var n = 5;
var t = new[] { 2, 4, 5 };
// Helping transformations
ComputeDistances(t);
CorrectDistances(t);
// The algorithm
var r = CalculateRank(t, n);
Console.WriteLine("n = 5");
Console.WriteLine("t = {2, 4, 5}");
Console.WriteLine("r = {0}", r);
Console.ReadKey();
}
static void ComputeDistances(int[] t)
{
var k = t.Length;
while (--k >= 0)
t[k] -= (k + 1);
}
static void CorrectDistances(int[] t)
{
var k = t.Length;
while (--k > 0)
t[k] -= t[k - 1];
}
static int CalculateRank(int[] t, int n)
{
int k = t.Length - 1, r = 0;
for (var i = 0; i < t.Length; i++)
{
if (t[i] == 0)
{
n--;
k--;
continue;
}
for (var j = 0; j < t[i]; j++)
{
n--;
r += CalculateBinomialCoefficient(n, k);
}
n--;
k--;
}
return r;
}
static int CalculateBinomialCoefficient(int n, int k)
{
int i, l = 1, m, x, y;
if (n - k < k)
{
x = k;
y = n - k;
}
else
{
x = n - k;
y = k;
}
for (i = x + 1; i <= n; i++)
l *= i;
m = CalculateFactorial(y);
return l/m;
}
static int CalculateFactorial(int n)
{
int i, w = 1;
for (i = 1; i <= n; i++)
w *= i;
return w;
}
}
The idea behind the scenes is to associate a k-subset with an operation of drawing k-elements from the n-size set. It is a combination, so the overall count of possible items will be (n k). It is a clue that we could seek the solution in Pascal Triangle. After a while of comparing manually written examples with the appropriate numbers from the Pascal Triangle, we will find the pattern and hence the algorithm.
I used user515430's answer and converted to python3. Also this supports non-continuous values so you could pass in [1,3,5,7,9] as your pool instead of range(1,11)
from itertools import combinations
from scipy.special import comb
from pandas import Index
debugcombinations = False
class IndexedCombination:
def __init__(self, _setsize, _poolvalues):
self.setsize = _setsize
self.poolvals = Index(_poolvalues)
self.poolsize = len(self.poolvals)
self.totalcombinations = 1
fast_k = min(self.setsize, self.poolsize - self.setsize)
for i in range(1, fast_k + 1):
self.totalcombinations = self.totalcombinations * (self.poolsize - fast_k + i) // i
#fill the nCr cache
self.choose_cache = {}
n = self.poolsize
k = self.setsize
for i in range(k + 1):
for j in range(n + 1):
if n - j >= k - i:
self.choose_cache[n - j,k - i] = comb(n - j,k - i, exact=True)
if debugcombinations:
print('testnth = ' + str(self.testnth()))
def get_nth_combination(self,index):
n = self.poolsize
r = self.setsize
c = self.totalcombinations
#if index < 0 or index >= c:
# raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(self.poolvals[-1 - n])
return tuple(result)
def get_n_from_combination(self,someset):
n = self.poolsize
k = self.setsize
index = 0
j = 0
for i in range(k):
setidx = self.poolvals.get_loc(someset[i])
for j in range(j + 1, setidx + 1):
index += self.choose_cache[n - j, k - i - 1]
j += 1
return index
#just used to test whether nth_combination from the internet actually works
def testnth(self):
n = 0
_setsize = self.setsize
mainset = self.poolvals
for someset in combinations(mainset, _setsize):
nthset = self.get_nth_combination(n)
n2 = self.get_n_from_combination(nthset)
if debugcombinations:
print(str(n) + ': ' + str(someset) + ' vs ' + str(n2) + ': ' + str(nthset))
if n != n2:
return False
for x in range(_setsize):
if someset[x] != nthset[x]:
return False
n += 1
return True
setcombination = IndexedCombination(5, list(range(1,10+1)))
print( str(setcombination.get_n_from_combination([2,5,7,8,10])))
returns 188
The function a = 2 ^ b can quickly be calculated for any b by doing a = 1 << b.
What about the other way round, getting the value of b for any given a? It should be relatively fast, so logs are out of the question. Anything that's not O(1) is also bad.
I'd be happy with can't be done too if its simply not possible to do without logs or a search type thing.
Build a look-up table. For 32-bit integers, there are only 32 entries so it is O(1).
Most architectures also have an instruction to find the position of the most significant bit of a number a, which is the value b. (gcc provides the __builtin_clz function for this.)
For a BigInt, it can be computed in O(log a) by repeatedly dividing by 2.
int b = -1;
while (a != 0) {
a >>= 1;
++ b;
}
For this sort of thing I usually refer to this page with bit hacks:
Bit Twiddling Hacks
For example:
Find the log base 2 of an integer with a lookup table:
static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
unsigned int v; // 32-bit word to find the log of
unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
There are also a couple of O(log(n)) algorithms given on that page.
Some architectures have a "count leading zeros" instruction. For example, on ARM:
MOV R0,#0x80 # load R0 with (binary) 10000000
CLZ R1,R0 # R1 = number of leading zeros in R0, i.e. 7
This is O(1).
Or you can write:
while ((a >>= 1) > 0) b++;
This is O(1). One could imagine this to be expanded to:
b = (((a >> 1) > 0) ? 1 : 0) + (((a >> 2) > 0) ? 1 : 0) + ... + (((a >> 31) > 0) ? 1 : 0);
With a complier optimization, that once (a >> x) > 0) returns false, rest won't be calculated. Also comparing with 0 is faster then any other comparison. Also:
, where k is maximum of 32 and g is 1.
Reference: Big O notation
But in case you where using BigInteger, then my code example would look like:
int b = 0;
String numberS = "306180206916083902309240650087602475282639486413"
+ "866622577088471913520022894784390350900738050555138105"
+ "234536857820245071373614031482942161565170086143298589"
+ "738273508330367307539078392896587187265470464";
BigInteger a = new BigInteger(numberS);
while ((a = a.shiftRight(1)).compareTo(BigInteger.ZERO) > 0) b++;
System.out.println("b is: " + b);
If a is a double rather than an int then it will be represented as mantissa and exponent. The exponent is the part you are looking for, as this is the logarithm of the number.
If you can hack the binary representation then you can get the exponent out. Look up the IEEE standard to see where and how the exponent is stored.
For an integral value, if some method of getting the most significant bit position is not available then you can binary-search the bits for the upper-most 1 which is therefore O(log numbits). Doing this may well actually perform faster than converting to a double first.
In Java you can use Integer.numberOfLeadingZeros to compute the binary logarithm. It returns the number of leading zeros in the binary representation, so
floor(log2(x)) = 31 - numberOfLeadingZeros(x)
ceil(log2(x)) = 32 - numberOfLeadingZeros(x - 1)
It can't be done without testing the high bit, but most modern FPUs support log2 so all is not lost.