question abouut string sort - algorithm

i have question from programming pearls
problem is following
show how to use Lomuto's partitioning scheme to sort varying length bit strings in time proportional to the sum of their length
and algorithm is following
each record in x[0..n-1] has an integer length and pointer to the array bit[0..length-1]
code
void bsort(l,u,depth)
{
if (l>=u) return;
for (int i=l;i<u;i++)
if (x[i].length<depth)
swap(i,l++);
m=l;
if (x[i].bit[depth] == 0) swap(i,m++);
bsort(l,m-1,depth+1);
bsort(m,u,depth+1);
}
I need the following things:
how this algorithm works
how implement in java?

It's essentially almost the same in Java. If you know Java, which I assume, this shouldn't take more than a few minutes to port. I'm sure we'd be more than happy to give you some pointers as to how the algorithm works, but I'd like to see some work from you first. Take a pencil and some paper and trace the code. That's going to be your best bet with recursion.

Related

Binary search within BTree to improve performance

I'm reading through Cormen et. al. again, specifically the chapter on BTrees, and I am implementing one in C. However, I had a question about whether I could improve performance in searching by using binary search rather than linear search. The pseudocode given in the book looks something like:
ordered_pair* btree_search(btreenode* root, double val){
i = 0;
while(i < root->num_vals && val > x->vals[i]) i++;
if(i < x->num_keys && val == x->vals[i]) return ordered_pair(x, i);
else if (x->leaf) return NULL;
else {
disk_read(x->children[i]);
return btree_search(x->children[i], val);
}
}
(I have modified it to look like C and used index 0 rather than 1)
My question(s):
This looks like a linear search. However, since each collection of keys in a BTree node is implemented as an array, couldn't I use binary search instead? Would that lessen the time complexity of searching each node from O(n) to O(lg(n))? Or does reading from the disk make a binary search here rather pointless. The reason I'm asking this is because it seems relatively trivial to implement, and I am confused why Cormen et. al., doesn't mention it at all. Or perhaps I am just missing something.
If you take the time to answer or attempt to answer this question, thank you for your time!
Yes, you can definitely use a binary search.
Whether or not there's an advantage in it depends on how big your blocks are and how much it cost to read them, but as you say, it's not difficult, and it's not going to be slower.
I would always do this with a binary search.
Perhaps the authors just didn't want to complicate the lesson on B-Trees, or maybe they they are assuming that you've already spent a lot more time reading those blocks in the first place.

Hashing algorithms for data summary

I am on the search for a non-cryptographic hashing algorithm with a given set of properties, but I do not know how to describe it in Google-able terms.
Problem space: I have a vector of 64-bit integers which are mostly linearlly distributed throughout that space. There are two exceptions to this rule: (1) The number 0 occurs considerably frequently and (2) if a number x occurs, it is more likely to occur again than 2^-64. The goal is, given two vectors A and B, to have a convenient mechanism for quickly detecting if A and B are not the same. Not all vectors are of fixed size, but any vector I wish to compare to another will have the same size (aka: a size check is trivial).
The only special requirement I have is I would like the ability to "back out" a piece of data. In other words, given A[i] = x and a hash(A), it should be cheap to compute hash(A) for A[i] = y. In other words, I want a non-cryptographic hash.
The most reasonable thing I have come up with is this (in Python-ish):
# Imagine this uses a Mersenne Twister or some other seeded RNG...
NUMS = generate_numbers(seed)
def hash(a):
out = 0
for idx in range(len(a)):
out ^= a[idx] ^ NUMS[idx]
return out
def hash_replace(orig_hash, idx, orig_val, new_val):
return orig_hash ^ (orig_val ^ NUMS[idx]) ^ (new_val ^ NUMS[idx])
It is an exceedingly simple algorithm and it probably works okay. However, all my experience with writing hashing algorithms tells me somebody else has already solved this problem in a better way.
I think what you are looking for is called homomorphic hashing algorithm and it has already been discussed Paillier cryptosystem.
As far as I can see from that discussion, there are no practical implementation nowadays.
The most interesting feature, the one for which I guess it fits your needs, is that:
H(x*y) = H(x)*H(y)
Because of that, you can freely define the lower limit of your unit and rely on that property.
I've used the Paillier cryptosystem a few years ago (there was a Java implementation somewhere, but I don't have anymore the link) during my studies, but it's far more complex in respect of what you are looking for.
It has interesting feature under certain constraints, like the following one:
n*C(x) = C(n*x)
Again, it looks to me similar to what you are looking for, so maybe you should search for this family of hashing algorithms. I'll have a try with Google searching for a more specific link.
References:
This one is quite interesting, but maybe it is not a viable solution because of your space that is [0-2^64[ (unless you accept to deal with big numbers).

How to assess maximum number of recursive calls before stack overflows

Lets take a recursive function, for example factorial. Lets also assume that we have a stack of 1 MB size. Using a pen and paper, how can I estimate the number of recursive calls to the function before the stack overflows? I'm not interested in any particular language but rather in an abstract approach.
There are questions on SO that look similar but most of them are concerned with a specific language, or extending stack size, or estimating it by running specific function, or preventing overflow. I would like to find a mathematical way to estimate it.
I found similar question in an algorithmic challenge but couldn't come up with any reasonable solution.
Any suggestion highly appreciated.
EDIT
In response to provided replays if the language truly cannot be taken out of the equation let's assume it's C#. Also, since we are passing simple int or long to the function it's not passed by reference but as a copy. Also, assume a naive implementation, without hashing, without multi-threading, an implementation that as much as possible resembles a mathematical representation of the function:
private static long Factorial(long n)
{
if (n < 0)
{
throw new ArgumentException("Negative numbers not supported");
}
switch (n)
{
case 0:
return 1;
case 1:
return 1;
default:
return n * Factorial(n - 1);
}
}
It highly depends on the implementation of the function. How much memory does the function use, before calling itself again. When it recurses 100 times, you will also have 100 function scopes in memory, including the function arguments and variables. It also reserves 100 places on the stack to store the return values.
I don't think the language can easily be taken out of the equation, because you need to know exactly how the stack is used. For examples are objects passed by reference? Or are the objects copy as a new instance on the stack?

Branchless Binary Search

I'm curious if anyone could explain a branchless binary search implementation to me. I saw it mentioned in a recent question but I can't imagine how it would be implemented. I assume it could be useful to avoid branches if the number of items is quite large.
I'm going to assume you're talking about the sentence "Make a static const array of all the perfect squares in the domain you want to support, and perform a fast branchless binary search on it." found in this answer.
A "branchless" binary search is basically just an unrolled binary search loop. This only works if you know in advance the number of items in the array you're searching (as you would if it's static const). You can write a program to write the unrolled code if it's too long to do by hand.
Then, you must benchmark your solution to see whether it really is faster than a loop. If your branchless code is too big, it won't fit inside the CPU's fast instruction cache and will take longer to run than the equivalent loop.
If one has a function which returns +1, -1, or 0 based upon the position of the correct item versus the current one, one could initialize position to list size/2, and stepsize to position/2, and then after each comparison do position+=direction*stepsize; stepsize=stepsize/2. Iterate until stepsize is zero.

Is there a way to predict unknown function value based on its previous values

I have values returned by unknown function like for example
# this is an easy case - parabolic function
# but in my case function is realy unknown as it is connected to process execution time
[0, 1, 4, 9]
is there a way to predict next value?
Not necessarily. Your "parabolic function" might be implemented like this:
def mindscrew
#nums ||= [0, 1, 4, 9, "cat", "dog", "cheese"]
#nums.pop
end
You can take a guess, but to predict with certainty is impossible.
You can try using neural networks approach. There are pretty many articles you can find by Google query "neural network function approximation". Many books are also available, e.g. this one.
If you just want data points
Extrapolation of data outside of known points can be estimated, but you need to accept the potential differences are much larger than with interpolation of data between known points. Strictly, both can be arbitrarily inaccurate, as the function could do anything crazy between the known points, even if it is a well-behaved continuous function. And if it isn't well-behaved, all bets are already off ;-p
There are a number of mathematical approaches to this (that have direct application to computer science) - anything from simple linear algebra to things like cubic splines; and everything in between.
If you want the function
Getting esoteric; another interesting model here is genetic programming; by evolving an expression over the known data points it is possible to find a suitably-close approximation. Sometimes it works; sometimes it doesn't. Not the language you were looking for, but Jason Bock shows some C# code that does this in .NET 3.5, here: Evolving LINQ Expressions.
I happen to have his code "to hand" (I've used it in some presentations); with something like a => a * a it will find it almost instantly, but it should (in theory) be able to find virtually any method - but without any defined maximum run length ;-p It is also possible to get into a dead end (evolutionary speaking) where you simply never recover...
Use the Wolfram Alpha API :)
Yes. Maybe.
If you have some input and output values, i.e. in your case [0,1,2,3] and [0,1,4,9], you could use response surfaces (basicly function fitting i believe) to 'guess' the actual function (in your case f(x)=x^2). If you let your guessing function be f(x)=c1*x+c2*x^2+c3 there are algorithms that will determine that c1=0, c2=1 and c3=0 given your input and output and given the resulting function you can predict the next value.
Note that most other answers to this question are valid as well. I am just assuming that you want to fit some function to data. In other words, I find your question quite vague, please try to pose your questions as complete as possible!
In general, no... unless you know it's a function of a particular form (e.g. polynomial of some degree N) and there is enough information to constrain the function.
e.g. for a more "ordinary" counterexample (see Chuck's answer) for why you can't necessarily assume n^2 w/o knowing it's a quadratic equation, you could have f(n) = n4 - 6n3 + 12n2 - 6n, which has for n=0,1,2,3,4,5 f(n) = 0,1,4,9,40,145.
If you do know it's a particular form, there are some options... if the form is a linear addition of basis functions (e.g. f(x) = a + bcos(x) + csqrt(x)) then using least-squares can get you the unknown coefficients for the best fit using those basis functions.
See also this question.
You can apply statistical methods to try and guess the next answer, but that might not work very well if the function is like this one (c):
int evil(void){
static int e = 0;
if(50 == e++){
e = e * 100;
}
return e;
}
This function will return nice simple increasing numbers then ... BAM.
That's a hard problem.
You should check out the recurrence relation equation for special cases where it could be possible such a task.

Resources