Data structure for "interpolated" table lookup - performance

I have a collection of 2-D points which represent a 1-variable function.
Given a random input value, I have to select the closest value.
Example:
Curve:
(1,5)
(2,8)
(5,9)
Input: 3 Output: 8
My main concern is speed, space doesn't matter as much.
Which data structure would be best?
EDIT: The table is static, it won't change during runtime

It depends upon whether the table is static or dynamic.
If it's static data, simple sorted array and binary search will get the job done: search for the key, if it isn't found, check the index above and below to see which is closer to the search key, and return its associated value.
If the data is dynamic, I'd go with a B+Tree variant (though any balanced tree structure should work). Essentially the same algorithm, but you'd be checking sibling nodes, instead of just checking adjacent array cells.

You say the table is static, and won't change during runtime.
Then if you need blazing performance, and if the table is not too large, it's hard to beat a hard-coded binary search.
For the table you gave, it looks like this:
result = (x < 3.5
? (x < 1.5
? 5
: 8
)
: 9
);
You may have to write a little program to take the table as input, and generate the code as output, so you can include it in your main program.
If you don't mind using a macro, you might make it a little easier to write, like this:
#define M(a,middle,b) (x < (middle) ? (a) : (b))
result = M( M(5, 1.5, 8), 3.5, 9);
The only way to beat that is with a hard-coded hash search (using a switch statement).
If the table can change between runs, it might make sense to, whenever the program starts, it generates the code, compiles and links it into a dll, loads the dll, and runs with that.
That can take all of about a second, and then you have the high speed.

Related

How do I generate a predictable random stream in hierarchy in c#?

I am making a procedural game with hierarchy.
So object A will have 10 children.
Each child will have 10 children and so on.
Now suppose I want to give each child a random colour, and a random position (assume these are given by integers).
Therefor let X be the "ID" of an object.
Let COLOUR and POSITION be enums of type PROPERTY.
Then I want to generate random integers:
int GenerateRandomInteger(PROPERTY P, int childNumber);
So I can use:
int N = parentObject.GenerateRandomInteger(COLOUR, 7);
For example.
Any ideas how to go about this?
In this case, GetRandomInteger should be implemented as a hash function. A hash function takes arbitrary data (here, the values of P and childNumber) and outputs a hash code. For the purposes of a game:
The hash function should have the avalanche property, meaning that every bit of the input affects every bit of the hash code.
Good hash functions here include MurmurHash3 and xxHash.
This answer also assumes that childNumber is unique throughout the application, rather than unique for a given parent.
The resulting hash code can then be used to generate a pseudorandom color and a position (for example, the first 24 bits of the hash code can be extracted and treated as a 8-bit-per-component RGB color). But further details on how this will work will depend on what programming language you're using and what ranges are acceptable for colors and positions, which you didn't specify in your question (there are several languages that use ints and enums, for example).

Make a previously unknown number of parallel operations. In VHDL

Im working on a project for which I need to make calculations with vectors (orthogonalizing a matrix using gram schmidt method). The length of this vectors is unknown now, the program must be able to adapt to different lengths. One of such calculations is calculating a new vector (C) which is the result of adding A and B. Each element of the vectors is a number in fixed-point.
I want C(i)=A(i)+B(i). For all the elements of the vector (for i=0 to N, where N is the vector length).
I can find 2 solutions for this but both present some problems:
1- I can declare in the entity, vectors whose length changes according to a generic and then just create a for loop which goes through all the vector.
for I in 0 to N loop
C(I)<=A(I)+B(I);
end loop;
The problem with this solution is that the execution would be sequential, and therefore slow. Im not completly sure about this and I dont know how to check it but I guess that the compiler is not smart enough to notice that it can be processed in parallel. In this application speed is a key factor.
2- I can declare vectors which are as long as the maximum possible length for the actual data and fill them with zeroes. Then I could just assign:
C(0)<=A(0)+B(0);
C(1)<=A(1)+B(1);
C(2)<=A(2)+B(2);
...
C(Nmax)<=A(Nmax)+B(Nmax);
This is not an elegant solution and in this application N can be between 3 and 300 therefore it could be a complete waste and tedious to program.
3- I want to find a third solution which could be able to create a number (asigned by the generic) of combinational calculations following a template such as C(i)=A(i)+B(i). Is there any solution like this? It is actually creating a loop which would not be executed sequentially but instead all at the same time.
I know that similar stuff can be done using CUDA but this project is actually a comparison between GPUs and FPGAs, so changing the platform is not a suitable solution either.
Thank you in advance
Edit: I have tought of another unsatisfactory solution but I want to share it in case it is helpful for somebody else checking this in the future. Given that A and B have the same length, you can write them in a 1-D format, that is: A(normal)=[1001,1100,0011], A(1-D)=100111000011. The same would be done with B.
If you know before hand that the sum of any two possible numbers can be expressed with the same amount of bits, there will be no problems. So with 4 unsigned bits you should make sure that in any possible case the numbers in A or B are !>0111 (not higher than 0111). You could just write C(1-D)=A(1-D)+B(1-D) and then just asign C(0)=C(1-D)(3 downto 0), C(1)=C(1-D)(7 downto 4) etc.
If you cannot make sure that the numbers are not higher than 0111 (in the 4 bit case) it wont work.
You might be able to use the length attribute to create a loop depending on the size of your vector.
https://www.csee.umbc.edu/portal/help/VHDL/attribute.html
As mentioned in the comment to the question the loop should be unrolled as long as it is not synchronized to the clock.

What's the most efficient way of combining switch/if statements

This question doesn't address any programming language in particular but of course I'm happy to hear some examples.
Imagine a big number of files, let's say 5000, that have all kinds of letters and numbers in it. Then, there is a method that receives a user input that acts as an alias in order to display that file. Without having the files sorted in a folder, the method(s) need to return the file name that is associated to the alias the user provided.
So let's say user input "gd322" stands for the file named "k4e23", the method would look like
if(input.equals("gd322")){
return "k4e23";
}
Now, imagine having 4 values in that method:
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
case s3red: return 536fg;
case h563d: return h425d;
} //switch on string, no break, no string indicators, ..., pls ignore the syntax, it's just pseudo
Keeping in mind we have 5000 entries, there are probably more than just 2 entries starting with g. Now, if the user input starts with 's', instead of wasting CPU cycles checking all the a's, b's, c's, ..., we could also make another switch for this, which then directs to the 'next' methods like this:
switch(input[0]){ //implying we could access strings like that
case a: switchA(input);
case b: switchB(input);
// [...]
case g: switchG(input);
case s: switchS(input);
}
So the CPU doesn't have to check on all of them, but rather calls a method like this:
switchG(String input){
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
// [...]
}
Is there any field of computer science dealing with this? I don't know how to call it and therefore don't know how to search for it but I think my thoughts make sense on a large scale. Pls move the thread if it doesn't belong here but I really wanna see your thoughts on this.
EDIT: don't quote me on that "5000", I am not in the situation described above and I wanted to talk about this completely theoretical, it could also be 3 entries or 300'000, maybe even less or more
If you have 5000 options, you're probably better off hashing them than having hard-coded if / switch statements. In c++ you could also use std::map to pair a function pointer or other option handling information with each possible option.
Interesting, but I don't think you can give a generic answer. It all depends on how the code is executed. Many compilers will have all kinds of optimizations, in the if and switch, but also in the way strings are compared.
That said, if you have actual (disk) files with those lists, then reading the file will probably take much longer than processing it, since disk I/O is very slow compared to memory access and CPU processing.
And if you have a list like that, you may want to build a hash table, or simply a sorted list/array in which you can perform a binary search. Sorting it also takes time, but if you have to do many lookups in the same list, it may be well worth the time.
Is there any field of computer science dealing with this?
Yes, the science of efficient data structures. Well, isn't that what CS is all about? :-)
The algorithm you described resembles a trie. It wouldn't be statically encoded in the source code with switch statements, but would use dynamic lookups in a structure loaded from somewhere and stuff, but the idea is the same.
Yes the problem is known and solved since decades. Hash functions.
Basically you have a set of values (here strings like "gd322", "g344d") and you want to know if some other value v is among them.
The idea is to put the strings in a big array, at an index calculated from their values by some function. Given a value v, you'll compute an index the same way, and check whether the value v is here or not. Much faster than checking the whole array.
Of course there is a problem with different values falling at the same place : collisions. Some magic is needed then : perfect hash functions whose coefficients are tweaked so values from the initial set don't cause any collisions.

Fastest data structure with default values for undefined indexes?

I'm trying to create a 2d array where, when I access an index, will return the value. However, if an undefined index is accessed, it calls a callback and fills the index with that value, and then returns the value.
The array will have negative indexes, too, but I can overcome that by using 4 arrays (one for each quadrant around 0,0).
You can create a Matrix class that relies on tuples and dictionary, with the following behavior :
from collections import namedtuple
2DMatrixEntry = namedtuple("2DMatrixEntry", "x", "y", "value")
matrix = new dict()
defaultValue = 0
# add entry at 0;1
matrix[2DMatrixEntry(0,1)] = 10.0
# get value at 0;1
key = 2DMatrixEntry(0,1)
value = {defaultValue,matrix[key]}[key in matrix]
Cheers
This question is probably too broad for stackoverflow. - There is not a generic "one size fits all" solution for this, and the results depend a lot on the language used (and standard library).
There are several problems in this question. First of all let us consider a 2d array, we say this is simply already part of the language and that such an array grows dynamically on access. If this isn't the case, the question becomes really language dependent.
Now often when allocating memory the language automatically initializes the spots (again language dependent on how this happens and what the best method is, look into RAII). Though I can foresee that actual calculation of the specific cell might be costly (compared to allocation). In that case an interesting thing might be so called "two-phase construction". The array has to be filled with tuples/objects. The default construction of an object sets a bit/boolean to false - indicating that the value is not ready. Then on acces (ie a get() method or a operator() - language dependent) if this bit is false it constructs, else it just reads.
Another method is to use a dictionary/key-value map. Where the key would be the coordinates and the value the value. This has the advantage that the problem of construct-on-access is inherit to the datastructure (though again language dependent). The drawback of using maps however is that lookup speed of a value changes from O(1) to O(logn). (The actual time is widely different depending on the language though).
At last I hope you understand that how to do this depends on more specific requirements, the language you used and other libraries. In the end there is only a single data structure that is in each language: a long sequence of unallocated values. Anything more advanced than that depends on the language.

implementing a basic search engine with prefix tree

The problem is the implementing a prefix tree (Trie) in functional language without using any storage and iterative method.
I am trying to solve this problem. How should I approach this problem ? Can you give me exact algorithm or link which shows already implemented one in any functional language?
Why I am trying to do => creating a simple search engine with an feature of
adding word to tree
searching a word in tree
deleting a word in tree
Why I want to use functional language => I want improve my problem-solving ability a bit further.
NOTE : Since it is my hobby project, I will first implement basic features.
EDIT:
i.) What I mean about "without using storage" => I don't want use variable storage ( ex int a ), reference to a variable, array . I want calculate the result by recursively then showing result to the screen.
ii.) I have wrote some line but then I have erased because what I wrote is made me angry. Sorry for not showing my effort.
Take a look at haskell's Data.IntMap. It is purely functional implementation of
Patricia trie and it's source is quite readable.
bytestring-trie package extends this approach to ByteStrings
There is accompanying paper Fast Mergeable Integer Maps which is also readable and through. It describes implementation step-by-step: from binary tries to big-endian patricia trees.
Here is little extract from the paper.
At its simplest, a binary trie is a complete binary tree of depth
equal to the number of bits in the keys, where each leaf is either
empty, indicating that the corresponding key is unbound, or full, in
which case it contains the data to which the corresponding key is
bound. This style of trie might be represented in Standard ML as
datatype 'a Dict =
Empty
| Lf of 'a
| Br of 'a Dict * 'a Dict
To lookup a value in a binary trie, we simply read the bits of the
key, going left or right as directed, until we reach a leaf.
fun lookup (k, Empty) = NONE
| lookup (k, Lf x) = SOME x
| lookup (k, Br (t0,t1)) =
if even k then lookup (k div 2, t0)
else lookup (k div 2, t1)
The key point in immutable data structure implementations is sharing of both data and structure. To update an object you should create new version of it with the most possible number of shared nodes. Concretely for tries following approach may be used.
Consider such a trie (from Wikipedia):
Imagine that you haven't added word "inn" yet, but you already have word "in". To add "inn" you have to create new instance of the whole trie with "inn" added. However, you are not forced to copy the whole thing - you can create only new instance of the root node (this without label) and the right banch. New root node will point to new right banch, but to old other branches, so with each update most of the structure is shared with the previous state.
However, your keys may be quite long, so recreating the whole branch each time is still both time and space consuming. To lessen this effect, you may share structure inside one node too. Normally each node is a vector or map of all possible outcomes (e.g. in a picture node with label "te" has 3 outcomes - "a", "d" and "n"). There are plenty of implementations for immutable maps (Scala, Clojure, see their repositories for more examples) and Clojure also has excellent implementation of an immutable vector (which is actually a tree).
All operations on creating, updating and searching resulting tries may be implemented recursively without any mutable state.

Resources