Data Structure for Representing Paths of a Tree Without Redundancy - data-structures

Consider the following tree structure in a Clojure code:
(def tree [7 9 [7 5 3 [4 6 9] 9 3] 1 [2 7 9 9]])
The paths to - for instance - all even numbers in the tree would be:
[[2 3 0] [2 3 1] [4 0]]
This is a list of lists. Each 'inner' list represents an absolute path from the root of the tree to the leaves of interest.
I'm looking now for a data structure to represent such a result without redundancy. As you can see, for instance the fragment of [2 3] is repeated in two entries. I came up with a nested hash-map, but maybe there's something simpler:
{2 {3 {0 true 1 true}
4 {0 true}}

I believe that DAWG is overkill for your problem. Suffixes of your paths are barely going to be shared. So usage of trie should be enough (this is actually your nested hash map approach). Also it's pretty easy to generate it in clojure.

I think you could use a "deterministic acyclic finite state automaton (DAFSA) also called a directed acyclic word graph (DAWG)".
In your data, all the paths consist a set of strings (or words). Each path to a leaf would represent a path to an even number.

Related

How to implement least cost path through matrix in Haskell

Hello I have a particular question I cant find any resources on for Haskell. I'm looking to create a function that takes a mmatrix in as a parameter and returns an array for haskell. something like:
returnPossiblePaths :: [[Int]] -> [Int]
The condition though, is that I return the the array with the 'least cost path' or the path that has the lowest sum. So if I have the matrix:
[6 9 3
2 5 7]
I want to iterate from the head to the tail, add the numbers up in that path and return the array with the smallest sum. e.g:
6 -> 9 -> 3 -> 7 = 25
6 -> 9 -> 5 -> 7 = 27
6 -> 2 -> 5 -> 7 = 20
6 -> 2 -> 5 -> 9 -> 3 -> 7 = 32
So here my result array would be: [6, 2, 5, 7]. I need help on how to go about doing this. I have no idea how I would go about iterating from head to tail in different 'paths' without going through all the elements. My general plan was to get all the paths into arrays, map sum to al of them then compare the results and return the array with the smallest sum. So I would first get all the arrays (paths) from the matrix then apply this function to them:
addm::[Int]->Int
addm (x:xs) = sum(x:xs)
store those values in a variable, compare them then return the lowest one. I know haskell has amazing functions that make this way easier and I was wondering if I could get help on how to go about doing this. Any advice is greatly appreciated, thanks!

Practical algorithms for permuting external memory

On a spinning disk, I have N records that I want to permute. In RAM, I have an array of N indices that contain the desired permutation. I also have enough RAM to hold n records at a time. What algorithm can I use to execute the permutation on disk as quickly as possible, taking into account the fact that sequential disk access is a lot faster?
I have plenty of excess disk to use for intermediate files, if desired.
This is a known problem. Find the cycles in your permutation order. For instance, given five records to permute [1, 0, 3, 4, 2], you have cycles (0, 1) and (2, 3, 4). You do this by picking an unused starting position; follow the index pointers until you return to your starting point. The sequence of pointers describes a cycle.
You then permute the records with an internal temporary variable, one record long.
temp = disk[0]
disk[0] = disk[1]
disk[1] = temp
temp = disk[2]
disk[2] = disk[3]
disk[3] = disk[4]
disk[4] = temp
Note that you can also perform the permutation as you traverse the pointers. You will also need some method to recall which positions have already been permuted, such as clearing the permutation index (set it to -1).
Can you see how to generalize that?
This is an problem with interval coordination. I'll simplify the notation slightly by changing the memory available to M records -- having upper- and lower-case N is a little confusing.
First, we re-cast the permutations as a series of intervals, the rotational span during which a record needs to reside in RAM. If a record needs to be written to a lower-numbered position, we increase the endpoint by the list size, to indicate the wraparound -- have to wait for the next disk rotation. For instance, using my earlier example, we expand the list:
[1, 0, 3, 4, 2]
0 -> 1
1 -> 0+5
2 -> 3
3 -> 4
4 -> 2+5
Now, we apply standard greedy scheduling resolution. First, sort by endpoint:
[0, 1]
[2, 3]
[3, 4]
[1, 5]
[4, 7]
Now, apply the algorithm for M-1 "lanes"; the extra one is needed for swap space. We fill each lane, appending the interval with the earliest endpoint, whose start-point doesn't overlap:
[0, 1] [2, 3] [3, 4] [4, 7]
[1, 5]
We can do this in a total of 7 "ticks" if M >= 3. If M=2, we defer the second lane by 2 rotations to [11, 15].
Sneftal's nice example gives us more troubles, with deeper overlap:
[0, 4]
[1, 5]
[2, 6]
[3, 7]
[4, 0+8]
[5, 1+8]
[6, 2+8]
[7, 3+8]
This requires 4 "lanes" if available, deferring lanes as needed if M < 5.
The pathological case is where every record in the permutation needs to be copied back one position, such as [3, 0, 1, 2], with M=2.
[0, 3]
[1, 4]
[2, 5]
[3, 6]
In this case, we walk through the deferral cycle multiple times. At the end of every rotation, we have to defer all remaining intervals by one rotation, resulting in
[0, 3] [3, 6] [2+4, 5+4] [1+4+4, 4+4+4]
Does that get you moving, or do you need more detail?
I have an idea, which might need further improvement. But here it goes:
suppose the hdd has the following structure:
5 4 1 2 3
And we want to write out this permutation:
2 3 5 1 4
Since hdd is a circular buffer, and assuming it can only rotate in one direction, we can write the above permutation using shifts as such:
5 >> 2
4 >> 3
1 >> 1
2 >> 2
3 >> 2
So let's put that in an array, and since we know it is a circular array, lets put its mirrors side by side:
| 2 3 1 2 2 | 2 3 1 2 2| 2 3 1 2 2 | 2 3 1 2 2 |... Inf
Since we want to favor sequential reads, (or writes) we can put a cost function to the above series. Let the cost function be linear, i. e:
0 1 2 3 4 5 6 7 8 9 10 ... Inf
Now, let us add the cost function to the above series, but how to select the starting point?
The idea is to select the starting point such that you get the maximum congruent monotonically increasing sequence.
For example, if you select the 0 point to be on "3", you'll get
(1) | - 3 2 4 5 | 6 8 7 9 10 | ...
If you select the 0 point to be on "2", the one just right of "1", you'll get:
(2) | - - - 2 3 | 4 6 5 7 8 | ...
Since we are trying to favor consecutive reads, lets define our read-write function to work as such:
f():
At any currently pointed hdd location, function will read the currently pointed hdd file, into available RAM. (namely, total space - 1, because we want to save 1 for swap)
If no available space is left on RAM for read, the function will assert and program will halt.
At any current hdd location, if ram holds the value that we want to be written in that hdd location, function reads the current file into swap space, writes the wanted value from the ram to hdd, and destroys the value in ram.
If a value is placed into hdd, function will check if the sequence is completed. If it is, program will return with success.
Now, we should note that if the following holds:
shift amount <= n - 1 (n : available memory we can hold)
We can traverse the hard disk in once pass using the above function. For example:
current: 4 5 6 7 0 1 2 3
we want: 0 1 2 3 4 5 6 7
n : 5
We can start anywhere we want, say from the initial "4". We read 4 items sequentially, (n has 4 items now) and we start placing from 0 1 2 3, (we can because n = 5 total, and 4 is used. 1 is used for swap). So the total operations is 4 consecutive reads, and then r-w operations for 8 times.
Using that analogy, it becomes clear that if we subtract "n-1" from equations (1) and (2), the positions which have value "<= 0" will be a better suit for initial position because the ones higher than zero will definitely require another pass.
So we select eq. (2) and subtract, for let's say "n = 3", we subtract 2 from eq. (2):
(2) | - - - 0 1 | 2 4 3 5 6 | ...
Now it is clear that, using f(), and starting from 0, assuming n = 3, we will have a starting operation as such: r, r, r-w, r-w, ...
So, how do we do the rest and find minimum cost? We will place an array with initial minimum cost, just below equation (2). The positions in that array will signify where we want f() to be executed.
| - - - 0 1 | 2 4 3 5 6 | ...
| - - - 1 1 | 1 1 1 1 1 | ...
The second array, the ones with 1's and 0's tell the program where to execute f(). Note that, if we assumed those locations wrong, f() will assert.
Before we start actually placing files into hdd, we of course want to see if the f() positions are correct. We check if there are assertions, we we will try to minimize cost whilst removing all assertions. So, e.g:
(1) 1111000000000000001111
(2) 1111111000000000000000
(1) obviously has higher cost that (2). So the question simplifies on finding the 1-0 array.
Some ideas on finding the best array:
Simplest solution is to write out all 1's and turn assertions into 0's. (essentially it's a skip). This method is guaranteed to work.
Brute force: write an array of as shown in (2) and start shifting 1's to right, in such an order that tries out every permutation available:
1111111100000000
1111111010000000
1111110110000000
...
Full random approach: Plug in mt1997 and start permuting. Whenever you see a sharp drop in cost, stop executing and implement hdd copy-paste. You won't find the global minimum, but you'll get a nice trade-off.
Genetic algorithms: For permutations where "shift count is much lower than n - 1", the methodology provided in this answer should (?) provide a global minimum and smooth gradients. This allows one to use genetic algorithms without relying on mutations too much.
One advantage I find in this approach is that, since OP mentioned that this is a real life problem, the method provides an easy(ier?) way to change cost functions. It is easier to detect the effect of say, having lots of contigous small files to be copied vs. having a single huge file. Or perhaps rrwwrrww is better than rrrrwwww?
Does any of this even make sense? We will have to try out ...

Having trouble understanding the K-way merge algorithm (Counter example given)

In K way merge sort, the solution that uses a heap: essentially maintains a heap and constantly extracts max from that heap. I have a counterexample for why this won't work well.
5 -> 1 -> 0
4 -> 2 -> 1
3 -> 2 -> 0
Suppose we initialize our heap. It contains {5, 4, 3}.
We run extract max, we obtain 5 and add that into our new list (that represents the final solution). Our heap now looks like {4,3}. We then refill our heap with the head of list that we extracted the max element from.
This implies that we get something like this: {4, 3, 1}.
This doesn't make sense to me. This heap doesn't represent the top K elements anymore. 1 shouldn't be used to refill the heap, it should have been 2. So, this O(nlgk) method doesn't make much sense to me.
I hope someone can shed light on how this algorithm works because I'm stuck here.
The max heap always contains the max elements of k lists (or arrays). For your 'counter' example:
5 -> 1 -> 0
4 -> 2 -> 1
3 -> 2 -> 0
The heap is {5, 4, 3} contains max elements of these three lists.
Now you extract 5 from the heap, means you also remove 5 from the first list:
5-->1-->0: after extract 5, the list now is 1-->0: so 1 now is the top of the list.
Then the new heap is {4, 3, 1}, still contains max elements of lists.
Lets continue your example: the current heap after extracting 5 and heapifying is:
{4, 3, 1}
Extract 4 from the heap, means you also remove 4 from:
4-->2-->1: remove 4 you have 2-->1. 2 now is the top element of the list.
Then a new heap now is
{3, 2, 1}
Keep doing this, you get what you want (descending list).

Divide Set of numbers to sequence? Find General Term?

How can we divide set of numbers to sequence? And find the general term?
1 - numbers are always in order
2 - if we have n numbers n/2 numbers are always present
For example we have:
Input: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
Output--> 2*X, x=[0..15]
OR
Input: 0,2,4,5,6,8,10,12,14,15,16,18,20,22,24,26,28,30
Divide into two set
A: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
B: 5,10,15,20
Output--> 2*X, x=[0..15] AND 5*X, x=[1..4]
I think this is very difficult, any comments?
What computer field or algorithm can help me?
The problem as I understand it is this: Given a sequence of numbers, find the set of sequences that start from zero and increase by a constant multiple which cover this set.
Here's a general outline of what I would do:
I would make a list of all the numbers in the set, and iterate through starting from the first two elements to generate all of the possible sets meeting your criteria which are here. If you encounter an element in the list, you can remove it from consideration as a generating number since any list with that number as a constant multiple is a subset of a list you've encountered before. When you are done you will have a list of possible sets you can use to cover that set. FOR EXAMPLE:
0,2,4,5,6,8,10,12,14,15,16,18,20,22,24,26,28,30
We will start with 0 and 2. We'll look for elements that are successively 2 larger and remove them from the list of elements that will be considered as possible multiples. Once we find a multiple of 2 that's not in this list, we'll stop generating. Here we go:
s(2) = [0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30]
Which leaves:
[5,15]
as the two potential other candidates. Do you see that any of the elements, eg, 4, which are divisible by two will make subsets of that list and thus don't need to be considered?
The remaining list in the set will start at 0 and increase by 5, our smallest element:
[0,5,10,15,20]
(Remember we are checking the original list for these multiples and not the truncated list- the truncated list is only the list of remaining candidates. When the candidate list is empty we know we will have found all of the sets which are contained in this set who have no supersets.
For a more complex example:
[0 2 3 4 5 6 7 8 9 10 12 13 14 15]
We'll start with:
[0 2 4 6 8 10 12 14]
Which leaves
[3 5 7 9 13 15]
as candidates, which in turn generates:
[0 3 6 9 12 15]
which leaves
[5 7 13]
which generates
[0 5 10 15]
which leaves
[7 13]
which generates
[0 7 14]
which leaves
[13]
which generates
[0 13].
The total combination of sets is:
[0 2 4 6 8 10 12 14]
[0 3 6 9 12 15]
[0 5 10 15]
[0 7 14]
[0 13].
At this point, you have the smallest list of all of the sets needed to cover your set. It should be trivial to generate the proper [0,1...n]/a*n descriptors from here.

Permutation of a vector

suppose I have a vector:
0 1 2 3 4 5
[45,89,22,31,23,76]
And a permutation of its indices:
[5,3,2,1,0,4]
Is there an efficient way to resort it according to the permutation thus obtaining:
[76,31,22,89,45,23]
Using at most O(1) additional space?
Yes. Starting from the leftmost position, we put the element there in its correct position i by swapping it with the (other) misplaced element at that position i. This is where we need the O(1) additional space. We keep swapping pairs of elements around until the element in this position is correct. Only then do we proceed to the next position and do the same thing.
Example:
[5 3 2 1 0 4] initial state
[4 3 2 1 0 5] swapped (5,4), 5 is now in the correct position, but 4 is still wrong
[0 3 2 1 4 5] swapped (4,0), now both 4 and 0 are in the correct positions, move on to next position
[0 1 2 3 4 5] swapped (3,1), now 1 and 3 are both in the correct positions, move on to next position
[0 1 2 3 4 5] all elements are in the correct positions, end.
Note:
Since each swap operation puts at least one (of the two) elements in the correct position, we need no more than N such swaps altogether.
Zach's solution is very good.
Still, I was wondering why there is any need to sort. If you have the permutation of the indices, use the values as a pointer to the old array.
This may eliminate the need to sort the array in the first place. This is not a solution that can be used in all cases, but it will work fine in most cases.
For example:
a = [45,89,22,31,23,76];
b = [5,3,2,1,0,4]
Now if you want to lop through the values in a, you can do something like (pseudo-code):
for i=0 to 4
{
process(a[i]);
}
If you want to loop through the values in the new order, do:
for i=0 to 4
{
process(a[b[i]]);
}
As mentioned earlier, this soluion may be sufficient in many cases, but may not in some other cases. For other cases you can use the solution by Zach.But for the cases where this solution can be used, it is better because no sorting is needed at all.

Resources