Find recursive, hierarchical structures in a sequence - algorithm

I am looking for an efficient algorithm to find structures that are both recursive and hierarchical from a sequence. I did come across the sequitur algorithm but it only returns hierarchical structures.
Consider the following sequence as an example: abcbcabc
What sequitur returns:
0 -> 1 2 1
1 -> a 2
2 -> b c
Required output:
0 -> 1 1
1 -> a 2
2 -> b c | b c 2

Related

Counting number of key comparisons in hash table

I have a hash table that look like this:
0
1 -> 1101 -> 1222 -> 1343 \\ 3 key comparison
2
3 -> 2973 -> 2588 \\ 2 key comparison
4
How many key comparisons are there?
The given answer is 1 + 2 + 1 = 4 but shouldn't it be 3 + 2 = 5?
The given answer is correct. One possible sequence:
At first, you have an empty list -> then add 1101 -> no comparison needed.
Add 1222 -> go to the 1 list, compared it with 1101 -> add it to the end of the list -> 1 comparison.
Add 1343 -> go to the 1 list, compared it with 1101, 1222 -> add it to the end of the list -> 2 comparisons.
Add 2973 -> no comparison,
Add 2588 -> go to 3 list, compared it with 2973 -> 1 comparison.
So, in total, the number of comparison is 0 + 1 + 2 + 0 + 1
Don't know where do you get the 3 + 2 = 5 from? total number of elements?

Algorithm to calculate the price volatility of a commodity

I am trying to design an algorithm to calculate how volatile the price fluctuations of a commodity are.
The way I would like this to work is that if the price of a commodity constantly goes up and down, it should have a higher score than if the price of the commodity gradually increases and then falls in price rapidly.
Here is an example of what I mean:
Commodity A: 1 -> 2 -> 3 -> 2 -> 1 -> 3 -> 4 -> 2 -> 1
Commodity B: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 2
Commodity C: 1 -> 2 -> 3 -> 4 -> 5 -> 4 -> 3 -> 2-> 1
Commodity A has a 'wave' like pattern in that its price goes up and falls down on a regular basis.
Commodity B has a 'cliff' like pattern in that the price goes up gradually and then falls steeply.
Commodity C has a 'hill' like pattern in that the price rises gradually and then falls gradually.
A should receive the highest ranking, followed by C, followed by B. The more of a wave pattern the price of the commodity follows, the higher a ranking it should have.
Does have any suggestions for an algorithm that could do this?
Thanks!
My Approach looks something like this.
For my algorithm, I am considering the above example.
A: 1 -> 2 -> 3 -> 2 -> 1 -> 3 -> 4 -> 2 -> 1
B: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 2
C: 1 -> 2 -> 3 -> 4 -> 5 -> 4 -> 3 -> 2-> 1
Now I will squash these list, by squash i mean taking the start value and end value of an increasing or decreasing sequence.
So, after squashing the list will look something like this.
A: 1 -> 3 -> 1 -> 4 -> 1
B: 1 -> 8 -> 2
C: 1 -> 5 -> 1
Now once this it done, I take the difference between i and i+1 element and then take the average and based on the average, I give them the rank.
So the difference between i and i+1 element will look something like this
2 2 3 3
A: 1 --> 3 --> 1 --> 4 --> 1
7 6
B: 1 --> 8 --> 2
4 4
C: 1 --> 5 --> 1
Now let's sum this difference and take the average.
A: (2+2+3+3)/4 = 2.5
B: (7+6)/2 = 6.5
C: (4+4)/2 = 4
Now we can assign ranks based on this average value where
A < C < B
Hope this helps!

Minimum Hamiltonian path length using brute force approach

Assumption At least one Hamiltonian path exists in the graph. I am trying to find minimum path length among all Hamiltonian paths.
My approach
Let us say we have three nodes.
Possible paths are
1 -> 2 -> 3
1 -> 3 -> 2
2 -> 1 -> 3
2 -> 3 -> 1
3 -> 1 -> 2
3 -> 2 -> 1
Find path length of all tracks and take minimum among them. Time complexity of this approach will be O(N*(N!)) where N = #nodes
I am getting the wrong answer with this approach. Is the above approach correct? Please help.

Ilustrate the left-most derivation on a token stream

I am trying to understand the left-most derivation in the context of LL parsing algorithm. This link explains it from the generative perspective. i.e. It shows how to follow left-most derivation to generate a specific token sequence from a set of rules.
But I am thinking about the opposite direction. Given a token stream and a set of grammar rules, how to find the proper steps to apply a set of rules by the left-most derivation?
Let's continue to use the following grammar from the aforementioned link:
And the given token sequence is: 1 2 3
One way is this:
1 2 3
-> D D D
-> N D D (rewrite the *left-most* D to N according to the rule N->D.)
-> N D (rewrite the *left-most* N D to N according to the rule N->N D.)
-> N (same as above.)
But there are other ways to apply the grammar rules:
1 2 3 -> D D D -> N D D -> N N D -> N N N
OR
1 2 3 -> D D D -> N D D -> N N D -> N N
But only the first derivation ends up in a single non-terminal.
As the token sequence length increase, there can be many more ways. I think to infer a proper deriving steps, 2 prerequisites are needed:
a starting/root rule
the token sequence
After giving these 2, what's the algorithm to find the deriving steps? Do we have to make the final result a single non-terminal?
The general process of LL parsing consists of repeatedly:
Predict the production for the top grammar symbol on the stack, if that symbol is a non-terminal, and replace that symbol with the right-hand side of the production.
Match the top grammar symbol on the stack with the next input symbol, discarding both of them.
The match action is unproblematic but the prediction might require an oracle. However, for the purposes of this explanation, the mechanism by which the prediction is made is irrelevant, provided that it works. For example, it might be that for some small integer k, every possible sequence of k input symbols is only consistent with at most one possible production, in which case you could use a look-up table. In that case, we say that the grammar is LL(k). But you could use any mechanism, including magic. It is only necessary that the prediction always be accurate.
At any step in this algorithm, the partially-derived string is the consumed input appended with the stack. Initially there is no consumed input and the stack consists solely of the start symbol, so that the the partially-derived string (which has had 0 derivations applied). Since the consumed input consists solely of terminals and the algorithm only ever modifies the top (first) element of the stack, it is clear that the series of partially-derived strings constitutes a leftmost derivation.
If the parse is successful, the entire input will be consumed and the stack will be empty, so the parse results in a leftmost derivation of the input from the start symbol.
Here's the complete parse for your example:
Consumed Unconsumed Partial Production
Input Stack input derivation or other action
-------- ----- ---------- ---------- ---------------
N 1 2 3 N N → N D
N D 1 2 3 N D N → N D
N D D 1 2 3 N D D N → D
D D D 1 2 3 D D D D → 1
1 D D 1 2 3 1 D D -- match --
1 D D 2 3 1 D D D → 2
1 2 D 2 3 1 2 D -- match --
1 2 D 3 1 2 D D → 3
1 2 3 3 1 2 3 -- match --
1 2 3 -- -- 1 2 3 -- success --
If you read the last two columns, you can see the derivation process starting from N and ending with 1 2 3. In this example, the prediction can only be made using magic because the rule N → N D is not LL(k) for any k; using the right-recursive rule N → D N instead would allow an LL(2) decision procedure (for example,"use N → D N if there are at least two unconsumed input tokens; otherwise N → D".)
The chart you are trying to produce, which starts with 1 2 3 and ends with N is a bottom-up parse. Bottom-up parses using the LR algorithm correspond to rightmost derivations, but the derivation needs to be read backwards, since it ends with the start symbol.

merging linear lists - reconstruct railway network

I need to reconstruct the sequence of stations in a railway network from the sequences of single trips requested from a arbitrary station. There's no direction given in the data. But every request returns an terminal stop. The sequences of single trips can have gaps.
The (end-) result is always a linear list - forking is not allowed.
For example:
Result trips from requested station "4" :
4 - 3 - 2 - 1
4 - 1
4 - 5 - 6
4 - 8 - 9
4 - 6 - 7 - 8 - 9
manually reordered:
1 - 2 - 3 - 4
1 - 4
- 4 - 5 - 6
- 4 - 8 - 9
- 4 - 6 - 7 - 8 - 9
After merging result should be:
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9
start/stop: 1, 9
Is there an algorithm to calculate the resulting "rope of pearls" list? I tried to figure it out with perls graph-module, but no luck. My books on algorithms doesn't help either.
I think, there are pathologic cases, where multiple solutions are possible, depending on input data.
Maybe someone has an idea to solve it!
As you see in the answers, there is more than one solution. So here's a real-world dataset:
2204236 -> 2200007 -> 2200001
2204236 -> 2203095 -> 2203976 -> 2200225 -> 2200007 -> 2200001
2204236 -> 2204805 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2202110 -> 2202026
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2202508 -> 2202110 -> 2202026 -> 3011047 -> 3011048 -> 3011049
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2202110 -> 2202352 -> 2202026
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2209637 -> 2202110
solution of the example data with perl:
use Graph::Directed;
use Graph::Traversal::DFS;
my $g = Graph::Directed->new;
$g->add_path(1,2,3,4);
$g->add_path(1,4);
$g->add_path(4,5,6);
$g->add_path(4,8,9);
$g->add_path(4,6,7,8,9);
print "The graph is $g\n";
my #topo = $g->toposort;
print "g toposorted = #topo\n";
Output
> The graph is 1-2,1-4,2-3,3-4,4-5,4-6,4-8,5-6,6-7,7-8,8-9
> g toposorted = 1 2 3 4 5 6 7 8 9
Using the other direction
$g->add_path(4,3,2,1);
$g->add_path(4,1);
$g->add_path(4,5,6);
$g->add_path(4,8,9);
$g->add_path(4,6,7,8,9);
reveals the second solution
The graph is 2-1,3-2,4-1,4-3,4-5,4-6,4-8,5-6,6-7,7-8,8-9
g toposorted = 4 3 2 1 5 6 7 8 9
Treat the lists node links in a graph. 4-3-2-1 should mean 4 must come before 3, 3 before 2 and 2 before 1. So add arcs from 4 to 3, 3 to 2, 2 to 1.
Once you have all of those you run a topological sort(look it up on wikipedia) on the resulting graph. This will guarantee that the order you get will always respect the partial orderings you are given.
The only case when you are not going to find a solution is when the data is contradicting itself (if you have 4-3-2 and 4-2-3 there's no possible ordering).
You are right, there are multiple cases. Another good solution is 4-5-6-7-8-9-3-2-1, for your example.
Terminal stop station is articulation node and it splits graph into multiple partitions: all nodes inside partition are reachable from one another, nodes in different partitions are reachable only via known terminal stop station. Number of partitions is 2 in your example, but may be much larger, e.g. consider star-like structure 1 - 2, 1 - 3, 1 - 4, 1 - 5.
First of all you need to enumerate partitions. You treat your graph as undirected graph and run DFS from stop station in each of directions. At first run you discover partition #1, at second run partition #2 and so on.
Then you treat you graph as directed with stop station as root node for all partitions and run topological sorting (TS) for each of partitions.
Possible outcomes:
TS for one of partitions fails. This means there is no solution.
Number of partitions is one and TS for it succeeds. Solution is unique.
Number of partitions is more than one and TS succeeds for all of them. This means there are multiple solutions. To get any single valid result, you choose some partition and declare that it contains another terminal station. All other partitions are inserted into the first one in between arbitrary pair of nodes.

Resources