How to get data in regex in between multiple pair of indexes/positions - apache-nifi

I want to get chunk of data from sentence in between multiple pair of positions.
"There is forest and hills. That place is amazing"
out of this sentence I want data in between the indexes
8 - 13 -> forest
18 - 23 -> hills
so output will be -> forest hills

Related

Find elements of given matrix

You are given an infinite matrix whose upper-left square starts with 1. Here are the first five rows of the infinite matrix :
1 2 9 10 25
4 3 8 11 24
5 6 7 12 23
16 15 14 13 22
17 18 19 20 21
Your task is to find out the number in presents at row x and column y after observing a certain kind of patter present in the matrix
Input Format
The first input line contains an integer t: the number of test cases
After this, there are t lines, each containing integer x and y
For each test, print the number present at xth row and yth column.
sample input
3
2 3
1 1
4 2
sample output
8
1
15
Hint: the numbers at the right and bottom border of a left upper square are consecutive (going either down and left, or right and up). First determine in which border your position is, then find out which direction applies, and finally find the correct number at the position (which easy formula gives you the first number in the border?).

Algorithm to convert the given whole number into the custom binary representation

Problem Statement
I have two different patterns to represent whole numbers using binary digits as follows:
First (the standard decimal to binary conversion):
0 -> 000
1 -> 001
2 -> 010
3 -> 011
4 -> 100
5 -> 101
6 -> 110
7 -> 111
Second:
0 -> 0000
1 -> 0001
2 -> 0010
3 -> 0100
4 -> 1000
5 -> 0011
6 -> 0101
7 -> 1001
8 -> 0110
9 -> 1010
10 -> 1100
11 -> 0111
12 -> 1011
13 -> 1101
14 -> 1110
15 -> 1111
Explaination: First, take flip one bit each, highest priority to the least significant bit. Then in second iteration, keep first bit flipped and then flip the rest bits in order as before. In the third iteration, the least significant bit is unset, and then second significant bit set and then pattern continued.
But now, let's say I want to combine the first and second pattern into one single pattern where some parts are defined by the first patten while the others are defined by the second pattern. For example:
(First pattern, most 3 significant digits)(Second pattern, least 3 significant digits)
0 -> 000000
1 -> 000001
2 -> 000010
3 -> 000100
4 -> 000011
5 -> 000101
6 -> 000110
7 -> 000111
8 -> 001000
9 -> 001001
10 -> 001010
11 -> 001100
12 -> 001011
13 -> 001101
14 -> 001110
15 -> 001111
16 -> 010000
17 -> 010001
...
010111
011000
...
The goal:
Input which tells which group of bits have which of these 2 patterns (no. of bits can be as long as we want and the pattern repetition can keep shifting between groups of bits in this set infinitely); a whole number.
Output which converts the whole number to the corresponding binary representation based on the input pattern. E.g. 12 -> 001011
Where I am stuck:
The binary conversion is straight forward. The second pattern, I'm not really sure how to make. Even if I can convert the whole number to the second pattern binary representation, how could I combine the both types to find the correct binary equivalent for my input number? Since there is a pattern here, I'm sure there must be an elegant mathematical representation for this!
What is my use case?
I'm writing a code for an application where I want to create a search pattern similar to this binary representation, based on a similar type of input.
Assuming you have calculated first pattern and second pattern for a number.
Convert them back to "normal" decimal and then left shift firstNumber by 3 firstPatterNo << 3 and then do OR of 2 numbers firstPatterNo | secondPatterNo.
firstPatterNo = 3 // 011 (normal decimal representations)
secondPatternNo = 4 // 100 (normal decimal representations)
combined = (firstPatterNo << 3) | secondPatternNo
// Convert combined to normal binary representation
BTW whats the logic behind second pattern ? I am not able to figure out.

Find the number of substrings in a string containing equal numbers of a, b, c

I'm trying to solve this problem. Now, I was able to get a recursive solution:
If DP[n] gives the number of beautiful substrings (defined in problem) ending at the nth character of the string, then to find DP[n+1], we scan the input string backward from the (n+1)th character until we find an ith character such that the substring beginning at the ith character and ending at the (n+1)th character is beautiful. If no such i can be found, DP[n+1] = 0.
If such a string is found then, DP[n+1] = 1 + DP[i-1].
The trouble is, this solution gives a timeout on one testcase. I suspect it is the scanning backward part that is problematic. The overall time complexity for my solution seems to be O(N^2). The size of the input data seems to indicate that the problem expects an O(NlogN) solution.
You don't really need dynamic programming for this; you can do it by iterating over the string once and, after each character, storing the state (the relative number of a's, b's and c's that were encountered so far) in a dictionary. This dictionary has maximum size N+1, so the overall time complexity is O(N).
If you find that at a certain point in the string there are e.g. 5 more a's than b's and 7 more c's than b's, and you find the same situation at another point in the string, then you know that the substring between those two points contains an equal number of a's, b's and c's.
Let's walk through an example with the input "dabdacbdcd":
a,b,c
-> 0,0,0
d -> 0,0,0
a -> 1,0,0
b -> 1,1,0
d -> 1,1,0
a -> 2,1,0
c -> 2,1,1 -> 1,0,0
b -> 1,1,0
d -> 1,1,0
c -> 1,1,1 -> 0,0,0
d -> 0,0,0
Because we're only interested in the difference between the number of a's, b'a and c's, not the actual number, we reduce a state like 2,1,1 to 1,0,0 by subtracting the lowest number from all three numbers.
We end up with a dictionary of these states, and the number of times they occur:
0,0,0 -> 4
1,0,0 -> 2
1,1,0 -> 4
2,1,0 -> 1
States which occur only once don't indicate an abc-equal substring, so we can discard them; we're then left with these repetitions of states:
4, 2, 4
If a state occurs twice, there is 1 abc-equal substring between those two locations. If a state occurs 4 times, there are 6 abc-equal substrings between them; e.g. the state 1,1,0 occurs at these points:
dab|d|acb|d|cd
Every substring between 2 of those 4 points is abc-equal:
d, dacb, dacbd, acb, acbd, d
In general, if a state occurs n times, it represents 1 + 2 + 3 + ... + n-1 abc-equal substrings (or easier to calculate: n-1 × n/2). If we calculate this for every count in the dictionary, the total is our solution:
4 -> 3 x 2 = 6
2 -> 1 x 1 = 1
4 -> 3 x 2 = 6
--
13
Let's check the result by finding what those 13 substrings are:
1 d---------
2 dabdacbdc-
3 dabdacbdcd
4 -abdacbdc-
5 -abdacbdcd
6 --bdac----
7 ---d------
8 ---dacb---
9 ---dacbd--
10 ----acb---
11 ----acbd--
12 -------d--
13 ---------d

merging linear lists - reconstruct railway network

I need to reconstruct the sequence of stations in a railway network from the sequences of single trips requested from a arbitrary station. There's no direction given in the data. But every request returns an terminal stop. The sequences of single trips can have gaps.
The (end-) result is always a linear list - forking is not allowed.
For example:
Result trips from requested station "4" :
4 - 3 - 2 - 1
4 - 1
4 - 5 - 6
4 - 8 - 9
4 - 6 - 7 - 8 - 9
manually reordered:
1 - 2 - 3 - 4
1 - 4
- 4 - 5 - 6
- 4 - 8 - 9
- 4 - 6 - 7 - 8 - 9
After merging result should be:
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9
start/stop: 1, 9
Is there an algorithm to calculate the resulting "rope of pearls" list? I tried to figure it out with perls graph-module, but no luck. My books on algorithms doesn't help either.
I think, there are pathologic cases, where multiple solutions are possible, depending on input data.
Maybe someone has an idea to solve it!
As you see in the answers, there is more than one solution. So here's a real-world dataset:
2204236 -> 2200007 -> 2200001
2204236 -> 2203095 -> 2203976 -> 2200225 -> 2200007 -> 2200001
2204236 -> 2204805 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2202110 -> 2202026
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2202508 -> 2202110 -> 2202026 -> 3011047 -> 3011048 -> 3011049
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2202110 -> 2202352 -> 2202026
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2209637 -> 2202110
solution of the example data with perl:
use Graph::Directed;
use Graph::Traversal::DFS;
my $g = Graph::Directed->new;
$g->add_path(1,2,3,4);
$g->add_path(1,4);
$g->add_path(4,5,6);
$g->add_path(4,8,9);
$g->add_path(4,6,7,8,9);
print "The graph is $g\n";
my #topo = $g->toposort;
print "g toposorted = #topo\n";
Output
> The graph is 1-2,1-4,2-3,3-4,4-5,4-6,4-8,5-6,6-7,7-8,8-9
> g toposorted = 1 2 3 4 5 6 7 8 9
Using the other direction
$g->add_path(4,3,2,1);
$g->add_path(4,1);
$g->add_path(4,5,6);
$g->add_path(4,8,9);
$g->add_path(4,6,7,8,9);
reveals the second solution
The graph is 2-1,3-2,4-1,4-3,4-5,4-6,4-8,5-6,6-7,7-8,8-9
g toposorted = 4 3 2 1 5 6 7 8 9
Treat the lists node links in a graph. 4-3-2-1 should mean 4 must come before 3, 3 before 2 and 2 before 1. So add arcs from 4 to 3, 3 to 2, 2 to 1.
Once you have all of those you run a topological sort(look it up on wikipedia) on the resulting graph. This will guarantee that the order you get will always respect the partial orderings you are given.
The only case when you are not going to find a solution is when the data is contradicting itself (if you have 4-3-2 and 4-2-3 there's no possible ordering).
You are right, there are multiple cases. Another good solution is 4-5-6-7-8-9-3-2-1, for your example.
Terminal stop station is articulation node and it splits graph into multiple partitions: all nodes inside partition are reachable from one another, nodes in different partitions are reachable only via known terminal stop station. Number of partitions is 2 in your example, but may be much larger, e.g. consider star-like structure 1 - 2, 1 - 3, 1 - 4, 1 - 5.
First of all you need to enumerate partitions. You treat your graph as undirected graph and run DFS from stop station in each of directions. At first run you discover partition #1, at second run partition #2 and so on.
Then you treat you graph as directed with stop station as root node for all partitions and run topological sorting (TS) for each of partitions.
Possible outcomes:
TS for one of partitions fails. This means there is no solution.
Number of partitions is one and TS for it succeeds. Solution is unique.
Number of partitions is more than one and TS succeeds for all of them. This means there are multiple solutions. To get any single valid result, you choose some partition and declare that it contains another terminal station. All other partitions are inserted into the first one in between arbitrary pair of nodes.

Weekly group assignment algorithm

A friend of mine who who is a teacher has 23 students in a class. They want an algorithm that assigns students in groups of 2 and one group of 3 (handle the odd number of students) across 14 weeks such that no two pairs repeat across the 14 weeks (a pair is assigned to one week).
A brute force approach would be too inefficient, so I was thinking of other approaches, matrix representation sounds appealing, and graph theory. Does anyone have any ideas? The problems that I could find deal only with 1 week and this answer I could quite figure out.
Round-robin algorithm will do the trick i think.
Add the remaining student to the second group and you are done.
First run
1 2 3 4 5 6 7 8 9 10 11 12
23 22 21 20 19 18 17 16 15 14 13
Second run
1 23 2 3 4 5 6 7 8 9 10 11
22 21 20 19 18 17 16 15 14 13 12
...
Another possibility might be graph matching, 14 distinct graph matchings would be needed.
Try to describe the problem in terms of constraints.
Then pass the constraints to a tool like ECLiPSe (not Eclipse), see http://eclipseclp.org/.
In fact, your problem seems similar to that of the Golf example on that site (http://eclipseclp.org/examples/golf.ecl.txt).
Here's an example in Haskell that will produce groups of 14 non-repeating 11-pair-combinations. The value 'pairs' is all combinations of pairs from 1 to 23 (e.g., [1,2], [1,3] etc.). Then the program builds lists where each list is 14 lists of 11 pairs (choosing from the value 'pairs') such that no pair is repeated and no single number is repeated in one list of 11 pairs. It's up to you to simply place the missing last student for each week as you see fit. (It took about three minutes to calculate before it started to output results):
import Data.List
import Control.Monad
pairs = nubBy (\x y -> reverse x == y)
$ filter (\x -> length (nub x) == length x) $ replicateM 2 [1..23]
solve = solve' [] where
solve' results =
if length results == 14
then return results
else solveOne [] where
solveOne result =
if length result == 11
then solve' (result:results)
else do next <- pairs
guard (notElem (head next) result'
&& notElem (last next) result'
&& notElem next results')
solveOne (next:result)
where result' = concat result
results' = concat results
One sample from the output:
[[[12,17],[10,19],[9,18],[8,22],[7,21],[6,23],[5,11],[4,14],[3,13],[2,16],[1,15]],
[[12,18],[11,19],[9,17],[8,21],[7,23],[6,22],[5,10],[4,15],[3,16],[2,13],[1,14]],
[[12,19],[11,18],[10,17],[8,23],[7,22],[6,21],[5,9],[4,16],[3,15],[2,14],[1,13]],
[[15,23],[14,22],[13,17],[8,18],[7,19],[6,20],[5,16],[4,9],[3,10],[2,11],[1,12]],
[[16,23],[14,21],[13,18],[8,17],[7,20],[6,19],[5,15],[4,10],[3,9],[2,12],[1,11]],
[[16,21],[15,22],[13,19],[8,20],[7,17],[6,18],[5,14],[4,11],[3,12],[2,9],[1,10]],
[[16,22],[15,21],[14,20],[8,19],[7,18],[6,17],[5,13],[4,12],[3,11],[2,10],[1,9]],
[[20,21],[19,22],[18,23],[12,13],[11,14],[10,15],[9,16],[4,5],[3,6],[2,7],[1,8]],
[[20,22],[19,21],[17,23],[12,14],[11,13],[10,16],[9,15],[4,6],[3,5],[2,8],[1,7]],
[[20,23],[18,21],[17,22],[12,15],[11,16],[10,13],[9,14],[4,7],[3,8],[2,5],[1,6]],
[[19,23],[18,22],[17,21],[12,16],[11,15],[10,14],[9,13],[4,8],[3,7],[2,6],[1,5]],
[[22,23],[18,19],[17,20],[14,15],[13,16],[10,11],[9,12],[6,7],[5,8],[2,3],[1,4]],
[[21,23],[18,20],[17,19],[14,16],[13,15],[10,12],[9,11],[6,8],[5,7],[2,4],[1,3]],
[[21,22],[19,20],[17,18],[15,16],[13,14],[11,12],[9,10],[7,8],[5,6],[3,4],[1,2]]]
Start off with a set (maybe a bitset mapping to students for less memory consumption) for each student that has all other students in it. Iterate 14 times, each time picking 11 students (for the 11 groups you will form) for whom you will pick partners. For each student, pick a partner they haven't been in a group with yet. For one random student of those 11, pick a second partner, but make sure no student has less remaining partners than there are iterations left. For every pick, adjust the sets.

Resources