Divide Set of numbers to sequence? Find General Term? - algorithm

How can we divide set of numbers to sequence? And find the general term?
1 - numbers are always in order
2 - if we have n numbers n/2 numbers are always present
For example we have:
Input: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
Output--> 2*X, x=[0..15]
OR
Input: 0,2,4,5,6,8,10,12,14,15,16,18,20,22,24,26,28,30
Divide into two set
A: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
B: 5,10,15,20
Output--> 2*X, x=[0..15] AND 5*X, x=[1..4]
I think this is very difficult, any comments?
What computer field or algorithm can help me?

The problem as I understand it is this: Given a sequence of numbers, find the set of sequences that start from zero and increase by a constant multiple which cover this set.
Here's a general outline of what I would do:
I would make a list of all the numbers in the set, and iterate through starting from the first two elements to generate all of the possible sets meeting your criteria which are here. If you encounter an element in the list, you can remove it from consideration as a generating number since any list with that number as a constant multiple is a subset of a list you've encountered before. When you are done you will have a list of possible sets you can use to cover that set. FOR EXAMPLE:
0,2,4,5,6,8,10,12,14,15,16,18,20,22,24,26,28,30
We will start with 0 and 2. We'll look for elements that are successively 2 larger and remove them from the list of elements that will be considered as possible multiples. Once we find a multiple of 2 that's not in this list, we'll stop generating. Here we go:
s(2) = [0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30]
Which leaves:
[5,15]
as the two potential other candidates. Do you see that any of the elements, eg, 4, which are divisible by two will make subsets of that list and thus don't need to be considered?
The remaining list in the set will start at 0 and increase by 5, our smallest element:
[0,5,10,15,20]
(Remember we are checking the original list for these multiples and not the truncated list- the truncated list is only the list of remaining candidates. When the candidate list is empty we know we will have found all of the sets which are contained in this set who have no supersets.
For a more complex example:
[0 2 3 4 5 6 7 8 9 10 12 13 14 15]
We'll start with:
[0 2 4 6 8 10 12 14]
Which leaves
[3 5 7 9 13 15]
as candidates, which in turn generates:
[0 3 6 9 12 15]
which leaves
[5 7 13]
which generates
[0 5 10 15]
which leaves
[7 13]
which generates
[0 7 14]
which leaves
[13]
which generates
[0 13].
The total combination of sets is:
[0 2 4 6 8 10 12 14]
[0 3 6 9 12 15]
[0 5 10 15]
[0 7 14]
[0 13].
At this point, you have the smallest list of all of the sets needed to cover your set. It should be trivial to generate the proper [0,1...n]/a*n descriptors from here.

Related

Shell sort algorithm: shift or swap?

I read that Shell algorithm is an improved version of insertion sort, but I also read online sometimes it is about shifting, and sometimes I read it is about swapping, which one is correct?
For example: [5 3 2 10 0]
If we take the gap to be 2, then we will compare first 5 and 2, as a first step then, the result will be:
[2 3 5 10 0] by swapping and [2 5 3 10 0] by shifting, which one is Shell algorithm?
If we take the gap to be 2, then we will compare first 5 and 2, as a first step then, the result will be: [2 3 5 10 0] by swapping and [2 5 3 10 0] by shifting, which one is Shell algorithm?
The main principle in Shell sort is that with the chosen gap we look at the data as a collection of interleaved, shorter arrays. Each of those shorter arrays has their first entry at an index less than gap. These shorter arrays are sorted independently. Once that is done, the gap is reduced.
In the example, there are two interleaved arrays, which we can picture like this:
interleaved array: 5 2 0
interleaved array: 3 10
The first algorithm would fit under Shell sort. But the second one does not sort the interleaved arrays independently, as such rotations (shifts) move values from one interleaved array to another:
interleaved array: 5 🠔 2 0
⬊ ⬈
interleaved array: 3 10
...resulting in:
interleaved array: 2 3 0
interleaved array: 5 10
Unless other precautions are made, the second algorithm will not ensure a rotation improves the situation. For instance, if the input is [3 1 2 4] and gap is 2, then the comparison of 3 and 2 will lead to a rotation, and we get [2 3 1 4]. But now we still have two values in the first interleaved array that are not in order (2 is greater than 1).
Shifting?
Shifting does not occur like you depicted it (crossing multiple interleaved arrays), but within one interleaved array it is generally done, just like it is done in insertion sort. So to apply that to your example:
interleaved array: 5 2 0
interleaved array: 3 10
The value 2 is picked up and preceding values are shifted forward within the same interleaved array until the right slot is found for the picked up value. In this case only one value is shifted (5), which makes it a swap:
interleaved array: 2 5 0
interleaved array: 3 10
Now 0 is picked up, and two values are shifted (2 and 5):
interleaved array: 0 2 5
interleaved array: 3 10
Now the first interleaved array is sorted. The second interleaved array happens to be sorted already. Then the gap is reduced to 1:
array: 0 3 2 10 5
Here 2 is picked up and one value (3) is shifted:
array: 0 2 3 10 5
Finally 5 is picked up and one value (10) is shifted:
array: 0 2 3 5 10

Neighbors in the matrix - algorithm

I have a problem with coming up with an algorithm for the "graph" :(
Maybe one of you would be so kind and direct me somehow <3
The task is as follows:
We have a board of at least 3x3 (it doesn't have to be a square, it can be 4x5 for example). The user specifies a sequence of moves (as in Android lock pattern). The task is to check how many points he has given are adjacent to each other horizontally or vertically.
Here is an example:
Matrix:
1 2 3 4
5 6 7 8
9 10 11 12
The user entered the code: 10,6,7,3
The algorithm should return the number 3 because:
10 is a neighbor of 6
6 is a neighbor of 7
7 is a neighbor of 3
Eventually return 3
Second example:
Matrix:
1 2 3
4 5 6
7 8 9
The user entered the code: 7,8,6,3
The algorithm should return 2 because:
7 is a neighbor of 8
8 is not a neighbor of 6
6 is a neighbor of 3
Eventually return 2
Ofc number of operations equal length of array - 1
Sorry for "ile" and "tutaj", i'm polish
If all the codes are unique, use them as keys to a dictionary (with (row/col) pairs as values). Loop thru the 2nd item in user input to the end, check if math.Abs(cur.row-prev.row)+math.Abs(cur.col-prev.col)==1. This is not space efficient but deal with user input in linear complexity.
The idea is you have 4 conditions, one for each direction. Given any matrix of the shape n,m which is made of a sequence of integers AND given any element:
The element left or right will always be + or - 1 to the given element.
The element up or down will always be + or - m to the given element.
So, if abs(x-y) is 1 or m, then x and y are neighbors.
I demonstrate this in python.
def get_neighbors(seq,matrix):
#Conditions
check = lambda x,y,m: np.abs(x-y)==1 or np.abs(x-y)==m
#Pairs of sequences appended with m
params = zip(seq, seq[1:], [matrix.shape[1]]*(len(seq)-1))
neighbours = [check(*i) for i in params]
count = sum(neighbours)
return neighbours, count
seq = [7,8,6,3]
matrix = np.arange(1,10).reshape((3,3))
neighbours, count = get_neighbors(seq, matrix)
print('Matrix:')
print(matrix)
print('')
print('Sequence:', seq)
print('')
print('Count of neighbors:',count)
Matrix:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
Sequence: [10, 6, 7, 3]
Count of neighbors: 3
Another example -
seq = [7,8,6,3]
matrix = np.arange(1,10).reshape((3,3))
neighbours, count = get_neighbors(seq, matrix)
Matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
Sequence: [7, 8, 6, 3]
Count of neighbors: 2
So your input is the width of a table, the height of a table, and a list of numbers.
W = 4, H = 3, list = [10,6,7,3]
There are two steps:
Convert the list of numbers into a list of row/column coordinates (1 to [1,1], 5 to [2,1], 12 to [3,4]).
In the new list of coordinates, find consequent pairs, which have one coordinate identical, and the other one has a difference of 1.
Both steps are quite simple ("for" loops). Do you have problems with 1 or 2?

Random triplet number generator

I would like to create code for a random number generator for predetermined sets of triplets (200 sets in total to randomize). I would like the sets of triplets to form a set of six numbers and the set of triplets to remain unique.
example triplets A = [1 2 3; 4 5 6; 7 8 9, 10 11 12, 13 14 15]; etc
I would like resulting triplet to retain their original sequence
1 2 3 + 4 5 6, 1 2 3 + 7 8 9, 1 2 3 + 10 11 12, 1 2 3 + 13 14 15
I am not a coder, so any help would be appreciated
You want to pick three triplets, keeping them in order. So your first triplet cannot be too close to the end -- there have to be at least two more triplets after it. Similarly, the second triplet you pick needs at least one unpicked triplet after it.
I assume that you have your triplets in an array or similar, numbered 0 to 199.
Pick a random number A in the range 0 to 197. That is the index of your first triplet.
Pick a second random number B in the range (A + 1) to 198. That is the index of your second triplet.
Pick a third random number C in the range (B + 1) to 199. That is the index of your third triplet.
The range of random numbers you pick from is affected by the numbers you have previously picked and the number of picks remaining.

Practical algorithms for permuting external memory

On a spinning disk, I have N records that I want to permute. In RAM, I have an array of N indices that contain the desired permutation. I also have enough RAM to hold n records at a time. What algorithm can I use to execute the permutation on disk as quickly as possible, taking into account the fact that sequential disk access is a lot faster?
I have plenty of excess disk to use for intermediate files, if desired.
This is a known problem. Find the cycles in your permutation order. For instance, given five records to permute [1, 0, 3, 4, 2], you have cycles (0, 1) and (2, 3, 4). You do this by picking an unused starting position; follow the index pointers until you return to your starting point. The sequence of pointers describes a cycle.
You then permute the records with an internal temporary variable, one record long.
temp = disk[0]
disk[0] = disk[1]
disk[1] = temp
temp = disk[2]
disk[2] = disk[3]
disk[3] = disk[4]
disk[4] = temp
Note that you can also perform the permutation as you traverse the pointers. You will also need some method to recall which positions have already been permuted, such as clearing the permutation index (set it to -1).
Can you see how to generalize that?
This is an problem with interval coordination. I'll simplify the notation slightly by changing the memory available to M records -- having upper- and lower-case N is a little confusing.
First, we re-cast the permutations as a series of intervals, the rotational span during which a record needs to reside in RAM. If a record needs to be written to a lower-numbered position, we increase the endpoint by the list size, to indicate the wraparound -- have to wait for the next disk rotation. For instance, using my earlier example, we expand the list:
[1, 0, 3, 4, 2]
0 -> 1
1 -> 0+5
2 -> 3
3 -> 4
4 -> 2+5
Now, we apply standard greedy scheduling resolution. First, sort by endpoint:
[0, 1]
[2, 3]
[3, 4]
[1, 5]
[4, 7]
Now, apply the algorithm for M-1 "lanes"; the extra one is needed for swap space. We fill each lane, appending the interval with the earliest endpoint, whose start-point doesn't overlap:
[0, 1] [2, 3] [3, 4] [4, 7]
[1, 5]
We can do this in a total of 7 "ticks" if M >= 3. If M=2, we defer the second lane by 2 rotations to [11, 15].
Sneftal's nice example gives us more troubles, with deeper overlap:
[0, 4]
[1, 5]
[2, 6]
[3, 7]
[4, 0+8]
[5, 1+8]
[6, 2+8]
[7, 3+8]
This requires 4 "lanes" if available, deferring lanes as needed if M < 5.
The pathological case is where every record in the permutation needs to be copied back one position, such as [3, 0, 1, 2], with M=2.
[0, 3]
[1, 4]
[2, 5]
[3, 6]
In this case, we walk through the deferral cycle multiple times. At the end of every rotation, we have to defer all remaining intervals by one rotation, resulting in
[0, 3] [3, 6] [2+4, 5+4] [1+4+4, 4+4+4]
Does that get you moving, or do you need more detail?
I have an idea, which might need further improvement. But here it goes:
suppose the hdd has the following structure:
5 4 1 2 3
And we want to write out this permutation:
2 3 5 1 4
Since hdd is a circular buffer, and assuming it can only rotate in one direction, we can write the above permutation using shifts as such:
5 >> 2
4 >> 3
1 >> 1
2 >> 2
3 >> 2
So let's put that in an array, and since we know it is a circular array, lets put its mirrors side by side:
| 2 3 1 2 2 | 2 3 1 2 2| 2 3 1 2 2 | 2 3 1 2 2 |... Inf
Since we want to favor sequential reads, (or writes) we can put a cost function to the above series. Let the cost function be linear, i. e:
0 1 2 3 4 5 6 7 8 9 10 ... Inf
Now, let us add the cost function to the above series, but how to select the starting point?
The idea is to select the starting point such that you get the maximum congruent monotonically increasing sequence.
For example, if you select the 0 point to be on "3", you'll get
(1) | - 3 2 4 5 | 6 8 7 9 10 | ...
If you select the 0 point to be on "2", the one just right of "1", you'll get:
(2) | - - - 2 3 | 4 6 5 7 8 | ...
Since we are trying to favor consecutive reads, lets define our read-write function to work as such:
f():
At any currently pointed hdd location, function will read the currently pointed hdd file, into available RAM. (namely, total space - 1, because we want to save 1 for swap)
If no available space is left on RAM for read, the function will assert and program will halt.
At any current hdd location, if ram holds the value that we want to be written in that hdd location, function reads the current file into swap space, writes the wanted value from the ram to hdd, and destroys the value in ram.
If a value is placed into hdd, function will check if the sequence is completed. If it is, program will return with success.
Now, we should note that if the following holds:
shift amount <= n - 1 (n : available memory we can hold)
We can traverse the hard disk in once pass using the above function. For example:
current: 4 5 6 7 0 1 2 3
we want: 0 1 2 3 4 5 6 7
n : 5
We can start anywhere we want, say from the initial "4". We read 4 items sequentially, (n has 4 items now) and we start placing from 0 1 2 3, (we can because n = 5 total, and 4 is used. 1 is used for swap). So the total operations is 4 consecutive reads, and then r-w operations for 8 times.
Using that analogy, it becomes clear that if we subtract "n-1" from equations (1) and (2), the positions which have value "<= 0" will be a better suit for initial position because the ones higher than zero will definitely require another pass.
So we select eq. (2) and subtract, for let's say "n = 3", we subtract 2 from eq. (2):
(2) | - - - 0 1 | 2 4 3 5 6 | ...
Now it is clear that, using f(), and starting from 0, assuming n = 3, we will have a starting operation as such: r, r, r-w, r-w, ...
So, how do we do the rest and find minimum cost? We will place an array with initial minimum cost, just below equation (2). The positions in that array will signify where we want f() to be executed.
| - - - 0 1 | 2 4 3 5 6 | ...
| - - - 1 1 | 1 1 1 1 1 | ...
The second array, the ones with 1's and 0's tell the program where to execute f(). Note that, if we assumed those locations wrong, f() will assert.
Before we start actually placing files into hdd, we of course want to see if the f() positions are correct. We check if there are assertions, we we will try to minimize cost whilst removing all assertions. So, e.g:
(1) 1111000000000000001111
(2) 1111111000000000000000
(1) obviously has higher cost that (2). So the question simplifies on finding the 1-0 array.
Some ideas on finding the best array:
Simplest solution is to write out all 1's and turn assertions into 0's. (essentially it's a skip). This method is guaranteed to work.
Brute force: write an array of as shown in (2) and start shifting 1's to right, in such an order that tries out every permutation available:
1111111100000000
1111111010000000
1111110110000000
...
Full random approach: Plug in mt1997 and start permuting. Whenever you see a sharp drop in cost, stop executing and implement hdd copy-paste. You won't find the global minimum, but you'll get a nice trade-off.
Genetic algorithms: For permutations where "shift count is much lower than n - 1", the methodology provided in this answer should (?) provide a global minimum and smooth gradients. This allows one to use genetic algorithms without relying on mutations too much.
One advantage I find in this approach is that, since OP mentioned that this is a real life problem, the method provides an easy(ier?) way to change cost functions. It is easier to detect the effect of say, having lots of contigous small files to be copied vs. having a single huge file. Or perhaps rrwwrrww is better than rrrrwwww?
Does any of this even make sense? We will have to try out ...

Find the number of non-decreasing and non-increasing subsequences in an array

I am attempting to complete a programming challenge from Quora on HackerRank: https://www.hackerrank.com/contests/quora-haqathon/challenges/upvotes
I have designed a solution that works with some test cases, however, for many the algorithm that I am using is incorrect.
Rather than seeking a solution, I am simply asking for an explanation to how the subsequence is created and then I will implement a solution myself.
For example, with the input:
6 6
5 5 4 1 8 7
the correct output is -5, but I fail to see how -5 is the answer. The subsequence would be [5 5 4 1 8 7] and I cannot for the life of me find a means to get -5 as the output.
Problem Statement
At Quora, we have aggregate graphs that track the number of upvotes we get each day.
As we looked at patterns across windows of certain sizes, we thought about ways to track trends such as non-decreasing and non-increasing subranges as efficiently as possible.
For this problem, you are given N days of upvote count data, and a fixed window size K. For each window of K days, from left to right, find the number of non-decreasing subranges within the window minus the number of non-increasing subranges within the window.
A window of days is defined as contiguous range of days. Thus, there are exactly N−K+1 windows where this metric needs to be computed. A non-decreasing subrange is defined as a contiguous range of indices [a,b], a<b, where each element is at least as large as the previous element. A non-increasing subrange is similarly defined, except each element is at least as large as the next. There are up to K(K−1)/2 of these respective subranges within a window, so the metric is bounded by [−K(K−1)/2,K(K−1)/2].
Constraints
1≤N≤100,000 days
1≤K≤N days
Input Format
Line 1: Two integers, N and K
Line 2: N positive integers of upvote counts, each integer less than or equal to 10^9
Output Format
Line 1..: N−K+1 integers, one integer for each window's result on each line
Sample Input
5 3
1 2 3 1 1
Sample Output
3
0
-2
Explanation
For the first window of [1, 2, 3], there are 3 non-decreasing subranges and 0 non-increasing, so the answer is 3. For the second window of [2, 3, 1], there is 1 non-decreasing subrange and 1 non-increasing, so the answer is 0. For the third window of [3, 1, 1], there is 1 non-decreasing subrange and 3 non-increasing, so the answer is -2.
Given a window size of 6, and the sequence
5 5 4 1 8 7
the non-decreasing subsequences are
5 5
1 8
and the non-increasing subsequences are
5 5
5 4
4 1
8 7
5 5 4
5 4 1
5 5 4 1
So that's +2 for the non-decreasing subsequences and -7 for the non-increasing subsequences, giving -5 as the final answer.

Resources