Clickhouse - Moving Sum in Arrays

Clickhouse - Moving Sum in Arrays - clickhouse

I am looking at an efficient way to do a Sum of (n forward-looking array elements) in an array.
For e.g.
Input -> [1,2,3,4,5,6,7,8]
Expected Result (for n = 2) -> [3,5,7,9,11,13,15,8]
Similarly if the n=3 then Expected Result -> [6,9,12,15,18,21,15,8]
Is there a way, this can be accomplished in a super-efficient way within CH. Thanks!
Edit: I would like to know whether this can be accomplished without using
arrayReduceInRanges(agg_func, ranges, arr1...).
The CH version (version 19.15.1) we are using doesn't have arrayReduceInRanges

Try this one:
WITH 2 AS n
SELECT
range(1, 9, 1) AS arr,
arrayMap((x, index) -> arraySum(arraySlice(arr, index, n)), arr, arrayEnumerate(arr)) AS result
┌─arr───────────────┬─result───────────────┐
│ [1,2,3,4,5,6,7,8] │ [3,5,7,9,11,13,15,8] │
└───────────────────┴──────────────────────┘

Related

Count subsets of array which qualify min(subset)+max(subset) < k

Was asked this question in an interview, didn't have a better answer than generating all possible subsets.
Example:
a = [4,2,5,7] k = 8
output = 4
[2],[4,2],[2,5],[4,2,5]
Interviewer tried implying sorting the array should help, but I still couldn't figure out a better-than-brute-force solution. Will appreciate your input.

The interviewer implied that sorting the array would help and it does help. I'll try to explain.
Taking the array and k values you stated:
a = [4,2,5,7]
k = 8
Sorting the array will yield:
a_sort = [2,4,5,7]
Now we can consider the following procedure:
set ii = 0, jj = 1
choose a_sort[ii] as a part of your subset
2.1. If 2 * a_sort[ii] >= k, you are done. else, the subset [a_sort[ii]] holds the condition and is a part of the solution.
add a_sort[ii+jj] to your subset
3.1. If a_sort[ii] + a_sort[ii+jj] < k,
3.1.1. the subset [a_sort[ii], a_sort[ii+jj]] holds the condition and is a part of the solution, as well as any subset which consists of any additional number of elements a_sort[kk] where ii< kk < ii+jj
3.1.2. set jj += 1 and go back to step 3.
3.2. else, set ii += 1, jj = ii + 1, go back to step 2
With your input this procedure should return:
[[2], [2,4],[2,5],[2,4,5]]
# [2,7] results in 9 > 8 and therefore you move to [4]
# Note that for [4] subset you get 8 = 8 which is not smaller than 8, we are done
Explenation
if you have a subset of [a_sort[ii]] which does not hold 2 * a_sort[ii] < k, adding additional numbers to the subset will only yield min(subset)+max(subset) > 2 * a_sort[ii] > k and therefore there will not be any additional subsets which hold the wanted condition. Moreover, by setting a subset of [a_sort[ii+1]] will results in 2 * a_sort[ii+1] >= 2 * a_sort[ii] > k` sinse a_sort is sorted. Therefore you will not find any additional subsets.
for jj > ii, if a_sort[ii] + a_sort[ii+jj] < k then you can push any number if members from a_sort into the subset, as long as the index kk will be bigger than ii and lower than ii+jj since a_sort is sorted, and adding these members to the subset will not change the value of min(subset)+max(subset) which will remain a_sort[ii] + a_sort[ii+jj] and we already know that this value is smaller thank k
Getting the count
In case you simply want to the possible subsets, this can be done easier than generating the subsets themselves.
Assuming that for ii > jj the condition holds, i.e. a_sort[ii] + a_sort[ii+jj] < k. If jj = ii + 1 there is an addition of 1 possible subset. If jj > ii + 1 there are jj - ii - 1 additional elements which can be either present not not without a change of the value a_sort[ii] + a_sort[ii+jj]. Therefore there are a total of 2**(jj-ii-1) additional subsets available to add to the solution group (jj-ii-1 elements, each is independently present or not). This also holds for jj = ii + 1 since in this case 2**(jj-ii-1) = 2**0 = 1
Looking at the example above:
[2] adds 1 count
[2,4] adds 1 count (1 = 0 + 1)
[2,5] adds 2 counts (2 = 0 + 2 --> 2 **(2 - 0 - 1) = 2**1 = 2)
A total count of 4

Sort the array
For an element x at index l, do a binary search on the array to get index of the maximum integer in the array which is < k-x. Let the index be r.
For all subsets where min(subset) = x, we can have any element with index in range (l,r]. Number of subsets with min(subset) = x becomes the total number of possible subsets for (r-l) elements, so count = 2^(r-l) (or 0 if r<l).
(Note: in all such subsets, we are fixing x. That's why the range (l,r] isn't inclusive of l)
You have to iterate over the array, use the above process for each element/index to get the count of subsets where our current element is the minimum and the subset satisfies the given constraint. If you find an element with count=0, break the iteration.
This should work with a 0(N*log(N)) complexity, good enough for an interview question imo.
For the given example, sorted array = [2,4,5,7].
For element 2, l=0 and r=2. Count = 2^(2-0) = 4 (covers [2],[4,2],[2,5],[4,2,5]
For element 4, l=1 and r=0. Count = 0, and we break the iteration.

Increase efficiency for following algorithm

Problem:-
input = n
output :-
1 2 3.......n [first row]
2n+1 2n+2 2n+3....3n [second row]
3n+1 3n+2 3n+3...4n [second last row]
n+1 n+2 n+3....2n [last row]
In the problem we have to print a square such that we have 'n' numbers of rows in our square and in every row we have 'n' numbers. We prepare rows from numbers from 1 to square(n) in such way we fill numbers for first row, then last row, second row, second last row and so on.....
for e.g. if n = 4
We start from 1 print upto 4 then print a newline, so our first row is:-
1 2 3 4
Then our last row comes in continuation
5 6 7 8
then our second row will be
9 10 11 12
few examples:
input = 1
output = 1
input = 2
output = 1 2
3 4
input = 3
output = 1 2 3
7 8 9
4 5 6
My Code:
n = int(input().strip())
lines = [i for i in range (1, n + 1)]
line_order1 = []
line_order2 = []
#Reordering lines so we know the staring element of our method
for i in lines:
if(i % 2 == 1):
line_order1.append(i)
else:
line_order2.append(i)
print(line_order1)
print(line_order2)
// Getting the desired order of lines
line_order2.reverse()
line_order1.extend(line_order2)
print(line_order1)
// Now printing the desired square
for l in line_order1:
for i in range (1, n+1):
k = n * (l - 1)
print(k + i, end = " ")
print("\n")
Is there a better way to do this in terms of execution time?

While I see a few minor places you can improve your code, the performance is unlikely to be much better (my suggestions below might not make any performance difference at all). Your code will take O(n**2) time, which is the best you can do, since you need to print out that many numbers to form your square. Even if you combine some of your longer, more verbose steps into more compact versions, they'll can only possibly be better by a constant factor.
My first suggestion is to number the lines from 0 to n-1 instead of from 1 to n. This will save you some effort when you have to calculate how what multiple of n to include in the values for the row. Currently you've got an awkward l - 1 in your calculation that you could skip if you just used zero-indexed numbers for the rows. (Also l is a terrible variable name, since it looks like the digit 1 (one) in some fonts.)
My next suggestion is to simplify your code that builds the order. You don't need three lists, you can do the whole thing with one list that you feed two range objects, each counting up or down by two.
line_order = list(range(0, n, 2)) # count up by twos
line_order.extend(range(n - 1 - n%2, 0, -2)) # count down starting at either n-1 or n-2
Or, if you're willing to use a standard library module, you could import itertools and then use:
line_order = itertools.chain(range(0, n, 2), range(n - 1 - n%2, 0, -2))
The itertools.chain function returns an iterator that yields values from each of its iterable arguments as if they were concatenated together, without making any copies of the data or using significant extra memory. The difference is not likely to be a much here (since the maximum n you can usefully print out is fairly small), but if you were doing something different with the result of this algorithm and n was in the billions it would be very nice to avoid filling a list with that many values.
My last suggestion is to use a range again to generate all the numbers in each row directly, rather than explicitly looping from 1 to n and adding k each time.
for row_num in line_order:
print(*range(n * row_num + 1, n * (rownum + 1) + 1))
You can compute the start and end points with the multiples of n already included, rather than needing to do that in a separate step for each one. You certainly didn't need to be recomputing k as often as you were before. You can pass all the values from the range to print in one go using iterable unpacking syntax (*args).
Note though that unpacking the range that way is sort of the reverse of the previous suggestion regarding itertools.chain. If n is large, using a loop over the range would be more memory efficient, since you won't need all n values to exist in memory at a the same time. Here's what that would look like:
for line_num in line_order:
for value in range(n * row_num + 1, n * (rownum + 1) + 1):
print(value, end=" ")
print()

Finding the lowest sum of values in a list to form a target factor

I'm stuck as to how to make an algorithm to find a combination of elements from a list where the sum of those factors is the lowest possible where the factor of those numbers is a predetermined target value.
For instance a list:
(2,5,7,6,8,2,3)
And a target value:
12
Would result in these factors:
(2,2,3) and (2,6)
But the optimal combination would be:
(2,2,3)
As it has a lower sum

First erase from the list all numbers that aren't factors of n. So in your example your list would reduce to (2, 6, 2, 3). Then I would sort the list. So you have (2, 2, 3, 6). Start multiplying the elements from the left to right if you reach n stop. If you exceed n find the next smallest permutation of your numbers and repeat. This will be (2, 2, 6, 3) (for a C++ function that finds the next permutation see this link). This will guarantee to find the multiplication with the smallest sum because the we are checking the products in order from smallest sum to largest. This runs in the size of your list factorial but I think that is as good as you're going to get. This problem sounds NP hard.
You can do slightly better by pruning the permutations. Lets say you were looking for 24 and your list is (2, 4, 8, 12). The only subset is (2, 12). But the next permutation will be (2, 4, 12, 8) which you don't even need to generate because you knew that 2*4 was too small and 2*4*8 was too big and swapping 12 with 8 only increased 2*4*8. This way you didn't have to test that permutation.

You should be able to break the problem down recursively. You have a multiset of potential factors S = {n_1, n_2, ..., n_k}. Let f(S,n) be the maximum sum n_i_1 + n_i_2 + ... + n_i_j where n_i_l are distinct elements of the multiset and n_i_1 * ... * n_i_j = n. Then f(S,n) = max_i { (n_i + f(S-{n_i},n/n_i)) where n_i divides n }. In other words, f(S,n) can be computed recursively. With a little more work you can get the algorithm to spit out the actual n_is that work. The time complexity could be bad, but you don't say what your goals are in that regard.

def primes(n):
primfac = []
d = 2
while d*d <= n:
while (n % d) == 0:
primfac.append(d) # supposing you want multiple factors repeated
n //= d
d += 1
if n > 1:
primfac.append(n)
return primfac
def get_factors_list(dividend, ceiling = float('infinity')):
""" Yield all lists of factors where the largest is no larger than ceiling """
for divisor in range(min(ceiling, dividend - 1), 1, -1):
quotient, mod = divmod(dividend, divisor)
if mod == 0:
if quotient <= divisor:
yield [divisor, quotient]
for factors in get_factors_list(quotient, divisor):
yield [divisor] + factors
def print_factors(x):
factorList = []
if x > 0:
for factors in get_factors_list(x):
factorList.append(list(map(int, factors)))
return factorList

Here's is how you could do it in Haskell:
import Data.List(sortBy, subsequences)
import Data.Function(on)
lowestSumTargetFactor :: (Ord b, Num b) => [b] -> b -> [b]
lowestSumTargetFactor xs target = do
let l = filter (/= []) $ sortBy (compare `on` sum)
[x | x <- subsequences xs, product x == target]
if l == []
then error $ "lowestSumTargetFactor: " ++
"no subsequence product equals target."
else head l
Here's what is happening:
[x | x <- subsequences xs, product x == target] builds a list made of all subsequences of the list xs whose product equals target. In your example, it would build the list [[2,6],[6,2],[2,2,3]].
Then the sortBy (compareonsum) part sorts that list of list by the sum of it's list elements. It would return the list [[2,2,3],[2,6],[6,2]].
I then filter that list, removing any [] elements because product [] returns 1 (don't know the reasoning for this, yet). This was done because lowestSumTargetFactor [1, 1, 1] 1 would return [] instead of the expected [1].
Then I ask if the list we built is []. If no, I use the function head to return the first element of that list ([2,2,3] in your case). If yes, it returns the error as written.
Obs1: where it appears above, the $ just means that everything after it is enclosed in parentheses.
Obs2: the lowestSumTargetFactor :: (Ord b, Num b) => [b] -> b -> [b] part is just the function's type signature. It means that the function takes a list made of bs, a second argument b and returns another list made of bs, b being a member of both the Ord class of totally ordered datatypes, and the Num class, the basic numeric class.
Obs3: I'm still a beginner. A more experienced programmer would probably do this much more efficiently and elegantly.

find elements summing to s in an array

given an array of elements (all elements are unique ) , given a sum
s find all the subsets having sum s.
for ex array {5,9,1,3,4,2,6,7,11,10}
sum is 10
possible subsets are {10}, {6,4}, {7,3}, {5,3,2}, {6,3,1} etc.
there can be many more.
also find the total number of these subsets.
please help me to solve this problem..

It is a famous backtracking problem which can be solved by recursion. Basically its a brute force approach in which every possible combination is tried but 3 boundary conditions given at least prune the search.
Here is algorithm:
s variable for the sum of elements selected till now.
r variable for the overall sum of the remaining array.
M is the sum required.
k is index starting with 0
w is array of given integers
Sum(k,s,r)
{
x[k]:=1; //select the current element
if(s<=M & r>=M-s & w[k]<=M-s)
then
{
if(s+w[k]==M)
then output all i [1..k] that x[i]=1
else
sum(k+1,s+w[k],r-w[k])
}
x[k]:=0 //don't select the current element
if(s<=M) & (r>=M-s) & (w[k]<=M-s)
then
{
if (M==s)
then output all i [1..k] that x[i]=1
else
sum(k+1,s,r-w[k])
}
}
I am using an array "x" to mark the candidate numbers selected for solution. At each step 3 boundary conditions are checked:
1. Sum of selected elements in "x" from "w" shouldn't exceed M. s<M.
2. Remaining numbers in array should be able to complete M. r>=M-s.
3. Single remaining value in w shouldn't overflow M. w[k]<=M-s.
If any of the condition is failed, that branch is terminated.

Here's some python code doing what you want. It makes extensive use of itertools so to understand it you might want to have a look at the itertools docs.
>>> import itertools
>>> vals = (5,9,1,3,4,2,6,7,11,10)
>>> combos = itertools.chain(*((x for x in itertools.combinations(vals, i) if sum(x) == 10) for i in xrange(len(vals)+1)))
>>> for c in combos: print c
...
(10,)
(9, 1)
(3, 7)
(4, 6)
(5, 1, 4)
(5, 3, 2)
(1, 3, 6)
(1, 2, 7)
(1, 3, 4, 2)
What it does is basically this:
For all possible subset sizes - for i in xrange(len(vals)+1), do:
Iterate over all subsets with this size - for x in itertools.combinations(vals, i)
Test if the sum of the subset's values is 10 - if sum(x) == 10
In this case yield the subset
For each subset size another generator is yielded, so I'm using itertools.chain to chain them together so there's a single generator yielding all solutions.
Since you have only a generator and not a list, you need to count the elements while iterating over it - or you could use list(combos) to put all values from the generator into a list (this consumes the generator, so don't try iterating over it before/after that).

Since you don't say if it's homework or not, I give only some hints:
let nums be the array of numbers that you can use (in your example nums = {5,9,1,3,4,2,6,7,11,10})
let targetSum be the sum value you're given (in your example targetSum = 10)
sort nums: you don't want to search for solutions using elements of nums that are bigger of your targetSum
let S_s be a set of integers taken from nums whose sum is equal to s
let R_s be the set of all S_s
you want to find R_s (in your example R_10)
now, assume that you have a function find(i, s) which returns R_s using the the sub-array of nums starting from position i
if nums[i] > s you can stop (remember that you have previously sorted nums)
if nums[i] == s you have found R_s = { { nums[i] } }, so return it
for every j in [1 .. nums.length - 1] you want to compute R_s' = find(i + j, targetSum - nums[i]), then add nums[i] to every set in R_s', and add them to your result R_s
solve your problem by implementing find, and calling find(0, 10)
I hope this helps

Slow tail recursion in F#

I have an F# function that returns a list of numbers starting from 0 in the pattern of skip n, choose n, skip n, choose n... up to a limit. For example, this function for input 2 will return [2, 3, 6, 7, 10, 11...].
Initially I implemented this as a non-tail-recursive function as below:
let rec indicesForStep start blockSize maxSize =
match start with
| i when i > maxSize -> []
| _ -> [for j in start .. ((min (start + blockSize) maxSize) - 1) -> j] # indicesForStep (start + 2 * blockSize) blockSize maxSize
Thinking that tail recursion is desirable, I reimplemented it using an accumulator list as follows:
let indicesForStepTail start blockSize maxSize =
let rec indicesForStepInternal istart accumList =
match istart with
| i when i > maxSize -> accumList
| _ -> indicesForStepInternal (istart + 2 * blockSize) (accumList # [for j in istart .. ((min (istart + blockSize) maxSize) - 1) -> j])
indicesForStepInternal start []
However, when I run this in fsi under Mono with the parameters 1, 1 and 20,000 (i.e. should return [1, 3, 5, 7...] up to 20,000), the tail-recursive version is significantly slower than the first version (12 seconds compared to sub-second).
Why is the tail-recursive version slower? Is it because of the list concatenation? Is it a compiler optimisation? Have I actually implemented it tail-recursively?
I also feel as if I should be using higher-order functions to do this, but I'm not sure exactly how to go about doing it.

As dave points out, the problem is that you're using the # operator to append lists. This is more significant performance issue than tail-recursion. In fact, tail-recursion doesn't really speed-up the program too much (but it makes it work on large inputs where the stack would overflow).
The reason why you'r second version is slower is that you're appending shorter list (the one generated using [...]) to a longer list (accumList). This is slower than appending longer list to a shorter list (because the operation needs to copy the first list).
You can fix it by collecting the elements in the accumulator in a reversed order and then reversing it before returning the result:
let indicesForStepTail start blockSize maxSize =
let rec indicesForStepInternal istart accumList =
match istart with
| i when i > maxSize -> accumList |> List.rev
| _ ->
let acc =
[for j in ((min (istart + blockSize) maxSize) - 1) .. -1 .. istart -> j]
# accumList
indicesForStepInternal (istart + 2 * blockSize) acc
indicesForStepInternal start []
As you can see, this has the shorter list (generated using [...]) as the first argument to # and on my machine, it has similar performance to the non-tail-recursive version. Note that the [ ... ] comprehension generates elements in the reversed order - so that they can be reversed back at the end.
You can also write the whole thing more nicely using the F# seq { .. } syntax. You can avoid using the # operator completely, because it allows you to yield individual elemetns using yield and perform tail-recursive calls using yield!:
let rec indicesForStepSeq start blockSize maxSize = seq {
match start with
| i when i > maxSize -> ()
| _ ->
for j in start .. ((min (start + blockSize) maxSize) - 1) do
yield j
yield! indicesForStepSeq (start + 2 * blockSize) blockSize maxSize }
This is how I'd write it. When calling it, you just need to add Seq.toList to evaluate the whole lazy sequence. The performance of this version is similar to the first one.
EDIT With the correction from Daniel, the Seq version is actually slightly faster!

In F# the list type is implemented as a singly linked list. Because of this you get different performance for x # y and y # x if x and y are of different length. That's why your seeing a difference in performance. (x # y) has running time of X.length.
// e.g.
let x = [1;2;3;4]
let y = [5]
If you did x # y then x (4 elements) would be copied into a new list and its internal next pointer would be set to the existing y list. If you did y # x then y (1 element) would be copied into a new list and its next pointer would be set to the existing list x.
I wouldn't use a higher order function to do this. I'd use list comprehension instead.
let indicesForStepTail start blockSize maxSize =
[
for block in start .. (blockSize * 2) .. (maxSize - 1) do
for i in block .. (block + blockSize - 1) do
yield i
]

This looks like the list append is the problem. Append is basically an O(N) operation on the size of the first argument. By accumulating on the left, this operation takes O(N^2) time.
The way this is typically done in functional code seems to be to accumulate the list in reverse order (by accumulating on the right), then at the end, return the reverse of the list.
The first version you have avoids the append problem, but as you point out, is not tail recursive.
In F#, probably the easiest way to solve this problem is with sequences. It is not very functional looking, but you can easily create an infinite sequence following your pattern, and use Seq.take to get the items you are interested in.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Clickhouse - Moving Sum in Arrays - clickhouse

Related

Count subsets of array which qualify min(subset)+max(subset) < k

Increase efficiency for following algorithm

Finding the lowest sum of values in a list to form a target factor

find elements summing to s in an array

Slow tail recursion in F#

Categories

Resources