Logic: Applying gravity to a vector - algorithm

There is a method called gravity(Vector[] vector) The vector contains sequence of numbers. The gravity function should return a new vector after applying gravity which is explained below.
Assume 0's are air and 1's are brick. When gravity is applied the bricks should fall down to the lowest level.
Let vector = [3, 7, 8]
Converting this to binary we get:
0 0 1 1 for 3
0 1 1 1 for 7
1 0 0 0 for 8
Applying gravity:
0 0 0 0 which is 0
0 0 1 1 which is 3
1 1 1 1 which is 15
So the gravity function should return [0, 3, 15].
Hope you people understood the explanation. I tried a lot but I couldn't figure out the logic for this. One thing I observed was the sum of the numbers in the vector before and after applying gravity remains same.
That is,
3 + 7 + 8 = 18 = 0 + 3 + 15 for the above case.

I think it is as simple as counting the total '1' bit of each position...
Let N be the input vector size, b be the longest binary length of the input elements
Pre-compute the total # of '1' bit of each position, stored in count[], O(N*b)
Run Gravity Function, that is, to regenerate N numbers from the count[], O(N*b)
Total run time is O(N*b)
Below is the sample code in C++
using namespace std;
int v[5] = {3,9,7,8,5};
int cnt[5] = {0};
vector<int> ans;
vector<int> gravity(){
vector<int> ret;
for(int i=0; i<5;i++){
int s = 0;
for(int j=0; j<5;j++)
s += (1<<j); cnt[j]--;
return ret;
int main(){
// precompute sum of 1 of each bit
for(int i=0, j=0, tmp=v[i]; i<5; i++, j=0, tmp=v[i]){
if(tmp&1) cnt[j]++;
tmp >>= 1; j++;
ans = gravity();
for(int i=ans.size()-1; i>=0; i--) printf("%d ", ans[i]);
return 0;
The output is as follows:
Success time: 0 memory: 3272 signal:0
0 1 1 15 15

Start at the bottom. Any bricks in the row on top of that one will fall down except where there is already a brick on the bottom. So, the new bottom row is:
bottom_new = bottom_old OR top_old
The new top is:
top_new = bottom_old AND top_old
That is, there will be a brick in the new bottom row if there was a brick in either row, but there's only going to be a brick in the new top row if there was a brick in both rows.
Then you just work your way up the stack, with the new top row becoming the old bottom row for the next step.

The only solution I can think of so far uses nested for loops:
v is the input vector of N integers
D is the number of digits in each integer
c keeps track of the bottom-most free space where a brick can fall
The algorithm checks if the ith bit in the number n is set using (n & (1<<i)), which works in most C-like languages.
The algorithm in C:
for (int j=0; j<D; ++j)
int bit = 1<<j;
int c = N-1;
for (int i=N-1; i>=0; --i)
if (v[i] & bit) { // if bit j of number v[i] is set...
v[i] ^= bit; // set bit j in the number i to 0 using XOR
v[c] ^= bit; // set bottom-most bit in the number i to 1 using XOR
c -= 1; //increment by bottom row 1
If N is small and known it advance, you could work out the truth tables for the values of each digit and get the correct result using only bitwise operations and no loops.

So I found a solution which needs recursion I guess. Though I don't know the condition to stop the recursion.
The vector v = [3, 7, 8] is very simple that its not possible to explain why recursion is required so am considering a new vector v = [3, 9, 7, 8, 5]
In binary form :
0 0 1 1 - a4
1 0 0 1 - a3
0 1 1 1 - a2
1 0 0 0 - a1
0 1 0 1 - a0
Iteration 1 :
0 0 0 0 - b7 (b7 = a4 AND b5)
0 0 1 1 - b6 (b6 = a4 OR b5)
0 0 0 0 - b5 (b5 = a3 AND b3) ignore this
1 0 0 1 - b4 (b4 = a3 OR b3)
0 0 0 0 - b3 (b3 = a2 AND b1) ignore this
0 1 1 1 - b2 (b2 = a2 OR b1)
0 0 0 0 - b1 (b1 = a0 AND a1) ignore this
1 1 0 1 - b0 (b0 = a0 OR a1)
Intermediate vector = [b7, b6, b4, b2, b0] = [0, 3, 9, 7, 13]
Iteration 2 :
0 0 0 0 - c7 (c7 = b4 AND c5)
0 0 0 1 - c6 (c6 = b4 OR c5)
0 0 0 1 - c5 (c5 = b3 AND c3) ignore this
0 0 1 1 - c4 (c4 = b3 OR c3)
0 0 0 1 - c3 (c3 = b2 AND c1) ignore this
1 1 0 1 - c2 (c2 = b2 OR c1)
0 1 0 1 - c1 (c1 = b0 AND b1) ignore this
1 1 1 1 - c0 (c0 = b0 OR b1)
Intermediate vector = [c7, c6, c4, c2, c0] = [0, 1, 3, 13, 15]
Iteration 3 :
0 0 0 0 - d7 (d7 = c4 AND d5)
0 0 0 1 - d6 (d6 = c4 OR d5)
0 0 0 1 - d5 (d5 = c3 AND d3) ignore this
0 0 0 1 - d4 (d4 = c3 OR d3)
0 0 0 1 - d3 (d3 = c2 AND d1) ignore this
1 1 1 1 - d2 (d2 = c2 OR d1)
1 1 0 1 - d1 (d1 = c0 AND c1) ignore this
1 1 1 1 - d0 (d0 = c0 OR c1)
Resultant vector = [d7, d6, d4, d2, d0] = [0, 1, 1, 15, 15]
I got this solution by going backwards through the vector.
Another solution:
Construct a multidimensional array with all the bits of all the elements in the vector (i.e) if v = [3,7,8] then construct a 3x4 array and store all the bits.
Count the number of 1's in each column and store the count.
Fill each column with count number of 1's starting from the bottom bit.
This approach is simple but requires construction of large matrices.


Efficient code to load AVX vectors for 1D convolution kernel of length 8

An implementation of a 1D convolution operation will often need to load a vectors of data that sequentially step through a buffer of data offset by one element each iteration.
For example, consider a buffer of input data X[0], X[1], ..., X[n-1], where n is greater than twice the kernel length. If the convolution length is three, and we can fit eight elements in each vector, we might first want a vector with X[0], X[1], ..., X[7], then the next with X[1], X[2], ..., X[8] and the last with X[2], X[3], ..., X[9].
Consider the case where the kernel length as well as the vector length is 8. We must load eight vectors, that might look sequentially like this:
{ 0 1 2 3 4 5 6 7 }
{ 1 2 3 4 5 6 7 8 }
{ 2 3 4 5 6 7 8 9 }
{ 3 4 5 6 7 8 9 10 }
{ 4 5 6 7 8 9 10 11 }
{ 5 6 7 8 9 10 11 12 }
{ 6 7 8 9 10 11 12 13 }
{ 7 8 9 10 11 12 13 14 }
By reducing this sequence vertically, we could produce a running mean or sum. I.e., the sum of these vectors will have the sum of the first 8 elements in it's first position.
Consider that the order of the elements in the column does not matter. Any permutation of the elements in each column will still produce the same result. For a convolution, this permutation can be accounted for by altering the order of the constants used in the kernel.
Is there a faster way to load these vectors that takes advantage of this? Consider as a baseline the simple sequence of unaligned loads:
// Any sort of sliding window function, i.e. running mean, running max, convolution, etc.
void sliding_window(const float* input, unsigned length)
for (unsigned i = 0; i < length - 7; i += 8) {
for (unsigned j = 0; i < 8; j++) {
__m256 v = _mm256_loadu_ps(input[i + j]);
// reduction operation on v (e.g. max or fmadd) goes here
// handle tail here
First of all, you should note that if your convolution is separable, this is very often worth doing. Simple example:
res[i] = x[i]+x[i+1]+x[i+2]+x[i+3]+x[i+4]+x[i+5]+x[i+6]+x[i+7];
This can be done by convoluting with [1 1] * [1 0 1] * [1 0 0 0 1] in three steps, for example like so:
void sliding_window(float* output, const float* input, size_t length)
// Nomenclature
// aX input at i+X
// bX convolution with [1 1] starting at i+X
// cX convolution with [1 1] * [1 0 1] starting at i+X
// dX convolution with [1 1] * [1 0 1] * [1 0 0 0 1] starting at i+X
__m256 a0 = _mm256_load_ps(input), a8 = _mm256_load_ps(input + 8);
__m256 b0 = _mm256_add_ps(a0, _mm256_loadu_ps(input+1)), b8 = _mm256_add_ps(a8, _mm256_loadu_ps(input+9));
__m256 b4 = _mm256_permute2f128_ps(b0, b8, 1+16*2);
__m256 b2 = _mm256_shuffle_ps(b0, b4, 2+3*4+0*16+1*64);
__m256 c0 = _mm256_add_ps(b0, b2);
for (unsigned i = 0; i < length - 25; i += 8) {
// Convolute input with [1 1]
__m256 a16 = _mm256_load_ps( input + i + 16);
__m256 a17 = _mm256_loadu_ps(input + i + 17);
__m256 b16 = _mm256_add_ps(a16, a17);
// Convolute first convolution with [1 0 1]
__m256 b12 = _mm256_permute2f128_ps(b8, b16, 1+16*2);
__m256 b10 = _mm256_shuffle_ps(b8, b12, 2+3*4+0*16+1*64);
__m256 c8 = _mm256_add_ps(b8, b10);
// Convolute second convolution with [1 0 0 0 1]
__m256 c4 = _mm256_permute2f128_ps(c0, c8, 1+16*2);
__m256 d0 = _mm256_add_ps(c0, c4);
// Store result
_mm256_store_ps(output + i, d0);
// rename registers for next iteration:
b8 = b16;
c0 = c8;
// handle tail here ...
You can of course replace addps by maxps. Godbolt-Demo: https://godbolt.org/z/W9K9o943o
Overall, this takes 1 aligned + 1 unaligned load, 3 shuffles, 3 additions and 1 store for 8 elements (actually only using AVX1). On Intel CPUs with only 1 shuffle per cycle this may actually just be slightly faster than a naïve 8-load, 7-addition implementation (I did not benchmark this). On Zen3 I'm not sure about the actual cost of loading unaligned data.
If you have a non-trivial kernel it is probably hard to determine if it is separable, though.
The best I've been able to come up with is this sequence:
{ 0 1 2 3 4 5 6 7 }
{ 1 2 3 8 5 6 7 12 }
{ 2 3 8 9 6 7 12 13 }
{ 3 8 9 10 7 12 13 14 }
{ 4 5 6 7 8 9 10 11 }
{ 5 6 7 4 9 10 11 8 }
{ 6 7 4 5 10 11 8 9 }
{ 7 4 5 6 11 8 9 10 }
Each column contains the necessary elements. The first column contains 0 - 7, the next 1 - 8, then 2 - 9, etc.
This can be produced with the following sequence of operations:
void sliding_window(const float* input, unsigned length)
__m256 a = _mm256_load_ps(input);
for (unsigned i = 8; i < length - 7; i += 8) {
__m256 b = _mm256_load_ps(input + i);
__m256i ai = _mm256_castps_si256(a); // not part of sequence
__m256i bi = _mm256_castps_si256(b); // just for code reduction
// a is the first vector, these are remaining 7
__m256 j1 = _mm256_castsi256_ps(_mm256_alignr_epi8(bi, ai, 4));
// Reduction operation (add, fmadd, max, etc.) between a and j1 goes here
__m256 j2 = _mm256_castsi256_ps(_mm256_alignr_epi8(bi, ai, 8));
// Reduction with j2 goes here, and so on after each value
__m256 j3 = _mm256_castsi256_ps(_mm256_alignr_epi8(bi, ai, 12));
__m256 r0 = _mm256_permute2f128_ps(a, b, 0x21);
a = b; // Register with "b" isn't needed anymore
__m256 r1 = _mm256_permute_ps(r0, 0x39);
__m256 r2 = _mm256_permute_ps(r0, 0x4e);
__m256 r3 = _mm256_permute_ps(r0, 0x93);
// Final reduction with r3 to produce result
// handle tail here
On Zen3, I benchmark this as about 10% faster than the sequence of unaligned loads.

Subset sum with maximum equal sums and without using all elements

You are given a set of integers and your task is the following: split them into 2 subsets with an equal sum in such way that these sums are maximal. You are allowed not to use all given integers, that's fine. If it's just impossible, report error somehow.
My approach is rather straightforward: at each step, we pick a single item, mark it as visited, update current sum and pick another item recursively. Finally, try skipping current element.
It works on simpler test cases, but it fails one:
T = 1
N = 25
Elements: 5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5
One can run it as follows:
1 25 5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5
I expect sum to be equal 239, but it the algorithm fails to find such solution.
I've ended up with the following code:
#include <iostream>
#include <unordered_set>
using namespace std;
unordered_set<uint64_t> visited;
const int max_N = 50;
int data[max_N];
int p1[max_N];
int p2[max_N];
int out1[max_N];
int out2[max_N];
int n1 = 0;
int n2 = 0;
int o1 = 0;
int o2 = 0;
int N = 0;
void max_sum(int16_t &sum_out, int16_t sum1 = 0, int16_t sum2 = 0, int idx = 0) {
if (idx < 0 || idx > N) return;
if (sum1 == sum2 && sum1 > sum_out) {
sum_out = sum1;
o1 = n1;
o2 = n2;
for(int i = 0; i < n1; ++i) {
out1[i] = p1[i];
for (int i = 0; i < n2; ++i) {
out2[i] = p2[i];
if (idx == N) return;
uint64_t key = (static_cast<uint64_t>(sum1) << 48) | (static_cast<uint64_t>(sum2) << 32) | idx;
if (visited.find(key) != visited.end()) return;
p1[n1] = data[idx];
max_sum(sum_out, sum1 + data[idx], sum2, idx + 1);
p2[n2] = data[idx];
max_sum(sum_out, sum1, sum2 + data[idx], idx + 1);
max_sum(sum_out, sum1, sum2, idx + 1);
int main() {
int T = 0;
cin >> T;
for (int t = 1; t <= T; ++t) {
int16_t sum_out;
cin >> N;
for(int i = 0; i < N; ++i) {
cin >> data[i];
n1 = 0;
n2 = 0;
o1 = 0;
o2 = 0;
int res = 0;
int res2 = 0;
for (int i = 0; i < o1; ++i) res += out1[i];
for (int i = 0; i < o2; ++i) res2 += out2[i];
if (res != res2) cerr << "ERROR: " << "res1 = " << res << "; res2 = " << res2 << '\n';
cout << "#" << t << " " << res << '\n';
I have the following questions:
Could someone help me to troubleshoot the failing test? Are there any obvious problems?
How could I get rid of unordered_set for marking already visited sums? I prefer to use plain C.
Is there a better approach? Maybe using dynamic programming?
Another approach is consider all the numbers till [1,(2^N-2)].
Consider the position of each bit to position of each element .Iterate all numbers from [1,(2^N-2)] then check for each number .
If bit is set you can count that number in set1 else you can put that number in set2 , then check if sum of both sets are equals or not . Here you will get all possible sets , if you want just one once you find just break.
1) Could someone help me to troubleshoot the failing test? Are there any obvious problems?
The only issue I could see is that you have not set sum_out to 0.
When I tried running the program it seemed to work correctly for your test case.
2) How could I get rid of unordered_set for marking already visited sums? I prefer to use plain C.
See the answer to question 3
3) Is there a better approach? Maybe using dynamic programming?
You are currently keeping track of whether you have seen each choice of value for first subset, value for second subset, amount through array.
If instead you keep track of the difference between the values then the complexity significantly reduces.
In particular, you can use dynamic programming to store an array A[diff] that for each value of the difference either stores -1 (to indicate that the difference is not reachable), or the greatest value of subset1 when the difference between subset1 and subset2 is exactly equal to diff.
You can then iterate over the entries in the input and update the array based on either assigning each element to subset1/subset2/ or not at all. (Note you need to make a new copy of the array when computing this update.)
In this form there is no use of unordered_set because you can simply use a straight C array. There is also no difference between subset1 and subset2 so you can only keep positive differences.
Example Python Code
from collections import defaultdict
data=map(int,"5 27 24 12 12 2 15 25 32 21 37 29 20 9 24 35 26 8 31 5 25 21 28 3 5".split())
A=defaultdict(int) # Map from difference to best value of subset sum 1
A[0] = 0 # We start with a difference of 0
for a in data:
A2 = defaultdict(int)
def add(s1,s2):
if s1>s2:
d = s2-s1
if d in A2:
A2[d] = max( A2[d], s1 )
A2[d] = s1
for diff,sum1 in A.items():
sum2 = sum1 + diff
A = A2
print A[0]
This prints 239 as the answer.
For simplicity I haven't bothered with the optimization of using a linear array instead of the dictionary.
A very different approach would be to use a constraint or mixed integer solver. Here is a possible formulation.
x(i,g) = 1 if value v(i) belongs to group g
0 otherwise
The optimization model can look like:
max s
s = sum(i, x(i,g)*v(i)) for all g
sum(g, x(i,g)) <= 1 for all i
For two groups we get:
---- 31 VARIABLE s.L = 239.000
---- 31 VARIABLE x.L
g1 g2
i1 1
i2 1
i3 1
i4 1
i5 1
i6 1
i7 1
i8 1
i9 1
i10 1
i11 1
i12 1
i13 1
i14 1
i15 1
i16 1
i17 1
i18 1
i19 1
i20 1
i21 1
i22 1
i23 1
i25 1
We can easily do more groups. E.g. with 9 groups:
---- 31 VARIABLE s.L = 52.000
---- 31 VARIABLE x.L
g1 g2 g3 g4 g5 g6 g7 g8 g9
i2 1
i3 1
i4 1
i5 1
i6 1
i7 1
i8 1
i9 1
i10 1
i11 1
i12 1
i13 1
i14 1
i15 1
i16 1
i17 1
i19 1
i20 1
i21 1
i22 1
i23 1
i24 1
i25 1
If there is no solution, the solver will select zero elements in each group with a sum s=0.

Efficiently unpack a vector into binary matrix Octave

On Octave I'm trying to unpack a vector in the format:
y = [ 1
3 ]
I want to return a matrix of dimension ( rows(y) x max value(y) ), where for each row I have a 1 in the column of the original digits value, and a zero everywhere else, i.e. for the example above
y01 = [ 1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0 ]
so far I have
y01 = zeros( m, num_labels );
for i = 1:m
for j = 1:num_labels
y01(i,j) = (y(i) == j);
which works, but is going get slow for bigger matrices, and seems inefficient because it is cycling through every single value even though the majority aren't changing.
I found this for R on another thread:
f3 <- function(vec) {
U <- sort(unique(vec))
M <- matrix(0, nrow = length(vec),
ncol = length(U),
dimnames = list(NULL, U))
M[cbind(seq_len(length(vec)), match(vec, U))] <- 1L
but I don't know R and I'm not sure if/how the solution ports to octave.
Thanks for any suggestions!
Use a sparse matrix (which also saves a lot of memory) which can be used in further calculations as usual:
y = [1; 2; 4; 1; 3]
y01 = sparse (1:rows (y), y, 1)
if you really want a full matrix then use "full":
full (y01)
ans =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
Sparse is a more efficient way to do this when the matrix is big.
If your dimension of the result is not very high, you can try this:
y = [1; 2; 4; 1; 3]
I = eye(max(y));
y01 = I(y,:)
The result is same as full(sparse(...)).
y01 =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
% Vector y to Matrix Y
Y = zeros(m, num_labels);
% Loop through each row
for i = 1:m
% Use the value of y as an index; set the value matching index to 1
Y(i,y(i)) = 1;
Another possibility is:
y = [1; 2; 4; 1; 3]
classes = unique(y)(:)
num_labels = length(classes)
y01=[1:num_labels] == y
With the following detailed printout:
y =
classes =
num_labels = 4
y01 =
1 0 0 0
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0

How to efficiently calculate a row in pascal's triangle?

I'm interested in finding the nth row of pascal triangle (not a specific element but the whole row itself). What would be the most efficient way to do it?
I thought about the conventional way to construct the triangle by summing up the corresponding elements in the row above which would take:
1 + 2 + .. + n = O(n^2)
Another way could be using the combination formula of a specific element:
c(n, k) = n! / (k!(n-k)!)
for each element in the row which I guess would take more time the the former method depending on the way to calculate the combination. Any ideas?
>>> def pascal(n):
... line = [1]
... for k in range(n):
... line.append(line[k] * (n-k) / (k+1))
... return line
>>> pascal(9)
[1, 9, 36, 84, 126, 126, 84, 36, 9, 1]
This uses the following identity:
C(n,k+1) = C(n,k) * (n-k) / (k+1)
So you can start with C(n,0) = 1 and then calculate the rest of the line using this identity, each time multiplying the previous element by (n-k) / (k+1).
A single row can be calculated as follows:
First compute 1. -> N choose 0
Then N/1 -> N choose 1
Then N*(N-1)/1*2 -> N choose 2
Then N*(N-1)*(N-2)/1*2*3 -> N choose 3
Notice that you can compute the next value from the previous value, by just multipyling by a single number and then dividing by another number.
This can be done in a single loop. Sample python.
def comb_row(n):
r = 0
num = n
cur = 1
yield cur
while r <= n:
r += 1
cur = (cur* num)/r
yield cur
num -= 1
The most efficient approach would be:
std::vector<int> pascal_row(int n){
std::vector<int> row(n+1);
row[0] = 1; //First element is always 1
for(int i=1; i<n/2+1; i++){ //Progress up, until reaching the middle value
row[i] = row[i-1] * (n-i+1)/i;
for(int i=n/2+1; i<=n; i++){ //Copy the inverse of the first part
row[i] = row[n-i];
return row;
here is a fast example implemented in go-lang that calculates from the outer edges of a row and works it's way to the middle assigning two values with a single calculation...
package main
import "fmt"
func calcRow(n int) []int {
// row always has n + 1 elements
row := make( []int, n + 1, n + 1 )
// set the edges
row[0], row[n] = 1, 1
// calculate values for the next n-1 columns
for i := 0; i < int(n / 2) ; i++ {
x := row[ i ] * (n - i) / (i + 1)
row[ i + 1 ], row[ n - 1 - i ] = x, x
return row
func main() {
for n := 0; n < 20; n++ {
fmt.Printf("n = %d, row = %v\n", n, calcRow( n ))
the output for 20 iterations takes about 1/4 millisecond to run...
n = 0, row = [1]
n = 1, row = [1 1]
n = 2, row = [1 2 1]
n = 3, row = [1 3 3 1]
n = 4, row = [1 4 6 4 1]
n = 5, row = [1 5 10 10 5 1]
n = 6, row = [1 6 15 20 15 6 1]
n = 7, row = [1 7 21 35 35 21 7 1]
n = 8, row = [1 8 28 56 70 56 28 8 1]
n = 9, row = [1 9 36 84 126 126 84 36 9 1]
n = 10, row = [1 10 45 120 210 252 210 120 45 10 1]
n = 11, row = [1 11 55 165 330 462 462 330 165 55 11 1]
n = 12, row = [1 12 66 220 495 792 924 792 495 220 66 12 1]
n = 13, row = [1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1]
n = 14, row = [1 14 91 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 1]
n = 15, row = [1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1]
n = 16, row = [1 16 120 560 1820 4368 8008 11440 12870 11440 8008 4368 1820 560 120 16 1]
n = 17, row = [1 17 136 680 2380 6188 12376 19448 24310 24310 19448 12376 6188 2380 680 136 17 1]
n = 18, row = [1 18 153 816 3060 8568 18564 31824 43758 48620 43758 31824 18564 8568 3060 816 153 18 1]
n = 19, row = [1 19 171 969 3876 11628 27132 50388 75582 92378 92378 75582 50388 27132 11628 3876 969 171 19 1]
An easy way to calculate it is by noticing that the element of the next row can be calculated as a sum of two consecutive elements in the previous row.
[1, 5, 10, 10, 5, 1]
[1, 6, 15, 20, 15, 6, 1]
For example 6 = 5 + 1, 15 = 5 + 10, 1 = 1 + 0 and 20 = 10 + 10. This gives a simple algorithm to calculate the next row from the previous one.
def pascal(n):
row = [1]
for x in xrange(n):
row = [l + r for l, r in zip(row + [0], [0] + row)]
# print row
return row
print pascal(10)
In Scala Programming: i would have done it as simple as this:
def pascal(c: Int, r: Int): Int = c match {
case 0 => 1
case `c` if c >= r => 1
case _ => pascal(c-1, r-1)+pascal(c, r-1)
I would call it inside this:
for (row <- 0 to 10) {
for (col <- 0 to row)
print(pascal(col, row) + " ")
resulting to:
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
To explain step by step:
Step 1: We make sure that if our column is the first one we always return figure 1.
Step 2: Since each X-th row there are X number of columns. So we say that; the last column X is greater than or equal to X-th row, then the return figure 1.
Step 3: Otherwise, we get the sum of the repeated pascal of the column just before the current one and the row just before the current one ; and the pascal of that column and the row just before the current one.
Good Luck.
Let me build upon Shane's excellent work for an R solution. (Thank you, Shane!. His code for generating the triangle:
pascalTriangle <- function(h) {
lapply(0:h, function(i) choose(i, 0:i))
This will allow one to store the triangle as a list. We can then index whatever row desired. But please add 1 when indexing! For example, I'll grab the bottom row:
pt_with_24_rows <- pascalTriangle(24)
row_24 <- pt_with_24_rows[25] # add one
row_24[[1]] # prints the row
So, finally, make-believe I have a Galton Board problem. I have the arbitrary challenge of finding out percentage of beans have clustered in the center: say, bins 10 to 15 (out of 25).
Which turns out to be 0.7704771. All good!
In Ruby, the following code will print out the specific row of Pascals Triangle that you want:
def row(n)
pascal = [1]
if n < 1
p pascal
return pascal
n.times do |num|
nextNum = ((n - num)/(num.to_f + 1)) * pascal[num]
pascal << nextNum.to_i
p pascal
Where calling row(0) returns [1] and row(5) returns [1, 5, 10, 10, 5, 1]
Here is the another best and simple way to design a Pascal Triangle dynamically using VBA.
`Sub pascal()
Dim book As Excel.Workbook
Dim sht As Worksheet
Set book = ThisWorkbook
Set sht = book.Worksheets("sheet1")
a = InputBox("Enter the Number", "Fill")
For i = 1 To a
For k = 1 To i
If i >= 2 And k >= 2 Then
sht.Cells(i, k).Value = sht.Cells(i - 1, k - 1) + sht.Cell(i- 1, k)
sht.Cells(i, k).Value = 1
End If
Next k
Next i
End Sub`
I used Ti-84 Plus CE
The use of –> in line 6 is the store value button
Forloop syntax is
:For(variable, beginning, end [, increment])
nCr syntax is
:valueA nCr valueB
List indexes start at 1 so that's why i set it to R+1
N= row
R= column
:ClrList L1
:Disp "ROW
:Input N
:N nCr R–>L1(R+1)
:Disp L1
This is the fastest way I can think of to do this in programming (with a ti 84) but if you mean to be able to calculate the row using pen and paper then just draw out the triangle cause doing factorals are a pain!
Here's an O(n) space-complexity solution in Python:
def generate_pascal_nth_row(n):
for i in range(n):
previous_res = result.copy()
for j in range(1,i):
result[j] = previous_res[j-1] + previous_res[j]
return result
class Solution{
int comb(int n,int r){
long long c=1;
for(int i=1;i<=r;i++) { //calculates n!/(n-r)!
c=((c*n))/i; n--;
return c;
vector<int> getRow(int n) {
vector<int> v;
for (int i = 0; i < n; ++i)
return v;
faster than 100% submissions on leet code https://leetcode.com/submissions/detail/406399031/
The most efficient way to calculate a row in pascal's triangle is through convolution. First we chose the second row (1,1) to be a kernel and then in order to get the next row we only need to convolve curent row with the kernel.
So convolution of the kernel with second row gives third row [1 1]*[1 1] = [1 2 1], convolution with the third row gives fourth [1 2 1]*[1 1] = [1 3 3 1] and so on
This is a function in julia-lang (very simular to matlab):
function binomRow(n::Int64)
baseVector = [1] #the first row is equal to 1.
kernel = [1,1] #This is the second row and a kernel.
row = zeros(n)
for i = 1 : n
row = baseVector
baseVector = conv(baseVector, kernel) #convoltion with kernel
return row::Array{Int64,1}
To find nth row -
int res[] = new int[n+1];
res[0] = 1;
for(int i = 1; i <= n; i++)
for(int j = i; j > 0; j++)
res[j] += res[j-1];

Code-golf: generate pascal's triangle

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Generate a list of lists (or print, I don't mind) a Pascal's Triangle of size N with the least lines of code possible!
Here goes my attempt (118 characters in python 2.6 using a trick):
p=lambda n:[len(c()[k])and map(sum,zip(z+c()[k][-1],c()[k][-1]+z))or[1]for _ in range(n)]
the first element of the list comprehension (when the length is 0) is [1]
the next elements are obtained the following way:
take the previous list and make two lists, one padded with a 0 at the beginning and the other at the end.
e.g. for the 2nd step, we take [1] and make [0,1] and [1,0]
sum the two new lists element by element
e.g. we make a new list [(0,1),(1,0)] and map with sum.
repeat n times and that's all.
usage (with pretty printing, actually out of the code-golf xD):
result = p(10)
lines = [" ".join(map(str, x)) for x in result]
for i in lines:
print i.center(max(map(len, lines)))
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
K (Wikipedia), 15 characters:
Example output:
p 10
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1)
It's also easily explained:
p:{x {+':x,0} \ 1}
^ ^------^ ^ ^
p is a function taking an implicit parameter x.
p unfolds (C) an anonymous function (B) x times (A) starting at 1 (D).
The anonymous function simply takes a list x, appends 0 and returns a result by adding (+) each adjacent pair (':) of values: so e.g. starting with (1 2 1), it'll produce (1 2 1 0), add pairs (1 1+2 2+1 1+0), giving (1 3 3 1).
Update: Adapted to K4, which shaves off another two characters. For reference, here's the original K3 version:
J, another language in the APL family, 9 characters:
This uses J's builtin "combinations" verb.
p 10
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9
0 0 1 3 6 10 15 21 28 36
0 0 0 1 4 10 20 35 56 84
0 0 0 0 1 5 15 35 70 126
0 0 0 0 0 1 6 21 56 126
0 0 0 0 0 0 1 7 28 84
0 0 0 0 0 0 0 1 8 36
0 0 0 0 0 0 0 0 1 9
0 0 0 0 0 0 0 0 0 1
Haskell, 58 characters:
r 0=[1]
r(n+1)=zipWith(+)(0:r n)$r n++[0]
p n=map r[0..n]
*Main> p 5
More readable:
-- # row 0 is just [1]
row 0 = [1]
-- # row (n+1) is calculated from the previous row
row (n+1) = zipWith (+) ([0] ++ row n) (row n ++ [0])
-- # use that for a list of the first n+1 rows
pascal n = map row [0..n]
69C in C:
Use it like so:
int main()
#define N 10
int i, j;
int t[N*N] = {N};
for (i = 0; i < N; i++)
for (j = 0; j <= i; j++)
printf("%d ", t[i*N + j]);
return 0;
F#: 81 chars
let f=bigint.Factorial
let p x=[for n in 0I..x->[for k in 0I..n->f n/f k/f(n-k)]]
Explanation: I'm too lazy to be as clever as the Haskell and K programmers, so I took the straight forward route: each element in Pascal's triangle can be uniquely identified using a row n and col k, where the value of each element is n!/(k! (n-k)!.
Python: 75 characters
def G(n):R=[[1]];exec"R+=[map(sum,zip(R[-1]+[0],[0]+R[-1]))];"*~-n;return R
Shorter prolog version (112 instead of 164):
n([H,I|T],[A|B]):-n([I|T],B),A is H+I.
p(N,[R,S|T]):-O is N-1,p(O,[S|T]),n([0|S],R).
another stab (python):
def pascals_triangle(n):
for i in range(n-1):
return x
Haskell, 164C with formatting:
i l=zipWith(+)(0:l)$l++[0]
fp=map (concatMap$(' ':).show)f$iterate i[1]
c n l=if(length l<n)then c n$' ':l++" "else l
cl l=map(c(length$last l))l
pt n=cl$take n fp
Without formatting, 52C:
i l=zipWith(+)(0:l)$l++[0]
pt n=take n$iterate i[1]
A more readable form of it:
iterateStep row = zipWith (+) (0:row) (row++[0])
pascalsTriangle n = take n $ iterate iterateStep [1]
-- For the formatted version, we reduce the number of rows at the final step:
formatRow r = concatMap (\l -> ' ':(show l)) r
formattedLines = map formatRow $ iterate iterateStep [1]
centerTo width line =
if length line < width
then centerTo width (" " ++ line ++ " ")
else line
centerLines lines = map (centerTo (length $ last lines)) lines
pascalsTriangle n = centerLines $ take n formattedLines
And perl, 111C, no centering:
$n=<>;$p=' 1 ';for(1..$n){print"$p\n";$x=" ";while($p=~s/^(?= ?\d)(\d* ?)(\d* ?)/$2/){$x.=($1+$2)." ";}$p=$x;}
Scheme — compressed version of 100 characters
(define(P h)(define(l i r)(if(> i h)'()(cons r(l(1+ i)(map +(cons 0 r)(append r '(0))))))(l 1 '(1)))
This is it in a more readable form (269 characters):
(define (pascal height)
(define (next-row row)
(map +
(cons 0 row)
(append row '(0))))
(define (iter i row)
(if (> i height)
(cons row
(iter (1+ i)
(next-row row)))))
(iter 1 '(1)))
VBA/VB6 (392 chars w/ formatting)
Public Function PascalsTriangle(ByVal pRows As Integer)
Dim iRow As Integer
Dim iCol As Integer
Dim lValue As Long
Dim sLine As String
For iRow = 1 To pRows
sLine = ""
For iCol = 1 To iRow
If iCol = 1 Then
lValue = 1
lValue = lValue * (iRow - iCol + 1) / (iCol - 1)
End If
sLine = sLine & " " & lValue
Debug.Print sLine
End Function
PHP 100 characters
$v[]=1;while($a<34){echo join(" ",$v)."\n";$a++;for($k=0;$k<=$a;$k++)$t[$k]=$v[$k-1]+$v[$k];$v=$t;}
Ruby, 83c:
def p(n);n>0?(m=p(n-1);k=m.last;m+[([0]+k).zip(k+[0]).map{|x|x[0]+x[1]}]):[[1]];end
irb(main):001:0> def p(n);n>0?(m=p(n-1);k=m.last;m+[([0]+k).zip(k+[0]).map{|x|x[0]+x[1]}]):[[1]];end
=> nil
irb(main):002:0> p(5)
=> [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1], [1, 5, 10, 10, 5, 1]]
Another python solution, that could be much shorter if the builtin functions had shorter names... 106 characters.
from itertools import*
p=lambda n:[[len(list(combinations(r(i),j)))for j in r(i+1)]for i in r(n)]
Another try, in prolog (I'm practising xD), not too short, just 164c:
s([H|T],[J|U],[K|V]):-s(T,U,V),K is H+J.
l(P,N):-M is N-1,l(A,M),append(A,[0],B),s(B,[0|A],P).
p([H|T],N):-M is N-1,l(H,N),p(T,M).
s = sum lists element by element
l = the Nth row of the triangle
p = the whole triangle of size N
VBA, 122 chars:
Sub p(n)
For r = 1 To n
l = "1"
v = 1
For c = 1 To r - 1
v = v / c * (r - c)
l = l & " " & v
Debug.Print l
End Sub
I wrote this C++ version a few years ago:
#include <iostream>
int main(int,char**a){for(int b=0,c=0,d=0,e=0,f=0,g=0,h=0,i=0;b<atoi(a[1]);(d|f|h)>1?e*=d>1?--d:1,g*=f>1?--f:1,i*=h>1?--h:1:((std::cout<<(i*g?e/(i*g):1)<<" "?d=b+=c++==b?c=0,std::cout<<std::endl?1:0:0,h=d-(f=c):0),e=d,g=f,i=h));}
The following is just a Scala function returning a List[List[Int]]. No pretty printing or anything. Any suggested improvements? (I know it's inefficient, but that's not the main challenge now, is it?). 145 C.
def p(n: Int)={def h(n:Int):List[Int]=n match{case 1=>1::Nil;case _=>(0::h(n-1) zipAll(h(n-1),0,0)).map{n=>n._1+n._2}};(1 to n).toList.map(h(_))}
Or perhaps:
def pascal(n: Int) = {
def helper(n: Int): List[Int] = n match {
case 1 => 1 :: List()
case _ => (0 :: helper(n-1) zipAll (helper(n-1),0,0)).map{ n => n._1 + n._2 }
(1 to n).toList.map(helper(_))
(I'm a Scala noob, so please be nice to me :D )
a Perl version (139 chars w/o shebang)
#p = (1,1);
while ($#p < 20) {
#q =();
$z = 0;
push #p, 0;
foreach (#p) {
push #q, $_+$z;
$z = $_
#p = #q;
print "#p\n";
output starts from 1 2 1
PHP, 115 chars
If you don't care whether print_r() displays the output array in the correct order, you can shave it to 113 chars like
Perl, 63 characters:
My attempt in C++ (378c). Not anywhere near as good as the rest of the posts.. but I'm proud of myself for coming up with a solution on my own =)
int* pt(int n)
int s=n*(n+1)/2;
int* t=new int[s];
for(int i=0;i<n;++i)
for(int j=0;j<=i;++j)
t[i*n+j] = (!j || j==i) ? 1 : t[(i-1)*n+(j-1)] + t[(i-1)*n+j];
return t;
int main()
int n,*t;
for(int i=0;i<n;++i)
for(int j=0;j<=i;j++)
std::cout<<t[i*n+j]<<' ';
Old thread, but I wrote this in response to a challenge on another forum today:
def pascals_triangle(n):
for i in range(n-1):
x.append([sum(i) for i in zip([0]+x[-1],x[-1]+[0])])
return x
for x in pascals_triangle(5):
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]
