Making parallel prime sieve with shared memory faster

Making parallel prime sieve with shared memory faster - algorithm

I have a prime sieve whose sequential version runs great. I finally figured out how to make the inner loop run in parallel, but (as I feared based on prior experience with other languages) the single threaded version is faster.
Can this parallel version in Rust be optimized?
extern crate crossbeam;
fn main() {
let residues = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139,
143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211];
let val = 1_000_000;
let md = 210;
let rescnt = 48;
println!("val = {}, mod = {}, rescnt = {}", val, md, rescnt);
let mut posn = [0; 210];
for i in 1..rescnt {posn[residues[i]] = i - 1;}
posn[1] = rescnt - 1;
let mut modk; let mut r; let mut k;
let num = val - 1 | 1;
k = num / md; modk = md * k; r = 1;
while num >= modk + residues[r] {r += 1;}
let maxpcs = k * rescnt + r - 1;
let prms: Vec<u8> = vec![0; maxpcs];
println!("num = {}, k = {}, modk = {}, maxpcs = {}", num, k, modk, maxpcs);
let sqrt_n = (num as f32).sqrt() as usize;
modk = 0; r = 0; k = 0;
// sieve to identify/eliminate nonprimes/locations in prms array
for i in 0..maxpcs {
r += 1; if r > rescnt {r = 1; modk += md; k += 1;};
if prms[i] == 1 {continue;}
let prm_r = residues[r];
let prime = modk + prm_r;
if prime > sqrt_n {break;}
let prmstep = prime * rescnt;
for ri in &residues[1..rescnt + 1] {
let prms = &mut prms;
crossbeam::scope(|scope| {
scope.spawn(move || {
let prod = prm_r * ri;
let mut np = (k * (prime + ri) + (prod - 2) / md) * rescnt + posn[prod % md];
while np < maxpcs {prms[np] = 1; np += prmstep;}
});
});
}
}
// the prms array now has all the positions for primes r1..N
// extract prime numbers and count from prms into prims array
let mut prmcnt = 4;
modk = 0; r = 0;
for i in 0..maxpcs {
r += 1; if r > rescnt {r = 1; modk += md;};
if prms[i] == 0 {prmcnt += 1;}
}
println!("{}", prmcnt);
}
Using Rust 1.6 on Linux.

Related

How to improve my Fibonacci Generation JavaScript

How to make this fibonacci more clean and possibly performance increase?
function fibonacci(n) {
var array = [];
if (n === 1) {
array.push(0);
return array;
} else if (n === 2) {
array.push(0, 1);
return array;
} else {
array = [0, 1];
for (var i = 2; i < n; i++) {
var sum = array[array.length - 2] + array[array.length - 1];
array.push(sum);
}
return array;
}
}

If you want to optimize your Fibonacci then pre-calculate a bunch of values (say up to 64 or even higher depending on your use-case) and have those pre-calculated values as a constant array that your function can use.
const precalcFibonacci = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025, 20365011074, 32951280099, 53316291173, 86267571272, 139583862445, 225851433717, 365435296162, 591286729879, 956722026041, 1548008755920, 2504730781961, 4052739537881, 6557470319842];
function fibonacci(n) {
if(n <= 0) return [];
if(n < 65) return precalcFibonacci.slice(0, n);
else {
let array = precalcFibonacci.slice();
for(let i = 64, a = precalcFibonacci[62], b = precalcFibonacci[63]; i < n; i++) {
array[i] = a + b;
a = b;
b = array[i];
}
return array;
}
}

There is a way to get the N-th Fibonacci number in log(N).
All you need is to raise the matrix
| 0 1 |
| 1 1 |
to the power N using binary matrix exponentiation
This is really useful for really big N while a traditional algorithm is gonna be slow.
links to materials:
https://kukuruku.co/post/the-nth-fibonacci-number-in-olog-n/
https://www.youtube.com/watch?v=eMXNWcbw75E

Given list of integers, find the sets of numbers whose sum >= a target number, minimising the total amount that each set goes over the target

So somehow a relative of mine got these restaurant vouchers that can be used to deduct 500 baht off of each receipt. It is possible to ask the restaurant to issue multiple receipts so that multiple vouchers can be used. The relative wishes to spend as little cash as possible (anything over the voucher value of 500 will have to be paid in cash)
More formally, the question is:
given a list of prices (being the prices of items to be ordered), what are the combinations of prices that would require the least amount of cash to be paid?
For example:
let prices = [425, 105, 185, 185, 185, 98, 145, 155, 125, 125, 135, 295, 295, 155, 125]
if I were to just sum the prices in the given order, stopping when sum is over 500:
[425 + 105], [185 + 185 + 185], [98 + 145 + 155 + 125], [125 + 135 + 295], [295 + 155 + 125]
Sums of each combination:
[530, 555, 523, 555, 575]
Amount over 500:
[30, 55, 23, 55, 75]
Total cash to pay: 238
What is the best combinations of prices that would require the least amount of cash?
So far I have attempted a brute-force approach by generating all permutations of the prices and calculating the required cash amount for each permutation in the same fashion as the above example (summing from left to right, stopping when >= 500). However, this approach resulted in heap out of memory error when there are more than 10 prices :(
Thanks in advance.

I got 238 as well, with backtracking.
The program below tries all available new combinations of the state:
(index, sum_a, count_a, sum_b, count_b...)
where index is the index in prices, and sum_x and count_x
are the distinct sum x and the count of how many bins
with that sum we currently have.
It returns the recorded value achieved from that state if the state was previously seen, and avoids searching a branch known to lead to a result greater than or equal to the smallest recorded result so far. It recurses using a single object to store the state in memory and updates it as it backtracks.
There's also a tweak-able bound on how large a bin is allowed be (the line, if (new_sum >= 1.2 * 500)).
function getPrefixSums(A){
let pfxs = new Array(A.length);
pfxs[-1] = 0;
A.map((x, i) => pfxs[i] = A[i] + pfxs[i-1]);
return pfxs;
}
function add(sum, new_sum, dp){
if (dp.state[sum] == 1)
delete dp.state[sum];
else if (dp.state.hasOwnProperty(sum))
dp.state[sum] -= 1;
if (dp.state.hasOwnProperty(new_sum))
dp.state[new_sum] += 1;
else
dp.state[new_sum] = 1;
if (new_sum > 500)
dp.total += new_sum - 500;
else if (new_sum < 500);
dp.remaining -= new_sum - sum;
}
function remove(sum, new_sum, dp){
if (sum > 0 && !dp.state.hasOwnProperty(sum))
dp.state[sum] = 1;
else if (sum > 0)
dp.state[sum] += 1;
if (dp.state[new_sum] == 1)
delete dp.state[new_sum];
else if (dp.state.hasOwnProperty(new_sum))
dp.state[new_sum] -= 1;
if (new_sum > 500)
dp.total -= new_sum - 500;
else if (new_sum < 500);
dp.remaining += new_sum - sum;
}
function g(prices, pfxs, i, dp, memo){
const sorted = Object.entries(dp.state).sort(([sum1, count1], [sum2, count2]) =>
Number(sum1) - Number(sum2));
const key = String([i, sorted]);
if (memo.hasOwnProperty(key))
return memo[key];
if (dp.total >= dp.best)
return memo[key] = Infinity;
if (i == prices.length){
if (Object.keys(dp.state).some(x => Number(x) < 500))
return memo[key] = Infinity;
dp.best = Math.min(dp.best, dp.total);
return memo[key] = dp.total;
}
let best = Infinity;
let bestSum = -1;
// Add bin
if (pfxs[pfxs.length-1] - pfxs[i-1] >= 500 + dp.remaining){
dp.remaining += 500;
add(0, prices[i], dp);
const candidate = g(prices, pfxs, i+1, dp, memo);
best = candidate;
bestSum = 0;
dp.remaining -= 500;
remove(0, prices[i], dp);
}
// Add to existing bin;
for (let sum in dp.state){
const new_sum = Number(sum) + prices[i];
if (new_sum >= 1.2 * 500)
continue;
add(sum, new_sum, dp);
const candidate = g(prices, pfxs, i+1, dp, memo);
if (candidate < best){
best = candidate;
bestSum = sum;
}
remove(sum, new_sum, dp);
}
return memo[key] = best;
}
function backtrack(prices, total, memo){
let m = [];
for (let i in memo)
if (memo[i] == total)
m.push(i);
m = m.map(x => x.split(',').map(Number));
m.sort((a, b) => a[0] - b[0]);
function validate(added, removed){
return added.length == 1 &&
removed.length < 2 &&
!added.some(([sum, count]) => count > 1) &&
!removed.some(([sum, count]) => count > 1);
}
function back(i, prev_idx, dp){
if (i == m.length)
return dp;
const [idx, ...cts] = m[i];
const _dp = cts.reduce(function(acc, x, i){
if (!(i & 1))
acc[x] = cts[i+1];
return acc;
}, {});
if (idx == prev_idx)
return back(i + 1, prev_idx, dp);
let added = [];
let removed = [];
for (let sum in _dp){
if (!dp.hasOwnProperty(sum))
added.push([sum, _dp[sum]]);
else if (dp[sum] > _dp[sum])
removed.push([sum, dp[sum].count - _dp[sum]]);
}
for (let sum in dp){
if (!_dp.hasOwnProperty(sum))
removed.push([sum, dp[sum]]);
}
if (!validate(added, removed))
return back(i + 1, prev_idx, dp);
const [[new_sum, _]] = added;
let old_bin = [];
if (removed.length){
const [[old_sum, _]] = removed;
const len = dp[old_sum].bins.length;
old_bin = dp[old_sum].bins[len - 1];
if (dp[old_sum].count == 1){
delete dp[old_sum];
} else {
dp[old_sum].count -= 1;
dp[old_sum].bins.length = len - 1;
}
}
if (dp[new_sum]){
dp[new_sum].count += 1;
dp[new_sum].bins.push(old_bin.concat(prices[idx-1]));
} else {
dp[new_sum] = {
count: 1,
bins: [old_bin.concat(prices[idx-1])]
}
}
return back(i + 1, idx, dp);
}
function get_dp(row){
const [idx, ...cts] = row;
return cts.reduce(function(acc, x, i){
if (!(i & 1)){
acc[x] = {
count: cts[i+1],
bins: new Array(cts[i+1]).fill(null).map(_ => [x])
};
}
return acc;
}, {});
}
const dp = get_dp(m[1]);
return back(2, 1, dp);
}
function f(prices){
const pfxs = getPrefixSums(prices);
const dp = {
state: {'0': 1},
total: 0,
remaining: 0,
best: Infinity
};
const memo = {};
const result = g(prices, pfxs, 0, dp, memo);
const _dp = backtrack(prices, result, memo);
const bins = Object.values(_dp).flatMap(x => x.bins);
return [result, bins];
}
var prices = [425, 105, 185, 185, 185, 98, 145, 155, 125, 125, 135, 295, 295, 155, 125];
console.log(JSON.stringify(prices));
console.log(JSON.stringify(f(prices)));

Sufficient algorithm for swapping elements to meet a specific condition

You are given two positive number N and K, find the smallest number of swapping any two number from N to make two new numbers A and B and the difference (A - B) equals to K, if there any more than one solution, use the solution with the biggest A.
For e.g.
We have N = 9834216 and K = 8826, we swap 16 to form the new number 9168342, A = 9168 and B = 342, and A - B = 9168 - 342 = 8826.

let N = 9834216, K = 8826;
let count = 0;
function fn(nArr, k, n) {
if (n == 0) {
let temp = -k;
if (temp <= 0) {
return;
}
let tArr = [...nArr];
while (temp > 0) {
let temp1 = temp % 10;
let tId = tArr.indexOf(temp1);
if (tId < 0) {
return;
}
tArr.splice(tId, 1);
temp = Math.floor(temp / 10);
}
if (tArr.length == 0) {
console.log((K - k) + "-" + (-k) + "=" + K);
count++;
}
} else {
for (let i in nArr) {
let tArr = [...nArr];
tArr.splice(i, 1);
fn(tArr, k - Math.pow(10, n - 1) * nArr[i], n - 1);
}
}
}
function getAB(N, K) {
let nArr = [],
n = N;
while (n > 0) {
nArr.push(n % 10);
n = Math.floor(n / 10);
}
nArr.sort();
nArr.reverse();
for (let i = nArr.length; i > nArr.length / 2; i--) {
fn(nArr, K, i);
}
}
getAB(N, K);
console.log(count + " solutions");

Sum divisible by n

This is a problem from Introduction to algorithms course:
You have an array with n random positive integers (the array doesn't
need to be sorted or the elements unique). Suggest an O(n) algorithm
to find the largest sum of elements, that is divisible by n.
It's relatively easy to find it in O(n2) using dynamic programming and storing largest sum with remainder 0, 1, 2,..., n - 1. This is a JavaScript code:
function sum_mod_n(a)
{
var n = a.length;
var b = new Array(n);
b.fill(-1);
for (var i = 0; i < n; i++)
{
var u = a[i] % n;
var c = b.slice();
for (var j = 0; j < n; j++) if (b[j] > -1)
{
var v = (u + j) % n;
if (b[j] + a[i] > b[v]) c[v] = b[j] + a[i];
}
if (c[u] == -1) c[u] = a[i];
b = c;
}
return b[0];
}
It's also easy to find it in O(n) for contiguous elements, storing partial sums MOD n. Another sample:
function cont_mod_n(a)
{
var n = a.length;
var b = new Array(n);
b.fill(-1);
b[0] = 0;
var m = 0, s = 0;
for (var i = 0; i < n; i++)
{
s += a[i];
var u = s % n;
if (b[u] == -1) b[u] = s;
else if (s - b[u] > m) m = s - b[u];
}
return m;
}
But how about O(n) in the general case? Any suggestions will be appreciated! I consider this has something to deal with linear algebra but I'm not sure what exactly.
EDIT: Can this actually be done in O(n log n)?

Since you don't specify what random means (uniform? if so in what interval?) the only general solution is the one for arbitrary arrays and I don't think you can get any better than O(n2). This is the dynamic programming algorithm in Python:
def sum_div(positive_integers):
n = len(positive_integers)
# initialise the dynamic programming state
# the index runs on all possible reminders mod n
# the DP values keep track of the maximum sum you can have for that reminder
DP = [0] * n
for positive_integer in positive_integers:
for remainder, max_sum in list(enumerate(DP)):
max_sum_next = max_sum + positive_integer
remainder_next = max_sum_next % n
if max_sum_next > DP[remainder_next]:
DP[remainder_next] = max_sum_next
return DP[0]
You can probably work out a faster solution if you have an upper limit for the values in the array, e.g. n.

Very interesting question !
This is my JS code. I don't think that O(n^2) can be lowered, hence I suppose that the way is to find an algorithm being more efficient in terms of benchmarking.
My (corrected) approach boils down to explore paths of sums until the next matching one (i.e. divisible by _n) is computed. The source array progressively shrinks as next sums are found.
(I provided different examples at the top)
var _a = [1000, 1000, 1000, 1000, 1000, 1000, 99, 10, 9] ;
//var _a = [1000, 1000, 1000, 1000, 1000, 1000, 99, 10, 9, 11] ;
//var _a = [1, 6, 6, 6, 6, 6, 49] ;
//var _a = [ -1, 1, 2, 4 ] ;
//var _a = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ] ;
//var _a = [1,1,1,1,1,1] ;
var _n = _a.length, _del_indexes = [] ;
var _rec = 0, _sum = 0, _start = 0, _test = 0 ;
console.log( "input array : ", _a );
console.log( "cardinality : ", _a.length );
while( _start < _a.length )
{
_test = 0 ;
for( var _i = _start ; _i < _a.length ; _i++ )
{
_sum += _a[_i%_n] ;
_del_indexes.push( _a[_i%_n] );
if ( ( _sum % _n ) == 0 )
{
_rec = _sum ;
_test = 1 ;
break ;
}
}
if ( _test )
{
for( var _d = 0 ; _d < _del_indexes.length ; _d++ ) _a.splice( _a.indexOf( _del_indexes[_d] ), 1 ) ;
_start = 0 ;
}
else _start++ ;
_del_indexes = [] ;
_sum = _rec ;
}
console.log( "Largest sum % " + _n + " is : ", _rec == 0 ? "none" : _rec );

Fast solution to Subset sum algorithm by Pisinger

This is a follow-up to my previous question. I still find it very interesting problem and as there is one algorithm which deserves more attention I'm posting it here.
From Wikipedia: For the case that each xi is positive and bounded by the same constant, Pisinger found a linear time algorithm.
There is a different paper which seems to describe the same algorithm but it is a bit difficult to read for me so please - does anyone know how to translate the pseudo-code from page 4 (balsub) into working implementation?
Here are couple of pointers I collected so far:
http://www.diku.dk/~pisinger/95-6.ps (the paper)
https://stackoverflow.com/a/9952759/1037407
http://www.diku.dk/hjemmesider/ansatte/pisinger/codes.html
PS: I don't really insist on precisely this algorithm so if you know of any other similarly performant algorithm please feel free to suggest it bellow.
Edit
This is a Python version of the code posted bellow by oldboy:
class view(object):
def __init__(self, sequence, start):
self.sequence, self.start = sequence, start
def __getitem__(self, index):
return self.sequence[index + self.start]
def __setitem__(self, index, value):
self.sequence[index + self.start] = value
def balsub(w, c):
'''A balanced algorithm for Subset-sum problem by David Pisinger
w = weights, c = capacity of the knapsack'''
n = len(w)
assert n > 0
sum_w = 0
r = 0
for wj in w:
assert wj > 0
sum_w += wj
assert wj <= c
r = max(r, wj)
assert sum_w > c
b = 0
w_bar = 0
while w_bar + w[b] <= c:
w_bar += w[b]
b += 1
s = [[0] * 2 * r for i in range(n - b + 1)]
s_b_1 = view(s[0], r - 1)
for mu in range(-r + 1, 1):
s_b_1[mu] = -1
for mu in range(1, r + 1):
s_b_1[mu] = 0
s_b_1[w_bar - c] = b
for t in range(b, n):
s_t_1 = view(s[t - b], r - 1)
s_t = view(s[t - b + 1], r - 1)
for mu in range(-r + 1, r + 1):
s_t[mu] = s_t_1[mu]
for mu in range(-r + 1, 1):
mu_prime = mu + w[t]
s_t[mu_prime] = max(s_t[mu_prime], s_t_1[mu])
for mu in range(w[t], 0, -1):
for j in range(s_t[mu] - 1, s_t_1[mu] - 1, -1):
mu_prime = mu - w[j]
s_t[mu_prime] = max(s_t[mu_prime], j)
solved = False
z = 0
s_n_1 = view(s[n - b], r - 1)
while z >= -r + 1:
if s_n_1[z] >= 0:
solved = True
break
z -= 1
if solved:
print c + z
print n
x = [False] * n
for j in range(0, b):
x[j] = True
for t in range(n - 1, b - 1, -1):
s_t = view(s[t - b + 1], r - 1)
s_t_1 = view(s[t - b], r - 1)
while True:
j = s_t[z]
assert j >= 0
z_unprime = z + w[j]
if z_unprime > r or j >= s_t[z_unprime]:
break
z = z_unprime
x[j] = False
z_unprime = z - w[t]
if z_unprime >= -r + 1 and s_t_1[z_unprime] >= s_t[z]:
z = z_unprime
x[t] = True
for j in range(n):
print x[j], w[j]

// Input:
// c (capacity of the knapsack)
// n (number of items)
// w_1 (weight of item 1)
// ...
// w_n (weight of item n)
//
// Output:
// z (optimal solution)
// n
// x_1 (indicator for item 1)
// ...
// x_n (indicator for item n)
#include <algorithm>
#include <cassert>
#include <iostream>
#include <vector>
using namespace std;
int main() {
int c = 0;
cin >> c;
int n = 0;
cin >> n;
assert(n > 0);
vector<int> w(n);
int sum_w = 0;
int r = 0;
for (int j = 0; j < n; ++j) {
cin >> w[j];
assert(w[j] > 0);
sum_w += w[j];
assert(w[j] <= c);
r = max(r, w[j]);
}
assert(sum_w > c);
int b;
int w_bar = 0;
for (b = 0; w_bar + w[b] <= c; ++b) {
w_bar += w[b];
}
vector<vector<int> > s(n - b + 1, vector<int>(2 * r));
vector<int>::iterator s_b_1 = s[0].begin() + (r - 1);
for (int mu = -r + 1; mu <= 0; ++mu) {
s_b_1[mu] = -1;
}
for (int mu = 1; mu <= r; ++mu) {
s_b_1[mu] = 0;
}
s_b_1[w_bar - c] = b;
for (int t = b; t < n; ++t) {
vector<int>::const_iterator s_t_1 = s[t - b].begin() + (r - 1);
vector<int>::iterator s_t = s[t - b + 1].begin() + (r - 1);
for (int mu = -r + 1; mu <= r; ++mu) {
s_t[mu] = s_t_1[mu];
}
for (int mu = -r + 1; mu <= 0; ++mu) {
int mu_prime = mu + w[t];
s_t[mu_prime] = max(s_t[mu_prime], s_t_1[mu]);
}
for (int mu = w[t]; mu >= 1; --mu) {
for (int j = s_t[mu] - 1; j >= s_t_1[mu]; --j) {
int mu_prime = mu - w[j];
s_t[mu_prime] = max(s_t[mu_prime], j);
}
}
}
bool solved = false;
int z;
vector<int>::const_iterator s_n_1 = s[n - b].begin() + (r - 1);
for (z = 0; z >= -r + 1; --z) {
if (s_n_1[z] >= 0) {
solved = true;
break;
}
}
if (solved) {
cout << c + z << '\n' << n << '\n';
vector<bool> x(n, false);
for (int j = 0; j < b; ++j) x[j] = true;
for (int t = n - 1; t >= b; --t) {
vector<int>::const_iterator s_t = s[t - b + 1].begin() + (r - 1);
vector<int>::const_iterator s_t_1 = s[t - b].begin() + (r - 1);
while (true) {
int j = s_t[z];
assert(j >= 0);
int z_unprime = z + w[j];
if (z_unprime > r || j >= s_t[z_unprime]) break;
z = z_unprime;
x[j] = false;
}
int z_unprime = z - w[t];
if (z_unprime >= -r + 1 && s_t_1[z_unprime] >= s_t[z]) {
z = z_unprime;
x[t] = true;
}
}
for (int j = 0; j < n; ++j) {
cout << x[j] << '\n';
}
}
}

great code man, but it sometimes crashed in this codeblock
for (mu = w[t]; mu >= 1; --mu)
{
for (int j = s_t[mu] - 1; j >= s_t_1[mu]; --j)
{
if (j >= w.size())
{ // !!! PROBLEM !!!
}
int mu_prime = mu - w[j];
s_t[mu_prime] = max(s_t[mu_prime], j);
}
}
...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Making parallel prime sieve with shared memory faster - algorithm

Related

How to improve my Fibonacci Generation JavaScript

Given list of integers, find the sets of numbers whose sum >= a target number, minimising the total amount that each set goes over the target

Sufficient algorithm for swapping elements to meet a specific condition

Sum divisible by n

Fast solution to Subset sum algorithm by Pisinger

Categories

Resources