I am new to Scala programming, I am trying to count RGB values in an image using Scala (particularly green color). Below is the code to count only green pixels. My Question here is, How to achieve the same using Scala map reduce algorithms?
....
val lightGreen = new Color(0,255,0)
val darkGreen = new Color(0,100,0)
var ctrGreen = 0
var ctrTotal = 0
for (x <- 0 until w)
for (y <- 0 until h) {
val c = new Color(img.getRGB(x, y))
if (isBetween(c, lightGreen,darkGreen)) {
ctrGreen += 1
}
ctrTotal += 1;
}
println("Green pixel count: " + ctrGreen)
println("Total pixel count: " + ctrTotal)
def isBetween(c: Color, c1: Color, c2: Color): Boolean = {
c.getRed >= c1.getRed && c.getRed <= c2.getRed &&
c.getBlue >= c1.getBlue && c.getBlue <= c2.getBlue &&
c.getGreen <= c1.getGreen && c.getGreen >= c2.getGreen
}
How to count number of green pixels using map reduce algorithm in Scala?
You can rewritte it as follow :
val rangeX = (0 until w)
val rangeY = (0 until h)
val (greenPixels, allPixels) = (for ( x <- rangeX; y <- rangeY ) yield( (isGreen(x, y), 1) )).reduce( (a, b) => (a._1 + b._1, a._2 + b._2) )
Or in a more functional approach :
val (greenPixels, allPixels) = rangeX.flatMap( x => rangeY.map( y => (isGreen(x, y), 1) ) )
.reduce( (a, b) => (a._1 + b._1, a._2 + b._2) )
Where isGreen is
def isGreen(x: Int, y: Int) =
{
val c = new Color(img.getRGB(x, y))
val isGreenAsInt = if( isBetween(c, lightGreen,darkGreen) ) 1 else 0
isGreenAsInt
}
Related
I was reading Spark correlation algorithm source code and while going through the code, I coulddn't understand this particular peace of code.
This is from the file : org/apache/spark/mllib/linalg/BLAS.scala
def spr(alpha: Double, v: Vector, U: Array[Double]): Unit = {
val n = v.size
v match {
case DenseVector(values) =>
NativeBLAS.dspr("U", n, alpha, values, 1, U)
case SparseVector(size, indices, values) =>
val nnz = indices.length
var colStartIdx = 0
var prevCol = 0
var col = 0
var j = 0
var i = 0
var av = 0.0
while (j < nnz) {
col = indices(j)
// Skip empty columns.
colStartIdx += (col - prevCol) * (col + prevCol + 1) / 2
av = alpha * values(j)
i = 0
while (i <= j) {
U(colStartIdx + indices(i)) += av * values(i)
i += 1
}
j += 1
prevCol = col
}
}
}
I do not know Scala and that could be the reason I could not understand it. Can someone explain what is happening here.
It is being called from Rowmatrix.scala
def computeGramianMatrix(): Matrix = {
val n = numCols().toInt
checkNumColumns(n)
// Computes n*(n+1)/2, avoiding overflow in the multiplication.
// This succeeds when n <= 65535, which is checked above
val nt = if (n % 2 == 0) ((n / 2) * (n + 1)) else (n * ((n + 1) / 2))
// Compute the upper triangular part of the gram matrix.
val GU = rows.treeAggregate(new BDV[Double](nt))(
seqOp = (U, v) => {
BLAS.spr(1.0, v, U.data)
U
}, combOp = (U1, U2) => U1 += U2)
RowMatrix.triuToFull(n, GU.data)
}
The correlation is defined here:
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
The final goal is to understand the Spark correlation algorithm.
Update 1: Relevent paper https://stanford.edu/~rezab/papers/linalg.pdf
I tried to use MillerRabin + PollardP1_rho method to factorize an integer into primes in Python3 for reducing time complexity as much as I could.But it failed some tests,I knew where the problem was.But I am a tyro in algorithm, I didn't know how to fix it.So I will put all relative codes here.
import random
def gcd(a, b):
"""
a, b: integers
returns: a positive integer, the greatest common divisor of a & b.
"""
if a == 0:
return b
if a < 0:
return gcd(-a, b)
while b > 0:
c = a % b
a, b = b, c
return a
def mod_mul(a, b, n):
# Calculate a * b % n iterately.
result = 0
while b > 0:
if (b & 1) > 0:
result = (result + a) % n
a = (a + a) % n
b = (b >> 1)
return result
def mod_exp(a, b, n):
# Calculate (a ** b) % n iterately.
result = 1
while b > 0:
if (b & 1) > 0:
result = mod_mul(result, a, n)
a = mod_mul(a, a, n)
b = (b >> 1)
return result
def MillerRabinPrimeCheck(n):
if n in {2, 3, 5, 7, 11}:
return True
elif (n == 1 or n % 2 == 0 or n % 3 == 0 or n % 5 == 0 or n % 7 == 0 or n % 11 == 0):
return False
k = 0
u = n - 1
while not (u & 1) > 0:
k += 1
u = (u >> 1)
random.seed(0)
s = 5 #If the result isn't right, then add the var s.
for i in range(s):
x = random.randint(2, n - 1)
if x % n == 0:
continue
x = mod_exp(x, u, n)
pre = x
for j in range(k):
x = mod_mul(x, x, n)
if (x == 1 and pre != 1 and pre != n - 1):
return False
pre = x
if x != 1:
return False
return True
def PollardP1_rho(n, c):
'''
Consider c as a constant integer.
'''
i = 1
k = 2
x = random.randrange(1, n - 1) + 1
y = x
while 1:
i += 1
x = (mod_mul(x, x, n) + c) % n
d = gcd(y - x, n)
if 1 < d < n:
return d
elif x == y:
return n
elif i == k:
y = x
k = (k << 1)
result = []
def PrimeFactorsListGenerator(n):
if n <= 1:
pass
elif MillerRabinPrimeCheck(n) == True:
result.append(n)
else:
a = n
while a == n:
a = PollardP1_rho(n, random.randrange(1,n - 1) + 1)
PrimeFactorsListGenerator(a)
PrimeFactorsListGenerator(n // a)
When I tried to test this:
PrimeFactorsListGenerator(4)
It didn't stop and looped this:
PollardP1_rho(4, random.randrange(1,4 - 1) + 1)
I have already tested the functions before PollardP1_rho and they work normally,so I know the function PollardP1_rho cannot deal the number 4 correctly,also the number 5.How can I fix that?
I have solved it myself.
There is 1 mistake in the code.
I should not use a var 'result' outside of the function as a global var,I should define in the function and use result.extend() to ensure the availability of the whole recursive process.So I rewrote PollardP1_rho(n, c) and PrimeFactorsListGenerator(n):
def Pollard_rho(x, c):
'''
Consider c as a constant integer.
'''
i, k = 1, 2
x0 = random.randint(0, x)
y = x0
while 1:
i += 1
x0 = (mod_mul(x0, x0, x) + c) % x
d = gcd(y - x0, x)
if d != 1 and d != x:
return d
if y == x0:
return x
if i == k:
y = x0
k += k
def PrimeFactorsListGenerator(n):
result = []
if n <= 1:
return None
if MillerRabinPrimeCheck(n):
return [n]
p = n
while p >= n:
p = Pollard_rho(p, random.randint(1, n - 1))
result.extend(PrimeFactorsListGenerator(p))
result.extend(PrimeFactorsListGenerator(n // p))
return result
#PrimeFactorsListGenerator(400)
#PrimeFactorsListGenerator(40000)
There is an additional tip: You don't need to write a function mod_mul(a, b, n) at all, using Python built-in pow(a, b, n) will do the trick and it is fully optimized.
A1, B1, C1, A2, B2 and C2 are 6 matrix with the same dimensions 4435X2000.
I have to find the values i, j and k for which A1(k,2000) == A2(i,j) and B1(k,2000) == B2(i,j) and C1(k,2000) == C2(i,j) , with the condition X(k)==1 and Y(i,j)==1
The objective is to find: counter, L, T and D
Is there a way to make this code faster? Can I avoid loops?
counter=0;
L(1)=0;
T(1)=0;
D(1)=0;
for k=1:4435
if X(k)==1 % X is a vector (4435x1)
F(k,:) = [A1(k,2000) B1(k,2000) C1(k,2000)]
for i=1:4435
for j=100:1999
if Y(i,j)==1 % Y is a matrix (4435x1999)
if F(k,:) == [A2(i,j) B2(i,j) C2(i,j)]
counter = counter+1;
L(counter)=k;
T(counter)=i;
D(counter)=j;
end
end
end
end
end
end
I want a solution that will save me at least 80% of the computation time!
and not have the error message: Out of memory
See how this works out for you -
%// Store X-Y data by calling X() and Y() functions
X_data = X(1:4435);
Y_data = Y(1:4435,100:1999);
range1 = 100:1999 %// define range for columns
A2 = A2(:,range1); %// Crop out A2, B2, C2 based on column-range
B2 = B2(:,range1);
C2 = C2(:,range1);
Y_data = Y_data(:,range1)==1;
%// Indices for dim-3
idx_X = find(X_data==1)
%// Map X==1 onto A1, B1, C1
A1Lr = A1(X_data==1,end)
B1Lr = B1(X_data==1,end)
C1Lr = C1(X_data==1,end)
%// Setup output array to store L, T, D as single Nx3 output array
out = zeros(sum(Y_data(:))*numel(A1Lr),3);
%// Try out(sum(Y_data(:)==1)*numel(A1Lr),3)=0; instead for speed!
%// Start collecting output indices
count = 1;
for iter1 = 1:numel(A1Lr)
[R,C] = find(Y_data & A2==A1Lr(iter1) & B2==B1Lr(iter1) & C2==C1Lr(iter1));
nR = numel(R);
out(count:count+nR-1,:) = [R C repmat(iter1,nR,1)];
count = count + nR;
end
out(find(out(:,1)==0,1):end,:)=[];
%// Packup the outputs
T = out(:,1)
D = out(:,2) + range1(1)-1
L = idx_X(out(:,3))
It is very difficult to determine what your code is actually supposed to accomplish, without really working to interpret your code. However, I'll give it a crack:
% Determine where X is true.
XTrue = X == 1;
% Extract values from A1,B1,C1 where X is true.
F ( XTrue , 1 : 3 ) = [ A1(XTrue,2000) B1(XTrue,2000) C1(XTrue,2000) ];
% Determine where Y is true.
YTrueIndex = find ( Y == 1 );
% Determine where the extracted values match
counter = [];
L = [];
T = [];
D = [];
for ( ii = 1 : length(YTrueIndex) )
indexCurrent = YTrueIndex(ii)
FRowsThatMatch = F(:,1)==A2(indexCurrent) & F(:,2)==B2(indexCurrent) & F(:,3)==C2(indexCurrent);
matchCount = length ( find ( FRowsThatMatch ) );
if ( matchCount > 0 )
counter = counter + matchCount;
[ i , j ] = ind2sub ( size ( Y ) , indexCurrent );
L = [ L , find ( FRowsThatMatch ) ];
T = [ T , ones(matchCount,1)*i ];
D = [ D , ones(matchCount,2)*j ];
end
end
I am working on problem two of Euler.
I wanted to solve it this way, to compare the time after .
//Find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed four million
I should get
//A: 4613732
but I am getting a huge number :
177112424089630957537
Can someone explain why ?
def Fibonaccu(max: Int) : BigInt = {
var a:BigInt = 0
var b:BigInt = 1
var sum:BigInt= 0
var i:BigInt = 0;
while(i < max){
i+=1
b = a + b
a = b - a
if (b % 2 == 0) sum += b
}
//Return
println(sum)
sum
}
}
Here it is:
scala> val fib: Stream[Int] = 0 #:: fib.scanLeft(1)(_+_)
fib: Stream[Int] = Stream(0, ?)
scala> fib.takeWhile(4000000>).filter(_%2 == 0).sum
res0: Int = 4613732
And here is based on your code:
scala> def Fibonaccu(max: Int) : BigInt = {
| var a:BigInt = 0
| var b:BigInt = 1
| var sum:BigInt= 0
| while(b < max) {
| if(b % 2 == 0) sum += b
| b = a + b
| a = b - a
| }
| sum
| }
Fibonaccu: (max: Int)BigInt
scala> Fibonaccu(4000000)
res1: BigInt = 4613732
Sought is an efficient algorithm that finds the unique integer in an interval [a, b] which has the maximum number of trailing zeros in its binary representation (a and b are integers > 0):
def bruteForce(a: Int, b: Int): Int =
(a to b).maxBy(Integer.numberOfTrailingZeros(_))
def binSplit(a: Int, b: Int): Int = {
require(a > 0 && a <= b)
val res = ???
assert(res == bruteForce(a, b))
res
}
here are some examples
bruteForce( 5, 7) == 6 // binary 110 (1 trailing zero)
bruteForce( 1, 255) == 128 // binary 10000000
bruteForce(129, 255) == 192 // binary 11000000
etc.
This one finds the number of zeros:
// Requires a>0
def mtz(a: Int, b: Int, mask: Int = 0xFFFFFFFE, n: Int = 0): Int = {
if (a > (b & mask)) n
else mtz(a, b, mask<<1, n+1)
}
This one returns the number with those zeros:
// Requires a > 0
def nmtz(a: Int, b: Int, mask: Int = 0xFFFFFFFE): Int = {
if (a > (b & mask)) b & (mask>>1)
else nmtz(a, b, mask<<1)
}
I doubt the log(log(n)) solution has a small enough constant term to beat this. (But you could do binary search on the number of zeros to get log(log(n)).)
I decided to take Rex's challenge and produce something faster. :-)
// requires a > 0
def mtz2(a: Int, b: Int, mask: Int = 0xffff0000, shift: Int = 8, n: Int = 16): Int = {
if (shift == 0) if (a > (b & mask)) n - 1 else n
else if (a > (b & mask)) mtz2(a, b, mask >> shift, shift / 2, n - shift)
else mtz2(a, b, mask << shift, shift / 2, n + shift)
}
Benchmarked with
import System.{currentTimeMillis => now}
def time[T](f: => T): T = {
val start = now
try { f } finally { println("Elapsed: " + (now - start)/1000.0 + " s") }
}
val range = 1 to 200
time(f((a, b) => mtz(a, b)))
time(f((a, b) => mtz2(a, b)))
First see if there is a power of two that lies within your interval. If there is at least one, the largest one wins.
Otherwise, choose the largest power of two that is less than your minimum bound.
Does 1100000...0 lie in your bound? If yes, you've won. If it's still less than your minimum bound, try 1110000...0; otherwise, if it's greater than your maximum bound, try 1010000...0.
And so forth, until you win.
as a conclusion, here is my variant of Rex' answer which gives both the center value and also an 'extent' which is the minimum power of two distance from the center which covers both a in the one direction and b in the other.
#tailrec def binSplit(a: Int, b: Int, mask: Int = 0xFFFFFFFF): (Int, Int) = {
val mask2 = mask << 1
if (a > (b & mask2)) (b & mask, -mask)
else binSplit(a, b, mask2)
}
def test(): Unit = {
val Seq(r1, r2) = Seq.fill(2)(util.Random.nextInt(0x3FFFFFFF) + 1)
val (a, b) = if (r1 <= r2) (r1, r2) else (r2, r1)
val (center, extent) = binSplit(a, b)
assert((center >= a) && (center <= b) && (center - extent) <= a &&
(center - extent) >= 0 && (center + extent) > b, (a, b, center, extent))
}
for (i <- 0 to 100000) { test() }