Finding major axis/image orientation of binary image in R

I have a high res binary image which looks something like:
I'm trying to compute the major axis which should be slightly rotated to the right and eventually get the axis of orientation of the object
A post here (in matlab) suggests a way of doing this is computing the covariance matrix for the datapoints and finding their eigenvalues/eigenvectors
I am trying to implement something similar in R
%% MATLAB CODE Calculate axis and draw
[M N] = size(Ibw);
[X Y] = meshgrid(1:N,1:M);
%Mass and mass center
m = sum(sum(Ibw));
x0 = sum(sum(Ibw.*X))/m;
y0 = sum(sum(Ibw.*Y))/m;
#R code
d = dim(im)
M = d[1]
N = d[2]
t = meshgrid(M,N)
X = t[[2]]
Y = t[[1]]
m = sum(im);
x0 = sum(im %*% X)/m;
y0 = sum(im %*% Y)/m;
meshgrid <-function(r,c){
return(list(R=matrix(rep(1:r, r), r, byrow=T),
C=matrix(rep(1:c, c), c)))
However, computing m , x0 and y0 takes too long in R.
Does anyone know of an implementation in R?

Computing the variance matrix directly, with var, takes 1/3 of a second.
# Sample data
M <- 2736
N <- 3648
im <- matrix( FALSE, M, N );
y <- as.vector(row(im))
x <- as.vector(col(im))
im[ abs( y - M/2 ) < M/3 & abs( x - N/2 ) < N/3 ] <- TRUE
theta <- runif(1, -pi/12, pi/12)
xy <- cbind(x+1-N/2,y+1-M/2) %*% matrix(c( cos(theta), sin(theta), -sin(theta), cos(theta) ), 2, 2)
#plot(xy[,1]+N/2-1, xy[,2]+M/2-1); abline(h=c(1,M),v=c(1,N))
f <- function(u, lower, upper) pmax(lower,pmin(round(u),upper))
im[] <- im[cbind( f(xy[,2] + M/2 - 1,1,M), f(xy[,1] + N/2 - 1,1,N) )]
image(1:N, 1:M, t(im), asp=1)
# Variance matrix of the points in the rectangle
i <- which(im)
V <- var(cbind( col(im)[i], row(im)[i] ))
# Their eigenvectors
u <- eigen(V)$vectors
abline( M/2-N/2*u[2,1]/u[1,1], u[2,1]/u[1,1], lwd=5 )
abline( M/2-N/2*u[2,2]/u[1,2], u[2,2]/u[1,2] )

Try replacing the default Rblas.dll with a suitable one from this link.


Drawing concentric tiling circles with even diameter

I need to draw circles using pixels with these constraints:
the total of pixels across the diameter is an even number,
there is no empty pixels between two circles of radius R and R+1 (R is an integer).
The midpoint algorithm can’t be used but I found out that Eric Andres wrote the exact thing I want. The algorithm can be found in this article under the name of “half integer centered circle”. For those who don’t have access to it, I put the interesting part is at the end of the question.
I encounter difficulties to implement the algorithm. I copied the algorithm in Processing using the Python syntax (for the ease of visualisation):
def half_integer_centered_circle(xc, yc, R):
x = 1
y = R
d = R
while y >= x:
point(xc + x, yc + y)
point(xc + x, yc - y + 1)
point(xc - x + 1, yc + y)
point(xc - x + 1, yc - y + 1)
point(xc + y, yc + x)
point(xc + y, yc - x + 1)
point(xc - y + 1, yc + x)
point(xc - y + 1, yc - x + 1)
if d > x:
d = d - x
x = x + 1
elif d < R + 1 - y:
d = d + y - 1
y = y - 1
d = d + y - x - 1
x = x + 1
y = y - 1
The point() function just plot a pixel at the given coordinates. Please also note that in the article, x is initialised as S, which is strange because there is no S elsewhere (it’s not explained at all), however it is said that the circle begins at (x, y) = (1, R), so I wrote x = 1.
There is the result I get for a radii between 1 pixel and 20 pixels:
As you can see, there are holes between circles and the circle with R = 3 is different from the given example (see below). Also, the circles are not really round compared to what you get with the midpoint algorithm.
How can I get the correct result?
Original Eric Andres’ algorithm:
I don't understand the way in which the algorithm has been presented in that paper. As I read it the else if clause associated with case (b) doesn't have a preceding if. I get the same results as you when transcribing it as written
Looking at the text, rather than the pseudocode, the article seems to be suggesting an algorithm of the following form:
x = 1
y = R
while x is less than or equal to y:
draw(x, y)
# ...
if the pixel to the right has radius between R - 1/2 and R + 1/2:
move one pixel to the right
if the pixel below has radius between R - 1/2 and R + 1/2:
move one pixel down
move one pixel diagonally down and right
Which seems plausible. In python:
import numpy as np
import matplotlib.pyplot as pp
fg = pp.figure()
ax = fg.add_subplot(111)
def point(x, y, c):
xx = [x - 1/2, x + 1/2, x + 1/2, x - 1/2, x - 1/2 ]
yy = [y - 1/2, y - 1/2, y + 1/2, y + 1/2, y - 1/2 ]
ax.plot(xx, yy, 'k-')
ax.fill_between(xx, yy, color=c, linewidth=0)
def half_integer_centered_circle(R, c):
x = 1
y = R
while y >= x:
point(x, y, c)
point(x, - y + 1, c)
point(- x + 1, y, c)
point(- x + 1, - y + 1, c)
point(y, x, c)
point(y, - x + 1, c)
point(- y + 1, x, c)
point(- y + 1, - x + 1, c)
def test(x, y):
rSqr = x**2 + y**2
return (R - 1/2)**2 < rSqr and rSqr < (R + 1/2)**2
if test(x + 1, y):
x += 1
elif test(x, y - 1):
y -= 1
x += 1
y -= 1
for i in range(1, 5):
half_integer_centered_circle(2*i - 1, 'r')
half_integer_centered_circle(2*i, 'b')
This seems to work as intended. Note that I removed the circle centre for simplicity. It should be easy enough to add in again.
Edit Realised I could match the radius 3 image if I tweaked the logic a bit.
I have been looking into this matter and observed three issues in the original paper:
The arithmetic circle copied here (Figure 10.a in the paper) is not consistent with the formal definition of the "half integer centered circle". In one case the distance to the center must be between R-1/2 and R+1/2 and in the other between integer values. The consequence is that this specific algorithm, if properly implemented, can never generate the circle of Figure 10.a.
There is a mistake in one of the inequalities of the algorithm pseudo code: the test for case (b) should be d <= (R + 1 - y) instead of d < (R + 1 - y).
All those pixels that satisfy x==y have only 4-fold symmetry (not 8-fold) and are generated twice by the algorithm. Although producing duplicated pixels may not be a problem for a drawing routine, it is not acceptable for the application that I am interested in. However this can be easily fixed by adding a simple check of the x==y condition and skipping the four duplicated pixels.
The python code of the original question includes the inequality error mentioned above and an additional mistake due to missing parenthesis in one of the expressions that should read d = d + (y - x - 1).
The following implementation fixes all this and is compatible with python2 and python3 (no integer division issues in the point() function):
import numpy as np
import matplotlib.pyplot as pp
fg = pp.figure()
ax = fg.add_subplot(111)
def point(x, y, c):
xx = [x - 0.5, x + 0.5, x + 0.5, x - 0.5, x - 0.5 ]
yy = [y - 0.5, y - 0.5, y + 0.5, y + 0.5, y - 0.5 ]
ax.plot(xx, yy, 'k-')
ax.fill_between(xx, yy, color=c, linewidth=0)
def half_integer_centered_circle(R, c):
x = 1
y = R
d = R
while y >= x:
point(x, y, c)
point(x, - y + 1, c)
point(- x + 1, y, c)
point(- x + 1, - y + 1, c)
if y != x:
point(y, x, c)
point(y, - x + 1, c)
point(- y + 1, x, c)
point(- y + 1, - x + 1, c)
if d > x:
d = d - x
x = x + 1
elif d <= R + 1 - y:
d = d + y - 1
y = y - 1
d = d + (y - x - 1)
x = x + 1
y = y - 1
for i in range(1, 5):
half_integer_centered_circle(2*i - 1, 'r')
half_integer_centered_circle(2*i, 'b')

Most efficient algorithm to find integer points within an ellipse

I'm trying to find all the integer lattice points within various 3D ellipses.
I would like my program to take an integer N, and count all the lattice points within the ellipses of the form ax^2 + by^2 + cz^2 = n, where a,b,c are fixed integers and n is between 1 and N. This program should then return N tuples of the form (n, numlatticePointsWithinEllipse n).
I'm currently doing it by counting the points on the ellipses ax^2 + by^2 + cz^2 = m, for m between 0 and n inclusive, and then summing over m. I'm also only looking at x, y and z all positive initially, and then adding in the negatives by permuting their signs later.
Ideally, I'd like to reach numbers of N = 1,000,000+ within the scale of hours
Taking a specific example of x^2 + y^2 + 3z^2 = N, here's the Haskell code I'm currently using:
import System.Environment
isqrt :: Int -> Int
isqrt 0 = 0
isqrt 1 = 1
isqrt n = head $ dropWhile (\x -> x*x > n) $ iterate (\x -> (x + n `div` x) `div` 2) (n `div` 2)
latticePointsWithoutNegatives :: Int -> [[Int]]
latticePointsWithoutNegatives 0 = [[0,0,0]]
latticePointsWithoutNegatives n = [[x,y,z] | x<-[0.. isqrt n], y<- [0.. isqrt (n - x^2)], z<-[max 0 (isqrt ((n-x^2 -y^2) `div` 3))], x^2 +y^2 + z^2 ==n]
latticePoints :: Int -> [[Int]]
latticePoints n = [ zipWith (*) [x1,x2,x3] y | [x1,x2,x3] <- (latticePointsWithoutNegatives n), y <- [[a,b,c] | a <- (if x1 == 0 then [0] else [-1,1]), b<-(if x2 == 0 then [0] else [-1,1]), c<-(if x3 == 0 then [0] else [-1,1])]]
latticePointsUpTo :: Int -> Int
latticePointsUpTo n = sum [length (latticePoints x) | x<-[0..n]]
listResults :: Int -> [(Int, Int)]
listResults n = [(x, latticePointsUpTo x) | x<- [1..n]]
main = do
args <- getArgs
let cleanArgs = read (head args)
print (listResults cleanArgs)
I've compiled this with
ghc -O2 latticePointsTest
but using the PowerShell "Measure-Command" command, I get the following results:
Measure-Command{./latticePointsTest 10}
TotalMilliseconds : 12.0901
Measure-Command{./latticePointsTest 100}
TotalMilliseconds : 12.0901
Measure-Command{./latticePointsTest 1000}
TotalMilliseconds : 31120.4503
and going any more orders of magnitude up takes us onto the scale of days, rather than hours or minutes.
Is there anything fundamentally wrong with the algorithm I'm using? Is there any core reason why my code isn't scaling well? Any guidance will be greatly appreciated. I may also want to process the data between "latticePoints" and "latticePointsUpTo", so I can't just rely entirely on clever number theoretic counting techniques - I need the underlying tuples preserved.
Some things I would try:
isqrt is not efficient for the range of values you are working work. Simply use the floating point sqrt function:
isqrt = floor $ sqrt ((fromIntegral n) :: Double)
Alternatively, instead of computing integer square roots, use logic like this in your list comprehensions:
x <- takeWhile (\x -> x*x <= n) [0..],
y <- takeWhile (\y -> y*y <= n - x*x) [0..]
Also, I would use expressions like x*x instead of x^2.
Finally, why not compute the number of solutions with something like this:
sols a b c n =
length [ () | x <- takeWhile (\x -> a*x*x <= n) [0..]
, y <- takeWhile (\y -> a*x*x+b*y*y <= n) [0..]
, z <- takeWhile (\z -> a*x*x+b*y*y+c*z*z <= n) [0..]
This does not exactly compute the same answer that you want because it doesn't account for positive and negative solutions, but you could easily modify it to compute your answer. The idea is to use one list comprehension instead of iterating over various values of n and summing.
Finally, I think using floor and sqrt to compute the integral square root is completely safe in this case. This code verifies that the integer square root by sing sqrt of (x*x) == x for all x <= 3037000499:
testAll :: Int -> IO ()
testAll n =
print $ head [ (x,a) | x <- [n,n-1 .. 1], let a = floor $ sqrt (fromIntegral (x*x) :: Double), a /= x ]
main = testAll 3037000499
Note I am running this on a 64-bit GHC - otherwise just use Int64 instead of Int since Doubles are 64-bit in either case. Takes only a minute or so to verify.
This shows that taking the floor of sqrt y will never result in the wrong answer if y <= 3037000499^2.

how to calculate a quadratic equation that best fits a set of given data

I have a vector X of 20 real numbers and a vector Y of 20 real numbers.
I want to model them as
y = ax^2+bx + c
How to find the value of 'a' , 'b' and 'c'
and best fit quadratic equation.
Given Values
X = (x1,x2,...,x20)
Y = (y1,y2,...,y20)
i need a formula or procedure to find following values
a = ???
b = ???
c = ???
Thanks in advance.
Everything #Bartoss said is right, +1. I figured I just add a practical implementation here, without QR decomposition. You want to evaluate the values of a,b,c such that the distance between measured and fitted data is minimal. You can pick as measure
sum(ax^2+bx + c -y)^2)
where the sum is over the elements of vectors x,y.
Then, a minimum implies that the derivative of the quantity with respect to each of a,b,c is zero:
d (sum(ax^2+bx + c -y)^2) /da =0
d (sum(ax^2+bx + c -y)^2) /db =0
d (sum(ax^2+bx + c -y)^2) /dc =0
these equations are
2(sum(ax^2+bx + c -y)*x^2)=0
2(sum(ax^2+bx + c -y)*x) =0
2(sum(ax^2+bx + c -y)) =0
Dividing by 2, the above can be rewritten as
a*sum(x^4) +b*sum(x^3) + c*sum(x^2) =sum(y*x^2)
a*sum(x^3) +b*sum(x^2) + c*sum(x) =sum(y*x)
a*sum(x^2) +b*sum(x) + c*N =sum(y)
where N=20 in your case. A simple code in python showing how to do so follows.
from numpy import random, array
from scipy.linalg import solve
import matplotlib.pylab as plt
a, b, c = 6., 3., 4.
N = 20
x = random.rand((N))
y = a * x ** 2 + b * x + c
y += random.rand((20)) #add a bit of noise to make things more realistic
x4 = (x ** 4).sum()
x3 = (x ** 3).sum()
x2 = (x ** 2).sum()
M = array([[x4, x3, x2], [x3, x2, x.sum()], [x2, x.sum(), N]])
K = array([(y * x ** 2).sum(), (y * x).sum(), y.sum()])
A, B, C = solve(M, K)
print 'exact values ', a, b, c
print 'calculated values', A, B, C
fig, ax = plt.subplots()
ax.plot(x, y, 'b.', label='data')
ax.plot(x, A * x ** 2 + B * x + C, 'r.', label='estimate')
A much faster way to implement solution is to use a nonlinear least squares algorithm. This will be faster to write, but not faster to run. Using the one provided by scipy,
from scipy.optimize import leastsq
def f(arg):
return a*x**2+b*x+c-y
(A,B,C),_=leastsq(f,[1,1,1])#you must provide a first guess to start with in this case.
That is a linear least squares problem. I think the easiest method which gives accurate results is QR decomposition using Householder reflections. It is not something to be explained in a stackoverflow answer, but I hope you will find all that is needed with this links.
If you never heard about these before and don't know how it connects with you problem:
A = [[x1^2, x1, 1]; [x2^2, x2, 1]; ...]
Y = [y1; y2; ...]
Now you want to find v = [a; b; c] such that A*v is as close as possible to Y, which is exactly what least squares problem is all about.

Ordered set and natural bijection (combinatorial species)

Let A some set (eg. 1000, 1001, 1002, ..., 1999).
Let lessThan some order relation function (eg. (a lessThan b) <-> (a > b)).
Let index a function (with inverse index') mapping a A element to naturals.
index a = 2000 - a
index' n = 2000 - n
Exists some way to construct index (and index') function for all (or some kinds of) (A, lessThan) pairs in P (polynomial time)?
Best regards and thank's in advance!
EDITED: A could be a set by definition (eg. all combinations with repetition of another big subset), then, we can't suppose A is completely traversable (in P).
EDITED: another non trivial example, let An a set (with elements like (x, y, p)) whose elements are ordered clockwise into a n X n square, like this
1 2 3 4
12 13 14 5
11 16 15 6
10 9 8 7
then, we can map each triplet in An to Bn = [1..n^2] with O(1) (a polynomial).
Given one An element we can index to Bn with O(1).
Given one Bn element we can index' to An with O(1).
// Square perimeter; square x = 1, 2, 3, ...
Func<int, int, int> perimeter = ( x, n ) => 4 * ( n - 2 * x + 1 );
// Given main diagonal coordinates (1, 1), (2, 2), ... return cell number
Func<int, int, int> diagonalPos = ( x, n ) => -4 * x * x + ( 4 * n + 8 ) * x - 4 * n - 3;
// Given a number, return their square
Func<int, int, int> inSquare = ( z, n ) => (int) Math.Floor(n * 0.5 - 0.5 * Math.Sqrt(n * n - z + 1.0) + 1.0);
Func<int, int, Point> coords = ( z, n ) => {
var s = inSquare(z, n);
var l = perimeter(s, n) / 4; // length sub-square edge -1
var l2 = l + l;
var l3 = l2 + l;
var d = diagonalPos(s, n);
if( z <= d + l )
return new Point(s + z - d, s);
if( z <= d + l2 )
return new Point(s + l, s + z - d - l);
if( z <= d + l3 )
return new Point(s + d + l3 - z, s + l);
return new Point(s, s + d + l2 + l2 - z);
(I have read about "Combinatorial species", "Ordered construction of combinatorial objects", "species" haskell package and others)
I may be misunderstanding what you want, but in case I'm not:
If lessThan defines a total order on the set, you can create the index and index' functions by
converting the set to a list (or an array/vector)
sorting that according to lessThan
construct index' as Data.Map.fromDistinctAscList $ zip [1 .. ] sortedList
construct index as Data.Map.fromDistinctAscList $ zip (map NTC sortedList) [1 .. ]
where NTC is a newtype constructor wrapping the type of elements of the set in a newtype whose Ord instance is given by lessThan.
newtype Wrapped = NTC typeOfElements
instance Eq Wrapped where
(NTC x) /= (NTC y) = x `lessThan` y || y `lessThan` x
-- that can usually be done more efficiently
instance Ord Wrapped where
(NTC x) <= (NTC y) = not $ y `lessThan` x
EDITED: A could be a set by definition (eg. all combinations with repetition of another big subset), then, we can't suppose A is completely traversable (in P).
In that case, unless I'm missing something fundamental, it's impossible in principle, because the index' function would provide a complete traversal of the set.
So you can create the index and index' functions in polynomial time if and only if the set is traversable in polynomial time.

Matlab wrong results with array calculations

I'm trying to reproduce the results from a paper for which I give a link to avoid writing down all the math needed:
On Modeling and Simulation of Game Theory-based Defense Mechanisms against DoS and DDoS Attacks
More specifically what I'm having a problem with is the Figure 3 plot. The plot gives in the z axis the results of equation 3 given the two variables m and M. The other equations that will be needed are 5,6,7 and there are also two small ones in the paragraph before equation 6. Also in order to see what Xi is check the 4.2 part. All the variable values needed are given before the plot.
Now to get to the point, I'm trying to create the exact same plot in matlab but I've failed and I need help because my matlab skills are not so good.
I have a script file in which I have the following:
w1 = 1000;
w2 = 1000;
w3 = 10;
B = 2000;
n = 20;
r_l = 60;
s_l = 20;
g = 10;
a_f = 5000;
b = 20;
vx = 0 : 1 : 500;
vy = 0 : 1 : 90;
[x,y] = meshgrid(vx,vy);
z = payoff(w1, w2, w3, y, r_l, n, g, B, b, x, s_l, a_f);
h = surfc(x,y,z);
set(h, 'edgecolor','none')
xlabel('Firewall Midpoint (M)')
ylabel('Number of zombies')
zlabel('Attackers payoff')
Payoff is a function that is as follows:
function out = payoff(w1, w2, w3, m, r_l, n, g, B, b, M, s_l, a_f)
r_a = a_f./ m;
r_a_dash = r_a.*(1-Fx(r_a, b, M, B));
r_l_dash = r_l.*(1-Fx(r_l, b, M, B));
v_b = ( m .* r_a_dash ) ./ ( n .* r_l_dash + m .* r_a_dash );
v_n = normcdf(( g .* ( n .* r_l_dash + m .* r_a_dash ) ./ B ), r_l, s_l);
out = w1 * v_b + w2 * v_n - w3 * m;
Fx again is a function that does the following:
function out = Fx(x,b,M,B)
I don't know where exactly is the mistake but the plot I get is the following which is not the same as the one in the paper.
The figure in the paper has a U shaped curve along the Firewall Midpointaxis whereas mine is monotonically increasing.
Can anyone spot any mistake(s) that I have? Thanks in advance.
The big thing I noticed was in your code you used:
v_n = normcdf(( g .* ( n .* r_l_dash + m .* r_a_dash ) ./ B ), r_l, s_l);
When you should have used (I think):
v_n = normcdf(( g .* ( n .* r_l_dash + m .* r_a_dash ) ./ B ), r_l_dash, s_l);
In the paper, they state:
Recall that rl represents the expected rate of a legitimate flow. Let the average rate of legitimate flows passing through the firewall be rl′.
In the normcdf function, the second argument should be the average, mu. This gives me a U-shaped curve along the Firewall Midpoint, however I can see it's not exact to the picture and I believe it's due to the value of b, as someone had already stated was not given.
Hope this helps. There may still be a calculation error as I've played around with various values of b and still can't match the image in the paper.
