Sort points in clockwise order? - algorithm

Given an array of x,y points, how do I sort the points of this array in clockwise order (around their overall average center point)? My goal is to pass the points to a line-creation function to end up with something looking rather "solid", as convex as possible with no lines intersecting.
For what it's worth, I'm using Lua, but any pseudocode would be appreciated.
Update: For reference, this is the Lua code based on Ciamej's excellent answer (ignore my "app" prefix):
function appSortPointsClockwise(points)
local centerPoint = appGetCenterPointOfPoints(points)
app.pointsCenterPoint = centerPoint
table.sort(points, appGetIsLess)
return points
end
function appGetIsLess(a, b)
local center = app.pointsCenterPoint
if a.x >= 0 and b.x < 0 then return true
elseif a.x == 0 and b.x == 0 then return a.y > b.y
end
local det = (a.x - center.x) * (b.y - center.y) - (b.x - center.x) * (a.y - center.y)
if det < 0 then return true
elseif det > 0 then return false
end
local d1 = (a.x - center.x) * (a.x - center.x) + (a.y - center.y) * (a.y - center.y)
local d2 = (b.x - center.x) * (b.x - center.x) + (b.y - center.y) * (b.y - center.y)
return d1 > d2
end
function appGetCenterPointOfPoints(points)
local pointsSum = {x = 0, y = 0}
for i = 1, #points do pointsSum.x = pointsSum.x + points[i].x; pointsSum.y = pointsSum.y + points[i].y end
return {x = pointsSum.x / #points, y = pointsSum.y / #points}
end

First, compute the center point.
Then sort the points using whatever sorting algorithm you like, but use special comparison routine to determine whether one point is less than the other.
You can check whether one point (a) is to the left or to the right of the other (b) in relation to the center by this simple calculation:
det = (a.x - center.x) * (b.y - center.y) - (b.x - center.x) * (a.y - center.y)
if the result is zero, then they are on the same line from the center, if it's positive or negative, then it is on one side or the other, so one point will precede the other.
Using it you can construct a less-than relation to compare points and determine the order in which they should appear in the sorted array. But you have to define where is the beginning of that order, I mean what angle will be the starting one (e.g. the positive half of x-axis).
The code for the comparison function can look like this:
bool less(point a, point b)
{
if (a.x - center.x >= 0 && b.x - center.x < 0)
return true;
if (a.x - center.x < 0 && b.x - center.x >= 0)
return false;
if (a.x - center.x == 0 && b.x - center.x == 0) {
if (a.y - center.y >= 0 || b.y - center.y >= 0)
return a.y > b.y;
return b.y > a.y;
}
// compute the cross product of vectors (center -> a) x (center -> b)
int det = (a.x - center.x) * (b.y - center.y) - (b.x - center.x) * (a.y - center.y);
if (det < 0)
return true;
if (det > 0)
return false;
// points a and b are on the same line from the center
// check which point is closer to the center
int d1 = (a.x - center.x) * (a.x - center.x) + (a.y - center.y) * (a.y - center.y);
int d2 = (b.x - center.x) * (b.x - center.x) + (b.y - center.y) * (b.y - center.y);
return d1 > d2;
}
This will order the points clockwise starting from the 12 o'clock. Points on the same "hour" will be ordered starting from the ones that are further from the center.
If using integer types (which are not really present in Lua) you'd have to assure that det, d1 and d2 variables are of a type that will be able to hold the result of performed calculations.
If you want to achieve something looking solid, as convex as possible, then I guess you're looking for a Convex Hull. You can compute it using the Graham Scan.
In this algorithm, you also have to sort the points clockwise (or counter-clockwise) starting from a special pivot point. Then you repeat simple loop steps each time checking if you turn left or right adding new points to the convex hull, this check is based on a cross product just like in the above comparison function.
Edit:
Added one more if statement if (a.y - center.y >= 0 || b.y - center.y >=0) to make sure that points that have x=0 and negative y are sorted starting from the ones that are further from the center. If you don't care about the order of points on the same 'hour' you can omit this if statement and always return a.y > b.y.
Corrected the first if statements with adding -center.x and -center.y.
Added the second if statement (a.x - center.x < 0 && b.x - center.x >= 0). It was an obvious oversight that it was missing. The if statements could be reorganized now because some checks are redundant. For example, if the first condition in the first if statement is false, then the first condition of the second if must be true. I decided, however, to leave the code as it is for the sake of simplicity. It's quite possible that the compiler will optimize the code and produce the same result anyway.

What you're asking for is a system known as polar coordinates. Conversion from Cartesian to polar coordinates is easily done in any language. The formulas can be found in this section.
After converting to polar coordinates, just sort by the angle, theta.

An interesting alternative approach to your problem would be to find the approximate minimum to the Traveling Salesman Problem (TSP), ie. the shortest route linking all your points. If your points form a convex shape, it should be the right solution, otherwise, it should still look good (a "solid" shape can be defined as one that has a low perimeter/area ratio, which is what we are optimizing here).
You can use any implementation of an optimizer for the TSP, of which I am pretty sure you can find a ton in your language of choice.

Another version (return true if a comes before b in counterclockwise direction):
bool lessCcw(const Vector2D &center, const Vector2D &a, const Vector2D &b) const
{
// Computes the quadrant for a and b (0-3):
// ^
// 1 | 0
// ---+-->
// 2 | 3
const int dax = ((a.x() - center.x()) > 0) ? 1 : 0;
const int day = ((a.y() - center.y()) > 0) ? 1 : 0;
const int qa = (1 - dax) + (1 - day) + ((dax & (1 - day)) << 1);
/* The previous computes the following:
const int qa =
( (a.x() > center.x())
? ((a.y() > center.y())
? 0 : 3)
: ((a.y() > center.y())
? 1 : 2)); */
const int dbx = ((b.x() - center.x()) > 0) ? 1 : 0;
const int dby = ((b.y() - center.y()) > 0) ? 1 : 0;
const int qb = (1 - dbx) + (1 - dby) + ((dbx & (1 - dby)) << 1);
if (qa == qb) {
return (b.x() - center.x()) * (a.y() - center.y()) < (b.y() - center.y()) * (a.x() - center.x());
} else {
return qa < qb;
}
}
This is faster, because the compiler (tested on Visual C++ 2015) doesn't generate jump to compute dax, day, dbx, dby. Here the output assembly from the compiler:
; 28 : const int dax = ((a.x() - center.x()) > 0) ? 1 : 0;
vmovss xmm2, DWORD PTR [ecx]
vmovss xmm0, DWORD PTR [edx]
; 29 : const int day = ((a.y() - center.y()) > 0) ? 1 : 0;
vmovss xmm1, DWORD PTR [ecx+4]
vsubss xmm4, xmm0, xmm2
vmovss xmm0, DWORD PTR [edx+4]
push ebx
xor ebx, ebx
vxorps xmm3, xmm3, xmm3
vcomiss xmm4, xmm3
vsubss xmm5, xmm0, xmm1
seta bl
xor ecx, ecx
vcomiss xmm5, xmm3
push esi
seta cl
; 30 : const int qa = (1 - dax) + (1 - day) + ((dax & (1 - day)) << 1);
mov esi, 2
push edi
mov edi, esi
; 31 :
; 32 : /* The previous computes the following:
; 33 :
; 34 : const int qa =
; 35 : ( (a.x() > center.x())
; 36 : ? ((a.y() > center.y()) ? 0 : 3)
; 37 : : ((a.y() > center.y()) ? 1 : 2));
; 38 : */
; 39 :
; 40 : const int dbx = ((b.x() - center.x()) > 0) ? 1 : 0;
xor edx, edx
lea eax, DWORD PTR [ecx+ecx]
sub edi, eax
lea eax, DWORD PTR [ebx+ebx]
and edi, eax
mov eax, DWORD PTR _b$[esp+8]
sub edi, ecx
sub edi, ebx
add edi, esi
vmovss xmm0, DWORD PTR [eax]
vsubss xmm2, xmm0, xmm2
; 41 : const int dby = ((b.y() - center.y()) > 0) ? 1 : 0;
vmovss xmm0, DWORD PTR [eax+4]
vcomiss xmm2, xmm3
vsubss xmm0, xmm0, xmm1
seta dl
xor ecx, ecx
vcomiss xmm0, xmm3
seta cl
; 42 : const int qb = (1 - dbx) + (1 - dby) + ((dbx & (1 - dby)) << 1);
lea eax, DWORD PTR [ecx+ecx]
sub esi, eax
lea eax, DWORD PTR [edx+edx]
and esi, eax
sub esi, ecx
sub esi, edx
add esi, 2
; 43 :
; 44 : if (qa == qb) {
cmp edi, esi
jne SHORT $LN37#lessCcw
; 45 : return (b.x() - center.x()) * (a.y() - center.y()) < (b.y() - center.y()) * (a.x() - center.x());
vmulss xmm1, xmm2, xmm5
vmulss xmm0, xmm0, xmm4
xor eax, eax
pop edi
vcomiss xmm0, xmm1
pop esi
seta al
pop ebx
; 46 : } else {
; 47 : return qa < qb;
; 48 : }
; 49 : }
ret 0
$LN37#lessCcw:
pop edi
pop esi
setl al
pop ebx
ret 0
?lessCcw##YA_NABVVector2D##00#Z ENDP ; lessCcw
Enjoy.

vector3 a = new vector3(1 , 0 , 0)..............w.r.t X_axis
vector3 b = any_point - Center;
- y = |a * b| , x = a . b
- Atan2(y , x)...............................gives angle between -PI to + PI in radians
- (Input % 360 + 360) % 360................to convert it from 0 to 2PI in radians
- sort by adding_points to list_of_polygon_verts by angle we got 0 to 360
Finally you get Anticlockwize sorted verts
list.Reverse()..................Clockwise_order

I know this is somewhat of an old post with an excellent accepted answer, but I feel like I can still contribute something useful. All the answers so far essentially use a comparison function to compare two points and determine their order, but what if you want to use only one point at a time and a key function?
Not only is this possible, but the resulting code is also extremely compact. Here is the complete solution using Python's built-in sorted function:
# Create some random points
num = 7
points = np.random.random((num, 2))
# Compute their center
center = np.mean(points, axis=0)
# Make arctan2 function that returns a value from [0, 2 pi) instead of [-pi, pi)
arctan2 = lambda s, c: angle if (angle := np.arctan2(s, c)) >= 0 else 2 * np.pi + angle
# Define the key function
def clockwise_around_center(point):
diff = point - center
rcos = np.dot(diff, center)
rsin = np.cross(diff, center)
return arctan2(rsin, rcos)
# Sort our points using the key function
sorted_points = sorted(points, key=clockwise_around_center)
This answer would also work in 3D, if the points are on a 2D plane embedded in 3D. We would only have to modify the calculation of rsin by dotting it with the normal vector of the plane. E.g.
rsin = np.dot([0,0,1], np.cross(diff, center))
if that plane has e_z as its normal vector.
The advantage of this code is that it works on only one point at the time using a key function. The quantity rsin, if you work it out on a coefficient level, is exactly the same as what is called det in the accepter answer, except that I compute it between point - center and center, not between point1 - center and point2 - center. But the geometrical meaning of this quantity is the radius times the sin of the angle, hence I call this variable rsin. Similarly for the dot product, which is the radius times the cosine of the angle and hence called rcos.
One could argue that this solution uses arctan2, and is therefore less clean. However, I personally think that the clearity of using a key function outweighs the need for one call to a trig function. Note that I prefer to have arctan2 return a value from [0, 2 pi), because then we get the angle 0 when point happens to be identical to center, and thus it will be the first point in our sorted list. This is an optional choice.
In order to understand why this code works, the crucial insight is that all our points are defined as arrows with respect to the origin, including the center point itself. So if we calculate point - center, this is equivalent to placing the arrow from the tip of center to the tip of point, at the origin. Hence we can sort the arrow point - center with respect to the angle it makes with the arrow pointing to center.

Here's a way to sort the vertices of a rectangle in clock-wise order. I modified the original solution provided by pyimagesearch and got rid of the scipy dependency.
import numpy as np
def pointwise_distance(pts1, pts2):
"""Calculates the distance between pairs of points
Args:
pts1 (np.ndarray): array of form [[x1, y1], [x2, y2], ...]
pts2 (np.ndarray): array of form [[x1, y1], [x2, y2], ...]
Returns:
np.array: distances between corresponding points
"""
dist = np.sqrt(np.sum((pts1 - pts2)**2, axis=1))
return dist
def order_points(pts):
"""Orders points in form [top left, top right, bottom right, bottom left].
Source: https://www.pyimagesearch.com/2016/03/21/ordering-coordinates-clockwise-with-python-and-opencv/
Args:
pts (np.ndarray): list of points of form [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]
Returns:
[type]: [description]
"""
# sort the points based on their x-coordinates
x_sorted = pts[np.argsort(pts[:, 0]), :]
# grab the left-most and right-most points from the sorted
# x-roodinate points
left_most = x_sorted[:2, :]
right_most = x_sorted[2:, :]
# now, sort the left-most coordinates according to their
# y-coordinates so we can grab the top-left and bottom-left
# points, respectively
left_most = left_most[np.argsort(left_most[:, 1]), :]
tl, bl = left_most
# now that we have the top-left coordinate, use it as an
# anchor to calculate the Euclidean distance between the
# top-left and right-most points; by the Pythagorean
# theorem, the point with the largest distance will be
# our bottom-right point. Note: this is a valid assumption because
# we are dealing with rectangles only.
# We need to use this instead of just using min/max to handle the case where
# there are points that have the same x or y value.
D = pointwise_distance(np.vstack([tl, tl]), right_most)
br, tr = right_most[np.argsort(D)[::-1], :]
# return the coordinates in top-left, top-right,
# bottom-right, and bottom-left order
return np.array([tl, tr, br, bl], dtype="float32")

With numpy:
import matplotlib.pyplot as plt
import numpy as np
# List of coords
coords = np.array([7,7, 5, 0, 0, 0, 5, 10, 10, 0, 0, 5, 10, 5, 0, 10, 10, 10]).reshape(-1, 2)
centroid = np.mean(coords, axis=0)
sorted_coords = coords[np.argsort(np.arctan2(coords[:, 1] - centroid[1], coords[:, 0] - centroid[0])), :]
plt.scatter(coords[:,0],coords[:,1])
plt.plot(coords[:,0],coords[:,1])
plt.plot(sorted_coords[:,0],sorted_coords[:,1])
plt.show()

Related

How to vectorize calculation of homogenous transformation matrix/tensor?

For my simulation I need to calculate many transformation matrices therefore I would like to vectorize a for-loop that I'm using right now.
Is there a way to vectorize the existing for-loop or do I probably need another approach in calculating the vectors and matrices before?
I prepared a little working example:
n_dim = 1e5;
p1_3 = zeros(3,n_dim); % translationvector (no trans.) [3x100000]
tx = ones(1,n_dim)*15./180*pi; % turn angle around x-axis (fixed) [1x100000]
ty = zeros(1,n_dim); % turn angle around y-axis (no turn) [1x100000]
tz = randi([-180 180], 1, n_dim)./180*pi; % turn angle around z-axis (different turn) [1x100000]
hom = [0 0 0 1].*ones(n_dim,4); % vector needed for homogenous transformation [100000x4]
% calculate sin/cosin values for rotation [100000x1 each]
cx = cos(tx)';
sx = sin(tx)';
cy = cos(ty)';
sy = sin(ty)';
cz = cos(tz)';
sz = sin(tz)';
% calculate rotation matrix [300000x3]
R_full = [ cy.*cz, -cy.*sz, sy; ...
cx.*sz+sx.*sy.*cz, cx.*cz-sx.*sy.*sz, -sx.*cy; ...
sx.*sz-cx.*sy.*cz, cz.*sx+cx.*sy.*sz, cx.*cy];
% prealocate transformation tensor
T = zeros(4,4,n_dim);
% create transformation tensor here
% T = [R11 R12 R13 p1;
% R21 R22 R23 p2;
% R31 R32 R33 p3;
% 0 0 0 1]
tic
for i = 1:n_dim
T(:,:,i) = [[R_full(i,1), R_full(i,2), R_full(i,3); ...
R_full(n_dim+i,1), R_full(n_dim+i,2), R_full(n_dim+i,3); ...
R_full(2*n_dim+i,1), R_full(2*n_dim+i,2), R_full(2*n_dim+i,3)], p1_3(:,i);
hom(i,:)];
end
toc
Try this:
T = permute(reshape(R_full,n_dim,3,3),[2,3,1]);
T(4,4,:) = 1;
Your method:
Elapsed time is 0.839315 seconds.
This method:
Elapsed time is 0.015389 seconds.
EDIT
I included Florian's answer, and of course he wins.
Are you ready for some crazy indexing foo? Here we go:
clear all;
close all;
clc;
n_dim_max = 200;
t_loop = zeros(n_dim_max, 1);
t_indexing = t_loop;
t_permute = t_loop;
fprintf("---------------------------------------------------------------\n");
for n_dim = 1:n_dim_max
p1_3 = zeros(3,n_dim); % translationvector (no trans.) [3x100000]
tx = ones(1,n_dim)*15./180*pi; % turn angle around x-axis (fixed) [1x100000]
ty = zeros(1,n_dim); % turn angle around y-axis (no turn) [1x100000]
tz = randi([-180 180], 1, n_dim)./180*pi; % turn angle around z-axis (different turn) [1x100000]
hom = [0 0 0 1].*ones(n_dim,4); % vector needed for homogenous transformation [100000x4]
% calculate sin/cosin values for rotation [100000x1 each]
cx = cos(tx)';
sx = sin(tx)';
cy = cos(ty)';
sy = sin(ty)';
cz = cos(tz)';
sz = sin(tz)';
% calculate rotation matrix [300000x3]
R_full = [ cy.*cz, -cy.*sz, sy; ...
cx.*sz+sx.*sy.*cz, cx.*cz-sx.*sy.*sz, -sx.*cy; ...
sx.*sz-cx.*sy.*cz, cz.*sx+cx.*sy.*sz, cx.*cy];
% prealocate transformation tensor
T = zeros(4,4,n_dim);
% create transformation tensor here
% T = [R11 R12 R13 p1;
% R21 R22 R23 p2;
% R31 R32 R33 p3;
% 0 0 0 1]
tic
for i = 1:n_dim
T(:,:,i) = [[R_full(i,1), R_full(i,2), R_full(i,3); ...
R_full(n_dim+i,1), R_full(n_dim+i,2), R_full(n_dim+i,3); ...
R_full(2*n_dim+i,1), R_full(2*n_dim+i,2), R_full(2*n_dim+i,3)], p1_3(:,i);
hom(i,:)];
end
t_loop(n_dim) = toc;
tic
% prealocate transformation tensor
TT = zeros(4, 4);
TT(end) = 1;
TT = repmat(TT, 1, 1, n_dim);
% Crazy index finding.
temp = repmat(1:(3*n_dim):(3*3*n_dim), 3, 1) + n_dim .* ((0:2).' * ones(1, 3));
temp = repmat(temp, 1, 1, n_dim);
t = zeros(1, 1, n_dim);
t(:) = 0:(n_dim-1);
temp = temp + ones(3, 3, n_dim) .* t;
% Direct assignment using crazily found indices.
TT(1:3, 1:3, :) = R_full(temp);
t_indexing(n_dim) = toc;
tic
% prealocate transformation tensor
TTT = zeros(4, 4);
TTT(end) = 1;
TTT = repmat(TTT, 1, 1, n_dim);
TTT(1:3, 1:3, :) = permute(reshape(R_full, n_dim, 3, 3), [2, 3, 1]);
t_permute(n_dim) = toc;
% Check
fprintf("n_dim: %d\n", n_dim);
fprintf("T equals TT: %d\n", (sum(T(:) == TT(:))) == (4 * 4 * n_dim));
fprintf("T equals TTT: %d\n", (sum(T(:) == TTT(:))) == (4 * 4 * n_dim));
fprintf("---------------------------------------------------------------\n");
end
figure(1);
plot(1:n_dim_max, t_loop, 1:n_dim_max, t_indexing, 1:n_dim_max, t_permute);
legend({'Loop', 'Indexing', 'Permute'});
xlabel('Dimension');
ylabel('Elapsed time [s]');
Sorry, the script got lengthy, because it's your initial solution, my solution, (and Florian's solution) and testing script all-in-one. Lazy friday was the reason for me not to split things properly...
How did I get there? Simple "reverse engineering". I took your solution for n_dim = [2, 3, 4] and determined [~, ii] = ismember(T(1:3, 1:3, :), R_full), i.e. the mapping of R_full to T(1:3, 1:3, :). Then, I analyzed the indexing scheme, and found the proper solution to mimic that mapping for arbitrary n_dim. Done! ;-) Yes, I like crazy indexing stuff.

How to set a negative number to infinity without using an if statement (or ternary)

I have the following piece of code:
for(uint i=0; i<6; i++)
coeffs[i] = coeffs[i] < 0 ? 1.f/0.f : coeffs[i];
Which checks an array with 6 elements and if it finds a negative entry it sets it to infinity and otherwise leaves the entry intact.
I need to do the same thing without using any if-statements
One obvious question would be what infinity you need when the input is less than 0.
Any Infinity
If the result can be negative infinity, I'd do something like this:
coeffs[i] /= (coeffs[i] >= 0.0);
The coeffs[i] >= 0.0 produces 1.0 if the input is positive, and 0.0 if the input is negative. Dividing the input by 1.0 leaves it unchanged. Dividing it by 0 produces infinity.
Positive Infinity
If it has to be a positive infinity, you'd change that to something like:
coeffs[i] = (fabs(coeffs[i]) / (coeffs[i] >= 0.0);
By taking the absolute value before the division, the infinity we produce for a negative is forced to be positive. Otherwise, the input started out positive, so the fabs and division by 1.0 leave the value intact.
Performance
As to whether this will actually improve performance, that's probably open to a lot more question. For the moment, let's look at code for the CPU, since Godbolt lets us examine that pretty easily.
If we look at this:
#include <limits>
double f(double in) {
return in / (in >= 0.0);
}
double g(double in) {
return in > 0.0 ? in : std::numeric_limits<double>::infinity();
}
So, let's look at the code produced for the first function:
xorpd xmm1, xmm1
cmplesd xmm1, xmm0
movsd xmm2, qword ptr [rip + .LCPI0_0] # xmm2 = mem[0],zero
andpd xmm2, xmm1
divsd xmm0, xmm2
ret
So that's not too terrible--branch-free, and (depending on the exact processor involved) a throughput around 8-10 cycles on most reasonably modern processors. On the other hand, here's the code produced for the second function:
xorpd xmm1, xmm1
cmpltsd xmm1, xmm0
andpd xmm0, xmm1
movsd xmm2, qword ptr [rip + .LCPI1_0] # xmm2 = mem[0],zero
andnpd xmm1, xmm2
orpd xmm0, xmm1
ret
This is also branch-free--and doesn't have that (relatively slow) divsd instruction either. Again, performance will vary depending on the specific processor, but we can probably plan on this having a throughput around 6 cycles or so--not tremendously faster than the previous, but probably at least a few cycles faster part of the time, and almost certain to never be any slower. In short, it's probably preferable under nearly any possible CPU.
GPU Code
GPUs have their own instruction sets, of course--but given the penalty they suffer for branches, compilers for them (and the instruction sets they provide) probably do at least as much to help eliminate branches as CPUs do, so chances are that the straightforward code will work just fine on it as well (though to say with certainty, you'd need to either examine the code it produced or profile it).
Big disclaimer up front: I haven't actually tested this, but I doubt it really is faster than using ternaries. Perform benchmarks to see if it really is an optimization!
Also: these are implemented/tested in C. They should be easily portable to GLSL, but you may need explicit type-conversions, which may make them (even) slower.
There are two ways to do it, based on whether you strictly need INFINITY or can just use a large value. Neither use branching expressions or statements, but they do involve a comparison. Both use the fact that comparison operators in C always return either 0 or 1.
The INFINITY-based way uses a 2-element array and has the comparison output choose the element of the choice-array:
float chooseCoefs[2] = {0.f, INFINITY}; /* initialize choice-array */
for(uint i = 0; i < 6; i++){
int neg = coefs[i] < 0; /* outputs 1 or 0 */
/* set 0-element of choice-array to regular value */
chooseCoefs[0] = coefs[i];
/* if neg == 0: pick coefs[i], else neg == 1: pick INFINITY */
coefs[i] = chooseCoefs[neg];
}
If you can use a normal (but big) value instead of INFINITY you can two multiplications & one addition instead:
#define BIGFLOAT 1000.f /* a swimming sasquatch... */
for(uint i = 0; i < 6; i++){
int neg = coefs[i] < 0;
/* if neg == 1: 1 * BIGFLOAT + 0 * coefs[i] == BIGFLOAT,
else neg == 0: 0 * BIGFLOAT + 1 * coefs[i] == coefs[i] */
coefs[i] = neg * BIGFLOAT + !neg * coefs[i];
}
Again, I didn't benchmark these, but my guess is that at least the array-based solution is far slower than simple ternaries. Don't underestimate the optimizing-power of your compiler!

How to divide by 9 using just shifts/add/sub?

Last week I was in an interview and there was a test like this:
Calculate N/9 (given that N is a positive integer), using only
SHIFT LEFT, SHIFT RIGHT, ADD, SUBSTRACT instructions.
first, find the representation of 1/9 in binary
0,0001110001110001
means it's (1/16) + (1/32) + (1/64) + (1/1024) + (1/2048) + (1/4096) + (1/65536)
so (x/9) equals (x>>4) + (x>>5) + (x>>6) + (x>>10) + (x>>11)+ (x>>12)+ (x>>16)
Possible optimization (if loops are allowed):
if you loop over 0001110001110001b right shifting it each loop,
add "x" to your result register whenever the carry was set on this shift
and shift your result right each time afterwards,
your result is x/9
mov cx, 16 ; assuming 16 bit registers
mov bx, 7281 ; bit mask of 2^16 * (1/9)
mov ax, 8166 ; sample value, (1/9 of it is 907)
mov dx, 0 ; dx holds the result
div9:
inc ax ; or "add ax,1" if inc's not allowed :)
; workaround for the fact that 7/64
; are a bit less than 1/9
shr bx,1
jnc no_add
add dx,ax
no_add:
shr dx,1
dec cx
jnz div9
( currently cannot test this, may be wrong)
you can use fixed point math trick.
so you just scale up so the significant fraction part goes to integer range, do the fractional math operation you need and scale back.
a/9 = ((a*10000)/9)/10000
as you can see I scaled by 10000. Now the integer part of 10000/9=1111 is big enough so I can write:
a/9 = ~a*1111/10000
power of 2 scale
If you use power of 2 scale then you need just to use bit-shift instead of division. You need to compromise between precision and input value range. I empirically found that on 32 bit arithmetics the best scale for this is 1<<18 so:
(((a+1)<<18)/9)>>18 = ~a/9;
The (a+1) corrects the rounding errors back to the right range.
Hardcoded multiplication
Rewrite the multiplication constant to binary
q = (1<<18)/9 = 29127 = 0111 0001 1100 0111 bin
Now if you need to compute c=(a*q) use hard-coded binary multiplication: for each 1 of the q you can add a<<(position_of_1) to the c. If you see something like 111 you can rewrite it to 1000-1 minimizing the number of operations.
If you put all of this together you should got something like this C++ code of mine:
DWORD div9(DWORD a)
{
// ((a+1)*q)>>18 = (((a+1)<<18)/9)>>18 = ~a/9;
// q = (1<<18)/9 = 29127 = 0111 0001 1100 0111 bin
// valid for a = < 0 , 147455 >
DWORD c;
c =(a<< 3)-(a ); // c= a*29127
c+=(a<< 9)-(a<< 6);
c+=(a<<15)-(a<<12);
c+=29127; // c= (a+1)*29127
c>>=18; // c= ((a+1)*29127)>>18
return c;
}
Now if you see the binary form the pattern 111000 is repeating so yu can further improve the code a bit:
DWORD div9(DWORD a)
{
DWORD c;
c =(a<<3)-a; // first pattern
c+=(c<<6)+(c<<12); // and the other 2...
c+=29127;
c>>=18;
return c;
}

subtracting two 8 bits integer bit by bit in assembly x86

so I'm trying to implement this algorithm to calculate the difference of two 8 bits integers
b = 0
difference = 0
for i = 0 to (n-1)
x = bit i of X
y = bit i of Y
bit i of difference = x xor y xor b
b = ((not x) and y) or ((not x) and b) or (y and b)
end for loop
this is what i did
calculation:
mov ebx, 0
mov diff, 0
mov ecx, 7
subtract:
mov al, X
and al, 1h ; find bit i of X
mov dl, Y
and dl, 1h ; find bit i of Y
mov ah, al
mov dh, al
xor al, dl
xor al, bl
mov diff, al ; find bit i of the difference
; calculate b value for the next interation
not ah
and ah, dl
not dh
and dh, dl
and dl, bl
or ah, dh
or ah, dl
mov bl, ah
; rotate X and Y to get ready for the next iteration
rol X, 1
rol Y, 1
loop subtract
the problem with this code is its only work on the first iteration of the loop
so for example if I enter first number to be 2 and the second number to be 1
the when i go through the loop,first iteration, the x value would be 0 and the y value would be 1, the i bit of the difference would be 1 and b value calculated would be 1
, but this only work for the first iteration, on the next iteration, I had x = 0, y = 0 and b = 1(from the last calculation), so I wanted my diff to be 1 and my b value for this iteration to be 1, instead I got 0 for both of them.
why doesn't the code work, as i was following the algorithm, and implement accordingly.
thank in advance
and
Try a higher level language first to understand the algorithm, then port that to asm.
#include <stdio.h>
//b = 0
//difference = 0
//for i = 0 to (n-1)
//
// x = bit i of X
// y = bit i of Y
// bit i of difference = x xor y xor b
// b = ((not x) and y) or ((not x) and b) or (y and b)
//
//end for loop
int main ( void )
{
unsigned char X,Y,Z;
unsigned char x,y,z,b,bnext;
unsigned char i;
X=0Xf5; Y=0Xf1;
b=0;
Z=0;
for (i=1;i;i<<=1)
{
x=0;
y=0;
if(i&X) x=1;
if(i&Y) y=1;
z=((x^y)^b)&1;
if(z) Z|=i;
bnext = ((~x)&y) | ((~x)&b) | (y&b);
b=bnext&1;
}
printf("0x%02X 0x%02X\n",Z,X-Y);
return(0);
}
you might even re-write it a few times to approach real instructions.
z=((x^y)^b)&1;
becomes
z = x;
z = z ^ y;
z = z ^ b;
z = z & 1;

2D Grid Direction Iteration

I'm looking for some sort of formula that, for each i from 0 to 7 will return an x and a y offset to an adjacent cell in a certain direction. The idea is that if I'm in a grid of cells, and I want to scan the surrounding cells, I don't have to make a christmas tree of if statements (much, much slower than arithmatic). Note that this scan includes the diagonals. I've been looking online for something like this, but with no luck.
The directions can be output in any order as long as each input yields a different output, x and y can only equal 1, 0 or -1, and none of the outputs are (0, 0).
Assuming that x,y is original coordinate and nx,ny will be the current neighbour:
for (int cx = -1; cx <= 1; ++cx)
for (int cy = -1; cy <= 1; ++cy)
if (cx != 0 && cy != 0)
{
int nx = x + cx;
int ny = y + cy;
// do whatever you like
}
or just use constants:
int delta[8][2] = {{1,0},{-1,0},{0,1},{0,-1},{1,1},{1,-1},{-1,1},{-1,-1}}
for i in range(3):
for j in range(3):
if (i-1) or (j-1):
print i-1,j-1
-1 -1
-1 0
-1 1
0 -1
0 1
1 -1
1 0
1 1
does this work?

Resources