I'm using a 64 bit LCG (MMIX (by Knuth)). It generate a certain block of random numbers inside my code, which use them to perform some operations. My code works in single core and I would like to parallelize the work to reduce the execution time.
Before start thinking to more advanced methods in this sense I'd like to simply execute more identical codes in parallel (in fact the code repeats the same task over a certain numbers of indipendent simulation, so I can simply split the number of simulation between more identical codes and run them in parallel).
My only problem now is to find a seed for each code; in particular, to avoid the possibility of unwanted non trivial correlation between data generated in different codes, I have to be sure that the random number generated in the various codes don't overlap. To do so, starting from a certain seed in the first code I have to find a way to find a value (the next seed) very distant not in absolute value but in the pseudo-random sequence (so, such that, to go from the first to the second seed, I need a huge number of steps of LCG).
My first attempt was this:
starting from the LCG relation between 2 consecutive numbers generated in the sequence
So, in principle, I could calculate the above relation with, say, n = 2^40 and I_0 equal to the value of the first seed, and obtain a new seed distant 2^40 steps in the random CLG sequence from the first one.
The problem is that, doing so, I necessary go in overflow calculating a^n. In fact for MMIX (by Knuth) a~2^62 and i use unsigned long long int (64 bit). Note that the only problem here is the fraction in the above relation. If there only were sum and multiplication I could ignore the overflow problem due to the following modular properties (in fact I'm using 2^64 as c (64 bit generator)):
So, starting from a certain value (first seed), how can I find a second one distant a huge number of step in the LC pseudo-random sequence?
[EDIT]
r3mainer solution is perfectly suited for python codes. I'm trying now to implement it in c using unsigned __int128 variables. I have only one problem: in principle I should compute:
Say, for simplicity, I want to compute:
with n = 2^40 and c(a-1)~2^126. I proceed with a cycle.Starting with temp = a, in each iteration I compute temp = temp*temp, then I compute temp%c(a-1). The problem is in the second step (temp = temp*temp). temp in fact could be, in principle any number < c(a-1)~2^126. If temp is a big number, say > 2^64, I'll go in overflow, reaching 2^128 - 1, before the next module operation. So is there a way to avoid it? For now the only solution I see is to perform each multiplication with a loop over bit, as suggested here: c code: prevent overflow in modular operation with huge modules (modules near the overflow treshold)
Is there another way to perform module operation during the multiplication?
(note that being c = 2^64, with mod(c) operation I don't have the same problem because the overflow point (for ull int variables) coincides with the module)
Any LCG of the form x[n+1] = (x[n] * a + c) % m can be skipped to an arbitrary position very quickly.
Starting with a seed value of zero, the first few iterations of the LCG will give you this sequence:
x₀ = 0
x₁ = c % m
x₂ = (c(a + 1)) % m
x₃ = (c(a² + a + 1)) % m
x₄ = (c(a³ + a² + a + 1)) % m
It's pretty easy to see that each term is actually the sum of a geometric series, which can be calculated with a simple formula:
x_n = (c(a^{n-1} + a^{n-2} + ... + a + 1)) % m
= (c * (a^n - 1) / (a - 1)) % m
The (a^n - 1) term can be calculated quickly by modular exponentiation, but dividing by (a-1) is a bit tricky because (a-1) and m are both even (i.e., not coprime), so we can't calculate the modular multiplicative inverse of (a-1) mod m directly.
Instead, calculate (a^n-1) mod m*(a-1), then perform a straightforward (non-modular) division of the result by a-1. In Python, the calculation would go something like this:
def lcg_skip(m, a, c, n):
# Calculate nth term of LCG sequence with parameters m (modulus),
# a (multiplier) and c (increment), assuming an initial seed of zero
a1 = a - 1
t = pow(a, n, m * a1) - 1
t = (t * c // a1) % m
return t
def test(nsteps):
m = 2**64
a = 6364136223846793005
c = 1442695040888963407
#
print("Calculating by brute force:")
seed = 0
for i in range(nsteps):
seed = (seed * a + c) % m
print(seed)
#
print("Calculating by fast method:")
# Calculate nth term by modular exponentiation
print(lcg_skip(m, a, c, nsteps))
test(1000000)
So to create LCGs with non-overlapping output sequences, all you would need to do is use initial seed values generated by lcg_skip() with values of n that are far enough apart.
Well, for LCG it is known property to jump forward and backward in O(log2(N)) time where N is the distance between jump points, paper by F. Brown, "Random Number Generation with Arbitrary Stride," Trans. Am. Nucl. Soc. (Nov. 1994).
It means if you have LCG parameters (a, c) satisfying Hull–Dobell Theorem, then whole period would be 264 numbers before repeating themself, and say for Nt number pf threads you make jump distance of 264 / Nt, and all threads start with the same seed and just jump after initializing LCG by (264 / Nt)*threadId, and you would be completely safe from RNG correlations due to sequences overlap.
For simplest case of common 64 unsigned modulo math, as implemented in NumPy, code below should work fine
import numpy as np
class LCG(object):
UZERO: np.uint64 = np.uint64(0)
UONE : np.uint64 = np.uint64(1)
def __init__(self, seed: np.uint64, a: np.uint64, c: np.uint64) -> None:
self._seed: np.uint64 = np.uint64(seed)
self._a : np.uint64 = np.uint64(a)
self._c : np.uint64 = np.uint64(c)
def next(self) -> np.uint64:
self._seed = self._a * self._seed + self._c
return self._seed
def seed(self) -> np.uint64:
return self._seed
def set_seed(self, seed: np.uint64) -> np.uint64:
self._seed = seed
def skip(self, ns: np.int64) -> None:
"""
Signed argument - skip forward as well as backward
The algorithm here to determine the parameters used to skip ahead is
described in the paper F. Brown, "Random Number Generation with Arbitrary Stride,"
Trans. Am. Nucl. Soc. (Nov. 1994). This algorithm is able to skip ahead in
O(log2(N)) operations instead of O(N). It computes parameters
A and C which can then be used to find x_N = A*x_0 + C mod 2^M.
"""
nskip: np.uint64 = np.uint64(ns)
a: np.uint64 = self._a
c: np.uint64 = self._c
a_next: np.uint64 = LCG.UONE
c_next: np.uint64 = LCG.UZERO
while nskip > LCG.UZERO:
if (nskip & LCG.UONE) != LCG.UZERO:
a_next = a_next * a
c_next = c_next * a + c
c = (a + LCG.UONE) * c
a = a * a
nskip = nskip >> LCG.UONE
self._seed = a_next * self._seed + c_next
#%%
np.seterr(over='ignore')
seed = np.uint64(1)
rng64 = LCG(seed, np.uint64(6364136223846793005), np.uint64(1))
print(rng64.next())
print(rng64.next())
print(rng64.next())
#%%
rng64.skip(-3) # back by 3
print(rng64.next())
print(rng64.next())
print(rng64.next())
rng64.skip(-3) # back by 3
rng64.skip(2) # forward by 2
print(rng64.next())
Tested in Python 3.9.1, x64 Win 10
It is a straightforward question: Is there a faster alternative to all(a(:,i)==a,1) in MATLAB?
I'm thinking of a implementation that benefits from short-circuit evaluations in the whole process. I mean, all() definitely benefits from short-circuit evaluations but a(:,i)==a doesn't.
I tried the following code,
% example for the input matrix
m = 3; % m and n aren't necessarily equal to those values.
n = 5000; % It's only possible to know in advance that 'm' << 'n'.
a = randi([0,5],m,n); % the maximum value of 'a' isn't necessarily equal to
% 5 but it's possible to state that every element in
% 'a' is a positive integer.
% all, equal solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%% all and equal solution %%%%%%%%%
ax_boo = all(a(:,i)==a,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
% alternative solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%%% alternative solution %%%%%%%%%%%
ax_boo = a(1,i) == a(1,:);
for k = 2:m
ax_boo(ax_boo) = a(k,i) == a(k,ax_boo);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
but it's intuitive that any "for-loop-solution" within the MATLAB environment will be naturally slower. I'm wondering if there is a MATLAB built-in function written in a faster language.
EDIT:
After running more tests I found out that the implicit expansion does have a performance impact in evaluating a(:,i)==a. If the matrix a has more than one row, all(repmat(a(:,i),[1,n])==a,1) may be faster than all(a(:,i)==a,1) depending on the number of columns (n). For n=5000 repmat explicit expansion has proved to be faster.
But I think that a generalization of Kenneth Boyd's answer is the "ultimate solution" if all elements of a are positive integers. Instead of dealing with a (m x n matrix) in its original form, I will store and deal with adec (1 x n matrix):
exps = ((0):(m-1)).';
base = max(a,[],[1,2]) + 1;
adec = sum( a .* base.^exps , 1 );
In other words, each column will be encoded to one integer. And of course adec(i)==adec is faster than all(a(:,i)==a,1).
EDIT 2:
I forgot to mention that adec approach has a functional limitation. At best, storing adec as uint64, the following inequality must hold base^m < 2^64 + 1.
Since your goal is to count the number of columns that match, my example converts the binary encoding to integer decimals, then you just loop over the possible values (with 3 rows that are 8 possible values) and count the number of matches.
a_dec = 2.^(0:(m-1)) * a;
num_poss_values = 2 ^ m;
num_matches = zeros(num_poss_values, 1);
for i = 1:num_poss_values
num_matches(i) = sum(a_dec == (i - 1));
end
On my computer, using 2020a, Here are the execution times for your first 2 options and the code above:
Elapsed time is 0.246623 seconds.
Elapsed time is 0.553173 seconds.
Elapsed time is 0.000289 seconds.
So my code is 853 times faster!
I wrote my code so it will work with m being an arbitrary integer.
The num_matches variable contains the number of columns that add up to 0, 1, 2, ...7 when converted to a decimal.
As an alternative you can use the third output of unique:
[~, ~, iu] = unique(a.', 'rows');
for i = 1:n
ax_boo = iu(i) == iu;
end
As indicated in a comment:
ax_boo isolates the indices of the columns I have to sum in a row vector b. So, basically the next line would be something like c = sum(b(ax_boo),2);
It is a typical usage of accumarray:
[~, ~, iu] = unique(a.', 'rows');
C = accumarray(iu,b);
for i = 1:n
c = C(i);
end
Context: I am working on implementing a navigation system for the mobile computers added by OpenComputers, a Minecraft mod. For those not familiar with the mod, it basically adds a variety of Lua-programmable, upgradable computers, including mobile ones - namely, robots, drones, and tablets. One of the many challenges often arising when trying to program robots and drones to carry out an autonomous task is to ensure they know their coordinates at all times.
The simplest solution would be to use Navigation upgrade, which does exactly that - provides the computer with its exact coordinates relative to the center of the map it was crafted with. It has two major downsides, however - it takes up a Tier II upgrade slot, which is no small thing, and is limited to the area of the map. The latter is more or less acceptable, but still makes this navigation method unavailable for some usage cases.
Another solution would be to make computers memorise their coordinates once and then keep track of their movements, but this has a number of potential caveats, too - you have to control all movement through custom subroutines or use hacks to intercept component calls, you can't move a computer without having to manually enter the coordinates each time, there are some precision errors for the drones and this won't work at all for the tablets.
A third method - the one I'm working on - is similar to the real life GPS. It's based on the fact that computers can be upgraded with wireless network cards to be able to send messages to each other within a quite large distance of 400 blocks, and along with the message itself they receive an exact distance (floating point number, in blocks) between the sender and the receiver. If we designate some fixed computers as "satellites" which constantly broadcast their position, we can make a mobile computer able to trilaterate its exact position using information from 4+ satellites.
This approach is scalable (you can just keep adding more satellites to the network to expand its coverage), does not take up an extra upgrade slot for navigation purposes only (since many mobile computers are upgraded with wireless network cards already) and precise, which gives it a clear advantage over two other methods. However, it requires some surprisingly complicated calculations, and this is where I'm getting stuck.
Problem: I need to find a trilateration algorithm (ideally coming with a code example) which would allow any mobile computer to calculate its position (within a margin of error of ~0.25 blocks) knowing the coordinates of the designated "satellites" and the distances to them. We assume that all computers and satellites are equipped with Tier II wireless cards (i.e. that they can send messages to each other within the total range of 400 blocks and know the distance between a sender and itself with the precision allowed by float32 numbers). The solution will be coded in pure Lua without accessing any third-party services, so packets like Mathematica are a no-go. Currently I'm betting on some sort of a fitting method, though I don't know how to implement one and how well it could be adapted to the possibility of some satellites in range broadcasting a wrong position.
On the most basic level, we can assume there are 4 satellites which constantly and correctly broadcast their position, are set apart from each other at a moderate distance and do not lie on a single 2D plane. There are some optional conditions which the algorithm should ideally be able to adapt to - see section below.
Bonus points for:
Making the algorithm small enough to fit into the 2KB memory of the drone (assuming UTF8 encoding). It should take well less space than that, though, so that a main program could fit too. The smaller, the better.
Making an algorithm which allows the satellites to be very close to each other and to have non-integer coordinates (to allow for replacing several fixed satellites with one constantly moving robot or drone, or for making the mobile computer itself move as it takes measurements from a single satellite).
Making an algorithm which allows for less than 4 satellites to be present, assuming the position can be determined already - for instance, if the mobile computer in question is a robot and all but one possible positions are below or above the allowed height range for blocks (y<0 or y>255). Such setup is possible if there are three satellites positioned at the height of, say, y=255.
Making an algorithm which is resistant to some satellites broadcasting slightly wrong position (a minor mistake in the setup). Given the presense of enough correct measurements, the algorithm should deduce the correct position or flatly out throw an error. Ideally, it can also log the location of the "off" satellite.
Making an algorithm which is resistant to a simultaneous presence of two or more groups of satellites correctly broadcasting their positions in different systems of coordinates (a major mistake in the setup). Each network has a (supposedly unique) identificator that allows to distinguish between different networks independently set up by different players (or, well, just one). If, however, they didn't bother to properly set the identificators, different signals can mix up, confusing the mobile computer. The resistant algorithm should therefore be able to detect this situation and either flatly out throw an error or distinguish between different networks (then it could be fine-tuned to suit the purposes of a specific application - i.e. refuse to load, choose the closest network, choose the biggest network, prompt user or controlling server, etc.).
What I tried: Besides trying to solve the problem by myself, I've also tried to look up a fitting solution on the internet. However, none of the solutions I could find were fit for this task.
Most of the stuff I've found by googling up "trilateration algorithms" was dealing with the real-life GPS systems - that is, using just 2 coordinates, strongly accounting for errors and not giving enough precision in general.
Some, on the opposite, were purely mathematical, suggesting building series of equations to find the intersection points of the spheres. Sadly, as far as my weak mathematical background allows me to understand, this approach does not account for precision errors of floating numbers - circles do not quite intersect, points are not quite in the same locations, and so the equations do not have solutions.
Some seemed to explain the solution, but involved a lot of complicated math I couldn't understand and did not include an exact algorithm or at least a code example.
At least one used external packets like Mathematica, which, again, are not available in this case.
If I left some important points unclear, please leave a comment so that I could improve the question. Thanks in advance!
Such a trilateration system was already developed for a different mod, named ComputerCraft. Since its propably not compatible for your specific problem, you will have to modify and adapt its logic but the algorithm itself should work.
Here is the Source Code
CHANNEL_GPS = 65534
local function trilaterate( A, B, C )
local a2b = B.vPosition - A.vPosition
local a2c = C.vPosition - A.vPosition
if math.abs( a2b:normalize():dot( a2c:normalize() ) ) > 0.999 then
return nil
end
local d = a2b:length()
local ex = a2b:normalize( )
local i = ex:dot( a2c )
local ey = (a2c - (ex * i)):normalize()
local j = ey:dot( a2c )
local ez = ex:cross( ey )
local r1 = A.nDistance
local r2 = B.nDistance
local r3 = C.nDistance
local x = (r1*r1 - r2*r2 + d*d) / (2*d)
local y = (r1*r1 - r3*r3 - x*x + (x-i)*(x-i) + j*j) / (2*j)
local result = A.vPosition + (ex * x) + (ey * y)
local zSquared = r1*r1 - x*x - y*y
if zSquared > 0 then
local z = math.sqrt( zSquared )
local result1 = result + (ez * z)
local result2 = result - (ez * z)
local rounded1, rounded2 = result1:round( 0.01 ), result2:round( 0.01 )
if rounded1.x ~= rounded2.x or rounded1.y ~= rounded2.y or rounded1.z ~= rounded2.z then
return rounded1, rounded2
else
return rounded1
end
end
return result:round( 0.01 )
end
local function narrow( p1, p2, fix )
local dist1 = math.abs( (p1 - fix.vPosition):length() - fix.nDistance )
local dist2 = math.abs( (p2 - fix.vPosition):length() - fix.nDistance )
if math.abs(dist1 - dist2) < 0.01 then
return p1, p2
elseif dist1 < dist2 then
return p1:round( 0.01 )
else
return p2:round( 0.01 )
end
end
function locate( _nTimeout, _bDebug )
-- Let command computers use their magic fourth-wall-breaking special abilities
if commands then
return commands.getBlockPosition()
end
-- Find a modem
local sModemSide = nil
for n,sSide in ipairs( rs.getSides() ) do
if peripheral.getType( sSide ) == "modem" and peripheral.call( sSide, "isWireless" ) then
sModemSide = sSide
break
end
end
if sModemSide == nil then
if _bDebug then
print( "No wireless modem attached" )
end
return nil
end
if _bDebug then
print( "Finding position..." )
end
-- Open a channel
local modem = peripheral.wrap( sModemSide )
local bCloseChannel = false
if not modem.isOpen( os.getComputerID() ) then
modem.open( os.getComputerID() )
bCloseChannel = true
end
-- Send a ping to listening GPS hosts
modem.transmit( CHANNEL_GPS, os.getComputerID(), "PING" )
-- Wait for the responses
local tFixes = {}
local pos1, pos2 = nil, nil
local timeout = os.startTimer( _nTimeout or 2 )
while true do
local e, p1, p2, p3, p4, p5 = os.pullEvent()
if e == "modem_message" then
-- We received a reply from a modem
local sSide, sChannel, sReplyChannel, tMessage, nDistance = p1, p2, p3, p4, p5
if sSide == sModemSide and sChannel == os.getComputerID() and sReplyChannel == CHANNEL_GPS and nDistance then
-- Received the correct message from the correct modem: use it to determine position
if type(tMessage) == "table" and #tMessage == 3 then
local tFix = { vPosition = vector.new( tMessage[1], tMessage[2], tMessage[3] ), nDistance = nDistance }
if _bDebug then
print( tFix.nDistance.." metres from "..tostring( tFix.vPosition ) )
end
if tFix.nDistance == 0 then
pos1, pos2 = tFix.vPosition, nil
else
table.insert( tFixes, tFix )
if #tFixes >= 3 then
if not pos1 then
pos1, pos2 = trilaterate( tFixes[1], tFixes[2], tFixes[#tFixes] )
else
pos1, pos2 = narrow( pos1, pos2, tFixes[#tFixes] )
end
end
end
if pos1 and not pos2 then
break
end
end
end
elseif e == "timer" then
-- We received a timeout
local timer = p1
if timer == timeout then
break
end
end
end
-- Close the channel, if we opened one
if bCloseChannel then
modem.close( os.getComputerID() )
end
-- Return the response
if pos1 and pos2 then
if _bDebug then
print( "Ambiguous position" )
print( "Could be "..pos1.x..","..pos1.y..","..pos1.z.." or "..pos2.x..","..pos2.y..","..pos2.z )
end
return nil
elseif pos1 then
if _bDebug then
print( "Position is "..pos1.x..","..pos1.y..","..pos1.z )
end
return pos1.x, pos1.y, pos1.z
else
if _bDebug then
print( "Could not determine position" )
end
return nil
end
end
From https://github.com/dan200/ComputerCraft/blob/master/src/main/resources/assets/computercraft/lua/rom/apis/gps.lua
Ask if you have any specific questions about the source code.
Function trilateration expects list of satellites (their coordinates and distances from mobile computer) and previous coordinates of the mobile computer.
Gather only satellites from your own group, exclude satellites from all other groups.
Some of your satellites might send incorrect data, it's OK.
If there is not enough satellites accessible, the function returns nil as it can't determine the current position.
Otherwise, the function returns current coordinates of the mobile computer and list of indices of satellites been blamed as incorrect.
In case of ambiguity the new position is selected as nearest one to the previous position of the mobile computer.
The output coordinates are integer, Y coordinate is limited to the range 0..255
The following conditions should be satisfied for proper trilateration:
(number_of_correct_satellites) must be >= 3
(number_of_correct_satellites) must be >= 4 if at least one incorrect satellite exists
(number_of_correct_satellites) must be > (number_of_incorrect_satellites)
Recognizing an incorrect satellite is costly CPU operation.
Once a satellite is recognized as incorrect, please store it in some blacklist and exclude it from all future calculations.
do
local floor, exp, max, min, abs, table_insert = math.floor, math.exp, math.max, math.min, math.abs, table.insert
local function try_this_subset_of_sat(satellites, is_sat_incorrect, X, Y, Z)
local last_max_err, max_err = math.huge
for k = 1, math.huge do
local oldX, oldY, oldZ = X, Y, Z
local DX, DY, DZ = 0, 0, 0
max_err = 0
for j = 1, #satellites do
if not is_sat_incorrect[j] then
local sat = satellites[j]
local dx, dy, dz = X - sat.x, Y - sat.y, Z - sat.z
local d = (dx*dx + dy*dy + dz*dz)^0.5
local err = sat.distance - d
local e = exp(err+err)
e = (e-1)/(e+1)/(d+1)
DX = DX + dx*e
DY = DY + dy*e
DZ = DZ + dz*e
max_err = max(max_err, abs(err))
end
end
if k % 16 == 0 then
if max_err >= last_max_err then
break
end
last_max_err = max_err
end
local e = 1/(1+(DX*DX+DY*DY+DZ*DZ)^0.5/max_err)
X = X + DX*e
Y = max(0, min(255, Y + DY*e))
Z = Z + DZ*e
if abs(oldX - X) + abs(oldY - Y) + abs(oldZ - Z) <= 1e-4 then
break
end
end
return max_err, floor(X + 0.5), floor(Y + 0.5), floor(Z + 0.5)
end
local function init_set(is_sat_incorrect, len, ctr)
for j = 1, len do
is_sat_incorrect[j] = (j <= ctr)
end
end
local function last_combination(is_sat_incorrect)
local first = 1
while not is_sat_incorrect[first] do
first = first + 1
end
local last = first + 1
while is_sat_incorrect[last] do
last = last + 1
end
if is_sat_incorrect[last] == nil then
return true
end
is_sat_incorrect[last] = true
init_set(is_sat_incorrect, last - 1, last - first - 1)
end
function trilateration(list_of_satellites, previous_X, previous_Y, previous_Z)
local N = #list_of_satellites
if N >= 3 then
local is_sat_incorrect = {}
init_set(is_sat_incorrect, N, 0)
local err, X, Y, Z = try_this_subset_of_sat(list_of_satellites, is_sat_incorrect, previous_X, previous_Y, previous_Z)
local incorrect_sat_indices = {}
if err < 0.1 then
return X, Y, Z, incorrect_sat_indices
end
for incorrect_ctr = 1, min(floor((N - 1) / 2), N - 4) do
init_set(is_sat_incorrect, N, incorrect_ctr)
repeat
err, X, Y, Z = try_this_subset_of_sat(list_of_satellites, is_sat_incorrect, previous_X, previous_Y, previous_Z)
if err < 0.1 then
for j = 1, N do
if is_sat_incorrect[j] then
table_insert(incorrect_sat_indices, j)
end
end
return X, Y, Z, incorrect_sat_indices
end
until last_combination(is_sat_incorrect)
end
end
end
end
Usage example:
-- assuming your mobile computer previous coordinates were 99 120 100
local previous_X, previous_Y, previous_Z = 99, 120, 100
-- assuming your mobile computer current coordinates are 111 112 113
local list_of_satellites = {
{x=22, y=55, z=77, distance=((111-22)^2+(112-55)^2+(113-77)^2)^0.5}, -- correct satellite
{x=35, y=99, z=42, distance=((111-35)^2+(112-99)^2+(113-42)^2)^0.5}, -- correct satellite
{x=44, y=44, z=44, distance=((111-94)^2+(112-94)^2+(113-94)^2)^0.5}, -- incorrect satellite
{x=10, y=88, z=70, distance=((111-10)^2+(112-88)^2+(113-70)^2)^0.5}, -- correct satellite
{x=54, y=54, z=54, distance=((111-64)^2+(112-64)^2+(113-64)^2)^0.5}, -- incorrect satellite
{x=91, y=33, z=15, distance=((111-91)^2+(112-33)^2+(113-15)^2)^0.5}, -- correct satellite
}
local X, Y, Z, list_of_incorrect_sat_indices = trilateration(list_of_satellites, previous_X, previous_Y, previous_Z)
if X then
print(X, Y, Z)
if #list_of_incorrect_sat_indices > 0 then
print("Satellites at the following indices are incorrect: "..table.concat(list_of_incorrect_sat_indices, ","))
end
else
print"Not enough satellites"
end
Output:
111 112 113
Satellites at the following indices are incorrect: 3,5
My problem is roughly as follows. Given a numerical matrix X, where each row is an item. I want to find each row's nearest neighbor in terms of L2 distance in all rows except itself. I tried reading the official documentation but was still a little confused about how to achieve this. Could someone give me some hint?
My code is as follows
function l2_dist(v1, v2)
return sqrt(sum((v1 - v2) .^ 2))
end
function main(Mat, dist_fun)
n = size(Mat, 1)
Dist = SharedArray{Float64}(n) #[Inf for i in 1:n]
Id = SharedArray{Int64}(n) #[-1 for i in 1:n]
#parallel for i = 1:n
Dist[i] = Inf
Id[i] = 0
end
Threads.#threads for i in 1:n
for j in 1:n
if i != j
println(i, j)
dist_temp = dist_fun(Mat[i, :], Mat[j, :])
if dist_temp < Dist[i]
println("Dist updated!")
Dist[i] = dist_temp
Id[i] = j
end
end
end
end
return Dict("Dist" => Dist, "Id" => Id)
end
n = 4000
p = 30
X = [rand() for i in 1:n, j in 1:p];
main(X[1:30, :], l2_dist)
#time N = main(X, l2_dist)
I'm trying to distributed all the i's (i.e. calculating each row minimum) over different cores. But the version above apparently isn't working correctly. It is even slower than the sequential version. Can someone point me to the right direction? Thanks.
Maybe you're doing something in addition to what you have written down, but, at this point from what I can see, you aren't actually doing any computations in parallel. Julia requires you to tell it how many processors (or threads) you would like it to have access to. You can do this through either
Starting Julia with multiple processors julia -p # (where # is the number of processors you want Julia to have access to)
Once you have started a Julia "session" you can call the addprocs function to add additional processors.
To have more than 1 thread, you need to run command export JULIA_NUM_THREADS = #. I don't know very much about threading, so I will be sticking with the #parallel macro. I suggest reading documentation for more details on threading -- Maybe #Chris Rackauckas could expand a little more on the difference.
A few comments below about my code and on your code:
I'm on version 0.6.1-pre.0. I don't think I'm doing anything 0.6 specific, but this is a heads up just in case.
I'm going to use the Distances.jl package when computing the distances between vectors. I think it is a good habit to farm out as many of my computations to well-written and well-maintained packages as possible.
Rather than compute the distance between rows, I'm going to compute the distance between columns. This is because Julia is a column-major language, so this will increase the number of cache hits and give a little extra speed. You can obviously get the row-wise results you want by just transposing the input.
Unless you expect to have that many memory allocations then that many allocations are a sign that something in your code is inefficient. It is often a type stability problem. I don't know if that was the case in your code before, but that doesn't seem to be an issue in the current version (it wasn't immediately clear to me why you were having so many allocations).
Code is below
# Make sure all processors have access to Distances package
#everywhere using Distances
# Create a random matrix
nrow = 30
ncol = 4000
# Seed creation of random matrix so it is always same matrix
srand(42)
X = rand(nrow, ncol)
function main(X::AbstractMatrix{Float64}, M::Distances.Metric)
# Get size of the matrix
nrow, ncol = size(X)
# Create `SharedArray` to store output
ind_vec = SharedArray{Int}(ncol)
dist_vec = SharedArray{Float64}(ncol)
# Compute the distance between columns
#sync #parallel for i in 1:ncol
# Initialize various temporary variables
min_dist_i = Inf
min_ind_i = -1
X_i = view(X, :, i)
# Check distance against all other columns
for j in 1:ncol
# Skip comparison with itself
if i==j
continue
end
# Tell us who is doing the work
# (can uncomment if you want to verify stuff)
# println("Column $i compared with Column $j by worker $(myid())")
# Evaluate the new distance...
# If it is less then replace it, otherwise proceed
dist_temp = evaluate(M, X_i, view(X, :, j))
if dist_temp < min_dist_i
min_dist_i = dist_temp
min_ind_i = j
end
end
# Which column is minimum distance from column i
dist_vec[i] = min_dist_i
ind_vec[i] = min_ind_i
end
return dist_vec, ind_vec
end
# Using Euclidean metric
metric = Euclidean()
inds, dist = main(X, metric)
#time main(X, metric);
#show dist[[1, 5, 25]], inds[[1, 5, 25]]
You can run the code with
1 processor julia testfile.jl
% julia testfile.jl
0.640365 seconds (16.00 M allocations: 732.495 MiB, 3.70% gc time)
(dist[[1, 5, 25]], inds[[1, 5, 25]]) = ([2541, 2459, 1602], [1.40892, 1.38206, 1.32184])
n processors (in this case 4) julia -p n testfile.jl
% julia -p 4 testfile.jl
0.201523 seconds (2.10 k allocations: 99.107 KiB)
(dist[[1, 5, 25]], inds[[1, 5, 25]]) = ([2541, 2459, 1602], [1.40892, 1.38206, 1.32184])