python testing data in 2d array - performance

I have a 2d array with depth information, 640x480.
I want to add (row, col) values to a list where the value is in the range 800 to 2800 (true in my example data for about 5% of the values).
I have this code (python 2.7, w10, new laptop 2017)
depth = np.load("depth.npy") # depth.shape = (640, 480), ndarray
for row in range(480):
for col in range(640):
dist = depth[col, row]
if dist > 800 and dist < 2800:
obstacleList.append((col, dist))
My time measure shows me that it takes almost 10 seconds to complete the list.
For further processing of the data I need only the col with the lowest dist value but I thought this would add only more processing time.
What is wrong with my code?

found numpy.nanmin which finds the min value for each column in practical no time. To get rid of some values (like 0) I needed to convert my array to float (as NaN is float) and replace unwanted values with NaN
b = depth.astype(float)
b[b<800] = np.NaN
obstacles = np.nanmin(b, axis=1)
gave me an array with the lowest value <> NaN for each column in no time (0.05 secs on my machine versus the 8 seconds using normal iteration)!


Reading unformatted FORTRAN files into multidimensional arrays using Julia

I'm just getting started with Julia, and I'm trying to read an unformatted FORTRAN file and store the data in arrays that are shaped in a particular way. I'm not sure how to accomplish this using Julia.
I've found the Julia package FortranFiles, which provides a direct way to read unformatted FORTRAN files using Julia. The file I'm trying to read looks like:
1 integer:
nzones*3 integers (brackets indicate one record):
series of nzones datasets:
[xvalues1,yvalues1,zvalues1](floating point values) for 1st zone
[xvalues1,yvalues1,zvalues1](floating point values) for 2nd zone
[xvalues1,yvalues1,zvalues1](floating point values) for last zone
where the first line represents the number of zones and the lines that follow represent a grid dimension in each i, j, and k directions. Following these first two records are the x, y, and z coordinates, which are Float64s, for each i, j, and k point in a zone, and I would like to shape the arrays as x(1:im,1:jm,1:km,m), y(1:im,1:jm,1:km,m), and z(1:im,1:jm,1:km,m) where im, jm, and km are the imax,jmax, and kmax extents listed for each zone. Here's what I have so far:
using FortranFiles
fname = "my_file"
fid = FortranFile(fname)
#fread fid nblks::Int32
#fread fid ni::(Int32,nblks) nj::(Int32,nblks) nk::(Int32,nblks)
Here's where I'm getting hung up. For each zone I have x, y, and z coordinate arrays which should all be rank 4 arrays. For the x array, I want to store all of the x coordinates where x[1,1,1,1] refers to an x coordinate value at i=1, j=1, k=1, and zone =1 and x[end, end, end, end] refers to an x coordinate value at i = imax, j=jmax, k=kmax, and for the last zone listed (i.,e. zone = nblks). Then I want to create similar arrays for the y and z coordinate values.
Something like:
for m = 1:nblks
im = ni[m]
jm = nj[m]
km = nk[m]
#fread fid x::(Float64,im,jm,km,m) y::(Float64,im,jm,km,m) z::(Float64,im,jm,km,m)
However, I get a FortranFilesError: attempting to read beyond record end when trying this approach.
It appears that my issue is somewhat related to how Julia reads unformatted binary data, which is different from how FORTRAN's read works on the same data.
In FORTRAN, I could do something like:
integer, dimension (:), allocatable :: idim, jdim, kdim
integer :: nblks, fid, ios
fid = 10
open(unit=fid,form='unformatted', file='my_file',status='old',iostat=ios)
if( ios /= 0 ) then
write(*,*) '*** Error reading file ***'
end if
read(fid) nblks
allocate( idim(nblks), jdim(nblks), kdim(nblks) )
read(fid) ( idim(m), jdim(m), kdim(m), m = 1, nblks )
However in Julia, I need to keep track of the file pointer's position, and realize that each record is preceded and followed by a 4-byte integer. I haven't been able to find a way to read each zone's i, j, & k extents directly into three separate arrays like can be done in FORTRAN (since the record is probably parsed line by line), but an alternative in Julia is to just read the entire record into a single nblk*3 element vector, and then reshape this vector afterwards:
fid = open("my_file")
nblks = read(fid,Int32)
dims = Array{Int32}(undef,3*nblks)
ni, nj, nk = [Array{Int32}(undef,nblks) for i in 1:3]
for m in 1:nblks
ni[m] = dims[3*m-2]
nj[m] = dims[3*m-1]
nk[m] = dims[3*m]

Efficiently calculate histogram of a 3D numpy array along an axis with different bin edges

Problem description
I have a 3D numpy array, denoted as data, of shape N x R x C, i.e. N samples, R rows and C columns. I would like to obtain histograms along column for each combination of sample and row. However bin edges (see argument bins in numpy.histogram), of fixed length S, will be different at different rows but are shared across samples. Consider this example for illustration, for the 1st sample (data[0]), bin edge sequence for its 1st row is different from that for its 2nd row, but is the same as that for the 1st row from the 2nd sample (data[1]). Thus all the bin edge sequences are stored in a 2D numpy array of shape R x S, denoted as bin_edges.
My question is how to efficiently calculate the histograms?
A working but slow solution
Using numpy.histogram, I was able to come up with a working but fairly slow solution as shown in the below code snippet
Get dummy data
N: number of samples
R: number of rows (or kernels)
C: number of columns (or pixels)
S: number of bins
import numpy as np
N, R, C, S = 100, 50, 1000, 10
data = np.random.randn(N, R, C)
# for each row/kernel, pool pixels of all samples
poolsamples = np.swapaxes(data, 0, 1).reshape(R, -1)
# use quantiles as bin edges
percentiles = np.linspace(0, 100, num=(S + 1))
bin_edges = np.transpose(np.percentile(poolsamples, percentiles, axis=1))
A working but slow solution of getting histograms along column
hist = np.empty((N, R, S))
for idx in np.arange(R):
bin_edges_i = bin_edges[idx, :]
counts = np.apply_along_axis(
lambda a: np.histogram(a, bins=bin_edges_i)[0],
1, data[:, idx, :])
hist[:, idx, :] = counts
Possible directions
Fancy numpy reshape to avoid using for loop at all
This problem arises from extracting low-end characteristics for each image forwarded through a trained neural network. Therefore, if the histogram extraction can be embedded in TensorFlow graph and ultimately be carried out on GPU, that would be ideal!
I noticed a python package fast-histogram which claims to be 7-15x faster than numpy.histogram. However 1d histogram function can only takes number of bins instead of actual bin positions
I would love to hear any inputs! Thanks in advance!
Making use of 2D version of np.searchsorted : searchsorted2d -
def vectorized_app(data, bin_edges):
N, R, C = data.shape
a = np.sort(data.reshape(-1,C),1)
b = np.repeat(bin_edges[None],N,axis=0).reshape(-1,bin_edges.shape[-1])
idx = searchsorted2d(a,b)
idx[:,0] = 0
idx[:,-1] = a.shape[1]
out = (idx[:,1:] - idx[:,:-1]).reshape(N,R,-1)
return out
Runtime test -
In [591]: N, R, C, S = 100, 50, 1000, 10
...: data = np.random.randn(N, R, C)
...: # for each row/kernel, pool pixels of all samples
...: poolsamples = np.swapaxes(data, 0, 1).reshape(R, -1)
...: # use quantiles as bin edges
...: percentiles = np.linspace(0, 100, num=(S + 1))
...: bin_edges = np.transpose(np.percentile(poolsamples, percentiles, axis=1))
In [592]: %timeit org_app(data, bin_edges)
1 loop, best of 3: 481 ms per loop
In [593]: %timeit vectorized_app(data, bin_edges)
1 loop, best of 3: 224 ms per loop
In [595]: np.allclose(org_app(data, bin_edges), vectorized_app(data, bin_edges))
Out[595]: True
More than 2x speedup there.
Closer look reveals that the bottleneck with the proposed vectorized method is the sorting itself -
In [594]: %timeit np.sort(data.reshape(-1,C),1)
1 loop, best of 3: 194 ms per loop
We need this sorting to use searchsorted.

MATLAB: Speeding up a discretization function using bsxfun

For a current project, I have to discretize quasi-continuous values into bins defined by some pre-defined binning resolution. For this purpose, I have written a function, which I expected to be highly efficient as it is able to both process scalar inputs as well as vector inputs using bsxfun. However, after some profiling, I found out that almost all processing time of my much larger project is produced in this function, and within the function, it's mainly the bsxfun part that takes time, with the min-query following on second place. Long story short, I am looking for advice on how to solve this task MUCH faster in MATLAB. Side note: I am usually passing vectors with some 50k elements.
Here's the code:
function sampleNo = value2sample(value,bins)
%Make sure both vectors have orientations fitting bsxfun
value = value(:);
bins = bins(:)';
%Recover bin resolution (avoids passing another parameter)
delta = median(diff(bins));
%Calculate distance matrix between all combinations
dist = abs(bsxfun(#minus,value,bins));
%What we really want to know is the minimum distance per row
[minval,ind] = min(dist,[],2);
%Make sure we don't accidentally further process NaNs as 1st bin
sampleNo = ind;
sampleNo(minval>delta) = NaN;
The reason that your function is slow is because you are computing the distance between every element of values and bins and storing them all in an array - if there are N values and M bins then you will require NM elements to store all the distances, and this is probably a really big number (e.g. if each input has 50,000 elements then you need 2.5 billion elements in the output array).
Moreover, since your bins are sorted (you didn't state this, but it looks like you are assuming it in your code) you do not need to compute the distance from every value to every bin. You can be much smarter,
function ind = value2sample(value, bins)
% Find median bin distance
delta = median(diff(bins));
% Bucket into 'nearest' bin by using midpoints
bins = bins(:);
mids = [-Inf; 0.5 * (bins(1:end-1) + bins(2:end))];
[~, ind] = histc(value, mids);
% Ensure that NaN values and points that aren't near any bin are returned as NaN
ind(isnan(value)) = NaN;
ind(abs(value - bins(ind)) > delta) = NaN;
In my tests, with values = randn(10000, 1) and bins = -50:50 it takes around 4.5 milliseconds to run the original function, and 485 microseconds to run the code above, so you are getting around a 10x speedup (and the speedup will be even greater as you increase the size of the inputs).
Thanks to #Chris Taylor, I was able to solve the problem very efficiently. The code now runs almost 400 times faster than before. The only changes I had to make from his version are reflected in the code below. Main issue was to replace histc (whose use is not encouraged anymore) by discretize.
function ind = value2sample(value, bins)
% Make sure the vectors are standing
value = value(:);
bins = bins(:);
% Bucket into 'nearest' bin by using midpoints
mids = [eps; 0.5 * (bins(1:end-1) + bins(2:end))];
ind = discretize(value, mids);
The only thing is, that in this implementation your bins must be non-negative. Other than that, this code does exactly what I want, including the fact that ind has the same size as value and contains NaNs whenever a value is NaN or out of the range of bins.

Random Algorithm with adjustable probability

I'm searching for an algorithm (no matter what programming language, maybe Pseudo-code?) where you get a random number with different probability's.
For example:
A random Generator, which simulates a dice where the chance for a '6'
is 50% and for the other 5 numbers it's 10%.
The algorithm should be scalable, because this is my exact problem:
I have a array (or database) of elements, from which i want to
select 1 random element. But each element should have a different
probability to be selected. So my idea is that every element get a
number. And this number divided by the sum of all numbers results the
chance for the number to be randomly selected.
Anybody know a good programming language (or library) for this problem?
The best solution would be a good SQL Query which delivers 1 random entry.
But i would also be happy with every hint or attempt in an other programming language.
A simple algorithm to achieve it is:
Create an auexillary array where sum[i] = p1 + p2 + ... + pi. This is done only once.
When you draw a number, draw a number r with uniform distribution over [0,sum[n]), and binary search for the first number higher than the uniformly distributed random number. It can be done using binary search efficiently.
It is easy to see that indeed the probability for r to lay in a certain range [sum[i-1],sum[i]), is indeed sum[i]-sum[i-1] = pi
(In the above, we regard sum[-1]=0, for completeness)
For your cube example:
You have:
p1=p2=....=p5 = 0.1
p6 = 0.5
First, calculate sum array:
sum[1] = 0.1
sum[2] = 0.2
sum[3] = 0.3
sum[4] = 0.4
sum[5] = 0.5
sum[6] = 1
Then, each time you need to draw a number: Draw a random number r in [0,1), and choose the number closest to it, for example:
r1 = 0.45 -> element = 4
r2 = 0.8 -> element = 6
r3 = 0.1 -> element = 2
r4 = 0.09 -> element = 1
An alternative answer. Your example was in percentages, so set up an array with 100 slots. A 6 is 50%, so put 6 in 50 of the slots. 1 to 5 are at 10% each, so put 1 in 10 slots, 2 in 10 slots etc. until you have filled all 100 slots in the array. Now pick one of the slots at random using a uniform distribution in [0, 99] or [1, 100] depending on the language you are using.
The contents of the selected array slot will give you the distribution you want.
ETA: On second thoughts, you don't actually need the array, just use cumulative probabilities to emulate the array:
r = rand(100) // In range 0 -> 99 inclusive.
if (r < 50) return 6; // Up to 50% returns a 6.
if (r < 60) return 1; // Between 50% and 60% returns a 1.
if (r < 70) return 2; // Between 60% and 70% returns a 2.
You already know what numbers are in what slots, so just use cumulative probabilities to pick a virtual slot: 50; 50 + 10; 50 + 10 + 10; ...
Be careful of edge cases and whether your RNG is 0 -> 99 or 1 -> 100.

Range update and querying in a 2D matrix

I don't have a scenario, but here goes the problem. This is one is just driving me crazy. There is a nxn boolean matrix initially all elements are 0, n <= 10^6 and given as input.
Next there will be up to 10^5 queries. Each query can be either set all elements of column c to 0 or 1, or set all elements of row r to 0 or 1. There can be another type of query, printing the total number of 1's in column c or row r.
I have no idea how to solve this and any help would be appreciated. Obviously a O(n) solution per query is not feasible.
The idea of using a number to order the modifications is taken from Dukeling's post.
We will need 2 maps and 4 binary indexed tree (BIT, a.k.a. Fenwick Tree): 1 map and 2 BITs for rows, and 1 map and 2 BITs for columns. Let us call them m_row, f_row[0], and f_row[1]; m_col, f_col[0] and f_col[1] respectively.
Map may be implemented with array, or tree like structure, or hashing. The 2 maps are used to store the last modification to a row/column. Since there can be at most 105 modification, you may use that fact to save space from simple array implementation.
BIT has 2 operations:
adjust(value, delta_freq), which adjusts the frequency of the value by delta_freq amount.
rsq(from_value, to_value), (rsq stands for range sum query) which finds the sum of the all the frequencies from from_value to to_value inclusive.
Let us declare global variable: version
Let us define numRow to be the number of rows in the 2D boolean matrix, and numCol to be the number of columns in the 2D boolean matrix.
The BITs should have size of at least MAX_QUERY + 1, since it is used to count the number of changes to the rows and columns, which can be as many as the number of queries.
version = 1
# Map should return <0, 0> for rows or cols not yet
# directly updated by query
m_row = m_col = empty map
f_row[0] = f_row[1] = f_col[0] = f_col[1] = empty BIT
Update algorithm:
update(isRow, value, idx):
if (isRow):
# Since setting a row/column to a new value will reset
# everything done to it, we need to erase earlier
# modification to it.
# For example, turn on/off on a row a few times, then
# query some column
<prevValue, prevVersion> = m_row.get(idx)
if ( prevVersion > 0 ):
f_row[prevValue].adjust( prevVersion, -1 ) idx, <value, version> )
f_row[value].adjust( version, 1 )
<prevValue, prevVersion> = m_col.get(idx)
if ( prevVersion > 0 ):
f_col[prevValue].adjust( prevVersion, -1 ) idx, <value, version> )
f_col[value].adjust( version, 1 )
version = version + 1
Count algorithm:
count(isRow, idx):
if (isRow):
# If this is row, we want to find number of reverse modifications
# done by updating the columns
<value, row_version> = m_row.get(idx)
count = f_col[1 - value].rsq(row_version + 1, version)
# If this is column, we want to find number of reverse modifications
# done by updating the rows
<value, col_version> = m_col.get(idx)
count = f_row[1 - value].rsq(col_version + 1, version)
if (isRow):
if (value == 1):
return numRow - count
return count
if (value == 1):
return numCol - count
return count
The complexity is logarithmic in worst case for both update and count.
Take version just to mean a value that gets auto-incremented for each update.
Store the last version and last update value at each row and column.
Store a list of (versions and counts of zeros and counts of ones) for the rows. The same for the columns. So that's only 2 lists for the entire grid.
When a row is updated, we set its version to the current version and insert into the list for rows the version and if (oldRowValue == 0) zeroCount = oldZeroCount else zeroCount = oldZeroCount + 1 (so it's not the number of zero's, rather the number of times a value was updated with a zero). Same for oneCount. Same for columns.
If you do a print for a row, we get the row's version and last value, we do a binary search for that version in the column list (first value greater than). Then:
if (rowValue == 1)
target = n*rowValue
- (latestColZeroCount - colZeroCount)
+ (latestColOneCount - colOneCount)
target = (latestColOneCount - colOneCount)
Not too sure whether the above will work.
That's O(1) for update, O(log k) for print, where k is the number of updates.
