Filtering a range in Julia - filter

I have some MWE below. What I want is to have a subsection of a range, interact with the rest of the range, but not itself.
For instance if the range is 1:100, I want to have a for loop that will have each index in 4:6, interact with all values of 1:100 BUT NOT 4:6.
I want to do this using ranges/filters to avoid generating temporary arrays.
In my case the total range is the number of atoms in the system. The sub-range, is the atoms in a specific molecule. I need to do calculations where each atom in a molecule interacts with all other atoms, but not the atoms in the same molecule.
Further
I am trying to avoid using if statements because that messes up parallel codes. Doing this with an if statement would be
for i=4:6
for j = 1:100
if j == 4 || j==5 || j==6
continue
end
println(i, " ", j)
end
end
I have actual indexing in my code, I would never hardcode values like the above... But I want to avoid that if statement.
Trials
The following does what I want, but I now realize that using filter is bad when it comes to memory and the amount used scales linearly with b.
a = 4:6
b = 1:100
for i in a
for j in filter((b) -> !(b in a),b)
print(i, " ", j)
end
end
Is there a way to get the double for loop I want where the outer is a sub-range of the inner, but the inner does not include the outer sub-range and most importantly is fast and does not create alot of memory usage like filter does?

If memory usage is really a concern, consider two for loops using the range components:
systemrange = 1:50
moleculerange = 4:12
for i in systemrange[1]:moleculerange[1]-1
println(i)
end
for i in moleculerange[end]+1:systemrange[end]
println(i)
end
You might be able to do each loop in its own thread.

What about creating a custom iterator?
Note that example below needs some adjustments depending on how you define the exception lists (for example for long list with non continues indices you should use binary search).
struct RangeExcept
start::Int
stop::Int
except::UnitRange{Int}
end
function Base.iterate(it::RangeExcept, (el, stop, except)=(it.except.start > 1 ? it.start : it.except.stop+1, it.stop, it.except))
new_el = el+1
if new_el in except
new_el = except.stop+1
end
el > stop && return nothing
return (el, (new_el, stop,except))
end
Now let us test the code:
julia> for i in RangeExcept(1,10,3:7)
println(i)
end
1
2
8
9
10

Related

Convert structure fields to arrays efficiently matlab

I have a structure called s in Matlab. This is a structure with two fields a and b. The structure size is 1 x 1,620,000.
It is a very large structure (that probably takes half of the ram of my machine). This is what the structure looks like:
I am looking for an efficient way to concatenate each of the fields a and b into two separate arrays that I can then export to csv. I built the code below, to do so, but even after 12 hours running it has not even reached a quarter of the loop. Any more efficient way of doing this?
a = [];
b =[];
total_n = size(s,2);
count = 1;
while size(s,2)>0
if size(s(1).a,1)
a = [a; s(1).a];
end
if size(s(1).b,1)
b = [b; s(1).b];
end
s(1) = []; %to save memory
if mod(count,1000) == 0
fprintf('Done %2f \n', [count/total_n])
end
count = count+1;
end
s(1) = []; %to save memory
ah, but such huge misunderstanding that comment is.
if size(s) is 1 x 1,620,000, you just suddenly forced the loop to do (under the hood, you dont see it)
snew=zeros(1,size(s,2)-1) # now you use double memory
snew=s(2:end) # now you force an unnecesary copy
So not only does that line make your code require double the memory, but also in each loop, you make an unnecesary copy of a large array.
Just replace your while for a normal for loop of for ii=1:size(s,2) and then index s!
Now, you can see hopefully then why the following is equally a big mistake (not only that, but any modern MATLAB version is currently telling you this is a bad idea in your editor)
a=[]
a=[a;s(1).a]
In here in each loop you are forcing MATLAB to make a new a that is 1 bigger than before, and copy the contents of the old a there.
instead, preallocate the size of a.
As you don't know what you are going to put there, I suggest using a cell array, as each s(ii).a has a different length.
You can then, after the loop, remove all empty (isempty) cells if you want.
Managed to do it efficiently:
s= struct2cell(s);
s= squeeze(s);
a = a(1,:);
a = a';
a = vertcat(a{:});
b = a(2,:);
b = b';
b = vertcat(b{:});

Employ early bail-out in MATLAB

There is a example for Employ early bail-out in this book (http://www.amazon.com/Accelerating-MATLAB-Performance-speed-programs/dp/1482211297) (#YairAltman). for speed improvement we can convert this code:
data = [];
newData = [];
outerIdx = 1;
while outerIdx <= 20
outerIdx = outerIdx + 1;
for innerIdx = -100 : 100
if innerIdx == 0
continue % skips to next innerIdx (=1)
elseif outerIdx > 15
break % skips to next outerIdx
else
data(end+1) = outerIdx/innerIdx;
newData(end+1) = process(data);
end
end % for innerIdx
end % while outerIdx
to this code:
function bailableProcessing()
for outerIdx = 1 : 5
middleIdx = 10
while middleIdx <= 20
middleIdx = middleIdx + 1;
for innerIdx = -100 : 100
data = outerIdx/innerIdx + middleIdx;
if data == SOME_VALUE
return
else
process(data);
end
end % for innerIdx
end % while middleIdx
end % for outerIdx
end % bailableProcessing()
How we did this conversion? Why we have different middleIdx range in new code? Where is checking for innerIdx and outerIdx in new code? what is this new data = outerIdx/innerIdx + middleIdx calculation?
we have only this information for second code :
We could place the code segment that should be bailed-out within a
dedicated function and return from the function when the bail-out
condition occurs.
I am sorry that I did not clarify within the text that the second code segment is not a direct replacement of the first. If you reread the early bail-out section (3.1.3) perhaps you can see that it has two main parts:
The first part of the section (which includes the top code segment) illustrates the basic mechanism of using break/continue in order to bail-out from a complex processing loop, in order to save processing time in computing values that are not needed.
In contrast, the second part of the section deals with cases when we wish to break out of an ancestor loop that is not the direct parent loop. I mention in the text that there are three alternatives that we can use in this case, and the second code segment that you mentioned is one of them (the other alternatives are to use dedicated flags with break/continue and to use try/catch blocks). The three code segments that I provided in this second part of the section should all be equivalent to each other, but they are NOT equivalent to the code-segment at the top of the section.
Perhaps I should have clarified this in the text, or maybe I should have used the same example throughout. I will think about this for the second edition of the book (if and when it ever appears).
I have used a variant of these code segments in other sections of the book to illustrate various other aspects of performance speedups (for example, 3.1.4 & 3.1.6) - in all these cases the code segments are NOT equivalent to each other. They are merely used to illustrate the corresponding text.
I hope you like my book in general and think that it is useful. I would be grateful if you would place a positive feedback about it on Amazon (direct link).
p.s. - #SamRoberts was correct to surmise that mention of my name would act as a "bat-signal", attracting my attention :-)
it's all far more simple than you think!
How we did this conversion?
Irrationally. Those two codes are completely different.
Why we have different middleIdx range in new code?
Randomness. The point of the author is something different.
Where is checking for innerIdx and outerIdx in new code?
dont need that, as it's not intended to be the same code.
what is this new data = outerIdx/innerIdx + middleIdx calculation?
a random calculation as well as data(end+1) = outerIdx/innerIdx; in the original code.
i suppose the author wants to illustrate something far more profoundly: that if you wrap your code that does (possibly many) loops (fors/whiles, doesnt matter) inside a function and you issue a return statement if you somehow detect that you're done, it will result in an effectively "bailable" computation, e.g. the method that does the work returns earlier than it would normally do. that is illustrated here by the condition that checks on data == SOME_VALUE; you can have your favourite bailout condition there instead :-)
moreover, the keywords [continue/break] inside the first example are meant to illustrate that you can [skip the rest of/leave] the inner-most loop from whereever you call them. in principal, you can implement a bailout using these by e.g.
bailing = false;
for outer = 1:1000
for inner = 1:1000
if <somebailingcondition>
bailing = true;
break;
else
<do stuff>
end
end
if bailing
break;
end
end
but that would be very clumsy as that "cascade" of breaks will need to be as long as you have nested loops and messes up the code.
i hope that could clarify your issues.

Lua - why for loop limit is not calculated dynamically?

Ok here's a basic for loop
local a = {"first","second","third","fourth"}
for i=1,#a do
print(i.."th iteration")
a = {"first"}
end
As it is now, the loop executes all 4 iterations.
Shouldn't the for-loop-limit be calculated on the go? If it is calculated dynamically, #a would be 1 at the end of the first iteration and the for loop would break....
Surely that would make more sense?
Or is there any particular reason as to why that is not the case?
The main reason why numerical for loops limits are computed only once is most certainly for performance.
With the current behavior, you can place arbitrary complex expressions in for loops limits without a performance penalty, including function calls. For example:
local prod = 1
for i = computeStartLoop(), computeEndLoop(), computeStep() do
prod = prod * i
end
The above code would be really slow if computeEndLoop and computeStep required to be called at each iteration.
If the standard Lua interpreter and most notably LuaJIT are so fast compared to other scripting languages, it is because a number of Lua features have been designed with performance in mind.
In the rare cases where the single evaluation behavior is undesirable, it is easy to replace the for loop with a generic loop using while end or repeat until.
local prod = 1
local i = computeStartLoop()
while i <= computeEndLoop() do
prod = prod * i
i = i + computeStep()
end
The length is computed once, at the time the for loop is initialized. It is not re-computed each time through the loop - a for loop is for iterating from a starting value to an ending value. If you want the 'loop' to terminate early if the array is re-assigned to, you could write your own looping code:
local a = {"first", "second", "third", "fourth"}
function process_array (fn)
local inner_fn
inner_fn =
function (ii)
if ii <= #a then
fn(ii,a)
inner_fn(1 + ii)
end
end
inner_fn(1, a)
end
process_array(function (ii)
print(ii.."th iteration: "..a[ii])
a = {"first"}
end)
Performance is a good answer but I think it also makes the code easier to understand and less error-prone. Also, that way you can (almost) be sure that a for loop always terminates.
Think about what would happen if you wrote that instead:
local a = {"first","second","third","fourth"}
for i=1,#a do
print(i.."th iteration")
if i > 1 then a = {"first"} end
end
How do you understand for i=1,#a? Is it an equality comparison (stop when i==#a) or an inequality comparison (stop when i>=#a). What would be the result in each case?
You should see the Lua for loop as iteration over a sequence, like the Python idiom using (x)range:
a = ["first", "second", "third", "fourth"]
for i in range(1,len(a)+1):
print(str(i) + "th iteration")
a = ["first"]
If you want to evaluate the condition every time you just use while:
local a = {"first","second","third","fourth"}
local i = 1
while i <= #a do
print(i.."th iteration")
a = {"first"}
i = i + 1
end

How to increase memory to handle super large Lua tables

I have a Lua function that, given n, generates all permutations of the series from 1 to n and stores each unique series in table form within a container table.
The size of this generated table gets very large very quickly (and necessarily so). About the time I try n = 11, the script will run for several seconds before failing out to "lua: not enough memory." I have 16gb of physical RAM, but watching the performance monitor in the Windows task manager allows me to watch the ram be consumed during run time, and it only gets up to about 20% before the script ends with the memory error.
I found this post that looks like the direction I need to head: memory of a process in Lua
Since I'm running my script with Lua.exe, I'm assuming that I'm limited to how much memory Windows allocates for Lua.exe. Can I increase this amount? Can I use a C# wrapper program to simply run the Lua script (the idea being that it will have a higher/less restricted memory allocation)? Or am I looking in the wrong direction?
Do you need to store all the permutations in advance? You could generate them on-the-fly instead.
Example:
local function genPerm(self, i)
local result = {}
local f = 1
for j = 1, self.n do
f = f * j
table.insert(result, j)
end
for j = 1, self.n-1 do
f = f / (self.n + 1 - j)
local k = math.floor((i - 1) / f)
table.insert(result, j, table.remove(result, j+k))
i = i - k * f
end
return result
end
local function perms(n)
return setmetatable({n=n}, {__index=genPerm})
end
local generator = perms(11)
for _, i in ipairs {1, 42, 1000000, 39916800} do
print(table.concat(generator[i], ','))
end
In the same vein as finn's answer, here is another permutation generator:
local function perms(a,lo,hi,f)
if lo>hi then f(a) end
for i=lo,hi do
a[lo],a[i]=a[i],a[lo]
perms(a,lo+1,hi,f)
a[lo],a[i]=a[i],a[lo]
end
end
local function gperms(n,f)
local a={}
for i=1,n do a[i]=i end
perms(a,1,#a,f)
end
local function show(a)
for i=1,#a do io.write(a[i],' ') end
io.write('\n')
end
gperms(4,show)
You could perhaps use a memory-mapped file on the C++ side of Lua, for which you could provide an API via LuaBridge.
Update 1: an alternative to a memory-mapped file could be a NoSQL database

ruby parallel assignment, step question

so, i'm trying to learn ruby by doing some project euler questions, and i've run into a couple things i can't explain, and the comma ?operator? is in the middle of both. i haven't been able to find good documentation for this, maybe i'm just not using the google as I should, but good ruby documentation seems a little sparse . . .
1: how do you describe how this is working? the first snippet is the ruby code i don't understand, the second is the code i wrote that does the same thing only after painstakingly tracing the first:
#what is this doing?
cur, nxt = nxt, cur + nxt
#this, apparently, but how to describe the above?
nxt = cur + nxt
cur = nxt - cur
2: in the following example, how do you describe what the line with 'step' is doing? from what i can gather, the step command works like (range).step(step_size), but this seems to be doing (starting_point).step(ending_point, step_size). Am i right with this assumption? where do i find good doc of this?
#/usr/share/doc/ruby1.9.1-examples/examples/sieve.rb
# sieve of Eratosthenes
max = Integer(ARGV.shift || 100)
sieve = []
for i in 2 .. max
sieve[i] = i
end
for i in 2 .. Math.sqrt(max)
next unless sieve[i]
(i*i).step(max, i) do |j|
sieve[j] = nil
end
end
puts sieve.compact.join(", ")
1: It's called parallel assignment. Ruby cares to create temporal variables and not override your variables with incorrect values. So this example:
cur, nxt = nxt, cur + nxt
is the same as:
tmp = cur + nxt
cur = nxt
nxt = tmp
bur more compact, without place to make stupid mistake and so on.
2: There are 2 step methods in ruby core library. First is for Numeric class (every numbers), so you could write:
5.step(100, 2) {}
and it starts at 5 and takes every second number from it, stops when reaches 100.
Second step in Ruby is for Range:
(5..100).step(2) {}
and it takes range (which has start and end) and iterates through it taking every second element. It is different, because you could pass it not necessarily numeric range and it will take every nth element from it.
Take a look at http://ruby-doc.org/core-1.8.7/index.html
This a parallel assignment. In your example, Ruby first evaluates nxt and cur + nxt. It then assigns the results to cur and nxt respectively.
The step method in the code is actually Numeric#step (ranges are constructed with (n..m)). The step method of Numeric iterates using the number it is called on as a starting point. The arguments are the limit and step size respectively. The code above will therefore invoke the block with i * i and then each successive increment of i until max is reached.
A good starting point for Ruby documentation is the ruby-doc.org site.

Resources