dynamic array creation in cython - performance

Is there any way to dynamically create arrays in cython without using the horribly ugly kludge of malloc+pointer+free? There has to be some refcounting, garbage-collecting wrapper for this very basic function.
I need this to implement a ragged array.
inputs=[arr1,arr2,arr3,...]
...
NELEMENTS=len(inputs)
cdef np.ndarray[double,2] lookup[NELEMENTS] #<--- this is where I'm stuck
for i in range(NELEMENTS):
lookup[i]=inputs[i]
# data.shape =((5000,NELEMENTS))
for i in range(data.shape[0]):
for j in range(data.shape[1]):
do_something(lookup[j,data[i,j]])

If I understand correctly, there are at least 2 ways of doing what you want:
1) Create a 2-dimensional numpy array, where the size of the 2nd dimension is fixed by the largest of your input arrays. This will waste some space, but is easy, and efficient. You can use the zeros function to create a 2-dim array full of zeros, and then just populate the required entries. This is shown below as Option 1.
2) Create a nested numpy array, where lookup2[i] is a 1-dim numpy array of size defined by the number of elements in input[i]. This is also straight-forward, but less efficient, as the internal arrays are stored as generic Python objects.
inputs = [ [1] ,[2,3,4], [5,6], [7,8,9,10,11,12]]
NELEMENTS=len(inputs)
# Option 1: create 2-dim numpy array full of zeros, and only populate necessary
# parts
maxInputSize = max( [len(x) for x in inputs] )
cdef np.ndarray[double,ndim=2] lookup = np.zeros( (NELEMENTS, maxInputSize) )
for i in range(NELEMENTS):
for j in range(len(inputs[i])):
lookup[i][j] = inputs[i][j]
# Option 2: create nested numpy array
cdef np.ndarray[object, ndim=1] lookup2 = np.empty( (NELEMENTS,), dtype='object' )
for i in range(NELEMENTS):
nInputs = len(inputs[i])
lookup2[i] = np.zeros(nInputs)
for j in range(nInputs):
lookup2[i][j] = inputs[i][j]

Related

Sorting array of struct in Julia

Suppose if I have the following in Julia:
mutable struct emptys
begin_time::Dict{Float64,Float64}; finish_time::Dict{Float64,Float64}; Revenue::Float64
end
population = [emptys(Dict(),Dict(),-Inf) for i in 1:n_pop] #n_pop is a large positive integer value.
for ind in 1:n_pop
r = rand()
append!(population[ind].Revenue, r)
append!(population[ind].begin_time, Dict(r=>cld(r^2,rand())))
append!(population[ind].finish_time, Dict(r=>r^3/rand()))
end
Now I want to sort this population based on the Revenue value. Is there any way for Julia to achieve this? If I were to do it in Python it would be something like this:
sorted(population, key = lambda x: x.Revenue) # The population in Python can be prepared using https://pypi.org/project/ypstruct/ library.
Please help.
There is a whole range of sorting functions in Julia. The key functions are sort (corresponding to Python's sorted) and sort! (corresponding to Python's list.sort).
And as in Python, they have a couple of keyword arguments, one of which is by, corresponding to key.
Hence the translation of
sorted(population, key = lambda x: x.Revenue)
would be
getrevenue(e::emptys) = e.Revenue
sort(population, by=getrevenue)
Or e -> e.Revenue, but having a getter function is good style anyway.

How do I tell Julia in which order to iterate in an Array?

This answser led me to an other question:
When defining a new structure like this one:
struct ReversedRowMajor{T,A} <: AbstractMatrix{T}
data::A
end
ReversedRowMajor(data::AbstractMatrix{T}) where {T} = ReversedRowMajor{T, typeof(data)}(data)
Base.size(R::ReversedRowMajor) = reverse(size(R.data))
Base.getindex(R::ReversedRowMajor, i::Int, j::Int) = R.data[end-j+1, end-i+1]
If R is a ReversedRowMajor array, when accessing R[:,:] Julia will iterate through the CartesianIndices in the order which should be the fastest for the array i.e. the memory order (see array performance tips) but in this case it's not the expected one, since we are permutating the indices: (i,j) → (end-j+1, end-i+1).
So the question is: given an array, is there a way to tell Julia which axis is the fastest one?
(see also Multidimensional algorithms and iteration)

python: list of lists to ordered dict and group by first element

I have a list of lists list1 = [['colour','red'],['colour','blue],['shape','rect'],['shape','square']]
what is the fastest way to make an OrderedDict out of list1?
{colour:['red','blue'],shape:['rect','square']}
So far, I have been able to map through list1 and extract unique elements in index 0 of each inner list and return it as list2.
I could map through list1 and list2 and if maching element found then take element at index 1 from each inner list of list1 but I am not sure if it is right approach / fast approach.
any help please?
Two approaches, depending on your inputs:
Option 1: If, as in your example, all the matched keys are consecutive (so you always see all colours together), you can use itertools.groupby to group them:
from collections import OrderedDict
from itertools import groupby
from operator import itemgetter
list1 = [['colour','red'],['colour','blue],['shape','rect'],['shape','square']]
dict1 = OrderedDict((k, [v for _, v in grp]) for k, grp in groupby(list1, itemgetter(0)))
This is, at least theoretically, the fastest approach, since it writes each key in the dict exactly once without looking them up repeatedly each time a key is seen, but it relies on the input being ordered by key.
Option 2: Use the __missing__ special method to make an OrderedDict with the same behavior on looking up a missing key as defaultdict(list) (sadly, the two types are incompatible, so you can't make a class that inherits from both and call it a day), then write an explicit loop to fill it in:
from collections import OrderedDict
class OrderedMultidict(OrderedDict):
__slots__ = () # Avoid overhead of per-instance __dict__
def __missing__(self, key):
# Missing keys are seamlessly initialized to an empty list
self[key] = retval = []
return retval
Then use it to accumulate the results:
dict1 = OrderedMultidict()
for k, v in list1:
dict1[k].append(v)
This approach removes the ordering dependency of option 1, in exchange for adding repeated lookups of each key (though only the first lookup invokes Python level code in __missing__; after that, if OrderedDict is C level as in modern Python 3 code, the lookups will remain C level as well). That said, while repeated lookups are theoretically somewhat worse than writing each key exactly once, in practice I suspect this solution will be faster on modern CPython (where OrderedDict is a C built-in); on Python 2 and older Python 3, where it's implemented in Python (while groupby is always C level), groupby is more likely to win, but when both types are C accelerated, groupby actually has some additional overhead that may make it lose.

Cannot convert object of type 'list' to a numeric value

I am making a pyomo model, where i want to use random numbers for my two dimensional parameters. I put a small python script for random numbers that looks exactly what i wanted to see for my two dimensional parameter. I am getting a TypeError: Cannot convert object of type 'list'(value =[[....]] to a numeric value. in my objective function. Below is my objective function and random numbers script.
model.obj = Objective(expr=sum(model.C[v,l] * model.T[v,l] for v in model.V for l in model.L) + \
sum(model.OC[o,l] * model.D[o,l] for o in model.O for l in model.L), sense=minimize)
import random
C = [[] for i in range(7)]
for i in range(7):
for j in range(5):
C[i]+= [random.randint(100,500)]
model.C = Param(model.V, model.L, initialize=C)
Please let me know if someone can help fixing this.
You should initialize your parameter using a function instead of a nested list
def init_c(m, i, j):
return random.randint(100,500)
model.c = Param(model.V, model.L, initialize=init_c)

Numpy savetxt loop

Using Numpy, I am going to split an array of dimension (557124,2), dtype = "S10", in 6 subarrays using:
sub_arr = np.split(arr, 6)
Now I would like to use a for loop on savetxt and save the 6 subarrays to 6 .txt files.
I tried:
for i in sub_array:
np.savetxt(("Subarray", i, ".txt"), sub_array[i], fmt='%s')
There are 2 problems:
It's incorrect to say in sub_array. I should use range(5) but I want to make it adaptable to any number of sub arrays.
I thought I could use a sort of "paste" as in R when I did ("Subarray", i, ".txt"). Is there anything alike in Numpy?
Any idea?
From what I've understood
sub_arr = np.split(arr, 6)
returns a list of 6 numpy arrays. Then you can use enumerate to get each array and its index
fname_template = "Subarray.{i}.txt"
for i, sarr in enumerate(sub_arr):
np.savetxt(fname_template.format(i=i), sarr, fmt='%s')
To create the file name I've used the new string formatting. Otherwise you can concatenate strings with + as "Subarray."+str(i)+".txt", but you have to make sure that all the elements that you concatenate are strings.

Resources