Data structure for conditional probabilities with updating conditions - data-structures

I have a list of lists of 4 integers. All integers in the same list are distinct, e.g.,
data = [[8, 9, 3, 0], [3, 8, 4, 9], [7, 9, 6, 4], [3, 6, 4, 8], [0, 5, 3, 7], [0, 9, 4, 2], [9, 0, 1, 5], [3, 2, 8, 6], [3, 5, 4, 0], [1, 2, 5, 9], [1, 3, 6, 5], [2, 4, 5, 7], [7, 8, 6, 3], [6, 2, 9, 8], [8, 7, 5, 4], [8, 5, 1, 3]]
Currently I have a function that receives the list above and a list of distinct integers certain. The function returns a list with the probabilities of each integer (from 0 to 9) being in a list knowing that it contains the integers in certain.
def probability(data, certain):
probs = [0] * 10
counter_total = 0
set_certain = set(certain)
for d in data:
if set_certain.issubset(d):
counter_total += 1
for i in range(10):
if i in d and i not in set_certain:
probs[i] += 1
probs = [x / counter_total for x in probs]
return probs
Initially, the list certain is empty and values are added later. Is there a data structure I can use in the start of the program so that I don't have to go through all the data again every time I append a new value to certain? The list data can be very big.

Related

How do I generate a nested array that contains all possible 4 digit permutations 6 numbers with repeating values? ruby

Given a number of digits n = 1, 2, 3, 4, 5, 6.
I wanted to generate a nested array S that will contain all possible 4 digit permutations of n.
since 6^4 = 1296, there will be 1296 possible permutations.
Example:
S = [[1,1,1,1],[1,1,1,2],[1,1,2,2]...[6,6,6,6]]
I started the nested loop with the first index with value of [1,1,1,1]
Then used a for in loop with range 0..1295 and tried to carry over the value of S[i] to S[i+1]
then increment the value of S[i+1][x], where x always starts at 3 then is decremented until it reaches 0 then it becomes 3 again. The problem with my procedure is when i try to increment the S[i+1][x], S[i] also increments its S[i][x].
In the code below S is instead called 'all_possible_combinations'
all_possible_combinations = Array.new(1296) {Array.new(4)}
all_possible_combinations[0] = [1, 1 ,1 ,1]
x = 3
for i in 0..1295
if i + 1 == 1296
break
else
all_possible_combinations[i+1] = all_possible_combinations[i]
all_possible_combinations[i+1][x] += 1
x -= 1
if x == 0
x = 3
end
end
end
[Attached image shows debugging process where Si][x] also gets incremented
You may compute that array as follows.
a = [1, 2, 3, 4, 5, 6]
b = a.repeated_permutation(4).to_a
#=> [[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 1, 3], [1, 1, 1, 4], [1, 1, 1, 5],
# [1, 1, 1, 6], [1, 1, 2, 1], [1, 1, 2, 2], [1, 1, 2, 3], [1, 1, 2, 4],
# ...
# [6, 6, 5, 3], [6, 6, 5, 4], [6, 6, 5, 5], [6, 6, 5, 6], [6, 6, 6, 1],
# [6, 6, 6, 2], [6, 6, 6, 3], [6, 6, 6, 4], [6, 6, 6, 5], [6, 6, 6, 6]]
b.size
#=> 1296
See Array#repeated_permutation
If the array a may contain duplicates and you wish to remove duplicate permutations you may wish to tack on Array#uniq.
a = [1, 1, 3, 1, 1, 6]
b = a.repeated_permutation(4).to_a.uniq
#=> [[1, 1, 1, 1], [1, 1, 1, 3], [1, 1, 1, 6], [1, 1, 3, 1],
# [1, 1, 3, 3], [1, 1, 3, 6], [1, 1, 6, 1], [1, 1, 6, 3],
# [1, 1, 6, 6], [1, 3, 1, 1], [1, 3, 1, 3], [1, 3, 1, 6],
# [1, 3, 3, 1], [1, 3, 3, 3], [1, 3, 3, 6], [1, 3, 6, 1],
# [1, 3, 6, 3], [1, 3, 6, 6], [1, 6, 1, 1], [1, 6, 1, 3],
# [1, 6, 1, 6], [1, 6, 3, 1], [1, 6, 3, 3], [1, 6, 3, 6],
# [1, 6, 6, 1], [1, 6, 6, 3], [1, 6, 6, 6], [3, 1, 1, 1],
# [3, 1, 1, 3], [3, 1, 1, 6], [3, 1, 3, 1], [3, 1, 3, 3],
# [3, 1, 3, 6], [3, 1, 6, 1], [3, 1, 6, 3], [3, 1, 6, 6],
# [3, 3, 1, 1], [3, 3, 1, 3], [3, 3, 1, 6], [3, 3, 3, 1],
# [3, 3, 3, 3], [3, 3, 3, 6], [3, 3, 6, 1], [3, 3, 6, 3],
# [3, 3, 6, 6], [3, 6, 1, 1], [3, 6, 1, 3], [3, 6, 1, 6],
# [3, 6, 3, 1], [3, 6, 3, 3], [3, 6, 3, 6], [3, 6, 6, 1],
# [3, 6, 6, 3], [3, 6, 6, 6], [6, 1, 1, 1], [6, 1, 1, 3],
# [6, 1, 1, 6], [6, 1, 3, 1], [6, 1, 3, 3], [6, 1, 3, 6],
# [6, 1, 6, 1], [6, 1, 6, 3], [6, 1, 6, 6], [6, 3, 1, 1],
# [6, 3, 1, 3], [6, 3, 1, 6], [6, 3, 3, 1], [6, 3, 3, 3],
# [6, 3, 3, 6], [6, 3, 6, 1], [6, 3, 6, 3], [6, 3, 6, 6],
# [6, 6, 1, 1], [6, 6, 1, 3], [6, 6, 1, 6], [6, 6, 3, 1],
# [6, 6, 3, 3], [6, 6, 3, 6], [6, 6, 6, 1], [6, 6, 6, 3],
# [6, 6, 6, 6]]
b.size
#=> 81
To create a sequence where each element is generated based on the previous one, there's Enumerator.produce, e.g.:
enum = Enumerator.produce([1, 1, 1, 1]) do |a, b, c, d|
d += 1 # ^^^^^^^^^^^^
# initial value
if d > 6
d = 1
c += 1
end
if c > 6
c = 1
b += 1
end
if b > 6
b = 1
a += 1
end
if a > 6
raise StopIteration # <- stops enumeration
end
[a, b, c, d] # <- return value = next value
end
I've kept the example intentionally simple, using an explicit variable for each of the four digits. You could of course also have an array and use a little loop to handle the increment / carry.
The above gives you:
enum.count #=> 1296
enum.first(3) #=> [[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 1, 3]]
enum.to_a.last(3) #=> [[6, 6, 6, 4], [6, 6, 6, 5], [6, 6, 6, 6]]

Yen's K shortest Path giving incorrect results (Python)

I am trying to implement the Yen's K Shortest Path Algorihtm based on the pseudo-code from https://en.wikipedia.org/wiki/Yen%27s_algorithm. Here is the code.
import numpy as np
import networkx as nx
edge_list = [[0, 1], [0, 2], [0, 7], [1, 2], [1, 9], [2, 5], [2, 7], [2, 9], [3, 4], [3, 5], [3, 6], [3, 8], [4, 5], [4, 6], [4, 7], [4, 8], [5, 6], [5, 7], [5, 8], [6, 8], [7, 8]]
graph = nx.Graph()
graph.add_edges_from(edge_list)
nx.draw(graph, with_labels = True)
source_node = 8
destination_node = 9
def yen_ksp(graph, source, sink, K):
A, B = [], []
A.append(nx.shortest_path(graph, source=source, target=sink))
for k in range(1, 1+K):
for i in range(len(A[k - 1]) - 1):
spurNode = A[k-1][i]
rootPath = A[k-1][0:i+1]
removed_edges, removed_nodes = [], []
for p in A:
if rootPath == p[0:i+1] and p[i:i+2] not in removed_edges:
removed_edges.append(p[i:i+2])
for edge in removed_edges:
graph.remove_edge(edge[0], edge[1])
try:
spurPath = nx.shortest_path(graph, source=spurNode, target=sink)
except:
for edge in removed_edges:
graph.add_edge(edge[0], edge[1])
continue
totalPath = rootPath + spurPath[1:]
B.append(totalPath)
for edge in removed_edges:
graph.add_edge(edge[0], edge[1])
if B == []:
# This handles the case of there being no spur paths, or no spur paths left.
# This could happen if the spur paths have already been exhausted (added to A),
# or there are no spur paths at all - such as when both the source and sink vertices
# lie along a "dead end".
break
B.sort()
A.append(B[-1])
B.pop(-1)
return A
print(yen_ksp(graph.copy(), source_node, destination_node, 10))
This is supposed to be an undirected, unweighted graph generated from the above code.
And this is the output of the code.
[[8, 5, 2, 9],
[8, 7, 2, 9],
[8, 7, 2, 1, 9],
[8, 7, 2, 1, 2, 9],
[8, 7, 2, 1, 2, 1, 9],
[8, 7, 2, 1, 2, 1, 2, 9],
[8, 7, 2, 1, 2, 1, 2, 1, 9],
[8, 7, 2, 1, 2, 1, 2, 1, 2, 9],
[8, 7, 2, 1, 2, 1, 2, 1, 2, 1, 9],
[8, 7, 2, 1, 2, 1, 2, 1, 2, 1, 2, 9],
[8, 7, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 9]]
Obviously there are shorter paths that the algorithm missed. And, the results contain paths that have loops. I want only the ones without.
Also, in other cases, the results were in the wrong order, some longer paths appear before other paths that are shorter. In the KSP problem, the order of results is obviously important because if I stop at some k, I want to be sure that there is no shorter path that I have missed.
I am open to other algorithms that can correctly and effectively solve this problem of KSP without loops on undirected-unweighted graphs.
Please help.
Networkx provides a function for generating a list of all simple paths in a graph from source to target, starting from shortest ones: shortest_simple_paths. This procedure is based exactly on Yen's algorithm, as you can read in the documentation.
Using it is very simple:
paths = list(nx.shortest_simple_paths(graph, source_node, target_node))
If you want only the first K shortest paths you can make use of islice:
from itertools import islice
paths = list(islice(nx.shortest_simple_paths(graph, source_node, target_node), K))
Example:
from itertools import islice
K = 10
source_node = 8
target_node = 9
graph = nx.Graph()
edge_list = [[0, 1], [0, 2], [0, 7], [1, 2], [1, 9], [2, 5], [2, 7],
[2, 9], [3, 4], [3, 5], [3, 6], [3, 8], [4, 5], [4, 6],
[4, 7], [4, 8], [5, 6], [5, 7], [5, 8], [6, 8], [7, 8]]
graph.add_edges_from(edge_list)
for path in islice(nx.shortest_simple_paths(graph, source_node, target_node), K):
print(path)
Output:
[8, 5, 2, 9]
[8, 7, 2, 9]
[8, 5, 7, 2, 9]
[8, 5, 2, 1, 9]
[8, 3, 5, 2, 9]
[8, 7, 0, 1, 9]
[8, 7, 2, 1, 9]
[8, 4, 5, 2, 9]
[8, 7, 5, 2, 9]
[8, 7, 0, 2, 9]
If you want to understand how shortest_simple_path is implemented you can check out its source code: it's well written and very easy to understand!

Breaking a matrix into smaller sub-lists

I can't wrap my head around how to achieve this. To be more specific, I would like to break the following matrix
matrix = [[7, 9, 1, 8, 9, 1],
[4, 2, 1, 2, 1, 5],
[3, 2, 3, 1, 2, 3],
[7, 9, 11, 6, 4, 8],
[8, 9, 22, 3, 1, 9],
[1, 1, 1, 1, 1, 1]]
into:
[[7, 9,
4, 2],
[1, 8,
1, 2],
[9, 1,
1, 5],
[3, 2,
7, 9],
[3, 1,
11, 6],
[2, 3,
4, 8],
[8, 9,
1, 1],
[22, 3,
1, 1],
[1, 9,
1, 1]]
Or equivalently,
[[7, 9, 4, 2],
[1, 8, 1, 2],
[9, 1, 1, 5],
[3, 2, 7, 9],
[3, 1, 11, 6],
[2, 3, 4, 8],
[8, 9, 1, 1],
[22, 3, 1, 1],
[1, 9, 1, 1]]
Here is what I have tried doing:
def split([[]]) -> [[]]
split_matrix = []
temp_map = []
row_limit, col_limit = 2, 2
for row in range(len(elevation_map)):
for col in range(len(elevation_map)):
elevation = elevation_map[row][col]
if row < row_limit and col < col_limit:
temp_map.append(elevation)
split_matrix.append(temp_map)
return split_matrix
However, I had no luck in doing so.
Is there a way to do it without using libraries like numpy? Is it possible?
The solution is going to be neater if we write a helper function to extract one 2x2 sub-matrix into a list. After that, it's a simple list comprehension, iterating over the coordinates of the top-left of each submatrix.
def split_matrix(matrix, rows=2, cols=2):
def helper(i, j):
out = []
for row in matrix[i:i+rows]:
out.extend(row[j:j+cols])
return out
width, height = len(matrix[0]), len(matrix)
return [
helper(i, j)
for i in range(0, height, rows)
for j in range(0, width, cols)
]

Sorting a vector of vectors by multiple elements in Julia

I've had good read of the 'Sorting Functions' section of the Julia manual, and had a look at some of the similar questions that have already been asked on this board, but I don't think I've quite found the answer to my question. Apologies if I've missed something.
Essentially I have a vector of vectors, with the enclosed vectors containing integers. For the purposes of the example, each enclosed vector contains 3 integers, but it could be any number. I want to sort the enclosed vectors by the first element, then by the second element, then by the third element etc.
Let's start with the vector:
v = [[3, 6, 1], [2, 2, 6], [1, 5, 9], [2, 1, 8], [3, 7, 9],
[1, 1, 2], [2, 2, 2], [3, 6, 2], [1, 2, 5], [1, 5, 6],
[3, 7, 4], [2, 1, 4], [2, 2, 1], [3, 1, 2], [1, 2, 8]]
And continue with what I'm actually looking for:
v = [[1, 1, 2], [1, 2, 5], [1, 2, 8], [1, 5, 6], [1, 5, 9],
[2, 1, 4], [2, 1, 8], [2, 2, 1], [2, 2, 2], [2, 2, 6],
[3, 1, 2], [3, 6, 1], [3, 6, 2], [3, 7, 4], [3, 7, 9]]
So there should be no requirement for rocket science.
I can easily sort the vector by the first element of the enclosed vectors by one of two ways:
v = sort(v, lt = (x, y) -> isless(x[1], y[2]))
or:
v = sort(v, by = x -> x[1])
Both these methods produce the same answer:
v = [[1, 5, 9], [1, 1, 2], [1, 2, 5], [1, 5, 6], [1, 2, 8],
[2, 2, 6], [2, 1, 8], [2, 2, 2], [2, 1, 4], [2, 2, 1],
[3, 6, 1], [3, 7, 9], [3, 6, 2], [3, 7, 4], [3, 1, 2]]
So, as you can see, I have sorted by the first element of the enclosed vectors, but not by the subsequent elements.
So, to come back to the question in the title, is there a method of sorting by multiple elements using the sort() function?
I can actually get what I want using loops:
for i = 3:-1:1
v = sort(v, lt = (x, y) -> isless(x[i], y[i]))
end
or:
for i = 3:-1:1
v = sort(v, by = x -> x[i])
end
However, I don't want to re-invent the wheel, so if there's a way of doing it within the sort() function I'd love to learn about it.
You can use lexless function as lt keyword argument that does exactly what you want if I understood your question correctly:
julia> sort(v, lt=lexless)
15-element Array{Array{Int64,1},1}:
[1, 1, 2]
[1, 2, 5]
[1, 2, 8]
[1, 5, 6]
[1, 5, 9]
[2, 1, 4]
[2, 1, 8]
[2, 2, 1]
[2, 2, 2]
[2, 2, 6]
[3, 1, 2]
[3, 6, 1]
[3, 6, 2]
[3, 7, 4]
[3, 7, 9]
EDIT: I have just checked that this is a solution for Julia 0.6. In Julia 0.7 you can simply write:
julia> sort(v)
15-element Array{Array{Int64,1},1}:
[1, 1, 2]
[1, 2, 5]
[1, 2, 8]
[1, 5, 6]
[1, 5, 9]
[2, 1, 4]
[2, 1, 8]
[2, 2, 1]
[2, 2, 2]
[2, 2, 6]
[3, 1, 2]
[3, 6, 1]
[3, 6, 2]
[3, 7, 4]
[3, 7, 9]

Split a 3D numpy array into 3D blocks

I would like to split a 3D numpy array into 3D blocks in a 'pythonic' way. I am working with image sequences that are somewhat large arrays (1000X1200X1600), so I need to split them into pieces to do my processing.
I have written functions to do this, but I am wondering if there is a native numpy way to accomplish this - numpy.split does not seem to do what I want for 3D arrays (but perhaps I don't understand its functionality)
To be clear: the code below accomplishes my task, but I am seeking a faster way to do it.
def make_blocks(x,t):
#x should be a yXmXn matrix, and t should even divides m,n
#returns a list of 3D blocks of size yXtXt
down = range(0,x.shape[1],t)
across = range(0,x.shape[2],t)
reshaped = []
for d in down:
for a in across:
reshaped.append(x[:,d:d+t,a:a+t])
return reshaped
def unmake_blocks(x,d,m,n):
#this takes a list of matrix blocks of size dXd that is m*n/d^2 long
#returns a 2D array of size mXn
rows = []
for i in range(0,int(m/d)):
rows.append(np.hstack(x[i*int(n/d):(i+1)*int(n/d)]))
return np.vstack(rows)
Here are vectorized versions of those loopy implementations using a combination of permuting dims with np.transpose and reshaping -
def make_blocks_vectorized(x,d):
p,m,n = x.shape
return x.reshape(-1,m//d,d,n//d,d).transpose(1,3,0,2,4).reshape(-1,p,d,d)
def unmake_blocks_vectorized(x,d,m,n):
return np.concatenate(x).reshape(m//d,n//d,d,d).transpose(0,2,1,3).reshape(m,n)
Sample run for make_blocks -
In [120]: x = np.random.randint(0,9,(2,4,4))
In [121]: make_blocks(x,2)
Out[121]:
[array([[[4, 7],
[8, 3]],
[[0, 5],
[3, 2]]]), array([[[5, 7],
[4, 0]],
[[7, 3],
[5, 7]]]), ... and so on.
In [122]: make_blocks_vectorized(x,2)
Out[122]:
array([[[[4, 7],
[8, 3]],
[[0, 5],
[3, 2]]],
[[[5, 7],
[4, 0]],
[[7, 3],
[5, 7]]], ... and so on.
Sample run for unmake_blocks -
In [135]: A = [np.random.randint(0,9,(3,3)) for i in range(6)]
In [136]: d = 3
In [137]: m,n = 6,9
In [138]: unmake_blocks(A,d,m,n)
Out[138]:
array([[6, 6, 7, 8, 6, 4, 5, 4, 8],
[8, 8, 3, 2, 7, 6, 8, 5, 1],
[5, 2, 2, 7, 1, 2, 3, 1, 5],
[6, 7, 8, 2, 2, 1, 6, 8, 4],
[8, 3, 0, 4, 4, 8, 8, 6, 3],
[5, 5, 4, 8, 5, 2, 2, 2, 3]])
In [139]: unmake_blocks_vectorized(A,d,m,n)
Out[139]:
array([[6, 6, 7, 8, 6, 4, 5, 4, 8],
[8, 8, 3, 2, 7, 6, 8, 5, 1],
[5, 2, 2, 7, 1, 2, 3, 1, 5],
[6, 7, 8, 2, 2, 1, 6, 8, 4],
[8, 3, 0, 4, 4, 8, 8, 6, 3],
[5, 5, 4, 8, 5, 2, 2, 2, 3]])
Alternative to make_blocks with view_as_blocks -
from skimage.util.shape import view_as_blocks
def make_blocks_vectorized_v2(x,d):
return view_as_blocks(x,(x.shape[0],d,d))
Runtime test
1) make_blocks with original and view_as_blocks based approaches -
In [213]: x = np.random.randint(0,9,(100,160,120)) # scaled down by 10
In [214]: %timeit make_blocks(x,10)
1000 loops, best of 3: 198 µs per loop
In [215]: %timeit view_as_blocks(x,(x.shape[0],10,10))
10000 loops, best of 3: 85.4 µs per loop
2) unmake_blocks with original and transpose+reshape based approaches -
In [237]: A = [np.random.randint(0,9,(10,10)) for i in range(600)]
In [238]: d = 10
In [239]: m,n = 10*20,10*30
In [240]: %timeit unmake_blocks(A,d,m,n)
100 loops, best of 3: 2.03 ms per loop
In [241]: %timeit unmake_blocks_vectorized(A,d,m,n)
1000 loops, best of 3: 511 µs per loop

Resources