How to get AdvancedSubtensor on GPU - gpgpu

I have some subtensor and for some reason, Theano cannot transfer it to the GPU.
Some sample code:
import numpy
import theano
import theano.printing
import theano.compile.io
import theano.compile.function_module
import theano.tensor as T
from theano.sandbox.cuda.basic_ops import as_cuda_ndarray_variable
n_copies, n_cells = 5, 10
P = T.constant(numpy.zeros((n_copies, n_cells), dtype="int32")) # (n_copies,n_cells) -> list of indices
meminkey = T.fmatrix() # (batch,n_cells)
meminkey = as_cuda_ndarray_variable(meminkey)
i_t = T.ones((meminkey.shape[0],))
batches = T.arange(0, i_t.shape[0]).dimshuffle(0, 'x', 'x') # (batch,n_copies,n_cells)
P_bc = P.dimshuffle('x', 0, 1) # (batch,n_copies,n_cells)
meminkeyP = meminkey[batches, P_bc] # (batch,n_copies,n_cells)
meminkeyP = as_cuda_ndarray_variable(meminkeyP)
func = theano.function(inputs=[meminkey], outputs=[meminkeyP])
theano.printing.debugprint(func)
I added some as_cuda_ndarray_variable to make the problem more clear because in the output, you see the transfers GpuFromHost and HostFromGpu, which it would avoid if it could do the AdvancedSubtensor on GPU. Output.
Using gpu device 0: GeForce GTX TITAN (CNMeM is disabled, CuDNN not available)
GpuFromHost [id A] '' 5
|AdvancedSubtensor [id B] '' 4
|HostFromGpu [id C] '' 1
| |<CudaNdarrayType(float32, matrix)> [id D]
|InplaceDimShuffle{0,x,x} [id E] '' 3
| |ARange{dtype='int64'} [id F] '' 2
| |TensorConstant{0} [id G]
| |Shape_i{0} [id H] '' 0
| | |<CudaNdarrayType(float32, matrix)> [id D]
| |TensorConstant{1} [id I]
|TensorConstant{[[[4 0 1 2..5 8 9 7]]]} [id J]
So, why is Theano not able to transform this into a GPU op?
Also, how can I rewrite the code that Theano will do the calculation on GPU?
Related question in Google Groups: here and here
and here.

Ok, so in the Google Groups posts which I linked, it's pretty good explained why it doesn't work. AdvancedSubtensor is the most generic form which works with all crazy types of indexing variants. Then there is AdvancedSubtensor1, which only works for a certain kind of subset. There only exists a GPU version for AdvancedSubtensor1, not for AdvancedSubtensor. I didn't fully understand the reason but there is an ongoing discussion about it.
AdvancedSubtensor1 can be used when there is a single list of indices. However, in my example, that is not the case. The common workaround you see, also in some other example in those Google Groups post, is to flatten the array first and calculate the indices for the flattened array.
Most examples work with some kind of nonzero() or so, where you also would flatten the base arguments in the same and then you get the indices for the flattened version.
So, the question is, how to apply this to my code?
Actually, there is a simpler solution where it will use AdvancedSubtensor1 which I didn't realized initially:
meminkeyP = meminkey[:, P] # (batch,n_copies,n_cells)
However, before I realized that, I came up with a generic solution which also works for other cases. I transform my indices tuple (batches, P_bc) into indices for the flattened version. This is done with this function:
def indices_in_flatten_array(ndim, shape, *args):
"""
We expect that all args can be broadcasted together.
So, if we have some array A with ndim&shape as given,
A[args] would give us a subtensor.
We return the indices so that A[args].flatten()
and A.flatten()[indices] are the same.
"""
assert ndim > 0
assert len(args) == ndim
indices_per_axis = [args[i] for i in range(ndim)]
for i in range(ndim):
for j in range(i + 1, ndim):
indices_per_axis[i] *= shape[j]
indices = indices_per_axis[0]
for i in range(1, ndim):
indices += indices_per_axis[i]
return indices
Then, I use it like this:
meminkeyP = meminkey.flatten()[indices_in_flatten_array(meminkey.ndim, meminkey.shape, batches, P_bc)]
This seems to work.
And I get this output:
Using gpu device 0: GeForce GTX TITAN (CNMeM is disabled, CuDNN not available)
GpuReshape{3} [id A] '' 11
|GpuAdvancedSubtensor1 [id B] '' 10
| |GpuReshape{1} [id C] '' 2
| | |<CudaNdarrayType(float32, matrix)> [id D]
| | |TensorConstant{(1,) of -1} [id E]
| |Reshape{1} [id F] '' 9
| |Elemwise{second,no_inplace} [id G] '' 8
| | |TensorConstant{(1, 5, 10) of 0} [id H]
| | |Elemwise{Mul}[(0, 0)] [id I] '' 7
| | |InplaceDimShuffle{0,x,x} [id J] '' 6
| | | |ARange{dtype='int64'} [id K] '' 4
| | | |TensorConstant{0} [id L]
| | | |Shape_i{0} [id M] '' 0
| | | | |<CudaNdarrayType(float32, matrix)> [id D]
| | | |TensorConstant{1} [id N]
| | |InplaceDimShuffle{x,x,x} [id O] '' 5
| | |Shape_i{1} [id P] '' 1
| | |<CudaNdarrayType(float32, matrix)> [id D]
| |TensorConstant{(1,) of -1} [id E]
|MakeVector{dtype='int64'} [id Q] '' 3
|Shape_i{0} [id M] '' 0
|TensorConstant{5} [id R]
|TensorConstant{10} [id S]
Small test case:
def test_indices_in_flatten_array():
n_copies, n_cells = 5, 4
n_complex_cells = n_cells / 2
n_batch = 3
static_rng = numpy.random.RandomState(1234)
def make_permut():
p = numpy.zeros((n_copies, n_cells), dtype="int32")
for i in range(n_copies):
p[i, :n_complex_cells] = static_rng.permutation(n_complex_cells)
# Same permutation for imaginary part.
p[i, n_complex_cells:] = p[i, :n_complex_cells] + n_complex_cells
return T.constant(p)
P = make_permut() # (n_copies,n_cells) -> list of indices
meminkey = T.as_tensor_variable(static_rng.rand(n_batch, n_cells).astype("float32"))
i_t = T.ones((meminkey.shape[0],)) # (batch,)
n_batch = i_t.shape[0]
batches = T.arange(0, n_batch).dimshuffle(0, 'x', 'x') # (batch,n_copies,n_cells)
P_bc = P.dimshuffle('x', 0, 1) # (batch,n_copies,n_cells)
meminkeyP1 = meminkey[batches, P_bc] # (batch,n_copies,n_cells)
meminkeyP2 = meminkey.flatten()[indices_in_flatten_array(meminkey.ndim, meminkey.shape, batches, P_bc)]
numpy.testing.assert_allclose(meminkeyP1.eval(), meminkeyP2.eval())

Related

Julia matrix operation

I am looking for a matrix operation. But not sure if there is an existing operation for it.
Ex:P=[1 2 ; 3 4] and Q=[5 6 ; 7 8]
[P ; Q] # [P ; Q] => [P*P ; P*Q ; Q*P ; Q*Q]
# is the operation that I am looking for.
Thanks!
You can just define your custom operator such as:
function ⊗(a::Matrix,b::Matrix)
h1 = Int(size(a,1)/2)
P1 = #view a[1:h1,:]
Q1 = #view a[h1+1:end,:]
h2 = Int(size(b,1)/2)
P2 = #view b[1:h2,:]
Q2 = #view b[h2+1:end,:]
[P1*P2 ; P1*Q2 ; Q1*P2 ; Q1*Q2]
end
And now use it!
julia> [P ; Q] ⊗ [P ; Q] == [P*P ; P*Q ; Q*P ; Q*Q]
true
Perhaps you need to add checking the sizes etc.
You might also want to have and additional operator function ⊗(a::Tuple{Matrix,Matrix},b::Tuple{Matrix,Matrix}) so you do not need to merge P and Q matrices and then later decomposing them.

Generate and Test accumulating valid answer for next test

I know how to do a simple generate and test to return each answer individually. In the following example only items that are greater than 1 are returned.
item(1).
item(1).
item(2).
item(3).
item(1).
item(7).
item(1).
item(4).
gen_test(Item) :-
item(Item), % generate
Item > 1. % test
?- gen_test(Item).
Item = 2 ;
Item = 3 ;
Item = 7 ;
Item = 4.
I can also return the results as a list using bagof/3
gen_test_bagof(List) :-
bagof(Item,(item(Item),Item > 1), List).
?- gen_test_bagof(List).
List = [2, 3, 7, 4].
Now I would like to be able to change the test so that it uses member/2 with a list where the list is the accumulation of all previous valid answers.
I have tried this
gen_test_acc_facts(L) :-
gen_test_acc_facts([],L).
gen_test_acc_facts(Acc0,R) :-
item(H), % generate
(
member(H,Acc0) % test
->
gen_test_acc_facts(Acc0,R) % Passes test, don't accumulate, run generate and test again.
;
gen_test_acc_facts([H|Acc0],R) % Fails test, accumulate, run generate and test again.
).
but since it is recursive, every call of item/1 results in the same first fact being used.
I suspect the answer will require eliminating backtracking as was done in this answer by mat because this needs to preserve information over backtracking.
Details
The example given is a simplified version of the real problem which is to generate graphs with N vertices where the edges are undirected, have no
loops (self references), have no weights and the vertexes are labeled and there is no root vertex and set of graphs is only the isomorphic graphs. Generating all of the graphs for N is easy, and while I can accumulate all of the graphs into a list first, then test all of them against each other; once N=8 the memory is exhausted.
?- graph_sizes(N,Count).
N = 0, Count = 1 ;
N = Count, Count = 1 ;
N = Count, Count = 2 ;
N = 3, Count = 8 ;
N = 4, Count = 64 ;
N = 5, Count = 1024 ;
N = 6, Count = 32768 ;
N = 7, Count = 2097152 ;
ERROR: Out of global stack
However as there are many redundant isomorphic graphs generated, by pruning the list as it grows, the size of N can be increased. See: Enumerate all non-isomorphic graphs of a certain size
gen_vertices(N,L) :-
findall(X,between(1,N,X),L).
gen_edges(Vertices, Edges) :-
findall((X,Y), (member(X, Vertices), member(Y, Vertices), X < Y), Edges).
gen_combination_numerator(N,Numerator) :-
findall(X,between(0,N,X),L0),
member(Numerator,L0).
comb(0,_,[]).
comb(N,[X|T],[X|Comb]) :-
N>0,
N1 is N-1,
comb(N1,T,Comb).
comb(N,[_|T],Comb) :-
N>0,
comb(N,T,Comb).
gen_graphs(N,Graph) :-
gen_vertices(N,Vertices),
gen_edges(Vertices,Edges),
length(Edges,Denominator),
gen_combination_numerator(Denominator,Numerator),
comb(Numerator,Edges,Graph).
graph_sizes(N,Count) :-
length(_,N),
findall(.,gen_graphs(N,_), Sols),
length(Sols,Count).
The test for isomorphic graphs in Prolog.
Examples of generated graphs:
?- gen_graphs(1,G).
G = [] ;
false.
?- gen_graphs(2,G).
G = [] ;
G = [(1, 2)] ;
false.
?- gen_graphs(3,G).
G = [] ;
G = [(1, 2)] ;
G = [(1, 3)] ;
G = [(2, 3)] ;
G = [(1, 2), (1, 3)] ;
G = [(1, 2), (2, 3)] ;
G = [(1, 3), (2, 3)] ;
G = [(1, 2), (1, 3), (2, 3)] ;
false.
The differences between all the graphs being generated (A006125) vs the desired graphs (A001349) .
A006125 A001349 Extraneous
0 1 - 1 = 0
1 1 - 1 = 0
2 2 - 1 = 1
3 8 - 2 = 6
4 64 - 6 = 58
5 1024 - 21 = 1003
6 32768 - 112 = 32656
7 2097152 - 853 = 2096299
8 268435456 - 11117 = 268424339
9 68719476736 - 261080 = 68719215656
10 35184372088832 - 11716571 = 35184360372261
11 36028797018963968 - 1006700565 = 36028796012263403
12 73786976294838206464 - 164059830476 = 73786976130778375988
13 302231454903657293676544 - 50335907869219 = 302231454853321385807325
14 2475880078570760549798248448 - 29003487462848061 = 2475880078541757062335400387
15 40564819207303340847894502572032 - 31397381142761241960 = 40564819207271943466751741330072
Using geng and listg
Both of these programs are among many others are included in the nauty and Traces download link on the home page. (User's guide)
The programs are written in C and make use of make and can run on Linux. Instead of using Cygwin on Windows, WSL can be installed instead.
The source code can be downloaded using
wget "http://pallini.di.uniroma1.it/nauty26r11.tar.gz"
then
tar xvzf nauty26r11.tar.gz
cd nauty26r11
./configure
make
nauty generates output in graph6 format by default but can be converted to list of edges using listg, e.g.
eric#WINDOWS-XYZ:~/nauty26r11$ ./geng 3 | ./listg -e
>A ./geng -d0D2 n=3 e=0-3
>Z 4 graphs generated in 0.00 sec
Graph 1, order 3.
3 0
Graph 2, order 3.
3 1
0 2
Graph 3, order 3.
3 2
0 2 1 2
Graph 4, order 3.
3 3
0 1 0 2 1 2
Useful options for the programs
geng
-help : options
-c : only write connected graphs (A001349)
-u : do not output any graphs, just generate and count them
Example
eric#WINDOWS-ABC:~/nauty26r11$ ./geng -c -u 10
>A ./geng -cd1D9 n=10 e=9-45
>Z 11716571 graphs generated in 5.09 sec
Notice that 11716571 is the size for 10 from A001349
How to create file on Windows using WSL
Since WSL can access the Windows file system it is possible to direct the output from WSL commands to a Windows file, e.g.
touch /mnt/c/Users/Eric/graphs.txt
The touch step is not needed, but I use it to create an empty file first then verify that it is there on Windows to ensure I have typed the path correctly.
Here is an example that creates the graph edge list for A001349 in the users directory.
.geng -c 1 | .listg -e >> /mnt/c/Users/Eric/graphs.txt
.geng -c 2 | .listg -e >> /mnt/c/Users/Eric/graphs.txt
In the SWI-Prolog you can store values in global vars:
backtrackable b: b_setval, b_getval
not backtrackable nb: nb_setval, nb_getval
besides using dynamic predicates: assert/retract.
item(1).
item(1).
item(2).
item(3).
item(1).
item(7).
item(1).
item(4).
Solution 1 using regular list
gen_test(Item) :-
nb_setval(sofar, []),
item(Item), % generate
once(
(
nb_getval(sofar, SoFar),
(Item > 1, \+ member(Item, SoFar)), % test custom + on membership in earlier tests
nb_setval(sofar, [Item | SoFar])
)
;
true
),
fail; true.
Solution 2 using open list
gen_test1(Item) :-
(
item(Item), % generate
Item > 1, lookup(Item, SoFar, ItemExists)),
ItemExists == true -> fail; true
);
append(SoFar, [], SoFar ), % stubbing open list
nb_setval(sofar, Sofar).
lookup( I, [ I | _ ], true).
lookup( I, [ _ | L ], false) :-
var( L ); L == [].
lookup( I, [ _ | L ] ItemExists ):-
lookup( I, L, ItemExists ).

Delete a letter

i create something but it doesn't worked.The exercise was telling to delete a letter.Example ([c,o,m,k,p,u,t,e,r]) the k must be eliminated.
den([c,o,m,p,u,t,e,r]).
den([n,e,t,w,o,r,k]).
den([p,r,o,g,r,a,m]).
% (c) delete(X,L1,L2):-
% append(A,[X,T],L1),
% append(A,T,L2).
% <------------------ L -------------------->
% +-----------------------------------------+
% |<-> A <-> | X | <-> B <-> | Y | <-> C <->|
% +-----------------------------------------+
% <--------- F --------->
% +-------------------------------------+
% |<-> A <-> | <-> B <-> | Y | <-> C <->|
% +-------------------------------------+
% <-------------- CL --------------->
% +---------------------------------+
% |<-> A <-> | <-> B <-> | <-> C <->|
% +---------------------------------+
delete_extra(Word, CorrectWord) :-
append(Begin, [Letter1|Ypoloipo], Word),
append(Middle, [Letter2|End], Ypoloipo),
word(CorrectWord),
append(Begin, YpoloipoCW, CorrectWord),
append(Middle, End, YpoloipoCW),
Letter1 \= Letter2.
Could it be as simple as:
delete_extra(Word, CorrectWord) :-
select(_, Word, CorrectWord),
den(CorrectWord).
Sample call:
?- delete_extra([c,o,m,k,p,u,t,e,r], CorrectWord).
CorrectWord = [c, o, m, p, u, t, e, r] ;
false.
The select/3 predicate is a de facto standard library predicate over lists that non-deterministically selects an element from a list, returning it and the rest of the list.

Haskell: f n that returns list of lists of n elements from [1..n]

I need a function f :: (Integral n) => n -> [[n]].
The returned list should contain all lists of length n where all elements originate from [1..n].
Example:
f 2 = [[1,1],[1,2],[2,1],[2,2]]
This is an easy problem for constant ns:
f2 = [[a, b] | a <- [1..2], b <- [1..2]]
f3 = [[a, b, c] | a <- [1..3], b <- [1..3], c <- [1..3]]
f4 = [[a, b, c, d] | a <- [1..4], b <- [1..4], c <- [1..4], d <- [1..4]]
one solution can be
f n = sequence . replicate n $ [1..n]
Note that f 10 will have 10^10 elements
Another method could be:
Prelude> import Control.Monad
Prelude Control.Monad> f n = replicateM n [1..n]
Prelude Control.Monad> f 2
[[1,1],[1,2],[2,1],[2,2]]
A solution with the great combinat library:
Math.Combinat.Sets> combine 3 [1 .. 3]
[[1,1,1],[1,1,2],[1,1,3],[1,2,2],[1,2,3],[1,3,3],[2,2,2],[2,2,3],[2,3,3],[3,3,3]]

Prolog Roman Numerals (Attribute Grammars)

I am working on an assignment in prolog that scans a list of numerals and should return whether the list is a valid roman numeral and the decimal value of the numerals. Ex)
1 ?- roman(N, ['I'], []).
N = 1
true.
2 ?-
When I run the program that I feel should work, the decimal value is always right, so I'm guessing I got the synthesized attributes part right, but it always returns false for numeral lists that should return true. I'd also like to add that it aborts when it is supposed to if more than 3 Is, Xs, or Cs are present.
1 ?- roman(N, ['I'], []).
N = 1 ;
false.
2 ?- roman(N, ['I','I','I','I'], []).
Error: too many I's
% Execution Aborted
3 ?-
When I take out the N and throw in a {write('N = '), write(N)}, it works fine and returns true.
1 ?- roman(['I'], []).
N = 1
true.
When I remove {N is ValH + ValT + ValU} it returns true, however, it no longer displays the decimal value. Here is the top line of my code (because this is a current assignment, I would prefer to show as little as is necessary to get an answer):
roman(N) --> hundreds(ValH), tens(ValT), units(ValU), {N is ValH + ValT + ValU}.
Why is this returning false with N, but true without, and how do I fix it?
Assignment:
The following BNF specification defines the language of Roman numerals
less than 1000:
<roman> ::= <hundreds> <tens> <units>
<hundreds> ::= <low hundreds> | CD | D <low hundreds> | CM
<low hundreds> ::= e | <low hundreds> C
<tens> ::= <low tens> | XL | L <low tens> | XC
<low tens> ::= e | <low tens> X
<units> ::= <low units> | IV | V <low units> | IX
<low units> ::= e | <low units> I
Define attributes for this grammar to carry out two tasks:
a) Restrict the number of X’s in <low tens>, the I’s in <low units>, and
the C’s in <low hundreds> to no more than three.
b) Provide an attribute for <roman> that gives the decimal value of the
Roman numeral being defined.
Define any other attributes needed for these tasks, but do not change
the BNF grammar.
Did you noticed the grammar is formed of the same pattern (group//5) repeated 3 times, just with different symbols ? I like the compactness...
roman(N) -->
group('C','D','M',100, H),
group('X','L','C',10, T),
group('I','V','X',1, U),
{N is H+T+U}.
group(A,B,C, Scale, Value) -->
( g3(A, T)
; [A, B], {T = 4}
% thanks to Daniel and Will for catching bugs
; [B], g3(A, F), {T is 5+F}
; [B], {T is 5}
; [A, C], {T = 9}
; {T = 0}
), {Value is Scale * T}.
g3(C, 1) --> [C].
g3(C, 2) --> [C,C].
g3(C, 3) --> [C,C,C].
some test
?- atom_chars('CMXXX',L), phrase(roman(N),L).
L = ['C', 'M', 'X', 'X', 'X'],
N = 930 ;
false.
?- atom_chars('CMXLVIII',L), phrase(roman(N),L).
L = ['C', 'M', 'X', 'L', 'V', 'I', 'I', 'I'],
N = 943 ;
false.
Just a curiousity, showing DCG at work...
edit after Daniel and Will comments...
?- atom_chars('VIII',L), phrase(roman(N),L).
L = ['V', 'I', 'I', 'I'],
N = 8 .
?- phrase(roman(X), ['L','I','X']).
X = 59 .

Resources