Mutually exclusivity in Problog - prolog

We have
4 different storage spaces, and
5 different boxes (named b1, b2, b3, b4 and b5) which they wanted to put in this storage spaces.
Each storage space can be filled with only one unique box at a time.
*But B5 has a special condition which allows to be used in multiple storage spaces at the same time.
Each box has specific weight as assign to it (b1:4, b2:6, b3:5, b4:6 and b5:5).
Each box has a specific probability to be filled in to the storage spaces (b1:1, b2:0.6, b3=1, b4=0.8, b5=1).
We try to get the probable content of the storage spaces and their probabilities if the total weight is 22. ! (which we will use this as an evidence mechanism)
For example :
SS1 - b2(6)
SS2 - b5(5)
SS3 - b4(6)
SS4 - b5(5)
Where the total weight will be 22
And the probability of this content.
In my code bellow I get the answer for one of the probable content as totalboxweight(b2, b5, b4, b5, 22) which is okay for me. It means first box b2 is in first storage space, b5 is in second storage space and so on.
Here is my code so far, I add comments also to explain my intentions
But I need help to update it add the probabilities and apply some of the conditions I talked about.
box(b1,4).
box(b2,6).
box(b3,5).
box(b4,6).
box(b5,5). % I tried to define the boxes but I dont know how to assign probabilites to them in this format
total(D1,D2,D3,D4,Sum) :-
Sum is D1+D2+D3+D4. % I defined the sum calculation
totalboxweight(A,B,C,D,Sum) :-
box(A,D1), box(B,D2) , box(C,D3), box(D,D4),
total(D1,D2,D3,D4,Sum). % I am sum up all weights
sumtotal(Sum) :-
box(A,D1), box(B,D2) , box(C,D3), box(D,D4),
total(D1,D2,D3,D4,Sum). % I defined this one to use it as an evidence
evidence(sumtotal(22),true). % if we know the total weight is 22
query(totalboxweight(D1,D2,D3,D4,22)). % what is the probable content
I am using an online Problog editor to test my code. Here is the link.
And I am trying to do it in Problog not Prolog, so the syntax is different.
Right now with the help of answers I overcome some issues, the problems I still have ;
I couldn't apply probabilities
I couldn't apply the condition ( Each storage space can be filled with only one unique box at a time. But B5 has a special condition which allows to be used in multiple storage spaces at the same time.)
Thanks you in advance.

Related

Masked self-attention in tranformer's decoder

I'm writing my thesis about attention mechanisms. In the paragraph in which I explain the decoder of transformer I wrote this:
The first sub-layer is called masked self-attention, in which the masking operation consists in preventing the decoder from paying attention to subsequent words.
That is to say, while training a transformer for translation purposes, it is possible to access the target translation; on the other hand, during the inference, that is the translation of new sentences, it is not possible to access the target translation. Therefore, when calculating the probabilities of the next word in the sequence, the network must not access that word. Otherwise, the translation task would be banal and the network would not learn to predict the translation correctly.
I don't know if I said something wrong also in the previous part, but my professor thinks I made mistakes in the following part:
To understand in a simple way the functioning of the masked self-attention level, let's go back to the example “Orlando Bloom loves Miranda Kerr” (x1 x2 x3 x4 x5).
If we consider the inputs as vectors x1, x2. x3. x4. x5 and we want to translate the word x3 corresponding to "loves", you need to make sure that the following words x4 and x5 do not influence the translation y3. To prevent this influence, masking sets the weights of x4 and x5 to zero. Then a normalization of the weights is performed so that the sum of the elements of each column in the matrix is equal to 1. The result is a matrix with normalized weights in each column.
Can someone please tell me where the miskates are?

cbind before rbind, or rbind before cbind?

Say I have 20 frames on a 4-node H2O cluster: a1..a5, b1..b5, c1..c5, d1..d5. And I want to combine them into one big frame, from which I will build a model.
Is it better to combine sets of columns, then combine rows:
h2o.rbind(
h2o.cbind(a1, b1, c1, d1),
h2o.cbind(a2, b2, c2, d2),
h2o.cbind(a3, b3, c3, d3),
h2o.cbind(a4, b4, c4, d4),
h2o.cbind(a5, b5, c5, d5)
)
Or, to combine the rows first, then the columns:
h2o.cbind(
h2o.rbind(a1, a2, a3, a4, a5),
h2o.rbind(b1, b2, b3, b4, b5),
h2o.rbind(c1, c2, c3, c4, c5),
h2o.rbind(d1, d2, d3, d4, d5)
)
For the sake of argument, 1/2/3/4/5 might each represent one month of data, which is why they got imported separately. And a/b/c/d are different sets of features, which again explains why they were imported separately. Let's say, a1..a5 have 1728 columns, b1..b5 have 113 columns, c1..c5 have 360 columns, and d1..d5 is a single column (the answer I'll be modelling). (Though I suspect, as H2O is a column database, that the relative number of columns in a/b/c/d does not matter?)
By "better" I mean quicker, but if there is a memory-usage difference in one or the other, that would also be good to know: I'm mainly interested in the Big Data case, where the combined frame is big enough that I wouldn't be able to fit it in the memory of just a a single node.
I'm now fairly sure the answer is: doesn't matter.
Point 1: The two examples in the question are identical. This is because both h2o.cbind() and h2o.rbind() use lazy evaluation. So either way it returns immediately, and nothing happens until you perform some operation. (I've been using nrow() or ncol() to force creation of the new frame - it also allows me to check that I've got what I expected.)
Point 2: I've been informed by an H2O developer that they is no difference (CPU or memory), because either way the data will be copied.
Point 3: I've not noticed any significant speed difference on some reasonably big cbind/rbinds, with final frame size of 17GB (compressed size). This has not been rigorous, but I've never waited more than 30 to 40 seconds for the nrow() command to complete the copy.
Bonus Tip: Following on from point 1, it is essential you call nrow() (or whatever) to force the copy to happen, before you delete the constituent parts. If you do the all = rbind(parts), then h2o.rm(parts), then nrow(all) you get an error (and your data is lost and needs to be imported again).

Solving a puzzle using swi-prolog

I've been given as an assignment to write using prolog a solver for
the battleships solitaire puzzle. To those unfamiliar, the puzzle deals
with a 6 by 6 grid on which a series of ships are placed according to the provided
constraints on each row and column, i.e. the first row must contain 3 squares with ships, the second row must contain 1 square with a ship, the third row must contain 0 squares etc for the other rows and columns.
Each puzzle comes with it's own set of constraints and revealed squares, typically two. An example can be seen here:
battleships
So, here's what I've done:
step([ShipCount,Rows,Cols,Tiles],[ShipCount2,Rows2,Cols2,Tiles2]):-
ShipCount2 is ShipCount+1,
nth1(X,Cols,X1),
X1\==0,
nth1(Y,Rows,Y1),
Y1\==0,
not(member([X,Y,_],Tiles)),
pairs(Tiles,TilesXY),
notdiaglist(X,Y,TilesXY),
member(T,[1,2,3,4,5,6]),
append([X,Y],[T],Tile),
append([Tile],Tiles,Tiles2),
dec_elem1(X,Cols,Cols2),dec_elem1(Y,Rows,Rows2).
dec_elem1(1,[A|Tail],[B|Tail]):- B is A-1.
dec_elem1(Count,[A|Tail],[A|Tail2]):- Count1 is Count-1,dec_elem1(Count1,Tail,Tail2).
neib(X1,Y1,X2,Y2) :- X2 is X1,(Y2 is Y1 -1;Y2 is Y1+1; Y2 is Y1).
neib(X1,Y1,X2,Y2) :- X2 is X1-1,(Y2 is Y1 -1;Y2 is Y1+1; Y2 is Y1).
neib(X1,Y1,X2,Y2) :- X2 is X1+1,(Y2 is Y1 -1;Y2 is Y1+1; Y2 is Y1).
notdiag(X1,Y1,X2,Y2) :- not(neib(X1,Y1,X2,Y2)).
notdiag(X1,Y1,X2,Y2) :- neib(X1,Y1,X2,Y2),((X1 == X2,t(Y1,Y2));(Y1 == Y2,t(X1,X2))).
notdiaglist(X1,Y1,[]).
notdiaglist(X1,Y1,[[X2,Y2]|Tail]):-notdiag(X1,Y1,X2,Y2),notdiaglist(X1,Y1,Tail).
t(X1,X2):- X is abs(X1-X2), X==1.
pairs([],[]).
pairs([[X,Y,Z]|Tail],[[X,Y]|Tail2]):-pairs(Tail,Tail2).
I represent a state with a list: [Count,Rows,Columns,Tiles]. The last state must be
[10,[0,0,0,0,0,0],[0,0,0,0,0,0], somelist]. A puzzle starts from an initial state, for example
initial([1, [1,3,1,1,1,2] , [0,2,2,0,0,5] , [[4,4,1],[2,1,0]]]).
I try to find a solution in the following manner:
run:-initial(S),step(S,S1),step(S1,S2),....,step(S8,F).
Now, here's the difficulty: if i restrict myself to one type of ship parts by using member(T,[1])
instead of
member(T,[1,2,3,4,5,6])
it works fine. However, when I use the full range of possible values for T which are needed
later, the query never ends since it runs for too long. this puzzles me, since :
(a) it works for 6 types of ships but only for 8 steps instead of 9
(b) going from a single type of ship to 6 types increases the number
of options for just the last step by a factor of 6, which
shouldn't have such a dramatic effect.
So, what's going on?
To answer your question directly, what's going on is that Prolog is trying to sift through an enormous space of possibilities.
You're correct that altering that line increases the search space of the last call by a factor of six, note that the size of the search space of, say, nine calls, isn't proportional to 9 times the size of one call. Prolog will backtrack on failure, so it's proportional (bounded above, actually) to the size of the possible results of one call raised to the ninth power.
That means we can expect the size of the space Prolog needs to search to grow by at most a factor of 6^9 = 10077696 when we allow T to take on 6 times as many values.
Of course, it doesn't help that (as far as I was able to tell) a solution doesn't exist if we call step 9 times starting with initial anyways. Since that last call is going to fail, Prolog will keep trying until it's exhausted all possibilities (of which there are a great many) before it finally gives up.
As far as a solution goes, I'm not sure I know enough about the problem. If the value if T is the kind of ship that fits in the grid (e.g. single square, half of a 2-square-ship, part of a 3-square-ship) you should note that that gives you a lot more information than the numbers on the rows/columns.
Right now, in pseudocode, your step looks like this:
Find a (X,Y) pair that has non-zero markings on its row/column
Check that there isn't already a ship there
Check that it isn't diagonal to a ship
Pick a kind of ship-part for it to be.
I'd suggest you approach like this:
Finish any already placed ship-bits to form complete ships (if we can't: fail)
Until we're finished:
Find acceptable places to place ship
Check that the markings on the row/column aren't zero
Try to place an entire ship here. (instead of a single part)
By using the most specific information that we have first (in this case, the previously placed parts), we can reduce the amount of work Prolog has to do and make things return reasonably fast.

How to determine character similarity?

I am using the Levenshtein distance to find similar strings after OCR. However, for some strings the edit distance is the same, although the visual appearance is obviously different.
For example the string Co will return these matches:
CY (1)
CZ (1)
Ca (1)
Considering, that Co is the result from an OCR engine, Ca would be the more likely match than the ones. Therefore, after calculating the Levenshtein distance, I'd like to refine query result by ordering by visual similarity. In order to calculate this similarity a I'd like to use standard sans-serif font, like Arial.
Is there a library I can use for this purpose, or how could I implement this myself? Alternatively, are there any string similarity algorithms that are more accurate than the Levenshtein distance, which I could use in addition?
If you're looking for a table that will allow you to calculate a 'replacement cost' of sorts based on visual similarity, I've been searching for such a thing for awhile with little success, so I started looking at it as a new problem. I'm not working with OCR, but I am looking for a way to limit the search parameters in a probabilistic search for mis-typed characters. Since they are mis-typed because a human has confused the characters visually, the same principle should apply to you.
My approach was to categorize letters based on their stroke components in an 8-bit field. the bits are, left to right:
7: Left Vertical
6: Center Vertical
5: Right Vertical
4: Top Horizontal
3: Middle Horizontal
2: Bottom Horizontal
1: Top-left to bottom-right stroke
0: Bottom-left to top-right stroke
For lower-case characters, descenders on the left are recorded in bit 1, and descenders on the right in bit 0, as diagonals.
With that scheme, I came up with the following values which attempt to rank the characters according to visual similarity.
m: 11110000: F0
g: 10111101: BD
S,B,G,a,e,s: 10111100: BC
R,p: 10111010: BA
q: 10111001: B9
P: 10111000: B8
Q: 10110110: B6
D,O,o: 10110100: B4
n: 10110000: B0
b,h,d: 10101100: AC
H: 10101000: A8
U,u: 10100100: A4
M,W,w: 10100011: A3
N: 10100010: A2
E: 10011100: 9C
F,f: 10011000: 98
C,c: 10010100: 94
r: 10010000: 90
L: 10000100: 84
K,k: 10000011: 83
T: 01010000: 50
t: 01001000: 48
J,j: 01000100: 44
Y: 01000011: 43
I,l,i: 01000000: 40
Z,z: 00010101: 15
A: 00001011: 0B
y: 00000101: 05
V,v,X,x: 00000011: 03
This, as it stands, is too primitive for my purposes and requires more work. You may be able to use it, however, or perhaps adapt it to suit your purposes. The scheme is fairly simple. This ranking is for a mono-space font. If you are using a sans-serif font, then you likely have to re-work the values.
This table is a hybrid table including all characters, lower- and upper-case, but if you split it into upper-case only and lower-case only it might prove more effective, and that would also allow to apply specific casing penalties.
Keep in mind that this is early experimentation. If you see a way to improve it (for example by changing the bit-sequencing) by all means feel free to do so.
In general I've seen Damerau-Levenshtein used much more often than just Levenshtein , and it basically adds the transposition operation. It is supposed to account for more than 80% of human misspelling, so you should certainly consider that.
As to your specific problem, you could easily modify the algorithm to increase the cost when substituting a capital letter with a non capital letter, and the opposite to obtain something like that:
dist(Co, CY) = 2
dist(Co, CZ) = 2
dist(Co, Ca) = 1
So in your distance function just have a different cost for replacing different pairs of characters.
That is, rather than a replacement adding a set cost of one or two irrepective of the characters involved - instead have a replace cost function that returns something in between 0.0 and 2.0 for the cost of replacing certain characters in certain contexts.
At each step of the memoization, just call this cost function:
cost[x][y] = min(
cost[x-1][y] + 1, // insert
cost[x][y-1] + 1, // delete,
cost[x-1][y-1] + cost_to_replace(a[x],b[y]) // replace
);
Here is my full Edit Distance implementation, just swap the replace_cost constant for a replace_cost function as shown:
https://codereview.stackexchange.com/questions/10130/edit-distance-between-two-strings
In terms of implementing the cost_to_replace function you need a matrix of characters with costs based on how similiar the characters are. There may be such a table floating around, or you could implement it yourself by writing each pair of characters to a pair of images and then comparing the images for similiarity using standard vision techniques.
Alternatively you could use a supervised method whereby you correct several OCR misreads and note the occurences in a table that will then become the above cost table. (ie If the OCR gets it wrong than the characters must be similiar).

Prolog - Getting element from a list of lists

I am having trouble figuring out how to access a single character from a list of strings without using recursion, but instead backtracking.
For example I have this list of Strings and I want to be able to return a single character from one of these strings ('.' 'o', '*'). The program I am working on is treating it as rows and columns. It is a fact in my database that looks like this:
matrix(["...o....",
".******.",
"...o....",
".*...*..",
"..o..*..",
".....*..",
".o...*..",
"....o..o"].
I have the predicate:
get(Row,Col,TheChar) :-
that takes a row and column number (with index starting at 1) and returns the entry (TheEntry) at that specific row and column.
I have a feeling my predicate head might not be build correctly but I'm really more focused on just how to go through each String in the list character by character without recursion and returning that.
I am new to prolog and am having major difficulty with this.
Any help at all would be greatly appreciated!
Thank you!
An implementation of get/3 might look like this:
get(Row,Col,TheChar) :-
matrix(M),
nth(Row,M,RowList),
nth(Col,RowList,TheChar).
Note that TheChar is unified to a character code e.g.
| ?- get(1,4,X).
X = 111
If you want to get see the character you can for instance use atom codes, e.g.
| ?- get(4,2,X), atom_codes(CharAtom,[X]).
X = 42
CharAtom = *
Hope this helps.
using your matrix representation, you could do something like this:
cell(X,Y,Cell) :-
matrix(Rows) ,
Matrix =.. [matrix|Rows] ,
arg(X,Matrix,Cols) ,
Row =.. [row|Cols] ,
arg(Y,Row,Cell)
.
The use of =.. to construct terms on the fly might be a hint that your matrix representation isn't the best. You might consider different representations for your matrix.
Assuming a "standard" matrix with fixed-length rows, you could represent the matrix
A B C D
E F G H
I J K L
in a couple of different ways:
A single string, if the cell values can be represented as a single character and your prolog supports real strings (rather than string-as-list-of-char-atoms):
"ABCDEFGHIJKL"
Lookup is simple and zero-relative (e.g., the first row and the first column are both numbered 0):
( RowLength * RowOffset ) + ColOffset
gives you the index to the appropriate character in the atom. Retrieval consists of a simple substring operation. This has the advantages of speed and simplicity.
a compound term is another option:
matrix( rows( row('A','B','C','D') ,
row('E','F','G','H') ,
row('I','J','K','L')
)
).
Lookup is still simple:
cell(X,Y,Matrix,Value) :-
arg(X,Matrix,Row) ,
arg(Y,Matrix,Cell)
.
A third option might be to use the database to represent your matrix more directly using the database predicates asserta, assertz, retract , retractall , recorda, recordz, recorded, erase. You could build a structure of facts, for instance in the database along the lines of:
matrix( Matrix_Name ).
matrix_cell( Matrix_Name , RowNumber , ColumnNumber , Value ).
This has the advantage of allowing both sparse (empty cells don't need to be represented) and jagged (rows can vary in length) representations.
Another option (last resort,you might say) would be to jump out into a procedural language, if your prolog allows that, and represent the matrix in a more...matrix-like manner. I had to do that once: we ran into huge performance problems with both memory and CPU once the data model got past a certain size. Our solution was to represent the needed relation as a ginormous array of bits, which was trivial to do in C (and not so much in Prolog).
I'm sure you can come up with other methods of representing matrices as well.
TMTOWTDI (Tim-Toady or "There's More Than One Way To Do It") as they say in the Perl community.

Resources