word disambiguation algorithm (Lesk algorithm) - algorithm

Hii..
Can anybody help me to find an algorithm in Java code to find synonyms of a search word based on the context and I want to implement the algorithm with WordNet database.
For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the synonyms must be suitable according to a context.

Let me illustrate a possible approach:
Let your sentence be A B C
Let each word have synsets i.e. {A:(a1, a2, a3), B:(b1), C:(c1, c2)}
Now form possible synset sets: (a1, b1, c1), (a1, b1, c2), (a2, b1, c1) ... (a3, b1, c2)
Define function F(a, b, c) which returns the distance (score) between (a, b, c).
Call F on each synset set.
Pick the set with the maximum score.
For starters, the function F can just return the product of the inverse of the number of nodes between the two nodes:
Maximize(Product[i=0 to len(sentence); j=0 to len(sentence)] (1/D(node_i, node_j)))
Later on, you can increase its complexity.

This is the perfect document for your problem. The acc of the algorithm is not high but I think it will be enough .
On this link you can find a Java API for WordNet Searching (JAWS).

Hi i got to have a look at this page when i was searching for lesk algorithm implementations .
I think it comes as a part of the JAWS package .
i havent used it yet , but i guess this will help

Related

Finding occurrences of predicate within list

I'm attempting to find the amount of inversions within a list. Inversions would be defined as any pair a,b from a list, where ai is the index of a and bi is the index of b that satisfies a > b and ai < bi. Essentially a comes before b but yet is larger than b.
The first thing I did was write a predicate to find out what the index is.
indexOf(Index, Element, List) :-
nth1(Index, List, Element).
Then I wrote a predicate to determine if any set of two numbers is an inversion
isInversion(A, B, List) :-
A \= B, indexOf(AI, A, List), indexOf(BI, B, List), A > B, AI < BI.
At this point I have a lot of questions, especially as I'm very unfamiliar with logic programming languages. My first question is, indexOf won't actually give me the index will it? I'm confused how that would actually work as it seems like it'd essentially have to try every number, which I'm not explicitly telling it to do.
If somehow indexOf will automatically determine the index and store it in AI/BI like I'm expecting, then I believe my isInversion predicate will evaluate correctly, if I'm wrong please let me know.
My main concern is how to actually determine the amount of inversions. In something like python I would do
count = 0
for a in puzzle
for b in puzzle
if a is b continue
if isInversion(a, b, puzzle)
count = count + 1
That would give me my amount of inversions. But how can I do this in prolog? For loops don't seem very stylistic so I don't want to use that.
Something to note, I have searched for other questions. It's a little tough since I obviously don't know exactly what I'm trying to look for. However I just wanted to make it clear that I felt things such as Prolog do predicate for all pairs in List? didn't help me answer the question.
You should remove the constraint A\=B as it will fail with unbound variables.
Then use aggregate_all/3 to count all the inversions (you don't actually need the values A/B of the inversion):
isInversion(A, B, List):-
indexOf(AI, A, List),
indexOf(BI, B, List),
A > B,
AI < BI.
countInversions(List, N):-
aggregate_all(count, isInversion(_, _, List), N).
Sample run:
?- countInversions([4,3,2,1,9], N).
N = 6.
You may see which inversions exists using findall/3 on isInversion:
?- findall(A-B, isInversion(A,B,[4,3,2,1,9]), LInversions).
LInversions = [4-3, 4-2, 4-1, 3-2, 3-1, 2-1].

How could I remove backtracking from this code?

The goal is to select shapes that don't touch each other using constraints (clpfd). Calling start(Pairs,4) would return Pairs = [1,3,5,7].
One problem I noticed is that if I print Final before labeling, it prints [1,3,5,7]. Which means labeling isn't doing anything.
What could I change/add to this code in order to fix that and also remove possible backtracking?
:-use_module(library(clpfd)).
:-use_module(library(lists)).
% init initialises Pairs and Max
% Pairs - The elements inside the Nth list in Pairs,
% represent the index of the shapes that shape N can touch
init([[3,5,6,7],[4,5,7],[1,4,5,7],[2,3,7],[1,2,3,7],[1],[1,2,3,4,5]],7).
start(Final, N):-
init(Pairs, Max),
length(Final, N),
domain(Final, 1, Max),
ascending(Final),
all_different(Final),
rules(Pairs,Final),
labeling([],Final).
rules(_,[]).
rules(Pairs,[H|T]):-
nth1(H,Pairs,PairH),
secondrule(PairH,T),
rules(Pairs,T).
secondrule(_, []).
secondrule(PairH, [H|T]):-
element(_,PairH,H),
secondrule(PairH, T).
ascending([_|[]]).
ascending([H|[T1|T2]]):-
H #< T1,
ascending([T1|T2]).
This is an Independent Set problem, which is an NP-hard problem. Therefore, it is unlikely that anybody will ever find a way to do it without search (backtracking) for general instances.
Regarding your code, labeling/2 does nothing, because your rules/2 is in fact a search procedure that returns the solution it it can find it. all_different/1 is useless too, because it is implied by ascending/1.
Presumably, your goal is a program that sets up constraints (without any search) and then searches for a solution with labeling/2. For that, you need to rethink your constraint model. Read up a bit on independent sets.

Algorithm in finding combination group of number

I'm currently stuck on a part of an application I am working on. I don't want to copy all the code and paste it here but let me go directly straight to the point with a simple example:
Suppose I have a string "abcdefg", I am trying to find an algorithm that would get all the possible grouping without exchanging the characters, for example:
abcdeg
a, b, c, d, e, f, g
ab, c, d, e, f, g
..
..
abc, def, g
..
ab, cd, efg
..
and so on...
I think the example is pretty much the point. Can anyone provide me a pseudo-code? I understand Java, C, and C++ as well, so maybe a code snippet on those language is better, but if not pseudo-code is fine and I'll try to implement it. Thanks in advance.
It's surprisingly simple. Lop off the first letter and associate a 0 or a 1 with the remaining letters. A 1 means place a comma just before the letter. A 0 means don't.
E.g. 001100 corresponds to abc,d,efg.
The notation I'm using is a simple map of a number increasing from zero expressed in binary.
So three things, (i) count integers, (ii) convert to binary, (iii) use that binary as the comma positioning rule.
The stopping condition is obvious.

Finite-state transducer that computes the relation

From http://www.cse.ohio-state.edu/~gurari/theory-bk/theory-bk-twoli1.html#30007-23021r2.2.4:
Let M = <Q, Σ, Δ, δ, q0, F> be the deterministic finite-state transducer whose transition diagram is given in Figure 2.E.2.
For each of the following relations find a finite-state transducer that computes the relation.
a. { (x, y) | x is in L(M), and y is in Δ* }.
b. { (x, y) | x is in L(M), y is in Δ*, and (x, y) is not in R(M) }.
Yes, this is HW, but I have been struggling with these questions and could at least use pointers. If you want to create your own c. and/or d. examples just to show me HOW to do it rather than lead me to the answers for a. and b. then obviously I'm fine with that.
Thanks in advance!
Since you don't indicate what progress you've made so far, I'm going to assume that you've made no progress at all, and will give overall guidance for how you can approach this sort of problem.
First of all, examine the transition diagram. Do you understand what all the notations mean? Note that the transducer is described as deterministic. Do you understand what that means? Convince yourself that the transducer depicted in the transition diagram is, in fact, deterministic. Trace through it; try to get a sense for what inputs are accepted by the transducer, and what outputs it gives.
Next, figure out what L(M), Δ, and R(M) are for this transducer, since the questions refer to them. Do you know what those notations mean?
Do you know what it means for a transducer to compute a certain relation? Do you understand the { (x, y) | ... } notation for describing the relation?
Can you modify the transition diagram to eliminate the ε/0 transition and merge it into adjacent transitions (which then might output multiple symbols at a single transition)? (This can help, IMHO, with creating other transducers that accept the same input language. More so with part b, in this case, than part a.)
Describe for yourself the transducers you need to create, in a way that's independent of the original transducer. Will these transducers be deterministic?
Create the transition diagrams for these transducers.

Floyd and Warshall's algorithms in Prolog

I want to program this algorithms in Prolog, and first I need to create a matrix from a list of graphs. I've done this before (also with help of some of you, guys), but now I don't know how to store it inside a list of lists (which I suppose it's the best approach in prolog's case). I think I can be able to continue from there (with the triple for loop in each of the algorithms). The logic of the program is not difficult for me, but how to work with data. Sorry for being a bother and thanks in advance!
My matrix generator:
graph(a,b).
graph(a,a).
graph(b,c).
graph(b,d).
graph(c,d).
graph(a,e).
graph(e,f).
matrix :- allnodes(X),printmatrix(X).
node(X) :- graph(X,_).
node(X) :- graph(_,X).
allnodes(Nodes) :- setof(X, node(X), Nodes).
printedge(X,Y) :- graph(Y,X), write('1 ').
printedge(X,Y) :- \+ graph(Y,X), write('0 ').
printmatrix(List):- member(Y, List),nl,member(X, List),printedge(X,Y),fail.
Your previous question Adjacency Matrix in prolog dealt with the visual display (row over row) of the adjacency matrix of a graph. Here we address how to realize/represent the adjacency matrix as a Prolog term. In particular we will adopt as given the allnodes/1 predicate shown above as a means of getting a list of all nodes.
Prolog lacks any native "matrix" data type, so the approach taken here will be to use a list of lists to represent the adjacency matrix. Entries are organized by "row" in 0's and 1's that denote the adjacency of the node corresponding to a row with that node corresponding to a column.
Looking at your example graph/2 facts, I see that you've included one self-edge (from a to a). I'm not sure if you intend the graph to be directed or undirected, so I'll assume a directed graph was intended and note where a small change would be needed if otherwise an undirected graph was meant.
There is a "design pattern" here that defines a list by applying a rule to each item in a list. Here we do this one way to construct each row of the "matrix" and also (taking that as our rule) to construct the whole list of lists.
/* construct adjacency matrix for directed graph (allow self-edges) */
adjacency(AdjM) :-
allnodes(L),
adjMatrix(L,L,AdjM).
adjMatrix([ ],_,[ ]).
adjMatrix([H|T],L,[Row|Rows]) :-
row_AdjM(H,L,Row),
adjMatrix(T,L,Rows).
row_AdjM(_,[ ],[ ]).
row_AdjM(X,[Y|Ys],[C|Cs]) :-
( graph(X,Y)
-> C = 1
; C = 0
),
row_AdjM(X,Ys,Cs).
If an undirected graph were meant, then the call to graph(X,Y) should be replaced with an alternative ( graph(X,Y); graph(Y,X) ) that allows an edge to be considered in either direction.

Resources