How to make a template file of CRF++? - crf

I'm new to CRF++. I'm teaching myself looking at its manual:
http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar#templ
And I don't understand what this means:
This is a template to describe unigram features. When you give a
template "U01:%x[0,1]", CRF++ automatically generates a set of feature
functions (func1 ... funcN) like:
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
.... funcXX = if (output = B-NP and feature="U01:NN") return 1 else return 0
funcXY = if (output = O and feature="U01:NN") return 1 else return 0. The number of feature functions generated by a template
amounts to (L * N), where L is the number of output
Why are there many lines for the Unigram features and what do they mean?

After looking at the documentation for long enough, I think I figured it out.
Take the example in the documentation where the input data is:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
and the feature template (in the format %x[row, col], where row is relative to your current position) in question is %x[0,1]
When %x[0,1] is expanded, depending on the current token, it could scan one of the strings inside the set [PRP, VBZ, DT, JJ, NN] (i.e. one of the unique strings from the 1st column, where the leftmost column is column 0). For each of these strings it creates a set of feature functions of the form (looking at the 3rd row of input data):
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
...
where that particular string (DT in the code above) is compared with every single output class.
So if the output classes are [B-NP, I-NP, O] the feature template expanded into feature functions will look like:
# row 1 (He, PRP, B-NP)
func1 = if (output = B-NP and feature="U01:PRP") return 1 else return 0
func2 = if (output = I-NP and feature="U01:PRP") return 1 else return 0
func3 = if (output = O and feature="U01:PRP") return 1 else return 0
# row 2 (Reckons, VBZ, B-VP)
func4 = if (output = B-NP and feature="U01:VBZ") return 1 else return 0
func5 = if (output = I-NP and feature="U01:VBZ") return 1 else return 0
func6 = if (output = O and feature="U01:VBZ") return 1 else return 0
# Row 3 (the, DT, B-NP)
func7 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func8 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func9 = if (output = O and feature="U01:DT") return 1 else return 0
# Row 4 (current, JJ, I-NP)
func10 = if (output = B-NP and feature="U01:JJ") return 1 else return 0
func11 = if (output = I-NP and feature="U01:JJ") return 1 else return 0
func12 = if (output = O and feature="U01:JJ") return 1 else return 0
# Row 5 (account, NN, I-NP)
func13 = if (output = B-NP and feature="U01:NN") return 1 else return 0
func14 = if (output = I-NP and feature="U01:NN") return 1 else return 0
func15 = if (output = O and feature="U01:NN") return 1 else return 0
Regarding where the documentation mentions:
The number of feature functions generated by a template amounts to (L * N), where L is the number of output classes and N is the number of unique strings expanded from the given template.
In this case L would be 3 and N would be 5.

For a particular template %x[i,j], i represents the offsets(row) to current position, j represents the feature(column) you want to use.
Given data:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP << CURRENT TOKEN
account NN I-NP
%x[0,1] refers to the word, offset to current word is 0, its pos tag is JJ and its output tag is I-NP.
Move farword, %x[0, 1] -> pos tag = NN, output tag = I-NP
Each feature function refers to a pair of possible values of the current word and its pos tag.
update:
I think explaination above is quite straight forward on condition that you understand CRF model well.
CRF Model Reference
CRF++ is a replication of Sha and Pereira (2003)

Related

R - Using a While() loop inside a FOR() loop

I am rebuilding a VBA code inside R, that counts transitions from a rating to another based on different conditions:
It is as follows:
## attach the relevant data table
attach(cohort)
# define the matrices that will contain all the counting information
ni = matrix(0,nrow = 1, ncol = classes - 1)
nij = matrix(0, nrow = classes-1, ncol = classes+1)
for (k in 1:obs)
{
# define the year of the kth observation
t = apply(data.frame(date[k],ystart),1,max, na.rm = F)
#t = year(as.Date(t))
while (t < yend)
{
# if this observation and the second one belong to the same id and year, break and move to the next one
if (id[k] == id[k+1] & date[k] == date[k+1]) {break}
# if the rating of this observation is 0 (not rated) or in default, then leave it
if (rating[k] == classes | rating[k] == 0) {break}
# add to the group of customers with rating = rating_k, 1 observation
rating_k = rating[k]
ni[rating_k] = ni[rating_k]+1
# determine the rating from end of next year
if (id[k] != id[k+1] | date[k+1] > (t+1))
{newrat = rating_k}
else
{
kn = k +1
while (date[kn]==date[kn+1] & id[kn]==id[kn+1])
{
if (rating[kn]==classes) {break}
Kn = kn+1
}
newrat = rating[kn]
}
nij[rating_k, newrat] = (nij[rating_k, newrat] + 1)
if(newrat!=rating[k]) {break}
else
{t = (t+1)}
}
print (k)
}
At the end of my code, if the condition " if(newrat!=rating[k]) " is met, i want my code to break and move to the next K. Else, if the condition is not met, i have t = t + 1, where the code will go back to the condition inside the while(t
I added in the end "print(k)" to understand at which "for k ..." step the code stops, and it always stops at k = 9 while k = 1 to 8 are printed. In total, i have 4000 observations but only 8 are considered, though the loop never stops and R keeps running.

SpriteKit for loop

Hi I'm trying to follow a tutorial on Ray Wenderlich site
[http://www.raywenderlich.com/76740/make-game-like-space-invaders-sprite-kit-and-swift-tutorial-part-1][1]
so I'm going thru the functions breaking it down so i can get an understanding of how it works I've commented out stuff which i think i understand but this bit has me stumped
thanks for looking
the for loop whats the var row = 1 at the beginning doing ?
I've only ever done for lops like
for Position in 0...9
{
// do something with Position ten times
}
then whats the % in if row %3 mean?
for var row = 1; row <= kInvaderRowCount; row++ // start of loop
{
var invaderType: InvaderType // varible of atype etc
if row % 3 == 0
{
invaderType = .AType
} else if row % 3 == 1
hers the rest of the code
func makeInvaderOfType(invaderType: InvaderType) -> (SKNode) // function passes in a enum of atype,btype,ctype and returns sknode
{
var invaderColor: SKColor// variable for the colour
switch(invaderType)// switch statment if we pass in atype we will get red
{
case .AType:
invaderColor = SKColor.redColor()
case .BType:
invaderColor = SKColor.greenColor()
case .CType:
invaderColor = SKColor.blueColor()
default:
invaderColor = SKColor.blueColor()
}
let invader = SKSpriteNode(color: invaderColor, size: kInvaderSize)//variable of a skspritenode with color from switch statement size from vairiabe kinvadersize
invader.name = kInvaderName // name is invader fron let kinvadername
return invader //return the spritenode with color size name
}
func setupInvaders()
{
let baseOrigin = CGPoint(x:size.width/3, y:180) // vairible to hold cgpoint screen size /3 width 180 height
for var row = 1; row <= kInvaderRowCount; row++ // start of loop
{
var invaderType: InvaderType // varible of atype etc
if row % 3 == 0
{
invaderType = .AType
} else if row % 3 == 1
{
invaderType = .BType
} else
{
invaderType = .CType
}
let invaderPositionY = CGFloat(row) * (kInvaderSize.height * 2) + baseOrigin.y// varible to hold cgfloat row ? think its the incriment of the for loop times 16 times 2 = 32 plus 180 first time is 212 then 244
/* so if ive got his rightthe sum goes row = 1 kinvadersize.hieght *2 = 32 + baseoringin.y = 180
1 * 32 +180 = 212
2 * 32 + 180 = 392 but its 244
*/
println(row)
var invaderPosition = CGPoint(x:baseOrigin.x, y:invaderPositionY) // varible to hold cgpoint
println(invaderPosition.y)
for var col = 1; col <= kInvaderColCount; col++
{
var invader = makeInvaderOfType(invaderType)// varible that runs function and return the spritenode with color size name????
invader.position = invaderPosition
addChild(invader)
invaderPosition = CGPoint(x: invaderPosition.x + kInvaderSize.width + kInvaderGridSpacing.width, y: invaderPositionY)
}
}
}
If I understand your question correctly, here's the answer. Based on this code:
for var row = 1; row <= kInvaderRowCount; row++ // start of loop
{
var invaderType: InvaderType // varible of atype etc
if row % 3 == 0
{
invaderType = .AType
} else if row % 3 == 1
The first line means:
var row = 1: given a new variable, row, with a value of 1
row <= kInvaderRowCount: as long as the variable row is less than or equal to kInvaderRowCount, keep running the for loop
row++: after each time the loop is run, increment (increase) the value of row by 1
As for the "%", that is the modulo operator. It returns the remainder after a division operation on integer values. So if 7 divided by 3 = 2, with a remainder of 1, then
7 / 3 = 2
7 % 3 = 1
The modulus operator results in an integer. While 1 / 3 = 0.33..., 1 % 3 = 1. Because the remainder of 1 divided by 3 is 1.
1 % 3 = 1
2 % 3 = 2
3 % 3 = 0
4 % 3 = 1
5 % 3 = 2
6 % 3 = 0
see also: How Does Modulus Divison Work.

Fastest solution for all possible combinations, taking k elements out of n possible with k>2 and n large

I am using MATLAB to find all of the possible combinations of k elements out of n possible elements. I stumbled across this question, but unfortunately it does not solve my problem. Of course, neither does nchoosek as my n is around 100.
Truth is, I don't need all of the possible combinations at the same time. I will explain what I need, as there might be an easier way to achieve the desired result. I have a matrix M of 100 rows and 25 columns.
Think of a submatrix of M as a matrix formed by ALL columns of M and only a subset of the rows. I have a function f that can be applied to any matrix which gives a result of either -1 or 1. For example, you can think of the function as sign(det(A)) where A is any matrix (the exact function is irrelevant for this part of the question).
I want to know what is the biggest number of rows of M for which the submatrix A formed by these rows is such that f(A) = 1. Notice that if f(M) = 1, I am done. However, if this is not the case then I need to start combining rows, starting of all combinations with 99 rows, then taking the ones with 98 rows, and so on.
Up to this point, my implementation had to do with nchoosek which worked when M had only a few rows. However, now that I am working with a relatively bigger dataset, things get stuck. Do any of you guys think of a way to implement this without having to use the above function? Any help would be gladly appreciated.
Here is my minimal working example, it works for small obs_tot but fails when I try to use bigger numbers:
value = -1; obs_tot = 100; n_rows = 25;
mat = randi(obs_tot,n_rows);
while value == -1
posibles = nchoosek(1:obs_tot,i);
[num_tries,num_obs] = size(possibles);
num_try = 1;
while value == 0 && num_try <= num_tries
check = mat(possibles(num_try,:),:);
value = sign(det(check));
num_try = num_try + 1;
end
i = i - 1;
end
obs_used = possibles(num_try-1,:)';
Preamble
As yourself noticed in your question, it would be nice not to have nchoosek to return all possible combinations at the same time but rather to enumerate them one by one in order not to explode memory when n becomes large. So something like:
enumerator = CombinationEnumerator(k, n);
while(enumerator.MoveNext())
currentCombination = enumerator.Current;
...
end
Here is an implementation of such enumerator as a Matlab class. It is based on classic IEnumerator<T> interface in C# / .NET and mimics the subfunction combs in nchoosek (the unrolled way):
%
% PURPOSE:
%
% Enumerates all combinations of length 'k' in a set of length 'n'.
%
% USAGE:
%
% enumerator = CombinaisonEnumerator(k, n);
% while(enumerator.MoveNext())
% currentCombination = enumerator.Current;
% ...
% end
%
%% ---
classdef CombinaisonEnumerator < handle
properties (Dependent) % NB: Matlab R2013b bug => Dependent must be declared before their get/set !
Current; % Gets the current element.
end
methods
function [enumerator] = CombinaisonEnumerator(k, n)
% Creates a new combinations enumerator.
if (~isscalar(n) || (n < 1) || (~isreal(n)) || (n ~= round(n))), error('`n` must be a scalar positive integer.'); end
if (~isscalar(k) || (k < 0) || (~isreal(k)) || (k ~= round(k))), error('`k` must be a scalar positive or null integer.'); end
if (k > n), error('`k` must be less or equal than `n`'); end
enumerator.k = k;
enumerator.n = n;
enumerator.v = 1:n;
enumerator.Reset();
end
function [b] = MoveNext(enumerator)
% Advances the enumerator to the next element of the collection.
if (~enumerator.isOkNext),
b = false; return;
end
if (enumerator.isInVoid)
if (enumerator.k == enumerator.n),
enumerator.isInVoid = false;
enumerator.current = enumerator.v;
elseif (enumerator.k == 1)
enumerator.isInVoid = false;
enumerator.index = 1;
enumerator.current = enumerator.v(enumerator.index);
else
enumerator.isInVoid = false;
enumerator.index = 1;
enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
enumerator.recursion.MoveNext();
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
end
else
if (enumerator.k == enumerator.n),
enumerator.isInVoid = true;
enumerator.isOkNext = false;
elseif (enumerator.k == 1)
enumerator.index = enumerator.index + 1;
if (enumerator.index <= enumerator.n)
enumerator.current = enumerator.v(enumerator.index);
else
enumerator.isInVoid = true;
enumerator.isOkNext = false;
end
else
if (enumerator.recursion.MoveNext())
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
else
enumerator.index = enumerator.index + 1;
if (enumerator.index <= (enumerator.n - enumerator.k + 1))
enumerator.recursion = CombinaisonEnumerator(enumerator.k - 1, enumerator.n - enumerator.index);
enumerator.recursion.v = enumerator.v((enumerator.index + 1):end); % adapt v (todo: should use private constructor)
enumerator.recursion.MoveNext();
enumerator.current = [enumerator.v(enumerator.index) enumerator.recursion.Current];
else
enumerator.isInVoid = true;
enumerator.isOkNext = false;
end
end
end
end
b = enumerator.isOkNext;
end
function [] = Reset(enumerator)
% Sets the enumerator to its initial position, which is before the first element.
enumerator.isInVoid = true;
enumerator.isOkNext = (enumerator.k > 0);
end
function [c] = get.Current(enumerator)
if (enumerator.isInVoid), error('Enumerator is positioned (before/after) the (first/last) element.'); end
c = enumerator.current;
end
end
properties (GetAccess=private, SetAccess=private)
k = [];
n = [];
v = [];
index = [];
recursion = [];
current = [];
isOkNext = false;
isInVoid = true;
end
end
We can test implementation is ok from command window like this:
>> e = CombinaisonEnumerator(3, 6);
>> while(e.MoveNext()), fprintf(1, '%s\n', num2str(e.Current)); end
Which returns as expected the following n!/(k!*(n-k)!) combinations:
1 2 3
1 2 4
1 2 5
1 2 6
1 3 4
1 3 5
1 3 6
1 4 5
1 4 6
1 5 6
2 3 4
2 3 5
2 3 6
2 4 5
2 4 6
2 5 6
3 4 5
3 4 6
3 5 6
4 5 6
Implementation of this enumerator may be further optimized for speed, or by enumerating combinations in an order more appropriate for your case (e.g., test some combinations first rather than others) ... Well, at least it works! :)
Problem solving
Now solving your problem is really easy:
n = 100;
m = 25;
matrix = rand(n, m);
k = n;
cont = true;
while(cont && (k >= 1))
e = CombinationEnumerator(k, n);
while(cont && e.MoveNext());
cont = f(matrix(e.Current(:), :)) ~= 1;
end
if (cont), k = k - 1; end
end

How can I generate this pattern of numbers?

Given inputs 1-32 how can I generate the below output?
in. out
1
1
1
1
2
2
2
2
1
1
1
1
2
2
2
2
...
Edit Not Homework.. just lack of sleep.
I am working in C#, but I was looking for a language agnostic algorithm.
Edit 2 To provide a bit more background... I have an array of 32 items that represents a two dimensional checkerboard. I needed the last part of this algorithm to convert between the vector and the graph, where the index aligns on the black squares on the checkerboard.
Final Code:
--Index;
int row = Index >> 2;
int col = 2 * Index - (((Index & 0x04) >> 2 == 1) ? 2 : 1);
Assuming that you can use bitwise operators you can check what the numbers with same output have in common, in this case I preferred using input 0-31 because it's simpler (you can just subtract 1 to actual values)
What you have?
0x0000 -> 1
0x0001 -> 1
0x0010 -> 1
0x0011 -> 1
0x0100 -> 2
0x0101 -> 2
0x0110 -> 2
0x0111 -> 2
0x1000 -> 1
0x1001 -> 1
0x1010 -> 1
0x1011 -> 1
0x1100 -> 2
...
It's quite easy if you notice that third bit is always 0 when output should be 1 and viceversa it's always 1 when output should be 2
so:
char codify(char input)
{
return ((((input-1)&0x04)>>2 == 1)?(2):(1));
}
EDIT
As suggested by comment it should work also with
char codify(char input)
{
return ((input-1 & 0x04)?(2):(1));
}
because in some languages (like C) 0 will evaluate to false and any other value to true. I'm not sure if it works in C# too because I've never programmed in that language. Of course this is not a language-agnostic answer but it's more C-elegant!
in C:
char output = "11112222"[input-1 & 7];
or
char output = (input-1 >> 2 & 1) + '1';
or after an idea of FogleBird:
char output = input - 1 & 4 ? '2' : '1';
or after an idea of Steve Jessop:
char output = '2' - (0x1e1e1e1e >> input & 1);
or
char output = "12"[input-1>>2&1];
C operator precedence is evil. Do use my code as bad examples :-)
You could use a combination of integer division and modulo 2 (even-odd): There are blocks of four, and the 1st, 3rd, 5th block and so on should result in 1, the 2nd, 4th, 6th and so on in 2.
s := ((n-1) div 4) mod 2;
return s + 1;
div is supposed to be integer division.
EDIT: Turned first mod into a div, of course
Just for laughs, here's a technique that maps inputs 1..32 to two possible outputs, in any arbitrary way known at compile time:
// binary 1111 0000 1111 0000 1111 0000 1111 0000
const uint32_t lu_table = 0xF0F0F0F0;
// select 1 bit out of the table
if (((1 << (input-1)) & lu_table) == 0) {
return 1;
} else {
return 2;
}
By changing the constant, you can handle whatever pattern of outputs you want. Obviously in your case there's a pattern which means it can probably be done faster (since no shift is needed), but everyone else already did that. Also, it's more common for a lookup table to be an array, but that's not necessary here.
The accepted answer return ((((input-1)&0x04)>>2 == 1)?(2):(1)); uses a branch while I would have just written:
return 1 + ((input-1) & 0x04 ) >> 2;
Python
def f(x):
return int((x - 1) % 8 > 3) + 1
Or:
def f(x):
return 2 if (x - 1) & 4 else 1
Or:
def f(x):
return (((x - 1) & 4) >> 2) + 1
In Perl:
#!/usr/bin/perl
use strict; use warnings;
sub it {
return sub {
my ($n) = #_;
return 1 if 4 > ($n - 1) % 8;
return 2;
}
}
my $it = it();
for my $x (1 .. 32) {
printf "%2d:%d\n", $x, $it->($x);
}
Or:
sub it {
return sub {
my ($n) = #_;
use integer;
return 1 + ( (($n - 1) / 4) % 2 );
}
}
In Haskell:
vec2graph :: Int -> Char
vec2graph n = (cycle "11112222") !! (n-1)
Thats pretty straightforward:
if (input == "1") {Console.WriteLine(1)};
if (input == "2") {Console.WriteLine(1)};
if (input == "3") {Console.WriteLine(1)};
if (input == "4") {Console.WriteLine(1)};
if (input == "5") {Console.WriteLine(2)};
if (input == "6") {Console.WriteLine(2)};
if (input == "7") {Console.WriteLine(2)};
if (input == "8") {Console.WriteLine(2)};
etc...
HTH
It depends of the language you are using.
In VB.NET, you could do something like this :
for i as integer = 1 to 32
dim intAnswer as integer = 1 + (Math.Floor((i-1) / 4) mod 2)
' Do whatever you need to do with it
next
It might sound complicated, but it's only because I put it into a sigle line.
In Groovy:
def codify = { i ->
return (((((i-1)/4).intValue()) %2 ) + 1)
}
Then:
def list = 1..16
list.each {
println "${it}: ${codify(it)}"
}
char codify(char input)
{
return (((input-1) & 0x04)>>2) + 1;
}
Using Python:
output = 1
for i in range(1, 32+1):
print "%d. %d" % (i, output)
if i % 4 == 0:
output = output == 1 and 2 or 1
JavaScript
My first thought was
output = ((input - 1 & 4) >> 2) + 1;
but drhirsch's code works fine in JavaScript:
output = input - 1 & 4 ? 2 : 1;
and the ridiculous (related to FogleBird's answer):
output = -~((input - 1) % 8 > 3);
Java, using modulo operation ('%') to give the cyclic behaviour (0,1,2...7) and then a ternary if to 'round' to 1(?) or 2(:) depending on returned value.
...
public static void main(String[] args) {
for (int i=1;i<=32;i++) {
System.out.println(i+"="+ (i%8<4?1:2) );
}
Produces:
1=1 2=1 3=1 4=2 5=2 6=2 7=2 8=1 9=1
10=1 11=1 12=2 13=2 14=2 15=2 16=1
17=1 18=1 19=1 20=2 21=2 22=2 23=2
24=1 25=1 26=1 27=1 28=2 29=2 30=2
31=2 32=1

I want a function in VB SCRIPT to calculate numerology

I want a function to calculate numerology.For example if i enter "XYZ" then my output should be 3 .
Here is how it became 3:
X = 24
Y = 25
Z = 26
on adding it becomes 75 which again adds up to 12 (7+5) which again adds up to 3(1+2) . Similarly whatever names i should pass,my output should be a single digit score.
Here you are:
Function Numerology(Str)
Dim sum, i, char
' Convert the string to upper case, so that 'X' = 'x'
Str = UCase(Str)
sum = 0
' For each character, ...
For i = 1 To Len(Str)
' Check if it's a letter and raise an exception otherwise
char = Mid(Str, i , 1)
If char < "A" Or char > "Z" Then Err.Raise 5 ' Invalid procedure call or argument
' Add the letter's index number to the sum
sum = sum + Asc(char) - 64
Next
' Calculate the result using the digital root formula (http://en.wikipedia.org/wiki/Digital_root)
Numerology = 1 + (sum - 1) Mod 9
End Function
In vbscript:
Function numerology(literal)
result = 0
for i = 1 to Len(literal)
'' // for each letter, take its ASCII value and substract 64,
'' so "A" becomes 1 and "Z" becomes 26
result = result + Asc(Mid(literal, i, 1)) - 64
next
'' // while result is bigger than 10, let's sum it's digits
while(result > 10)
partial = 0
for i = 1 to Len(CStr(result))
partial = partial + CInt(Mid(CStr(result), i, 1))
next
result = partial
wend
numerology = result
End Function
I have no idea what this could possible be used for but it was fun to write anyway.
Private Function CalcStupidNumber(ByVal s As String) As Integer
s = s.ToLower
If (s.Length = 1) Then 'End condition
Try
Return Integer.Parse(s)
Catch ex As Exception
Return 0
End Try
End If
'cover to Values
Dim x As Int32
Dim tot As Int32 = 0
For x = 0 To s.Length - 1 Step 1
Dim Val As Integer = ConvertToVal(s(x))
tot += Val
Next
Return CalcStupidNumber(tot.ToString())
End Function
Private Function ConvertToVal(ByVal c As Char) As Integer
If (Char.IsDigit(c)) Then
Return Integer.Parse(c)
End If
Return System.Convert.ToInt32(c) - 96 ' offest of a
End Function

Resources