When cola.js fails, how do you find out what went wrong?
I don't have a minimal reproducible example, because that will likely take hours of debugging to make, and if I had it, then it wouldn't be an example of what I'm asking about. I'm hoping to hear of a way to quickly find out what's gone wrong. Presumably it's some bad data in my graph. But how can I tell what the bad data is?
The screenshot below illustrates a typical situation where something has gone wrong. Various NaNs are reported, there are a bunch of Uncaught TypeErrors, and an Assertion fails. There are a whole lot of local variables, all named with a single letter. Clicking and looking at them shows that a NaN got into a coordinate somewhere, as suggested by the previous error messages. It looks like sometime earlier, something got a null when it needed some sort of list-like data structure. How can I quickly find out what was the bad data that got into the graph?
I'm using cola.v3.min.js.
Details
I'm looking for a general approach to finding this kind of error, but here's the specific data that produced the errors in this example. Each line is the argument passed to cy.add() (as suggested by Stepan T.—and printing out everything passed to cy.add() might turn out to be the first step of the answer!).
{"group":"nodes","data":{"id":1,"label":"Workspace","parent":[]},"position":{"x":45,"y":0}}
{"group":"nodes","data":{"id":2,"label":"Target(121)","parent":[1]},"position":{"x":5,"y":0}}
{"group":"nodes","data":{"id":3,"label":"WantFullySourced","parent":[]},"position":{"x":50,"y":0}}
{"group":"nodes","data":{"id":4,"label":"Brick(120)","parent":[1]},"position":{"x":10,"y":0}}
{"group":"nodes","data":{"id":5,"label":"Avail","parent":[]},"position":{"x":55,"y":0}}
{"group":"nodes","data":{"id":6,"label":"Brick(1)","parent":[1]},"position":{"x":15,"y":0}}
{"group":"nodes","data":{"id":7,"label":"Avail","parent":[]},"position":{"x":60,"y":0}}
{"group":"nodes","data":{"id":8,"label":"Brick(2)","parent":[1]},"position":{"x":20,"y":0}}
{"group":"nodes","data":{"id":9,"label":"Avail","parent":[]},"position":{"x":65,"y":0}}
{"group":"nodes","data":{"id":10,"label":"Brick(3)","parent":[1]},"position":{"x":25,"y":0}}
{"group":"nodes","data":{"id":11,"label":"Avail","parent":[]},"position":{"x":70,"y":0}}
{"group":"nodes","data":{"id":12,"label":"Brick(4)","parent":[1]},"position":{"x":30,"y":0}}
{"group":"nodes","data":{"id":13,"label":"Avail","parent":[]},"position":{"x":75,"y":0}}
{"group":"nodes","data":{"id":14,"label":"Brick(5)","parent":[1]},"position":{"x":35,"y":0}}
{"group":"nodes","data":{"id":15,"label":"Avail","parent":[]},"position":{"x":80,"y":0}}
{"group":"nodes","data":{"id":16,"label":"NumericalRelationScout","parent":[1]},"position":{"x":40,"y":0}}
{"group":"nodes","data":{"id":17,"label":"OperandView","parent":[1]},"position":{"x":0,"y":0}}
{"group":"edges","data":{"id":"2.member_of.1.members","source":2,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"4.member_of.1.members","source":4,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"6.member_of.1.members","source":6,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"8.member_of.1.members","source":8,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"10.member_of.1.members","source":10,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"12.member_of.1.members","source":12,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"14.member_of.1.members","source":14,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"16.member_of.1.members","source":16,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"17.member_of.1.members","source":17,"source_port_label":"member_of","target":1,"target_port_label":"members","weight":0.5}}
{"group":"edges","data":{"id":"3.taggees.2.tags","source":3,"source_port_label":"taggees","target":2,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"2.support_from.3.support_to","source":2,"source_port_label":"support_from","target":3,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"3.support_from.2.support_to","source":3,"source_port_label":"support_from","target":2,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"5.taggees.4.tags","source":5,"source_port_label":"taggees","target":4,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"4.support_from.5.support_to","source":4,"source_port_label":"support_from","target":5,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"7.taggees.6.tags","source":7,"source_port_label":"taggees","target":6,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"6.support_from.7.support_to","source":6,"source_port_label":"support_from","target":7,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"9.taggees.8.tags","source":9,"source_port_label":"taggees","target":8,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"8.support_from.9.support_to","source":8,"source_port_label":"support_from","target":9,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"11.taggees.10.tags","source":11,"source_port_label":"taggees","target":10,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"10.support_from.11.support_to","source":10,"source_port_label":"support_from","target":11,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"13.taggees.12.tags","source":13,"source_port_label":"taggees","target":12,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"12.support_from.13.support_to","source":12,"source_port_label":"support_from","target":13,"target_port_label":"support_to","weight":2}}
{"group":"edges","data":{"id":"15.taggees.14.tags","source":15,"source_port_label":"taggees","target":14,"target_port_label":"tags","weight":0.5}}
{"group":"edges","data":{"id":"14.support_from.15.support_to","source":14,"source_port_label":"support_from","target":15,"target_port_label":"support_to","weight":2}}
The object passed to cy.layout() had a circular structure, so JSON.stringify() wouldn't print it. Adding the getCircularReplacer() function recommended here seemed to crash the browser. But manually removing elements from the layout object exposed the error: in a relative alignment constraint (documented under API here), I had an offset of '0' where I needed a 0, i.e. a string where a number was needed.
OK, happily that problem is now fixed. That still leaves the original question: is there a faster way to find errors like that? (The static-typing advocates are surely cackling by now.)
It may not matter, but I am using the windows distribution of CRF++ 0.58.
So I have successfully used mallet to train a model with a CRF and then test it. When I try to use the same train and test files with CRF++ (and after creating a template file), I get a
The line search routine mcsrch failed: error code:0
error when I use either
-a CRF-L1
or the default
-a CRF-L2
When I use
-a MIRA
though, training works without error and same with test.
The format of the test and training data can be the same for both mallet and crf++, so that is not the issue. My template file is as simple as
#Mixed
M00:%x[0,0]
M01:%x[0,1]
M02:%x[0,2]
......
M12:%x[0,12]
My last column is either 0 or 1 in my training data which is the value to classify with. No whitespace in any of my features, I use underscores when necessary. Am I missing something simple here, what would cause the L1 and L2 regularizations to fail like that?
I knew it was something silly ...
To use features like I am using, you need to use the U prefix (as in Unigram). So like U00:%x[0,0] is fine. You can't just call you features anything you want.
I also discovered that if I stripped down my test data to a single sentence, I would get the same error message. When I restored my test data back to its original size of around 2600 sentences, the regularization algorithms now run. Overfitting is a common cause of this error message across various nlp and ml applications, but that was not the true problem in my case.
It can also happen in the extreme case of a dataset with just one CLASS (due to bug in the training set generation procedure).