Stanford Pattern-based Information Extraction - stanford-nlp

I really wonder how can I use SPIED (Stanford bootstrapping tool)? Is there any way to have a demo file like what we have for parsers, Ner, .....? The statement in the file is not easy to use... I'm using other tools in my project thanks to demo files...

Please see the main method of the class GetPatternsFromDataMultiClass. The static run method inside the class is almost like a demo. All you need is a properties file, an example demo properties file is provided with the release. You can also access the example properties here.
You would need to run the class with parameters: " -props [path-to-properties] "
SPIED code is different from NER etc. because there is no model released. The code is generic, like CRFs or logistic regression, which you use to train your own model.
An example code to run SPIED is (you can similarly use :
GetPatternsFromDataMultiClass<SurfacePattern> model = GetPatternsFromDataMultiClass.<SurfacePattern>run(props);
for(Map.Entry<String, Counter<SurfacePattern>> p : model.getLearnedPatterns().entrySet()){
System.out.println("For label " + p.getKey() + ", the patterns learned are: ");
for(Map.Entry<SurfacePattern, Double> pat: p.getValue().entrySet()){
System.out.println("Pattern " + pat + " with score " + pat.getValue());
}
System.out.println("For label " + p.getKey() + ", the learned words are: " + model.constVars.getLearnedWords(p.getKey()));
}
For more details on how to use the model for another piece of text, look at explanations of flags loadSavedPatternsWordsDir in the example.properties file.

Related

How to generate smarter/complex snippets in Visual Studio/Visual Studio Code?

Problem: I'm looking for a way to create complex snippets. At our company we have larger functions which almost seem boilerplate-ish, and I feel can be made much easier.
Desired solution: I want to create something, similar to how snippets work, but suitable for more complex generation of code. For instance, see the following code, which is typical for what we generate:
private readonly DependencyOne dependencyOne;
private readonly DependencyTwo dependencyTwo;
public ClassName(DependencyOne dependencyOne, DependencyTwo dependencyTwo)
{
this.dependencyOne = dependencyOne;
this.dependencyTwo = dependencyTwo;
}
Basically I only want type the two classnames, and from that generate the constructor and the two associated fields. If possible I want to add these fields at the correct position in the code, pretty much like how IntelliSense's Quick Fix automatically finds the correct position in your code to place the fields.
The reason why I can't just generate it above the constructor, is because there are some methods which will be generated which aren't constructors and therefore don't reside on the top of the code.
How do I achieve this desired solution?
Solution with Visual Studio Code 1.24:
In visual studio code you can specify the snippets as you want by creating a snippet JSON file. Please refer to this doc to know how to create a new snippet in VS Code.
write the following in language.json, language would be whatever language for which you want to create the snippet :
"Constructor - A unique name" : {
"prefix" : "constructor",
"body": [
"private readonly ${DependencyOne} ${dependencyOne};",
"private readonly ${DependencyTwo} ${dependencyTwo};",
"",
"public ClassName(${DependencyOne} ${dependencyOne}, ${DependencyTwo} ${dependencyTwo})",
"{",
" this.${dependencyOne} = ${dependencyOne};",
" this.${dependencyTwo} = ${dependencyTwo};",
"}",
],
"description": "description of what it does"
}
after following the steps in doc and writing the json, you would be able to use the snippet by typing constructor as mentioned as "prefix" of the snippet.
And with release v1.25 the following works:
"Constructor and variables" : {
"prefix" : "ctor",
"body": [
"private readonly ${1/(.*)/${1:/capitalize}/} ${1:var1};",
"private readonly ${2/(.*)/${1:/capitalize}/} ${2:var2};",
"",
"public ClassName(${1/(.*)/${1:/capitalize}/} $1, ${2/(.*)/${1:/capitalize}/} $2)",
"{",
" this.$1 = $1;",
" this.$2 = $2;",
"}",
],
"description": "your description"
},
For this you will only type two names - I have made it so you type the uncapitalized version and the snippet will automatically capitalize the classnames. It would be easy to reverse those but would be a lot more code. After you enter the second classname/var hit tab and your code will capitalize correctly.You can replace the "var1/var2" with whatever you want.

Removing Class Name of an element using element.className.replace method

var divfoo=document.getElementById("foo");
divfoo.className=" css-class css-class2 ";
divfoo.className=divfoo.className.replace(" css-class2 ", "");
The above code works. but I would like to make changes to last line of above code which is using replace method. Instead of writing the code like above, I would like to know why doesn't it work when written like below.
var divfoo=document.getElementById("foo");
divfoo.className=" css-class css-class2 ";
divfoo.className.replace(" css-class2 ", "");
Why should one assign to "divfoo.className" when applying replace method to the same "divfoo.className", why can't we just apply method directly like above code did?
Because of this should I hate javascript for not being logical?
enter code here
Element.className is a plain string representation of class HTML attribute.
String.replace method does not change the source string that is called on, just returns the result of replacement procedure.
If you want more "logical / functional" approach, look at Element.classList interface, namely the remove method.

Generate HTML documentation for a FreeMarker FTL library

I've a FreeMarker library that I want to ship with my product, and I'm looking for a way to generate a HTML documentation for it based on the comments in the FTL file (in a Javadoc fashion).
For example, a typical function in my library is written like:
<#--
MyMacro: Does stuff with param1 and param2.
- param1: The first param, mandatory.
- param2: The second param, 42 if not specified.
-->
<#macro MyMacro param1 param2=42>
...
</#macro>
I didn't find anything on that subject, probably because there is no standard way of writing comments in FreeMarker (Such as #param or #returns in Javadoc).
I don't mind rolling my own solution for that, but I'm keen on using an existing system like Doxia (since I'm using Maven to build the project) or Doxygen maybe, instead of writing something from scratch.
Ideally I'd like to write the comment parsing code only, and rely on something else to detect the macros and generate the doc structure.
I'm open to changing the format of my comments if that helps.
In case you decide to write your own doc generator or to write a FTL-specific front-end for an existing document generator, you can reuse some of FreeMarker's parsing infrastructure:
You can use Template.getRootTreeNode() in order to retrieve the template's top level AST node. Because macros and the responding comments should be direct children of the this top level node (IIRC), iterating over its children and casting them to the right AST node subclass should give you almost everything you need with respect to FTL syntax. To illustrate the approach I hacked together a little "demo" (cfg is a normal FreeMarker Configuration object):
Template t = cfg.getTemplate("foo.ftl");
TemplateElement te = t.getRootTreeNode();
Enumeration e = te.children();
while(e.hasMoreElements()) {
Object child = e.nextElement();
if(child instanceof Comment) {
Comment comment = (Comment)child;
System.out.println("COMMENT: " + comment.getText());
} else if(child instanceof Macro) {
Macro macro = (Macro)child;
System.out.println("MACRO: " + macro.getName());
for(String argumentName : macro.getArgumentNames()) {
System.out.println("- PARAM: " + argumentName);
}
}
}
produces for your given example macro:
COMMENT:
MyMacro: Does stuff with param1 and param2.
- param1: The first param, mandatory.
- param2: The second param, 42 if not specified.
MACRO: MyMacro
- PARAM: param1
- PARAM: param2
How you parse the comment is then up to you ;-)
Update: Found something called ftldoc in my backups and uploaded it to GitHub. Maybe this is what you are looking for...

How to make client side I18n with mustache.js

i have some static html files and want to change the static text inside with client side modification through mustache.js.
it seems that this was possible Twitter's mustache extension on github: https://github.com/bcherry/mustache.js
But lately the specific I18n extension has been removed or changed.
I imagine a solution where http:/server/static.html?lang=en loads mustache.js and a language JSON file based on the lang param data_en.json.
Then mustache replaces the {{tags}} with the data sent.
Can someone give me an example how to do this?
You can use lambdas along with some library like i18next or something else.
{{#i18n}}greeting{{/i18n}} {{name}}
And the data passed:
{
name: 'Mike',
i18n: function() {
return function(text, render) {
return render(i18n.t(text));
};
}
}
This solved the problem for me
I don't think Silent's answer really solves/explains the problem.
The real issue is you need to run Mustache twice (or use something else and then Mustache).
That is most i18n works as two step process like the following:
Render the i18n text with the given variables.
Render the HTML with the post rendered i18n text.
Option 1: Use Mustache partials
<p>{{> i18n.title}}</p>
{{#somelist}}{{> i18n.item}}{{/somelist}}
The data given to this mustache template might be:
{
"amount" : 10,
"somelist" : [ "description" : "poop" ]
}
Then you would store all your i18n templates/messages as a massive JSON object of mustache templates on the server:
Below is the "en" translations:
{
"title" : "You have {{amount}} fart(s) left",
"item" : "Smells like {{description}}"
}
Now there is a rather big problem with this approach in that Mustache has no logic so handling things like pluralization gets messy.
The other issue is that performance might be bad doing so many partial loads (maybe not).
Option 2: Let the Server's i18n do the work.
Another option is to let the server do the first pass of expansion (step 1).
Java does have lots of options for i18n expansion I assume other languages do as well.
Whats rather annoying about this solution is that you will have to load your model twice. Once with the regular model and second time with the expanded i18n templates. This is rather annoying as you will have to know exactly which i18n expansions/templates to expand and put in the model (otherwise you would have to expand all the i18n templates). In other words your going to get some nice violations of DRY.
One way around the previous problem is pre-processing the mustache templates.
My answer is based on developingo's. He's answer is very great I'll just add the possibility to use mustache tags in the message keycode. It is really needed if you want to be able the get messages according to the current mustache state or in loops
It's base on a simple double rendering
info.i18n = function(){
return function(text, render){
var code = render(text); //Render first to get all variable name codes set
var value = i18n.t(code)
return render(value); //then render the messages
}
}
Thus performances aren't hit because of mustache operating on a very small string.
Here a little example :
Json data :
array :
[
{ name : "banana"},
{ name : "cucomber" }
]
Mustache template :
{{#array}}
{{#i18n}}description_{{name}}{{/i18n}}
{{/array}}
Messages
description_banana = "{{name}} is yellow"
description_cucomber = "{{name}} is green"
The result is :
banana is yellow
cucomber is green
Plurals
[Edit] : As asked in the comment follows an example with pseudo-code of plural handling for english and french language. Its a very simple and not tested example but it gives you a hint.
description_banana = "{{#plurable}}a {{name}} is{{/plurable}} green" (Adjectives not getting "s" in plurals)
description_banana = "{{#plurable}}Une {{name}} est verte{{/plurable}}" (Adjectives getting an "s" in plural, so englobing the adjective as well)
info.plurable = function()
{
//Check if needs plural
//Parse each word with a space separation
//Add an s at the end of each word except ones from a map of common exceptions such as "a"=>"/*nothing*/", "is"=>"are" and for french "est"=>"sont", "une" => "des"
//This map/function is specific to each language and should be expanded at need.
}
This is quite simple and pretty straightforward.
First, you will need to add code to determine the Query String lang. For this, I use snippet taken from answer here.
function getParameterByName(name) {
var match = RegExp('[?&]' + name + '=([^&]*)')
.exec(window.location.search);
return match && decodeURIComponent(match[1].replace(/\+/g, ' '));
}
And then, I use jQuery to handle ajax and onReady state processing:
$(document).ready(function(){
var possibleLang = ['en', 'id'];
var currentLang = getParameterByName("lang");
console.log("parameter lang: " + currentLang);
console.log("possible lang: " + (jQuery.inArray(currentLang, possibleLang)));
if(jQuery.inArray(currentLang, possibleLang) > -1){
console.log("fetching AJAX");
var request = jQuery.ajax({
processData: false,
cache: false,
url: "data_" + currentLang + ".json"
});
console.log("done AJAX");
request.done(function(data){
console.log("got data: " + data);
var output = Mustache.render("<h1>{{title}}</h1><div id='content'>{{content}}</div>", data);
console.log("output: " + output);
$("#output").append(output);
});
request.fail(function(xhr, textStatus){
console.log("error: " + textStatus);
});
}
});
For this answer, I try to use simple JSON data:
{"title": "this is title", "content": "this is english content"}
Get this GIST for complete HTML answer.
Make sure to remember that other languages are significantly different from EN.
In FR and ES, adjectives come after the noun. "green beans" becomes "haricots verts" (beans green) in FR, so if you're plugging in variables, your translated templates must have the variables in reverse order. So for instance, printf won't work cuz the arguments can't change order. This is why you use named variables as in Option 1 above, and translated templates in whole sentences and paragraphs, rather than concatenating phrases.
Your data needs to also be translated, so the word 'poop', which came from data - somehow that has to be translated. Different languages do plurals differently, as does english, as in tooth/teeth, foot/feet, etc. EN also has glasses and pants that are always plural. Other languages similarly have exceptions and strange idoms. In the UK, IBM 'are' at the trade show whereas in in the US, IBM 'is' at the trade show. Russian has several different rules for plurals depending on if they are people, animals, long narrow objects, etc. In other countries, thousands separators are spaces, dots, or apostrophes, and in some cases don't work by 3 digits: 4 in Japan, inconsistently in India.
Be content with mediocre language support; it's just too much work.
And don't confuse changing language with changing country - Switzerland, Belgium and Canada also have FR speakers, not to mention Tahiti, Haiti and Chad. Austria speaks DE, Aruba speaks NL, and Macao speaks PT.

How can I search for a text and fill/click on a link with Selenium?

Here's the deal:
Is there a way to search for an input name or type witch is not precise and fill it?
For example, I want to fill any input with the name email with my email, but I maybe have some inputs named email-123, emailemail, emails etc... Is there a way to do something like * email * ?
And how can I click on a link verifying some text that could be on the link, or above the link, or close, or at class etc ?
ps: I'm using selenium ide with firefox
You can use Xpath to find it with something like //input[contains(#name,'email'). If you have multiple instances like that on the page it will be worth moving your test to your favourite programming language and then doing
emailInstances = sel.get_xpath_count("//input[contains(#name,'email')]")
for i in range(int(emailInstances)):
sel.type("//input[contains(#name,'email')]["+ i + 1 +"]","email#address.tld")
Xpath works well and the solution above is good. If you are trying to test old verions of IE you could also use JavaScript injection. I find it is very fast, although can be a bit trickier to debug. I didn't actually check if the below works but hopefully it gives you an idea of what you can do:
String javaScript = "_sl_enterEmailStr = function(parentObj,str) { "+
" var allTags = parentObj.getElementsByTagName('input'); "+
" for (var i = 0; i < allTags.length; ++i) { "+
" var tag = allTags[i]; "+
" if (tag.name && tag.type && tag.type === 'text' "+
" && tag.name.match(/email/)) { "+
" tag.value = str; "+
" } "+
" } "+
"}; "+
"_sl_enterEmailStr(this.browserbot.getCurrentWindow().document "+
" ,'myemail#mydomain.org'); ";
mySelenium.getEval(javaScript);
I find JavaScript injection with regular expressions allows me to do great things to dynamic input fields. Note you can use findElement() to be more specific about where you look for tags.
Regarding clicking a link and getting text, those are simple click() and getText() operations that can be done given the proper locator. I would check out the selenium API. for example, here is the link to the Java one for 1.0b2.

Resources