I see that there is an option inside of SUTime to resolve ambiguous time references to the future, but I am not sure how to tell NER annotator to do so. For example, when annotating this sentence "let's go out on Friday" (and let's say that today's Sunday), I want SUTime to return next Friday's date, not the previous one, which appears by default, since it's closer to Sunday. Thanks.
You have to provide your own grammar file. You can copy the default one from the corenlp. It should be located somewhere like stanford-sutime-models-1.3.5.jar:edu/stanford/nlp/models/sutime/english.sutime.txt
Then add following code to the end of the section, that starts with comment # Final rules to determine how to resolve date:
{
pattern: ( [ $hasTemporal ] ),
action: VTag( $0[0].temporal.value, "resolveTo", RESOLVE_TO_FUTURE)
}
This will tag all temporals to be resolved into the future. Note, that there're several predefined tags that resolves some time patterns into the past. You can delete/modify them too.
Then provide a resource path to your file to a TimeAnnotator constructor:
Properties props = new Properties();
props.setProperty("sutime.rules", "edu/stanford/nlp/models/sutime/defs.sutime.txt,PATH_TO_YOUR_RESOURCE_FOLDER/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt");
TimeAnnotator timeAnnotator = new TimeAnnotator("sutime", props);
There is also a small trick with a DocDateAnnotation. If you want time patterns like "on Friday at 7pm" to be resolved correctly, you should provide an iso formatted datetime (not only a date like YYYY-MM-DD) into a DocDateAnnotation.
Related
I want to add addresses (and possibly other rules based entities) to an NER pipeline and the Tokens Regex seems like a terribly useful DSL for doing so. Following https://stackoverflow.com/a/42604225, I'm created this rules file:
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
{ pattern: ([{ner:"NUMBER"}] [{pos:"NN"}|{pos:"NNP"}] /ave(nue)?|st(reet)?|boulevard|blvd|r(oa)?d/), action: Annotate($0, ner, "address") }
Here's a scala repl session showing how I'm trying to set up an annotation pipeline.
# import edu.stanford.nlp.pipeline.{StanfordCoreNLP, CoreDocument}
# import edu.stanford.nlp.util.PropertiesUtils.asProperties
# val pipe = new StanfordCoreNLP(asProperties(
"customAnnotatorClass.tokensregex", "edu.stanford.nlp.pipeline.TokensRegexAnnotator",
"annotators", "tokenize,ssplit,pos,lemma,ner,tokensregex",
"ner.combinationMode", "HIGH_RECALL",
"tokensregex.rules", "addresses.tregx"))
pipe: StanfordCoreNLP = edu.stanford.nlp.pipeline.StanfordCoreNLP#2ce6a051
# val doc = new CoreDocument("Adam Smith lived at 123 noun street in Glasgow, Scotland")
doc: CoreDocument = Adam Smith lived at 123 noun street in Glasgow, Scotland
# pipe.annotate(doc)
# doc.sentences.get(0).nerTags
res5: java.util.List[String] = [PERSON, PERSON, O, O, address, address, address, O, CITY, O, COUNTRY]
# doc.entityMentions
res6: java.util.List[edu.stanford.nlp.pipeline.CoreEntityMention] = [Adam Smith, 123, Glasgow, Scotland]
As you can see, the address gets correctly tagged in the nerTags for the sentence, but it doesn't show up in the documents entityMentions. Is there a way to do this?
Also, is there a way from the document to discern two adjacent matches of the tokenregex from a single match (assuming I have more complicated set of regexes; in the current example I only match exactly 3 tokens, so I could just count tokens)?
I tried approaching it using the regexner with a tokens regex described here https://stanfordnlp.github.io/CoreNLP/regexner.html, but I couldn't seem to get that working.
Since I'm working in scala I'll be happy to dive into the Java API to get this to work, rather than fiddle with properties and resource files, if that's necessary.
Yes, I've recently added some changes (in the GitHub version) to make this easier! Make sure to download the latest version from GitHub. Though we are aiming to release Stanford CoreNLP 3.9.2 fairly soon and it will have these changes.
If you read this page you can get an understanding of the full NER pipeline run by the NERCombinerAnnotator.
https://stanfordnlp.github.io/CoreNLP/ner.html
Furthermore there is a lot of write up on the TokensRegex here:
https://stanfordnlp.github.io/CoreNLP/tokensregex.html
Basically what you want to do is run the ner annotator, and use it's TokensRegex sub-annotator. Imagine you have some named entity rules in a file called my_ner.rules.
You could run a command like this:
java -Xmx5g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.tokensregex.rules my_ner.rules -outputFormat text -file example.txt
This will run a TokensRegex sub-annotator during the full named entity recognition process. Then when the final step of entity mentions are run, it will operate on the rules extracted named entities and create entity mentions from them.
Im trying to build a email template with Freemarker/Clickdimensions plugin in CRM 2013. I have a "Date only" field on an entity which for example contains the date 2017-04-17. I want this date to show as the following: Monday 17 april.
This is done with Freemarker and I have tried the following:
<#assign x = Recipient.field_booking.field_scheduleddate?time>
${x?string.full}
This doesnt seem to work. Im not getting any result at all, just an empty line.
Does anyone know what could be wrong?
I will assume that field_scheduleddate is a string (not a java.util.Date).
At ?time FreeMarker should throw and exception saying something like that the string doesn't follow the expected pattern. I suspect the framework you are using catches and suppresses that exception (which makes using FreeMarker much much harder). Check the logs, maybe it's there.
You want to deal with a date-only value there, hence you should use ?date, as ?time is for time-only values. Also, field_scheduleddate apparently uses ISO 8601 format, so unless the date_format configuration setting is set to ISO, you will have to use ?date.iso (supported since FreeMarker 2.3.21).
As of printing the date, ?string.full should work, but usually you should set date_format globally to the format you prefer, and then you can simply write ${x}.
(Also note that #assign is unnecessary above, as you can put arbitrarily complex expression inside ${}.)
I am trying to parse a date string using the following pattern: yyMMdd and the STRICT resolver as follows:
DateTimeFormatter formatter = DateTimeFormatter.ofPattern(dateFormat).withResolverStyle(ResolverStyle.STRICT);
LocalDate.parse(expiryDate, formatter);
I get the following DateTimeParseException:
java.time.format.DateTimeParseException: Text '160501' could not be
parsed: Unable to obtain LocalDate from TemporalAccessor:
{YearOfEra=2016, MonthOfYear=5, DayOfMonth=1},ISO of type
java.time.format.Parsed
When I swith to the default resolve style, i.e. ResolverStyle.SMART it allows such dates as 30th of February.
Can someone please help?
The strict resolver requires an era to go with YearOfEra. Change your pattern to use "u" instead of "y" and it will work, ie. "uuMMdd".
While JodaStephen has nicely explained the reason for the exception and given one good solution (use uu rather than yy), I am offering a couple of other possible solutions:
The obvious one that you probably don’t want: leave the resolver style at SMART (the default). In other words either leave out .withResolverStyle(ResolverStyle.STRICT) completely or change it to .withResolverStyle(ResolverStyle.SMART).
Provide a default era.
For the second option here is a code example:
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("yyMMdd")
.parseDefaulting(ChronoField.ERA, 1)
.toFormatter()
.withResolverStyle(ResolverStyle.STRICT);
String expiryDate = "160501";
LocalDate result = LocalDate.parse(expiryDate, formatter);
System.out.println(result);
Output is:
2016-05-01
Where the last solution may make a difference compared to using uu in the format pattern:
It allows us to use a format pattern that is given to us where we cannot control whether pattern letter u or y is used.
With pattern letter y it will fail with an exception if the string contains a negative year. Depending on your situation and requirements this may be desirable or unacceptable.
Edit: The second argument to parseDefaulting() may also be written IsoEra.CE.getValue() rather than just 1 to make it clear that we are specifying the current era (CE; also often called Anno Domini or AD).
As Xaerxess found in this topic: Month name in genitive (Polish locale) with Joda-Time DateTimeFormatter
in JDK8 DateFormatSymbols.getInstance(new Locale("pl", "PL")).getMonths() returns month names in genitive by default. Previous Java version returns month names in nominative case.
With this, for example, SimpleDateFormat format with "dd-MMMM-yyyy" pattern gives different result in JDK8 than in JDK6 or 7.
It's a big change and some of my old application doesn't work properly with a new month names. I'm looking for a solution to change globally default month names for Locale PL.
I tried with DateFormatSymbols.getInstance().setMonths(new String[] {..}), but it doesn't work globally.
If I'll find a solution for changing default month names with Java code, I could add this code at application initialization, without correcting the whole app. In my case I'll just simply add an servlet to my web app with load-on-startup option.
Or maybe you have a different idea how to make Java 8 compatible in this case? Maybe there is parameter / option which I could pass to jvm on start?
I had the same issue. Of all the billion standards my client decided to use all capital letters for the month breaking the default parsers. So my date looked like:
02-DEC-15
Now Java time can Parse "Dec" but not "DEC". Annoying .. I searched for a while and found the java8 replacement for this problem.
This is how you can add custom anything to your formatter:
public static void main(String[] args) {
String test = "02-DEC-15";
Map<Long, String> lookup = new HashMap<>();
lookup.put(1L, "JAN");
lookup.put(2L, "FEB");
lookup.put(3L, "MAR");
lookup.put(4L, "APR");
lookup.put(5L, "MAY");
lookup.put(6L, "JUN");
lookup.put(7L, "JUL");
lookup.put(8L, "AUF");
lookup.put(9L, "SEP");
lookup.put(10L, "OCT");
lookup.put(11L, "NOV");
lookup.put(12L, "DEC");
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("dd-")
.appendText(ChronoField.MONTH_OF_YEAR, lookup)
.appendPattern("-yy")
.toFormatter();
LocalDate parse = LocalDate.parse(test,formatter);
System.out.println(parse);
}
This is a bit tricky because it did not behave the way I expected it. Here's a few points to consider:
Each append call will add a new parser instance that will be called IN ORDER strictly failing if it does not work. So for example, if you take the above example and thought you could just add months, saying:
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendPattern("dd-MMM-yy")
.appendText(ChronoField.MONTH_OF_YEAR, lookup)
.toFormatter();
This will fail, because the pattern adds multiple parsers for your formatter:
dd
-
MMM
-
yy
Adding the lookup at the end won't work, because MMM will already fail your parsing context.
Each step can therefore be added separately. Look at:
MonthDay#PARSER
This is what finally helped me to find the correct solution. In any case, using the Builder you can construct whatever freaky non-standard parsers you want. Please however do remember to yell at everyone who thinks they can come up with yet another way of representing a date that needs to be unleashed into this world.
I hope this saves someone some heartache.
Artur
Edit: So once again I half ignored the question. I don't think there is a default way, unless you register a provider for yourself. However that seems a bit wrong to me. I think the way the parsing is meant to work is to have a static instance of your parser that you use. This is with regards to looking at how for example the default parsers in java time are implemented.
So just create your custom parser at startup and reference it throughout your application for parsing.
How does the Grails tag fieldValue perform its formatting?
I have an domain class with an Double attribute.
class Thing {
Double numericValue
}
In GSP, the fieldValue is used (as created by grails generate-all) for rendering:
${fieldValue(bean:thing, field:"numericValue")}
Unfortunately digits after 3 decimal places are not displayed (ie, 0.123456 is displayed as 0.123). How do I control fieldValue's formatting?
Note that I could just use ${thing.numericValue} (which does no formatting) or
<g:formatNumber>, but I'd rather use the fieldValue tag and specify the formatting. I just don't know where to specify fieldValue's formatting.
Use
<g:formatNumber number="${thing.numericValue}" format="\\$###,##0.00" />
instead or use
${g.formatNumber(number:thing.numericValue, format:'\\$###,##0.00'}
Hope this helps.
An alternative to the answers above is using the i8n files. This option is useful since it can be changed for "All" and depending on the locale
if you go to the messages.properties file you can add the following
default.number.format = ###,##0.00
This will change the default format for all numbers.
If you plan on using the g:formatNumber tag i would suggest using it as
<g:formatNumber number="${myNumber}" formatName="myCustom.number.format" />
and adding the code entry to the messages.properties files as:
myCustom.number.format = ###,##0.00
by doing this you will only need to use the code wherever you need a similar number format, and, if needed make the changes in a single place.
It would be in your best interests to read this article from the grails docs.
OFFTOPIC: As a side note you can also change the default date format in the messages.properties file as follows
default.date.format=dd 'de' MMMM 'de' yyyy