Do we need to create process() inside a new annotator? - opennlp

Im creating an annotator called "NewAnnotator" and try to make it works in a pipeline with others annotators in ClearTK like:
SentenceAnnotator, PosTaggerAnnotator, etc. So I want to be able to run pipeline:
aggregate.add(SentenceAnnotator.getDescription());
aggregate.add(PosTaggerAnnotator.getDescription());
aggregate.add(NewAnnotator.getDescription());
// run the classification pipeline on the new texts
SimplePipeline.runPipeline(reader, aggregate.createAggregateDescription());
I wrote the code with no error, but when running it returns a lot of errors, which I think from this part in my NewAnnotator code:
public static AnalysisEngineDescription getDescription() throws ResourceInitializationException {
return AnalysisEngineFactory.createPrimitiveDescription(
NewAnnotator.class,
PARAM_POSTAG_MODEL_FILE,
ParamUtil.getParameterValue(PARAM_POSTAG_MODEL_FILE, "/somepath"));
}
public static final String PARAM_POSTAG_MODEL_FILE = ConfigurationParameterFactory.createConfigurationParameterName(
PosTaggerAnnotator.class,
"postagModelFile");
I almost copy this part from PosTaggerAnnotator, but it has no use in my NewAnnotator, I just add in so that I can use:
aggregate.add(NewAnnotator.getDescription());
because I don't know any other way to add to aggregate without .getDescription(); and I also don't know how to declare a correct getDescription() in my annotator, even it can works fine without it.
So please give me some advise here if you have experienced it! Thank you!

getDescription() is a convenience method to create a default description for your annotator. It uses AnalysisEngineFactory.createPrimitiveDescription(), to which you need to provide the right arguments, like this:
public static AnalysisEngineDescription getDescription() throws ResourceInitializationException {
return AnalysisEngineFactory.createPrimitiveDescription(
NewAnnotator.class,
first_parameter_name, first_parameter_value,
second_parameter_name, second_parameter_value,
... );
}
There are more examples in the uimaFIT codebase.

Related

Return an item by id

I got this piece of code, I am learning from tutorial. I want to return an element by url which looks like clients/1 instead of clients?id=1. How can I achieve this? Also, can the code below be made easier way?
#GetMapping
public Client getClient(#RequestParam int id) {
Optional<Client> first = clientList.stream().filter(element -> element.getId() == id).findFirst();
return first.get();
}
You may want to use #PathVariable as follows:
#Controller
#RequestMapping("/clients")
public class MyController {
#GetMapping("/{id}")
public Client getClient(#PathVariable int id) {
return clientList.stream().filter(element -> element.getId() == id).findFirst().orElseThrow();
}
Please note, the Optional can be unpacked with orElseThrow method. This will throw a NoSuchElementException in case there is no element found for the id.
Other solution would be to use orElse(new Client(...)) to return a default value if nothing is found.
get() is not really recommended to be used. From the JavaDoc of the get() method:
API Note:
The preferred alternative to this method is orElseThrow().
Even though get() may also throw a NoSuchElementException, similar to orElseThrow, usually the consensus is that get should not be used without isPresent, or should not be used at all. There several other methods to unpack the Optional without forcing you write an if.
The whole idea of the Optional is to overcome this by forcing you to think about the case when there is no value inside.

redissonClient.poll() only returning the first 8 characters of String type value

Currently using reddison, creating a redissonClient and trying to poll data from redis server. I can see the data in the redis db if I check via redis-cli but when I look at the string value in my java application it is always the first 8 characters of the string and no more. Not sure why it won't give me the whole value.
I've tried using the .peek() method as well and I see the same symptom in that I only get 8 characters of the string returned.
Here is the main part of the code I can provide more details as needed:
#Service
#Slf4j
public class RedisConsumer {
RedisConfig redisConfig;
//RQueue<String> redisQueue;
RBlockingQueue<String> redisQueue;
#Autowired
RedisConsumer(RedisConfig redisConfig) {
this.redisConfig = redisConfig;
}
public void pollAuditQueue() {
//Redisson
redisQueue.add("{JSON string here snipped out for brevity}");
String item = redisQueue.poll();
if (!Objects.isNull(item)) {
log.info("I found this item: " + item);
} else {
log.info("Nothing in queue...");
}
}
#PostConstruct
private void init() throws Exception {
RedissonClient redissonClient = redisConfig.redisson();
redisQueue = redissonClient.getBlockingQueue("test");
while(true) {
pollAuditQueue();
Thread.sleep(5000);
}
}
When I look at the print statement in my console I see:
I found this item: {"AuditEv
When I check the redis-cli I can see the whole value:
1) "\xfc\t{\"AuditEvent\":{\"timestamp\":\"2018-11-27 04:31:47.818000+0000\" snipped the rest out for brevity}"
Lastly if I check that the item was removed from Redis after being polled in the Java app I can confirm that it is.
Any help would be great since it's not throwing any specific error I'm not finding any resources online to help address it.
I've found one thing I didn't notice in my earlier testing. When I manually insert using the redis cli I was replicating what my first tests through Java did which put the \xfc\t at the front which can be seen in my sample above.
Just now when I used redisQueue.add from within my application I noticed in redis it has \xfc\x80\x90\x01 instead and those do return the entire string to me in my application. I assume then this has to do with memory allocation somehow? I'm marking the question as resolved as I am no longer experiencing the issue. If anyone can drop on comment on what those letter/numbers mean though it may be meaningful for anyone that reads this post later. Once I have researched it I will add that comment myself if no one has beat me to it!
Add encoding:
RMap map = redisson.getMap("SessionMap"); -->
RMap map = redisson.getMap("SessionMap", new StringCodec("UTF-8"));

How to pursuade the ApiExplorer to create documentation for ExpandoObject?

I've created a very neat way of implementing a PATCH method for my Web.API project by making use of an ExpandoObject as a parameter. As illustrated below:
[HttpPatch, Route("api/employee/{id:int}")]
public IHttpActionResult Update(int id, [FromBody] ExpandoObject employee)
{
var source = Repository.FindEmployeeById(id);
Patch(employee, source);
Repository.SaveEmployee(source);
return Ok(source);
}
However, when generating documentation ApiExplorer is at a loss as to what to do with the ExpandoObject, which is totally understandable. Would anyone have any ideas on how to manipulate the ApiExplorer to provide some sensible documentation?
My idea was to maybe introduce an new attribute which points to the actual Type that is expected:
public IHttpActionResult Update(int id, [FromBody, Mimics(typeof(Employee))] ExpandoObject employee)
{
...
}
But I have no idea where to start, any ideas or suggestions are welcome.
So this has been the source of some late evenings in order to get the Api Explorer to play along with our developed Http Patch mechanism. Truth be told, I'd probably should do a bit of a proper write up to full explain the mechanics behind the whole idea. But for those of you who landed on this page because you want the Api explorer to use a different type in the documentation, this is where you need to look:
Open HelpPageConfigurationExtensions.cs and locate the following method:
//File: Areas/HelpPage/HelpPageConfigurationExtensions.cs
private static void GenerateRequestModelDescription(HelpPageApiModel apiModel, ModelDescriptionGenerator modelGenerator, HelpPageSampleGenerator sampleGenerator)
{
....
}
this is the location where the parameter information is available to you and also provides you with the ability to replace/substitute parameter information with something else. I ended up doing the following to handle my ExpandoObject parameter issue:
if (apiParameter.Source == ApiParameterSource.FromBody)
{
Type parameterType = apiParameter.ParameterDescriptor.ParameterType;
// do something different when dealing with parameters
// of type ExpandObject.
if (parameterType == typeof(ExpandoObject))
{
// if a request-type-attribute is defined, assume the parameter
// is the supposed to mimic the type defined.
var requestTypeAttribute = apiParameter.ParameterDescriptor.GetCustomAttributes<RequestTypeAttribute>().FirstOrDefault();
if (requestTypeAttribute != null)
{
parameterType = requestTypeAttribute.RequestType;
}
}
}
Just, note that the RequestTypeAttribute is something I devised. My WebApi endpoint looks like this now:
public IHttpActionResult Update(int id,
[FromBody, RequestType(typeof(Employee))] ExpandoObject employee)
Thank you to everyone who took time to look into the problem.

protobuf-net concurrent performance issue in TakeLock

We're using protobuf-net for sending log messages between services. When profiling stress testing, under high concurrency, we see very high CPU usage and that TakeLock in RuntimeTypeModel is the culprit. The hot call stack looks something like:
*Our code...*
ProtoBuf.Serializer.SerializeWithLengthPrefix(class System.IO.Stream,!!0,valuetype ProtoBuf.PrefixStyle)
ProtoBuf.Serializer.SerializeWithLengthPrefix(class System.IO.Stream,!!0,valuetype ProtoBuf.PrefixStyle,int32)
ProtoBuf.Meta.TypeModel.SerializeWithLengthPrefix(class System.IO.Stream,object,class System.Type,valuetype ProtoBuf.PrefixStyle,int32)
ProtoBuf.Meta.TypeModel.SerializeWithLengthPrefix(class System.IO.Stream,object,class System.Type,valuetype ProtoBuf.PrefixStyle,int32,class ProtoBuf.SerializationContext)
ProtoBuf.ProtoWriter.WriteObject(object,int32,class ProtoBuf.ProtoWriter,valuetype ProtoBuf.PrefixStyle,int32)
ProtoBuf.BclHelpers.WriteNetObject(object,class ProtoBuf.ProtoWriter,int32,valuetype
ProtoBuf.BclHelpers/NetObjectOptions)
ProtoBuf.Meta.TypeModel.GetKey(class System.Type&)
ProtoBuf.Meta.RuntimeTypeModel.GetKey(class System.Type,bool,bool)
ProtoBuf.Meta.RuntimeTypeModel.FindOrAddAuto(class System.Type,bool,bool,bool)
ProtoBuf.Meta.RuntimeTypeModel.TakeLock(int32&)
[clr.dll]
I see that we can use the new precompiler to get a speed boost, but I'm wondering if that will get rid of the issue (sounds like it doesn't use reflection); it would be a bit of work for me to integrate this, so I haven't tested it yet. I also see the option to call Serializer.PrepareSerializer. My initial (small scale) testing didn't make the prepare seem promising.
A little more info about the type we're serializing:
[ProtoContract]
public class SomeMessage
{
[ProtoMember(1)]
public SomeEnumType SomeEnum { get; set; }
[ProtoMember(2)]
public long SomeId{ get; set; }
[ProtoMember(3)]
public string SomeString{ get; set; }
[ProtoMember(4)]
public DateTime SomeDate { get; set; }
[ProtoMember(5, DynamicType = true, OverwriteList = true)]
public Collection<object> SomeArguments
}
Thanks for your help!
UPDATE 9/17
Thanks for your response! We're going to try the workaround you suggest and see if that fixes things.
This code lives in our logging system so, in the SomeMessage example, SomeString is really a format string (e.g. "Hello {0}") and the SomeArguments collection is a list of objects used to fill in the format string, just like String.Format. Before we serialize, we look at each argument and call DynamicSerializer.IsKnownType(argument.GetType()), if it isn't known, we convert it to a string first. I haven't looked at the ratios of data, but I'm pretty sure we have a lot of different strings coming in as arguments.
Let me know if this helps. If you need, I'll try to get more details.
TakeLock is only used when it is changing the model, for example because it is seeing a type for the first time. You shouldn't normally see TakeLock after the first time a particular type has been used. In most cases, using Serializaer.PrepareSerializer<SomeMessage>() should perform all the necessary initialization (and similar for any other contracts you are using).
However! I wonder if perhaps this is also related to your use of DynamicType; what are the actual objects being used here? It might be that I need to tweak the logic here, so that it doesn't spend any time on that step. If you let me know the actual objects (so I can repro), I will try to run some tests.
As for whether the precompiler would change this; yes it would. A fully compiled static model has a completely different implementation of the ProtoBuf.Meta.TypeModel.GetKey method, so it would never call TakeLock (you don't need to protect a model that can never change!). But you can actuallydo something very similar without needing to use precompile. Consider the following, run as part of your app's initialization:
static readonly TypeModel serializer;
...
var model = TypeModel.Create();
model.Add(typeof(SomeMessage), true);
// TODO add other contracts you use here
serializer = model.Compile();
This will create a fully static-compiled serializer assembly in memory (instead of a mutable model with individual operations compiled). If you now use serializer.Serialize(...) instead of Serializer.Serialize (i.e. the instance method on your stored TypeModel rather than the static method on Serializer) then it will essentially be doing something very similar to "precompiler", but without the need to actualy precompile it (obviously this will only be available on "full" .NET). This will then never call TakeLock, as it is running a fixed model, rather than a flexible model. It does, however, require you to know what contract-types you use. You could use reflection to find these, by looking for all those types with a given attribute:
static readonly TypeModel serializer;
...
var model = TypeModel.Create();
Type attributeType = typeof(ProtoContractAttribute);
foreach (var type in typeof(SomeMessage).Assembly.GetTypes()) {
if (Attribute.IsDefined(type, attributeType)) {
model.Add(type, true);
}
}
serializer = model.Compile();
But emphasis: the above is a workaround; it sounds like there's a glitch, which I'll happily investigate if I can see an example where it actually happens; most importantly: what are the objects in SomeArguments?

Can someone help me understand Guava CacheLoader?

I'm new to Google's Guava library and am interested in Guava's Caching package. Currently I have version 10.0.1 downloaded. After reviewing the documentation, the JUnit tests source code and even after searching google extensively, I still can't figure out how to use the Caching package. The documentation is very short, as if it was written for someone who has been using Guava's library not for a newbie like me. I just wish there are more real world examples on how to use Caching package propertly.
Let say I want to build a cache of 10 non expiring items with Least Recently Used (LRU) eviction method. So from the example found in the api, I build my code like the following:
Cache<String, String> mycache = CacheBuilder.newBuilder()
.maximumSize(10)
.build(
new CacheLoader<String, String>() {
public String load(String key) throws Exception {
return something; // ?????
}
});
Since the CacheLoader is required, I have to include it in the build method of CacheBuilder. But I don't know how to return the proper value from mycache.
To add item to mycache, I use the following code:
mycache.asMap().put("key123", "value123");
To get item from mycache, I use this method:
mycache.get("key123")
The get method will always return whatever value I returned from CacheLoader's load method instead of getting the value from mycache. Could someone kindly tell me what I missed?
Guava's Cache type is generally intended to be used as a computing cache. You don't usually add values to it manually. Rather, you tell it how to load the expensive to calculate value for a key by giving it a CacheLoader that contains the necessary code.
A typical example is loading a value from a database or doing an expensive calculation.
private final FooDatabase fooDatabase = ...;
private final LoadingCache<Long, Foo> cache = CacheBuilder.newBuilder()
.maximumSize(10)
.build(new CacheLoader<Long, Foo>() {
public Foo load(Long id) {
return fooDatabase.getFoo(id);
}
});
public Foo getFoo(long id) {
// never need to manually put a Foo in... will be loaded from DB if needed
return cache.getUnchecked(id);
}
Also, I tried the example you gave and mycache.get("key123") returned "value123" as expected.

Resources