How do I train one model for OpenNLP Name Entity from multiple files in DKPro Core? - opennlp

How do I train one model from multiple files in DKPro Core?
After annotate many documents in WebAnno and export in XMI format I tryed to create a model with this code:
File model = new File("/tmp/", "model.bin");
SimplePipeline.runPipeline(CollectionReaderFactory.createReaderDescription(XmiReader.class,
ResourceCollectionReaderBase.PARAM_SOURCE_LOCATION, "/tmp/",
ResourceCollectionReaderBase.PARAM_PATTERNS, ResourceCollectionReaderBase.INCLUDE_PREFIX + "*.xmi"),
AnalysisEngineFactory.createEngineDescription(OpenNlpNamedEntityRecognizerTrainer.class,
OpenNlpNamedEntityRecognizerTrainer.PARAM_TARGET_LOCATION, model,
OpenNlpNamedEntityRecognizerTrainer.PARAM_LANGUAGE, "pt"));
}
The problem is that although it did open the multiple annotated files only one file was trained.

The reader opens all files and sends them one-by-one to the trainer. The trainer learns from all of them and produces a single output model. That is why you only see one output file.
If you wanted to create one model per input file, you'd have to create a loop which passes the files one-by-one to the reader.

Related

Is it possible to use two or more repeating groups (not nestead) on same page in PDF template. BI Publisher

fProcessor.process() method stuck when PDF template with two repeating groups on same page is used.
Is it possible to use two or more repeating groups (not nested) on same page?
FormProcessor fProcessor = new FormProcessor();
fProcessor.setTemplate(args[0]); // Input File (PDF) name
fProcessor.setData(args[1]); // Input XML data file name
fProcessor.setOutput(args[2]); // Output File (PDF) name
fProcessor.process();
Received following reply from oracle support
PDF templates do not support multiple nor nested loops.
You can only use one.

Possible to set file name for h2o.save_model() (rather then simply use the model_id value)?

Trying to save an h2o model with some specific name that differs from the model's model_id field, but trying something like...
h2o.save_model(model=model,
path='/some/path/then/filename',
force=False)
just creates a dir/file structure like
some
|__path
|__then
|__filename
|__<model_id>
as opposed to
some
|__path
|__then
|__filename
Is this possible to do from the save_model method?
I can't / hesitate to simply change the model_id before calling the save method because the model names have timestamps appended to them to avoid name collisions with other models that may be on the h2o cluster (am trying to remove these timestamps when saving on disk and simplifying the name on the cluster before saving creates a time where naming collision can occur if other processes are also attempting to save such a model (of, say, a different timestamp)).
Any way to get this behavior or other common alternatives / workarounds?
This is currently not possible, however I created a feature request here. There is a related question here which shows a solution for R (could be adapted to Python). The work-around is just to rename the file manually using a few lines of R/Python code.

Swift: Populating Information from a Data File

I'm new to Swift and programming in general.
I'm working on a small OSX application that displays information for countries when the user clicks on a map of the world. The map interface works fine. I've tried this on a smaller map with just four countries. I've put my country data into a class called Country with variables for the data (e.g. population, landArea, majorExport, etc.) I put the Countries into an array. When the user selects a country, the controller grabs the right Country from the array and populates the data fields. So far, so good.
I'm getting ready to scale up to a map of the world with 150+ countries. Is there a way to store all of my data in a separate file (like a .csv file) so I don't have to hard code all of this Country data directly in a .swift file? If so:
(a) what kind of file should I use?
(b) how would I set it up?
(c) how do I get the application access it?
Thank you.
Now that I think about it, you can use a .plist file. It's the easiest, because you can simple use dict.writeToFile: to write and NSDictionary(contentsOfFile:) to read. And since NSDictionary is bridged to Swift, it just works. I found this great article

Create a file with Windows Property Store (metadata) using win32 API

I'd like to create a new stub file "test.mp3" for instance, and add a Window Property to it ( System.Author for instance).
the solution must be usable for several file extension as text, picture, videos, etc...
If I just create a file and use IShellItem2::GetPropertyStore I get a HRESULT fail for invalid Arguments.
Use IShellItem2::GetPropertyStore on a real music file I can read and write Its properties just fine.
Please test your suggestions first.
Property Stores typically access and store data within the file itself. In your case of a mp3 file, it would be attempting to read and write the ID3 tags. Also, Property Stores are not stored in a database and cannot be arbitrarily added to files that don't support it.
You'll most likely need to implement your own property handlers to do what it appears you're trying to accomplish. For types that already have handlers, you'll have to replace the system handlers with your own.
The most likely reason your mp3 test is failing is that you have an empty file with no data and no valid ID3 tags.

OS X Data file editor saving to XML: Document based or not?

So I'm trying to make a data editor for an iOS/Android app I've got. There's 3 separate data files that I'd like to be able to edit, and I would like to save them to plist files or xml files. I'm planning on using Core Data in the app. The problems I'm running into:
1). Should this be a Document-Based Application or not?
2). If so, how would I set it up to allow editing of 3 different structures of files?
3). And if so, how would I go about setting the document based app to use regular plist/xml files as the file type instead of some custom file type?
The plan is for the editor to be able to open up and edit the files, and then the saved files can be copied into the project resources of the iOS and Android apps.
1. Should the app be document-based?
Yes.
2. How would one allow editing three different structures of files?
Choose from any of the following:
Create three dictionaries in the document types list, all three of which reference the same document class.
The same as above, but with windowNibName or makeWindowControllers choosing the UI depending on the document type. In other words, shared model code, but different view hierarchies. (I probably would choose either of the alternatives instead.)
Create three document type dictionaries, each of which has its own document class.
Which one you choose will depend on just how different the types are.
You'll probably want to export a UTI for each document type, as well. Xcode will not help you there; you'll need to write each UTI dictionary by hand.
3. How would I set the document based app to use regular plist/xml files as the file type instead of some custom file type?
If you export one or more UTIs, you should set the parent UTI(s) of each UTI appropriately, but that's advisory; all it means is you'll be able to open the documents with generic plist or XML editors/viewers.
Reading in and writing out the data is up to you, in each document class. You will have to use NSPropertyListSerialization, NSXMLParser, PRHXMLParser, NSXMLDocument, or something else, as you see fit; NSDocument does not handle your file format for you.

Resources