I want to classify news documents on the basis of type of content it has. For example, Sports, Politics, Entertainment etc. How i can do this using stanford- nlp? If possible, please share an example for the same.
This link should be of interest:
http://nlp.stanford.edu/software/classifier.shtml
Related
I have just recently started training for elasticsearch technology and in their book they mentioned this:
Aggregations allow hierarchical rollups too. For example, let’s find the average age of employees who share a particular interest:
Please find the link to the book, and the context of this quote by following the analytics section.
So my question is: What exactly is a hierarchical rollup?
I have consulted an article in searchbusinessanalytics but I don't understand what is the relationship between the explanation they gave and the meaning of the word as quoted above. So if someone could clarify the relationship between the two if it exists.
N.B: Here is a quote from the previously mentionsed article in searchbusinessanalytics:
The simplest definition of data rollup is that we convert categories to variables.
Thanks.
I want to train existing Stanford core-nlp's english-left3words-distsim.bin model with some more data which fits my use case. I want to assign custom tags for certain words like run will be a COMMAND.
Where can I get the training data set? I could follow something like model training
For the most part it is sections 0-18 of the WSJ Penn Treebank.
Link: https://catalog.ldc.upenn.edu/ldc99t42
We have some extra data sets we don't distribute as well that we add on to the WSJ data.
I'm using Stanford NER and I have some results with the entity "MISC" in the
4 class :Location, Person, Organization, Misc
but I don't know what really represent this entity, anyone know what is that entity ?
Thanks
MISC is a category from the CoNLL 2003 evaluation data which is typically used to develop NER models. Honestly I don't think there is any definition of MISC beyond "is a named entity" and "isn't PERSON, ORG, or LOC".
I found this description on spaCy:
"MISC: Miscellaneous entities, e.g., events, nationalities, products, or works of art."
for models recognizing PER, LOC, ORG, MISC.
I'm pretty sure I know the answer to this question but am looking for confirmation from someone with more Elasticsearch experience than me.
Let's say I've got a database containing Authors and Books. An author can be associated with 0 or more books, and a book can be associated with 1 or more authors. We want users to be able to search on author name to find the author and all his/her books, and we also want them to be able to search on book title to get back its author(s). We know there will be plenty of multi-author books.
Because Elasticsearch only directly supports one level of parent-child relationships, and because children can only have one parent, it seems to me that we need to denormalize the data and use nested objects to establish this relationship. If we modify properties of an author who has published 23 books, we will need to reindex the author record and all 23 of his/her book records.
In my fantasy world, I'd love to have those 23 books each contain an array of author IDs so that I don't have to reindex books when I reindex authors. It seems like this would definitely be possible using Elasticsearch's parent-child support if a book could only have one author, but because of the many-to-many requirement, I have to use nested objects and reindex any related objects whenever anything changes.
Is this correct? It certainly seems like more work (and certainly more updates), but I want to do this the right way, not the "clever" way that introduces complexity and bugs and madness.
Any guidance would be appreciated.
From your question I can safely assume that ES will not be your primary data-store. So the main question as to how to denormalise your many-to-many relationship is to figure out "how & what" will you use ES. That is what queries are you expected to build.
Thinking of "query command" design and denormalize accordingly. Here are a few pointers:
denormalising Authors IDs into the book: would you expect a user to execute a search such as "all book for userId=XYZ". If not, you would rather need the name of the author as a multi-field in your Book document
duplicate, duplicate and duplicate. Figure out which data will be heavily updated (authors, as book general do not gain author after their publication). Denormalize author into books (names most likely). Duplicate (into another document type) something like "author_books" which will would be a child of authors and support update fairly often (again, denormalise the title and other relevant stuff to search from the author perspective).
Hope this makes some sense ;)
I'm trying to determine if the Google Places API is suitable for a restaurant review website I'm working on (disclaimer: I'm not a developer so please excuse my lack of knowledge here).
Specifically, looking at https://developers.google.com/places/documentation/details for support, I'm trying to determine if the Places API includes the following restaurant-specific attributes in its database that we could query: cuisine type (i.e. Indian, Brunch, American) and/or neighborhood (i.e. Marina, Mission, Financial District). As an example in Layman's terms, if we were to use the Google Places API, would users on our site be able to search for Indian restaurants in the Financial District and see restaurants that meet that criteria?
Thanks,
Jaydon
You'd have to make a search using the 'type' restaurant and enter in the search the term indian, as for the location you could use a radius
In layman's terms yes you can, but the neighborhood would be determinated from one central location in a radius format (wouldn't fallow the specific outline of the actual neighborhood) and the type of food would have to be included in the search terms, you could automatically add that, in other words, you can ask the end user to specify the cuisine he is looking for (multi choice) that way you add that cuisine automatically within the search terms.