What is an aggregation hierarchical rollup? - elasticsearch

I have just recently started training for elasticsearch technology and in their book they mentioned this:
Aggregations allow hierarchical rollups too. For example, let’s find the average age of employees who share a particular interest:
Please find the link to the book, and the context of this quote by following the analytics section.
So my question is: What exactly is a hierarchical rollup?
I have consulted an article in searchbusinessanalytics but I don't understand what is the relationship between the explanation they gave and the meaning of the word as quoted above. So if someone could clarify the relationship between the two if it exists.
N.B: Here is a quote from the previously mentionsed article in searchbusinessanalytics:
The simplest definition of data rollup is that we convert categories to variables.
Thanks.

Related

How to index two document types in parent-child relationship in Elasticsearch

I am building a search functionality for two types of related documents, let's call them "blogs" and "posts", respectively a blog website (with a bunch of posts) and the specific posts written in that blog. I'd like to be able to search against both of them. In a relational database (which ES is not), I would have two main tables which would be linked against a foreign key, and I could search the two tables separately or with a join. In Elasticsearch, I am considering a parent-child relationship where "blog" is the parent document, and there are potentially many "post" documents associated with it as the child.
EDIT: I should explain why I want to index them this way. Basically, I want people to be able to search for blogs (the overall series of posts written by the same author), and the search terms might not be in the blog's description alone, but rather in the posts; for instance, a blog about Python might have a general description that talks about python, but the blog posts might talk about django, so if someone searches for "django" I'd like the python blog to come up. Also, I want people to be able to search for specific posts. I also think (prove me wrong!) these need to be separate types of documents because they would have different fields, e.g. a post might have a date field, while a blog would not have that field.
In any case: Ideally, I would like to be able to offer a search function against "blog" which would also search against the "post" text (as the relevant text might be in the post); additionally, I'd like to allow users to search all posts regardless of what blog they are associated with.
What are the best practices for setting this up? From what I can tell, Elasticsearch has removed the ability to have two types of documents on the same index, and parent-child relationships need to be on the same index. With this constraint, it seems like parent-child relationships would only be for relationships between documents of the same type, e.g. if you are indexing people and you can indicate who is a parent and child (literally).
The other option would be to create two indexes, one for blogs (which would include the posts' texts) and a second index which would include only the posts. But my instinct is that this would duplicate a tremendous amount of data, and also a lot more work to keep it updated and in sync with my main relational data store.

How to create a quick search in CRM that spans multiple entities with grouped conditions

We are a housing association with a large CRM system (2016 & SP1). We have a new requirement that requires our users to be able to search for people who are current (ie not previous) occupants or residents or who are not residents (eg contractors)
For this purpose, we need to search the Person entity which has a related Tenancy entity. Person has TenancyType field with possible (option set) values Occupant, Resident, Contractor. Tenancy has TenancyStatus field with possible (text) values Current and Previous.
We tried using the following filter criteria in the quick view on the Person entity:
thinking that it would return all people who are not previous residents. However we noticed that it would filter out contractors because contractors do not have related tenancy records.
We needed to change the criteria to return all contractors OR all residents and occupants with no previous tenancy. So we changed it to the following:
at which point we got stuck because we noticed that it was not possible to AND together the second and the third conditions as the third one is a related entity.
We are wondering what the best way is to achieve the above bearing in mind that we do not want a separate view for each condition, eg one for residents, one for none residents, etc.
Any help or suggestion is greatly appreciated.
It is not possible to do this with a single query.
Instead, you can use two queries. If you do not want to do that, then using reports (as suggested by Alex) or a BI-solution would be other possibilities.
Thanks to everyone here who spent time answering my question. The following describes the correct answer:
https://community.dynamics.com/crm/f/117/p/241352/666651#666651

How to classify documents with stanford NLP

I want to classify news documents on the basis of type of content it has. For example, Sports, Politics, Entertainment etc. How i can do this using stanford- nlp? If possible, please share an example for the same.
This link should be of interest:
http://nlp.stanford.edu/software/classifier.shtml

Best practice for handling many-to-many relationships in Elasticsearch?

I'm pretty sure I know the answer to this question but am looking for confirmation from someone with more Elasticsearch experience than me.
Let's say I've got a database containing Authors and Books. An author can be associated with 0 or more books, and a book can be associated with 1 or more authors. We want users to be able to search on author name to find the author and all his/her books, and we also want them to be able to search on book title to get back its author(s). We know there will be plenty of multi-author books.
Because Elasticsearch only directly supports one level of parent-child relationships, and because children can only have one parent, it seems to me that we need to denormalize the data and use nested objects to establish this relationship. If we modify properties of an author who has published 23 books, we will need to reindex the author record and all 23 of his/her book records.
In my fantasy world, I'd love to have those 23 books each contain an array of author IDs so that I don't have to reindex books when I reindex authors. It seems like this would definitely be possible using Elasticsearch's parent-child support if a book could only have one author, but because of the many-to-many requirement, I have to use nested objects and reindex any related objects whenever anything changes.
Is this correct? It certainly seems like more work (and certainly more updates), but I want to do this the right way, not the "clever" way that introduces complexity and bugs and madness.
Any guidance would be appreciated.
From your question I can safely assume that ES will not be your primary data-store. So the main question as to how to denormalise your many-to-many relationship is to figure out "how & what" will you use ES. That is what queries are you expected to build.
Thinking of "query command" design and denormalize accordingly. Here are a few pointers:
denormalising Authors IDs into the book: would you expect a user to execute a search such as "all book for userId=XYZ". If not, you would rather need the name of the author as a multi-field in your Book document
duplicate, duplicate and duplicate. Figure out which data will be heavily updated (authors, as book general do not gain author after their publication). Denormalize author into books (names most likely). Duplicate (into another document type) something like "author_books" which will would be a child of authors and support update fairly often (again, denormalise the title and other relevant stuff to search from the author perspective).
Hope this makes some sense ;)

UI Design approach for a questionaire

Scenario:
There is this online questionaire that will be filled in by various departments in a company. The questions are data driven and are different for each department.
But for some of the questions, the way the input is taken is also different; for some departments the same question is asked to be replied to, by selecting values from a drop down, for the others its free text entry; again, for some departments you change the caption against the area for entry. This caption is not part of the question. its also not coming from the database as of now and Id rather not put it all in the database and increase the joins for each select. Out of the twenty odd questions which have such captions, there are just 3 such captions which change.
for eg.
Department A.)
Q.) How would you like to get here?
{caption:"Enter your prefered transport method"} [Free Text Box]
Department B.)
Q.) How would you like to get here?
{caption:"Select option"} [Drop Down]
What would be the best way to design and code such web based questionairre of the ways below?
Implement it using if-else conditions for each department and show and hide input controls as per department
Abstract all common inputs into a parent class and have multiple child classes for each department which contain their own specific behavior for data input
Any other better way?
Thanks for your time. :)
I would recommend using if/else statements to show and hide the various questions.
The reason why I say if/else instead of sub-classing is that you'll come across a case where a question which used to be "common" to all Departments becomes specific to a few, and you'll have to refactor that question to the sub-classes and delete it where it's not applicable. The if/else code may get tedious, but not more tedious than the refactor I mention above.
In addition, I would also urge you to strongly consider normalizing the questions of your survey. In other words, in the example you provided above, I would make the free text and option two different questions. This doesn't change how the answers are input, it just changes how you consider the different questions.
The reason why I'm saying this is that any analysis you will be doing on the answers is dependent on comparing apples to apples -- when you restrict or free the list of available choices, you're comparing apples to oranges.

Resources