How to define document ordering based on filter parameter - elasticsearch

Hi Elasticsearch experts.
I have a problem which might be realted to the fact I am indexing DB relational data.
My scenario is the following:
I have two entities:
documents and meetings.
Documents and meetings are independent entities. Although it is possible to assign documents to meetings in a given order.
We are using a join table for this in the DB.
meetings(id,name,date)
document(id,title,author)
meeting_document(doc_id,meeting_id,order)
In elasticsearch I am indexing the documents_id as NESTED property of the meeting
meeting example:
{
id: 25
name:"test",
documents: [22,12,24,55]
}
I will fetch the meeting, after this I would like to send a request to the documents filtering on document.id and asking elasticsearch to return the list in the same order I passed in the list of ids to the filter.
What is the best way to implement this ?
Thanks

Nice Question,
I've spent some time figuring a solution for you and come up with a solution, It might be tricky one but works.
Lets have a look to my query,
I've used script score, for sorting by user defined list.
POST index/type/_search
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "ar.size()-ar.indexOf(doc['docid'].value)",
"params": {
"ar": [
"1",
"2",
"4",
"3"
]
}
}
}
]
}
},
"filter": {
"terms": {
"docid": [
"1",
"2",
"4",
"3"
]
}
}
}
The thing you have to take care is,
send, same value for filter and in params. Like in the above query.
This returns me hits with doc ids, 1, 2, 4, 3 .
You have to change field name inside script and in filter, and you can use termQuery inside query object.
I've tested the code, Hope this helps!!
Thanks

Related

Elastic Search | How to get original search query with corresponding match value

I'm using ElasticSearch as search engine for a human resource database.
The user submits a competence (f.ex 'disruption'), and ElasticSearch returns all users ordered by best match.
I have configured the field 'competences' to use synonyms, so 'innovation' would match 'disruption'.
I want to show the user (who is performing the search) how a particular search result matched the search query. For this I use the explain api (reference)
The query works as expected and returns an _explanation to each hit.
Details (simplified a bit) for a particular hit could look like the following:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
}
Problem: I cannot see what the original search term was in the _explanation. (As illustrated in example above: I can see that some search query matched with 'innovation' or 'disruption', I need to know what the skill the users searched for)
Question: Is there any way to solve this issue (example: parse a custom 'description' with info about the search query tag to the _explanation)?
Expected Result:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
customDescription: 'innovation'
}
Maybe you can put the original query in the _name field?
Like explained in https://qbox.io/blog/elasticsearch-named-queries:
GET /_search
{
"query": {
"query_string" : {
"default_field" : "skills",
"query" : "disruption",
"_name": "disruption"
}
}
}
You can then find the proginal query in the matched queries section in the return object:
{
"_index": "testindex",
"_type": "employee",
"_id": "2",
"_score": 0.19178301,
"_source": {
"skills": "disruption"
},
"matched_queries": [
"disruption"
]
}
Add the explain to the solution and i think it would work fine...?

How can i get unique suggestions without duplicates when i use completion suggester?

I am using elastic 5.1.1 in my environment. I have chosen completion suggester on a field name post_hashtags with an array of strings to have suggestion on it. I am getting response as below for prefix "inv"
Req:
POST hashtag/_search?pretty&&filter_path=suggest.hash-suggest.options.text,suggest.hash-suggest.options._source
{"_source":["post_hashtags" ],
"suggest": {
"hash-suggest" : {
"prefix" : "inv",
"completion" : {
"field" : "post_hashtags"
}
}
}
Response :
{
"suggest": {
"hash-suggest": [
{
"options": [
{
"text": "invalid",
"_source": {
"post_hashtags": [
"invalid"
]
}
},
{
"text": "invalid",
"_source": {
"post_hashtags": [
"invalid",
"coment_me",
"daya"
]
}
}
]
}
]
}
Here "invalid" is returned twice because it is also a input string for same field "post_hashtags" in other document.
Problems is if same "invalid" input string present in 1000 documents in same index then i would get 1000 duplicated suggestions which is huge and not needed.
Can I apply an aggregation on a field of type completion ?
Is there any way I can get unique suggestion instead of duplicated text field, even though if i have same input string given to a particular field in multiple documents of same index ?
ElasticSearch 6.1 has introduced the skip_duplicates operator. Example usage:
{
"suggest": {
"autocomplete": {
"prefix": "MySearchTerm",
"completion": {
"field": "name",
"skip_duplicates": true
}
}
}
}
Edit: This answer only applies to Elasticsearch 5
No, you cannot de-duplicate suggestion results. The autocomplete suggester is document-oriented in Elasticsearch 5 and will thus return suggestions for all documents that match.
In Elasticsearch 1 and 2, the autocomplete suggester automatically de-duplicated suggestions. There is an open Github ticket to bring back this functionality, and it looks like it is possible to do so in a future version.
For now, you have two options:
Use Elasticsearch version 1 or 2.
Use a different suggestion implementation not based on the autocomplete suggester. The only semi-official suggestion I have seen so far involve putting your suggestion strings in a separate index.

Scope Elasticsearch Results to Specific Ids

I have a question about the Elasticsearch DSL.
I would like to do a full text search, but scope the searchable records to a specific array of database ids.
In SQL world, it would be the functional equivalent of WHERE id IN(1, 2, 3, 4).
I've been researching, but I find the Elasticsearch query DSL documentation a little cryptic and devoid of useful examples. Can anyone point me in the right direction?
Here is an example query which might work for you. This assumes that the _all field is enabled on your index (which is the default). It will do a full text search across all the fields in your index. Additionally, with the added ids filter, the query will exclude any document whose id is not in the given array.
{
"bool": {
"must": {
"match": {
"_all": "your search text"
}
},
"filter": {
"ids": {
"values": ["1","2","3","4"]
}
}
}
}
Hope this helps!
As discussed by Ali Beyad, ids field in the query can do that for you. Just to complement his answer, I am giving an working example. In case anyone in the future needs it.
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field": "your query"
}
},
{
"ids" : {
"values" : ["0aRM6ngBFlDmSSLpu_J4", "0qRM6ngBFlDmSSLpu_J4"]
}
}
]
}
}
}
You can create a bool query that contains an Ids query in a MUST clause:
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-ids-query.html
By using a MUST clause in a bool query, your search will be further limited by the Ids you specify. I'm assuming here by Ids you mean the _id value for your documents.
According to es doc, you can
Returns documents based on their IDs.
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
With elasticaBundle symfony 5.2
$query = new Query();
$IdsQuery = new Query\Ids();
$IdsQuery->setIds($id);
$query->setQuery($IdsQuery);
$this->finder->find($query, $limit);
You have two options.
The ids query:
GET index/_search
{
"query": {
"ids": {
"values": ["1, 2, 3"]
}
}
}
or
The terms query:
GET index/_search
{
"query": {
"terms": {
"yourNonPrimaryIdField": ["1", "2","3"]
}
}
}
The ids query targets the document's internal _id field (= the primary ID). But it often happens that documents contain secondary (and more) IDs which you'd target thru the terms query.
Note that if your secondary IDs contain uppercase chars and you don't set their field's mapping to keyword, they'll be normalized (and lowercased) and the terms query will appear broken because it only works with exact matches. More on this here: Only getting results when elasticsearch is case sensitive

Constructing a NEST/ElasticSearch query with nested properties

I'm querying an ElasticSearch database (the Danish CVR registry) using NEST in C#. I'm trying to formulate a query that will query this scheme:
relations: [
{
participant: {
key: 123123
},
organisations: [
{
organisationName: {
name: "some string",
period: {
from: "SOME DATE"
to: "SOMEDATE OR NULL"
}
},
... more of similar objects ..
}
]
},
.. more of similar objects ..
]
My problem here is that I need to find documents that have a certain participant.key value, while at the same time has a specific organisations.organisationName.name and a missing or null value in organisations.organisationName.period.to
I know I need to use a nested query to get documents that have both a null value in the to field and a certain name in the name field, but on top of that I need to also have the specific key in the particiant.key field, and this is where I'm having trouble. Note that all 3 fields that I'm checking must be within the same relations object, and the to and name fields must be within the same organisationName object.
The query without the key part as a JSON query is this:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "relations.organisations.organisationName",
"score_mode": "max",
"query": {
"bool": {
"must": [
{ "match": { "relations.organisations.organisationName.name": "EJERREGISTER" }},
{"filtered": { "filter" : {
"missing" : { "field" : "relations.organisations.organisationName.period.to" }
} } }
]
}}}}
]
}}}
Hoping someone out there is apt at making these queries in the NEST Query DSL. I could also work from a pure ElasticSearch JSON query, but the .NET equivalent would be my preferred option :)
Thanks in advance!
After some experimentation I came to the conclusion that the right answer to my problem would be a query with a nested query that 1. Checks the key, and 2. has a nested query that does the other things I needed in organisation.organisationName object.
I couldn't quite verify this, however, because the database I'm querying does not have the relations-object marked as nested (and I can't change that since it's a government database)
My workaround was to retrieve all relations related to my keys, and then filtering out the remaining objects in memory, as this wasn't too much overhead in my scenario.
Edit: as a follow up, the external database I was using added the nested clause, and it worked as explained above.

Elasticsearch to recommend book authors: how to limit maximum 3 books per author?

I use Elasticsearch to recommend authors (my Elasticsearch documents represent books, with a title, a summary and a list of author ids).
The user queries my index with some text (e.g. Georgia or Paris) and I need to aggregate the score of individual books at the author level (meaning: recommand an author that writes about Paris).
I began with a simple aggregation, however, experimentally (cross-validation) it is best to stop aggregating the score of each users after maximum 4 books per user. This way, we do not have an author with 200 books that can "dominate" the results. Let me explain in pseudocode:
# the aggregated score of each author
Map<Author, Double> author_scores = new Map()
# the number of books (hits) that contributed to each author
Map<Author, Integer> author_cnt = new Map()
# iterate ES query results
for Document doc in hits:
# stop aggregating if more that 4 books from this author have already been found
if (author_cnt.get(doc.author_id) < 4):
author_scores.increment_by(doc.author_id, doc.score)
author_cnt.increment_by(doc.author_id, 1)
the_result = author_scores.sort_map_by_value(reverse=true)
So far, I have implemented the above aggregation in custom application code, but I was wondering if it was possible to rewrite it using Elasticsearch's query DSL or org.elasticsearch.search.aggregations.Aggregator interface.
My opinion is that you cannot do this with the features ES offers. The closest thing I could find about your requirement is "top_hits" aggregation. With this you perform your query, you aggregate on whatever you want and then you say you need only the top X hits ordered by a criteria.
For your particular scenario your query is a "match" for "Paris", the aggregation is on author id and then you tell ES to only return the first 3 books, ordered by score for each author. The good part is that ES will offer you the best three books for each particular author, ordered by relevance, and not all the books for each or none. The not-so-good part is that "top-hits" doesn't allow another sub-aggregation to make possible a sum of scores only for those "top hits". In this case you would still need to compute the sum of scores for each author.
And a sample query:
{
"query": {
"match": {
"title": "Paris"
}
},
"aggs": {
"top-authors": {
"terms": {
"field": "author_ids"
},
"aggs": {
"top_books_hits": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"_source": {
"include": [
"title"
]
},
"size": 3
}
}
}
}
}
}

Resources