Elasticsearch 5.1: applying additional filters to the "more like this" query - elasticsearch

Building a search engine on top of emails. MLT is great at finding emails with similar bodies or subjects, but sometimes I want to do something like: show me the emails with similar content to this one, but only from joe#yahoo.com and only during this date range. This seems to have been possible with ES 2.x, but it seems that 5.x doesn't allow allow filtration on fields other than that being considered for similarity. Am I missing something?
i still can't figure how to do what i described. Imagine I have an index of emails with two types for the sake of simplicity: body and sender. I know now to find messages that are restricted to a sender, the posted query would be something like:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"sender": "mike#foo.com"
}
}
]
}
}
}
}
}
Similarly, if I wish to know how to find messages that are similar to a single hero message using the contents of the body, i can issue a query like:
{
"query": {
"more_like_this": {
"fields" : ["body"],
"like" : [{
"_index" : "foo",
"_type" : "email",
"_id" : "a1af33b9c3dd436dabc1b7f66746cc8f"
}],
"min_doc_freq" : 2,
"min_word_length" : 2,
"max_query_terms" : 12,
"include" : "true"
}
}
}
both of these queries specify the results by adding clauses inside the query clause of the root object. However, any way I try to put these together gives me parse exceptions. I can't find any examples of documentations that would say, give me emails that are similar to this hero, but only from mike#foo.com

You're almost there, you can combine them both using a bool/filter query like this, i.e. make an array out of your filter and put both constraints in there:
{
"query": {
"bool": {
"filter": [
{
"term": {
"sender": "mike#foo.com"
}
},
{
"more_like_this": {
"fields": [
"body"
],
"like": [
{
"_index": "foo",
"_type": "email",
"_id": "a1af33b9c3dd436dabc1b7f66746cc8f"
}
],
"min_doc_freq": 2,
"min_word_length": 2,
"max_query_terms": 12,
"include": "true"
}
}
]
}
}
}

Related

How to get the best matching document in Elasticsearch?

I have an index where I store all the places used in my documents. I want to use this index to see if the user mentioned one of the places in the text query I receive.
Unfortunately, I have two documents whose name is similar enough to trick Elasticsearch scoring: Stockholm and Stockholm-Arlanda.
My test phrase is intyg stockholm and this is the query I use to get the best matching document.
{
"size": 1,
"query": {
"bool": {
"should": [
{
"match": {
"name": "intyig stockholm"
}
}
],
"must": [
{
"term": {
"type": {
"value": "4"
}
}
},
{
"terms": {
"name": [
"intyg",
"stockholm"
]
}
},
{
"exists": {
"field": "data.coordinates"
}
}
]
}
}
}
As you can see, I use a terms query to find the interesting documents and I use a match query in the should part of the root bool query to use scoring to get the document I want (Stockholm) on top.
This code worked locally (where I run ES in a container) but it broke when I started testing on a cluster hosted in AWS (where I have the exact same dataset). I found this explaining what happens and adding the search type argument actually fixes the issue.
Since the workaround is best not used on production, I'm looking for ways to have the expected result.
Here are the two documents:
// Stockholm
{
"type" : 4,
"name" : "Stockholm",
"id" : "42",
"searchableNames" : [
"Stockholm"
],
"uniqueId" : "Place:42",
"data" : {
"coordinates" : "59.32932349999999,18.0685808"
}
}
// Stockholm-Arlanda
{
"type" : 4,
"name" : "Stockholm-Arlanda",
"id" : "1832",
"searchableNames" : [
"Stockholm-Arlanda"
],
"uniqueId" : "Place:1832",
"data" : {
"coordinates" : "59.6497622,17.9237807"
}
}

Custom ordering on elastic search

I'm executing a simple query which returns items matched by companyId.
In addition to only showing clients matching a specific company I also want records matching a certain location to appear at the top.So if somehow I pass through pseudo sort:"location=Johannesburg" it would return the data below and items which match the specific location would appear on top, followed by items with other locations.
Data:
{
"clientId" : 1,
"clientName" : "Name1",
"companyId" : 8,
"location" : "Cape Town"
},
{
"clientId" : 2,
"clientName" : "Name2",
"companyId" : 8,
"location" : "Johannesburg"
}
Query:
{
"query": {
"match": {
"companyId": "8"
}
},
"size": 10,
"_source": {
"includes": [
"firstName",
"companyId",
"location"
]
}
}
Is something like this possible in elastic and if so what is the name of this concept?(I'm not sure what to even Google for to solve this problem)
It can be done in different ways.
Simplest (if go only with text matching) is use bool query with should statement.
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document. Doc
Example:
{"query":
"bool": {
"must": [
"match": {
"companyId": "8"
}
],
"should": [
"match": {
"location": "Johannesburg"
}
]
}
}
}
More complex solution is to store GEO points in location, and use Distance feature query as example.

ElasticSearch 5.x context suggester with multiple contexts

I want to use the context suggester from elasticSearch, but my suggestion results need to match 2 context values.
Expanding the example from the docs, i want to do something like:
POST place/_search?pretty
{
"suggest": {
"place_suggestion" : {
"prefix" : "tim",
"completion" : {
"field" : "suggest",
"size": 10,
"contexts": {
"place_type": [ "cafe", "restaurants" ],
"rating": ["good"]
}
}
}
}
}
I would like to have results that have a context 'cafe' or 'restaurant' for place_type AND that have the context 'good' for rating.
When I try something like this, elastic performs an OR operation on the contexts, giving me all suggestions with the context 'cafe', restaurant' OR 'good'.
Can I somehow specify what BOOL operator elastic needs to use for combining multiple contexts?
It looks like this functionality isn't supported from Elasticsearch 5.x onwards:
https://github.com/elastic/elasticsearch/issues/21291#issuecomment-375690371
Your best bet is to create a composite context, which seems to be how Elasticsearch 2.x achieved multiple contexts in a query:
https://github.com/elastic/elasticsearch/pull/26407#issuecomment-326771608
To do this, I guess you'll need a new field in your mapping. Let's call it cat-rating:
PUT place
{
"mappings": {
"properties": {
"suggest": {
"type": "completion",
"contexts": [
{
"name": "place_type-rating",
"type": "category",
"path": "cat-rating"
}
]
}
}
}
}
When you index new documents you'll need to concantenate the fields place_type and rating together, separated by -, for the cat-rating field.
Once that's done your query will need to look something like this:
POST place/_search?pretty
{
"suggest": {
"place_suggestion": {
"prefix": "tim",
"completion": {
"field": "suggest",
"size": 10,
"contexts": {
"place_type-rating": [
{
"context": "cafe-good"
},
{
"context": "restaurant-good"
}
]
}
}
}
}
}
That'll return suggestions of good cafe's OR good restaurants.

Different boosting for the same field in different types in Elasticsearch 2.x with multi_match query

I am trying to do the following as described in the documentation (which is maybe outdated at present date).
https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html
I will adapt the scenario described there to what I want to achieve.
Imagine that we have two types in our index: blog_t1 for blog posts
about Topic 1, and blog_t2 for blog posts about Topic 2. Both types
have a title field.
Then, I want to apply query boosting to the title field for blog_t1
only.
In previous versions of Elasticsearch, you could reference the field
from the type by using blog_t1.title and blog_t2.title. So boosting
one of them was as simple as blog_t1.title^2.
But since Elasticsearch 2.x, some old support for types have been removed (for good reasons, like removing ambiguity). Those changes are described here.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_mapping_changes.html
So my question is, how can I do that boosting for the title, just for the type blog_t1, and not blog_t2, with Elasticsearch 2.x, in a multi_match query?
The query would be something like this, but this obviously does not work as type.field is not a thing anymore.
GET /my_index/_search
{
"query": {
"multi_match": {
"query": "Hello World",
"fields": [
"blog_t1.title^2",
"blog_*.title",
"author",
"content"
]
}
}
}
FYI, the only solution I found so far is to give the titles different names, like title_boosted for blog_t1 and just title for the others, which is problematic when making use of the information, as I can no longer use the "title" as a unique thing.
Thanks.
What about adding another "optional" constraint for the document type so docs matching it have more score (you can tune it with boosting) like:
{
"query" : {
"bool" :
{
"must" :
[
{"match" : {"title" : "Hello world"}}
],
"should" :
[
{"match" : {"_type" : "blog_t1"}}
]
}
}
}
Or with score functions:
{
"query": {
"function_score": {
"query": {
"match": {
"title": "Hello world"
}
},
"boost_mode": "multiply",
"functions": [
{
"filter": {
"term": {
"_type": "blog_t1"
}
},
"weight": 2
},
{
"filter": {
"term": {
"_type": "blog_t2"
}
},
"weight": 3
}
]
}
}
}

Highlight not working along with term lookup filter

I'm new to elastic search and have started exploring it from the past few days. My requirement is to get the matched keywords highlighted.
So I have 2 indices
http://localhost:9200/lookup/type/1?pretty
Output
{
"_index" : "lookup",
"_type" : "type",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source":{"terms":["Apache
Storm","Kafka","MR","Pig","Hive","Hadoop","Mahout"]}
}
And another one as following:-
http://localhost:9200/skillsetanalyzer/resume/_search?fields=keySkills
output
{"took":19,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"skillsetanalyzer","_type":"resume","_id":"1","_score":1.0,"fields":{"keySkills":["Core
Java","J2EE","Struts 1.x","SOAP based
Web Services using JAX-WS","Maven","Ant","JMS","Apache
Storm","Kafka","RDBMS
(MySQL","Tomcat","Weblogic","Eclipse","Toad","TIBCO
product Suite (Administrator","Business
Work","Designer","EMS)","CVS","SVN"]}},
And below query returns the correct results but does not highlight the matched keywords.
curl -XGET 'localhost:9200/skillsetanalyzer/resume/_search?pretty' -d '
{
"query":
{"filtered":
{"filter":
{"terms":
{"keySkills":
{"index":"lookup",
"type":"type",
"id":"1",
"path":"terms"
},
"_cache_key":"1"
}
}
}
},
"highlight": {
"fields":{
"keySkills":{}
}
}
}'
Field "KeySkills" is not analyzed and its type is String. I'm not able to make out what is wrong with the
query.
Please help in providing the necessary pointers.
~Shweta
Highlighting works against the Query, you are just filtering the results. You need to specify highlight_query along with your filters like this
{
"query": {
"filtered": {
"filter": {
"terms": {
"keySkills": [
"MR","Pig","Hive"
]
}
}
}
},
"highlight": {
"fields": {
"keySkills": {
"highlight_query": {
"terms": {
"keySkills": [
"MR","Pig","Hive"
]
}
}
}
}
}
}
I hope this helps.

Resources