Highlight not working along with term lookup filter - elasticsearch

I'm new to elastic search and have started exploring it from the past few days. My requirement is to get the matched keywords highlighted.
So I have 2 indices
http://localhost:9200/lookup/type/1?pretty
Output
{
"_index" : "lookup",
"_type" : "type",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source":{"terms":["Apache
Storm","Kafka","MR","Pig","Hive","Hadoop","Mahout"]}
}
And another one as following:-
http://localhost:9200/skillsetanalyzer/resume/_search?fields=keySkills
output
{"took":19,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"skillsetanalyzer","_type":"resume","_id":"1","_score":1.0,"fields":{"keySkills":["Core
Java","J2EE","Struts 1.x","SOAP based
Web Services using JAX-WS","Maven","Ant","JMS","Apache
Storm","Kafka","RDBMS
(MySQL","Tomcat","Weblogic","Eclipse","Toad","TIBCO
product Suite (Administrator","Business
Work","Designer","EMS)","CVS","SVN"]}},
And below query returns the correct results but does not highlight the matched keywords.
curl -XGET 'localhost:9200/skillsetanalyzer/resume/_search?pretty' -d '
{
"query":
{"filtered":
{"filter":
{"terms":
{"keySkills":
{"index":"lookup",
"type":"type",
"id":"1",
"path":"terms"
},
"_cache_key":"1"
}
}
}
},
"highlight": {
"fields":{
"keySkills":{}
}
}
}'
Field "KeySkills" is not analyzed and its type is String. I'm not able to make out what is wrong with the
query.
Please help in providing the necessary pointers.
~Shweta

Highlighting works against the Query, you are just filtering the results. You need to specify highlight_query along with your filters like this
{
"query": {
"filtered": {
"filter": {
"terms": {
"keySkills": [
"MR","Pig","Hive"
]
}
}
}
},
"highlight": {
"fields": {
"keySkills": {
"highlight_query": {
"terms": {
"keySkills": [
"MR","Pig","Hive"
]
}
}
}
}
}
}
I hope this helps.

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

I need to add a parameter to my search that filters results containing a specific word in a value. The query is searching for user history records and contains a url key. I need to filter out /history and any other url containing that string.
Here's my current query:
GET /user_log/_search
{
"size" : 50,
"query": {
"match": {
"user_id": 56678
}
}
}
Here's an example of a record, boiled down to just the value we're looking at:
"_source": {
"url": "/history?page=2&direction=desc",
},
How can the parameters of the search be changed to filter out this result.
You can use the filter param of boolean query in Elasticsearch.
if your url field is of type keyword, you can use the below query
{
"query": {
"bool": {
"must": {
"match": {
"user_id": 56678
}
},
"filter": { --> note filter
"term": {
"url": "/history"
}
}
}
}
}
I found a way to solve my specific issue. Instead of filtering on the url I'm filtering on a different value. Here's what I'm using now:
{
"size" : 50,
"query": {
"bool" : {
"must" : {
"match" : { "user_id" : 56678 }
},
"must_not": {
"match" : { "controller": "History" }
}
}
}
}
I'm still going to leave this question open for a while to see if anyone has other ways of solving the original problem.

How to get the best matching document in Elasticsearch?

I have an index where I store all the places used in my documents. I want to use this index to see if the user mentioned one of the places in the text query I receive.
Unfortunately, I have two documents whose name is similar enough to trick Elasticsearch scoring: Stockholm and Stockholm-Arlanda.
My test phrase is intyg stockholm and this is the query I use to get the best matching document.
{
"size": 1,
"query": {
"bool": {
"should": [
{
"match": {
"name": "intyig stockholm"
}
}
],
"must": [
{
"term": {
"type": {
"value": "4"
}
}
},
{
"terms": {
"name": [
"intyg",
"stockholm"
]
}
},
{
"exists": {
"field": "data.coordinates"
}
}
]
}
}
}
As you can see, I use a terms query to find the interesting documents and I use a match query in the should part of the root bool query to use scoring to get the document I want (Stockholm) on top.
This code worked locally (where I run ES in a container) but it broke when I started testing on a cluster hosted in AWS (where I have the exact same dataset). I found this explaining what happens and adding the search type argument actually fixes the issue.
Since the workaround is best not used on production, I'm looking for ways to have the expected result.
Here are the two documents:
// Stockholm
{
"type" : 4,
"name" : "Stockholm",
"id" : "42",
"searchableNames" : [
"Stockholm"
],
"uniqueId" : "Place:42",
"data" : {
"coordinates" : "59.32932349999999,18.0685808"
}
}
// Stockholm-Arlanda
{
"type" : 4,
"name" : "Stockholm-Arlanda",
"id" : "1832",
"searchableNames" : [
"Stockholm-Arlanda"
],
"uniqueId" : "Place:1832",
"data" : {
"coordinates" : "59.6497622,17.9237807"
}
}

Search for documents with exactly different fields values

I'm adding documents with the following strutucte
{
"proposta": {
"matriculaIndicacao": 654321,
"filial": 100,
"cpf": "12345678901",
"idStatus": "3",
"status": "Reprovada",
"dadosPessoais": {
"nome": "John Five",
"dataNascimento": "1980-12-01",
"email": "fulanodasilva#fulano.com.br",
"emailValidado": true,
"telefoneCelular": "11 99876-9999",
"telefoneCelularValidado": true,
"telefoneResidencial": "11 2211-1122",
"idGenero": "1",
"genero": "M"
}
}
}
I'm trying to perform a search with multiple field values.
I can successfull search for a document with a specific cpf atribute with the following search
{
"query": {
"term" : {
"proposta.cpf" : "23798770823"
}
}
}
But now I need to add an AND clause, like
{
"query": {
"term" : {
"proposta.cpf" : "23798770823"
,"proposta.dadosPessoais.dataNascimento": "1980-12-01"
}
}
}
but it's returning an error message.
P.S: If possible I would like to perform a search where if the field doesn't exist, it returns the document that matches only the proposta.cpf field.
I really appreciate any help.
The idea is to combine your constraints within a bool/should query
{
"query": {
"bool": {
"should": [
{
"term": {
"proposta.cpf": "23798770823"
}
},
{
"term": {
"proposta.dadosPessoais.dataNascimento": "1980-12-01"
}
}
]
}
}
}

Elasticsearch 5.1: applying additional filters to the "more like this" query

Building a search engine on top of emails. MLT is great at finding emails with similar bodies or subjects, but sometimes I want to do something like: show me the emails with similar content to this one, but only from joe#yahoo.com and only during this date range. This seems to have been possible with ES 2.x, but it seems that 5.x doesn't allow allow filtration on fields other than that being considered for similarity. Am I missing something?
i still can't figure how to do what i described. Imagine I have an index of emails with two types for the sake of simplicity: body and sender. I know now to find messages that are restricted to a sender, the posted query would be something like:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"sender": "mike#foo.com"
}
}
]
}
}
}
}
}
Similarly, if I wish to know how to find messages that are similar to a single hero message using the contents of the body, i can issue a query like:
{
"query": {
"more_like_this": {
"fields" : ["body"],
"like" : [{
"_index" : "foo",
"_type" : "email",
"_id" : "a1af33b9c3dd436dabc1b7f66746cc8f"
}],
"min_doc_freq" : 2,
"min_word_length" : 2,
"max_query_terms" : 12,
"include" : "true"
}
}
}
both of these queries specify the results by adding clauses inside the query clause of the root object. However, any way I try to put these together gives me parse exceptions. I can't find any examples of documentations that would say, give me emails that are similar to this hero, but only from mike#foo.com
You're almost there, you can combine them both using a bool/filter query like this, i.e. make an array out of your filter and put both constraints in there:
{
"query": {
"bool": {
"filter": [
{
"term": {
"sender": "mike#foo.com"
}
},
{
"more_like_this": {
"fields": [
"body"
],
"like": [
{
"_index": "foo",
"_type": "email",
"_id": "a1af33b9c3dd436dabc1b7f66746cc8f"
}
],
"min_doc_freq": 2,
"min_word_length": 2,
"max_query_terms": 12,
"include": "true"
}
}
]
}
}
}

Source field in not been shown while using the following query

Im using the following script fields query. It is getting me the score I wanted to have , but not the _source field. How can I solve the problem?. Here is the query Iam running.
{
"terms": {
"closing": ["wed"
]
}
}
"script_fields": {
"index": {
"script": "doc['collection'].value / doc['people'].value"
}
}
}
The issue here is when using script_fields, the response will not include "_source" by default. You need to specify it explicitly in the query. Modify your query like below and see if you are getting the results as expected
{
"terms": {
"closing": ["wed"
]
}
},
"fields": [
"_source"
],
"script_fields": {
"my_score": {
"script": "doc['collection'].value / doc['people'].value"
}
}
}

Resources