Get matched keywords while searching on an analysed field - elasticsearch

Is there a way to get only the matched keywords while searching on an analysed field. My case is I have a 'content' field (string analysed) against which a query is run like this:
GET /posts/post/_search?pretty=true
{
"query": {
"query_string": {
"query": "content:(obama or hilary)"
}
},
"fields": ["id", "interaction_id", "sentiment", "tweet_created_at", "content"]
}
I get output like this:
"hits": [
{
"_index": "posts_v1",
"_type": "post",
"_id": "51764639fdccca097f03d095",
"_score": 2.024847,
"fields": {
"content": "UGANDA HILARY",
"id": "51764639fdccca097f03d095",
"sentiment": 0,
"tweet_created_at": "2012-11-24T14:59:25Z",
"interaction_id": "1e236478961ca480e0744001f05ca8b8"
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c2bae26c8f1806cb000001",
"_score": 1.9791828,
"fields": {
"content": "Obama in Berlin — looking back",
"id": "51c2bae26c8f1806cb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-20T08:18:39Z",
"interaction_id": "1e2d98202c55a980e07493a024172cb6"
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c3a6b06c8f185fcb000001",
"_score": 1.7071226,
"fields": {
"content": "Knowing Barack Obama, Hilary Clintonr",
"id": "51c3a6b06c8f185fcb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-21T01:04:45Z",
"interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72"
}
}
]
So, I need to have something like this:
"hits": [
{
"_index": "posts_v1",
"_type": "post",
"_id": "51764639fdccca097f03d095",
"_score": 2.024847,
"fields": {
"content": "UGANDA HILARY",
"id": "51764639fdccca097f03d095",
"sentiment": 0,
"tweet_created_at": "2012-11-24T14:59:25Z",
"interaction_id": "1e236478961ca480e0744001f05ca8b8",
"content_tags": ["hilary"]
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c2bae26c8f1806cb000001",
"_score": 1.9791828,
"fields": {
"content": "Obama in Berlin — looking back",
"id": "51c2bae26c8f1806cb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-20T08:18:39Z",
"interaction_id": "1e2d98202c55a980e07493a024172cb6",
"content_tags": ["obama"]
}
},
{
"_index": "posts_v1",
"_type": "post",
"_id": "51c3a6b06c8f185fcb000001",
"_score": 1.7071226,
"fields": {
"content": "Knowing Barack Obama, Hilary Clintonr",
"id": "51c3a6b06c8f185fcb000001",
"sentiment": 0,
"tweet_created_at": "2013-06-21T01:04:45Z",
"interaction_id": "1e2da0e8fb5fa480e07407b3fa87ab72",
"content_tags": ["obama", "hilary"]
}
}
]
Please note the content_tags field in the second hits structure. Is there a way to acheive this?

Elasticsearch doesn't support returning which terms matched which field directly though I think it could implement one reasonably easily as an additional "highlighter". I think you have two options at this point:
Do something hacky with highlighting like asking for the text length to be the max(all_strings.map(strlen).max, min_highlight_length), strip the text that isn't highlighted, and dedupe. I believe min_highlight_length is 13 characters or something. That might only apply to the FVH, which I don't suggest you use, so maybe you can ignore that.
Do two searches either via multisearch or sequentially.

Related

Elasticsearch - Delete query among nested object

I'm new to Elasticsearch, and I cannot find a Delete query.
Here is an example of an document in myIndex :
{
"_index": "myIndex",
"_type": "_doc",
"_id": "IPc5kn8Bq7SuVr5qM9dq",
"_score": 1,
"_source": {
"code": "1234567",
"matches": [
{
"hostname": "hostnameA.com",
"url": "https://www.hostnameA.com/....",
},
{
"hostname": "hostnameB.com",
"url": "https://www.hostnameB.com/....",
},
{
"hostname": "hostnameC.com",
"url": "https://www.hostnameC.com/....",
},
{
"hostname": "hostnameD.com",
"url": "https://www.hostnameD.com/....",
},
]
}
}
Let's say this index contains 10k documents.
I would like a query to remove all the item from my array matches where the hostname is equal to hostnameC.com, and keeping all the others.
Anyone would have an idea to help me?

Difference between match vs wild card query

What is the difference between the Match and Wild card query? If the requirement is to search a combination of words in a paragraph or log which approach is better?
Match query is used to find all those documents that have the exact search term (ignore the case), whereas Wildcard query returns the documents that contain the search term.
Adding a working example
Index Data:
{
"name":"breadsticks with soup"
}
{
"name":"multi grain bread"
}
Search Query using Match query:
{
"query": {
"match": {
"name": "bread"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"name": "multi grain bread"
}
}
]
Search Query using wildcard query:
{
"query": {
"wildcard": {
"name": "*bread*"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "multi grain bread"
}
},
{
"_index": "67706115",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "breadsticks with soup"
}
}
]

Elastic search negate phrase and words in simple query string

I'm trying to negate some words and phrases in an Elastic Search request using the simple query string.
This is what I do:
&q=-"the witcher 3"-game-novel
So basically, trying to negate a phrase AND the words after it. But that doesn't seem to work.
If I try to negate the words alone it works.
How can I negate phrases and sentences in a simple query string?
Adding a working example with index data,search query, and search result.
Index Data:
{
"name":"test"
}
{
"name":"game"
}
{
"name":"the witcher"
}
{
"name":"the witcher 3"
}
{
"name":"the"
}
Search Query:
{
"query": {
"simple_query_string" : {
"query": "-(game | novel) -(the witcher 3)",
"fields": ["name"],
"default_operator": "and"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "4",
"_score": 2.0,
"_source": {
"name": "the"
}
},
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"name": "the witcher"
}
},
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "1",
"_score": 2.0,
"_source": {
"name": "test"
}
}
]

match query returns exact values only in elasticsearch

I have following documents:
{
"_index": "testrest",
"_type": "testrest",
"_id": "sadfasdfw1",
"_score": 1,
"_source": {,
"s_item_name": "Create",
"request_id": "35",
"confidence": "0.5",
}
},
{
"_index": "testrest",
"_type": "testrest",
"_id": "asdfds",
"_score": 1,
"_source": {,
"s_item_name": "Update",
"request_id": "35",
"confidence": "0.3333",
}
},
I am trying to get results for request_id of 35 and their confidence values.
For eg. if input is only 0. then both results should be displayed.
And if input is 0.5 then only first doc., and if 0.3 only second doc.
Here's what I tried:
{
"query": {
"bool": {
"must": [
{ "match": { "confidence_score": "0.33" }}
],
"filter": {
"term": {
"request_id": "35"
}
}
}
}
}
This gives 0 results. Since it requires exact values only, like 0.5 or 0.3333.
I thought match works for this instead of term.
How do I make the query similar to LIKE operator in SQL?
For like I should suggest you have a look at wildcard, prefix or match_phrase type in elastic search or if you are using the latest version you can write SQL statement using ES plugin.

Is it possible to perform user count / cardinality with logical relationship in ElasticSearch?

I have documents of Users with the following format:
{
userId: "<userId>",
userAttributes: [
"<Attribute1>",
"<Attribute2>",
...
"<AttributeN>"
]
}
I want to be able to get the number of unique users that answer a logic statement, for example How many users have attribute1 AND attribute2 OR attribute3?
I've read about the cardinality function in cardinality-aggregation but it seems to work for a single value, lacking the logic abilities of "AND" and "OR".
Note that I have around 1,000,000,000 documents and I need the results as fast as possible, this why I was looking at the cardinality estimation.
What about this attempt, considering the userAttributes as a simple array of strings (analyzed in my case, but single lowercase terms):
POST /users/user/_bulk
{"index":{"_id":1}}
{"userId":123,"userAttributes":["xxx","yyy","zzz"]}
{"index":{"_id":2}}
{"userId":234,"userAttributes":["xxx","yyy","aaa"]}
{"index":{"_id":3}}
{"userId":345,"userAttributes":["xxx","yyy","bbb"]}
{"index":{"_id":4}}
{"userId":456,"userAttributes":["xxx","ccc","zzz"]}
{"index":{"_id":5}}
{"userId":567,"userAttributes":["xxx","ddd","ooo"]}
GET /users/user/_search
{
"query": {
"query_string": {
"query": "userAttributes:(((xxx AND yyy) NOT zzz) OR ooo)"
}
},
"aggs": {
"unique_ids": {
"cardinality": {
"field": "userId"
}
}
}
}
which gives the following:
"hits": [
{
"_index": "users",
"_type": "user",
"_id": "2",
"_score": 0.16471066,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"aaa"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "3",
"_score": 0.04318809,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"bbb"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "5",
"_score": 0.021594046,
"_source": {
"userAttributes": [
"xxx",
"ddd",
"ooo"
]
}
}
]

Resources