Counting the SEARCH term/phrase in a specific field in Elasticsearch - elasticsearch

I have this type of data
{
"name_id": 2145
"address": "Antartica"
"characteristics" : "He is a very nice person with very nice personality. the nicest thing about him is his nice dog"
}
now I am running this query
GET friends/_search
{
"query": {
"bool": {
"must": [
{"term": {
"name_id.keyword": "B08F2BWX2V"
}
},
{
"match_phrase": {
"characteristics": "nice"
}
}
]
}
}
}
is there a way i can get the results and the word count i.e,
nice : 4

There is an elastic api that can return the token count information you need.
It is the Term vectors API.
I'm not sure if it will be exactly what you need but I saw in the post below a question similar to yours:
https://stackoverflow.com/a/69734423/18778181

Related

Match query fuzzily to an array of candidates

I have an index in elastic with the following document structure:
{
"questions": [
"What is your name?",
"How are you called?",
"What should I call you?",
...
],
"answer": "<answer>"
}
I would like to match queries to one of the entries in the questions array.
For example the query "What's your name"?
The returning document should be the one with the closest matching entry of questions in all the documents in the index.
I have tried:
{
"query": {
"match": { "questions": { "query": "<question>", "fuzziness": "auto" } },
}
}
But that sometimes returns a "wrong" document, even if the query is one of the entries of questions in one of the documents exactly.
I've also tried
{
"query": {
"match_phrase": { "questions": "<query>" },
}
}
But that doesn't allow fuzziness, and since the queries are human inputs, it's not catching enough cases
And lastly I tried
{
"query": {
"span_near": [
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<first word of the query>" },
}
}
},
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<second word of the query>" },
}
}
},
...
]
}
}
But that (at least as far as I seem to notice) only matches questions exactly with fuzzy words.
What I would like (at least as far as I understand), is a fuzzy TF-IDF across all entries of questions, get the best match and then rank the documents according to the best matches of one of the entries of questions (not the entirety of the questions array)
I'm a pretty inexperienced novice when it comes to Elastic, so I appreciate any tips and tricks or outright solutions you might have for me, thank you!

ElasticSearch Query DSL Combine Terms and Wildcard

I have to distinct queries which are working well enough alone:
{"wildcard":{"city":"*Beach*"}}
{"terms":{"state":["Florida","Georgia"]}}
but trying to combine them into one query is proving to be quite the challenge.
I had thought just doing simply {{"wildcard":{"city":"*Beach*"}},{"terms":{"state":["Florida","Georgia"]}}} would do it, but it does not. So then I tried a few different iterations using arrays, and bool queries etc. Can someone point me in the correct direction?
Bool query should be the right way to go.
Below is an example for your use case:
{
"query": {
"bool": {
"must": [
{
"wildcard": { "city": "*Beach*" }
},
{
"terms": {
"state": [ "Florida", "Georgia" ]
}
}
]
}
}
}
If there is not result, it means that there is no entry matching both of the criteria.

Elasticsearch query to search two word combination with space

I have a elasticsearch query to search the data based on name.
My query is
$http.post(elasticSearchURL,{ "filter": { "and": [{ "term": { "Name": "allan" } } ] } })
The above query works fine for single word search but when I give two words with space it doesn't picks any data for it.
My query is not working for below scenario.
{ "filter": { "and": [{ "term": { "Name": "allan edward" } } ] } }
I dont know what keyword should I have to append to satisfy my search scenario.
Thanks in advance
Phrase match query is what you are looking for.
A query like below should work fine -
{
"query": {
"match_phrase": {
"title": "allan edward"
}
}
}

Elasticsearch complex proximity query

Given that I have a query like below:
council* W/5 (tip OR tips)
The above query can be translated as: Find anything that has council* and (tip OR tips) no more than 5 words apart.
So following text will match:
Shellharbour City Council Tip
council best tip
councils top 10 tips
But this one should not match:
... City Council at Shellharbour. There is not any good tip at all.
I need help to build an elasticsearch query for that. I was thinking about Regex query but I'm not quite sure about better alternatives. Thanks
You can use a combination of the span_near query, span_multi and span_or. We can use the query below to perform the same search.
{
"query": {
"span_near": {
"clauses": [
{
"span_multi":
{
"match":
{
"prefix": { "text": "council"}
}
}
},
{
"span_or": {
"clauses": [
{
"span_term": {
"text": {
"value": "tip"
}
}
},
{
"span_term": {
"text": {
"value": "tips"
}
}
}
]
}
}
],
"slop": 5,
"in_order": true
}
}
}
The important things to look out for are the span_term which is the text your searching for. In this example I only had one field called "text". Slop indicates the number of words we will allow between the terms, and in_order indicates that the order of words is important. So "tip council" will not match, where as "council tip" will.

ElasticSearch Bool Filter with a Phrase (instead of a single word/tag)

In elastic search, this filter
{
"bool": {
"must": {
"term": {
"article.title": "google"
}
}
}
}
Properly returns articles with "google" in the title.
However,
{
"bool": {
"must": {
"term": {
"article.title": "google earth"
}
}
}
}
Does not return any results, despite the fact that there are articles with the exact words "google earth" in the title. I would like it to do so.
The full query:
{
"size": 200,
"filter": {
"bool": {
"must": {
"term": {
"article.title": "google maps"
}
}
}
},
{
"range": {
"created_date": {
"from": "2013-01-11T02:14:03.352Z"
}
}
}]
}
}
As you can see, I don't have a "query" -- just a filter, size, and range. So I take it that ElasticSearch is using the default analyzer...?
What am I misunderstanding?
EDIT: For those looking for the solution, here is my filter:
{
"query": {
"bool": {
"must": {
"must_match": {
"article.title": "google earth"
}
}
}
}
}
Node that (1) we wrapped the bool filter with "query" and (2) the "term" changed to a "must_match", which causes the entire phrase to be matched (as opposed to "match" which would search the article.title with a standard analyzer on google earth).
The full query looks like this:
{
"size": 200,
"filter": {
"query": {
"bool": {
"must": {
"must_match": {
"article.title": "google earth"
}
}
}
}
}
}
FWIW, the reason I have this condition within the "filter" field (as opposed to using a standard query) is that sometimes I want to use a "must_not" instead of a "must_not", and sometimes I also add other elements to the query.
Elasticsearch isn't using an analyzer at all, because you have used the term query, which looks for exact terms.
Your title field IS analyzed (unless you have specified otherwise), so "google earth" will have been indexed as the two terms ["google","earth"]. That's why the term query for "google" works, but the term query for "google earth" doesn't - that EXACT term does not exist.
If you use a match query instead, then your query terms will be analyzed before searching.
Using Elasticsearch 5.4.2., my solution evolved to be the following one:
{"query": {
"bool": {
"must": {
"match_phrase": {
"article.title": "google earth"}}}}}
Hope this helps someone.
For those stumbling upon this more recently, be advised that a more concise way to represent
{"query":{"bool":{"must":{"must_match":{"article.title":"google earth"}}}}}
is with
{"query":{"match_phrase":{"article.title":"google earth"}}}
I solved this by exploding the passed phrase, so just changing.
{"bool":{"must":{"term":{"article.title":"google earth"}}}}
to
{"bool":{"must":{"term":{"article.title":["google", "earth"]}}}}
It's not pretty and might be too slow if you have a lot of queries going on, but it works.
NOTE, I just found out this will also return any results with either "google" or "earth".

Resources