ElasticSearch Query DSL Combine Terms and Wildcard - elasticsearch

I have to distinct queries which are working well enough alone:
{"wildcard":{"city":"*Beach*"}}
{"terms":{"state":["Florida","Georgia"]}}
but trying to combine them into one query is proving to be quite the challenge.
I had thought just doing simply {{"wildcard":{"city":"*Beach*"}},{"terms":{"state":["Florida","Georgia"]}}} would do it, but it does not. So then I tried a few different iterations using arrays, and bool queries etc. Can someone point me in the correct direction?

Bool query should be the right way to go.
Below is an example for your use case:
{
"query": {
"bool": {
"must": [
{
"wildcard": { "city": "*Beach*" }
},
{
"terms": {
"state": [ "Florida", "Georgia" ]
}
}
]
}
}
}
If there is not result, it means that there is no entry matching both of the criteria.

Related

How does the flow works in elasticsearch queries?

I have written a query which has couple of condition as shown below.
GET /agreement/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "T-0668",
"fields": [
"agreecondition.agreementId",
"agreecondition.conditionContractId"
]
}
},
,
{
"range": {
"agreecondition.validFrom": {
"gte": "02/18/2019"
}
}
},
{
"range": {
"agreecondition.validTo": {
"lte": "03/07/2019"
}
}
}
],
"filter": [
{
"terms": {
"agreecondition.promotionId.keyword": [
"x",
"y"
]
}
}
]
}
}
}
My question is how the flow works?
Ex: Does the ES first gets the results for the must condition's multi-match and on the output of the multi-match, does the range condition applies? followed by filter(on top of the output of the range condition)?
I just wanted to get a clarity on this, if my assumption is wrong, then i need to re-write the query.
You can check elasticsearch official blog on query execution order to understand this in details but you might just not get all the details you are looking for, due to limitation elastic put as mentioned at the end of the blog:
Q: How can I check which query/filter got executed first?
A: We don't really expose this information, which is very internal. However if you
check the output of the profile API, you can count how many times
nextDoc/advance have been called on the one hand, and matches on the
other hand. Query nodes that have the higher counts have been run
first.
Note: Profile API will be very handful for you as suggested in the blog as well.

Difference between elasticsearch queries

I'm having a hard time trying to figure out why these two queries do not return the same number of results (I'm using elasticsearch 2.4.1):
{
"nested": {
"path": "details",
"filter": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
{
"nested": {
"path": "details",
"filter": {
"bool": {
"must": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
}
}
The first query has more results.
My guess was that the filter clause in the first query was working like an or/should, but if I replace the must in the second query with a should, the query yields a greater number of results than that of those two.
How does the meaning of those queries differ?
I'm afraid I have no knowledge of the structure of the indexed documents; all I know is how many rows each query returns.
The first query is wrong, the nested filter cannot be an array, so I suspect ES doesn't parse it correctly and only takes one match instead of both, which is probably why it returns more data than the second one.
The second query is correct in terms of nested filter and yields exactly what you expect.

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Elasticsearch returning distinct results

Here is my ES query:
{
"fields": [
"news.authorname.raw",
"news.authorid"
],
"query": {
"filtered": {
"filter": {
"terms": {
"news.authorid": [
1,
2
]
}
}
}
}
}
With this query I get a list of pairs {authorid, authorname}. This list has repeated {authorid, authorname} values and I just need to get the same list but with no repetitions. This seems not that difficult or at least that is what I thought this morning. My small knowledge of ES together with the lack of documentation is making me desperate to find a solution to such a trivial problem.
Of course I could get the whole list and remove repetitions through code, but if it was possible I would prefere not to receive unnecessary data to have it removed afterwards.
Anyone can give a hand on that? Should I use some other approach?
Thanks in advance!!
I would suggest to use source filtering:
{
"_source": [ "news.authorname.raw", "news.authorid" ],
"query": {
"filtered": {
"filter": {
"terms": {
"news.authorid": [
1,
2
]
}
}
}
}
}
It is generally easier to handle than fields, which sometimes do look like a cartesian product.

Can _score from different queries be compared?

In my application, I issue multiple queries, each of which to a different index. Then, I merge the results from these queries, and sort them using the _score attribute, in order to rank them according to their relavance. But I wonder if this makes sense at all, since the results came from different queries?
I guess my question is: can _scores from different queries be compared?
Instead of issuing multiple queries , it would be a good idea to club them together in a single query.
You can use index query to do index specefic operation.
So something like
{
"bool": {
"should": [
{
"indices": {
"indices": [
"index1"
],
"query": {
"term": {
"tag": "wow"
}
}
}
},
{
"indices": {
"indices": [
"index2"
],
"query": {
"term": {
"name": "laptop"
}
}
}
}
]
}
}
Once this is done , results would be sorted based on the _score.
Hope that helps.

Resources