partial matching result in array removal - elasticsearch

Used the following query to search books with name 'new java ' or ' java new'
"bool":{
"must":[
{
"term":{
"title":{
"value":"new"
}
}
},
{
"term":{
"title":{
"value":"java"
}
}
}
]
}
It is giving the value exactly but showing duplicate records also what i mean is, It should not return the following as it has ' new' and 'java' which is in different index but this result also displaying
{
"_index":"book-lists",
"_type":"book-list",
"_id":"AVBRSvHIXb7carZwcePS",
"_version":1,
"_score":1,
"_source":{
"title":"Technology",
"books":[
{
"title":"Java",
"isRead":true,
"summary":"lorem ipsum",
"rating":3.5
},
{
"title":"java jsp",
"isRead":true,
"summary":"lorem ipsum",
"rating":3.5
},
{
"title":"new servlet",
"isRead":true,
"summary":"lorem ipsum",
"rating":3.5
}
],
"numberViews":0,
"idOwner":"17xxxxxxxxxxxx45"
}
}
Is it possible to avoid the match which is in different index of array.

Related

Why search performance is difference between from&size and search&after

There are hundreds of millions of documents in my index. When I search, I find that search&after is much slower than from&size。 Use from&size,search is quick,took several ms return,but use search after,it took 20 seconds。My search result is sort by time and key(a keyword copy of _id) ,Why?what's the difference?
search cmd:
{
"query":{
"bool":{
"filter":[
{
"query_string":{
"query":"*"
}
},
{
"range":{
"__time__":{
"gte":1324958207,
"lte":1724958207
}
}
}
]
}
},
"size":10,
"sort":[
{
"__time__":{
"order":"desc"
}
},
{
"__key__":{
"order":"desc"
}
}
],
"search_after":[
1630594662000,
"6130e666-2-67e9e3-f5-1"
],
"profile":true
}
profile:
{"searches":[
{
"query":[
{
"type":"BoostQuery",
"description":"(ConstantScore(DocValuesFieldExistsQuery [field=__time__]))^0.0",
"time_in_nanos":45722536283,
"breakdown":{
"set_min_competitive_score_count":0,
"match_count":0,
"shallow_advance_count":0,
"set_min_competitive_score":0,
"next_doc":45722241414,
"match":0,
"next_doc_count":410919487,
"score_count":0,
"compute_max_score_count":0,
"compute_max_score":0,
"advance":19517,
"advance_count":39,
"score":0,
"build_scorer_count":78,
"create_weight":14271,
"shallow_advance":0,
"create_weight_count":1,
"build_scorer":261081
},
"children":[
{
"type":"DocValuesFieldExistsQuery",
"description":"DocValuesFieldExistsQuery [field=__time__]",
"time_in_nanos":16571715415,
"breakdown":{
"set_min_competitive_score_count":0,
"match_count":0,
"shallow_advance_count":0,
"set_min_competitive_score":0,
"next_doc":16571493898,
"match":0,
"next_doc_count":410919487,
"score_count":0,
"compute_max_score_count":0,
"compute_max_score":0,
"advance":15074,
"advance_count":39,
"score":0,
"build_scorer_count":78,
"create_weight":517,
"shallow_advance":0,
"create_weight_count":1,
"build_scorer":205926
}
}
]
}
],
"rewrite_time":116538,
"collector":[
{
"name":"PagingFieldCollector",
"reason":"search_top_hits",
"time_in_nanos":30851166561
}
]
}
],
"aggregations":[
]
}
Because search_after use scroll approach for filtering which supposed to sort all the data before filtering, instead of from/size which only retrieved elements as a stream, and will be slower and hit memory as much as you go deeper with a from

Has Child join field issue

Good day:
I've setup a Parent/Child relationship model between Facility and FacilityType. Currently I'm trying to query the Facility and at the same time trying to load the children by using the HasChild query but, I'm getting the following error:
{
"aggs":{
"Capacity":{
"children":{
"type":"facilitytype"
},
"aggs":{
"Capacity":{
"histogram":{
"field":"capacity",
"interval":10.0,
"missing":0.0
}
}
}
},
"Distance":{
"histogram":{
"field":"businessLocation",
"interval":10.0,
"order":{
"_count":"desc"
}
}
}
},
"query":{
"bool":{
"should":[
{
"bool":{
"must":[
{
"geo_distance":{
"boost":1.1,
"distance":"200.0m",
"distance_type":"arc",
"businessLocation":{
"lat":38.958878299999988,
"lon":-77.365260499999991
}
}
},
{
"has_child":{
"_name":"FacilityType",
"type":"doc",
"query":{
"match_all":{
}
}
}
}
]
}
},
{
"geo_distance":{
"boost":1.1,
"distance":"200.0m",
"distance_type":"arc",
"serviceAreas":{
"lat":38.958878299999988,
"lon":-77.365260499999991
}
}
}
]
}
}
}
I'm getting this error when I execute the query:
[has_child] join field [joinField] doesn't hold [doc] as a child

Elasticsearch Sorting by Likes and Dislikes

I've been struggling to express the current logic problem I'm trying to solve with Elasticsearch, and I think I have a good way to represent it.
Let's say I'm building out an API to sort Mario Kart characters in order of the user's preference. The user can list characters they like, and those they dislike. Here is the data set:
{character: {name: "Mario", weight: "Light"}},
{character: {name: "Luigi", weight: "Medium"}},
{character: {name: "Peach", weight: "Light"}},
{character: {name: "Bowser", weight: "Heavy"}},
{character: {name: "Toad", weight: "Light"}},
{character: {name: "Koopa", weight: "Medium"}}
The user inputs that they like Mario and Luigi and do not like Bowser. With Elasticsearch, how could I go about sorting this data for the user so the list is returned like so:
[Mario (+), Luigi (+), Peach, Toad, Koopa, Bowser (-)]
*Pluses and minuses in there for legibility.
This would return the user's top choices in front, the ones they are OK with in the middle, and the ones they don't prefer at the end. Having to use nested queries really trips me up here.
Evolving the query, let's say there's a team mode where each team is comprised of pairs of two, determined by the game in the following pairs:
[Luigi (+), Bowser (-)]
[Mario (+), Peach]
[Toad, Koopa]
How to I ensure that I don't filter out teams that contain Bowser, yet still weight the results so that it's like so:
[Mario (+), Peach]
[Toad, Koopa]
[Luigi (+), Bowser (-)]
Or, should [Luigi, Bowser] actually rank second?
I'm very confused about building complex queries like these in Elasticsearch and would appreciate any help.
Depending on your mapping, something along the lines of
GET /characters/_search
{
"sort":[
"_score"
],
"query":{
"bool":{
"should":[
{
"constant_score":{
"filter":{
"term":{
"name.keyword":"Mario"
}
},
"boost":2.0
}
},
{
"constant_score":{
"filter":{
"term":{
"name.keyword":"Luigi"
}
},
"boost":2.0
}
},
{
"constant_score":{
"filter":{
"term":{
"name.keyword":"Peach"
}
},
"boost":1.0
}
},
{
"constant_score":{
"filter":{
"term":{
"name.keyword":"Toad"
}
},
"boost":1.0
}
},
{
"constant_score":{
"filter":{
"term":{
"name.keyword":"Koopa"
}
},
"boost":1.0
}
},
{
"constant_score":{
"filter":{
"term":{
"name.keyword":"Bowser"
}
},
"boost":0
}
}
]
}
}
}
should work.
PS: IF you have a nested mapping then surround the bool query with a nested query clause and adjust the field name paths. To return only the name field add _source clause before the query with path to name as value.
First off I gotta say - IMHO using Elasticsearch for this is major overkill. You should probably go with a much simpler in memory data structure for this calculation.
Assuming you do decide to implement this with Elasticsearch, I would do the following thing:
1) Represent each character as a document using this mapping -
PUT game/characters/_mapping
{
"properties": {
"name":{
"type": "keyword"
},
"weight": {
"type": "keyword"
}
}
}
2) Each character will look like so:
PUT game/characters/boswer
{
"name": "bowser",
"weight": "heavy"
}
3) And then you can fetch them ordered by likes similiarly to how #sramalingam24 suggested. Note that the boosts must non-negative, so you'd need to "normalize" the likeability of the characters to a range above zero:
GET game/characters/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"term": {
"name": "Peach"
}
},
"boost": 2
}
},{
"constant_score": {
"filter": {
"term": {
"name": "Mario"
}
},
"boost": 2
}
},{
"constant_score": {
"filter": {
"term": {
"name": "Toad"
}
},
"boost": 1
}
},{
"constant_score": {
"filter": {
"term": {
"name": "Bowser"
}
},
"boost": 0
}
},
]
}
}
}
Good luck!

Elasticsearch Prefix Exact Match

i have text fields like above
elastic|b|c
elastic,search|b|c
elastic,search,prefix|b|c
I want to query on this string with prefix. And the query is
aggs":{
"field":{
"filter":{
"match":{
"field":{
"type":"prefix",
"query":"elastic|"
}
}
},
"aggs":{
"field":{
"terms":{
"field":"textField",
"size":255
}
}
}
}
}
},
"
and this query return all texts below in the example.
Do i need extra analyzer or token filter on texts?
How can i exact match search with prefix on elastic ?
you can achieve that by using wildcards in elasticsearch.
{
"query": {
"wildcard": {
"textField": {
"value": "elastic*"
}
}
}
}

Mixed filters, using OR as well as AND, in ElasticSearch

In your opinion what would be the best way to do the following?
I want to filter an ElasticSearch query by several ranges that are grouped in an OR filter, and then by one final range that needs to be included as an AND filter. The explanation is a bit crappy but hopefully the pseudo-code below will help...
Basically I tried structuring the following query:
{
"query":{
"multi_match":{
"query":"blue",
"fields":[
"name"
]
}
},
"sort":{
"_score":{
"order":"desc",
"missing":"_last"
}
},
"from":"0",
"size":"24",
"facets":{
"rating":{
"range":{
"field":"rating",
"ranges":[
{
"from":1
},
{
"from":2
},
{
"from":3
},
{
"from":4
}
]
}
},
"price":{
"range":{
"field":"price",
"ranges":[
{
"to":10
},
{
"from":10,
"to":100
},
{
"from":100,
"to":1000
}
{
"from":1000
}
]
}
}
},
"filter":{
"or":[
{
"range":{
"price":{
"from":"10",
"to":"100"
}
}
},
{
"range":{
"price":{
"from":"100",
"to":"1000"
}
}
}
],
"and":{
"numeric_range":{
"rating":{
"gte":"4"
}
}
}
}
}
This failed with the error that there was "No parser for element [numeric_range]". So I tried replacing:
"and":{
"numeric_range":{
"rating":{
"gte":"4"
}
}
}
with:
"numeric_range":{
"rating":{
"gte":"4"
}
}
The query now returns results but it's returning results with prices in the ranges 10-100, 100-1000 and ANY results with a rating greater than 4 (even if their price is outside of the defined range).
Any clues on how I could do this query? Do I need to be using a bool filter?
Ah ha, figured it out, with the help of Boaz Leskes over on the ElasticSearch mailing list!
It should be structured like this:
filter: {
bool: {
must: [
{
"numeric_range":{
"rating":{
"gte":"4"
}
}
}
],
should: [
{
"range":{
"price":{
"from":"10",
"to":"100"
}
}
},
{
"range":{
"price":{
"from":"100",
"to":"1000"
}
}
}
]
}
}

Resources