Why is elasticsearch giving me results that don't match my query - elasticsearch

I'm trying to make sure that only documents where "relationship_type":"group" is returned but why is "relationship_type: "event" being returned as well with a score similar to "relationship_type":"group"? Also why isn't my source filtering working?
My request on dev-tools
POST get-together/_search?size=5
{
"query": {
"match": { "relationship_type": "group" }
},
"fields": ["relationship_type"],
"_source": false
}
The response, note that I had to put a limit on the size, otherwise it was returning everything for some reason. My source isnt being filtered and the last document doesn't match my query
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 20,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"relationship_type" : "group",
"name" : "Denver Clojure",
"organizer" : [
"Daniel",
"Lee"
],
"description" : "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",
"created_on" : "2012-06-15",
"tags" : [
"clojure",
"denver",
"functional programming",
"jvm",
"java"
],
"members" : [
"Lee",
"Daniel",
"Mike"
],
"location_group" : "Denver, Colorado, USA"
}
},
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"relationship_type" : "group",
"name" : "Elasticsearch Denver",
"organizer" : "Lee",
"description" : "Get together to learn more about using Elasticsearch, the applications and neat things you can do with ES!",
"created_on" : "2013-03-15",
"tags" : [
"denver",
"elasticsearch",
"big data",
"lucene",
"solr"
],
"members" : [
"Lee",
"Mike"
],
"location_group" : "Denver, Colorado, USA"
}
},
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"relationship_type" : "group",
"name" : "Elasticsearch San Francisco",
"organizer" : "Mik",
"description" : "Elasticsearch group for ES users of all knowledge levels",
"created_on" : "2012-08-07",
"tags" : [
"elasticsearch",
"big data",
"lucene",
"open source"
],
"members" : [
"Lee",
"Igor"
],
"location_group" : "San Francisco, California, USA"
}
},
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"relationship_type" : "group",
"name" : "Enterprise search London get-together",
"organizer" : "Tyler",
"description" : "Enterprise search get-togethers are an opportunity to get together with other people doing search.",
"created_on" : "2009-11-25",
"tags" : [
"enterprise search",
"apache lucene",
"solr",
"open source",
"text analytics"
],
"members" : [
"Clint",
"James"
],
"location_group" : "London, England, UK"
}
},
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "100",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"relationship_type" : {
"name" : "event",
"parent" : "1"
},
"host" : [
"Lee",
"Troy"
],
"title" : "Liberator and Immutant",
"description" : "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.",
"attendees" : [
"Lee",
"Troy",
"Daniel",
"Tom"
],
"date" : "2013-09-05T18:00",
"location_event" : {
"name" : "Stoneys Full Steam Tavern",
"geolocation" : "39.752337,-105.00083"
},
"reviews" : 4
}
}
]
}
}
This is what my mapping for the relationship_typefield looks like

You need to remove the empty line that is between the POST and the JSON query otherwise the query is not taken into account.
In Dev Tools, it should look like this:
POST get-together/_search?size=5
{ <---- no empty line here
"query": {
"match": { "relationship_type": "group" }
},
"fields": ["description"],
"_source": false
}

Related

I need help for a query elasticsearch

I need help for a query.
This is my query and my sample :
GET /product/_search
{
"query": {
"bool" : {
"must" : {
"multi_match" : {
"query": "Torsades",
"fields": [ "ean^10", "name^4", "brand" ]
}
}
}
}
}
[
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "1",
"_score" : 13.78764,
"_source" : {
"country" : 1,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G",
"brand" : "Fiorini"
}
},
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "74",
"_score" : 13.78764,
"_source" : {
"country" : null,
"ean" : "3564700009826",
"name" : "Pâtes Torsades - Turini - 500 g",
"brand" : "Turini"
}
},
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "78",
"_score" : 11.964245,
"_source" : {
"country" : null,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G - ITM BENCHMARK",
"brand" : "Fiorini"
}
}
]
I want a condition specific and I can't find the solution :
I want :
ALL products for country=1 AND (ALL products for country=null MINUS product.ean IN country=1)
In my sample, I want have 2 hits :
THIS is deleted because EAN in country=1 :
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "78",
"_score" : 11.964245,
"_source" : {
"country" : null,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G - ITM BENCHMARK",
"brand" : "Fiorini"
}
}
Someone have a solution ?
EDIT :
I want this result :
[
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "1",
"_score" : 13.78764,
"_source" : {
"country" : 1,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G",
"brand" : "Fiorini"
}
},
{
"_index" : "product_2022-05-13-194440",
"_type" : "_doc",
"_id" : "74",
"_score" : 13.78764,
"_source" : {
"country" : null,
"ean" : "3564700009826",
"name" : "Pâtes Torsades - Turini - 500 g",
"brand" : "Turini"
}
}
]
You tried to use Field Collapsing?
GET test/_search
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Torsades",
"fields": [
"ean^10",
"name^4",
"brand"
]
}
}
}
},
"collapse": {
"field": "ean.keyword"
}
}
Response:
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5611319,
"_source" : {
"country" : 1,
"ean" : "3250391967858",
"name" : "Torsades Semi-complètes BIO - 500G",
"brand" : "Fiorini"
},
"fields" : {
"ean.keyword" : [
"3250391967858"
]
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.5611319,
"_source" : {
"country" : null,
"ean" : "3564700009826",
"name" : "Pâtes Torsades - Turini - 500 g",
"brand" : "Turini"
},
"fields" : {
"ean.keyword" : [
"3564700009826"
]
}
}
]

Elastic Search sort by boolean field

I want to sort my list by true value in a field called trusted.
I have found that the sort option does not support boolean sorting.
How can I achieve this?
If I understood your issue well, I tried to do a test locally on ES version 7.8, and I ingested the following data in my index:
"content": "This is a test",
"trusted": true
"content": "This is a new test",
"trusted": true
"content": "This is not a test",
"trusted": false
Here is the mapping of the index:
"mappings" : {
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"trusted" : {
"type" : "boolean"
}
}
}
Here is the query when "order" : "desc":
{
"sort": [
{
"trusted": {
"order": "desc"
}
}
]
}
The response:
"hits" : [
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "B-YleHQBsTCl1BZvrFdA",
"_score" : null,
"_source" : {
"content" : "This is a test",
"trusted" : true
},
"sort" : [
1
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "CeYleHQBsTCl1BZvtFdJ",
"_score" : null,
"_source" : {
"content" : "This is a new test",
"trusted" : true
},
"sort" : [
1
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "DOYleHQBsTCl1BZvvVfl",
"_score" : null,
"_source" : {
"content" : "This is not a test",
"trusted" : false
},
"sort" : [
0
]
}
]
When "order":"asc", the response is:
"hits" : [
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "DOYleHQBsTCl1BZvvVfl",
"_score" : null,
"_source" : {
"content" : "This is not a test",
"trusted" : false
},
"sort" : [
0
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "B-YleHQBsTCl1BZvrFdA",
"_score" : null,
"_source" : {
"content" : "This is a test",
"trusted" : true
},
"sort" : [
1
]
},
{
"_index" : "boolean-sorting",
"_type" : "_doc",
"_id" : "CeYleHQBsTCl1BZvtFdJ",
"_score" : null,
"_source" : {
"content" : "This is a new test",
"trusted" : true
},
"sort" : [
1
]
}
]
Links:
https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html
Please let me know If i wrongly answered, I will be glad to help.

Problem with creating roles in open-distro for elasticsearch

I have 2 roles that are assigned to one user. In the first role, I include field name for documents which have _id 1 and 2
{
"index_permissions": [
{
"index_patterns": [
"test"
],
"dls": "{\n \"terms\": {\n \"_id\": [ \"1\", \"2\"] \n }\n}\n\n",
"fls": [
"name"
],
"masked_fields": [],
"allowed_actions": [
"get",
"crud"
]
}
],
"tenant_permissions": [],
"cluster_permissions": [
"*"
]
}
and in the second role, I include field job_description for document which have _id 3
{
"index_permissions": [
{
"index_patterns": [
"test"
],
"dls": "{\n \"terms\": {\n \"_id\": [\"3\"] \n }\n}\n",
"fls": [
"job_description"
],
"masked_fields": [],
"allowed_actions": []
}
],
"tenant_permissions": [],
"cluster_permissions": []
}
when I try to get data from the index it shows me job_description and name in all documents,
{
"took" : 237,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
}
]
}
}
but I want to see the only name in two firs records and only job_description in 3 document like that
{
"took" : 237,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"name" : "John",
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "John",
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"job_description" : "Systems administrator and Linux specialist"
}
}
]
}
}
does anyone know how to do it?
DLS and FLS do not work in conjunction like that.
DLS is used to only return back a subset of search response based on the DLS query, whereas FLS is used to only include or exclude certain fields from the search response returned from elasticsearch.
All the DLS queries are combined (OR condition) and similarly all FLS input is combined (AND condition) for a user that contains multiple such configurations.
In your case, you have two DLS and two FLS query. The two DLS queries will work as OR conditions, in your case it will return back documents matching 1,2 or 3 doc_id. Similarly, both name and job_description will be returned back.

Why doesn't kibana display all the search results?

Here is my search query:
GET /bank/_search?q=*&sort=account_number:asc&pretty
which matches all of the 1000 docs in the bank index:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank LRl6fcZsSR6a0BMxIAQzIA 1 1 1000 0 414.3kb 414.3kb
green open .kibana_task_manager 2hiY91XzQQKAzmnXhpQLTA 1 0 2 0 12.8kb 12.8kb
green open .kibana_1 G4vY0_JASzqERwKlbqMqAg 1 0 4 0 14.7kb 14.7kb
yellow open customer 0B2gsBy3Rp-5vkMFhto-Wg 1 1 2 0 6.7kb 6.7kb
Below are my search results. Under "hits" at the top, you can see that there were 1000 hits, which is what I expected (all the _docs). Yet, kibana only displays 9 of the hits. Where are the rest?
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"_score" : null,
"_source" : {
"account_number" : 0,
"balance" : 16623,
"firstname" : "Bradshaw",
"lastname" : "Mckenzie",
"age" : 29,
"gender" : "F",
"address" : "244 Columbus Place",
"employer" : "Euron",
"email" : "bradshawmckenzie#euron.com",
"city" : "Hobucken",
"state" : "CO"
},
"sort" : [
0
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke#pyrami.com",
"city" : "Brogan",
"state" : "IL"
},
"sort" : [
1
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"account_number" : 2,
"balance" : 28838,
"firstname" : "Roberta",
"lastname" : "Bender",
"age" : 22,
"gender" : "F",
"address" : "560 Kingsway Place",
"employer" : "Chillium",
"email" : "robertabender#chillium.com",
"city" : "Bennett",
"state" : "LA"
},
"sort" : [
2
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"account_number" : 3,
"balance" : 44947,
"firstname" : "Levine",
"lastname" : "Burks",
"age" : 26,
"gender" : "F",
"address" : "328 Wilson Avenue",
"employer" : "Amtap",
"email" : "levineburks#amtap.com",
"city" : "Cochranville",
"state" : "HI"
},
"sort" : [
3
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"account_number" : 4,
"balance" : 27658,
"firstname" : "Rodriquez",
"lastname" : "Flores",
"age" : 31,
"gender" : "F",
"address" : "986 Wyckoff Avenue",
"employer" : "Tourmania",
"email" : "rodriquezflores#tourmania.com",
"city" : "Eastvale",
"state" : "HI"
},
"sort" : [
4
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"account_number" : 5,
"balance" : 29342,
"firstname" : "Leola",
"lastname" : "Stewart",
"age" : 30,
"gender" : "F",
"address" : "311 Elm Place",
"employer" : "Diginetic",
"email" : "leolastewart#diginetic.com",
"city" : "Fairview",
"state" : "NJ"
},
"sort" : [
5
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : null,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond#netagy.com",
"city" : "Dante",
"state" : "TN"
},
"sort" : [
6
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "7",
"_score" : null,
"_source" : {
"account_number" : 7,
"balance" : 39121,
"firstname" : "Levy",
"lastname" : "Richard",
"age" : 22,
"gender" : "M",
"address" : "820 Logan Street",
"employer" : "Teraprene",
"email" : "levyrichard#teraprene.com",
"city" : "Shrewsbury",
"state" : "MO"
},
"sort" : [
7
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "8",
"_score" : null,
"_source" : {
"account_number" : 8,
"balance" : 48868,
"firstname" : "Jan",
"lastname" : "Burns",
"age" : 35,
"gender" : "M",
"address" : "699 Visitation Place",
"employer" : "Glasstep",
"email" : "janburns#glasstep.com",
"city" : "Wakulla",
"state" : "AZ"
},
"sort" : [
8
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "9",
"_score" : null,
"_source" : {
"account_number" : 9,
"balance" : 24776,
"firstname" : "Opal",
"lastname" : "Meadows",
"age" : 39,
"gender" : "M",
"address" : "963 Neptune Avenue",
"employer" : "Cedward",
"email" : "opalmeadows#cedward.com",
"city" : "Olney",
"state" : "OH"
},
"sort" : [
9
]
}
]
}
}
Okay:
hits.hits – actual array of search results (defaults to first 10 documents)
You can control the size of what kibana outputs like this:
GET /bank/_search
{
"query": { "match_all": {} },
"size": 50
}
If size isn't specified:
GET /bank/_search
{
"query": { "match_all": {} },
}
then size defaults to 10.
By default the size parameter is set to a value of 10 and therefore you are able to see only 10 results. To get more results you can adjust this parameter according to you needs. Sometimes it would be better to use size parameter along with from parameter to get results page wise as in when not whole data is required in one go.
So either you can use "size": 1000 or you can set "from": 0, "size": 100 to get first 100 results and the keep on sending same query and just change the value of from param on each request. For e.g. to get next 100 results set "from": 100.
To get all 1000 results add size param as below:
{
"query":{
// your query here
},
"size": 1000
}
You can read more on from/size here.
As a query parameter you can add size as
GET /bank/_search?q=*&sort=account_number:asc&size=1000&pretty

Elastic query to return only max score records

I will be passing one should query to elastic. It should fetch only max score record from the results.
Records in my index :
GET testindex1/_search
"hits" : [
{
"_index" : "testindex1",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "xyz",
"description" : "better",
"place" : "kerala"
}
},
{
"_index" : "testindex1",
"_type" : "doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "abc",
"description" : "best",
"place" : "andra"
}
},
{
"_index" : "testindex1",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "mno",
"description" : "good",
"place" : "tamil"
}
}
]
Query passed:
GET testindex1/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"default_field": "name",
"query": "xyz",
"default_operator": "AND"
}
},
{
"query_string": {
"default_field": "description",
"query": "*",
"default_operator": "AND"
}
},
{
"query_string": {
"default_field": "place",
"query": "*",
"default_operator": "AND"
}
}
]
}
}
}
Results for the above query:
"hits" : {
"total" : 3,
"max_score" : 2.287682,
"hits" : [
{
"_index" : "testindex1",
"_type" : "doc",
"_id" : "2",
"_score" : 2.287682,
"_source" : {
"name" : "xyz",
"description" : "better",
"place" : "kerala"
}
},
{
"_index" : "testindex1",
"_type" : "doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"name" : "abc",
"description" : "best",
"place" : "andra"
}
},
{
"_index" : "testindex1",
"_type" : "doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"name" : "mno",
"description" : "good",
"place" : "tamil"
}
}
]
}
For the above query the max score is 2.287682, and i want to display only the record with max score. Records might be more than 1 in my scenario. I cant use must query. I want to fetch the best result from the query passed
I don't believe elasticsearch provides something like this with one request.
There are two alternatives that come in mind.
If you have a maximum number of results, you can specify that number (e.g. I want the best 20 of all, even if there are more). Then ask for that size: 20, and then just filter the results on the application level. The max_score is available, also the score of every document is available as well.
The second one is to make a request with size: 1 and filter_path=hits.max_score in order to get the max score and then make a second request adding the field min_score https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-min-score.html . This means that for every request you need to make two actually, but you get the desired behaviour.

Resources