Using index sorting by default in _search - elasticsearch

I am using ElasticSearch 7.6 and the Index Sorting feature which was introduced in 6.0.
What i am looking to do is to do a GET /myindice/_search without specifying sort and get documents based on Index sorting settings I have specified for my index and NOT insertion order.
My index as per the doc :
PUT twitter
{
"settings" : {
"index" : {
"sort.field" : "date",
"sort.order" : "desc"
}
},
"mappings": {
"properties": {
"date": {
"type": "date"
}
}
}
}
PUT twitter/_doc/a
{
"date": "2015-01-01"
}
PUT twitter/_doc/b
{
"date": "2016-01-01"
}
PUT twitter/_doc/c
{
"date": "2017-01-01"
}
My initial thought is that
GET twitter/_search
Should return doc C, B and A.
I get the following :
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "a",
"_score" : 1.0,
"_source" : {
"date" : "2015-01-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "b",
"_score" : 1.0,
"_source" : {
"date" : "2016-01-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "c",
"_score" : 1.0,
"_source" : {
"date" : "2017-01-01"
}
}
]
}
}
As the documentation isn't clear at this particular subject and that all query are using sort :
https://www.elastic.co/guide/en/elasticsearch/reference/6.0/index-modules-index-sorting.html
Am I required to specify the sort order in the GET query (hence repeating the sort specified as the Index Sorting) ?
Thanks in advance for any diligent soul that could help me,

Related

Elasticsearch returning wrong results upon query

I am new to ElasticSearch and was doing some experiments to learn but I figured out that _search query is returning wrong results. I inserted documents to index by using following code
PUT tryDB/_doc/2
{"personId":"2","minor":true,"money":15 }
PUT tryDB/_doc/3
{"personId":"3","minor":true,"money":20 }
PUT tryDB/_doc/4
{"personId":"4","minor":true,"money":25 }
PUT tryDB/_doc/5
{"personId":"5","minor":true,"money":30 }
PUT tryDB/_doc/6
{"personId":"6","minor":true,"money":35 }
PUT tryDB/_doc/7
{"personId":"7","minor":true,"money":40 }
PUT tryDB/_doc/8
{"personId":"8","minor":true,"money":45 }
PUT tryDB/_doc/9
{"personId":"9","minor":true,"money":55 }
PUT tryDB/_doc/10
{"personId":"10","minor":true,"money":60 }
PUT tryDB/_doc/11
{"personId":"11","minor":true,"money":65 }
PUT tryDB/_doc/12
{"personId":"12","minor":true,"money":70 }
PUT tryDB/_doc/13
{"personId":"2","minor":false,"money":80 }
PUT tryDB/_doc/14
{"personId":"2","minor":false,"money":90 }
PUT tryDB/_doc/15
{"personId":"2","minor":false,"money":100 }
PUT tryDB/_doc/16
{"personId":"2","minor":false,"money":10 }
After which I fired up a GET tryDB/_search query to list all the documents, which in turn returns
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 16,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "tryDB",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"personId" : "1",
"minor" : true,
"money" : 10
}
},
{
"_index" : "tryDB",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"personId" : "2",
"minor" : true,
"money" : 15
}
},
{
"_index" : "tryDB",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"personId" : "3",
"minor" : true,
"money" : 20
}
},
{
"_index" : "tryDB",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"personId" : "4",
"minor" : true,
"money" : 25
}
},
{
"_index" : "tryDB",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"personId" : "5",
"minor" : true,
"money" : 30
}
},
{
"_index" : "tryDB",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"personId" : "6",
"minor" : true,
"money" : 35
}
},
{
"_index" : "tryDB",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"personId" : "7",
"minor" : true,
"money" : 40
}
},
{
"_index" : "tryDB",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"personId" : "8",
"minor" : true,
"money" : 45
}
},
{
"_index" : "tryDB",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"personId" : "9",
"minor" : true,
"money" : 55
}
},
{
"_index" : "tryDB",
"_id" : "10",
"_score" : 1.0,
"_source" : {
"personId" : "10",
"minor" : true,
"money" : 60
}
}
]
}
}
Where are the rest 6 documents ?
Now I went ahead and fired up a range based query
GET tryDB/_search
{
"query": {
"range": {
"money": {
"lte":100
}
}
}
}
Which in turn returned
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "tryDB",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"personId" : "1",
"minor" : true,
"money" : 10
}
},
{
"_index" : "tryDB",
"_id" : "15",
"_score" : 1.0,
"_source" : {
"personId" : "2",
"minor" : false,
"money" : 100
}
},
{
"_index" : "tryDB",
"_id" : "16",
"_score" : 1.0,
"_source" : {
"personId" : "2",
"minor" : false,
"money" : 10
}
}
]
}
}
Which is wrong clearly. Can anyone help me figure out what's going on here?
Where are the rest 6 documents ?
When you do not determine the value of "size", by default elastic returns 10 documents.
Set size like this:
{
"size": 20,
"query": {
"match_all": {}
}
}
POST tryDB/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"money": {
"lte": 100
}
}
}
]
}
}
}
#rabbitbr Thanks for the quick response!
Hey I figured out the solution (posting here)
Based on the result,
Looks like Elastic Search index money as string.
I tried setting up an explicit mapping to make sure the money field indexed as number.
https://opensearch.org/docs/1.3/opensearch/mappings/
This worked out.

How to set _id to a field from property in mapping in elasticsearch

I am trying to define a mapping in elasticsearch wherein _id will be set to one of the field of property in the mapping.
So every time i post data it should automatically extract this field and set it to _id.
But on saving data every time a new random _id is generated. Is this the correct way to set _id when setting mappings in elasticsearch.
PUT /index00001
{
"mappings": {
"_meta":{
"_id" : "userid"
},
"properties": {
"userid": {
"type": "text"
},
"nickname": {
"type": "text"
}
}
},
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
POST /index00001/_doc
{
"userid": "6009001",
"nickname": "nick"
}
{
"took" : 438,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index00001",
"_type" : "_doc",
"_id" : "IeKqnn0BuUqEU88H_tlq",
"_score" : 1.0,
"_source" : {
"userid" : "6009001",
"nickname" : "nick"
}
},
{
"_index" : "index00001",
"_type" : "_doc",
"_id" : "JeKrnn0BuUqEU88HNtnu",
"_score" : 1.0,
"_source" : {
"userid" : "6009001",
"nickname" : "amit"
}
}
]
}
}
Why is my _id not set to userid field from property
This is elasticsearch version - 7.8.0 lucene_version -8.5.1
It used to be possible to have ES automatically use a field value as the ID of the document in ES 1.X, but it is not possible anymore since ES 2.0.
Now you need to explicitly pass the ID of your documents when indexing them, otherwise one will be generated for you.

Range filter for count of documents with the same value for a field

In my index my-books, each document represents a book and has a field authorId, which uniquely represents the author of the book. I want to run a search query with a range filter on the total number of books authored by the book's author.
For example: say, if I have four authors A, B, C, D.
A is the author for books a1, a2,a3.
B is the author for book b1.
C is the author for books c1,c2.
D is the author for books d1, d2, d3, d4.
Lets say I want to retrieve all books such as the number of books written by the same author is greater than 1 but less than 4. Then my result hits are [a1, a2, a3, c1, c2].
How do I write such a query?
You need to use
terms aggregation to group by authors
top_hits to get documents under that author
bucket_selector to get terms where doc count is less than 4
{
"aggs": {
"NAME": {
"terms": {
"field": "author.keyword",
"size": 10
},
"aggs": {
"books": {
"top_hits": {
"size": 10
}
},
"final_filter": {
"bucket_selector": {
"buckets_path": {
"values": "_count"
},
"script": "params.values < 4"
}
}
}
}
}
}
Result
"aggregations" : {
"NAME" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A",
"doc_count" : 2,
"books" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index148",
"_type" : "_doc",
"_id" : "-_pOUHoBVZyA6L_G1XrM",
"_score" : 1.0,
"_source" : {
"book" : "a1",
"author" : "A"
}
},
{
"_index" : "index148",
"_type" : "_doc",
"_id" : "_PpPUHoBVZyA6L_GL3q5",
"_score" : 1.0,
"_source" : {
"book" : "a3",
"author" : "A"
}
}
]
}
}
},
{
"key" : "B",
"doc_count" : 1,
"books" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index148",
"_type" : "_doc",
"_id" : "_fpPUHoBVZyA6L_GWHpg",
"_score" : 1.0,
"_source" : {
"book" : "b1",
"author" : "B"
}
}
]
}
}
},
{
"key" : "C",
"doc_count" : 1,
"books" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index148",
"_type" : "_doc",
"_id" : "_vpPUHoBVZyA6L_Gmnoj",
"_score" : 1.0,
"_source" : {
"book" : "c1",
"author" : "C"
}
}
]
}
}
}
]
}
}

How do I apply reindex to new data values through filters?

This is basic_data(example) Output value
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "hello",
"#timestamp" : "2021-05-13T02:50:05.962Z"
},
{
"_index" : "0513_final_test_instgram",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"host" : "DESKTOP-7MDCA36",
"path" : "C:/python_file/20210513_114123_instargram.csv",
"#version" : "1",
"message" : "python,
"#timestamp" : "2021-05-13T02:50:05.947Z"
}
First of all, out of various field values, only message values have been extracted.(under code example)
GET 0513_final_test_instgram/_search?_source=message&filter_path=hits.hits._source
{
"hits" : {
"hits" : [
{
"_source" : {
"message" : "hello"
}
},
{
"_source" : {
"message" : "python"
}
I got to know reindex that stores new indexes.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
However, I don't know even if I look at the document.
0513 attempt code
POST _reindex
{
"source": {
"index": "0513_final_test_instgram"
},
"dest": {
"index": "new_data_index"
}
}
How do you use reindex to store data that only extracted message values in a new index?
update comment attempt
output
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 163,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "6uShY3kBEkIlakOYovrR",
"_score" : 1.0,
"_source" : {
"message" : "hello"
}
},
{
"_index" : "new_data_index",
"_type" : "_doc",
"_id" : "EeShY3kBEkIlakOYovvm",
"_score" : 1.0,
"_source" : {
"message" : "python"
}
}
You simply need to specify which fields you want to reindex into the new index:
{
"source": {
"index": "0513_final_test_instgram",
"_source": ["message"]
},
"dest": {
"index": "new_data_index"
}
}

Problem with creating roles in open-distro for elasticsearch

I have 2 roles that are assigned to one user. In the first role, I include field name for documents which have _id 1 and 2
{
"index_permissions": [
{
"index_patterns": [
"test"
],
"dls": "{\n \"terms\": {\n \"_id\": [ \"1\", \"2\"] \n }\n}\n\n",
"fls": [
"name"
],
"masked_fields": [],
"allowed_actions": [
"get",
"crud"
]
}
],
"tenant_permissions": [],
"cluster_permissions": [
"*"
]
}
and in the second role, I include field job_description for document which have _id 3
{
"index_permissions": [
{
"index_patterns": [
"test"
],
"dls": "{\n \"terms\": {\n \"_id\": [\"3\"] \n }\n}\n",
"fls": [
"job_description"
],
"masked_fields": [],
"allowed_actions": []
}
],
"tenant_permissions": [],
"cluster_permissions": []
}
when I try to get data from the index it shows me job_description and name in all documents,
{
"took" : 237,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"name" : "John",
"job_description" : "Systems administrator and Linux specialist"
}
}
]
}
}
but I want to see the only name in two firs records and only job_description in 3 document like that
{
"took" : 237,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"name" : "John",
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"name" : "John",
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.0,
"_source" : {
"job_description" : "Systems administrator and Linux specialist"
}
}
]
}
}
does anyone know how to do it?
DLS and FLS do not work in conjunction like that.
DLS is used to only return back a subset of search response based on the DLS query, whereas FLS is used to only include or exclude certain fields from the search response returned from elasticsearch.
All the DLS queries are combined (OR condition) and similarly all FLS input is combined (AND condition) for a user that contains multiple such configurations.
In your case, you have two DLS and two FLS query. The two DLS queries will work as OR conditions, in your case it will return back documents matching 1,2 or 3 doc_id. Similarly, both name and job_description will be returned back.

Resources