Is it possible to extract the stored value of a keyword field when _source is disabled in Elasticsearch 7 - elasticsearch

I have the following index:
{
"articles_2022" : {
"mappings" : {
"_source" : {
"enabled" : false
},
"properties" : {
"content" : {
"type" : "text",
"norms" : false
},
"date" : {
"type" : "date"
},
"feed_canonical" : {
"type" : "boolean"
},
"feed_id" : {
"type" : "integer"
},
"feed_subscribers" : {
"type" : "integer"
},
"language" : {
"type" : "keyword",
"doc_values" : false
},
"title" : {
"type" : "text",
"norms" : false
},
"url" : {
"type" : "keyword",
"doc_values" : false
}
}
}
}
}
I have a very specific one-time need and I want to extract the stored values from the url field for all documents. Is this possible with Elasticsearch 7? Thanks!

Since in your index mapping, you have defined url field as of keyword type and have "doc_values": false. Therefore you cannot perform terms aggregation on this.
As far as I can understand your question, you only need to get the value of the of the url field in several documents. For that you can use exists query
Adding a working example
Index Mapping:
PUT idx1
{
"mappings": {
"properties": {
"url": {
"type": "keyword",
"doc_values": false
}
}
}
}
Index Data:
POST idx1/_doc/1
{
"url":"www.google.com"
}
POST idx1/_doc/2
{
"url":"www.youtube.com"
}
Search Query:
POST idx1/_search
{
"_source": [
"url"
],
"query": {
"exists": {
"field": "url"
}
}
}
Search Response:
"hits" : [
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"url" : "www.google.com"
}
},
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"url" : "www.youtube.com"
}
}
]

As your
"_source" : { "enabled" : false }
You can add mapping "store:true" for the field that you want to extract value of.
As
PUT indexExample2
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"url": {
"type": "keyword",
"doc_values": false,
"store": true
}
}
}
}
Now once you index data, #ESCoder Thanks for example.
POST indexExample2/_doc/1
{
"url":"www.google.com"
}
POST indexExample2/_doc/2
{
"url":"www.youtube.com"
}
You can extract only the stored field in your search queries, even if _source is disabled.
POST indexExample2/_search
{
"query": {
"exists": {
"field": "url"
}
},
"stored_fields": ["url"]
}
This will o/p as:
"hits" : [
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"url" : [
"www.google.com"
]
}
},
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"url" : [
"www.youtube.com"
]
}
}
]

Related

Elasticsearch | Mapping exclude field with bulk API

I am using bulk api to create index and store data fields. Also I want to set mapping to exclude a field "field1" from the source. I know this can be done using "create Index API" reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html but I am using bulk API. below is sample API call:
POST _bulk
{ "index" : { "_index" : "test", _type = 'testType', "_id" : "1" } }
{ "field1" : "value1" }
Is there a way to add mapping settings while bulk indexing similar to below code:
{ "index" : { "_index" : "test", _type = 'testType', "_id" : "1" },
"mappings": {
"_source": {
"excludes": [
"field1"
]
}
}
}
{ "field1" : "value1" }
how to do mapping with bulk API?
It is not possible to define the mapping for a new Index while using the bulk API. You have to create your index beforehand and define the mapping then, or you have to define an index template and use a name for your index in your bulk request that triggers that template.
The following example code can be executed via the Dev Tools windows in Kibana:
PUT /_index_template/mytemplate
{
"index_patterns": [
"te*"
],
"priority": 1,
"template": {
"mappings": {
"_source": {
"excludes": [
"testexclude"
]
},
"properties": {
"testfield": {
"type": "keyword"
}
}
}
}
}
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "testfield" : "value1", "defaultField" : "asdf", "testexclude": "this shouldn't be in source" }
GET /test/_mapping
You can see by the response that in this example the mapping template was used for the new test index because the testfield has only the keyword type and the source excludes is used from the template.
{
"test" : {
"mappings" : {
"_source" : {
"excludes" : [
"testexclude"
]
},
"properties" : {
"defaultField" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"testexclude" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"testfield" : {
"type" : "keyword"
}
}
}
}
}
Also the document is not returned with the excluded field:
GET /test/_doc/1
Response:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"defaultField" : "asdf",
"testfield" : "value1"
}
}
Hope this answers your question and solves your use-case.

Why can't I get Elasticsearch's completion suggester to sort based on a field?

I'm trying to get autocomplete suggestions from Elasticsearch, but sorted by an internal popularity score that I supply in the data, so that the most popular ones show at the top. My POST looks like this:
curl "http://penguin:9200/node/_search?pretty" --silent --show-error \
--header "Content-Type: application/json" \
-X POST \
-d '
{
"_source" : [
"name",
"popular_score"
],
"sort" : [ "popular_score" ],
"suggest" : {
"my_suggestion" : {
"completion" : {
"field" : "searchbar_suggest",
"size" : 10,
"skip_duplicates" : true
},
"text" : "f"
}
}
}
'
I get back valid autocomplete suggestions, but they aren't sorted by the popular_score field:
{
...
"suggest" : {
"my_suggestion" : [
{
"text" : "f",
"offset" : 0,
"length" : 1,
"options" : [
{
"text" : "2020 Fact Longlist",
"_index" : "node",
"_type" : "_doc",
"_id" : "245105",
"_score" : 1.0,
"_source" : {
"popular_score" : "35",
"name" : "2020 Fact Longlist"
}
},
{
"text" : "Fable",
"_index" : "node",
"_type" : "_doc",
"_id" : "125903",
"_score" : 1.0,
"_source" : {
"popular_score" : "69.33333333333333333333333333333333333333",
"name" : "Fable"
}
},
{
"text" : "Fables",
"_index" : "node",
"_type" : "_doc",
"_id" : "172986",
"_score" : 1.0,
"_source" : {
"popular_score" : "24",
"name" : "Fables"
}
}
...
]
}
]
}
}
My mappings are:
{
"mappings": {
"properties": {
"nodeid": {
"type": "integer"
},
"name": {
"type": "text",
"copy_to": "searchbar_suggest"
},
"popular_score": {
"type": "float"
},
"searchbar_suggest": {
"type": "completion"
}
}
}
}
What am I doing wrong?

filter with special character in ElasticSearch 6.0.0

I am trying to filter all data which contains some special character like '#', '.','/' etc. But not able to succeed.
I am willing to fetch the city which contains the # or dot(.), so i need a query which provide me the output that contains the special character.
I am quite new here in Elasticsearch query. So please help me.
Thanks
Below is index:
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "student",
"_type" : "data",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "Mirja",
"city" : "pune # bandra",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Rohan",
"city" : "BBSR /. patia",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Diya",
"city" : "pune_bandra",
"contact number" : 9723124343
}
}
}
]
}
}```
You need to check the analyzer on your city field. If it's standard analyzer, it will remove special characters when creating tokens. Instead use the below mapping on city field and search using a regular match query
PUT test_index
{
"mappings": {
"properties": {
"city": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}

Multi match query and the scoring calculation in Elasticsearch

I have couple documents in the index, and 2 of them are
{
"id" : "c0706549-d06c-4043-8086-1b4b3ec1ef95",
"title" : "Google Pixel XL",
"memory" : "4GB",
"quantity" : 3
}
{
"id" : "23ecaecd-6b3f-4592-b79f-f46a20157221",
"title" : "Google Pixel XL",
"memory" : "6GB",
"quantity" : 1
}
And for the query
{
"query": { "multi_match": { "query": "pixel xl 6gb", "fields": ["title", "memory"] } }
}
I get the response
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "c0706549-d06c-4043-8086-1b4b3ec1ef95",
"_score" : 2.4280763,
"_source" : {
"id" : "c0706549-d06c-4043-8086-1b4b3ec1ef95",
"title" : "Google Pixel XL",
"memory" : "4GB",
"quantity" : 3
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "23ecaecd-6b3f-4592-b79f-f46a20157221",
"_score" : 2.4280763,
"_source" : {
"id" : "23ecaecd-6b3f-4592-b79f-f46a20157221",
"title" : "Google Pixel XL",
"memory" : "6GB",
"quantity" : 1
}
}
But I expect that the document with the memory field 6GB will be on top, can you please advise why this happens and how to fix it?
Index mapping
{
"mappings" : {
"properties" : {
"memory" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"title" : {
"type" : "text",
"analyzer" : "synonym_analyzer"
}
}
}
}
Index settings
{
"index" : {
"analysis" : {
"filter" : {
"synonym_filter" : {
"type" : "synonym",
"synonyms" : [
"laptop, notebook"
]
}
},
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "synonym_filter"]
}
}
}
}
}
Elasticsearch version 7.7.0
I just tried this locally and I am getting much higher score(~2X) for a document containing the 6GB and if you note carefully in your case, both the document has exactly the same score (2.4280763), that means both of them has exactly the same relevance and its just the order in Elasticsearch response that is different.
Think of it, you need to sort the numbers 1,2,3,1 then 1,1,2,3 will be the order
it doesn't matter which 1 comes before or after.
Also, you need to provide your mapping and index configuration(number of shards) and elasticsearch version(as older version uses tf/idf while the new one uses BM25) for score calculation.
I tried this on ES 7.7 version and as mentioned earlier with my mapping and your sample data got 2X better score.
Index mapping
{
"mappings": {
"properties": {
"title" : {
"type": "text"
},
"memory" : {
"type" : "text"
}
}
}
}
Index your 2 docs
{
"title" : "Google Pixel XL",
"memory" : "6GB"
}
{
"title" : "Google Pixel XL",
"memory" : "4GB"
}
Search query
{
"query": {
"multi_match": {
"query": "pixel xl 6gb",
"fields": [
"title",
"memory"
]
}
}
}
And search result
"hits": [
{
"_index": "multma",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471, --> note this
"_source": {
"title": "Google Pixel XL",
"memory": "6GB" --> 6 GB one has a better score and coming on top
}
},
{
"_index": "multma",
"_type": "_doc",
"_id": "1",
"_score": 0.36464313,
"_source": {
"title": "Google Pixel XL",
"memory": "4GB"
}
}
]

ElasticSearch join data within the same index

I am quite new with ElasticSearch and I am collecting some application logs within the same index which have this format
{
"_index" : "app_logs",
"_type" : "_doc",
"_id" : "JVMYi20B0a2qSId4rt12",
"_source" : {
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "STARTED",
"ts" : "2019-10-02T08:11:53Z"
}
I can have different event types. In this case I am interested in STARTED and FINISHED. I would like to query ES in order to get all the app that started in a certain day and enrich them with their end time. Basically I want to create couples of start/end (an end might also be missing, but that's fine).
I have realized join relations in sql cannot be used in ES and I was wondering if I can exploit some other feature in order to get this result in one query.
Edit: these are the details of the index mapping
{
“app_logs" : {
"mappings" : {
"_doc" : {
"properties" : {
"event_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
“app_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"ts" : {
"type" : "date"
},
“event_type” : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}}}}
What I understood is that you would want to collate list of documents having same app_id along with the status as either STARTED or FINISHED.
I do not think Elasticsearch is not meant to perform JOIN operations. I mean you can but then you have to design your documents as mentioned in this link.
What you would need is an Aggregation query.
Below is the sample mapping, documents, the aggregation query and the response as how it appears, which would actually help you get the desired result.
Mapping:
PUT mystatusindex
{
"mappings": {
"properties": {
"username":{
"type": "keyword"
},
"app_id":{
"type": "keyword"
},
"event_type":{
"type":"keyword"
},
"ts":{
"type": "date"
}
}
}
}
Sample Documents
POST mystatusindex/_doc/1
{
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "STARTED",
"ts" : "2019-10-02T08:11:53Z"
}
POST mystatusindex/_doc/2
{
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "FINISHED",
"ts" : "2019-10-02T08:12:53Z"
}
POST mystatusindex/_doc/3
{
"username" : "mapred",
"app_id" : "application_1569623930006_490201",
"event_type" : "STARTED",
"ts" : "2019-10-02T09:30:53Z"
}
POST mystatusindex/_doc/4
{
"username" : "mapred",
"app_id" : "application_1569623930006_490202",
"event_type" : "STARTED",
"ts" : "2019-10-02T09:45:53Z"
}
POST mystatusindex/_doc/5
{
"username" : "mapred",
"app_id" : "application_1569623930006_490202",
"event_type" : "FINISHED",
"ts" : "2019-10-02T09:45:53Z"
}
POST mystatusindex/_doc/6
{
"username" : "mapred",
"app_id" : "application_1569623930006_490203",
"event_type" : "STARTED",
"ts" : "2019-10-03T09:30:53Z"
}
POST mystatusindex/_doc/7
{
"username" : "mapred",
"app_id" : "application_1569623930006_490203",
"event_type" : "FINISHED",
"ts" : "2019-10-03T09:45:53Z"
}
Query:
POST mystatusindex/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "2019-10-02T00:00:00Z",
"lte": "2019-10-02T23:59:59Z"
}
}
}
],
"should": [
{
"match": {
"event_type": "STARTED"
}
},
{
"match": {
"event_type": "FINISHED"
}
}
]
}
},
"aggs": {
"application_IDs": {
"terms": {
"field": "app_id"
},
"aggs": {
"ids": {
"top_hits": {
"size": 10,
"_source": ["event_type", "app_id"],
"sort": [
{ "event_type": { "order": "desc"}}
]
}
}
}
}
}
}
Notice that for filtering I've made use of Range Query as you only want to filter documents for that date and also added a bool should logic to filter based on STARTED and FINISHED.
Once I have the documents, I've made use of Terms Aggregation and Top Hits Aggregation to get the desired result.
Result
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"application_IDs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "application_1569623930006_490200", <----- APP ID
"doc_count" : 2,
"ids" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "1", <--- Document with STARTED status
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490200"
},
"sort" : [
"STARTED"
]
},
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "2", <--- Document with FINISHED status
"_score" : null,
"_source" : {
"event_type" : "FINISHED",
"app_id" : "application_1569623930006_490200"
},
"sort" : [
"FINISHED"
]
}
]
}
}
},
{
"key" : "application_1569623930006_490202",
"doc_count" : 2,
"ids" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490202"
},
"sort" : [
"STARTED"
]
},
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"event_type" : "FINISHED",
"app_id" : "application_1569623930006_490202"
},
"sort" : [
"FINISHED"
]
}
]
}
}
},
{
"key" : "application_1569623930006_490201",
"doc_count" : 1,
"ids" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490201"
},
"sort" : [
"STARTED"
]
}
]
}
}
}
]
}
}
}
Note that the last document with only STARTED appears in the aggregation result as well.
Updated Answer
{
"size":0,
"query":{
"bool":{
"must":[
{
"range":{
"ts":{
"gte":"2019-10-02T00:00:00Z",
"lte":"2019-10-02T23:59:59Z"
}
}
}
],
"should":[
{
"term":{
"event_type.keyword":"STARTED" <----- Changed this
}
},
{
"term":{
"event_type.keyword":"FINISHED" <----- Changed this
}
}
]
}
},
"aggs":{
"application_IDs":{
"terms":{
"field":"app_id.keyword" <----- Changed this
},
"aggs":{
"ids":{
"top_hits":{
"size":10,
"_source":[
"event_type",
"app_id"
],
"sort":[
{
"event_type.keyword":{ <----- Changed this
"order":"desc"
}
}
]
}
}
}
}
}
}
Note the changes I've made. Whenever you would need exact matches or want to make use of aggregation, you would need to make use of keyword type.
In the mapping you've shared, there is no username field but two event_type fields. I'm assuming its just a human err and that one of the field should be username.
Now if you notice carefully, the field event_type has a text and its sibling keyword field. I've just modified the query to make use of the keyword field and when I am doing that, I'm use Term Query.
Try this out and let me know if it helps!

Resources