I have some problem with setting elasticsearch 7 version.
My purpose is make automatically #timestamp field value after make new doc in ES.
I found some answer about similar question. but it can't be solution because it is different version.
I tried _default_ object in mappings object. But it seems to not provide anymore in ES 7 version.
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
And I want to make #timestamp value in this case.
PUT /locations
{
"mappings": {
"properties": {
"location": {
"type": "geo_point"
},
"id": {
"type": "text"
}
}
}
}
PUT /locations/_doc/1
{
"location" : "31.387593,121.123446",
"id" : "xxxxxxxxxxxxxxxxxxxxxx"
}
expectd result :
{
#timestamp : "2019-10-23 10:23:50",
"location" : "31.387593,121.123446",
"id" : "xxxxxxxxxxxxxxxxxxxxxx"
}
You can create an ingest pipeline
PUT _ingest/pipeline/timestamp
{
"description": "Adds timestamp to documents",
"processors": [
{
"set": {
"field": "_source.timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
And call it while inserting documents
POST index39/_doc?pipeline=timestamp
{
"id":1
}
Response:
{
"_index" : "index39",
"_type" : "_doc",
"_id" : "KWF6920BpmJq35glEsr3",
"_score" : 1.0,
"_source" : {
"id" : 1,
"timestamp" : "2019-10-23T07:17:15.639200400Z"
}
}
}
Related
I have the following index:
{
"articles_2022" : {
"mappings" : {
"_source" : {
"enabled" : false
},
"properties" : {
"content" : {
"type" : "text",
"norms" : false
},
"date" : {
"type" : "date"
},
"feed_canonical" : {
"type" : "boolean"
},
"feed_id" : {
"type" : "integer"
},
"feed_subscribers" : {
"type" : "integer"
},
"language" : {
"type" : "keyword",
"doc_values" : false
},
"title" : {
"type" : "text",
"norms" : false
},
"url" : {
"type" : "keyword",
"doc_values" : false
}
}
}
}
}
I have a very specific one-time need and I want to extract the stored values from the url field for all documents. Is this possible with Elasticsearch 7? Thanks!
Since in your index mapping, you have defined url field as of keyword type and have "doc_values": false. Therefore you cannot perform terms aggregation on this.
As far as I can understand your question, you only need to get the value of the of the url field in several documents. For that you can use exists query
Adding a working example
Index Mapping:
PUT idx1
{
"mappings": {
"properties": {
"url": {
"type": "keyword",
"doc_values": false
}
}
}
}
Index Data:
POST idx1/_doc/1
{
"url":"www.google.com"
}
POST idx1/_doc/2
{
"url":"www.youtube.com"
}
Search Query:
POST idx1/_search
{
"_source": [
"url"
],
"query": {
"exists": {
"field": "url"
}
}
}
Search Response:
"hits" : [
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"url" : "www.google.com"
}
},
{
"_index" : "idx1",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"url" : "www.youtube.com"
}
}
]
As your
"_source" : { "enabled" : false }
You can add mapping "store:true" for the field that you want to extract value of.
As
PUT indexExample2
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"url": {
"type": "keyword",
"doc_values": false,
"store": true
}
}
}
}
Now once you index data, #ESCoder Thanks for example.
POST indexExample2/_doc/1
{
"url":"www.google.com"
}
POST indexExample2/_doc/2
{
"url":"www.youtube.com"
}
You can extract only the stored field in your search queries, even if _source is disabled.
POST indexExample2/_search
{
"query": {
"exists": {
"field": "url"
}
},
"stored_fields": ["url"]
}
This will o/p as:
"hits" : [
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"url" : [
"www.google.com"
]
}
},
{
"_index" : "indexExample2",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"url" : [
"www.youtube.com"
]
}
}
]
I am using bulk api to create index and store data fields. Also I want to set mapping to exclude a field "field1" from the source. I know this can be done using "create Index API" reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html but I am using bulk API. below is sample API call:
POST _bulk
{ "index" : { "_index" : "test", _type = 'testType', "_id" : "1" } }
{ "field1" : "value1" }
Is there a way to add mapping settings while bulk indexing similar to below code:
{ "index" : { "_index" : "test", _type = 'testType', "_id" : "1" },
"mappings": {
"_source": {
"excludes": [
"field1"
]
}
}
}
{ "field1" : "value1" }
how to do mapping with bulk API?
It is not possible to define the mapping for a new Index while using the bulk API. You have to create your index beforehand and define the mapping then, or you have to define an index template and use a name for your index in your bulk request that triggers that template.
The following example code can be executed via the Dev Tools windows in Kibana:
PUT /_index_template/mytemplate
{
"index_patterns": [
"te*"
],
"priority": 1,
"template": {
"mappings": {
"_source": {
"excludes": [
"testexclude"
]
},
"properties": {
"testfield": {
"type": "keyword"
}
}
}
}
}
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "testfield" : "value1", "defaultField" : "asdf", "testexclude": "this shouldn't be in source" }
GET /test/_mapping
You can see by the response that in this example the mapping template was used for the new test index because the testfield has only the keyword type and the source excludes is used from the template.
{
"test" : {
"mappings" : {
"_source" : {
"excludes" : [
"testexclude"
]
},
"properties" : {
"defaultField" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"testexclude" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"testfield" : {
"type" : "keyword"
}
}
}
}
}
Also the document is not returned with the excluded field:
GET /test/_doc/1
Response:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"defaultField" : "asdf",
"testfield" : "value1"
}
}
Hope this answers your question and solves your use-case.
Follow up to this question.
I have a dynamic template which copies the text of a JSON blob to a single text field, and I'd like to search on that field and highlight matches. Here is my full code for ES 6.5
DELETE /test
PUT /test?include_type_name=true
{
"settings": {"number_of_shards": 1,"number_of_replicas": 1},
"mappings": {
"_doc": {
"dynamic_templates": [
{
"full_name": {
"match_mapping_type": "string",
"path_match": "content.*",
"mapping": {
"type": "text",
"copy_to": "content_text"
}
}
}
],
"properties": {
"content_text": {
"type": "text"
},
"content": {
"type": "object",
"enabled": "true"
}
}
}
}
}
PUT /test/_doc/1?refresh=true
{
"content": {
"a": {
"b": {
"text": "42"
}
}
}
}
GET /test/_search
{
"query": {
"match": {
"content_text": "42"
}
},
"highlight": {
"fields": {
"content_text": {}
}
}
}
The response does not show the highlighted content_text
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"content" : {
"a" : {
"b" : {
"text" : "42"
}
}
}
}
}
]
}
}
As you can see, the content_text field is not highlight. It's also not in the response at all. How do I get highlights for this field to show up?
This is a tricky one, but will make sense once you read what follows.
As per the official documentation on highlighting, the actual content of a field is required to exist somewhere. So if the field is not stored (i.e. the mapping does not set store to true), the actual _source is loaded and the relevant field is extracted from _source.
In your case, the content_text field doesn't exist in the _source document (i.e. it is just indexed from other text fields present in content.*) and in the mapping, the store parameter is not set to true (it is false by default).
So you simply need to change your mapping to this:
"content_text": {
"store": true,
"type": "text"
},
And then your query will yield this:
"highlight" : {
"content_text" : [
"<em>42</em>"
]
}
So I have two fields in my docs
{
emails: ["", "", ""]
name: "",
}
And I want to have a new field once the docs are indexed called uid which will just contain the concatenated strings of all the emails and the name for every doc.
I am able to get scripted field like that using this GET request on my index _search endpoint
{
"script_fields": {
"combined": {
"script": {
"lang": "painless",
"source": "def result=''; for (String email: doc['emails.keyword']) { result = result + email;} return doc['name'].value + result;"
}
}
}
}
I want to know what my ingest pipeline PUT request body should look like if I want to have the same scripted field indexed with my docs?
Let's say I have the below sample index and sample document.
Sample Source Index
For the sake of understanding, I've created the below mapping.
PUT my_source_index
{
"mappings": {
"properties": {
"email":{
"type":"text"
},
"name":{
"type": "text"
}
}
}
}
Sample Document:
POST my_source_index/_doc/1
{
"email": ["john#gmail.com","doe#outlook.com"],
"name": "johndoe"
}
Just follow the below steps
Step 1: Create Ingest Pipeline
PUT _ingest/pipeline/my-pipeline-concat
{
"description" : "describe pipeline",
"processors" : [
{
"join": {
"field": "email",
"target_field": "temp_uuid",
"separator": "-"
}
},
{
"set": {
"field": "uuid",
"value": "{{name}}-{{temp_uuid}}"
}
},
{
"remove":{
"field": "temp_uuid"
}
}
]
}
Notice that I've made use of Ingest API where I am using three processors while creating the above pipeline which would be executed in sequence:
The first processor is a Join Processor, which concatenates all the email ids and creates temp_uuid.
Second Processor is a Set Processor, I am combining name with temp_uuid.
And in the third step, I am removing the temp_uuid using Remove Processor
Note that I am using - as delimiter between all values. You can feel free to use anything you want.
Step 2: Create Destination Index:
PUT my_dest_index
{
"mappings": {
"properties": {
"email":{
"type":"text"
},
"name":{
"type": "text"
},
"uuid":{ <--- Do not forget to add this
"type": "text"
}
}
}
}
Step 3: Apply Reindex API:
POST _reindex
{
"source": {
"index": "my_source_index"
},
"dest": {
"index": "my_dest_index",
"pipeline": "my-pipeline-concat" <--- Make sure you add pipeline here
}
}
Note how I've mentioned the pipeline while using Reindex API
Step 4: Verify Destination Index:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_dest_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "johndoe",
"uuid" : "johndoe-john#gmail.com-doe#outlook.com", <--- Note this
"email" : [
"john#gmail.com",
"doe#outlook.com"
]
}
}
]
}
}
Hope this helps!
Why can I not see the _timestamp field while being able to filter a query by it?
The following query return the correct documents, but not the timestamp itself. How can I return the timestamp?
{
"fields": [
"_timestamp",
"_source"
],
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"_timestamp": {
"from": "2013-01-01"
}
}
}
}
}
}
The mapping is:
{
"my_doctype": {
"_timestamp": {
"enabled": "true"
},
"properties": {
"cards": {
"type": "integer"
}
}
}
}
sample output:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test1",
"_type" : "doctype1",
"_id" : "HjfryYQEQL6RkEX3VOiBHQ",
"_score" : 1.0, "_source" : {"cards": "5"}
}, {
"_index" : "test1",
"_type" : "doctype1",
"_id" : "sDyHcT1BTMatjmUS0NSoEg",
"_score" : 1.0, "_source" : {"cards": "2"}
}]
}
When timestamp field is enabled, it's indexed but not stored by default. So, while you can search and filter by the timestamp field, you cannot easily retrieve it with your records. In order to be able to retrieve the timestamp field you need to recreate your index with the following mapping:
{
"my_doctype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
},
"properties": {
...
}
}
}
This way you will be able to retrieve timestamp as the number of milliseconds since the epoch.
It is not necessary to store the timestamp field, since its exact value is preserved as a term, which is also more likely to already be present in RAM, especially if you are querying on it. You can access the timestamp via its term using a script_value:
{
"query": {
...
},
"script_fields": {
"timestamp": {
"script": "_doc['_timestamp'].value"
}
}
}
The resulting value is expressed in miliseconds since UNIX epoch. It's quite obscene that ElasticSearch can't do this for you, but hey, nothing's perfect.