Is there a way to enable _source on existing data? - elasticsearch

I create an Index without _source field (considerations of memory).
I want to enable this field on the existing data , there is a way to do that?
For example:
I will create dummy-index :
PUT /dummy-index?pretty
{
"mappings": {
"_doc": {
"_source": {
"enabled": false
}
}
}
}
and I will add the next document :
PUT /dummy-index/_doc/1?pretty
{
"name": "CoderIl"
}
I will get only the hit metadata when I search without the name field
{
"_index" : "dummy-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0
}
the question if I could change the _soruce to enable and when I search again I'll get the missing data (in this example "name" field) -
{
"_index" : "dummy-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0
"_source" : {
"name" :CoderIl"
}
}

As clarified in the chat, the issue is
_source field is disabled.
In search result he wants what was stored in the fields which is returned as part if _source if enabled like below
_source" : {
"name" :CoderIl"
}
Now in order to achieve it, store option must be enabled on the field, please note this can't be changed dynamically and you have to re-index data again with updated mapping.
Example
Index mapping
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"name": {
"type": "text"
},
"title" :{
"type" : "text",
"store" : true
}
}
}
}
Index sample docs
{
"name" : "coderIL"
}
{
"name" : "coderIL",
"title" : "seconds docs"
}
**Search doc with fields content using store fields
{
"stored_fields": [
"title"
],
"query": {
"match": {
"name": "coderIL"
}
}
}
And search result
"hits": [
{
"_index": "without_source",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156
},
{
"_index": "without_source",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"fields": {
"title": [
"seconds docs"
]
}
}
]

store option on field controls that, from the same official doc
By default, field values are indexed to make them searchable, but they
are not stored. This means that the field can be queried, but the
original field value cannot be retrieved.
Usually this doesn’t matter. The field value is already part of the
_source field, which is stored by default. If you only want to retrieve the value of a single field or of a few fields, instead of
the whole _source, then this can be achieved with source filtering.
As mentioned on the doc, by default its disabled, and if you want to save space, you can enable it on specific fields and need to re-index the data again
Edit: Index option controls(enabled by default) whether field is indexed or not(this is required for searching on the field) and store option controls whether it's stored or not, this is used if you want to get the non-analyzed value ie what you sent to ES in your index request, which based on field type goes through text analysis and part of index option, refer this SO question for more info.

Related

ElasticSearch Set Processor

I am trying to use the Elasticsearch Set Processor functionality to add a Queue wise Constant field to a given index which contains data from multiple Queues. The ElasticSearch documentation is really sparse in this respect.
I am trying to use the below code to create a Set Processor for Index default-*, but somehow it's not working
PUT /_ingest/pipeline/set_aht
{
"description": "sets queue wise AHT constants",
"processors": [
{
"set": {
"field": "queueAHTVal",
"value": "10",
"if": "queueName == 'A'"
}
}
]
}
Looking for some howto guidance from anyone who might have previously worked on Set Processor for ElasticSearch
I tried to work on a possible suggestion. If i understood your issue well, you want to add a new field based on a field value (queueName) when it equals to A?
If yes, I modified your pipeline and did a test locally.
Here is the updated pipeline code:
PUT _ingest/pipeline/set_aht
{
"processors": [
{
"set": {
"field": "queueAHTVal",
"value": "10",
"if": "ctx.queueName.equals('A')"
}
}
]
}
I used the _reindex API so as to ingest the data in another field.
POST _reindex
{
"source": {
"index": "espro"
},
"dest": {
"index": "espro-v2",
"pipeline": "set_aht"
}
}
The response is:
"hits" : [
{
"_index" : "espro-v2",
"_type" : "_doc",
"_id" : "7BErVHQB3IIDvL59miT1",
"_score" : 1.0,
"_source" : {
"queueName" : "A",
"queueAHTVal" : "10"
}
},
{
"_index" : "espro-v2",
"_type" : "_doc",
"_id" : "IBEsVHQB3IIDvL59iien",
"_score" : 1.0,
"_source" : {
"queueName" : "B"
}
}
Let me know if you need help or If I wrongly understood your issue, I will try to help thank you.

Elasticsearch merge multiple indexes based on common field

I'm using ELK to generate views out of the data from two different DB. One is mysql other one is PostgreSQL. There is no way of writing join query between those two DB instance. But I have a common field call "nic". Following are the documents from each index.
MySQL
index: user_detail
"_id": "871123365V",
"_source": {
"type": "db-poc-user",
"fname": "Iraj",
"#version": "1",
"field_lname": "Sanjeewa",
"nic": "871456365V",
"#timestamp": "2020-07-22T04:12:00.376Z",
"id": 2,
"lname": "Santhosh"
}
PostgreSQL
Index: track_details
"_id": "871456365V",
"_source": {
"#version": "1",
"nic": "871456365V",
"#timestamp": "2020-07-22T04:12:00.213Z",
"track": "ELK",
"type": "db-poc-ceg"
},
I want to merge both index in to single index using common field "nic". And create new index. So I can create visualization on Kibana. How can this be achieved?
Please note that each document in new index should have
"nic,fname,lname,track" as fields. Not the aggregation.
I would leverage the enrich processor to achieve this.
First, you need to create an enrich policy (use the smallest index, let's say it's user_detail):
PUT /_enrich/policy/user-policy
{
"match": {
"indices": "user_detail",
"match_field": "nic",
"enrich_fields": ["fname", "lname"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/user-policy/_execute
The next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/user_lookup
{
"description" : "Enriching user details with tracks",
"processors" : [
{
"enrich" : {
"policy_name": "user-policy",
"field" : "nic",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
},
{
"remove": {
"field": ["#version", "#timestamp", "type"]
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "track_details"
},
"dest": {
"index": "user_tracks",
"pipeline": "user_lookup"
}
}
After running this, the user_tracks index will contain exactly what you need, for instance:
{
"_index" : "user_tracks",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"fname" : "Iraj",
"nic" : "871456365V",
"lname" : "Santhosh",
"track" : "ELK"
}
}
If your source indexes ever change (new users, changed names, etc), you'll need to re-run the above steps, but before doing it, you need to delete the ingest pipeline and the ingest policy (in that order):
DELETE /_ingest/pipeline/user_lookup
DELETE /_enrich/policy/user-policy
After that you can freely re-run the above steps.
PS: Just note that I cheated a bit since the record in user_detail doesn't have the same nic in your example, but I guess it was a copy/paste issue.

Elasticseach - force a field to index only, avoid store

How do I force a field to be indexed only and not store the data. This option is available in Solr and not sure if it's possible in Elasticseach.
From document
By default, field values are indexed to make them searchable, but they
are not stored. This means that the field can be queried, but the
original field value cannot be retrieved.
Usually this doesn’t matter. The field value is already part of the
_source field, which is stored by default. If you only want to retrieve the value of a single field or of a few fields, instead of
the whole _source, then this can be achieved with source filtering
If you don't want field to be stored in _source too. You can exclude the field from source in mapping
Mapping:
{
"mappings": {
"properties": {
"title":{
"type":"text"
},
"description":{
"type":
}
},
"_source": {
"excludes": [
"description"
]
}
}
}
Query:
GET logs/_search
{
"query": {
"match": {
"description": "b" --> field description is searchable(indexed)
}
}
}
Result:
"hits" : [
{
"_index" : "logs",
"_type" : "_doc",
"_id" : "-aC9V3EBkD38P4LIYrdY",
"_score" : 0.2876821,
"_source" : {
"title" : "a" --> field "description" is not returned
}
}
]
Note:
Removing fields from source will cause below issue
The update, update_by_query, and reindex APIs.
On the fly highlighting.
The ability to reindex from one Elasticsearch index to another, either to change mappings or analysis, or to upgrade an index to a new major version.
The ability to debug queries or aggregations by viewing the original document used at index time.
Potentially in the future, the ability to repair index corruption automatically.

In Elastic search ,how to get "-id" value of a document by providing unique content present in the document

I have few documents ingested in Elastic search. A sample document is as below.
"_index": "author_index",
"_type": "_doc",
"_id": "cOPf2wrYBik0KF", --Automatically generated by Elastic search after ingestion
"_score": 0.13956004,
"_source": {
"author_data": {
"author": "xyz"
"author_id": "123" -- This is unique id for each document
"publish_year" : "2016"
}
}
Is there a way to get the auto-generated _id by sending author_id from High-level Rest Client?
I tried researching solutions.But all the solutions are only fetching the document using _id. But I need the reverse operation.
Actual Output expected: cOPf2wrYBik0KF
The SearchHit provides access to basic information like index, document ID and score of each search hit, so with Search API you can do it this way on Java,
String index = hit.getIndex();
String id = hit.Id() ;
OR something like this,
SearchResponse searchResponse =
client.prepareSearch().setQuery(matchAllQuery()).get();
for (SearchHit hit : searchResponse.getHits()) {
String yourId = hit.id();
}
SEE HERE: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search.html#java-rest-high-search-response
You can use source filtering.You can turn off _source retrieval as you are interested in just the _id.The _source accepts one or more wildcard patterns to control what parts of the _source should be returned.(https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-source-filtering.html):
GET /author_index
{
"_source" : false,
"query" : {
"term" : { "author_data.author_id" : "123" }
}
}
Another approach will also give for the _id for the search.The stored_fields parameter is about fields that are explicitly marked as stored in the mapping, which is off by default and generally not recommended:
GET /author_index
{
"stored_fields" : ["author_data.author_id", "_id"],
"query" : {
"term" : { "author_data.author_id" : "123" }
}
}
Output for both above queries:
"hits" : [
{
"_index" : "author_index",
"_type" : "_doc",
"_id" : "cOPf2wrYBik0KF",
"_score" : 6.4966354
}
More details here: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-stored-fields.html

Attempting to use Elasticsearch Bulk API when _id is equal to a specific field

I am attempting to bulk insert documents into an index. I need to have _id equal to a specific field that I am inserting. I'm using ES v6.6
POST productv9/_bulk
{ "index" : { "_index" : "productv9", "_id": "in_stock"}}
{ "description" : "test", "in_stock" : "2001"}
GET productv9/_search
{
"query": {
"match": {
"_id": "2001"
}
}
}
When I run the bulk statement it runs without any error. However, when I run the search statement it is not getting any hits. Additionally, I have many additional documents that I would like to insert in the same manner.
What I suggest to do is to create an ingest pipeline that will set the _id of your document based on the value of the in_stock field.
First create the pipeline:
PUT _ingest/pipeline/set_id
{
"description" : "Sets the id of the document based on a field value",
"processors" : [
{
"set" : {
"field": "_id",
"value": "{{in_stock}}"
}
}
]
}
Then you can reference the pipeline in your bulk call:
POST productv9/doc/_bulk?pipeline=set_id
{ "index" : {}}
{ "description" : "test", "in_stock" : "2001"}
By calling GET productv9/_doc/2001 you will get your document.

Resources