How to index same doc in different indices with different routing - elasticsearch

I need to be able to index the same document in different indexes with different routing value.
Basically the problem to solve is to be able to calculate complex aggregations about payment information from the perspective of payer and collector. For example, "payments made / received in the last 15 days grouped by status"
I was wondering how we can achieve this using ElasticSearch bulk api.
Is it possible to achieve this without generating redundancy in the ndjson? Something like this for example:
POST _bulk
{ "index" : { "_index" : "test_1", "_id" : "1", "routing": "1234" } }
{ "index" : { "_index" : "test_2", "_id" : "1", "routing": "5678" } }
{ "field1" : "value1" }
I looked for documentation but I didn't find a place that explain this.

By only using the bulk API, you'll need to repeat the document each time.
Another way of doing it is to bulk-index the documents into the first index and then using the Reindex API you can create the second index with a different routing value for each document.
POST _bulk
{ "index" : { "_index" : "test_1", "_id" : "1", "routing": "1234" } }
{ "field1" : "value1", "routing2": "5678" }
And then you can reindex into a second index using the second routing value (that you need to store in the document somehow
POST _reindex
{
"source": {
"index": "test_1"
},
"dest": {
"index": "test_2"
},
"script": {
"source": "ctx._routing = ctx._source.routing2",
"lang": "painless"
}
}
That way, you only index the data once using the bulk API, which will roughly take half the time than when doubling all documents, and then by leveraging the Reindex API all the data will be reindexed internally (i.e. without the added network latency of sending the potentially big payload)

Related

Elasticsearch merge multiple indexes based on common field

I'm using ELK to generate views out of the data from two different DB. One is mysql other one is PostgreSQL. There is no way of writing join query between those two DB instance. But I have a common field call "nic". Following are the documents from each index.
MySQL
index: user_detail
"_id": "871123365V",
"_source": {
"type": "db-poc-user",
"fname": "Iraj",
"#version": "1",
"field_lname": "Sanjeewa",
"nic": "871456365V",
"#timestamp": "2020-07-22T04:12:00.376Z",
"id": 2,
"lname": "Santhosh"
}
PostgreSQL
Index: track_details
"_id": "871456365V",
"_source": {
"#version": "1",
"nic": "871456365V",
"#timestamp": "2020-07-22T04:12:00.213Z",
"track": "ELK",
"type": "db-poc-ceg"
},
I want to merge both index in to single index using common field "nic". And create new index. So I can create visualization on Kibana. How can this be achieved?
Please note that each document in new index should have
"nic,fname,lname,track" as fields. Not the aggregation.
I would leverage the enrich processor to achieve this.
First, you need to create an enrich policy (use the smallest index, let's say it's user_detail):
PUT /_enrich/policy/user-policy
{
"match": {
"indices": "user_detail",
"match_field": "nic",
"enrich_fields": ["fname", "lname"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/user-policy/_execute
The next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/user_lookup
{
"description" : "Enriching user details with tracks",
"processors" : [
{
"enrich" : {
"policy_name": "user-policy",
"field" : "nic",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
},
{
"remove": {
"field": ["#version", "#timestamp", "type"]
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "track_details"
},
"dest": {
"index": "user_tracks",
"pipeline": "user_lookup"
}
}
After running this, the user_tracks index will contain exactly what you need, for instance:
{
"_index" : "user_tracks",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"fname" : "Iraj",
"nic" : "871456365V",
"lname" : "Santhosh",
"track" : "ELK"
}
}
If your source indexes ever change (new users, changed names, etc), you'll need to re-run the above steps, but before doing it, you need to delete the ingest pipeline and the ingest policy (in that order):
DELETE /_ingest/pipeline/user_lookup
DELETE /_enrich/policy/user-policy
After that you can freely re-run the above steps.
PS: Just note that I cheated a bit since the record in user_detail doesn't have the same nic in your example, but I guess it was a copy/paste issue.

Can i use Elasticsearch with data that needs authentication to view(ex.Logged in users only)

I want to implement searching on my website.Users should be able to search for products that they have in their shop.Obviously,the products returned should only be theirs,same if customers search on their website.How can i implement this with Elasticsearch?Obviously,i will have my backend do the query not the front-end,but how will i limit the search results to be only for one user.Is it only possible through filtering with my own code.Does it have something like WHERE from sql?Am i going about it the wrong way?Will it be better if i use the Full text search from PostgreSQL.
I am using GO btw.
Best regards
Update:My usecase as requested:
User is paired with an ID.He is in his dashboard and searches for a product he has in his shop.His requests passes the session token cookie and i get his ID on my server.Then i need to get the products that match his query and only his.
In SQL it would be SELECT * FROM products WHERE shop_id=ID for example.Is it possible with Elasticsearch?Is it more trouble than worth instead of implementing full text search on PostgreSQL?
Iy can be easily achieved using Elasticsearch and you should define shop_id as a keyword field and later on use that in filter context of query to make sure, you search only on the products belong to a particular shop_id.
Using shop_id in filter context also improves the performance of your search significantly as these are by default cached at Elasticsearch as explained in the official doc
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.
Is the status field set to "published"?
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
Sample mapping and query according to your requirement:
Index mapping
{
"mappings" :{
"properties" : {
"product" : {
"type" : "text"
},
"shop_id" :{
"type" : "keyword"
}
}
}
}
Index sample docs for 2 diff shop ids
{
"product" : "foo",
"shop_id" : "stackoverflow"
}
{
"product" : "foo",
"shop_id" : "opster"
}
Search for foo product where shop_id is stackoverflow
{
"query": {
"bool": {
"must": [
{
"match": {
"product": "foo"
}
}
],
"filter": [
{
"term": {
"shop_id": "stackoverflow"
}
}
]
}
}
}
Search result
"hits": [
{
"_index": "productshop",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {. --> note only foo belong to `stackoverflow` returned
"product": "foo",
"shop_id": "stackoverflow"
}
}
]

Elasticseach - force a field to index only, avoid store

How do I force a field to be indexed only and not store the data. This option is available in Solr and not sure if it's possible in Elasticseach.
From document
By default, field values are indexed to make them searchable, but they
are not stored. This means that the field can be queried, but the
original field value cannot be retrieved.
Usually this doesn’t matter. The field value is already part of the
_source field, which is stored by default. If you only want to retrieve the value of a single field or of a few fields, instead of
the whole _source, then this can be achieved with source filtering
If you don't want field to be stored in _source too. You can exclude the field from source in mapping
Mapping:
{
"mappings": {
"properties": {
"title":{
"type":"text"
},
"description":{
"type":
}
},
"_source": {
"excludes": [
"description"
]
}
}
}
Query:
GET logs/_search
{
"query": {
"match": {
"description": "b" --> field description is searchable(indexed)
}
}
}
Result:
"hits" : [
{
"_index" : "logs",
"_type" : "_doc",
"_id" : "-aC9V3EBkD38P4LIYrdY",
"_score" : 0.2876821,
"_source" : {
"title" : "a" --> field "description" is not returned
}
}
]
Note:
Removing fields from source will cause below issue
The update, update_by_query, and reindex APIs.
On the fly highlighting.
The ability to reindex from one Elasticsearch index to another, either to change mappings or analysis, or to upgrade an index to a new major version.
The ability to debug queries or aggregations by viewing the original document used at index time.
Potentially in the future, the ability to repair index corruption automatically.

In Elastic search ,how to get "-id" value of a document by providing unique content present in the document

I have few documents ingested in Elastic search. A sample document is as below.
"_index": "author_index",
"_type": "_doc",
"_id": "cOPf2wrYBik0KF", --Automatically generated by Elastic search after ingestion
"_score": 0.13956004,
"_source": {
"author_data": {
"author": "xyz"
"author_id": "123" -- This is unique id for each document
"publish_year" : "2016"
}
}
Is there a way to get the auto-generated _id by sending author_id from High-level Rest Client?
I tried researching solutions.But all the solutions are only fetching the document using _id. But I need the reverse operation.
Actual Output expected: cOPf2wrYBik0KF
The SearchHit provides access to basic information like index, document ID and score of each search hit, so with Search API you can do it this way on Java,
String index = hit.getIndex();
String id = hit.Id() ;
OR something like this,
SearchResponse searchResponse =
client.prepareSearch().setQuery(matchAllQuery()).get();
for (SearchHit hit : searchResponse.getHits()) {
String yourId = hit.id();
}
SEE HERE: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search.html#java-rest-high-search-response
You can use source filtering.You can turn off _source retrieval as you are interested in just the _id.The _source accepts one or more wildcard patterns to control what parts of the _source should be returned.(https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-source-filtering.html):
GET /author_index
{
"_source" : false,
"query" : {
"term" : { "author_data.author_id" : "123" }
}
}
Another approach will also give for the _id for the search.The stored_fields parameter is about fields that are explicitly marked as stored in the mapping, which is off by default and generally not recommended:
GET /author_index
{
"stored_fields" : ["author_data.author_id", "_id"],
"query" : {
"term" : { "author_data.author_id" : "123" }
}
}
Output for both above queries:
"hits" : [
{
"_index" : "author_index",
"_type" : "_doc",
"_id" : "cOPf2wrYBik0KF",
"_score" : 6.4966354
}
More details here: https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-stored-fields.html

Attempting to use Elasticsearch Bulk API when _id is equal to a specific field

I am attempting to bulk insert documents into an index. I need to have _id equal to a specific field that I am inserting. I'm using ES v6.6
POST productv9/_bulk
{ "index" : { "_index" : "productv9", "_id": "in_stock"}}
{ "description" : "test", "in_stock" : "2001"}
GET productv9/_search
{
"query": {
"match": {
"_id": "2001"
}
}
}
When I run the bulk statement it runs without any error. However, when I run the search statement it is not getting any hits. Additionally, I have many additional documents that I would like to insert in the same manner.
What I suggest to do is to create an ingest pipeline that will set the _id of your document based on the value of the in_stock field.
First create the pipeline:
PUT _ingest/pipeline/set_id
{
"description" : "Sets the id of the document based on a field value",
"processors" : [
{
"set" : {
"field": "_id",
"value": "{{in_stock}}"
}
}
]
}
Then you can reference the pipeline in your bulk call:
POST productv9/doc/_bulk?pipeline=set_id
{ "index" : {}}
{ "description" : "test", "in_stock" : "2001"}
By calling GET productv9/_doc/2001 you will get your document.

Resources