Elasticsearch querying alias with routing giving partial results - elasticsearch

In an effort to create multi-tenant architecture for my project.
I've created an elasticsearch cluster with an index 'tenant'
"tenant" : {
"some_type" : {
"_routing" : {
"required" : true,
"path" : "tenantId"
},
Now,
I've also created some aliases -
"tenant" : {
"aliases" : {
"tenant_1" : {
"index_routing" : "1",
"search_routing" : "1"
},
"tenant_2" : {
"index_routing" : "2",
"search_routing" : "2"
},
"tenant_3" : {
"index_routing" : "3",
"search_routing" : "3"
},
"tenant_4" : {
"index_routing" : "4",
"search_routing" : "4"
}
I've added some data with tenantId = 2
After all that, I tried to query 'tenant_2' but I only got partial results, while querying 'tenant' index directly returns with the full results.
Why's that?
I was sure that routing is supposed to query all the shards that documents with tenantId = 2 resides on.

When you have created aliases in elasticsearch, you have to do all operations using aliases only. Be it indexing, update or search.
Try reindexing the data again and check if possible (If it is a test index, I hope so).
Remove all the indices.
curl -XDELETE 'localhost:9200/' # Warning:!! Dont use this in production.
Use this command only if it is test index.
Create the index again. Create alias again. Do all the indexing, search and delete operations on alias name. Even the import of data should also be done via alias name.

Related

Elasticsearch conflict while putting document to index

I want to create an index and modify its setting with template and at the same time create an alias for it
"template_1" : {
"order" : 0,
"index_patterns" : [
"test*"
],
"settings" : {
"index" : {
"number_of_shards" : "2",
"number_of_replicas" : "2"
}
},
"mappings" : { },
"aliases" : {
"some-alias" : { }
}
}
}
when I am trying to put a document using alias, it tries to create an index with the alias name. However I am looking for something which will search for the index which has this alias and throws an error that there are no index exist with this alias
The problem is you are referencing multiple indexes with a single alias, so when you PUT a document ES does not know in which document to store it to.
Quoting the doc:
If no write index is specified and there are multiple indices referenced by an alias, then writes will not be allowed.
One solution, as per quote above, is to specify a write index (see docs) as the default destination for new documents (its also possible to specify rollover rules to update it).
The other solution, of course, is use the actual index name when putting docs.

Attempting to delete all the data for an Index in Elasticsearch

I am trying to delete all the documents, i.e. data from an index. I am using v6.6 along with the dev tools in Kibana.
In the past, I have done this operation successfully but now it is saying 'not found'
{
"_index" : "new-index",
"_type" : "doc",
"_id" : "_query",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 313,
"_primary_term" : 7
}
Here is my kibana statement
DELETE /new-index/doc/_query
{
"query": {
"match_all": {}
}
}
Also, the index GET operation which verified the index has data and exists:
GET new-index/doc/_search
I verified the type is doc but I can post the whole mapping, if needed.
Easier way is to navigate in Kibana to Management->Elasticsearch index mapping then select indexes you would like to delete via checkboxes, and click on Manage index -> delete index or flush index depending on your need.
I was able to resolve the issue by using a delete by query:
POST new-index/_delete_by_query
{
"query": {
"match_all": {}
}
}
Delete documents is a problematic way to clear data.
Preferable delete index:
DELETE [your-index]
From kibana console.
And recreate from scratch.
And more preferable way is to make a template for an index that creates index as well with the first indexed document.
Only solutions currently are to either delete the index itself (faster), or delete-by-query (slower)
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/docs-delete-by-query.html
POST new-index/_delete_by_query?conflicts=proceed
{
"query": {
"match_all": {}
}
}
Delete API only removes a single document https://www.elastic.co/guide/en/elasticsearch/reference/7.4/docs-delete.html
My guess is that someone changed a field's name and now the DB (NoSQL) and Elasticsearch string name for that field doesn't match. So Elasticsearch tried to delete that field, but the field was "not found".
It's not an error I would lose sleep over.

Query to compare two columns from two different indexes

I have a Column in one Index with a number of Countries in it, I want to check whether these countries are similar or same as countries in the Column in another index.
So it is like, in one index we have user data with the countries user has specified and in the other index we have the master data with the actual countries. So now I want to check whether the countries entered by the user are the same as the ones in master data.
If anybody knows how to write a query for this in Kibana, kindly help.
GET final,master/_count
{"query": {"bool": {"must": [{"script": {"script":"['A_OPERATINGCOUNTRY'].value == ['AD_Country Name.keyword'].value"}}]}}}
You need to manage this outside Elasticsearch. But, since the incoming data doesn't have the country name you want, why do you want to check this on Elasticsearch? The form you are using should have the exactly value you want to index.
You can use the terms query
Suppose we have an index of page access logs like so:
PUT /mybeat-2018/_doc/1
{
"host" : "elastic.co",
"ttl" : 40
}
PUT /mybeat-2018/_doc/2
{
"host" : "elastic.co",
"ttl" : 666
}
PUT /mybeat-2018/_doc/3
{
"host" : "google.com",
"ttl" : 55
}
and an independent whitelist that can shrink or grow, with a bunch of hosts:
PUT /whitelist/_doc/1
{
"hosts" : [
{
"name" : "elastic.co"
},
{
"name" : "twitter.com"
}
]
}
Then a search on the mybeat-* for whatever is in the whitelist should reference the whitelist document (in our case the document with id: 1) like so:
GET /mybeat-*/_search
{
"query" : {
"terms" : {
"host" : {
"index" : "whitelist",
"type" : "_doc",
"id" : "1",
"path" : "hosts.name"
}
}
}
}

ElasticSearch Filtered Aliases Creation - Best Practice

We are planning to use Filtered Aliases as mentioned here - https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
Our input data is going to be a stream with each line of the stream corresponding to an object we would like to store in ES.
Each object contains an 'id', which we are using for routing and filtering.
QUESTION -
How do we create alias and index data in a performant way ?
-- Do we index all data, keep track of all the unique 'id's and the very end create the filtered alias ? OR
-- For each object, check if an alias for that 'id' exists; if it doesn't create one ?
I'm leaning towards the first approach. Is it advisable and performant when compared to the second approach ?
TIA.
Based on our discussion above and after having glanced over the blog article you posted, I'm pretty positive that in your case you don't need aliases at all and the routing key would suffice. Again, only because you have a single index, if you had many indices this would not be true anymore!
You simply need to specify the routing key to use when indexing your document. Until ES 2.0, you can use the _routing field for that purpose, even though it's been deprecated in ES 1.5, but in your case it serves your purpose.
{
"customer" : {
"_routing" : {
"required" : true,
"path" : "customer_id" <----- the field you use as the routing key
},
"properties": { ... }
}
}
Then when searching you simply need to specify &routing=<customer_id> in your search URL in addition to your customer id filter (since a given shard can host documents for different customers). Your search will go directly to the shard identified by the given routing key, and thus, only retrieve data from the specified customer.
Using a filtered alias for this brings nothing as the filter and routing key you'd include in your alias definition would not contribute anything additional, since the retrieved documents are already "filtered" (kind of) by the routing key. This is way easier than trying to detect (on each new document to index) if an alias exists or not and create it if it doesn't.
UPDATE:
Now if you absolutely have/want to create filtered aliases, the more performant way would be the first one you mentioned:
First index your daily data
Then run a terms aggregation on your customer_id field with size high enough (i.e. higher than the cardinality of the field, which was ~100 in your case) to make sure you capture all unique customer ids to create your aliases
Loop over all the buckets to retrieve all unique customer ids
Create all aliases in one shot using one action for each customer_id
curl -XPOST 'http://localhost:9200/_aliases' -d '{
"actions" : [
{
"add" : {
"index" : "customers",
"alias" : "alias_cid1",
"routing" : "cid1",
"filter" : { "term" : { "customer_id" : "cid1" } }
}
},
{
"add" : {
"index" : "customers",
"alias" : "alias_cid2",
"routing" : "cid2",
"filter" : { "term" : { "customer_id" : "cid2" } }
}
},
{
"add" : {
"index" : "customers",
"alias" : "alias_cid3",
"routing" : "cid3",
"filter" : { "term" : { "customer_id" : "cid3" } }
}
},
...
]
}'
Note that you don't have to worry if an alias already exists, the whole command won't fail and silently ignore the existing alias.
When this command has run, you'll have all your aliases on your unique index, properly configured with a filter and a routing key.

Index huge data into Elasticsearch

I am new to elasticsearch and have huge data(more than 16k huge rows in the mysql table). I need to push this data into elasticsearch and am facing problems indexing it into it.
Is there a way to make indexing data faster? How to deal with huge data?
Expanding on the Bulk API
You will make a POST request to the /_bulk
Your payload will follow the following format where \n is the newline character.
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
...
Make sure your json is not pretty printed
The available actions are index, create, update and delete.
Bulk Load Example
To answer your question, if you just want to bulk load data into your index.
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
The first line contains the action and metadata. In this case, we are calling create. We will be inserting a document of type type1 into the index named test with a manually assigned id of 3 (instead of elasticsearch auto-generating one).
The second line contains all the fields in your mapping, which in this example is just field1 with a value of value3.
You will just concatenate as many as these as you'd like to insert into your index.
This may be an old thread but I though I would comment anyway for anyone who is looking for a solution to this problem. The JDBC river plugin for Elastic Search is very useful for importing data from a wide array of supported DB's.
Link to JDBC' River source here..
Using Git Bash' curl command I PUT the following configuration document to allow for communication between ES instance and MySQL instance -
curl -XPUT 'localhost:9200/_river/uber/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/elastic",
"user" : "root",
"password" : "root",
"sql" : "select * from tbl_indexed",
"poll" : "24h",
"max_retries": 3,
"max_retries_wait" : "10s"
},
"index": {
"index": "uber",
"type" : "uber",
"bulk_size" : 100
}
}'
Ensure you have the mysql-connector-java-VERSION-bin in the river-jdbc plugin directory which contains jdbc-river' necessary JAR files.
Try bulk api
http://www.elasticsearch.org/guide/reference/api/bulk.html

Resources