Elastic Search Shard/Routing

Elastic Search Shard/Routing - elasticsearch

I have a multi-tenant system and I am trying to design ElasticSearch to support multi-tenancy. I've searched on the net but all post I've found does not specify in practice how to do it.
The basic idea is to have on each index, 1 shard per customer and use custom routing to query the customer dedicated shard. This is clear. Now, how can I implement this? How can create multiple shards per index specifying the "key value" in order to query that specific shard in future? Code example will be helpful.
Thank you so much.

I don't think this is the correct way to achieve multi-tenancy, not at the shard-level.
What do you want to achieve exactly? I'd assume that you want that different users (or, better, different roles) can access different portions of an index. In this case, you should have a look at document-level security permissions.
How to achieve this in practice? Let's say that you have an index named multitenants-index, and you have two tenants such that: (i) the first tenant can read/write only those documents having the field "tenant": 1, (ii) the second tenant can read/write only those documents having the field "tenant": 2. Then, you might create the following two roles:
POST /_xpack/security/role/first_tenant
{
"indices": [
{
"names": [ "multitenants-index" ],
"privileges": [ "read", "write" ],
"query": "{\"match\": {\"tenant\": 1}}"
}
]
}
POST /_xpack/security/role/second_tenant
{
"indices": [
{
"names": [ "multitenants-index" ],
"privileges": [ "read", "write" ],
"query": "{\"match\": {\"tenant\": 2}}"
}
]
}

Related

Is it Possible to obtain User from Painless script when updating doc from Kibana?

Using Elastic Painless scripting, is it possible to get the user submitting a document update via the Kibana GUI?
Using ingest pipelines, I've tried to append the Security User to the context
{
"set_security_user": {
"field": "_security",
"properties": [
"roles",
"username",
"email",
"full_name",
"metadata",
"api_key",
"realm",
"authentication_type"
]
}
}
However regardless of which user is submitting a change to the document (via the Kibana GUI) it always sets it to:
...
"roles": [
"kibana_system",
"cloud-internal-enterprise_search-server"
],
"realm": {
"name": "found",
"type": "file"
},
"authentication_type": "REALM",
"username": "cloud-internal-enterprise_search-server"
...
Context:
What I'm trying to achieve is an additional layer of restrictions when users are modifying the Enterprise Search indexes. I want Developer Roles to be able to see all the configuration items within App Search (Enterprise Search via Kibana), but to only be able to read and not write. There doesn't seem to be a way to do this using the standard Enterprise Search roles which give Admins, Owners and Devs full read/write permissions for the engine.

is there a elasticsearch standard solution to load recently changed relational data

I have following tables which have millions of records and they are changing frequently is there a way to load that data in elasticsearch (for eventual consistency ) with spring boot initially and incrementally?
Tables :
Employee
Role
contactmethod (Phone/email/mobile)
channel
department
Status
Address
Here the document will be like below
{
"id":1,
"name": "tom john",
"Contacts":[
{
"mobile":123,
"type":"MOBILE"
},
{
"phone":223333
"type":"PHONE"
}
]
"Address":[
{
"city": "New york"
"ZIP": 12343
"type":"PERMANENT"
},
{
"city": "New york"
"ZIP": 12343
"type":"TEMPORARY"
}
]
}
.. simillar data for ROLE,DEPT etc tables
]
How do I make sure that ev.g. mobile number of "tom john" changed in relational DB will be propagated to elasticsearch DB ?

You should have a background job in your application, which pulls the data from DB(you know when there is change in DB of-course), and based on what you need(filtering, massaging) reindex that in your Elasticsearch index.
or you can use the logstash with JDBC to keep your data in sync, please refer to elastic blog on how to do it.
The first one is a flexible and not out of the box solution, while the second one is out of the box solution, and there are pros and cons of both the approaches and choose what fits best in your use-case.

Elasticsearch put role API

I started using the create role API and it works as expected : https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-put-role.html
I got the list of default roles in elasticsearch, /_security/role but I don't know to create the following roles and not able to find the proper docs for it.
I want to segregate the user based on the following needs,
Role which has the privilege to perform only READ / WRITE in all the indices in Elastic Search (This role should not have privilege to CREATE / DELETE indices
Role which has the privilege to perform only operations on Kibana
Role which has the privilege to perform only operations on Logstash

I want to segregate the user based on the following needs,
Role which has the privilege to perform only operations on Kibana
Role which has the privilege to perform only operations on Logstash
when Creating / Updating a role, you can find all valid privileges in security privilege of elasticsearch 7.x documentation then add / delete some of them into the role you update.
The role setup below should cover typical use cases of Kibana and Logstash :
For Logstash user
add manage_index_templates to cluster privilege list
add create_index and index to indice privilege list, for each index pattern
you may need create or create_doc in the indice privilege list, in case that you generate _id field of a document externally (instead of auto-generated ID by elasticsearch)
assign the new role you created to whatever users you like
# Quick example, with POST request /_security/role/my_logstash_role
{
"cluster": ["manage_index_templates"],
"indices": [
{
"names": [ "logstash-*", "YOUR_INDEX_PATTERN_2" ],
"privileges": ["create_index", "index"],
}
],
"applications": [
{
"application": "YOUR_APP_NAME",
"privileges": [ "YOUR_APP_PRIV" ],
}
],
}
For Kibana user
add read to indice privilege list, for each index pattern
assign the new role you created, and built-in role kibana_system to whatever users you like, note kibana_system includes (1) a cluster privilege named monitor and (2) access permissions to some index patterns e.g. .kibana*, .reporting-*, .monitoring-* , which are required by Kibana.
if you also use DevTool console of Kibana to interact with elasticsearch REST API, you may need to add few more privileges like write,delete,manage ...etc to the role, which highly depends on the API endpoints you attempt to call.
# Quick example, with POST request /_security/role/my_kibana_role
{
"cluster": [],
"indices": [
{
"names": [ "logstash-*", "YOUR_INDEX_PATTERN_2" ],
"privileges": ["read"],
}
],
"applications": [
{
"application": "YOUR_APP_NAME",
"privileges": [ "YOUR_CUSTOM_APP_PRIV" ],
}
],
}

How to index and query Nested documents in the Elasticsearch

I have 1 million users in a Postgres table. It has around 15 columns which are of the different datatype (like integer, array of string, string). Currently using normal SQL query to filter the data as per my requirement.
I also have an "N" number of projects (max 5 projects) under each user. I have indexed these projects in the elasticsearch and doing the fuzzy search. Currently, for each project (text file) I have a created a document in the elasticsearch.
Both the systems are working fine.
Now my need is to query the data on both the systems. Ex: I want all the records which have the keyword java (on elasticsearch) and with experience of more than 10 years (available in Postgres).
Since the user's count will be increasing drastically, I have moved all the Postgres data into the elasticsearch.
There is a chance of applying filters only on the fields related to the user (except project related fields).
Now I need to created nest projects for the corresponding users. I tried parent-child types and didn't work for me.
Could anyone help me with the following things?
What will be the correct way of indexing projects associated with the users?
Since each project document has a field called category, is it possible to get the matched category name in the response?
Are there any other better way to implement this?

By your description, we can tell that the "base document" is all based on users.
Now, regarding your questions:
Based on what I said before, you can add all the projects associated to each user as an array. Like this:
{
"user_name": "John W.",
..., #More information from this user
"projects": [
{
"project_name": "project_1",
"role": "Dev",
"category": "Business Intelligence",
},
{
"project_name": "project_3",
"role": "QA",
"category": "Machine Learning",
}
]
},
{
"user_name": "Diana K.",
..., #More information from this user
"projects": [
{
"project_name": "project_1"
"role": "Project Leader",
"category": "Business Intelligence",
},
{
"project_name": "project_4",
"role": "DataBase Manager",
"category": "Mobile Devices",
},
{
"project_name": "project_5",
"role": "Project Manager",
"category": "Web services",
}
]
}
This structure is with the goal of adding all the info of the user to each document, doesn't matter if the info is repeated. Doing this will allow you to bring back, for example, all the users that work in a specific project with queries like this:
{
"query":{
"match": {
"projects.name": "project_1"
}
}
}
Yes. Like the query above, you can match all the projects by their "category" field. However, keep in mind that since your base document is merely related to users, it will bring back the whole user's document.
For that case, you might want to use the Terms aggregation, which will bring you the unique values of certain fields. This can be "combined" with a query. Like this:
{
"query":{
"match": {
"projects.category": "Mobile Devices"
}
}
},
"size", 0 #Set this to 0 since you want to focus on the aggregation's result.
{
"aggs" : {
"unique_projects_names" : {
"terms" : { "field" : "projects.name" }
}
}
}
That last query will bring back, in the aggregation fields, all the unique projects' name with the category "Mobile Devices".
You can create a new index where you'll store all the information related to your projects. However, the relationships betwen users and projects won't be easy to keep (remember that ES is NOT intended for being an structured or ER DB, like SQL) and the queries will become very complex, even if you decide to name both of your indices (users and projects) in a way you can call them with a wildcard.
EDIT: Additional, you can consider store all the info related to your projects in Postgress and do the call separately, first get the project ID (or name) from ES and then the project's info from Postgres (since I assume is maybe the info that is more likely not to change).
Hope this is helpful! :D

Which is better simple_query_string or query_string?

What is the difference between simple_query_string and query_string in elastic search?
Which is better for searching?
In the elastic search simple_query_string documentation, they are written
Unlike the regular query_string query, the simple_query_string query
will never throw an exception and discards invalid parts of the
query.
but it not clear. Which one is better?

There is no simple answer. It depends :)
In general the query_string is dedicated for more advanced uses. It has more options but as you quoted it throws exception when sent query cannot be parsed as a whole. In contrary simple_query_string has less options but does not throw exception on invalid parts.
As an example take a look at two below queries:
GET _search
{
"query": {
"query_string": {
"query": "hyperspace AND crops",
"fields": [
"description"
]
}
}
}
GET _search
{
"query": {
"simple_query_string": {
"query": "hyperspace + crops",
"fields": [
"description"
]
}
}
}
Both are equivalent and return the same results from your index. But when you will break the query and sent:
GET _search
{
"query": {
"query_string": {
"query": "hyperspace AND crops AND",
"fields": [
"description"
]
}
}
}
GET _search
{
"query": {
"simple_query_string": {
"query": "hyperspace + crops +",
"fields": [
"description"
]
}
}
}
Then you will get results only from the second one (simple_query_string). The first one (query_string) will throw something like this:
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "Failed to parse query [hyperspace AND crops AND]",
"index_uuid": "FWz0DXnmQhyW5SPU3yj2Tg",
"index": "your_index_name"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
...
]
},
"status": 400
}
Hope you are now understand the difference with throwing/not throwing exception.
Which is better? If you want expose the search to some plain end users I would rather recommend to use simple_query_string. Thanks to that end user will get some result in each query case even if he made a mistake in a query. query_string is recommended for some more advanced users who will be trained in how is the correct query syntax so they will know why they do not have any results in every particular situation.

Adding to what #Piotr has mentioned,
What I understand is when you want the external users or consumers want to make use of the search solution, simple query string offers better solution in terms of error handling and limiting what kind of queries users can probably construct.
In other words, if the search solution is available publicly for any consumers to consume the solution, then I guess simple_query_string would make sense, however if I do know who my end-users are in a way I can drive them as what they are looking for, no reason why I cannot expose them via query_string
Also QueryStringQueryBuilder.java makes use of QueryStringQueryParser.java while SimpleQueryStringBuilder.java makes use of SimpleQueryStringQueryParser.java which makes me think that there would be certain limitations in parsing and definitely the creators wouldn't want many features to be managed by end-users. for .e.g dis-max and which is available in query_string.
Perhaps the main purpose of simple query string is to limit end-users to make use of simple querying for their purpose and devoid them of all forms of complex querying and advance features so that we have more control on our search engine (which I'm not really sure about but just a thought).
Plus the possibilities to misuse query_string might be more as only advanced users are capable of constructing certain complex queries in correct way which may be a bit too much for simple users who are looking for basic search solution.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elastic Search Shard/Routing - elasticsearch

Related

Is it Possible to obtain User from Painless script when updating doc from Kibana?

is there a elasticsearch standard solution to load recently changed relational data

Elasticsearch put role API

How to index and query Nested documents in the Elasticsearch

Which is better simple_query_string or query_string?

Categories

Resources