Scenario: I have an application where my java application pushes user data from database to Elastic search which is accessed using Kibana dashboards. I also have a Content application which allow users to create/edit data which is saved in database using my java application.
Use case: When user slices data in Kibana dashboards and reaches a point where he realize an error in data, he would want to make change to the data point. E.g. certain company is shown in a particular city in the dashboard which seems to be an incorrect data. User would want to change the city to the correct one.
Problem case: I am not able find a way to either allow the data to be edited within Kibana or have some kind of deeplink in Kibana which takes user from Kibana to my Content application so that the data point can be edited by user.
Currently the user can go to the Content application, search for the company, search for the addresses and make a change there, however thats very cumbersome to do with millions of companies and millions of data points in database.
Haven't found editing possibilities up to now ... but linking is possible:
when you head to "Kibana/Mgmt/Index Patterns" you can define fields to render as a clickable URL (e.g. to be used in the "Data Table" vis).
If you have a field containing e.g. some ID myid you can have kibana output a clickable link instead pointing to e.g. https://mysite/?id=myid
See https://www.elastic.co/guide/en/kibana/current/field-formatters-string.html for details.
If you need more complex linking options (e.g. your effective link needs to incorporate multiple fields of a document) you can create a so called scripted field, there you have access to multiple fields of an elasticsearch document and can construct your link more or less freely).
We use that a lot to link from overview kibana dashboards to other systems with detailed data on the respective item, see for example this Data Table:
Related
Popular search engines are quite performant when it comes to full text searches and many other aspects, however, I am not sure how to map the main document storage system security policies to ES and/or SOLR?
Consider Google Drive and it's folders. Users can share any folder - then files and folders below are also shared. Content management systems use something similar.
But how to map that to the external search engines (that is, not built-in to application's content management system), especially, if there are millions of documents in many tens of thousands of folders, tens of thousands of users? Will it help if, for example, depth (nestedness) of the folders is limited to some small number?
I know ES has user roles, but I can't see it can help here, because accesses are given more or less arbitrary. Another approach is to somehow materialize user access in the documents (folders and documents) themselves, but then changes in users' roles, local to some folder, will result in changing many thousands of documents.
Also, searches can be quite arbitrary and lengthy, so it is desired to have pagination, so, for example, fetching "everything" and then sorting out user access on application side is not an option.
I believe the scenario described is quite common, but I can't find any hints how to implement it.
I had used solr as search engine and solr's Data Import Handler (DIH) feature for importing the data from database to Solr.
I would suggest you to go with the approach of indexing the acl's along with the documents.
I had done the same approach and its working fine till now.
I agree that you have re-index the data on the solr side when there is any changes on folder access or change in the access of level of documents. We do need to re-index the document if the metadata of the document is changes or the content of the document is changes. Similarly we can also update the documents on the solr side for any changes in the ACL(Access Control List).
Why to index the ACL along with Document information.
The reason is whenever user search for a document, you can pass the user acl as part of the query in the form of filter query and get the documents which are accessible to user.
I feel this removes the complexity of applying the acl logic at the back end side.
If you dont index the ACL in solr, then you have to filter out the documents after you retrieve from solr by checking the document is and whatever the acl logic applies.
Or the last option could be index the document without acls. Let the user search all the documents. When he tries to perform any action on those documents then you can check the permission and allow the user to perform the action or deny the user saying you dont have enough permission to access the document.
Action could be like View, Download, Update etc..
You need to decide whichever approach suits and works out in your case.
What I have : Elastic search database for full text search purposes.
What my requirement is : In a given elasticsearch index, I need to detect some sensitive data like iban no, credit card no, passport no, social security no, address etc. and report them to the client. There will be checkboxes as input parameters. For instance, the client can select credit card no and passport no, then clicks detect button. After that, the system will start scanning index, and reports documents which include credit card no and passport no. It is aimed to have more than 200 sensitive data types, and clients will be able to make multiple selections over these types.
What I have done : I have created a C# application and used Nest library for ES queries. In order to detect each sensitive data type, I have created regular expressions and some special validation rules in my C# app which works well for manually given input string.
In my C# app, I have created a match all query with scroll api. When the user clicks detect button, my app is iterating all the source records which returns from scroll api,and for each record, the app is executing sensitive data finder codes based on client's selection.
The problem here is searching all source records in the ES index, extracting sensitive datas and preparing report as fast as possible with great amount of documents. I know ES is designed for full text search, not for scanning whole system and bringing data. However all data are in elasticsearch right now and I need to use this db to make detecting operation.
I am wondering if I can do this in a different and efficient way. Can this problem be solved with writing an elastic search plugin without a C# app? Or is there a better solution to scan the whole source data in ES index?
Thanks for suggestions.
Passport number, other sensitive information detection algorithm should run once, during indexing time, or maybe asynchronously as a separate job that will update documents with flags representing the presence of sensitive information. Based on the flag the relevant documents can be searched.
Search time analysis in this case will be very costly and should be avoided.
I'm using ElasticSearch 7.1.1 as a full-text search engine. At the beginning all the documents are accessible to every user. I want to give users the possibility to edit documents. The modified version of the document will be accessible only to the editor and everyone else will only be able to see the default document.
To do this I will add two array to every document:
An array of users excluded from seeing the doc
An array with the only user that can see the this doc
Every time someone edit a document I will:
Add to the excluded users list the user that made the edit
Create document containing the edit available only to that user.
This way in the index I'll have three types of documents:
Documents accessible to everyone
Documents accessible to everyone except some users
Documents accessible only to a specific users
I use ElasticSearch not only to fetch documents but also to calculate live aggregations (e.g. sums of some field) so query-time I will be able to fetch user specific documents.
I don't expect a lot of edits, less than 1% of the total documents.
Is there a smarter, and less query intensive, way to obtain the same results?
You could implement a document level security.
With that you can define roles that restrict the read-access to certain documents that match a query (e.g. you could use the id of the document).
So instead of updating the documents each time via your proposed array-solution, you would instead update the role respectively granting the roles to the particular users. This would of course require that every user has an elasticsearch user.
This feature is the only workaround to fulfill your requirements that Elasticsearch brings on the table "out of the box" as far as I know.
I hope I could help you.
Goal: I want to create a dashboard which shows user requests made to my website. For this, I created a filter in my java web-app and started capturing user requests and storing them in an ES index. The document is in the form of:
{
'user': 'user1',
'url': 'domain.com/page1',
'hitcount': 12
}
So, now I have an index which contains the information as to how many times a user requested which URLs.
Now, I want to create visualizations to show usage trends per user.
Question:
Which visualizations should be used for this use-case?
If I need to show the change in user-trends over time, how should I save the data? For e.g. is there a visualization where I could show, that a user has stopped/reduced requesting a page and now accesses a different page more frequently.
Any direction will be helpful.
Note: I understand, this could be done with grafana + prometheus, but I wish to do this with elastic stack.
I’d recommend logging user requests to a log file and have filebeat read and index them into ES. It is better to send non aggregated data into ES and then let ES aggregate it to create required visualizations
I'm wondering if having thousands of different indexes is a bad idea?
I'm adding a search page to my web app based on ElasticSearch. The search page lets users search for other users on the site by filtering on a number of different indexed criteria (name, location, gender etc). This is fairly straight forward and will require just one index that contains a document every user of the site.
However, I want to also create a page where users can see a list of all of the other users they follow. I want this page to have the same filtering options that are available on the search page. I'm wondering if a good way to go about this would be to create a separate index for each user containing documents for each user they follow?
While you can certainly create thousands of indices in elasticsearch, I don't really see the need for it in your use case. I think you can use one index. Simply create an additional child type followers for the main user record. Every time user A follows user B, create a child record of B with the following content: {"followed_by" : "A"}. To get the list of users that current user is following, you can simply add Has Child Filter to you query.
I would like to add to Igor's answer that creating thousand of indexes on a tiny cluster (one or two nodes) can cause some drawbacks.
Each shard of an index is a full Lucene instance. That said, you will have many opened files (probably too many opened files) if you have a single node (or a small cluster - in term of nodes).
That's one of the major reasons why I would not define too many indices...
See also File descriptors on installation guide