Type per user in Elasticsearch? - elasticsearch

I'm designing an analytics platform. Every user has access only to his own documents. All the documents have the same structure.
The default option is to have a userId field and use it every time I need to filter documents.
The question is will type per user improve search performance?

No, type per user won't improve your performance. It is exactly the same as filtering by the field.
But, you may consider using "filtered aliases". Since you actualy want to make different "views" for the same index you may create different aliases filtered by the userId as stated here.

Related

Map multiple values to a unique column in Elasticsearch

I want to work with Elasticsearch to process some Whatsapp chats. So I am initially planning the data load.
The problem is that the data exported from Whatsapp, doesn't contain a real unique id per user but it only contains the name of the user taken from the contact directory of the device where the chat is exported (ie. a user can change the number or have two numbers in the same group).
Because of that, I need to create a custom explicit mapping table between the user names and a self-generated unique id, that gets populated in an additional column.
Then, my question is: "How can I implement such kind of explicit mapping in Elasticsearch to generate an additional unique column?". Alternatively, a valid answer could be a totally different approach to the problem.
PS. As I write, I think the solution could be in the ingestion process, like in a python script, but I still want to post the question to understand if this is something that Elasticsearch can do by itself.
yes, do it during the index process
if you had the data that maps the name and the id stored in a separate index you could do this with an enrich processor when you index the data to add whichever value you want to the document via a pipeline
also - Elasticsearch doesn't have columns, only fields

Do I need to send data to Elasticsearch that I don't need for answering search requests?

Suppose I have a client with name, email, company attributes and I only need to search the name and email attributes.
Do I still need to send company attribute to elasticsearch and set index to false? or just send the required attributes only?
typically you would just index the entire document here. unless you have a massive data set (ie TBs), then the amount of savings you have would be minimal
and, from experience, there's a likely chance that someone will come along and ask to now search the company field, which means you will need to reindex everything to allow that
that said, yes you can definitely take that approach

ElasticSearch index per user?

I need to make a system using ElasticSearch.
Each user has its documents, and the scope of these documents is only inside its user scope. Any user document is no accessible for any other system user.
The question is, what's the best approach, create an index per user, or create a single index containing all the documents of each user.
Each user might have its custom meta-information field over their documents that other users have not.
I know that in general it's proposed to use a single index with user aliases, however I don't understand how to add this custom user's document meta-information in this big index.
For example, imagine userA has two documents indexed, and userB has 3 documents. In my system exists system pre-defined meta-information as filename and description, however, the system allows to each user defines each own custom meta-information, for example: userA might have a meta-information color over its documents, and userB might have a size meta-information field over each document.
I understand one posibility would be add new field on the single index, however, it can be out of bounds.
What's would be the best approach?
Thanks for all.
One index per user sounds like you'd run into trouble at some point - there is an overhead per index that would become significant once you have a lot of users (say 10000 or so)
I don't think you need this though - you could allow custom attributes on a per user basis by using nested fields - each nested object would have name and value properties (possibly multiple value properties) and so you can have arbitrary searchable metadata for your documents without needing to change the mapping each time.

Multiple routing field in elasticsearch

I am a newbie to elasticsearch. i need a clarification. i can understand how routing works, but I have a question.
Can i create routing for an document with multiple field. if yes, can i search the
data using single routing value. Can any on provide any example about it.
Imagine I have 5 fields: [username,id,age,dept,salary]. Now i need to create a routing value for this document. Can I do so using the username and id field?
Thanks in advance.
In answer to your question: no, you can't automatically use multiple fields for a routing value when indexing a document. You can choose one and only one field, and that field must contain a single value.
However, you could manually concatenate the username and id field and pass it in the indexing request:
PUT /index/type/id?routing=username_id
{ body }
That said, routing is a feature for more advanced users. It is very useful but does make life more complicated. You say that you're a newbie, so I'd suggest not playing with routing just yet. That can follow when you're running a 50 node cluster.

XPages: can i filter a view to show only entries that belong to a group?

i have a view in an xpage with some entries (lets say clients). I have an acl group of persons (clients) that contains some of the clients of the view. Now i want to use the search attribute of the view to show only entries that belong to the group.
I already use search attribute to select users by name e.g:
FIELD Name Contains "Chuck Norris"
Is there any similar query? (maybe using #isMember on the field....?)
UPDATE: i will have the group entries (client names) into a text list in a document too. so can i filter the "name" field of the view based on the values of a text list?
Perhaps using a reader field is a good idea. You're talking about restricting document access to a group of Domino users - that's exactly what reader fields are for.
For example, make your text list field containing client names into a reader field like this:
var item = document1.getFirstItem("myfield");
item.setReaders(true);
document1.save();
myfield needs to contain canonical names (CN=firstname lastname/O=organisation).
Using reader fields, you don't need to do any view filtering at all, it happens automatically. If you have really many documents (say, half a million or so), it could slow down things, otherwise, it's a nice approach.
When you want to restrict displaying documents only in one certain view reader fields are no solution, though. In that case, you need to do the view filtering yourself as you tried.
If you want to filter only for ONE certain client, then using a categorized view is the way to go. You can give the view panel the name of one client as category filter then.
If you want to filter for multiple clients, you need to do it based on fulltext search, just as you already tried. In that case, make sure you're working with Domino 9. Previous Domino versions don't apply the view sorting order to a fulltext search result, which means you have to search it manually using custom javascript or so, which is complicated.
Or, as Frantisek suggested, write a scheduled agent which puts documents in folders depending on their clients - but depending on the number of clients you want to filter the view for this may lead to many folders, which may lead to other problems. Furthermore, you need to make sure to remove folders when they are not needed anymore, and you have a lag until new documents appear in a folder.
So in a nutshell, if you want to do an application wide restriction based on client names, use reader fields.
If you want to restrict for one client name at a time, use categories.
Otherwise, use fulltext search with Domino 9.

Resources