elasticsearch: allow discovery of document, without exposing source? - elasticsearch

I'm trying to set up elasticsearch so that it allows users to discover the existence of documents, without having access to the document itself. For example, imagine a site that aggregates academic articles: they allow full-text search over the body, but only present the abstract.
I am trying to set up a system where different user groups have access to different documents, but everyone has access to the entire index.
What is the path of least resistance for me to set up restricted content search on elasticsearch? Is it a setting? A plugin? Write my own plugin? Fork?

To answer first part of your query,
First way: You can disable returning _source field for particular query by this.
{
"_source": false,
"query": {
"term": {
"user": "kimchy"
}
}
}
Second way: If you never want to see _source field, you can disable storing it.
{
"tweet": {
"_source": {
"enabled": false
}
}
}
Second part, you mentioned
I didn't exactly get your requirements but Shield can be useful if you want simple authentication, role based access control so some set of users can't modify documents and so on.
If you have your user-facing system, you can achieve it simply by adding access permission field in each document and mapping the permissions with user. Then you can use the filters when searching for documents. This is in-case if you don't get into details of Shield.

Related

Possible to provide an entire document to Update By Query?

I would like to search for a document that is stored in ElasticSearch based on it's fields and overwrite that entire document with a new version. I am new to ES but from what I can tell I can tell I can only use Update if I am searching for a document by it's ES assigned _id, so I was hoping to use Update By Query to do this. Unfortunately, it appears that if I use Update By Query, then I need to provide a script to update the fields I care about. Something like below:
POST my-index-000001/_update_by_query
{
"script": {
"source": "ctx._source.count++",
"lang": "painless"
},
"query": {
"term": {
"user.id": "kimchy"
}
}
}
My problem is that my document has dozens of fields and I don't know which of them will have changed. I could loop through them and build the script, but I'm hoping there is a way to simply provide the document that you want and have anything that matches your query be overwritten by that document. Is this possible with Update By Query? Or is there another way to match on something other than _id and perform an update?
You question is not entirely clear, are you trying to update the whole document for a for a given id? If yes, you can simple overwrite the exiting document with the put call:
PUT index-name/_id
This will overwrite the existing index so make sure that you are sending the complete document in your PUT call and not just the field that have changed.

How to find what index a field belongs to in elasticsearch?

I am new to elasticsearch. I have to write a query using a given field but I don't know how to find the appropriate index. How would I find this information?
Edit:
Here's an easier/better way using mapping API
GET _mapping/field/<fieldname>
One of the ways you can find is to get records where the field exist
Replace the <fieldName> with your fields name. /_search will search across all indices and return any document that matches or has the field. Set _source to false, since you dont care about document contents but only index name.
GET /_search
{
"_source": false,
"query": {
"exists": {
"field": "<fieldName>"
}
}
}
Another, more visual way to do that is through the kibana Index Management UI (assuming you have privileges to access the site).
There you can click on the indices and open the mappings tab to get all fields of the particular index. Then just search for the desired field.
Summary:
#Polynomial Proton's answer is the way of choice in 90% of the time. I just wanted to show you another way to solve your issue. It will require more manual steps than #Polynomial Proton's answer. Also, if you have a large amount of indices this way is not appropriate.

Elasticsearch index with no fields indexed

I need to create an Elasticsearch index whose contents will be accessed only by the document-id. There will never be any queries related to the contents of documents. These documents can contain any JSON, including instances where the same field can contain different types of data, etc.
I've hunted for this info, and have found much about indexing individual fields, but nothing about treating the entire document as essentially opaque.
Any help would be much appreciated.
You could do that you want, but as for me, this is not right way.
For first you need to create a mapping for index:
PUT index_name_here
{
"mappings": {
"document_type_here": {
"properties": {
"field_name_for_json_data_here": {
"type": "object",
"enabled": false
}
}
}
}
}
After this, you could create documents with custom structure of fields. You just need to store you JSON not directly in document, but inside field of document (in my example inside field "field_name_for_json_data_here")
If it possible, tell me the reason, why you choose Elasticsearch for store this kind of data? Because if I correctly understood the question, you need simple key\value storage (you could store you json as string) and many databases are more suitable for this.

Is there a way to define a dynamic query in Kibana dashboard?

A somewhat similar question has been asked here but there's no answer for that yet. That question relates to an older version of Kibana so I hope you can help me.
I'm trying to setup some predefined queries in the Kibana dashboard. I'm using Kibana 5.1. The purpose of those queries is filtering some logs based on multiple different parameters.
Let's see a query I'd like to execute:
{
"index": "${index_name}",
"query": {
"query_string": {
"query": "message:(+\"${LOG_LEVEL}\")",
"analyze_wildcard": true
}
}
}
I know I can query directly in the dashboard something like "message:(+"ERROR")" and manually change the ERROR to WARN for example, but I don't want that - imagine that this query might be more complex and contain multiple fields.
Note that the data stored in the message is not structured - think of the message as a whole log line. This means I don't have fields like LOG_LEVEL which I could filter directly.
Is there any way I can set the index_name and LOG_LEVEL dynamically from the Kibana Discover dashboard?
You should go to discover, open one document and click over this button in any of the fields. After this, a filter will appear under the search bar and you can edit it and put any custom query. If you want add more filters with more custom queries you can repeat the same action with a different document or field or you can do to Settings (or Management), Saved Objects, go to the Search you saved and to the JSON representation and copy and paste the elements inside the filter array field as many times you want.
And remember that in order to apply one of the filters, you probably should disable the enabled ones (otherwise it will filter by all the enabled filters in your dashboard).

Elasticsearch - Filtering / Boosting based on a list of ids

i have user objects indexed with various fields in a ES index.
I also have another services (not ES) telling me which of my users are currently online. It's a simple list of user ids. I can't index this "is_online" field cause it changing too regularly.
But i'd like to be able to perfom various query using this online status.
For exemple, i'd like to be able to filter my results to get only users that are currently online.
But i'd also like to be able to perform boost query based on this list of ids.
For example, i'd like to be able to get a list of users, preferably thoose who are online.
I dont know what is the best way to do that (performance POV).
Any help is welcome
I think that a Terms Query could help:
GET /_search
{
"query" : {
"terms" : {
"user" : ["id1", "id2", "idn"]
}
}
}

Resources