How to change Elasticsearch document source content by modifying Elasticsearch source code? - elasticsearch

I need to encrypt Elasticsearch document source content for security.
The final effect to be achieved is as follows:
input:
{
"title":"you know, for search",
"viewcount": 20
}
In es:
{
"title": "zpv!lopx-!gps!tfbsdi", // whatever, encrypted title
"viewcount": ☯ // whatever, encrypted viewcount
}

Instead of having encrypted data in ES, We can make communication between ES nodes and clients can be encrypted with X-Pack. That means If the Client is allowed to query the data in the end he will be able to get the data. We can control that with X-Pack.
Indexing encrypted data in ElasticSearch is not recommended IMO, since it involves additional overhead of Decrypting and encrypting the data.

Related

Using ElasticSearch Local version in postman

I am trying to Use my Elastic search server installed in my local machine to use Postman .i.e., With the help of Postman I want to Post Data and retrieve it with a get operation but unable to do it as I am getting error unknown key [High] for create index
So please help me with the same.
If you want to add a document to your index,
your url should look something like this ( for document ID 1 ) :
PUT http://localhost:9200/test/_doc/1
A good place to start :
https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index.html
For indexing document in the index
PUT http://localhost:9200/my_index/_doc/1
Retrieving indexed document
GET http://localhost:9200/my_index/_doc/1
Introduction:
Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack. Do anything from tracking query load to understanding the way requests flow through your apps.
Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite “stash.” .
Elasticsearch exposes itself through rest API so in this case you don't have to use logstash as we are directly adding data to elastic search
How to add it directly
you can create an index and type using :
{{url}}/index/type
where index is like a table and type is like just a unique data type that we will be storing to the index. Eg {{url}/movielist/movie
https://praveendavidmathew.medium.com/visualization-using-kibana-and-elastic-search-d04b388a3032

Fuzzy search over encrypted data

I have a schema where couple of fields must be encrypted. I was wondering if someone has done it or can point me to a resource to know if with elasticsearch I have a way to implement a fuzzy search over this encrypted data.
For example when I have
{
"last_name": "encryptedLastName",
}
and 2 documents where lastName was encrypted one with encrypted value of last_name=Ferdinand and another one with encrypted value of last_name=Ferdadian
I'd like to be able to search with a string and fetch both document as long as the levenstein distance is > 80 for example. Is this at all possible?
On another note, I also wanted to be able to do searches with 'like' over the encrypted data for example where last_name like 'Fer%'
You can build index over encrypted data, but it would mean the data will be unencrypted in the index. The same reason why they are encrypted in the database itself likely means they can't be unencrypted in the elasticsearch index either.
And if the encryption is any good, similar values look completely different after encryption.
Generally (not specific to Elasticsearch):
To search over encrypted data, you will need to decrypt it. If you want to make it fast, you need to keep a decrypted index. You can either have a fast search or good encryption but not both.

How to reindex AWS Elasticsearch?

My Ruby/Sinatra app connects to an AWS ES cluster using the elasticsearch-ruby gem to index text documents that authorised (by indexing using their user ID) users can search through. Now, I want to copy a document from one index to another to make a document query-able by a different, authorised user. I tried the _reindex endpoint as documented on this file only to get the following error:
Elasticsearch::Transport::Transport::Errors::Unauthorized - [401] {"Message":"Your request: '/_reindex' is not allowed."}:
Googling around, I stumbled across an Amazon docs page that lists all supported operations on both their API's, and for some twisted reason _reindex isn't there yet. Why is that? More importantly,
how do I get around this efficiently and achieve what I want to do?
You should double check the Elasticsearch version deployed by AWS ES. The _reindex API became available in version 2.2 I believe. You can check the version number by GETting the ES root ip & port with curl e.g. and checking version.number.
To work around not having the _reindex endpoint, I would recommend you implement it yourself. This isn't too bad. You can use a scroll to iterate through all the documents you want to reindex. If it is the entire index, you can use a matchall query with the scroll. You can then manipulate the documents as you wish or simply use the bulk api to post (i.e. reindex) the documents to the new index.
Make sure to have created the new index with the mapping template you want ahead of time.
This procedure above is best for reindexing lots of documents; if you just want to move a few or one (which it sounds like you do). Grab the document from its existing index by id and submit it to your second index.
AWS Elasticsearch now supports remote reindex, check this documentation:
https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/remote-reindex.html
Example below:
'''
POST <local-domain-endpoint>/_reindex
{
"source": {
"remote": {
"host": "https://remote-domain-endpoint:443"
},
"index": "remote_index"
},
"dest": {
"index": "local_index"
}
}
'''

Using ElasticSearch to store data without indexing or analysis (NEST client)

We are using ES via the NEST client for search, and we'd like to try to leverage it to store some reports that the system generates as well.
The reports are strings containing CSV data and they can be quite large, 100mb+, and we've run into some problems. First we were exceeding the 100mb limit set in the http config, so I increased that and the error stopped.
Now we're getting System.OutOfMemoryExceptions.
With the reports, we don't need to analyse them or have them tokenized and indexed. We just need to be able to get them back out by their ID to send along to the browser. I haven't had a lot of luck finding details on how to use ES as a dumb key-value store though, or if that will help with the memory problem.
Additionally it crossed my mind to zip compress the data before sending it into ES, but again, not sure if that'd help or what would be involved.
Don't know how you have it currently configured, but you could try the string type with an index value of "not_analyzed" or "no". I'd also give the binary type a try. You should set store to "true" for either approach. That should prevent ES from attempting to analyze and index the field.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html

What is the best way to index Couchbase data on Elastic Search

I work with Couchbase DB and I want to index part of its data on Elastic Search (ES).
The data from Couchbase should be synced, i.e. if the document on CB changes, it should change the document on ES.
I have several questions about what is the best way to do it:
What is the best way to sync the data ? I saw that there is a CB plugin for ES (http://www.couchbase.com/couchbase-server/connectors/elasticsearch), but it that the recommended way ?
I don't want to store all the CB document on ES, but only part of it, e.g. some of the fields I want to store and some not - how can I do it ?
My documents may have different attributes and the difference may be big (e.g. 50 different attributes/fields). Assuming I want to index all these attributes to ES, will it effect the performance because I have a lot of fields indexed ?
10x,
Given the doc link, I am assuming you are using Couchbase and not CouchDB.
You are following the correct link for use of Elastic Search with Couchbase. Per the documentation, configure the Cross Data Center Replication (XDCR) capabilities of Couchbase to push data to ES automatically as mutations occur.
Without a defined mapping file, ES will create a default mapping. You can provide your own mapping file (or alter the one it generates) to control which fields get indexed. Refer to the enabled property in the ES documentation at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-object-type.html.
Yes, indexing all fields will affect performance. You can find some performance management tips for the Couchbase integration at http://docs.couchbase.com/couchbase-elastic-search/#managing-performance. The preferred approach to the integration is perform the search in ES and only get keys back for the matched documents. You then make a multiget call against the Couchbase cluster to retrieve the document details themselves. So while ES will index many fields, you do not store all fields there nor do you retrieve their values from ES. The in-memory multiget against Couchbase is the fastest way to retrieve the matching documents, using the IDs from ES.
Lot of questions..!
Let me answer one by one:
1)The best way and already available solution to use river plugin to dynamically sync the data.And also it ll index the changed document alone..It ll help a lot in performance.
2)yes you can restrict the field to be indexed in river plugin. refer
The documents of plugin is available in couchbase website itself.
Refer: http://docs.couchbase.com/couchbase-elastic-search/
Github river is still in development.,but you can use the code and modify as your need.
https://github.com/mschoch/elasticsearch-river-couchbase
3)If you index all the fields, yes there will be some lag in performance.So better to index the needed fields alone. if you need to store some field just to store, then mention in mapping as not analyzed to specific.It will decrease indexing time and also searching time.
HOpe it helps..!
You might find this additional explanation regarding Don Stacy's answer to question 2 useful:
When replicating from Couchbase, there are 3 ways in which you can interfere with Elasticsearch's default mapping (before you start XDCR) and thus, as desired, not store certain fields by setting "store" = false:
Create manual mappings on your index
Create a dynamic template
Edit couchbase_template.json
Hints:
Note that when we do XDCR from Couchbase to Elasticsearch, Couchbase wraps the original document in a "doc" field. This means that you have to take this modified structure into account when you create your mapping. It would look something like this:
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument": {
"_source": {
"enabled": false
},
"properties": {
"doc": {
"properties": {
"your_field_name": {
"store": true,
...
},
...
}
}
}
}
}'
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
Including/Excluding fields from _source: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/dynamic-templates.html
https://forums.couchbase.com/t/about-elasticsearch-plugin/2433
https://forums.couchbase.com/t/custom-maps-for-jsontypes-with-elasticsearch-plugin/395

Resources