Bulk import nested records from ElasticSearch in RethinkDB - rethinkdb

I would like to import a lot of data from ElasticSearch into RethinkDB, but I'm not sure if the import capability allows for the case when the json objects I want are nested within another object. In my case, ElasticSearch nests the "hits" inside query metadata
{"took":7453,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":11853,"max_score":1.0,"hits":[...]}}
I want to quickly input everything inside ["hits"]["hits"]. Is this possible?

Related

Extracting a json field using Kibana

I have multiple fields stored as part of my log in elastic search. I am using Kibana to query the fields.One of the fields has a json object. I need to extract certain fields from the json object. Is there a way to do it using Kibana?
As of present Kibana does not support nested json objects.
There was some work being done - https://github.com/elastic/kibana/issues/1084
You can separate out fields you want from nested object to parent level key-value pair and then Kibana will be able to visualize it.

How to create an index from search results, all on the server?

I will be getting documents from a filtered query (quite a lot of documents). I will then immediately create an index from them (in Python, using requests to directly query the REST API), without any modification.
Is it possible to make this operation directly on the server, without the round-trip of data to the script and back?
Another question was similar (in the intent) and the only answer is to go via Logstash (equivalent to using my code, though possibly more efficient)
refer http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html
in short what you need to do is
0.) ensure you have _source set to true
1.) use scan and scroll API , pass your filtered query with search type scan,
2.)fetch documents using scroll id
2.) bulk index the result using the source field which returns you the json used to index data
refer:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
guide/en/elasticsearch/guide/current/bulk.html
guide/en/elasticsearch/guide/current/reindex.html
es 2.3 has an experimental feature that allows reindex from a query
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Get all documents from an index of an elasticsearch cluster and index it in another elasticsearch cluster

My goal here is to get all documents from an index of an ES cluster and insert them in another ES cluster keeping the same metadata.
I had a look at mget API to retrieve data and Bulk API to insert it but this Bulk API needs a special structure:
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
So my idea is to retrieve my data of EScluster1 in a file and rearranged it to meet the structure of Bulk API and index it to EScluster2.
Do you see a better and/or faster way to proceed?
elasticdump does this. If you want to do this manually, you'll want to query using scroll and then bulk index what comes out of that. Not too hard to script together. With elastic dump you can pump the data around without writing to a file. However, it is kind of limited when you have e.g. parent/child relations in your index.

Do elasticsearch queries touch the DB?

Just starting to use elasticsearch with haystack in django using postgres, and I'm pretty happy with it so far.
I'm wondering if the search queries (filters) through ES will submit a query to the DB or do they use data gathered during indexing?
Given that I can delete the data in the DB and still search, the answer seems to be yes, the queries do not touch the DB but only touch the index.
Also, I found this documentation on the matter:
http://django-haystack.readthedocs.org/en/latest/best_practices.html#avoid-hitting-the-database
Further, this is also from the docs:
For example, one great way to leverage this is to pre-rendering an
object’s search result template DURING indexing. You define an
additional field, render a template with it and it follows the main
indexed record into the index. Then, when that record is pulled when
it matches a query, you can simply display the contents of that field,
which avoids the database hit.:

Elastic Search way to hide document form search till end of import process

I am wondering if there is any way to "hide" documents from search routine in following case:
Import process are run daily.
Import process are indexing documents to Elastic Search via multiple calls to _bulk
While import process is running I do not what just imported documents where able for getting via _search.
Does Elastic search have some kind of transaction support - no index document will be available till transactions is committed?
I expect number of documents being indexed in one import process to be quite numerous. So I cannot do one _bulk call.
I tried index.refresh_interval index setting and calling _refresh at the end of import process. But it does not helped much - documents became searchable in the middle of import process.
Elasticsearch doesn't support transaction. You will need to handle this functionality on the client side. If you are only adding documents (no updates), you can associate a batch id with each record and filter out all records with batch id >= the currently running batch. At the end of the batch when you want to make new records available you can update the filter to include just finished batch id. You can associate this filter with an alias, which will make the switch transparent for searchers. This approach is not going to work in case of updates though.

Resources