Getting Rethinkdb index metadata - rethinkdb

I'd like to obtain metadata about an index on a Rethinkdb table, such as
what expression is used for arbitrary indexes
what fields are used for compound indexes
whether the index is multi or not
How can I get this information through the admin interface?
Through the Python API?
Thanks!

Unfortunately at this point you can't get any of this information. It's something that's planned as an addition to ReQL but hasn't been added yet.
If this issue is important to you I'd recommend opening an issue here: https://github.com/rethinkdb/rethinkdb that's the best way to influence development direction.

Related

Keeping the .enrich index updated to source index elasticsearch

I'm using the new enrich API of Elasticsearch (ver 7.11),
to my understanding, I need to execute the policy "PUT /_enrich/policy/my-policy/_execute" each time when the source index changed, which lead to the creation of a new .enrich index.
is there an option to make it happen automatically and avoid of index creation on every change of the source index?
This is not (yet) supported and there have been other reports of similar needs.
It seems to be complex to provide the ability to regularly update an enrich index based on a changing source index and the issue above explains why.
That feature might be available some day, something seems to be in the works. I agree it would be super useful.
You can add a default pipeline to your index. that pipeline will process the documents.
See here.

Is there any tool out there for generating elasticsearch mapping

Mostly what I do is to assemble the mapping by hand. Choosing the correct types myself.
Is there any tool which facilitates this?
For example which will read a class (c#,java..etc) and choosing the closest ES types accordingly.
I've never seen such a tool, however I know that ElasticSearch has a REST API over HTTP.
So you can create a simple HTTP query with JSON body that will depict your object with your fields: field names + types (Strings, numbers, booleans) - pretty much like a Java/C# class that you've described in the question.
Then you can ask the ES to store the data in the non-existing index (to "index" your document in ES terms). It will index the document, but it will also create an index, and the most importantly for your question, will create a mapping for you "dynamically", so that later you will be able to query the mapping structure (again via REST).
Here is the link to the relevant chapter about dynamically created mappings in the ES documentation
And Here you can find the API for querying the mapping structure
At the end of the day you'd still want to retain some control over how your mapping is generated. I'd recommend:
syncing some sample documents w/o a mapping
investigating what mapping was auto generated and
dropping the index & using dynamic_templates to pseudo-auto-generate / update the mapping as new documents come in.
This GUI could help too.
Currently, there is no such tool available to generate the mapping for elastic.
It is a kind of similar thing as we have to design a database in MySQL.
But if we want such kind of thing then we use Mongo DB which requires no predefined schema.
But Elastic comes with its very dynamic feature, which allows us to play around it. One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible like the mongo schema which can be manipulated dynamically.
To index a document, you don’t need to first define a mapping or schema and define your fields along with their data type .
You can just index a document and the index, type, and fields will be created automatically.
For further details you can go through the below documentation:
Elastic Dynamic Mapping

Multi tenancy in Elastic Search

We are planning to introduce Elastic search(AWS) for our Multi tenancy application. We have below options,
Using One Index Per Tenant
Using One Type Per Tenant
All Tenants Share One Index with Custom routing
As per this blog https://www.elastic.co/blog/found-multi-tenancy the first option would give memory issue. But not clear about other options.
It seems if we are using the third option then there is no data segregation. Not sure about security.
I believe second option would be better option as data would be segregated.
Help me to identify best option to proceed elastic search with Multi tenancy.
Please note that we would leverage AWS infrastructure.
We are considering the same question right now, and the following set of articles by Elasticsearch was very helpful.
Start here: https://www.elastic.co/guide/en/elasticsearch/guide/current/scale.html
And read through each subsequent article until you hit this one: https://www.elastic.co/guide/en/elasticsearch/guide/current/finite-scale.html
The following two were very eye-opening for me:
https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/one-big-user.html
The basic takeaway:
Alias per customer
Shard routing
Now you can have indexes for big customers, shared indexes for little customers, and they all appear to be separate indices
This is a too important link not to be mentioned here:
http://www.bigeng.io/elasticsearch-scaling-multitenant/
Good architecture dilemmas, and great performance analysis / reasoning.
tldr; they had index groups that are built around shard allocation filtering to segregate load across nodes in the cluster
To sum up accepted answer and other articles,
Use a shared index using custom routing using an alias
1.1) Special case: Big client can have dedicated index, only if needed.
Following article covers many use cases for detailed explanation.
https://www.elastic.co/blog/found-multi-tenancy
Following is the conclusion on how you can do it (link source: accepted answer)
https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html

Internal data storage mechanism of elasticsearch

I have been working with elasticsearch for the past 2 months. I have used both REST approach and API support in different languages to index, get and search data. I also read a lot about elasticsearch and found out it is not a good option to use it as a data store. Why is this? And I'm also curious about how elasticsearch internally stores the indexed data. Any good link or explanation??
Elastic Search is built on top of Apache Lucene - here's a reference doc on the Lucene index file structure:
http://lucene.apache.org/core/4_7_2/core/org/apache/lucene/codecs/lucene46/package-summary.html#package_description
Regarding whether or not it's a good option as a data store I think that's more individual opinion and specific use cases than a fact that can be proved. It does not have the transaction support that something like MySQL does if that's what you are looking for. In that case it's somewhat on a par with other NoSQL solutions. This is a pretty decent writeup on the trade-offs and issues: https://www.found.no/foundation/elasticsearch-as-nosql/
In the end it depends on what you are doing with your data and what level of robustness you require.

What does the document cap mean in Websolr?

I'm using the Websolr addon in Heroku. What does it mean by "250,000 documents"? What number of DB records or size is that?
Nick from Websolr here.
In this case, 'documents' would be all the distinct 'things' that you want to search.
A Solr index is comprised of many documents. Each document has many fields. Typically each document is analogous to a row in a table, or an instance of a model for your particular ORM.
Typically, a Solr client for your preferred language will help you integrate that concept into your own application and the tools you have used to create it.
In Solr a document is an indexed 'item'.

Resources