With graphene-django when documenting e.g. GraphQL object type classes the docstrings are usually used to autogenerate documentation which is visible in GraphiQL. This should also work with plain graphene-python and all graphene-python based implementations like graphene and graphene-mongo as well. But the docstrings used for resolver functions of queries is not used to autodocument Queries. How can I document queries with docstrings?
Related
Mostly what I do is to assemble the mapping by hand. Choosing the correct types myself.
Is there any tool which facilitates this?
For example which will read a class (c#,java..etc) and choosing the closest ES types accordingly.
I've never seen such a tool, however I know that ElasticSearch has a REST API over HTTP.
So you can create a simple HTTP query with JSON body that will depict your object with your fields: field names + types (Strings, numbers, booleans) - pretty much like a Java/C# class that you've described in the question.
Then you can ask the ES to store the data in the non-existing index (to "index" your document in ES terms). It will index the document, but it will also create an index, and the most importantly for your question, will create a mapping for you "dynamically", so that later you will be able to query the mapping structure (again via REST).
Here is the link to the relevant chapter about dynamically created mappings in the ES documentation
And Here you can find the API for querying the mapping structure
At the end of the day you'd still want to retain some control over how your mapping is generated. I'd recommend:
syncing some sample documents w/o a mapping
investigating what mapping was auto generated and
dropping the index & using dynamic_templates to pseudo-auto-generate / update the mapping as new documents come in.
This GUI could help too.
Currently, there is no such tool available to generate the mapping for elastic.
It is a kind of similar thing as we have to design a database in MySQL.
But if we want such kind of thing then we use Mongo DB which requires no predefined schema.
But Elastic comes with its very dynamic feature, which allows us to play around it. One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible like the mongo schema which can be manipulated dynamically.
To index a document, you don’t need to first define a mapping or schema and define your fields along with their data type .
You can just index a document and the index, type, and fields will be created automatically.
For further details you can go through the below documentation:
Elastic Dynamic Mapping
I am working on a search dashboard with full text search capabilities, backed by ES. The search would initially be consumed by a UI dashboard. I am planning to have an application web service (WS) api layer between the UI dashboard and ES which will route the business search to ES.
There can be multiple clients to WS going forward, each with its own business use cases, and complex data requirements (basically response fields). There are many entities and huge number of fields across them. Each client would need to specify what fields entities it wants to return with what fields.
To support this dynamically changing requirement, one approach could be to have the WS be a pass through to the ES (with pre validations like access control and post transformations to the response from ES). The WS APIs will look exactly like the ES APIs, the UI should build ES queries through JS client and send it to WS, which after access control will get data from ES.
I am new to ES and skeptic of this approach. Can there be any particular challenges in this approach. One of my colleague has worked on ES before but always with a backend Java client, so he's not too sure.
I looked up a ES Js client and there's an official one here.
Some Context here:
We have around 4 different entities (can increase in future) with both full text and keyword type fields. A typical search could have multiple filters and search terms and would want to specify the result fields. Also, some searches would be across entities and some to individual ones. We are maintaining a separate entity for each entity.
What I understand from your post is, below is what you want to achieve at high level.
There can be multiple clients to WS going forward, each with its own
business use cases, and complex data requirements (basically response
fields)
And as you are not sure, how to do this, you are thinking to build Elasticsearch queries from Javascript in your front-end only. I am not a very big fan of this approach as it exposes, how you are building queries and if some hacker knows crucial like below information, then can bring your entire ES cluster to its knees:
Knows what types of wildcard queries.
Knows index names and ES cluster details(although you may have access control but still you are exposing the crucial info).
How you are building your search queries.
Above are just a few examples and will add more info.
Right approach
As you already have a backend, where you would be checking the access, there only build the Elasticsearch queries and you even have the advantage of your teammates who knows it.
For building complex response field, you can use the source filtering, using which you can specify in your search request, what all fields you want to return in your search result.
I'm using kafka-connect-elasticsearch with a custom converter, which extends standard JsonConverter.
I have 250+ topics with different event types, thus i'm happy that kafka-connect automatically creates indices for me in elasticsearch.
However, I'd like to disable all analysers except for keyword-analyser (don't need full-text search here).
How can I do that? How and where can I manipulate index mappings?
In my custom converter I infer schema for my payload, convert it to kafka-connect-specific schema, and then return new SchemaAndValue(connectSchema, connectValue) object.
I suppose, connect-specific schema is then used to generate mappings, is that true?
I started reading the documentation about Elasticsearch, and I read about _type metadata element, in Elasticsearch documentation:
Elasticsearch exposes a feature called types which allows you to logically partition data inside of an index. Documents in different types may have different fields, but it is best if they are highly similar.
So my question is: In which situations the best practice is to split documents into types? Because in the documentation, they wrote that the documents in different _types should have similar fields.
Let's say you create a new index "WWW" and the types of it would be "http" and "https". Both types have the same mapping and fields. It would be easier to search all the "http" documents like this:
GET /WWW/http/_search?pretty
and the https like this:
GET /WWW/https/_search?pretty
It also gives you a logical separation between your data.
There's a good blog post about type vs index: https://www.elastic.co/blog/index-vs-type
Having the same mappings and fields is a good starting point (since sparsity is an issue). Just be aware that types will be removed in the future, so don't structure your logic around it too heavily. But you will be able to do the same with an enum like field and a filter in your query.
We were comparing those search solutions and started to wonder why one does need a schema and the other does not. What are tradeoffs? Is it because one is like SQL and the other is like NoSQL in sense of schema configuration?
ES does have a schema defined as templates and mappings. You don't have to use it, but in practice you will. Schema is actually a good thing, and if you notice a database claiming to be pure schemaless - there will be performance implication.
Schema is a tradeoff between ease of developing and adoption against performance. It is easy to read/write into a schemaless database, but it it will be less performant, particularly for any non-trivial query.
Elasticsearch definitely has a schema. If you think it does not, try indexing a date into a field and then an int into the same field. Or even into different types with the same name (I think ES 2.0 disallows that now).
What Elasticsearch does is simplifies auto-creation of a schema. That has tradeoffs such as possible incorrect type detection, fields that are single-valued or multivalued in the result output based on number of elements they contain (they are always multivalued under the covers), and so on. Elasticsearch has some ways to work around that, mostly by defining some of the schema elements and explicit schema mapping as Oleksii wrote.
Solr also has schemaless mode that closely matches Elasticsearch mode, down to storing all JSON as a single field. And when you enable it, you get both similar benefits and similar disadvantages Elasticsearch has. Except, in Solr, you can change things like order of auto-type strategies and mapping to field types. In Elasticsearch (1.x at least) it was hard coded. You can see - slightly dated - comparison in my presentation from 2014.
As Slomo said, they both use Lucene underneath for storing and most of the search. So, the core engine approach cannot change.