How to enforce a required field in elastic search? - elasticsearch

I am building a cms using elastic search on the back end and my team has decided to use elastic search. I am new to it. I mostly use mongoose with mongodb from previous projects. In mongodb if I wrong assign a field or completely skip a required field mongodb throws an error.
Is there a way to enforce required fields in elasticsearch?

There is not built in functionality, that will allow you to define required/mandatory fields in the mappings. Many will recommend you to do checks on the client side.
However, in Elasticsearch 5.x you have the possibility to do the trick by using Ingest node.
You can use ingest node to pre-process documents before the actual
indexing takes place. This pre-processing happens by an ingest node
that intercepts bulk and index requests, applies the transformations,
and then passes the documents back to the index or bulk APIs.
To pre-process documents before indexing, you define a pipeline that
specifies a series of processors. Each processor transforms the
document in some way.
An example, which shows the possibility of using this approach.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"lang": "painless",
"inline": "if (ctx.title == null) { throw new Exception('Document does not have the *title* field') }"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "1",
"_source": {
"title": "Elasticsearch 101"
}
},
{
"_index": "index",
"_type": "type",
"_id": "2",
"_source": {
"company": "Elastic"
}
}
]
}
For more information please take a look here - https://www.elastic.co/guide/en/elasticsearch/reference/5.2/ingest.html

Related

ElasticSearch - Multiple query on one call (with sub limit)

I have a problem with ElasticSearch, I need you :)
Today I have an index in which I have my documents. These documents represent either Products or Categories.
The structure is this:
{
"_index": "documents-XXXX",
"_type": "_doc",
"_id": "cat-31",
"_score": 1.0,
"_source": {
"title": "Category A",
"type": "category",
"uniqId": "cat-31",
[...]
}
},
{
"_index": "documents-XXXX",
"_type": "_doc",
"_id": "prod-1",
"_score": 1.0,
"_source": {
"title": "Product 1",
"type": "product",
"uniqId": "prod-1",
[...]
}
},
What I'd like to do, in one call, is:
Have 5 documents whose type is "Product" and 2 documents whose type is "Category". Do you think it's possible?
That is, two queries in a single call with query-level limits.
Also, isn't it better to make two different indexes, one for the products, the other for the categories?
If so, I have the same question, how, in a single call, do both queries?
Thanks in advance
If product and category are different contexts I would try to separate them into different indices. Is this type used in all your queries to filter results? Ex: I want to search for the term xpto in docs with type product or do you search without applying any filter?
About your other question, you can apply two queries in a request. The Multi search API can help with this.
You would have two answers one for each query.
GET my-index-000001/_msearch
{ }
{"query": { "term": { "type": { "value": "product" } }}}
{"index": "my-index-000001"}
{"query": { "term": { "type": { "value": "category" } }}}

Misspelling suggestion ("did you mean") with phrase suggest and whitespace correction with Elasticsearch

I use default analyzer "english" for searching documents and it is pretty good.
But also I need "did you mean" results when search query is misspelled OR search by such misspelled prhases.
What analyzers/filters/query do I need to achieve such behaveour?
Source text
Elasticsearch is a distributed, open source search and analytics engine for all types of data,
including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built
on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic).
Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is
the central component of the Elastic Stack, a set of open source tools for data ingestion,
enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack
(after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection
of lightweight shipping agents known as Beats for sending data to Elasticsearch.
Search terms
search query => did you mean XXX?
missed letter or something like
Elastisearch => Elasticsearch
distribated => distributed
Apacje => Apache
extra space
Elastic search => Elasticsearch
no space
opensource => open source
misspelled phrase
serach engne => search engine
Your first example of missed letter or something else can be achieved using the fuzzy query and second one using the custom analyzer which uses ngram or edge-ngram tokenizer for examples on it, please refer to my blog on autocomplete.
Adding fuzzy query example on your sample doc
Index mapping
{
"mappings": {
"properties": {
"title": {
"type": "text"
}
}
}
}
Index your sample docs and use below search queries
{
"query": {
"fuzzy": {
"title": {
"value": "distributed"
}
}
}
}
And search res
"hits": [
{
"_index": "didyou",
"_type": "_doc",
"_id": "2",
"_score": 0.89166296,
"_source": {
"title": "distribated"
}
}
]
And for Elasticsearch
{
"query": {
"fuzzy": {
"title": {
"value": "Elasticsearch"
}
}
}
}
And search Result
"hits": [
{
"_index": "didyou",
"_type": "_doc",
"_id": "1",
"_score": 0.8173577,
"_source": {
"title": "Elastisearch"
}
}
]

What does # mean in elastic search documents?

My question is: "What does the # mean in elastic search documents?" #timestamp automatically gets created along with #version. Why is this and what's the point?
Here is some context... I have a web app that writes logs to files. Then I have logstash forward these logs to elastic search. Finally, I use Kibana to visualize everything.
Here is an example of one of the documents in elastic search:
{
"_index": "logstash-2018.02.17",
"_type": "doc",
"_id": "0PknomEBajxXe2bTzwxm",
"_version": 1,
"_score": null,
"_source": {
"#timestamp": "2018-02-17T05:06:13.362Z",
"source": "source",
"#version": "1",
"message": "message",
"env": "development",
"host": "127.0.0.1"
},
"fields": {
"#timestamp": [
"2018-02-17T05:06:13.362Z"
]
},
"sort": [
1518843973362
]
}
# fields are usually ones generated by Logstash as metadata ones, #timestamp being the value that the event was processed by Logstash. Similarly #version is also being added by Logstash to denote the version number of the document.
Here is the reference.
The # field is the metadata created for Logstash. It is part of the data itself.
More info is here.

Elastic filter with dot (.) in name

I'm pretty new to ELK and seem to start with the complicated questions ;-)
I have elements that look like following:
{
"_index": "asd01",
"_type": "doc",
"_id": "...",
"_score": 0,
"_source": {
"#version": "1",
"my-key": "hello.world.to.everyone",
"#timestamp": "2018-02-05T13:45:00.000Z",
"msg": "myval1"
}
},
{
"_index": "asd01",
"_type": "doc",
"_id": "...",
"_score": 0,
"_source": {
"#version": "1",
"my-key": "helloworld.from.someone",
"#timestamp": "2018-02-05T13:44:59.000Z",
"msg": "myval2"
}
I want to filter for my-key(s) that start with "hello." and want to ignore elements that start with "helloworld.". The dot seem to be interpreted as a wildchard and every kind of escaping doesn't seem to work.
With a filter for that as I want to be able to use the same expression in Kibana as well as in the API directly.
Can someone point me to how to get it working with Elasticsearch 6.1.1?
It's not being used as a wildcard, it's just being removed by the default analyzer (standard analyzer). If you do not specify a mapping, elasticsearch will create one for you. For string fields it will create a multi value field, the default will be text (with default analyzer - standard) and keyword field with the keyword analyzer. If you do not want this behaviour you must specify the mapping explicitly during index creation, or update it and reindex the data
Try using this
GET asd01/_search
{
"query": {
"wildcard": {
"my-key.keyword": {
"value": "hello.*"
}
}
}
}

Discrepancies in ElasticSearch Results

I have a relatively simple search index built up for simple, plain text queries. No routing, custom analyzers or anything like that. One search instance/node, one index.
There are docs within the index that I have deleted, and the RESTfull API confirms that:
GET /INDEX_NAME/person/464
{
"_index": "INDEX_NAME",
"_type": "person",
"_id": "464",
"exists": false
}
However the doc is being returned from a simple search
POST /INDEX_NAME/person
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "person.offices",
"query": "Chicago"
}
}
]
}
}
}
One of the rows that is returned:
{
"_index": "INDEX_NAME",
"_type": "person",
"_id": 464,
"_score": null,
"fields": [
...
]
}
I'm new to ElasticSearch and thought I finally had a grasp of the basic concepts before digging deeper. But I'm not sure why a document isn't accessible via REST but it is still appearing in the results?
I'm also running into the reverse issue where docs are returned from the API but they are not being returned in the search. For the sake of clarity I am considering that a separate issue for the time being, but I have a feeling that these two issues might be related.
Part of me wants to delete my index and rebuild it, but I don't want to get into the same situation in a few days (and I'm not sure if that would even help).
Any ideas or pointers on why this discrepancy might be happening? Maybe a process is in some zombie state and elasticsearch just needs to be restarted?

Resources