Avoid mapping multiple fields in elastic search - elasticsearch

I have the following problem when indexing documents in elasticsearch, my documents contain some fields that are not repeated in other documents, so I end having a mapping of more than 100.000 elements. Let's see an example:
If I send something like this to an empty index:
{"example":{
"a1":123,
"a2":444,
"a3":52566,
"a4":7,
.....
"aN":11
}
}
It will create the following mapping:
{"example" : {
"properties" : {
"a1" : {
"type" : "long"
},
"a2" : {
"type" : "long"
},
"a3" : {
"type" : "long"
},
"a4" : {
"type" : "long"
},
.....
"aN" : {
"type" : "long"
}
}
}
}
Then if I send another document:
{"example":{
"b1":123,
"b2":444,
"b3":52566,
"b4":7,
.....
"bN":11
}
}
It will create a mapping double as the one above.
The object is more complex than this, but the situation that I'm having now is that the mapping is that big that is killing the server.
How can I address this? is the multifield working in this scenario? I tried in several ways but it doesn't seem to work.
Thanks.

It is pretty tough to give you a definitive answer given we have no idea of your usecase, but my initial guess is that if you have a mapping of thousands of fields that have no logical bond you've probably made some wrong choices about the architecture of your data. Could you tell us why you need to have thousands of fields that have different names for a single document type ? As it is there's not much we can do to pinpoint you into the right direction.

If you really want to do so, create mapping as on example below:
POST /index_name/_mapping/type_name
{
"type_name": {
"enabled": false
}
}
It will give required behavior. elasticsearch will stop to create mapping for fields, as well as parsing and indexing of your documents.
See these links to get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-object-type.html#_enabled_3

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

elasticsearch doesn't suggesting anything if the exact word is used as text?

I'm using text suggester of elasticsearch. My index contains a document which has a filed name and its value is crick
{
"suggest": {
"my-suggest" : {
"text" : "crick",
"term" : {
"field" : "name",
"sort": "score"
}
}
}
}
it return no match, it only returns a value if there is a misspelled
if I pass the exact text it return nothing any idea !!
You are not using suggest_mode
The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
missing: Only provide suggestions for suggest text terms that are not in the index. This is the default.
popular: Only suggest suggestions that occur in more docs then the original suggest text term.
always: Suggest any matching suggestions based on terms in the suggest text.
Since you haven't mentioned suggest_mode it is picking missing by default.
use this settings
{
"suggest": {
"my-suggest" : {
"text" : "crick",
"term" : {
"field" : "name",
"sort": "score",
"suggest_mode": "always"
}
}
}
}

Mapping to limit length of Array datatype in Elasticsearch

I'm trying to create an elasticsearch Mapping which limits the length of an array datatype to x number of items.
mapping = """
{
"mappings": {
"document": {
"properties": {
"pages": {
"type": "text"
}
}
}
}
}
}
"""
in this case, how do I set the "pages" array to have a maximum of 1,000 list items? Also, is there a way to "ignore" insert errors triggered by ES when this limit has been reached?
Elasticsearch has no such limits, you'd have to enforce it in your application.
As for ignoring errors look at the ignore_malformed option for many fields.
Hope this helps!
Thanks Honza !
I had assumed so eventually ... to expand on your answer, here's how I'm inserting/indexing documents now:
data = {
"_op_type": "index",
"_index" : "myIndex",
"_type" : "document",
'script' : {
'inline': 'if(ctx._source.pages.length < 1001){ ctx._source.pages.add(params.page);}',
'params' : {
"page" : "{}".format(item['page'])
}
}
}
I'm using the script field, combined with the "painless" language to check the field length before indexing the document.
Note, I'm using Python Elasticsearch library's bulk helper in the above example, which is why you see the "_op_type" field.

How to run Elasticsearch completion suggester query on limited set of documents

I'm using a completion suggester in Elasticsearch on a single field. The type contains documents of several users. Is there a way to limit the returned suggestions to documents that match a specific query?
I'm currently using this query:
{
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
Is there a way to combine this query with a different one, e.g.
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
}
}
Have a look at the context suggester, which is just a specialized completion suggester with filtering capabilities - however this is still not a regular query filter, just keep that in mind.
You can specify both the query and the suggester in your query, like this:
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
},
"suggest": {
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
}
I have a similar use case, and I've posted my question on elastic search forum, see here
From what I've read so far, I don't think with completion suggester you can limit documents. They essentially create a finite state transducer (prefix tree) at index time, this makes it fast but you lose the flexibility of filtering on additional fields. I don't think context suggester would work in your case (let me know if i am wrong), because the cardinality of user_id is very high.
I think edge-ngrams partial matching is more flexible and might actually work in your use case.
Let me know what you end up implementing.

Exact (not substring) matching in Elasticsearch

{"query":{
"match" : {
"content" : "2"
}
}} matches all the documents whole content contains the number 2, however I would like the content to be exactly 2, no more no less - think of my requirement in a spirit of Java's String.equals.
Similarly for the second query I would like to match when the document's content is exactly '3 3' and nothing more or less. {"query":{
"match" : {
"content" : "3 3"
}
}}
How could I do exact (String.equals) matching in Elasticsearch?
Without seeing your index type mapping and sample data, it's hard to answer this directly - but I'll try.
Offhand, I'd say this is similar to this answer here (https://stackoverflow.com/a/12867852/382774), where you simply set the content field's index option to not_analyzed in your mapping:
"url" : {
"type" : "string",
"index" : "not_analyzed"
}
Edit: I wasn't clear enough with my original answer, shown above. I did not mean to imply that you should add the example code to your query, I meant that you need to specify in your index type mapping that the url field is of type string and it is indexed but not analyzed (not_analyzed).
This tells Elasticsearch to not bother analyzing (tokenizing or token filtering) the field when you're indexing your documents - just store it in the index as it exists in the document. For more information on mappings, see http://www.elasticsearch.org/guide/reference/mapping/ for an intro and http://www.elasticsearch.org/guide/reference/mapping/core-types/ for specifics on not_analyzed (tip: search for it on that page).
Update:
Official doc tells us that in a new version of Elastic search you can't define variable as "not_analyzed", instead of this you should use "keyword".
For the old version elastic:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
For new version:
{
"foo": {
"type" "keyword",
"index": true
}
}
Note that this functionality (keyword type) are from elastic 5.0 and backward compatibility layer is removed from Elasticsearch 6.0 release.
Official Doc
You should use filter instead of match.
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"content" : 2
}
}
}
}
And you got docs whose content is exact 2, instead of 20 or 2.1

Resources