ElasticSearch create an index with dynamic properties - elasticsearch

Is it possible to create an index, restricting indexing a parent property?
For example,
$ curl -XPOST 'http://localhost:9200/actions/action/' -d '{
"user": "kimchy",
"message": "trying out Elasticsearch",
"actionHistory": [
{ "timestamp": 123456789, "action": "foo" },
{ "timestamp": 123456790, "action": "bar" },
{ "timestamp": 123456791, "action": "buz" },
...
]
}'
I don't want actionHistory to be indexed at all. How can this be done?
For the above document, I believe the index would be created as
$ curl -XPOST localhost:9200/actions -d '{
"settings": {
"number_of_shards": 1
},
"mappings": {
"action": {
"properties" : {
"user": { "type": "string", "index" : "analyzed" },
"message": { "type": "string": "index": "analyzed" },
"actionHistory": {
"properties": {
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"action": { "type": "string", "index": "analyzed" }
}
}
}
}
}
}'
Would removing properties from actionHistory and replace it with "index": "no" be the proper solution?
This is an example, however my actual situation are documents with dynamic properties (i.e. actionHistory contains various custom, non-repeating properties across all documents) and my mapping definition for this particular type has over 2000 different properties, making searches extremely slow (i.e. worst than full text search from the database).

You can probably get away by using dynamic templates, match on all actionHistory sub-fields and set "index": "no" for all of them.
PUT actions
{
"mappings": {
"action": {
"dynamic_templates": [
{
"actionHistoryRule": {
"path_match": "actionHistory.*",
"mapping": {
"type": "{dynamic_type}",
"index": "no"
}
}
}
]
}
}
}

Related

How to query documents where a rank_features field is missing?

I have an index with a few hundred thousand documents. Some of them have a rank_features field called my_field. I want to retrieve documents without that field.
I tried:
"query": {
"bool": {
"must_not": [
{"exists": {"field":"my_field"}}]
...
But I get the following error:
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: [rank_features] fields do not support [exists] queries",
...
The index mapping is defined as follows:
"mappings": {
"dynamic": "strict",
"_routing": {
"required": true
},
"properties": {
"my_field": {
"properties": {
"my_subfield": {
"type": "rank_features"
}
}
...
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"mapping": {
"total_fields": {
"limit": "2000"
}
},
"refresh_interval": "1s",
"number_of_shards": "10",
"blocks": {
"write": "false"
},
Note that despite the mapping being strict, this field was added recently and older documents don't have it.
Tldr;
You are doing a exist query against a field that only support rank_feature queries
As per the documentation of the rank_features field.
rank_features fields do not support sorting or aggregating and may only be queried using rank_feature queries.

Elasticsearch Field Preference for result sequence

I have created the index in elasticsearch with the following mapping:
{
"test": {
"mappings": {
"documents": {
"properties": {
"fields": {
"type": "nested",
"properties": {
"uid": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"fulltext"
]
}
}
},
"fulltext": {
"type": "text"
},
"tags": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
While searching I want to set the preference of fields for example if search text found in title or url then that document comes first then other documents.
Can we set a field preference for search result sequence(in my case preference like title,url,tags,fields)?
Please help me into this?
This is called "boosting" . Prior to elasticsearch 5.0.0 - boosting could be applied in indexing phase or query phase( added as part of field mapping ). This feature is deprecated now and all mappings after 5.0 are applied in query time .
Current recommendation is to to use query time boosting.
Please read this documents to get details on how to use boosting:
1 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
2 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html

Lucene search using Kibana does return my results

Using Kibana, I have created the following index:
put newsindex
{
"settings" : {
"number_of_shards":3,
"number_of_replicas":2
},
"mappings" : {
"news": {
"properties": {
"NewsID": {
"type": "integer"
},
"NewsType": {
"type": "text"
},
"BodyText": {
"type": "text"
},
"Caption": {
"type": "text"
},
"HeadLine": {
"type": "text"
},
"Approved": {
"type": "text"
},
"Author": {
"type": "text"
},
"Contact": {
"type": "text"
},
"DateCreated": {
"type": "date",
"format": "date_time"
},
"DateSubmitted": {
"type": "date",
"format": "date_time"
},
"LastModifiedDate": {
"type": "date",
"format": "date_time"
}
}
}
}
}
I have populated the index with Logstash. If I just perform a match_all query, all my records are returned as you'd expect. However, when I try to perform a targeted query such as:
get newsindex/_search
{
"query":{"match": {"headline": "construct abnomolies"}
}
}
I can see headline as a property of _source, but my query is ignored i.e. I still receive everything, regardless of whats in the headline. How do I need to change my index to make headline searchable. I'm using Elasticsearch 5.6.3
I needed to change the name property on my index to be lowercase. I noticed in the output windows the the properties under _source where lowercase. In Kibana the predictive text was offering my notation and lowercase. I've dropped my index and re-populated and it now works.

How do I alter the schema without destroying data in elasticsearch?

This is my current schema
{
"mappings": {
"historical_data": {
"properties": {
"continent": {
"type": "string",
"index": "not_analyzed"
},
"country": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string"
},
"funding": {
"type": "long"
},
"year": {
"type": "integer"
},
"agency": {
"type": "string"
},
"misc": {
"type": "string"
},
"university": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
I have 700k records uploaded. Without destroying the data, how can I make the university index not "not_analysed" such that the change reflects in my existing data?
The mapping for an existing field cannot be modified.
However you can achieve the desired outcome in two ways .
Create another field. Adding fields is free using put _mapping API
curl -XPUT localhost:9200/YOUR_INDEX/_mapping -d '{
"properties": {
"new_university": {
"type": "string"
}
}
}'
Use multi-fields, add a sub-field to your not_analyzed field.
curl -XPUT localhost:9200/YOUR_INDEX/_mapping -d '{
"properties": {
"university": {
"type": "string",
"index": "not_analyzed",
"fields": {
"university_analyzed": {
"type": "string" // <-- ANALYZED sub field
}
}
}
}
}'
In both the case, you need to reindex in order to populate the new field. Use _reindex API
curl -XPUT localhost:9200/_reindex -d '{
"source": {
"index": "YOUR_INDEX"
},
"dest": {
"index": "YOUR_INDEX"
},
"script": {
"inline": "ctx._source.university = ctx._source.university"
}
}'
You are not exactly forced to "destroy" your data, what you can do is reindex your data as described in this article (I'm not gonna rip off the examples as they are particularly clear in the section Reindexing your data with zero downtime).
For reindexing, you can also take a look at the reindexing API, the simplest way being:
POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Of course it will take some resources to perform this operation, so I would suggest that you take a complete look at the changes you want to introduce in your mapping, and perform the operation when you have the least amount of activity on your servers (e.g. during the weekend, or at night...)

Update and search in multi field properties in ElasticSearch

I'm trying to use multi field properties for multi language support. I created following mapping for this:
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
I created test record:
{
"prod-id": "1234567",
"prod-name": [
"Test product",
"Produit d'essai"
]
}
and tried to query using some language:
{
"query": {
"bool": {
"must": [
{"match": {
"prod-name.en": "Produit"
}}
]
}
}
}
As a result I got my document. But I expected that I will have empty result when I use French but choose English. It seems ElasticSearch ignores which field I specified in query. There is no difference in search result when I use "prod-name.en" or "prod-name.fr" or just "prod-name". Is this behaviour expected? Should I do some special things to have searching just in one language?
Another problem with updating multi field property. I can't update just one field.
{
"doc" : {
"prod-name.en": "Test"
}
}
I got following error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
}
],
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
},
"status": 400
}
Is there any way to update just one field in multi field property?
In your mapping, the prod-name.en field will simply be analyzed using the english analyzer and the same for the french field. However, ES will not choose for you which value to put in which field.
Instead, you need to modify your mapping like this
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "object",
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
and input document to be like this and you'll get the results you expect.
{
"prod-id": "1234567",
"prod-name": {
"en": "Test product",
"fr": "Produit d'essai"
}
}
As for the updating part, your partial document should be like this instead.
{
"doc" : {
"prod-name": {
"en": "Test"
}
}
}

Resources