Separation of hits returned from elastic by nested field value - elasticsearch

I've index with products there. I'm trying to separate hits returned from elastic by nested field value. There's my shortened index:
{
"mapping": {
"product": {
"properties": {
"id": {
"type": "integer"
},
"model_name": {
"type": "text",
},
"variants": {
"type": "nested",
"properties": {
"attributes": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"product_attribute_id": {
"type": "integer"
},
"value": {
"type": "text"
}
}
},
"id": {
"type": "integer"
},
"product_id": {
"type": "integer"
}
}
}
}
}
}
}
And product example (there's is more variants and attributes in product - I just cut them off):
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
},
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}
The field in question is attribute id. Let's say I want to separate products by size attribute - id 29. It would be perfect if the response would look like:
"hits" : [
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
}
]
},
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}]
I thought about separate all variants in elastic request and then group them on application side by those attribute but i think it's not most elegant and above all, efficient way.
What are the elastic keywords that I should be interested in?
Thank you in advance for your help.

Related

Simple elasticsearch input - Rejecting mapping update final mapping would have more than 1 type: [_doc, doc]

I'm trying to send data to elasticsearch but running into an issue where my number field only comes up as a string. These are the steps I took.
Step 1. Add index & map
PUT http://123.com:5101/core_060619/
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
Result:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "core_060619"
}
Step 2. Add data
PUT http://123.com:5101/core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
Result:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
},
"status": 400
}
You can not have more than one type of document in Elasticsearch 6.0.0+. If you set your document type to doc, then you can add another document by simply PUT http://123.com:5101/core_060619/doc/1, PUT http://123.com:5101/core_060619/doc/2 etc.
Elasticsearch 6.+
PUT core_060619/
{
"mappings": {
"doc": { //type of documents in index is 'doc'
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
}
Since we created mapping to have doc type of documents, now we can add new documents by simply adding /doc/_id:
PUT core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
PUT core_060619/doc/2
{
"test" : [ {
"data" : "111120300",
"date" : "10:15 2019-06-02"
} ]
}
Elasticsearch 7.+
Types are removed, but you can use custom like field(s):
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"type": { "type": "keyword" },
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" },
"content": { "type": "text" },
"tweeted_at": { "type": "date" }
}
}
}
}
PUT twitter/_doc/user-kimchy
{
"type": "user",
"name": "Shay Banon",
"user_name": "kimchy",
"email": "shay#kimchy.com"
}
PUT twitter/_doc/tweet-1
{
"type": "tweet",
"user_name": "kimchy",
"tweeted_at": "2017-10-24T09:00:00Z",
"content": "Types are going away"
}
GET twitter/_search
{
"query": {
"bool": {
"must": {
"match": {
"user_name": "kimchy"
}
},
"filter": {
"match": {
"type": "tweet"
}
}
}
}
}
Removal of mapping types

elasticsearch reindex nested object's element to keyword

I have an index structured like below:
"my_index": {
"mappings": {
"my_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeStatistics": {
"type": "nested",
"properties": {
"clicks": {
"type": "long"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
}
}
I need to remove the nested object in a new index and just save the creativeId as a new keyword (to make it clear: I know I will loose the clicks data, and it is not important). It means the final new index scheme would be:
"my_new_index": {
"mappings": {
"my_new_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
Right now each row has exactly one creativeStatistics. and therefore there is no complexity in selecting one of the creativeIds.
I know it is possible to reindex using painless scripts, but I don't know how can I do that. Any help will be appreciated.
You can do it like this:
POST _reindex
{
"source": {
"index": "my_old_index"
},
"dest": {
"index": "my_new_index"
},
"script": {
"source": "if (ctx._source.creativeStatistics != null && ctx._source.creativeStatistics.size() > 0) {ctx._source.creativeId = ctx._source.creativeStatistics[0].creativeId; ctx._source.remove('creativeStatistics')}",
"lang": "painless"
}
}
You can also create a Pipeline by creating a Script Processor as follows:
PUT _ingest/pipeline/my_pipeline
{
"description" : "My pipeline",
"processors" : [
{ "script" : {
"source": "for (item in ctx.creativeStatistics) { if(item.creativeId!=null) {ctx.creativeId = item.creativeId;} }"
}
},
{
"remove": {
"field": "creativeStatistics"
}
}
]
}
Note that if you have multiple nested objects, it would append the last object's creativeId. And it would only add creativeId if a source document has one in its creativeStatistics.
Below is how you can then use reindex query:
POST _reindex
{
"source": {
"index": "creativeindex_src"
},
"dest": {
"index": "creativeindex_dest",
"pipeline": "my_pipeline"
}
}

Nested query in ElasticSearch - two levels

I have the next mapping :
"c_index": {
"aliases": {},
"mappings": {
"an": {
"properties": {
"id": {
"type": "string"
},
"sm": {
"type": "nested",
"properties": {
"cr": {
"type": "nested",
"properties": {
"c": {
"type": "string"
},
"e": {
"type": "long"
},
"id": {
"type": "string"
},
"s": {
"type": "long"
}
}
},
"id": {
"type": "string"
}
}
}
}
}
}
And I need a query than gives me all the cr's when:
an.id == x and sm.id == y
I tried with :
{"query":{"bool":{"should":[{"terms": {"_id": ["x"]}},
{"nested":{"path": "sm","query":{
"match": {"sm.id":"y"}}}}]}}}
But runs very slow and gives more info than i need.
What's the most efficient way to do that ? Thank you!
You don't need nested query here. Also, use filter instead of should if you want to find documents matching all the queries (the exception would be if you wanted the query to affect the score, like match query, which is not the case here, then you could use should + minimum_should_match option)
{
"query": {
"bool": {
"filter": [
{ "term": { "_id": "x" } },
{ "term": { "sm.id": "y" } }
]
}
}
}

Update and search in multi field properties in ElasticSearch

I'm trying to use multi field properties for multi language support. I created following mapping for this:
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
I created test record:
{
"prod-id": "1234567",
"prod-name": [
"Test product",
"Produit d'essai"
]
}
and tried to query using some language:
{
"query": {
"bool": {
"must": [
{"match": {
"prod-name.en": "Produit"
}}
]
}
}
}
As a result I got my document. But I expected that I will have empty result when I use French but choose English. It seems ElasticSearch ignores which field I specified in query. There is no difference in search result when I use "prod-name.en" or "prod-name.fr" or just "prod-name". Is this behaviour expected? Should I do some special things to have searching just in one language?
Another problem with updating multi field property. I can't update just one field.
{
"doc" : {
"prod-name.en": "Test"
}
}
I got following error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
}
],
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
},
"status": 400
}
Is there any way to update just one field in multi field property?
In your mapping, the prod-name.en field will simply be analyzed using the english analyzer and the same for the french field. However, ES will not choose for you which value to put in which field.
Instead, you need to modify your mapping like this
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "object",
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
and input document to be like this and you'll get the results you expect.
{
"prod-id": "1234567",
"prod-name": {
"en": "Test product",
"fr": "Produit d'essai"
}
}
As for the updating part, your partial document should be like this instead.
{
"doc" : {
"prod-name": {
"en": "Test"
}
}
}

ElasticSearch: search inside the array of objects

I have a problem with querying objects in array.
Let's create very simple index, add a type with one field and add one document with array of objects (I use sense console):
PUT /test/
PUT /test/test/_mapping
{
"test": {
"properties": {
"parent": {"type": "object"}
}
}
}
POST /test/test
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Now I want to search by both names "turkey" and "turkey,mugla-province" . The first query works fine:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey"}}}
But the second one returns nothing:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey,mugla-province"}}}
I tried a lot of stuff including:
"parent": {
"type": "nested",
"include_in_parent": true,
"properties": {
"label": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"store": true
}
}
}
But nothing helps. What do I miss?
Here's one way you can do it, using nested docs:
I defined an index like this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"parent": {
"type": "nested",
"properties": {
"label": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Indexed your document:
PUT /test_index/doc/1
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Then either of these queries will return it:
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey"
}
}
}
}
}
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey,mugla-province"
}
}
}
}
}
Here's the code I used:
http://sense.qbox.io/gist/6258f8c9ee64878a1835b3e9ea2b54e5cf6b1d9e
For search multiple terms use the Terms query instead of Term query.
"terms" : {
"tags" : [ "turkey", "mugla-province" ],
"minimum_should_match" : 1
}
There are various ways to construct this query, but this is the simplest and most elegant in the current version of ElasticSearch (1.6)

Resources