Query elasticsearch nested field by index(order of insert) - elasticsearch

I have an elasticsearch document with some nested objects(mapped as nested field)
for example:
{
"FirstName": "Test",
"LastName": "Test",
"Cost": 322.54,
"Email": "test#test.com",
"Vehicles": [
{
"Year": 2000,
"Make": "Mazda",
"Model": "6"
},
{
"Year": 2012,
"Make": "Ford",
"Model": "F150"
}
]
}
i am trying to do aggregations on specific index of the array, for example i want to sum the cost of documents which has Ford make but only on the first vehicle.
is it even possible at all? there is almost no information on the internet about elasticsearch nested fields and nothing about their index/order

It is possible to achieve what you want, but you also need to add the index order as a field inside your nested documents:
{
"FirstName": "Test",
"LastName": "Test",
"Cost": 322.54,
"Email": "test#test.com",
"Vehicles": [
{
"Year": 2000,
"Make": "Mazda",
"Model": "6",
"Index": 0
},
{
"Year": 2012,
"Make": "Ford",
"Model": "F150",
"Index": 1
}
]
}
And then you can query your index using the two conditions on Index and the Make like this:
{
"query": {
"nested": {
"path": "Vehicles",
"query": {
"bool": {
"filter": [
{
"match": {
"Vehicles.Index": 0
}
},
{
"match": {
"Vehicles.Make": "Ford"
}
}
]
}
}
}
}
}
In this specific case, the query is not going to yield any results, as you expect.

Related

Elasticsearch - nested types vs collapse/aggs

I have a use case where I need to find the latest data based on some fields.
The fields are:
category.name
category.type
createdAt
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
The problem is that category.* is a nested field and I can't aggs/collapse these fields because ES doesn't support it.
Mapping:
PUT data
{
"mappings": {
"properties": {
"createdAt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
},
"category": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
},
"approved": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
Data:
POST data/_create/1
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "no"
}
POST data/_create/2
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
POST data/_create/3
{
"category": [
{
"name": "Rick J.",
"level": "B"
}
],
"createdBy": "Rick",
"createdAt": "2022-03-02 02:09:27.527+0200",
"approved": "no"
}
I'm looking for either a search query that can handle that in an acceptable performant way, or a new object design without nested type where I could take advantage of aggs/collapse feature.
Any suggestion will be really appreciated.
About your first question,
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
I believe you can do something along those lines:
GET /72088168/_search
{
"query": {
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{
"match": {
"category.name": "John G."
}
},
{
"match": {
"category.level": "A"
}
}
]
}
}
}
},
"sort": [
{
"createdAt": {
"order": "desc"
}
}
],
"size":1
}
For the 2nd matter, it really depends on what you are aiming to do. could merge category.name and category.level in the same field. Such that you document would look like:
{
"category": ["John G. A","Chris T. A"],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
No more nested needed. Although I agree it feels like using tape to fix your issue.

Elastic Search Wildcard query with space failing 7.11

I am having my data indexed in elastic search in version 7.11. This is my mapping i got when i directly added documents to my index.
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}
I havent added the keyword part but no idea where it came from.
I am running a wild card query on the same. But unable to get data for keywords with spaces.
{
"query": {
"bool":{
"should":[
{"wildcard": {"name":"*hello world*"}}
]
}
}
}
Have seen many answers related to not_analyzed . And i have tried updating {"index":"true"} in mapping but with no help. How to make the wild card search work in this version of elastic search
Tried adding the wildcard field
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type" :"wildcard"
}
}
}
And got following response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
},
"status": 400
}
Adding a sample document to match
{
"_index": "accelerators",
"_type": "_doc",
"_id": "602ec047a70f7f30bcf75dec",
"_score": 1.0,
"_source": {
"acc_id": "602ec047a70f7f30bcf75dec",
"name": "hello world example",
"type": "Accelerator",
"description": "khdkhfk ldsjl klsdkl",
"teamMembers": [
{
"userId": "karthik.r#gmail.com",
"name": "Karthik Ganesh R",
"shortName": "KR",
"isOwner": true
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS",
"isOwner": false
}
],
"sectorObj": [
{
"item_id": 14,
"item_text": "Cross-sector"
}
],
"geographyObj": [
{
"item_id": 4,
"item_text": "Global"
}
],
"technologyObj": [
{
"item_id": 1,
"item_text": "Artificial Intelligence"
}
],
"themeColor": 1,
"mainImage": "assets/images/Graphics/Asset 35.svg",
"features": [
{
"name": "Ideation",
"icon": "Asset 1007.svg"
},
{
"name": "Innovation",
"icon": "Asset 1044.svg"
},
{
"name": "Strategy",
"icon": "Asset 1129.svg"
},
{
"name": "Intuitive",
"icon": "Asset 964.svg"
},
],
"logo": {
"actualFileName": "",
"fileExtension": "",
"fileName": "",
"fileSize": 0,
"fileUrl": ""
},
"customLogo": {
"logoColor": "#B9241C",
"logoText": "EC",
"logoTextColor": "#F6F6FA"
},
"collaborators": [
{
"userId": "muhammed.arif#gmail.com",
"name": "muhammed Arif P T",
"shortName": "MA"
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS"
}
],
"created_date": "2021-02-18T19:30:15.238000Z",
"modified_date": "2021-03-11T11:45:49.583000Z"
}
}
You cannot modify a field mapping once created. However, you can create another sub-field of type wildcard, like this:
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type": "text",
"fields": {
"wildcard": {
"type" :"wildcard"
},
"keyword": {
"type" :"keyword",
"ignore_above":256
}
}
}
}
}
When the mapping is updated, you need to reindex your data so that the new field gets indexed, like this:
POST http://localhost:9001/indexname/_update_by_query
And then when this finishes, you'll be able to query on this new field like this:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.wildcard": "*hello world*"
}
}
]
}
}
}

Item variants in ElasticSearch

What is the best way to use item variants in elasticsearch and retrieving only 1 item of the variant group?
For example, let's say I have the following items:
[{
"sku": "abc-123",
"group": "abc",
"color": "red",
"price": 10
},
{
"sku": "def-123",
"group": "def",
"color": "red",
"price": 10
},
{
"sku": "abc-456",
"group": "abc",
"color": "black",
"price": 20
}
]
The first item and the last one are in the same group, so I want only to return one of them if I query for items below the price of 20 (for example), but with the best hit score.
Feel free to suggest documents design and queries accordingly.
If your mapping is of Nested datatype, then you can use this to retrieve them.
GET index/type/_search
{
"size": 2000,
"_source": false,
"query": {
"bool": {
"filter": {
"nested": {
"path": "childs",
"query": {
"bool": {
"filter": {
"term": {
"childs.group.keyword": "abc"
}
}
}
},
"inner_hits": {}
}
}
}
}
}

Elasticsearch query on inner list and get only matching objects from list instead of entire list in result document

In following elastic search documents need to find comments from specific name eg "Mary Brown". Basically query on inner list and get only matching objects from list instead of entire list in result document. Is it possible. I have defined nested as mapping for 'comments'
{
"title": "Investment secrets",
"body": "What they don't tell you ...",
"tags": [ "shares", "equities" ],
"comments": [
{
"name": "Mary Brown",
"comment": "Lies, lies, lies",
"age": 42,
"stars": 1,
"date": "2014-10-18"
},
{
"name": "John Smith",
"comment": "You're making it up!",
"age": 28,
"stars": 2,
"date": "2014-10-16"
},
{
"name": "Mary Brown",
"comment": "making it!!!",
"age": 42,
"stars": 3,
"date": "2014-10-20"
}
]
}
Since you have properly mapped your comments field as nested, then yes this is possible using inner_hits, like this:
{
"_source": false,
"query": {
"nested": {
"path": "comments",
"inner_hits": { <---- use inner_hits here
"_source": [
"comment", "date"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"comments.name": "Mary Brown"
}
}
]
}
}
}
}
}

How to get the number of hits of several matching fields in one record?

I have records similar to
{
"who": "John",
"hobby": [
{"name": "gardening",
"skills": 2
},
{"name": "sleeping",
"skills": 3
},
{"name": "darts",
"skills": 2
}
]
}
,
{
"who": "Mary",
"hobby": [
{"name": "gardening",
"skills": 2
},
{"name": "volleyball",
"skills": 3
},
{"name": "kung-fu",
"skills": 2
}
]
}
I am looking at building a query which would answer the question: "how many hobbies with skills=2 do we have?"
The answer for the example above would be 3 ("gardening" is common to both, and each have another unique one).
Every "query" or "query"+"aggs" I tried returns in ['hits']['hits'] or ['aggregations']['sources']['buckets'] the number of matching documents, that is two in the case above (one for "John" and one for "Mary", each of them satisfying the query).
Is there a way to build a query so that it returns the total number of fields (in the example above: the elements of the list "hobby") which matched that query? (fields, not documents)
Note: If my documents were flat:
{"who": "John", "name": "gardening", "skills": 2},
{"who": "John", "name": "sleeping", "skills": 3},
(...)
{"who": "Mary", "name": "kung-fu", "skills": 2}
then a simple "query" to match "skills": 2 + an aggregation on "name" would have done the work
Yes, you can achieve this with the nested type and using inner_hits and/or nested aggregations.
So here is the mapping you should use:
curl -XPUT localhost:9200/hobbies -d '{
"mappings": {
"hob": {
"properties": {
"who": {
"type": "string"
},
"hobby": {
"type": "nested", <--- the hobby list is of type nested
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"skills": {
"type": "integer"
}
}
}
}
}
}
}
Then we can insert your two sample documents using the _bulk endpoint like this:
curl -XPOST localhost:9200/hobbies/hob/_bulk -d '
{"index":{}}
{"who":"John", "hobby":[{"name": "gardening","skills": 2},{"name": "sleeping","skills": 3},{"name": "darts","skills": 2}]}
{"index":{}}
{"who":"Mary", "hobby":[{"name": "gardening","skills": 2},{"name": "volley-ball","skills": 3},{"name": "kung-fu","skills": 2}]}
'
And finally, we can query your index for how many hobbies have skills: 2 like this:
curl -XPOST localhost:9200/hobbies/hob/_search -d '{
"_source": false,
"query": {
"nested": {
"path": "hobby",
"query": {
"term": {
"hobby.skills": 2
}
},
"inner_hits": {} <---- this will return only the matching nested fields with skills=2
}
},
"aggs": {
"hobbies": {
"nested": {
"path": "hobby"
},
"aggs": {
"skills": {
"filter": {
"term": {
"hobby.skills": 2
}
},
"aggs": {
"by_field": { <--- this will return a breakdown of the fields with skills=2
"terms": {
"field": "name"
}
}
}
}
}
}
}
}'
What this query will return you is
In the hits part, the four fields that have skills: 2
In the aggs part, a breakdown of the 3 distinct fields which have skills: 2

Resources