how do we use and query the keyword field? - elasticsearch

When I do
PUT /vehicles/_doc/123
{
"make" : "Honda Civic",
"color" : "Blue",
"from": "Japan",
"size": "Big",
"comment": "deja vu",
"HP" : 250,
"milage" : 24000,
"price": 19300.97
}
It automatically generate the index definition below:
{
"vehicles": {
"aliases": {},
"mappings": {
"properties": {
"HP": {
"type": "long"
},
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"comment": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"from": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"make": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"milage": {
"type": "long"
},
"price": {
"type": "float"
},
"size": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "vehicles",
"creation_date": "1670864230815",
"number_of_replicas": "1",
"uuid": "etLFicsvSXCpeuFiYCiT0g",
"version": {
"created": "8050299"
}
}
}
}
}
In the index, say color, it has type text, and there is a field keyword, how do we use and query the keyword field?

You just need to use color.keyword in your query when you want to query the keyword field, if you want to just query the text part, you simply use the color in field name.
text and keyword fields are tokenised and stored differently and used in different scenario, this answer will be useful for understand the difference.

Related

How to perform nested aggregation in child parent relationship

I am using elasticsearch 7.11 and have implemented parent child relation on of the base reason was my updates were very frequent and time a new child could be added under 1 parent,
My project is something managing all the computers in the network all the activity related to the endpoints should be logged for the analytics purpose so.
My mapping is some thing.
PcInformation -> User
Now Pc has its own information the main thing to note is the activationTime and the user has its Department, username, role etc.
Now I want to get the top departments w.r.t to PC and its time.
Say I want to know which departments have most number of PC in 2020.
What I am currently doing is first get all the PC using the user relationship using hasChild query is below.
{
"query": {
"bool": {
"filter": [
{
"has_child": {
"type": "user",
"query": {
"nested": {
"path": "user",
"query": {
"match_all": {}
}
}
}
}
},
{
"range": {
"regDate": {
"gte": "2020-04-11",
"lte": "2022-04-31"
}
}
}
]
}
}
}
This would return me all the PC in specific time.
And then I am performing aggregation first on user than sub aggregation on pcConnection data for the time based aggragation now I want to know the name of the department but this is not in the the pc information.
One thing is to put user information in the pc but I would lost for what I am using parent child model.
Is there anyway to do so ?
Updated
The Sample Mapping
{
"pcinformation": {
"mappings": {
"properties": {
"_class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "nested",
"properties": {
"userGroup": {
"type": "keyword"
},
"userTeam": {
"type": "keyword"
},
"userCode": {
"type": "long"
},
"userName": {
"type": "keyword"
}
}
},
"antivirus": {
"type": "nested",
"properties": {
"datetime": {
"type": "date"
},
"name": {
"type": "keyword"
}
}
},
"cpuId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"domainName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firewall": {
"type": "nested",
"properties": {
"datetime": {
"type": "date"
},
"status": {
"type": "keyword"
}
}
},
"friendlyName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"activationDate": {
"type": "date"
},
"macId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"osArch": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"osType": {
"type": "keyword"
},
"osVersion": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"pcSignature": {
"type": "text"
},
"pcSignatureHash": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"relation": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"infection": [
"user"
]
}
},
"userName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"vm": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
So I got two records as this is parent child the one is
{
"_index": "pcInformation",
"_type": "_doc",
"_id": "abcd",
"_version": 1,
"_score": 1,
"_source": {
"_class": "stor.doc.pcInformation",
"pcSignatureHash": "abcd",
"pcSignature": "dddd",
"name": "DESKTOP8JGBPB9",
"userName": "Win1064",
"osType": "Windows.10.Enterprise",
"domainName": "DESKTOP8JGBPB9",
"cpuId": "NOCPUID",
"osVersion": "10.0.19042",
"osArch": "32",
"macId": "0800278A763D",
"activationDate": "2021-05-25T08:46:30.510Z",
"vm": "No VM",
"friendlyName": "Windows Defender",
"relation": {
"name": "pcInformation"
}
}
}
The other one is user information.
{
"_index": "pcInformation",
"_type": "_doc",
"_id": "Qw60onkBDTnt1BMJOeq0",
"_version": 1,
"_score": 1,
"_routing": "abcd",
"_source": {
"_class": "stor.doc.pcInformation",
"agent": {
"userCode": 1,
"userGroup":"admin",
"userRole":"manager"
},
"relation": {
"name": "user",
"parent": "abcd"
}
}
}

my phrase_prefix query does not work for numeric values

my query is pretty simple, it looks like this:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "something_to_search",
"type": "phrase_prefix",
"fields": [
"name",
"id"
...
],
"lenient": true
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
name is text value and id is numeric value, if I search for "Jo" I will get people who's names starts with "Jo", but if I search for "123" I wont get people who's id's starts with "123", but if I search for the exact id I will get a result.
can someone please tell me how can I get also prefix queries on numeric?
my mappings:
{
"people_db": {
"mappings": {
"person": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"street": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"streetNumber": {
"type": "long"
},
"zipCode": {
"type": "long"
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
}
}
}
}
}
}

Negative values in Elasticsearch range queries

I have find this problem while making a watch in Elasticsearch, this is my query:
"body": {
"query": {
"bool": {
"must": [
{
"range": {
"percent": {
"lt": 100
}
It returns successfully every document with percent between 0 and 99, however it ignores those with negative value. The "percent" field is mapped as long number in the index.
Can you help me?
Thanks
Edit: Return of executing "curl -XGET localhost:9200/monthly-tickets-2018-06"
{
"monthly-tickets-2018-06": {
"aliases": {},
"mappings": {
"monthly_tickets": {
"properties": {
"percent": {
"type": "long"
},
"priority": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"project": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ref": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "date"
}
}
}
},
"settings": {
"index": {
"creation_date": "1528946562231",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "aIfLjFwqS_aCzQFvZm0L5Q",
"version": {
"created": "6020399"
},
"provided_name": "monthly-tickets-2018-06"
}
}
}
}

Elasticsearch - using nested object value in Function Score

I currently have a nested object interest_scores in ES that looks like this:
[{
username: 'Somebody',
interest_scores: [
{ name: 'Running', score: 10 }
{ name: 'Food and drinks', score: 21 }
]
},
{
username: 'SomebodyElse',
interest_scores: [
{ name: 'Running', score: 7 }
{ name: 'Food and drinks', score: 29 }
]
}]
When I enter the search term Running I would like the user with the highest score for Running to get returned first.
I know the way to do this is to use a Function Score Query but I am not sure how to use the matching search term in the function / script. What I think is that the query will return all documents that have the interest "Running" and then I could use something like interest_scores.{match}.score to add to or multiply by the document score.
Any help with this would be greatly appreciated!
As requested, here is the mapping:
{
"influencers": {
"mappings": {
"influencer": {
"properties": {
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"geo": {
"type": "geo_point"
},
"hashtags": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"interest_scores": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"score": {
"type": "long"
}
}
},
"interests": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"language": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"location": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"country_code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lat": {
"type": "float"
},
"lng": {
"type": "float"
},
"state_code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"subdivision": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"network_data": {
"properties": {
"facebook": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"instagram": {
"properties": {
"bio": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"engagement": {
"type": "float"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"picture": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"reach": {
"type": "long"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"pinterest": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"twitter": {
"properties": {
"bio": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"engagement": {
"type": "float"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"picture": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"reach": {
"type": "long"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"youtube": {
"properties": {
"bio": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"engagement": {
"type": "float"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"picture": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"reach": {
"type": "long"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"videos": {
"type": "long"
},
"views": {
"type": "long"
},
"views_per_video": {
"type": "float"
}
}
}
}
},
"networks": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"picture": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"total_reach": {
"type": "long"
},
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
I do not have a function score query yet, I am only testing in the Dev Tools of Kibana - I do have all of the other filters working correctly though. I am just looking to say "If the search term matches a interest_scores.name then sort the hits by the interest_scores.score of that interest_scores.name
Update
The following seems to be working when I test it in Kibana dev tools:
{
"query": {
"nested": {
"path": "interest_scores",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match": { "interest_scores.name": "Running" }
},
"script_score": {
"script": "_score + doc['interest_scores.score'].value"
}
}
}
}
}
}
I have tested it with a few different search terms and it always returns the highest score first, but what is weird is that I get the same results when I remove the script_score function. Can anyone tell me if this is a good solution, or why it works without the script_score?
As described here, you can sort by nested fields:
{
"_source": false, # for inner hits - you can remove it
"query": {
"nested": {
"path": "interest_scores",
"filter": {
"range": {
"interest_scores.score": {
"gte": "0"
}
}
},
"inner_hits": {} # for inner hits - you can remove it
}
},
"sort": {
"interest_scores.score": {
"order": "desc",
"mode": "max",
"nested_filter": {
"range": {
"interest_scores.score": {
"gte": "0"
}
}
}
}
}
}
*Pay attention that, you can use the inner_hits ability to show only relevant nested documents. If all inner hits documents are relevant - please remove the marked lines.
**Use the filter on score field or on any other field (e.g: name you would like to filter by).
EDIT 1:
If you want to get the sorted scores of specific name, try:
{
"_source": false,
"query": {
"nested": {
"path": "interest_scores",
"filter": {
"term": {
"interest_scores.name": "SCORE_NAME"
}
},
"inner_hits": {}
}
},
"sort": {
"interest_scores.score": {
"order": "desc",
"mode": "max",
"nested_filter": {
"range": {
"interest_scores.score": {
"gte": "0"
}
}
}
}
}
}
Put the desired score name instead SCORE_NAME.

Sort a nested array and return top 10 in elastic

I have a nested data type in an elastic index and want to sort this ascending for all returned results. I have tried the following:
GET indexname/_search
{
"_source" : ["m_iTopicID", "m_iYear", "m_Companies"],
"query": {
"terms":{
"m_iTopicID": [11,12,13]
}
},
"sort" : [
{
"m_Companies.value" : {
"order" : "asc",
"nested_path" : "m_Companies"
}
}
]
}
The mapping of the index as follows:
{
"indexname": {
"mappings": {
"topicyear": {
"properties": {
"m_Companies": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_People": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_Places": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_Subtopics": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_fActivation": {
"type": "float"
},
"m_iDocBodyWordCnt": {
"type": "long"
},
"m_iNodeID": {
"type": "long"
},
"m_iTopicID": {
"type": "long"
},
"m_iYear": {
"type": "long"
},
"m_szDocID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szDocTitle": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szGeo1": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_szSourceType": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "float"
}
}
},
"m_szSrcUrl": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"m_szTopicNames": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
This returns all topics with ID 11, 12 or 13 with a list of m_Companies... but the lists aren't sorted ascending by the value field.
I would then like to only return the top 10 of each list. So the list doesn't return hundreds like currently but just n. If I can't achieve this option I will just obtain the top 10 at the front-end with a javascript splice(0,10) but it would be great if elastic could do this for me.
Thanks in advance.
Since you provided the sort in the main/parent level query, this will sort only the parent/root documents. As you might have observed with the results that documents are sorted with minimum value for m_Companes.value.
To sort the nested documents for each document you have to go deep inside the nested document and apply sort as m_Companies are subdocuments in the parent document. You have to use nested inner_hits and then sort the inner_hits.
This github issue has very good example of what i was trying to explain as how this sorts only the parent/root document based on values in nested documents.
Since you want all documents in nested, so you can let the nested query to fetch all nested documents using match_all and sort based on value field.
you can use the following query
{
"_source": ["m_iYear", "m_Companies"],
"query": {
"bool": {
"must": [{
"terms": {
"m_iTopicID": [11, 12, 13]
}
},
{
"nested": {
"path": "m_Companies",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": [{
"m_Companies.value": "asc"
}]
}
}
}
]
}
},
"sort": [{
"m_Companies.value": {
"order": "asc",
"nested_path": "m_Companies"
}
}]
}
Hope this helps,
Thanks

Resources