I am using elastic search version 1.2.1 and trying to sort results on an integer field but it's not working the way I believe it has to and I can't sort in asc or desc order. I am using Sense extension of Chrome and Elastic search listens port 9200 at localhost.
Here's how I defined index:
PUT keyword_test
Then I added mapping for keyword_test index:
PUT /keyword_test/_mapping/keyword
{
"keyword": {
"properties" : {
"id": {
"type": "string"
},
"search_date": {
"type": "string"
},
"keyword": {
"type": "string",
"index": "analyzed"
},
"count": {
"type": "integer"
}
}
}
}
Then I added different keywords with different counts and try to search among them with the query below:
GET _search
{
"sort": {
"count": {
"order": "asc",
"ignore_unmapped": true
}
},
"query": {
"fuzzy": {
"keyword": "iphone"
}
}
}
I get the result below:
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "keyword_test",
"_type": "keyword",
"_id": "8",
"_score": null,
"_source": {
"id": 8,
"count": 9000,
"keyword": "iphone 5s",
"search_date": "2015-05-05"
},
"sort": [
9000
]
},
{
"_index": "keyword_test",
"_type": "keyword",
"_id": "10",
"_score": null,
"_source": {
"id": 10,
"count": 9500,
"keyword": "iphone 6 plus",
"search_date": "2015-05-05"
},
"sort": [
9500
]
},
{
"_index": "keyword_test",
"_type": "keyword",
"_id": "9",
"_score": null,
"_source": {
"id": 9,
"count": 9100,
"keyword": "iphone 6",
"search_date": "2015-05-05"
},
"sort": [
9100
]
}
]
}
Result should be 9000, 9100, 9500 order but it is in 9000, 9500, 9100 order. I also get SearchParseException if I remove ignore_unmapped. What should I do? Am I missing some mapping for count field? Any help would be appreciated, thanks.
Related
I have data in Elasticsearch in the below format -
"segments": [
{"id": "ABC", "value":123},
{"id": "PQR", "value":345},
{"id": "DEF", "value":567},
{"id": "XYZ", "value":789},
]
I want to retrieve all segments where id is "ABC" or "DEF".
I looked up the docs (https://www.elastic.co/guide/en/elasticsearch/reference/7.9/query-dsl-nested-query.html) and few examples on YouTube but the all look to retrieve only a single object while I want to retrieve more than 1.
Is there a way to do this?
You can use nested query with inner hits as shown here.
I hope your index mapping is looks like below and segments field is define as nested
"mappings": {
"properties": {
"segments": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "long"
}
}
}
}
}
You can use below Query:
{
"_source" : false,
"query": {
"nested": {
"path": "segments",
"query": {
"terms": {
"segments.id.keyword": [
"ABC",
"DEF"
]
}
},
"inner_hits": {}
}
}
}
Response:
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "73895503",
"_id": "TmM8iYMBrWOLJcwdvQGG",
"_score": 1,
"inner_hits": {
"segments": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "73895503",
"_id": "TmM8iYMBrWOLJcwdvQGG",
"_nested": {
"field": "segments",
"offset": 0
},
"_score": 1,
"_source": {
"id": "ABC",
"value": 123
}
},
{
"_index": "73895503",
"_id": "TmM8iYMBrWOLJcwdvQGG",
"_nested": {
"field": "segments",
"offset": 2
},
"_score": 1,
"_source": {
"id": "DEF",
"value": 567
}
}
]
}
}
}
}
]
}
I have a lot of keywords that I want to extract from a query and tell the position(offset) of were the keywords are in that text
So this is my progress I created two custom analyzers keyword and shingles:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"shingle"
]
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "string",
"index_analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
And here are the keywords that I say:
{
"hits": {
"total": 2000,
"hits": [
{
"id": 1,
"keyword": "python programming"
},
{
"id": 2,
"keyword": "facebook"
},
{
"id": 3,
"keyword": "Microsoft"
},
{
"id": 4,
"keyword": "NLTK"
},
{
"id": 5,
"keyword": "Natural language processing"
}
]
}
}
And I make a query something like this:
{
"query": {
"match": {
"keyword": "I post a lot of things on Facebook and quora"
}
}
}
So with the code above I get
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.009332742,
"hits": [
{
"_index": "test",
"_type": "your_type",
"_id": "2",
"_score": 0.009332742,
"_source": {
"id": 2,
"keyword": "facebook"
}
},
{
"_index": "test",
"_type": "your_type",
"_id": "4",
"_score": 0.009207102,
"_source": {
"id": 4,
"keyword": "quora"
}
}
]
}
}
But I don't know were in the text are that words the offset of those words:
I want to know that quora start at index 40. But not to highlight them between tags or something like this.
I want to mention that my post is based on this post
Extract keywords (multi word) from text using elastic search
I have the following index template
{
"index_patterns": "notificationtiles*",
"order": 1,
"version": 1,
"aliases": {
"notificationtiles": {}
},
"settings": {
"number_of_shards": 5,
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"influencerId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"friendId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"message": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"type": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"sponsorshipCharityId": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"createdTimestampEpochInMilliseconds": {
"type": "date",
"format": "epoch_millis",
"index": false
},
"updatedTimestampEpochInMilliseconds": {
"type": "date",
"format": "epoch_millis",
"index": false
},
"createdDate": {
"type": "date"
},
"updatedDate": {
"type": "date"
}
}
}
}
with the following query
{
"query": {
"bool": {
"must": [
{
"match": {
"influencerId": "52407710-f7be-49c1-bc15-6d52363144a6"
}
},
{
"match": {
"type": "friend_completed_sponsorship"
}
}
]
}
},
"size": 0,
"aggs": {
"friendId": {
"terms": {
"field": "friendId",
"size": 2
},
"aggs": {
"latest": {
"top_hits": {
"sort": [
{
"createdDate": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"sponsorshipCharityId",
"message",
"createdDate"
]
},
"size": 1
}
}
}
}
}
}
which returns
{
"took": 72,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 12,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"friendId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 7,
"buckets": [
{
"key": "cf750fd8-998f-4dcd-9c88-56b2b6d6fce9",
"doc_count": 3,
"latest": {
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "416a8e07-fd72-46d4-ade1-b9442ef46978",
"_score": null,
"_source": {
"createdDate": "2020-06-24T17:35:17.816842Z",
"sponsorshipCharityId": "336de13c-f522-4796-9218-f373ff0b4373",
"message": "Contact Test 788826 Completed Sponsorship!"
},
"sort": [
1593020117816
]
}
]
}
}
},
{
"key": "93ab55c5-795f-44b0-900c-912e3e186da0",
"doc_count": 2,
"latest": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "66913b8f-94fe-49fd-9483-f332329b80dd",
"_score": null,
"_source": {
"createdDate": "2020-06-24T17:57:17.816842Z",
"sponsorshipCharityId": "dbad136c-5002-4470-b85d-e5ba1eff515b",
"message": "Contact Test 788826 Completed Sponsorship!"
},
"sort": [
1593021437816
]
}
]
}
}
}
]
}
}
}
However, I'd like the results to include the latest documents (ordered by createdDate desc), for example the following document
{
"_index": "notificationtiles-1",
"_type": "_doc",
"_id": "68a2a0a8-27aa-4347-8751-d7afccfa797d",
"_score": 1.0,
"_source": {
"id": "68a2a0a8-27aa-4347-8751-d7afccfa797d",
"influencerId": "52407710-f7be-49c1-bc15-6d52363144a6",
"friendId": "af342805-1990-4794-9d67-3bb2dd1e36dc",
"message": "Contact Test 788826 Completed Sponsorship!",
"type": "friend_completed_sponsorship",
"sponsorshipCharityId": "b2db72e6-a70e-414a-bf8b-558e6314e7ec",
"createdDate": "2020-06-25T17:35:17.816842Z",
"updatedDate": "2020-06-25T17:35:17.816876Z",
"createdTimestampEpochInMilliseconds": 1593021437817,
"updatedTimestampEpochInMilliseconds": 1593021437817
}
}
I need to get the 2 latests documents grouped by friendId with the latest document per friendId. The part of grouping by friendId with the latest document per friendId, works fine. However, I'm unable to sort the index by createdDate desc before the aggregation happens.
essentially, i'd like to sort the index by createdDate desc, before the aggregation takes place. I don't want to have a parent aggregate by createdDate since that wouldn't result in unique friendId. How can that be achieved?
It looks like you need to set the order property of your terms aggregation. By default they are ordered by hit count. You want them to be ordered by the max createdDate. So you should add a sub aggregation to calculate the max createdDate, and then you can use the ID of that aggregation to order the parent terms aggregation.
I am trying to search for all the unique names in the index test_nested.
GET test_nested/_mappings
{
"test_nested": {
"mappings": {
"my_type": {
"properties": {
"group": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
GET test_nested/_search
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test_nested",
"_type": "my_type",
"_id": "AWG5iVBz4bQsVnslc9gL",
"_score": 1,
"_source": {
"group": "fans",
"user": [
{
"name": "Linux"
},
{
"name": "Android (operating system)"
},
{
"name": "Widows 10"
}
]
}
},
{
"_index": "test_nested",
"_type": "my_type",
"_id": "AWG5ieKW4bQsVnslc9gM",
"_score": 1,
"_source": {
"group": "fans",
"user": [
{
"name": "Bitcoin"
},
{
"name": "PHP"
},
{
"name": "Microsoft Windows"
}
]
}
},
{
"_index": "test_nested",
"_type": "my_type",
"_id": "AWG5irrV4bQsVnslc9gN",
"_score": 1,
"_source": {
"group": "fans",
"user": [
{
"name": "Windows XP"
}
]
}
},
{
"_index": "test_nested",
"_type": "my_type",
"_id": "1",
"_score": 1,
"_source": {
"group": "fans",
"user": [
{
"name": "iOS"
},
{
"name": "Android (operating system)"
},
{
"name": "Widows 10"
},
{
"name": "Widows XP"
}
]
}
}
]
}
}
I want all the unique names for a term. i.e. if I search for "wi"* then I should get [Microsoft Windows, Widows 10, Windows XP]
I don't know exactly what you mean but I use that query to list all statuses:
GET order/default/_search
{
"size": 0,
"aggs": {
"status_terms": {
"terms": {
"field": "status.keyword",
"missing": "N/A",
"min_doc_count": 0,
"order": {
"_key": "asc"
}
}
}
}
}
My model has status field and that query lists all statuses.
This is bucket aggregations
One of fields in result is:
sum_other_doc_count - Elastic returns the top unique terms. So if you have many different terms then some of them will not appear in the results. This field is a sum of documents which will not be a part of the response.
For nested objects try to read and use Nested Query docs
I found the solution. Hope it helps someone.
GET record_new/_search
{
"size": 0,
"query": {
"term": {
"software_tags": {
"value": "windows"
}
}
},
"aggs": {
"software_tags": {
"terms": {
"field": "software_tags.keyword",
"include" : ".*Windows.*",
"size": 10000,
"order": {
"_count": "desc"
}
}
}
}
}
I have an index full of keywords and based on those keywords I want to extract the keywords from the input text.
Following is the sample keyword index. Please note that the keywords can be of multiple words too, or basically they are tags which are unique.
{
"hits": {
"total": 2000,
"hits": [
{
"id": 1,
"keyword": "thousand eyes"
},
{
"id": 2,
"keyword": "facebook"
},
{
"id": 3,
"keyword": "superdoc"
},
{
"id": 4,
"keyword": "quora"
},
{
"id": 5,
"keyword": "your story"
},
{
"id": 6,
"keyword": "Surgery"
},
{
"id": 7,
"keyword": "lending club"
},
{
"id": 8,
"keyword": "ad roll"
},
{
"id": 9,
"keyword": "the honest company"
},
{
"id": 10,
"keyword": "Draft kings"
}
]
}
}
Now, if I input the text as "I saw the news of lending club on facebook, your story and quora" the output of the search should be ["lending club", "facebook", "your story", "quora"]. Also the search should be case insensetive
There's just one real way to do this. You'll have to index your your data as keywords and search it analyzed with shingles:
See this reproduction:
First, we'll create two custom analyzers: keyword and shingles:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase"
]
},
"my_analyzer_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"shingle"
]
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"keyword": {
"type": "string",
"index_analyzer": "my_analyzer_keyword",
"search_analyzer": "my_analyzer_shingle"
}
}
}
}
}
Now let's create some sample data using what you gave us:
POST /test/your_type/1
{
"id": 1,
"keyword": "thousand eyes"
}
POST /test/your_type/2
{
"id": 2,
"keyword": "facebook"
}
POST /test/your_type/3
{
"id": 3,
"keyword": "superdoc"
}
POST /test/your_type/4
{
"id": 4,
"keyword": "quora"
}
POST /test/your_type/5
{
"id": 5,
"keyword": "your story"
}
POST /test/your_type/6
{
"id": 6,
"keyword": "Surgery"
}
POST /test/your_type/7
{
"id": 7,
"keyword": "lending club"
}
POST /test/your_type/8
{
"id": 8,
"keyword": "ad roll"
}
POST /test/your_type/9
{
"id": 9,
"keyword": "the honest company"
}
POST /test/your_type/10
{
"id": 10,
"keyword": "Draft kings"
}
And finally query to run search:
POST /test/your_type/_search
{
"query": {
"match": {
"keyword": "I saw the news of lending club on facebook, your story and quora"
}
}
}
And this is result:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.009332742,
"hits": [
{
"_index": "test",
"_type": "your_type",
"_id": "2",
"_score": 0.009332742,
"_source": {
"id": 2,
"keyword": "facebook"
}
},
{
"_index": "test",
"_type": "your_type",
"_id": "7",
"_score": 0.009332742,
"_source": {
"id": 7,
"keyword": "lending club"
}
},
{
"_index": "test",
"_type": "your_type",
"_id": "4",
"_score": 0.009207102,
"_source": {
"id": 4,
"keyword": "quora"
}
},
{
"_index": "test",
"_type": "your_type",
"_id": "5",
"_score": 0.0014755741,
"_source": {
"id": 5,
"keyword": "your story"
}
}
]
}
}
So what it does behind the scenes?
It indexes your documents as whole keywords (It emits whole string as a single token). I've also added asciifolding filter, so it normalizes letters, i.e. é becomes e) and lowercase filter (case insensitive search). So for instance Draft kings is indexed as draft kings
Now search analyzer is using same logic, except that its' tokenizer is emitting word tokens and on top of that creates shingles(combination of tokens), which will match your keywords indexed as in first step.