Related
I'm pretty new is ElasticSearch and will be thankful for the help.
I have an index.
It's an example of data:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager"
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC."
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager"
}
}
]
}
}
Its mappings:
{
"profile_similarity_ivan" : {
"mappings" : {
"properties" : {
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"headline" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
}
}
}
}
}
I would like for fields country and headline to count a number of specific words.
For example, if I search for 'US', an output might be like this:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC.",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager",
"country_count_US" : 0,
"headline_count_US" : 0
}
}
]
}
}
I notice that it can be done using runtime fields in ElasticSearch and scripting with painless
In general, I have issues with writing the painless script for this task.
Can you help me please write this script and create the right query in ElasticSearch for this task please?
Also will be thankful for any advice for this task can be finished by other functionality (not only by runtime fields) of ElasticSearch.
Thanks
This can be done but you need to fix three things.
You seem not to have created a mapping for your index, what you show look like the dynamic mappings ES assigns on its own to any given field. Even with your current mappings, you can simply run a terms aggregation on the results of your query and you will get the count of the words that you need. Just pass them as individual terms to be aggregated. Something like this will give you some output.
GET _search
{
"query": {
"match": {
"Country": "US"
}
},
"aggs": {
"country_count": {
"composite" : {
"sources" : [
{"country" : {"terms" : {"field" : "country"}}},
{"id" : {"terms" : {"field" : "_id", "include" : "US"}}}
]
}
}
}
}
The compostie aggregation will return PER DOCUMENT, how many times the word "US" has come.
Just go look at the docs about how to paginate the composite aggregation. This way you can get all the required counts for EVERY SINGLE DOCUMENT.
Composite Aggregation
Generally aggregations are used to get such answers. You may need to tweak the mappings of the fields, to use different analyzers(whitespace).
But generally you just need to use terms aggregations.
HTH.
Here is my search query:
GET /bank/_search?q=*&sort=account_number:asc&pretty
which matches all of the 1000 docs in the bank index:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank LRl6fcZsSR6a0BMxIAQzIA 1 1 1000 0 414.3kb 414.3kb
green open .kibana_task_manager 2hiY91XzQQKAzmnXhpQLTA 1 0 2 0 12.8kb 12.8kb
green open .kibana_1 G4vY0_JASzqERwKlbqMqAg 1 0 4 0 14.7kb 14.7kb
yellow open customer 0B2gsBy3Rp-5vkMFhto-Wg 1 1 2 0 6.7kb 6.7kb
Below are my search results. Under "hits" at the top, you can see that there were 1000 hits, which is what I expected (all the _docs). Yet, kibana only displays 9 of the hits. Where are the rest?
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"_score" : null,
"_source" : {
"account_number" : 0,
"balance" : 16623,
"firstname" : "Bradshaw",
"lastname" : "Mckenzie",
"age" : 29,
"gender" : "F",
"address" : "244 Columbus Place",
"employer" : "Euron",
"email" : "bradshawmckenzie#euron.com",
"city" : "Hobucken",
"state" : "CO"
},
"sort" : [
0
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke#pyrami.com",
"city" : "Brogan",
"state" : "IL"
},
"sort" : [
1
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"account_number" : 2,
"balance" : 28838,
"firstname" : "Roberta",
"lastname" : "Bender",
"age" : 22,
"gender" : "F",
"address" : "560 Kingsway Place",
"employer" : "Chillium",
"email" : "robertabender#chillium.com",
"city" : "Bennett",
"state" : "LA"
},
"sort" : [
2
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"account_number" : 3,
"balance" : 44947,
"firstname" : "Levine",
"lastname" : "Burks",
"age" : 26,
"gender" : "F",
"address" : "328 Wilson Avenue",
"employer" : "Amtap",
"email" : "levineburks#amtap.com",
"city" : "Cochranville",
"state" : "HI"
},
"sort" : [
3
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"account_number" : 4,
"balance" : 27658,
"firstname" : "Rodriquez",
"lastname" : "Flores",
"age" : 31,
"gender" : "F",
"address" : "986 Wyckoff Avenue",
"employer" : "Tourmania",
"email" : "rodriquezflores#tourmania.com",
"city" : "Eastvale",
"state" : "HI"
},
"sort" : [
4
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"account_number" : 5,
"balance" : 29342,
"firstname" : "Leola",
"lastname" : "Stewart",
"age" : 30,
"gender" : "F",
"address" : "311 Elm Place",
"employer" : "Diginetic",
"email" : "leolastewart#diginetic.com",
"city" : "Fairview",
"state" : "NJ"
},
"sort" : [
5
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : null,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond#netagy.com",
"city" : "Dante",
"state" : "TN"
},
"sort" : [
6
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "7",
"_score" : null,
"_source" : {
"account_number" : 7,
"balance" : 39121,
"firstname" : "Levy",
"lastname" : "Richard",
"age" : 22,
"gender" : "M",
"address" : "820 Logan Street",
"employer" : "Teraprene",
"email" : "levyrichard#teraprene.com",
"city" : "Shrewsbury",
"state" : "MO"
},
"sort" : [
7
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "8",
"_score" : null,
"_source" : {
"account_number" : 8,
"balance" : 48868,
"firstname" : "Jan",
"lastname" : "Burns",
"age" : 35,
"gender" : "M",
"address" : "699 Visitation Place",
"employer" : "Glasstep",
"email" : "janburns#glasstep.com",
"city" : "Wakulla",
"state" : "AZ"
},
"sort" : [
8
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "9",
"_score" : null,
"_source" : {
"account_number" : 9,
"balance" : 24776,
"firstname" : "Opal",
"lastname" : "Meadows",
"age" : 39,
"gender" : "M",
"address" : "963 Neptune Avenue",
"employer" : "Cedward",
"email" : "opalmeadows#cedward.com",
"city" : "Olney",
"state" : "OH"
},
"sort" : [
9
]
}
]
}
}
Okay:
hits.hits – actual array of search results (defaults to first 10 documents)
You can control the size of what kibana outputs like this:
GET /bank/_search
{
"query": { "match_all": {} },
"size": 50
}
If size isn't specified:
GET /bank/_search
{
"query": { "match_all": {} },
}
then size defaults to 10.
By default the size parameter is set to a value of 10 and therefore you are able to see only 10 results. To get more results you can adjust this parameter according to you needs. Sometimes it would be better to use size parameter along with from parameter to get results page wise as in when not whole data is required in one go.
So either you can use "size": 1000 or you can set "from": 0, "size": 100 to get first 100 results and the keep on sending same query and just change the value of from param on each request. For e.g. to get next 100 results set "from": 100.
To get all 1000 results add size param as below:
{
"query":{
// your query here
},
"size": 1000
}
You can read more on from/size here.
As a query parameter you can add size as
GET /bank/_search?q=*&sort=account_number:asc&size=1000&pretty
I'm pretty new to elastic search and would like to write a query for all of the values a specific field? I mean, say i have a field "Number" and "change_manager_group", is there a query to perform list all the numbers of which "change_manager_group = Change Managers - 2"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 10,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1700,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test-tem-changes",
"_type" : "_doc",
"_id" : "CHG0393073_1554800400000",
"_score" : 1.0,
"_source" : {
"work_notes" : "",
"priority" : "4 - Low",
"planned_start" : 1554800400000,
"Updated_by" : "system",
"Updated" : 1554819333000,
"phase" : "Requested",
"Number" : "CHG0312373",
"change_manager_group" : "Change Managers - 1",
"approval" : "Approved",
"downtime" : "false",
"close_notes" : "",
"Standard_template_version" : "",
"close_code" : null,
"actual_start" : 1554819333000,
"closed_by" : "",
"Type" : "Normal"
}
},
{
"_index" : "test-tem-changes",
"_type" : "_doc",
"_id" : "CHG0406522_0",
"_score" : 1.0,
"_source" : {
"work_notes" : "",
"priority" : "4 - Low",
"planned_start" : 0,
"Updated_by" : "svcmdeploy_automation",
"Updated" : 1553320559000,
"phase" : "Requested",
"Number" : "CHG041232",
"change_manager_group" : "Change Managers - 2",
"approval" : "Approved",
"downtime" : "false",
"close_notes" : "Change Installed",
"Standard_template_version" : "",
"close_code" : "Successful",
"actual_start" : 1553338188000,
"closed_by" : "",
"Type" : "Automated"
}
},
{
"_index" : "test-tem-changes",
"_type" : "_doc",
"_id" : "CHG0406526_0",
"_score" : 1.0,
"_source" : {
"work_notes" : "",
"priority" : "4 - Low",
"planned_start" : 0,
"Updated_by" : "svcmdeploy_automation",
"Updated" : 1553321854000,
"phase" : "Requested",
"Number" : "CHG0412326",
"change_manager_group" : "Change Managers - 2",
"approval" : "Approved",
"downtime" : "false",
"close_notes" : "Change Installed",
"Standard_template_version" : "",
"close_code" : "Successful",
"actual_start" : 1553339629000,
"closed_by" : "",
"Type" : "Automated"
}
},
I tried this after a bit of googling, but that errors out
curl -XGET "http://localhost:9200/test-tem-changes/_search?pretty=true" -H 'Content-Type: application/json' -d '
> {
> "query" : { "Number" : {"query" : "*"} }
> }
> '
What am i missing here?
To get all the documents where change_manager_group ==Change Managers - 2 you want to use a Term Query. Below I am wrapping it in a filter context so that it is faster (does not score relevance).
If change_manager_group is not a keyword mapped field, you may have to use change_manager_group.keyword depending on your mapping.
GET test-tem-changes/_search
{
"query": {
"bool": {
"filter": {
"term": {
"change_manager_group": "Change Managers - 2"
}
}
}
}
}
I'm currently trying to setup my suggestion implementation.
My index settings / mappings:
{
"settings" : {
"analysis" : {
"analyzer" : {
"trigrams" : {
"tokenizer" : "mesh_default_ngram_tokenizer",
"filter" : [ "lowercase" ]
},
"suggestor" : {
"type" : "custom",
"tokenizer" : "standard",
"char_filter" : [ "html_strip" ],
"filter" : [ "lowercase" ]
}
},
"tokenizer" : {
"mesh_default_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "3",
"max_gram" : "3"
}
}
}
},
"mappings" : {
"default" : {
"properties" : {
"uuid" : {
"type" : "string",
"index" : "not_analyzed"
},
"language" : {
"type" : "string",
"index" : "not_analyzed"
},
"fields" : {
"properties" : {
"content" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "trigrams",
"fields" : {
"suggest" : {
"type" : "completion",
"analyzer" : "suggestor"
}
}
}
}
}
}
}
}
My query:
{
"suggest": {
"query-suggest" : {
"text" : "som",
"completion" : {
"field" : "fields.content.suggest"
}
}
},
"_source": ["fields.content", "uuid", "language"]
}
The query result:
{
"took" : 44,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "c6b7391075cc437ab7391075cc637a05-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content s<b>om</b>e text with more content you can poke a stick at"
},
"uuid" : "c6b7391075cc437ab7391075cc637a05"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "96e2c6765b6841fea2c6765b6871fe36-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content no text with more content you can poke a stick at"
},
"uuid" : "96e2c6765b6841fea2c6765b6871fe36"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "fd1472555e9d4d039472555e9d5d0386-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content someth<strong>ing</strong> completely different"
},
"uuid" : "fd1472555e9d4d039472555e9d5d0386"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "5a3727b134064de4b727b134063de4c4-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content some<strong>what</strong> strange content"
},
"uuid" : "5a3727b134064de4b727b134063de4c4"
}
}, {
"_index" : "node-08c5d084d4e842b385d084d4e8a2b301-fe6212a62ad94590a212a62ad9759026-44874a2a8d2e4483874a2a8d2e44830c-draft",
"_type" : "default",
"_id" : "865257b6be4340c69257b6be4340c603-en",
"_score" : 0.0,
"_source" : {
"language" : "en",
"fields" : {
"content" : "This is<pre>another set of <strong>important</strong>content some <strong>more</strong> content you can poke a stick at too"
},
"uuid" : "865257b6be4340c69257b6be4340c603"
}
} ]
},
"suggest" : {
"query-suggest" : [ {
"text" : "som",
"offset" : 0,
"length" : 3,
"options" : [ ]
} ]
}
}
I'm currently using Elasticsearch 2.4.6 and I can't update
There are 5 document in my index and only 4 contain the word "some".
Why do I see 5 hits but no options?
The options are not empty if I start my suggest text with the first word of the field string. (e.g: this)
Is my usage of the suggest feature valid when dealing with fields that contain full html pages? I'm not sure whether the feature was meant to handle many tokens per document.
I already tried to use ngram tokenizer for my suggestor analyzer but that did not change the situation. Any hint or feedback would be appreciated.
It seems that the issue I'm seeing is a restriction is completion suggesters:
Matching always starts at the beginning of the text. So, for example, “Smi” will match “Smith, Fed” but not “Fed Smith”. However, you could list both “Smith, Fed” and “Fed Smith” as two different inputs for the one output.
http://rea.tech/implementing-autosuggest-in-elasticsearch/
I'm trying to do a wildcard query with spaces. It easily matches the words on term basis but not on field basis.
I've read the documentation which says that I need to have the field as not_analyzed but with this type set, it returns nothing.
This is the mapping with which it works on term basis:
{
"denshop" : {
"mappings" : {
"products" : {
"properties" : {
"code" : {
"type" : "string"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"price" : {
"type" : "long"
},
"url" : {
"type" : "string"
}
}
}
}
}
}
This is the mapping with which the exact same query returns nothing:
{
"denshop" : {
"mappings" : {
"products" : {
"properties" : {
"code" : {
"type" : "string"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
"price" : {
"type" : "long"
},
"url" : {
"type" : "string"
}
}
}
}
}
}
The query is here:
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*test*"}}}'
Response with the not_analyzed property:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Response without not_analyzed:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
...
EDIT: Adding requested info
Here is the list of documents:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
"_index" : "denshop",
"_type" : "products",
"_id" : "3L1",
"_score" : 1.0,
"_source" : {
"id" : 3,
"name" : "Testovací produkt 2",
"code" : "",
"price" : 500,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-2/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "4L1",
"_score" : 1.0,
"_source" : {
"id" : 4,
"name" : "Testovací produkt 3",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-3/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "2L1",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "Testovací produkt",
"code" : "",
"price" : 500,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "5L1",
"_score" : 1.0,
"_source" : {
"id" : 5,
"name" : "Testovací produkt 4",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-4/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "6L1",
"_score" : 1.0,
"_source" : {
"id" : 6,
"name" : "Testovací produkt 5",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/tricka-tilka-tuniky/testovaci-produkt-5/"
}
} ]
}
}
Without the not_analyzed it returns with this:
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací*"}}}'
But not with this (notice the space before asterisk):
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací *"}}}'
When I add the not_analyzed to mapping, it returns no hits no matter what I put in the wildcard query.
Add a custom analyzer that should lowercase the text. Then in your search query, before passing the text to it have it lowercased in your client application.
To, also, keep the original analysis chain, I've added a sub-field to your name field that will use the custom analyzer.
PUT /denshop
{
"settings": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"products": {
"properties": {
"name": {
"type": "string",
"fields": {
"lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And the query will work on the sub-field:
GET /denshop/products/_search
{
"query": {
"wildcard": {
"name.lowercase": "*testovací *"
}
}
}