How to sort fields with elasticsearch 7 - elasticsearch

I tried to sort results with a title but it didn't work properly.
Query :
GET /products/_search
{
"sort": [
{ "title.keyword": { "order": "desc" }}
],
"query": {
....
},
}
Mapping
"mappings" : {
"properties" : {
...
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
...
}
}
Results
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 826,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1457580605505",
"_score" : null,
"_source" : {
"id" : 1457580605505,
"title" : "Étui-portefeuille multifonction pour iPhone", <-----
"body_html" : "description here",
after googling I didn't found the right answer for my case. maybe because I'm using ES7 and solution giving not compatible with it.
I have multiple products start with Z...
thanks

É is after Z in character sorting. ( É is different from E ). When you want to sort on some string in elastic you should apply a normalizer to your field to achieve natural sorting.
You should go to this documentation page : normalizer
In your case since you use french language, your normalizer should be composed of lowercase and ascii_folding filters. So the example in the documentation page should perfectly match your needs.

Related

How to count the number of repetitions of a specific word in specific fields of each document in the ElasticSearch index?

I'm pretty new is ElasticSearch and will be thankful for the help.
I have an index.
It's an example of data:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager"
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC."
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager"
}
}
]
}
}
Its mappings:
{
"profile_similarity_ivan" : {
"mappings" : {
"properties" : {
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"headline" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
}
}
}
}
}
I would like for fields country and headline to count a number of specific words.
For example, if I search for 'US', an output might be like this:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC.",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager",
"country_count_US" : 0,
"headline_count_US" : 0
}
}
]
}
}
I notice that it can be done using runtime fields in ElasticSearch and scripting with painless
In general, I have issues with writing the painless script for this task.
Can you help me please write this script and create the right query in ElasticSearch for this task please?
Also will be thankful for any advice for this task can be finished by other functionality (not only by runtime fields) of ElasticSearch.
Thanks
This can be done but you need to fix three things.
You seem not to have created a mapping for your index, what you show look like the dynamic mappings ES assigns on its own to any given field. Even with your current mappings, you can simply run a terms aggregation on the results of your query and you will get the count of the words that you need. Just pass them as individual terms to be aggregated. Something like this will give you some output.
GET _search
{
"query": {
"match": {
"Country": "US"
}
},
"aggs": {
"country_count": {
"composite" : {
"sources" : [
{"country" : {"terms" : {"field" : "country"}}},
{"id" : {"terms" : {"field" : "_id", "include" : "US"}}}
]
}
}
}
}
The compostie aggregation will return PER DOCUMENT, how many times the word "US" has come.
Just go look at the docs about how to paginate the composite aggregation. This way you can get all the required counts for EVERY SINGLE DOCUMENT.
Composite Aggregation
Generally aggregations are used to get such answers. You may need to tweak the mappings of the fields, to use different analyzers(whitespace).
But generally you just need to use terms aggregations.
HTH.

Query consecutive words using match_phrase elasticsearch works unexpected

I have the parameter name as a text:
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
}
}
}
Because of the nature of text type in ElasticSearch, matchs every word on the phrase. That's why in some cases I get the next results:
POST /example-tags/_search
{
"query": {
"match": {
"name": "Jordan Rudess was born in 1956"
}
}
}
// Results
{
"took" : 28,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.1596613,
"hits" : [
{
"_index" : "example-tags",
"_type" : "_doc",
"_id" : "6101e538bc8ec610aff699e4",
"_score" : 4.1596613,
"_source" : {
"name" : "Jordan Rudess"
}
},
{
"_index" : "example-tags",
"_type" : "_doc",
"_id" : "610123538bc8ec61034ff699e4",
"_score" : 4.1796613,
"_source" : {
"name" : "Alice in Chains"
}
},
]
}
}
As you can see, in the text Jordan Rudess was born in 1956 I get the result Alice in Chains just for the word in. I want to avoid this behaviour.
If I try:
POST /example-tags/_search
{
"query": {
"match_phrase": {
"name": "Dream Theater keyboardist's Jordan Rudess was born in 1956"
}
}
}
// Results
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
So, in the past example I was expecting to get the Jordan Rudess tag name but I get empty results.
I need to get the maximum ocurrences in tag.name of consecutive words in a phrase. How can I achieve that?

No matches when querying Elastic Search

I'm trying to run a query elastic search. When run this query
GET accounts/_search/
{
"query": {
"term": {
"address_line_1": "1000"
}
}
}
I get back multiple records like
"hits" : [
{
"_index" : "accounts",
"_type" : "_doc",
"_id" : "...",
"_score" : 8.355149,
"_source" : {
"state_id" : 35,
"first_name" : "...",
"last_name" : "...",
"middle_name" : "P",
"dob" : "...",
"status" : "ACTIVE",
"address_line_1" : "1000 BROADROCK CT",
"address_line_2" : "",
"address_city" : "PARMA",
"address_zip" : "",
"address_zip_plus_4" : ""
}
},
But when I try to expand it to include the more like below I don't get any matches
GET accounts/_search/
{
"query": {
"term": {
"address_line_1": "1000 B"
}
}
}
The response is
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
The term query is looking for exact matches. Your address_line_* fields were most probably indexed with the standard analyzer which lowercase-s all the letters which in turn prevents the query from matching.
So either use
GET accounts/_search/
{
"query": {
"match": { <--
"address_line_1": "1000 B"
}
}
}
which does not really 'care' about B being lower/upper case or adjust your field analyzers such that the capitalization is preserved.

elasticsearch does not return expected returns

I'm complete new on elasticsearch. I tried search API but it's not returning what I expected
What I did
POST /test/_doc/1
{
"name": "Hello World"
}
GET /test/_doc/1
Response:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 5,
"_seq_no" : 28,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Hello World"
}
}
GET /test/_mapping
Response:
{
"test" : {
"mappings" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"query" : {
"properties" : {
"term" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}
}
GET /test/_search
{
"query": {
"term": {
"name": "Hello"
}
}
}:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
GET /test/_search
{
"query": {
"term": {
"name": "Hello World"
}
}
}
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
My elasticsearch version is 7.3.2
The last two search should return me document 1, is that correct? Why does it hit nothing?
Problem is that you have term queries. Term queries are not analysed. Hence Hello didn't match the term hello in your index. Note the case difference.
Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.
Reference
Whereas match queries analyse the search term also.
{
"query": {
"match": {
"name": "Hello"
}
}
}
You can use _analyze to check how your terms are indexed.

ElasticSearch: What is the param limit in painless scripting?

I will have documents with the following data -
1. id
2. user_id
3. online_hr
4. offline_hr
My use case is the following -
I wish to sort the users who are active using online_hr field,
While I want to sort the users who are inactive using the offline_hr field.
I am planning to use ElasticSearch painless script for this use case,
I will have using 2 arrays of online_user_list and offline_user_list into the script params,
And I plan to compare each document's user_id,
if it is present in the either of the params lists and sort accordingly.
I want to know if there is any limit to the param object,
As the userbase may be in 100s of thousands,
And if passing 2 lists of that size in the ES scripting params would be troublesome?
And if there is any better approach?
Query to add data -
POST /products/_doc/1
{
"id":1,
"user_id" : "1",
"online_hr" : "1",
"offline_hr" : "2"
}
Sample data -
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"user_id" : "1",
"online_hr" : "1",
"offline_hr" : "2"
}
}
]
}
}
Mapping -
{
"products" : {
"aliases" : { },
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"offline_hr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"online_hr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"user_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1566466257331",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "g2F3UlxQSseHRisVinulYQ",
"version" : {
"created" : "7020099"
},
"provided_name" : "products"
}
}
}
}
I found Painless scripts have a default size limit of 65,535 bytes,
while the ElasticSearch compiler had a limit of 16834 characters
Reference -
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-walkthrough.html
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html

Resources