ElasticSearch: What is the param limit in painless scripting? - elasticsearch

I will have documents with the following data -
1. id
2. user_id
3. online_hr
4. offline_hr
My use case is the following -
I wish to sort the users who are active using online_hr field,
While I want to sort the users who are inactive using the offline_hr field.
I am planning to use ElasticSearch painless script for this use case,
I will have using 2 arrays of online_user_list and offline_user_list into the script params,
And I plan to compare each document's user_id,
if it is present in the either of the params lists and sort accordingly.
I want to know if there is any limit to the param object,
As the userbase may be in 100s of thousands,
And if passing 2 lists of that size in the ES scripting params would be troublesome?
And if there is any better approach?
Query to add data -
POST /products/_doc/1
{
"id":1,
"user_id" : "1",
"online_hr" : "1",
"offline_hr" : "2"
}
Sample data -
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"user_id" : "1",
"online_hr" : "1",
"offline_hr" : "2"
}
}
]
}
}
Mapping -
{
"products" : {
"aliases" : { },
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"offline_hr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"online_hr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"user_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1566466257331",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "g2F3UlxQSseHRisVinulYQ",
"version" : {
"created" : "7020099"
},
"provided_name" : "products"
}
}
}
}
I found Painless scripts have a default size limit of 65,535 bytes,
while the ElasticSearch compiler had a limit of 16834 characters
Reference -
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-walkthrough.html
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html

Related

ElasticSearch - knn search. Sometimes returns _score = null

The more I pass an array to knn_vetcors, the more sources have _score=null
For example - I sent array with length 2 and I got 3 results with valid _score. But if i sent array with length 60 I got all results with _score is null
Request
{
"_source":[],
"collapse":{
"field":"id"
},
"query":{
"knn":{
"vector":{
"k":10,
"vector":[
0,
// array size - 46
0
]
}
}
},
"size":100,
"track_scores":false
}
Response (first and second scores is null but third is float)
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "sb_index_images_ba7587a1-35ab-482f-93d8-a433dd132556_1667904180",
"_type" : "_doc",
"_id" : "207445df53a7b54c76ff76c0bec352c9",
"_score" : null,
"fields" : {
"id" : [
"377007"
]
}
},
{
"_index" : "sb_index_images_ba7587a1-35ab-482f-93d8-a433dd132556_1667904180",
"_type" : "_doc",
"_id" : "ea374a9b90d83ab93a77fb03226cafd3",
"_score" : null,
"fields" : {
"id" : [
"377009"
]
}
},
{
"_index" : "sb_index_images_ba7587a1-35ab-482f-93d8-a433dd132556_1667904180",
"_type" : "_doc",
"_id" : "1f93035d08e2b7af7d482a89f36e3c7c",
"_score" : 0.134376,
"fields" : {
"id" : [
"377014"
]
}
}
]
}
}
Mapping my index
{
"sb_index_images_ba7587a1-35ab-482f-93d8-a433dd132556_1667904180" : {
"mappings" : {
"properties" : {
"colors" : {
"type" : "long"
},
"colors_vector" : {
"type" : "knn_vector",
"dimension" : 9
},
"id" : {
"type" : "keyword"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"params" : {
"properties" : {
"0d23f34d9f2168ab98e5542149eb2f3d" : {
"properties" : {
"name" : {
"type" : "keyword",
"ignore_above" : 256
},
"value" : {
"type" : "keyword",
"eager_global_ordinals" : true,
"ignore_above" : 256,
"fields" : {
"float" : {
"type" : "float",
"ignore_malformed" : true
}
}
}
}
}
}
},
"vector" : {
"type" : "knn_vector",
"dimension" : 2048
}
}
}
}
}

How to count the number of repetitions of a specific word in specific fields of each document in the ElasticSearch index?

I'm pretty new is ElasticSearch and will be thankful for the help.
I have an index.
It's an example of data:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager"
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC."
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager"
}
}
]
}
}
Its mappings:
{
"profile_similarity_ivan" : {
"mappings" : {
"properties" : {
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
},
"headline" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"fielddata" : true
}
}
}
}
}
I would like for fields country and headline to count a number of specific words.
For example, if I search for 'US', an output might be like this:
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1834,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "profile_similarity",
"_id" : "9c346fe0-253b-4c68-8f11-97bbb18d9c9a",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Salt Lake City Metropolitan Area",
"headline" : "Product Manager",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "e97cdbe8-445f-49f0-b659-6a19829a0a14",
"_score" : 1.0,
"_source" : {
"country" : "US",
"city" : "Los Angeles",
"headline" : "K2 & Amazon, Smarter King, LLC.",
"country_count_US" : 1,
"headline_count_US" : 0
}
},
{
"_index" : "profile_similarity",
"_id" : "a7a69710-4fad-4b7d-88e4-bd0873e6fd03",
"_score" : 1.0,
"_source" : {
"country" : "CA",
"city" : "Greater Toronto Area",
"headline" : "Senior Product Manager",
"country_count_US" : 0,
"headline_count_US" : 0
}
}
]
}
}
I notice that it can be done using runtime fields in ElasticSearch and scripting with painless
In general, I have issues with writing the painless script for this task.
Can you help me please write this script and create the right query in ElasticSearch for this task please?
Also will be thankful for any advice for this task can be finished by other functionality (not only by runtime fields) of ElasticSearch.
Thanks
This can be done but you need to fix three things.
You seem not to have created a mapping for your index, what you show look like the dynamic mappings ES assigns on its own to any given field. Even with your current mappings, you can simply run a terms aggregation on the results of your query and you will get the count of the words that you need. Just pass them as individual terms to be aggregated. Something like this will give you some output.
GET _search
{
"query": {
"match": {
"Country": "US"
}
},
"aggs": {
"country_count": {
"composite" : {
"sources" : [
{"country" : {"terms" : {"field" : "country"}}},
{"id" : {"terms" : {"field" : "_id", "include" : "US"}}}
]
}
}
}
}
The compostie aggregation will return PER DOCUMENT, how many times the word "US" has come.
Just go look at the docs about how to paginate the composite aggregation. This way you can get all the required counts for EVERY SINGLE DOCUMENT.
Composite Aggregation
Generally aggregations are used to get such answers. You may need to tweak the mappings of the fields, to use different analyzers(whitespace).
But generally you just need to use terms aggregations.
HTH.

How to get the exact match of using DSL

How mapping have role to find the search??
GET courses/_search
return is below
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0226655,
"hits" : [
{
"_index" : "courses",
"_type" : "classroom",
"_id" : "7",
"_score" : 1.0226655,
"_source" : {
"name" : "Computer Internals 250",
"room" : "C8",
"professor" : {
"name" : "Gregg Va",
"department" : "engineering",
"facutly_type" : "part-time",
"email" : "payneg#onuni.com"
},
"students_enrolled" : 33,
"course_publish_date" : "2012-08-20",
"course_description" : "cpt Int 250 gives students an integrated and rigorous picture of applied computer science, as it comes to play in the construction of a simple yet powerful computer system. "
}
},
{
"_index" : "courses",
"_type" : "classroom",
"_id" : "4",
"_score" : 0.2876821,
"_source" : {
"name" : "Computer Science 101",
"room" : "C12",
"professor" : {
"name" : "Gregg Payne",
"department" : "engineering",
"facutly_type" : "full-time",
"email" : "payneg#onuni.com"
},
"students_enrolled" : 33,
"course_publish_date" : "2013-08-27",
"course_description" : "CS 101 is a first year computer science introduction teaching fundamental data structures and algorithms using python. "
}
}
]
}
}
mapping is below
{
"courses" : {
"mappings" : {
"properties" : {
"course_description" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"course_publish_date" : {
"type" : "date"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"professor" : {
"properties" : {
"department" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"facutly_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"room" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"students_enrolled" : {
"type" : "long"
}
}
}
}
}
I need to return the exact match phrase professor.name=Gregg Payne
I tried below query as per direction from https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
GET courses/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"professor.name" : "Gregg Payne"
}
}
}
}
}
Based on your mapping, here is the query that shall work for you -
POST http://localhost:9200/courses/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"professor.name.keyword" : "Gregg Payne"
}
}
}
}
}
Answering your question in the comments - search is always about mappings :) In your case you use Term query which is about searching for exact values and it needs a keyword field. Text fields get analyzed:
Avoid using the term query for text fields.
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
To search text field values, use the match query instead

How to sort fields with elasticsearch 7

I tried to sort results with a title but it didn't work properly.
Query :
GET /products/_search
{
"sort": [
{ "title.keyword": { "order": "desc" }}
],
"query": {
....
},
}
Mapping
"mappings" : {
"properties" : {
...
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
...
}
}
Results
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 826,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1457580605505",
"_score" : null,
"_source" : {
"id" : 1457580605505,
"title" : "Étui-portefeuille multifonction pour iPhone", <-----
"body_html" : "description here",
after googling I didn't found the right answer for my case. maybe because I'm using ES7 and solution giving not compatible with it.
I have multiple products start with Z...
thanks
É is after Z in character sorting. ( É is different from E ). When you want to sort on some string in elastic you should apply a normalizer to your field to achieve natural sorting.
You should go to this documentation page : normalizer
In your case since you use french language, your normalizer should be composed of lowercase and ascii_folding filters. So the example in the documentation page should perfectly match your needs.

Elasticsearch wildcard query with spaces

I'm trying to do a wildcard query with spaces. It easily matches the words on term basis but not on field basis.
I've read the documentation which says that I need to have the field as not_analyzed but with this type set, it returns nothing.
This is the mapping with which it works on term basis:
{
"denshop" : {
"mappings" : {
"products" : {
"properties" : {
"code" : {
"type" : "string"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"price" : {
"type" : "long"
},
"url" : {
"type" : "string"
}
}
}
}
}
}
This is the mapping with which the exact same query returns nothing:
{
"denshop" : {
"mappings" : {
"products" : {
"properties" : {
"code" : {
"type" : "string"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
"price" : {
"type" : "long"
},
"url" : {
"type" : "string"
}
}
}
}
}
}
The query is here:
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*test*"}}}'
Response with the not_analyzed property:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Response without not_analyzed:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
...
EDIT: Adding requested info
Here is the list of documents:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
"_index" : "denshop",
"_type" : "products",
"_id" : "3L1",
"_score" : 1.0,
"_source" : {
"id" : 3,
"name" : "Testovací produkt 2",
"code" : "",
"price" : 500,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-2/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "4L1",
"_score" : 1.0,
"_source" : {
"id" : 4,
"name" : "Testovací produkt 3",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-3/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "2L1",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "Testovací produkt",
"code" : "",
"price" : 500,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "5L1",
"_score" : 1.0,
"_source" : {
"id" : 5,
"name" : "Testovací produkt 4",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-4/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "6L1",
"_score" : 1.0,
"_source" : {
"id" : 6,
"name" : "Testovací produkt 5",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/tricka-tilka-tuniky/testovaci-produkt-5/"
}
} ]
}
}
Without the not_analyzed it returns with this:
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací*"}}}'
But not with this (notice the space before asterisk):
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací *"}}}'
When I add the not_analyzed to mapping, it returns no hits no matter what I put in the wildcard query.
Add a custom analyzer that should lowercase the text. Then in your search query, before passing the text to it have it lowercased in your client application.
To, also, keep the original analysis chain, I've added a sub-field to your name field that will use the custom analyzer.
PUT /denshop
{
"settings": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"products": {
"properties": {
"name": {
"type": "string",
"fields": {
"lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And the query will work on the sub-field:
GET /denshop/products/_search
{
"query": {
"wildcard": {
"name.lowercase": "*testovací *"
}
}
}

Resources