How to get Elasticsearch boolean match working for multiple fields - elasticsearch

I need some expert guidance on trying to get a bool match working. I'd like the query to only return a successful search result if both 'message' matches 'Failed password for', and 'path' matches '/var/log/secure'.
This is my query:
curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "#timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '
Here is the start of the output from the search:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local#logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","#version":"1","#timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...
The problem is if I change '/var/log/secure' to just 'var' say, and run the query, I still get a result, just with a lower score. I understood the bool...must construct meant both match terms here would need to be successful. What I'm after is no result if 'path' doesn't exactly match '/var/log/secure'...
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local#logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","#version":"1","#timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...
I checked the mappings for these fields to check that they are not analyzed :
curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'
I think these fields are non analyzed and so I believe the search will not be analyzed too (based on some training documentation I read recently from elasticsearch). Here is a snippet of the output _mapping for this index below.
....
"message" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
....
Where am I going wrong, or what am I misunderstanding here?

As mentioned in the OP you would need to use the "not_analyzed" view of the fields but as per the OP mapping the non-analyzed version of the field is message.raw, path.raw
Example:
{
"filter" : { "range" : { "#timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message.raw" : "Failed password for" } },
{ "match_phrase" : { "path.raw" : "/var/log/secure" } }
]
}
}
}
.The link alongside gives more insight to multi-fields
.To expand further
The mapping in the OP for path is as follows:
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
}
This specifies that the path field uses the default analyzer and field.raw is not analyzed.
If you want to set the path field to be not analyzed instead of raw it would be something on these lines:
"path" : {
"type" : "string",
"index" : "not_analyzed",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : <whatever analyzer you want>,
"ignore_above" : 256
}
}
}

Related

How to get the exact match of using DSL

How mapping have role to find the search??
GET courses/_search
return is below
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0226655,
"hits" : [
{
"_index" : "courses",
"_type" : "classroom",
"_id" : "7",
"_score" : 1.0226655,
"_source" : {
"name" : "Computer Internals 250",
"room" : "C8",
"professor" : {
"name" : "Gregg Va",
"department" : "engineering",
"facutly_type" : "part-time",
"email" : "payneg#onuni.com"
},
"students_enrolled" : 33,
"course_publish_date" : "2012-08-20",
"course_description" : "cpt Int 250 gives students an integrated and rigorous picture of applied computer science, as it comes to play in the construction of a simple yet powerful computer system. "
}
},
{
"_index" : "courses",
"_type" : "classroom",
"_id" : "4",
"_score" : 0.2876821,
"_source" : {
"name" : "Computer Science 101",
"room" : "C12",
"professor" : {
"name" : "Gregg Payne",
"department" : "engineering",
"facutly_type" : "full-time",
"email" : "payneg#onuni.com"
},
"students_enrolled" : 33,
"course_publish_date" : "2013-08-27",
"course_description" : "CS 101 is a first year computer science introduction teaching fundamental data structures and algorithms using python. "
}
}
]
}
}
mapping is below
{
"courses" : {
"mappings" : {
"properties" : {
"course_description" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"course_publish_date" : {
"type" : "date"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"professor" : {
"properties" : {
"department" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"facutly_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"room" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"students_enrolled" : {
"type" : "long"
}
}
}
}
}
I need to return the exact match phrase professor.name=Gregg Payne
I tried below query as per direction from https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
GET courses/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"professor.name" : "Gregg Payne"
}
}
}
}
}
Based on your mapping, here is the query that shall work for you -
POST http://localhost:9200/courses/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"professor.name.keyword" : "Gregg Payne"
}
}
}
}
}
Answering your question in the comments - search is always about mappings :) In your case you use Term query which is about searching for exact values and it needs a keyword field. Text fields get analyzed:
Avoid using the term query for text fields.
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
To search text field values, use the match query instead

ElasticSearch: What is the param limit in painless scripting?

I will have documents with the following data -
1. id
2. user_id
3. online_hr
4. offline_hr
My use case is the following -
I wish to sort the users who are active using online_hr field,
While I want to sort the users who are inactive using the offline_hr field.
I am planning to use ElasticSearch painless script for this use case,
I will have using 2 arrays of online_user_list and offline_user_list into the script params,
And I plan to compare each document's user_id,
if it is present in the either of the params lists and sort accordingly.
I want to know if there is any limit to the param object,
As the userbase may be in 100s of thousands,
And if passing 2 lists of that size in the ES scripting params would be troublesome?
And if there is any better approach?
Query to add data -
POST /products/_doc/1
{
"id":1,
"user_id" : "1",
"online_hr" : "1",
"offline_hr" : "2"
}
Sample data -
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"user_id" : "1",
"online_hr" : "1",
"offline_hr" : "2"
}
}
]
}
}
Mapping -
{
"products" : {
"aliases" : { },
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"offline_hr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"online_hr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"user_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1566466257331",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "g2F3UlxQSseHRisVinulYQ",
"version" : {
"created" : "7020099"
},
"provided_name" : "products"
}
}
}
}
I found Painless scripts have a default size limit of 65,535 bytes,
while the ElasticSearch compiler had a limit of 16834 characters
Reference -
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-walkthrough.html
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html

Elastichsearch range query does not work with icu_collation for Turkish words

I've document which has Turkish words like "şa, za, sb, şc, sd, şe" etc. as customer_address property.
I've indexed my documents as documented below because I want to order documents according to the customer_address field. Sorting is working well.
Sorting and Collations
Now I'm trying to apply range query over "customer_address" field. When I sent the query below, I've got an empty result. (expected result: sb, sd, şa, şd)
curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"query":{"bool":{"filter":[{"range":{"customer_address.sort":{"from":"plaj","to":"şcam","include_lower":true,"include_upper":true,"boost":1.0}}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}}}'
When I've queried I saw that my fields are encrypted as specified in the document.
curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"aggs":{"myaggregation":{"terms":{"field":"customer_address.sort","size":10000}}},"size":0}'
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.0,
"hits" : [ ]
}
"aggregations" : {
"a" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "⚕䁁䀠怀\u0001",
"doc_count" : 1
},
{
"key" : "⚗䁁䀠怀\u0001",
"doc_count" : 1
},
{
"key" : "✁ੀ⃀ၠ\u0000\u0000",
"doc_count" : 1
},
{
"key" : "✁ୀ⃀ၠ\u0000\u0000",
"doc_count" : 1
},
{
"key" : "✁ీ⃀ၠ\u0000\u0000",
"doc_count" : 1
},
{
"key" : "ⶔ䁁䀠怀\u0001",
"doc_count" : 1
}
]
}
}
}
So, How should I send my parameters in the range query to be able to get the successful result?
Thanks in advance.
My Mapping:
curl -XGET http://localhost:9200/sampleindex?pretty
{
"sampleindex" : {
"aliases" : { },
"mappings" : {
"invoice" : {
"properties" : {
"customer_address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
},
"sort" : {
"type" : "text",
"analyzer" : "turkish",
"fielddata" : true
}
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "sampleindex",
"max_result_window" : "2147483647",
"creation_date" : "1521732167023",
"analysis" : {
"filter" : {
"turkish_phonebook" : {
"variant" : "#collation=phonebook",
"country" : "TR",
"language" : "tr",
"type" : "icu_collation"
},
"turkish_lowercase" : {
"type" : "lowercase",
"language" : "turkish"
}
},
"analyzer" : {
"turkish" : {
"filter" : [
"turkish_lowercase",
"turkish_phonebook"
],
"tokenizer" : "keyword"
}
}
},
"number_of_replicas" : "1",
"uuid" : "ChNGX459TUi8VnBLTMn-Ng",
"version" : {
"created" : "5020099"
}
}
}
}
}
I've solved my problem by defining an analyzer with char filter during index creation. I don't know whether it is a good solution or not, but I've could not solve by "turkish_phonebook" of ICU, so the solution seems working for now.
Firstly, I created an index with "turkish_collation_analyzer". And then for my properties which needs this, I created a field "property.tr" to use this defined analyzer. And for last, during range queries, I converted my values as expected by this field.
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "sampleindex",
"max_result_window": "2147483647",
"creation_date": "1522050241730",
"analysis": {
"analyzer": {
"turkish_collation_analyzer": {
"char_filter": [
"turkish_char_filter"
],
"tokenizer": "keyword"
}
},
"char_filter": {
"turkish_char_filter": {
"type": "mapping",
"mappings": [
"a => x01",
"b => x02",
.,
.,
.,
]
}
}
},
"number_of_replicas": "1",
"uuid": "hiEqIpjYTLePjF142B8WWQ",
"version": {
"created": "5020099"
}
}
}

only basic searches are working

I'm playing with curl querying elasticsearch db through my mac console. But I have troubles to execute more complex searches. So far I can query for match_all like this:
curl -XGET 'localhost:9200/products/fashion/_search?pretty' -d'
{
"query" : { "match_all" : {} }
}'
And I receive the following data:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 915503,
"max_score" : 1.0,
"hits" : [
{
"_index" : "products",
"_type" : "fashion",
"_id" : "57d49ee494efcfdfe0f3abfe",
"_score" : 1.0,
"_source" : {
"doc" : {
"id" : "57d49ee494efcfdfe0f3abfe",
"name" : "Shorts",
"price" : {
"value" : 35
}
}
}
},
...........
}
I don't have problems to request a mapping like this:
curl -XGET 'localhost:9200/products/_mapping/fashion?pretty'
And the result for price is:
.......
"price" : {
"properties" : {
"currency" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"value" : {
"type" : "long"
}
}
},
....
But all my attempts to query with filter on "price.value" did not hit.
curl -XGET 'localhost:9200/products/fashion/_search?pretty' -d'
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"price.value" : 35
}
}
}
}
}'
{
"took" : 26,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
This query I took from elasticsearch guide
I ran out of ideas and examples how to write this query to return what I obviously have in database. As you might noticed I have at least one document with price.value = 35
That's because your price field is within another field named doc, so you need to query doc.price.value like this:
curl -XPOST 'localhost:9200/products/fashion/_search?pretty' -d'
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"doc.price.value" : 35
}
}
}
}
}'

Elasticsearch wildcard query with spaces

I'm trying to do a wildcard query with spaces. It easily matches the words on term basis but not on field basis.
I've read the documentation which says that I need to have the field as not_analyzed but with this type set, it returns nothing.
This is the mapping with which it works on term basis:
{
"denshop" : {
"mappings" : {
"products" : {
"properties" : {
"code" : {
"type" : "string"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"price" : {
"type" : "long"
},
"url" : {
"type" : "string"
}
}
}
}
}
}
This is the mapping with which the exact same query returns nothing:
{
"denshop" : {
"mappings" : {
"products" : {
"properties" : {
"code" : {
"type" : "string"
},
"id" : {
"type" : "long"
},
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
"price" : {
"type" : "long"
},
"url" : {
"type" : "string"
}
}
}
}
}
}
The query is here:
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*test*"}}}'
Response with the not_analyzed property:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Response without not_analyzed:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
...
EDIT: Adding requested info
Here is the list of documents:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
"_index" : "denshop",
"_type" : "products",
"_id" : "3L1",
"_score" : 1.0,
"_source" : {
"id" : 3,
"name" : "Testovací produkt 2",
"code" : "",
"price" : 500,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-2/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "4L1",
"_score" : 1.0,
"_source" : {
"id" : 4,
"name" : "Testovací produkt 3",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-3/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "2L1",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "Testovací produkt",
"code" : "",
"price" : 500,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "5L1",
"_score" : 1.0,
"_source" : {
"id" : 5,
"name" : "Testovací produkt 4",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/damske-obleceni/testovaci-produkt-4/"
}
}, {
"_index" : "denshop",
"_type" : "products",
"_id" : "6L1",
"_score" : 1.0,
"_source" : {
"id" : 6,
"name" : "Testovací produkt 5",
"code" : "",
"price" : 666,
"url" : "http://www.denshop.lh/tricka-tilka-tuniky/testovaci-produkt-5/"
}
} ]
}
}
Without the not_analyzed it returns with this:
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací*"}}}'
But not with this (notice the space before asterisk):
curl -XPOST http://127.0.0.1:9200/denshop/products/_search?pretty -d '{"query":{"wildcard":{"name":"*testovací *"}}}'
When I add the not_analyzed to mapping, it returns no hits no matter what I put in the wildcard query.
Add a custom analyzer that should lowercase the text. Then in your search query, before passing the text to it have it lowercased in your client application.
To, also, keep the original analysis chain, I've added a sub-field to your name field that will use the custom analyzer.
PUT /denshop
{
"settings": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"products": {
"properties": {
"name": {
"type": "string",
"fields": {
"lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And the query will work on the sub-field:
GET /denshop/products/_search
{
"query": {
"wildcard": {
"name.lowercase": "*testovací *"
}
}
}

Resources