Add simple search on every field in ElasticSearch - elasticsearch

here's my query:
"query":{
"query":{
"bool":{
"must":[
{
"term":{
"device_id":"1"
}
},
{
"range":{
"created_at":{
"gte":"2019-10-01",
"lte":"2019-12-10"
}
}
}
],
"must_not":[
{
"term":{
"action":"ping"
}
},
{
"term":{
"action":"checkstatus"
}
}
]
}
},
"sort":{
"created_at":"desc"
},
"size":10,
"from":0
}
the logs vary a lot with completely different set of fields. What I can do to search if in any of them is a string that I'm looking for? Let's say I'm looking for "mit" and it shows me one log with
surname -> Smith
and the second one with
action -> "message comMITted"
I've tried using match [ '_all' => "mit" ] but it's deprecated I heard.
I'm using Elasticsearch 7.3.1

To perform search across all fields, use copy_to to copy the values of all fields into a grouped field, and perform the queries against that field. In case you have dynamic index mappings refer this.
For supporting infix matching, you can use ngrams. This is a good tutorial for ngrams.
Infix matching can be performed using wildcard query too. But it's not recommended due to performance reasons.

Related

Elasticsearch sort exact matches and fuzzy matches in different sets

This is my first ever question here so I apologize if I make any mistakes.
I'm trying to make a fuzzy search (match query with fuzziness parameter) on my index that will return the results in Alphabetical order. But I need the exact matches to come first(Alphabetically ordered among themselves) and fuzzy matches later.
I have tried this to make exact matches have higher scores. But they are just being sorted by their scores:
"query":{
"bool":{
"must":[
{
"match":{
"myPropertyName":{
"query":"myWord",
"fuzziness":"AUTO"
}
}
}
],
"should":[
{
"match":{
"myPropertyName":{
"query":"myWord",
"boost":20
}
}
}
]
}
},
"sort":[
"_score",
{
"myProperty.keyword":{
"order":"asc"
}
}
],
"track_scores":true
}
Then I have tried to make the scores of all exact matches and fuzzy matches same among themselves with many methods. I can make it for fuzzy matches by using filter or constant_score but I couldn't figure a way to assign a custom score to the results of should query in my search.
How can I achieve this?
I've managed to achieve this by using a function score query with "boost_mode": "replace" and setting a custom value to weight parameter like: "weight": "10".
{
"query":{
"function_score":{
"query":{
"bool":{
"filter":[
{
"match":{
"myPropertyName":{
"query":"myWord",
"fuzziness":"AUTO"
}
}
}
]
}
},
"boost_mode":"replace",
"functions":[
{
"filter":{
"match":{
"myPropertyName":{
"query":"myWord"
}
}
},
"weight":"10"
}
]
}
},
"sort":[
"_score",
{
"myProperty.keyword":{
"order":"asc"
}
}
],
"track_scores":true
}
This way documents that match the match query will return with 0 score since it's also a filter query. Then among these documents the ones that match the function will return with 10 score since "boost_mode": "replace" and "weight: "10".
When it comes to sorting firstly Elasticsearch will sort the results by their score's since it comes first in "sort[]" array. Then documents with same scores will be sorted alphabetically among themselves.
This worked perfectly for me.

bool query with filter does not return any documents

The simple query
"query": {
"simple_query_string": { "query":"great guide" }
},
returns my document as expected, containing
"groups": [
"Local Business"
],
But if I use a filter, it returns no documents:
"query": {
"bool":{
"must":[
{"simple_query_string": { "query":"great guide" }}
],
"filter":{
"terms":{
"groups":["Local Business"]
}
}
}
},
If I remove the "filter" key and values, then the document is retrieved.
Why isn't the filter matching the document ?
If the groups field is of type keyword, then the query you've mentioned works as expected.
However it wouldn't work if the field groups if of type text. In that case the below query would actually fit what you are looking for.
Query for group - Type text
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"simple_query_string":{
"query":"great guide"
}
}
],
"filter":{
"match":{
"groups":"Local Business"
}
}
}
}
}
The reason the query you've mentioned doesn't work for the field of type text is because this field goes through Analysis phase making use of Standard Analyzer by default where it would first convert Local Business into small cases and then saves local and business as two individual words in the inverted index.
Elasticsearch would only give you results if the words you query match what's available in the index.
And what keyword does is, it saves Local Business as is in inverted index.
Note: You can try the query you have by replacing groups with groups.keyword if mapping hasn't been defined and is created dynamically.
Hope this helps!

Elasticsearch: get exact match, then fuzzy

I run this query:
GET /thing/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"type": "most_fields",
"fields": [ "title", "loc" ]
}
}
}
It works and returns results from thing where title and loc contains castle and/or pontivy in a relevant order. Nice.
Now I want to continue querying like this, but I also want the result to prefer exact match on title. Which means that if one item exactly match castle pontivy in its title, it must be returned as first element (then the other results are treated as usual).
Is there a way to do this?
You could do a phrase match and give it a boost of 5, so whatever the default score is, it will add +5 to that. If you want to get more into scoring, look into function score query (I recommend you should)
Second multi_match will match the rest of the documents using most_fields.
{
"query":{
"bool":{
"should":[
{
"multi_match":{
"query":"castle pontivy",
"type":"phrase",
"fields":[
"title",
"loc"
],
"boost":5
}
},
{
"multi_match":{
"query":"castle pontivy",
"type":"most_fields",
"fields":[
"title",
"loc"
]
}
}
]
}
}
}

Elastic Search: Matching sub token default operator

Is there a way to set the default operator for sub tokens (tokens generated through the analyzer)? It currently seems to default to OR and setting operator does not work.
I'm using the validate API to see how Elastic Search is understanding the query:
/myIndex/mapping/_validate/query?explain=true
{
"query":{
"multi_match":{
"type":"phrase_prefix",
"query":"test123",
"fields":[
"message"
],
"lenient":true,
"analyzer":"myAnalyzer"
}
}
}
Which returns
+(message:test123 message:test message:123)
What I want is
+message:test123 +message:test +message:123
Is there any way to do this without using a script or splitting the terms and creating a more complex query in the application?
EDIT
Using operator or minimum_should_match does not make a difference.
My elastic search mapping for myAnalyzer is
{
"analysis":{
"filter":{
"foldAscii":{
"type":"asciifolding",
"preserve_original":"1"
},
"capturePattern":{
"type":"pattern_capture",
"patterns":[
"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+(?!\\p{Ll}+))",
"(\\d+)"
]
},
"noDuplicates":{
"type":"unique",
"only_on_same_position":"true"
}
},
"myAnalyzer":{
"filter":[
"capturePattern",
"lowercase",
"foldAscii",
"noDuplicates"
],
"tokenizer":"standard"
}
}
}

Allow wildcards in proximity searches with multiple words

I am using ElasticSearch 5.6 on Ubuntu 16.04. My problem is when i try to use wildcards inside a proximity search with multiple words.
Examples:
"hell* worl*"~3
Basically, I would like to get all the words that starts with "hell" and "worl" that are close to each other with a max distance of 3.
I do not get any error but it does not find the documents. It seems that wildcards are not analyzed. I also have set analyze_wildcard: true
The DOC says:
By default, wildcards terms in a query string are not analyzed. By
setting this value to true, a best effort will be made to analyze
those as well.
But, only the following query works:
"hello world"~3 # this works
This is my query:
{
"size":15,
"from":0,
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"\"hell* worl*\"~3",
"analyze_wildcard":true
}
}
]
}
}
}
Reference:
Proximity Searches
Wildcards
You can use span queries to achive what you want, though be careful cause the terms are not analyzed here.
{
"size": 15,
"from": 0,
"query": {
"span_near": {
"clauses": [
{
"span_multi": {
"match": {
"wildcard": {
"t": "hell*"
}
}
}
},
{
"span_multi": {
"match": {
"wildcard": {
"t": "worl*"
}
}
}
}
],
"slop": 3,
"in_order": true
}
}
}
The problem in your query_string is that * character is not treated as wildcard within quotes. What you get is simple slop phrase similar to "hell# worl#"~3 cause special characters have no meaning within quotes.
Be careful though, cause span queries have much slower performance than simple phrase search (though it seems that it still is faster than slop phrases which actually surprised me).
Better option if you can still prepare your data for the scenario is to use ngrams. With ngrams simple "hell worl"~3 would match what you want.

Resources