Elasticsearch term query with colons - elasticsearch

I have a string field "title"(not analyzed) in elasticsearch. A document has title "Garfield 2: A Tail Of Two Kitties (2006)".
When I use the following json to query, no result returns.
{"query":{"term":{"title":"Garfield 2: A Tail Of Two Kitties (2006)"}}}
I tried to escape the colon character and the braces, like:
{"query":{"term":{"title":"Garfield 2\\: A Tail Of Two Kitties \\(2006\\)"}}}
Still not working.

Term query wont tokenize or apply analyzers to the search text. Instead if looks for the exact match which wont work as the string fields are analyzed/tokenized by default.
To give this a better explanation -
Lets say there is a string value as - "I am in summer:camp"
When indexing this its broken into tokens as below -
"I am in summer:camp" => [ I , am , in , summer , camp ]
Hence even if you do a term search for "I am in summer:camp" , it wont still work as the token "I am in summer:camp" is not present in the index.
Something like phrase query might work better here.
Or you can leave "index" field as "not_analyzed" to make sure that string is not tokenized.

Related

Match is not obtained by using match_phrase_prefix in elastic search

I have used match_phrase_prefix. It provides results if I have searched by entering some characters, but when I add some more characters to the search field, query returns zero hits.
For example: if I type abcd it returns match.
But if I type abcd e then even if there is match, I didn't get the hit.
Following is the query I have used
here _field: is the filed name and
_queryText: is the search field value that I enter.
Can I use must or should condition or minimum should match here if yes then how?
Thanks in advance
{
body: {
'query': {
'match_phrase_prefix': {
[_field]: _queryText
},
},
'size': 15,
}
}
Match_phrase_prefix is the same as match_phrase, except that it allows for prefix matches on the last term in the text.
In your case, when you search for abcd as it the only term in the search query, it will make a prefix query and documents containing abcd, abcde , abcdef all will match.
But as soon as you change your search query to abcd e, it has a two words abcd and e. It will make a match query on abcd and prefix query on e, so documents containing abcd followed by e or ef or efg will match.
It would be better if you can provide your sample docs so that I can explain it better what should match and what shouldn't and the reason behind it.

Match string with minus character in elasticsearch

So in DB I have this entry:
Mark-Whalberg
When searching with term
Mark-Whalberg
I get not match.
Why? Is minus a special character what I understand? It symbolizes "exclude"?
The query is this:
{"query_string": {"query": 'Mark-Whalberg', "default_operator": "AND"}}
Searching everything else, like:
Mark
Whalberg
hlb
Mark Whalberg
returns a match.
Is this stored as two different pieces? How can I get a match when including the minus sign in the search term?
--------------EDIT--------------
This is the current query:
var fields = [
"field1",
"field2",
];
{"query_string":{"query": '*Mark-Whalberg*',"default_operator": "AND","fields": fields}};
You have an analyzer configuration issue.
Let me explain that. When you defined your index in ElasticSearch, you didn't indicate any analyzer for the field. It means it's the Standard Analyzer that will apply.
According to the documentation :
Standard Analyzer
The standard analyzer is the default analyzer which is used if none is
specified. It provides grammar based tokenization (based on the
Unicode Text Segmentation algorithm, as specified in Unicode Standard
Annex #29) and works well for most languages.
Also, to answer to your question :
Why? Is minus a special character what I understand? It symbolizes
"exclude"?
For the Standard Analyzer, yes it is. It doesn't mean "exclude" but it is a special char that will be deleted after analysis.
From documentation :
Why doesn’t the term query match my document?
[...] There are many ways to analyze text: the default standard
analyzer drops most punctuation, breaks up text into individual words,
and lower cases them. For instance, the standard analyzer would turn
the string “Quick Brown Fox!” into the terms [quick, brown, fox].
[...]
Example :
If you have the following text :
"The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
Then the Standard Analyzer will produce :
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]
If you don't want to use the analyzer you have 2 solutions :
You can use match query.
You can ask ElasticSearch to not analyze the field when you create your index : here's how
I hope this will help you.
I've stuck in same question and the answer from #Mickael was perfect to understand what is going on (I really recommend you to read the linked documentation).
I solve this by defining an operator to the query:
GET http://localhost:9200/creative/_search
{
"query": {
"match": {
"keyword_id": {
"query": "fake-keyword-uuid-3",
"operator": "AND"
}
}
}
}
For better understand the algorithm that this query uses, try to add "explain": true and analyse the results:
GET http://localhost:9200/creative/_search
{
"explain": true,
"query": // ...
}

Searchkick substring matches lookup

I'm thinking this might be a question for the wider ElasticSearch community, but since we're using Searchkick, I thought I'd start here...
We have an index containing records with multiple string fields, say:
"Jimi", "Hendrix", "Guitar"
"Phil", "Collins", "Drums"
"Sting", "", "Bass"
"Ringo", "Starr", "Drums"
"Paul", "McCartney", "Bass"
I want to pass searchkick/elasticsearch a long string, say:
"It is known that Jimi liked to set light to his guitar and smash up all the drums while on stage."
and i want to get returned the fields that have any matches - preferably in order of the most matches first:
"Jimi", "Hendrix", "Guitar"
"Phil", "Collins", "Drums"
"Ringo", "Starr", "Drums"
How do I go about setting up the query?
Thanks!

Elasticsearch substring matchng without ending

For example if my search word is: "Houses" I want found result "House" how to search without last 1-2 word letters ?
I try "nGram" filter, but it serrch for full word.
I feel you are chasing the wrong approach.
Judging by your example , i feel what you are looking is stemmers.
Elasticsearch has stemmers like snowball which can convert any word to their base forms or stems.
For eg: , the stemmer can convert
[ "jumping" , "jumped" ] -> "jump"
[ "staying" , "stayed" ] -> "stay"
And so on...
Snowball - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.html#analysis-snowball-analyzer

In elasticSearch, not able to search name with special character '-'

I am trying to search names in elastic search,
Consider name as kanal-kannan
normally we search name with * na, I tried to search like this -
"/index/party_details/_search?size=200&from=0&q=(first_name_v:kanal-*)"
this results in zero records.
Unless the hyphen character has been dealt with specifically by the analyzer then the two words in your example, kanal and kannan will be indexed separately because any non-alpha character is treated by default as a word delimiter.
Have a look at the documentation for Word Delimiter Token Filter and specifically at the type_table parameter.
Here's an example I used to ensure that an email field was correctly indexed
ft.custom_delimiter = {
"type": "word_delimiter",
"split_on_numerics": false,
"type_table": ["# => ALPHANUM", ". => ALPHANUM", "- => ALPHA", "_ => ALPHANUM"]
};
- is a special character that needs to be escaped to be searched literally: \-
If you use the q parameter (that is, the query_string query), the rules of the Lucene Queryparser Syntax apply.
Depending on your analyzer chain, you might not have any - characters in your index, replacing them with a space in your query would work in that cases too.
#l4rd's answer should work properly (I have the same setup). Another option you have is to mark field with keyword analyzer to prevent tokenizing at all. Note that keyword tokenizer wouldn't lowercase anything, so use custom analyzer with keyword tokenizer in this case.

Resources