I am running elasticsearch v1.1.1 and I am having trouble getting results from regex searches.
{
"query" : {
"regexp" : {
"lastname" : "smit*"
}
}
}
Returns 0 results (when I know I have 'smith' in the data.
I have also tried:
{
"query" : {
"filtered" : {
"filter" : {
"regexp" : {
"lastname" : "smit*"
}
}
}
}
}
Any help would be appreciated.
So first off, a lot of this is dependent on how you indexed the field - analyzed or not, what kind of tokenizer, was it lowercased, etc.
To answer your specific question concerning regexp queries, assuming your field is indexed as "smith" (all lower case) you should change your search string to "smit.*" which should match "smith". "smit." should also work.
The reason is that in regexp (which is different than wildcard) "." matches any character. "*" matches any number of the previous character. So your search would match "smitt" or "smittt". The construct ".*" means match any number (including 0) of the previous character - which is "." which matches any. The combination of the two is the regexp equivalent of the wildcard "*".
That said, I'd caution that regexp and wildcard searches can have significant performance challenges in text indexes, depending upon the nature of the field, how it's indexed and the number of documents. These kinds of searches can be very useful but more than one person has built wildcard or regexp searches tested on small data sets only to be disappointed by the production performance. Use with caution.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
ElasticSearch Regexp Filter
Related
i am not able to search string start with number
PUT music/song/1?refresh
{
"suggest" : [
{
"input": "123hello",
"weight" : 3
}
]
}
i have tried the following regex query
POST music/_search?pretty
{
"suggest": {
"song-suggest" : {
"regex" : "^[0-9].*$",
"completion" : {
"field" : "suggest"
}
}
}
}
You should try [0-9]*.*.
The point is that lucene regexp doesn't use '^' and '$' as symbols of start and end of the string to anchor your pattern. In fact regexp in lucene is anchored to whole string by default, see:
Most regular expression engines allow you to match any part of a string. If you want the regexp pattern to start at the beginning of the string or finish at the end of the string, then you have to anchor it specifically, using ^ to indicate the beginning or $ to indicate the end.
Lucene’s patterns are always anchored. The pattern provided must match the entire string.
See also my very similar question. Especially if your field could be more then 256 characters.
Don't know if question is still actual, so I just leave it here.
I've a problem with the NOT operator in Elasticsearch.
I start a query string query and look for these keywords:
plain~ AND NOT port~
I'm getting a list with documents which contains the word "plain" (that's ok) but also with the word "airport".
Is this the correct behavior and how can I exclude these compound words?
Yes, this is corect behaviour. Please have a look at the documentation for fuzzy operator and especially fuzziness parameter values.
The point is that fuzzy operator "uses the Damerau-Levenshtein distance to find all terms with a maximum of two changes, where a change is the insertion, deletion or substitution of a single character..."
The word airport in you query is not excluded as it has more than two changes.
But this query would work:
{
"query": {
"query_string": {
"fields" : ["description"],
"query": "NOT rport~2"
}
}
}
It would exclude airport from the results. But you cannot increase the fuzziness factor to 3 as this is not supported (so this "query": "NOT port~3" won't work).
Your needs sound to me more like one of the cases of Partial Matching
So in DB I have this entry:
Mark-Whalberg
When searching with term
Mark-Whalberg
I get not match.
Why? Is minus a special character what I understand? It symbolizes "exclude"?
The query is this:
{"query_string": {"query": 'Mark-Whalberg', "default_operator": "AND"}}
Searching everything else, like:
Mark
Whalberg
hlb
Mark Whalberg
returns a match.
Is this stored as two different pieces? How can I get a match when including the minus sign in the search term?
--------------EDIT--------------
This is the current query:
var fields = [
"field1",
"field2",
];
{"query_string":{"query": '*Mark-Whalberg*',"default_operator": "AND","fields": fields}};
You have an analyzer configuration issue.
Let me explain that. When you defined your index in ElasticSearch, you didn't indicate any analyzer for the field. It means it's the Standard Analyzer that will apply.
According to the documentation :
Standard Analyzer
The standard analyzer is the default analyzer which is used if none is
specified. It provides grammar based tokenization (based on the
Unicode Text Segmentation algorithm, as specified in Unicode Standard
Annex #29) and works well for most languages.
Also, to answer to your question :
Why? Is minus a special character what I understand? It symbolizes
"exclude"?
For the Standard Analyzer, yes it is. It doesn't mean "exclude" but it is a special char that will be deleted after analysis.
From documentation :
Why doesn’t the term query match my document?
[...] There are many ways to analyze text: the default standard
analyzer drops most punctuation, breaks up text into individual words,
and lower cases them. For instance, the standard analyzer would turn
the string “Quick Brown Fox!” into the terms [quick, brown, fox].
[...]
Example :
If you have the following text :
"The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
Then the Standard Analyzer will produce :
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]
If you don't want to use the analyzer you have 2 solutions :
You can use match query.
You can ask ElasticSearch to not analyze the field when you create your index : here's how
I hope this will help you.
I've stuck in same question and the answer from #Mickael was perfect to understand what is going on (I really recommend you to read the linked documentation).
I solve this by defining an operator to the query:
GET http://localhost:9200/creative/_search
{
"query": {
"match": {
"keyword_id": {
"query": "fake-keyword-uuid-3",
"operator": "AND"
}
}
}
}
For better understand the algorithm that this query uses, try to add "explain": true and analyse the results:
GET http://localhost:9200/creative/_search
{
"explain": true,
"query": // ...
}
I'm trying to remove accents of occurrences looked somewhat the same way that the function downcase.
Currently searching for all results starting with a string like that :
r.Table("places").Filter(func(customer r.Term) interface{}{
return customer.Field("Name").Downcase().Match("^" + strings.ToLower(value))
})
but it does not work with words with an accent in the word.
Example : with search word "yes", it'll find :
"yes" "yesy" "yessss"
but not
"yés"
What is the best way to remove accents in query to pick them up, too?
What kind of analyzers would you implement in Elasticsearch for searching book titles.
The requirements are that there must be fuzziness and there are word that are 3 letters.
I'm not going to include code because I would like to get a fresh insight.
But the problem I am having is that when I search 3 letters words wrong,
Say I type "dns" and there is a document with a field "dna" then I will get
kindness or something that has dns in the word.
I believe to solve your problem you can use the fuzziness field in your fuzzy query, this will let you set the maximum edit distance so long words will not get matched when your input is a very small word.
{
"fuzzy" : {
"user" : {
"value" : "ki",
"fuzziness" : 2,
"prefix_length" : 1
}
}
}
The above query would match all 3 letter words which start with the letter 'k' and all 4 letter words which start with the letters 'ki'. A fuzziness of 2 means that any 2 edits are allowed i.e. either change 'i' to another letter and then add another letter or add two more letter while keeping 'ki'. The prefix length tells elasticsearch how much of the query needs to be exactly matched before the fuzziness can take over.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html