Elasticsearch Query String with Dot/Point at the end, i.e. +foo.* - elasticsearch

I have an index containing lot's of streets. The index looks like this:
Mainstreet 42
Some other street 15
Foostr. 9
The default search query looks like this:
+QUERY_STRING*
So querying for foo (sent as +foo*) or foostr (sent as +foostr*) results in Foostr. 9, which is correct. BUT querying for foostr. (which get's sent to Elasticsearch as +foostr.*) gives no results, but why?
I use standard analyzer and the query string with no special options. (This also returns 0 results when using http://127.0.0.1:9200/test/streets?q=+foostr.*).
Btw. this: http://127.0.0.1:9200/test/streets?q=+foostr. (same as above without the asterisk) finds the right results
Questions:
Why is this happening?
How to avoid this behavior?

One thing i didn't think about was:
Elasticsearch will not analyze wildcard queries by default!
This means. By default it will act like this:
input query | the query that ES will use
----------------------------------------
foo | foo
foo. | foo
foo* | foo*
foo.* | foo.*
As you can see, if the input query contains a wildcard, ES will not remove any characters. When using no wildcard, ES will take the query and run an analyzer, which (i.e. when using the default analyzer) will remove all dots.
To "fix" this, you can either
Remove all dots manually from the query string. Or
Use analyze_wildcard=true (i.e. http://127.0.0.1:9200/test/streets?q=+foostr.*&analyze_wildcard=true). Here's an explanation of what happens: https://github.com/elastic/elasticsearch/issues/787

1) This is because standard analyser does not index special characters. Example if you index a string Yoo! My name is Karthik., elasticsearch breaks it down to (yoo, my, name, is, karthik) without special characters (which actually makes sense in many simple cases) and in lowercase. So, when you search for foostr., there were no results.. as it was indexed as foostr (without ".").
2) You can use different types of analysers for different fields depending on your requirement while indexing (or you can use no_analyser as well).
Example:-
$ curl -XPUT 'http://localhost:9200/bookstore/book/_mapping' -d '
{
"book" : {
"properties" : {
"title" : {"type" : "string", "analyzer" : "simple"},
"description" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
'
You can refer this and this for more information.
HTH!

Related

Kibana visualize use wild card in search bar

Is it possible to use wild card in Kibana visualize search bar.
Tried to use it like below, but did not work.
operation: "Revers" NOT file:"*Test.Revers"
This returns 2 because there are two Revers terms ("Revers", "/test/count/Test.Revers" ) even though only one data entry is in the stats data.
The following also returns the same value as 2.
operation: "Revers"
Stat data sample is as below.
"_source": {
"status": 0,
"trstime": 1819,
"username": "test",
"operation": "Revers",
"file": "/test/count/Test.Revers"
}
I have tested it in ES 7.10 as you not mentioned ES version.
Answer to your question is YES, you can use wildcrad in Kibana visualize search bar but value should be without double quotes. Because if you give value in doble quotes it will consider as text and search for it.
You can try below query and it will give you your expected output:
operation: Revers AND NOT file.keyword: *Test.Revers
The result given for the below query as 1 without double quotes.
operation: Revers AND NOT file: *Test.Revers

How should I configure elasticsearch mapping in order to get the MySQL "like" like behaviour?

I have a field say "name" which can contain :
Multiple words;
both lower case and upper case;
digits;
special characters: !##$%^&*();
different languages like: english, french, danish and others.
The task is to define the settings of this field so that when I search I can get the desirable results as follows: no matter what I pass in as searched string(ex: '1', 'a', '#1' 'èæ qтчert1') I should get all documents that contain searched sequence.
Note: I use elasticsearch v 5.6;
I believe the TEXT type, should be OK. The best possible way is by testing it.
PUT /language_test/sample/1
{
"sentence" :" 你吃饭了吗?",
"lang" :"chinese"
}
PUT /language_test/sample/2
{
"sentence" :"Var kan jag hitta någon som talar engelska?",
"lang" :"swedish"
}
GET /language_test/_mapping

Elastic search filename search not working with dots in filename

I have elasticsearch mapping as follows:
{
"info": {
"properties": {
"timestamp": {"type":"date","format":"epoch_second"},
"user": {"type":"keyword" },
"filename": {"type":"text"}
}
}
}
When I try to do match query on filename, it works properly when I don't give dot in search input, but when dot in included, it returns many false results.
I learnt that standard analyzer is the issue. It breaks search input on dots and then search. What analyzer I can use in this case? The filenames can be millions and I don't want something with takes lot of memory and time. Please suggest.
As you are talking about filenames here, i would suggest using the keyword analyzer. This will not split the string and just index it as it is.
You could also just change ur mapping from text to keyword instead.

How do you search for exact terms (which may include special characters) with trailing/leading wildcard matching in Elasticsearch?

I am trying to figure out how to create Elasticsearch queries that allow for exact matches containing reserved characters while supporting trailing or leading wildcard expansion. I am using logstash dynamic templates which automatically also creates a raw field for each of my terms.
To sum up as concisely as possible, I want to create queries that can support two generic types of matching across all values:
Searching terms such as 'abc' to return results like 'abc.xyz.com'. In this case, the token created by the standard token analyzer completely tokenizes 'abc.xyz.com' into one token, and wildcard matching can succeed using the following command:
{
"query": {
"wildcard": {
"_all": "*abc*"
}
}
}
Searching terms such as fullpaths like '/Intel/1938138191(1).zip' to return results like 'C:/Program Files (x86)/Intel/1938138191(1).zip'. In this case, even if I backslash all of the reserved characters, doing a wildcard match like
{
"query": {
"wildcard": {
"_all": "*/Intel/1938138191(1).zip*"
}
}
}
will not work. And this is because _all defaults to using the standard analyzer, so the path will be split and an exact match cannot be made. However, if I SPECIFICALLY query the raw field like below (both when I escape / do not escape the special characters), I get the correct result:
{
"query": {
"wildcard": {
"field.raw": "*/Intel/1938138191(1).zip*"
}
}
}
So my question is, is there any way to support calling wildcard queries across both tokens analyzed by the standard analyzers and the raw fields which are not analyzed at all, in one query? That is some way of generically encapsulating searched terms so that in both of my above examples, I would get the correct result? For reference I am using Elasticsearch version 1.7. I have also tried looking into query string matching and term matching, all to no avail.

Elastic search query string regex

I am having an issue querying an field (title) using query string regex.
This works: "title:/test/"
This does not : "title:/^test$/"
However they mention it is supported https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax
My goal it to do exact match, but this match should not be partial, it should match the whole field value.
Does anybody have an idea what might be wrong here?
From the documentation
The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators.
You are using anchors ^ and $, which are not supported because there is no need for that, again from the docs
Lucene’s patterns are always anchored. The pattern provided must match the entire string
If you are looking for the phrase query kind of match you could use double quotes like this
{
"query": {
"query_string": {
"default_field": "title",
"query": "\"test phrase\""
}
}
}
but this would also match documents with title like test phrase someword
If you want exact match, you should look for term queries, make your title field mapping "index" : "not_analyzed" or you could use keyword analyzer with lowercase filter for case insensitive match. Your query would look like this
{
"query": {
"term": {
"title": {
"value": "my title"
}
}
}
}
This will give you exact match
Usually in Regex the ^ and $ symbols are used to indicate that the text is should be located at the start/end of the string. This is called anchoring. Lucene regex patterns are anchored by default.
So the pattern "test" with Elasticsearch is the equivalent of "^test$" in say Java.
You have to work to "unanchor" your pattern, for example by using "te.*" to match "test", "testing" and "teeth". Because the pattern "test" would only match "test".
Note that this requires that the field is not analyzed and also note that it has terrible performance. For exact match use a term filter as described in the answer by ChintanShah25.

Resources