escaping forward slashes in elasticsearch - elasticsearch

I am doing a general search against elasticsearch (1.7) and all is well except my account numbers have forward slashes in them. The account number field is not the id field and is "not_analyzed".
If I do a search on an account number e.g. AC/1234/A01 then I get thousands of results, presumably because it is doing a regex search (?).
{
"query" : { "query_string" : {"query" : "AC/1234/A01"} }
}
I can get the result I want by doing an exact match search
{
"query" : { "query_string" : {"query" : "\"AC/1234/A01\""} }
}
This actually gives me the result I want and probably will fit the bill as a backup option (surrounding all 'single word' searches with quotes). However I'm thinking if they do a multiple word search including the account number I will be back to thousands of results and although I can't see the value of that search I would like to avoid it happening.
Essentially I have a java app querying elastic search and I would like to escape all forward slashes entered in the GUI.
My Googling has told me that
{
"query" : { "query_string" : {"query" : "AC\\/1234\\/A01"} }
}
ought to do this but it makes no difference, the query works but I still get thousands of results.
Could anyone point me in the right direction ?

You should get what you want without escaping anything simply by specifying a keyword analyzer for the query string, like this:
{
"query" : {
"query_string" : {
"query" : "AC\\/1234\\/A01",
"analyzer": "keyword" <---- add this line
}
}
}
If you don't do this, the standard analyzer is used (and will tokenize your query string) whatever the type of your field is or whether it is not_analyzed or not.

Use this query as example:
{
"query": {
"query_string": {
"fields": [
"account_number.keyword"
],
"query": "AC\\/1234\\/A01",
"analyzer": "keyword"
}
}
}

I use query_string because I want to give my users the possibility to do complex queries with OR and AND. Having the search break when a slash is used (e.g. when searching for an URL) is not helpful.
I worked around that issue by adding quotes when a slash is in the search string but no quotes:
if (strpos($query, '/') !== false && strpos($query, '"') === false) {
$query = '"' . $query . '"';
}

Related

Elastic exact matching and substring matching together

I know that Elastic have "keyword" type in order to find something with exact matching. Ex:
"address": { "type": "keyword"}
That's cool. exact matching works!
but I would like to have both "exact matching" and "sub-string" matching. So I decided to create the following mapping:
"address": { "type": "text" , "index": true }
Problem
If I have "text" type, how can I search exact matching string? (not sub-string). I've tried several ways but does not works:
GET testing_index/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"address" : "washington"
}
}
}
}
}
or
GET testing_index/_search
{
"query": {
"match": {
"address" : "washington"
}
}
}
I need just something universal mapping:
to find exact string
to find sub-strings
I hope elastic can do this.
By default, text fields use the default analyzer, which drops most punctuation, breaks up text into individual words, and lower cases them. For instance, the standard analyzer would turn the string “Quick Brown Fox!” into the terms [quick, brown, fox]. As you can imagine, this makes it difficult to write an exact match query against the text field. For your use case, I suggest one of 2 options:
store as keyword, and accomplish sub-string-like matching using wildcard or fuzzy queries. Wildcard queries, in particular queries with a leading wildcard, are notoriously slow, so proceed with caution.
store the field twice: one as keyword and one as text. Obvious downside here is bloating the size of the index.
For more background, see the "Term Query" Elasticsearch documentation, and in particular the section on "Why doesn’t the term query match my document?": https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

Phrase suggester returns unexpected result when first letter is misspelled

I'm using Elasticsearch Phrase Suggester for correcting user's misspellings. everything is working as I expected unless user enters a query which it's first letter is misspelled. At this situation phrase suggester returns nothing or returns unexpected results.
My query for suggestion:
{
"suggest": {
"text": "user_query",
"simple_phrase": {
"phrase": {
"field": "title.phrase",,
"collate": {
"query": {
"inlile" : {
"bool": {
"should": [
{ "match": {"title": "{{suggestion}}"}},
{ "match": {"participants": "{{suggestion}}"}}
]
}
}
}
}
}
}
}
}
Example when first letter is misspelled:
"simple_phrase" : [
{
"text" : "گاشانچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "گارانتی",
"score" : 0.00253151
}]
}
]
Example when fifth letter is misspelled:
"simple_phrase" : [
{
"text" : "کاشاوچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "کاشانچی",
"score" : 0.1121
},
{
"text" : "کاشانجی",
"score" : 0.0021
},
{
"text" : "کاشنچی",
"score" : 0.0020
}]
}
]
I expect that these two misspelled queries have same suggestions(my expected suggestions are second one). what is wrong?
P.S: I'm using this feature for Persian language.
I have solution for your problem, only need to add some fields in your schema.
P.S: I don't have that much expertise in elasticsearch but I have solved same problem using solr, you can implement same way in elasticSearch too
Create new ngram field and copy all you title name in ngram field.
When you fire any query for missspell word and you get blank result then split
the word and again fire the same query you will get results as expected.
Example : Suppose user searching for word Akshay but type it as Skshay, then
create query in below way you will get results as expected hopefully.
I am here giving you solr example same way you can achieve it using
elasticsearch.
**(ngram:"skshay" OR ngram:"sk" OR ngram:"ks" OR ngram:"sh" OR ngram:"ha" ngram:"ay")**
We have split the word sequence wise and fire query on field ngram.
Hope it will help you.
From Elasticsearch doc:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html
prefix_length
The number of minimal prefix characters that must match in order be a
candidate suggestions. Defaults to 1. Increasing this number improves
spellcheck performance. Usually misspellings don’t occur in the
beginning of terms. (Old name "prefix_len" is deprecated)
So by default phrase-suggester assumes that the first character is correct because the default value for prefix_length is 1.
Note: setting this value to 0 is not a good way because this will have performance implications.
You need to use the reverse analyzer
I explained it in this post so please go and check my answer
Elasticsearch spell check suggestions even if first letter missed
And regarding the duplicates, you can use
skip_duplicates
Whether duplicate suggestions should be filtered out (defaults to
false).

fuzzy searching with query_string Elasticsearch

i have a record saved in Elasticsearch which contains a string exactly equals to Clash of clans
now i want to search this string with Elasticsearch and i using this
{
"query_string" : {
"query" : "clash"
}
}
its working perfectly but now if i write
"query" : "class"
it dont give me back any record so i realize i should use Fuzzy searching so i come to know that i can use fuzziness parameter with query_string so i did
{
"query_string" : {
"query" : "clas"
"fuzziness":1
}
}
but still elasticsearch is not returning anything!
kindly help and i cant use Fuzzy query i just can use query_string.
Thanks
You need to use the ~ operator to have fuzzy searching in query_string:
{
"query": {
"query_string": {
"query": "class~"
}
}
}

How to use elastic search for advanced queries:

I'm using elasticsearch. I'm already pretty deep into it but I'm very confused as to how to go about writing advanced queries. There are queries / filters / etc. I'm confused as to how to proceed.
I have a schema that looks like this:
photos: {people: [{person_id: 1, person_name:"john kealy"}],
tags: [{tag_id: 1, tag_name:"other tag"},
by_line: "John D Kealy/My website.com",
location: "Some Place OUt West"]
I need to be able to string together these queries dynamically ALWAYS pulling in FULL MATCHES, e.g. I would like to search for
people.person_id: [1,2] (pulls in only photos with BOTH or more peole)
tags.tag_id: [1,2,3] (pulls in only photos with all three or more tags)
by_line: "John D. Kealy/My Website.com" (the full name including the slash)
location: "some place out west"
I would like to write one query with all these items. I need to include the slash in "by_line", i don't care up upper or lower case. I need the exact match "some place out west". What do I use here? Queries or filters / filtered?
General guidelines for bool filters/queries can be found here.
If you are constructing an "exact match" query, you can often use the term filter (or query).
If you are constructing a search that requires a solid performance speed wise, a filtered query is often advisable, as filters are set before the query is run, often improving performance.
As for your specific example, the below filters should work, throw it around a matchAll query or anything else you need [With the non-analyzed by_line field, the analyzed one has a query). This should give you an idea as how to construct future queries:
NOTE: This assumes that your by_line field is not analyzed. The double slash will escape your slash delimiter, if you are using an analyzed field you must use a match query.
Without analyzer on by_line
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [
{ "terms" : {"people.person_id" : ["1", "2"]}},
{ "terms" : {"tags.tag_id" : ["1", "2", "3"]}},
{ "term" : {"by_line" : "John D. Kealy\\/My Website.com"}},
{ "term" : {"location" : "some place out west"}}
]
}
}
}
}
}
I will keep the above there for future readers, however I see in your post history that you are using the standard analyzer, your query should be structured as follows.
With analyzer on by_line
{
"query" : {
"filtered" : {
"query": {
"match": {
"by_line": "John Kealy/BFA.com"
}
},
"filter" : {
"bool" : {
"must" : [
{ "terms" : {"people.person_id" : ["1", "2"]}},
{ "terms" : {"tags.tag_id" : ["1", "2", "3"]}},
{ "term" : {"location" : "some place out west"}}
]
}
}
}
}
}

Elastic Search Hyphen issue with term filter

I have the following Elastic Search query with only a term filter. My query is much more complex but I am just trying to show the issue here.
{
"filter": {
"term": {
"field": "update-time"
}
}
}
When I pass in a hyphenated value to the filter, I get zero results back. But if I try without an unhyphenated value I get results back. I am not sure if the hyphen is an issue here but my scenario makes me believe so.
Is there a way to escape the hyphen so the filter would return results? I have tried escaping the hyphen with a back slash which I read from the Lucene forums but that didn't help.
Also, if I pass in a GUID value into this field which is hyphenated and surrounded by curly braces, something like - {ASD23-34SD-DFE1-42FWW}, would I need to lower case the alphabet characters and would I need to escape the curly braces too?
Thanks
I would guess that your field is analyzed, which is default setting for string fields in elasticsearch. As a result, when it indexed it's not indexed as one term "update-time" but instead as 2 terms: "update" and "time". That's why your term search cannot find this term. If your field will always contain values that will have to be matched completely as is, it would be the best to define such field in mapping as not analyzed. You can do it by recreating the index with new mapping:
curl -XPUT http://localhost:9200/your-index -d '{
"mappings" : {
"your-type" : {
"properties" : {
"field" : { "type": "string", "index" : "not_analyzed" }
}
}
}
}'
curl -XPUT http://localhost:9200/your-index/your-type/1 -d '{
"field" : "update-time"
}'
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"filter": {
"term": {
"field": "update-time"
}
}
}'
Alternatively, if you want some flexibility in finding records based on this field, you can keep this field analyzed and use text queries instead:
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"query": {
"text": {
"field": "update-time"
}
}
}'
Please, keep in mind that if your field is analyzed then this record will be found by searching for just word "update" or word "time" as well.
The accepted answer didn't work for me with elastic 6.1. I solved it using the "keyword" field that elastic provides by default on string fields.
{
"filter": {
"term": {
"field.keyword": "update-time"
}
}
}
Based on the answer by #imotov If you're using spring-data-elasticsearch then all you need to do is mark your field as:
#Field(type = FieldType.String, index = FieldIndex.not_analyzed)
instead of
#Field(type = FieldType.String)
The problem is you need to drop the index though and re-instantiate it with new mappings.

Resources