ElasticSearch Match Multiple Prefix Terms - performance

I am trying to give ElasticSearch a query with multiple terms and then be given matching documents where the terms specified are anywhere in the target field. The terms may be full words or word prefixes.
Example document:
{
"msg": "hello I am a text message"
}
Example query string:
"hello message"
The words "hello" and "message" appear in the text so I want the document returned. The same query should also return the document if the query string is:
"hel mes"
What is the most performant way to query ElasticSearch to achieve this goal?

Related

How to prevent slow match / match_phrase queries for keywords in Kibana?

How can I achieve that a match query for certain fields is equivalent to a term query?
I have a larger index in Elastic covering events. Each event has an eventid field consisting of a random hex string (e.g. f4fc38c993c1a8273f9c40eedc9050b7) as well as some other fields. The eventid is indexed as keyword in Elastic.
If I query based on this field in Kibana, the query often runs into timeouts, because Kibana automatically generates a match query for eventid:f4fc38c993c1a8273f9c40eedc9050b7.
If I set a manual filter using { "query": { "term": { "eventid": "f4fc38c993c1a8273f9c40eedc9050b7" } } } (so a term instead of match query) I get a response quite quickly.
From my understanding, these should be pretty much equivalent, as keyword fields aren't analyzed, so the match query should be equivalent to a term query.
What am I missing?

Elastic search wildcard search space issue

Consider index field "ProductName" having the value "dove 3.75oz" and when user searches for "dove 3.75oz" text below bool query is working fine to retreive the document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75oz"}}}]}}
If user searches for "dove 3.75 oz" (Space between "3.75" and "oz") the bool query is failing to retrieve the same document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75 oz"}}}]}}
Question: How to design a query using a wildcard query that supports space or no spaces? Please share an example.
Text fields values are broken into tokens by default and then stored. So something like "hello man"" will be saved separately as hello and man because of the space between them. And that is exactly why this will not work with a wildcard query.
{"wildcard":{"ProductName":{"value":"3.75 oz"}}}
It only works for single tokens. For wildcard queries you can use a special field type called wildcard.
If you do not want to reindex your data, try phrase search like:
"match_phrase": {
"ProductName": {
"query": "3.75 oz"
}
}

How to find all documents with specific string in field?(Elasticsearch)

I have a document with fields:
"provider": "AppStore",
"device_model": "iPad3,6[graphicsDeviceName: PowerVR SGX 554]",
"days_in_game": 34,
And I need to get all documents with iPad string in device_model!
Is it possible?
There are two types of search queries in Elasticsearch ie. term queries and match queries. The match first analyzes the query string, then looks for documents containing the words in the query and returns result depending upon how closely it matches.
What the term query does is basically a yes or no query and will return only the documents that have an exact match.
I think for your case a term query is better fit. And since field does not contain the exact word iPad but something like iPad3 you should use a prefix, wildcard or possibly a regexp query depending upon what your document actually contain(take a look at this)
You could use the following query:
{
"query": {
"prefix": {
"device_model": "iPad"
}
}

Is it possible to chain fquery filters in elastic search with exact matches?

I have been having trouble writing a method that will take in various search parameters in elasticsearch. I was working with queries that looked like this:
body:
{query:
{filtered:
{filter:
{and:
[
{term: {some_term: "foo"}},
{term: {is_visible: true}},
{term: {"term_two": "something"}}]
}
}
}
}
Using this syntax I thought I could chain these terms together and programatically generate these queries. I was using simple strings and if there was a term like "person_name" I could split the query into two and say "where person_name match 'JOHN'" and where person_name match 'SMITH'" getting accurate results.
However, I just came across the "fquery" upon asking this question:
Escaping slash in elasticsearch
I was not able to use this "and"/"term" filter searching a value with slashes in it, so I learned that I can use fquery to search for the full value, like this
"fquery": {
"query": {
"match": {
"by_line": "John Smith"
But how can I search like this for multiple items? IT seems that when i combine fquery and my filtered/filter/and/term queries, my "and" term queries are ignored. What is the best practice for making nested / chained queries using elastic search ?
As in the comment below, yes I can just add fquery to the "and" block like so
{:filtered=>
{:filter=>
{:and=>[
{:term=>{:is_visible=>true}},
{:term=>{:is_private=>false}},
{:fquery=>
{:query=>{:match=>{:sub_location=>"New JErsey"}}}}]}}}
Why would elasticsearch also return results with "sub_location" = "new York"? I would like to only return "new jersey" here.
A match query analyzes the input and by default it is a boolean OR query if there are multiple terms after the analysis. In your case, "New JErsey" gets analyzed into the terms "new" and "jersey". The match query that you are using will search for documents in which the indexed value of field "sub_location" is either "new" or "jersey". That is why your query also matches documents where the value of field "sub_location" is "new York" because of the common term "new".
To only match for "new jersey", you can use the following version of the match query:
{
"query": {
"match": {
"sub_location": {
"query": "New JErsey",
"operator": "and"
}
}
}
}
This will not match documents where the value of field "sub_location" is "New York". But, it will match documents where the value of field "sub_location" is say "York New" because the query finally translates into a boolean query like "York" AND "New". If you are fine with this behaviour, well and good, else read further.
All these issues arise because you are using the default analyzer for the field "sub_location" which breaks tokens at word boundaries and indexes them. If you really do not care about partial matches and want to always match the entire string, you can make use of custom analyzers to use Keyword Tokenizer and Lowercase Token Filter. Mind you, going ahead with this approach will need you to re-index all your documents again.

ElasticSearch: Matching multiple queries

I am using Tire (ElasticSearch Ruby gem), and want to match a few fields on the keyword "community marketing". However, I also want ElasticSearch to return me results for the keyword "communities marketing" as well. The standard analyzer does not parse/tokenize "communities" as "community" so they're separate keywords.
How do I get ElasticSearch to return me results for both "community marketing" and "communities marketing"? I prefer to do this in query time, rather than index time. I'm fine with ElasticSearch standard analyzer and prefer not to mess around with it.
fields = ["title", "popular_hash_tags"]
keyword = "communities marketing"
keyword2 = "community marketing"
s = Tire.search "articles" do
query do
match fields, keyword, :operator => "AND"
#NOW I also want to match keyword2??
end
end
I suggest digging through the query DSL of Elasticsearch. You will find a lot of interesting stuff.
For instance, the "should" clause of a bool filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html

Resources