Performing an AND query in elastic search - elasticsearch

I have tried looking for another solution to this, but the Bool query in ES seems to not do quite what I am looking for. Or I am just not using it correctly.
In our current implementation of search we are trying to boost performance/reduce memory footprint of each query by changing our query logic. Today, if you search for "The Red Ball" you may get back 5 million documents because ES returns any document that matches "the" OR "red" OR "ball" which means we get back WAAAAAY too many irrelevant documents (mostly because of the "the" term). I would like to change our query to instead use AND so ES would return only documents that match "the" AND "red" AND "ball".
I am using the NEST Client to do this with C# so an example using the client would be best since that seems to be where I cannot figure out what to do. Thanks

You can simply use query string query with AND operator.
{
"query": {
"query_string": {
"default_field": "your_field", <--- remove this if you want to search on all fields
"query": "the red ball",
"default_operator": "AND"
}
}
}
or simply
{
"query": {
"query_string": {
"query": "the AND red AND ball"
}
}
}
I do not know C#, but this is how it might look in nest(everyone,feel free to edit)
client.Search<your_index>(q => q
.Query(qu => qu
.QueryString(qs=>qs
.OnField(x=>your_field).Query("the AND red AND ball")
)
)
);

I found the appropriate query to make using the NEST client:
SearchDescriptor<BackupEntitySearchDocument> desc = new SearchDescriptor<BackupEntitySearchDocument>();
desc.Query(qq => qq.MultiMatch(m => m.OnFields(_searchFields).Query(query).Operator(Operator.And)));
var searchResp = await _client.SearchAsync<BackupEntitySearchDocument>(desc).ConfigureAwait(false);
Where _searchFields is a List<string> containing the fields to match on and query is the term to search for.

Related

Search in two fields on elasticsearch with kibana

Assuming I have an index with two fields: title and loc, I would like to search in this two fields and get the "best" match. So if I have three items:
{"title": "castle", "loc": "something"},
{"title": "something castle something", "loc": "something,pontivy,something"},
{"title": "something else", "loc": "something"}
... I would like to get the second one which has "castle" in its title and "pontivy" in its loc. I tried to simplify the example and the base, it's a bit more complicated. So I tried this query, but it seems not accurate (it's a feeling, not really easy to explain):
GET merimee/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"fields": [ "title", "loc" ]
}
}
}
Is it the right way to search in various field and get the one which match the in all the fields?
Not sure my question is clear enough, I can edit if required.
EDIT:
The story is: the user type "castle pontivy" and I want to get the "best" result for this query, which is the second because it contains "castle" in "title" and "pontivy" in "loc". In other words I want the result that has the best result in both fields.
As the other posted suggested, you could use a bool query but that might not work for your use case since you have a single search box that you want to query against multiple fields with.
I recommend looking at a Simple Query String query as that will likely give you the functionality you're looking for. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
So you could do something similar to this:
{
"query": {
"simple_query_string" : {
"query": "castle pontivy",
"fields": ["title", "loc"],
"default_operator": "and"
}
}
}
So this will try to give you the best documents that match both terms in either of those fields. The default operator is set as AND here because otherwise it is OR which might not give you the expected results.
It is worthwhile to experiment with other options available for this query type as well. You might also explore using a Query String query as it gives more flexibility but the Simple Query String term works very well for most cases.
This can be done by using bool type of query and then matching the fields.
GET _search
{
"query":
{
"bool": {"must": [{"match": {"title": "castle"}},{"match": {"loc": "pontivy"}}]
}
}
}

Custom score for exact, phonetic and fuzzy matching in elasticsearch

I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:
if input = exact 'Smith' then score = 100%
else
if input = phonetic match then
score = <depending upon fuzziness match of input with name>%
end if
end if;
I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!
Update:
I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:
The post for reference:
https://discuss.elastic.co/t/fuzzy-query-scoring-based-on-levenshtein-distance/11116
The text to look for in the post:
"For future readers I solved this issue by creating a custom score query and
writing a (native) script to handle the scoring."
You can implement this search logic using the rescore function query (docs here).
Here there is a possible example:
{
"query": {
"function_score": {
"query": { "match": {
"input": "Smith"
} },
"boost": "5",
"functions": [
{
"filter": { "match": { "input.keyword": "Smith" } },
"random_score": {},
"weight": 23
}
]
}
}
}
In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).
You can control the re-score effect tuning the weight parameter.

Analyzer to find , e.g: "starbucks" when mistakenly querying "star bucks"

How would I define an analyzer so a query recalls a document with term "starbucks" when mistakenly querying "star bucks"?
Or in general: how would I define an analyzer that is able to search for combined terms by omitting term-separators/ spaces, in the supplied query?
N-grams clearly don't work, since you'd have to know to split up the term 'starbucks' on indexing in 2 separate terms 'star' and 'bucks'. Splitting on syllables might be enough, but not sure if that's possible (or scales)
Thoughts?
You can use Fuzzy Search.
Here is a full working sample:
PUT test1
POST test1/a
{
"item1": "starbucks"
}
POST test1/a
{
"item1": "foo"
}
GET test1/a/_search
{
"query": {
"fuzzy": {
"item1": "star bucks"
}
}
}

In elastic search, q=joh* is returning a correct set, but a JSON with match: joh* is not

When I call this URL:
http://192.168.x.x:9200/identities/work/_search?q=joh*
ES is returning a limited (5) set of matches, starting with some indexes of people names John and Johnny etc. That seems to be the correct result.
But when I send this JSON to ES:
{
"query": {
"match": {
"_all": "joh*"
}
}
}
I get results that I can't even logically explain. Seems rather random, and a lot of indexes too (hundreds, not a lot of johns and johnny's either ;))
Is this not the equivalent of the URL mentioned above? What am I doing wrong?
When you call the following URL, what ES does implicitly is to create a query_string query not a match query
http://192.168.x.x:9200/identities/work/_search?q=joh*
So the equivalent JSON query would be:
{
"query": {
"query_string": {
"query": "joh*"
}
}
}
Moreover, match queries do not handle wildcards as in joh*, the * is considered and matched as a real character, not as a wildcard.

How do I construct this elasticsearch query object?

My documents are indexed like this:
{
title: "stuff here",
description: "stuff here",
keywords: "stuff here",
score1: "number here",
score2: "number here"
}
I want to perform a query that:
Uses the title, description, and keywords fields for matching the text terms.
It doesn't have to be complete match. Eg. If someone searches "I have a big nose", and "nose" is in one of the document titles but "big" is not, then this document should still be returned.
Edit: I tried this query and it works. Can someone confirm if this is the right way to do it? Thanks.
{
query:{
'multi_match':{
'query': q,
'fields': ['title^2','description', 'keywords'],
}
}
}
Your way is definitely the way to go!
The multi_match query is usually the one that you want to expose to the end users, while the query_string is similar, but also more powerful and dangerous since it exposes the lucene query syntax. Rule of thumb: don't use query string if you don't need it.
Also, searching on multiple fields is easy just providing the list of fields you want to search on, as you did, without the need for a bool query.
Below is the code that will create the query you can use. I wrote it in c# but it will work in other languages in the same way.
What you need to do is to create a BooleanQuery and set that at least 1 of its condition has to match. Then add a condition for every document field you want to be checked with Occur.SHOULD enum value:
BooleanQuery searchQuery = new BooleanQuery();
searchQuery.SetMinimumNumberShouldMatch(1);
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "title", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
Query titleQuery = parser.Parse("I have a big nose");
searchQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "description", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
Query descriptionQuery = parser.Parse("I have a big nose");
searchQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "keywords", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
Query keywordsQuery = parser.Parse("I have a big nose");
searchQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
Query queryThatShouldBeExecuted.Add(searchQuery, BooleanClause.Occur.MUST);
Here is the link to an example in java http://www.javadocexamples.com/java_source/org/apache/lucene/search/TestBooleanMinShouldMatch.java.html
The according JSON object to perform a HTTP Post request would be this:
{
"bool": {
"should": [
{
"query_string": {
"query": "I have a big nose",
"default_field": "title"
}
},
{
"query_string": {
"query": "I have a big nose",
"default_field": "description"
}
},
{
"query_string": {
"query": "I have a big nose",
"default_field": "keywords"
}
}
],
"minimum_number_should_match": 1,
"boost": 1
}
}

Resources