Search in two fields on elasticsearch with kibana - elasticsearch

Assuming I have an index with two fields: title and loc, I would like to search in this two fields and get the "best" match. So if I have three items:
{"title": "castle", "loc": "something"},
{"title": "something castle something", "loc": "something,pontivy,something"},
{"title": "something else", "loc": "something"}
... I would like to get the second one which has "castle" in its title and "pontivy" in its loc. I tried to simplify the example and the base, it's a bit more complicated. So I tried this query, but it seems not accurate (it's a feeling, not really easy to explain):
GET merimee/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"fields": [ "title", "loc" ]
}
}
}
Is it the right way to search in various field and get the one which match the in all the fields?
Not sure my question is clear enough, I can edit if required.
EDIT:
The story is: the user type "castle pontivy" and I want to get the "best" result for this query, which is the second because it contains "castle" in "title" and "pontivy" in "loc". In other words I want the result that has the best result in both fields.

As the other posted suggested, you could use a bool query but that might not work for your use case since you have a single search box that you want to query against multiple fields with.
I recommend looking at a Simple Query String query as that will likely give you the functionality you're looking for. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
So you could do something similar to this:
{
"query": {
"simple_query_string" : {
"query": "castle pontivy",
"fields": ["title", "loc"],
"default_operator": "and"
}
}
}
So this will try to give you the best documents that match both terms in either of those fields. The default operator is set as AND here because otherwise it is OR which might not give you the expected results.
It is worthwhile to experiment with other options available for this query type as well. You might also explore using a Query String query as it gives more flexibility but the Simple Query String term works very well for most cases.

This can be done by using bool type of query and then matching the fields.
GET _search
{
"query":
{
"bool": {"must": [{"match": {"title": "castle"}},{"match": {"loc": "pontivy"}}]
}
}
}

Related

How can we make few tokens to be phrase in elastic search query

I want to search part of query to be considered as phrase .For e.g. I want to search "Can you show me documents for Hospitality and Airline Industry"
Here I want Airline Industry to be considered as phrase.I dont find any such settings in multi_match .
Even when we try to use multi_match query using "Can you show me documents for Hospitality and \"Airline Industry\"" .Default analyser breaks it into separate tokens.I dont want to change settings of my analyser.Also I have found that we can do this in simple_query_string but that has consequences that we can not apply filter option as we have in multi_match boolean query because I want to apply filter on certain feilds as well.
search_text="Can you show me documents for Hospitality and Airline Industry" Now I Want to pass Airline Industry as a phrase to search my indexed document against 2 fields.
okay so say I have existing code like this.
If filter:
qry={
“query":{
“bool”:{
“must”:{
"multi_match":{
"query":search_text,
"type":"best_fields",
"fields":["TITLE1","TEXT"],
"tie_breaker":0.3,
}
},
“filter”:{“terms”:{“GRP_CD”:[“1234”,”5678”] }
}
}
else:
qry={
"query":{
"multi_match":{
"query":search_text',
"type":"best_fields",
"fields":["TITLE1",TEXT"],
"tie_breaker":0.3
}
}
}
'But then I have realised this code is not handling Airline Industry as a phrase even though I am passing search string like this
"Can you show me documents for Hospitality and \"Airline Industry\""
As per elastic search document I came to know there is this query which might handle this
qry={"query":{
"simple_query_string":{
"query":"Can you show me documents for Hospitality and \"Airline Industry\"",
"fields":["TITLE1","TEXT"] }
} }
But now my issue is what if user want to apply filter..with filter query as above I can not pass phrase and boolean query is not possible with simple_query_string'
You can always combine queries using boolean query. Lets understand this case by case. Before going to the cases I would like to clarify one thing which is about filter. The filter clause of boolean query behave just like a must clause but the difference is that any query (even another boolean query with a must/should clause(s)) inside filter clause have filter context. Filter context means, that part of query will not be considered for score calculation.
Now lets move on to cases:
Case 1: Only query and no filters.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
]
}
}
}
Notice that the query is same as specified by you in the question. All I have done here is that I wrapped it in a bool query. This doesn't make any logical change to the query but doing so will make it easier to add queries to filter clause programmatically.
Case 2: Phrase query with filter.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
],
"filter": [
{
"terms": {
"GRP_CD": [
"1234",
"5678"
]
}
}
]
}
}
}
This way you can combine query(query context) with the filters.

Find one result based on a term query or a list of results based on a match query

I have an index of documents, each containing an id and name field. Each document name happens to be unique.
I want to perform a query on the name field that returns one exact result if possible, or falls back to return a list of similar results. For example, if the search term is Acme Incorporated and there is an exact result, return that only. Otherwise return similar matches; e.g: ACME Inc., acme, Ace etc.
I assumed that I need to somehow combine a keyword-based term query for an exact match, and a text-based match query for the similar matches. I am still getting to grips with compound queries so my first attempt was pretty naive:
{
"query": {
"bool": {
"should": [
{
"term": {
"name.exact": "Acme Incorporated"
}
},
{
"match": {
"name": "Acme Incorporated"
}
}
]
}
}
}
This returns a list of similar matches AND an exact match if present, because at least one query should succeed. This is obviously not correct.
In order to facilitate the keyword-based term query above, I added name.exact to my document mapping:
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"exact": {
"type": "keyword"
}
}
}
}
}
}
I suppose another approach is use the Multi Search API to perform the above queries separately. This allows me to look at the responses, and decide to use the match query if the term query result set is empty. This will work for my use case but I suspect that this is not an optimal approach.
I assume this is a common use-case but I am not sure what the solution is.
Edit
My current thinking on this is that I go with a Multi Search query as described above, the first is the same keyword-based term query to attempt to find an exact result and the second is the following — a compound bool query that excludes an exact result.
{
"query": {
"bool": {
"must": {
"match": {
"name": "Acme Incorporated"
}
},
"must_not": {
"term": {
"name.keyword": "Acme Incorporated"
}
}
}
}
}
In the end, the MultiSearch API suited my use case:
The multi search API executes several searches from a single API request. The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format.
I used this to perform two queries in one request:
Find any exact results with a keyword-based term query on the document name field.
Find any similar results with a bool query, comprising a match query on the
document name field, and a must_not of the first query to
filter out any exact results.
A Multi Search body is constructed of one or more pairs of an (optionally) empty header and body (a single query) delimited by newlines; e.g:
GET /myindex/_msearch
{}
{"query": {"constant_score": {"filter": {"term": {"name.keyword": "Acme Incorporated"}}}}}
{}
{"query": {"bool": {"must": {"match": {"name": "Acme Incorporated"}}, "must_not": {"term": {"name.keyword": "Acme Incorporated"}}}}}
The query is in ndjson format, which states that "Each Line is a Valid JSON Value". This requires that each query be compressed to one line, which is not very readable but not an issue if you're using a library to construct queries.

How can I achieve this type of queries in ElasticSearch?

I have added a document like this to my index
POST /analyzer3/books
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
And then I do queries like this
GET /analyzer3/_analyze
{
"analyzer": "english",
"text": "\"The * day I went with my * to the\""
}
And it successfully returns the previously added document.
My idea is to have quotes so that the query becomes exact, but also wildcards that can replace any word. Google has this exact functionality, where you can search queries like this, for instance "I'm * the university" and it will return page results that contain texts like I'm studying in the university right now, etc.
However I want to know if there's another way to do this.
My main concern is that this doesn't seem to work with other languages like Japanese and Chinese. I've tried with many analyzers and tokenizers to no avail.
Any answer is appreciated.
Exact matches on the tokenized fields are not that straightforward. Better save your field as keyword if you have such requirements.
Additionally, keyword data type support wildcard query which can help you in your wildcard searches.
So just create a keyword type subfield. Then use the wildcard query on it.
Your search query will look something like below:
GET /_search
{
"query": {
"wildcard" : {
"title.keyword" : "The * day I went with my * to the"
}
}
}
In the above query, it is assumed that title field has a sub-field named keyword of data type keyword.
More on wildcard query can be found here.
If you still want to do exact searches on text data type, then read this
Elasticsearch doesn't have Google like search out of the box, but you can build something similar.
Let's assume when someone quotes a search text what they want is a match phrase query. Basically remove the \" and search for the remaining string as a phrase.
PUT test/_doc/1
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
GET test/_search
{
"query": {
"match_phrase": {
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
}
}
For the * it's getting a little more interesting. You could just make multiple phrase searches out of this and combine them. Example:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": "The"
}
},
{
"match_phrase": {
"title": "day I went with my"
}
},
{
"match_phrase": {
"title": "to the"
}
}
]
}
}
}
Or you could use slop in the phrase search. All the terms in your search query have to be there (unless they are being removed by the tokenizer or as stop words), but the matched phrase can have additional words in the phrase. Here we can replace each * with 1 other words, so a slop of 2 in total. If you would want more than 1 word in the place of each * you will need to pick a higher slop:
GET test/_search
{
"query": {
"match_phrase": {
"title": {
"query": "The * day I went with my * to the",
"slop": 2
}
}
}
}
Another alternative might be shingles, but this is a more advanced concept and I would start off with the basics for now.

Elasticsearch wrong explanation validate api

I'm using Elasticsearch 5.2. I'm executing the below query against an index that has only one document
Query:
GET test/val/_validate/query?pretty&explain=true
{
"query": {
"bool": {
"should": {
"multi_match": {
"query": "alkis stackoverflow",
"fields": [
"name",
"job"
],
"type": "most_fields",
"operator": "AND"
}
}
}
}
}
Document:
PUT test/val/1
{
"name": "alkis stackoverflow",
"job": "developer"
}
The explanation of the query is
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow))) #(#_type:val)
I read this as:
Field job must have alkis and stackoverflow
AND
Field name must have alkis and stackoverflow
This is not the case with my document though. The AND between the two fields is actually OR (as it seems from the result I'm getting)
When I change the type to best_fields I get
+(((+job:alkis +job:stackoverflow) | (+name:alkis +name:stackoverflow))) #(#_type:val)
Which is the correct explanation.
Is there a bug with the validate api? Have I misunderstood something? Isn't the scoring the only difference between these two types?
Since you picked the most_fields type with an explicit AND operator, the reasoning is that one match query is going to be generated per field and all terms must be present in a single field for a document to match, which is your case, i.e. both terms alkis and stackoverflow are present in the name field, hence why the document matches.
So in the explanation of the corresponding Lucene query, i.e.
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow)))
when no specific operator is specified between the terms, the default one is an OR
So you need to read this as: Field job must have both alkis and stackoverflow OR field name must have both alkis and stackoverflow.
The AND operator that you apply only concerns all the terms in your query but in regard to a single field, it's not an AND between all fields. Said differently, your query will be executed as a two match queries (one per field) in a bool/should clause, like this:
{
"query": {
"bool": {
"should": [
{ "match": { "job": "alkis stackoverflow" }},
{ "match": { "name": "alkis stackoverflow" }}
]
}
}
}
In summary, the most_fields type is most useful when querying multiple fields that contain the same text analyzed in different ways. This is not your case and you'd probably better be using cross_fields or best_fields depending on your use case, but certainly not most_fields.
UPDATE
When using the best_fields type, ES generates a dis_max query instead of a bool/should and the | (which is not an OR !!) sign separates all sub-queries in a dis_max query.

Productsearch with Elasticsearch

I am relatively new to elasticsearch and I want to perform a search for products with brand and type names.
I already tried a bit but I think I am missing something important to have a solid search algorithm. Here is my approach:
A product looks e.g. like this:
{
brandName: "Samsung",
typeName: "PS-50Q7HX",
...
}
I will have a single input field. The user can search for a brand/type only or for a brand in combination with a type name. E.g.
Samsung | Samsung PS-50Q7HX | PS-50Q7HX
To eliminate misstyping in the typeName field I use an ngram tokenizer which works great when I search for types only. But in combination with the brandName field I get in trouble. Using something like this does not work well (especially when I use an ngram tokenizer on the brandName field too):
{
"query" : {
"multi_match" : {
"query": "Samsung PS 50Q 7HX",
"type": "cross_fields",
"fields": ["brandName", "typeName"]
}
}
}
Of course I know why this is not working well with two ngram tokenizer and a mixed field but I am not sure how to solve this the best way.
I think the main problem is that I do not know if the user entered a brand name or not and I thought about using a second index filled with all available brands, which I use to perform a "pre-search" for an eventually given brand name in my query string. If I find a match I am able to split the search string into type and brand name and perform a more specific search. Like this one
{
"query": {
"bool": {
"must": [
{ "match": { "brandName": "Samsung" } },
{ "match": { "typeName": "PS-50Q7HX" } }
]
}
}
}
Does this sound like a good approach? Or does anyone see a better way?
Any help is appreciated!
Thank you very much and best regards,
Stefan
To eliminate the typo mistake by the user, you used ngram analyzer which is a costly one. You could use stem analyzer which provide some flexible options to eliminate the typo mistakes
As per my concern, instead of index this in 2 different fields you could index this as a single field.
ex:- "FIELD_NAME": "Samsung|PS-50Q7HX"
Brand name and Product name with some delimiter i used |. analyse this field values with delimiter. so your content data will be index as follows
Samsung
PS-50Q7HX
Then you could search by the following query
{
"query": {
"query-string": {
"query": "Samsung PS-50Q7HX",
"default_operator": "or",
"fields": [
"FIELD_NAME"
]
}
}
}
this will retrieve the document which has the brand name as samsung or product name as PS-50Q7Hx from index. you could use prefix search and if you use default_operator as and then your search will be most accuracy.

Resources