ElasticSearch multi_match query over multiple fields with Fuzziness - elasticsearch

How can I add fuzziness to a multi_match query? So if someone is to search for 'basball' it would still find 'baseball' articles. Currently my query looks like this:
POST /newspaper/articles/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "baseball",
"type": "phrase",
"fields": [
"subject^3",
"section^2.5",
"article^2",
"tags^1.5",
"notes^1"
]
}
}
}
}
}
One option I was looking at is to do something like this, just don't know if this is the best option. It's important to keep the sorting based on the scoring:
"query" : {
"query_string" : {
"query" : "subject:basball^3 section:basball^2.5 article:basball^2",
"fuzzy_prefix_length" : 1
}
}
Suggestions?

To add fuzziness to a multiquery you need to add the fuzziness property as described here:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "baseball",
"type": "phrase",
"fields": [
"subject^3",
"section^2.5",
"article^2",
"tags^1.5",
"notes^1"
],
"fuzziness" : "AUTO",
"prefix_length" : 2
}
}
}
}
}
Please notice that prefix_length explained in the doc as:
The number of initial characters which will not be “fuzzified”. This helps to reduce the number of terms which must be examined. Defaults to 0.
To check the possible values of fuzziness please visit the ES docs.

Related

Elasticsearch with grouped query_string

{
"query":
{
"query_string" :
{
"query" : "((name:the_search_phrase) OR (keywords:the_search_phrase)) AND (city:Sydney, Australia)"
}
}
}
New to elasticsearch. Building the JSON as per the documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
The query runs, however, results with city other that Sydney, Australia are returned too. Why the AND part is not working?
I want the search phrase to match against either or both name, keywords but the city should be strictly Sydney.
What you are doing is a full text query. city:Sydney, Australia seems to be a filter query. Like a WHERE clause in a SQL. You are better off using a filter query for that.
Look at the boolean query for examples,
Something like this perhaps,
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "the_search_term",
"fields": [
"name",
"keywords"
]
}
}
],
"filter": [
{
"match": {
"city": "Sydney, Australia"
}
}
]
}
}
}

How do i write a search query in elastic when the field name is unknown

I want to use the elastic search to query documents who has the matching values of "dc883c6f24776ad6ce1f86c41b5cf87cfb784e85".
Please find the structure of the source document attached in the image. Is it possible to query this with something like below using *[wild char] when the field name is unknown for us.
"query" : {
"constant_score" : {
"filter" : {
"terms" : {
"commitId.persistence-statics-service.*":["dc883c6f24776ad6ce1f86c41b5cf87cfb784e85"]
}
}
}
}
You can use query_string or multi_match to specify a wildcard in the field name part. I think the multi_match is simpler though:
{
"query": {
"constant_score": {
"filter": {
"multi_match": {
"query": "dc883c6f24776ad6ce1f86c41b5cf87cfb784e85",
"fields": [
"commitId.persistence-statics-service.*"
],
"analyzer": "keyword"
}
}
}
}
}
Try to Use query_string . Its very powerful in partial search in ES ,check my answer link :-
{
"query": {
"query_string": {
"fields" : ["commitId.persistence-statics-service.*"] ,
"query": "*dc883c6f24776ad6ce1f86c41b5cf87cfb784e85*"
}
}
}
Query_string is more powerful than multi_match link2

elasticsearch ngram analyzer return unexpected result

I'm using ngram analyzer for indexing and standard analyzer for query.
currently i have indexed multiphone and iphone.
when i search for iphone the score and therefore relevancy of multiphone is higher than iphone.
how should i build query in order to get higher score for iphone?
the query that i execute is
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "iphone",
"fields": [
"englishName",
"aliasName"
]
}
},
what i need is that iphone score be higher than multiphone.
what about performance?
I have answered similar question here
Basically you need to add raw version of the field to your mapping. You could use keyword analzyer with lowercase filter or you can make it "index" : "not_analyzed" or even use default standard analyzer.
Then you do a bool query and add a clause for the exact match and It will be scored higher.
EDIT : Example
You could map your englishName field as follow
englishName: {
type: 'string',
index_analyzer: 'ngram_analyzer',
search_analyzer: 'standard',
"fields": {
"raw": {
"type": "string",
"index" : "not_analyzed" <--- here
}
}
}
You could do the same with aliasName
Then your query would look something like this
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "iphone",
"fields": [
"englishName",
"aliasName"
]
}
},
{
"multi_match": {
"query": "iphone",
"fields": [
"englishName.raw",
"aliasName.raw"
],
"boost": 5
}
}
]
}
}
}
iphone will be scored higher with this query
Hope this helps.

Elastic search filter

I am new to Elastic search . Please help me in finding the filter/query to be written to match the exact records using Java API.
Below is the mongodb record .I need to get both the record matching the word 'Jerry' using elastic search.
{
"searchcontent" : [
{
"key" : "Jerry",
"sweight" : "1"
},{
"key" : "Kate",
"sweight" : "1"
},
],
"contentId" : "CON_4",
"keyword" : "TEST",
"_id" : ObjectId("55ded619e4b0406bbd901a47")
},
{
"searchcontent" : [
{
"key" : "TOM",
"sweight" : "2"
},{
"key" : "Kruse",
"sweight" : "2"
}
],
"contentId" : "CON_3",
"keyword" : "Jerry",
"_id" : ObjectId("55ded619e4b0406ccd901a47")
}
And if you would like to search in all the fields.
Then you can just do a match _all query,
POST <index name>/<type name>/_search.
{
"query": {
"match" : {
"_all" : "Jerry"
}
}
}
This searches for 'Jerry' in all the fields.
A Multi-Match query is what you need to search across multiple fields. Below query will search for the word "jerry" in both the fields "searchcontent.key" and "keyword" which is what you want.
POST <index name>/<type name>/_search
{
"query": {
"multi_match": {
"query": "jerry",
"fields": [
"searchcontent.key",
"keyword"
]
}
}
}
There is no single solution, it depends how you map your data in elastic search and what you are indexing
GET /intu/_settings
You can use: query string.
If you don't need to combine filter you can remove bool and should.
From the documentation: "The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document."
For example:
GET /yourstaff/_search
{
"query": {
"filtered": {
"query": {
"bool": {
"should":
{
"query_string": {
"query": "jerry"
}
}
}
}
}
}
}
Take a look to the documentation:
Query string
Term vs full-search
Bool query
Use Sense to figure out what results you want to have
Using filter is a better option as it caches the results..
{
"query":
{
"bool":
{
"should":
[
{
"term":
{
"searchcontent.key":"jerry"
}
},
{
"term":
{
"keyword":"jerry"
}
}
]
}
}
}
https://www.elastic.co/blog/found-optimizing-elasticsearch-searches
A suggested read for better search.

Elasticsearch: how to disable scoring on a field?

I am new to Elasticsearch and please forgive me if the answer is obvious.
Here is what I have for the mapping of the field in question:
"condition" : { "type" : "string", "store" : "no", "index": "not_analyzed", "omit_norms" : "true" }
I need search on this field, but I need 100% string match (no stemming, etc.) on a sub-string (blank separated). An example of this field in a document is as follows:
{
"condition": "abc xyz"
}
An example query is:
/_search?q=condition:xyz
Is the above mapping correct? I also used omit_norms (true). Is this a correct thing to do in my case?
How can I disable scoring on this field? Can I do it in mapping? What is the best way of doing it? (Actually I need to disable scoring on more than one. I do have fields that need scoring)
Thanks and regards!
Using omit_norms:true will not take the length of the field into consideration for the scoring, Elasticsearch won't index the norms information. So if you don't want to use scoring that is a good thing to do as it will save you some disk space.
If you're not interested in scoring in your queries use a filtered query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": {
"term": {
"condition": "abc xyz"
}
}
}
}
}
}
}
The new syntax for a filtered query is now:
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"condition": "abc"
}
}
}
}
}

Resources