Searching in elasticsearch with proximity(slop) zero and one - elasticsearch

I have created the following index
PUT /proximity_example_1
{
"mappings":{
"properties":{
"doc_id": {
"type": "text"
},
"test_name":{
"type": "text"
}
}
}
}
Then indexed a document
POST proximity_example_2/_doc
{
"doc_id": "id1",
"test_name": "test proximity here"
}
Then queried with proximity 0, as follow
GET proximity_example_2/_search
{
"query": {
"match_phrase": {
"test_name": {
"query": "proximity test",
"slop": 0.0
}
}
}
}
But I didn't get any result, Then I searched with proximity 1 , and this time also I didn't get any document.
But when I searched with proximity greater than 1, I got results.
GET proximity_example_2/_search
{
"query": {
"match_phrase": {
"test_name": {
"query": "proximity test",
"slop": 2.0
}
}
}
}
GET proximity_example_2/_search
{
"query": {
"match_phrase": {
"test_name": {
"query": "proximity test",
"slop": 3.0
}
}
}
}
So does that mean in elasticsearch when we do a search with proximity 1 or 0 order of the search term matters?
Thank you...

Slop with value 0 is as good as normal phrase search(very restrictive and should have search terms in the exact same order in the Elasticsearch), as you increase the slope this restrictiveness gets reduce and you will have more search results, but beware that increasing to to high number will defeat the purpose of phrase search and you will get irrelevant results.
You can read this and this detailed blog post that explains how it works internally

Related

Counting the SEARCH term/phrase in a specific field in Elasticsearch

I have this type of data
{
"name_id": 2145
"address": "Antartica"
"characteristics" : "He is a very nice person with very nice personality. the nicest thing about him is his nice dog"
}
now I am running this query
GET friends/_search
{
"query": {
"bool": {
"must": [
{"term": {
"name_id.keyword": "B08F2BWX2V"
}
},
{
"match_phrase": {
"characteristics": "nice"
}
}
]
}
}
}
is there a way i can get the results and the word count i.e,
nice : 4
There is an elastic api that can return the token count information you need.
It is the Term vectors API.
I'm not sure if it will be exactly what you need but I saw in the post below a question similar to yours:
https://stackoverflow.com/a/69734423/18778181

how to i search in my whole index without giving fields in elasticsearch-kibana

GET my_production_productsd/_search
{
"query": {
"match_phrase_prefix": {
"ProductDescription": "women"
}
}
}
it gets results from only ProductDescription
Please take a look on _all field. It was deprecated in 6.0.0 version of ES mostly because of storage size. But you can still enable it. Maybe in your case it would not be a problem.
Example code:
GET /my_index/_search
{
"query": {
"match": {
"_all": "john smith 1970"
}
}
}

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Elastic Search - Sort By Doc Type

I have an elastic search index with 2 different doc types: 'a' and 'b'. I would like to sort my results by type and give preference to type='b' (even if it has a low score). I had been consuming the results of the search below at the client end and sorting them but I've realized that this approach does not work well since I am only inspecting the first 10 results which often does not contain any b's. Increasing the return results is not ideal. I'd like to get the elastic search to do the work.
http://<server>:9200/my_index/_search?q=london
You would need to play with function_score and, depending on how you already score your documents, test some weight values, boost_modes and score_modes for each type. For example:
GET /some_index/a,b/_search
{
"query": {
"function_score": {
"query": {
# your query here
},
"functions": [
{
"filter": {
"type": {
"value": "b"
}
},
"weight": 3
},
{
"filter": {
"type": {
"value": "a"
}
},
"weight": 1
}
],
"score_mode": "first",
"boost_mode": "multiply"
}
}
}
Its working for me.you will execute below commands at command Prompt.
curl -XGET localhost:9200/index_v1,index_v2/_search?pretty -d #boost.json
boost.json
{
"indices_boost" : {
"index_v2" : 1.4,
"index_v1" : 1.3
}
}

Elasticsearch fuzzy matching: How can I get direct hits first?

I'm using Elasticsearch to search names in a database, and I want it to be fuzzy to allow for minor spelling errors. Based on the advice I've found on the matter, I'm using "match" and "fuzziness" instead of "fuzzy", which definitely seems to be more accurate. This is my query:
{ "query":
{ "match":
{ "last_name":
{ "query": "Beach",
"type": "phrase",
"fuzziness": 2
}
}
}
}
However, even though I have numerous results with last_name "Beach" (I know there's at least 100), I also get results with last_name "Beech" and "Berch" in the first 10 hits returned by my query. Can someone help me figure out how to get the exact matches first?
Try changing your query to a boolean query with 2 should queries.
The first one being your current query, and then second being a query that only gives exact matches, then give that one a big boost (like 10.0).
That should get your exact matches on top while still listing your partial matches.
I tried to edit "Constantijn" answer above to include sample based on his answer, but still not appearing (pending approval). So, I will just put a sample here instead...
{
"query": {
"bool": {
"should": [
{
"match": {
"last_name": {
"query": "Beach",
"fuzziness": 2,
"boost": 1
}
}
},
{
"match": {
"last_name": {
"query": "Beach",
"boost": 10
}
}
}
]
}
}
}

Resources