Match partial query string with complete document in elaticsearch index - elasticsearch

I require to search against documents of small text length added to an index in elasticsearch and want to get search results only if the search query matches any of my document text completely.
E.g.
Let these be two documents added to index.
1.) {
name: "alpha beta"
}
2.) {
name: "gamma delta"
}
if the query strings are -
1.) "alpha beats beta"
2.) "alpha beats gammma"
than first query should return first document as all the tokens of document match as it is. But second query shouldn't return any document as no such documents exist whose all tokens are present in the query string as it is.
NOTE: Result should only be returned if all the tokens of the text filed in document are present in the query string.

You can tokenize the terms by space and write query as below.
{
"query": {
"bool": {
"should" : [
{"match_phrase" : { "name": "alpha"}},
{"match_phrase" : { "name": "beta"}}
],
"minimum_should_match" : 2
}
},
size:50
}
Here "minimum_should_match" : n is key, where n is number of terms.

Related

Atlas Search Index partial match

I have a test collection with these two documents:
{ _id: ObjectId("636ce11889a00c51cac27779"), sku: 'kw-lids-0009' }
{ _id: ObjectId("636ce14b89a00c51cac2777a"), sku: 'kw-fs66-gre' }
I've created a search index with this definition:
{
"analyzer": "lucene.standard",
"searchAnalyzer": "lucene.standard",
"mappings": {
"dynamic": false,
"fields": {
"sku": {
"type": "string"
}
}
}
}
If I run this aggregation:
[{
$search: {
index: 'test',
text: {
query: 'kw-fs',
path: 'sku'
}
}
}]
Why do I get 2 results? I only expected the one with sku: 'kw-fs66-gre' 😬
During indexing, the standard anlyzer breaks the string "kw-lids-0009" into 3 tokens [kw][lids][0009], and similarly tokenizes "kw-fs66-gre" as [kw][fs66][gre]. When you query for "kw-fs", the same analyzer tokenizes the query as [kw][fs], and so Lucene matches on both documents, as both have the [kw] token in the index.
To get the behavior you're looking for, you should index the sku field as type autocomplete and use the autocomplete operator in your $search stage instead of text
You're still getting 2 results because of the tokenization, i.e., you're still matching on [kw] in two documents. If you search for "fs66", you'll get a single match only. Results are scored based on relevance, they are not filtered. You can add {$project: {score: { $meta: "searchScore" }}} to your pipeline and see the difference in score between the matching documents.
If you are looking to get exact matches only, you can look to using the keyword analyzer or a custom analyzer that will strip the dashes, so you deal w/ a single token per field and not 3

Elasticsearch - boosting fields for multi match without specifying complete field list in query

I am trying to boost fields using multi match query without specifying complete field list but I cannot find out how to do it. I am searching through multiple indices on all fields, which I don't know at the run time, but I know which are the important ones.
For example I have index A with the fields 1,2,3,4 and index B with fields 1,5,6,7,8. I need to search across both indexes through all fields with the boosting on field 1.
So far I got
GET A,B/_search
{
"query": {
"multi_match" : {
"query" : "somethingToSearch"
}
}
}
Which goes through all fields on both indices, but I would like to have something like this (boosting match on field 1 before the others)
GET A,B/_search
{
"query": {
"multi_match" : {
"query" : "somethingToSearch",
"fields" : ["1^5,*"]
}
}
}
Is there any way how to do it without using bool queries?

elasticsearch searching where nth character of a field matches the parameter

Elasticsearch index has json docs like this
{
id: ABC120379013
name: Matlo Jonipa
jobs: {nested data}
}
When defining index schema, I register id as keyword.
Now I need to write a query that can return all docs where 4th char in the id field value is digit 9.
# matched docs
id: ABC920379013,
id: Zxr900013000
...
I have two questions for this usecase.
1- Am I indexing it right when I set id as a keyword field? I feel I should be using some analyzer but I don't know which one.
2- Can someone guide me how to write a query to match nth char of a field?
I can use this regex to match a string whose 4th character is 9 but can I use it in an elasticsearch query?
/^...9.*$/
or
/^.{3}9.*$/
Below query would help your use case. I've used Script Query.
Also, yes you are doing it right, you need to make sure that the field id would be of type keyword. Note that keyword type doesn't make use of analyzers.
POST regexindex/_search
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "doc['id'].value.indexOf('9') == 3",
"lang": "painless"
}
}
}
}
}
}
I've mentioned .indexOf('9') == 3 because index starts from 0

Make a prefix query on whole filed in elastic search

Hi I am having a field called text_field in which i have two document
1.lubricant
2.air lube
I have used Edge-N gram analyzer with term query but in result when i serch with lub
Terms query over filed analyzed with edge n-gram analyzer
{
"terms" : {
"text_field" : [ "lub" ]
}
}
prefix query over filed analyzed with keyword tokenizer:
{
"prefix" : {
"text_field" : {
"prefix" : "lub"
}
}
}
In both these queries m getting two results in result set
"lubricant",
"air lube"
I don't want air lube to be in result as it starts with word air,is there any way to make a search prefix query on whole field,looks like here it's checking terms,is there any way to sort this out.

Elasticsearch bool search matching incorrectly

So I have an object with an Id field which is populated by a Guid. I'm doing an elasticsearch query with a "Must" clause to match a specific Id in that field. The issue is that elasticsearch is returning a result which does not match the Guid I'm providing exactly. I have noticed that the Guid I'm providing and one of the results that Elasticsearch is returning share the same digits in one particular part of the Guid.
Here is my query source (I'm using the Elasticsearch head console):
{
query:
{
bool:
{
must: [
{
text:
{
couchbaseDocument.doc.Id: 5cd1cde9-1adc-4886-a463-7c8fa7966f26
}
}]
must_not: [ ]
should: [ ]
}
}
from: 0
size: 10
sort: [ ]
facets: { }
}
And it is returning two results. One with ID of
5cd1cde9-1adc-4886-a463-7c8fa7966f26
and the other with ID of
34de3d35-5a27-4886-95e8-a2d6dcf253c2
As you can see, they both share the same middle term "-4886-". However, I would expect this query to only return a record if the record were an exact match, not a partial match. What am I doing wrong here?
The query is (probably) correct.
What you're almost certainly seeing is the work of the 'Standard Analyzer` which is used by default at index-time. This Analyzer will tokenize the input (split it into terms) on hyphen ('-') among other characters. That's why a match is found.
To remedy this, you want to set your couchbaseDocument.doc.Id field to not_analyzed
See: How to not-analyze in ElasticSearch? and the links from there into the official docs.
Mapping would be something like:
{
"yourType" : {
"properties" : {
"couchbaseDocument.doc.Id" : {"type" : "string", "index" : "not_analyzed"},
}
}
}

Resources