Elastic search how to query the results for the keyword exists in the given fields - elasticsearch

I have a email elastic search db created uses following mappings for email sender and receipients:
"mappings": {
...
"recipients": {
"type": "keyword"
},
"sender": {
"type": "keyword"
},
...
I am given a list of emails and I try to query the emails if the any of the email is either the sender OR recipient. For example, I try to use following query:
{
"query": {
"multi_match" : {
"query": "abc#apple.com defg#samsung.com",
"operator": "OR",
"fields": [ "recipients", "sender" ],
"type": "cross_fields"
}
}
}
to query the emails if (abc#apple.com exists in the sender or receipient) OR (defg#samsung.com exists in the sender or receipient). But it doesn't return any result.. (But it do exists)
Does anyone know how to query the emails if any of the email in sender or receipient?
Thanks

It's good that you have found the solution, but understanding why multi_match didn't work and why query_string worked, and why you should avoid the query_string if possible important.
As mentioned, in the official Elasticsearch documentation,
Also, your multi_match query didn't work as you provided the two mails input in the same query like abc#apple.com defg#samsung.com and this term is analyzed depending on the fields analyzer(keyword in your example), So, it would try to find abc#apple.com defg#samsung.com in your fields, not abc#apple.com or defg#samsung.com.
If you want to use the multi_match, right query would be
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "abc#apple.com",
"operator": "OR",
"fields": [
"recipients",
"sender"
],
"type": "cross_fields"
}
},
{
"multi_match": {
"query": "defg#samsung.com",
"operator": "OR",
"fields": [
"recipients",
"sender"
],
"type": "cross_fields"
}
}
]
}
}
}
which returns below documents.
"hits": [
{
"_index": "71367024",
"_id": "1",
"_score": 0.6931471,
"_source": {
"recipients": "abc#apple.com",
"sender": "foo#bar.com"
}
},
{
"_index": "71367024",
"_id": "2",
"_score": 0.6931471,
"_source": {
"recipients": "defg#samsung.com",
"sender": "baz#bar.com"
}
}
]

I think I may find the answer. Using the following query will work:
{
"query": {
"query_string" : {
"query": "abc#apple.com OR defg#samsung.com",
"fields": [ "recipients", "sender" ]
}
}

Related

add fuzziness to elasticsearch query

I have a query for an autocomplete/suggestions index that looks like this:
{
"size": 10,
"query": {
"multi_match": {
"query": "'"+search_text+"'",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
This query works exactly as I want it to. However I want to add fuzziness:"AUTO" to this query. I read the documentation and tried adding it like this:
{
"size": 10,
"query": {
"multi_match": {
"query": {
"fuzzy": {
"value": "'"+search_text+"'",
"fuzziness": "AUTO"
}
},
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
But I get a this error
```
"type": "parsing_exception",
"reason": "[multi_match] unknown token [START_OBJECT] after [query]",```
This is causing my query not to work.
There is no need to add a fuzzy query. To add fuzziness to a multi-match query you need to add the fuzziness property as described here :
Since you are using bool_prefix as the type of multi-match query, so it creates a match_bool_prefix on each field that analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"serviceTitle": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"services": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"company_name":"sequencing how shingles are actually used"
}
Search Query:
{
"size": 10,
"query": {
"multi_match": {
"query": "sequensing how shingles",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
],
"fuzziness":"auto"
}
}
}
Search Result:
"hits": [
{
"_index": "65153201",
"_type": "_doc",
"_id": "1",
"_score": 1.5465959,
"_source": {
"company_name": "sequencing how shingles are actually used"
}
}
]
If you want to query sequensing, and get the above document, then you need to change the type of multi-match from bool_prefix to another type according to your use case.

Returning documents that match multiple wildcard string queries

I'm new to Elasticsearch and would greatly appreciate help on this
In the query below I only want the first document to be returned, but instead both documents are returned. How can I write a query to search for two wildcard strings on two separate fields, but only return documents that match?
I think what's being returned currently is score dependent, but I don't need the score.
POST /pr/_doc/1
{
"type": "Type ONE",
"currency":"USD"
}
POST /pr/_doc/2
{
"type": "Type TWO",
"currency":"USD"
}
GET /pr/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Type ON*",
"fields": ["type"],
"analyze_wildcard": true
}
},
{
"simple_query_string": {
"query": "US*",
"fields": ["currency"],
"analyze_wildcard":true
}
}
]
}
}
}
Use below query which uses the default_operator: AND and query string for in depth information and further reading.
Search query
{
"query": {
"query_string": {
"query": "(Type ON*) AND (US*)",
"fields" : ["type", "currency"],
"default_operator" : "AND"
}
}
}
Index your sample docs and it returns your expected doc only:
"hits": [
{
"_index": "multiplequery",
"_type": "_doc",
"_id": "1",
"_score": 2.1823215,
"_source": {
"type": "Type ONE",
"currency": "USD"
}
}
]

Word and phrase search on multiple fields in ElasticSearch

I'd like to search documents using Python through ElasticSearch. I am looking for documents which contains word and/or phrase in any one of three fields.
GET /my_docs/_search
{
"query": {
"multi_match": {
"query": "Ford \"lone star\"",
"fields": [
"title",
"description",
"news_content"
],
"minimum_should_match": "-1",
"operator": "AND"
}
}
}
In the above query, I'd like to get documents whose title, description, or news_content contain "Ford" and "lone star" (as a phrase).
However, it seems that it does not consider "lone star" as a phrase. It returns documents with "Ford", "lone", and "star".
So, I was able to reproduce your issue and solved it using the REST API of Elasticsearch as I am not familiar with the python syntax and glad you provided your search query in JSON format, and I built my solution on top of it.
Index def
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"description" :{
"type" : "text"
},
"news_content" : {
"type" : "text"
}
}
}
}
Sample docs
{
"title" : "Ford",
"news_content" : "lone star", --> note this matches your criteria
"description" : "foo bar"
}
{
"title" : "Ford",
"news_content" : "lone",
"description" : "star"
}
Search query you are looking for
{
"query": {
"bool": {
"must": [ --> note this, both clause must match
{
"multi_match": {
"query": "ford",
"fields": [
"title",
"description",
"news_content"
]
}
},
{
"multi_match": {
"query": "lone star",
"fields": [
"title",
"description",
"news_content"
],
"type": "phrase" --> note `lone star` must be phrase
}
}
]
}
}
}
Result contains just one doc from sample
"hits": [
{
"_index": "so_phrase",
"_type": "_doc",
"_id": "1",
"_score": 0.9527341,
"_source": {
"title": "Ford",
"news_content": "lone star",
"description": "foo bar"
}
}
]

elasticsearch: How to rank first appearing words or phrases higher

For example, if I have the following documents:
1. Casa Road
2. Jalan Casa
Say my query term is "cas"... on searching, both documents have same scores. I want the one with casa appearing earlier (i.e. document 1 here) and to rank first in my query output.
I am using an edgeNGram Analyzer. Also I am using aggregations so I cannot use the normal sorting that happens after querying.
You can use the Bool Query to boost the items that start with the search query:
{
"bool" : {
"must" : {
"match" : { "name" : "cas" }
},
"should": {
"prefix" : { "name" : "cas" }
},
}
}
I'm assuming the values you gave is in the name field, and that that field is not analyzed. If it is analyzed, maybe look at this answer for more ideas.
The way it works is:
Both documents will match the query in the must clause, and will receive the same score for that. A document won't be included if it doesn't match the must query.
Only the document with the term starting with cas will match the query in the should clause, causing it to receive a higher score. A document won't be excluded if it doesn't match the should query.
This might be a bit more involved, but it should work.
Basically, you need the position of the term within the text itself and, also, the number of terms from the text. The actual scoring is computed using scripts, so you need to enable dynamic scripting in elasticsearch.yml config file:
script.engine.groovy.inline.search: on
This is what you need:
a mapping that is using term_vector set to with_positions, and edgeNGram and a sub-field of type token_count:
PUT /test
{
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions",
"index_analyzer": "edgengram_analyzer",
"search_analyzer": "keyword",
"fields": {
"word_count": {
"type": "token_count",
"store": "yes",
"analyzer": "standard"
}
}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"name_ngrams": {
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "30"
}
},
"analyzer": {
"edgengram_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"name_ngrams"
],
"tokenizer": "standard"
}
}
}
}
}
test documents:
POST /test/test/1
{"text":"Casa Road"}
POST /test/test/2
{"text":"Jalan Casa"}
the query itself:
GET /test/test/_search
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"term": {
"text": {
"value": "cas"
}
}
},
"script_score": {
"script": "termInfo=_index['text'].get('cas',_POSITIONS);wordCount=doc['text.word_count'].value;if (termInfo) {for(pos in termInfo){return (wordCount-pos.position)/wordCount}};"
},
"boost_mode": "sum"
}
}
]
}
}
}
and the results:
"hits": {
"total": 2,
"max_score": 1.3715843,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 1.3715843,
"_source": {
"text": "Casa Road"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.8715843,
"_source": {
"text": "Jalan Casa"
}
}
]
}

Elasticsearch OR filtered query does not return results

I have the following data set:
{
"_index": "myIndex",
"_type": "myType",
"_id": "220005",
"_score": 1,
"_source": {
"id": "220005",
"name": "Some Name",
"type": "myDataType",
"doc_as_upsert": true
}
}
Doing a direct match query like so:
GET typo3data/destination/_search
{
"query": {
"match": {
"name": "Some Name"
}
},
"size": 500
}
Will return the data just fine:
"hits": {
"total": 1,
"max_score": 3.442347,
"hits": [...
Doing an OR-query however (I am not sure which syntax is correct, the first syntax is taken from elasticsearch docs, the second is a working query taken from another project with the same versions):
GET typo3data/destination/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"or": {
"filters": [
{
"term": {
"name": "Some Name"
}
}
]
}
}
}
},
"size": 500
}
or
{
"query":
{
"match_all": {}
},
"filter":
{
"or":
[
{ "term": { "name": "Some Name"} },
{ "term": { "name": "Some Other Name"} }
]
},
"size": 1000
}
Does not return anything.
The mapping for the name field is:
"name": {
"type": "string",
"index": "not_analyzed"
}
Elasticsearch version is 1.4.4.
When indexing "some name" , this is broken into tokens as follows -
"some name" => [ "some" , "name" ]
Now in a normal match query , it also does the same above process before matching result. If either "same" or "name" is present , that document is qualified as result
match query ("some name") => search for term "some" or "name"
The term query does not analyze or tokenize your query. This means that it looks for a exact token or term of "some name" which is not present.
term query ("some name") => search for term "some name"
Hence you wont be seeing any result.
Things should work fine if you make the field not_analyzed , but then make sure the case is also matching,
You can read more about the same here.
After extending our mapping to include every field we have:
PUT typo3data/_mapping/destination
{
"someType": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"parentId": {
"type": "integer"
},
"type": {
"type": "string"
},
"generatedUid": {
"type": "integer"
}
}
}
}
The or-filters were working. So the general answer is: If you have such a problem, check your mappings closely and rather do too much work on them than too little.
If someone has an explanation why this might be happening, I will gladly pass the answer mark on to it.

Resources