Elasticsearch how to use multi_match with wildcard - elasticsearch

I have a User object with properties Name and Surname. I want to search these fields using one query, and I found multi_match in the documentation, but I don't know how to properly use that with a wildcard. Is it possible?
I tried with a multi_match query but it didn't work:
{
"query": {
"multi_match": {
"query": "*mar*",
"fields": [
"user.name",
"user.surname"
]
}
}
}

Alternatively you could use a query_string query with wildcards.
"query": {
"query_string": {
"query": "*mar*",
"fields": ["user.name", "user.surname"]
}
}
This will be slower than using an nGram filter at index-time (see my other answer), but if you are looking for a quick and dirty solution...
Also I am not sure about your mapping, but if you are using user.name instead of name your mapping needs to look like this:
"your_type_name_here": {
"properties": {
"user": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"surname": {
"type": "string"
}
}
}
}
}

Such a query worked for me:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{"query": {"wildcard": {"user.name": {"value": "*mar*"}}}},
{"query": {"wildcard": {"user.surname": {"value": "*mar*"}}}}
]
}
}
}
}
}
Similar to what you are doing, except that in my case there could be different masks for different fields.

I just did this now:
GET _search {
"query": {
"bool": {
"must": [
{
"range": {
"theDate": {
"gte": "2014-01-01",
"lte": "2014-12-31"
}
}
},
{
"match" : {
"Country": "USA"
}
}
],
"should": [
{
"wildcard" : { "Id_A" : "0*" }
},
{
"wildcard" : { "Id_B" : "0*" }
}
],"minimum_number_should_match": 1
}
}
}

Similar to suggestion above, but this is simple and worked for me:
{
"query": {
"bool": {
"must":
[
{
"wildcard" : { "processname.keyword" : "*system*" }
},
{
"wildcard" : { "username" : "*admin*" }
},
{
"wildcard" : { "device_name" : "*10*" }
}
]
}
}
}

I would not use wildcards, it will not scale well. You are asking a lot of the search engine at query time. You can use the nGram filter, to do the processing at index-time not search time.
See this discussion on the nGram filter.
After indexing the name and surname correctly (change your mapping, there are examples in the above link) you can use multi-match but without wildcards and get the expected results.

description: {
type: 'keyword',
normalizer: 'useLowercase',
},
product: {
type: 'object',
properties: {
name: {
type: 'keyword',
normalizer: 'useLowercase',
},
},
},
activity: {
type: 'object',
properties: {
name: {
type: 'keyword',
normalizer: 'useLowercase',
},
},
},
query:
query: {
bool: {
must: [
{
bool: {
should: [
{
wildcard: {
description: {
value: `*${value ? value : ''}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
{
wildcard: {
'product.name': {
value: `*${value ? value : ''}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
{
wildcard: {
'activity.name': {
value: `*${value ? value : ''}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
],
},
},
{
match: {
recordStatus: RecordStatus.Active,
},
},
{
bool: {
must_not: [
{
term: {
'user.id': req.currentUser?.id,
},
},
],
},
},
{
bool: {
should: tags
? tags.map((name: string) => {
return {
nested: {
path: 'tags',
query: {
match: {
'tags.name': name,
},
},
},
};
})
: [],
},
},
],
filter: {
bool: {
must_not: {
terms: {
id: existingIds ? existingIds : [],
},
},
},
},
},
},
sort: [
{
updatedAt: {
order: 'desc',
},
},
],

Related

ElasticSearch 6.8 doesn't order by exact matches first

I've been searching for this kind of issue for some days and I didn't make it work. I followed steps like this and this but no success.
So basically, I have the following data on ElasticSearch:
{ title: "Black Dust" },
{ title: "Dust In The Wind" },
{ title: "Gold Dust Woman" },
{ title: "Another One Bites The Dust" }
and the problem is that I want to search by "Dust" word and I want the results be ordered like:
{ title: "Dust In The Wind" },
{ title: "Black Dust" },
{ title: "Gold Dust Woman" },
{ title: "Another One Bites The Dust" }
where "Dust" must appear at the top of the result instead.
Posting the mappings and query would be better than continue explaining the issue itself.
settings: {
analysis: {
normalizer: {
lowercase: {
type: 'custom',
filter: ['lowercase']
}
}
}
},
mappings: {
_doc: {
properties: {
title: {
type: 'text',
analyzer: 'standard',
fields: {
raw: {
type: 'keyword',
normalizer: 'lowercase'
},
fuzzy: {
type: 'text',
},
},
}
}
}
}
and my query is:
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"title"
],
"default_operator": "AND",
"query": "dust"
}
},
"should": {
"prefix": {
"title.raw": "dust"
}
}
}
}
Can anyone please help me in this?
Thank you!
SOLUTION!
I figured it out and I solved by performing the following query:
"query": {
"bool": {
"must": {
"bool": {
"should": [
{
"prefix": {
"title.raw": {
"value": "dust",
"boost": 1000000
}
}
},
{
"match": {
"title": {
"query": "dust",
"boost": 50000
}
}
},
{
"match": {
"title": {
"query": "dust",
"boost": 10,
"fuzziness": 1
}
}
}
]
}
}
}
}
However, while writing tests, I found a little issue.
So, I'm generating a random uuid and adding to database the following:
{ title: `${uuid} A` }
{ title: `${uuid} W` }
{ title: `${uuid} Z` }
{ title: `A ${uuid}` }
{ title: `z ${uuid}` }
{ title: `Z ${uuid}` }
When I perform the query above looking for the uuid, I get:
uuid Z
uuid A
uuid W
Z uuid
I achieved my first goal that was having the uuid on first position, but why Z is before A? (first and second result)
When everything else fails you can use a trivial substring position sort like so:
{
"query": {
"bool": {
"must": {
...
},
"should": {
...
}
}
},
"sort": [
{
"_script": {
"script": "return doc['title.raw'].value.indexOf('dust')",
"type": "number",
"order": "asc" <--
}
}
]
}
I've set the order to asc because the lower the substring index, the higher the 'score'.
EDIT
We've gotta account for index == -1 so replace the script above with:
"script": "def pos = doc['title.raw'].value.indexOf('dust'); return pos == -1 ? Integer.MAX_VALUE : pos"

elasticsearch nested query returns only last 3 results

We have the following elasticsearch mapping
{
index: 'data',
body: {
settings: {
analysis: {
analyzer: {
lowerCase: {
tokenizer: 'whitespace',
filter: ['lowercase']
}
}
}
},
mappings: {
// used for _all field
_default_: {
index_analyzer: 'lowerCase'
},
entry: {
properties: {
id: { type: 'string', analyzer: 'lowerCase' },
type: { type: 'string', analyzer: 'lowerCase' },
name: { type: 'string', analyzer: 'lowerCase' },
blobIds: {
type: 'nested',
properties: {
id: { type: 'string' },
filename: { type: 'string', analyzer: 'lowerCase' }
}
}
}
}
}
}
}
and a sample document that looks like the following:
{
"id":"5f02e9dae252732912749e13",
"type":"test_type",
"name":"test_name",
"creationTimestamp":"2020-07-06T09:07:38.775Z",
"blobIds":[
{
"id":"5f02e9dbe252732912749e18",
"filename":"test1.csv"
},
{
"id":"5f02e9dbe252732912749e1c",
"filename":"test2.txt"
},
// removed in-between elements for simplicity
{
"id":"5f02e9dbe252732912749e1e",
"filename":"test3.csv"
},
{
"id":"5f02e9dbe252732912749e58",
"filename":"test4.txt"
},
{
"id":"5f02e9dbe252732912749e5a",
"filename":"test5.csv"
},
{
"id":"5f02e9dbe252732912749e5d",
"filename":"test6.txt"
}
]
}
I have the following ES query which is querying documents for a certain timerange based on the creationTimestamp field and then filtering the nested field blobIds based on a user query, that should match the blobIds.filename field.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"creationTimestamp": {
"gte": "2020-07-01T09:07:38.775Z",
"lte": "2020-07-07T09:07:40.147Z"
}
}
},
{
"nested": {
"path": [
"blobIds"
],
"query": {
"query_string": {
"fields": [
"blobIds.filename"
],
"query": "*"
}
},
// returns the actual blobId hit
// and not the whole array
"inner_hits": {}
}
},
{
"query": {
"query_string": {
"query": "+type:*test_type* +name:*test_name*"
}
}
}
]
}
}
}
},
"sort": [
{
"creationTimestamp": {
"order": "asc"
},
"id": {
"order": "asc"
}
}
]
}
The above entry is clearly matching the query. However, it seems like there is something wrong with the returned inner_hits, since I always get only the last 3 blobIds elements instead of the whole array that contains 24 elements, as can be seen below.
{
"name": "test_name",
"creationTimestamp": "2020-07-06T09:07:38.775Z",
"id": "5f02e9dae252732912749e13",
"type": "test_type",
"blobIds": [
{
"id": "5f02e9dbe252732912749e5d",
"filename": "test4.txt"
},
{
"id": "5f02e9dbe252732912749e5a",
"filename": "test5.csv"
},
{
"id": "5f02e9dbe252732912749e58",
"filename": "test6.txt"
}
]
}
I find it very strange since I'm only doing a simple * query.
Using elasticsearch v1.7 and cannot update at the moment

Elasticsearch with nested objects query

I have an index with a nested mapping.
I want to preform a query that will return the following: give me all the documents where each word in the search term appears in one or more of the nested documents.
Here is the index:
properties: {
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
Here is the latest query I try:
nested: {
path: "column_values_index_as_objects",
query: {
bool: {
must: [
{
match: {
"column_values_index_as_objects.value.analyzed": {
query: search_term,
boost: 10 * boost_factor,
operator: "or",
analyzer: "searchkick_search"
}
}
}
For example if I search the words 'food and water', I want that each word will appear in at least on nested document.
The current search returns the document even if only one of the words exists
Thanks for the help!
Update:
As Cristoph suggested, the solution works. now I have the following problem.
Here is my index:
properties: {
name: {
type: "text"
},
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
And the query I want to preform is if I search for 'my name is guy', and will give all the documents where all the words are found - might be in the nested documents and might in the name field.
For example, I could have a document with the value 'guy' in the name field and other words in the nested documents
In order to do this, I usually split the terms and generate a request like this (foo:bar is an other criteria on an other field) :
{
"bool": {
"must": [
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "food",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "and",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "water",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"query": {
"term": {
"foo": "bar"
}
}
}
]
}
}

elasticsearch: combine text match and array contains

We have documents in elastic saved in the following structure:
{
...
name: "name",
ancestors: ["id1", "id2", "id3"],
...
}
I want to create a search query that searches for name: "some name" AND ancestors contains "id1".
I've tried many queries but none seems to work, or return the desired result. If it's of any help, this combined query should only return one entry every time.
Some of the queries I've tried are the following:
filtered: {
query: {
query_string: {
query: "name:name"
},
term: {
ancestors: "id1"
}
}
}
__
match: {
name: "name",
ancestors: "id1"
},
defaultOperator: 'AND'
__
bool: {
must: { term: { name: "name" }},
filter: {
term: { ancestors: "id1" }
}
}
The mappings are the following:
{
"data": {
"mappings": {
"entry": {
"properties": {
"ancestors": {
"type": "string"
},
"id": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
We haven't changed the default mappings, that's why ancestors is of type string, but I don't think this makes any difference
Try this query:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "name",
"query": "stripe AND 1"
}
},
{
"match": {
"ancestors": "id1"
}
}
]
}
}
}

Using a custom_score to sort by a nested child's timestamp

I'm pretty new to elasticsearch and have been banging my head trying to get this sorting to work. The general idea is to search email message threads with nested messages and nested participants. The goal is to display search results at the thread level, sorting by the participant who is doing the search and either the last_received_at or last_sent_at column depending on which mailbox they are in.
My understanding is that you can't sort by a single child's value among many nested children. So in order to do this I saw a couple of suggestions for using a custom_score with a script, then sorting on the score. My plan is to dynamically change the sort column and then run a nested custom_score query that will return the date of one of the participants as the score. I've been noticing some issues with both the score format being strange (eg. always has 4 zeros at the end) and it may not be returning the date that I was expecting.
Below are simplified versions of the index and the query in question. If anyone has any suggestions, I'd be very grateful. (FYI - I am using elasticsearch version 0.20.6.)
Index:
mappings: {
message_thread: {
properties: {
id: {
type: long
}
subject: {
dynamic: true
properties: {
id: {
type: long
}
name: {
type: string
}
}
}
participants: {
dynamic: true
properties: {
id: {
type: long
}
name: {
type: string
}
last_sent_at: {
format: dateOptionalTime
type: date
}
last_received_at: {
format: dateOptionalTime
type: date
}
}
}
messages: {
dynamic: true
properties: {
sender: {
dynamic: true
properties: {
id: {
type: long
}
}
}
id: {
type: long
}
body: {
type: string
}
created_at: {
format: dateOptionalTime
type: date
}
recipient: {
dynamic: true
properties: {
id: {
type: long
}
}
}
}
}
version: {
type: long
}
}
}
}
Query:
{
"query": {
"bool": {
"must": [
{
"term": { "participants.id": 3785 }
},
{
"custom_score": {
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"term": { "participants.id": 3785 }
}
}
},
"params": { "sort_column": "participants.last_received_at" },
"script": "doc[sort_column].value"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": { "messages.recipient.id": 3785 }
}
]
}
},
"sort": [ "_score" ]
}
Solution:
Thanks to #imotov, here is the final result. The participants were not properly nested in the index (while the messages didn't need to be). In addition, include_in_root was used for the participants to simplify the query (participants are small records and not a real size issue, although #imotov also provided an example without it). He then restructured the JSON request to use a dis_max query.
curl -XDELETE "localhost:9200/test-idx"
curl -XPUT "localhost:9200/test-idx" -d '{
"mappings": {
"message_thread": {
"properties": {
"id": {
"type": "long"
},
"messages": {
"properties": {
"body": {
"type": "string",
"analyzer": "standard"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
},
"id": {
"type": "long"
},
"recipient": {
"dynamic": "true",
"properties": {
"id": {
"type": "long"
}
}
},
"sender": {
"dynamic": "true",
"properties": {
"id": {
"type": "long"
}
}
}
}
},
"messages_count": {
"type": "long"
},
"participants": {
"type": "nested",
"include_in_root": true,
"properties": {
"id": {
"type": "long"
},
"last_received_at": {
"type": "date",
"format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
},
"last_sent_at": {
"type": "date",
"format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
},
"name": {
"type": "string",
"analyzer": "standard"
}
}
},
"subject": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "string"
}
}
}
}
}
}
}'
curl -XPUT "localhost:9200/test-idx/message_thread/1" -d '{
"id" : 1,
"subject" : {"name": "Test Thread"},
"participants" : [
{"id" : 87793, "name" : "John Smith", "last_received_at" : null, "last_sent_at" : "2010-10-27T17:26:58Z"},
{"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-27T17:26:58Z", "last_sent_at" : null}
],
"messages" : [{
"id" : 1,
"body" : "This is a test.",
"sender" : { "id" : 87793 },
"recipient" : { "id" : 3785},
"created_at" : "2010-10-27T17:26:58Z"
}]
}'
curl -XPUT "localhost:9200/test-idx/message_thread/2" -d '{
"id" : 2,
"subject" : {"name": "Elastic"},
"participants" : [
{"id" : 57834, "name" : "Paul Johnson", "last_received_at" : "2010-11-25T17:26:58Z", "last_sent_at" : "2010-10-25T17:26:58Z"},
{"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-25T17:26:58Z", "last_sent_at" : "2010-11-25T17:26:58Z"}
],
"messages" : [{
"id" : 2,
"body" : "More testing of elasticsearch.",
"sender" : { "id" : 57834 },
"recipient" : { "id" : 3785},
"created_at" : "2010-10-25T17:26:58Z"
},{
"id" : 3,
"body" : "Reply message.",
"sender" : { "id" : 3785 },
"recipient" : { "id" : 57834},
"created_at" : "2010-11-25T17:26:58Z"
}]
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
# Using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
"query": {
"filtered": {
"query": {
"nested": {
"path": "participants",
"score_mode": "max",
"query": {
"custom_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"participants.id": 3785
}
}
}
},
"params": {
"sort_column": "participants.last_received_at"
},
"script": "doc[sort_column].value"
}
}
}
},
"filter": {
"query": {
"multi_match": {
"query": "test",
"fields": ["subject.name", "participants.name", "messages.body"],
"operator": "and",
"use_dis_max": true
}
}
}
}
},
"sort": ["_score"],
"fields": []
}
'
# Not using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
"query": {
"filtered": {
"query": {
"nested": {
"path": "participants",
"score_mode": "max",
"query": {
"custom_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"participants.id": 3785
}
}
}
},
"params": {
"sort_column": "participants.last_received_at"
},
"script": "doc[sort_column].value"
}
}
}
},
"filter": {
"query": {
"bool": {
"should": [{
"match": {
"subject.name":"test"
}
}, {
"nested" : {
"path": "participants",
"query": {
"match": {
"name":"test"
}
}
}
}, {
"match": {
"messages.body":"test"
}
}
]
}
}
}
}
},
"sort": ["_score"],
"fields": []
}
'
There are a couple of issues here. You are asking about nested objects, but participants are not defined in your mapping as nested objects. The second possible issue is that score has type float, so it might not have enough precision to represent timestamp as is. If you can figure out how to fit this value into float, you can take a look at this example: Elastic search - tagging strength (nested/child document boosting). However, if you are developing a new system, it might be prudent to upgrade to 0.90.0.Beta1, which supports sorting on nested fields.

Resources