Difference between match vs wild card query - elasticsearch

What is the difference between the Match and Wild card query? If the requirement is to search a combination of words in a paragraph or log which approach is better?

Match query is used to find all those documents that have the exact search term (ignore the case), whereas Wildcard query returns the documents that contain the search term.
Adding a working example
Index Data:
{
"name":"breadsticks with soup"
}
{
"name":"multi grain bread"
}
Search Query using Match query:
{
"query": {
"match": {
"name": "bread"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"name": "multi grain bread"
}
}
]
Search Query using wildcard query:
{
"query": {
"wildcard": {
"name": "*bread*"
}
}
}
Search Result will be
"hits": [
{
"_index": "67706115",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "multi grain bread"
}
},
{
"_index": "67706115",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "breadsticks with soup"
}
}
]

Related

Elastic search negate phrase and words in simple query string

I'm trying to negate some words and phrases in an Elastic Search request using the simple query string.
This is what I do:
&q=-"the witcher 3"-game-novel
So basically, trying to negate a phrase AND the words after it. But that doesn't seem to work.
If I try to negate the words alone it works.
How can I negate phrases and sentences in a simple query string?
Adding a working example with index data,search query, and search result.
Index Data:
{
"name":"test"
}
{
"name":"game"
}
{
"name":"the witcher"
}
{
"name":"the witcher 3"
}
{
"name":"the"
}
Search Query:
{
"query": {
"simple_query_string" : {
"query": "-(game | novel) -(the witcher 3)",
"fields": ["name"],
"default_operator": "and"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "4",
"_score": 2.0,
"_source": {
"name": "the"
}
},
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"name": "the witcher"
}
},
{
"_index": "stof_64133051",
"_type": "_doc",
"_id": "1",
"_score": 2.0,
"_source": {
"name": "test"
}
}
]

elasticsearch query for finding id in fields in json file

I have a json file that I indexed on elasticsearch and I need a query to retrieve "_id_osm". can you help me plz.
and this is one line of my json file:
{
"index": {
"_index": "pariss",
"_type": "sig",
"_id": 1
}
}{
"fields": {
"_id_osm": 416747747,
"_categorie": "",
"_name": [
""
],
"_location": [
36.1941834,
5.3595221
]
}
}
Based on the comments in the answer updated the answer,
If you have store true in your mapping for _id_osm then you can use below query to fetch the field value.
{
"stored_fields" : ["_id_osm"],
"query": {
"match": {
"_id": 1
}
}
}
Above call returns below response and you can notice the fields section in the response which contains the field name and value.
"hits": [
{
"_index": "intqu",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"_id_osm": [
416747747
]
}
}
]
If you don't have store true which is default, then use _source filtering to get the data.
{
"_source": [ "_id_osm" ],
"query": {
"match": {
"_id": 1
}
}
}
which returns below response, you can see _source has the data.
"hits": [
{
"_index": "intqu",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"_id_osm": 416747747
}
}
]

Elastic search - query-string - return result based on custom order

Below search query result provides data based in an order when the search keywords are more than one.
{
"query": {
"query_string" : {
"query" : "(Sony Music) OR (Sony Music*) OR (*Sony Music) OR (*Sony Music*)",
"fields" : ["MDMGlobalData.Name1"]
}
}
}
Exact Matches first.
Then, show those that start with search term.
Then, show those that end with search term.
Then, show the remainder.
But if its just one word, say sony in query data. The order is messed up.
Someone please let me why this is happening? and what's the best approach to have above ordered results using query-string search.
When you only query sony, it should have the lowest score. Is that not what you expect? By default, the query string does seem to take into consideration the order of the OR clauses so I'd say yours is already pretty optimized.
Have you tried tinkering w/ the default_operator option?
Also, what do you mean by sony "being in the query data"? The query string itself or a document whose field MDMGlobalData.Name1 is sony?
But if its just one word, say sony in query data. The order is messed
up.
Based on your above statement and the comment which you mentioned in the above answer
Adding Working example with sample docs, and search query
Index Sample Data:
{
"MDMGlobalData":{
"name":"Sony Music"
}
}
{
"MDMGlobalData":{
"name":"Sony Music Corp"
}
}
{
"MDMGlobalData":{
"name":"All Sony Music Corp"
}
}
{
"MDMGlobalData":{
"name":"Sony"
}
}
Search Query:
{
"query": {
"query_string": {
"query": "(Sony) OR (Sony*) OR (*Sony) OR (*Sony*)",
"fields": [
"MDMGlobalData.name"
]
}
}
}
Search Result:
"hits": [
{
"_index": "foo1",
"_type": "_doc",
"_id": "4",
"_score": 3.1396344,
"_source": {
"MDMGlobalData": {
"name": "Sony"
}
}
},
{
"_index": "foo1",
"_type": "_doc",
"_id": "1",
"_score": 3.114749,
"_source": {
"MDMGlobalData": {
"name": "Sony Music"
}
}
},
{
"_index": "foo1",
"_type": "_doc",
"_id": "2",
"_score": 3.097392,
"_source": {
"MDMGlobalData": {
"name": "Sony Music Corp"
}
}
},
{
"_index": "foo1",
"_type": "_doc",
"_id": "3",
"_score": 3.084596,
"_source": {
"MDMGlobalData": {
"name": "All Sony Music Corp"
}
}
}
]
As you can see the order is still maintained, Sony is having maximum score (as it should be according to the query taken) and then further scoring is done on the basis of the order of the OR clauses.

elasticsearch : merge 2 fields from different types in search result

Do you know if we can merge dynamically 2 fields which belong to different types in one unique field
I have an index my_index with 2 types type1 and type2
I am doing a search on those 2 types :
POST /my_index/_search
{
"min_score": 1,
"query": {
"bool": {
"should": [
{
"match": {
"titreType1": {
"query": "boy"
}
}
},
{
"match": {
"titreType2": {
"query": "boy"
}
}
}
]
}
}
}
I will have results from the 2 different types that looks like to:
"hits": [
{
"_index": "my_index",
"_type": "type1",
"_id": "AVo0LhFj8N13TOVDqMo9",
"_score": 13.171456,
"_source": {
"titreType1": "the boy !"
}
},
{
"_index": "my_index",
"_type": "type1",
"_id": "AVo0Lg5X8N13TOVDqMUH",
"_score": 12.986091,
"_source": {
"titreType1": "if i were a boy"
}
},
{
"_index": "my_index",
"_type": "type2",
"_id": "AVo0S-nM8N13TOVDqNPX",
"_score": 12.34135,
"_source": {
"titreType2": "boy are very nasty and it is sad"
}
},
...
]
I would like to have in my result just one column named "title" that display value from titreType1 or titreType2
Do you know how to do this?

Is it possible to perform user count / cardinality with logical relationship in ElasticSearch?

I have documents of Users with the following format:
{
userId: "<userId>",
userAttributes: [
"<Attribute1>",
"<Attribute2>",
...
"<AttributeN>"
]
}
I want to be able to get the number of unique users that answer a logic statement, for example How many users have attribute1 AND attribute2 OR attribute3?
I've read about the cardinality function in cardinality-aggregation but it seems to work for a single value, lacking the logic abilities of "AND" and "OR".
Note that I have around 1,000,000,000 documents and I need the results as fast as possible, this why I was looking at the cardinality estimation.
What about this attempt, considering the userAttributes as a simple array of strings (analyzed in my case, but single lowercase terms):
POST /users/user/_bulk
{"index":{"_id":1}}
{"userId":123,"userAttributes":["xxx","yyy","zzz"]}
{"index":{"_id":2}}
{"userId":234,"userAttributes":["xxx","yyy","aaa"]}
{"index":{"_id":3}}
{"userId":345,"userAttributes":["xxx","yyy","bbb"]}
{"index":{"_id":4}}
{"userId":456,"userAttributes":["xxx","ccc","zzz"]}
{"index":{"_id":5}}
{"userId":567,"userAttributes":["xxx","ddd","ooo"]}
GET /users/user/_search
{
"query": {
"query_string": {
"query": "userAttributes:(((xxx AND yyy) NOT zzz) OR ooo)"
}
},
"aggs": {
"unique_ids": {
"cardinality": {
"field": "userId"
}
}
}
}
which gives the following:
"hits": [
{
"_index": "users",
"_type": "user",
"_id": "2",
"_score": 0.16471066,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"aaa"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "3",
"_score": 0.04318809,
"_source": {
"userAttributes": [
"xxx",
"yyy",
"bbb"
]
}
},
{
"_index": "users",
"_type": "user",
"_id": "5",
"_score": 0.021594046,
"_source": {
"userAttributes": [
"xxx",
"ddd",
"ooo"
]
}
}
]

Resources