Elasticsearch | Match multiple phrases - elasticsearch

Im trying to create a query that will find all statuses that contain either #breakingbad OR "breaking bad"
here is what i have so far , but its obviously wrong according to sense:
{
"query": {
"match": {
"_all": {
"query": "breaking bad",
"type": "phrase"
}
},
"match": {
"_all": {
"query": "#breakingbad",
"type": "phrase"
}
}
}

ANSWER:
{
"query": {
"bool": {
"should": [
{
"match": {
"message": {
"query": "breaking bad",
"type": "phrase"
}
}
},
{
"match": {
"message": "#poznasty"
}
}
]
}
}
}

why not use multi_match
{
"query" : {
"multi_match" : {
"fields" : ["name", "description"],
"query" : "breaking bad",
"type" : "phrase_prefix"
}
}
}
MultiMatchQueryBuilder builder = QueryBuilders.multiMatchQuery(query,
"name", "description").type(MatchQueryBuilder.Type.PHRASE_PREFIX);

Related

Elasticsearch: Minimum_should_match donĀ“t return correctly

I'm new to the elastic universe and I have a question about a query. I'll try to describe it here:
I have a document called 'store' with several stores registered and within each store item a list of customers:
loja {
nome,
telefone,
email,
clientes : [
{
nomeCliente,
telefone,
email
}
]
}
I need a query where I would have to return at least 1 pair of customers from the same registered store
For example:
I research 'Ana Maria', 'Sandra Maria' and 'Alberto Braz', where I would need to return the stores that have [Ana Maria and Sandra Maria] or [Ana Maria and Alberto Braz] or [Sandra Maria and Alberto Braz].
I did the search according to the dsl below, but the minimum_should_match clause is not respecting the limit of 2 m and returning results with only 1 record found.
Am I doing something wrong in the query?
Could you help me out on this one?
Query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool":{
"should": {
"match": {
"clientes.nomeCliente" : {
"query" : "ANA MARIA",
"type" : "phrase",
"operator": "and",
"slop" : 40
}
}
},
"should": {
"match":{
"clientes.nomeCliente" : {
"query" : "SANDRA MARIA",
"type" : "phrase",
"operator": "and",
"slop" : 40
}
}
},
"should": {
"match":{
"clientes.nomeCliente" : {
"query" : "ALBERTO BRAZ",
"type" : "phrase",
"operator": "and",
"slop" : 40
}
}
}
},"minimum_should_match": 2
},
"path": "clientes",
"inner_hits" : {
"size" : 10
}
}
}
]
}
}
}
For the should you need to use an array instead of an object. So, you query need to be something like this :
{
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"should": [
{
"match": {
"clientes.nomeCliente": {
"query": "ANA MARIA",
"type": "phrase",
"operator": "and",
"slop": 40
}
}
},
{
"match": {
"clientes.nomeCliente": {
"query": "SANDRA MARIA",
"type": "phrase",
"operator": "and",
"slop": 40
}
}
},
{
"match": {
"clientes.nomeCliente": {
"query": "ALBERTO BRAZ",
"type": "phrase",
"operator": "and",
"slop": 40
}
}
}
],
"minimum_should_match": 2
}
},
"path": "clientes",
"inner_hits": {
"size": 10
}
}
}
]
}
}
}
I could not check the parameters of the match query because I don't have the mapping and sample data. But you can check that part with your index directly.

full-text and knn_vector hybrid search for elastic

I am currently working on a search engine and i've started to implement semantic search. I use open distro version of elastic and my mapping look like this for the moment :
{
"settings": {
"index": {
"knn": true,
"knn.space_type": "cosinesimil"
}
},
"mappings": {
"properties": {
"title": {
"type" : "text"
},
"data": {
"type" : "text"
},
"title_embeddings": {
"type": "knn_vector",
"dimension": 600
},
"data_embeddings": {
"type": "knn_vector",
"dimension": 600
}
}
}
}
for basic knn_vector search i use this :
{
"size": size,
"query": {
"script_score": {
"query": {
"match_all": { }
},
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
and i've managed to get a, kind of, hybrid search with this :
{
"size": size,
"query": {
"function_score": {
"query": {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"script_score": {
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
}
The problem is that if i don't have the word in the document, then it is not returned. For example, with the first search query, when i search for trump (which is not in my dataset) i manage to get document about social network and politic. I don't have these results with the hybrid search.
I have tried this :
{
"size": size,
"query": {
"function_score": {
"query": {
"match_all": { }
},
"functions": [
{
"filter" : {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"weight": 1
},
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 4
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
but the multi match part give a constant score to all documents that match and i want to use the filter to rank my document like in normal full text query. Any idea to do it ? Or should i use another strategy? Thank you in advance.
After the help of Archit Saxena here is the solution of my problems :
{
"size": size,
"query": {
"function_score": {
"query": {
"bool": {
"should" : [
{
"multi_match" : {
"query": query,
"fields": ["data", "title"]
}
},
{
"match_all": { }
}
],
"minimum_should_match" : 0
}
},
"functions": [
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 20
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}

Elasticseach wildcard query on nested types

I'm trying to run a wildcard query on a nested type in ElasticSearch. I have records with the following structure:
{
"field_1": "value_1",
"nested_field_1": [
{
"field_type": "some_field_type",
"field_value": "some_value"
},
{
"field_type": "another_field_type",
"field_value": "another_value"
}
]
}
I want to be able to run wildcard query on the nested_field, either on field_value or on field_type.
I can query for an exact match with this syntax:
"query": {
"nested": {
"path": "nested_field_1",
"query": {
"bool": {
"must": [
{
"match": {
"nested_field_1.field_value": "another_value"
}
}
]
}
}
}
}
}
But replacing the match with wildcard doesn't yield any results.
Any help would be welcome.
So I just tried your example and it gives me the result and used elasticsearch official wildcard query doc.
Index Def
{
"mappings": {
"properties": {
"field_1": {
"type": "text"
},
"nested_field_1" :{
"type" : "nested",
"properties" : {
"field_type" :{
"type" : "text"
},
"field_value" :{
"type" : "integer" --> created as interfere field
}
}
}
}
}
}
Index doc
{
"field_1": "value_1",
"nested_field_1": [
{
"field_type": "some_field_type",
"field_value": 20
},
{
"field_type": "another_field_type",
"field_value": 40
}
]
}
Wildcard search query
{
"query": {
"nested": {
"path": "nested_field_1",
"query": {
"bool": {
"must": [
{
"wildcard": { --> note
"nested_field_1.field_type": {
"value": "another_field_type"
}
}
}
]
}
}
}
}
}
Search result
"nested_field_1": [
{
"field_type": "some_field_type",
"field_value": 20
},
{
"field_type": "another_field_type",
"field_value": 40
}
]
}

Elasticsearch query with multiple conditions and time range

I'm trying to create a query to count instances where two conditions are met over the last day.
This query shows the count for the two conditions, but when I try to add a range in, it seems to match all documents:
GET logstash-*/_count
{
"query": {
"bool": {
"should": [
{
"match": {
"rawmsg": {
"query": "Could not send Message.",
"type": "phrase"
}
}
},
{
"match": {
"stack_trace": {
"query": "*WebServiceException*",
"type": "phrase"
}
}
}
]
}
}
}
Here's how I'm trying to add the date range:
GET logstash-*/_count
{
"query": {
"bool": {
"should": [
{
"match": {
"rawmsg": {
"query": "Could not send Message.",
"type": "phrase"
}
}
},
{
"match": {
"stack_trace": {
"query": "*WebServiceException*",
"type": "phrase"
}
}
},
{
"range" : {
"#timestamp" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
]
}
}
}
I ended up finding two ways of accomplishing what I needed:
GET logstash-*/tcp_input/_count?q=stack_trace: *WebServiceException* AND rawmsg: "Could not send Message" AND #timestamp: [ now-30d TO now ]
and
GET logstash-*/_count
{
"query": {
"query_string": {
"query": """stack_trace: *WebServiceException* AND rawmsg: "Could not send Message" AND #timestamp: [ now-3d TO now]""",
"analyze_wildcard": true
}
}
}
You are using a should clause instead of a must clause, which effectively combines your conditions with an OR instead of an AND.
Also the timestamp should be now/d-1d for the range query.

exact match query in elasticsearch

I'm trying to run an exact match query in ES
in MYSQL my query would be:
SELECT * WHERE `content_state`='active' AND `author`='bob' AND `title` != 'Beer';
I looked at the ES docs here:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
and came up with this:
{
"from" : '.$offset.', "size" : '.$limit.',
"filter": {
"and": [
{
"and": [
{
"term": {
"content_state": "active"
}
},
{
"term": {
"author": "bob"
}
},
{
"not": {
"filter": {
"term": {
"title": "Beer"
}
}
}
}
]
}
]
}
}
but my results are still coming back with the title = Beer, it doesn't seem to be excluding the titles that = Beer.
did I do something wrong?
I'm pretty new to ES
I figured it out, I used this instead...
{
"from" : '.$offset.', "size" : '.$limit.',
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "content_state",
"query": "active"
}
},
{
"query_string": {
"default_field": "author",
"query": "bob"
}
}
],
"must_not": [
{
"query_string": {
"default_field": "title",
"query": "Beer"
}
}
]
}
}
}
Query String Query is a pretty good concept to handle various relationship between search criteria. Have a quick look into Query string query syntax to understand in detail about this concept
{
"query": {
"query_string": {
"query": "(content_state:active AND author:bob) AND NOT (title:Beer)"
}
}
}
Filters are supposed to work on exact values, if you had defined your mapping in a manner where title was a non-analyzed field, your previous attempt ( with filters) would have worked as well.
{
"mappings": {
"test": {
"_all": {
"enabled": false
},
"properties": {
"content_state": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}

Resources