Elasticseatch exact match or partial match - elasticsearch

My idea is to create exact / partial match. So, If exact match is found in any of the fields, return only it. If no exact match, return all partial matches. To Illustrate, I will give example on emails.
I use standart analyzer and map emails as text.
To give an example, we have following emails indexed:
one#gmail.com
two#gmail.com
three#gmail.com
four#hotmail.com
If query string is "one#gmail.com" only one#gmail.com should be returned.
If query string is "gmail.com" all gmail emails should be returned.
I currently have to queries:
{"query": {
"multi_match":{
"query":"one#gmail.com",
"fields":["email", "other_fileds"],
"type":"phrase"
}
This one will find exact matches when standart analyzer and text mapping is used.
{"query": {
"query_string":{
"query":"gmail.com",
}
This one will find all partial matches.
I need to somehow combine them with OR.
So far I followed suggestion from the comment and do the following:
Firstly, i search using phrase matching
{"query":
{"query_string":
{
"query":"one#gmail.com",
"type":"phrase"
}
}
}
If there are NO hits, I do a second search:
{"query":
{"query_string":
{
"query":"one#gmail.com",
}
}
}
This tends to work well in the majority of cases. But it fails the following case, when search query contains a lot of data from different fields. For example "one#gmail.com Alex 095111111".

Related

Kibana Regex check if a field Value contains another field value

I'm trying to search for documents in which a description field contains the value of a name field (from another document). I tried to do a Regex query as following :
GET inventory-full-index/_search
{
"query": {
"regexp": {
"description.description_data.value.keyword": ".*doc['name.keyword'].*"
}
}
}
It returns me interesting documents, that fit my need. The problem is that i created a document that contains "python3" in the description, and I made sure there was a document named "python3" as well. This query doesn't return this document, so i obviously missed something.
Any idea how to fix this ?

Query to see if a field contains a string using Query DSL

I am trying to filter Kibana for a field that contains the string "pH". The field is called extra.monitor_value_name. Examples of potential values are Temperature_ABC01, DO_ABC01, or pH_ABC01.
Kibana's Elasticsearch Query DSL does not seem to have a "contains string" so I need to custom make a query.
I am new to Query DSL, can you help me create the query?
Also, is it proper to call it Query DSL? I'm not even sure of proper wording.
Okay! Circling back with an answer to my own question.
My initial problem stemmed from not knowing about field_name vs field_name.keyword. Read here for info on keyword here: What's the difference between the 'field' and 'field.keyword' fields in Kibana?
Solution 1
Here's the query I ended up using. I used a regexp query. I found this article useful in figuring out syntax for the regexp:
{
"query": {
"regexp": {
"extra.monitor_value_name.keyword": "pH.*"
}
}
}
Solution 2
Another way I could have filtered, without Query DSL was typing in a search field: extra.monitor_value_name.keyword:pH*.
One interesting thing to note was the .keyword doesn't seem to be necessary with this method. I am not sure why.
try this in filter using Elasticsearch Query DSL:
{
"query": {
"wildcard": {
"extra.monitor_value_name": {
"value": "pH.*"
}
}
}
}

Not able to understand this Elasticsearch query

{
"query": {
"nested": {
"path": "product_vendors",
"query": {
"bool" :{
"must" : {
"bool" : {
"should" : [
{ "terms": {"product_vendors.manufacturer_style":["FSS235D-26","SG463-1128-5","SG463-2879-4"]}},
{ "terms": {"product_vendors.id":["71320"]}}
]
}
}
}
}
}
}
}
I have above elastic query, not able to understand this. Would anyone please explain what it means and what documents it will return?
Update : #christinabo , i tried your query , and results returned , but here some small issues , apart from the matched documents , two more additional documents are returning in those documents only vendor_id is matching , may i know why two extra unmatched documents are returning , do we need to some attribute or something to make sure strict search and return is allowed , can please suggest on this .
By observing the query, I can understand that there is a nested object in the data. I can imagine that it has this structure:
product_vendors: {
'id': 'the_id',
'manufacturer_style': 'some style'
}
In order to query a nested object, you need a nested query. This is why you have the nested keyword there. In a nested query, you need to specify the path (product_vendors) that leads to the embedded fields (id, manufacturer_style).
Then, the query defines a bool query with the must keyword, which means that the query which follows must appear in matching documents. In this case, what it must appear is another bool query, defined with the should keyword. This contains two terms sub-queries (one for manufacturer_style and one for id) and means that the matching documents should match one or two of them. Each sub-query queries the embedded field by specifying the whole route of the nested object, using the dot (i.e. product_vendors.manufacturer_style).
I would expect the query to return you the documents that match at least one of the terms queries, with the documents that match both to have higher score.
I hope that this explanation gives you an overall idea of this query.
More about bool queries from the documentation here.

how can I sort ElasticSearch result properly

For example I have the following two documents with fields Id and Name(the name field is analyzed):
1,jack-in-box
2,box
When my query was "box", I got the both documents, but actually I only wanna the document 2, or getting document 2 above of document 1.
How can I query this please.
I know that the doc1 was tokenized to jack,in and box, so when I search box I would get the doc1. My current solution is creating another field called name_not_analyzed and it is not analyzed. But I have been wondering if we have the best way via query to solve this such I don't have to reindex. Thanks in advance!
As #jgr pointed out in comment doc2 should be above doc1 by default unless you have your own ranking algorithm or if you are using constant score query or if you are only using filter which would give score of 1 to all documents
Now if you only want doc2, you could use scripting
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source.name.toLowerCase()=='box'"
}
}
}
}
}
I am accessing the source itself to check against, also using lowercase to match BOX, Box etc.
Hope this helps!!

Exclude setting on integer field in term query

My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.

Resources