Elasticsearch match exact term - elasticsearch

I have an Elasticsearch repo and a aplication that create documents for what we call 'assets'. I need to prevent users to create 'assets' with the same 'title'.
When the user tries to create an 'asset' I am querying the repo with the title and if there is a match an error message is shown to the user.
My problem is that when I query the title I am getting multiple results (for similar matches).
This is my query so far:
GET assets-1/asset/_search
{
"query": {
"match": {
"title": {
"query": "test",
"operator": "and"
}
}
}
}
I have many records with title: 'test 1', 'test 2', 'test bla' and only one with the title 'test'.
But I am getting all of the above.
Is there any condition or property I have to add to the query so I will exact match the term?

Your title field is probably analyzed and thus the test token will match any title containing that token.
In order to implement an exact match you need to have a not_analyzed field and do a term query on it.
You need to change the mapping of your title field to this:
curl -XPUT localhost:9200/assets-1/_mapping/asset -d '{
"asset": {
"properties": {
"title": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Then you need to reindex your data and you'll then be able to run an exact match query like this:
curl -XPOST localhost:9200/assets-1/asset/_search -d '{
"query": {
"term": {
"title.raw": "test"
}
}
}'

Related

handling both exact and partial search on the same search string

I want to define the schema which can tackle the partial as well as the exact search for the same search value.
The exact search should always return the "exact match", ES should not break the search string into tokens in this case.
For partial match data type of the property should be text and for exact it should be keyword. For having the feasibility to have both partial and exact search without having to index the data to different properties you can leverage using fields. What it does is that it helps to index same data into different ways.
So, lets say you want to index name of persons, and have the ability for partial and exact search. In such case the mapping would be:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Lets index a few docs:
PUT test/_doc/1
{
"name": "Nishant Saini"
}
PUT test/_doc/2
{
"name": "Nishant Kumar"
}
For partial search we have to query name field and it is of type text.
GET test/_doc/_search
{
"query": {
"query_string": {
"query": "Nishant Saini",
"field": [
"name"
]
}
}
}
The above query will return both docs (1 and 2) because one token i.e. Nishant appears in both the document for field name.
For exact search we need to query on name.keyword. To perform exact match we can use term query as below:
{
"query": {
"term": {
"name.keyword": "Nishant Saini"
}
}
}
This would match doc 1 only.

Elasticsearch mappings doesn't seem to apply for query

I use Elasticsearch with Spring Boot application. In this application there
I have index customer, and customer contains field secretKey. This secret key is string that is build from numbers and letters in way FOOBAR-000
My goal was to select exactly one customer by his secret key, so I changed mappings to NOT ANALYZE that fields but it seems not to work. What am I doing wrong?
Here's my mapping
curl -X GET 'http://localhost:9200/customer/_mapping'
{
"customer": {
"mappings": {
"customer": {
"properties": {
"secretKey": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
but after I will run query
curl -XGET "http:/localhost:9200/customer/_validate/query?explain" -d'
{
"query": {
"query_string": {
"query": "FOOBAR-3121"
}
}
}'
I get following explanation:
"explanations": [
{
"index": "customer",
"valid": true,
"explanation": "_all:foobar _all:3121"
},
]
From my understanding you have an index called "customer" and within this index, a document containing a "customer field. In your case the secretKey should be nested in the "customer" field. For some reasons Elasticsearch decided to have a strange behaviour if you encapsulate objects without specifying that they are of nested type. This is the article from the doc that explains the behaviour in details. If you specify it with the following :
{
"customer": {
"mappings": {
"_doc": {
"properties": {
"customer": {
"type": "nested"
}
}
}
}
}
}
Then it should work with your query
You need to specify field name in your query, without it ElasticSearch executes query against all field, so you see _all . Try this one:
curl -XGET "http:/localhost:9200/customer/_validate/query?explain" -d'
{
"query": {
"term": {
"secretKey": {
"value": "FOOBAR-3121"
}
}
}
}'
My goal was to select exactly one customer by his secret key
Your requirement is strict, so use MATCH query to select ONLY matched customer!
curl -XGET "http:/localhost:9200/customer/_validate/query?explain" -d'
{
"query": {
"match": {
"secretKey": "FOOBAR-3121"
}
}

elasticsearch aggregations separated words

I simply run an aggregations in browser plugin(marvel) as you see in picture below there is only one doc match the query but aggregrated separated by spaces but it doesn't make sense I want aggregrate for different doc.. ın this scenario there should be only one group with count 1 and key:"Drow Ranger".
What is the true way of do this in elasticsearch..
It's probably because your heroname field is analyzed and thus "Drow Ranger" gets tokenized and indexed as "drow" and "ranger".
One way to get around this is to transform your heroname field to a multi-field with an analyzed part (the one you search on with the wildcard query) and another not_analyzed part (the one you can aggregate on).
You should create your index like this and specify the proper mapping for your heroname field
curl -XPUT localhost:9200/dota2 -d '{
"mappings": {
"agust": {
"properties": {
"heroname": {
"type": "string",
"fields": {
"raw: {
"type": "string",
"index": "not_analyzed"
}
}
},
... your other fields go here
}
}
}
}
Then you can run your aggregation on the heroname.raw field instead of the heroname field.
UPDATE
If you just want to try on the heroname field, you can just modify that field and not recreate the whole index. If you run the following command, it will simply add the new heroname.raw sub-field to your existing heroname field. Note that you still have to reindex your data though
curl -XPUT localhost:9200/dota2/_mapping/agust -d '{
"properties": {
"heroname": {
"type": "string",
"fields": {
"raw: {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Then you can keep using heroname in your wildcard query, but your aggregation will look like this:
{
"aggs": {
"asd": {
"terms": {
"field": "heroname.raw", <--- use the raw field here
"size": 0
}
}
}
}

How to make query_string search exact phrase in ElasticSearch

I put 2 documents in Elasticsearch :
curl -XPUT "http://localhost:9200/vehicles/vehicle/1" -d'
{
"model": "Classe A"
}'
curl -XPUT "http://localhost:9200/vehicles/vehicle/2" -d'
{
"model": "Classe B"
}'
Why is this query returns the 2 documents :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe A\""
}
}
}'
And this one, only the second document :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe B\""
}
}
}'
I want elastic search to match on the exact phrase I pass to the query parameter, WITH the whitespace, how can I do that ?
What you need to look at is the analyzer you're using. If you don't specify one Elasticsearch will use the Standard Analyzer. It is great for the majority of cases with plain text input, but doesn't work for the use case you mention.
What the standard analyzer will do is split the words in your string and then converts them to lowercase.
If you want to match the whole string "Classe A" and distinguish this from "Classe B", you can use the Keyword Analyzer. This will keep the entire field as one string.
Then you can use the match query which will return the results you expect.
Create the mapping:
PUT vehicles
{
"mappings": {
"vehicle": {
"properties": {
"model": {
"type": "string",
"analyzer": "keyword"
}
}
}
}
}
Perform the query:
POST vehicles/_search
{
"query": {
"match": {
"model": "Classe A"
}
}
}
If you wanted to use the query_string query, then you could set the operator to AND
POST vehicles/vehicle/_search
{
"query": {
"query_string": {
"query": "Classe B",
"default_operator": "AND"
}
}
}
Additionally, you can use query_string and escape the quotes will also return an exact phrase:
POST _search
{
"query": {
"query_string": {
"query": "\"Classe A\""
}
}
use match phrase query as mentioned below
GET /company/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
Seems like in the latest versions of ES you can just use .keyword
POST vehicles/_search
{
"query": {
"term": {
"model.keyword": "Classe A"
}
}
}
It will match exactly the string "Classe A"
Dynamic fields determined by ES as text will have a subfield 'keyword', very useful for this cases:
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
Another nice solution would be using match and minimum_should_match(providing the percentage of the words you want to match). It can be 100% and will return the results containing at least the given text;
It is important that this approach is NOT considering the order of the words.
"query":{
"bool":{
"should":[
{
"match":{
"my_text":{
"query":"I want to buy a new new car",
"minimum_should_match":"90%"
}
}
}
]
}
}

Range filter not working

I'm using the elasticsearch search engine and when I run the code below, the results returned doesn't match the range criteria(I get items with published date below the desired limit):
#!/bin/bash
curl -X GET 'http://localhost:9200/newsidx/news/_search?pretty' -d '{
"fields": [
"art_text",
"title",
"published",
"category"
],
"query": {
"bool": {
"should": [
{
"fuzzy": {"art_text": {"boost": 89, "value": "google" }}
},
{
"fuzzy": {"art_text": {"boost": 75, "value": "twitter" }}
}
],
"minimum_number_should_match": 1
}
},
"filter" : {
"range" : {
"published" : {
"from" : "2013-04-12 00:00:00"
}
}
}
}
'
I also tried putting the range clause in a must one, inside the bool query, but the results were the same.
Edit: I use elasticsearch to search in a mongodb through a river plugin. This is the script I ran to search the mongodb db with ES:
#!/bin/bash
curl -X PUT localhost:9200/_river/mongodb/_meta -d '{
"type":"mongodb",
"mongodb": {
"db": "newsful",
"collection": "news"
},
"index": {
"name": "newsidx",
"type": "news"
}
}'
Besides this, I didn't create another indexes.
Edit 2:
A view to the es mappings:
http://localhost:9200/newsidx/news/_mapping
published: {
type: "string"
}
The reason is in your mapping. The published field, which you are using as a date, is indexed as a string. That's probably because the date format you are using is not the default one in elasticsearch, thus the field type is not auto-detected and it's indexed as a simple string.
You should change your mapping using the put mapping api. You need to define the published field as a date there, specifying the format you're using (can be more than one) and reindex your data.
After that your range filter should work!

Resources