Find misspelled documents in elasticsearch - elasticsearch

I have an Author document in my elasticsearch index. I have a user input to put new author in the index.
Before storing those new Author, I want to check if the Author already exist in the index, even if it was first misspelled.
I'm doing fuzzy search that seems to be the way of doing this.
Here is the request I'm doing:
curl 'http://localhost:9200/my_index/Author/_search?pretty' -d '{
"query":
{
"fuzzy": {
"name": {
"value": "put a name here"
}
}
}
}'
Given I have an Author named "Daniel Bluefield".
The above request works well when I search "Danel".
But it don't return anythin if I search the full name, it did not return any result.
How can I make a request for "Danel Bluefld" returns some results ?

Change it to fuzzy_like_this_field,you might need to tweak the fuzziness parameter
curl 'http://localhost:9200/my_index/Author/_search?pretty' -d '{
"query":
{
"fuzzy_like_this_field" : {
"name" : {
"like_text" : "Danel Bluefld",
"max_query_terms" : 10
}
}
}
}'

The Mihai works well, however, I've managed to make it work another way:
{
"min_score": 3,
"query": {
"match": {
"name": {
"query": "danil greenfld",
"fuzziness": "AUTO"
}
}
}
}
But I can't really see the difference between those two queries...

Related

How to query in elasticsearch?

I am working on elastic search to fetch the record which contain string "bond"
{
"query": {
"match": {
"name": "Bond"
}
}
}
but I am getting empty array as a output. Though multiple records are present containing string "bold" , but i am getting empty hits. (hits:[])
How to solve this issue?
I am using same query for another index and its working but for index named as "all_colleges", its not working.
Its only returning the record when string is perfect match. i.e. "Bond" == "Bond"
You can try with fuzziness:
{
"query": {
"match": {
"name": {
"query": "Bond",
"fuzziness": "AUTO"
}
}
}
}
Actually there is many parameters you can add to get the results that you want in elastic search. Please check this link https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
You can try this one
{
"query": {
"match": {
"name": {
"query": "Bond",
"fuzziness": "AUTO"
}
}
}
}
`

Elasticsearch Update by Query

I am trying to update several documents based on a search query with ES version 2.3.4. My use case is to search for documents where two fields match certain values and then add a new field with a certain value. So let's say I want to search all employees with first name "John" and last name "Smith" and add a new field "job" to their profiles with the value "Engineer" in it.
So my first question is whether it is possible to do this using the "doc" option with the update_by_query API (the same way like with the update API).
If not, and script must be used (which is the way I'm doing it now) then maybe somebody can help me getting rid of the following error:
{"error":{"root_cause":[{"type":"class_cast_exception","reason":"java.lang.String cannot be cast to java.util.Map"}],"type":"class_cast_exception","reason":"java.lang.String cannot be cast to java.util.Map"},"status":500}
The code I'm using looks as follows:
curl -XPOST -s 'http://localhost:9200/test_index/_update_by_query?conflicts=proceed' -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"match": { "first_name" : "John" }
},
{
"match": { "last_name" : "Smith" }
}
]
}
}
}
},
"script" : "ctx._source.job = \"Engineer\""
}'
When sending the same query (without the "script" field) using the count API no error is reported and the correct number of documents is retuned.
The correct syntax is this:
"script": {
"inline": "ctx._source.job = \"Engineer\""
}

Elasticsearch find ONLY perfect match

I've been trying to do this for some time, read and searched a lot and I haven't found any definitive answer or solution.
Let's say we add some documents:
$ curl -XPUT http://localhost:9200/tm/entries/1 -d '{"item": "foo" }'
{"_index":"tm","_type":"entries","_id":"1","_version":1,"created":true}
$ curl -XPUT http://localhost:9200/tm/entries/2 -d '{"item": "foo bar" }'
{"_index":"tm","_type":"entries","_id":"2","_version":1,"created":true}
$ curl -XPUT http://localhost:9200/tm/entries/3 -d '{"item": "foo bar foo" }'
{"_index":"tm","_type":"entries","_id":"3","_version":1,"created":true}
After this, i want to find ONLY the document(s) that match perfectly the search query
$ curl -XGET http://localhost:9200/_search?q=foo
The result contains all 3 documents and I only want to get the one which matches "foo" only and nothing else.
Also,
$ curl -XGET http://localhost:9200/_search?q=bar foo
Should not return any results.
Can Elasticsearch do that?
How?
Update:
Existing mapping:
{
"tm": {
"mappings": {
"entries": {
"properties": {
"item": {
"type": "string"
}
}
}
}
}
}
Use he following Mapping.
{
"tm": {
"mappings": {
"entries": {
"properties": {
"item": {
"type": "string",
"index" : "not_analyzed"
}
}
}
}
}
}
And use term query to find exact match. Term queries are not analyzed.refer
curl -XGET "http://localhost:9200/tm/entries/_search" -d'
{
"query": {
"term": {
"item": {
"value": "foo bar"
}
}
}
}'
Try adding "index" : "not_analyzed" in the mapping.
And query should be something like
{
"match_phrase": {
"item": "foo"
}
}
You should use match query instead of query_string. It'll solve your issue.
{
"match" : {
"item" : "bar foo"
}
}
Take a look at this:
Also, make sure the terms you are searching is actually present in the indexed field. For that you need to use analyser "keyword".For more information take a look at this.
Thanks
If you are trying to search from GET request, I think this might help:
$ curl -XGET http://localhost:9200/tm/entries/_search?q=item:foo
so it is of syntax, _search?q= <field>:<value>
You can find documentation here, URI Search
And, If you are trying to have filter, it is good to have mapping with not_analyzed (as described above).
And for complex queries,
curl -XPOST "http://localhost:9200/tm/entries/_search" -d'
{
"filter": {
"term": {
"item": "foo"
}
}
}'
hope this helps.

How to make query_string search exact phrase in ElasticSearch

I put 2 documents in Elasticsearch :
curl -XPUT "http://localhost:9200/vehicles/vehicle/1" -d'
{
"model": "Classe A"
}'
curl -XPUT "http://localhost:9200/vehicles/vehicle/2" -d'
{
"model": "Classe B"
}'
Why is this query returns the 2 documents :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe A\""
}
}
}'
And this one, only the second document :
curl -XPOST "http://localhost:9200/vehicles/_search" -d'
{
"query": {
"query_string": {
"query": "model:\"Classe B\""
}
}
}'
I want elastic search to match on the exact phrase I pass to the query parameter, WITH the whitespace, how can I do that ?
What you need to look at is the analyzer you're using. If you don't specify one Elasticsearch will use the Standard Analyzer. It is great for the majority of cases with plain text input, but doesn't work for the use case you mention.
What the standard analyzer will do is split the words in your string and then converts them to lowercase.
If you want to match the whole string "Classe A" and distinguish this from "Classe B", you can use the Keyword Analyzer. This will keep the entire field as one string.
Then you can use the match query which will return the results you expect.
Create the mapping:
PUT vehicles
{
"mappings": {
"vehicle": {
"properties": {
"model": {
"type": "string",
"analyzer": "keyword"
}
}
}
}
}
Perform the query:
POST vehicles/_search
{
"query": {
"match": {
"model": "Classe A"
}
}
}
If you wanted to use the query_string query, then you could set the operator to AND
POST vehicles/vehicle/_search
{
"query": {
"query_string": {
"query": "Classe B",
"default_operator": "AND"
}
}
}
Additionally, you can use query_string and escape the quotes will also return an exact phrase:
POST _search
{
"query": {
"query_string": {
"query": "\"Classe A\""
}
}
use match phrase query as mentioned below
GET /company/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
Seems like in the latest versions of ES you can just use .keyword
POST vehicles/_search
{
"query": {
"term": {
"model.keyword": "Classe A"
}
}
}
It will match exactly the string "Classe A"
Dynamic fields determined by ES as text will have a subfield 'keyword', very useful for this cases:
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
Another nice solution would be using match and minimum_should_match(providing the percentage of the words you want to match). It can be 100% and will return the results containing at least the given text;
It is important that this approach is NOT considering the order of the words.
"query":{
"bool":{
"should":[
{
"match":{
"my_text":{
"query":"I want to buy a new new car",
"minimum_should_match":"90%"
}
}
}
]
}
}

Elasticsearch grouping facet by owner, mine vs others

I am using Elasticsearch to index documents that have an owner which is stored in a userId property of the source object. I can easily do a facet on the userId and get facets for each owner that there is, but I'd like to have the facets for owner show up like so:
Documents owned by me (X)
Documents owned by others (Y)
I could handle this on the client side and take all of the facets returned by elasticsearch and go through them and figure out those owned by the current user and not and display it appropriately, but I was hoping there was a way to tell elasticsearch to handle this in the query itself.
You can use filtered facets to do this:
curl -XGET "http://localhost:9200/_search" -d'
{
"query": {
"match_all": {}
},
"facets": {
"my_docs": {
"filter": {
"term": { "user_id": "my_user_id" }
}
},
"others_docs": {
"filter": {
"not": {
"term": { "user_id": "my_user_id" }
}
}
}
}
}'
One of the nice things about this is that the two terms filters are identical and so are only executed once. The not filter just inverts the results of the cached term filter.
You're right, ElasticSearch has a way to do that. Take a look to scripting term facets, specially to the second example ("using the boolean feature"). You should be able to do somthing like:
{
"query" : {
"match_all" : { }
},
"facets" : {
"userId" : {
"terms" : {
"field" : "userId",
"size" : 10,
"script" : "term == '<your user id>' ? true : false"
}
}
}
}

Resources