Elasticsearch error when same search query - elasticsearch

Now i am working with Elastic search.But when i run same query for search,i can not get same results.
First query:"GET /mega/employee/_search?q=last_name:Smith"
Result:I only get 2 results that "last_name==Smith"
Second query:
"GET /mega/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}"
I get 3 results,with last result:last_name==Fer
Someones can explain for me?

You probably need to POST to _search. GET with a body is ill-supported (and, I'd argue, meaningless according to the spec), and is rather abused by Elasticsearch. Your request is almost certainly being interpreted at a GET to your _search endpoint with no query attached.

Related

Using term or terms with one value in Elasticsearch queries

I am querying an Elasticsearch index using the values of a field. Sometimes, I have to extract all the documents having a field set to exactly one value; Some other times I have to retrieve all the documents having a field, set with one of the values in a list of values.
The latter use case contains the former. Can I use a single query using the terms construct?
POST /_search
{
"query": {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
Or, in cases I know I need to search only for a unique value, it is better to use the term construct?
POST _search
{
"query": {
"term" : { "user" : "kimchy" }
}
}
Which approach is better regarding performance? Does Elasticsearch perform any optimization if the value in the terms construct is unique?
Thanks to all.
See this link. Terms query is automatically cached while term query is not . So, the next you run the same query, the took time for query for execution will be faster. So if you have a case where you need to run the same query again and again, terms query is a good choice. If not, there is not much of difference between the two.

Jest Client Size Parameter Ignored

I am using the Jest Client to query Elasticsearch from my Java program. Everything works correctly except that when I add the "size" parameter it is ignored. Building and execution of the Search is like so:
Search search = new Search.Builder(query)
.setSearchType(SearchType.QUERY_THEN_FETCH)
.addIndex(index)
.addType(type)
.setParameter(Parameters.SIZE, 1)
.build();
jestClient.execute(search);
This query always returns 10 results, instead of the 1 result expected. In case it is relevant, there are only 5 shards, so it is not returning a result per shard.
Is there any particular reason this parameter is ignored? When running the same query with the same parameters on the command line with 'curl -XGET' or when simply putting it in the browser the query runs correctly and the size parameter is taken in to account.
It turns out I had a misunderstanding of how the size parameter was working. I believe Jest was handling it correctly when sizing the query results, but what I was actually interested in were the aggregation results.
Adding a "size" parameter to my aggregation worked, as can be seen in the Terms Aggregation docs (copied below).
{
"aggs" : {
"products" : {
"terms" : {
"field" : "product",
"size" : 5
}
}
}
}

Finding fields Elasticsearch has matched on

I am using Elasticsearch to search for a group a user should join. I have the user data nested into the search query. On return I get back the closest matched group that user should be in.
The field I am searching on is a nested field as follows:
`{"interests": [
{"topics":["python", "stackoverflow", "elasticsearch"]},
{"topics":["arts", "textiles"]}
]}`
However if you want an understanding of a match - how do you do this?
Elasticsearch does have an explain function which says what the scoring is made up of using tfidf, but not specifically what terms were used.
For example, if I search for 'Textile', the doc should match on 'textiles'. Thus I want the term 'textiles' to be returned in explain or some other way.
The only way I see that provides this need, is to store the search and the document retrieved and then process both to discover words ES has most likely matched on.
EDIT - for some more clarity of the question
An example in my index of a group which has "interests": ['arts', 'fine arts', 'art painting', 'arts and crafts', 'sports']
Now my search, I am looking for Arts and many other things. Now the term I am searching for comes up in this list many times, thus should always be a contributor.
What I want in the response is to say these words were matched ['arts', 'fine arts', 'art painting', 'arts and crafts']along with the degree to which they match i..e 'arts' should be higher than the others, but all others are also relevant
Elasticsearch allows you to specify the _name field for all queries and
filters. This means that you can separate your query into different parts with
separate names, which will allow you to determine which parts matched.
For example:
{
"query" : {
"bool" : {
"should" : [
{"match" : { "interests.topics" : {"query" : "python", "_name" : "py-topic"} }},
{"match" : { "interests.topics" : {"query" : "arts", "_name" : "arts-topic"} }}
]
}
}
}
Then, in your response, you will get back any array of which queries (or
filters) matched and you can determine if the py-topic query and/or the
arts-topic query matched above.

How to enable fuzziness for phrase queries in ElasticSearch

We're using ElasticSearch for searching through millions of tags. Our users should be able to include boolean operators (+, -, "xy", AND, OR, brackets). If no hits are returned, we fall back to a spelling suggestion provided by ES and search again. That's our query:
$ curl -XGET 'http://127.0.0.1:9200/my_index/my_type/_search' -d '
{
"query" : {
"query_string" : {
"query" : "some test query +bools -included",
"default_operator" : "AND"
}
},
"suggest" : {
"text" : "some test query +bools -included",
"simple_phrase" : {
"phrase" : {
"field" : "my_tags_field",
"size" : 1
}
}
}
}
Instead of only providing a fallback to spelling suggestions, we'd like to enable fuzzy matching. If, for example, a user searches for "stackoverfolw", ES should return matches for "stackoverflow".
Additional question: What's the better performing method for "correcting" spelling errors? As it is now, we have to perform two subsequent requests, first with the original search term, then with the by ES suggested term.
The query_string does support some fuzziness but only when using the ~ operator, which I think doesn't your usecase. I would add a fuzzy query then and put it in or with the existing query_string. For instance you can use a bool query and add the fuzzy query as a should clause, keeping the original query_string as a must clause.
As for your additional question about how to correct spelling mistakes: I would use fuzzy queries to automatically correct them and two subsequent requests if you want the user to select the right correction from a list (e.g. Did you mean), but your approach sounds good too.

Queries vs. Filters

I can't see any description of when I should use a query or a filter or some combination of the two. What is the difference between them? Can anyone please explain?
The difference is simple: filters are cached and don't influence the score, therefore faster than queries. Have a look here too. Let's say a query is usually something that the users type and pretty much unpredictable, while filters help users narrowing down the search results , for example using facets.
This is what official documentation says:
As a general rule, filters should be used instead of queries:
for binary yes/no searches
for queries on exact values
As a general rule, queries should be used instead of filters:
for full text search
where the result depends on a relevance score
An example (try it yourself)
Say index myindex contains three documents:
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world!" }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world! I am Sam." }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hi Stack Overflow!" }'
Query: How well a document matches the query
Query hello sam (using keyword must)
curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "must": { "match": { "msg": "hello sam" }}}}
}'
Document "Hello world! I am Sam." is assigned a higher score than "Hello world!", because the former matches both words in the query. Documents are scored.
"hits" : [
...
"_score" : 0.74487394,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
"_score" : 0.22108285,
"_source" : {
"name" : "Hello world!"
}
...
Filter: Whether a document matches the query
Filter hello sam (using keyword filter)
curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "filter": { "match": { "msg": "hello sam" }}}}
}'
Documents that contain either hello or sam are returned. Documents are NOT scored.
"hits" : [
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world!"
}
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
Unless you need full text search or scoring, filters are preferred because frequently used filters will be cached automatically by Elasticsearch, to speed up performance. See Elasticsearch: Query and filter context.
Filters -> Does this document match? a binary yes or no answer
Queries -> Does this document match? How well does it match? uses scoring
Few more addition to the same.
A filter is applied first and then the query is processed over its results. To store the binary true/false match per document , something called a bitSet Array is used.
This BitSet array is in memory and this would be used from second time the filter is queried. This way , using bitset array data-structure , we are able to utilize the cached result.
One more point to note here , the filter cache is created only when the request is executed hence only from the second hit , we actually get the advantage of caching.
But then you can use warmer API , to outgrow this. When you register a query with filter against a warmer API , it will make sure that this is executed against a new segment whenever it comes live. Hence we will get consistent speed from the first execution itself.
Basically, a query is used when you want to perform a search on your documents with scoring.
And filters are used to narrow down the set of results obtained by using query. Filters are boolean.
For example say you have an index of restaurants something like zomato.
Now you want to search for restaurants that serve 'pizza', which is basically your search keyword.
So you will use query to find all the documents containing "pizza" and some results will obtained.
Say now you want list of restaurant that serves pizza and has rating of atleast 4.0.
So what you will have to do is use the keyword "pizza" in your query and apply the filter for rating as 4.0.
What happens is that filters are usually applied on the results obtained by querying your index.
Since version 2 of Elasticsearch, filters and queries have been merged and any query clause can be used as either a filter or a query (depending on the context). As with version 1, filters are cached and should be used if scoring does not matter.
Source: https://logz.io/blog/elasticsearch-queries/
Queries : calculate score; thus they’re able to return results sorted by relevance.
Filters : don’t calculate score, making them faster and easier to cache.

Resources