ElasticSearch and Agregation - elasticsearch

I have been given a problem where I need to perform a search based on different fields.For example,On UI the user is giving several search option like company name,department,state/province,title country and region.
The user selects few of these options like company name,department,state.I need to perform the search on these fields and return the results.
Can I do this with the help of aggregation in elastic search?Can anyone give me detailed example on how this can be done.
I did a few example like performing aggregation on gender.the query is as follows:-
"aggs" :{"group_by_gender" :{"terms" :{"field" : "gender"}}
When I ran this type of query all the sources(from documents) were returned.So,I was kind of confused whether aggregation is actually performed.
Thanks in Advance

Aggregations are meant to make statistics over the values of fields. If you need to search documents depending on fields, you need to make (boolean) queries.
Example:
POST myIndex/_search
{
"bool" : {
"must" : [
{"term" : { "name" : "kimchy" }},
{"term" : { "state" : "unicorn planet" }}
]
}
}

Elastic search boolquery
boolean query has different parameters like must , should , match ,match all , filter.
hope this will help.

Related

Filter on score after rescore in Elasticsearch

I have been on an internet manhunt for days for this and getting ready to give up. I need to filter on _score in Elasticsearch after the rescore function has completed. So given an example query like this:
POST /_search
{
"query" : {
"match" : {
"message" : {
"operator" : "or",
"query" : "the quick brown"
}
}
},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"match_phrase" : {
"message" : {
"query" : "the quick brown",
"slop" : 2
}
}
},
"query_weight" : 0.7,
"rescore_query_weight" : 1.2
}
}
}
Say just for simplicity's sake that the above returns 5 documents with scores ranging from 0.0 to 1.0. I want the final returned results set to only be the documents with a score above 0.90. In other words, take those newly-rescored docs, and hand them off to a filter where it drops all documents scored below 0.90.
I have tried many, many different ways but nothing is working. Post_filter is apparently meant to come after the main query but before rescore, so that one doesn't work. min_score does not work at all with rescore, it only works with the original ES scores from the main query. Aggs is one functionality that I am able to get to work after rescore, but aggregating is not what I need to do here. But at least it shows me that ES has the ability to continue operating on the data after a rescore query.
Any thoughts on how to get this seemingly simple task accomplished? I have also tried using function_score and script_score but really those are just ways to further modify the scores, whereas I need to filter on the scores generated by the rescore. The requirement here is to get it done in the query. We can't do it as a post-processing step.

How to specify certain fields only in the query property

I am using a service which wraps requests to Elastic Search. This service only allows me to send the query property to Elastic Search. I want to tell Elastic Search to look only for matches in a certain field in a document.
For example, if this is my document:
{
name: 'foo',
value: 'true'
}
Then I want to tell Elastic Search to look only for documents where name equals foo.
The Elastic Search documentation says to do this by using the fields property like so:
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^3", "message" ]
}
}
But I can ONLY access the query property, so I can't specify fields. Lower down on the page, under best fields it says that this is equivalent to doing something like +first_name:will +first_name:smith. But when I put this, it's looking for text that actually matches +first_name:will +first_name:smith in the value, rather than looking for a first_name field that has a value will.
Is it possible to specify what field to search in with Elastic Search using only the query property?
This sounds like a perfect match for query_string(https://www.elastic.co/guide/en/elasticsearch/reference/1.x/query-dsl-query-string-query.html). You can do something like this with it:
"query_string" : {
"query" : "subject:whatever OR message:whatever"
}
So, if you can change multi_match to query_string this would be what you are looking for.
Lucene supports fielded data. When performing a search you can either specify a field, or use the default field. The field names and default field is implementation specific.
You can search any field by typing the field name followed by a colon ":" and then the term you are looking for.
{
"query": {
"query_string": {
"query": "Name:\"foo bar cook\"",
"default_operator" : "or"
}
}
}
use default_operator and to perform AND operation, or to perform OR kind of operation among the values

Queries vs Filters - Order of execution

I've read this question and a colleague of mine made me doubt:
In a filtered query, when is the filter applied ? Before or after executing the query ? When is the result cached ?
If the filter is applied beforehand, wouldn't it be a a good thing to duplicate the query part in the filters ?
If the filter is applied afterward, then i'm having trouble understanding what is cached.
Luckily, ES provides two types of filters for you to work with:
{
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
{
"query": {
"filtered" : {
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
}
In the first case, filters are applied to all documents found by the query. In the second case, the documents are filtered before the query runs. This yields better performance.
Quoted from: http://www.packtpub.com/elasticsearch-server-for-fast-scalable-flexible-search-solution/book
About cache, I'm not sure about cache mechanism of filters.
My guessing would be:
First case, since the filter is against a set of results returned by query, the cache is kind of specific for this return set.
Second case, the filter is applied first, the cache is stored for the indices you checked against, thus, this cache is more reusable because it does not rely on the content of the query, but at larger memory cost and query time for first time(before the cache is generated).
Let me explain you search query execution-
First thing is that there is always a Complete document of reference in which you want to search.
If you have filter query included with search query then it will just make that document smaller or in other words filter queries are cached results of same query.
Now you have a smaller tree to search from with your query text.
Now your doubt part- Duplicating the query in filters will only increase overhead of cache mechanism and There are many guide lines on what to include in filter query and what to ignore. It's all play of relevancy.

Doing search in elasticsearch

I prepare query object and do search in elasticsearch.
For making query object, I give key and their value.
Problem is, when key and value is like "brand":"Men's Wear" then In this case elasticsearch is unable to give me related docs. I think problem is with comma or may be space. everything is fine if I use other json property for key and value (having no space and comma like "priority":"high")
Any help please!
Update:
no match query still not working! one more problem i found in creating search query. query i am using is:
var qryObj1 = {
"query" : {
"text" : {"name":"Tom"}
}
};
This will return all docs having name Tom. Now I want to get all docs having name Tom and profession is developer. So, here modified one:
qryObj1 = {
"query" : {
"text" : {"name":"Tom","profession":"developer"}
},"operator" : "and"
};
but search result is old one. any help!
Sounds like you are using TermQuery, aren't you?
TermQuery are not analyzed so they don't match with your analyzed content.
Try with a MatchQuery. It should work.
You need to use boolean query
http://www.elasticsearch.org/guide/reference/query-dsl/bool-query.html
Here you can ask ES to take AND or OR of various queries
"bool" : {
"must" : [
"text" : {"name":"Tom"},
"text" : {"profession":"developer"}
]
}

elastic search faceted query returns incorrect count

I need help in aggregate / faceted queries in elastic search. I have used faceted query to group the results but I’m not getting grouped result with correct count.
Please suggest on how to get grouped results from elastic search.
{
"query" : {
"query_string" : {"query" : "pared_cat_id:1"} } ,
"facets" : {
"subcategory" : {
"terms" : {
"field": "sub_cat_id",
"size" : 50,
"order" : "term",
"all_terms" : true
}
}
},
"from" : 0,
"size": 50
}
Trying to get grouped results for sub category id for passed parent category id.
"query_string" : {"query" : "pared_cat_id:1"} } ,
This is applied to overall data and not on the facets counts.
FOr this you need to use facet query in which you can specify same which you are specifying in the main query string.
So facets count which are being shown to you now are based on the results without applying "query_string" : {"query" : "pared_cat_id:1"} } , ie. to the whole data. Incase you want facets counts after applying "query_string" : {"query" : "pared_cat_id:1"} } , provide it in the facet query.
Elasticsearch faceting queries works very well in terms of accuracy, at least I have not seen any problem yet.
Just a few questions:
What field is this string or numeric,give example?
Have you applied any custom mapping or you have used default "standard" analyzer
Please state the kind of inaccuracy like "aa" should have count 100 but its 50 or is there any other kind of inaccuracy?
Elasticsearch facets query returns incorrect count if the number of shards is >1, so as for now Facets are deprecated and will be removed in a future release. You are encouraged to migrate to aggregations instead.
I suggest that you take a look at this blog post in which Alex Brasetvik give a good description along with some examples on how to use the aggregations feature properly.

Resources