Can you refer to and filter on a script field in a query expression, after defining it? - elasticsearch

I'm new to ElasticSearch and was wondering, once you define a script field with mvel syntax, can you subsequently filter on, or refer to it in the query body as if it was any other field?
I can't find any examples of this while same time I don't see any mention of whether this is possible on the docs page
http://www.elasticsearch.org/guide/reference/modules/scripting/
http://www.elasticsearch.org/guide/reference/api/search/script-fields/
The book ElasticSearch Server doesn't mention if this is possible or not either

As for 2018 and Elastic 6.2 it is still not possible to filter by fields defined with script_fields, however, you can define custom script filter for the same purpose. For example, lets assume that you've defined the following script field:
{
"script_fields" : {
"some_date_fld_year":"doc["some_date_fld"].empty ? null : doc["some_date_fld"].date.year"
}
}
you can filter by it with
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": " (doc["some_date_fld"].empty ? null : doc["some_date_fld"].date.year) >= 2017",
"lang": "painless"
}
}
}
}
}
}

It's not possible for one simple reason: the script_fields are calculated during final stage of search (fetch phase) and only for the records that you retrieve (top 10 by default). The script filter is applied to all records that were not filtered out by preceding filters and it happens during query phase, which precedes the fetch phase. In other words, when filters are applied the script_fields don't exist yet.

Related

How to run Elasticsearch completion suggester query on limited set of documents

I'm using a completion suggester in Elasticsearch on a single field. The type contains documents of several users. Is there a way to limit the returned suggestions to documents that match a specific query?
I'm currently using this query:
{
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
Is there a way to combine this query with a different one, e.g.
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
}
}
Have a look at the context suggester, which is just a specialized completion suggester with filtering capabilities - however this is still not a regular query filter, just keep that in mind.
You can specify both the query and the suggester in your query, like this:
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
},
"suggest": {
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
}
I have a similar use case, and I've posted my question on elastic search forum, see here
From what I've read so far, I don't think with completion suggester you can limit documents. They essentially create a finite state transducer (prefix tree) at index time, this makes it fast but you lose the flexibility of filtering on additional fields. I don't think context suggester would work in your case (let me know if i am wrong), because the cardinality of user_id is very high.
I think edge-ngrams partial matching is more flexible and might actually work in your use case.
Let me know what you end up implementing.

ElasticSearch aggregation: exclude one filter per aggregation

I want to filter out documents whose field 'A' is equal to 'a', and I want to facet the field 'A' at the same time, excluding of course the previous filter.
I know that you can put the filter 'outside' the query in order to get the facets without that filter applied, like:
ElasticSearch
{
"query : { "match_all" : { } },
"filter" : { "term : { "A" : "a" } },
"facets" : {
"A" : { "terms" : { "field" : "A" } } //this should exclude the filter A:a
}
}
SOLR
&q=:*:*
&fq={!tag=Aa}A:a
&facet=true&facet.field={!ex=Aa}A
This is very nice, but what happens if i have multiple filters and facets that each one should exclude each other?
Example:
filter=A:a
filter=B:b
filter=C:c
facet={exclude filter A:a}A
facet={exclude filter B:b}B
facet={exclude filter C:c}C
That is, for facet A I want to keep all filters except A:a, for facet B all except B:b, and so on.
The most obvious way would be to do n queries (one per each of the n facets), but I'd like to stay away from that.
The global scope provides access to every document, you can then add the same filters you used for the main query.
I gave an example with global scope in this related topic
Could you give any feedback about performance issue with post_filter ?

Queries vs Filters - Order of execution

I've read this question and a colleague of mine made me doubt:
In a filtered query, when is the filter applied ? Before or after executing the query ? When is the result cached ?
If the filter is applied beforehand, wouldn't it be a a good thing to duplicate the query part in the filters ?
If the filter is applied afterward, then i'm having trouble understanding what is cached.
Luckily, ES provides two types of filters for you to work with:
{
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
{
"query": {
"filtered" : {
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
}
In the first case, filters are applied to all documents found by the query. In the second case, the documents are filtered before the query runs. This yields better performance.
Quoted from: http://www.packtpub.com/elasticsearch-server-for-fast-scalable-flexible-search-solution/book
About cache, I'm not sure about cache mechanism of filters.
My guessing would be:
First case, since the filter is against a set of results returned by query, the cache is kind of specific for this return set.
Second case, the filter is applied first, the cache is stored for the indices you checked against, thus, this cache is more reusable because it does not rely on the content of the query, but at larger memory cost and query time for first time(before the cache is generated).
Let me explain you search query execution-
First thing is that there is always a Complete document of reference in which you want to search.
If you have filter query included with search query then it will just make that document smaller or in other words filter queries are cached results of same query.
Now you have a smaller tree to search from with your query text.
Now your doubt part- Duplicating the query in filters will only increase overhead of cache mechanism and There are many guide lines on what to include in filter query and what to ignore. It's all play of relevancy.

Register and call query in ElasticSearch

Is it possible to register query (like the percolate process) and call them by name to execute them.
I am building an application that will let the user save search query associated with a label. I would like to save the query generated by the filter in ES.
If I save the query in an index, I have to call ES first to retrieve the query, extract the field containing the query and then call ES again to execute it. Can I do it in one call ?
The other solution is to register queries (labels with _percolator with an identifier of the user:
/_percolate/transaction/user1_label1
{
"userId": "user1",
"query":{
"term":{"field1":"foo" }
}
}
and when there is a new document use the percolator in a non indexing mode (filtered per userId) to retrieve which query match, and then update the document by adding a field "label":["user1_label1", "user1_label2"] and finaly index the document. SO the labelling is done at indexing time.
What do you think ?
Thanks in advance.
Try Filter Aliases.
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{
"add" : {
"index" : "the_real_index",
"alias" : "user1",
"filter" : { "term" : { "field1" : "foo" } }
}
}
]
}'

How to perform a date range elasticsearch query given multiple dates per document?

I'm using ElasticSearch to index forum threads and reply posts. Each post has a date field associated with it. I'd like to perform a query that includes a date range which will return threads that contain posts matching a date range. I've looked at using a nested mapping but the docs say the feature is experimental and may lead to inaccurate results.
What's the best way to accomplish this? I'm using the Java API.
You haven't said much about your data structure, but I'm inferring from your question that you have post objects which contain a date field, and presumably a thread_id field, ie some way of identifying which thread a post belongs to?
Do you also have a thread object, or is your thread_id sufficient?
Either way, your stated goal is to return a list of threads which have posts in a particular date range. This means that you need to group your threads (rather than returning the same thread_id multiple times for each post in the date range).
This grouping can be done by using facets.
So the query in JSON would look like this:
curl -XGET 'http://127.0.0.1:9200/posts/post/_search?pretty=1&search_type=count' -d '
{
"facets" : {
"thread_id" : {
"terms" : {
"size" : 20,
"field" : "thread_id"
}
}
},
"query" : {
"filtered" : {
"query" : {
"text" : {
"content" : "any keywords to match"
}
},
"filter" : {
"numeric_range" : {
"date" : {
"lt" : "2011-02-01",
"gte" : "2011-01-01"
}
}
}
}
}
}
'
Note:
I'm using search_type=count because I don't actually want the posts returned, just the thread_ids
I've specified that I want the 20 most frequently encountered thread_ids (size: 20). The default would be 10
I'm using a numeric_range for the date field because dates typically have many distinct values, and the numeric_range filter uses a different approach to the range filter, making it perform better in this situation
If your thread_ids look like how-to-perform-a-date-range-elasticsearch-query then you can use these values directly. But if you have a separate thread object, then you can use the multi-get API to retrieve these
your thread_id field should be mapped as { "index": "not_analyzed" } so that the whole value is treated as a single term, rather than being analyzed into separate terms

Resources