Search within the results got from elasticsearch - elasticsearch

Is it possible to search within the results that I get from elasticsearch?
To achieve that currently I need to run & wait for two searches on elasticsearch: the first search is
{ "match": { "title": "foo" } }
It takes 5 seconds and returns 500 docs etc.. And then a second search
{
"bool": {
"must": [
{ "match": { "title": "foo" } },
{ "match": { "title": "bar" } }
]
}
}
It takes another 5 seconds and returns 200 docs, which basically has nothing to do with the first search from elasticsearch's perspective.
Instead of doing it this way, I'd like to offer a "search further within the result" option to my users. Hopefully with this option, users can make a search with more keyword provided based on the result returned from the first search.
So my scenario is that a user makes a first search with keyword "foo", and gets 500 results on the webpage, and then selects "search further within the result", to make a second search within the 500 results, and hope to get some refined results really quick.
How can I achive it? Thanks!

What you could do is use the IDS query. Collect all document IDs from the first request, and then post them with a new Bool query that includes an IDS query in a must clause next to the original query. You could efficiently collect the IDs in the first request using the Scroll API. Since you will return the second result sorted anyway, it does not make sense to do any sorting in the first request, so you can speed up the first request.
See:
Scroll API: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
IDS Query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html

post filter is a way to search inside an other search.
In your case :
GET _search
{
"query": {
"match": {
"title": "foo"
}
},
"post_filter": {
"match": {
"title": "bar"
}
}
}
post_filter will be executed on the query result.

Related

Find one result based on a term query or a list of results based on a match query

I have an index of documents, each containing an id and name field. Each document name happens to be unique.
I want to perform a query on the name field that returns one exact result if possible, or falls back to return a list of similar results. For example, if the search term is Acme Incorporated and there is an exact result, return that only. Otherwise return similar matches; e.g: ACME Inc., acme, Ace etc.
I assumed that I need to somehow combine a keyword-based term query for an exact match, and a text-based match query for the similar matches. I am still getting to grips with compound queries so my first attempt was pretty naive:
{
"query": {
"bool": {
"should": [
{
"term": {
"name.exact": "Acme Incorporated"
}
},
{
"match": {
"name": "Acme Incorporated"
}
}
]
}
}
}
This returns a list of similar matches AND an exact match if present, because at least one query should succeed. This is obviously not correct.
In order to facilitate the keyword-based term query above, I added name.exact to my document mapping:
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"exact": {
"type": "keyword"
}
}
}
}
}
}
I suppose another approach is use the Multi Search API to perform the above queries separately. This allows me to look at the responses, and decide to use the match query if the term query result set is empty. This will work for my use case but I suspect that this is not an optimal approach.
I assume this is a common use-case but I am not sure what the solution is.
Edit
My current thinking on this is that I go with a Multi Search query as described above, the first is the same keyword-based term query to attempt to find an exact result and the second is the following — a compound bool query that excludes an exact result.
{
"query": {
"bool": {
"must": {
"match": {
"name": "Acme Incorporated"
}
},
"must_not": {
"term": {
"name.keyword": "Acme Incorporated"
}
}
}
}
}
In the end, the MultiSearch API suited my use case:
The multi search API executes several searches from a single API request. The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format.
I used this to perform two queries in one request:
Find any exact results with a keyword-based term query on the document name field.
Find any similar results with a bool query, comprising a match query on the
document name field, and a must_not of the first query to
filter out any exact results.
A Multi Search body is constructed of one or more pairs of an (optionally) empty header and body (a single query) delimited by newlines; e.g:
GET /myindex/_msearch
{}
{"query": {"constant_score": {"filter": {"term": {"name.keyword": "Acme Incorporated"}}}}}
{}
{"query": {"bool": {"must": {"match": {"name": "Acme Incorporated"}}, "must_not": {"term": {"name.keyword": "Acme Incorporated"}}}}}
The query is in ndjson format, which states that "Each Line is a Valid JSON Value". This requires that each query be compressed to one line, which is not very readable but not an issue if you're using a library to construct queries.

How can I achieve this type of queries in ElasticSearch?

I have added a document like this to my index
POST /analyzer3/books
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
And then I do queries like this
GET /analyzer3/_analyze
{
"analyzer": "english",
"text": "\"The * day I went with my * to the\""
}
And it successfully returns the previously added document.
My idea is to have quotes so that the query becomes exact, but also wildcards that can replace any word. Google has this exact functionality, where you can search queries like this, for instance "I'm * the university" and it will return page results that contain texts like I'm studying in the university right now, etc.
However I want to know if there's another way to do this.
My main concern is that this doesn't seem to work with other languages like Japanese and Chinese. I've tried with many analyzers and tokenizers to no avail.
Any answer is appreciated.
Exact matches on the tokenized fields are not that straightforward. Better save your field as keyword if you have such requirements.
Additionally, keyword data type support wildcard query which can help you in your wildcard searches.
So just create a keyword type subfield. Then use the wildcard query on it.
Your search query will look something like below:
GET /_search
{
"query": {
"wildcard" : {
"title.keyword" : "The * day I went with my * to the"
}
}
}
In the above query, it is assumed that title field has a sub-field named keyword of data type keyword.
More on wildcard query can be found here.
If you still want to do exact searches on text data type, then read this
Elasticsearch doesn't have Google like search out of the box, but you can build something similar.
Let's assume when someone quotes a search text what they want is a match phrase query. Basically remove the \" and search for the remaining string as a phrase.
PUT test/_doc/1
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
GET test/_search
{
"query": {
"match_phrase": {
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
}
}
For the * it's getting a little more interesting. You could just make multiple phrase searches out of this and combine them. Example:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": "The"
}
},
{
"match_phrase": {
"title": "day I went with my"
}
},
{
"match_phrase": {
"title": "to the"
}
}
]
}
}
}
Or you could use slop in the phrase search. All the terms in your search query have to be there (unless they are being removed by the tokenizer or as stop words), but the matched phrase can have additional words in the phrase. Here we can replace each * with 1 other words, so a slop of 2 in total. If you would want more than 1 word in the place of each * you will need to pick a higher slop:
GET test/_search
{
"query": {
"match_phrase": {
"title": {
"query": "The * day I went with my * to the",
"slop": 2
}
}
}
}
Another alternative might be shingles, but this is a more advanced concept and I would start off with the basics for now.

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:
We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?
You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.
Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword
Query to Verify
POST <your_index_name>/_search
{
"size": 0,
"query": {
"term": {
"sourceId": "100"
}
}
}
Execute the above Term Query and take a note at the hits.total of the response.
Remove the "size":0 in the above query if you want to view the entire documents as response.
Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.
Query to Delete
POST <your_index_name>/_delete_by_query
{
"query": {
"term": {
"sourceId": "100"
}
}
}
Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.
I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.
Hope it helps!
POST /my_index/_delete_by_query?conflicts=proceed&pretty
{
"query": {
"match_all": {}
}
}
Delete all the documents of an index without deleting the mapping and settings:
See: https://opster.com/guides/elasticsearch/search-apis/elasticsearch-delete-by-query/

How to use multifield search in elasticsearch combining should and must clause

This may be a repeted question but I'm not findin' a good solution.
I'm trying to search elasticsearch in order to get documents that contains:
- "event":"myevent1"
- "event":"myevent2"
- "event":"myevent3"
the documents must not contain all of them in the same document but the result should contain only documents that are only with those types of events.
And this is simple because elasticsearch helps me with the clause should
which returns exactly what i want.
But then, I want that all the documents must contain another condition that is I want the field result.example.example = 200 and this must be in every single document PLUS the document should be 1 of the previously described "event".
So, for example, a document has "event":"myevent1" and result.example.example = 200 another one has "event":"myevent2" and result.example.example = 200 etc etc.
I've tried this configuration:
{
"query": {
"bool": {
"must":{"match":{"operation.result.http_status":200}},
"should": [
{
"match": {
"event": "bank.account.patch"
}
},
{
"match": {
"event": "bank.account.add"
}
},
{
"match": {
"event": "bank.user.patch"
}
}
]
}
}
}
but is not working 'cause I also get documents that not contain 1 of the should field.
Hope I explained well,
Thanks in advance!
As is, your query tells ES to look for documents that must have "operation.result.http_status":200 and to boost those that have a matching event type.
You're looking to combine two must queries
one that matches one of your event types,
one for your other condition
The event clause accepts multiple values and those values are exact matches : you're looking for a terms query.
Try
{
"query": {
"bool": {
"must": [
{"match":{"operation.result.http_status":200}},
{
"terms" : {
"event" : [
"bank.account.patch",
"bank.account.add",
"bank.user.patch"
]
}
}
]
}
}
}

Performance of elastic queries

This query takes 200+ ms every time it is executed:
{
"filter": {
"term": {
"id": "123456",
"_cache": true
}
}
}
but this one only takes 2-3 ms every time it is executed after the first query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"id": "123456"
}
}
}
}
}
Note the same ID values in both queries. Looks like the second query uses cached results from the first query. But why the first query cannot use the cached results itself? Removing "_cache" : true from the first query doesn't change anything.
And when I execute the second query with some other ID, it takes ~ 40 ms to execute it for the first time and 2-3 ms every time after that. So the second query not only works faster but it also caches the results and uses the cache for subsequent calls.
Is there an explanation for all this?
The top-level filter element in the first request has very special function in Elasticsearch. It's used to filter search result without affecting facets. In order to avoid interfering with facets, this filter is applied during collection of results and not during searching, which causes its slow performance. Using top-level filter without facets makes very little sense because filtered and constant_score queries typically provide much better performance. If verbosity of filtered query with match_all bothers you, you can rewrite your second request into equivalent constant_score query:
{
"query": {
"constant_score": {
"filter": {
"term": {
"id": "123456"
}
}
}
}
}

Resources