Scope Elasticsearch Results to Specific Ids - elasticsearch

I have a question about the Elasticsearch DSL.
I would like to do a full text search, but scope the searchable records to a specific array of database ids.
In SQL world, it would be the functional equivalent of WHERE id IN(1, 2, 3, 4).
I've been researching, but I find the Elasticsearch query DSL documentation a little cryptic and devoid of useful examples. Can anyone point me in the right direction?

Here is an example query which might work for you. This assumes that the _all field is enabled on your index (which is the default). It will do a full text search across all the fields in your index. Additionally, with the added ids filter, the query will exclude any document whose id is not in the given array.
{
"bool": {
"must": {
"match": {
"_all": "your search text"
}
},
"filter": {
"ids": {
"values": ["1","2","3","4"]
}
}
}
}
Hope this helps!

As discussed by Ali Beyad, ids field in the query can do that for you. Just to complement his answer, I am giving an working example. In case anyone in the future needs it.
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field": "your query"
}
},
{
"ids" : {
"values" : ["0aRM6ngBFlDmSSLpu_J4", "0qRM6ngBFlDmSSLpu_J4"]
}
}
]
}
}
}

You can create a bool query that contains an Ids query in a MUST clause:
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-ids-query.html
By using a MUST clause in a bool query, your search will be further limited by the Ids you specify. I'm assuming here by Ids you mean the _id value for your documents.

According to es doc, you can
Returns documents based on their IDs.
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}

With elasticaBundle symfony 5.2
$query = new Query();
$IdsQuery = new Query\Ids();
$IdsQuery->setIds($id);
$query->setQuery($IdsQuery);
$this->finder->find($query, $limit);

You have two options.
The ids query:
GET index/_search
{
"query": {
"ids": {
"values": ["1, 2, 3"]
}
}
}
or
The terms query:
GET index/_search
{
"query": {
"terms": {
"yourNonPrimaryIdField": ["1", "2","3"]
}
}
}
The ids query targets the document's internal _id field (= the primary ID). But it often happens that documents contain secondary (and more) IDs which you'd target thru the terms query.
Note that if your secondary IDs contain uppercase chars and you don't set their field's mapping to keyword, they'll be normalized (and lowercased) and the terms query will appear broken because it only works with exact matches. More on this here: Only getting results when elasticsearch is case sensitive

Related

Elasticsearch join-like query on multiple types and different fields

I have an Elasticsearch index called my_index which contains documents of two types, Type1 and Type2.
The two document types contain different data about the same type of entity.
The two document types both contain the ID of the related entity.
I've been trying to construct a join-like query which would return entities which match conditions on both document types, but I can't get it to work, and I also can't find any citation in the Elasticsearch multi-type or query documentation that says it's not possible.
The problem I'm trying to solve is avoiding having to manually join two result sets by getting all Type1 hits and all Type2 hits and doing the join outside of Elasticsearch, since the index has millions of documents.
The equivalent in SQL would be
select * from
Type1 inner join Type2
on Type2.EntityId = Type1.EntityId
where
Type1.Field = Condition AND
Type2.Field = Condition [...]
The URL I'm using to query against is http://elastic/my_index/Type1,Type2/_search to include both document types.
If I perform a blank query against this URL, I get hits of both Type1 and Type2.
If I add a criterion for Type1, it works as expected:
{ "query": {
"bool": {
"must": [{
"term": {
"FieldOnType1": "lorem" } } ] } } }
Somehow Elasticsearch can infer that FieldOnType1 is indeed a field on Type1.
When I add a criterion for Type2, I don't get any hits:
{ "query": {
"bool": {
"must": [{
"term": {
"FieldOnType1": "lorem" } }, {
"term": {
"FieldOnType2": "ipsum" } } ] } } }
In reality, there are sometimes more than 2 term queries, or range queries and term queries.
I'm guessing the problem with the above query is that no single document can match both criteria at once.
I've tried
using should instead of must, and I've tried
qualifying the field names with type names, and I've tried
many variations of the query (including using filters instead of queries)
but everything gives me 0 hits.
Similar questions here suggest to use the Elasticsearch multi-search API instead of the search API, but that won't solve my "manual join" problem.
Is there a way to make an elaborate "OR" query that would allow queries on both types? Or something else?
Try multi_match query (I use ES 6, so have index p/type):
GET index1,index2/_search
{
"query":{
"multi_match": {
"query": "1",
"fields": ["FieldOnType1", "FieldOnType2"]
}
}
}
If you need to use different fields, should should work:
GET test,test1/_search
{
"query":{
"bool": {
"should": [
{
"term": {"firstName": "john"}
},
{
"term": {"firstName1": "jerry1"}
}
]
}
}
}

How to use multifield search in elasticsearch combining should and must clause

This may be a repeted question but I'm not findin' a good solution.
I'm trying to search elasticsearch in order to get documents that contains:
- "event":"myevent1"
- "event":"myevent2"
- "event":"myevent3"
the documents must not contain all of them in the same document but the result should contain only documents that are only with those types of events.
And this is simple because elasticsearch helps me with the clause should
which returns exactly what i want.
But then, I want that all the documents must contain another condition that is I want the field result.example.example = 200 and this must be in every single document PLUS the document should be 1 of the previously described "event".
So, for example, a document has "event":"myevent1" and result.example.example = 200 another one has "event":"myevent2" and result.example.example = 200 etc etc.
I've tried this configuration:
{
"query": {
"bool": {
"must":{"match":{"operation.result.http_status":200}},
"should": [
{
"match": {
"event": "bank.account.patch"
}
},
{
"match": {
"event": "bank.account.add"
}
},
{
"match": {
"event": "bank.user.patch"
}
}
]
}
}
}
but is not working 'cause I also get documents that not contain 1 of the should field.
Hope I explained well,
Thanks in advance!
As is, your query tells ES to look for documents that must have "operation.result.http_status":200 and to boost those that have a matching event type.
You're looking to combine two must queries
one that matches one of your event types,
one for your other condition
The event clause accepts multiple values and those values are exact matches : you're looking for a terms query.
Try
{
"query": {
"bool": {
"must": [
{"match":{"operation.result.http_status":200}},
{
"terms" : {
"event" : [
"bank.account.patch",
"bank.account.add",
"bank.user.patch"
]
}
}
]
}
}
}

How do I search within an list of strings in Elastic Search?

My data has a field localities which is an array of strings.
"localities": [
"Mayur Vihar Phase 1",
"Paschim Vihar",
"Rohini",
"",
"Laxmi Nagar",
"Vasant Vihar",
"Dwarka",
"Karol Bagh",
"Inderlok" ]
What query should I write to filter the documents by a specific locality such as "Rohini"?
A simple match query will be enough (if you don't know the mapping of your localities field).
POST <your index>/_search
{
"query": {
"match": {
"localities": "Rohini"
}
}
}
If the localities field is set as a string type and index as not_analyzed, the best way to query this is to use a term filter, wrapped in a filtered query (you can't use directly filters) :
POST <your index>/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"localities": "Rohini"
}
}
}
}
}
If you doesn't need the score, the second solution is the way to go as filters doesn't compute score, are faster and cached.
Check the documentation for information about analysis which is a very important subject in ElasticSearch, heavily influencing the way you query.
POST /_search
{
"query": {
"match": {
"localities": "Rohini"
}
}
}
Or you can simply query:
GET /_search?q=localities:Rohini

Filter items which array contains any of given values

I have a set of documents like
{
tags:['a','b','c']
// ... a bunch properties
}
As stated in the title: Is there a way to filter all documents containing any of given tags using Nest ?
For instance, the record above would match ['c','d']
Or should I build multiple "OR"s manually ?
elasticsearch 2.0.1:
There's also terms query which should save you some work. Here example from docs:
{
"terms" : {
"tags" : [ "blue", "pill" ],
"minimum_should_match" : 1
}
}
Under hood it constructs boolean should. So it's basically the same thing as above but shorter.
There's also a corresponding terms filter.
So to summarize your query could look like this:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tags": ["c", "d"]
}
}
}
}
With greater number of tags this could make quite a difference in length.
Edit: The bitset stuff below is maybe an interesting read, but the answer itself is a bit dated. Some of this functionality is changing around in 2.x. Also Slawek points out in another answer that the terms query is an easy way to DRY up the search in this case. Refactored at the end for current best practices. —nz
You'll probably want a Bool Query (or more likely Filter alongside another query), with a should clause.
The bool query has three main properties: must, should, and must_not. Each of these accepts another query, or array of queries. The clause names are fairly self-explanatory; in your case, the should clause may specify a list filters, a match against any one of which will return the document you're looking for.
From the docs:
In a boolean query with no must clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.
Here's an example of what that Bool query might look like in isolation:
{
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}
And here's another example of that Bool query as a filter within a more general-purpose Filtered Query:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}
}
}
Whether you use Bool as a query (e.g., to influence the score of matches), or as a filter (e.g., to reduce the hits that are then being scored or post-filtered) is subjective, depending on your requirements.
It is generally preferable to use Bool in favor of an Or Filter, unless you have a reason to use And/Or/Not (such reasons do exist). The Elasticsearch blog has more information about the different implementations of each, and good examples of when you might prefer Bool over And/Or/Not, and vice-versa.
Elasticsearch blog: All About Elasticsearch Filter Bitsets
Update with a refactored query...
Now, with all of that out of the way, the terms query is a DRYer version of all of the above. It does the right thing with respect to the type of query under the hood, it behaves the same as the bool + should using the minimum_should_match options, and overall is a bit more terse.
Here's that last query refactored a bit:
{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tag": [ "c", "d" ],
"minimum_should_match": 1
}
}
}
}
Whilst this an old question, I ran into this problem myself recently and some of the answers here are now deprecated (as the comments point out). So for the benefit of others who may have stumbled here:
A term query can be used to find the exact term specified in the reverse index:
{
"query": {
"term" : { "tags" : "a" }
}
From the documenation https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Alternatively you can use a terms query, which will match all documents with any of the items specified in the given array:
{
"query": {
"terms" : { "tags" : ["a", "c"]}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
One gotcha to be aware of (which caught me out) - how you define the document also makes a difference. If the field you're searching in has been indexed as a text type then Elasticsearch will perform a full text search (i.e using an analyzed string).
If you've indexed the field as a keyword then a keyword search using a 'non-analyzed' string is performed. This can have a massive practical impact as Analyzed strings are pre-processed (lowercased, punctuation dropped etc.) See (https://www.elastic.co/guide/en/elasticsearch/guide/master/term-vs-full-text.html)
To avoid these issues, the string field has split into two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search. (https://www.elastic.co/blog/strings-are-dead-long-live-strings)
For those looking at this in 2020, you may notice that accepted answer is deprecated in 2020, but there is a similar approach available using terms_set and minimum_should_match_script combination.
Please see the detailed answer here in the SO thread
You should use Terms Query
{
"query" : {
"terms" : {
"tags" : ["c", "d"]
}
}
}

elasticsearch - confused on how to searching items that a field contains string

This query is returning fine only one item "steve_jobs".
{
"query": {
"constant_score": {
"filter": {
"term": {
"name":"steve_jobs"
}
}
}
}
}
So, now I want to get all people with name prefix steve_. So I try this:
{
"query": {
"constant_score": {
"filter": {
"term": {
"name": "steve_"
}
}
}
}
}
This is returning nothing. Why?
I'm confused about when to use term query / term filter / terms filter / querystring query.
What you need is Prefix Query.
If you are indexing your document like so:
POST /testing_nested_query/class/
{
"name": "my name is steve_jobs"
}
And you are using the default analyzer, then the problem is that the term steve_jobs will be indexed as one term. So your Term Query will never be able to find any docs matching the term steve as there is no term like in the index. Prefix Query helps you solve your problem by searching for a prefix in all the indexed terms.
You can solve the same problem by making your custom analyzers (read this and this) so that steve_jobs is stored as steve and jobs.

Resources