Search multiple indices for an ID efficiently? - elasticsearch

I need to get a document but I have no idea what index it is in. I have a bunch of indices for different days; all prefixed with "mydocs-". I've tried:
GET /mydocs-*/adoc/my_second_doc
returns "index_not_found_exception"
GET /mydocs-*/adoc/_search
{
"query": {
"bool":{
"filter": [{
"term":{
"_id": ["my_second_doc"]
}
}]
}
}
}
returns all the docs in the index.
Now, if I search the specific index I can get the doc. Problem is that I don't always know the index it is in beforehand. So, I'd have to search many, many indices for it (thousands of indices).
GET /mydocs-12/adoc/my_second_doc
returns the desired doc.
Any ideas on how to do an efficient Get/Search for the doc?

Have you tried with :
GET mydocs-*/adoc/_search
{
"query": {
"term": {
"_id": "my_second_doc"
}
}
}
Or more specifically with :
GET mydocs-*/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"_id": "my_second_doc"
}
},
{
"term": {
"_type": "adoc"
}
}
]
}
}
}
The above two queries will find all the documents whose index starting with mydocs-, type is adoc and id is my_second_doc.

Related

Is it possible to access a query term in a script field?

I would like to construct an elasticsearch query in which I can search for a term and on-the-fly compute a new field for each found document, which is calculated based on some existing fields as well as the query term. Is this possible?
For example, let's say in my EL query I am searching for documents which have the keyword "amsterdam" in the "text" field.
"filter": [
{
"match_phrase": {
"text": {
"query": "amsterdam"
}
}
}]
Now I would also like to have a script field in my query, which computes some value based on other fields as well as the query.
So far, I have only found how to access the other fields of a document though, using doc['someOtherField'], for example
"script_fields" : {
"new_field" : {
"script" : {
"lang": "painless",
"source": "if (doc['citizens'].value > 10000) {
return "large";
}
return "small";"
}
}
}
How can I integrate the query term, e.g. if I wanted to add to the if statement "if the query term starts with a-e"?
You're on the right track but script_fields are primarily used to post-process your documents' attributes — they won't help you filter any docs because they're run after the query phase.
With that being said, you can use scripts to filter your documents through script queries. Before you do that, though, you should explore alternatives.
In other words, scripts should be used when all other mechanisms and techniques have been exhausted.
Back to your example. I see three possibilities off the top of my head.
Match phrase prefix queries as a group of bool-should subqueries:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase_prefix": {
"text_field": "a"
}
},
{
"match_phrase_prefix": {
"text_field": "b"
}
},
{
"match_phrase_prefix": {
"text_field": "c"
}
},
... till the letter "e"
]
}
}
]
}
}
}
A regexp query:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"regexp": {
"text_field": "[a-e].+"
}
}
]
}
}
}
Script queries using .charAt comparisons:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"source": """
char c = doc['text_field.keyword'].value.charAt(0);
return c >= params.gte.charAt(0) && c <= params.lte.charAt(0);
""",
"params": {
"gte": "a",
"lte": "e"
}
}
}
}
]
}
}
}
If you're relatively new to ES and would love to see real-world examples, check out my recently released Elasticsearch Handbook. One chapter is dedicated to scripting and as it turns out, you can achieve a lot with scripts (if of course executed properly).

How to write ElasticSearch query with AND condition

I am trying to write an elastic search query for searching the data with two.conditions something as below
{
"query": {
"match": {
"trackingId": "track4324234234244",
"log_message": "downstream request-response"
}
}
}
The above query wont work because [match] query doesn't support multiple fields. Is there a way I can achieve this.
You can use Bool query, where a must clause can be used.
must means: The clause (query) must appear in matching documents. These clauses must match, like logical AND.
To know about the difference between must and should refer to this SO answer
Adding Working example with sample docs and search query
Index Sample Data:
{
"trackingId":"track4324234234244",
"log_message":"downstream request-response"
}
{
"trackingId":"track4324234234244",
"log_message":"downstream"
}
{
"trackingId":"tracks4324234234244",
"log_message":"downstream request-response"
}
Search query:
{
"query": {
"bool": {
"must": [
{
"match": {
"trackingId": "track4324234234244"
}
},
{
"match": {
"log_message": {
"query": "downstream request-response",
"operator": "and"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.8570712,
"_source": {
"trackingId": "track4324234234244",
"log_message": "downstream request-response"
}
}
]
Apart from Bool, you can also make use of simple query string as mentioned below:
POST <your_index_name>/_search
{
"query": {
"simple_query_string": {
"fields": ["trackingId", "log_message"],
"query": "track4324234234244 downstream request-response",
"default_operator": "AND"
}
}
}
Note how I've just added all the terms and made use of default_operator: AND so that it returns only documents having all the terms present in the fields.
There is also query_string however I would recommend using the above one as query_string works in strict fashion meaning, it would throw errors if the query string has any syntax errors while simple_query_string does not.
POST <your_index_name>/_search
{
"query": {
"query_string": {
"fields": ["trackingId", "log_message"],
"query": "(track4324234234244) AND (downstream request-response)",
"default_operator": "AND"
}
}
}
So as to when to use simple_query_string, mostly only if you would want to expose the query string or terms to end user, at that point which this would be useful.
Hope that helps!

Difference between elasticsearch queries

I'm having a hard time trying to figure out why these two queries do not return the same number of results (I'm using elasticsearch 2.4.1):
{
"nested": {
"path": "details",
"filter": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
{
"nested": {
"path": "details",
"filter": {
"bool": {
"must": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
}
}
The first query has more results.
My guess was that the filter clause in the first query was working like an or/should, but if I replace the must in the second query with a should, the query yields a greater number of results than that of those two.
How does the meaning of those queries differ?
I'm afraid I have no knowledge of the structure of the indexed documents; all I know is how many rows each query returns.
The first query is wrong, the nested filter cannot be an array, so I suspect ES doesn't parse it correctly and only takes one match instead of both, which is probably why it returns more data than the second one.
The second query is correct in terms of nested filter and yields exactly what you expect.

How to check field data is numeric when using inline Script in ElasticSearch

Per our requirement we need to find the max ID of the document before adding new document. Problem here is doc may contain string data also So had to use inline script on the elastic query to find out max id only for the document which has integer data otherwise returning 0. am using following inline script query to find max-key but not working. can you help me onthis ?.
{
"size":0,
"query":
{"bool":
{"filter":[
{"term":
{"Name":
{
"value":"Test2"
}
}}
]
}},
"aggs":{
"MaxId":{
"max":{
"field":"Key","script":{
"inline":"((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"}}
}
}
}
The error is because the max aggregation only supports numeric fields, i.e. you cannot specify a string field (i.e. Key) in a max aggregation.
Simply remove the "field":"Key" part and only keep the script part
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"Name": "Test2"
}
}
]
}
},
"aggs": {
"MaxId": {
"max": {
"script": {
"source": "((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"
}
}
}
}
}

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Resources