returning just whats inside of _source - elasticsearch

I'm using the elasticsearch javascript library and am struggling to figure out how to just return whats inside of the _source object...I pull that data like this:
client.search({
index: 'kafkajmx2',
body: {
"_source": "*",
"size": 10000,
"query": {
"bool": {
"must": [
{ "match": { "metric_name": "IsrExpandsPerSec.Count" }}
],
"filter": [
{
"range": {
"#timestamp": {
"gte": "now-60m"
}
}
}
]
}
}
}
})
but I don't get just the source back...if I change "_source": "*" to "_source": true, I still get the same results back...

There is metadata that is associated with the results that are returned. The * that you are indicating in the _source is only used for the fields within _source, and not the meta data, which is everything outside the _source object in your JSON payload. Elasticsearch - how to return only data, not meta information? I believe is similar to what you are asking, and it appears that it is not doable, although that question is fairly old as there are newer versions of ElastiSearch out there. Looking at the latest version, as of this writing is 5.2, does not allow you to do this. You will need to parse the returned results from the query.

Related

Elasticsearch inline string replace seems to do nothing

We have some legacy fields in Elastic search index, which cause us some troubles and we would like to perform a string replace over the whole index.
For instance some old timestamps are stored in format of 2000-01-01T00:00:00.000+0100 but should be stored as 2000-01-01T00:00:00.000+01:00.
I tried to run following query:
POST /my_index/_update_by_query
{
"script":
{
"lang": "painless",
"inline": "ctx._source.timestamp = ctx._source.timestamp.replace('+0100', '+01:00')"
}
}
I run the query within Kibana, but I always get a query timeout - I guess that is not necessarily bad considering the database is huge, however I never see the fields updated.
Is there a way to see the status of such query?
I also tried to create a search query for the update, but with no luck:
GET /my_index/_search
{
"query": {
"query_string": {
"query": "*0100",
"allow_leading_wildcard": true,
"analyze_wildcard": true,
"fields": ["timestamp"]
}
}
}
Which unfortunately always returns empty set - not sure what might be wrong.
What would be a correct way to achieve such update?
I would solve this using an ingest pipeline that you'll use to update your whole index.
First, create the ingest pipeline like below. What it does is detect documents which have a timestamp field ending with +0100 and then updates the timestamp to use the timezone with the correct format.
PUT _ingest/pipeline/fix-tz
{
"processors": [
{
"dissect": {
"if": "ctx.timestamp.endsWith('+0100')",
"field": "timestamp",
"pattern": "%{timestamp}+%{tz}"
}
},
{
"set": {
"if": "ctx.tz != null",
"field": "timestamp",
"value": "{{timestamp}}+01:00"
}
},
{
"remove": {
"if": "ctx.tz!= null",
"field": "tz"
}
}
]
}
Then, when the pipeline is created, you just have to update your index with it, like this:
POST my_index/_update_by_query?pipeline=fix-tz&wait_for_completion=false
Once this has run completely, your index should be properly updated.

Elasticsearch - Include fields in highlight excluded in _source

I know objects marked as excluded in the _source mapping can be included in the search query. But I have a requirement to include matching terms in the highlight section of the response.
e.g.
I have a mapping like:
{
"mappings": {
"doc": {
"_source": {
"excludes": ["some_nested_object.complex_tags_object"]
},
"properties": {
"some_nested_object": {
"type": "nested"
}
}
}
}
}
Search Query:
GET my_index/_search {
"size": 500,
"query": {
"bool": {
"must": [{
"nested": {
"query": {
"bool": {
"must":
[{
"match_phrase_prefix": {
"some_nested_object.complex_tags_object.name": {
"query": "account"
}
}
}
]
}
},
"path": "some_nested_object"
}
}
]
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"some_nested_object.complex_tags_object.name": {}
}
}
}
If I don't exclude in the mapping but in the search query at runtime then I am able to return matching terms in the highlight section but the response is very slow due to the large size of the object.
So is it possible to include fields marked as exclude in the mapping/doc/_source as part of highlight?
So is it possible to include fields marked as exclude in the mapping/doc/_source as part of highlight?
The short answer to your question unfortunately is no. From the Elasticsearch highlighting documentation:
Highlighting requires the actual content of a field. If the field is not stored (the mapping does not set store to true), the actual _source is loaded and the relevant field is extracted from _source.
You have a few options, each of which involve compromise:
Include your field back into the source if you absolutely need to support highlighting over it (I appreciate this will conflict with the reasons for excluding it from the source in the first place)
Relax the requirement to support highlighting over this field (compromise on features)
Implement a highlighting feature for this field outside Elasticsearch (probably this will compromise on quality of your solution and perhaps cost)

Escaping a hash in a parent ID for elastic search?

We were having some issues where our queries weren't returning items with specific version IDs using ElasticSearch 2.3. After some investigation, it looks like our current elasticsearch query is not behaving when there is a '#' in the version ID.
The query I am trying to perform is something like the following:
{
"query": {
"constant_score": {
"filter": {
"terms": {
"_parent": [
"faro-deployments-webservice-infrastructure|#abc123",
"faro-deployments-webservice-infrastructure|xyz321"
]
}
}
}
}
}
This works fine but excludes any results where the parent ID has a '#' character in it.
I can't seem to find it again, but I recall reading somewhere that # has a specific meaning in this context. I have tried a variety of ways to attempt to escape the #, is there a way to support versions with a # character in it for this or perform a similar query with similar results?
The following seems to work for me. I changed the query to do something similar and did not use the "_parent" field.
{
"query": {
"has_parent": {
"type": "deck",
"query": {
"constant_score": {
"filter": {
"terms": {
"_id": [
"faro-deployments-webservice-infrastructure|#abc123",
"faro-deployments-webservice-infrastructure|xyz321"
]
}
}
}
}
}
}
}

elasticsearch-py driver is not filtering data properly when aggregating

I faced a weird with elasticsearch python drivers and would like if someone can explain it to me! The below code works directly from cURL but doesn't work with python-requests or elasticsearch-py, strangely, it works when I switch to pyelasticsearch library! The details are:
I have a type called MY_TYPE that has a nested object MY_NESTED_FIELD and a child document MY_CHILD_TYPE. I'm trying to do term facet aggregation on the nested attributes based on filters applied to the MY_TYPE and MY_CHILD_TYPE types. The query looks like
{
"query": {
"filtered": {
"filter": {
"has_child": {
"query": {
"range": {
"CHILD_FIELD": {
"gte": 0.5
}
}
},
"type": "MY_CHILD_TYPE"
}
}
}
},
"aggs": {
"aggregation_results": {
"aggs": {
"boards": {
"terms": {
"field": "MY_NESTED_FIELD.KEY",
"size": 100
},
"aggs": {
"MY_RANGES": {
"range": {
"ranges": [
{
"to": 0.5,
"from": 0
},
{
"to": 0.8
"from": 0.5
}
],
"field": "MY_NESTED_FIELD_PATH.VALUE"
}
}
}
}
},
"nested": {
"path": "MY_NESTED_FIELD_PATH"
}
}
}
}
When I run this query against elasticsearch directly (using cURL or head plugin) it filters the parent and returns aggregations based on correct results. However, when I try it from the python script, it runs successfully but returns wrong data (it returns facets from all the documents without applying the filter)
I have tried:
cURL: Works!
ElasticSearch's HEAD plugin: Works!
python-requests version 2.8.1: Did not work!
elasticsearch-py api versions 1.4.0 and 2.1.0: Did not work!
pyelasticsearch version 1.4: Works!
The code snippets for elasticsearch-py is:
from elasticsearch import Elasticsearch
es = Elasticsearch('HOST:PORT')
data = es.search(index='INDEX_NAME', doc_type='MY_TYPE', body=payload, q='*:*', size=0)
When using python-requests, the code was:
import requests
url = 'http://ES_HOST:ES_PORT/ES_INDEX/ES_TYPE/_search'
params = {'size':0, 'q':'*:*'}
data = requests.post(url, params=params, data=json.dumps(payload)).json()
My elastic search version is:
{
"version": {
"number": "1.4.4",
"build_hash": "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
"build_timestamp": "2015-02-19T13:05:36Z",
"build_snapshot": false,
"lucene_version": "4.10.3"
}
}
So my questions are:
Is this the best way to write this query?
Is there an explanation for why elasticsearch-py is acting strangely?
Is there a fix for this on elasticsearch-py?

Elastic search aggregation sum

Im using elasticsearch 1.0.2 and I want to perform a search on it using a query with aggregation functions like sum()
Suppose my single record data is something like that
{
"_index": "outboxpro",
"_type": "message",
"_id": "PAyEom_mRgytIxRUCdN0-w",
"_score": 4.5409594,
"_source": {
"team_id": "1bf5f3f968e36336c9164290171211f3",
"created_user": "1a9d05586a8dc3f29b4c8147997391f9",
"created_ip": "192.168.2.245",
"folder": 1,
"report": [
{
"networks": "ec466c09fd62993ade48c6c4bb8d2da7facebook",
"status": 2,
"info": "OK"
},
{
"networks": "bdc33d8ca941b8f00c2a4e046ba44761twitter",
"status": 2,
"info": "OK"
},
{
"networks": "ad2672a2361d10eacf8a05bd1b10d4d8linkedin",
"status": 5,
"info": "[unauthorized] Invalid or expired token."
}
]
}
}
Let's say I need to fetch the count of all success messages posted with status = 2 in report field. There will be many record in the collection. I want to take report of all success messages posted.
I have tried the following code
////////////// Edit
{
"size": 2000,
"query": {
"filtered": {
"query": {
"match": {
"team_id": {
"query": "1bf5f3f968e36336c9164290171211f3"
}
}
}
}
},
"aggs": {
"genders": {
"terms": {
"field": "report.status"
}
}
}
}
Please help me to find some solution. Am newbie in elastic search. Is there any other aggregation method to find this one ?. Your help i much appreciate.
Your script filter is slow on big data and doesn't use benefits of "indexing". Did you think about parent/child instead of nested? If you use parent/child - you could use aggregations natively and use calculate sum.
You will have to make use of nested mappings here. Do have a look at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-mapping.html.
And then you will have to do aggregation on nested fields as in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html.

Resources