Elasticsearch inline string replace seems to do nothing - elasticsearch

We have some legacy fields in Elastic search index, which cause us some troubles and we would like to perform a string replace over the whole index.
For instance some old timestamps are stored in format of 2000-01-01T00:00:00.000+0100 but should be stored as 2000-01-01T00:00:00.000+01:00.
I tried to run following query:
POST /my_index/_update_by_query
{
"script":
{
"lang": "painless",
"inline": "ctx._source.timestamp = ctx._source.timestamp.replace('+0100', '+01:00')"
}
}
I run the query within Kibana, but I always get a query timeout - I guess that is not necessarily bad considering the database is huge, however I never see the fields updated.
Is there a way to see the status of such query?
I also tried to create a search query for the update, but with no luck:
GET /my_index/_search
{
"query": {
"query_string": {
"query": "*0100",
"allow_leading_wildcard": true,
"analyze_wildcard": true,
"fields": ["timestamp"]
}
}
}
Which unfortunately always returns empty set - not sure what might be wrong.
What would be a correct way to achieve such update?

I would solve this using an ingest pipeline that you'll use to update your whole index.
First, create the ingest pipeline like below. What it does is detect documents which have a timestamp field ending with +0100 and then updates the timestamp to use the timezone with the correct format.
PUT _ingest/pipeline/fix-tz
{
"processors": [
{
"dissect": {
"if": "ctx.timestamp.endsWith('+0100')",
"field": "timestamp",
"pattern": "%{timestamp}+%{tz}"
}
},
{
"set": {
"if": "ctx.tz != null",
"field": "timestamp",
"value": "{{timestamp}}+01:00"
}
},
{
"remove": {
"if": "ctx.tz!= null",
"field": "tz"
}
}
]
}
Then, when the pipeline is created, you just have to update your index with it, like this:
POST my_index/_update_by_query?pipeline=fix-tz&wait_for_completion=false
Once this has run completely, your index should be properly updated.

Related

Find all entries on a list within Kibana via Elasticserach Query DSL

Could you please help me on this? My Kibana Database within "Discover" contains a list of trades. I know want to find all trades within this DB that have been done in specific instruments (ISIN-Number). When I add a filter manually and switch to Elasticserach Query DSL, I find the following:
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"obdetails.isin": "CH0253592783"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0315622966"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0357659488"
}
}
],
"minimum_should_match": 1
}
}
}
Since I want to check the DB for more than 200 ISINS, this seems to be inefficient. Is there a way, in which I could just say "show me the trade if it contains one of the following 200 ISINs?".
I already googled and tried this, which did not work:
{
"query": {
"terms": {
"obdetails.isin": [ "CH0357659488", "CH0315622966"],
"boost": 1.0
}
}
}
The query works, but does not show any results.
To conclude. A field of type text is analyzed which basically converts the given data to a list of terms using given analyzers etc. rather than it being a single term.
Given behavior causes the terms query to not match these values.
Rather than changing the type of the field one may add an additional field of type keyword. That way a terms queries can be performed whilst still having the ability to match on the field.
{
"isin": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
The above example will add an extra field called obdetails.isin.keyword which can be used for terms. While still being able to use match queries on obdetails.isin

ElasticSearch must-terms does not return data

My ElasticSearch must-terms does not work, the data has clientId value "08d71bc7-c4ab-6e1d-f858-cf3448242e8b" but the result is empty. I am using elasticsearch:6.7.1. Do you know the problem here?
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{ "terms": { "clientId": ["08d71bc7-c4ab-6e1d-f858-cf3448242e8b", "08d71bc7-c4ab-6e1d-f858-cf3448242e8c"] } },
{
"query_string": {
"query": "*d*",
"fields": ["name", "description", "title"]
}
},
{ "query_string": { "query": "1", "fields": ["type"] } }
]
}
}
}
I share sample data
I haven't worked enough with "query_string"... But if you don't put them and run your query, I'm sure it should at least give you some results. If so, your "query_string"s are the ones that are giving you this bad time
I first recommend you to use "filter" instead of "must".
Consider using the Regexp query your first "query_string". I found here how to query multiple fields with Regexp.
For the second, it would be enough to use "term" instead of "query_string".
Hope this is helpful! :D
The search results depends on the analysis type of clientId . If clientId is a 'keyword' your query should work as expected, but if the type of clientId is 'text' then the value might get tokenized to smaller parts (break at the dash).
You can check the clientId fields type in the index mappings, and also run the analyze API to check the tokenization: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

Elasticsearch not searching a field in term query

I'm having a problem in searching records with a field. Although, field exists with the value in the document in Elasticsearch but when I use this field to search as term, it does not fetch the record. Other fields are doing great.
JSON Request:
{
"query": {
"filtered": {
"query": {
"match_all": [
]
},
"filter": {
"and": [
{
"term": {
"sellerId": "6dd7035e-1d6f-4ddb-82f4-521902bfc29e"
}
}
]
}
}
}
}
It does not return any error, it just doesn't fetch the related document. I tried searching with other fields and they worked fine.
Is there anything I'm missing here ?
Elasticsearch version: 2.2.2
Arfeen
You need to reindex your data and change the mapping of that field to
"sellerId": {
"type": "string",
"index": "not_analyzed"
}
That way the UUID won't be analyzed and split into tokens and you'll be able to search it using a term query.

returning just whats inside of _source

I'm using the elasticsearch javascript library and am struggling to figure out how to just return whats inside of the _source object...I pull that data like this:
client.search({
index: 'kafkajmx2',
body: {
"_source": "*",
"size": 10000,
"query": {
"bool": {
"must": [
{ "match": { "metric_name": "IsrExpandsPerSec.Count" }}
],
"filter": [
{
"range": {
"#timestamp": {
"gte": "now-60m"
}
}
}
]
}
}
}
})
but I don't get just the source back...if I change "_source": "*" to "_source": true, I still get the same results back...
There is metadata that is associated with the results that are returned. The * that you are indicating in the _source is only used for the fields within _source, and not the meta data, which is everything outside the _source object in your JSON payload. Elasticsearch - how to return only data, not meta information? I believe is similar to what you are asking, and it appears that it is not doable, although that question is fairly old as there are newer versions of ElastiSearch out there. Looking at the latest version, as of this writing is 5.2, does not allow you to do this. You will need to parse the returned results from the query.

Aggregation over "LastUpdated" property or _timestamp

My Elasticsearch mapping looks like roughly like this:
{
"myIndex": {
"mappings": {
"myType": {
"_timestamp": {
"enabled": true,
"store": true
},
"properties": {
"LastUpdated": {
"type": "date",
"format": "dateOptionalTime"
}
/* lots of other properties */
}
}
}
}
}
So, _timestamp is enabled, and there's also a LastUpated property on every document. LastUpdated can have a different value than _timestamp: sometimes, documents get updated physically (e.g. updates to denormalized data) which updates _timestamp, but LastUpdated remains unchanged because the document hasn't actually been "updated" from a business perspective.
Also, there are many of documents without a LastUpdated value (mostly old data).
What I'd like to do is run an aggregation which counts the number of documents per calendar day (kindly ignore the fact that the dates need to be midnight-aligned, please). For every document, use LastUpdated if it's there, otherwise use _timestamp.
Here's what I've tried:
{
"aggregations": {
"counts": {
"terms": {
"script": "doc.LastUpdated == empty ? doc._timestamp : doc.LastUpdated"
}
}
}
}
The bucketization appears to work to some extent, but the keys in the result looks weird:
buckets: [
{
key: org.elasticsearch.index.fielddata.ScriptDocValues$Longs#7ba1f463doc_count: 300544
}{
key: org.elasticsearch.index.fielddata.ScriptDocValues$Longs#5a298acbdoc_count: 257222
}{
key: org.elasticsearch.index.fielddata.ScriptDocValues$Longs#6e451b5edoc_count: 101117
},
...
]
What's the proper way to run this aggregation and get meaningful keys (i.e. timestamps) in the result?
I've tested and made a groovy script for you,
POST index/type/_search
{
"aggs": {
"counts": {
"terms": {
"script": "ts=doc['_timestamp'].getValue();v=doc['LastUpdated'].getValue();rv=v?:ts;rv",
"lang": "groovy"
}
}
}
}
This returns the required result.
Hope this helps!! Thanks!!

Resources