Related to previous question, how can we format string escaped value? - elasticsearch

Related to the previous question
How to make _source field dynamic ?
I was able to make search template _source field dynamic from the front-end, but due to invalid JSON format, I had to make it to string format. which is very hard to read. Is there any way to make it in a readable form? I tried \ after each new line to make (as suggested in ruby) but could not get it working.
"source": "{\"query\":{\"bool\":{\"must\":{\"match\":{\"line\":\"{{text}}\"}},\"filter\":{{{#line_no}}\"range\":{\"line_no\":{{{#start}}\"gte\":\"{{start}}\"{{#end}},{{/end}}{{/start}}{{#end}}\"lte\":\"{{end}}\"{{/end}}}}{{/line_no}}}}}}"
this is the string query which saved in a YML file.
I tried with ruby multiline string but still giving a parsing error.
I have created a template.yml file and store the template as given below
template: |
{
"script": {
"lang": "mustache",
"source": '{'\
'"_source": {{#toJson}}fields{{/toJson}}'\
'}'\
}
}
also tried the replace with double quotes and still backtick not helping.

Related

Elastic search filename search not working with dots in filename

I have elasticsearch mapping as follows:
{
"info": {
"properties": {
"timestamp": {"type":"date","format":"epoch_second"},
"user": {"type":"keyword" },
"filename": {"type":"text"}
}
}
}
When I try to do match query on filename, it works properly when I don't give dot in search input, but when dot in included, it returns many false results.
I learnt that standard analyzer is the issue. It breaks search input on dots and then search. What analyzer I can use in this case? The filenames can be millions and I don't want something with takes lot of memory and time. Please suggest.
As you are talking about filenames here, i would suggest using the keyword analyzer. This will not split the string and just index it as it is.
You could also just change ur mapping from text to keyword instead.

Elasticsearch - text type regexp

Does elasticsearch support regex search on text type string?
I created a document like below.
{
"T": "a$b$c$d"
}
and I tried to search this document with below query.
{
"query": {
"query_string": {
"query": "T:/a.*/"
}
}
}
It seems work for me, BUT when I tried to query with '$' symbol. It's unable to find the document.
{
"query": {
"query_string": {
"query": "T:/a$.*/"
}
}
}
How should I do to find the document? This key data should be text type(not keyword) since it can be longer than keyword max length.
You should be aware of some things, here:
If your field is analyzed (and tokenized in the process) you will only find matches in fields containing a token (not the whole "text") that matches your RegExp. If you want the whole content of the field to match, you must use a keyword field or at least a Keyword Analyzer that doesn't tokenize your text.
The $ symbol has a special meaning in Regular Expressions (it marks the end of a string), so you'll have to escape it: a\$.*
Your RegExp must match a whole token to get a hit. That's why there's no point to use $ as a (non-escaped) RegExp symbol: Your RegExp must match a whole token from beginning to end, anyway. So (to stick to your example) to match fields where a is followed by c, you'd need .*?a[^c]*c.*, or if you need the $s in there, escape them: .*?a\$[^c]*c\$.*

Not able to parse string to date in logstash/elasticSearch

I had created a logstash script to read a logfile which is having various timestamp of format "2018-05-08T12:18:53.506+0530". I am trying to parse it to date using the date filter in log stash
date{
match => ["edrTimestamp","yyyy-MM-dd'T'HH:mm:ss.SSS'Z'","ISO8601"]
target => "edrTimestamp"
}
The running the above logstash script it creates a elastic search index. But still the string is not parsed to date. It is also showing date parsed exception in the index.
It creates output like this.
{
"tags": [
"_dateparsefailure"
],
"statusCode": "805",
"campaignRedemptionLimitTotal": 1000,
"edrTimestamp": "2018-05-22T16:41:25.162+0530 ",
"msisdn": "+919066231327",
"timestamp": "2018-05-22T16:41:25.122+0530",
"redempKeyword": "print1",
"campaignId": "C910101-1527004962-1582",
"category": "RedeemRequestReceived"
}
Please tell me whats wrong in the above code> I had tried many others alternative but still it is not working.
Your issue is that your timestamp has a space at the end of it "edrTimestamp": "2018-05-22T16:41:25.162+0530 ", which is causing the date parsing to fail. You need to add a:
mutate {
strip => "edrTimestamp"
}
before your date filter.
I don't think you should be escaping the Z. So you probably want something like:
yyyy-MM-dd'T'HH:mm:ss,SSS
Also you should not be using "Z" since your time is not Zulu (0 offset). You will want to contain the offset as part of the pattern. The Heroku grok debug app is useful for this.
If I pass your string
2018-05-08T12:18:53.506+0530
and use the filter %{TIMESTAMP_ISO8601} then it matches, this pattern is made up of the following sub-patterns:
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?

How do you search for exact terms (which may include special characters) with trailing/leading wildcard matching in Elasticsearch?

I am trying to figure out how to create Elasticsearch queries that allow for exact matches containing reserved characters while supporting trailing or leading wildcard expansion. I am using logstash dynamic templates which automatically also creates a raw field for each of my terms.
To sum up as concisely as possible, I want to create queries that can support two generic types of matching across all values:
Searching terms such as 'abc' to return results like 'abc.xyz.com'. In this case, the token created by the standard token analyzer completely tokenizes 'abc.xyz.com' into one token, and wildcard matching can succeed using the following command:
{
"query": {
"wildcard": {
"_all": "*abc*"
}
}
}
Searching terms such as fullpaths like '/Intel/1938138191(1).zip' to return results like 'C:/Program Files (x86)/Intel/1938138191(1).zip'. In this case, even if I backslash all of the reserved characters, doing a wildcard match like
{
"query": {
"wildcard": {
"_all": "*/Intel/1938138191(1).zip*"
}
}
}
will not work. And this is because _all defaults to using the standard analyzer, so the path will be split and an exact match cannot be made. However, if I SPECIFICALLY query the raw field like below (both when I escape / do not escape the special characters), I get the correct result:
{
"query": {
"wildcard": {
"field.raw": "*/Intel/1938138191(1).zip*"
}
}
}
So my question is, is there any way to support calling wildcard queries across both tokens analyzed by the standard analyzers and the raw fields which are not analyzed at all, in one query? That is some way of generically encapsulating searched terms so that in both of my above examples, I would get the correct result? For reference I am using Elasticsearch version 1.7. I have also tried looking into query string matching and term matching, all to no avail.

Elasticsearch escape hyphenated field in groovy script

I am attempting to add a field to a document doing something similar to https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#_scripted_updates. However, I appear to be running into issues due to the field being hyphen separated(appears to be treated as a minus sign) as opposed to underscore separated.
Example body below:
{"script":"ctx._source.path.to.hyphen-separated-field = \"new data\""}
I attempted to escape the hyphens with a backslash, but to no luck.
You can access the field using square brackets, i.e. simply do it like this:
{"script": "ctx._source.path.to['hyphen-separated-field'] = \"new data\""}
This one worked for me on 2.x (or maybe other version as well):
"script": {
"inline": "ctx._source.path.to[field] = val",
"params": {
"val": "This is the new value",
"field": "hyphen-separated-field"
}
}
Or this will also work
{"script": "ctx._source.path.to.'hyphen-separated-field' = 'new data'"}

Resources