Client-side validation of Elasticsearch query string - elasticsearch

I have an application that uses NEST (Elasticsearch .NET client) to communicate with an Elasticsearch cluster. The integration allows the user to specify input for the "query_string" portion of a query.
The user may input an invalid query. Say "AND", which is invalid because the predicate is incomplete. But the error message that comes back from Elasticsearch is exceedingly verbose, and contains terminology that isn't very user-friendly, like "all shards failed".
Is there a way I can offer the user a more meaningful error message (say - "bad predicate"). Ideally, the users search string would be validated without an Elasticsearch round-trip, but I'll settle for a simpler error message however I can get it.

The error message returned by Elasticsearch is verbose but for parsing errors like these, Elasticsearch throws a QueryParsingException. If you examine the error message closely, you'll find the string QueryParsingException towards the end of the entire error message. This is the exception (and its message) you are interested in. For example, when I spelt must as mus2t in a search request, I get a huge error message by Elasticsearch and below is the last part of the error message.
QueryParsingException[[<index name>] bool query does not support [mus2t]]; }]
I got this when I spelt must as mus2t. You can parse and extract out this error message.

You can use validation api.
For following query
var validateResponse = client.Validate<Document>(descriptor => descriptor
.Explain()
.Query(query => query
.QueryString(qs => qs
.OnFields(f => f.Name)
.Query("AND"))));
you will get
org.elasticsearch.index.query.QueryParsingException: [indexname]
Failed to parse query [AND];
org.apache.lucene.queryparser.classic.ParseException: Cannot parse
'AND': Encountered " <AND> "AND "" at line 1, column 0. Was expecting
one of:
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<NUMBER> ...
<TERM> ...
"*" ...
Still not so perfect for end user and it requires round-trip to ES, but maybe it will be helpful.

Related

Cannot Parse The "search_phase_execution_exception" Into Go Structure *elastic.Error And Get The Root Cause

My department uses the olivere elastic v7.0.26 as ElasticSearch Client,sometimes our front-end system returns some errormsg such as "Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]" but doesn't have any other useful msg to help troubleshooting,so I've searched GitHub about parsing errors to *elastic.Error and codes are just like this:
if err != nil {
log.WithFields(ctx, log.Fields{}).WithError(err).Warn("list query es error")
if ex, ok := err.(*elastic.Error); ok {
log.WithFields(ctx, log.Fields{"query": query, "status": ex.Status, "detail": ex.Details}).WithError(ex).Warnf("list query es err")
}
return res, err
}
But the strange thing happened is "list query es error" was printed on our log system but the next log "list query es err" didn't,i've used deep-paging (large from+size) to check the log,it works and print the elastic error such as root cause so i can get the max_result_window tips..But it returns a unparsing error which is returned as the search_phase_execution_exception error,the company code is not allowed to paste to the open-sourcing website,and I just wanna know what error can make elasticsearch returns the search_phase_execution_exception and the errcode 400?Really appreciate your help!
same to above description,but the point I need to supple is that the problem is not always happened,so I can exclude the index mapping field type error such as "text/keyword",just wanna get the whole error/exceptions elasticsearch can return,but I cannot find the relative documentions on elastic guide,that's ok if u can provide the guide/doc about this
uhh it seems like a joke XD,I've never learned about some troubleshooting technique,but after I asked this question,I realized that I can get the ElasticSearch server log,so I asked our company's SRE and got the log,it's a error about terms query exceed maximum,and the reason caused is our RPC interface return a exceptional result about nil data redis set key,it will be a big-key problem,but we found this and fix it.

Can a logstash filter error be forwarded to elastic?

I'm having these json parsing errors from time to time:
2022-01-07T12:15:19,872][WARN ][logstash.filters.json ] Error parsing json
{:source=>"message", :raw=>" { the invalid json }", :exception=>#<LogStash::Json::ParserError: Unrecognized character escape 'x' (code 120)
Is there a way to get the :exception field in the logstash config file?
I opened the exact same thread on the elastic forum and got a working solution there. Thanks to #Badger on the forum, I ended up using the following raw ruby filter:
ruby {
code => '
#source = "message"
source = event.get(#source)
return unless source
begin
parsed = LogStash::Json.load(source)
rescue => e
event.set("jsonException", e.to_s)
return
end
#target = "jsonData"
if #target
event.set(#target, parsed)
end
'
}
which extracts the info I needed:
"jsonException" => "Unexpected character (',' (code 44)): was expecting a colon to separate field name and value\n at [Source: (byte[])\"{ \"baz\", \"oh!\" }\r\"; line: 1, column: 9]",
Or as the author of the solution suggested, get rid of the #target part and use the normal json filter for the rest of the data.

Logstash Grok Parser not working for error logs

I am trying to parse error logs using Logstash to capture few fields especially errormessage. But unable to capture errormessage in Logstash. Below is the actual error message and parser which I wrote
12345 http://google.com 2017-04-17 09:02:43.065 ERROR 10479 --- [http-nio-8052-exec-2] com.utilities.TokenUtils : Error
org.xml.SAXParseException: An invalid XML character (Unicode: 0xe) was found in the value of attribute "ID" and element is "saml".
at org.apache.parsers.DOMParser.parse(Unknown Source)
at org.apache.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at com.utilities.TokenUtils.validateSignature(TokenUtils.java:99)
Parser:
`%{NOTSPACE:stnum}\s*%{NOTSPACE:requestURL}\s*%{TIMESTAMP_ISO8601:log_timestamp}\s*%{LOGLEVEL:loglevel}\s*%{NUMBER:pid}\s*---\s*\[(?<thread>[A-Za-z0-9-]+)\]\s*%{DATA:class}\s*:\s%{NOTSPACE:level}\s*(?<errormessage>.[^\n]*).[^\n]*`
I am trying to capture this message from the log:
org.xml.SAXParseException: An invalid XML character (Unicode: 0xe) was found in the value of attribute "ID" and element is "saml".
Which logstash parser you are using? Please provide while conf file which can give us more info. Here's the sample to parse exception type from your logs (Using grok filter).
filter {
grok {
match => ["message", "%{DATA:errormessage} %{GREEDYDATA:EXTRA}"]
}
}

fluent-plugin-elasticsearch: "Could not push log to Elasticsearch" error with "error"=>{"type"=>"mapper_parsing_exception"}

When I am injecting data collected by Fluentd to Elasticsearch using fluent-plugin-elasticsearch, some data caused the following error:
2017-04-09 23:47:37 +0900 [error]: Could not push log to Elasticsearch: {"took"=>3, "errors"=>true, "items"=>[{"index"=>{"_index"=>"logstash-201704", "_type"=>"ruby", "_id"=>"AVtTLz_cUzkwT9CQCxrH", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [message]", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:27"}}}}, .....]}
It seems that elasticsearch banned the data for error failed to parse [message] and Can't get text on a START_OBJECT at 1:27. but I cannot see what data is sent to Elasticsearch and what's wrong.
Any ideas?
fluent-plugin-elasticsearch uses _bulk API to sending data. I put the request-dumping code on /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-5.0.4/lib/elasticsearch/api/actions/bulk.rb as following:
def bulk(arguments={})
...
payload = body
end
$log.info([method, path, params, payload].inspect) # <=== here ($log is global logger of fluentd)
perform_request(method, path, params, payload).body
And I found the request sent to Elasticsearch was as following:
POST /_bulk
{"index":{"_index":"logstash-201704","_type":"ruby"}}
{"level":"INFO","message":{"status":200,"time":{"total":46.26,"db":33.88,"view":12.38},"method":"PUT","path":"filtered","params":{"time":3815.904,"chapter_index":0},"response":[{}]},"node":"main","time":"2017-04-09T14:39:06UTC","tag":"filtered.console","#timestamp":"2017-04-09T23:39:06+09:00"}
The problem is message field contains JSON object, although this field is mapped as analyzed string on Elasticsearch.

umbraco lucene syntax for not

I am trying to write a lucene query to retrieve some pages in my website so I have the following:
string.Format("nodeName: ({0})^7 bodyText: ({0})^6", _searchTerm)
which means it will search for content that either has the nodeName or the bodyText that includes the _searchTerm variable
where I am struggling is that I also want it to not include any results that have a hideInNav flag set to 1 so I tried:
string.Format("nodeName: ({0})^7 bodyText: ({0})^6 +hideInNav: NOT(1)", _searchTerm)
However this is throwing up the following error:
Encountered " <NOT> "NOT "" at line 1, column 140.
Was expecting one of:
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
"[" ...
"{" ...
<NUMBER> ...
As far as I can tell the query does have a ( after the NOT so I'm stumped as to where this is being expected
Try this query:
string.Format("nodeName: ({0})^7 bodyText: ({0})^6 !hideInNav: (1)", _searchTerm)
The exclamation mark can also be changed to NOT:
string.Format("nodeName: ({0})^7 bodyText: ({0})^6 NOT hideInNav: (1)", _searchTerm)
See this page for an overview of the Lucene Query syntax (it's not the current version but I doubt it changed a lot)
Edit: maybe reversing you hideInNave statement will fix it:
string.Format("nodeName: ({0})^7 bodyText: ({0})^6 +hideInNav: (0)", _searchTerm) to check if it is zero or check if it is zero.
You might want to also download Luke to check the content of your index and see how values are saved.

Resources