Search for a string that start with a wildcard in ElasticSearch - elasticsearch

I am building a kibana dashboard that displays information about X509 certificates. I would like to build a pie chart of certificates that contain a wildcard in their CN or SAN attributes, but I cannot find a query syntax that works.
To match a string like subject.cn: "*.example.net", I tried the following kibana queries:
subject.cn:/\*./
subject.cn:/^\*./
subject.cn:\*\.
subject.cn:\*.
subject.cn:*.
Could someone point me to the proper syntax? Is this even something ES/Lucene supports?

Analysing *.example.net with the standard analyser will give you a single term of example.net - i.e. the asterisk and first "." have been stripped.
Using not_analyzed will store the complete field *.example.net (as expected!)
If the wildcard is always at the beginning of the CN name then using a simple prefix query will work (I've simplified the field name):
curl -XGET 'http://localhost:9200/mytest/certificates/_search?pretty' -d '{
"query": {
"prefix": { "cn.raw":"*"}
}
}'
However if you want to search against different levels of the domain name you'll need to change the analyser you're using.
E.g. use the pattern analyser and define "." as your delimiter or possibly create a custom analyzer that calls the path hierarchy tokenizer - it's going to depend on how user's want to search your data.

Thanks to Olly's answer, I was able to find a solution that works. Once the raw fields defined, the trick is to escape the wildcard to treat it as a character, and to surround it with unescape wildcards, to accept surrounding characters:
ca:false AND (subject.cn.raw:*\** OR x509v3Extensions.subjectAlternativeName.raw:*\**)

Related

Elasticsearch ingore special characters unless quoted

I am making a search tool to query against text fields.
If the user searches for:
6ES7-820
Than I would like to return all documents containing:
6ES7820
6-ES7820
6ES7-820
...
In other words I would like to ignore special characters. I could achieve this by removing the special characters in my search analyzer and my indexing analyzer.
But when the user would search for the same term using quotation marks (or something else):
"6ES7-820"
I want to only return the documents containing
6ES7-820
So then special characters should not be ignored. This means I can not remove these characters while indexing.
How could this search method be implemented in Elasticsearch, which analysers should I use?

Maching two words as a single word

Consider that I have a document which has a field with the following content: 5W30 QUARTZ INEO MC 3 5L
A user wants to be able to search for MC3 (no space) and get the document; however, search for MC 3 (with spaces) should also work. Moreover, there can be documents that have the content without spaces and that should be found when querying with a space.
I tried indexing without spaces (e.g. 5W30QUARTZINEOMC35L), but that does not really work as using a wildcard search I would match too much, e.g. MC35 would also match, and I only want to match two exact words concatenated together (as well as exact single word).
So far I'm thinking of additionally indexing all combinations of two words, e.g. 5W30QUARTZ, QUARTZINEO, INEOMC, MC3, 35L. However, does Elasticsearch have a native solution for this?
I'm pretty sure what you want can be done with the shingle token filter. Depending on your mapping, I would imagine you'd need to add a filter looking something like this to your content field to get your tokens indexed in pairs:
"filter_shingle":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,
"output_unigrams":"true"
}
Note that this is also already the default configuration, I just added it for clarity.

Searching for special characters in elastic

I have a field name in my index with value $$$ LTD
Standard analyser is applied to this field.
I'm trying to search for record with this value as below but nothing found.
http://localhost:9200/my-index/_search?q=name:$$$
In the same time when I'm searching for name:"$$$ LTD" it returns all records that contains LTD as if $$$ ignored.
I'm quite sure proper value exists in index. So how can I search for it?
UPD.
Mapping related to searchable field:
{“name":{"type":"string","boost":4.0,"analyzer”:”nameAnalyzer"}
{"nameAnalyzer":{"filter":["lowercase"],"type":"custom","tokenizer":"standard"}}}
Not use Special charactor ($) in your URL parameters.So use encode of it,for Ex. encode of $ is %24 so use this way.
http://localhost:9200/my-index/_search?q=name:%24%24%24
Solved.
Standard tokeniser strips special characters.
I have to define different type of tokeniser (probably space based).
More information on this question can be found on page:
https://discuss.elastic.co/t/how-to-index-special-characters-and-search-those-special-characters-in-elasticsearch/42506/2

elasticsearch - fulltext search for words with special/reserved characters

I am indexing documents that may contain any special/reserved characters in their fulltext body. For example
"PDF/A is an ISO-standardized version of the Portable Document Format..."
I would like to be able to search for pdf/a without having to escape the forward slash.
How should i analyze my query-string and what type of query should i use?
The default standard analyzer will tokenize a string like that so that "PDF" and "A" are separate tokens. The "A" token might get cut out by the stop token filter (See Standard Analyzer). So without any custom analyzers, you will typically get any documents with just "PDF".
You can try creating your own analyzer modeled off the standard analyzer that includes a Mapping Char Filter. The idea would that "PDF/A" might get transformed into something like "pdf_a" at index and query time. A simple match query will work just fine. But this is a very simplistic approach and you might want to consider how '/' characters are used in your content and use slightly more complex regex filters which are also not perfect solutions.
Sorry, I completely missed your point about having to escape the character. Can you elaborate on your use case if this turns out to not be helpful at all?
To support queries containing reserved characters i now use the Simple Query String Query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html)
As of not using a query parser it is a bit limited (e.g. no field-queries like id:5), but it solves the purpose.

How to search emoticon/emoji in elasticsearch?

I am trying to search emoticon/emoji containing text in elasticsearch. Earlier, I have inserted tweets in ES. Now I want to search for example smile or sad faces related tweets. I tried the following
1) used equivalent of unicode values of smile, but didn't work. No results were returned.
GET /myindex/twitter_stream/_search
{
"query": {
"match": {
"text": "\u1f603"
}
}
}
How to set up emoji search in elasticsearch? Do, I have to encode raw tweets before ingesting into elasticsearch? What would be the query ? Any experienced approaches? Thanks.
The specification explain how to search for emoji:
Searching includes both searching for emoji characters in queries, and
finding emoji characters in the target. These are most useful when
they include the annotations as synonyms or hints. For example, when
someone searches for ⛽︎ on yelp.com, they see matches for “gas
station”. Conversely, searching for “gas pump” in a search engine
could find pages containing ⛽︎.
Annotations are language-specific: searching on yelp.de, someone would
expect a search for ⛽︎ to result in matches for “Tankstelle”.
You can keep the real unicode char, and expand it to it annotation in each language you aim to support.
This can be done with a synonym filter. But Elasticsearch standard tokenizer will remove the emoji, so there is quite a lot of work to do:
remove emoji modifier, clean everything up;
tokenize via whitespace;
remove undesired punctuation;
expand the emoji to their synonyms.
The whole process is described here: http://jolicode.com/blog/search-for-emoji-with-elasticsearch (disclaimer: I'm the author).
The way I have seen emoticons work is actually a string is stored in place of there image counterparts when you are storing them in a database. For eg. A smile is stored as :smile:. You can verify that in your case. If this is the case, you can add a custom tokenizer which does not tokenize on colons so that an exact match for the emoticons can be made. Then while searching you just need to convert the emoticon image in search to appropriate string and elasticsearch will be able to find it. Hope it helps

Resources