Searching for special characters in elastic

Searching for special characters in elastic - elasticsearch

I have a field name in my index with value $$$ LTD
Standard analyser is applied to this field.
I'm trying to search for record with this value as below but nothing found.
http://localhost:9200/my-index/_search?q=name:$$$
In the same time when I'm searching for name:"$$$ LTD" it returns all records that contains LTD as if $$$ ignored.
I'm quite sure proper value exists in index. So how can I search for it?
UPD.
Mapping related to searchable field:
{“name":{"type":"string","boost":4.0,"analyzer”:”nameAnalyzer"}
{"nameAnalyzer":{"filter":["lowercase"],"type":"custom","tokenizer":"standard"}}}

Not use Special charactor ($) in your URL parameters.So use encode of it,for Ex. encode of $ is %24 so use this way.
http://localhost:9200/my-index/_search?q=name:%24%24%24

Solved.
Standard tokeniser strips special characters.
I have to define different type of tokeniser (probably space based).
More information on this question can be found on page:
https://discuss.elastic.co/t/how-to-index-special-characters-and-search-those-special-characters-in-elasticsearch/42506/2

Related

Maching two words as a single word

Consider that I have a document which has a field with the following content: 5W30 QUARTZ INEO MC 3 5L
A user wants to be able to search for MC3 (no space) and get the document; however, search for MC 3 (with spaces) should also work. Moreover, there can be documents that have the content without spaces and that should be found when querying with a space.
I tried indexing without spaces (e.g. 5W30QUARTZINEOMC35L), but that does not really work as using a wildcard search I would match too much, e.g. MC35 would also match, and I only want to match two exact words concatenated together (as well as exact single word).
So far I'm thinking of additionally indexing all combinations of two words, e.g. 5W30QUARTZ, QUARTZINEO, INEOMC, MC3, 35L. However, does Elasticsearch have a native solution for this?

I'm pretty sure what you want can be done with the shingle token filter. Depending on your mapping, I would imagine you'd need to add a filter looking something like this to your content field to get your tokens indexed in pairs:
"filter_shingle":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,
"output_unigrams":"true"
}
Note that this is also already the default configuration, I just added it for clarity.

elasticsearch - fulltext search for words with special/reserved characters

I am indexing documents that may contain any special/reserved characters in their fulltext body. For example
"PDF/A is an ISO-standardized version of the Portable Document Format..."
I would like to be able to search for pdf/a without having to escape the forward slash.
How should i analyze my query-string and what type of query should i use?

The default standard analyzer will tokenize a string like that so that "PDF" and "A" are separate tokens. The "A" token might get cut out by the stop token filter (See Standard Analyzer). So without any custom analyzers, you will typically get any documents with just "PDF".
You can try creating your own analyzer modeled off the standard analyzer that includes a Mapping Char Filter. The idea would that "PDF/A" might get transformed into something like "pdf_a" at index and query time. A simple match query will work just fine. But this is a very simplistic approach and you might want to consider how '/' characters are used in your content and use slightly more complex regex filters which are also not perfect solutions.
Sorry, I completely missed your point about having to escape the character. Can you elaborate on your use case if this turns out to not be helpful at all?

To support queries containing reserved characters i now use the Simple Query String Query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html)
As of not using a query parser it is a bit limited (e.g. no field-queries like id:5), but it solves the purpose.

Is there a way to search fhir resources on a text search parameter using wildcards?

I'm trying to search for all Observations where "blood" is associated with the code using:
GET [base]/Observation?code:text=blood
It appears that the search is matching Observations where the associated text starts with "blood" but not matching on associated text that contains "blood".
Using the following, I get results with a Coding.display of "Systolic blood pressure" but I'd like to also get these Observations by searching using the text "blood".
GET [base]/Observation?code:text=sys
Is there a different modifier I should be using or wildcards I should use?

The servers seem to do as the spec requests: when using the modifier :text on a token search parameter (like code here), the spec says:
":text The search parameter is processed as a string that searches
text associated with the code/value"
If we look at how a server is supposed to search a string, we find:
"By default, a field matches a string query if the value of the field
equals or starts with the supplied parameter value, after both have
been normalized by case and accent."
Now, if code would have been a true string search parameter, we could have applied the modifier contains, however we cannot stack modifiers, so in this case code:text:containts would may logical, but is not part of the current specification.
So, I am afraid that there is currently no "standard" way to do what you want.

Amazon Cloudsearch not searching with partial string

I'm testing Amazon Cloudsearch for my web application and i'm running into some strange issues.
I have the following domain indexes: name, email, id.
For example, I have data such as: John Doe, John#example.com, 1
When I search for jo I get nothing. If I search for joh I still get nothing, But if I search for john then I get the above document as a hit. Why is it not getting when I put partial strings? I even put suggestors on name and email with fuzzy matching enabled. Is there something else i'm missing? I read the below on this:
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching.html
http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-compound-queries.html
I'm doing the searches using boto as well as with the form on AWS page.

What you're trying to do -- finding "john" by searching "jo" -- is a called a prefix search.
You can accomplish this either by searching
(prefix field=name 'jo')
or
q=jo*
Note that if you use the q=jo* method of appending * to all your queries, you may want to do something like q=jo* |jo because john* will not match john.
This can seem a little confusing but imagine if google gave back results for prefix matches: if you searched for tort and got back a mess of results about tortoises and torture instead of tort (a legal term), you would be very confused (and frustrated).
A suggester is also a viable approach but that's going to give you back suggestions (like john, jordan and jostle rather than results) that you would then need to search for; it does not return matching documents to you.
See "Searching for Prefixes in Amazon CloudSearch" at http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html

Are your index field types "Text"? If they are just "Literals", they have to be an exact match.

I think you must have your name and email fields set as the literal type instead of the text type, otherwise a simple text search of 'jo' or 'Joh' should've found the example document.
While using a prefix search may have solved your problem (and that makes sense if the fields are set as the literal type), the accepted answer isn't really correct. The notion that it's "like a google search" isn't based on anything in the documentation. It actually contradicts the example they use, and in general muddies up what's possible with the service. From the docs:
When you search text and text-array fields for individual terms, Amazon CloudSearch finds all documents that contain the search terms anywhere within the specified field, in any order. For example, in the sample movie data, the title field is configured as a text field. If you search the title field for star, you will find all of the movies that contain star anywhere in the title field, such as star, star wars, and a star is born. This differs from searching literal fields, where the field value must be identical to the search string to be considered a match.

Search for a string that start with a wildcard in ElasticSearch

I am building a kibana dashboard that displays information about X509 certificates. I would like to build a pie chart of certificates that contain a wildcard in their CN or SAN attributes, but I cannot find a query syntax that works.
To match a string like subject.cn: "*.example.net", I tried the following kibana queries:
subject.cn:/\*./
subject.cn:/^\*./
subject.cn:\*\.
subject.cn:\*.
subject.cn:*.
Could someone point me to the proper syntax? Is this even something ES/Lucene supports?

Analysing *.example.net with the standard analyser will give you a single term of example.net - i.e. the asterisk and first "." have been stripped.
Using not_analyzed will store the complete field *.example.net (as expected!)
If the wildcard is always at the beginning of the CN name then using a simple prefix query will work (I've simplified the field name):
curl -XGET 'http://localhost:9200/mytest/certificates/_search?pretty' -d '{
"query": {
"prefix": { "cn.raw":"*"}
}
}'
However if you want to search against different levels of the domain name you'll need to change the analyser you're using.
E.g. use the pattern analyser and define "." as your delimiter or possibly create a custom analyzer that calls the path hierarchy tokenizer - it's going to depend on how user's want to search your data.

Thanks to Olly's answer, I was able to find a solution that works. Once the raw fields defined, the trick is to escape the wildcard to treat it as a character, and to surround it with unescape wildcards, to accept surrounding characters:
ca:false AND (subject.cn.raw:*\** OR x509v3Extensions.subjectAlternativeName.raw:*\**)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Searching for special characters in elastic - elasticsearch

Not use Special charactor ($) in your URL parameters.So use encode of it,for Ex. encode of $ is %24 so use this way. http://localhost:9200/my-index/_search?q=name:%24%24%24

Solved. Standard tokeniser strips special characters. I have to define different type of tokeniser (probably space based). More information on this question can be found on page: https://discuss.elastic.co/t/how-to-index-special-characters-and-search-those-special-characters-in-elasticsearch/42506/2

Related

Maching two words as a single word

elasticsearch - fulltext search for words with special/reserved characters

Is there a way to search fhir resources on a text search parameter using wildcards?

Amazon Cloudsearch not searching with partial string

Search for a string that start with a wildcard in ElasticSearch

Categories

Resources