Elasticsearch 7.9 forward slashes - elasticsearch

I'm using elasticsearch 7.9.1 and want to search for "/abc" (including the forward slash) in the field name "Path", such as such as in "mysite.com/abc/xyz". Here's the index template, but doesn't work:
"Path": {
"type": "text",
"index": false
}
What did I do wrong? Can you please help? Thanks!

They changed the syntax for "not analyzed" text only once (in ES 5), from
{
"type": "string",
"index": "not_analyzed"
}
to
{
"type": "keyword"
}
If you want special characters like / to not be removed at indexing time during analysis, you should use keyword instead of text.
Moreover, if your intent is to search within URL, you should prefer the wildcard field type or keep using text but use an appropriate custom analyzer that splits your URL into parts.
If you upgrade to 7.11, you could also have access to the URI parts ingest processor that does all the job for you.

Related

Elasticsearch search yields no results, analyzers might be the issue

Elasticsearch version: 1.6.0
I've been using elasticsearch for the last months (just started) and now I'm running into problems with it. Here is some info about my database:
The index I'm using uses the default dynamic mapping (eg: I haven't tinkered with its mapping). My objects should be schema-free. Also the index uses the default analyzer (I haven't touched that either) so index/_settings looks like this:
{
"default": {
"settings": {
"index": {
"creation_date": "1441808338958",
"uuid": "34Yn1_ixSqOzp9UotOE_4g",
"number_of_replicas": "1",
"number_of_shards": "1",
"version": {
"created": "1060099"
}
}
}
}
}
Here's the issue I'm having: on some field values the search does not work as expected (I concluded it's because of the analyzer). Example: the field email has the value user#example.com; {"query":{"bool":{"must":[{"term":{"user.email":"user#example.com"}}]}} won't work, but having the term value as just "user" works (because it somehow tokenizes it, and there is no token with the full email address).
Here's what I want: I want both wildcard text searches (finding a bad word in a comment's text) AND strict searches (like on email for example) on any field, then I'll be using bool and should with either term or wildcard.
The problem is I just can't tell him "ok, on this field you should use the X analyzer" because all my fields are dynamic.
What I've tried: On the index's settings I PUT-ed this: {"analysis":{"analyzer":{"default":{"type":"keyword"}}}}; doesnt' work: nothing changed (I also didn't forget to close the index before doing so and open it).
Is this issue even related to analyzers ?
This query won't work
{"query":{"bool":{"must":[{"term":{"user.email":"user#example.com"}}]}}
Term is exact match, meaning whatever your value for that field ("user#example.com" in your case) must match whatever tokens ES has for that field.
When you don't assign any analyzer for that field, ES will assume you are using standard analyzer for that field. When this "user#example.com" indexed, it will be tokenized into ("user","example","com").
To solve your problem you have to tell ES to "not_analyzed" the email field in your index's mapping.
With the help of Ryan Huynh I've solved my issue:
Use dynamic mappings; create the index like so:
PUT /index
{
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_template": {
"mapping": {
"index": "not_analyzed",
"type": "string"
},
"match_mapping_type": "string",
"match": "*"
}
}
]
}
}

ElasticSearch results starting with a particular letter?

I have a large elasticsearch database full of records that each have a Name field, which is a single word. I would like to be able to page through the (sorted by Name) results starting at a particular letter. For example, I want to be able to start showing results where Name starts with the letter 'J', and then be able to page through all the remaining results.
This is how Name is currently mapped:
"Name": {
"type": "multi_field",
"fields": {
"name_exact": {
"type": "string",
"index": "not_analyzed"
},
"name_simple": {
"type": "string",
"analyzer": "simple"
},
"name_snow": {
"type": "string",
"analyzer": "snowball"
}
}
}
Is there a query that will let me do this?
You can use a prefix filter (cached by default) or prefix query (not cacheable).
Note that the query string itself is not analyzed.
If you want analysis on the query string, you should change your mapping and add an edge-ngram analyzed field; you can then use it with a match query.

how to keep *only* longest term produced by PathHierarchy tokenizer in ElasticSearch?

I need to use PathHierarchy tokenizer during indexing stage. (so I could generate terms like "a", "a/b", "a/b/c".
But during search stage I would like to only keep the longest term ("a/b/c"). I need this because Kibana uses query_string type of queries so the query_string itself is analyzesed.
(question regarding Kibana queries is here:
do the queries for values analyzed with hierarchical path work correctly in Kibana and ElasticSearch?)
is it possible to create a custom analyzer which will use path_hierarchy tokenizer and then will apply a filter which will only keep the longest term?
You can use a different analyser for indexing and searching. Maybe this mapping can help you:
PUT /myindex
{
"mappings": {
"mytype":{
"properties": {
"path": {
"type": "string",
"index_analyzer": " path_hierarchy",
"search_analyzer": "keyword"
}
}
}
}
}

ElasticSearch what analyzer should be used for searching for both url fragment and exact url path

I want to store uri in a mapping and I want to make it searchable the following way:
Exact match (i.e. if I stored: http://stackoverflow.com/questions then looking for the term http://stackoverflow.com/questions retrieves the item.
Bit like letter tokenizer all "words" should be searchable. So searching for either questions, stackoverflow or maybe com will bring back http://stackoverflow.com/questions as a hit.
Looking for '.' or '/' separated url fragments should be still searchable. So searching for stackoverflow.com will bring back http://stackoverflow.com/questions as a hit.
should be case insensitive. (like lowercase)
The html://, htmls://, www. etc. is optional for searching. So searching for either http://stackoverflow.com or stackoverflow.com will bring back http://stackoverflow.com/questions as a hit.
Maybe a solution should be something like chaining tokenizers or something like that. I'm quite new to ES so this is maybe a trivial question.
So what kind of analyzer should I use/build to achieve this functionality?
Any help would be greatly apprechiated.
You are absolutely, correct. You will want to set your field type as multi_field and then create analyzers for each scenario. At the core, you can then do a multi_match query:
=============type properties===============
{
"fun_documents": {
"properties": {
"url": {
"type": "multi_field",
"fields": {
"keyword": {
"type": "string",
"analyzer": "keyword"
},
"alphanum_only": {
"type": "string",
"analyzer": "my_custom_alpha_num_analyzer"
},
{
"etc": "etc"
}
}
}
}
}
}
==================query=====================
{
"query": {
"multi_match": {
"query": "stackoverflow",
"fields": [
"url.keyword",
"url.alphanum_only",
"url.optional_fun"
]
}
}
}
Note that you can get fancy with multi_field aliases and reusing the same name, but this is the simple demonstration.

How to not-analyze in ElasticSearch?

I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters.
If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that.
Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of denoting that this field shall not be analyzed?
I only create the index, I don't do anything else. I can use analyzers like "english" for other fields which seems to be built-in names for pre-configured analyzers. Is there a list of other names? Maybe there's one fitting my needs (namely doing nothing with the input).
This is my mapping currently:
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string" }
}
}
}
my_field1 is language-dependent; this seems to work. my_field2 shall be verbatim. I'd like to give an analyzer there which simply does not do anything.
A sample value for my_field2 would be "B45c 14/04".
"my_field2": {
"properties": {
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
Check you here, https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html, for further info.
This is no longer true due to the removal of the string (replaced by keyword and text) type as described here. Instead you should use keyword type with "index": true | false.
For Example OLD:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
becomes NEW:
{
"foo": {
"type" "keyword",
"index": true
}
}
This means the field is indexed but as it is typed as keyword not analyzed implicitly. If you would like to have the field analyzed, you need to use text type.
keyword analyser can be also used.
// don't actually use this, use "index": "not_analyzed" instead
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string", "analyzer": "keyword" }
}
}
}
As noted here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html, it makes more sense to mark those fields as not_analyzed.
But keyword analyzer can be useful when it is set by default for whole index.
UPDATE: As it said in comments, string is no longer supported in 5.X

Resources