Elasticsearch search yields no results, analyzers might be the issue - elasticsearch

Elasticsearch version: 1.6.0
I've been using elasticsearch for the last months (just started) and now I'm running into problems with it. Here is some info about my database:
The index I'm using uses the default dynamic mapping (eg: I haven't tinkered with its mapping). My objects should be schema-free. Also the index uses the default analyzer (I haven't touched that either) so index/_settings looks like this:
{
"default": {
"settings": {
"index": {
"creation_date": "1441808338958",
"uuid": "34Yn1_ixSqOzp9UotOE_4g",
"number_of_replicas": "1",
"number_of_shards": "1",
"version": {
"created": "1060099"
}
}
}
}
}
Here's the issue I'm having: on some field values the search does not work as expected (I concluded it's because of the analyzer). Example: the field email has the value user#example.com; {"query":{"bool":{"must":[{"term":{"user.email":"user#example.com"}}]}} won't work, but having the term value as just "user" works (because it somehow tokenizes it, and there is no token with the full email address).
Here's what I want: I want both wildcard text searches (finding a bad word in a comment's text) AND strict searches (like on email for example) on any field, then I'll be using bool and should with either term or wildcard.
The problem is I just can't tell him "ok, on this field you should use the X analyzer" because all my fields are dynamic.
What I've tried: On the index's settings I PUT-ed this: {"analysis":{"analyzer":{"default":{"type":"keyword"}}}}; doesnt' work: nothing changed (I also didn't forget to close the index before doing so and open it).
Is this issue even related to analyzers ?

This query won't work
{"query":{"bool":{"must":[{"term":{"user.email":"user#example.com"}}]}}
Term is exact match, meaning whatever your value for that field ("user#example.com" in your case) must match whatever tokens ES has for that field.
When you don't assign any analyzer for that field, ES will assume you are using standard analyzer for that field. When this "user#example.com" indexed, it will be tokenized into ("user","example","com").
To solve your problem you have to tell ES to "not_analyzed" the email field in your index's mapping.

With the help of Ryan Huynh I've solved my issue:
Use dynamic mappings; create the index like so:
PUT /index
{
"mappings": {
"_default_": {
"dynamic_templates": [
{
"string_template": {
"mapping": {
"index": "not_analyzed",
"type": "string"
},
"match_mapping_type": "string",
"match": "*"
}
}
]
}
}

Related

How to conditionally apply an analyzer at index time to a field that could be one of many languages?

I have documents with a field (e.g. input_text) that contains a string that could be one of 20 odd languages. I have another field that has the short form of the language (e.g. lang)
I want to conditionally apply an analyzer at index time to the text field dependent on what the language is as detected from the language field.
I eventually want a Kibana dashboard with a single word cloud of the most common words in the text field (ie in multiple languages) but only words that have been stemmed and tokenized with stop words removed.
Is there a way to do this?
The elasticsearch documents suggest using multiple fields for each language and then specifying an analyzer for the appropriate field, but I can't do this as there are 20 some languages and this would overload my nodes.
There is no way to achieve what you want in Elasticsearch (applying analyzer to field A based on the value of field B).
I would recommend to create one index per language, and then create an index alias that groups all those indices and query against it.
PUT lang_de
{
"mappings": {
"properties": {
"input_text": {
"type": "text",
"analyzer": "german"
}
}
}
}
PUT lang__en
{
"mappings": {
"properties": {
"input_text": {
"type": "text",
"analyzer": "english"
}
}
}
}
POST _aliases
{
"actions": [
{
"add": {
"index": "lang_*",
"alias": "lang"
}
}
]
}

Elasticsearch 7.9 forward slashes

I'm using elasticsearch 7.9.1 and want to search for "/abc" (including the forward slash) in the field name "Path", such as such as in "mysite.com/abc/xyz". Here's the index template, but doesn't work:
"Path": {
"type": "text",
"index": false
}
What did I do wrong? Can you please help? Thanks!
They changed the syntax for "not analyzed" text only once (in ES 5), from
{
"type": "string",
"index": "not_analyzed"
}
to
{
"type": "keyword"
}
If you want special characters like / to not be removed at indexing time during analysis, you should use keyword instead of text.
Moreover, if your intent is to search within URL, you should prefer the wildcard field type or keep using text but use an appropriate custom analyzer that splits your URL into parts.
If you upgrade to 7.11, you could also have access to the URI parts ingest processor that does all the job for you.

Elasticsearch - template matcing based on field value

Imagine this document:
{
"_index": "project.datasync.20180101",
"_type": "com.redhat.viaq.common",
"service": "data-sync-server",
"data": {
"foo":"bar"
}
...
}
I would like to have mapping for "data.foo" field (imagine I need some changes in how it is indexed etc.)
I know I can match indices like this:
{
"template" : "project.datasync.*",
"order" : 100,
"mappings": {
"data": {
"enabled": true,
"properties": {
"foo": {"type": "string", "index": "not_analyzed", ...}
}
}
}
}
However, the datasync part in the index name comes from somewhere else and there's no guarantee that it will be datasync or something similar that matches a pattern.
So, my index template wouldn't match if the index is project.thedatasync.20180101.
I know I can use project.* in my index template for matching, but in that case it is too generic where it matches irrelevant things.
So, I would like to have this mapping active only when service is data-sync-server which is always true for the documents that I am interested in.
Any ideas? This seemed like something fundamentally against how ElasticSearch works and in that case I would like to clarify that.
Please note that documents are sent to ElasticSearch with Fluentd I don't have access to Fluentd config to change the index name there.

elasticsearch - field filterable but not searchable

Using elastic 2.3.5. Is there a way to make a field filterable, but not searchable? For example, I have a language field, with values like en-US. Setting several filters in query->bool->filter->term, I'm able to filter the result set without affecting the score, for example, searching for only documents that have en-US in the language field.
However, I want a query searching for the term en-US to return no results, since this is not really an indexed field for searching, but just so I can filter.
Can I do this?
ElasticSearch use an _all field to allow fast full-text search on entire documents. This is why searching for en-US in all fields of all documents return you the one containing 'language':'en-US'.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
You can specify "include_in_all": false in the mapping to deactivate include of a field into _all.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "string"
},
"country": {
"type": "string"
},
"language": {
"type": "string",
"include_in_all": false
}
}
}
}
}
In this example, searching for 'US' in all field will return only document containing US in title or country. But you still be able to filter your query using the language field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/include-in-all.html

Excluding field from _source causes aggregation to not work

We're using Elasticsearch 1.7.2 and trying to use the "include/exclude from _source" feature as it's described here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html
We have a field types that's 'pretty' and that we would like to return to the client but it's not well suited to aggregations, and a field types_int (and also a types_string but that's not relevant now) that's 'ugly' but optimized for search/aggregations which we don't want to return to the client but that we want to aggregate/filter on.
The field types_int doesn't need to be stored anywhere, it just needs to be indexed. We don't want to waste bandwidth in returning it to the client either, so we don't want to include it in _source.
The mapping for it looks like this:
"types_int": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"value_int": {
"type": "integer"
}
}
}
However, after we add the exclude, our filters/aggregations on it stop working.
The excludes looks like this:
"_source": {
"excludes": [
"types_int"
]
}
Without that in the mapping, everything works fine.
An example of a filter:
POST my_index/my_type/_search
{
"filter": {
"nested": {
"path": "types_int",
"filter": {
"term": {
"types_int.name": "<something>"
}
}
}
}
}
Again, removing the excludes and everything works fine.
Thinking it might have something to do with nested types, since they're separate documents and all and perhaps handled differently from normal fields, I added an exclude mapping for a 'normal' value type field and then my filter also stopped working.
"publication": {
"type": "string",
"index": "not_analyzed"
}
"_source": {
"excludes": [
"publication"
]
}
So my conclusion is that after you exclude something from _source, you can no longer filter on it? Which doesn't make sense to me, so I'm thinking there's something we're doing wrong here. The _source include/exclude is just a post-process action that manipulates the string data inside that field, right?
I understand that we can also use source filtering to request specific fields to not be included at query time, but it's simply unnecessary to store it. If anything, I would just like to understand why this doesn't work :)

Resources