Elasticsearch: Constructing mappings for Java Client - elasticsearch

In my elasticsearch.yml file am trying to implement some mapping where one field belonging to one type is indexed using a different analyzer to the rest.
At present the yaml file has the following structure:
index:
bookshelf:
types:
book:
mappings:
title: {analyzer: customAnalyzer}
analysis:
analyzer:
# set standard analyzer with no stop words as the default
default:
type: standard
stopwords: _none_
# set custom analyser to provide relative search results
customAnalyzer:
type: custom
tokenizer: nGramTokenizer
filter: [lowercase,stopWordsFilter,asciifolding]
tokenizer:
nGramTokenizer:
type: nGram
min_gram: 1
max_gram: 2
filter:
nGramFilter:
type: nGram
min_gram: 1
max_gram: 2
stopWordsFilter:
type: stop
stopwords: _none_
This does not apply the custom analyzer to the title field, so I was hoping someone may be able to point me in the right direction for applying custom analyzers to individual fields?

I answered this in the ml:
If you are using Java you don't have to use an yml file. You can, but you don't have to.
If you are using Spring, you can have a look at the ES spring factory project:  https://github.com/dadoonet/spring-elasticsearch
If not, there is different ways of creating index and mappings in Java:
You can have a look here to see how I'm doing this by reading a json
mapping file: 
https://github.com/dadoonet/spring-elasticsearch/blob/master/src/main/java/fr/pilato/spring/elasticsearch/ElasticsearchAbstractClientFactoryBean.java#L616
You can also use XContent objects provided by ES to build your
mappings in Java: 
https://github.com/dadoonet/rssriver/blob/master/src/test/java/org/elasticsearch/river/rss/RssRiverTest.java#L14
Using this object is described here:  https://github.com/dadoonet/rssriver/blob/master/src/test/java/org/elasticsearch/river/rss/AbstractRssRiverTest.java#L98
Adding the mapping as follows:
node .client() .admin () .indices()
.preparePutMapping ("yourindex" )
.setType ( "yourtype" )
.setSource ( mapping ())
.execute() .actionGet ();
I hope this could help you

Related

Filebeat date field mapped as type keyword

Filebeat is reading logs from a file, where logs are in the following format:
{"logTimestamp":"2019-11-29T16:39:43.027Z","#version":"1","message":"Hello world","logger_name":"se.lolotron.App","thread_name":"thread-1","level":"INFO","level_value":40000,"application":"my-app"}
So there is a field logTimestamp logged in ISO 8601 time format.
The problem is that this field is mapped as a keyword In Elasticsearch filebeat index
"logTimestamp": {
"type": "keyword",
"ignore_above": 1024
},
On the other hand if I index a similar document in the same Elasticsearch instance but different index, e.g.
POST /new_index/_doc/
{
"message": "hello world",
"logTimestamp":"2019-11-29T16:39:43.027Z"
}
The mapping is
"logTimestamp": {
"type": "date"
},
According to docs here and here by default Elastic should detect a date if formatted with strict_date_optional_time. And strict_date_optional_time is described as
A generic ISO datetime parser where the date is mandatory and the time
is optional.
Which I presume is ISO 8601 and think I proved that with indexing a new doc to new_index in the example above.
Why is logTimestamp saved as keyword in the case of Filebeat? Any ideas?
I'm using Filbeat 7.2.1, Elasticsearch 7.2.1.
Also the default fields.yml is used.
I just found out that date_detection is disabled for filebeat indices by default (Filebeat version 7.2.1).
This can be seen here
var (
// Defaults used in the template
defaultDateDetection = false
...
Does not look like it can be overridden.
The workaround for this is to use experimental feature append_fields (experimental at least at the time of writing this post. See here for more.) and add the following to the filebeat.yml config
setup.template.overwrite: true
setup.template.append_fields:
- name: logTimestamp
type: date
This will make sure that the mapping for logTimestamp is date.

Can't deal with accents in Elasticsearch indexing and search

I have an issue with elasticsearch and the way the data are indexed/retrieved. I don't understand what happens.
This is the mapping I use (sorry, it's yaml format) :
The idea is simple, in theory... I have a string analyzer with lowercase and asciifolding filters. I don't want to care about case or accents, and I would like to use this analyzer to index and search.
settings:
index:
analysis:
filter:
autocomplete_filter:
type: edgeNGram
side: front
min_gram: 1
max_gram: 20
analyzer:
autocomplete:
type: custom
tokenizer: standard
filter: [lowercase, asciifolding, autocomplete_filter]
string_analyzer:
type: custom
tokenizer: standard
filter: [lowercase, asciifolding]
types:
city:
mappings:
cityName:
type: string
analyzer: string_analyzer
search_analyzer: string_analyzer
location: {type: geo_point}
When I run this query :
{
"query": {
"prefix":{
"cityName":"per"
}
}
,
"size":20
}
I get some results like "Perpezat", "Pern", "Péreuil" which is the excepted result.
But if I run the following query :
{
"query": {
"prefix":{
"cityName":"pér"
}
}
,
"size":20
}
Then I get no result at all.
If you have any clue or help, I would be happy to know it.
Thanks
In the Prefix Query, your search input is not analyzed like in other cases:
Matches documents that have fields containing terms with a specified prefix (not analyzed)
Your first example works because the documents are analyzed at index time using your analyzer with lowercase and asciifolding, so they contain a term starting with per (perpezat, pern, pereuil).
Your second example does not work because those documents don't contain any terms starting with pér.
Since I couldn't find a way to tell Elasticsearch to analyze the prefix before performing the search, you could achieve your goal by manually adding this step:
Ask Elastisearch to analyze your input calling the Analyze API
Use the output from step 1 (it should be per in the examples) for the prefix query
For this to work, your search input should be a single term (I think that could be why Elasticsearch doesn't want to analyze it in the first place)
#mario-trucco Finally, I've found this post that explains a better way to analyze the strings.
What is an effective way to search world-wide location names with ElasticSearch?
Of course it doesn't answer my initial question and I still don't understand what happened, but it solves my problem by removing it.
Thanks again for your help and time.

Elasticsearch view indexed data

I have filter which replace characters
char_filter:
lt_characters:
type: mapping
mappings: ["a=>bbbbbb", "c=>tttttt", "ddddddd=>k" ]
I'm add this filter to index, now how to check does this filter work, where I can found indexed data ?
I mean exactly view replacments.
To see what tokens are created with your char_filter you can use the Analyze API.
curl -XGET 'localhost:9200/_analyze?char_filters=lt_characters' -d 'this is a test'

How to search fields with '-' characters in elastic search

I am new to elastic search. I have got following document where one of the field "eventId" has "-" in value.
When i try to search with complete value of eventId, i don't get any results.
Sample Document app/event
{
"tags": {}
"eventId": "cc98d57b-c6bc-424c-b54c-df1e3df0d942",
}
I haven't created any explicit settings for my index.
Thanks.
you should check if the tokenizer splits your value into multiple fields. Maybe your value is stored as 5 fields: "cc98d57b", "c6bc", "424c", "b54c" and "df1e3df0d942"
You can analyze that with the 'Kopf' Plugin (https://github.com/lmenezes/elasticsearch-kopf).
If that is your problem you should change your field mapping, so that the value is not analyzed ("index" : "not_analyzed").
For an example how to set that mapping see here: Elasticsearch mapping settings 'not_analyzed' and grouping by field in Java
After that, you should be able to search for your specific value.

Elasticsearch: field "title" was indexed without position data; cannot run PhraseQuery

I have an index in ElasticSearch with the following mapping:
mappings: {
feed: {
properties: {
html_url: {
index: not_analyzed
omit_norms: true
index_options: docs
type: string
}
title: {
index_options: offsets
type: string
}
created: {
store: true
format: yyyy-MM-dd HH:mm:ss
type: date
}
description: {
type: string
}
}
}
getting the following error when performing phrase search ("video games"):
IllegalStateException[field \"title\" was indexed without position data; cannot run PhraseQuery (term=video)];
Single word searches work fine. Tried "index_options: positions" as well but with no luck. Title field contains text in multiple languages, sometimes empty. Interesting that it seems to fail randomly, for example it would fail with 200K documents or 800K using the same dataset. Is there a reason some titles wouldn't get indexed with positions?
Elastic search version 0.90.5
Just in case someone else has the same issue. There was another type/table (feed2) in the same index with the same "title" field that was set to "not_analyzed".
For some reason even if you specify the type: http://elasticsearchhost.com:9200/index_name/feed/_search the other type is still being searched as well. Changing the mapping for feed2 type fixed the problem.
You probably have another field named 'title' with a different mapping in another type but in the same index.
Basically if you have 2 fields with the same name in the same index - even if they are in different types - they cannot have different mappings: to be more precise, even if they have the same type (eg: "string") but one of them is "analyzed" and the other is "not analyzed", problems will arise.
I mean, yeah, you can try to setup 2 different mappings, and ElasticSearch will not complain, but when searching you get strange result and everything will go bananas.
You can read more about this issue here where they say:
[...] In the end, we opted to enforce the rule that all fields with the same name in the same index must have the same mapping [...]
And yeah, considering how the promise of ElasticSearch has always been "it just works" this little detail took a lot of people by surprise.

Resources