Elasticsearch: what sample document would look like for this mapping? - elasticsearch

I am new to Elasticsearch. I was given this mapping document:
{
"book": {
"properties": {
"title": {
"properties": {
"de": {
"type": "string",
"fields": {
"default": {
"type": "string",
"analyzer": "de_analyzer"
}
}
},
"fr": {
"type": "string",
"fields": {
"default": {
"type": "string",
"analyzer": "fr_analyzer"
}
}
}
}
}
}
}
}
I am wondering what a sample document conforming to this mapping would like so that I can generate right json string.
Thanks and regards.

It should look something similar to:
{
"book": {
"title": {
"de": "somestring",
"fr": "somestring"
}
}
}
For more information read this.
This is how your json will look like. I see you are confused by:
"fr": {
"type": "string",
"fields": {
"fr": {
"type": "string",
"analyzer": "fr_analyzer"
}
}
}
Actually this is used for indexing a single field in multiple ways. Effective you are telling elasticsearch that you want same field to index in two different ways. In your case I'm not getting why you are using same name "fr" and "de" for indexing multi-fields. If you will use same name, ES will index it only as a single field. So, in your case doing like this is a similar thing:
"fr": {
"type": "string",
"analyzer": "fr_analyzer"
}
For more information go through these links:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/most-fields.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html
Hope this helps.

Well, I'm way slow here, but, as dark_shadow points out, this looks like a convoluted attempt at exposing multiple analyzers for a field (formerly known as multi-field mapping).
I've written up an example of mapping this in two ways: giving title two properties, fr and de, for which explicit strings need to be provided in each document, and just having a single title property, with fr and de analyzer 'fields'. Note that I replaced the name analyzers with "standard" to keep me from having to load those analyzers locally.
The difference between these is apparent. In example one, you have the ability to provide the translated titles on each book, with the index/search analyzer appropriate to the language being use on the respective properties.
In example two, you are providing a single title string, which will be indexed using both analyzers, and then you can choose the analyser to search by in your query.
#!/bin/sh
echo "--- delete index"
curl -X DELETE 'http://localhost:9200/so_multi_map/'
echo "--- create index and put mapping into place"
curl -XPUT http://localhost:9200/so_multi_map/?pretty -d '{
"mappings": {
"example_one": {
"properties": {
"title": {
"properties": {
"de": {
"type": "string",
"analyzer": "standard"
},
"fr": {
"type": "string",
"analyzer": "standard"
}
}
}
}
},
"example_two": {
"properties": {
"title": {
"type": "string",
"fields": {
"de": {
"type": "string",
"analyzer": "standard"
},
"fr": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
},
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}'
echo "--- index a book by example_one by POSTing"
curl -XPOST http://localhost:9200/so_multi_map/example_one -d '{
"title": {
"de": "Auf der Suche nach der verlorenen Zeit",
"fr": "À la recherche du temps perdu"
}
}'
echo "--- index a book by example_two by POSTing"
curl -XPOST http://localhost:9200/so_multi_map/example_two -d '{
"title": "À la recherche du temps perdu"
}'
echo "\n --- flush indices"
curl -XPOST 'http://localhost:9200/_flush'
echo "--- show the books"
curl -XGET http://localhost:9200/so_multi_map/_search?pretty -d '
{
"query": {
"match_all": {}
}
}'
echo "--- how to query just the fr title in example one:"
curl -XGET http://localhost:9200/so_multi_map/example_one/_search?pretty -d '
{
"query": {
"match": {
"title.fr": "recherche"
}
}
}'
echo "--- how to query just the fr title in example two:"
curl -XGET http://localhost:9200/so_multi_map/example_two/_search?pretty -d '
{
"query": {
"match": {
"title.fr": "recherche"
}
}
}'

Related

Enable stop token filter for standard analyzer

I'm using ElasticSearch 5.1 and I want to enable the Stop Token Filter for the standard analyzer which is disabled by default
The document describes how to use it in a custom analyzer, but I would like to know how to enable it, since it's already included.
you have to configure the standard analyzer, see the example below how to do it with curl command(taken from docs here):
curl -XPUT 'localhost:9200/my_index?pretty' -d'
{
"settings": {
"analysis": {
"analyzer": {
"std_english": {
"type": "standard",
"stopwords": "_english_"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"my_text": {
"type": "text",
"analyzer": "standard",
"fields": {
"english": {
"type": "text",
"analyzer": "std_english"
}
}
}
}
}
}
}'
curl -XPOST 'localhost:9200/my_index/_analyze?pretty' -d'
{
"field": "my_text",
"text": "The old brown cow"
}'
curl -XPOST 'localhost:9200/my_index/_analyze?pretty' -d'
{
"field": "my_text.english",
"text": "The old brown cow"
}'

How to implement case sensitive search in elasticsearch?

I have a field in my indexed documents where i need to search with case being sensitive. I am using the match query to fetch the results.
An example of my data document is :
{
"name" : "binoy",
"age" : 26,
"country": "India"
}
Now when I give the following query:
{
“query” : {
“match” : {
“name” : “Binoy"
}
}
}
It gives me a match for "binoy" against "Binoy". I want the search to be case sensitive. It seems by default,elasticsearch seems to go with case being insensitive. How to make the search case sensitive in elasticsearch?
In the mapping you can define the field as not_analyzed.
curl -X PUT "http://localhost:9200/sample" -d '{
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}'
echo
curl -X PUT "http://localhost:9200/sample/data/_mapping" -d '{
"data": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}'
Now if you can do normal index and do normal search , it wont analyze it and make sure it deliver case insensitive search.
It depends on the mapping you have defined for you field name. If you haven't defined any mapping then elasticsearch will treat it as string and use the standard analyzer (which lower-cases the tokens) to generate tokens. Your query will also use the same analyzer for search hence matching is done by lower-casing the input. That's why "Binoy" matches "binoy"
To solve it you can define a custom analyzer without lowercase filter and use it for your field name. You can define the analyzer as below
"analyzer": {
"casesensitive_text": {
"type": "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
You can define the mapping for name as below
"name": {
"type": "string",
"analyzer": "casesensitive_text"
}
Now you can do the the search on name.
note: the analyzer above is for example purpose. You may need to change it as per your needs
Have your mapping like:
PUT /whatever
{
"settings": {
"analysis": {
"analyzer": {
"mine": {
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "mine"
}
}
}
}
}
meaning, no lowercase filter for that custom analyzer.
Here is the full index template which worked for my ElasticSearch 5.6:
{
"template": "logstash-*",
"settings": {
"analysis" : {
"analyzer" : {
"case_sensitive" : {
"type" : "custom",
"tokenizer": "standard",
"filter": ["stop", "porter_stem" ]
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"fluentd": {
"properties": {
"message": {
"type": "text",
"fields": {
"case_sensitive": {
"type": "text",
"analyzer": "case_sensitive"
}
}
}
}
}
}
}
As you see, the logs are coming from FluentD and are saved into a timebased index logstash-*. To make sure, I can still execute wildcard queries on the message filed, I put a multi-field mapping on that field. Wildcard/analyzed queries can be done on message field and the case sensitive one on the message.case_sensitive field.

How to use stopword elasticsearch

I have an Elasticsearch 1.5 running on my server,
specifically, I want/create three fields with is
1.name
2.description
3.nickname
I want setup stopword for description and nickname field when I insert the data on the Elasticsearch then stop word automatically remove unwanted stopword. I'm trying so many time but not working.
curl -X POST http://127.0.0.1:9200/tryoindex/ -d'
{
"settings": {
"analysis": {
"filter": {
"custom_english_stemmer": {
"type": "stemmer",
"name": "english"
},
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
}
},
"mappings": {
"test": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "custom_lowercase_stemmed"
}
}
}
}
}'
curl -X POST "http://localhost:9200/tryoindex/nama/1" -d '{
"text" : "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your"
}'
curl "http://localhost:9200/tryoindex/nama/_search?pretty=1" -d '{
"query": {
"query_string": {
"query": "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your",
"fields": ["text"]
}
}
}'
Change your analyzer part to
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"stop",
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
To verify the changes use
curl -XGET 'localhost:9200/tryoindex/_analyze?analyzer=custom_lowercase_stemmed' -d 'testing this is stopword testing'
and observe the tokens
{"tokens":[{"token":"test","start_offset":0,"end_offset":7,"type":"<ALPHANUM>","position":1},{"token":"stopword","start_offset":16,"end_offset":24,"type":"<ALPHANUM>","position":4},{"token":"test","start_offset":25,"end_offset":32,"type":"<ALPHANUM>","position":5}]}%
PS: If you don't want to get the stemmed version of testing, then remove the stemming filters.
You need to use the stop token filter in your analyzer filter chain.

Elasticsearch: sorting Spanish double names alphabetically

I am doing an Elasticsearch query and I want the results ordered alphabetically by last name. My problem: the last names are all Spanish double names, and ES doesn't order them the way I would like it.
I would prefer the order to be:
Batres Rivera
Batrín Chojoj
Fion Morales
Lopez Giron
Martinez Castellanos
Milán Casanova
This is my query:
{
"query": {
"match_all": {}
},
"sort": [
{
"Last Name": {
"order": "asc"
}
}
]
}
The order that I get with this is:
Batres Rivera
Batrín Chojoj
Milán Casanova
Martinez Castellanos
Fion Morales
Lopez Giron
So it is not sorting by the first string, but by either of both (Batres, Batrín, Casanova, Castellanos, Fion, Giron).
If I try additionally
{
"order": "asc",
"mode": "max"
}
then I get:
Batrín Chojoj
Lopez Giron
Martinez Castellanos
Milán Casanova
Fion Morales
Batres Rivera
All the fields are indexed by default, I checked with
curl -XGET localhost/my_index/_mapping
and I get back
my_index: {
my_type: {
properties: {
FirstName: {
type: string
}LastName: {
type: string
}MiddleName: {
type: string
}
...
}
}
}
Does anyone know how to make the results to be ordered to be ordered alphabetically by the beginning string of the last name?
Thanks!
The problem is that your LastName field is analyzed, so the string Batres Rivera is indexed as a multi-value field with two terms: batres and rivera. But this isn't like an ordered array, it's more like a "bag of values". So when you try to sort on the field, it chooses one of the terms (the min or max) and sorts on that.
What you need to do is to store the LastName as a single term (Batres Rivera) for sorting purposes, by mapping the field as
{ "type": "string", "index": "not_analyzed"}
Obviously you can't then use that field for search purposes: you wouldn't be able to search for rivera and match on that field.
The way to support both searching and sorting is to use multi-fields: ie index the same value in two ways, one for searching and one for sorting.
In 0.90.* the syntax for multi-fields is:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "multi_field",
"fields": {
"LastName": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
In 1.0.* the multi_field type has been removed and now any core field type supports sub-fields as follows:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
So you can use the LastName field for searching, and the LastName.raw field for sorting:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
Language specific sorting
You should also look at using the ICU analysis plugin to sort using the Spanish sort order (or collation). This is a bit more complex but is worth using:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_folding"
]
},
"es_sorting": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"spanish"
]
}
},
"filter": {
"spanish": {
"type": "icu_collation",
"language": "es"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"analyzer": "folding",
"fields": {
"raw": {
"type": "string",
"analyzer": "es_sorting"
}
}
}
}
}
}
}'
We create a folding analyzer which we'll use for the LastName field, which will analyze a string like Muñoz Rivera into the two terms munoz (without the ~) and rivera. So a user can search for munoz or muñoz and either will match.
Then we create the es_sorting analyzer which indexes the proper sort order for muñoz rivera (lowercased) in Spanish.
Searching would be done in the same way:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
We need to know how you are indexing the name.
Please check this discussion link.
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-td932996.html
This should be very helpful for your case. This depends on your mapping settings. What analyzer you use for the name field.
Need your mapping definition to decide on a proper solution.

Indexing website/url in Elastic Search

I have a website field of a document indexed in elastic search. Example value: http://example.com . The problem is that when I search for example, the document is not included. How to map correctly the website/url field?
I created the index below:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_html":{
"type":"custom",
"tokenizer": "standard",
"filter":"standard",
"char_filter": "html_strip"
}
}
}
}
},
"mapping":{
"blogshops": {
"properties": {
"category": {
"properties": {
"name": {
"type": "string"
}
}
},
"reviews": {
"properties": {
"user": {
"properties": {
"_id": {
"type": "string"
}
}
}
}
}
}
}
}
}
I guess you are using standard analyzer, which splits http://example.dom into two tokens - http and example.com. You can take a look http://localhost:9200/_analyze?text=http://example.com&analyzer=standard.
If you want to split url, you need to use different analyzer or specify our own custom analyzer.
You can take a look how would be url indexed with simple analyzer - http://localhost:9200/_analyze?text=http://example.com&analyzer=simple. As you can see, now is url indexed as three tokens ['http', 'example', 'com']. If you don't want to index tokens like ['http', 'www'] etc, you can specify your analyzer with lowercase tokenizer (this is the one used in simple analyzer) and stop filter. For example something like this:
# Delete index
#
curl -s -XDELETE 'http://localhost:9200/url-test/' ; echo
# Create index with mapping and custom index
#
curl -s -XPUT 'http://localhost:9200/url-test/' -d '{
"mappings": {
"document": {
"properties": {
"content": {
"type": "string",
"analyzer" : "lowercase_with_stopwords"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"analysis": {
"filter" : {
"stopwords_filter" : {
"type" : "stop",
"stopwords" : ["http", "https", "ftp", "www"]
}
},
"analyzer": {
"lowercase_with_stopwords": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [ "stopwords_filter" ]
}
}
}
}
}' ; echo
curl -s -XGET 'http://localhost:9200/url-test/_analyze?text=http://example.com&analyzer=lowercase_with_stopwords&pretty'
# Index document
#
curl -s -XPUT 'http://localhost:9200/url-test/document/1?pretty=true' -d '{
"content" : "Small content with URL http://example.com."
}'
# Refresh index
#
curl -s -XPOST 'http://localhost:9200/url-test/_refresh'
# Try to search document
#
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
"query" : {
"query_string" : {
"query" : "content:example"
}
}
}'
NOTE: If you don't like to use stopwords here is interesting article stop stopping stop words: a look at common terms query

Resources