Adding ngram to existing index - elasticsearch

Is there a way to add ngram matching to existing index? I saw plenty examples in documentation how to create index which will search through using ngrams, but when I try to follow those instructions I get error:
{"error":"IndexAlreadyExistsException[[nameOfIndex] already exists]","status":400}
Example curl which I'm using:
curl -XPUT elasticUrl/nameOfIndex -d '{
"settings": {
"number_of_shards": 1
},
"mappings": {
"title": {
"properties": {
"text_field": {
"type": "string",
"term_vector": "yes"
}
}
}
}
}'

Try this:
First close the index, then apply updates and then open the index
POST /blog/_close
// apply index settings updates
POST /blog/_open
Source: link

Related

elasticsearch mapping is empty after creating

I'm trying to create an autocomplete index for my elasticsearch using the search_as_you_type datatype.
My first command I run is
curl --request PUT 'https://elasticsearch.company.me/autocomplete' \
'{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type"
},
"serviceTitle": {
"type": "search_as_you_type"
}
}
}
}'
which returns
{"acknowledged":true,"shards_acknowledged":true,"index":"autocomplete"}curl: (3) nested brace in URL position 18:
{
"mappings": {
"properties": etc.the rest of the json object I created}}
Then I reindex using
curl --silent --request POST 'http://elasticsearch.company.me/_reindex?pretty' --data-raw '{
"source": {
"index": "existing_index"
},
"dest": {
"index": "autocomplete"
}
}' | grep "total\|created\|failures"
I expect to see some "total":1000,"created":5etc but some kind of response from the terminal, but I get nothing. Also, when I check the mapping of my autocomplete index, by running curl -u thething 'https://elasticsearch.company.me/autocomplete/_mappings?pretty',
I get an empty mapping result:
{
"autocomplete" : {
"mappings" : { }
}
}
Is my error in the creation of my index or the reindexing? I'm expecting the autocomplete mappings to show the two fields I'm searching for, ie: "company_name" and "serviceTitle". Any ideas how to fix?

elasticsearch aggregations separated words

I simply run an aggregations in browser plugin(marvel) as you see in picture below there is only one doc match the query but aggregrated separated by spaces but it doesn't make sense I want aggregrate for different doc.. ın this scenario there should be only one group with count 1 and key:"Drow Ranger".
What is the true way of do this in elasticsearch..
It's probably because your heroname field is analyzed and thus "Drow Ranger" gets tokenized and indexed as "drow" and "ranger".
One way to get around this is to transform your heroname field to a multi-field with an analyzed part (the one you search on with the wildcard query) and another not_analyzed part (the one you can aggregate on).
You should create your index like this and specify the proper mapping for your heroname field
curl -XPUT localhost:9200/dota2 -d '{
"mappings": {
"agust": {
"properties": {
"heroname": {
"type": "string",
"fields": {
"raw: {
"type": "string",
"index": "not_analyzed"
}
}
},
... your other fields go here
}
}
}
}
Then you can run your aggregation on the heroname.raw field instead of the heroname field.
UPDATE
If you just want to try on the heroname field, you can just modify that field and not recreate the whole index. If you run the following command, it will simply add the new heroname.raw sub-field to your existing heroname field. Note that you still have to reindex your data though
curl -XPUT localhost:9200/dota2/_mapping/agust -d '{
"properties": {
"heroname": {
"type": "string",
"fields": {
"raw: {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Then you can keep using heroname in your wildcard query, but your aggregation will look like this:
{
"aggs": {
"asd": {
"terms": {
"field": "heroname.raw", <--- use the raw field here
"size": 0
}
}
}
}

copy_to and custom analyzer not working

(I'm doing this with a fresh copy of Elasticsearch 1.5.2)
I've defined a custom analyzer and it's working:
curl -XPUT 127.0.0.1:9200/test -d '{
"settings": {
"index": {
"analysis": {
"tokenizer": {
"UrlTokenizer": {
"type": "pattern",
"pattern": "https?://([^/]+)",
"group": 1
}
},
"analyzer": {
"accesslogs": {
"tokenizer": "UrlTokenizer"
}
}
}
}
}
}'; echo
curl '127.0.0.1:9200/test/_analyze?analyzer=accesslogs&text=http://192.168.1.1/123?a=2#1111' | json_pp
Now I apply it to an index:
curl -XPUT 127.0.0.1:9200/test/accesslogs/_mapping -d '{
"accesslogs" : {
"properties" : {
"referer" : { "type" : "string", "copy_to" : "referer_domain" },
"referer_domain": {
"type": "string",
"analyzer": "accesslogs"
}
}
}
}'; echo
From the mapping I can see both of them are applied.
Now I try to insert some data,
curl 127.0.0.1:9200/test/accesslogs/ -d '{
"referer": "http://192.168.1.1/aaa.php",
"response": 100
}';echo
And the copy_to field, aka referer_domain was not generated and if I try to add a field with that name, the tokenizer is not applied either.
Any ideas?
copy_to works but, you are assuming that since you don't see the field being generated, it doesn't exist.
When you return your document back (with GET /test/accesslogs/1 for example), you don't see the field under _source. This contains the original document that has been indexed. And you didn't index any referer_domain field, just referer and response. And this is the reason why you don't see it.
But Elasticsearch does create that field in the inverted index. You can use it to query, compute or retrieve if you stored it.
Let me exemplify my statements:
you can query that field and you will get results back based on it. If you really want to see what has been stored in the inverted index, you can do this:
GET /test/accesslogs/_search
{
"fielddata_fields": ["referer","response","referer_domain"]
}
you can, also, retrieve that field if you stored it:
"referer_domain": {
"type": "string",
"analyzer": "accesslogs",
"store" : true
}
with this:
GET /test/accesslogs/_search
{
"fields": ["referer","response","referer_domain"]
}
In conclusion, copy_to modifies the indexed document, not the source document. You can query your documents having that field and it will work because the query looks at the inverted index. If you want to retrieve that field you need to store it, as well. But you will not see that field in the _source field because _source is the initial document that has been indexed. And the initial document doesn't contain referer_domain.

I can not change the analyzer and mapping at already existing index

I would like change settings and mapping on an already existing index in elasticsearch. However, I get the error.
curl -XPOST localhost:9200/myindex/_close
{"acknowledged":true}
curl -XPUT localhost:9200/myindex/_settings -d '{
"index": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
}'
{"acknowledged":true}
curl -XPUT localhost:9200/myindex/mytype/_mapping -d '{
"properties": {
"myfield": {
"type": "string",
"search_analyzer": "custom_analyzer",
"index_analyzer": "custom_analyzer"
}
}
}'
{"error":"MergeMappingException[Merge failed with failures {[mapper [myfield] has different index_analyzer]}]","status":400}
What am I doing wrong?
All indices are stored immutable so you have to reindex you data.
zero downtime ver. 1
To reindex you can follow this steps:
create index with wanted settings/mappings
pull you data from old to new index with the _bulk API
zero downtime ver. 2
Or you can create a new index with wanted settings/mappings and create a alias (with the old index name) to the newly created index.
downtime needed
The last way to do this is to close your index, make your changes and reopen your index again.

Why Elasticsearch "not_analyzed" field is split into terms?

I have the following field in my mapping definition:
...
"my_field": {
"type": "string",
"index":"not_analyzed"
}
...
When I index a document with value of my_field = 'test-some-another' that value is split into 3 terms: test, some, another.
What am I doing wrong?
I created the following index:
curl -XPUT localhost:9200/my_index -d '{
"index": {
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Then I index the following document:
curl -XPOST localhost:9200/my_index/my_type -d '{
"my_field": "test-some-another"
}'
Then I use the plugin https://github.com/jprante/elasticsearch-index-termlist with the following API:
curl -XGET localhost:9200/my_index/_termlist
That gives me the following response:
{"ok":true,"_shards":{"total":5,"successful":5,"failed":0},"terms": ["test","some","another"]}
Verify that mapping is actually getting set by running:
curl localhost:9200/my_index/_mapping?pretty=true
The command that creates the index seems to be incorrect. It shouldn't contain "index" : { as a root element. Try this:
curl -XPUT localhost:9200/my_index -d '{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
In ElasticSearch a field is indexed when it goes within the inverted index, the data structure that lucene uses to provide its great and fast full text search capabilities. If you want to search on a field, you do have to index it. When you index a field you can decide whether you want to index it as it is, or you want to analyze it, which means deciding a tokenizer to apply to it, which will generate a list of tokens (words) and a list of token filters that can modify the generated tokens (even add or delete some). The way you index a field affects how you can search on it. If you index a field but don't analyze it, and its text is composed of multiple words, you'll be able to find that document only searching for that exact specific text, whitespaces included.
You can have fields that you only want to search on, and never show: indexed and not stored (default in lucene).
You can have fields that you want to search on and also retrieve: indexed and stored.
You can have fields that you don't want to search on, but you do want to retrieve to show them.

Resources