Elasticsearch updating the analyzer creates a members field - elasticsearch

I came across a problem where I needed to update the stopwords on an index, which was specifying the english analyzer as the default analyzer. Typically, the analyzers are specified in the settings for the index:
{
"twitter": {
"settings": {
"index": {
"creation_date": "1469465586110",
"analysis": {
"filter": {
"lowercaseFilter": {
"type": "lowercase"
}
},
"analyzer": {
"default": {
"type": "english"
},
...
So, the analyzers are located at <index name>.settings.index.analysis.analyzer
To update the analyzer, I ran these commands:
curl -XPOST "http://localhost:9200/twitter/_close" && \
curl -XPUT "http://localhost:9200/twitter/_settings" -d'
{
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
}
}
}
}' && \
curl -XPOST "http://localhost:9200/twitter/_open"
After running those commands, I verified that the default analyzer was analyzing text, and keeping all stopwords.
However, when I use the Jest client, now the settings look like this, and the analysis isn't happening properly (note how the analysis settings are under the "members" property now):
{
"twitter": {
"settings": {
"index": {
"members": {
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
},
I've stepped through the code and everything looks in order:

I figured it out. So by running:
sudo tcpflow -p -c -i lo0 port 9200 2>/dev/null | grep -oE '.*(GET|POST|PUT|DELETE) .*_dev.*' -A30
I could see that the JsonObject I was sending was including the members field, which is where Gson's JsonObject stores the objects inside itself. Since I was passing this raw object into Jest's UpdateSettings builder, it was being serialized in a way I didn't expect (including the members field), and being sent to elasticsearch that way. I solved the problem by calling the JsonObject's toString() method and passing that to the UpdateSettings Builder

Related

elasticsearch mapping is empty after creating

I'm trying to create an autocomplete index for my elasticsearch using the search_as_you_type datatype.
My first command I run is
curl --request PUT 'https://elasticsearch.company.me/autocomplete' \
'{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type"
},
"serviceTitle": {
"type": "search_as_you_type"
}
}
}
}'
which returns
{"acknowledged":true,"shards_acknowledged":true,"index":"autocomplete"}curl: (3) nested brace in URL position 18:
{
"mappings": {
"properties": etc.the rest of the json object I created}}
Then I reindex using
curl --silent --request POST 'http://elasticsearch.company.me/_reindex?pretty' --data-raw '{
"source": {
"index": "existing_index"
},
"dest": {
"index": "autocomplete"
}
}' | grep "total\|created\|failures"
I expect to see some "total":1000,"created":5etc but some kind of response from the terminal, but I get nothing. Also, when I check the mapping of my autocomplete index, by running curl -u thething 'https://elasticsearch.company.me/autocomplete/_mappings?pretty',
I get an empty mapping result:
{
"autocomplete" : {
"mappings" : { }
}
}
Is my error in the creation of my index or the reindexing? I'm expecting the autocomplete mappings to show the two fields I'm searching for, ie: "company_name" and "serviceTitle". Any ideas how to fix?

Elasticsearch: Issues reindexing - ending up with more than one type

ES 6.8.6
I am trying to reindex some indexes to reduce the number shards.
The original index had a type of 'auth' but recently I added a template that used _doc. When I tried:
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "auth_2019.03.02"
},
"dest": {
"index": "auth_ri_2019.03.02",
"type": "_doc"
}
}
'
I get this error:
"Rejecting mapping update to [auth_ri_2019.03.02] as the final mapping would have more than 1 type: [_doc, auth]"
I understand that I can't have more than one type and that types are depreciated in 7.x. My question is can I change the type during the reindex operation.
I am trying to tidy everything up in preparation to moving to 7.x.
It looks like you have to write a script to change the document during the reindex process.
From the docs,
Like _update_by_query, _reindex supports a script that modifies the document.
You are indeed able to change type.
Think of the possibilities! Just be careful; you are able to change:
_id,
_type,
_index,
_version,
_routing
For your case add
"script": {
"source": "ctx._type = '_doc'",
"lang": "painless"
}
Full example
{
"source": {
"index": "auth_2019.03.02"
},
"dest": {
"index": "auth_ri_2019.03.02",
},
"script": {
"source": "ctx._type = '_doc'",
"lang": "painless"
},
}
Firstly thanks to leandrojmp for prompting me to reread the docs and noticing the example where they had type specified for both source and dest.
I don't understand why but adding a type to the source specification solved the problem.
This worked:
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "auth_2019.03.02",
"type": "auth"
},
"dest": {
"index": "auth_ri_2019.03.02",
"type": "_doc"
}
}
'

Mapping definition for [fields] has unsupported parameters: [analyzer : case_sensitive]

In my search engine, users can select to search case-sensitively or not. If they choose to, the query will search on fields which use a custom case-sensitive analyser. This is my setup:
GET /candidates/_settings
{
"candidates": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "candidates",
"creation_date": "1528210812046",
"analysis": {
"analyzer": {
"case_sensitive": {
"filter": [
"stop",
"porter_stem"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
...
}
}
}
}
So I have created a custom analyser called case_sensitive taken from this answer. I am trying to define my mapping as follows:
PUT /candidates/_mapping/candidate
{
"properties": {
"first_name": {
"type": "text",
"fields": {
"case": {
"type": "text",
"analyzer": "case_sensitive"
}
}
}
}
}
So, when querying, for a case-sensitive match, I can do:
simple_query_string: {
query: **text to search**,
fields: [
"first_name.case"
]
}
I am not even getting to the last step as I am getting the error described in the title when I am trying to define the mapping. The full stack trace is in the image below:
I initially thought that my error was similar to this one but I think that issue is only related to using the keyword tokenizer and not the standard one
In this mapping definition, I was actually trying to adjust the mapping for several different fields and not just first_name. One of these fields has the type long and that is the mapping definition that was throwing the error. When I remove that from the mapping definition, it works as expected. However, I am unsure as to why this fails for this data type?

How to denormalize hierarchy in ElasticSearch?

I am new to ElasticSearch and I have a tree, which describes a path to a certain document (not real filesystem paths, just simple text fields categorizing articles, images, documents as one). Each path entry has a type, like.: Group Name, Assembly name or even Unknown. The types could be used in queries to skip certain entries in the path for example.
My source data is stored in SQL Server, the schema looks something like this:
Tree builds up by connecting the Tree.Id to Tree.ParentId, but each node must have a type. The Documents are connected to a leaf in the Tree.
I am not worried about querying the structure in SQL Server, however I should find an optimal approach to denormalize and search them in Elastic. If I flatten the paths and make a list of "descriptors" for a document, I can store each of the Document entries as an Elastic Document.:
{
"path": "NodeNameRoot/NodeNameLevel_1/NodeNameLevel_2/NodeNameLevel_3/NodeNameLevel_4",
"descriptors": [
{
"name": "NodeNameRoot",
"type": "type1"
},
{
"name": "NodeNameLevel_1",
"type": "type1"
},
{
"name": "NodeNameLevel_2",
"type": "type2"
},
{
"name": "NodeNameLevel_3",
"type": "type2"
},
{
"name": "NodeNameLevel_4",
"type": "type3"
}
],
"document": {
...
}
}
Can I query such a structure in ElasticSearch? Or Should I denormalize the paths in a different way?
My main questions:
Can query them based on type or text value (regex matching for example). For example: Give me all the type2->type3 paths (practically leave the type1 out), where the path contains X?
Is it possible to query based on levels? Like I would like the paths where there are 4 descriptors.
Can I do the searching with the built-in functionality or do I need to write an extension?
Edit
Based on G Quintana 's anwser, I made an index like this.:
curl -X PUT \
http://localhost:9200/test \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"mappings": {
"path": {
"properties": {
"names": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"tokens": {
"type": "text",
"analyzer": "pathname_analyzer"
},
"depth": {
"type": "token_count",
"analyzer": "pathname_analyzer"
}
}
},
"types": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"tokens": {
"type": "text",
"analyzer": "pathname_analyzer"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"pathname_analyzer": {
"type": "pattern",
"pattern": "#->>",
"lowercase": true
}
}
}
}
}'
And could query the depth like this.:
curl -X POST \
http://localhost:9200/test/path/_search \
-H 'content-type: application/json' \
-d '{
"query": {
"bool": {
"should": [
{"match": { "names.depth": 5 }}
]
}
}
}'
Which return correct results. I will test it a little more.
First of all you should identify all your query patterns to design how you will index your data.
From the example you gave, I would index documents of the form:
{
"path": "NodeNameRoot/NodeNameLevel_1/NodeNameLevel_2/NodeNameLevel_3/NodeNameLevel_4",
"types: "type1/type1/type2/type2/type3",
"document": {
...
}
}
Before indexing, you must configure mapping and analysis:
Field path:
use type text + analyzer based on pattern analyzer to split at / characters
use type token_count + same analyzer to compute path depth. Create a multi field (path.depth)
Field types
use type text + analyzer based on pattern analyzer to split at / characters
Configure index mappings and analysis to split the path and types fields and the , use a or a
Give me all the type2->type3 paths use a match_phrase query on the types field
where the path contains X use match query on the path field
where there are 4 descriptors use term query on path.depth sub field
Your descriptors field is not interesting.
The Path tokenizer might be interesting for some usecases.
You can apply multiple analyzer on the same field using multi-fields and then query if sub fields.

Change settings and mappings on existing index in Elasticsearch

I would like the following settings and mapping set on an already existing index in Elasticsearch:
{
"analysis": {
"analyzer": {
"dot-analyzer": {
"type": "custom",
"tokenizer": "dot-tokenizer"
}
},
"tokenizer": {
"dot-tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
}
}
}
{
"doc": {
"properties": {
"location": {
"type": "string",
"index_analyzer": "dot-analyzer",
"search_analyzer": "keyword"
}
}
}
}
I have tried to add these two lines of code:
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
But this is the result:
org.elasticsearch.index.mapper.MapperParsingException: Analyzer [dot-analyzer] not found for field [location]
Anyone? Thanks a lot,
Stine
This seems to work:
if (client.admin().indices().prepareExists(Index).execute().actionGet().exists()) {
client.admin().indices().prepareClose(Index).execute().actionGet();
client.admin().indices().prepareUpdateSettings(Index).setSettings(settings.string()).execute().actionGet();
client.admin().indices().prepareOpen(Index).execute().actionGet();
client.admin().indices().prepareDeleteMapping(Index).setType(Type).execute().actionGet();
client.admin().indices().preparePutMapping(Index).setType(Type).setSource(mapping).execute().actionGet();
} else {
client.admin().indices().prepareCreate(Index).addMapping(Type, mapping).setSettings(settings).execute().actionGet();
}
If you look at your settings after sending the changes you'll notice that the analyzer is not there. In fact you can't change the analysis section of the settings on a live index. Better to create it with the desired settings, otherwise you can just close it:
curl -XPOST localhost:9200/index_name/_close
While the index is closed you can send the new settings. After that you can reopen the index:
curl -XPOST localhost:9200/index_name/_open
While the index is closed it doesn't use any cluster resource, but it is not readable nor writable. If you want to close and reopen the index using the Java API you can use the following code:
client.admin().indices().prepareClose(indexName).execute().actionGet();
//TODO update settings
client.admin().indices().prepareOpen(indexName).execute().actionGet();

Resources