elasticsearch mapping is empty after creating - elasticsearch

I'm trying to create an autocomplete index for my elasticsearch using the search_as_you_type datatype.
My first command I run is
curl --request PUT 'https://elasticsearch.company.me/autocomplete' \
'{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type"
},
"serviceTitle": {
"type": "search_as_you_type"
}
}
}
}'
which returns
{"acknowledged":true,"shards_acknowledged":true,"index":"autocomplete"}curl: (3) nested brace in URL position 18:
{
"mappings": {
"properties": etc.the rest of the json object I created}}
Then I reindex using
curl --silent --request POST 'http://elasticsearch.company.me/_reindex?pretty' --data-raw '{
"source": {
"index": "existing_index"
},
"dest": {
"index": "autocomplete"
}
}' | grep "total\|created\|failures"
I expect to see some "total":1000,"created":5etc but some kind of response from the terminal, but I get nothing. Also, when I check the mapping of my autocomplete index, by running curl -u thething 'https://elasticsearch.company.me/autocomplete/_mappings?pretty',
I get an empty mapping result:
{
"autocomplete" : {
"mappings" : { }
}
}
Is my error in the creation of my index or the reindexing? I'm expecting the autocomplete mappings to show the two fields I'm searching for, ie: "company_name" and "serviceTitle". Any ideas how to fix?

Related

Elasticsearch join-like query within same index

I have an index with a following structure (mapping)
{
"properties": {
"content": {
"type": "text",
},
"prev_id": {
"type": "text",
},
"next_id": {
"type": "text",
}
}
}
where prev_id and next_id are IDs of documents in this index (may be null values).
I want to perform _search query and get prev.content and next.content fields.
Now I use two queries: the first for searching by content field
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"match": {
"content": "yellow fox"
}
}
}'
and the second to get next and prev records.
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"ids": {
"values" : ["5bb93552e42140f955501d7b77dc8a0a", "cd027a48445a0a193bc80982748bc846", "9a5b7359d3081f10d099db87c3226d82"]
}
}
}'
Then I join results on application-side.
Can I achieve my goal with one query only?
PS: the purpose to store next-prev as IDs is to safe disk space. I have a lot of records and content field is quite large.
What you are doing is the way to go. But how large is the content? - Maybe you can consider not storing content ( source = false)?

Elasticsearch: Issues reindexing - ending up with more than one type

ES 6.8.6
I am trying to reindex some indexes to reduce the number shards.
The original index had a type of 'auth' but recently I added a template that used _doc. When I tried:
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "auth_2019.03.02"
},
"dest": {
"index": "auth_ri_2019.03.02",
"type": "_doc"
}
}
'
I get this error:
"Rejecting mapping update to [auth_ri_2019.03.02] as the final mapping would have more than 1 type: [_doc, auth]"
I understand that I can't have more than one type and that types are depreciated in 7.x. My question is can I change the type during the reindex operation.
I am trying to tidy everything up in preparation to moving to 7.x.
It looks like you have to write a script to change the document during the reindex process.
From the docs,
Like _update_by_query, _reindex supports a script that modifies the document.
You are indeed able to change type.
Think of the possibilities! Just be careful; you are able to change:
_id,
_type,
_index,
_version,
_routing
For your case add
"script": {
"source": "ctx._type = '_doc'",
"lang": "painless"
}
Full example
{
"source": {
"index": "auth_2019.03.02"
},
"dest": {
"index": "auth_ri_2019.03.02",
},
"script": {
"source": "ctx._type = '_doc'",
"lang": "painless"
},
}
Firstly thanks to leandrojmp for prompting me to reread the docs and noticing the example where they had type specified for both source and dest.
I don't understand why but adding a type to the source specification solved the problem.
This worked:
curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "auth_2019.03.02",
"type": "auth"
},
"dest": {
"index": "auth_ri_2019.03.02",
"type": "_doc"
}
}
'

changing the timestamp format of elasticsearch index

I am trying to load log records into elasticsearch (7.3.1) and showing the results in kibana. I am facing the fact that although records are loaded into elasticearch and a curl GET shows them, they are not visible in kibana.
Most of the time, this is because of the timestamp format. In my case, the proper timestamp format should be basic_date_time, but the index only has:
# curl -XGET 'localhost:9200/og/_mapping'
{"og":{"mappings":{"properties":{"#timestamp":{"type":"date"},"componentName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}%
I would like to add format 'basic_date_time' to the #timestamp properties, but each try I do is either not accepted by elasticsearch or does not change the index field.
I simply fail to get the right command to do the job.
For example, the simplest I could think of,
Z cr 23;curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/og/_mapping' -d'
{"mappings":{"properties":{"#timestamp":{"type":"date","format":"basic_date_time"}}}}
'
gives error
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Root mapping definition has unsupported parameters: [mappings : {properties={#timestamp={format=basic_date_time, type=date}}}]"}],"type":"mapper_parsing_exception","reason":"Root mapping definition has unsupported parameters: [mappings : {properties={#timestamp={format=basic_date_time, type=date}}}]"},"status":400}%
and trying to do it via kibana with
PUT /og
{
"mappings": {
"properties": {
"#timestamp": { "type": "date", "format": "basic_date_time" }
}
}
}
gives
{
"error": {
"root_cause": [
{
"type": "resource_already_exists_exception",
"reason": "index [og/NIT2FoNfQpuPT3Povp97bg] already exists",
"index_uuid": "NIT2FoNfQpuPT3Povp97bg",
"index": "og"
}
],
"type": "resource_already_exists_exception",
"reason": "index [og/NIT2FoNfQpuPT3Povp97bg] already exists",
"index_uuid": "NIT2FoNfQpuPT3Povp97bg",
"index": "og"
},
"status": 400
}
I am not sure if I should even try this in kibana. But I would be very glad if I could find the right curl command to get the index changed.
Thanks for helping, Ruud
You can do it either via curl like this:
curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/og/_mapping' -d '{
"properties": {
"#timestamp": {
"type": "date",
"format": "basic_date_time"
}
}
}
'
Or in Kibana like this:
PUT /og/_mapping
{
"properties": {
"#timestamp": {
"type": "date",
"format": "basic_date_time"
}
}
}
Also worth noting is that once an index/mapping is created you can usually not modify it (very few exceptions). You can create a new index with the correct mapping and reindex your data into it.

Elasticsearch updating the analyzer creates a members field

I came across a problem where I needed to update the stopwords on an index, which was specifying the english analyzer as the default analyzer. Typically, the analyzers are specified in the settings for the index:
{
"twitter": {
"settings": {
"index": {
"creation_date": "1469465586110",
"analysis": {
"filter": {
"lowercaseFilter": {
"type": "lowercase"
}
},
"analyzer": {
"default": {
"type": "english"
},
...
So, the analyzers are located at <index name>.settings.index.analysis.analyzer
To update the analyzer, I ran these commands:
curl -XPOST "http://localhost:9200/twitter/_close" && \
curl -XPUT "http://localhost:9200/twitter/_settings" -d'
{
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
}
}
}
}' && \
curl -XPOST "http://localhost:9200/twitter/_open"
After running those commands, I verified that the default analyzer was analyzing text, and keeping all stopwords.
However, when I use the Jest client, now the settings look like this, and the analysis isn't happening properly (note how the analysis settings are under the "members" property now):
{
"twitter": {
"settings": {
"index": {
"members": {
"analysis": {
"analyzer": {
"default": {
"type": "english",
"stopwords": "_none_"
},
I've stepped through the code and everything looks in order:
I figured it out. So by running:
sudo tcpflow -p -c -i lo0 port 9200 2>/dev/null | grep -oE '.*(GET|POST|PUT|DELETE) .*_dev.*' -A30
I could see that the JsonObject I was sending was including the members field, which is where Gson's JsonObject stores the objects inside itself. Since I was passing this raw object into Jest's UpdateSettings builder, it was being serialized in a way I didn't expect (including the members field), and being sent to elasticsearch that way. I solved the problem by calling the JsonObject's toString() method and passing that to the UpdateSettings Builder

Why Elasticsearch "not_analyzed" field is split into terms?

I have the following field in my mapping definition:
...
"my_field": {
"type": "string",
"index":"not_analyzed"
}
...
When I index a document with value of my_field = 'test-some-another' that value is split into 3 terms: test, some, another.
What am I doing wrong?
I created the following index:
curl -XPUT localhost:9200/my_index -d '{
"index": {
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Then I index the following document:
curl -XPOST localhost:9200/my_index/my_type -d '{
"my_field": "test-some-another"
}'
Then I use the plugin https://github.com/jprante/elasticsearch-index-termlist with the following API:
curl -XGET localhost:9200/my_index/_termlist
That gives me the following response:
{"ok":true,"_shards":{"total":5,"successful":5,"failed":0},"terms": ["test","some","another"]}
Verify that mapping is actually getting set by running:
curl localhost:9200/my_index/_mapping?pretty=true
The command that creates the index seems to be incorrect. It shouldn't contain "index" : { as a root element. Try this:
curl -XPUT localhost:9200/my_index -d '{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
In ElasticSearch a field is indexed when it goes within the inverted index, the data structure that lucene uses to provide its great and fast full text search capabilities. If you want to search on a field, you do have to index it. When you index a field you can decide whether you want to index it as it is, or you want to analyze it, which means deciding a tokenizer to apply to it, which will generate a list of tokens (words) and a list of token filters that can modify the generated tokens (even add or delete some). The way you index a field affects how you can search on it. If you index a field but don't analyze it, and its text is composed of multiple words, you'll be able to find that document only searching for that exact specific text, whitespaces included.
You can have fields that you only want to search on, and never show: indexed and not stored (default in lucene).
You can have fields that you want to search on and also retrieve: indexed and stored.
You can have fields that you don't want to search on, but you do want to retrieve to show them.

Resources