Latest fsriver plugin not working - elasticsearch

I have issue with latest fsriver plugin.
I executed following command to index document
PUT _river/mynewriver2/_meta
{
"type": "fs",
"fs": {
"url": "d://tmp",
"update_rate": "1h",
"includes": [ "*.doc" , "*.xls", "*.txt" ]
},
"index": {
"index": "docs1",
"type": "doc1",
"bulk_size": 50
}
}
Inside d://tmp I have a simple txt file with person name.
But when I am executing the command to check document, I am not getting any document.
GET docs1/doc1/_search
output :
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
In elasticsearch console, I have following log:
[2015-05-23 12:40:40,645][INFO ][cluster.metadata ] [Ulysses] [.marvel-2015.05.23] update_mapping [cluster_stats] (dynamic)
[2015-05-23 12:40:54,037][INFO ][cluster.metadata ] [Ulysses] [_river] creating index, cause [auto(index api)], templates [], shards [1]/[1], mappings [mynewriver2]
[2015-05-23 12:40:56,511][INFO ][cluster.metadata ] [Ulysses] [_river] update_mapping [mynewriver2] (dynamic)
[2015-05-23 12:40:57,023][INFO ][fr.pilato.elasticsearch.river.fs.river.FsRiver] [Ulysses] [fs][mynewriver2] Starting fs river scanning
[2015-05-23 12:40:57,309][INFO ][cluster.metadata ] [Ulysses] [docs1] creating index, cause [api], templates [], shards [5]/[1], mappings []
[2015-05-23 12:41:00,762][INFO ][cluster.metadata ] [Ulysses] [.marvel-2015.05.23] update_mapping [index_event] (dynamic)
I am running elasticsearch 1.5.2 in windows 7 ( 64 bit).

Since you're on a Windows system, it looks like the specified path is not correct according to the documentation, i.e. you should either use two back slashes instead of two forward slashes in your path OR a single forward slash. Can you try to delete your river and re-create it like this
PUT _river/mynewriver2/_meta
{
"type": "fs",
"fs": {
"url": "d:\\tmp",
"update_rate": "1h",
"includes": [ "*.doc" , "*.xls", "*.txt" ]
},
"index": {
"index": "docs1",
"type": "doc1",
"bulk_size": 50
}
}
or like this:
PUT _river/mynewriver2/_meta
{
"type": "fs",
"fs": {
"url": "d:/tmp",
"update_rate": "1h",
"includes": [ "*.doc" , "*.xls", "*.txt" ]
},
"index": {
"index": "docs1",
"type": "doc1",
"bulk_size": 50
}
}

Related

Elasticsearch alias not being created on index creation

I'm using the go-elasticsearch API in my application to create indices in an Elastic.co cloud cluster. The application dynamically creates an index with a template and then starts indexing documents. The template includes an alias name and look like this:
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"created_at": {
"type": "date"
},
"updated_at": {
"type": "date"
},
"status": {
"type": "keyword"
}
}
},
"aliases": {
"rollout-nodes-f0776f0": {}
}
}
The name of the alias can change, so we pass it to the template when we create a new index. This is done with the Create indices API in Go:
indexTemplate := getIndexTemplate()
res, err := n.client.Indices.Create(
indexName,
n.client.Indices.Create.WithBody(indexTemplate),
n.client.Indices.Create.WithContext(ctx),
n.client.Indices.Create.WithTimeout(time.Second),
)
Doing some testing, this code works on localhost (without security enabled) but is not working with the cluster in Elastic.co, the index is created but not the alias.
I think it should be a problem related with either the API Key permissions or some configuration in the server, but I was unable to find yet which permission I'm missing.
For more context, this is the API Key I'm using:
{
"id": "fakeID",
"name": "index-service-key",
"creation": 1675350573126,
"invalidated": false,
"username": "fakeUser",
"realm": "cloud-saml-kibana",
"metadata": {},
"role_descriptors": {
"logstash_writer": {
"cluster": [
"monitor",
"transport_client",
"read_ccr",
"read_ilm",
"manage_index_templates"
],
"indices": [
{
"names": [
"*"
],
"privileges": [
"all"
],
"allow_restricted_indices": false
}
],
"applications": [],
"run_as": [],
"metadata": {},
"transient_metadata": {
"enabled": true
}
}
}
}
Any ideas? I know I can use the POST _aliases API, but the index creation option should be working too.

Mapping exception while executing Rollup job

I have a Kibana instance which stores log data from our java apps in per daily indexes, like logstash-java-beats-2019.09.01. As far as amount of indexes could be pretty big in future I want to create a rollup job, to be able to archive old logs in separate index, something like logstash-java-beats-rollup. Typical document in logstash-java-beats-2019.09.01 index looks like this:
{
"_index": "logstash-java-beats-2019.10.01",
"_type": "_doc",
"_id": "C9mfhG0Bf_Fr5GBl6kTg",
"_version": 1,
"_score": 1,
"_source": {
"#timestamp": "2019-10-01T00:02:13.756Z",
"ecs": {
"version": "1.0.0"
},
"event_timestamp": "2019-10-01 00:02:13,756",
"log": {
"offset": 5729359,
"file": {
"path": "/var/log/application-name/application.log"
}
},
"tags": [
"service-name",
"location",
"beats_input_codec_plain_applied"
],
"loglevel": "WARN",
"java_class": "java.class.name",
"message": "Log message here",
"host": {
"name": "host-name-beat"
},
"#version": "1",
"agent": {
"hostname": "host-name",
"id": "a34af368-3359-495a-9775-63502693d148",
"ephemeral_id": "cc4afd3c-ad97-47a4-bd21-72255d450232",
"type": "filebeat",
"version": "7.2.0",
"name": "host-name-beat"
},
"input": {
"type": "log"
}
}
}
So I created a rollup job with such config:
{
"config": {
"id": "Test 2 job",
"index_pattern": "logstash-java-beats-2*",
"rollup_index": "logstash-java-beats-rollup",
"cron": "0 0 * * * ?",
"groups": {
"date_histogram": {
"fixed_interval": "1000ms",
"field": "#timestamp",
"delay": "1d",
"time_zone": "UTC"
}
},
"metrics": [],
"timeout": "20s",
"page_size": 1000
},
"status": {
"job_state": "stopped",
"current_position": {
"#timestamp.date_histogram": 1567933199000
},
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 1840,
"documents_processed": 5322525,
"rollups_indexed": 1838383,
"trigger_count": 1,
"index_time_in_ms": 1555018,
"index_total": 1839,
"index_failures": 0,
"search_time_in_ms": 59059,
"search_total": 1840,
"search_failures": 0
}
}
but it fails to rollup the data with such exception:
Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$GTvyIZtPhKqi-dtfVd6MXg], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[1]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$v-r89eEpLvImr0lWIrOb_Q], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[2]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$quCHwZP1iVU_Bs2fmhgSjQ], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
...
logstash-java-beats-rollup index is empty, even if there is some stats for the rollup job available.
I'm using elasticsearch v7.2.0
Could you please explain what is wrong with the data, or with the rollup job configuration?

Elasticsearch queries consuming 100% of CPU

I'm still relatively new to Elasticsearch and, currently, I'm attempting to switch from Solr to Elasticsearch and am seeing a huge increase in CPU usage when ES is on our production website. The site sees anywhere from 10,000 to 30,000 requests to ES per second. Solr handles that load just fine with our current hardware.
The books index mapping: https://pastebin.com/bKM9egPS
A query for a book: https://pastebin.com/AdfZ895X
ES is hosted on AWS on an m4.xlarge.elasticsearch instance.
Our cluster is set up as follows (anything not included is default):
"persistent": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "2",
"node_concurrent_recoveries": "2",
"disk": {
"watermark": {
"low": "15.0gb",
"flood_stage": "5.0gb",
"high": "10.0gb"
}
},
"node_initial_primaries_recoveries": "4"
}
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "60mb"
}
}
Our nodes have the following configuration:
"_nodes": {
"total": 2,
"successful": 2,
"failed": 0
},
"cluster_name": "cluster",
"nodes": {
"####": {
"name": "node1",
"version": "6.3.1",
"build_flavor": "oss",
"build_type": "zip",
"build_hash": "####",
"roles": [
"master",
"data",
"ingest"
]
},
"###": {
"name": "node2",
"version": "6.3.1",
"build_flavor": "oss",
"build_type": "zip",
"build_hash": "###",
"roles": [
"master",
"data",
"ingest"
]
}
}
Can someone please help me figure out what exactly is happening so I can get this deployment finished?

Healthy Elasticsearch cluster turns RED after opening a closed index

I have a managed cluster hosted by elastio.co. Here is the configuration
|Platform => Amazon Web Services| |Memory => 4 GB|
|Storage => 96 GB| |SSD => Yes| |High availability => Yes 2 data centers|
Each index in this cluster contain log data of exactly one day. Average index size is 15 mb and average doc count is 15000. The cluster is not in any way under any kind of pressure (JVM, Indexing & Searching time, Disk Space all are in very comfort zone)
When I opened a previously closed index the cluster is turned RED. Here are some matrices I found querying the elasticsearch.
GET /_cluster/allocation/explain
{
"index": "some_index_name", # 1 Primary shard , 1 replica shard
"shard": 0,
"primary": true
}
Response :
"unassigned_info": {
"reason": "ALLOCATION_FAILED"
"failed_allocation_attempts": 3,
"details": "failed recovery, failure RecoveryFailedException[[some_index_name][0]: Recovery failed on {instance-*****}{Hash}{HASH}{IP}{IP}{logical_availability_zone=zone-1, availability_zone=***, region=***}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFoundException[no segments* file found in store(mmapfs(/app/data/nodes/0/indices/MFIFAQO2R_ywstzqrfbY4w/0/index)): files: []]; ",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
{
"node_name": "instance-***",
"node_decision": "no",
"store": {
"in_sync": false,
"allocation_id": "RANDOM_HASH",
"store_exception": {
"type": "index_not_found_exception",
"reason": "no segments* file found in SimpleFSDirectory#/app/data/nodes/0/indices/RANDOM_HASH/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory#346e1b99: files: []"
}
}
},
{
"node_name": "instance-***",
"node_attributes": {
"logical_availability_zone": "zone-0",
},
"node_decision": "no",
"store": {
"found": false
}
}
I've tried rerouting the shards to a node. Even setting data loss flag to true.
POST _cluster/reroute
{
"commands" : [
{"allocate_stale_primary" : {
"index" : "some_index_name", "shard" : 0,
"node" : "instance-***",
"accept_data_loss" : true
}
}
]
}
Response:
"acknowledged": true,
"state": {
"version": 338190,
"state_uuid": "RANDOM_HASH",
"master_node": "RANDOM_HASH",
"blocks": {
"indices": {
"restored_**: {
"4": {
"description": "index closed",
"retryable": false,
"levels": [
"read",
"write"
]
}
},
"restored_**": {
"4": {
"description": "index closed",
"retryable": false,
"levels": [
"read",
"write"
]
}
}
}
},
"routing_table": {
"indices": {
"SOME_INDEX_NAME": {
"shards": {
"0": [
{
"state": "INITIALIZING",
"primary": true,
"relocating_node": null,
"shard": 0,
"index": "SOME_INDEX_NAME",
"recovery_source": {
"type": "EXISTING_STORE"
},
"allocation_id": {
"id": "HASH"
},
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"failed_attempts": 4,
"delayed": false,
"details": "same as explanation above ^ ",
"allocation_status": "no_valid_shard_copy"
}
},
{
"state": "UNASSIGNED",
"primary": false,
"node": null,
"relocating_node": null,
"shard": 0,
"index": "some_index_name",
"recovery_source": {
"type": "PEER"
},
"unassigned_info": {
"reason": "INDEX_REOPENED",
"delayed": false,
"allocation_status": "no_attempt"
}
}
]
}
},
Any kind of suggestion is welcomed. Thanks and regards.
This occurs when the master-node is brought down abruptly.
Here are the steps I took to resolve the same issue, that I had encountered ,
Step 1: Check the allocation
curl -XGET http://localhost:9200/_cat/allocation?v
Step 2: Check the shard stores
curl -XGET http://localhost:9200/_shard_stores?pretty
Look out for "index", "shard" and "node" that has the error that you displayed.
The ERROR should be --> "no segments* file found in SimpleFSDirectory#/...."
Step 3: Now reroute that index as shown below
curl -XPOST 'http://localhost:9200/_cluster/reroute?master_timeout=5m' \
-d '{ "commands": [ { "allocate_empty_primary": { "index": "IndexFromStep2", "shard": ShardFromStep2 , "node": "NodeFromStep2", "accept_data_loss" : true } } ] }'
Step 4: Repeat Step2 and Step3 until you see this output.
curl -XGET 'http://localhost:9200/_shard_stores?pretty'
{
"indices" : { }
}
Your cluster should go green soon.

Cluster Red - unallocated shards in a index

My cluster suddenly went to red. Because of an index shard allocation fail. when i run
GET /_cluster/allocation/explain
{
"index": "my_index",
"shard": 0,
"primary": true
}
output:
{
"shard": {
"index": "twitter_tracker",
"index_uuid": "mfXc8oplQpq2lWGjC1TxbA",
"id": 0,
"primary": true
},
"assigned": false,
"shard_state_fetch_pending": false,
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2018-01-02T08:13:44.513Z",
"failed_attempts": 1,
"delayed": false,
"details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: NotSerializableExceptionWrapper[shard_lock_obtain_failed_exception: [twitter_tracker][0]: obtaining shard lock timed out after 5000ms]; ",
"allocation_status": "no_valid_shard_copy"
},
"allocation_delay_in_millis": 60000,
"remaining_delay_in_millis": 0,
"nodes": {
"n91cV7ocTh-Zp58dFr5rug": {
"node_name": "elasticsearch-24-384-node-1",
"node_attributes": {},
"store": {
"shard_copy": "AVAILABLE"
},
"final_decision": "YES",
"final_explanation": "the shard can be assigned and the node contains a valid copy of the shard data",
"weight": 0.45,
"decisions": []
},
"_b-wXdjGRdGLEtvY76PDSA": {
"node_name": "elasticsearch-24-384-node-2",
"node_attributes": {},
"store": {
"shard_copy": "NONE"
},
"final_decision": "NO",
"final_explanation": "there is no copy of the shard available",
"weight": 0,
"decisions": []
}
}
}
What will be the solution? This is happened in my production node. My elasticsearch version 5.0. and i have two nodes
It is an issue that every Elastic Cluster developer will bump to anyway :)
Safeway to reroute your red index.
curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed
This command will take some time, but you won't get allocation error while data transferring.
Here is issue explained wider.
I solved my issue with the following command.
curl -XPOST 'localhost:9200/_cluster/reroute?pretty' -d '{
"commands" : [ {
"allocate_stale_primary" :
{
"index" : "da-prod8-other", "shard" : 3,
"node" : "node-2-data-pod",
"accept_data_loss" : true
}
}
]
}'
So here u might lose the data. For me it works well. It was thrilling while running the command. luckily it worked well. For more details enter link description here Check this thread.

Resources