kafka connector elasticsearch not consuming topic - elasticsearch

This is my kafka connector properties
##
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##
# This file contains some of the configurations for the Kafka Connect distributed worker. This file is intended
# to be used with the examples, and some settings may differ from those used in a production system, especially
# the `bootstrap.servers` and those specifying replication factors.
# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
bootstrap.servers=localhost:9092
# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=false
value.converter.schemas.enable=false
# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
offset.storage.topic=__connect_offsets
offset.storage.replication.factor=1
#offset.storage.partitions=25
# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
config.storage.topic=__connect_configs
config.storage.replication.factor=1
# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
status.storage.topic=__connect_status
status.storage.replication.factor=1
#status.storage.partitions=5
# Flush much faster than normal, which is useful for testing/debugging
#offset.flush.interval.ms=10000
# These are provided to inform the user about the presence of the REST host and port configs
# Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.
#rest.host.name=
#rest.port=8083
# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
#rest.advertised.host.name=
#rest.advertised.port=
# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java
And this is the POST body I use for creating the Elasticsearch sink
{
"name" : "test-distributed-connector",
"config" : {
"connector.class" : "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max" : "2",
"topics.regex" : "^test[0-9A-Za-z-_]*(?<!-raw$)$",
"connection.url" : "http://elasticsearch:9200",
"connection.username": "admin",
"connection.password": "admin",
"type.name" : "_doc",
"key.ignore" : "true",
"schema.ignore" : "true",
"transforms": "TimestampRouter",
"transforms.TimestampRouter.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.TimestampRouter.topic.format": "${topic}-${timestamp}",
"transforms.TimestampRouter.timestamp.format": "YYYY.MM.dd",
"batch.size": "100",
"offset.flush.interval.ms":"60000",
"offset.flush.timeout.ms": "15000",
"read.timeout.ms": "15000",
"connection.timeout.ms": "10000",
"max.buffered.records": "1500"
}
}
The issue I met is that sometimes this sink would work and send data to Elasticsearch and showing
[2020-09-15 20:27:05,904] INFO WorkerSinkTask{id=test-distributed-connector-0} Committing offsets asynchronously using sequence number 1.......
But most of the time it would just stuck and repeat this part
[2020-09-15 20:24:29,458] INFO [Consumer clientId=consumer-4, groupId=connect-test-distributed-connector] Group coordinator kafka:9092 (id: 2147483543 rack: null) is unavailable or invalid, will attempt rediscovery (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:706)
[2020-09-15 20:24:29,560] INFO [Consumer clientId=consumer-4, groupId=connect-test-distributed-connector] Discovered group coordinator kafka:9092 (id: 2147483543 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:654)
[2020-09-15 20:24:29,561] INFO [Consumer clientId=consumer-4, groupId=connect-test-distributed-connector] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:486)
One of the doubt I have is that this Elasticsearch sink would read lot's of topic with quite amount of message/data.
So that it has issue when trying to read the topic from Kafka
Because I have another Elasticsearh sink with basically identical setting with this one and that one works.
Is there any method/tweak can be done to make this Elasticseach work?
######## Update#########
Also sometimes(quite frequently) I'll see this log
[2020-09-16 09:51:18,189] WARN [Consumer clientId=consumer-6, groupId=connect-test-distributed-connector] Close timed out with 1 pending requests to coordinator, terminating client connections (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:769)
and
[2020-09-16 10:17:43,369] WARN [Consumer clientId=consumer-16, groupId=connect-test-distributed-connector-] Close timed out with 1 pending requests to coordinator, terminating client connections (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:769)

You are using a secure connection, I assume your xpack is enabled.
try changing
"connection.url" : "http://elasticsearch:9200"
to
"connection.url" : "https://elasticsearch:9200"

Related

Using multiple pipelines in Logstash with beats input

As per an earlier discussion (Defining multiple outputs in Logstash whilst handling potential unavailability of an Elasticsearch instance) I'm now using pipelines in Logstash in order to send data input (from Beats on TCP 5044) to multiple Elasticsearch hosts. The relevant extract from pipelines.yml is shown below.
- pipeline.id: beats
queue.type: persisted
config.string: |
input {
beats {
port => 5044
ssl => true
ssl_certificate_authorities => '/etc/logstash/config/certs/ca.crt'
ssl_key => '/etc/logstash/config/certs/forwarder-001.pkcs8.key'
ssl_certificate => '/etc/logstash/config/certs/forwarder-001.crt'
ssl_verify_mode => "force_peer"
}
}
output { pipeline { send_to => [es100, es101] } }
- pipeline.id: es100
path.config: "/etc/logstash/pipelines/es100.conf"
- pipeline.id: es101
path.config: "/etc/logstash/pipelines/es101.conf"
In each of the pipeline .conf files I have the related virtual address i.e. the file /etc/logstash/pipelines/es101.conf includes the following:
input {
pipeline {
address => es101
}
}
This configuration seems to work well i.e. data is received by each of the Elasticsearch hosts es100 and es101.
I need to ensure that if one of these hosts is unavailable, the other still receives data and thanks to a previous tip, I'm now using pipelines which I understand allow for this. However I'm obviously missing something key in this configuration as the data isn't received by a host if the other is unavailable. Any suggestions are gratefully welcomed.
Firstly, you should configure persistent queues on the downstream pipelines (es100, es101), and size them to contain all the data that arrives during an outage. But even with persistent queues logstash has an at-least-once delivery model. If the persistent queue fills up then back-pressure will cause the beats input to stop accepting data. As the documentation on the output isolator pattern says "If any of the persistent queues of the downstream pipelines ... become full, both outputs will stop". If you really want to make sure an output is never blocked because another output is unavailable then you will need to introduce some software with a different delivery model. For example, configure filebeat to write to kafka, then have two pipelines that read from kafka and write to elasticsearch. If kafka is configured with an at-most-once delivery model (the default) then it will lose data if it cannot deliver it.

Keep getting Fetch offset is out of range for partition resetting offset

We have a spring boot application that reads from topic XXXX and writes to topic YYYY
We have noticed that Topic YYYY contains duplicate records
After some investigation we saw the following WARN message in spring boot application's log
Fetch offset 948982335 is out of range for partition XXXX-6, resetting offset
Resetting offset for partition XXXX-6 to offset 568675320.
we are suspecting that this warning message is what causes the duplication.
so:
After some research we found the following answers frequent-offset-out-of-range-messages-partitions-deserted-by-consumer and kafka-consumer-offsets-out-of-range-with-no-configured-reset-policy-for-partitio , but it did not explain neither the cause nor the solution.
and this is a code snippet for my spring application
topology.addSource("XXXXRawSourceInput", configProperties.XXXX)
.addProcessor("XXXXProcessor", ProcessorSupplier {
XXXXProcessor()
}, "XXXXSourceInput")
.addSink("SinkRawProcessorToYYYY", "YYYY", "XXXXProcessor")
This is my server config
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
#broker.id=0
broker.id.generation.enable=true
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
advertised.listeners=PLAINTEXT://km-kafka06:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=30
# The number of threads that the server uses for processing requests, which may include disk I/O
#num.io.threads=8
num.io.threads=16
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma separated list of directories under which to store log files
#log.dirs=/tmp/kafka-logs
log.dirs=/data/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
#num.partitions=1
num.partitions=7
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
#num.recovery.threads.per.data.dir=1
num.recovery.threads.per.data.dir=4
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
#offsets.topic.replication.factor=1
#transaction.state.log.replication.factor=1
#transaction.state.log.min.isr=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=3
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=480
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=km-zookeeper01:2181,km-zookeeper02:2181,km-zookeeper03:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
#group.initial.rebalance.delay.ms=0
group.initial.rebalance.delay.ms=5000
# To enable topics deletion
delete.topic.enable=false
# To enable topic auto creation
1- Why this behavior keeps happening?
2- How to solve this issue?
It was a bug in kafka, solved after upgrading,
same issue mentioned here: Consumer offset reset after new segment rolling

Kafka producer unexpected behaviour

I am running into a strange behaviour with my Kafka producer and consumer.
Below is my setup on my local machine
1 zookeper node
2 kafka broker nodes
1 producer (doing async writes) and 1 subscriber written in go using this library
I am creating a topic using kafka's command line tool as below
./kafka-topics.sh --zookeeper localhost:2181 --create --topic foo --partitions 1 --replication-factor 2 --config min.insync.replicas=2
The issue is that whenever i kill leader node of the partition, the producer and consumer still keep on pushing and pulling messages from the kafka cluster even though the min.insync.replicas setting for my topic is 2. I expect producer to throw exceptions and partition should not be allowed for writing as per the docs.
I found one more thread similar to mine wherein it was suggested to set min.insync.replicas per topic which i have done but still there are no errors on producer
Am i doing something wrong somewhere ?
Producer code (golang)
func main() {
kafkaProducer, kafkaConErr = kafka.NewProducer( & kafka.ConfigMap {
"bootstrap.servers": "localhost:9092",
"acks": "-1"
})
if kafkaConErr != nil {
fmt.Println("Error creating InfluxDB Client: ", kafkaConErr.Error())
}
defer kafkaProducer.Close()
topic: = "foo"
/* for range []string{"Welcome", "to", "the", "Confluent", "Kafka", "Golang", "client"} { */
perr: = kafkaProducer.Produce( & kafka.Message {
TopicPartition: kafka.TopicPartition {
Topic: & topic,
Partition: kafka.PartitionAny
},
Value: empData,
}, nil)
if perr != nil {
fmt.Println(err.Error())
return
}
deliveryReportHandler()
}
func deliveryReportHandler() {
// Delivery report handler for produced messages
go func() {
for e: = range kafkaProducer.Events() {
switch ev: = e.(type) {
case *kafka.Message:
if ev.TopicPartition.Error != nil {
fmt.Printf("Delivery failed: %v\n", ev.TopicPartition)
} else {
fmt.Printf("Delivered message to topic %s [%d] at offset %v\n", * ev.TopicPartition.Topic, ev.TopicPartition.Partition, ev.TopicPartition.Offset)
}
default:
fmt.Printf("Ignored event: %s\n", ev)
}
}
}()
}
My broker config as as below
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma separated list of directories under which to store log files
log.dirs=/tmp/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
#offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
auto.leader.rebalance.enable=true
leader.imbalance.check.interval.seconds=5
leader.imbalance.per.broker.percentage=0
min.insync.replicas=2
offsets.topic.replication.factor=2
replica.lag.time.max.ms=1000
Update - filed a issue in github repo link

Creating cluster member removes some configuration

I'm using WAS ND and want to have dmgr profile with federated managed profile app.
I am creating cluster using:
AdminTask.createCluster('[-clusterConfig [-clusterName %s -preferLocal true]]' % nameOfModulesCluster)
Next, I'm configuring my WAS instance, queues, datasources, jdbc, JMS Activation Specs, factories etc.
By the time I want to create cluster member, I'm displaying:
print("QUEUES: \n" + AdminTask.listSIBJMSQueues(AdminConfig.getid('/ServerCluster:ModulesCluster/')))
print("JMS AS: \n" + AdminTask.listSIBJMSActivationSpecs(AdminConfig.getid('/ServerCluster:ModulesCluster/')))
And it returns all queues I've created earlier. But when I'm calling
AdminTask.createClusterMember('[-clusterName %(cluster)s -memberConfig [-memberNode %(node)s -memberName %(server)s -memberWeight 2 -genUniquePorts true -replicatorEntry false] -firstMember [-templateName default -nodeGroup DefaultNodeGroup -coreGroup DefaultCoreGroup -resourcesScope cluster]]' % {'cluster': nameOfCluster, 'node': nameOfNode, 'server': nameOfServer})
AdminConfig.save()
configuration displayed earlier is... gone. Some configuration (like datasources) is still displayed in ibm/console, but queues and jms as are not. The same print is displaying nothing, but member is added to cluster.
I can't find any information using google. I've tried AdminNodeManagement.syncActiveNodes(), but it won't work since I'm using
/opt/IBM/WebSphere/AppServer/bin/wsadmin.sh -lang jython -conntype NONE -f global.py
and AdminControl is not available.
What should I do in order to keep my configuration created before clustering? Do I have to sync it somehow?
This is the default behavior and is due to the -resourcesScope attribute in the createClusterMember command. This attribute determines how the server resources are promoted in the cluster, while adding the first cluster member.
Valid options for resourcesScope are :
Cluster: moves the resources of the first cluster member to the cluster level. The resources of the first cluster member replace the resources of the cluster. (is the default option)
Server: maintains the server resources at the new cluster member level. The cluster resources remain unchanged.
Both: copies the resources of the cluster member (server) to the cluster level. The resources of the first cluster member replace the resources of the cluster. The same resources exist at both the cluster and cluster member scopes.
Since you have set "-resourcesScope cluster" in your createClusterMember command, all configuration created at cluster scope are being deleted/replaced by the empty configurations of the new cluster member.
So, for your configurations to work, set "-resourcesScope server", such that the cluster configurations are not replaced by the cluster member configurations.
AdminTask.createClusterMember('[-clusterName %(cluster)s -memberConfig [-memberNode %(node)s -memberName %(server)s -memberWeight 2 -genUniquePorts true -replicatorEntry false] -firstMember [-templateName default -nodeGroup DefaultNodeGroup -coreGroup DefaultCoreGroup -resourcesScope server]]' % {'cluster': nameOfCluster, 'node': nameOfNode, 'server': nameOfServer})
AdminConfig.save()
Refer "Select how the server resources are promoted in the cluster" section in https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/urun_rwlm_cluster_create2_v61.html for more details.

ElasticSearch Out Of Memory

I have a blog (30 small articles) based on Symfony2.6 and running on a little Ubuntu14.04 VPS (4GB memory, 50GB disk space). I use ElasticSearch throw the FOS ElasticaBundle in order to allow users/readers to look for articles on this blog (by keywords and categories, that's it).
Everything was going well for nearly 2 months and now it appears that the blog is completely unavailable !
I figured out this was due to some kind of "OOM" problem.
I have tried to set the indices.fieddata.cache.size to 40%.
I have tried to have a look at the head plugin. It answered that the cluster was not connected.
I have tried /_nodes/stats/indices/fielddata?fields=* request. Talked about 5572 bytes used for this node which doesn't seem that much.
When I try to stop the node with Ctrl + C in the terminal, it took ages, it prints :
[2016-01-04 23:38:37,085][INFO ][node ] [Novs]
stopping ... Exception in thread "elasticsearch[Novs][generic][T#4]"
java.lang.OutOfMemoryError: Java heap space
I also found out that my elasticsearch1..../data folder was absolutely huuge, about 26GB. I am gonna run out of disk space soon and don't know if I can just delete old folders manually for example.
Is there any simple command-line tool that could help to get rid of all this OOM problem in a matter of seconds ? Or something like that ?
The ElasticSearch config (the only one I have been able to find in /elastiseach-1.7.3/config/ ) :
##################### Elasticsearch Configuration Example
#####################
# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at <elasticsearch.org/guide>.
#
# The installation procedure is covered at
# <elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html>.
#
# Elasticsearch comes with reasonable defaults for most settings,
# so you can try it out without bothering with configuration.
#
# Most of the time, these defaults are just fine for running a production
# cluster. If you're fine-tuning your cluster, or wondering about the
# effect of certain configuration option, please _do ask_ on the
# mailing list or IRC channel [elasticsearch.org/community].
# Any element in the configuration can be replaced with environment variables
# by placing them in ${...} notation. For example:
#
#node.rack: ${RACK_ENV_VAR}
# For information on supported formats and syntax for the config file, see
# <elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html>
################################### Cluster ###################################
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
#cluster.name: elasticsearch
#################################### Node #####################################
# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
#node.name: "Franz Kafka"
# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
#node.master: true
#
# Allow this node to store data (enabled by default):
#
#node.data: true
# You can exploit these settings to design advanced cluster topologies.
#
# 1. You want this node to never become a master node, only to hold data.
# This will be the "workhorse" of your cluster.
#
#node.master: false
#node.data: true
#
# 2. You want this node to only serve as a master: to not store any data and
# to have free resources. This will be the "coordinator" of your cluster.
#
#node.master: true
#node.data: false
#
# 3. You want this node to be neither master nor data node, but
# to act as a "search load balancer" (fetching data from nodes,
# aggregating results, etc.)
#
#node.master: false
#node.data: false
# Use the Cluster Health API [localhost:9200/_cluster/health], the
# Node Info API [localhost:9200/_nodes] or GUI tools
# such as <http://www.elasticsearch.org/overview/marvel/>,
# <github.com/karmi/elasticsearch-paramedic>,
# <github.com/lukas-vlcek/bigdesk> and
# mobz.github.com/elasticsearch-head> to inspect the cluster state.
# A node can have generic attributes associated with it, which can later be used
# for customized shard allocation filtering, or allocation awareness. An attribute
# is a simple key value pair, similar to node.key: value, here is an example:
#
#node.rack: rack314
# By default, multiple nodes are allowed to start from the same installation location
# to disable it, set the following:
#node.max_local_storage_nodes: 1
#################################### Index ####################################
# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.
#
# Note, that it makes more sense to configure index settings specifically for
# a certain index, either when creating it or by using the index templates API.
#
# See <elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html> and
# <elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html>
# for more information.
# Set the number of shards (splits) of an index (5 by default):
#
#index.number_of_shards: 5
# Set the number of replicas (additional copies) of an index (1 by default):
#
#index.number_of_replicas: 1
# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
#index.number_of_shards: 1
#index.number_of_replicas: 0
# These settings directly affect the performance of index and search operations
# in your cluster. Assuming you have enough machines to hold shards and
# replicas, the rule of thumb is:
#
# 1. Having more *shards* enhances the _indexing_ performance and allows to
# _distribute_ a big index across machines.
# 2. Having more *replicas* enhances the _search_ performance and improves the
# cluster _availability_.
#
# The "number_of_shards" is a one-time setting for an index.
#
# The "number_of_replicas" can be increased or decreased anytime,
# by using the Index Update Settings API.
#
# Elasticsearch takes care about load balancing, relocating, gathering the
# results from nodes, etc. Experiment with different settings to fine-tune
# your setup.
# Use the Index Status API (<localhost:9200/A/_status>) to inspect
# the index status.
#################################### Paths ####################################
# Path to directory containing configuration (this file and logging.yml):
#
#path.conf: /path/to/conf
# Path to directory where to store index data allocated for this node.
#
#path.data: /path/to/data
#
# Can optionally include more than one location, causing data to be striped across
# the locations (a la RAID 0) on a file level, favouring locations with most free
# space on creation. For example:
#
#path.data: /path/to/data1,/path/to/data2
# Path to temporary files:
#
#path.work: /path/to/work
# Path to log files:
#
#path.logs: /path/to/logs
# Path to where plugins are installed:
#
#path.plugins: /path/to/plugins
#################################### Plugin ###################################
# If a plugin listed here is not installed for current node, the node will not start.
#
#plugin.mandatory: mapper-attachments,lang-groovy
################################### Memory ####################################
# Elasticsearch performs poorly when JVM starts swapping: you should ensure that
# it _never_ swaps.
#
# Set this property to true to lock the memory:
#
#bootstrap.mlockall: true
# Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set
# to the same value, and that the machine has enough memory to allocate
# for Elasticsearch, leaving enough memory for the operating system itself.
#
# You should also make sure that the Elasticsearch process is allowed to lock
# the memory, eg. by using `ulimit -l unlimited`.
############################## Network And HTTP ###############################
# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).
# Set the bind address specifically (IPv4 or IPv6):
#
#network.bind_host: 192.168.0.1
# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
#network.publish_host: 192.168.0.1
# Set both 'bind_host' and 'publish_host':
#
#network.host: 192.168.0.1
# Set a custom port for the node to node communication (9300 by default):
#
#transport.tcp.port: 9300
# Enable compression for all communication between nodes (disabled by default):
#
#transport.tcp.compress: true
# Set a custom port to listen for HTTP traffic:
#
#http.port: 9200
# Set a custom allowed content length:
#
#http.max_content_length: 100mb
# Disable HTTP completely:
#
#http.enabled: false
################################### Gateway ###################################
# The gateway allows for persisting the cluster state between full cluster
# restarts. Every change to the state (such as adding an index) will be stored
# in the gateway, and when the cluster starts up for the first time,
# it will read its state from the gateway.
# There are several types of gateway implementations. For more information, see
# <elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html>.
# The default gateway type is the "local" gateway (recommended):
#
#gateway.type: local
# Settings below control how and when to start the initial recovery process on
# a full cluster restart (to reuse as much local data as possible when using shared
# gateway).
# Allow recovery process after N nodes in a cluster are up:
#
#gateway.recover_after_nodes: 1
# Set the timeout to initiate the recovery process, once the N nodes
# from previous setting are up (accepts time value):
#
#gateway.recover_after_time: 5m
# Set how many nodes are expected in this cluster. Once these N nodes
# are up (and recover_after_nodes is met), begin recovery process immediately
# (without waiting for recover_after_time to expire):
#
#gateway.expected_nodes: 2
############################# Recovery Throttling #############################
# These settings allow to control the process of shards allocation between
# nodes during initial recovery, replica allocation, rebalancing,
# or when adding and removing nodes.
# Set the number of concurrent recoveries happening on a node:
#
# 1. During the initial recovery
#
#cluster.routing.allocation.node_initial_primaries_recoveries: 4
#
# 2. During adding/removing nodes, rebalancing, etc
#
#cluster.routing.allocation.node_concurrent_recoveries: 2
# Set to throttle throughput when recovering (eg. 100mb, by default 20mb):
#
#indices.recovery.max_bytes_per_sec: 20mb
# Set to limit the number of open concurrent streams when
# recovering a shard from a peer:
#
#indices.recovery.concurrent_streams: 5
################################## Discovery ##################################
# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.
# Set to ensure a node sees N other master eligible nodes to be considered
# operational within the cluster. This should be set to a quorum/majority of
# the master-eligible nodes in the cluster.
#
#discovery.zen.minimum_master_nodes: 1
# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
#discovery.zen.ping.timeout: 3s
# For more information, see
# <elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html>
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
#discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]
# EC2 discovery allows to use AWS EC2 API in order to perform discovery.
#
# You have to install the cloud-aws plugin for enabling the EC2 discovery.
#
# For more information, see
# <elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html>
#
# See <http://elasticsearch.org/tutorials/elasticsearch-on-ec2/>
# for a step-by-step tutorial.
# GCE discovery allows to use Google Compute Engine API in order to perform discovery.
#
# You have to install the cloud-gce plugin for enabling the GCE discovery.
#
# For more information, see <github.com/elasticsearch/elasticsearch-cloud-gce>.
# Azure discovery allows to use Azure API in order to perform discovery.
#
# You have to install the cloud-azure plugin for enabling the Azure discovery.
#
# For more information, see <github.com/elasticsearch/elasticsearch-cloud-azure>.
################################## Slow Log ##################################
# Shard level query and fetch threshold logging.
#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms
#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms
#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms
################################## GC Logging ################################
#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms
#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s
################################## Security ################################
# Uncomment if you want to enable JSONP as a valid return transport on the
# http server. With this enabled, it may pose a security risk, so disabling
# it unless you need it is recommended (it is disabled by default).
#
#http.jsonp.enable: true
Thanks in advance for your help.
As this seems to be Heap Space issue, make sure you have sufficient memory. Read this blog about Heap sizing.
As you have 4GB RAM assign half of it to Elasticsearch heap. Run export ES_HEAP_SIZE=2g. Also lock the memory for JVM, uncomment bootstrap.mlockall: true in your config file.
Another important thing here is if you have only 30 small articles, how is your data folder 26GB in size? How many indexes you have, run GET _cat/indices to check which index is taking that much space. Run GET /_nodes/stats to see detailed info about node, you might be able to figure out what is the issue. One more thing, if you are using marvel plugin, then marvel indices are pretty huge and you need to delete them to free disk space.
Tweaking indices.fieddata.cache.size is not a solution for lack of memory. From the Docs
This setting is a safeguard, not a solution for insufficient memory.
If you don’t have enough memory to keep your fielddata resident in
memory, Elasticsearch will constantly have to reload data from disk,
and evict other data to make space. Evictions cause heavy disk I/O and
generate a large amount of garbage in memory, which must be garbage
collected later on.
Hope this helps!!

Resources