Cassandra - unable to connect via cqlsh - bash

I have a problem in connecting to cassandra via clqsh. I've deployed a cluster consisting of 3 nodes on CentOS7. I could see that nodes are connecting with each other. nodetool status output is bellow:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN ${SEED2} 226.47 KiB 1 60,3% <hash> rack1
UN ${SEED} 190.77 KiB 1 50,9% <hash> rack1
UN ${IP} 157.62 KiB 1 88,7% <hash> rack1
But connecting via cqlsh doesn't work. I've tried connection to localhost and to node IP. Here is the output of cqlsh command:
[root#node02 default.conf]# cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1':
error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
Connection refused")})
[root#node02 default.conf]# cqlsh ${IP}
connection error: ('Unable to connect to any servers', {'${IP}':
ConnectionShutdown('Connection to ${IP} was closed',)})
It's not such obvious for me why 'Connection to ... was closed' is printed if connecting to rpc_address but 'Connectiong refused' when connecting to the localhost.
Does anyone know the cause of such problem?
cassandra.yaml file is bellow:
# Cassandra storage config YAML
cluster_name: '${NAME}'
hinted_handoff_enabled: true
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
hints_directory: /var/lib/cassandra/hints
key_cache_size_in_mb: 2
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
concurrent_reads: 32
concurrent_writes: 32
storage_port: 7000
ssl_storage_port: 7001
rpc_port: 9042
start_rpc: true
rpc_keepalive: true
rpc_server_type: sync
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
listen_address: ${IP}
rpc_address: ${IP}
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: ${IP},${SEED}

Found the issue. You set rpc_port to 9042. I think you're confusing rpc with native (cql). Rpc is the old interface that is deprecated in later releases. I would recommend setting start_rpc to false and set rpc_port back to it's default value: 9160.

Related

Cannot stop ZooKeeper in Windows

When I tried to stop the ZooKeeper with command "zkServer stop", I got the following result:
call "C:\Program Files\Java\jdk1.8.0_121"\bin\java "-Dzookeeper.log.dir=C:\zookeeper-3.4.10\bin\.." "-Dzookeeper.root.logger=INFO,CONSOLE" -cp "C:\zookeeper-3.4.10\bin\..\build\classes;C:\zookeeper-3.4.10\bin\..\build\lib\*;C:\zookeeper-3.4.10\bin\..\*;C:\zookeeper-3.4.10\bin\..\lib\*;C:\zookeeper-3.4.10\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "C:\zookeeper-3.4.10\bin\..\conf\zoo.cfg" stop
Output:
2017-09-01 13:55:22,070 [myid:] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2017-09-01 13:55:22,072 [myid:] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2017-09-01 13:55:22,072 [myid:] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2017-09-01 13:55:22,072 [myid:] - WARN [main:QuorumPeerMain#113] - Either no config or no quorum defined in config, running in standalone mode
2017-09-01 13:55:22,145 [myid:] - ERROR [main:ZooKeeperServerMain#55] - Invalid arguments, exiting abnormally
java.lang.NumberFormatException: For input string: "C:\zookeeper-3.4.10\bin\..\conf\zoo.cfg"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:59)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:84)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-09-01 13:55:22,148 [myid:] - INFO [main:ZooKeeperServerMain#56] - Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns]
I am sure I have started the Zookeeper, because when I tried to start a new one, it shows "java.net.BindException: Address already in use: bind"
Another strange problem is that I cannot find Zookeeper in the Windows Service list. However, when I tried to show all port usage in Windows PowerShell by netstat -and, I found the 2181 is in use:
Proto Local Address Foreign Address State
TCP 0.0.0.0:2181 0.0.0.0:0 LISTENING
[java.exe]
TCP [::1]:2181 [::1]:62268 ESTABLISHED
[java.exe]
TCP [::1]:2181 [::1]:62279 ESTABLISHED
[java.exe]
TCP [::1]:2181 [::1]:62280 ESTABLISHED
[java.exe]
TCP [::1]:2181 [::1]:62281 ESTABLISHED
[java.exe]
I was running ZooKeeper on Windows and wasn't able to stop ZooKeeper running at 2181 port using zookeeper-stop.sh, so I tried this double slash "//" method to taskkill. It worked
1. netstat -ano | findstr :2181
TCP 0.0.0.0:2181 0.0.0.0:0 LISTENING 8876
TCP [::]:2181 [::]:0 LISTENING 8876
2. taskkill //PID 8876 //F
SUCCESS: The process with PID 8876 has been terminated.
Credit goes to: How do I kill the process currently using a port on localhost in Windows?
It looks like there is an open bug concerning the start and stop commands in ZooKeeper
To start ZooKeeper, omit the start parameter and call bin\zkServer instead.
To stop it, if you don't see the process from the task manager. You need to connect to ZooKeeper server as an administrator and perform the kill commands.
More details are here.

Cassandra Node communication issue

I have two node cluster on AWS. Everything was working fine until yesterday.
Today I came across a problem when I run nodetool status then the following error appears.
Node1 thinks Node2 is down and vice versa.
From ip2
ip2$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN <ip1> ? 256 ? 27c91f95-4b58-492b-a16e-d9b99867a505 r1
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip2> 9.11 GiB 256 ? e628324d-34dd-4c9c-a53d-99abfacb54af rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
From ip1
ip1$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN <ip2> ? 256 ? e628324d-34dd-4c9c-a53d-99abfacb54af r1
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip1> 9.14 GiB 256 ? 27c91f95-4b58-492b-a16e-d9b99867a505 rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
As per last line there is some replication setting problem but I am not able to figure this out. Please suggest.
WARN [OptionalTasks:1] 2017-08-08 15:33:37,223 CassandraRoleManager.java:344 - CassandraRoleManager skipped default role setup: some nodes were not ready
INFO [OptionalTasks:1] 2017-08-08 15:33:37,223 CassandraRoleManager.java:383 - Setup task failed with error, rescheduling
INFO [HANDSHAKE-/172.15.14.106] 2017-08-08 15:33:37,340 OutboundTcpConnection.java:515 - Handshaking version with /172.15.14.106

Nutch 2.3.1 on cassandra couldn't start

I'm trying to run nutch 2.3.1 with cassandra. Followed steps on http://wiki.apache.org/nutch/Nutch2Cassandra . Finally, when I try to start nutch with command:
bin/crawl urls/ test http://localhost:8983/solr/ 2
I got the following exception:
GeneratorJob: starting
GeneratorJob: filtering: false
GeneratorJob: normalizing: false
GeneratorJob: topN: 50000
GeneratorJob: java.lang.RuntimeException: job failed: name=[test]generate: 1454483370-31180, jobid=job_local1380148534_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)
Error running:
/home/user/apache-nutch-2.3.1/runtime/local/bin/nutch generate -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -topN 50000 -noNorm -noFilter -adddays 0 - crawlId webmd -batchId 1454483370-31180
Failed with exit value 255.
When I check logs/hadoop.log, here's the error message:
2016-02-03 15:18:14,741 ERROR connection.HConnectionManager - Could not start connection pool for host localhost(127.0.0.1):9160
...
2016-02-03 15:18:15,185 ERROR store.CassandraStore - All host pools marked down. Retry burden pushed out to client.
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:390)
But my cassandra server is up:
runtime/local$ netstat -l |grep 9160
tcp 0 0 172.16.230.130:9160 *:* LISTEN
Anyone can help on this issue? Thanks.
The address of Cassandra is not localhost, it's 172.16.230.130. That is the reason, Nutch cannot connect to the Cassandra store.
Hope this helps,
Le Quoc Do

ElasticSearch java.net.NoRouteToHostException in docker

[2015-10-11 13:08:26,587][WARN ][transport.netty ] [Joseph] exception caught on transport layer [[id: 0x7e9f652b]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
I get this exception when launching the elasticsearch in docker (Actually I only have this problem in CentOS7 docker host)
First, my dockefile exposes the UDP ports.
EXPOSE 9200 9300/udp 9301/udp 9302/udp 9303/udp 9304/udp 9305/udp
When I start the docker container, I opened these ports via -p 9200:9200 -p 9300:9300/udp -p 9301:9301/udp -p 9302:9302/udp -p 9303:9303/udp -p 9304:9304/udp -p 9305:9305/udp
Within docker ps, I do see these ports are opened as 0.0.0.0:9300-9305->9300-9305/udp
And here is some lines of my elasticsearch.yml
cluster.name: changsha
discovery.zen.ping.unicast.hosts: [ "10.0.5.241" ]
network.publish_host: 10.0.5.241
10.0.5.241 is my docker host's IP address. Please what is wrong here? it succeeded in CentOS6 host, but failes on this CentOS7 host.
UPDATE
Following this answer, I get the following result from tcpdump -p -nn icmp.
09:26:53.277117 IP 10.0.5.241 > 172.17.0.8: ICMP host 10.0.5.241 unreachable - admin prohibited, length 68
09:26:53.277494 IP 10.0.5.241 > 172.17.0.8: ICMP host 10.0.5.241 unreachable - admin prohibited, length 68
09:26:53.277822 IP 10.0.5.241 > 172.17.0.8: ICMP host 10.0.5.241 unreachable - admin prohibited, length 68
09:26:53.278043 IP 10.0.5.241 > 172.17.0.8: ICMP host 10.0.5.241 unreachable - admin prohibited, length 68
09:26:54.277753 IP 10.0.5.241 > 172.17.0.8: ICMP host 10.0.5.241 unreachable - admin prohibited, length 68
09:27:04.280703 IP 10.0.5.241 > 172.17.0.8: ICMP host 10.0.5.241 unreachable - admin prohibited, length 68
First, find out the docker interface ip address
# ifconfig
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 56:84:7a:fe:97:99 txqueuelen 0 (Ethernet)
RX packets 115761 bytes 12605533 (12.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 55687 bytes 22647938 (21.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Then add all of the docker IP addresses into whitelist
firewall-cmd --permanent --zone=trusted --add-source=172.17.0.0/16
firewall-cmd --reload
Problem solved
If someone come across the issue in centos 7.4, it`s because of the conflict between docker service and firewalld service.
you can solve by disable firewalld and then restart docker service.
please refer https://sanenthusiast.com/docker-and-firewalld-mess-in-centos-7/

Datastax Opscenter - Agent not connecting

I setup Cassandra, OpsCenter and the needed DataStax agent on my EC2 Amazon machine. At the moment it's only one machine.
Everything seems to be running fine, except the node list is empty and so are the keyspaces in the Opscenter. The cassandra, datastax and opscenter logs show no errors and I followed the installation / configuration carefully. Then tried all the suggested fixes.
My guess is the problem lies in the communication between the agent and opscenter.
After a while these requests fail:
etc/cassandra/cassandra.yaml: (simplified)
cluster_name: 'CassandraCluster'
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "1.2.3.4"
listen_address: 1.2.3.4
rpc_address: 0.0.0.0
endpoint_snitch: Ec2Snitch
etc/opscenter/opscenterd.conf: (simplified)
[webserver]
port = 81
interface = 0.0.0.0
[authentication]
enabled = False
[stat_reporter]
[agents]
use_ssl = false
var/lib/datastax-agent/conf/address.yaml: (simplified)
stomp_interface: 1.2.3.4
local_interface: 1.2.3.4
use_ssl: 0
nodetool status output:
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: eu-west_1_cassandra
===============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 1.2.3.4 2.06 MB 256 100.0% 8a121c12-7cbf-4a2a-b111-4ad111c111d8 1a
Nothing really strange shows up in the log except for the repetitive occurence of the following line in the agent.log:
INFO [install-location-finder] 2015-03-11 15:26:04,690 New JMX connection (127.0.0.1:7199)
INFO [install-location-finder] 2015-03-11 15:27:04,698 New JMX connection (127.0.0.1:7199)
INFO [install-location-finder] 2015-03-11 15:28:04,709 New JMX connection (127.0.0.1:7199)
INFO [install-location-finder] 2015-03-11 15:29:04,716 New JMX connection (127.0.0.1:7199)
INFO [install-location-finder] 2015-03-11 15:30:04,724 New JMX connection (127.0.0.1:7199)
INFO [install-location-finder] 2015-03-11 15:31:04,731 New JMX connection (127.0.0.1:7199)
To supply all the info here are the logs:
opscenterd.log
agent.log
cassandra/system.log
In certain environments the persistent connection between the browser and opscenterd may fail. We're working on implementing a more robust connection that will work in all environments, but in the meantime you can use the following workaround:
http://www.datastax.com/documentation/opscenter/5.1/opsc/troubleshooting/opscTroubleshootingZeroNodes.html
Minimal configuration that I find working was setting this options below for address.yaml
stomp_interface: [opscenter-ip]
stomp_port: 61620
use_ssl: 0
cassandra_conf: /etc/cassandra/cassandra.yaml
jmx_host: [cassandra-node-ip]
jmx_port: 7199
Make sure you have sysstat installed also.

Resources