How to KILL coordinator Leader node in Apache Druid from a python program (AWS lambda function)? - aws-lambda

How to KILL coordinator Leader node in Apache Druid from a python program (AWS lambda function) ?
I am working on a druid system. Some of the segments get stuck in rebalancing because of problems in metadata. To prevent this from occurring, I am thinking of killing the coordinator leader node. The Zookeeper will restart the node.
I have following queries.
How do I kill druid coordinator leader node from a Python AWS lambda function?
How to identify if segment(s) are stuck during rebalancing from a python AWS lambda function?
I know how to identify coordinator leader node from Python program. There is a druid API end point for that [ GET on
"/druid/coordinator/v1/leader" ]. This is listed in api-reference.html document.
But I am not able to get the druid POST endpoint for killing a Druid leader node.
I went through the API reference documents and posts in internet. But not able to get a answers to my queries.

How do I kill druid coordinator leader node from a Python AWS lambda function?
Rarely do databases expose endpoints to kill the services. Druid is also one such example.
I think the question is more about whats causing your druid cluster to be unstable.
How to identify if segment(s) are stuck during rebalancing from a python AWS lambda function?
This is one of the symptoms.
I would first check if I have enough space in the historicals. You can open the services tab from the druid console and see how free are your historicals.
I would also check if your historicals are configured correctly to pull from deep storage.
Seems like you are on aws please configure it using s3.
https://druid.apache.org/docs/latest/tutorials/cluster.html#deep-storage
If you have more doubts please feel free to open an issue on github: https://github.com/apache/druid/issues
or find us on the druid slack channel https://druid.apache.org/community/

Related

Role of Zookeeper in Hadoop

I understand based on the slides that in the context of Hadoop that Zookeeper is used for storing information of Master, and status of different tasks, which worker is working on which partition AND also the available workers are also stored in Zookeeper.
Why is Zookeeper is used for this metadata storage here? Any data store can be used right ?
For instance Celery can configure any result backend Redis/Mongo etc. So in practice Hadoop can use any storage backend right? But why Zookeeper?
This doc suggests that Redis, SQLite, MySQL, PostgreSQL can be used for celery task result storage.
https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/index.html
Zookeeper ZAB protocol is utilized for leader election, as well as distributed locks.
It is not simply a datastore, and no, not any can be used.
Celery isn't used within the Hadoop ecosystem, so I'm not sure how that's relevant to the question.

Amazon EMR spam applications by user dr.who?

I am working spark processes using python (pyspark). I create an amazon EMR cluster to run my spark scripts, but when cluster is just created a lot of processes ar launched by itself (¿?), when I check cluster UI:
So, when I try to lunch my own script, they enter in an endless queue, sometime ACCEPTED but never get into RUNNING state.
I couldn't find any info about this issue even in amazon forums, so I'll glad any advice.
Thanks in advance.
you need to check in the security group of the master node, check the inbound traffic,
maybe you have a rule for anywhere, please remove that or try to remove and check if the things work it is a vulnerability.

Is there a way to shutdown and start an AWS redshift cluster with the cli?

I'm just bringing up a redshift cluster to start a development effort and usually use a cron service to bring down all of my development resources outside of business hours to save money.
As I browse the aws cli help:
aws redshift help
I don't see any options to stop or shutdown my test cluster like I have in the console.
If there is no way to do this, does anybody know why they don't offer this functionality? These instances are pretty spendy to keep online and I don't want to have to go in and shut them down by hand every night.
It sounds like you are looking for:
delete-cluster, that explicitly specifies a final snapshot
restore-from-cluster-snapshot, restoring the snapshot taken above
From the aws-cli aws redshift delete-cluster documentation:
If you want to shut down the cluster and retain it for future use, set
SkipFinalClusterSnapshot to "false" and specify a name for
FinalClusterSnapshotIdentifier . You can later restore this snapshot to resume using the cluster. If a final cluster snapshot is requested,
the status of the cluster will be "final-snapshot" while the snapshot
is being taken, then it's "deleting" once Amazon Redshift begins
deleting the cluster.
Example usage, again from the documentation:
# When shutting down at night...
aws redshift delete-cluster --cluster-identifier mycluster --final-cluster-snapshot-identifier my-snapshot-id
# When starting up in the morning...
aws redshift restore-from-cluster-snapshot --cluster-identifier mycluster --snapshot-identifier my-snapshot-id

Need help regarding storm

1) What happens if Nimbus fails? Can we convert some other node into a Nimbus?
2) Where is the output of topology stored? When a bolt emits a tuple, where is it stored ?
3) What happens if zookeeper fails ?
Nimbus is itself a failure-tolerant process, which means it doesn't store its state in-memory but in an external database (Zookeeper). So if Nimbus crashes (an unlikely scenario), on the next start it will resume processing just where it stopped. Nimbus usually must be setup to be monitored by an external monitoring system, such as Monit, which will check the Nimbus process state periodically and restart it if any problem occurs. I suggest you read the Storm project's wiki for further information.
Nimbus is the master node of a Storm cluster and isn't possible to have multiple Nimbus nodes. (Update: the Storm community is now (as of 5/2014) actively working on making the Nimbus daemon fault tolerant in a failover manner, by having multiple Nimbuses heartbeating each other)
The tuple is "stored" in the tuple tree, and it is passed to the next bolt in the topology execution chain as topology execution progresses. As for physical storage, tuples are probably stored in an in-memory structure and seralized as necessary to be distributed among the cluster's nodes. The complete Storm cluster's state itself is stored in Zookeeper. Storm doesn't concern itself with persisent storage of a topology or a bolt's output -- it is your job to persist the results of the processing.
Same as for Nimbus, Zookeper in a real, production Storm cluster must be configured for reliability, and for Zookeeper that means having an odd number of Zookeeper nodes running on different servers. You can find more information on configuring a Zookeeper production cluster in the Zookeper Administrator's Guide. If Zookeeper would fail (altough a highly unlikely scenario in a properly configured Zookeeper cluster) the Storm cluster wouldn't be able to continue processing, since all cluster's state is stored in Zookeeper.
Regarding question 1), this bug report and subsequent comment from Storm author and maintainer Nathan Marz clarifies the issue:
Storm is not designed for having topologies partially running. When you bring down the master, it is unable to reassign failed workers. We are working on Nimbus failover. Nimbus is fault-tolerant to the process restarting, which has made it fault-tolerant enough for our and most people's use cases.

read data from amazon hbase

Can anyone suggest me that whether I can read data from amazon hbase using the org.apache.hadoop.conf.Configuration and org.apache.hadoop.hbase.client.HTablePool.
We are migrating to Amazon's EMR framework having hbase running on top of it.
The present implementation is based on pure Apache hadoop and hbase distributions. I'm trying to verify that no code changes needed even we migrate to amazon's EMR.
Please share your thoughts.
While it should not happen, I would expect the problems and changes related to the nature of EC2 and its networking.
HBase relay on Regions able to renew their leases in timely manner. If Region servers are two busy - because of some massive operations over them, they can not do so and get kicked off the cluster.
In amazon performance of the EC2 instances are much less predictable then in dedicated cluster (unless you use cluster instances), so adjusting timeout parameters and/or nature of your loads might be needed to get cluster to work properly

Resources