how to setup the cassandra cluster in cloud - amazon-ec2

I'm new to Cassandra. I installed Cassandra on my ec2 machine, but how can I configure Cassandra in cluster mode.
Is there any link that will be helpful.
Thanks in advance

Step 3: Running a cluster (let me know if that is not enough for you)
You should also read the last section (updated yesterday (110310) by jbellis) on this page

I just tried this one today :
https://cloud.google.com/solutions/cassandra/
Google cloud will let you deploy a pre-configured cluster of Cassandra nodes within a few mouse clicks. When this service is not free, they still let you have a trial for 60 days.
** Another option is to follow their script on how to deploy multiple nodes cluster:
https://cloud.google.com/solutions/cassandra/deployment-details

Related

How to setup 2 nodes on elasticsearch?

Hello enthusiastic people.
I am a student trying to learn Elastic stack.
I have 1 node installed on my local machine. I have also successfully installed beats on my other local machine to get data and deliver it to my logstash.
My question is, what if I add another node, do I still need to install kibana and elasticsearch? Then connect it from my first node?
I just read a lot that a single node is prone to data loss.
Sorry for my noob question.
Your answer is very appreciated.
Thanks in advance.
Having a cluster with at least 3 nodes would be good to ensure data security and integrity.
A cluster can have one or more nodes.
An example scenario:
It will be easier for you to install with docker during the learning and development process. I recommend you follow the link below. This link explains how to set up an elasticsearch cluster with 3 nodes on docker.
Start a multi-node cluster with Docker Compose

Amazon EMR spam applications by user dr.who?

I am working spark processes using python (pyspark). I create an amazon EMR cluster to run my spark scripts, but when cluster is just created a lot of processes ar launched by itself (¿?), when I check cluster UI:
So, when I try to lunch my own script, they enter in an endless queue, sometime ACCEPTED but never get into RUNNING state.
I couldn't find any info about this issue even in amazon forums, so I'll glad any advice.
Thanks in advance.
you need to check in the security group of the master node, check the inbound traffic,
maybe you have a rule for anywhere, please remove that or try to remove and check if the things work it is a vulnerability.

How do I install components such as Apache Drill and Apache Hue in IBM Bluemix BigInsights Apache Hadoop

I am new to IBM Bluemix platform and exploring its BigInsights service. I can see pre configured components such as Pig Hive Hbase and others. But I want to know How can I install services like Drill or say Hue which is not configured by default. Also ssh to cluster nodes allows restricted access with no sudo rights in case one need to run yum commands.Does bluemix allows root access as I cannot see one. Thanks In advance.
As far as I know, it is not possible.
But you can use http://www.softlayer.com/ to build your own IOP (IBM Open Platform) Cluster in the cloud.
If you are interested in IBM's value-adds and you just want to try out:
https://www.youtube.com/watch?v=4p7LDeu_qQQ it is a nice tutorial to set up your own cluster via Docker.
This tutorial should be still valid for Hue:
https://developer.ibm.com/hadoop/2015/06/02/deploying-hue-on-ibm-biginsights/
Installing Drill doesn't look complicated:
https://drill.apache.org/docs/installing-drill-in-distributed-mode/
In conclusion: You need to move away from Bluemix, if you want to have a more customised BigInsights. But there are options: Softlayer, AWS, .. or just on your local computer (if you got sufficient resources, since some components like Hbase need a minimum amount of nodes)

Datastax Opscenter issue: dashboard timeout

I installed Datastax community version in an EC2 server and it worked fine. After that I tried to add one more server and I see two nodes in the Nodes menu but in the main dashboard I see the following error:
Error: Call to /Test_Cluster__No_AMI_Parameters/rc/dashboard_presets/ timed out.
One potential rootcause I can see is the name of the cluster? I specified something else in the cassandra.yaml but it looks like opscenter is still using the original name? Any help would be grealy appreciated.
It was because cluster name change wasn't made properly. I found it easier to change the cluster name before starting Cassandra cluster. On top of this, only one instance of opscentered needs to run in one single cluster. datastax-agent needs to be running in all nodes in the cluster but they need to point to the same opscenterd (change needs to be made at /var/lib/datastax-agent/conf/address.yaml)

Hadoop installation on Amazon cloud

i am new to Hadoop ,i likes to go in hadoop administration line so studied basics of hadoop and tried to install hadoop in pseudo distribution mode and installed successfully and run some basic examples also, now i need to improve me further,so i need to try a way to learn hadoop installation and configuration in real time so decided to go for Amazon micro instance ,can any one please tell how to install and configure hadoop in Amazon cloud.
Thanks in Advance.
I have tried this personally and you will not really be able to use hadoop on a single micro instance due to memory restrictions. IMHO you should atleast try a medium instance to run hadoop or better yet use their elastic-mapreduce api which is a modified version of hadoop. You can run a 3 node cluster for around 00.25 cents an hour. If you really want to learn big data this is the way I went.
You should check out their documentation here
http://aws.amazon.com/documentation/elasticmapreduce/

Resources