I have enabled client-to-node and node-to-node encryption in my cassandra cluster. What optionial parameters do I need to pass to get cassandra-stress to work?
I'm am striking out finding any examples or good documentation on using this.
Probably you already found the answer, but I will answer for those who might come looking.
Try:
cassandra-stress write -node 127.0.0.1 -transport truststore=/path/to/cluster/truststore.jks truststore-password=mytruststorepassword -mode native cql3 user=myuser password=mypassword
I found this blog post very useful.
This was not available in Apache Cassandra v2.1, if you want to learn more on how this was implemented then look at this JIRA post
Related
I'm working on a cluster but I don't know how many hosts it has exactly, which are their IPs and what rack they belong to.
I've previously worked with clusters managed via Cloudera and got that information from the cloudera api (http://cloudera.github.io/cm_api/apidocs/v16/), in particular this (http://cm_server_host:7180/api/v16/hosts) gave me all the info I was looking for. But how can I do that if the cluster doesn't use Cloudera? It has spark as well, but since there is Hadoop and HDFS I think the information is more likely to be found there.
Thanks in advance!
You can find those information via http api, that by default should be available under this url:
http://<namenodehost>:50070
and via YARN http api, that by default should be available under this url:
http://<resourcemanagerhost>:8088/cluster/nodes
Alternatively you can use ResourceManager REST API’s.
http://<resourcemanagerhost>:8088/ws/v1/cluster/nodes
More about the topic you can find for example here:
https://www.datadoghq.com/blog/collecting-hadoop-metrics/
Can we create a new pipeline using Apache Nifi API without using the GUI? If yes, then please let me know the steps for the same.
The response to your question is yes, you can use:
NiFi API.
NiFi CLI from version 1.6.
NiPyApi python client thanks to #chaffelson
You can find the documentation here:
https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli
https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
You can also search in the Hortonworks page, there is a lot of contain that can be helpful.
If you are familiar with Python, there is also a community Python client for NiFi.
https://github.com/Chaffelson/nipyapi
And a quick introduction here:
https://community.hortonworks.com/articles/167364/nifi-sdlc-automation-in-python-with-nipyapi-part-1.html
note: I am the primary author.
word-counter example with hbase and hadoop
I am new to hadoop and hbase, i am going to implement a real example on a data set and understand the logic behind them.
I have already install hadoop and hbase on my system (ubuntu 17.04).
hadoop-2.8.0
hbase-1.3.1
is there any step-by-step tutorial for implementing word-counter example?
(word-counter example or any basic example exist)
There is comprehensive tutorial provided in HBase reference guide:
http://hbase.apache.org/book.html#mapreduce.example
Note, HBase provides alternative mechanism called Cascading which is similar to Map-Reduce, but allow to write code in simplified way (it's described in ref. guide too).
quite new to Hbase - can anyone recommend any full tutorials or examples of how to connect to HBase using ruby?
So far I've tried using an old version of Thrift and the code compiles #transport and #protocol, but dies on #client, probably because of the old version.
I'm using HBase in a VM and not sure how to generate a Thrift client package, as far as I understand, thrift --gen [lang] [hbase-root]/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift will generate a gen-rb file inside the vm. Do I then use this file in my ruby code ($:.push('./gen-rb') ) ?
Alternatively, should I forget about Thrift and instead use Massive Record?
Recently I've been writing about using HBase within Ruby in a day-to-day practical sense.
You might want to check this introductory post I wrote about it, it has working examples which you can use to handle your HBase cluster from the outside using pure ruby.
At the end of that post I also keep a list of links to other posts and tutorials I'll continue writing on the subject.
EDIT
Also, about Thrift vs Massive Record, I would suggest you stick to Thrift.
Thrift has come a long way since its first gem was published and it actually is Apache's answer to accessing HBase externally.
I am interested in the Apache Hadoop project, but i would like to know if any other tested (please mind the 'tested') projects/frameworks are out there.
Appreciate any information/links to projects similar to Apache Hadoop and any comments on the Apache Hadoop project from anyone that has used it.
Regards,
As mentioned in an answer to this question:
https://stackoverflow.com/questions/2168558/is-there-anything-like-hadoop-in-c
MongoDB might be something you could look at. Its a scalable database which allows MapReduce algorithms to be run against it.
There are indeed open-source projects utilizing and funding on Hadoop.
See Apache Mahout for data mining: http://lucene.apache.org/mahout/
And are you aware of the other MR implementations available?
http://en.wikipedia.org/wiki/MapReduce#Implementations
Maybe. But none of them will have anywhere near the testing a real world experience that hadoop does. Companies like facebook and yahoo are paying to scale hadoop and I know of no similar open source projects that are really worth looking at.
A possible way is to use org.apache.hadoop.hbase.MiniDFSCluster and org.apache.hadoop.mapred.MiniMRCluster, which are used in testing hadoop itself.
What they do is to launch a small cluster locally. To test your program, make hdfs-site.xml stuffs pointing to local cluster, and add them to your classpath. And this local cluster is just like another cluster but smaller. You can reference hadoop/src/test/*-site.xml as templates.
For more example, take a look at hadoop/src/test/.
There is a Hadoop-like framework, built over Hadoop, giving importance to prioritized execution of iterative algorithms.
It is tested. I have run The WordCount example on it. It is very very similar to Hadoop (especially the installation)
You can find the paper here :
http://rio.ecs.umass.edu/mnilpub/papers/socc11-zhang.pdf
and the code here
https://code.google.com/p/priter/
Hope this helps
A