does Apache Kylin need a Apache Derby or Mysql for run the sample cube - hadoop

I installed Java and Hadoop and Hbase and Hive and Spark and Kylin.
hadoop-3.0.3
hbase-1.2.6
apache-hive-2.3.3-bin
spark-2.2.2-bin-without-hadoop
apache-kylin-2.3.1-bin
I will be grateful if someone in Help me with Kyle's installation and configuration them.

http://kylin.apache.org/docs/ this may help you. You can send email to dev#kylin.apache.org, then the questions will be discussed and answered in the mailing list. There are some tips for sending the email: 1. provide Kylin version 2. provide log information 3.provide the usage scenario. If you want to get a quick start, you can run Kylin in a Hadoop sandbox VM or in the cloud, for example, start a small AWS EMR or Azure HDInsight cluster and then install Kylin in one of the nodes. When you use Kylin-2.3.1, I suggest you use Spark-2.1.2.

Related

How do I install components such as Apache Drill and Apache Hue in IBM Bluemix BigInsights Apache Hadoop

I am new to IBM Bluemix platform and exploring its BigInsights service. I can see pre configured components such as Pig Hive Hbase and others. But I want to know How can I install services like Drill or say Hue which is not configured by default. Also ssh to cluster nodes allows restricted access with no sudo rights in case one need to run yum commands.Does bluemix allows root access as I cannot see one. Thanks In advance.
As far as I know, it is not possible.
But you can use http://www.softlayer.com/ to build your own IOP (IBM Open Platform) Cluster in the cloud.
If you are interested in IBM's value-adds and you just want to try out:
https://www.youtube.com/watch?v=4p7LDeu_qQQ it is a nice tutorial to set up your own cluster via Docker.
This tutorial should be still valid for Hue:
https://developer.ibm.com/hadoop/2015/06/02/deploying-hue-on-ibm-biginsights/
Installing Drill doesn't look complicated:
https://drill.apache.org/docs/installing-drill-in-distributed-mode/
In conclusion: You need to move away from Bluemix, if you want to have a more customised BigInsights. But there are options: Softlayer, AWS, .. or just on your local computer (if you got sufficient resources, since some components like Hbase need a minimum amount of nodes)

Apache Kylin installation without Sandbox

I was wondering if there are any resources regarding Apache Kylin installation without any sandbox (like cloudera, hortonworks) support. I have managed to do the following:
Install Hadoop 2.6
Install Hive
Install HBase
Then I used the binary from kylin site and so far been able to run it. The problem start when I try to build a cube, the map reduce job gets stuck in step 2. I am thinking if it is still assuming to be in sandbox mode and not submitting job to hadoop at all (there is no entry in hadoop jobtracker).
So I need solution regarding the two:
1. Possible configuration of kylin in pure hadoop setup (no sandbox)
2. somehow enable the kylin setup to submit job to hadoop.
There is no such sandbox or non-sandbox configuration in Kylin. Just make sure the machine where Kylin runs has hadoop setup correctly and you should be fine.
Under the scene, kylin.sh uses hbase classpath and hive -e set | grep 'env:CLASSPATH' to detect hadoop settings. Double check these commands work as expect if you are not sure what cluster Kylin connects to.
If Kylin has problem submitting MR jobs, check two places. First is hadoop resource manager, see if the job has really been submitted or not. Sometimes it's just running slow. Second check kylin.log, see if any exception there. Post the log to kylin dev mailing list and someone will be able to help.
You can install hadoop-2.6 , hive-0.14 ,hbase-0.98.8-hadoop2 with Zookeeper inbuilt or external zookeeper-3.5
Now you can run kylin-v1.1-release on it
If you still face Issues paste the log here

Install datastax cassandra on cloudera cluster

I have an existing CDH 5.3 cluster running on Ubuntu server. I would like to install Cassandra on the same nodes and integrate it with the existing Cloudera cluster. I know Cassandra allows BYOH now but I cannot find any guide online to help me accomplish it. Has anyone done this? Do you have any instructions i could follow?
Thank You
Here is the Datastax documentation for BYOH

Hadoop installation on Amazon cloud

i am new to Hadoop ,i likes to go in hadoop administration line so studied basics of hadoop and tried to install hadoop in pseudo distribution mode and installed successfully and run some basic examples also, now i need to improve me further,so i need to try a way to learn hadoop installation and configuration in real time so decided to go for Amazon micro instance ,can any one please tell how to install and configure hadoop in Amazon cloud.
Thanks in Advance.
I have tried this personally and you will not really be able to use hadoop on a single micro instance due to memory restrictions. IMHO you should atleast try a medium instance to run hadoop or better yet use their elastic-mapreduce api which is a modified version of hadoop. You can run a 3 node cluster for around 00.25 cents an hour. If you really want to learn big data this is the way I went.
You should check out their documentation here
http://aws.amazon.com/documentation/elasticmapreduce/

Cassandra integration with Hadoop

I am newbie to Cassandra. I am posting this question as different documentations were providing different details with respect to integeting Hive with Cassandra and I was not able to find the github page.
I have installed a single node Cassandra 2.0.2 (Datastax Community Edition) in one of the data nodes of my 3 node HDP 2.0 cluster.
I am unable to use hive to access Cassandra using 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'. I am getting the error ' return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
I have copied all the jars in /$cassandra_home/lib/* to /$hive-home/lib and also included the /cassandra_home/lib/* in the $HADOOP_CLASSPATH.
Is there any other configuration changes that I have to make to integrate Cassandra with Hadoop/Hive?
Please let me know. Thanks for the help!
Thanks,
Arun
Probably these are starting points for you:
Hive support for Cassandra, github
Top level article related to your topic with general information: Hive support for Cassandra CQL3.
Hadoop support, Cassandra Wiki.
Actually your question is not so narrow, there could be lot of reasons for this. But what you should remember Hive is based on MapReduce engine.
Hope this helps.

Resources