Apache Kylin not able to load models/configuration - hadoop

I'm new to hadoop,hive, hbase and kylin. I tried to install thoose first three, and it's seems to be working.
After that I tried to install apache kylin, run the sample.sh and success.
After running the script I restart and open the web interface. Some page cannot be opened ex: /cube, /models, /admin/config
The problem is: I can see there are 5 tables created in hive, and also 2 cubes created. But when I open in web gui, the models is in loading-state and I cannot build the cube.
When I try to build the cube
I cannot find any infomative log (Or maybe there is one, but I don't know about it)
kylin.log
https://pastebin.com/TUZkQepa
hadoop-hadoop-namenode-master.log
https://pastebin.com/T8eNt3PY
hadoop-hadoop-secondarynamenode-master.log
https://pastebin.com/iMJDNFfU
yarn-hadoop-resourcemanager-master.log
https://pastebin.com/TGwJWTRF
hbase-hadoop-zookeeper-master.log
https://pastebin.com/Ym6eky5h
hbase-hadoop-master-master.log
https://pastebin.com/p1ygfw4W
Here is the configuration for hadoop
(yarn-site.xml)
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Configuration for hbase
regionservers
slave2
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/datadir</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave2</value>
</property>
</configuration>
Configuration for hive
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>gwudainget</value>
<description>password for connecting to mysql server</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
</configuration>
For kylin, I use default configuration, because I don't really know what to do with the kylin configuration.
What i use:
hadoop 2.7.5 binary
hbase 1.2.6 binary
hive 1.2.2 binary
kylin 2.2.0 source (I just add logs)

Related

How to change mapreduce temporary working directory /tmp to other folder

I am using hive and I want to change the mapreduce temporary working directory from /tmp to some other directory. I tried everything which could I find on internet but nothing is working. I can see by du -h command that /tmp is filling up during the mapreduce task. Please somebody help me to change the directory.
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/bd/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/bd/tmp/hadoop/dfs/journalnode/</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/data/bd/tmp/mapred/local</value>
</property>
<property>
<name>mapreduce.task.tmp.dir</name>
<value>/data/bd/tmp</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>/data/bd/tmp/mapred/temp</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/data/bd/tmp/hadoop-yarn/staging</value>
</property>
<property>
<name>mapreduce.jobtracker.system.dir</name>
<value>/data/bd/tmp/mapred/system</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/data/bd/tmp/mapred/staging</value>
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/data/bd/tmp/logs</value>
<description>The staging dir used while submitting jobs</description>
</property>
<property>
<name>yarn.timeline-service.entity-group-fs-store.active-dir</name>
<value>/data/bd/tmp/entity-file-history/active</value>
<description>HDFS path to store active application’s timeline data</description>
</property>
<property>
<name>yarn.timeline-service.entity-group-fs-store.done-dir</name>
<value>/data/bd/tmp/entity-file-history/done/</value>
<description>HDFS path to store done application’s timeline data</description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/bd/tmp/hadoop-ubuntu/nm-local-dir</value>
<description>List of directories to store localized files</description>
</property>
</configuration>
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password for connecting to mysql server</description>
</property>
<property>
<name>hive.exec.parallel</name>
<value>true</value>
<description>Whether to execute jobs in parallel</description>
</property>
<property>
<name>hive.exec.parallel.thread.number</name>
<value>8</value>
<description>How many jobs at most can be executed in parallel</description>
</property>
<property>
<name>hive.cbo.enable</name>
<value>true</value>
<description>Flag to control enabling Cost Based Optimizations using Calcite framework.</description>
</property>
<property>
<name>hive.compute.query.using.stats</name>
<value>true</value>
<description>
When set to true Hive will answer a few queries like count(1) purely using stats
stored in metastore. For basic stats collection turn on the config hive.stats.autogather to true.
For more advanced stats collection need to run analyze table queries.
</description>
</property>
<property>
<name>hive.stats.fetch.partition.stats</name>
<value>true</value>
<description>
Annotation of operator tree with statistics information requires partition level basic
statistics like number of rows, data size and file size. Partition statistics are fetched from
metastore. Fetching partition statistics for each needed partition can be expensive when the
number of partitions is high. This flag can be used to disable fetching of partition statistics
from metastore. When this flag is disabled, Hive will make calls to filesystem to get file sizes
and will estimate the number of rows from row schema.
</description>
</property>
<property>
<name>hive.stats.fetch.column.stats</name>
<value>true</value>
<description>
Annotation of operator tree with statistics information requires column statistics.
Column statistics are fetched from metastore. Fetching column statistics for each needed column
can be expensive when the number of columns is high. This flag can be used to disable fetching
of column statistics from metastore.
</description>
</property>
<property>
<name>hive.stats.autogather</name>
<value>true</value>
<description>A flag to gather statistics automatically during the INSERT OVERWRITE command.</description>
</property>
<property>
<name>hive.stats.dbclass</name>
<value>fs</value>
<description>
Expects one of the pattern in [jdbc(:.*), hbase, counter, custom, fs].
The storage that stores temporary Hive statistics. In filesystem based statistics collection ('fs'),
each task writes statistics it has collected in a file on the filesystem, which will be aggregated
after the job has finished. Supported values are fs (filesystem), jdbc:database (where database
can be derby, mysql, etc.), hbase, counter, and custom as defined in StatsSetupConst.java.
</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/data/bd/tmp</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
<name>hive.service.metrics.file.location</name>
<value>/data/bd/tmp/report.json</value>
<description>For metric class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics JSON_FILE reporter, the location of local JSON metrics file. This file will get overwritten at every interval.</description>
</property>
<property>
<name>hive.query.results.cache.directory</name>
<value>/data/bd/tmp/hive/_resultscache_</value>
<description>unknown</description>
</property>
<property>
<name>hive.llap.io.allocator.mmap.path</name>
<value>/data/bd/tmp</value>
<description>unknown</description>
</property>
<property>
<name>hive.hbase.snapshot.restoredir</name>
<value>/data/bd/tmp</value>
<description>unknown</description>
</property>
<property>
<name>hive.druid.working.directory</name>
<value>/data/bd/tmp//workingDirectory</value>
<description>unknown</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/data/bd/tmp</value>
<description>logs hive</description>
</property>
</configuration>
For hadoop 2.7.1
Configure mapreduce.cluster.local.dir in $HADOOP_HOME/etc/hadoop/mapred-site.xml, it also supports comma-separated list of directories on different devices.
https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Hadoop can't execute a basic Example

The software I'm using:
System:macOS Mojave 10.14.2
Hadoop:3.1.1
JDK:10.0.2
I execute this command:hadoop jar /usr/local/Cellar/hadoop/3.1.1/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar pi 2 5, it failed:
I need help, thank you!!!
In hadoop-env.sh, I just add the sentence:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-10.0.2.jdk/Contents/Home
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
I solved it.
It's the reason of java version.
When I added the two lines of code to yarn-env.sh, it didnt't work for me.
export YARN_RESOURCEMANAGER_OPTS="--add-modules=ALL-SYSTEM"
export YARN_NODEMANAGER_OPTS="--add-modules=ALL-SYSTEM"
In the end, I change the java version, set it to java8 and deleted above two lines of code, it worked for me.
You can set it in hadoop-env.sh.
Thx

Hadoop - failed to specify server's Kerberos principal name

Error - Failed to specify server's Kerberos principal name
I am trying to setup a Hadoop cluster using Kerberos. I managed to get the cluster working with Spark and Yarn before starting the Kerberos configuration. Currently my master and three nodes are running but i'm getting an error in the yarn logs.
Error:
java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException : Failed to specify server's Kerberos principal name
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster:9000</value>
</property>
<!--Kerberos configuration-->
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[2:$1#$0](hdfs/.*#.*EXAMPLEREALM.COM)s/.*/hdfs/
RULE:[2:$1#$0](HTTP/.*#.*EXAMPLEREALM.COM)s/.*/hdfs/
RULE:[2:$1#$0](yarn/.*#.*EXAMPLEREALM.COM)s/.*/yarn/
DEFAULT
</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.replication<name>
<value>2</value>
</property>
<!-- General HDFS security config -->
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<!-- NameNode security config -->
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytabs/hdfs.service.keytab</value> <!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/hadoopslave1.examplerealm.com#EXAMPLEREALM.COM</value>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/hadoopslave1.examplerealm.com#EXAMPLEREALM.COM</value>
</property>
<!-- Secondary NameNode security config -->
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>/etc/security/keytabs/hdfs.service.keytab</value> <!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>hdfs/hadoopslave1.examplerealm.com#EXAMPLEREALM.COM</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/hadoopslave1.examplerealm.com#EXAMPLEREALM.COM</value>
</property>
<!-- DataNode security config -->
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1004</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/security/keytabs/hdfs.service.keytab</value> <!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/hadoopslave1.examplerealm.com#EXAMPLEREALM.COM</value>
</property>
<!-- Web Authentication config -->
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/hadoopslave1.examplerealm.com#EXAMPLEREALM.COM</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoopmaster</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.principal</name>
<value>yarn/hadoopslave1.examplerealm.com#EXAMPLEREALM.COM</value>
</property>
<property>
<name>yarn.nodemanager.keytab</name>
<value>/etc/security/keytabs/yarn.service.keytab</value>
</property>
</configuration>
Have you installed krb5-libs and krb5-workstation on all nodes?
on Centos:
yum install krb5-server krb5-libs
yum install krb5-libs krb5-workstation
In that case, trying this might help you:
systemctl enable krb5kdc
systemctl start krb5kdc
systemctl enable kadmin
systemctl start kadmin
Also check: https://community.hortonworks.com/questions/176262/failed-to-specify-servers-kerberos-principal-name.html

Hadoop : DataNode change directory not taking effect

We are using hadoop 2.7.3 changed the hdfs-site.xml to point to new directory provided permissions on new directory too ...and ran start-dfs.sh and stop-dfs.sh ..on name node ...but changes are not taking effect it still points to the old directory ...
Am I missing anything while doing the configuration changes? And how can we make sure to use the new directory?
it's a multi node cluster
this is the hdfs-site.xml on name node
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///test/hadoop/hadoopinfra/hdfs/namenode</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///tmp/hadoop/data</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>2368709120</value>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>1.0</value>
</property>
</configuration>
this is the hdfs-site.xml under data node
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///test/hadoop/hadoopinfra/hdfs/datanode</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///tmp/hadoop/data</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>2368709120</value>
</property>
<property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
<value>1.0</value>
</property>
</configuration>

hive slow performance with this configuration

When i add this configuration command in hive-site.xml ,hive queries is very slow.
Anyone can explain why happened for me?how can we fix it?
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
this is my hive-site.xml:
<configuration>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value></value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>

Resources