When i add this configuration command in hive-site.xml ,hive queries is very slow.
Anyone can explain why happened for me?how can we fix it?
this is my hive-site.xml:
<description>Username to use against metastore database</description>
<description>Username to use against metastore database</description>
I am using hive and I want to change the mapreduce temporary working directory from /tmp to some other directory. I tried everything which could I find on internet but nothing is working. I can see by du -h command that /tmp is filling up during the mapreduce task. Please somebody help me to change the directory.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<?xml version="1.0" encoding="UTF-8"?>
<description>Whether virtual memory limits will be enforced for containers</description>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
<description>The staging dir used while submitting jobs</description>
<description>HDFS path to store active application’s timeline data</description>
<description>HDFS path to store done application’s timeline data</description>
<description>List of directories to store localized files</description>
<description>metadata is stored in a MySQL server</description>
<description>MySQL JDBC driver class</description>
<description>user name for connecting to mysql server</description>
<description>password for connecting to mysql server</description>
<description>Whether to execute jobs in parallel</description>
<description>How many jobs at most can be executed in parallel</description>
<description>Flag to control enabling Cost Based Optimizations using Calcite framework.</description>
When set to true Hive will answer a few queries like count(1) purely using stats
stored in metastore. For basic stats collection turn on the config hive.stats.autogather to true.
For more advanced stats collection need to run analyze table queries.
Annotation of operator tree with statistics information requires partition level basic
statistics like number of rows, data size and file size. Partition statistics are fetched from
metastore. Fetching partition statistics for each needed partition can be expensive when the
number of partitions is high. This flag can be used to disable fetching of partition statistics
from metastore. When this flag is disabled, Hive will make calls to filesystem to get file sizes
and will estimate the number of rows from row schema.
Annotation of operator tree with statistics information requires column statistics.
Column statistics are fetched from metastore. Fetching column statistics for each needed column
can be expensive when the number of columns is high. This flag can be used to disable fetching
of column statistics from metastore.
<description>A flag to gather statistics automatically during the INSERT OVERWRITE command.</description>
Expects one of the pattern in [jdbc(:.*), hbase, counter, custom, fs].
The storage that stores temporary Hive statistics. In filesystem based statistics collection ('fs'),
each task writes statistics it has collected in a file on the filesystem, which will be aggregated
after the job has finished. Supported values are fs (filesystem), jdbc:database (where database
can be derby, mysql, etc.), hbase, counter, and custom as defined in StatsSetupConst.java.
<description>Scratch space for Hive jobs</description>
<description>For metric class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics JSON_FILE reporter, the location of local JSON metrics file. This file will get overwritten at every interval.</description>
<description>logs hive</description>
For hadoop 2.7.1
Configure mapreduce.cluster.local.dir in $HADOOP_HOME/etc/hadoop/mapred-site.xml, it also supports comma-separated list of directories on different devices.
I'm new to hadoop,hive, hbase and kylin. I tried to install thoose first three, and it's seems to be working.
After that I tried to install apache kylin, run the sample.sh and success.
After running the script I restart and open the web interface. Some page cannot be opened ex: /cube, /models, /admin/config
The problem is: I can see there are 5 tables created in hive, and also 2 cubes created. But when I open in web gui, the models is in loading-state and I cannot build the cube.
When I try to build the cube
I cannot find any infomative log (Or maybe there is one, but I don't know about it)
Here is the configuration for hadoop
Configuration for hbase
Configuration for hive
<description>metadata is stored in a MySQL server</description>
<description>MySQL JDBC driver class</description>
<description>user name for connecting to mysql server</description>
<description>password for connecting to mysql server</description>
<description>Whether to include the current database in the Hive prompt.</description>
For kylin, I use default configuration, because I don't really know what to do with the kylin configuration.
What i use:
hadoop 2.7.5 binary
hbase 1.2.6 binary
hive 1.2.2 binary
kylin 2.2.0 source (I just add logs)
as of now, i'm currently figuring out how to save properly a specific hive table that was derived from a mapped source table in a specific database. let's say that the there would be a separate database for both the tester and developer side. how can i segregate the list of tables that they can access from one another?
For now, i monitor the state of the two databases via HUE. Now, I have a spark program that runs on a yarn cluster that creates a table to be stored depending on whether or not he is a developer or a tester.
The spark program that I've just created is a simple app that reads a table from the current warehouse location and saves a new table named new_table
I have the following hive configuration xml such as the following:
Based from my current understanding, If i change the warehouse location to something upon submitting the spark app on the yarn cluster via hive.warehouse.dir using --files /file/hive-site.xml such as the value of
hdfs:/user/diff/warehouse, the hive configurations on the spark app should detect the following hive tables that exist on the specific directory.
However, upon doing so, it still persists to the location of the default database of the hive.metastore.uris which points to the directory hdfs:/user/hive/warehouse. Based from my understanding, the hive.metastore.uris overrides the database location in hive.metastore.dir.
What am I doing wrong at this point? is there something i need to properly configure in Hive-site.xml? any answers would be appreciated. Thank you. I'm currently a novice developer when it comes to spark and hadoop.
Create separate databases
Creating the databases is a one time thing
hive> create database dev_db location '/user/hive/my_databases/dev';
hive> create database tst_db location '/user/hive/my_databases/tst';
When you create the table you choose the database you want to work with
hive> create table dev_db.my_dev_table (i int);
hive> create table tst_db.my_tst_table (i int);
hive> desc formatted dev_db.my_dev_table;
# col_name data_type comment
i int
# Detailed Table Information
Database: dev_db
Location: hdfs://quickstart.cloudera:8020/user/hive/my_databases/dev/my_dev_table
hive> desc formatted tst_db.my_tst_table;
Database: tst_db
Location: hdfs://quickstart.cloudera:8020/user/hive/my_databases/tst/my_tst_table
I configured spark engine in hive-site.xml using:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
I configured spark engine in hive-site.xml using:
In yarn-site.xml:
When I run hive on spark job, dynamic allocation is not working. Spark would automatically assign spark.executor.instances to whatever the number I set to spark.dynamicAllocation.initialExecutors and not change. Can anyone help me to figure out the problem?
I have configured HBase-1.1.5 and Hadoop-2.7.2 with Kerberos security.
I have enabled authorization for HBase.
When executing any authorization command like user_permission, grant, revoke, etc.
Its getting more than 40 seconds to execute
Below are hbase-site.xml configuration properties
Help me to improve performance on HBase acl
Thanks in advance