Apache kylin cube fails "no counters for job" - hadoop

using kylin 1.5.4, when i build the cube it fails at step 3 , log says "no counter for job". It's not fetching cardinality of hive table as well. When i create a model or cube it throws failed to take action error, but when i close the json page, they are created. It isn't fetching the date partition column, throwing column not found in logs.
Any help or insights are greatly appreciated.

A little late to this, but I also had this issue on my current project where I got the "no counters for job" error. The problem was that we were using ORC tables in Hive. We just moved the data from the ORC tables into new TEXTFILE tables and set everything up in the Kylin cube with the new tables and everything worked.

When Kylin reports "no counter for job", usually it was because that MR job is failed or the Hadoop history server wasn't started. If the MR got error, you need look into the Hadoop Job logs to see what's the root cause; Could you double check and paste the error trace? also please check whether it matches with KYLIN-2026 or KYLIN-2032 , which will be fixed in 1.5.4.1

Related

Not able to indexing to colunm in hbase table using phoenix

I have store 3 millions records in hbase table using csv bulk load and try to fetch some sql data using phoenix but I am getting more time to fetch records, So I have created index using phoenix but not able to insert index table record using phoenix map reduce process. I have used following command to insert indexing data.
I am not sure where this hfile store in hdfs. Please help me where this will store or something I did wrong then please help me for this.
sudo -u hdfs hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table BIGSS --index-table BIG_STATUS_INDEXS --output-path STATUS_HFILES
I have attached screenshot which mentioned error which I am facing. Please help for this.
I have used following link for reference.
https://phoenix.apache.org/secondary_indexing.html

storing a Dataframe to a hive partition table in spark

I'm trying to store a stream of data comming in from a kafka topic into a hive partition table. I was able to convert the dstream to a dataframe and created a hive context. My code looks like this
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
newdf.registerTempTable("temp") //newdf is my dataframe
newdf.write.mode(SaveMode.Append).format("osv").partitionBy("date").saveAsTable("mytablename")
But when I deploy the app on cluster, its says
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark-3f00838b-c5d9-4a9a-9818-11fbb0007076/scratch_hive_2016-10-18_23-18-33_118_769650074381029645-1, expected: hdfs://
When I try to save it as a normal table and comment out the hiveconfigurations it work. But, with partition table...its giving me this error.
I also tried registering the dataframe as a temp table and then to write that table to the partition table. Doing that also gave me the same error
Can someone please tell how can I solve it.
Thanks.
You need to use hadoop(hdfs) configured if you are deploying the app
on the cluster.
With saveAsTable the default location that Spark saves to is controlled by the HiveMetastore (based on the docs). Another option would be to use saveAsParquetFile and specify the path and then later register that path with your hive metastore OR use the new DataFrameWriter interface and specify the path option write.format(source).mode(mode).options(options).saveAsTable(tableName).
I figured it out.
In the code for spark app, I declared the scratch dir location as below and it worked.
sqlContext.sql("SET hive.exec.scratchdir=<hdfs location>")
sqlContext.sql("SET hive.exec.scratchdir=location")

Errors from avro.serde.schema - "CannotDetermineSchemaSentinel"

When running jobs on Hadoop (CDH4.6 and Hive 0.10), these errors showed up:
avro.serde.schema
{"type":"record","name":"CannotDetermineSchemaSentinel","namespace":"org.apache.hadoop.hive","fields":
[{"name":"ERROR_ERROR_ERROR_ERROR_ERROR_ERROR_ERROR","type":"string"},{"name":"Cannot_determine_schema","type":"string"},{"name":"check","type":"string"},
{"name":"schema","type":"string"},{"name":"url","type":"string"},{"name":"and","type":"string"},{"name":"literal","type":"string"}]}
What's the root cause, and how do I resolve them?
Thanks!
This happens when Hive is unable to read or parse the avro schema you have given it. Check the avro.schema.url or avro.schema.literal property in your table; it is likely it is set incorrectly.

Reason for having TempStatStore in hadoop

Could any one give me the reason for what is the purpose of TempStatStores and derby.log fiels in hadoop and when these will be created.?
while trying to execute a query in hive, i'm getting an error: unable to create TempStatStore
from http://osdir.com/ml/general/2011-05/msg06513.html
TempStatsStore is a derby database for stats gathering (intermediate stats). You can turn off stats gathering by set hive.stats.autogather=false

What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

I am getting:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
While trying to make a copy of a partitioned table using the commands in the hive console:
CREATE TABLE copy_table_name LIKE table_name;
INSERT OVERWRITE TABLE copy_table_name PARTITION(day) SELECT * FROM table_name;
I initially got some semantic analysis errors and had to set:
set hive.exec.dynamic.partition=true
set hive.exec.dynamic.partition.mode=nonstrict
Although I'm not sure what the above properties do?
Full ouput from hive console:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201206191101_4557, Tracking URL = http://jobtracker:50030/jobdetails.jsp?jobid=job_201206191101_4557
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=master:8021 -kill job_201206191101_4557
2012-06-25 09:53:05,826 Stage-1 map = 0%, reduce = 0%
2012-06-25 09:53:53,044 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201206191101_4557 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
That's not the real error, here's how to find it:
Go to the hadoop jobtracker web-dashboard, find the hive mapreduce jobs that failed and look at the logs of the failed tasks. That will show you the real error.
The console output errors are useless, largely beause it doesn't have a view of the individual jobs/tasks to pull the real errors (there could be errors in multiple tasks)
I know I am 3 years late on this thread, however still providing my 2 cents for similar cases in future.
I recently faced the same issue/error in my cluster.
The JOB would always get to some 80%+ reduction and fail with the same error, with nothing to go on in the execution logs either.
Upon multiple iterations and research I found that among the plethora of files getting loaded some were non-compliant with the structure provided for the base table(table being used to insert data into partitioned table).
Point to be noted here is whenever I executed a select query for a particular value in the partitioning column or created a static partition it worked fine as in that case error records were being skipped.
TL;DR: Check the incoming data/files for inconsistency in the structuring as HIVE follows Schema-On-Read philosophy.
Adding some information here, as it took me awhile to find the hadoop jobtracker web-dashboard in HDInsight (Azure's Hadoop), and a colleague finally showed me where it was. There is a shortcut on the head node called "Hadoop Yarn Status" which is just a link to a local http page (http://headnodehost:9014/cluster in my case). When opened the dashboard looked like this:
In that dashboard you can find your failed application, and then after clicking into it you can look at the logs of the individual map and reduce jobs.
In my case it seemed to still be running out of memory in the reducers, even though I had cranked the memory in the configuration already. For some reason it was not surfacing the "java outofmemory" errors I got earlier though.
The top answer is right, that the error code doesn't give you much info. One of the common causes that we saw in our team for this error code was when the query was not optimized well. A known reason was when we do an inner join with the left side table magnitudes bigger than the table on right side. Swapping these tables would usually do the trick in such cases.
I removed the _SUCCESS file from the EMR output path in S3 and it worked fine.
I was also facing same error when I was inserting the data into HIVE external table which was pointing to Elastic search cluster.
I replaced the older JAR elasticsearch-hadoop-2.0.0.RC1.jar to elasticsearch-hadoop-5.6.0.jar, and everything worked fine.
My Suggestion is please use the specific JAR as per the elastic search version. Don't use older JARs if you are using newer version of elastic search.
Thanks to this post Hive- Elasticsearch Write Operation #409
Received this error when joining two tables. And one table is large in size and another table is small, which could fit into disk memory. In such a case, use
set hive.auto.convert.join = false
This might help to get rid of the above error. For more detail on this issue please refer to the below threads
Hive Map-Join configuration mystery
Hive.auto.convert.join = true what is the significance of this?
Even I faced the same issue - when checked on dashboard I found following Error. As the data was coming through Flume and had interrupted in between due to which may be there was inconsistency in few files.
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input within/between OBJECT entries
Running on fewer files it worked. Format consistency was the reason in my case.
I faced the same issue because I didn't have permission to query the database I was trying to.
In the case you don't have permission to query the table/database, besides the Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask error, you will see that in Cloudera Manager is not even registering your query.
In my case, the solution was adding more RAM Memory to the Virtual Machines. Sometimes code 2 means that Map and Reduce nodes do not have enough memory.
Another option could be changing the properties "mapreduce.map.memory.mb" y "mapreduce.reduce.memory.mb" in the mapred-site.xml file.
I got the same error while creating the hive table in beeline and then tried to create through spark-shell which thrown actual error. In my case error was with disk space quota for hdfs directory.
org.apache.hadoop.ipc.RemoteException: The DiskSpace quota of /user/hive/warehouse/XXX_XX.db is exceeded: quota = 6597069766656 B = 6 TB but diskspace consumed = 6597493381629 B = 6.00 TB

Resources