HBase : Create table command taking long time - hadoop

I am new to HBase and i'm followig the book "Hadoop-The Definitve Guide".
I have started all application on my local system, which means there should be not network overhead. But when i ran the simple command to create a table in base, it is taking around nine seconds.
Here is the procedure which i used to start the hbase and create table:
./start-hbase.sh
./hbase shell
create 'test' , 'data'
And here is the console logs showing it takes around 8.3 seconds:
hbase(main):002:0> create 'test' , 'data'
0 row(s) in 8.3310 seconds
=> Hbase::Table - test
Although there is no error or exception in the logs of the base. For reference, here is my hbase-KV-master-KV.local.log file:
2017-06-20 16:23:49,335 INFO [B.defaultRpcServer.handler=9,queue=0,port=64717] master.HMaster: Client=KV//127.0.0.1 create 'test', {NAME => 'data', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2017-06-20 16:23:49,463 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x15cc51a8c440000 type:create cxid:0x2ca zxid:0x47 txntype:-1 reqpath:n/a Error Path:/hbase/table-lock/test Error:KeeperErrorCode = NoNode for /hbase/table-lock/test
2017-06-20 16:23:54,615 INFO [RegionOpenAndInitThread-test-1] regionserver.HRegion: creating HRegion test HTD == 'test', {NAME => 'data', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://172.**.**.168/var/folders/sm/814w032j2q3d9npm7c4509xm0000gn/T/hbase-KV/hbase/.tmp Table name == test
2017-06-20 16:23:55,041 INFO [RegionOpenAndInitThread-test-1] regionserver.HRegion: Closed test,,1497956029330.a5c5e9076c1e38f4255f4dc8eea50f97.
2017-06-20 16:23:55,164 INFO [ProcedureExecutor-1] hbase.MetaTableAccessor: Added 1
2017-06-20 16:23:55,273 INFO [ProcedureExecutor-1] zookeeper.ZKTableStateManager: Moving table test state from null to ENABLING
2017-06-20 16:23:55,278 INFO [ProcedureExecutor-1] master.AssignmentManager: Assigning 1 region(s) to localhost,64720,1497955472691
2017-06-20 16:23:55,285 INFO [ProcedureExecutor-1] master.RegionStates: Transition {a5c5e9076c1e38f4255f4dc8eea50f97 state=OFFLINE, ts=1497956035278, server=null} to {a5c5e9076c1e38f4255f4dc8eea50f97 state=PENDING_OPEN, ts=1497956035285, server=localhost,64720,1497955472691}
2017-06-20 16:23:55,290 INFO [PriorityRpcServer.handler=14,queue=0,port=64720] regionserver.RSRpcServices: Open test,,1497956029330.a5c5e9076c1e38f4255f4dc8eea50f97.
2017-06-20 16:23:55,303 INFO [AM.ZK.Worker-pool2-t10] master.RegionStates: Transition {a5c5e9076c1e38f4255f4dc8eea50f97 state=PENDING_OPEN, ts=1497956035285, server=localhost,64720,1497955472691} to {a5c5e9076c1e38f4255f4dc8eea50f97 state=OPENING, ts=1497956035303, server=localhost,64720,1497955472691}
2017-06-20 16:23:55,307 INFO [StoreOpener-a5c5e9076c1e38f4255f4dc8eea50f97-1] hfile.CacheConfig: Created cacheConfig for data: blockCache=LruBlockCache{blockCount=0, currentSize=867896, freeSize=844179528, maxSize=845047424, heapSize=867896, minSize=802795072, minFactor=0.95, multiSize=401397536, multiFactor=0.5, singleSize=200698768, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
2017-06-20 16:23:55,307 INFO [StoreOpener-a5c5e9076c1e38f4255f4dc8eea50f97-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000
2017-06-20 16:23:55,319 INFO [RS_OPEN_REGION-localhost:64720-1] regionserver.HRegion: Onlined a5c5e9076c1e38f4255f4dc8eea50f97; next sequenceid=2
2017-06-20 16:23:55,322 INFO [PostOpenDeployTasks:a5c5e9076c1e38f4255f4dc8eea50f97] regionserver.HRegionServer: Post open deploy tasks for test,,1497956029330.a5c5e9076c1e38f4255f4dc8eea50f97.
2017-06-20 16:23:55,325 INFO [PostOpenDeployTasks:a5c5e9076c1e38f4255f4dc8eea50f97] hbase.MetaTableAccessor: Updated row test,,1497956029330.a5c5e9076c1e38f4255f4dc8eea50f97. with server=localhost,64720,1497955472691
2017-06-20 16:23:55,327 INFO [AM.ZK.Worker-pool2-t11] master.RegionStates: Transition {a5c5e9076c1e38f4255f4dc8eea50f97 state=OPENING, ts=1497956035303, server=localhost,64720,1497955472691} to {a5c5e9076c1e38f4255f4dc8eea50f97 state=OPEN, ts=1497956035327, server=localhost,64720,1497955472691}
2017-06-20 16:23:55,329 INFO [ProcedureExecutor-1] zookeeper.ZKTableStateManager: Moving table test state from ENABLING to ENABLED
Any suggestion, what could be the issue? Why is it taking so long time?

Related

Ambari metrics not show metrics after Cleaning up Ambari Metrics System Data

we have ambari with HDP version 2.6.5
we want to clean all metrics data , according to the following instructions on link - https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data
so we did the following
note - Metrics Service operation mode - distributed
we stop the metrics service from ambari
we clean all data: ( from hdfs )
hdfs dfs -rm -r -f /apps/ams/metrics/*
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/.tmp' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/.tmp
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/MasterProcWALs' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/MasterProcWALs
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/WALs' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/WALs
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/archive' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/archive
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/data' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/data
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/hbase.id' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/hbase.id
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/hbase.version' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/hbase.version
20/02/13 06:10:01 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hdfsha/apps/ams/metrics/oldWALs' to trash at: hdfs://hdfsha/user/hdfs/.Trash/Current/apps/ams/metrics/oldWALs
And we clean also the following folders
ls /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0/
ls /var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool/
We start the metrics services from ambari
But from ambari metrics graphs not appears , and metrics collector service have alert
Not clearly why metrics not created after full metrics cleaning ,
From the log we can see the following:
2020-02-13 06:15:33,024 INFO [ProcedureExecutorThread-5] procedure2.ProcedureExecutor: Rolledback procedure CreateTableProcedure (table=SYSTEM.CATALOG) id=6 owner=ams state=ROLLEDBACK exec-time=239msec exception=org.apache.hadoop.hbase.TableExistsException: SYSTEM.CATALOG
2020-02-13 06:15:44,356 INFO [timeline] timeline.HadoopTimelineMetricsSink: No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2020-02-13 06:16:21,223 INFO [RpcServer.FifoWFPBQ.default.handler=28,queue=1,port=61300] master.HMaster: Client=ams/null List Table Descriptor for the SYSTEM.CATALOG table fails
2020-02-13 06:16:21,236 INFO [RpcServer.FifoWFPBQ.default.handler=28,queue=1,port=61300] master.HMaster: Client=ams/null create 'SYSTEM.CATALOG', {TABLE_ATTRIBUTES => {PRIORITY => '2000', coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.coprocessor.MetaDataEndpointImpl|805306366|', coprocessor$6 => '|org.apache.phoenix.coprocessor.MetaDataRegionObserver|805306367|'}, {NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1000', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'true', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2020-02-13 06:16:21,349 INFO [ProcedureExecutorThread-6] procedure.CreateTableProcedure: CreateTableProcedure (table=SYSTEM.CATALOG) id=7 owner=ams state=RUNNABLE execute state=CREATE_TABLE_PRE_OPERATION
2020-02-13 06:16:21,360 WARN [ProcedureExecutorThread-6] procedure.CreateTableProcedure: The table SYSTEM.CATALOG does not exist in meta but has a znode. run hbck to fix inconsistencies.
2020-02-13 06:16:21,652 INFO [ProcedureExecutorThread-6] procedure2.ProcedureExecutor: Rolledback procedure CreateTableProcedure (table=SYSTEM.CATALOG) id=7 owner=ams state=ROLLEDBACK exec-time=305msec exception=org.apache.hadoop.hbase.TableExistsException: SYSTEM.CATALOG
2020-02-13 06:17:14,354 INFO [timeline] timeline.HadoopTimelineMetricsSink: No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2020-02-13 06:17:58,076 INFO [RpcServer.FifoWFPBQ.default.handler=28,queue=1,port=61300] master.HMaster: Client=ams/null List Table Descriptor for the SYSTEM.CATALOG table fails
2020-02-13 06:17:58,093 INFO [RpcServer.FifoWFPBQ.default.handler=28,queue=1,port=61300] master.HMaster: Client=ams/null create 'SYSTEM.CATALOG', {TABLE_ATTRIBUTES => {PRIORITY => '2000', coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.coprocessor.MetaDataEndpointImpl|805306366|', coprocessor$6 => '|org.apache.phoenix.coprocessor.MetaDataRegionObserver|805306367|'}, {NAME => '0', BLOOMFILTER => 'ROW', VERSIONS => '1000', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'true', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2020-02-13 06:17:58,206 INFO [ProcedureExecutorThread-7] procedure.CreateTableProcedure: CreateTableProcedure (table=SYSTEM.CATALOG) id=8 owner=ams state=RUNNABLE execute state=CREATE_TABLE_PRE_OPERATION
2020-02-13 06:17:58,218 WARN [ProcedureExecutorThread-7] procedure.CreateTableProcedure: The table SYSTEM.CATALOG does not exist in meta but has a znode. run hbck to fix inconsistencies.
2020-02-13 06:17:58,484 INFO [ProcedureExecutorThread-7] procedure2.ProcedureExecutor: Rolledback procedure CreateTableProcedure (table=SYSTEM.CATALOG) id=8 owner=ams state=ROLLEDBACK exec-time=279msec exception=org.apache.hadoop.hbase.TableExistsException: SYSTEM.CATALOG
2020-02-13 06:19:24,358 INFO [timeline] timeline.HadoopTimelineMetricsSink: No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
2020-02-13 06:19:34,540 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=156.56 KB, freeSize=147.69 MB, max=147.84 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=30, evicted=0, evictedPerRun=0.0
Did you check the Ambari-server.log. You can definitely find something there.

spark 1.6.3 stuck when union rdd

I want union two RDDs, one from RDBMS and one from hdfs.
It work fine when the hdfs file was not exists. And when it's exists, and union the two RDDs. Spark will stuck.
the log:
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 200 blocks
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/11/24 20:01:29 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
16/11/24 20:01:29 INFO Executor: Finished task 2.0 in stage 5.0 (TID 409). 2074 bytes result sent to driver
16/11/24 20:01:29 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 409) in 76 ms on localhost (1/3)
stop at here.
the code:
val allUserAction = if (exists) {
val textFile = sc.textFile(historyHDFSFilePath).map { line =>
//val textFile = sc.textFile("file:///Volumes/disk02/Desktop/part-00000").map { line =>
val values = line.split(",")
if (values.length == 13) {
(values(0).toLong, (values(1), values(2).toLong, values(3).toLong,
values(4).toInt, values(5).toLong, values(6).toInt, values(7).toBoolean, false, values(8).toBoolean,
values(9).toBoolean, values(10).toBoolean, values(11).toInt, values(12).toInt))
} else {
(0L, ("", 0L, 0L, 0, 0L, 0, false, false, false, false, false, 0, 0))
}
}
textFile.union(registerUserSourceWithOp)
} else {
registerUserSourceWithOp
}.cache()
allUserAction.collect().foreach(println)
If I copy the file from hdfs to local machine. and read from it. it work fine.
How to fix it. THX~

unable to load data in hbase table from hive

I am using hadoop version 2.7.0, hive version 1.1.0, HBase version hbase-0.98.14-hadoop2.
I have created a hbase table from hive successfully.
hive (Koushik)> CREATE TABLE hive_hbase_emp_test(eid int, ename string, esal double)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cfstr:enm,cfsal:esl")
> TBLPROPERTIES ("hbase.table.name" = "hive_hbase_emp_test");
OK
Time taken: 0.874 seconds
hbase(main):004:0> describe 'hive_hbase_emp_test'
Table hive_hbase_emp_test is ENABLED
hive_hbase_emp_test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cfsal', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VER
SIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'cfstr', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VER
SIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
2 row(s) in 3.0650 seconds
But when I am trying to load the table from hive it is failing.
hive (Koushik)> INSERT OVERWRITE TABLE hive_hbase_emp_test SELECT empid,empname,empsal FROM hive_employee;
Query ID = hduser_20150921110000_249675d5-9da7-49fe-b03e-3a2d813ac898
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1442836788507_0011, Tracking URL = http://localhost:8088/proxy/application_1442836788507_0011/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1442836788507_0011
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2015-09-21 11:01:39,041 Stage-0 map = 0%, reduce = 0%
2015-09-21 11:02:39,429 Stage-0 map = 0%, reduce = 0%
2015-09-21 11:02:45,814 Stage-0 map = 100%, reduce = 0%
Ended Job = job_1442836788507_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1442836788507_0011_m_000000 (and more) from job job_1442836788507_0011
Task with the most failures(4):
-----
Task ID:
task_1442836788507_0011_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1442836788507_0011&tipid=task_1442836788507_0011_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
... 22 more
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.lazy.LazyUtils.getByte(Ljava/lang/String;B)B
at org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.collectSeparators(LazySerDeParameters.java:223)
at org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.<init>(LazySerDeParameters.java:90)
at org.apache.hadoop.hive.hbase.HBaseSerDeParameters.<init>(HBaseSerDeParameters.java:95)
at org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:344)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
... 22 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
The content of auxlib folder in hive is as below
hduser#ubuntu:/usr/lib/hive/auxlib$ ls
activation-1.1.jar
aopalliance-1.0.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.1.jar
avro-1.7.4.jar
aws-java-sdk-1.7.4.jar
azure-storage-2.0.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.7.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-daemon-1.0.13.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.3.2.jar
commons-logging-1.1.1.jar
commons-math-2.1.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
findbugs-annotations-1.3.9-1.jar
gmbal-api-only-3.0.0-b023.jar
grizzly-framework-2.1.2.jar
grizzly-http-2.1.2.jar
grizzly-http-server-2.1.2.jar
grizzly-http-servlet-2.1.2.jar
grizzly-rcm-2.1.2.jar
gson-2.2.4.jar
guava-12.0.1.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.7.0.jar
hadoop-ant-2.7.0.jar
hadoop-archives-2.7.0.jar
hadoop-auth-2.7.0.jar
hadoop-aws-2.7.0.jar
hadoop-azure-2.7.0.jar
hadoop-client-2.2.0.jar
hadoop-common-2.2.0.jar
hadoop-datajoin-2.7.0.jar
hadoop-distcp-2.7.0.jar
hadoop-extras-2.7.0.jar
hadoop-gridmix-2.7.0.jar
hadoop-hdfs-2.7.0.jar
hadoop-hdfs-2.7.0-tests.jar
hadoop-hdfs-nfs-2.7.0.jar
hadoop-mapreduce-client-app-2.7.0.jar
hadoop-mapreduce-client-common-2.7.0.jar
hadoop-mapreduce-client-core-2.7.0.jar
hadoop-mapreduce-client-hs-2.7.0.jar
hadoop-mapreduce-client-hs-plugins-2.7.0.jar
hadoop-mapreduce-client-jobclient-2.7.0.jar
hadoop-mapreduce-client-jobclient-2.7.0-tests.jar
hadoop-mapreduce-client-shuffle-2.7.0.jar
hadoop-mapreduce-examples-2.7.0.jar
hadoop-openstack-2.7.0.jar
hadoop-rumen-2.7.0.jar
hadoop-sls-2.7.0.jar
hadoop-streaming-2.7.0.jar
hadoop-yarn-api-2.7.0.jar
hadoop-yarn-applications-distributedshell-2.7.0.jar
hadoop-yarn-applications-unmanaged-am-launcher-2.7.0.jar
hadoop-yarn-client-2.7.0.jar
hadoop-yarn-common-2.7.0.jar
hadoop-yarn-registry-2.7.0.jar
hadoop-yarn-server-applicationhistoryservice-2.7.0.jar
hadoop-yarn-server-common-2.7.0.jar
hadoop-yarn-server-nodemanager-2.7.0.jar
hadoop-yarn-server-resourcemanager-2.7.0.jar
hadoop-yarn-server-sharedcachemanager-2.7.0.jar
hadoop-yarn-server-tests-2.7.0.jar
hadoop-yarn-server-web-proxy-2.7.0.jar
hamcrest-core-1.3.jar
hbase-annotations-0.98.14-hadoop2.jar
hbase-checkstyle-0.98.14-hadoop2.jar
hbase-client-0.98.14-hadoop2.jar
hbase-common-0.98.14-hadoop2.jar
hbase-common-0.98.14-hadoop2-tests.jar
hbase-examples-0.98.14-hadoop2.jar
hbase-hadoop2-compat-0.98.14-hadoop2.jar
hbase-hadoop-compat-0.98.14-hadoop2.jar
hbase-it-0.98.14-hadoop2.jar
hbase-it-0.98.14-hadoop2-tests.jar
hbase-prefix-tree-0.98.14-hadoop2.jar
hbase-protocol-0.98.14-hadoop2.jar
hbase-resource-bundle-0.98.14-hadoop2.jar
hbase-rest-0.98.14-hadoop2.jar
hbase-server-0.98.14-hadoop2.jar
hbase-server-0.98.14-hadoop2-tests.jar
hbase-shell-0.98.14-hadoop2.jar
hbase-testing-util-0.98.14-hadoop2.jar
hbase-thrift-0.98.14-hadoop2.jar
high-scale-lib-1.1.1.jar
hive-hbase-handler-1.2.1.jar
hive-serde-1.2.1.jar
htrace-core-2.04.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.1.3.jar
httpclient-4.2.5.jar
httpcore-4.1.3.jar
httpcore-4.2.5.jar
jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-core-asl-1.8.8.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.2.3.jar
jackson-jaxrs-1.8.8.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.8.8.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
jamon-runtime-2.3.1.jar
jasper-compiler-5.5.23.jar
jasper-runtime-5.5.23.jar
javax.inject-1.jar
java-xmlbuilder-0.4.jar
javax.servlet-3.1.jar
javax.servlet-api-3.0.1.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jcodings-1.0.8.jar
jersey-client-1.8.jar
jersey-core-1.8.jar
jersey-core-1.9.jar
jersey-grizzly2-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jersey-test-framework-core-1.9.jar
jersey-test-framework-grizzly2-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jettison-1.3.1.jar
jetty-6.1.26.jar
jetty-sslengine-6.1.26.jar
jetty-util-6.1.26.jar
joda-time-2.7.jar
joni-2.1.2.jar
jruby-complete-1.6.8.jar
jsch-0.1.42.jar
jsp-2.1-6.1.14.jar
jsp-api-2.1-6.1.14.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-4.11.jar
leveldbjni-all-1.8.jar
libthrift-0.9.0.jar
log4j-1.2.17.jar
management-api-3.0.0-b012.jar
metrics-core-3.0.1.jar
mockito-all-1.8.5.jar
netty-3.6.6.Final.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
servlet-api-2.5-6.1.14.jar
servlet-api-2.5.jar
slf4j-api-1.6.4.jar
slf4j-log4j12-1.6.4.jar1
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.jar
What's I am missing here??
It looks like there is a version compatibility issue. The org.apache.hadoop.hive.serde2.lazy.LazyUtils.getByte is added to this class in this commit, which is released in Hive 1.2. See here
Actually I made a mistake. I have kept hive-hbase-handler-1.2.1.jar & hive-serde-1.2.1.jar in the auxlib path, which was causing the problem. When I removed 1.2.1 version of jars and then it is working fine with hive-hbase-handler-1.1.0.jar & hive-serde-1.1.0.jar. So the problem resolved with hive version 1.1.0 only (with habse version 0.98.14 and hadoop version 2.7.0).
NoSuchMethodError represents JVM could find the Class, but the Method is not found.
May be the class(runtime), is not the same with your hive version.
You can start hive cli in debug mode(bin/hive -hiveconf hive.root.logger=DEBUG,console). It will show all jars in and find the jar version in the logs.

Errors reading data from 1G file on localCluster mode with apache storm

Hi I'm using storm with local cluster mode for developing.
I ran a simple code that contains spout and two bolts, the code example count words from log file.
code example url :
http://kaviddiss.com/2013/05/17/how-to-get-started-with-storm-framework-in-5-minutes/
the code works perfectly with small log files (7.3M), but when I try to run a big log file (100M-1000M) I'm getting exceptions.
I set a long delay till the cluster is going down.
May I miss some configuration options here?
exceptions:
11326 [Thread-6] INFO backtype.storm.daemon.supervisor - Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "HelloStorm-1-1403522378", :executors ([3 3] [ 4 4] [2 2] [1 1])} for this supervisor 868aff95-7b63-44d1-ad55-2dd07d9c7ba2 on port 1024 with id df052251-45ec-4bc3-a486-c2bf11a8a0fa
11336 [Thread-6] INFO backtype.storm.daemon.worker - Launching worker for HelloStorm-1-1403522378 on 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024 with id df052251-45ec-4bc3-a486-c2bf11a8a0fa and conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization" true, "topology.ma x.error.report.per.interval" 5, "zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations" true, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper. session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 50, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/li b:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/var/tmp//77d5cd63-9539-44a4-892a-9e91553987df", "storm.messaging.netty.buffer_size" 5242880, "supervisor.w orker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.sha red.thread.pool.size" 4, "nimbus.host" "localhost", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_t hreads" 1, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts " nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor. frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topology.spout.w ait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "topology.sleep.spout. wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" (1024 1025 1026), "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx256m", "nimbus.thrift.port" 6627, "topol ogy.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.wait.strategy" "com.lm ax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory", "drpc.invoc ations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchroniz ation.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode" "local", "topolog y.optimize" true, "topology.max.task.parallelism" nil}
11337 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
11344 [Thread-6-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
11358 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
11611 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor line-reader-spout:[2 2]
11618 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks line-reader-spout:[2 2]
11632 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Opening spout line-reader-spout:(2)
Start Time: 18512885554479686
11634 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Opened spout line-reader-spout:(2)
11636 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Activating spout line-reader-spout:(2)
11638 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor line-reader-spout:[2 2]
11677 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor word-counter:[3 3]
11721 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks word-counter:[3 3]
11725 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor word-counter:[3 3]
11733 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor word-spitter:[4 4]
11735 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks word-spitter:[4 4]
11737 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor word-spitter:[4 4]
11746 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor __system:[-1 -1]
11747 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks __system:[-1 -1]
11748 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor __system:[-1 -1]
11761 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor __acker:[1 1]
11765 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks __acker:[1 1]
11767 [Thread-6] INFO backtype.storm.daemon.executor - Timeouts disabled for executor __acker:[1 1]
11768 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor __acker:[1 1]
11768 [Thread-6] INFO backtype.storm.daemon.worker - Launching receive-thread for 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024
11786 [Thread-6] INFO backtype.storm.daemon.worker - Worker has topology config {"storm.id" "HelloStorm-1-1403522378", "dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.se cs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization" true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 0, "topology.skip.missing.k ryo.registrations" true, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.in terval.millis" 50, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.l ocal.dir" "/var/tmp//77d5cd63-9539-44a4-892a-9e91553987df", "storm.messaging.netty.buffer_size" 5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "inputF ile" "test_log.log", "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "localhost", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "stor m.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.kryo.decorators" (), "topology.name" "HelloStorm", "topology.transfer.buffer.size" 1024, "topology.worker .childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "superviso r.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topolo gy.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" 1, "storm.zookeeper.retry.interval" 1000, "topology.slee p.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" (1024 1025 1026), "topology.debug" false, "nimbus.task.launc h.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.kryo.register" nil, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx25 6m", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "top ology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serializ ation.DefaultKryoFactory", "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransp ortPlugin", "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.nett y.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode" "local", "topology.optimize" true, "topology.max.task.parallelism" nil}
11786 [Thread-6] INFO backtype.storm.daemon.worker - Worker df052251-45ec-4bc3-a486-c2bf11a8a0fa for storm HelloStorm-1-1403522378 on 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024 has finished loading
11801 [Thread-18-word-counter] INFO backtype.storm.daemon.executor - Preparing bolt word-counter:(3)
11821 [Thread-18-word-counter] INFO backtype.storm.daemon.executor - Prepared bolt word-counter:(3)
11823 [Thread-20-word-spitter] INFO backtype.storm.daemon.executor - Preparing bolt word-spitter:(4)
11825 [Thread-20-word-spitter] INFO backtype.storm.daemon.executor - Prepared bolt word-spitter:(4)
11838 [Thread-24-__acker] INFO backtype.storm.daemon.executor - Preparing bolt __acker:(1)
11840 [Thread-22-__system] INFO backtype.storm.daemon.executor - Preparing bolt __system:(-1)
11854 [Thread-24-__acker] INFO backtype.storm.daemon.executor - Prepared bolt __acker:(1)
12173 [Thread-22-__system] INFO backtype.storm.daemon.executor - Prepared bolt __system:(-1)
112055 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
112058 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
112058 [Thread-6-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
112058 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121441 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
121442 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121442 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
121442 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121443 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
121443 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121443 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
121444 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134654 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
134655 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134655 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134656 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
134656 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
134656 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
134657 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
134657 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134657 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
139931 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
149745 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
149745 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
149746 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
149746 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
149747 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
149747 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
149747 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
149747 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158929 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158931 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158931 [Thread-6-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
158931 [Thread-6-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
158931 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158932 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
158933 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
176934 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
357333 [CuratorFramework-5] ERROR com.netflix.curator.ConnectionState - Connection timed out
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72) ~[curator-client-1.0.1.jar:na]
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74) [curator-client-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.BackgroundSyncImpl.performBackgroundOperation(BackgroundSyncImpl.java:39) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:40) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:547) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.access$200(CuratorFrameworkImpl.java:50) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.call(CuratorFrameworkImpl.java:177) [curator-framework-1.0.1.jar:na]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [na:1.6.0_65]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [na:1.6.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [na:1.6.0_65]
at java.lang.Thread.run(Thread.java:680) [na:1.6.0_65]
[update]
I got new exception running 70M file:
622366 [CuratorFramework-9] ERROR com.netflix.curator.framework.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.lang.OutOfMemoryError: GC overhead limit exceeded
The problem seems to be exactly as described: you've loaded more data into memory than your JVM can support. I assume this is happening to the spout. For very large files you'll need to break up the processing by either splitting the files in advance or streaming the files in instead of trying to load the whole file into memory.

Copy Data from one hbase table to another

I have created one table hivetest which also create the table in hbase with name of 'hbasetest'. Now I want to copy 'hbasetest' data into another hbase table(say logdata) with the same schema. So, can anyone help me how do copy the data from 'hbasetest' to 'logdata' without using the hive.
CREATE TABLE hivetest(cookie string, timespent string, pageviews string, visit string, logdate string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "m:timespent, m:pageviews, m:visit, m:logdate")
TBLPROPERTIES ("hbase.table.name" = "hbasetest");
Updated question :
I have created the table logdata like this. But, I am getting the following error.
create 'logdata', {NAME => ' m', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>'0', TTL => '2147483647', BLOCKSIZE=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
13/09/23 12:57:19 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_0, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/09/23 12:57:29 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_1, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/09/23 12:57:38 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_2, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/09/23 12:57:53 INFO mapred.JobClient: Job complete: job_201309231115_0025
13/09/23 12:57:53 INFO mapred.JobClient: Counters: 7
13/09/23 12:57:53 INFO mapred.JobClient: Job Counters
13/09/23 12:57:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34605
13/09/23 12:57:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient: Rack-local map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient: Launched map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/09/23 12:57:53 INFO mapred.JobClient: Failed map tasks=1
Use the copyTable command. Example :
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=logdata hbasetest
Actually i am using hive-0.9.0. Which has a bug
https://issues.apache.org/jira/browse/HIVE-3243.
So, while creating the table SerDe of HBaseStorageHandler doesn't ignore white space between comma and column family. Hence you need to remove the white spaces. Then it will work fine.

Resources