Increasing neo4j performance when doing large number of writes - performance

I am trying to load data into neo4j through my application. This is using bolt driver. My application is initially very write intensive. I am unable to reduce the load time as neo4j writes seem to be pretty slow with bolt driver. I see all threads doing below operation:
at sun.nio.ch.FileDispatcherImpl.read0(java.io.FileDescriptor, long, int)
at sun.nio.ch.SocketDispatcher.read(java.io.FileDescriptor, long, int)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher)
at sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher)
at sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer)
at org.neo4j.driver.internal.security.TLSSocketChannel.channelRead(java.nio.ByteBuffer) (line: 159)
at org.neo4j.driver.internal.security.TLSSocketChannel.unwrap(java.nio.ByteBuffer) (line: 229)
at org.neo4j.driver.internal.security.TLSSocketChannel.read(java.nio.ByteBuffer) (line: 419)
at org.neo4j.driver.internal.net.BufferingChunkedInput.readNextPacket(java.nio.channels.ReadableByteChannel, java.nio.ByteBuffer) (line: 409)
at org.neo4j.driver.internal.net.BufferingChunkedInput.readChunkSize() (line: 345)
at org.neo4j.driver.internal.net.BufferingChunkedInput.read(java.nio.ByteBuffer) (line: 247)
at org.neo4j.driver.internal.net.BufferingChunkedInput.fillScratchBuffer(int) (line: 216)
at org.neo4j.driver.internal.net.BufferingChunkedInput.readByte() (line: 110)
at org.neo4j.driver.internal.packstream.PackStream$Unpacker.unpackStructHeader() (line: 430)
at org.neo4j.driver.internal.messaging.PackStreamMessageFormatV1$Reader.read(org.neo4j.driver.internal.messaging.MessageHandler) (line: 398)
at org.neo4j.driver.internal.net.SocketClient.receiveOne(org.neo4j.driver.internal.net.SocketResponseHandler) (line: 176)
at org.neo4j.driver.internal.net.SocketConnection.receiveOne() (line: 212)
at org.neo4j.driver.internal.net.ConcurrencyGuardingConnection.receiveOne() (line: 165)
at org.neo4j.driver.internal.net.pooling.PooledSocketConnection.receiveOne() (line: 183)
at org.neo4j.driver.internal.InternalStatementResult.receiveOne() (line: 335)
at org.neo4j.driver.internal.InternalStatementResult.tryFetchNext() (line: 325)
at org.neo4j.driver.internal.InternalStatementResult.hasNext() (line: 193)
Heap Memory min and max =4Gb
Page Cache size = 2 GB
Total database size approx = 2 GB (Expected to grow up to 10GB)
Is there a way i can optimize for the above operation? I can increase the heap up to 8 GB as other applications are running on this machine as well.

If your'e application is write-Intensive, you'll benefit from performing your writes with batched transactions ( do this from your application -> aggregate the nodes \ edges until you reach this side , open a transaction , and commit when reaching this size). you can play with the batch size ( 1000, 10000) and see what is working best for you.

Related

When I use flink to write hbase,getting hbase error in region server

Soft version as follows:
apache hbase 2.1.6
apache flink 1.13.6
apache hadoop 3.1.1
When I use the hbase-client api to access hbase, I get the following error:
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=16, exceptions:
Wed Sep 28 03:03:11 UTC 2022, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68532: java.io.IOException: Invalid currTagsLen -32239. Block offset: 1319713, block length: 99991, position: 42422 (without header). path=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/cd083a4a1ef04baff94ebb5aabdb8cb8/i/1f6dd8a1bc054eefbc9faa1bf625e24f
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: java.lang.IllegalStateException: Invalid currTagsLen -32239. Block offset: 1319713, block length: 99991, position: 42422 (without header). path=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/cd083a4a1ef04baff94ebb5aabdb8cb8/i/1f6dd8a1bc054eefbc9faa1bf625e24f
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.checkTagsLen(HFileReaderImpl.java:642)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readKeyValueLen(HFileReaderImpl.java:630)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1080)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:153)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6581)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6745)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6518)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3155)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3404)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42190)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
... 3 more
The exception for hbase regionserver is as follows:
2022-09-28 11:19:36,019 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started
2022-09-28 11:20:20,946 INFO [MemStoreFlusher.0] regionserver.HRegion: Flushing 1/1 column families, dataSize=1.95 MB heapSize=2.09 MB
2022-09-28 11:20:20,969 INFO [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed memstore data size=1.95 MB at sequenceid=8934625 (bloomFilter=true), to=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d5468
55f/.tmp/i/2629dbae7d5e402489ef56b1c097289f
2022-09-28 11:20:20,977 INFO [MemStoreFlusher.0] regionserver.HStore: Added hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/2629dbae7d5e402489ef56b1c097289f, entries=1212, sequenceid=8934625, filesize=359.
1 K
2022-09-28 11:20:20,978 INFO [MemStoreFlusher.0] regionserver.HRegion: Finished flush of dataSize ~1.95 MB/2041026, heapSize ~2.09 MB/2190200, currentSize=0 B/0 for e63ee2269b0b076a415c5f76d546855f in 32ms, sequenceid=8934625, compaction requested=true
2022-09-28 11:20:20,986 INFO [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] regionserver.HRegion: Starting compaction of i in expose,9ffffff6,1663741391432.e63ee2269b0b076a415c5f76d546855f.
2022-09-28 11:20:20,986 INFO [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] regionserver.HStore: Starting compaction of [hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/98d0ecd1ed
7744a8a5f94923c382861e, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/30bab1682dba4721b25e58b78dd17255, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/f8
0c2f08176e417a9184f434d4300935, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/52baca576c154c26b7df3b5d126d47b8, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d5468
55f/i/7d8291d422d042de9aa43aa5b79da6ad, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/8bf3b47909ab4eeb86d8a5c283cfe942, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5
f76d546855f/i/0663d48a4ed94dbe9fdc78f6649c1eb3, hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/b80b55d744174bc882db93283cd70c71] into tmpdir=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e
63ee2269b0b076a415c5f76d546855f/.tmp, totalSize=18.9 M
2022-09-28 11:20:21,153 INFO [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] throttle.PressureAwareThroughputController: e63ee2269b0b076a415c5f76d546855f#i#compaction#637 average throughput is 122.45 MB/second, slept 0 time(s) and
total slept time is 0 ms. 0 active operations remaining, total limit is 61.86 MB/second
2022-09-28 11:20:21,159 ERROR [regionserver/bghbaseclusterdn9528:16020-shortCompactions-1664173471436] regionserver.CompactSplit: Compaction failed region=expose,9ffffff6,1663741391432.e63ee2269b0b076a415c5f76d546855f., storeName=i, priority=73, startTime=
1664335220978
java.lang.IllegalStateException: Invalid currTagsLen -9. Block offset: 1677972, block length: 161891, position: 48652 (without header). path=hdfs://cthbaseclusterpro01/apps/hbase/data/data/default/expose/e63ee2269b0b076a415c5f76d546855f/i/b80b55d744174bc88
2db93283cd70c71
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.checkTagsLen(HFileReaderImpl.java:642)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readKeyValueLen(HFileReaderImpl.java:630)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1080)
at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:388)
at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:327)
at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1410)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2187)
at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:596)
at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:638)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2022-09-28 11:20:25,000 INFO [RpcServer.default.FPBQ.Fifo.handler=18,queue=3,port=16020] regionserver.HRegion: writing data to region expose,9ffffff6,1663741391432.e63ee2269b0b076a415c5f76d546855f. with WAL disabled. Data may be lost in the event of a cra
sh.
2022-09-28 11:24:01,565 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.08 GB, freeSize=2.52 GB, max=3.60 GB, blockCount=17155, accesses=133155383, hits=132992986, hitRatio=99.88%, , cachingAccesses=132985682, cachingHits=132951576, cac
hingHitsRatio=99.97%, evictions=16199, evicted=0, evictedPerRun=0.0
2022-09-28 11:24:01,569 INFO [MobFileCache #0] mob.MobFileCache: MobFileCache Statistics, access: 0, miss: 0, hit: 0, hit ratio: 0%, evicted files: 0
2022-09-28 11:24:05,246 INFO [regionserver/bghbaseclusterdn9528:16020.logRoller] wal.AbstractFSWAL: Rolled WAL /apps/hbase/data/WALs/bghbaseclusterdn9528,16020,1664173440239/bghbaseclusterdn9528%2C16020%2C1664173440239.1664331845190 with entries=21, files
ize=5.39 KB; new WAL /apps/hbase/data/WALs/bghbaseclusterdn9528,16020,1664173440239/bghbaseclusterdn9528%2C16020%2C1664173440239.1664335445235
I found some solutions in code. such as HBASE-21507、HBASE-24515、HBASE-21775

All queries with 'ON CLUSTER' clause timed out with error message 'There is no a local address in host list'

We're building up a ClickHouse cluster (version 20.1.8.41) on 7 nodes, using a "circular replica" pattern (i.e. 7 shards * 2 replicas on different nodes), with an extra ZooKeeper cluster.
The /etc/hosts files are all correctly configured, and the cluster started succcessfully.
However, when we're executing distributed DDL queries, they all hanged and eventually timed out, e.g.:
:) create database ods on cluster sht_ck_cluster_1;
CREATE DATABASE ods ON CLUSTER sht_ck_cluster_1
→ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) Received exception from server (version 20.1.8):
Code: 159. DB::Exception: Received from localhost:9002. DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000007 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 14 unfinished hosts (0 of them are currently active), they are going to execute the query in background.
0 rows in set. Elapsed: 180.589 sec.
The clickhouse-server.log on the client node gives information below:
2020.04.23 00:33:33.327414 [ 32 ] {c3c49bd3-333d-4fca-aa2f-2520f5c0cb9f} <Error> executeQuery: Code: 159, e.displayText() = DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000007 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 14 unfinished hosts (0 of them are currently active), they are going to execute the query in background (version 20.1.8.41) (from 127.0.0.1:42198) (in query: CREATE DATABASE ods ON CLUSTER sht_ck_cluster_1), Stack trace (when copying this message, always include the lines below):
0. 0xb2087bc Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) in /usr/bin/clickhouse
1. 0x4d8e3c9 DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) in /usr/bin/clickhouse
2. 0x84846b9 DB::DDLQueryStatusInputStream::readImpl() in /usr/bin/clickhouse
3. 0x8345e3f DB::IBlockInputStream::read() in /usr/bin/clickhouse
4. 0x833d541 DB::AsynchronousBlockInputStream::calculate() in /usr/bin/clickhouse
5. 0x833e113 ? in /usr/bin/clickhouse
6. 0x4dc8b7a ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) in /usr/bin/clickhouse
7. 0x4dc9790 ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() const in /usr/bin/clickhouse
8. 0x4dc7eca ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) in /usr/bin/clickhouse
9. 0x4dc69dc ? in /usr/bin/clickhouse
10. 0x7e25 start_thread in /usr/lib64/libpthread-2.17.so
11. 0xfebad clone in /usr/lib64/libc-2.17.so
What seems weird is that the clickhouse-server.log on the other nodes says:
2020.04.23 00:30:32.744984 [ 23 ] {} <Debug> DDLWorker: Processing tasks
2020.04.23 00:30:32.744992 [ 24 ] {} <Debug> DDLWorker: Cleaning queue
2020.04.23 00:30:32.746629 [ 23 ] {} <Debug> DDLWorker: Will not execute task query-0000000007: There is no a local address in host list
2020.04.23 00:30:32.746641 [ 23 ] {} <Debug> DDLWorker: Waiting a watch
I'm completely confused about this message. I've tried restarting the cluster, disabling DNS cache, and setting the parameter explicitly, but nothing worked.
What else should I do? Many thanks.
Regards
On cluster does not work "circular replica"

Pyspark: Saving dataframe to hadoop or hdfs without overflowing memory?

I'm working on a pipeline that reads a number of hive tables and parses them into some DenseVectors for eventual use in SparkML. I want to do a lot of iteration to find optimal training parameters, both inputs to the model and with computing resources. The dataframe I'm working with is somewhere between 50-100gb all said, spread across a dynamic number of executors on a YARN cluster.
Whenever I try to save, either to parquet or saveAsTable, I get a series of failed tasks before finally it fails completely and suggests raising spark.yarn.executor.memoryOverhead. Each id is a a single row, no more than a few kb.
feature_df.write.parquet('hdfs:///user/myuser/featuredf.parquet',mode='overwrite',partitionBy='id')
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 98 in stage 33.0 failed 4 times, most recent failure: Lost task 98.3 in
stage 33.0 (TID 2141, rs172.hadoop.pvt, executor 441): ExecutorLostFailure
(executor 441 exited caused by one of the running tasks) Reason: Container
killed by YARN for exceeding memory limits. 12.0 GB of 12 GB physical memory used.
Consider boosting spark.yarn.executor.memoryOverhead.
I currently have this at 2g.
Spark workers are currently getting 10gb, and the driver (which is not on the cluster) is getting 16gb with a maxResultSize of 5gb.
I'm caching the dataframe before I write, what else can I do to troubleshoot?
Edit: It seems like it's trying to do all of my transformations at once. When I look at the details for the saveAsTable() method:
== Physical Plan ==
InMemoryTableScan [id#0L, label#90, features#119]
+- InMemoryRelation [id#0L, label#90, features#119], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
+- *Filter (isnotnull(id#0L) && (id#0L < 21326835))
+- InMemoryTableScan [id#0L, label#90, features#119], [isnotnull(id#0L), (id#0L < 21326835)]
+- InMemoryRelation [id#0L, label#90, features#119], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
+- *Project [id#0L, label#90, pythonUDF0#135 AS features#119]
+- BatchEvalPython [<lambda>(collect_list_is#108, 56845.0)], [id#0L, label#90, collect_list_is#108, pythonUDF0#135]
+- SortAggregate(key=[id#0L, label#90], functions=[collect_list(indexedSegs#39, 0, 0)], output=[id#0L, label#90, collect_list_is#108])
+- *Sort [id#0L ASC NULLS FIRST, label#90 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(id#0L, label#90, 200)
+- *Project [id#0L, UDF(segment#2) AS indexedSegs#39, cast(label#1 as double) AS label#90]
+- *BroadcastHashJoin [segment#2], [entry#12], LeftOuter, BuildRight
:- HiveTableScan [id#0L, label#1, segment#2], MetastoreRelation pmccarthy, reka_data_long_all_files
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
+- *Project [cast(entry#7 as string) AS entry#12]
+- HiveTableScan [entry#7], MetastoreRelation reka_trop50, public_crafted_audiences_sized
My suggestion would be to disable dynamic allocation. Try running it with the below configuration :
--master yarn-client --driver-memory 15g --executor-memory 15g --executor-cores 10 --num-executors 15 -Dspark.yarn.executor.memoryOverhead=20000 -Dspark.yarn.driver.memoryOverhead=20000 -Dspark.default.parallelism=500
Ultimately the clue I got from the Spark user mailing list was to look at the partitions, both balance and sizes. As the planner had it, too much was being given to a single executor instance. Adding .repartition(1000) to the expression creating the dataframe to be written made all the difference, and more gains could probably be achieved by creating and partitioning on a clever key column.

Cloudera Hive: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce

I keep getting this error when trying to query data using hue
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce
From the hue job browser under the syslog tab
The error log is too big to paste here
http://pastebin.com/h8tgYuzR
Error from terminal
hive> SELECT count(*) FROM tweets;
Query ID = cloudera_20161128145151_137efb02-413b-4457-b21d-084101b77091
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1480364897609_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1480364897609_0003/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1480364897609_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-11-28 14:52:09,804 Stage-1 map = 0%, reduce = 0%
2016-11-28 14:53:10,955 Stage-1 map = 0%, reduce = 0%
2016-11-28 14:53:13,213 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1480364897609_0003 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://quickstart.cloudera:8088/proxy/application_1480364897609_0003/
Examining task ID: task_1480364897609_0003_m_000000 (and more) from job job_1480364897609_0003
Task with the most failures(4):
-----
Task ID:
task_1480364897609_0003_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1480364897609_0003&tipid=task_1480364897609_0003_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable Objavro.schema�
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:505)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader#7aee0989; line: 1, column: 2]
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:136)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:100)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:496)
... 9 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.StringReader#7aee0989; line: 1, column: 2]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
at org.codehaus.jackson.impl.ReaderBasedParser._handleUnexpectedValue(ReaderBasedParser.java:630)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:364)
at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2439)
at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2396)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1602)
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:126)
... 12 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Here is the table
CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/cloudera/flume/tweets';
data from the file I am trying to load http://pastebin.com/g7eg1BaP
I have a feeling that the table was defined using AVRO as the data type but a non-avro file was loaded to the table.
Remember that Hive is "schema on read" and not "schema on load". It will check for schema only when a job is run, not during loading or defining.
Please post the CREATE TABLE command used and a few records from the file you are trying to load.
Hope this helps.

sonarqube 5.2 MySQLTransactionRollbackException: Deadlock found when trying to get lock

Using SonarQube 5.2 I’m seeing the following Deadlock issue:
05:48:22 ERROR: Error during Sonar runner execution
05:48:22 java.lang.IllegalStateException: Fail to execute request
[code=500, url=http://192.168.109.6/api/ce/submit?projectKey=CoprHD&projectName=CoprHD-controller&projectBranch=bugfix-COP-19001-hotfix]:
{"errors":[{"msg":"\n### Error updating database.
Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException:
Deadlock found when trying to get lock; try restarting transaction\n
### The error may involve org.sonar.db.user.RoleMapper.insertGroupRole-Inline\n### The error occurred while setting parameters\n
### SQL: INSERT INTO group_roles (group_id, resource_id, role) VALUES (?, ?, ?)\n
### Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction"}]}
05:48:22 at org.sonar.batch.report.ReportPublisher.uploadMultiPartReport(ReportPublisher.java:182)
05:48:22 at org.sonar.batch.report.ReportPublisher.sendOrDumpReport(ReportPublisher.java:151)
05:48:22 at org.sonar.batch.report.ReportPublisher.execute(ReportPublisher.java:115)
05:48:22 at org.sonar.batch.phases.PhaseExecutor.publishReportJob(PhaseExecutor.java:116)
05:48:22 at org.sonar.batch.phases.PhaseExecutor.execute(PhaseExecutor.java:106)
05:48:22 at org.sonar.batch.scan.ModuleScanContainer.doAfterStart(ModuleScanContainer.java:192)
05:48:22 at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:100)
05:48:22 at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:85)
05:48:22 at org.sonar.batch.scan.ProjectScanContainer.scan(ProjectScanContainer.java:258)
05:48:22 at org.sonar.batch.scan.ProjectScanContainer.scanRecursively(ProjectScanContainer.java:253)
05:48:22 at org.sonar.batch.scan.ProjectScanContainer.doAfterStart(ProjectScanContainer.java:243)
05:48:22 at org.sonar.core.platform.ComponentContainer.startComponents(ComponentContainer.java:100)
05:48:22 at org.sonar.core.platform.ComponentContainer.execute(ComponentContainer.java:85)
05:48:22 at org.sonar.batch.bootstrap.GlobalContainer.executeAnalysis(GlobalContainer.java:153)
05:48:22 at org.sonar.batch.bootstrapper.Batch.executeTask(Batch.java:110)
05:48:22 at org.sonar.runner.batch.BatchIsolatedLauncher.execute(BatchIsolatedLauncher.java:55)
05:48:22 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
05:48:22 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
05:48:22 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
05:48:22 at java.lang.reflect.Method.invoke(Method.java:606)
05:48:22 at org.sonar.runner.impl.IsolatedLauncherProxy.invoke(IsolatedLauncherProxy.java:61)
05:48:22 at com.sun.proxy.$Proxy0.execute(Unknown Source)
05:48:22 at org.sonar.runner.api.EmbeddedRunner.doExecute(EmbeddedRunner.java:275)
05:48:22 at org.sonar.runner.api.EmbeddedRunner.runAnalysis(EmbeddedRunner.java:166)
05:48:22 at org.sonar.runner.api.EmbeddedRunner.runAnalysis(EmbeddedRunner.java:153)
05:48:22 at org.sonar.runner.cli.Main.runAnalysis(Main.java:118)
05:48:22 at org.sonar.runner.cli.Main.execute(Main.java:80)
05:48:22 at org.sonar.runner.cli.Main.main(Main.java:66)
Searching for similar reports I found this reference which says the issue was resolved: https://jira.sonarsource.com/browse/SONAR-1945
I also found a reference that transaction-isolation should be changed from REPEATABLE-READ to READ-COMMITTED. Is this a reasonable thing to do with mysql for Sonar?
mysql> show variables like '%wait_timeout%';
+--------------------------+----------+
| Variable_name | Value |
+--------------------------+----------+
| innodb_lock_wait_timeout | 500 |
| lock_wait_timeout | 31536000 |
| wait_timeout | 28800 |
+--------------------------+----------+
3 rows in set (0.25 sec)
mysql> show variables like '%tx_isolation%';
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| tx_isolation | REPEATABLE-READ |
+---------------+-----------------+
1 row in set (0.00 sec)
mysql> SELECT ##GLOBAL.tx_isolation, ##tx_isolation;
+-----------------------+-----------------+
| ##GLOBAL.tx_isolation | ##tx_isolation |
+-----------------------+-----------------+
| REPEATABLE-READ | REPEATABLE-READ |
+-----------------------+-----------------+
For further info about the Deadlock issue, here is some data.
Does anyone know if this issue is something that should be tweaked in mysql or is this an issue that needs to be fixed in the SonarQube app?
mysql> show engine innodb status
=====================================
2015-12-18 07:42:25 7f61f03cd700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 31 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 44635 srv_active, 0 srv_shutdown, 1284536 srv_idle
srv_master_thread log flush and writes: 1329157
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 224853
OS WAIT ARRAY INFO: signal count 1727534
Mutex spin waits 1578113, rounds 7231747, OS waits 74673
RW-shared spins 483413, rounds 5257332, OS waits 110301
RW-excl spins 197945, rounds 3737144, OS waits 35005
Spin rounds per wait: 4.58 mutex, 10.88 RW-shared, 18.88 RW-excl
------------------------
LATEST DETECTED DEADLOCK
------------------------
2015-12-17 05:46:47 7f61f0594700
*** (1) TRANSACTION:
TRANSACTION 17641507, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 8 lock struct(s), heap size 1184, 7 row lock(s), undo log entries 9
MySQL thread id 5021, OS thread handle 0x7f61f071a700, query id 33269201 localhost 127.0.0.1 sonar update
INSERT INTO group_roles (group_id, resource_id, role)
VALUES (null, 1515106, 'codeviewer')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 310 page no 6 n bits 472 index `group_roles_resource` of table `sonar`.`group_roles` trx id 17641507 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** (2) TRANSACTION:
TRANSACTION 17641509, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
7 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 3
MySQL thread id 5005, OS thread handle 0x7f61f0594700, query id 33269204 localhost 127.0.0.1 sonar update
INSERT INTO group_roles (group_id, resource_id, role)
VALUES (1, 1515107, 'admin')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 310 page no 6 n bits 472 index `group_roles_resource` of table `sonar`.`group_roles` trx id 17641509 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 310 page no 6 n bits 472 index `group_roles_resource` of table `sonar`.`group_roles` trx id 17641509 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 18864174
Purge done for trx's n:o
<pre>
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 7482, OS thread handle 0x7f61f03cd700, query id 38116433 localhost sonar init
show engine innodb status
---TRANSACTION 18864038, not started
MySQL thread id 7478, OS thread handle 0x7f61f3349700, query id 38115903 localhost 127.0.0.1 sonar cleaning up
---TRANSACTION 18864173, not started
MySQL thread id 7475, OS thread handle 0x7f61f040e700, query id 38116432 localhost 127.0.0.1 sonar cleaning up
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
7146308 OS file reads, 6478063 OS file writes, 1783568 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 3077, seg size 3079, 22965 merges
merged operations:
insert 45672, delete mark 7198683, delete 214896
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 6374777, node heap has 11107 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 219765124434
Log flushed up to 219765124434
Pages flushed up to 219765124434
Last checkpoint at 219765124434
0 pending log writes, 0 pending chkp writes
1189792 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 3296722944; in additional pool allocated 0
Dictionary memory allocated 359878
Buffer pool size 196600
Free buffers 8192
Database pages 177301
Old database pages 65285
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 1567756, not young 296705943
0.00 youngs/s, 0.00 non-youngs/s
Pages read 7146255, created 1592527, written 5004155
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 177301, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]

Resources