Spark streaming receiver Out of Memory (OOM) - spark-streaming
I have a problem where the receiver restarts again and again.
I am using Spark 1.6.1.
I use Spark Streaming to receive from a streaming, then use map to deserialize pb data.
My testing contains two cases:
Just receive the data and print directly: the app is stable
Receive and deserialize: this produces problems. The occurrence time is not regular.
There is 500Mb/min. I have set executor memory at 8GB. The problem is just like something allocate memory extremely. But, I don't know why.
My code:
val conf = new SparkConf().setAppName(args(8))
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.streaming.stopGracefullyOnShutdown", "true")
conf.set("spark.streaming.backpressure.enabled","true")
conf.set("spark.speculation","true")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(args(7).toInt))
val bigPipeStreams = (1 to args(3).toInt).map{
i => ssc.networkStream(
new MyBigpipeLogagentReceiver(args(0),args(1),args(2),i,args(4),args(5),args(6).toInt)
)
}
val lines = ssc.union(bigPipeStreams)
def deserializePbData(value: String) : String = {
if (null == value || value.isEmpty) {
return ""
}
var cuid = ""
var os = ""
var channel = ""
var sv = ""
var resid = ""
var appid = ""
var prod = ""
try { //if exception,useless data,just drop it
val timeStrIndex = value.indexOf(",\"time_str\"")
var strAfterTruncation = ""
if (-1 != timeStrIndex) {
strAfterTruncation = value.substring(0,timeStrIndex) + "}"
} else {
strAfterTruncation = value
}
val jsonData = JSONObject.fromObject(strAfterTruncation)
//val jsonData = value.getAsJsonArray()
val binBody = jsonData.getString("bin_body")
val pbData = binBody.substring(1,binBody.length()-1).split(",").foldLeft(ArrayBuffer.empty[Byte])((b,a) => b +java.lang.Byte.parseByte(a)).drop(8).toArray
Lighttpd.lighttpd_log.parseFrom(pbData).getRequest().getUrl().getUrlFields().getAutokvList().asScala.foreach(a =>
a.getKey() match {
case "cuid" => cuid += a.getValue()
case "os" => os += a.getValue()
case "channel" => channel += a.getValue()
case "sv" => sv += a.getValue()
case "resid" => resid += a.getValue()
case "appid" => appid += a.getValue()
case "prod" => prod += a.getValue()
case _ => null
}
)
val decodeCuid = URLDecoder.decode(cuid, "UTF-8")
os = os.toLowerCase()
if (os.matches("android(.*)")) {
os = "android"
} else if (os.matches("iphone(.*)")) {
os = "iphone"
} else if (os.matches("ipad(.*)")) {
os = "ipad"
} else if (os.matches("s60(.*)")) {
os = "symbian"
} else if (os.matches("wp7(.*)")) {
os = "wp7"
} else if (os.matches("wp(.*)")) {
os = "wp"
} else if (os.matches("tizen(.*)")) {
os = "tizen"
val ifHasLogid = Lighttpd.lighttpd_log.parseFrom(pbData).hasLogid()
val time = Lighttpd.lighttpd_log.parseFrom(pbData).getTime()
if (ifHasLogid) {
val logid = Lighttpd.lighttpd_log.parseFrom(pbData).getLogid()
if (logid.isEmpty || logid.toString().equals("-") || !resid.toString().equals("01") || channel.isEmpty |!appid.isEmpty || !prod.isEmpty) {
""
} else {
decodeCuid + "\001" + os + "\001" + channel + "\001" + sv + "\001" + "1" + "\001" + "1" + "\001" + time + "\n"
}
} else {
""
}
} catch {
case _:Throwable => ""
}
}
lines.map(parseData).print()
The error text:
016-07-12T12:00:01.546+0800: 5096.643: [GC (Allocation Failure)
Desired survivor size 442499072 bytes, new threshold 1 (max 15)
[PSYoungGen: 0K->0K(2356736K)] 5059009K->5059009K(7949312K), 0.0103342 secs] [Times: user=0.21 sys=0.00, real=0.01 secs]
2016-07-12T12:00:01.556+0800: 5096.654: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(2356736K)] [ParOldGen: 5059009K->5057376K(5592576K)] 5059009K->5057376K(7949312K), [Metaspace: 44836K->44490K(1089536K)], 0.8769617 secs] [Times: user=17.88 sys=0.04, real=0.88 secs]
2016-07-12T12:00:02.434+0800: 5097.531: Total time for which application threads were stopped: 1.2951974 seconds, Stopping threads took: 0.0000662 seconds
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid24310.hprof ...
2016-07-12T12:00:30.960+0800: 5126.057: Total time for which application threads were stopped: 28.5260812 seconds, Stopping threads took: 0.0000995 seconds
Heap dump file created [5211252802 bytes in 28.526 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
# Executing /bin/sh -c "kill 24310"...
2016-07-12T12:00:31.589+0800: 5126.686: Total time for which application threads were stopped: 0.6289627 seconds, Stopping threads took: 0.0001258 seconds
2016-07-12T12:00:31.595+0800: 5126.692: Total time for which application threads were stopped: 0.0004822 seconds, Stopping threads took: 0.0001493 seconds
2016-07-12 12:00:31.597 [Thread-5] ERROR [Logging.scala:95] - Uncaught exception in thread Thread[Thread-5,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236) ~[na:1.8.0_51]
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) ~[na:1.8.0_51]
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) ~[na:1.8.0_51]
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) ~[na:1.8.0_51]
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_51]
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) ~[na:1.8.0_51]
at com.esotericsoftware.kryo.io.Output.flush(Output.java:155) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at com.esotericsoftware.kryo.io.Output.require(Output.java:135) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at com.esotericsoftware.kryo.io.Output.writeString_slow(Output.java:420) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:326) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:194) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:153) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1196) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1202) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:858) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:645) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.streaming.receiver.BlockManagerBasedBlockHandler.storeBlock(ReceivedBlockHandler.scala:77) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushAndReportBlock(ReceiverSupervisorImpl.scala:157) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushArrayBuffer(ReceiverSupervisorImpl.scala:128) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl$$anon$3.onPushBlock(ReceiverSupervisorImpl.scala:109) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.streaming.receiver.BlockGenerator.pushBlock(BlockGenerator.scala:296) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.streaming.receiver.BlockGenerator.org$apache$spark$streaming$receiver$BlockGenerator$$keepPushingBlocks( BlockGenerator.scala:268) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.streaming.receiver.BlockGenerator$$anon$1.run(BlockGenerator.scala:109) ~[ spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
2016-07-12 12:00:31.600 [SIGTERM handler] ERROR [SignalLogger.scala:57] - RECEIVED SIGNAL 15: SIGTERM
2016-07-12T12:00:31.611+0800: 5126.708: Total time for which application threads were stopped: 0.0005602 seconds, Stopping threads took: 0.0001765 seconds
2016-07-12T12:00:31.617+0800: 5126.714: Total time for which application threads were stopped: 0.0004800 seconds, Stopping threads took: 0.0001412 seconds
2016-07-12 12:00:32.483 [Bigpipe Receiver-SendThread(cq01-bigpipe-proxy01.cq01.baidu.com:2181)] WARN [ClientCnxnSocket.java:139] - Connected to an old server; r-o mode will be unavailable
2016-07-12T12:00:32.507+0800: 5127.604: Total time for which application threads were stopped: 0.0004604 seconds, Stopping threads took: 0.0001198 seconds
2016-07-12T12:00:32.509+0800: 5127.606: Total time for which application threads were stopped: 0.0002919 seconds, Stopping threads took: 0.0001800 seconds
2016-07-12T12:00:32.509+0800: 5127.607: Total time for which application threads were stopped: 0.0002692 seconds, Stopping threads took: 0.0001612 seconds
2016-07-12 12:00:32.549 [Bigpipe Receiver-SendThread(tc-bigpipe-proxy03.tc.baidu.com:2181)] WARN [ClientCnxnSocket.java:139] - Connected to an old server; r-o mode will be unavailable
2016-07-12T12:00:34.220+0800: 5129.317: [GC (Allocation Failure)
Desired survivor size 424148992 bytes, new threshold 2 (max 15)
[PSYoungGen: 1931776K->188775K(2363904K)] 6989152K->5246152K(7956480K), 0.2569385 secs] [Times: user=0.00 sys=5.19, real=0.26 secs]
2016-07-12T12:00:34.477+0800: 5129.575: Total time for which application threads were stopped: 0.2575019 seconds, Stopping threads took: 0.0000384 seconds
2016-07-12T12:00:35.478+0800: 5130.575: Total time for which application threads were stopped: 0.0002786 seconds, Stopping threads took: 0.0000424 seconds
2016-07-12T12:00:37.600+0800: 5132.697: [GC (Allocation Failure)
Desired survivor size 482344960 bytes, new threshold 3 (max 15)
[PSYoungGen: 2120551K->387013K(2268160K)] 7177928K->5444389K(7860736K), 0.5153031 secs] [Times: user=0.00 sys=9.89, real=0.52 secs]
2016-07-12T12:00:38.116+0800: 5133.213: Total time for which application threads were stopped: 0.5157529 seconds, Stopping threads took: 0.0000427 seconds
2016-07-12T12:00:40.116+0800: 5135.213: Total time for which application threads were stopped: 0.0003171 seconds, Stopping threads took: 0.0001000 seconds
2016-07-12T12:00:40.419+0800: 5135.516: [GC (Allocation Failure)
Desired survivor size 599785472 bytes, new threshold 2 (max 15)
[PSYoungGen: 2240965K->471033K(2324992K)] 7298341K->5633517K(7917568K), 0.3621433 secs] [Times: user=0.12 sys=7.11, real=0.36 secs]
2016-07-12T12:00:40.781+0800: 5135.878: Total time for which application threads were stopped: 0.3626080 seconds, Stopping threads took: 0.0000429 seconds
2016-07-12T12:00:41.781+0800: 5136.879: Total time for which application threads were stopped: 0.0003301 seconds, Stopping threads took: 0.0000947 seconds
2016-07-12T12:00:43.108+0800: 5138.205: [GC (Allocation Failure)
Desired survivor size 620756992 bytes, new threshold 3 (max 15)
[PSYoungGen: 2324985K->378481K(2054656K)] 7487469K->5831048K(7647232K), 0.2593685 secs] [Times: user=0.66 sys=4.96, real=0.26 secs]
2016-07-12T12:00:43.368+0800: 5138.465: [Full GC (Ergonomics) [PSYoungGen: 378481K->0K(2054656K)] [ParOldGen: 5452566K->4713601K(5592576K)] 5831048K->4713601K(7647232K), [Metaspace: 44635K->44635K(1089536K)], 4.3137405 secs] [Times: user=9.78 sys=74.53, real=4.31 secs]
2016-07-12T12:00:47.682+0800: 5142.779: Total time for which application threads were stopped: 4.5736603 seconds, Stopping threads took: 0.0000449 seconds
2016-07-12T12:00:47.682+0800: 5142.779: Total time for which application threads were stopped: 0.0002430 seconds, Stopping threads took: 0.0000856 seconds
2016-07-12T12:00:49.954+0800: 5145.052: [GC (Allocation Failure)
Desired survivor size 597688320 bytes, new threshold 4 (max 15)
[PSYoungGen: 1583616K->161266K(2189824K)] 6297217K->4874867K(7782400K), 0.0388138 secs] [Times: user=0.00 sys=0.84, real=0.04 secs]
2016-07-12T12:00:49.993+0800: 5145.091: Total time for which application threads were stopped: 0.0392926 seconds, Stopping threads took: 0.0000449 seconds
2016-07-12T12:00:51.903+0800: 5147.000: [GC (Allocation Failure)
Desired survivor size 596115456 bytes, new threshold 5 (max 15)
[PSYoungGen: 1744882K->324587K(2213888K)] 6458483K->5038189K(7806464K), 0.0334029 secs] [Times: user=0.69 sys=0.03, real=0.04 secs]
2016-07-12T12:00:51.936+0800: 5147.034: Total time for which application threads were stopped: 0.0338707 seconds, Stopping threads took: 0.0000404 seconds
2016-07-12T12:00:53.942+0800: 5149.039: [GC (Allocation Failure)
Desired survivor size 654835712 bytes, new threshold 6 (max 15)
[PSYoungGen: 1954795K->490438K(2120704K)] 6668397K->5204039K(7713280K), 0.0441762 secs] [Times: user=0.95 sys=0.02, real=0.05 secs]
2016-07-12T12:00:53.986+0800: 5149.083: Total time for which application threads were stopped: 0.0446174 seconds, Stopping threads took: 0.0000456 seconds
2016-07-12T12:00:56.102+0800: 5151.199: [GC (Allocation Failure)
Desired survivor size 763887616 bytes, new threshold 5 (max 15)
[PSYoungGen: 2120646K->639467K(1943552K)] 6834247K->5370280K(7536128K), 0.1124828 secs] [Times: user=1.07 sys=1.30, real=0.11 secs]
2016-07-12T12:00:56.214+0800: 5151.312: Total time for which application threads were stopped: 0.1129348 seconds, Stopping threads took: 0.0000396 seconds
2016-07-12T12:00:57.784+0800: 5152.881: [GC (Allocation Failure)
Desired survivor size 895483904 bytes, new threshold 4 (max 15)
[PSYoungGen: 1943531K->745977K(2050048K)] 6674344K->5504073K(7642624K), 0.0971717 secs] [Times: user=1.20 sys=0.67, real=0.10 secs]
2016-07-12T12:00:57.881+0800: 5152.979: Total time for which application threads were stopped: 0.0977363 seconds, Stopping threads took: 0.0000941 seconds
2016-07-12T12:00:59.406+0800: 5154.504: [GC (Allocation Failure)
Desired survivor size 935329792 bytes, new threshold 5 (max 15)
[PSYoungGen: 2050041K->599188K(1715200K)] 6808137K->5647517K(7307776K), 0.3651465 secs] [Times: user=0.98 sys=5.88, real=0.37 secs]
2016-07-12T12:00:59.772+0800: 5154.869: Total time for which application threads were stopped: 0.3656089 seconds, Stopping threads took: 0.0000479 seconds
2016-07-12T12:01:00.968+0800: 5156.066: [GC (Allocation Failure)
Desired survivor size 954204160 bytes, new threshold 4 (max 15)
[PSYoungGen: 1568404K->697830K(1667072K)] 6616733K->5746159K(7259648K), 0.0978955 secs] [Times: user=1.91 sys=0.04, real=0.09 secs]
2016-07-12T12:01:01.066+0800: 5156.164: Total time for which application threads were stopped: 0.0983759 seconds, Stopping threads took: 0.0000482 seconds
2016-07-12T12:01:02.189+0800: 5157.287: [GC (Allocation Failure)
Desired survivor size 954204160 bytes, new threshold 3 (max 15)
[PSYoungGen: 1667046K->465454K(1864192K)] 6715375K->5855655K(7456768K), 0.1261993 secs] [Times: user=2.41 sys=0.29, real=0.12 secs]
2016-07-12T12:01:02.316+0800: 5157.413: [Full GC (Ergonomics) [PSYoungGen: 465454K->65236K(1864192K)] [ParOldGen: 5390200K->5592328K(5592576K)] 5855655K->5657564K(7456768K), [Metaspace: 44635K->44635K(1089536K)], 3.2729437 secs] [Times: user=12.34 sys=57.11, real=3.28 secs]
2016-07-12T12:01:05.589+0800: 5160.686: Total time for which application threads were stopped: 3.3998619 seconds, Stopping threads took: 0.0000521 seconds
2016-07-12T12:01:05.589+0800: 5160.686: Total time for which application threads were stopped: 0.0002330 seconds, Stopping threads took: 0.0000949 seconds
2016-07-12T12:01:05.688+0800: 5160.785: Total time for which application threads were stopped: 0.0002935 seconds, Stopping threads took: 0.0000514 seconds
Heap
PSYoungGen total 1864192K, used 146620K [0x0000000715580000, 0x00000007c0000000, 0x00000007c0000000)
eden space 932352K, 8% used [0x0000000715580000,0x000000071a4fa138,0x000000074e400000)
from space 931840K, 7% used [0x0000000787200000,0x000000078b1b5290,0x00000007c0000000)
to space 931840K, 0% used [0x000000074e400000,0x000000074e400000,0x0000000787200000)
ParOldGen total 5592576K, used 5592328K [0x00000005c0000000, 0x0000000715580000, 0x0000000715580000)
object space 5592576K, 99% used [0x00000005c0000000,0x00000007155420a8,0x0000000715580000)
Metaspace used 44654K, capacity 44990K, committed 45864K, reserved 1089536K
class space used 6212K, capacity 6324K, committed 6440K, reserved 1048576K
New Error: I think it's the upload error which call oom problem.I'd like to know how to fix this upload error?
2016-07-15 11:41:47.307 [shuffle-client-0] ERROR [TransportChannelHandler.java:128] - Connection to nmg01-taihang-d10207.nmg01.baidu.com/10.76.48.22:30456 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.
2016-07-15 11:41:47.309 [shuffle-client-0] ERROR [TransportResponseHandler.java:122] - Still have 1 requests outstanding when connection from nmg01-taihang-d10207.nmg01.baidu.com/10.76.48.22:30456 is closed
2016-07-15 11:41:47.314 [shuffle-client-0] ERROR [Logging.scala:95] - Error while uploading block input-0-1468553896200
java.io.IOException: Connection from nmg01-taihang-d10207.nmg01.baidu.com/10.76.48.22:30456 closed
at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:124) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:94) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
2016-07-15T11:41:47.316+0800: 2176.487: Total time for which application threads were stopped: 0.0002632 seconds, Stopping threads took: 0.0000521 seconds
2016-07-15 11:41:47.316 [Thread-5] WARN [Logging.scala:91] - Failed to replicate input-0-1468553896200 to BlockManagerId(2, nmg01-taihang-d10207.nmg01.baidu.com, 30456), failure #0
java.io.IOException: Connection from nmg01-taihang-d10207.nmg01.baidu.com/10.76.48.22:30456 closed
at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:124) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:94) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ~[spark-assembly-1.6.1.0-baidu-SNAPSHOT-hadoop2.5.1.3-baidu-SNAPSHOT.jar:1.6.1.0-baidu-SNAPSHOT]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_51]
2016-07-15T11:41:48.316+0800: 2177.487: Total time for which application threads were stopped: 0.0003391 seconds, Stopping threads took: 0.0000979 seconds
2016-07-15T11:41:51.312+0800: 2180.483: [GC (Allocation Failure) --[PSYoungGen: 2894863K->2894863K(3007488K)] 8299519K->9550273K(9998336K), 0.7462118 secs] [Times: user=9.78 sys=0.02, real=0.74 secs]
2016-07-15T11:41:52.059+0800: 2181.230: [Full GC (Ergonomics) [PSYoungGen: 2894863K->0K(3007488K)] [ParOldGen: 6655410K->6895736K(6990848K)] 9550273K->6895736K(9998336K), [Metaspace: 44409K->44409K(1087488K)], 0.4061892 secs] [Times: user=7.50 sys=0.01, real=0.41 secs]
Your code appears to have an error in structure. In the process of looking at your code (to re-indenting it to reflect the structure as posted), I found that your last else if statement:
} else if (os.matches("tizen(.*)")) {
os = "tizen"
opens a block, but does not close the block where it "should". Instead, the block is actually terminated with:
} catch {
The full code as it appears was intended (and re-indented) is:
val conf = new SparkConf().setAppName(args(8))
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.streaming.stopGracefullyOnShutdown", "true")
conf.set("spark.streaming.backpressure.enabled","true")
conf.set("spark.speculation","true")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(args(7).toInt))
val bigPipeStreams = (1 to args(3).toInt).map{
i => ssc.networkStream(
new MyBigpipeLogagentReceiver(args(0),args(1),args(2),i,args(4),args(5),args(6).toInt)
)
}
val lines = ssc.union(bigPipeStreams)
def deserializePbData(value: String) : String = {
if (null == value || value.isEmpty) {
return ""
}
var cuid = ""
var os = ""
var channel = ""
var sv = ""
var resid = ""
var appid = ""
var prod = ""
try { //if exception,useless data,just drop it
val timeStrIndex = value.indexOf(",\"time_str\"")
var strAfterTruncation = ""
if (-1 != timeStrIndex) {
strAfterTruncation = value.substring(0,timeStrIndex) + "}"
} else {
strAfterTruncation = value
}
val jsonData = JSONObject.fromObject(strAfterTruncation)
//val jsonData = value.getAsJsonArray()
val binBody = jsonData.getString("bin_body")
val pbData = binBody.substring(1,binBody.length()-1).split(",").foldLeft(ArrayBuffer.empty[Byte])((b,a) => b +java.lang.Byte.parseByte(a)).drop(8).toArray
Lighttpd.lighttpd_log.parseFrom(pbData).getRequest().getUrl().getUrlFields().getAutokvList().asScala.foreach(a =>
a.getKey() match {
case "cuid" => cuid += a.getValue()
case "os" => os += a.getValue()
case "channel" => channel += a.getValue()
case "sv" => sv += a.getValue()
case "resid" => resid += a.getValue()
case "appid" => appid += a.getValue()
case "prod" => prod += a.getValue()
case _ => null
}
)
val decodeCuid = URLDecoder.decode(cuid, "UTF-8")
os = os.toLowerCase()
if (os.matches("android(.*)")) {
os = "android"
} else if (os.matches("iphone(.*)")) {
os = "iphone"
} else if (os.matches("ipad(.*)")) {
os = "ipad"
} else if (os.matches("s60(.*)")) {
os = "symbian"
} else if (os.matches("wp7(.*)")) {
os = "wp7"
} else if (os.matches("wp(.*)")) {
os = "wp"
} else if (os.matches("tizen(.*)")) {
os = "tizen"
}
val ifHasLogid = Lighttpd.lighttpd_log.parseFrom(pbData).hasLogid()
val time = Lighttpd.lighttpd_log.parseFrom(pbData).getTime()
if (ifHasLogid) {
val logid = Lighttpd.lighttpd_log.parseFrom(pbData).getLogid()
if (logid.isEmpty || logid.toString().equals("-") || !resid.toString().equals("01") || channel.isEmpty |!appid.isEmpty || !prod.isEmpty) {
""
} else {
decodeCuid + "\001" + os + "\001" + channel + "\001" + sv + "\001" + "1" + "\001" + "1" + "\001" + time + "\n"
}
} else {
""
}
} catch {
case _:Throwable => ""
}
}
lines.map(parseData).print()
I have not checked your code for functionality. This is just a syntax/structure issue that stood out when very briefly looking at your the code you posted.
Related
ffmpeg takes too long to start
I have this command in python script, in a loop: ffmpeg -i somefile.mp4 -ss 00:03:12 -t 00:00:35 piece.mp4 -loglevel error -stats It cuts out pieces of input file (-i). Input filename, as well as start time (-ss) and length of the piece I cut out (-t) varies, so it reads number of mp4 files and cuts out number of pieces from each one. During execution of the script it might be called around 100 times. My problem is that each time before it starts, there is a delay of 6-15 seconds and it adds up to significant time. How can I get it to start immediately? Initially I thought it was process priority problem, but I noticed that even during the "pause", all processors work at 100%, so apparently some work is being done. The script (process_videos.py): import subprocess import sys import math import time class TF: """TimeFormatter class (TF). This class' reason for being is to convert time in short form, e.g. 1:33, 0:32, or 23 into long form accepted by mp4cut function in bash, e.g. 00:01:22, 00:00:32, etc""" def toLong(self, shrt): """Converts time to its long form""" sx = '00:00:00' ladd = 8 - len(shrt) n = sx[:ladd] + shrt return n def toShort(self, lng): """Converts time to short form""" if lng[0] == '0' or lng[0] == ':': return self.toShort(lng[1:]) else: return lng def toSeconds(self, any_time): """Converts time to seconds""" if len(any_time) < 3: return int(any_time) tt = any_time.split(':') if len(any_time) < 6: return int(tt[0])*60 + int(tt[1]) return int(tt[0])*3600 + int(tt[1])*60 + int(tt[2]) def toTime(self, secsInt): """""" tStr = '' hrs, mins, secs = 0, 0, 0 if secsInt >= 3600: hrs = math.floor(secsInt / 3600) secsInt = secsInt % 3600 if secsInt >= 60: mins = math.floor(secsInt / 60) secsInt = secsInt % 60 secs = secsInt return str(hrs).zfill(2) + ':' + str(mins).zfill(2) + ':' + str(secs).zfill(2) def minus(self, t_start, t_end): """""" t_e = self.toSeconds(t_end) t_s = self.toSeconds(t_start) t_r = t_e - t_s hrs, mins, secs = 0, 0, 0 if t_r >= 3600: hrs = math.floor(t_r / 3600) t_r = t_r - (hrs * 3600) if t_r >= 60: mins = math.floor(t_r / 60) t_r = t_r - (mins * 60) secs = t_r hrsf = str(hrs).zfill(2) minsf = str(mins).zfill(2) secsf = str(secs).zfill(2) t_fnl = hrsf + ':' + minsf + ':' + secsf return t_fnl def go_main(): tf = TF() vid_n = 0 arglen = len(sys.argv) if arglen == 2: with open(sys.argv[1], 'r') as f_in: lines = f_in.readlines() start = None end = None cnt = 0 for line in lines: if line[:5] == 'BEGIN': start = cnt if line[:3] == 'END': end = cnt cnt += 1 if start == None or end == None: print('Invalid file format. start = {}, end = {}'.format(start,end)) return else: lines_r = lines[start+1:end] del lines print('videos to process: {}'.format(len(lines_r))) f_out_prefix = "" for vid in lines_r: vid_n += 1 print('\nProcessing video {}/{}'.format(vid_n, len(lines_r))) f_out_prefix = 'v' + str(vid_n) + '-' dat = vid.split('!')[1:3] title = dat[0] dat_t = dat[1].split(',') v_pieces = len(dat_t) piece_n = 0 video_pieces = [] cmd1 = "echo -n \"\" > tmpfile" subprocess.run(cmd1, shell=True) print(' new tmpfile created') for v_times in dat_t: piece_n += 1 f_out = f_out_prefix + str(piece_n) + '.mp4' video_pieces.append(f_out) print(' piece filename {} added to video_pieces list'.format(f_out)) v_times_spl = v_times.split('-') v_times_start = v_times_spl[0] v_times_end = v_times_spl[1] t_st = tf.toLong(v_times_start) t_dur = tf.toTime(tf.toSeconds(v_times_end) - tf.toSeconds(v_times_start)) cmd3 = ["ffmpeg", "-i", title, "-ss", t_st, "-t", t_dur, f_out, "-loglevel", "error", "-stats"] print(' cutting out piece {}/{} - {}'.format(piece_n, len(dat_t), t_dur)) subprocess.run(cmd3) for video_piece_name in video_pieces: cmd4 = "echo \"file " + video_piece_name + "\" >> tmpfile" subprocess.run(cmd4, shell=True) print(' filename {} added to tmpfile'.format(video_piece_name)) vname = f_out_prefix[:-1] + ".mp4" print(' name of joined file: {}'.format(vname)) cmd5 = "ffmpeg -f concat -safe 0 -i tmpfile -c copy joined.mp4 -loglevel error -stats" to_be_joined = " ".join(video_pieces) print(' joining...') join_cmd = subprocess.Popen(cmd5, shell=True) join_cmd.wait() print(' joined!') cmd6 = "mv joined.mp4 " + vname rename_cmd = subprocess.Popen(cmd6, shell=True) rename_cmd.wait() print(' File joined.mp4 renamed to {}'.format(vname)) cmd7 = "rm " + to_be_joined rm_cmd = subprocess.Popen(cmd7, shell=True) rm_cmd.wait() print('rm command completed - pieces removed') cmd8 = "rm tmpfile" subprocess.run(cmd8, shell=True) print('tmpfile removed') print('All done') else: print('Incorrect number of arguments') ############################ if __name__ == '__main__': go_main() process_videos.py is called from bash terminal like this: $ python process_videos.py video_data video_data file has the following format: BEGIN !first_video.mp4!3-23,55-1:34,2:01-3:15,3:34-3:44! !second_video.mp4!2-7,12-44,1:03-1:33! END My system details: System: Host: snowflake Kernel: 5.4.0-52-generic x86_64 bits: 64 Desktop: Gnome 3.28.4 Distro: Ubuntu 18.04.5 LTS Machine: Device: desktop System: Gigabyte product: N/A serial: N/A Mobo: Gigabyte model: Z77-D3H v: x.x serial: N/A BIOS: American Megatrends v: F14 date: 05/31/2012 CPU: Quad core Intel Core i5-3570 (-MCP-) cache: 6144 KB clock speeds: max: 3800 MHz 1: 1601 MHz 2: 1601 MHz 3: 1601 MHz 4: 1602 MHz Drives: HDD Total Size: 1060.2GB (55.2% used) ID-1: /dev/sda model: ST31000524AS size: 1000.2GB ID-2: /dev/sdb model: Corsair_Force_GT size: 60.0GB Partition: ID-1: / size: 366G used: 282G (82%) fs: ext4 dev: /dev/sda1 ID-2: swap-1 size: 0.70GB used: 0.00GB (0%) fs: swap dev: /dev/sda5 Info: Processes: 313 Uptime: 16:37 Memory: 3421.4/15906.9MB Client: Shell (bash) inxi: 2.3.56 UPDATE: Following Charles' advice, I used performance sampling: # perf record -a -g sleep 180 ...and here's the report: Samples: 74K of event 'cycles', Event count (approx.): 1043554519767 Children Self Command Shared Object - 50.56% 45.86% ffmpeg libavcodec.so.57.107.100 - 3.10% 0x4489480000002825 0.64% 0x7ffaf24b92f0 - 2.12% 0x5f7369007265646f av_default_item_name 1.39% 0 - 44.48% 40.59% ffmpeg libx264.so.152 5.78% x264_add8x8_idct_avx2.skip_prologue 3.13% x264_add8x8_idct_avx2.skip_prologue 2.91% x264_add8x8_idct_avx2.skip_prologue 2.31% x264_add8x8_idct_avx.skip_prologue 2.03% 0 1.78% 0x1 1.26% x264_add8x8_idct_avx2.skip_prologue 1.09% x264_add8x8_idct_avx.skip_prologue 1.06% x264_me_search_ref 0.97% x264_add8x8_idct_avx.skip_prologue 0.60% x264_me_search_ref - 38.01% 0.00% ffmpeg [unknown] 4.10% 0 - 3.49% 0x4489480000002825 0.70% 0x7ffaf24b92f0 0.56% 0x7f273ae822f0 0.50% 0x7f0c4768b2f0 - 2.29% 0x5f7369007265646f av_default_item_name 1.99% 0x1 10.13% 10.12% ffmpeg [kernel.kallsyms] - 3.14% 0.73% ffmpeg libavutil.so.55.78.100 2.34% av_default_item_name - 1.73% 0.21% ffmpeg libpthread-2.27.so - 0.70% pthread_cond_wait##GLIBC_2.3.2 - 0.62% entry_SYSCALL_64_after_hwframe - 0.62% do_syscall_64 - 0.57% __x64_sys_futex 0.52% do_futex 0.93% 0.89% ffmpeg libc-2.27.so - 0.64% 0.64% swapper [kernel.kallsyms] 0.63% secondary_startup_64 0.21% 0.18% ffmpeg libavfilter.so.6.107.100 0.20% 0.11% ffmpeg libavformat.so.57.83.100 0.12% 0.11% ffmpeg ffmpeg 0.11% 0.00% gnome-terminal- [unknown] 0.09% 0.07% ffmpeg libm-2.27.so 0.08% 0.07% ffmpeg ld-2.27.so 0.04% 0.04% gnome-terminal- libglib-2.0.so.0.5600.4
When you put -ss afer -i, mplayer will not use the keyframes to jump into the frame. ffmpeg will decode the video from the beginning of the video. That's where the 6-15 second delay with 100% CPU usage came from. You can put -ss before the -i e.g: ffmpeg -ss 00:03:12 -i somefile.mp4 -t 00:00:35 piece.mp4 -loglevel error -stats This will make ffmpeg use the keyframes and directly jumps to the starting time.
PyMQI bad consuming performance with persistence
I'm testing the performance of IBM MQ (running the latest version in a local docker container) I use a persistent queue. On the producer side, I can get higher throughput by running multiple producing applications in parallel. However, on the consumer side, I cannot increase the throughput by parallelizing consumer processes. On the contrary, the throughput is even worse for multiple consumers than for one single consumer. What could be the reason for the poor consuming performance? It shouldn't be due to the hardware limit as I'm comparing the consumption with the production and I did only message consumption without any other processing. Does the GET perform the commit for each message? I don't find any explicit commit method in PyMQI though. put_demo.py #!/usr/bin/env python3 import pymqi import time queue_manager = 'QM1' channel = 'DEV.APP.SVRCONN' host = '127.0.0.1' port = '1414' queue_name = 'DEV.QUEUE.1' message = b'Hello from Python!' conn_info = '%s(%s)' % (host, port) nb_messages = 1000 t0 = time.time() qmgr = pymqi.connect(queue_manager, channel, conn_info) queue = pymqi.Queue(qmgr, queue_name) for i in range(nb_messages): try: queue.put(message) except pymqi.MQMIError as e: print(f"Fatal error: {str(e)}") queue.close() qmgr.disconnect() t1 = time.time() print(f"tps: {nb_messages/(t1-t0):.0f} nb_message_produced: {nb_messages}") get_demo.py #!/usr/bin/env python3 import pymqi import time import os queue_manager = 'QM1' channel = 'DEV.APP.SVRCONN' host = '127.0.0.1' port = '1414' queue_name = 'DEV.QUEUE.1' conn_info = '%s(%s)' % (host, port) nb_messages = 1000 nb_messages_consumed = 0 t0 = time.time() qmgr = pymqi.connect(queue_manager, channel, conn_info) queue = pymqi.Queue(qmgr, queue_name) gmo = pymqi.GMO(Options = pymqi.CMQC.MQGMO_WAIT | pymqi.CMQC.MQGMO_FAIL_IF_QUIESCING) gmo.WaitInterval = 1000 while nb_messages_consumed < nb_messages: try: msg = queue.get(None, None, gmo) nb_messages_consumed += 1 except pymqi.MQMIError as e: if e.reason == 2033: # No messages, that's OK, we can ignore it. pass queue.close() qmgr.disconnect() t1 = time.time() print(f"tps: {nb_messages_consumed/(t1-t0):.0f} nb_messages_consumed: {nb_messages_consumed}") run results > for i in {1..10}; do ./put_demo.py & done tps: 385 nb_message_produced: 1000 tps: 385 nb_message_produced: 1000 tps: 383 nb_message_produced: 1000 tps: 379 nb_message_produced: 1000 tps: 378 nb_message_produced: 1000 tps: 377 nb_message_produced: 1000 tps: 377 nb_message_produced: 1000 tps: 378 nb_message_produced: 1000 tps: 374 nb_message_produced: 1000 tps: 374 nb_message_produced: 1000 > for i in {1..10}; do ./get_demo.py & done tps: 341 nb_messages_consumed: 1000 tps: 339 nb_messages_consumed: 1000 tps: 95 nb_messages_consumed: 1000 tps: 82 nb_messages_consumed: 1000 tps: 82 nb_messages_consumed: 1000 tps: 82 nb_messages_consumed: 1000 tps: 82 nb_messages_consumed: 1000 tps: 82 nb_messages_consumed: 1000 tps: 82 nb_messages_consumed: 1000 tps: 82 nb_messages_consumed: 1000 get_demo.py updated version using syncpoint and batch commit #!/usr/bin/env python3 import pymqi import time import os queue_manager = 'QM1' channel = 'DEV.APP.SVRCONN' host = '127.0.0.1' port = '1414' queue_name = 'DEV.QUEUE.1' conn_info = '%s(%s)' % (host, port) nb_messages = 1000 commit_batch = 10 nb_messages_consumed = 0 t0 = time.time() qmgr = pymqi.connect(queue_manager, channel, conn_info) queue = pymqi.Queue(qmgr, queue_name) gmo = pymqi.GMO(Options = pymqi.CMQC.MQGMO_WAIT | pymqi.CMQC.MQGMO_FAIL_IF_QUIESCING | pymqi.CMQC.MQGMO_SYNCPOINT) gmo.WaitInterval = 1000 while nb_messages_consumed < nb_messages: try: msg = queue.get(None, None, gmo) nb_messages_consumed += 1 if nb_messages_consumed % commit_batch == 0: qmgr.commit() except pymqi.MQMIError as e: if e.reason == 2033: # No messages, that's OK, we can ignore it. pass queue.close() qmgr.disconnect() t1 = time.time() print(f"tps: {nb_messages_consumed/(t1-t0):.0f} nb_messages_consumed: {nb_messages_consumed}") Thanks.
How can I measure mutex contention in Ruby?
I recently found myself trying to diagnose why a particular Ruby program was running slow. In the end it turned out be caused by a scaling issue causing a lot of contention on a particular mutex. I was wondering if there are any tools that I could have used to make this issue easier to diagnose? I know I could have used ruby-prof to get detailed output of what all 100+ threads of this program were spending their time on, but I'm curious whether there is any tool that is specifically focused on just measuring mutex contention in Ruby?
So If figured out how to do this with DTrace. Given a Ruby program like this: # mutex.rb mutex = Mutex.new threads = [] threads << Thread.new do loop do mutex.synchronize do sleep 2 end end end threads << Thread.new do loop do mutex.synchronize do sleep 4 end end end threads.each(&:join) We can use a DTrace script like this: /* mutex.d */ ruby$target:::cmethod-entry /copyinstr(arg0) == "Mutex" && copyinstr(arg1) == "synchronize"/ { self->file = copyinstr(arg2); self->line = arg3; } pid$target:ruby:rb_mutex_lock:entry /self->file != NULL && self->line != NULL/ { self->mutex_wait_start = timestamp; } pid$target:ruby:rb_mutex_lock:return /self->file != NULL && self->line != NULL/ { mutex_wait_ms = (timestamp - self->mutex_wait_start) / 1000; printf("Thread %d acquires mutex %d after %d ms - %s:%d\n", tid, arg1, mutex_wait_ms, self->file, self->line); self->file = NULL; self->line = NULL; } When we run this script against the Ruby program, we then get information like this: $ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb' Thread 286592 acquires mutex 4313316240 after 2 ms - mutex.rb:14 Thread 286591 acquires mutex 4313316240 after 4004183 ms - mutex.rb:6 Thread 286592 acquires mutex 4313316240 after 2004170 ms - mutex.rb:14 Thread 286592 acquires mutex 4313316240 after 6 ms - mutex.rb:14 Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14 Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14 Thread 286591 acquires mutex 4313316240 after 16012158 ms - mutex.rb:6 Thread 286592 acquires mutex 4313316240 after 2002593 ms - mutex.rb:14 Thread 286591 acquires mutex 4313316240 after 4001983 ms - mutex.rb:6 Thread 286592 acquires mutex 4313316240 after 2004418 ms - mutex.rb:14 Thread 286591 acquires mutex 4313316240 after 4000407 ms - mutex.rb:6 Thread 286592 acquires mutex 4313316240 after 2004163 ms - mutex.rb:14 Thread 286591 acquires mutex 4313316240 after 4003191 ms - mutex.rb:6 Thread 286591 acquires mutex 4313316240 after 2 ms - mutex.rb:6 Thread 286592 acquires mutex 4313316240 after 4005587 ms - mutex.rb:14 ... We can collect this output and use it to derive information about which mutexes are causing the most contention.
singleshot QTimer on OS X rapid fires multiple times and too early
I have implemented an idle timer on a resource (class) instances of which can be open in several applications at once. Hence, the idleTimer is not only a simple QTimer, but the slot (trigger) needs to verify if no other applications have accessed the same resources during the last N minutes. If that is the case, the timer is reset (without updating the lastAccessedTime value), otherwise the resource is closed. The timer is thus a singleshot one, and lastAccessTime is kept in a QSharedMemory object. Here's some trace output: ### "Google Contacts () of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83601 timer QTimer(0x11d273d60) triggered 1 times ### slotIdleTimedOut ->handleIdleTiming: setting QTimer(0x11d273d60) for wallet "kdewallet" handle 0 timeout to 6 ### "Google Contacts () of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83634 timer QTimer(0x11d273d60) triggered 2 times ### "Google Contacts () of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83634 timer QTimer(0x11d273d60) triggered 3 times ### "Google Contacts ()of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83634 timer QTimer(0x11d273d60) triggered 4 times ### "Google Contacts () of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83634 timer QTimer(0x11d273d60) triggered 5 times ### "Google Contacts () of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83635 timer QTimer(0x11d273d60) triggered 6 times ### "Google Contacts () of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83635 timer QTimer(0x11d273d60) triggered 7 times ### "Google Contacts () of type Google Contacts" Idle timeout 6 min. for KWallet::Wallet(0x105d1f900) "kdewallet" handle 0 ; elapsed minutes= 5.83635 timer QTimer(0x11d273d60) triggered 8 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle 0 ; elapsed minutes= 6 timer QTimer(0x120a1b5f0) triggered 1 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle -1 ; elapsed minutes= 6.00008 timer QObject(0x0) triggered 2 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle -1 ; elapsed minutes= 6.00009 timer QObject(0x0) triggered 3 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle -1 ; elapsed minutes= 6.00012 timer QObject(0x0) triggered 4 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle -1 ; elapsed minutes= 6.00012 timer QObject(0x0) triggered 5 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle -1 ; elapsed minutes= 6.00012 timer QObject(0x0) triggered 6 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle -1 ; elapsed minutes= 6.00012 timer QObject(0x0) triggered 7 times ### "KMail" Idle timeout 6 min. for KWallet::Wallet(0x1083f1ac0) "kdewallet" handle -1 ; elapsed minutes= 6.00012 timer QObject(0x0) triggered 8 times The principle works, but I notice 2 things: the timer fires a bit early. Of course that causes the timer to be reset. it fires several times in fast succession. The fact that an early fire should reset it doesn't have the slightest effect. Below is the relevant part of my code, including the function that resets the timer at each resource access, and the timer's trigger slot. Any idea what I'm doing wrong? I stop the timer before (re)setting it to singleshot mode and starting it (anew). The object and application identifiers show that it is indeed the same timer that triggers multiple times, and that it can get triggered even after I deleted the timer object. Could it be that the trigger slot is not application (or even instance) specific, somehow leading to 1 instance receiving the idleTimer trigger signals from all other instances across the various applications that set an instance of this timer? idleTimer gets set to NULL only in the class destructor and/or when timeOut is <=0, so I'm stymied that my trigger slot can get called with a NULL timer object! From the timer install function (handleIdleTiming, a member of KWallet::Wallet as is the idleTimer itself): // This function is to be called at every operation that is supposed to launch or reset // the idle timing. #p timeOut is a time in minutes. void handleIdleTiming(const char *caller="", bool touchAccessTime=true) { // ... if( timeOut >= 0 ){ if( !idleTimer ){ idleTimer = new QTimer(0); } else{ idleTimer->stop(); } // when the idle timer fires, the wallet is supposed to be closed. There is thus // no reason to use a repeating timer. idleTimer->setSingleShot(true); connect( idleTimer, SIGNAL(timeout()), q, SLOT(slotIdleTimedOut()) ); if( touchAccessTime ){ if( lastAccessTime.lock() ){ *((double*)lastAccessTime.data()) = HRTime_Time(); lastAccessTime.unlock(); } else{ qDebug() << "Cannot lock lastAccessTime for wallet" << name << "error" << lastAccessTime.errorString(); } } idleTimer->start( timeOut * 60 * 1000 ); The timer trigger slot: void Wallet::slotIdleTimedOut() { double lastAccessTime = 0; // check the last time anyone accessed this wallet: if( d->lastAccessTime.lock() ){ lastAccessTime = *((double*)d->lastAccessTime.data()); d->lastAccessTime.unlock(); } else{ qDebug() << "Cannot lock lastAccessTime for wallet" << d->name << "error" << d->lastAccessTime.errorString(); } // the time elapsed since that last access, in minutes: double elapsed = (HRTime_Time() - lastAccessTime) / 60; d->idleTimerTriggered += 1; qDebug() << "###" << appid() << "Idle timeout" << d->timeOut << "min. for" << this << d->name << "handle" << d->handle << "; elapsed minutes=" << elapsed << "timer" << d->idleTimer << "triggered" << d->idleTimerTriggered << "times"; if( elapsed >= d->timeOut ){ // we have a true timeout, i.e. we didn't access the wallet in timeOut minutes, and no one else did either. slotWalletClosed(d->handle); } else{ // false alarm, reset the timer, but there's no need to count this as an access! d->handleIdleTiming(__FUNCTION__, false); } }
Must indeed have been because I issued a connect statement each time I reset the timer, instead of only once after creating it.
Solaris libumem why not show memory leak for first dynamic allocation
Say void main() { void *buff; buff = malloc(128); buff = malloc(60); buff = malloc(30); buff = malloc(16); free(buff); sleep(180); } ulib mem in solaris10 show only 60 bytes and 30 bytes as leak , why it not show 128 bytes also leaked?
Three memory leaks are detected by mdb and libumem with your code here: cat > leak.c <<% int main() { void *buff; buff = malloc(128); buff = malloc(60); buff = malloc(30); buff = malloc(16); free(buff); sleep(180); } % gcc -g leak.c -o leak pkill leak UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1 leak & sleep 5 rm -f core.leak.* gcore -o core.leak $(pgrep leak) mdb leak core.leak.* <<% ::findleaks -d % gcore: core.leak.1815 dumped CACHE LEAKED BUFCTL CALLER 0807f010 1 0808b4b8 main+0x29 0807d810 1 0808d088 main+0x39 0807d010 1 08092cd0 main+0x49 ------------------------------------------------------------------------ Total 3 buffers, 280 bytes umem_alloc_160 leak: 1 buffer, 160 bytes ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS 808b4b8 8088f28 1af921ab662 1 807f010 806c000 0 libumem.so.1`umem_cache_alloc_debug+0x144 libumem.so.1`umem_cache_alloc+0x19a libumem.so.1`umem_alloc+0xcd libumem.so.1`malloc+0x2a main+0x29 _start+0x83 umem_alloc_80 leak: 1 buffer, 80 bytes ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS 808d088 808cf68 1af921c11eb 1 807d810 806c064 0 libumem.so.1`umem_cache_alloc_debug+0x144 libumem.so.1`umem_cache_alloc+0x19a libumem.so.1`umem_alloc+0xcd libumem.so.1`malloc+0x2a main+0x39 _start+0x83 umem_alloc_40 leak: 1 buffer, 40 bytes ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS 8092cd0 808efc8 1af921f2bd2 1 807d010 806c0c8 0 libumem.so.1`umem_cache_alloc_debug+0x144 libumem.so.1`umem_cache_alloc+0x19a libumem.so.1`umem_alloc+0xcd libumem.so.1`malloc+0x2a main+0x49 _start+0x83