I am trying to populate a Gridgain cache which is running on a cluster on other machine. I'm getting ClientDisconnectedException while calling method on a Gridgain Cache and getting this after calling put method on the Cache.
Here is my cache Configuration:
// DPH cache
CacheConfiguration<K,V> DPHCacheCfg = new CacheConfiguration<>(DPH_CACHE);
DPHCacheCfg.setCacheMode(CacheMode.PARTITIONED); // Default.
DPHCacheCfg.setIndexedTypes(String.class, DPH.class);
DPHCacheCfg.setOffHeapMaxMemory(10 * 1024L * 1024L * 1024L);
DPHCacheCfg.setMemoryMode(CacheMemoryMode.ONHEAP_TIERED);
FifoEvictionPolicy evctPolicy = new FifoEvictionPolicy();
DPHCacheCfg.setEvictionPolicy(evctPolicy);
Her is how I put data in The Cache:
DPHCache.put(K, V); where V is some object. After certain number of puts, I am having the below exception.
avax.cache.CacheException: class org.apache.ignite.IgniteClientDisconnectedException: Operation has been cancelled (client node disconnected).
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1615)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.cacheException(IgniteCacheProxy.java:1955)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.put(IgniteCacheProxy.java:1155)
at com.elsevier.elssie.datafabric.LoadQuetzal.populateDPH(LoadQuetzal.java:246)
at com.elsevier.elssie.datafabric.LoadQuetzal.main(LoadQuetzal.java:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:294)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteClientDisconnectedException: Operation has been cancelled (client node disconnected).
at org.apache.ignite.internal.util.IgniteUtils$14.apply(IgniteUtils.java:829)
at org.apache.ignite.internal.util.IgniteUtils$14.apply(IgniteUtils.java:827)
... 11 more
Caused by: class org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Operation has been cancelled (client node disconnected).
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.disconnectedError(GridCacheMvccManager.java:406)
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onDisconnected(GridCacheMvccManager.java:382)
at org.apache.ignite.internal.processors.cache.GridCacheSharedContext.onDisconnected(GridCacheSharedContext.java:151)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onDisconnected(GridCacheProcessor.java:934)
at org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3023)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:588)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2058)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2039)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1435)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
This usually happens when all server nodes stop. You should check their logs to understand what exactly happened.
Also note that the client will reconnect automatically when at least one server is back. IgniteClientDisconnectedException has the reconnectFuture() method which returns the future that will be completed when the reconnection happens, so you can block the client until the cluster is functional.
Related
I have a spring batch job which is reading data from DB and indexing into solr.After deploying war for the first time i run it runs fine. But if i run it second time its showing exception as below
java.util.concurrent.RejectedExecutionException: Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$5793/1735313067#63e1224 rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor#3411eba2[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 157]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194)
at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.addRunner(ConcurrentUpdateSolrClient.java:429)
at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.request(ConcurrentUpdateSolrClient.java:527)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at org.apache.solr.client.solrj.SolrClient.addBeans(SolrClient.java:357)
at org.apache.solr.client.solrj.SolrClient.addBeans(SolrClient.java:329)
at sun.reflect.GeneratedMethodAccessor1557.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:136)
at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
at com.sun.proxy.$Proxy1112.write(Unknown Source)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.writeItems(SimpleChunkProcessor.java:188)
I think connection to solr is refused second time due to some issues.Pls help
This means you closed the CloudSolrClient and then tried to add more work with it after it was closed.
So you probably have some code like this:
try (CloudSolrClient solrClient = new CloudSolrClient.Builder()
.withZkHost(SOLR_ZK_HOSTS)
.withZkChroot(SOLR_ZK_CHROOT)
.withConnectionTimeout(10000)
.withSocketTimeout(60000)
.build()) {
... start all threads that do the work here ...
}
wait for threads to complete
You should have instead did this:
try (CloudSolrClient solrClient = new CloudSolrClient.Builder()
.withZkHost(SOLR_ZK_HOSTS)
.withZkChroot(SOLR_ZK_CHROOT)
.withConnectionTimeout(10000)
.withSocketTimeout(60000)
.build()) {
... start all threads that do the work here ...
wait for threads to complete
}
Notice how the wait for threads to complete is inside the try with resources loop. The result of not waiting for the threads to complete is that you ended up closing the solrClient before the files were all added.
I'm trying to use 'DBCPConnectionPoolLookup' service in 'ExecuteGroovyScript' to dynamically query the required database based on 'database.name' parameter in the input flow file.
The processor is successfully able to get the corresponding 'DBCPConnectionPool' service for querying but I'm getting the an exception java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object. As opposed to if I directly use the 'DBCPConnectionPool' service without the 'Lookup' service without changing any configuration it works fine.
I access the service as follows:
def clientDb = CTL.SQLLookupService.getConnection(flowFile.getAttributes())
Then use the 'clientDb' object to query as:
clientDb.rows(timseriesSqlCountQuery).eachWithIndex { row, idx ->numRowsTimeSeries= row.c}
I have tried increasing the values of Max Wait Time and Max Total Connections to higher values in 'DBCPConnectionPool' service, it does not help.
Please find below detail links of images for code,error and configuration
Exception
Configuration of 'ExecuteGroovyScript'
Configuration of 'DBCPConnectionPool' service
Configuration of 'DBCPConnectionPoolLookup' service
Script Code
import org.apache.nifi.distributed.cache.client.Deserializer
import org.apache.nifi.distributed.cache.client.Serializer
import org.apache.nifi.distributed.cache.client.exception.DeserializationException
import org.apache.nifi.distributed.cache.client.exception.SerializationException
import groovy.sql.Sql
import java.time.*
try {
def flowFile = session.get()
def isBootstrap=flowFile."isBootstrap"
def timseriesSqlQuery='SELECT id FROM [dbo].[Points] where ([MappedToEquipment] = \'Mapped\' or PointStatus = \'Mapped\')'
def timseriesSqlCountQuery='SELECT count(id) as c FROM [dbo].[Points] where ([MappedToEquipment] = \'Mapped\' or PointStatus = \'Mapped\')'
def spaceSqlQuery='select id from (select id from dbo.organization union select id from dbo.facility union select id from dbo.building union select id from dbo.floor union select id from dbo.wing union select id from dbo.room union select id from dbo.systems) tmp'
def spaceSqlCountQuery='select count(id) as c from (select id from dbo.organization union select id from dbo.facility union select id from dbo.building union select id from dbo.floor union select id from dbo.wing union select id from dbo.room union select id from dbo.systems) tmp'
def cache = CTL.lastIngestTimeMap
def clientDb = CTL.SQLLookupService.getConnection(flowFile.getAttributes())//SQL.staticService
int numRowsTimeSeries=0
int numRowsSpace=0
clientDb.rows(timseriesSqlCountQuery).eachWithIndex { row, idx ->numRowsTimeSeries= row.c}
clientDb.rows(spaceSqlCountQuery).eachWithIndex { row, idx ->numRowsSpace= row.c}
}
Exception from Nifi logs
2019-09-12 06:18:33,629 ERROR [Timer-Driven Process Thread-3] o.a.n.p.groovyx.ExecuteGroovyScript ExecuteGroovyScript[id=b435c079-ee6c-3c42-a6ea-020968267ecf] ExecuteGroovyScript[id=b435c079-ee6c-3c42-a6ea-020968267ecf] failed to process session due to java.lang.ClassCastException; Processor Administratively Yielded for 1 sec: java.lang.ClassCastException
java.lang.ClassCastException: null
2019-09-12 06:18:33,629 WARN [Timer-Driven Process Thread-3] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ExecuteGroovyScript[id=b435c079-ee6c-3c42-a6ea-020968267ecf] due to uncaught Exception: java.lang.ClassCastException
java.lang.ClassCastException: null
2019-09-12 06:18:33,629 ERROR [Timer-Driven Process Thread-9] o.a.n.p.groovyx.ExecuteGroovyScript ExecuteGroovyScript[id=9b81ca15-93a5-3953-9f40-d0874cfe2531] ExecuteGroovyScript[id=9b81ca15-93a5-3953-9f40-d0874cfe2531] failed to process session due to java.lang.ClassCastException; Processor Administratively Yielded for 1 sec: java.lang.ClassCastException
java.lang.ClassCastException: null
2019-09-12 06:18:33,629 WARN [Timer-Driven Process Thread-9] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ExecuteGroovyScript[id=9b81ca15-93a5-3953-9f40-d0874cfe2531] due to uncaught Exception: java.lang.ClassCastException
java.lang.ClassCastException: null
2019-09-12 06:18:33,708 ERROR [Timer-Driven Process Thread-10] o.a.n.p.groovyx.ExecuteGroovyScript ExecuteGroovyScript[id=a1ec4496-dca3-38ab-a47b-43d7ff95e40f] org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object: org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.nifi.dbcp.DBCPConnectionPool.getConnection(DBCPConnectionPool.java:308)
at org.apache.nifi.dbcp.DBCPService.getConnection(DBCPService.java:49)
at sun.reflect.GeneratedMethodAccessor106.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:84)
at com.sun.proxy.$Proxy89.getConnection(Unknown Source)
at org.apache.nifi.processors.groovyx.ExecuteGroovyScript.onInitSQL(ExecuteGroovyScript.java:339)
at org.apache.nifi.processors.groovyx.ExecuteGroovyScript.onTrigger(ExecuteGroovyScript.java:439)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:142)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1563)
at org.apache.nifi.dbcp.DBCPConnectionPool.getConnection(DBCPConnectionPool.java:305)
... 19 common frames omitted
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:451)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:365)
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
... 21 common frames omitted
2019-09-12 06:18:33,708 ERROR [Timer-Driven Process Thread-2] o.a.n.p.groovyx.ExecuteGroovyScript ExecuteGroovyScript[id=54d1e251-88f2-33f3-0489-722879a802bd] org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object: org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.nifi.dbcp.DBCPConnectionPool.getConnection(DBCPConnectionPool.java:308)
at org.apache.nifi.dbcp.DBCPService.getConnection(DBCPService.java:49)
at sun.reflect.GeneratedMethodAccessor106.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:84)
at com.sun.proxy.$Proxy89.getConnection(Unknown Source)
at org.apache.nifi.processors.groovyx.ExecuteGroovyScript.onInitSQL(ExecuteGroovyScript.java:339)
at org.apache.nifi.processors.groovyx.ExecuteGroovyScript.onTrigger(ExecuteGroovyScript.java:439)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:142)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1563)
at org.apache.nifi.dbcp.DBCPConnectionPool.getConnection(DBCPConnectionPool.java:305)
... 19 common frames omitted
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:451)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:365)
at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
... 21 common frames omitted
Finally after bring down Nifi twice I have found the solution. The problem seemed to be in the code which I was using, I used the object returned by CTL.index.getConnection(flowFile.getAttributes()) to query the SQL table which actually is a connection table, now due to this Nifi used up all available connections to SQL, due to which even if I reverted to using 'DBCPConnectionPool' service instead if 'Lookup' I was getting the above error. When I used to restart Nifi it used to work fine.
The actual code to be used in your script for using 'Lookup' Service is
def connectionObj = CTL.index.getConnection(flowFile.getAttributes())
def clientDb = new Sql(connectionObj)
Now use the 'clientDb' object to query your table
clientDb.rows(timseriesSqlCountQuery).eachWithIndex { row, idx ->numRowsTimeSeries= row.c}
I'm trying to submit a simple spark job in an Amazon EMR cluster. My cluster has 5 M4.2xlarge instances (1 master, 4 slaves), each with 16 vCPU, and 32 gigs of memory.
This is my code:
def main(args : Array[String]): Unit = {
val sparkConfig = new SparkConf()
.set("hive.exec.dynamic.partition", "true")
.set("hive.exec.dynamic.partition.mode", "nonstrict")
.set("hive.s3.max-client-retries", "50")
.set("hive.s3.max-error-retries", "50")
.set("hive.s3.max-connections", "100")
.set("hive.s3.connect-timeout", "5m")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrationRequired", "true")
.set("spark.kryo.classesToRegister", "org.apache.spark.graphx.impl.VertexAttributeBlock")
.set("spark.broadcast.compress", "true")
val spark = SparkSession.builder()
.appName("Spark Hive Example")
.enableHiveSupport()
.config(sparkConfig)
.getOrCreate()
// Set Kryo for serializing
GraphXUtils.registerKryoClasses(sparkConfig)
val res = spark.sql("SELECT col1, col2, col3 FROM table1 limit 10000")
val edgesRDD = res.rdd.map(row => Edge(row.getString(0).hashCode, row.getString(1).hashCode, row(2).asInstanceOf[String]))
val res_two = spark.sql("SELECT col1 FROM table2 where col1 is not NULL and col1 != '' limit 100000")
val vertexRDD: RDD[(VertexId, String)] = res_two.rdd.map(row => (row.getString(0).hashCode, row(0).asInstanceOf[String]))
val graph = Graph(vertexRDD, edgesRDD)
val connectedComponents = graph.connectedComponents().vertices
Both table1, and table2 are S3 backed external tables on hive. When I run this program, my job fails with the following error:
Job aborted due to stage failure: Task 827 in stage 0.0 failed 4 times, most recent failure: Lost task 827.3 in stage 0.0 (TID 921, xxx.internal, executor 3): com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1069)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4169)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4116)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1237)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:24)
at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:10)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:82)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:176)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:94)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AbstractAmazonS3Lite.getObjectMetadata(AbstractAmazonS3Lite.java:39)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:211)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy35.retrieveMetadata(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:768)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1194)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:773)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:166)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:355)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1204)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:246)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:245)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:203)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.$Proxy37.get(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1190)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
... 59 more
Not sure if it is coming from hadoop or when reading from hive, but I saw a similar issue here, so I added the following params in my spark-submit command:
--conf "spark.driver.extraJavaOptions=-Djavax.net.ssl.sessionCacheSize=1000 -Djavax.net.ssl.sessionCacheTimeout=60" --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.sessionCacheSize=1000 -Djavax.net.ssl.sessionCacheTimeout=60"
Still doesn't work. Does anyone know what's going on?
TLDR: The property you need to set is fs.s3.maxConnections in the emrfs-site.xml configuration file. It defaults to 50. We were getting exactly the same error/stack trace as you, so I set it to 5000, which fixed the problem and had no ill effects.
From what I can tell, the root cause is InputFormat implementations that do not properly use try...finally to ensure that connections get closed when an exceptions are thrown. Notably, older versions of Hive, including v1.2.1 that Spark is compiled against, exhibit this bug. Hive 2.x massively refactors OrcInputFormat, though I haven't verified that the bug is fixed, nor do I know if/when/how you can compile Spark against Hive 2.x.
The workaround increases the size of the connection pool, as suggested in another answer, but both the property and its location are quite different than in the "classic" S3 filesystems (s3/s3a/s3n). Of course, this isn't documented anywhere and required decompilation of the emrfs jar to tease out...
I don't use EMRFS, but I do know the other spark/hadoop S3 clients all use a pool of http connections for their requests to S3, and "timeout waiting for pool" messages invariably means "pool isn't big enough". See if you can find out what the emrfs options are for increasing that pool size. You will need at least one for every worker thread running in your process, and I'd double it in the hope that emrfs parallelises block uploads the way the s3a client does.
I have a service which uses apache HttpAsyncClient. (versions: httpasyncclient-4.0.2.jar, httpcore-4.4.3.jar, httpcore-nio-4.3.3.jar)
All requests start failing some time after starting the async client with following being initial exception -
[#|2016-03-16T22:31:59.376-0700|SEVERE|glassfish3.1.2|org.apache.http.impl.nio.client.InternalHttpAsyncClient|_ThreadID=564;_ThreadName=Thread-6;|I/O reactor terminated abnormally
org.apache.http.nio.reactor.IOReactorException: I/O dispatch worker terminated abnormally
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:357)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:189)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.doExecute(CloseableHttpAsyncClientBase.java:67)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.access$000(CloseableHttpAsyncClientBase.java:38)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:57)
at java.lang.Thread.run(Unknown Source)
Caused by: RestException(statusCode=500, code=null, message=I/O operation failed, developerMessage=RestException(statusCode=500, code=null, message=I/O operation failed, developerMessage=null)
at com.notificationservice.analytics.client.AsyncResponse$2.failed(AsyncResponse.java:178)
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.failed(DefaultClientExchangeHandlerImpl.java:258)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.exception(HttpAsyncRequestExecutor.java:127)
at org.apache.http.impl.nio.client.InternalIODispatch.onException(InternalIODispatch.java:68)
at org.apache.http.impl.nio.client.InternalIODispatch.onException(InternalIODispatch.java:37)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.outputReady(AbstractIODispatch.java:154)
at org.apache.http.impl.nio.reactor.BaseIOReactor.writable(BaseIOReactor.java:180)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:342)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:316)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:277)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:105)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:586)
at java.lang.Thread.run(Unknown Source)
)
at com.notificationservice.client.AsyncResponse$2.failed(AsyncResponse.java:178)
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:134)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.failed(DefaultClientExchangeHandlerImpl.java:258)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.exception(HttpAsyncRequestExecutor.java:127)
at org.apache.http.impl.nio.client.InternalIODispatch.onException(InternalIODispatch.java:68)
at org.apache.http.impl.nio.client.InternalIODispatch.onException(InternalIODispatch.java:37)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.outputReady(AbstractIODispatch.java:154)
at org.apache.http.impl.nio.reactor.BaseIOReactor.writable(BaseIOReactor.java:180)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:342)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:316)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:277)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:105)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:586)
... 1 more
Same problem happens with new versions - httpasyncclient-4.1.1.jar, httpcore-4.4.4.jar, httpcore-nio-4.4.4.jar
Any insight would be highly appreciated. Is there some IOReactorConfig parameter which needs to be changed?
I would say something is wrong with your rest parameters. StatusCode 500 comes from the server so your request are going to it.
Caused by: RestException(statusCode=500, code=null, message=I/O operation failed, developerMessage=RestException(statusCode=500, code=null, message=I/O operation failed, developerMessage=null
I use coherence in my project with following configuration.
Everything is Ok with coherence 12.1.2, but after updating coherence to version 12.1.3, i have a problem. When i put for example 1000 item on cache one by one using NamedCache.put() there is no error, but when i put 1000 items on cache with calling NamedCache.putall once, coherence raises an exception.
My project is in java and is deployed on jboss.
cache configuration:
<distributed-scheme>
<scheme-name>my-map</scheme-name>
<service-name>MyMap</service-name>
<serializer>
<instance>
<class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
<init-params>
<init-param>
<param-type>String</param-type>
<param-value>pof-config.xml</param-value>
</init-param>
</init-params>
</instance>
</serializer>
<thread-count>100</thread-count>
<local-storage>true</local-storage>
<backup-count>0</backup-count>
<backing-map-scheme>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<ramjournal-scheme/>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>myPackage.loader</class-name>
</class-scheme>
</cachestore-scheme>
</read-write-backing-map-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
Raised exception details:
Exception in thread "pool-7-thread-1" com.tangosol.net.RequestIncompleteException: Partial failure
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$BinaryMap.validatePartialResponse(PartitionedCache.CDB:56)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$BinaryMap.putAll(PartitionedCache.CDB:123)
at com.oracle.common.collections.ConverterCollections$ConverterMap.putAll(ConverterCollections.java:1553)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$ViewMap.putAll(PartitionedCache.CDB:5)
at com.tangosol.coherence.component.util.SafeNamedCache.putAll(SafeNamedCache.CDB:1)
at mypackage.CacheEnabledDataProvider.putOnCache(CacheEnabledDataProvider.java:66)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: Portable(com.tangosol.util.WrapperException): (Wrapped: Failed request execution for myMap service on Member(Id=1, Timestamp=2014-10-20 18:00:25.737, Address=192.168.70.101:8088, MachineId=5642, Location=site:,machine:PS,process:9876,member:Mem_1, Role=CoherenceServer) (Wrapped: Failed to store keys="9112791815, 9165497418, 9199193873, 9192139970, 9392020128, ") Assertion failed:) Assertion failed:
at com.tangosol.util.Base.ensureRuntimeException(Base.java:289)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.tagException(Grid.CDB:50)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onPartialCommit(PartitionedCache.CDB:7)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onPutAllRequest(PartitionedCache.CDB:85)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$PutAllRequest$PutJob.run(PartitionedCache.CDB:1)
at com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:1)
at com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:32)
at com.tangosol.coherence.component.util.DaemonPool$Daemon.onNotify(DaemonPool.CDB:65)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:51)
at java.lang.Thread.run(Unknown Source)
at <process boundary>
at com.tangosol.io.pof.ThrowablePofSerializer.deserialize(ThrowablePofSerializer.java:57)
at com.tangosol.io.pof.PofBufferReader.readAsObject(PofBufferReader.java:3316)
at com.tangosol.io.pof.PofBufferReader.readObject(PofBufferReader.java:2604)
at com.tangosol.io.pof.ConfigurablePofContext.deserialize(ConfigurablePofContext.java:376)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.readObject(Service.CDB:1)
at com.tangosol.coherence.component.net.Message.readObject(Message.CDB:1)
at com.tangosol.coherence.component.net.message.responseMessage.DistributedPartialResponse.read(DistributedPartialResponse.CDB:12)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$PartialValueResponse.read(PartitionedCache.CDB:4)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.deserializeMessage(Grid.CDB:20)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:21)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onNotify(PartitionedService.CDB:3)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onNotify(PartitionedCache.CDB:3)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:51)
at java.lang.Thread.run(Thread.java:745)
Caused by: Portable(com.tangosol.util.AssertionException): Assertion failed:
at com.tangosol.util.Base.azzertFailed(Base.java:209)
at com.tangosol.util.Base.azzert(Base.java:166)
at com.tangosol.net.cache.ReadWriteBackingMap$CacheStoreWrapper.storeAllInternal(ReadWriteBackingMap.java:5943)
at com.tangosol.net.cache.ReadWriteBackingMap$StoreWrapper.storeAll(ReadWriteBackingMap.java:5067)
at com.tangosol.net.cache.ReadWriteBackingMap.putAll(ReadWriteBackingMap.java:840)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.putAllPrimaryResource(PartitionedCache.CDB:7)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.postPutAll(PartitionedCache.CDB:27)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.putAll(PartitionedCache.CDB:14)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onPutAllRequest(PartitionedCache.CDB:62)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$PutAllRequest$PutJob.run(PartitionedCache.CDB:1)
at com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:1)
at com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:32)
at com.tangosol.coherence.component.util.DaemonPool$Daemon.onNotify(DaemonPool.CDB:65)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:51)
at java.lang.Thread.run(Unknown Source)
at <process boundary>
at com.tangosol.io.pof.ThrowablePofSerializer.deserialize(ThrowablePofSerializer.java:57)
at com.tangosol.io.pof.PofBufferReader.readAsObject(PofBufferReader.java:3316)
at com.tangosol.io.pof.PofBufferReader.readObject(PofBufferReader.java:2604)
at com.tangosol.io.pof.PortableException.readExternal(PortableException.java:150)
at com.tangosol.io.pof.ThrowablePofSerializer.deserialize(ThrowablePofSerializer.java:59)
at com.tangosol.io.pof.PofBufferReader.readAsObject(PofBufferReader.java:3316)
at com.tangosol.io.pof.PofBufferReader.readObject(PofBufferReader.java:2604)
at com.tangosol.io.pof.ConfigurablePofContext.deserialize(ConfigurablePofContext.java:376)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.readObject(Service.CDB:1)
at com.tangosol.coherence.component.net.Message.readObject(Message.CDB:1)
at com.tangosol.coherence.component.net.message.responseMessage.DistributedPartialResponse.read(DistributedPartialResponse.CDB:12)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$PartialValueResponse.read(PartitionedCache.CDB:4)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.deserializeMessage(Grid.CDB:20)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:21)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onNotify(PartitionedService.CDB:3)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onNotify(PartitionedCache.CDB:3)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:51)
at java.lang.Thread.run(Thread.java:745)
I found that if loader class defined in cachestore-scheme element, implements CacheStore instead of CacheLoader and leave store() and storeAll() empty, no exception is raised. According to oracle documents, implementing CacheStore is not necessary for read-only cache-store and implementing CacheLoader is sufficient. This might be a bug.
I am not sure implementing CacheStore with empty Store() and storeAll() methods when we need read-only cache-store has performance penalty or not?
I would only implement CacheLoader (not CacheStore) if you are not writing back to the database.