Azure Redis Cache - Multiple errors TimeoutException: Timeout performing GET {key} - caching

We deployed our application to Azure. It is using the Azure Redis Cache and we are experiencing quite a few timeouts. Namely:
[TimeoutException: Timeout performing GET textobjectDetails__23290_TextObject, inst: 1, mgr: Inactive, queue: 5, qu=0, qs=5, qc=0, wr=0/0, in=56864/0]
[TimeoutException: Timeout performing GET featured_series_CachedSeries, inst: 1, mgr: Inactive, queue: 4, qu=0, qs=4, qc=0, wr=0/0, in=44470/0]
[TimeoutException: Timeout performing GET SeriesByFranchiseId_1_CachedSeries, inst: 1, mgr: Inactive, queue: 3, qu=0, qs=3, qc=0, wr=0/0, in=11252/0]
[TimeoutException: Timeout performing GET media_silo-1-1-0_Media, inst: 1, mgr: Inactive, queue: 3, qu=0, qs=3, qc=0, wr=0/0, in=15188/0]
[TimeoutException: Timeout performing GET textobjectDetails__3092_TextObject, inst: 3, mgr: Inactive, queue: 7, qu=0, qs=7, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET textobjectbytype_104__TextObject, inst: 11, mgr: Inactive, queue: 9, qu=0, qs=9, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET groupnews_2_14_TextObject, inst: 1, mgr: Inactive, queue: 7, qu=0, qs=7, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET archived_news_by_group_13586_1_TextObject, inst: 2, mgr: Inactive, queue: 7, qu=0, qs=7, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET textobjectDetails__24404_TextObject, inst: 11, mgr: Inactive, queue: 12, qu=0, qs=12, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET standings_232_lcds_fixtures, inst: 2, mgr: Inactive, queue: 11, qu=0, qs=11, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET player_name1099_Player, inst: 4, mgr: Inactive, queue: 11, qu=0, qs=11, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET groupnews_1_6_TextObject, inst: 4, mgr: Inactive, queue: 9, qu=0, qs=9, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET archivednewscount__20789_TextObject, inst: 2, mgr: Inactive, queue: 11, qu=0, qs=11, qc=0, wr=0/0, in=65536/0]
[TimeoutException: Timeout performing GET media_id3648_Media, inst: 1, mgr: Inactive, queue: 10, qu=0, qs=10, qc=0, wr=0/0, in=65536/0]
The body of the exception is the same for all of them:
StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server):509
StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server):25
StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags):16
AB.SiteCaching.Providers.RedisDataSource+<>c__DisplayClasse`1.<RetrieveCacheObject>b__b():0
Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func):115
AB.SiteCaching.Providers.RedisDataSource.RetrieveCacheObject[T](String fullCacheKey):56
AB.SiteCaching.Providers.RedisDataSource.RetrieveCached[T](String key, Func`1 onNotCached, TimeSpan timeOut):61
DataAccess.Data.Caching.CachedSeries.GetSeriesByFranchiseId(Int32 franchiseId):64
Shared.Services.SeriesService.LatestYearByFranchiseId(Int32 franchiseId):0
AllBlacksdotcom.Controllers.FixturesController._MostRecentYearOfFixtures(Int32 franchiseId):0
(unknown).lambda_method(Closure , ControllerBase , Object[] ):-1
System.Web.Mvc.ActionMethodDispatcher.Execute(ControllerBase controller, Object[] parameters):0
System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters):87
System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters):0
System.Web.Mvc.Async.AsyncControllerActionInvoker.<BeginInvokeSynchronousActionMethod>b__39(IAsyncResult asyncResult, ActionInvocation innerInvokeState):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResult`2.CallEndDelegate(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResultBase`1.End():41
System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeActionMethod(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncControllerActionInvoker+AsyncInvocationWithFilters.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3d():20
System.Web.Mvc.Async.AsyncControllerActionInvoker+AsyncInvocationWithFilters+<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f():134
System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass33.<BeginInvokeActionMethodWithFilters>b__32(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResult`1.CallEndDelegate(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResultBase`1.End():41
System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeActionMethodWithFilters(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass21+<>c__DisplayClass2b.<BeginInvokeAction>b__1c():0
System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass21.<BeginInvokeAction>b__1e(IAsyncResult asyncResult):65
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResult`1.CallEndDelegate(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResultBase`1.End():41
System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeAction(IAsyncResult asyncResult):0
System.Web.Mvc.Controller.<BeginExecuteCore>b__1d(IAsyncResult asyncResult, ExecuteCoreState innerState):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResultBase`1.End():41
System.Web.Mvc.Controller.EndExecuteCore(IAsyncResult asyncResult):0
System.Web.Mvc.Controller.<BeginExecute>b__15(IAsyncResult asyncResult, Controller controller):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResultBase`1.End():41
System.Web.Mvc.Controller.EndExecute(IAsyncResult asyncResult):0
System.Web.Mvc.Controller.System.Web.Mvc.Async.IAsyncController.EndExecute(IAsyncResult asyncResult):0
System.Web.Mvc.MvcHandler.<BeginProcessRequest>b__5(IAsyncResult asyncResult, ProcessRequestState innerState):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult):0
System.Web.Mvc.Async.AsyncResultWrapper+WrappedAsyncResultBase`1.End():41
System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult):0
System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result):0
System.Web.Mvc.HttpHandlerUtil+ServerExecuteHttpHandlerAsyncWrapper+<>c__DisplayClassa.<EndProcessRequest>b__9():0
System.Web.Mvc.HttpHandlerUtil+ServerExecuteHttpHandlerWrapper+<>c__DisplayClass4.<Wrap>b__3():0
System.Web.Mvc.HttpHandlerUtil+ServerExecuteHttpHandlerWrapper.Wrap[TResult](Func`1 func):0
System.Web.Mvc.HttpHandlerUtil+ServerExecuteHttpHandlerWrapper.Wrap(Action action):25
System.Web.Mvc.HttpHandlerUtil+ServerExecuteHttpHandlerAsyncWrapper.EndProcessRequest(IAsyncResult result):32
System.Web.HttpServerUtility.ExecuteInternal(IHttpHandler handler, TextWriter writer, Boolean preserveForm, Boolean setPreviousPage, VirtualPath path, VirtualPath filePath, String physPath, Exception error, String queryStringOverride):772
Please see our time-out settings:
retryTimeoutInMilliseconds = "5000"
connectionTimeoutInMilliseconds = "5000"
operationTimeoutInMilliseconds = "1000"
How do I approach these time-outs? Would increasing operationTimeoutInMilliseconds do the trick? I also read about g-zip compression being helpful for decreasing time that it takes to read the data from Redis.
All nuget packages related to Redis are on their latest versions. We are using C1 version of Azure Redis Cache (would increasing to C2 help?).

A few points that improved our situation:
Protobuf-net instead of BinaryFormatter
I recommend using protobuf-net as it will reduce the size of values that you want to store in your cache.
public interface ICacheDataSerializer
{
byte[] Serialize(object o);
T Deserialize<T>(byte[] stream);
}
public class ProtobufNetSerializer : ICacheDataSerializer
{
public byte[] Serialize(object o)
{
using (var memoryStream = new MemoryStream())
{
Serializer.Serialize(memoryStream, o);
return memoryStream.ToArray();
}
}
public T Deserialize<T>(byte[] stream)
{
var memoryStream = new MemoryStream(stream);
return Serializer.Deserialize<T>(memoryStream);
}
}
Implement retry strategy
Implement this RedisCacheTransientErrorDetectionStrategy to handle timeout issues.
using Microsoft.Practices.TransientFaultHandling;
public class RedisCacheTransientErrorDetectionStrategy : ITransientErrorDetectionStrategy
{
/// <summary>
/// Custom Redis Transient Error Detenction Strategy must have been implemented to satisfy Redis exceptions.
/// </summary>
/// <param name="ex"></param>
/// <returns></returns>
public bool IsTransient(Exception ex)
{
if (ex == null) return false;
if (ex is TimeoutException) return true;
if (ex is RedisServerException) return true;
if (ex is RedisException) return true;
if (ex.InnerException != null)
{
return IsTransient(ex.InnerException);
}
return false;
}
}
Instantiate like this:
private readonly RetryPolicy _retryPolicy;
// CODE
var retryStrategy = new FixedInterval(3, TimeSpan.FromSeconds(2));
_retryPolicy = new RetryPolicy<RedisCacheTransientErrorDetectionStrategy>(retryStrategy);
Use like this:
var cachedString = _retryPolicy.ExecuteAction(() => dataCache.StringGet(fullCacheKey));
Review your code to minimize cache calls and values that you are storing in your cache. I reduced lots of errors by storing values more efficiently.
If none of this helps. Move to higher cache (we ended up using C3 instead of C1).

In all of your exception messages, you have something like "in=65536/0". This indicates that 65536 bytes have been received locally (in the kernel's socket buffer on the local machine) but have not yet been processed by StackExchange.Redis. This would point at an issue on the client app side of things that is keeping the system from processing your response. Two possibilities I can think of that could lead to this:
Your system does not have enough free thread pool threads to process the data fast enough. If you update to build 1.0.450 or later of StackExchange.Redis, the timeout messages will include statistics for how busy the thread pool is, which will help diagnose this issue. Alternatively, you can take use code like https://github.com/JonCole/SampleCode/blob/master/ThreadPoolMonitor/ThreadPoolLogger.cs to log this same information yourself whenever you get an exception.
You are receiving a very large object from the cache and the size of the object is not received within the timeout configured on StackExchange.Redis. When this happens, all calls that were queued onto the same connection behind the large item will timeout when the large item times out.
Anyway, hope this helps.

Go to Azure and get a bigger Redis instance. That t

Related

Tuning Kafka for latency, packet loss, and unreachability

I am trying to optimize the performances of Kafka in a scenario where there is high latency (>500ms) and intermittent packet loss. I am working with JAVA and using 'kafka_2.13', version: '2.5.0' API. I have 24
nodes connected to a single broker, each node tries to send a small message to all the other subscribers. I observe that all nodes are able to communicate when there is no packet loss nor latency but they don't seem to be able to communicate soon after I add latency and packet loss. I will do more tests on Monday but I was wondering if anyone had any suggestions on possible configuration improvements.
Following you can see the code that I see to publish and receive messages and then the different configurations that used for consumers and producers.
Publishers:
boolean sendAsyncMessage (byte[] value, String topic) {
ProducerRecord<Long, byte[]> record = new ProducerRecord<> (topic, System.currentTimeMillis (), value);
long msStart = System.currentTimeMillis ();
producer.send (record, (metadata, exception) -> {
long msDelta = System.currentTimeMillis () - msStart;
logger.info ("Message with topic {} sent at {}, was ack after {}", topic, msStart, msDelta);
if (metadata == null) {
logger.info ("An exception was triggered during send:" + exception.toString ());
}
});
producer.flush ();
return true;
}
Subscribers:
while (keepGoing.get ()) {
try {
// java example do it every time!
subscribe ();
ConsumerRecords<Long, byte[]> consumerRecords = consumer.poll (Duration.ofMillis (2000));
manageMessage (consumerRecords);
//Thread processRecords = new Thread (() -> manageMessage (consumerRecords));
//processRecords.start ();
} catch (Exception e) {
logger.error ("Problem in polling: " + e.toString ());
}
}
Producer:
properties.put (ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, KafkaBroker.KEY_SERIALIZER);
properties.put (ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaBroker.VALUE_SERIALIZER);
properties.put (ProducerConfig.ACKS_CONFIG, reliable ? "1" : 0);
// host1:port1,host2:port2
properties.put (ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, server);
// how many bytes to buffer records waiting to be sent to the server
//properties.put (ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
properties.put (ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");
properties.put (ProducerConfig.CLIENT_ID_CONFIG, clientID);
//15 MINUTES
properties.put (ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG, 54000000);
// MAX UNCOMPRESSED MESSAGE SIZE
// properties.put (ProducerConfig.MAX_REQUEST_SIZE_CONFIG, 1048576);
// properties.put (ProducerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG, 1000);
properties.put (ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG, 300);
properties.put (ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 300);
Consumer
properties.put (ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, KafkaBroker.KEY_DESERIALIZER);
properties.put (ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaBroker.VALUE_DESERIALIZER);
// host1:port1,host2:port2
properties.put (ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, server);
// should be the topic
properties.put (ConsumerConfig.GROUP_ID_CONFIG, groupID);
properties.put (ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "3000");
properties.put (ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000");
properties.put (ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
Before trying to change all settings, I'd make a few changes in your logic:
Producer
Currently you are calling flush() after sending each message, effectively doing a synchronous send. This is not recommended as it forces the Kafka client to make a request to the cluster for every single message. This is pretty inefficient. In most cases, it's best to let the client decide when to actually send messages and not use flush().
Consumer
In each iteration, you are calling subscribe(), this is not needed. You should only call subscribe() when you want to change the subscription. Also creating a new thread in each poll() loop is not recommended! In addition to being slow, you risk creating hundreds or thousands of threads if the consumer starts fetching large amounts of messages.
Kafka is using a TCP protocol, so packets lost should be automatically retried. By default, Kafka clients are configured to retry most operations and automatically reconnect to brokers if a connection is lost.
When doing your tests, before changing configurations, you should see how the Kafka client is behaving by monitoring its metrics and logs. Are timeouts reached because of the latency? Are messages retried?
In the end, the biggest factor that was preventing my distributed system to correctly communicate was the producer acks option. In the beginning, we had set this to all (the strictest option) and it seems that that, paired with the deteriorated network was preventing Kafka to have performances similar to other TCP based protocols. We now use 0 for unreliable messages and 1 for reliable.

Cypress test logs in Team City are too verbose

We are running Cypress tests in Team City and the output of the tests is crazy verbose. When I run the tests on the agent machine directly I don't see such verbose logging so it might be because of Team City.
Small sample (there are thousands of these):
cypress:https-proxy Making intercepted connection to 60378
cypress:server:server Got UPGRADE request from /__socket.io/?EIO=3&transport=websocket
cypress:server:timers queuing timer id 5 after 85000 ms
cypress:server:timers child received timer id 5
cypress:server:socket socket connected
cypress:server:timers clearing timer id 5 from queue { '3': { args: [], ms: 30000, cb: [Function: timeoutTimeout] }, '5': { args: [], ms: 85000, cb: [Function] } }
cypress:server:timers queuing timer id 6 after 85000 ms
cypress:server:timers child received timer id 6
cypress:server:timers clearing timer id 6 from queue { '3': { args: [], ms: 30000, cb: [Function: timeoutTimeout] }, '6': { args: [], ms: 85000, cb: [Function] } }
cypress:server:timers queuing timer id 7 after 85000 ms
cypress:server:timers queuing timer id 8 after 1000 ms
cypress:server:timers clearing timer id 8 from queue { '3': { args: [], ms: 30000, cb: [Function: timeoutTimeout] }, '7': { args: [], ms: 85000, cb: [Function] }, '8': { args: [], ms: 1000, cb: [Function: timeoutTimeout] } }
cypress:server:timers child received timer id 7
cypress:server:timers child received timer id 8
Any ideas?
Those are Cypress' own debug messages - they are only logged in case the DEBUG environment variable is set to a matching pattern, i.e. * or cypress:*. They are not related to the teamcity logger.
Either override the env.DEBUG Build Parameter in your Build Configuration and set it to an empty string, or (in case there's something else in your build that needs it to be set to *) unset it just at the launch inside your build step, like
env -u DEBUG npm run cypress -- --reporter teamcity

Testing my Xamarin.Forms iOS Application with Xamarin.UITest: The first test always fails

I am testing an Xamarin.iOS Application on a real device (iPhone 6s with iOS 12.1) with Xamarin.UITest.
When I am running my few UI Tests, the first test always seam to crash (regardless of what is happening inside the test) due to the same error (see below).
Enviroment is:
Xamarin.UITest 2.2.7
NUnitTestAdapter 2.1.1
NUnit 2.6.3
Xamarin.TestCloud.Agent 0.21.7
Setup is:
[SetUp]
public void Setup(){
this.app = ConfigureApp.iOS
.EnableLocalScreenshots()
.InstalledApp("my.app.bundle")
.DeviceIdentifier("My-iPhone6s-UDID-With-iOS12.1")
.StartApp();
}
[Test]
public void FirstTestThatCouldBeEmpty(){
//But doesn't have to be empty to produce the error
}
Resulting error:
2019-01-17T14:11:20.4902700Z 1 - ClearData:> 2019-01-17T14:11:20.4916340Z bundleId: my.app.bundle
2019-01-17T14:11:20.4929580Z deviceId: My-iPhone6s-UDID-With-iOS12.1
2019-01-17T14:11:33.7561260Z 3 - LaunchTestAsync:
2019-01-17T14:11:33.7574050Z deviceId: My-iPhone6s-UDID-With-iOS12.1
2019-01-17T14:11:33.9279420Z 5 - HTTP request failed, retry limit hit
2019-01-17T14:11:33.9302300Z Exception: System.Net.Http.HttpRequestException: An error occurred while sending the request ---> System.Net.WebException: Unable to read data from the transport connection: Connection reset by peer. ---> System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer. ---> System.Net.Sockets.SocketException: Connection reset by peer
2019-01-17T14:11:33.9322710Z at System.Net.Sockets.Socket.EndReceive (System.IAsyncResult asyncResult) [0x00012] in <23340a11bb41423aa895298bf881ed68>:0
2019-01-17T14:11:33.9340560Z at System.Net.Sockets.NetworkStream.EndRead (System.IAsyncResult asyncResult) [0x00057] in <23340a11bb41423aa895298bf881ed68>:0
2019-01-17T14:11:33.9358740Z --- End of inner exception stack trace ---
2019-01-17T14:11:33.9377100Z at System.Net.Sockets.NetworkStream.EndRead (System.IAsyncResult asyncResult) [0x0009b] in <23340a11bb41423aa895298bf881ed68>:0
2019-01-17T14:11:33.9398100Z at System.IO.Stream+<>c.b__43_1 (System.IO.Stream stream, System.IAsyncResult asyncResult) [0x00000] in <98fac219bd4e453693d76fda7bd96ab0>:0
2019-01-17T14:11:33.9415720Z at System.Threading.Tasks.TaskFactory1+FromAsyncTrimPromise1[TResult,TInstance].Complete (TInstance thisRef, System.Func`3[T1,T2,TResult] endMethod, System.IAsyncResult asyncResult, System.Boolean requiresSynchronization) [0x00000] in <98fac219bd4e453693d76fda7bd96ab0>:0
Sometimes it is this error:
SetUp : Xamarin.UITest.XDB.Exceptions.DeviceAgentException : Unable to end session: An error occurred while sending the request
System.Net.Http.HttpRequestException : An error occurred while sending the request
System.Net.WebException : Unable to read data from the transport connection: Connection reset by peer.
System.IO.IOException : Unable to read data from the transport connection: Connection reset by peer.
System.Net.Sockets.SocketException : Connection reset by peer
What could be a solution to this?
Doing the version puzzle of the nuget packages got me this far, that all tests are working besides this one.
A dummy test would be a possibility, but that would that countinous builds with those test would never be in the state of "successful" because this one test that is failing =(
I was able to find a work around for this issue, and really any issue with the device agent not connecting to the app under test.
public static IApp StartApp(Platform platform)
{
if (platform == Platform.iOS)
{
try
{
return ConfigureApp.iOS
.InstalledApp("com.example")
#if DEBUG
.Debug()
#endif
.StartApp();
}
catch
{
return ConfigureApp.iOS
.InstalledApp("com.example")
#if DEBUG
.Debug()
#endif
.ConnectToApp();
}
}
In plain language : "If you tried to start the app, and something went wrong, try to see if the app under test was started anyway and connect to it"
Note : As https://stackoverflow.com/users/1214857/iupchris10 mentioned in the comments, a caveat to this approach is that when you run tests consecutively, while the StartApp call will in theory reliably shut down the app, with the point of failure being reconnection, if it does not and you are running tests consecutively, you will simply connect to the running app instance in whatever state it was left in. This approach offers no recourse. Xamarin's StartApp will throw Exception, rather than a typed exception, and so mitigating this will rely on you parsing InnerException.Message and keeping a table of expected failures.

ZMQ - forwarding request between sockets or one-time proxy

I'm struggling with connecting two sockets:
frontend (ROUTER) - which handles clients request and forward them to backend
backend (ROUTER) - which receives request from frontend and deals with them with the use of number of workers ( which require some initialization, configuration etc).
The server code looks like this:
void server_task::run() {
frontend.bind("tcp://*:5570");
backend.bind("inproc://backend");
zmq::pollitem_t items[] = {
{ frontend, 0, ZMQ_POLLIN, 0 },
{ backend, 0, ZMQ_POLLIN, 0}
};
try {
while (true) {
zmq::poll(&items[0], 2, -1);
if (items[0].revents & ZMQ_POLLIN) {
frontend_h();
}
if (items[1].revents & ZMQ_POLLIN) {
backend_h();
}
}
}
catch (std::exception& e) {
LOG(info) << e.what();
}
}
frontend_h and backend_h are handler classes, each having access to both sockets.
The question is:
Considering synchronous execution of frontend_h() and backend_h() how can I forward the request dealt in frontend_h() to backend_h()?
I tried to simply re-send the message using backend socket like that:
void frontend_handler::handle_query(std::unique_ptr<zmq::message_t> identity, std::unique_ptr<zmq::message_t> request) {
zmq::message_t req_msg, req_identity;
req_msg.copy(request.get());
req_identity.copy(identity.get());
zmq::message_t header = create_header(request_type::REQ_QUERY);
backend.send(header, ZMQ_SNDMORE);
backend.send(message);
}
But it stucks on zmq::poll in run() after the execution of handle_query().
Stucks on zmq::poll()?
Your code has instructed the .poll() method to block, exactly as documentation states:
If the value of timeout is -1, zmq_poll() shall block indefinitely until a requested event has occurred...
How can I forward the request?
It seems pretty expensive to re-marshall each message ( +1 for using at least the .copy() method and avoiding re-packing overheads ) once your code is co-located and the first, receiving handler, can request and invoke any appropriate method of the latter directly ( and without any Context()-processing associated efforts and overheads.

How can I avoid this Deadlock in Oracle.sql.ARRAY type?

We've come across the deadlock situation below when running multiple Java EE MDB instances that are trying to write to the DB:
[deadlocked thread] [ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)':
Thread '[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)'' is waiting to acquire lock 'oracle.jdbc.driver.T4CConnection#90106ee' that is held by thread '[ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)''
Stack trace:
oracle.sql.ARRAY.toBytes(ARRAY.java:673)
oracle.jdbc.driver.OraclePreparedStatement.setArrayCritical(OraclePreparedStatement.java:5985)
oracle.jdbc.driver.OraclePreparedStatement.setARRAYInternal(OraclePreparedStatement.java:5944)
oracle.jdbc.driver.OraclePreparedStatement.setObjectCritical(OraclePreparedStatement.java:8782)
oracle.jdbc.driver.OraclePreparedStatement.setObjectInternal(OraclePreparedStatement.java:8278)
oracle.jdbc.driver.OraclePreparedStatement.setObject(OraclePreparedStatement.java:8868)
oracle.jdbc.driver.OraclePreparedStatementWrapper.setObject(OraclePreparedStatementWrapper.java:240)
weblogic.jdbc.wrapper.PreparedStatement.setObject(PreparedStatement.java:287)
org.springframework.jdbc.core.StatementCreatorUtils.setValue(StatementCreatorUtils.java:356)
org.springframework.jdbc.core.StatementCreatorUtils.setParameterValueInternal(StatementCreatorUtils.java:216)
org.springframework.jdbc.core.StatementCreatorUtils.setParameterValue(StatementCreatorUtils.java:127)
org.springframework.jdbc.core.PreparedStatementCreatorFactory$PreparedStatementCreatorImpl.setValues(PreparedStatementCreatorFactory.java:298)
org.springframework.jdbc.object.BatchSqlUpdate$1.setValues(BatchSqlUpdate.java:192)
org.springframework.jdbc.core.JdbcTemplate$4.doInPreparedStatement(JdbcTemplate.java:892)
org.springframework.jdbc.core.JdbcTemplate$4.doInPreparedStatement(JdbcTemplate.java:1)
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:586)
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:614)
org.springframework.jdbc.core.JdbcTemplate.batchUpdate(JdbcTemplate.java:883)
org.springframework.jdbc.object.BatchSqlUpdate.flush(BatchSqlUpdate.java:184)
com.csfb.fao.rds.rfi.common.dao.storedprocs.SaveEarlyExceptionBatchStoredProc.execute(SaveEarlyExceptionBatchStoredProc.java:93)
com.csfb.fao.rds.rfi.common.dao.EarlyExceptionDAOImpl.saveEarlyExceptionBatch(EarlyExceptionDAOImpl.java:34)
com.csfb.fao.rds.rfi.application.rulesengine.RulesEngine.saveEarlyExceptions(RulesEngine.java:302)
com.csfb.fao.rds.rfi.application.rulesengine.RulesEngine.executeRules(RulesEngine.java:209)
com.csfb.fao.rds.rfi.application.rulesengine.RulesEngine.onMessage(RulesEngine.java:97)
com.csfb.fao.rds.feeds.process.BaseWorkerMDB.onMessage(BaseWorkerMDB.java:518)
weblogic.ejb.container.internal.MDListener.execute(MDListener.java:466)
weblogic.ejb.container.internal.MDListener.transactionalOnMessage(MDListener.java:371)
weblogic.ejb.container.internal.MDListener.onMessage(MDListener.java:327)
weblogic.jms.client.JMSSession.onMessage(JMSSession.java:4547)
weblogic.jms.client.JMSSession.execute(JMSSession.java:4233)
weblogic.jms.client.JMSSession.executeMessage(JMSSession.java:3709)
weblogic.jms.client.JMSSession.access$000(JMSSession.java:114)
weblogic.jms.client.JMSSession$UseForRunnable.run(JMSSession.java:5058)
weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516)
weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
and...
[deadlocked thread] [ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)':
Thread '[ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)'' is waiting to acquire lock 'oracle.jdbc.driver.T4CConnection#b48b568' that is held by thread '[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)''
Stack trace:
oracle.sql.ARRAY.toBytes(ARRAY.java:673)
oracle.jdbc.driver.OraclePreparedStatement.setArrayCritical(OraclePreparedStatement.java:5985)
oracle.jdbc.driver.OraclePreparedStatement.setARRAYInternal(OraclePreparedStatement.java:5944)
oracle.jdbc.driver.OraclePreparedStatement.setObjectCritical(OraclePreparedStatement.java:8782)
oracle.jdbc.driver.OraclePreparedStatement.setObjectInternal(OraclePreparedStatement.java:8278)
oracle.jdbc.driver.OraclePreparedStatement.setObject(OraclePreparedStatement.java:8868)
oracle.jdbc.driver.OraclePreparedStatementWrapper.setObject(OraclePreparedStatementWrapper.java:240)
weblogic.jdbc.wrapper.PreparedStatement.setObject(PreparedStatement.java:287)
org.springframework.jdbc.core.StatementCreatorUtils.setValue(StatementCreatorUtils.java:356)
org.springframework.jdbc.core.StatementCreatorUtils.setParameterValueInternal(StatementCreatorUtils.java:216)
org.springframework.jdbc.core.StatementCreatorUtils.setParameterValue(StatementCreatorUtils.java:127)
org.springframework.jdbc.core.PreparedStatementCreatorFactory$PreparedStatementCreatorImpl.setValues(PreparedStatementCreatorFactory.java:298)
org.springframework.jdbc.object.BatchSqlUpdate$1.setValues(BatchSqlUpdate.java:192)
org.springframework.jdbc.core.JdbcTemplate$4.doInPreparedStatement(JdbcTemplate.java:892)
org.springframework.jdbc.core.JdbcTemplate$4.doInPreparedStatement(JdbcTemplate.java:1)
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:586)
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:614)
org.springframework.jdbc.core.JdbcTemplate.batchUpdate(JdbcTemplate.java:883)
org.springframework.jdbc.object.BatchSqlUpdate.flush(BatchSqlUpdate.java:184)
com.csfb.fao.rds.rfi.common.dao.storedprocs.SaveEarlyExceptionBatchStoredProc.execute(SaveEarlyExceptionBatchStoredProc.java:93)
com.csfb.fao.rds.rfi.common.dao.EarlyExceptionDAOImpl.saveEarlyExceptionBatch(EarlyExceptionDAOImpl.java:34)
com.csfb.fao.rds.rfi.application.rulesengine.RulesEngine.saveEarlyExceptions(RulesEngine.java:302)
com.csfb.fao.rds.rfi.application.rulesengine.RulesEngine.executeRules(RulesEngine.java:209)
com.csfb.fao.rds.rfi.application.rulesengine.RulesEngine.onMessage(RulesEngine.java:97)
com.csfb.fao.rds.feeds.process.BaseWorkerMDB.onMessage(BaseWorkerMDB.java:518)
weblogic.ejb.container.internal.MDListener.execute(MDListener.java:466)
weblogic.ejb.container.internal.MDListener.transactionalOnMessage(MDListener.java:371)
weblogic.ejb.container.internal.MDListener.onMessage(MDListener.java:327)
weblogic.jms.client.JMSSession.onMessage(JMSSession.java:4547)
weblogic.jms.client.JMSSession.execute(JMSSession.java:4233)
weblogic.jms.client.JMSSession.executeMessage(JMSSession.java:3709)
weblogic.jms.client.JMSSession.access$000(JMSSession.java:114)
weblogic.jms.client.JMSSession$UseForRunnable.run(JMSSession.java:5058)
weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516)
weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
Looking at the ARRAY.toBytes() method:
public byte[] toBytes()
throws SQLException
{
synchronized (getInternalConnection())
{
return this.descriptor.toBytes(this, this.enableBuffering);
}
}
..., it synchronizes on the following method (getInternalConnection() -> getPhysicalConnection()):
oracle.jdbc.internal.OracleConnection getPhysicalConnection()
{
if (this.physicalConnection == null)
{
try
{
this.physicalConnection = ((oracle.jdbc.internal.OracleConnection)new OracleDriver().defaultConnection());
}
catch (SQLException localSQLException)
{
}
}
return this.physicalConnection;
}
defaultConnection() does the following:
public Connection defaultConnection()
throws SQLException
{
if ((defaultConn == null) || (defaultConn.isClosed()))
{
synchronized (OracleDriver.class)
{
if ((defaultConn == null) || (defaultConn.isClosed()))
{
defaultConn = connect("jdbc:oracle:kprb:", new Properties());
}
}
}
return defaultConn;
}
So there's synchronizations on the connection instance and OracleDriver.class object... I can't see how this can deadlock. To get to the point of needing the lock on OracleDriver.class, the thread would already have the lock on the connection instance.... clearly I'm missing something.
We are creating the ARRAY type using the following code:
public ARRAY getOracleLongArray(DataSource ds, List<Long> vals, String typeName)
throws SQLException {
oracle.jdbc.OracleConnection conn = (oracle.jdbc.OracleConnection) ds
.getConnection();
Object[] data = vals.toArray(new Object[0]);
ARRAY mddArray = conn.createARRAY(typeName, data);
conn.close(); // Close the extra connection made here.
return mddArray;
}
Thanks
It looks like you are sharing that connection with multiple threads (by way of that defaultConnection() method. Don't do that, make sure a connection is used by only one thread at a time.
Mark had the answer:
We were using a Connection pool but passing the DataSource reference to it - every time we created a New ARRAY object it was getting a different Connection object from the Datasource (to that used by the actual db call) and thus creating a new db connection.
We wrapped the DataSource to always return the same connection and it fixed the issue.
Thanks

Resources