I'm trying to use the Go pubsub library against the local emulated pubsub server. I'm finding that the "old style" (deprecated) functions (e.g. CreateSub and PullWait) work find, but the "new style" API (e.g. Iterators and SubscriptionHandles) does not work as expected.
I've written two different unit tests that both test the same sequence of actions, one using the "new style" API and one using the "old style" API.
The sequence is:
create a subscription
fail to pull any messages (since none available)
publish a message
pull that message, but don't ACK it
lastly pull it again which should take 10s since the message ACK timeout has to expire first
https://gist.github.com/ianrose14/db6ecd9ccb6c84c8b36bf49d93b11bfb
The test using the old-style API works just as I would expect:
=== RUN TestPubSubRereadLegacyForDemo
--- PASS: TestPubSubRereadLegacyForDemo (10.32s)
pubsubintg_test.go:217: PullWait returned in 21.64236ms (expected 0)
pubsubintg_test.go:228: PullWait returned in 10.048119558s (expected 10s)
PASS
Whereas the test using the new-style API works unreliably. Sometimes things work as expected:
=== RUN TestPubSubRereadForDemo
--- PASS: TestPubSubRereadForDemo (11.38s)
pubsubintg_test.go:149: iter.Next() returned in 17.686701ms (expected 0)
pubsubintg_test.go:171: iter.Next() returned in 10.059492646s (expected 10s)
PASS
But sometimes I find that iter.Stop() doesn't return promptly as it should (and note how the second iter.Next too way longer than it should):
=== RUN TestPubSubRereadForDemo
--- FAIL: TestPubSubRereadForDemo (23.87s)
pubsubintg_test.go:149: iter.Next() returned in 7.3284ms (expected 0)
pubsubintg_test.go:171: iter.Next() returned in 20.074994835s (expected 10s)
pubsubintg_test.go:183: iter.Stop() took too long (2.475055901s)
FAIL
And other times I find that the first Pull after publishing the message takes too long (it should be near instant):
=== RUN TestPubSubRereadForDemo
--- FAIL: TestPubSubRereadForDemo (6.32s)
pubsubintg_test.go:147: failed to pull message from iterator: context deadline exceeded
FAIL
Any ideas? Are there any working examples using the new-style API? Unfortunately, the Go starter project here uses the old, deprecated API.
(Note: It looks like the line numbers in your example output don't match the code that you've linked.)
But sometimes I find that iter.Stop() doesn't return promptly as it should
A few changes have landed recently which fix the excessive delay when calling iter.Stop. It should now return promptly if all messages have been acked. Try syncing and testing it out again.
(and note how the second iter.Next too way longer than it should):
In you code that uses the new API, you first do a pull from an empty subscription, using a context with a 1s deadline. Let's call this "Pull request A". Although the underlying http request is cancelled, it seems that the connection is not being closed in any way that the server respects. So, as far as the server is concerned, "A" is still pending. Immediately after publishing, you make a new pull request, let's call that "B". After a message is returned via pull request B, you leave the message unacked, and make another pull request, "C".
Now, when you publish the message, the server will deliver it to either "A" or "B". If it delivers it to "A" first, you will see the first pull exceeding the 5s context deadline. If it is published to "B" first, you will see the first pull returning quickly, as expected. After the message is published to "B", and left unacked, the server will redeliver it to "A", or "C". If it picks "A" first, then you will end up with the second pull taking longer than expected. If it picks "C", then you will see both first and second pulls taking as long as you expect.
If you don't do that initial pull from the empty subscription, you should see your test behave as you expect.
Note: You don't see any of this when you use the old API because you're not doing the extra "pull from empty subscription" request with the old API (presumably because it doesn't properly support a cancellable context).
Aside: if you want to leave a message unacked, you should call Message.Done(false).
Related
My Cypress test is acting inconsistently due to an assertion set on header text. Here is my code:
cy.get('.heading-large').should('contain', 'dashboard') // passes
cy.contains('View details').first().click()
cy.get('.heading-large').should('contain', 'Registration details') // sometimes fails
If it fails, it is because the heading still contains 'dashboard' - Cypress appears not to have retried but gives error Timed out retrying: expected '<h1.heading-large>' to contain 'Registration details'
From reading about Cypress retry-ability, my understanding is that the should assertion should keep trying until timeout, which is set as "defaultCommandTimeout" : 5000. This feels true even if I have an element with the same identifier across two pages. There are no major performance issues with the app I'm testing.
The test seems more likely to fail if I am not watching the window and this issue looks like a possible cause.
Can anyone help determine: is there an issue with my test or Cypress, and how might I improve the test? I'm using Cypress 5.1.0 and Chrome 85 on MacOS Catalina.
It is failing occasionally because the request that fills the header with information has not resolved by the time the timeout has been reached.
You can solve this by setting up a route with a route alias to wait for that exact response from the request to resolve before you proceed with the click.
In other words, When you click(), there is a request sent that grabs the information you want to check for in the next get(). This response for this request has sometimes not resolved by the time your get() reaches timeout. You could increase the timeout but that's not recommended and not good practice here. Instead, wait for that specific response with route & route alias. If you do that, in every case, the last get() won't get called until the information it is looking for has been resolved.
I don't know your request but it would work something like this:
// setup the route and alias
cy.server()
cy.route("/youRequestUrlHere").as("myLovelyAlias")
// first get
cy.get('.heading-large').should('contain', 'dashboard')
// this click fires the request url from route() above
cy.contains('View details').first().click()
// wait for route to resolve using route alias
cy.wait("#myLovelyAlias").then((response) => {
// next get called after response resolves
cy.get('.heading-large').should('contain', 'Registration details')
}
Reference:
Route & alias
Route
Best Practice - get()
Network Request - wait()
edit:
As mentioned above, you could also cheat and set the defaultCommandTimeout to a higher number but that is not recommended because you could still run into cases where the response resolution takes longer than the timeout you've set. The route/wait pattern is the better, more stable approach.
Just in case you want to know how its done though, you would change your get() to something like:
cy.get('.heading-large', {defaultCommandTimeout: 60000}).should('contain', 'Registration details')
Again, other way would be much better.
Reference:
Cypress configuration
It looks like we need to wait for the Cypress bug "Some tests flake only if test runner's browser loses focus (or run headlessly)" to be fixed. This is because I have tried the alternative, helpful answers but am consistently facing the original issue when the window is out of focus.
Thank you to those who have answered and commented.
I have some async form validation code that I'd like to put under test using Cypress. The code is pretty simple -
on user input, enter async validation UI state (or stay in that state if there are previous validation requests that haven't been responded to)
send a request to the server
receive a response
if there are no pending requests, leave async validation UI state
Step 1 is the part I want to test. Right now, this means checking if some element has been assigned some class -- but the state changes can happen very fast, and most of the time (not always!) Cypress times out waiting for something that has ALREADY happened (in other words, step 4 has already occurred by the time we get around to seeing if step 1 happened).
So the failing test looks like:
cy.get("#some-input").type("...");
cy.get("#some-target-element").should("have.class", "class-to-check-for");
Usually, by the time Cypress gets to the second line, step 4 has already ran and the test fails. Is there a common pattern I should know about to solve this? I would naturally prefer not to have change the code under test.
Edit 1:
I'm not certain that I've 100% solved the "race" condition here, but if I use the underlying native elements (discarding the jQuery abstraction), I haven't had a failure yet.
So, changing:
cy.get("#some-input").type("...")
to:
cy.get("#some-input").then(jQueryObj => {
let nativeElement = jQueryObj[0];
nativeElement.value = "...";
nativeElement.dispatchEvent(new Event("input")); // make sure the app knows this element changed
});
And then running Cypress' checks for what classes have / haven't been added has been effective.
You can stub the server request that happens during form validation - and slow it down, see delay parameter https://docs.cypress.io/api/commands/route.html#Use-delays-for-responses
While the request is delayed, your app's validation UI is showing, you can validate it and then once the request finishes, check if the UI goes away.
TL;DR
How to safely await on function execution (takes str and int as arguments and doesn't require any other context) in a separate process?
Long story
I have aiohtto.web web API that uses Boost.Python wrapper for C++ extension, run under gunicorn (and I plan to deploy it on Heroku), tested by locust.
About extension: it have just one function that does non-blocking operation - takes one string (and one integer for timeout management), does some calculations with it and returns a new string. And for every input string, it is only one possible output (except timeout, but in that case, C++ exception must be raised and translated by Boost.Python to a Python-compatible one).
In short, a handler for specific URL executes the code below:
res = await loop.run_in_executor(executor, func, *args)
where executor is the ProcessPoolExecutor instance, and func -function from C++ extension module. (in the real project, this code is in the coroutine method of the class, and func - it's classmethod that only executes C++ function and returns the result)
Error catching
When a new request arrives, I extract it's POST data by request.post() and then storing it's data to the instance of the custom class named Call (because I have no idea how to name it in another way). So that call object contains all input data (string), request receiving time and unique id that comes with the request.
Then it proceeds to class named Handler (not the aiohttp request handler), that passes it's input to another class' method with loop.run_in_executor inside. But Handler has a logging system that works like a middleware - reads id and receiving time of every incoming call object and logging it with a message that tells you either it just starting to execute, successfully executed or get in trouble. Also, Handler have try/except and stores all errors inside the call object, so that logging middleware knows what error occurred, or what output extension had returned
Testing
I have the unit test that just creates 256 coroutines with this code inside and executor that have 256 workers and it works well.
But when testing with Locust here comes a problem. I use 4 Gunicorn workers and 4 executor workers for this kind of testing. At some time application just starts to return wrong output.
My Locust's TaskSet is configured to log every fault response with all available information: output string, error string, input string (that was returned by the application too), id. All simulated requests are the same, but id is unique for every.
The situation is better when setting Gunicorn's max_requests option to 100 requests, but failures still come.
Interesting thing is, that sometimes I can trigger "wrong output" period by simply stopping and starting Locust's test.
I need a 100% guarantee that my web API works as I expect.
UPDATE & solution
Just asked my teammate to review the C++ code - the problem was in global variables. In some way, it wasn't a problem for 256 parallel coroutines, but for Gunicorn was.
I want to send notifications to apple devices in batches (1.000 device tokens in batch for example). Ant it seems that I can't know for sure that message was delivered to APNs.
Here is the code sample:
ssl_connection(bundle_id) do |ssl, socket|
device_tokens.each do |device_token|
ssl.write(apn_message_for device_token)
# I can check if there is an error response from APNs
response_has_an_error = IO.select([socket],nil,nil,0) != nil
# ...
end
end
The main problem is if network is down after the ssl_connection is established
ssl.write(...)
will never raise an error. Is there any way to ckeck that connection still works?
The second problem is in delay between ssl.write and ready error answer from APNs. I can pass timeout parameter to IO.select after last messege was sent. Maybe It's OK to wait for a few seconds for 1.000 batch, but wat if I have to send 1.000 messages for differend bundle_ids?
At https://zeropush.com, we use a gem named grocer to handle our communication with Apple and we had a similar problem. The solution we found was to use the socket's read_non_block method before each write to check for incoming data on the socket which would indicate an error.
It makes the logic a bit funny because read_non_block throws IO::WaitReadable if there is no data to read. So we call read_non_block and catch IO::WaitReadable before continuing as normal. In our case, catching the exception is the happy case. You may be able to use a similar approach rather than using IO.select(...).
One issue to be aware of is that Apple may not respond immediately and any notifications sent between a failing notification and reading from the socket will be lost.
You can see the code we are using in production at https://github.com/SymmetricInfinity/grocer/blob/master/lib/grocer/connection.rb#L30.
I'm new to XMPP and the xmpp4r library, so please forgive my noob question if this is obviously documented somewhere.
What's the most straightforward way, in a synchronous manner, to find out if a given JID is online? (so that I can call something like is_online?(jid) in an if statement)
My details:
I'm writing a Sinatra app that will attempt to send a message to a user when a particular url gets requested on the web server, but it should only try to send the message to the user if that user is currently online. Figuring out if a given JID is online is my problem.
Now, I know that if I connect and wait a few seconds for all the initial presence probe responses to come back to the Roster helper, then I can inspect any of those presences from my Roster and call #online? on them to get the correct value. But, I don't know when all of the presence updates have been sent, so there's a race condition there and sometimes calling #online? on a presence from my roster will return false if I just haven't received that presence probe response yet.
So, my current thinking is that the most straightforward way to find out if someone is online is to construct a new Presence message of type :probe and send that out to the JID that I'm interested in. Here's how I'm doing it right now:
#jabber is the result of Client::new
#email is the jid I'm interested in polling
def is_online?(jabber, email)
online = false
p = Presence.new
p.set_to(email)
p.set_from(jabber.jid)
p.set_type(:probe)
pres = jabber.send(p) do |returned_presence|
online = returned_presence.nil?
end
return online
end
Now, this works in cases where the user is actually online, but when the user is offline, it looks like the presence probe message that comes back is being caught by some other presence_callback handler that doesn't know what to do with it, and my is_online? function never finishes returning a value.
Can anyone help me by providing a simple example is_online? function that I can call, or point me in the right direction for how I can detect when the roster is done getting all the initial presence updates before I try checking a presence for #online?
As it turns out, there's not a synchronous way to ask for a JID presence. You've just got to ask for what you want, then wait for your response handler to fire when the response arrives.