The golang blog mentions two Go features faketime and timejump:
https://blog.golang.org/playground
The part that interests me is:
modify scheduler condition to wait for deadlock, then:
- check if timers are pending
- advance clock to trigger-time of first timer
I would like to know how I can leverage this implementation to run unittests with faketime. By this I mean many unitests that use time.Sleep. Testing in realtime is prohibitive since execution time adds up to hours. In faketime the tests run within split-seconds.
To be clear: I do not plan or want to mess up the runtime. I want to build a fake clock that works.
I am pretty convinced that the above referenced implementation works correctly in the concurrent case. If you have a idea, a tip or two on how to borrow this implementation and build a fake clock from it would be great.
My question is based on the assumption that google won't accept a pull request for runtime/time.go to turn faketime into Faketime or to add "func Faketime(f int64) {faketime = f}".
My question is based on the assumption that google won't accept a pull request for runtime/time.go to turn faketime into Faketime or to add "func Faketime(f int64) {faketime = f}".
4 years later (Sept. 2019), that assumption might be challenged.
Brad Fitzpatrick just announced:
So we're moving the #golang playground from Native Client to linux/amd64 binaries with runtime faketime support under gvisor. e.g.
And (tweet):
Austin's faketime #golang support is in master (1.14?).
See golang/go commit 5ff38e4 & its prior two commits.
Note that the sleep is simulated and instead all writes are prefixed by a binary header containing the fake time, starting from usual Go epoch (play.golang.org).
See issue 30439 and issue 30324.
The context is the playground.
This is for use by the playground and might not even be documented or supported (kept compatible).
This is so we can replay sleeps in JavaScript but execute quickly server side.
Related
Assume I have a scenario where I am processing a background job in a worker. It simply receives a URL for a file (image, video, pdf, ..) hosted on a remote CDN and the worker does its work as:
Some processing on the file content in-memory
Then calls a 3rd party API to retrieve a signed valid URL for uploading the content to that same 3rd party.
Uploads the content to the 3rd party API – the response contains a unique file ID
Sends a message to a user through the 3rd party API with the unique file ID received earlier
Now, the problem is between step (3) and (4). The constraint here is that the 3rd party API needs few seconds to process the file (step 3) before we actually send a message containing the file ID we just uploaded (step 4).
One more assumption here is that I need to make sure all 4 steps execute in one go, as in, not to have any partial failure opportunities.
Possible approaches
The most naive way to go is by using sleep 5 between step (3) and (4), it might hurt / hard fail since I am not exactly sure how many seconds does the 3rd party API needs for processing, but according to my trials, 5 seconds sleep seemed alright.
I could do an in-process exponential retry for 3 (or X) times for step (3), catch an exception from the 3rd party and attempt to do step (4) when step (3) is successful – this is what I have now, it works alright.
I could perhaps either use a job scheduler or a ruby concurrency library to do step (4) in a delayed fashion. I don't appreciate this path as it feels like it is favouring complexity.
This piece of logic is built in Ruby, though the question might not be very Ruby specific and can be applicable in other languages, I would like to hear what Ruby folks think.
The API docs you linked to say:
Attention! Some time needed by a server to process an uploaded file.
File should be sent to a chat after a short timeout (a couple of
seconds)
I would usually advise against something of this nature, but since your vendor specifically says "timeout", sleep is the best option.
I'd try doing delayed task, as it will allow thread to continue working (so thread pool won't need to create new threads (they are quite expensive from memory side), your thread may continue doing useful job without need of context switch (which is expensive from CPU usage side), ...).
As for purity of solution, asynchronous programming should not involve any blocking tasks (we are actually fighting against blocking using asynchronous programming), so this is one more reason to use delayed task.
If application does not involve achieving highest performance (does Ruby performance oriented language?), so sleep may really be easiest, but not most optimal solution.
This is more of a theorical question.
Well, imagine that I have two programas that work simultaneously, the main one only do something when he receives a flag marked with true from a secondary program. So, this main program has a function that will keep asking to the secondary for the value of the flag, and when it gets true, it will do something.
What I learned at college is that the polling is the simplest way of doing that. But when I started working as an developer, coworkers told me that this method generate some overhead or it's waste of computation, by asking every certain amount of time for a value.
I tried to come up with some ideas for doing this in a different way, searched on the internet for something like this, but didn't found a useful way about how to do this.
I read about interruptions and passive ways that can cause the main program to get that data only if was informed by the secondary program. But how this happen? The main program will need a function to check for interruption right? So it will not end the same way as before?
What could I do differently?
There is no magic...
no program will guess when it has new information to be read, what you can do is decide between two approaches,
A -> asks -> B
A <- is informed <- B
whenever use each? it depends in many other factors like:
1- how fast you need the data be delivered from the moment it is generated? as far as possible? or keep a while and acumulate
2- how fast the data is generated?
3- how many simoultaneuos clients are requesting data at same server
4- what type of data you deal with? persistent? fast-changing?
If you are building something like a stocks analyzer where you need to ask the price of stocks everysecond (and it will change also everysecond) the approach you mentioned may be the best
if you are writing a chat based app like whatsapp where you need to check if there is some new message to the client and most of time wont... publish subscribe may be the best
but all of this is a very superficial look into a high impact architecture decision, it is not possible to get the best by just looking one factor
what i want to show is that
coworkers told me that this method generate some overhead or it's
waste of computation
it is not a right statement, it may be in some particular scenario but overhead will always exist in distributed systems
The typical way to prevent polling is by using the Publish/Subscribe pattern.
Your client program will subscribe to the server program and when an event occurs, the server program will publish to all its subscribers for them to handle however they need to.
If you flip the order of the requests you end up with something more similar to a standard web API. Your main program (left in your example) would be a server listening for requests. The secondary program would be a client hitting an endpoint on the server to trigger an event.
There's many ways to accomplish this in every language and it doesn't have to be tied to tcp/ip requests.
I'll add a few links for you shortly.
Well, in most of languages you won't implement such a low level. But theorically speaking, there are different waiting strategies, you are talking about active waiting. Doing this you can easily eat all your memory.
Most of languages implements libraries to allow you to start a process as a service which is at passive waiting and it is triggered when a request comes.
In Go, if we have a type with a method that starts some looped mechanism (polling A and doing B forever) is it best to express this as:
// Run does stuff, you probably want to run this as a goroutine
func (t Type) Run() {
// Do long-running stuff
}
and document that this probably wants to be launched as a goroutine (and let the caller deal with that)
Or to hide this from the caller:
// Run does stuff concurrently
func (t Type) Run() {
go DoRunStuff()
}
I'm new to Go and unsure if convention says let the caller prefix with 'go' or do it for them when the code is designed to run async.
My current view is that we should document and give the caller a choice. My thinking is that in Go the concurrency isn't actually part of the exposed interface, but a property of using it. Is this right?
I had your opinion on this until I started writing an adapter for a web service that I want to make concurrent. I have a go routine that must be started to parse results that are returned to the channel from the web calls. There is absolutely no case in which this API would work without using it as a go routine.
I then began to look at packages like net/http. There is mandatory concurrency within that package. It is documented at the interface level that it should be able to be used concurrently, however the default implementations automatically use go routines.
Because Go's standard library commonly fires of go routines within its own packages, I think that if your package or API warrants it, you can handle them on your own.
My current view is that we should document and give the caller a choice.
I tend to agree with you.
Since Go makes it so easy to run code concurrently, you should try to avoid concurrency in your API (which forces clients to use it concurrently). Instead, create a synchronous API, and then clients have the option to run it synchronously or concurrently.
This was discussed in a talk a couple years ago: Twelve Go Best Practices
Slide 26, in particular, shows code more like your first example.
I would view the net/http package as an exception because in this case, the concurrency is almost mandatory. If the package didn't use concurrency internally, the client code would almost certainly have to. For example, http.Client doesn't (to my knowledge) start any goroutines. It is only the server that does so.
In most cases, it's going to be one line of the code for the caller either way:
go Run() or StartGoroutine()
The synchronous API is no harder to use concurrently and gives the caller more options.
There is no 'right' answer because circumstances differ.
Obviously there are cases where an API might contain utilities, simple algorithms, data collections etc that would look odd if packaged up as goroutines.
Conversely, there are cases where it is natural to expect 'under-the-hood' concurrency, such as a rich IO library (http server being the obvious example).
For a more extreme case, consider you were to produce a library of plug-n-play concurrent services. Such an API consists of modules each having a well-described interface via channels. Clearly, in this case it would inevitably involve goroutines starting as part of the API.
One clue might well be the presence or absence of channels in the function parameters. But I would expect clear documentation of what to expect either way.
I have a a Nagios configuration which is performing a number of tests on a few hundred nodes; one of these is a variant of check_http. It's not configured to --enable-embedded-perl (ePN) but we'll be changing that soon. Even with ePN enabled I'm concerned about the model where each execution of this Perl HTTP+SSL check will be handling only a single target.
I'd like to write a simple select() (or poll() / epoll()) driven daemon which creates connections to multiple targets concurrently, reads the results and spits out results in a form that's useable to Nagios as if it were results from a passive check.
Is there a guide to how one could accomplish this? What's the interface or API for providing batched check updates to Nagios?
One hack I'm considering would be to have my daemon update a Redis store (with a key for each target, and a short expiration time) and replace check_http with a very small, lightweight GET of the local Redis instance on the key (the GET would either get the actual results for Nagios or a "(nil)" response which will be treated as if the HTTP connection had timed out.
However, I'm also a bit skeptical of my idea since I'd think someone has already something like this by now.
(BTW: I'm ready to be convinced to switch to something like Icinga or Zabbix or Zenoss or OpenNMS ... pretty much anything that will scale better).
As to whether or not to let Nagios handle the scheduling and checks, I'll leave that to you as it varies depending on your version of Nagios (newer versions can run these checks concurrently), and why you want a separate daemon for it. egarding versioning of Nagios, version 3 IIRC uses concurrent checks, and scales thusly to larger node counts than you report.
However, I can answer the Redis route concept as I've done it with Postfix queue stats and TTFB tracking for web sites.
Setting up the check using Python with the curl and multiprocessing modules is fairly straightforward as is dumping it into Redis. An expiration of I'd say no more than the interval would be a solid idea to keep the DB from growing. I'd recommend tis value be no more (or possibly just less than) the check interval to avoid grabbing stale check results. If the currently running check hasn't completed and the Redis-to-Nagios check runs, pulling in the previous check, you can miss failed checks.
For the Redis-To-Nagios check a simple redis-cli+bash scripting or Python check to pull the data for a given host, returning OK or otherwise depending on your data is fairly simple and would run quickly enough.
I'd recommend running the Redis instance on the Nagios check server to ensure minimum latency and avoid a network issue causing false alerts on your checks. I would also recommend a Nagios check on your Redis instance and the checking daemon. Make the check_http replacement check dependent on the Redis and http_check daemons running. THus you have a dependency chain as follows:
Redis -> http_checkd -> http_check_replacement
This will prevent false alerts on http_check_replacement by identifying the problem. For example, if your redis_checkd dies you get alerted to that, not 200+ "failed http_check_replacement" ones.
Also, since your data in Redis is by definition transient, I would disable the disk persistence. No need to write to disk when the data is constantly rotating.
On a side note, I would recommend, if using libcurl, you pull statistics from libcurl about how long it takes to get the connection open and how long the server to to respond (Time To First Byte - TTFB) and take advantage of Nagios's ability to store check statistics. You may well reach a time when having that data is really handy for troubleshooting and performance analysis.
I have a CLI Tool I've written in C which does this and uploads it into a local Redis instance. It is fast - barely more than the time to get the URL. I'm expecting it be open sourced this week, I can add Nagios style output to it fairly easily. In fact, I think I'll do that in the next week or two.
we've got an application developed in java, with GWT providing the frontent. The application is used on a variety of hardware specifications, e.g. also on older machines. Of course users complain about performance.
We'd like to collect profiling data from real-world users. So far we can measure the pure server-side duration (that's easy) and the duration of the network roundtrip (not so easy, but we managed that).
The hardest part for us is measuring the time elapsed between "user clicking on search button" and "first xxx rows of grid have been displayed".
Any idea?
Thanks
Holger
I would play around with creating a timestamp at the start of the page load and a timestamp at the end. I believe that "the beginning" would be "onModuleLoad" and the end would be after your last element/widget is added. I hope I have given you a good idea of where to start. You can play with moving these timestamps around to mazimize the time difference that you get. Once you feel confident that you are getting the rendering time, you can save the time difference in a database so that whenever anyone uses your page you get more user data.
Try using SpeedTracer, it's a google chrome plugin developed by google itself
There is no full-stack solution at the moment as far as I know. What you could build internally is a combination of remote logging, gwt lightweight measurements and deferred binding magic.
The first part is to understand that all RPC events and initialization sequence are already measured and how to plug in into that: http://code.google.com/webtoolkit/doc/latest/DevGuideLightweightMetrics.html
The second part is adding Deferred Binding magic to measure onSuccess() method execution time of application Callbacks. The inspiration (but not a solution) could be found here: http://josephmarques.wordpress.com/2010/11/29/performance-monitoring-using-gwt/
The final part is delivering back to client. Here you may use gwt-log or new gwt logging possibilities. Not sure if they implemented that in JDK logging though.
I was thinking to create an embeddable open source library today as we have solved exactly the same problem recently and in process of porting that to GWT 2.0 :)
But I guess it will take some time from idea to implementation...
Hope it helps.
Dmitry
You can use the Duration class available on GWT Client side. com.google.gwt.core.client.Duration
It is a utility class for measuring elapsed time on the client side.
Example usage:
Duration duration = new Duration();
doSomething(); //Returns the number of milliseconds that have elapsed since this object was created.
GWT.log("time taken for doSomething() to complete: "+duration.elapsedMillis());
Documentation
More examples