How get a data without polling? - algorithm

This is more of a theorical question.
Well, imagine that I have two programas that work simultaneously, the main one only do something when he receives a flag marked with true from a secondary program. So, this main program has a function that will keep asking to the secondary for the value of the flag, and when it gets true, it will do something.
What I learned at college is that the polling is the simplest way of doing that. But when I started working as an developer, coworkers told me that this method generate some overhead or it's waste of computation, by asking every certain amount of time for a value.
I tried to come up with some ideas for doing this in a different way, searched on the internet for something like this, but didn't found a useful way about how to do this.
I read about interruptions and passive ways that can cause the main program to get that data only if was informed by the secondary program. But how this happen? The main program will need a function to check for interruption right? So it will not end the same way as before?
What could I do differently?

There is no magic...
no program will guess when it has new information to be read, what you can do is decide between two approaches,
A -> asks -> B
A <- is informed <- B
whenever use each? it depends in many other factors like:
1- how fast you need the data be delivered from the moment it is generated? as far as possible? or keep a while and acumulate
2- how fast the data is generated?
3- how many simoultaneuos clients are requesting data at same server
4- what type of data you deal with? persistent? fast-changing?
If you are building something like a stocks analyzer where you need to ask the price of stocks everysecond (and it will change also everysecond) the approach you mentioned may be the best
if you are writing a chat based app like whatsapp where you need to check if there is some new message to the client and most of time wont... publish subscribe may be the best
but all of this is a very superficial look into a high impact architecture decision, it is not possible to get the best by just looking one factor
what i want to show is that
coworkers told me that this method generate some overhead or it's
waste of computation
it is not a right statement, it may be in some particular scenario but overhead will always exist in distributed systems

The typical way to prevent polling is by using the Publish/Subscribe pattern.
Your client program will subscribe to the server program and when an event occurs, the server program will publish to all its subscribers for them to handle however they need to.

If you flip the order of the requests you end up with something more similar to a standard web API. Your main program (left in your example) would be a server listening for requests. The secondary program would be a client hitting an endpoint on the server to trigger an event.
There's many ways to accomplish this in every language and it doesn't have to be tied to tcp/ip requests.
I'll add a few links for you shortly.

Well, in most of languages you won't implement such a low level. But theorically speaking, there are different waiting strategies, you are talking about active waiting. Doing this you can easily eat all your memory.
Most of languages implements libraries to allow you to start a process as a service which is at passive waiting and it is triggered when a request comes.

Related

Parallel Req/Rep via Pub/Sub

I have multiple servers, at any point, one and only one will be the leader whcih can respond to a request, all others just drop the request. The issue is that the client does not know which server is the leader.
I have tried using a pub socket on the client for the parallel request out, however I can't work out the right semantics for the response. In terms of how to get the server to respond to that specific client.
A hacky solution which I have tried is to have a sub socket on the client to pub sockets on all the servers, with the leader responding by publishing a message with a filter such that it only goes to the client.
However I am unable to receive any responses this way, the server believes that it sent the message and the client believes it subscribed to "" but then doesn't receive anything...
So I am wondering whether there is a more proper way of doing this? I have thought that potentially a dealer/router with sending to a specific client would work, however I am unsure how to do that.
Essentially I am trying to do a standard Req/Rep however doing the req in parallel to all the nodes, rather than round robin.
UPDATE: By sending the routing id of the dealer in the pub request, making the remote call idempotent (just returning pre-computed results on repeated attempts), and then sending the result back via a router, with message filtering on the receiving side, it now works.
Q : " is (there) a more proper way of doing this? "
Yes.
Start to apply the Maslow's Hammer rule:
“When the only tool you have is a hammer, every problem begins to resemble a nail.”
In other words, do not try use (one) hammer for solving every problem. PUB/SUB-archetype was designed to serve those-and-only-those multi-party Formal-Communications-Pattern archetypes, where many SUB-scribe to .recv() some PUB-lisher(s) .send()-broadcast messages, but nothing other.
Similarly, REQ/REP-archetype was defined and implemented so as to serve one-and-only-one multi-party distributed Formal-Communications-Pattern ( and will obviously not meet any use-case, which has any single other or even a slightly different requirement ).
Users often require some special, non-trivial features, that obviously were not a part of the said trivial Formal-Communications-Pattern archetype primitives ( those ready-made blocks, made available in the ZeroMQ toolbox ).
It is architecs' / designers' role to define, analyse and implement any more complex user-specific distributed-behaviour definition ( a protocol ) and to implement it, most often using a layered combination of the ready-made ZeroMQ primitives.
If in doubts, take a sheet of paper and pencil, draw a small crowd of kids on playground and sketch their "shouts", their "listening", their "silence", "waiting" and "doubts", their many or few "replies", their "voting" and "anger" of not being voted for by friends, their fight for a place on the Sun and their "persistence" not to let others take theirs turn and let 'em sit on the "swing" after releasing the so far pleasurable swinging oneselves.
All this is the part of finding the right mix of ( protocol-orchestrated ) levels of control and levels of freedom to act.
There we get the new, distributed-behaviour, tailor-made for your specific use-case.
Probability to find a ready-made primitive tool to match and fulfill any user-specific use case is limitlessly close to Zero ( sure, unless one's own, user-specific use-case requirements match all those of the primitive archetype, but that is not a user-specific use-case, but a re-use of an already implemented archetype for the very same situation, that was foreseen by the ZeroMQ fathers, wasn't it? )
Again, welcome to the art of Zen-of-Zero.
Maylike to readthis and this and this

Eventual Consistency in microservice-based architecture temporarily limits functionality

I'll illustrate my question with Twitter. For example, Twitter has microservice-based architecture which means that different processes are in different servers and have different databases.
A new tweet appears, server A stored in its own database some data, generated new events and fired them. Server B and C didn't get these events at this point and didn't store anything in their databases nor processed anything.
The user that created the tweet wants to edit that tweet. To achieve that, all three services A, B, C should have processed all events and stored to db all required data, but service B and C aren't consistent yet. That means that we are not able to provide edit functionality at the moment.
As I can see, a possible workaround could be in switching to immediate consistency, but that will take away all microservice-based architecture benefits and probably could cause problems with tight coupling.
Another workaround is to restrict user's actions for some time till data aren't consistent across all necessary services. Probably a solution, depends on customer and his business requirements.
And another workaround is to add additional logic or probably service D that will store edits as user's actions and apply them to data only when they will be consistent. Drawback is very increased complexity of the system.
And there are two-phase commits, but that's 1) not really reliable 2) slow.
I think slowness is a huge drawback in case of such loads as Twitter has. But probably it could be solved, whereas lack of reliability cannot, again, without increased complexity of a solution.
So, the questions are:
Are there any nice solutions to the illustrated situation or only things that I mentioned as workarounds? Maybe some programming platforms or databases?
Do I misunderstood something and some of workarounds aren't correct?
Is there any other approach except Eventual Consistency that will guarantee that all data will be stored and all necessary actions will be executed by other services?
Why Eventual Consistency has been picked for this use case? As I can see, right now it is the only way to guarantee that some data will be stored or some action will be performed if we are talking about event-driven approach when some of services will start their work when some event is fired, and following my example, that event would be “tweet is created”. So, in case if services B and C go down, I need to be able to perform action successfully when they will be up again.
Things I would like to achieve are: reliability, ability to bear high loads, adequate complexity of solution. Any links on any related subjects will be very much appreciated.
If there are natural limitations of this approach and what I want cannot be achieved using this paradigm, it is okay too. I just need to know that this problem really isn't solved yet.
It is all about tradeoffs. With eventual consistency in your example it may mean that the user cannot edit for maybe a few seconds since most of the eventual consistent technologies would not take too long to replicate the data across nodes. So in this use case it is absolutely acceptable since users are pretty slow in their actions.
For example :
MongoDB is consistent by default: reads and writes are issued to the
primary member of a replica set. Applications can optionally read from
secondary replicas, where data is eventually consistent by default.
from official MongoDB FAQ
Another alternative that is getting more popular is to use a streaming platform such as Apache Kafka where it is up to your architecture design how fast the stream consumer will process the data (for eventual consistency). Since the stream platform is very fast it is mostly only up to the speed of your stream processor to make the data available at the right place. So we are talking about milliseconds and not even seconds in most cases.
The key thing in these sorts of architectures is to have each service be autonomous when it comes to writes: it can take the write even if none of the other application-level services are up.
So in the example of a twitter like service, you would model it as
Service A manages the content of a post
So when a user makes a post, a write happens in Service A's DB and from that instant the post can be edited because editing is just a request to A.
If there's some other service that consumes the "post content" change events from A and after a "new post" event exposes some functionality, that functionality isn't going to be exposed until that service sees the event (yay tautologies). But that's just physics: the sun could have gone supernova five minutes ago and we can't take any action (not that we could have) until we "see the light".

Laravel Raffle Project. Is a Queue the best way to achieve this?

I'm creating a raffle site as a small side project. It will handle multiple raffles each with an end time. At the end of each raffle a single winner is chosen.
Are Laravel Jobs the best way to go with this? Do I just create a single forever-repeating job to check if any raffles have ended and need a winner?
If not, what would be the best way to go?
I don't think that forever-repeating scripts are generally a good idea.
I just create a single forever-repeating job
This is almost never a good idea. It has its applications in legacy code bases but websockets and events are best considered for this job. Also, you have the benefit of using a really good framework like Laravel, so take advantage of it
Websockets
If you want people to be notified in real time in the browser.
If you have all your users subscribe to a websocket channel when they load the page, you can easily send a message to a websocket server to all subscribed clients (ie browsers) to let them know who the winner is.
Then, in your client side code (Javascript), you can parse that message to determine who the winner is and render a pop up that let's the user know.
Events
If you don't mind a bit of a delay, most definitely use events for this.
At the end of every action that might potentially end a raffle (ie, a name is chosen at random by a computer - function chooseName()). Fire an event that notifies all participants in the raffle.
https://laravel.com/docs/5.2/events
NB: I've listed the above two as separate issues, but actually, the could be used together. For example, in the event that a name is chosen at random, determine if the raffle is over and notify clients via a websocket connection.
Why I wouldn't use delayed Jobs
The crux of the reason - maintainability
Imagine a scenario where something extends the time of your raffle by a week. This could've happened because a raffle was cheated on or whatever (can't really think of all the use cases in that area).
Now, your job has a set delay in place - is it really a good programming principle to have to change two things when only one scenario changed? Nope. Having something like an event in place - onRaffleEnd - explicitly looks for the occurrence of an event. Laravel doesn't care when that event happens.
Using delayed Jobs can work - it's just not a good programming use case in your scenario and limits what you're able to do in the longer run. It will force you to make more considerations when unforeseen circumstances come along as well as when you want to change things. This also decentralizes the logic related to your raffle. Whilst decoupling code is good practice, having logic sit in completely different places makes maintenance a nightmare.

How to design and structure a program that uses Actors

From Joe Armstrong's dissertation, he specified that an Actor-based program should be designed by following three steps. The thing is, I don't understand how the steps map to a real world problem or how to apply them. Here's Joe's original suggestion.
We identify all the truly concurrent activities in our real world activity.
We identify all message channels between the concurrent activities.
We write down all the messages which can flow on the different message channels.
Now we write the program. The structure of the program should exactly follow the structure of the problem. Each real world concurrent activity should be mapped onto exactly one concurrent process in our programming language. If there is a 1:1 mapping of the problem onto the program we say that the program is isomorphic to the problem.
It is extremely important that the mapping is exactly 1:1. The reason for this is that it minimizes the conceptual gap between the problem and the solution. If this mapping is not 1:1 the program will quickly degenerate, and become difficult to understand. This degeneration is often observed when non-CO languages are used to solve concurrent problems. Often the only way to get the program to work is to force several independent activities to be controlled by the same language thread or process. This leads to an inevitable loss of clarity, and makes the programs subject to complex and irreproducible interference errors.
I think #1 is fairly easy to figure out. It's #2 (and 3) where I get lost. To illustrate my frustration I stubbed out a small service available in this gist (Ruby service with callbacks).
Looking at that example service I can see how to answer #1. We have 5 concurrent services.
Start
LoginGateway
LogoutGateway
Stop
Subscribe
Some of those services don't work (or shouldn't) depending on the state the service is in. If the service hasn't been Started, then Login/Logout/Subscribe make no sense. Does this kind of state information have any relevance to Joe's 3 steps?
Anyway, given the example/mock service in that gist, I'm wondering how someone would go about designing a program to wrap this service up in an Actory fashion. I would just like to see a list of guidelines on how to apply Joe's 3 steps. Bonus points for writing some code (any language).
Generally, when structuring an application to use actors you have to identify the concurrent features of your application, which can be tricky to get the hang of. You identify 5 concurrent "services":
Start
LoginGateway
LogoutGateway
Stop
Subscribe
1, 4 and 5 seem to be types of messages that can flow through the system, 2 and 3 I'm not sure how to describe. Your gist is rather large and not super clear to me, but it looks like you've got some kind of message queue system. The actions a User can take are:
Log in to the system
Log out of the system
Subscribe to a Queue of messages
I'll assume logging in and out requires some auth step. I'll assume further that if the user fails the auth step their connection is broken but that creating a connection is not sufficient authentication.
The actions the System takes are:
Handling User actions
Routing messages to subscribers of a Queue
If that's not broadly true, let me know and I'll change this answer. (I'll assume that the messages that get sent to users are not generated by users but are an intrinsic part of the System; maybe we're discussing a monitoring service.) Anyhow, what is concurrent here? A few things:
Users act independently of one another
Queues have separate states
An actor based architecture represents each concurrent entity as its own process. The User is a finite state machine which authenticates, subscribes to a queue, alternatively receives messages and subscribes to more queues and eventually disconnects. In Erlang/OTP we'd represent this by a gen_fsm. The User process carries all the state needed to interact with the client which, if we're exposing a service over a network, would be a socket.
Authentication implies that the System is itself a 'process', though, more likely than not it's really a collection of processes which in Erlang/OTP we call an application. I digress. For simplification we'll assume that System is itself a single process which has some well-defined protocol and a state that keeps user credentials. User logins are, then, a well-defined message from a User process to the System process and the response therefrom. If there were no authentication we'd have no need for a System process as the only state related to a User would be a socket.
The careful reader will ask where do we accept socket connections for each User? Ah, good question. There's another concurrent entity in not mentioned, which we'll call here the Listener. It's another process that only listens for connections, creates a User for each new established socket and hands over ownership to the new User process, then loops back to listen.
The Queue is also a finite state machine. From its start state it accepts User subscription requests via a well-defined protocol, broadcasts messages to subscribers or accepts unsubscribe requests from User processes. This implies that the Queue has an internal store of User processes, the details of which are very dependent on language and need. In Erlang/OTP, for example, each Queue process would be a gen_server which stored User process ids--or PIDs--in a list and for each message to transmit simply did a multi-send to each User process in the list.
(In Erlang/OTP we'd user supervisors to ensure that processes stay alive and are restarted on death, which simplifies greatly the amount of work an Erlang developer has to do to ensure reliability in an actor-based architecture.)
Basically, to restate what Joe wrote, actor based architecture boils down to these points:
identify concurrent entities in the system and represent them in the implementation by processes,
decide how your processes will send messages (a primitive operation in Erlang/OTP, but something that has to be implemented explicitly in C or Ruby) and
create well-defined protocols between entities in the system which hide state modification.
It's been said that the Internet is the world's most successful actor based architecture and, really, that's not far off.

Distributed time synchronization and web applications

I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.
Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).
It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.
If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)
Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.
Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.

Resources