IVR technical implementation details - ivr

I extensively searched a lot about IVR technical implementation details on the web. But, could not not find appropriate results.
Can anyone help me with details on what IVR is (I mean detailed technical components involved, their interaction, integration, how call flows, networks, servers, databases involved, input output responses required). I looked for VoiceXML in detail, but how is it
Need to know the technical architecture of IVR.
Need to develop an IVR system using Java, but first would like to know the above information so as to make a foundation base.

Ricky from Twilio here.
We built an example IVR in Java you can check out. With our architecture, when someone makes a phone call to our IVR an HTTP request is made to our server where we provide some basic instructions of what we'd like to respond back to the user with using TwiML. Here's the code from our example, we're playing an MP3 and listening for the user to press a digit on their keypad using the verb:
protected void doPost(HttpServletRequest servletRequest, HttpServletResponse servletResponse)
throws IOException {
Gather gather = new Gather();
Play play = new Play("http://howtodocs.s3.amazonaws.com/et-phone.mp3");
try {
} catch (TwiMLException e) {
TwiMLResponse twiMLResponse = new TwiMLResponse();
try {
} catch (TwiMLException e) {
Once the user presses a digit, another HTTP request will be made to the action route we specified (/menu/show in this case), where we look at what digit the user pressed and take an action:
protected void doPost(HttpServletRequest servletRequest, HttpServletResponse servletResponse)
throws IOException {
String selectedOption = servletRequest.getParameter("Digits");
Map<String, String> optionPhones = new HashMap<>();
optionPhones.put("2", "+12024173378");
optionPhones.put("3", "+12027336386");
optionPhones.put("4", "+12027336637");
TwiMLResponse twiMLResponse = null;
try {
twiMLResponse = optionPhones.containsKey(selectedOption)
? dial(optionPhones.get(selectedOption))
: Redirect.toMainMenu();
} catch (TwiMLException e) {
Hope taking a look at how we built this IVR helps!

Do you want to write everything yourself or will you have a framework like VXML to work with? If you just want to write the VXML and JSP files then you need to have a VXML browser. If you want to write everything completely yourself then making a VXML browser is probably overkill and, regardless of whether you make a VXML browser or something else, you will need to worry about abstracting the hardware - an IVR with one voice/fax/modem will need different low level code than an IVR with Dialogic cards connected to T1 lines and that would be different than one that handles just SIP calls.
Assuming you have a VXML browser already and you just need to provide the VXML and JSP files, then what you need to worry about is whether you just want call flow or if you are going to do back-end integration. If your IVR is just going to answer the call, ask for some input from the caller and then play more info and hang up or transfer then it gets really easy - you don't need Java at all. The Java is needed for the back end integration.
Assuming you are going to have back end integration - whether it is just a database or web services to another server you need to worry about doing the back end calls asynchronously - if callers hear more than a second of dead air without being warned they will think the IVR is not working and will hang up. So, when the call arrives you need to send your initial request for data, then say "Welcome to my IVR" and then attempt to retrieve the result. If the result is not yet returned you need to say something else like "Please wait while I retrieve your details" and then check again. Eventually if the request doesn't return you need a fallback plan - you can either say "That service isn't currently available" and then transfer or hang up or you could offer a reduced service IVR. Whatever you do, you don't want the customer to ever hear more than a second of silence unless you have specifically told them you are waiting for something - either waiting for them to give input or waiting for their account details (or something similar).
To have this kind of asynchronous experience with VXML and JSPs you will need an in-memory queue of requests and a execution service that can provide worker threads to service those requests. That way you can queue a request and continue the IVR call flow checking periodically for a result. The execution service will eventually process the request and update it with the result. Then, when the IVR checks and the request is available it can use that info. But if the result doesn't come back back in time the IVR will give up and stop checking so you need a static thread that scans the queue and after a certain length of time will cancel the request if the execution service is processing it and then delete the request from the queue.
A VXML brower queues the voice and doesn't wait for it to be actually played until caller input is retrieved so if you are using voice to stall while you retrieve data then the voice prompt will need to be attached to a a grammar that doesn't accept any valid input just so that the IVR knows when the voice is finished. If you absolutely need the result of the back-end request before continuing the call flow you will need to loop around checking for the result until it either arrives or a smallish timeout has elapsed (how long depends on whether you warned them it could take a while or not). The same thing applies in this case - you will need to play a small silence attached to a grammar so that the call flow waits before checking again for the result - there normally isn't much point checking more often than 100ms-200ms.
If you aren't going to use a VXML browser but instead will write something yourself then the same advice mostly applies. But if you are going to have back end integration I would recommend making the system always wait for the voice prompt to finish playing instead of just queueing it - it makes everything MUCH easier. You will still need the in-memory queue and an execution pool so that the back-end integration can be done in the background.


How to manage a slow callback function in the ESPAsyncWebServer library

I understand that delaying or yielding in the ESPAsyncWebServer library callbacks are a no-no. However, my callback function needs to query another device via the Serial port. This process is slow and will crash the ESP32 as a result.
Here is an example:
void getDeviceConfig(AsyncWebServerRequest *request) {
AsyncResponseStream *response =
StaticJsonDocument<1024> doc;
JsonArray array = doc.createNestedArray("get");
for (size_t i = 0; i < request->params(); i++)
serializeJson(doc, Serial);
/* At this point, the remote device determines what is being asked for
and builds a response. This can take fair bit of time depending on
what is being asked (>1sec) */
I looked into building a response callback. However, I would need to know ahead of time how much data the remote device will generate. There's no way for me to know this.
I also looked into using a chunked response. In this case, the library will continuously call my callback function until I return 0 (which indicates that there is no more data). This is a good start - but doesn't quite fit. I can't inform of the caller that there is definitely more data coming, I just haven't received a single byte yet. All I can do here is return 0 which will stop the caller.
Is there an alternative approach I could use here?
The easiest way to do this without major changes to your code is to separate the request and the response and poll periodically for the results.
Your initial request as you have it written would initiate the work. The callback handler would set global boolean variable indicating there was work to be done, and if there were any parameters for the work, would save them in globals. Then it would return and the client would see the HTTP request complete but wouldn't have an answer.
In loop() you'd look for the boolean that there was work to be done, do the work, store any results in global variables, set a different global boolean indicating that the work was done, and set the original boolean that indicated work needed to be done to false.
You'd write a second HTTP request that checked to see if the work was complete, and issue that request periodically until you got an answer. The callback handler for the second request would check the "work was done" boolean and return either the results or an indication that the results weren't available yet.
Doing it this way would likely be considered hostile on a shared server or public API, but you have 100% of the ESP32 at your disposal so while it's wasteful it doesn't matter that it's wasteful.
It would also have problems if you ever issued a new request to do work before the first one was complete. If that is a possibility you'd need to move to a queueing system where each request created a queue entry for work, returned an ID for the request, and then the polling request to ask if work was complete would send the ID. That's much more complicated and a lot more work.
An alternate solution would be to use websockets. ESPAsyncWebServer supports async websockets. A websocket connection stays open indefinitely.
The server could listen for a websocket connection and then instead of performing a new HTTP request for each query, the client would send an indication over the websocket that it wanted to the server to do the work. The websocket callback would work much the same way as the regular HTTP server callback I wrote about above. But when the work was complete, the code doing it would just write the result back to the client over the websocket.
Like the polling approach this would get a lot more complicated if you could ever have two or more overlapping requests.

Which of these is the best practice for web sockets in terms of performance?

This is more of a hypothetical question, so I can't really show any code examples. Imagine if a site like Twitter wanted to live-update stats on a Tweet via web sockets/Socket.io. In terms of performance, which of these would be the best approach?
Each action (like, retweet, reply) sends a message to the server, which then gets emitted to all clients, and the client is responsible for updating the appropriate tweet.
Each tweet the client loads is connected to a different room so that it only emits and receives messages relevant to itself.
Or perhaps it's dependent on the scale of the application? Maybe 1 is better if you had a Twitter clone with only a few users, whereas I would think 2 is better in Twitter's case because it's a matter of hundreds of "rooms" vs millions of signals/second? And if that's the case, at what point is one approach preferred over the other?
At scale, you do not want to be sending messages to clients that they did not ask for and do not have any use for. Imagine a twitter client that was receiving every single tweet being sent in real time. That could overwhelm that client and it would mean the server would be delivering every single tweet to every single connected client. That obviously doesn't scale on either the server side or the client side.
So option 1 is out.
The appropriate solution has the server send to the client only the messages that is has a particular interest in seeing. This works just fine at any scale. I can't tell whether your option 2 is that or not since rooms are just a tool for making groups of connections that you can send the same message to - they don't really decide who gets what message - that logic must be baked into your server code.
For a twitter-like service, it seems you're going to have to have a system where your server can easily tell which users have an interest in this particular new message. That can presumably be for a number of reasons such as they are following the author, they are following a hashtag present in the message, they are mentioned in the message, etc... That is server-side logic, not just simple rooms.

Is it good idea call third party services from axon aggregate

I have an axon aggregate. It handle command and before applying event has to invoke third party service for validation some parameters, according to this validation i apply events or not. Is it good practice? Or I have make validation before i send command?
public class SomeAggregate {
public void someHandler() {
if(thirdPartyService.invoke) {
If it's a non-blocking (domain) service, something like a finite state machine, it's okay to call from within the aggregate, since it's most likely going to finish soon.
However, 'third party service' to me sounds like an outbound call, which might take some time.
When Axon loads an aggregate, it blocks the aggregate so no other thread can change it's state/handle commands on it.
A third-party service would mean that the aggregate is blocked even longer.
Hence, I would suggest not calling a third party service in your aggregate.
Either call the service prior to entering the aggregate or perform a compensating action after command handling was finalized to revert the decision. Which of the two makes most sense in your scenario, is dependent on your domain. I see "pre-validation" through the third-party service however as the most reasonable option to take.
It depends. If your third party service has side effects and isn't idempotent then I'm not sure what to do (I'm still trying to figure it out).
If it does have side effects, then I would want the aggregate to block / lock and use the aggregate's state / history to carefully manage such an interaction like this
public class SomeAggregate {
public void someHandler() {
Reason about whether its appropriate to send a request.
e.g. if a request has been sent but no response has been received,
then depending on the third party service it might be in an indeterminate state.
Instead of trying to interact with it, it might be better
to notify someone instead.
Effectively locks this interaction / any other instances in the same path
should get a concurrent modification exception when trying to commit this event.
commit(new ThirdPartyServiceRequested())
if(thirdPartyService.invoke) {
But Axon's 'unit of work' means that the emitted events won't be published / committed until the command handler has completed, so we can't guard is this manner.
Any ideas?

Does http have to be a request/response protocol?

I have to ask a plaintive question. I know that http is normally request-response. Can it be request-done?
We have a situation where we would like to send an ajax call off to one server, and then when that completes post a form to another server. We can't send them both without coordinating them, because the post makes the browser navigate to another server, and we lose our context.
When I am currently doing is to do the first ajax call, and then in its callback, I'm doing document['order-form'].submit(). My boss pointed out that if the ajax call isn't completed for a while, the user will see his browser not make progress, even though it's still responsive. He wanted me to put a reasonable timeout on the ajax call.
But really, the ajax call is a "nice but not necessary" thing. My boss would be equally happy if we could send it and forget about it.
I'm having a lot of trouble formulating an appropriate query for Google. "Use HTTP like UDP" doesn't work. A lot of things don't work. Time to ask a human.
If you look at the ISO-OSI model of networking, HTTP is an application layer protocol and UDP is in the transport layer. HTTP typically uses TCP and rarely uses UDP. RTP (Realtime Transport Protocol) however uses UDP and is used for media streaming. Here is one more thing, UDP is not going to assure you a 100% transport, whereas TCP tries to (when packet loss is detected, TCP tries a re-transmission). So we expect drops in UDP. So when you say - fire and forget - What happens when your packet fails to reach?
So I guess you got confused between UDP and HTTP (and I am sorry if that' s not the case and there is something really with HTTP using UDP for web pages since I am not aware of it right now)
The best way, IMHO, to co-ordinate an asynchronous process like this is to have an AJAX call (with CORS enabled if required) like what you have written currently, coupled with good UI/UX frontends which intelligently shows progress/status to the end user.
Also - maybe we could tune up the process which makes the AJAX response slower..say a DB call which is supposed to return data can be tuned up a bit.
Here's what Eric Bidelman says:
// Listen to the upload progress.
var progressBar = document.querySelector('progress');
xhr.upload.onprogress = function(e) {
if (e.lengthComputable) {
progressBar.value = (e.loaded / e.total) * 100;
progressBar.textContent = progressBar.value; // Fallback for unsupported browsers.
I think this has the germ of an answer. 1) We can find out when the request has entirely gone. 2) We can choose not to have handlers for the response.
As soon as you have been informed that the request has gone out, you can take your next step, including navigating to another page.
I'm not sure, however, how many browsers support xhr.upload.onprogress.
If something is worth doing, surely it's worth knowing whether what you requested was done or not. Otherwise how can you debug problems, or give any kind of good user experience?
A response is any kind of response, it need not carry a message body. A simple 204 response could indicate that something succeeded, as opposed to a 403 or 401 which may require some more action.
I think I've figured out the answer. And it is extremely simple. Good across all browsers.
Just add xhr.timeout = 100; to your ajax call. If it takes the server a full second to respond, you don't care. You already moved on at 1/10 second.
So in my case, I put document['order-form'].submit() in my timeout handler. When the browser navigates away, I am assured that the request has finished going out.
Doesn't use any esoteric knowledge of protocols, or any recent innovations.

What would be the best implementation to detect repeating SIP message?

I've wrote a SIP UAC, and I've tried a few ways to detect and ignore repeating incoming messages from the UAS, but with every approach I tried, something went wrong, my problem is that all the messages that has to do with the same call has the same signature, and to compare all of the message text is too much, so I was wondering, what parameter that compose a message should I be looking at when trying to detect these repeating messages.
I had a problem with an incoming Options, which I handled with sending the server an empty Ok response. (Update: after a while of testing I noticed, that I still get every now and then I get another Options request, few every few second, so I try responding with a Bad request, and now I only get the Options request once/twice every registration/reregistration)
currently I have repeating messages of SessionInPogress, and different error messages such as busy here, and unavailable, I get so many of these, and it messes my log up, I would like to filter them.
any idea how to achieve that?
I'll try your Technics before posting back, perhaps this would solve my problems
Here is what I used, it works nicely:
private boolean compare(SIPMessage message1, SIPMessage message2) {
if (message1.getClass() != message2.getClass())
return false;
if (message1.getCSeq().getSeqNumber() != message2.getCSeq().getSeqNumber())
return false;
if (!message1.getCSeq().getMethod().equals(message2.getCSeq().getMethod()))
return false;
if (!message1.getCallId().equals(message2.getCallId()))
return false;
if (message1.getClass()==SIPResponse.class)
return false;
return true;
It's a bit more complicated than ChrisW's answer.
First, the transaction layer filters out most retransmissions. It does this by, for most things, comparing the received message against a list of current transactions. If a transaction's found, that transaction will mostly swallow retransmissions as per the diagrams in RFC 3261, section 17. For instance, a UAC INVITE transaction in the Proceeding state will drop a delayed retransmitted INVITE.
Matching takes place in one of two ways, depending on the remote stack. If it's an RFC 3261 stack (the branch parameter on the topmost Via starts with "z9hG4bK") then things are fairly straightforward. Section 17.2.3 covers the full details.
Matching like this will filter out duplicate/retransmitted OPTIONS (which you mention as a particular problem). OPTIONS messages don't form dialogs, so looking at CSeq won't work. In particular, if the UAS sends out five OPTIONS requests which aren't just retransmissions, you'll get five OPTIONS requests (and five non-INVITE server transactions).
Retransmitted provisional responses to a non-INVITE transaction are passed up to the Transaction-User layer, or core as it's sometimes called, but other than the first one, final responses are not. (Again, you get this simply by implementing the FSM for that transaction - a final response puts a UAC non-INVITE transaction in the Completed state, which drops any further responses.
After that, the Transaction-User layer will typically receive multiple responses for INVITE transactions.
It's perfectly normal for a UAS to send multiple 183s, at least for an INVITE. For instance it might immediately send a 100 to quench your retransmissions (over unreliable transports at least), then a few 183s, a 180, maybe some more 183s, and finally a 200 (or more, for unreliable transports).
It's important that the transaction layer hands up all these responses because proxies and user agents handle the responses differently.
At this level the responses aren't, in a way, retransmitted. I should say: a UAS doesn't use retransmission logic to send loads of provisional responses (unless it implements RFC 3262). 200 OKs to INVITEs are resent because they destroy the UAC transaction. You can avoid their retransmission by sending your ACKs timeously.
I think that a message is duplicate/identical, if its ...
and method name (e.g. "INVITE")
... values match that of another message.
Note that a response message has the same CSeq as the request to which it's responding; and, that a single request you get several, provisional, but non-duplicate responses (e.g. RINGING followed by OK).