Amazon EC2 PublicDnsName - amazon-ec2

How to get PublicDnsName in amazonEC2?
we can get it from instance using ins.getPublicDnsName() but it get created after a certain time,is there any alternative way to get it?
Or some how get it as soon as it is generated?
for making it wait i did
while(flag) {
time = System.currentTimeMillis() - start;
for (Reservation res : ec2.describeInstances().getReservations()) {
for (Instance ins : res.getInstances()) {
if(ins.getState().getName().equalsIgnoreCase("running") || time == MAX_TIME_FOR_THREAD){
System.out.println(ins.getPublicDnsName()+"#########"+ins.getInstanceId());
flag = false;
break;
}
}
}
}
but inside a run of thread but while I am creating multiple ec2 resources it is returning dns of first up machine multiple times where as i feel it should return different dns address.

There is always a slight delay between the instance transitioning to the 'Running' state and the public IP address (and therefore the public DNS name) being allocated - and a longer one if you're also automatically allocating an Elastic IP. It can vary from a second or two up to 5-10 seconds on a busy cluster.
There's nothing you can do but introduce a short delay after detecting the 'Running' state before querying the API - although in your code you're not actually waiting long enough for the instance to get to 'Running' and just hitting MAX_TIME_FOR_THREAD, which is interestingly returning the last DNS name retrieved. This might actually be a small glitch in the API you're using.
In my launcher (written against the .NET API) I have a multithreaded staging loop which launches the instances, then a sleep wait for each to register as 'Running', and then a further sleep wait for the IP address to populate.

Related

How can I gracefully close my app when the EC2 instance is shut down?

I have a data processing app which takes many hours to run. One of our customers is running it on an AWS EC2 Windows 2012 instance which has a shutdown schedule defined. As a result, it keeps getting cut off part way through each run.
This doesn't pose a problem for the data processing itself: each data row is processed atomically, and it can be safely restarted afterwards without re-doing any of the data rows already processed. However, it does write lots of summary data to disk at the end of the process, which I could really do with seeing, and this data is not getting written.
The app is written in VB.NET WinForms. There is a form that displays progress on screen, and the processing logic is done via a BackgroundWorker component.
The form handles cancellation events as follows:
Private Sub MyBase_FormClosing(sender As Object, e As FormClosingEventArgs) Handles MyBase.FormClosing
' If the user tried to close the app, ask if they are sure, and if not, then don't close down.
If e.CloseReason = CloseReason.UserClosing AndAlso ContainsFocus AndAlso Not PromptToClose() Then
e.Cancel = True
Return
End If
' If the background worker is running, then politely request it to stop. Otherwise, let the form close down.
If worker.IsBusy Then
If Not worker.CancellationPending Then
worker.CancelAsync()
End If
e.Cancel = True
Else
Environment.ExitCode = AppExitCode.Cancelled
End If
End Sub
The background thread polls for cancellation once per data row processed, as follows:
Private Sub ReportProgress(state as UIState)
worker.ReportProgress(0, state)
If worker.CancellationPending Then
Throw New AbortException ' custom exception type
End If
End Sub
The top-level Catch block in the background thread handles all success and failure paths appropriately, writing all summary data to disk, before exiting the thread.
I have tested this extensively outside AWS. In particular, I have verified that, if you end the process via Task Manager, then it gracefully shuts down straight away without prompting the user, and writes all summary data to disk as expected. The graceful shutdown takes a few seconds at most.
My problem is, the summary data is not written to disk when AWS shuts down the EC2 instance on schedule. I don't know exactly what happens during the EC2 shutdown process, and my searching of the EC2 documentation has not yet shed light on this. I guess it could be one of:
AWS or Windows is terminating running processes without sending WM_CLOSE messages
AWS or Windows is giving insufficient time to each process to shut down
Can anyone clarify how the EC2 shutdown process works, or suggest how I can improve my code to handle it?

Performance Azure function with multiple output bindings

Hello all who read this,
We have written a router function on azure in an app plan that receives messages from iothub
and depending the message type we route our message to another eventhub.
Previously we had 6 out bindings to eventhubs in this function
Recently we added 3 more message type so 3 more out binding to 3 more eventhubs
No processing of the messages happen in this function but what we see now is that we spend 16 times more time in the routing function.
Is there a performance issue about having multiple output bindings.
We don't see an increase in load of the incoming messages.
We are running on azure functions 1.0 (Runtime version: 1.0.12205.0 (~1))
Regards Ben
Simplified Sample code of the routing function
public static class IotHubRouterFunction
{
[FunctionName("IotHubRouterFunction")]
public static void Run([EventHubTrigger("%iothub%", Connection = "IothubRouterListen")]EventData myEventHubData,
[EventHub("%msg1-eventhub%", Connection = "msg1event")] ICollector<EventData> eventHub4Dmsg1Event,
[EventHub("%msg2-eventhub%", Connection = "msg2event")] ICollector<EventData> eventHub4Dmsg2Event,
[EventHub("%msg3-eventhub%", Connection = "msg3event")] ICollector<EventData> eventHub4Dmsg3Event,
//... like 6 more bindings like this
ILogger logger
)
{
try
{
var messageType = GetValue(myEventHubData.Properties, "type");
// routing
switch (messageType)
{
case "msg1event":
{
eventHub4DevicesStatusChanged.Add(eventHub4Dmsg1Event);
break;
}
case "msg2event":
{
eventHub4MeasurementLog.Add(eventHub4Dmsg2Event);
break;
}
case "msg3event":
{
eventHub4DeviceDiscovered.Add(eventHub4Dmsg3Event);
break;
}
//6 more cases like this
default:
{
logger.LogError("Unrouteable message of type: {messageType}", messageType);
break;
}
}
}
catch (Exception ex)
{
//removed
}
}
}
With 6 bindings the message fly through the router function at 50ms
With 9 bindings the message crawl through the router function at 800ms
CPU raised with 30% as well on the applan (we scaled extra so we have it under control but why so much what is causing this)
A little late with the follow up of what happened
In the end we found out what was going on
We have several instances of our app plan
but the old monitoring solution showed the average of the cpu and memory overall the instances of the applan.
Basically with switching to the newer metrics and azure monitoring we were able to drill down in the separate instances of the app plan and the instances of the functions.
We found out that one instance of a function which was running three times two of them norammly but the third function had crashed it's internal apppool and consumed all cpu power it got hold off and did absolutely nothing.
We restarted the function and all issues were gone.
Still wondering if it was something in our code that made it go through the roof
or that something happened in azure that made it go crazy.
:-s
When you are using Azure Function under App service plan then you have to watch out for performance parameters like scaling. Have you investigated your function is not getting overloaded ?
On the other hand , As part of your design this approach is wrong to me. With this many bindings there could be potential performance issues , and what if you are supposed to add more bindings in future ? If you are not performing any operation then you shouldn't be taking overhead of redirecting messages.
Event Grid
We can use event grids for that. Based on topic the IoT hub publishes the event to a topic and events are consumed by subscribers in your case other event hubs. You also get advantage of micro billing (serverless) and auto scaling as well. https://learn.microsoft.com/en-us/azure/event-grid/overview

How can i terminate myself if i run too long?

I have a application that runs periodically (it's a scheduled task). The task is launched once a minute, and normally only takes a few seconds to do its business, then exits.
But there's a ~1 in 80,000 chance (every two or three months) that the application will hang. The root cause is because we're using Microsoft ServerXmlHttpRequest component to perform some work, and sometimes it just decides to hang. The virtue of ServerXmlHttpRequest over XmlHttpRequest is that the latter is not recommended for important scenarios, such as where reliability and security are important (which is true of an unattended server component):
The ServerXMLHTTP object offers functionality similar to that of the XMLHTTP object. Unlike XMLHTTP, however, the ServerXMLHTTP object does not rely on the WinInet control for HTTP access to remote XML documents. ServerXMLHTTP uses a new HTTP client stack. Designed for server applications, this server-safe subset of WinInet offers the following advantages:
Reliability — The HTTP client stack offers longer uptimes. WinInet features that are not critical for server applications, such as URL caching, auto-discovery of proxy servers, HTTP/1.1 chunking, offline support, and support for Gopher and FTP protocols are not included in the new HTTP subset.
Security — The HTTP client stack does not allow a user-specific state to be shared with another user's session. ServerXMLHTTP provides support for client certificates.
The job is being run as a scheduled task. I need the task to continue to run periodically; killing the existing process if it's dead.
The Windows Task Scheduler does have an option for forcibly close a task that is running too long:
The only downside to that approach is that it simply doesn't work - it simply does not stop the task. The hung process keeps running.
Given that i cannot trust the Microsoft ServerXmlHttpRequest to not arbitrarily lock up, and the task scheduler is unable to terminate the scheduled task, i need some way to do it myself.
Jobs
I tried looking into using the Job Objects API:
A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. A job can enforce limits such as working set size, process priority, and end-of-job time limit on each process that is associated with the job.
That one note sounded like exactly what i needed:
A job can enforce limits such as end-of-job time limit on each process that is associated with the job.
The only down-side to that approach is that it does not work. Job cannot impose a time-limit on a process. They can only impose a user time limit on a process:
PerProcessUserTimeLimit
If LimitFlags specifies JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process user-mode execution time limit, in 100-nanosecond ticks.
If the process is idle (for example, sitting at a MsgWaitForSingleObject as ServerXmlHttpRequest is), then it will accumulate no user time. I tested it. I created a job with a 1 second time limit, and placed my self process into it. As long as i don't move the mouse around my test application, it quite happily sits there for longer than one second.
Watchdog Thread
The only other technique i can imagine, given that my main thread is indefinitely blocked, is another thread. The only solution i can imagine is spawn another thread that will sleep for my three minutes, then ExitProcess:
Int32 watchdogTimeoutSeconds = FindCmdLineSwitch("watchdog", 0);
if (watchdogTimeoutSeconds > 0)
Thread thread = new Thread(KillMeCallback, new IntPtr(watchdogTimeoutSeconds));
void KillMeCallback(IntPtr data)
{
Int32 secondsUntilProcessIsExited = data.ToInt32();
if (secondsUntilProcessIsExited <= 0)
return;
Sleep(secondsUntilProcessIsExited*1000); //seconds --> milliseconds
LogToEventLog(ExtractFilename(Application.ExeName),
"Watchdog fired after "+secondsUntilProcessIsExited.ToString()+" seconds. Process will be forcibly exited.", EVENTLOG_WARNING_TYPE, 999);
ExitProcess(999);
}
And that works. The only downside is that it's a bad idea.
Can anyone think of anything better?
Edit
For now i will implement a
Contoso.exe /watchdog 180
So the process will be exited after 180 seconds. It means the duration is configurable, or can be removed completely easily in the field.
I used the route where i pass a special WatchDog argument to my process on the command line;
>Contoso.exe /watchdog 180
During initialization i check for the presence of the WatchDog option, with an integer number of seconds after it:
String s = Toolkit.FindCmdLineOption("watchdog", ["/", "-"]);
if (s <> "")
{
Int32 seconds = StrToIntDef(s, 0);
if (seconds > 0)
RunInThread(WatchdogThreadProc, Pointer(seconds));
}
and my thread procedure:
void WatchdogProc(Pointer Data);
{
Int32 secondsUntilProcessIsExited = Int32(Data);
if (secondsUntilProcessIsExited <= 0)
return;
Sleep(secondsUntilProcessIsExited*1000); //seconds -> milliseconds
LogToEventLog(ExtractFileName(ParamStr(0)),
Format("Watchdog fired after %d seconds. Process will be forcibly exited.", secondsUntilProcessIsExited),
EVENTLOG_WARNING_TYPE, 999);
ExitProcess(2);
}

Rate limiting algorithm for throttling request

I need to design a rate limiter service for throttling requests.
For every incoming request a method will check if the requests per second has exceeded its limit or not. If it has exceeded then it will return the amount of time it needs to wait for being handled.
Looking for a simple solution which just uses system tick count and rps(request per second). Should not use queue or complex rate limiting algorithms and data structures.
Edit: I will be implementing this in c++. Also, note I don't want to use any data structures to store the request currently getting executed.
API would be like:
if (!RateLimiter.Limit())
{
do work
RateLimiter.Done();
}
else
reject request
The most common algorithm used for this is token bucket. There is no need to invent a new thing, just search for an implementation on your technology/language.
If your app is high avalaible / load balanced you might want to keep the bucket information on some sort of persistent storage. Redis is a good candidate for this.
I wrote Limitd is a different approach, is a daemon for limits. The application ask the daemon using a limitd client if the traffic is conformant. The limit is configured on the limitd server and the app is agnostic to the algorithm.
since you give no hint of language or platform I'll just give out some pseudo code..
things you are gonna need
a list of current executing requests
a wait to get notified where a requests is finished
and the code can be as simple as
var ListOfCurrentRequests; //A list of the start time of current requests
var MaxAmoutOfRequests;// just a limit
var AverageExecutionTime;//if the execution time is non deterministic the best we can do is have a average
//for each request ether execute or return the PROBABLE amount to wait
function OnNewRequest(Identifier)
{
if(count(ListOfCurrentRequests) < MaxAmoutOfRequests)//if we have room
{
Struct Tracker
Tracker.Request = Identifier;
Tracker.StartTime = Now; // save the start time
AddToList(Tracker) //add to list
}
else
{
return CalculateWaitTime()//return the PROBABLE time it will take for a 'slot' to be available
}
}
//when request as ended release a 'slot' and update the average execution time
function OnRequestEnd(Identifier)
{
Tracker = RemoveFromList(Identifier);
UpdateAverageExecutionTime(Now - Tracker.StartTime);
}
function CalculateWaitTime()
{
//the one that started first is PROBABLY the first to finish
Tracker = GetTheOneThatIsRunnigTheLongest(ListOfCurrentRequests);
//assume the it will finish in avg time
ProbableTimeToFinish = AverageExecutionTime - Tracker.StartTime;
return ProbableTimeToFinish
}
but keep in mind that there are several problems with this
assumes that by returning the wait time the client will issue a new request after the time as passed. since the time is a estimation, you can not use it to delay execution, or you can still overflow the system
since you are not keeping a queue and delaying the request, a client can be waiting for more time that what he needs.
and for last, since you do not what to keep a queue, to prioritize and delay the requests, this mean that you can have a live lock, where you tell a client to return later, but when he returns someone already took its spot, and he has to return again.
so the ideal solution should be a actual execution queue, but since you don't want one.. I guess this is the next best thing.
according to your comments you just what a simple (not very precise) requests per second flag. in that case the code can be something like this
var CurrentRequestCount;
var MaxAmoutOfRequests;
var CurrentTimestampWithPrecisionToSeconds
function CanRun()
{
if(Now.AsSeconds > CurrentTimestampWithPrecisionToSeconds)//second as passed reset counter
CurrentRequestCount=0;
if(CurrentRequestCount>=MaxAmoutOfRequests)
return false;
CurrentRequestCount++
return true;
}
doesn't seem like a very reliable method to control whatever.. but.. I believe it's what you asked..

measuring cross process latency on windows

I am building latency measurement into a communication middleware I am building. The way I have it working is that I periodically send a probe msg from my publishing apps. Subscribing apps receive this probe, cache it, and send an echo back at a time of their choosing, noting how much time the msg was kept “on hold”. The subscribing app receives these echos and calculates latency as (now() – time_sent – time_on_hold) / 2.
This kinda works, but the numbers are vastly different (3x) when “time on hold” is greater than 0. I.e if I echo the msg back immediately I get around 50us on my dev env and if I wait, then send the msg back the time jumps to 150us (though I discount whatever time I was on hold). I use QueryPerfomanceCounter for all measurements.
This is all inside a single Windows 7 box. What am I missing here?
TIA.
A bit more information. I am using the following to measure time:
static long long timeFreq;
static struct Init
{
Init()
{
QueryPerformanceFrequency((LARGE_INTEGER*) &timeFreq);
}
} init;
long long OS::now()
{
long long result;
QueryPerformanceCounter((LARGE_INTEGER*)&result);
return result;
}
double OS::secondsDiff(long long ts1, long long ts2)
{
return (double) (ts1-ts2)/timeFreq;
}
On the publish side I do something like:
Probe p;
p.sentTimeStamp = OS::now();
send(p);
Response r = recv();
latency=OS::secondsDiff(OS::now()- r.sentTimeStamp) - r.secondsOnHoldOnReceiver;
And on the receiver side:
Probe p = recv();
long long received = OS::now();
sleep();
Response r;
r.sentTimeStamp = p.timeStamp;
r.secondOnHoldOnReceiver = OS::secondsDiff(OS::now(), received);
send(r);
Ok, I have edited my answer to reflect your answer: Sorry for the delay, but I didn't notice that you had elaborated on the question by creating an answer.
It's seems that functionally you are doing nothing wrong.
I think that when you distribute your application outside of localhost conditions, the additional 100us (if it is indeed roughly constant) will pale into insignificance compared to the average latency of a functioning network.
For the purposes of answering your question am thinking that there is a thread/interrupt scheduling issue on the server side that needs to be investigated, as you do not seem to be doing anything on the client that is not accounted for.
Try the following test scenario:
Send two Probes to clients A and B. (all localhost)
Send the Probe to 'Client B' one second (or X/2 seconds) after you send the probe to Client A.
Ensuring that 'Client A' waits for two seconds (or X seconds) and 'Client B' waits 'one second (or X/2 seconds)
The idea being that hopefully, both clients will send back their probe answers at roughly the same time and both after a sleep/wait (performing the action that exposes the problem). The objective is to try to get one of the clients responses to 'wake up' the publisher to see if the next clients answer will be processed immediately.
If one of these returned probes is not showing the anomily (most likely the second response) it could point to the fact that the publisher thread is waking from a sleep cycle (on recv 1st responce) and is immediately available to process the second response.
Again, if it turns out that the 100us delay is roughly constant, it will be +-10% of 1ms which is the timeframe appropriate for realworld network conditions.

Resources