Consul health check script is not showing output on UI - consul

Im not able to see the script output on consul UI...
The script runs but the output is not seen
What is that I'm missing or going wrong please help ! :/ :/

The following information is correct for Consul up to 0.7.2 and is subject to change in the future.
The output of a check is only updated in real time when the state of the check changes (i.e. when it goes from OK to WARNING or CRITICAL to OK). The actual text of the check will be updated periodically based Consul's anti-entropy runs, which default to happening every 10 minutes, iirc. If you're patient, the output will be updated. Or if you go to the Consul agent running the check and query the appropriate /v1/agent endpoint, it should be real-time. But if you query through the Consul Server's catalogs, it can be delayed.
This trade off in freshness was made due to scalability reasons and not wanting to continually stream hojilions of check updates into a single set of servers.

Related

Best way to monitor health of a spring app set up as a lambda function

I have a spring batch application that is hosted on AWS and runs as a lambda function. This lambda function is triggered when a file is dropped in the corresponding S3 bucket.
My question is: What would be the best way to perform health checks in this scenario? If this were a regular service running in EC2 (i.e. constantly running), I'd just schedule a health check to run after a fixed time interval, but since this lambda only runs for a couple of minutes at most, I'm not sure how I should proceed. I was thinking of just simply setting the health check status based on the individual reader and writer steps somehow. For instance, if the job was able to read successfully, return status UP, else, return some other status.
I do also want to note that the health of this app will need to be documented in splunk via logs.
Please let me know if there is a better solution. I'm new to health checks so my implementation might be incorrect.

Concepts to write code to monitor running application on the server

Requirement:-
I have to write code to monitor all the running applications on the server and give their name as output if it's down.
Research:-
During my research I found that:-
There are several tools like azure and monito that themselves monitor all the applications but this does not match our requirements.
We can write code that can check all the running services on the local desktop or the server and from there we can also check the running status of the required applications and if the status is stopped or sleep then we can easily notify.
We can send requests to the deployed URL at some regular interval and if we get a response status rather than 200 then we can notify the user as something is wrong and this particular website is not working.
If anyone can through some light on this and can suggest some more methods or references from their experience, it will be highly appreciated.

Conditionally restart a service

I've just learned how to use notifications and subscriptions in Chef to carry out actions such as restarting services if a config file is changed.
I am still learning chef so may just have not got to this section yet but I'd like to know how to do the actions conditionally.
Eg1 if I change a config file for my stand alone apache server I only want to restart the service if we are outside core business hours ie the current local time is between 6pm and 6am. If we are in core business hours I want the restart to happen but at a later time, outside core hours.
Eg2 if I change a config file for my load balanced apache server cluster I only want restart the service if a) the load balancer service status is "running" and b) all other nodes in the cluster have their apache service status as running ie I'm not taking down more than one node in the cluster at once.
I imagine we might need to put the action in a ruby block that either loops until the conditions are met or sets a flag or creates a scheduled task to execute later but I have no idea what to look for to learn how best to do this.
I guess this topic is kind of philosophical. For me, Chef should not have a specific state or logic beyond the current node and run. If I want to restart at a specific time, I would create a cron job with a conditional and just set the conditional with chef (Something like debian's /var/run/reboot-required). Then crond would trigger the reboot.
For your second example, the LB should have no issues to deal with a restarting apache backend and failover to another backend. Given that Chef runs regulary with something called "splay" the probability is very low that no backend is reachable. Even with only 2 backends. That said, reloading may be the better way.

How to download 300k log lines from my application?

I am running a job on my Heroku app that generates about 300k lines of log within 5 minutes. I need to extract all of them into a file. How can I do this?
The Heroku UI only shows logs in real time, since the moment it was opened, and only keeps 10k lines.
I attached a LogDNA Add-on as a drain, but their export also only allows 10k lines export. To even have the option of export, I need to apply a search filter (I typed 2020 because all the lines start with a date, but still...). I can scroll through all the logs to see them, but as I scroll up the bottom gets truncated, so I can't even copy-paste them myself.
I then attached Sumo Logic as a drain, which is better, because the export limit is 100k. However I still need to filter the logs in 30s to 60s intervals and download separately. Also it exports to CSV file and in reverse order (newest first, not what I want) so I have to still work on the file after its downloaded.
Is there no option to get actual raw log files in full?
Is there no option to get actual raw log files in full?
There are no actual raw log files.
Heroku's architecture requires that logging be distributed. By default, its Logplex service aggregates log output from all services into a single stream and makes it available via heroku logs. However,
Logplex is designed for collating and routing log messages, not for storage. It retains the most recent 1,500 lines of your consolidated logs, which expire after 1 week.
For longer persistence you need something else. In addition to commercial logging services like those you mentioned, you have several options:
Log to a database instead of files. Something like Apache Cassandra might be a good fit.
Send your logs to a logging server via Syslog (my preference):
Syslog drains allow you to forward your Heroku logs to an external Syslog server for long-term archiving.
Send your logs to a custom logging process via HTTPS.
Log drains also support messaging via HTTPS. This makes it easy to write your own log-processing logic and run it on a web service (such as another Heroku app).
Speaking solely from the Sumo Logic point of view, since that’s the only one I’m familiar with here, you could do this with its Search Job API: https://help.sumologic.com/APIs/Search-Job-API/About-the-Search-Job-API
The Search Job API lets you kick off a search, poll it for status, and then when complete, page through the results (up to 1M records, I believe) and do whatever you want with them, such as dumping them into a CSV file.
But this is only available to trial and Enterprise accounts.
I just looked at Heroku’s docs and it does not look like they have a native way to retrieve more than 1500 and you do have to forward those logs via syslog to a separate server / service.
I think your best solution is going to depend, however, on your use-case, such as why specifically you need these logs in a CSV.

IIS 7 request duration monitoring

i am curious if there is a way of monitoring the request duration time on an iis server. Personally I have came up with a solution but it's really resource intensive and that is why i'm asking the question, just to gather more opinions.
My plan is to extract the duration time of each request and send it to graphite so as to have a real time overview of the performance of the webserver. The idea i've came up with is to use poweshell with its webadministration module. And if you run get-item IIS:\AppPools\DefaultAppPool | Get-WebRequest for example you get all the requests on that app pool with a lot of info including the time info.
The thing is that i should have a script which runs every 100 ms to get all requests and that is kinda wasteful. Is there a way to tell iis to put the request duration time(in miliseconds) in the logs? Because then it would be much easier to get the information I need.
I don't know if there is such a feature on IIS, but I've done the same (sending iis page times to graphite) by using a reverse proxy between internet and the iis server, like nginx.
The proxy module from nginx allow you to log on each request the time the backend took to produce the page.
Also, having a proxy like nginx in fron of an IIS could be very helpful if you have to deal with visits with slow connections, nginx will store the reply from backend, drop backend connection and wait until visitor gets all the content. Highly recommended.
In case you go this route, you should use logster (also from etsy guys) or logstash to parse nginx logs each period of time you want (likely every minute).
Seems that there is a feature that logs requests based on a regex, and it's called Advanced Logging Module. You can specify from a number of fields what you want to get loged and it's W3C compliant. In my case i had time take as a filed which can be specified and that was what i was looking for. After that i written a script in powershell which parses the logs and gets the information i need, constructs a metric and sends it to statsd which in term sends it to powershell.
The method i chose for the log parsing was the following: in the script i used get-content comandlet from powershell to gather all the logs in one file(yes iis breaks the logs in multiple files, and i'm guessing the number of logs is dependent on the number of your working processes but i'm not sure). This was the first iteration in a second iteration i gather all the logs in another file and make a diff between the first file and the latter and only the difference gets processed.
I chose this method because it's i thought it wold be better to have the minimum regex processing. The next step is erasing the first file of accumulated logs and moving the second one in pace of the first that was erased and running the script again, so to have always a method of comparison. Also the log rollover is at one hour, after which the logs are erased.

Resources