I have a bash script that runs every five minutes to get an updated file from a ftp server. I know what time they generate the file which is every 5 minutes starting at 0 minutes of each hour. It usually takes them 30 seconds to generate the file and so I have mine offset by 1 minute (running every 5 minutes starting at 1 minute of each hour). However, there are times when their server bogs down and takes longer (sometimes minutes) to generate their file and it only takes about 13 seconds for me to download and process the file. When this happens I end up getting basically the first portion of the file and not the rest. Is there a way to verify that what I just downloaded matches what is on the ftp server? I was thinking maybe there was a way to check that the filesize of what I downloaded matches what is on the ftp server. Is that possible? My other thought was that if that is possible, depending on how quickly it can compare those two files, a delay may need to be built in to ensure it has time to for the file on the ftp server to have more data written to it if it is still in progress. Thoughts/suggestions? Thanks in advance.
Related
I am running an API on a linux server. For the majority of the day the API runs completely fine, however every single day at 1:00 am/pm I receive a large spike of failures that the end after about 5 minutes. Looking into the failures, there is not one consistent pattern in the requests, and most requests still process fine (just a higher proportion fail). It also is not likely traffic related as 1:00am is a very slow traffic window
I am not the only person who is using this server though, so I suspect that someone else is running a process every 12 hours at this time which is eating up a lot of resources.
My question is if there is a bash command which I can run on my server to see if there are any processes that are always being run during this window?
Use top command in batch mode to display system statistics in real time and save the output in a textfile
top -b -n 1 > top.txt
To grab more than one iteration of top command, change -n 1 to your disired number. For example -n 100
top -b -n 100 > top.txt
Put the command in a script and execute the script via crontab or as an atjob at 1:00 am then check your textfile
I have a Perl script which uses LWP::UserAgent to retrieve a YouTube video's information via an HTTPS request. The script normally would take 2-3 seconds to retrieve three separate HTTPS requests. Now, it takes 15-20 minutes.
I wrote a similar script in Python where I use the following:
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors
And this one request is taking 5-7 minutes.
Then - on Thursday night - the Perl script went back to only taking 2-3 seconds. This lasted for a few hours, and now the time is back to taking 15-20 minutes. The script seems to take forever on the HTTPS call - and not during any other time of the script.
If I take the HTTPS URL and put it in a web browser, it takes less than a second.
The script is over 2,000 lines long, so I can't post it here. But, all I am doing is trying to retrieve:
https://www.googleapis.com/youtube/v3/videos?part=statistics&key=$key&id=$video_comments_to_get,
where $key is my authorization key, and $video_comments_to_get is the 11 character YouTube video ID value.
This all started after YouTube's outage on November 11th. Any ideas why this would be taking so long? I only run this script 10-15 times a day - so it isn't like I am over my quota.
My understanding of the parse.com API rate limit is that it’s not a concurrent-job limit, it’s just the number of requests started in a given second. So if a user is, say, uploading a file from a slow network and it takes 30 seconds, that’s not 1 of my 30 req/s taken up that whole time. It’s just one request, the first second.
On my team, though, is a wonderful security guy whose job it is to worry. He thinks that if 30 users upload a file each, for 30 seconds, at a 30 r/s limit, no one else will be able to use our app until they are done.
Which one is correct?
Your understanding was correct. It's the number of requests started per second. The duration of the request does not come in to play.
Source: I work at Parse.
I think you are right. I've made some experiments with Parse, for example i reloaded a UITableview 10 or 20 times in one second (can't remember) for 3-4 minutes and checked the requests in the admin panel. The maximum value was always less than 30, but it doesn't matter, the point is that you can test it this way and get more informations.
Just create some test project and reload the SampleViewController.m (which contains a Parse query) 30 times in one second, after this you can check the data browser which will display the traffic by req/sec.
As a second option you can upload a bunch of images by current user in every second, since the upload time is longer than 1 sec, you can check what happens when you start uploading a bunch of images (or other data) in every second.
We have about 10 different Python scripts that download data from the web, read data from a database and write data back to that database. They do so repeatedly every 10 seconds (or 10 seconds after the last task has completed).
The question is, what is the best approach at running these tasks? I can think of a few ways:
a while True that runs the task then sleeps for the interval. It could be guarded by a watchdog like supervisord, making sure it is always up.
having the script execute the task just once, and invoking the script externally once every 10 seconds by another process.
having the script execute the task lets say for 1 hour (every 10 seconds for an hour), and having a watchdog make sure that task runs again once the hour is over.
I would like to avoid long running processes that actually do something because I don't want to deal with memory problems etc over long periods of time.
Additional Information
The scripts are different because they each retrieve data from a different source, and query, calculate and insert different data into the database.
The tasks are performed every 10 seconds since the data being retrieve is in real-time, and we need to not only keep updating it very frequently, but also keep all the historical data in the database.
There are a lot of resources being used by the scripts - MySQL connections, HTTP connections, Redis connections, etc. We have encountered issues with using the long-running approach before, specifically with MySQL connections (things like MySQL server has gone away, even though all connections had been closed). Hence the inclination toward having the scripts run in shorter periods of time.
What are some common approaches at this?
Unless your scripts somehow leak memory (quite unlikely), they should all be the same. So, for sheer simplicity (your time programming/debugging is much more expensive than a few miliseconds of the machine's time, even each 10 seconds!) I'd go for the single script that checks each 10 seconds.
OTOH, checking each 10 seconds sounds like busywork. Can't you set up so that whatever you are monitoring tells you when there are changes? Or batch the records up so you can retrieve, say, a day's worth at at time?
If you are running on linux, cron has granularity of a minute. We have processes we run constantly. Rather than watch them, the script will open a semaphore that gets released when the program finishes normally or not. This way if it runs long and it gets called again by cron, the copy will exit when it can't get the lock. This way you can call it a often as you need to without it stepping on a possibly still running copy.
I have to set up a cron job on my hosting provider.
This cron job needs to run every second. It's not intensive, just doing a check.
The hosting provider however only allows cron jobs to be run every two minutes. (can't change hosting btw)
So, I'm clueless on how to go about this?
My thoughts so far:
If it can only run every two minutes, I need to make it run every second for two minutes. 1) How do I make my script run for two minutes executing a function every second?
But it's important that there are no interruptions. 2) I have to ensure that it runs smoothly and that it remains constantly active.
Maybe I can also try making it run forever, and run the cron job every two minutes checking whether it is running? 3) Is this possible?
My friend mentioned using multithreading to ensure it's running every second. 4) any comments on this?
Thanks for any advice. I'm using ZF.
Approach #3 is the standard solution. For instance you can have the cron job touch a file every time it runs. Then on startup you can check whether that file has been touched recently, and if it has then exit immediately. Else start running. (Other approaches include using file locking, or else writing the pid to a file and on startup check whether that pid exists and is the expected program.)
As for the one second timeout, I would suggest calling usleep at the end of your query, supplying the number of milliseconds from now to when you next want to run. If you do a regular sleep then you'll actually run less than once a second because sleeps sometimes last longer than expected, and your check takes time. As long as your check takes under a second to run, this should work fine.
I don't think cron allows second level resolution. http://unixhelp.ed.ac.uk/CGI/man-cgi?crontab+5
field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sun, or use names)
So, even if your hosting provider allows you can't run a process that repeats every second. However, you can user command something like watch for repeated execution of your script. see here