Bash script that checks website every 10 seconds - bash

The following script checks a sites content to see if any change has been done to it, every 10 seconds. It's for a very time sensitive application. If something on the site has changed, I merely have seconds to do something else. It will then start a new download and compare cycle and wait for the next change and do cycle. The do something else, has yet to be scripted and not relevant to the question.
The question: Will it be a problem for a public website to have a script downloading a single page every 10-15 seconds. If so, is there any other way to monitor a site, unmanned?
#!/bin/bash
Domain="example.com"
Ocontent=$(curl -L "$Domain")
Ncontent="$Ocontent"
until [ "$Ocontent" != "$Ncontent" ]; do
Ocontent=$(curl -L "$Domain")
#CONTENT CHANGED TRUE
#if [ "$Ocontent" == "$Ncontent ]; then
# Ocontent=$(curl -L "$Domain")
#fi
echo "$Ocontent"
sleep 10
done

The problems you're going to run into:
If the site notices and has a problem with it, you may end up on a banned IP list. Using an IP pool or other distributed resource can mitigate this.
Pinging a website precisely every x number of seconds is unlikely. Network latency is likely to cause a great deal of variance in this.
If you get a network partition, your code should know how to cope. (What if your connection goes down? What should happen?)
Note that getting the immediate response is only part of downloading a webpage. There may be changes to referenced files, such as css, javascript or images that are not immediately apparent from just the original http response.

Related

Why is Firefox lagging so much?

I have made a simple PHP script and I am running it from my localhost.
The script does not use any sessions, cookies, databases or files. It just sleeps for 100 ms and measures how long it did sleep. The code is not important here, but anyway:
$r = microtime(true);
usleep(100000);
echo 1000*(microtime(true) - $r);
When I run this script in Chrome or Firefox I get results like:
99.447011947632
or
99.483013153076
However- the script always renders within a second in Chrome but takes up to 4 seconds in Firefox!
Here is my benchmark from Tamper Data for Firefox:
(Sorry it's not in English. The columns are from left to right: URL, Total time, Size, HTTP Status)
So there is something wrong with Firefox. What could it be? The problem affects other scripts as well.
Should I send any special HTTP headers for Firefox?

FastRWeb performance on Ubuntu with built-in web server

I have installed FastRWeb 1.1-0 on an installation of R 2.15.2 (Trick or Treat) running on an Ubuntu 10.04 box. I hope to use the resulting system to run a web service.
I've configured the system by setting http.port to 8181 in rserve.conf and unsetting the socket destination. I've assigned .http.request to FastRWeb::.http.request. I exchange JSON blobs between the client and the server using HTTP POST (the second blob can exceed 150KB in size, and will not fit in an HTTP GET query string.)
Everything works end to end -- I have a little client-side R script which generates JSON RPC calls across the channel. I see the run function invoked, and see it returned.
I've run into a significant performance problem, however: the return path takes in excess of 12 seconds from the time run() returns (including the call to done()) and the time that the R client gets the return value. RCurl doesn't seem to be the culprit; it appears that something is taking twelve seconds to do a return.
Does anybody have any suggestions of where to look? I can easily shift over to using Apache 2.0 and CGI, but, honestly, I'd rather keep everything R centric.
Answering my own question.
I wrapped .http.request with an Rprof()/Rprof(NULL) pair and looked at the time spent in each routine. It turns out that the system spends ~11 seconds inside URLDecode in the standard implementation of .run. This looks like a scaling problem in URLDecode in the core.

Regulating / rate limiting ruby mechanize

I need to regulate how often a Mechanize instance connects with an API (once every 2 seconds, so limit connections to that or more)
So this:
instance.pre_connect_hooks << Proc.new { sleep 2 }
I had thought this would work, and it sort of does BUT now every method in that class sleeps for 2 seconds, as if the mechanize instance is touched and told to hold 2 seconds. I'm going to try a post connect hook, but it is obvious I need something a bit more elaborate, but what I don't know what at this point.
Code is more explanation so if you are interested following along: https://github.com/blueblank/reddit_modbot, otherwise my question concerns how to efficiently and effectively rate limit a Mechanize instance to within a specific time frame specified by an API (where overstepping that limit results in dropped requests and bans). Also, I'm guessing I need to better integrate a mechanize instance to my class as well, any pointers on that appreciated as well.
Pre and post connect hooks are called on every connect, so if there is some redirection it could trigger many times for one request. Try history_added which only gets called once:
instance.history_added = Proc.new {sleep 2}
I use SlowWeb to rate limit calls to a specific URL.
require 'slowweb'
SlowWeb.limit('example.com', 10, 60)
In this case calls to example.com domain are limited to 10 requests every 60 seconds.

How do I absolutely ensure that a Phusion Passenger instance stays alive?

I'm having a problem where no matter what I try all Passenger instances are destroyed after an idle period (5 minutes, but sometimes longer). I've read the Passenger docs and related questions/answers on Stack Overflow.
My global config looks like this:
PassengerMaxPoolSize 6
PassengerMinInstances 1
PassengerPoolIdleTime 300
And my virtual config:
PassengerMinInstances 1
The above should ensure that at least one instance is kept alive after the idle timeout. I'd like to avoid setting PassengerPoolIdleTime to 0 as I'd like to clean up all but one idle instance.
I've also added the ruby binary to my CSF ignore list to prevent the long running process from being culled.
Is there somewhere else I should be looking?
Have you tried to set the PassengerMinInstances to anything other than 1 like 3 and see that work?
Ok, I found the answer for you on this link: http://groups.google.com/group/phusion-passenger/browse_thread/thread/7557f8ef0ff000df/62f5c42aa1fe5f7e . Look at the last comment by Phusion guy.
Is there a way to ensure that I always have 10 processes up and
running, and that each process only serves 500 requests before being
shut down?
"Not at this time. But the current behavior is such that the next time
it determines that more processes need to be spawned it will make sure
L at least PassengerMinInstances processes exist."
I have to say their documentation doesn't seem to match what the current behavior.
This seems to be quite a common problem for people running Apache on WHM/cPanel:
http://techiezdesk.wordpress.com/2011/01/08/apache-graceful-restart-requested-every-two-hours/
Enabling piped logging sorted the problem out for me.

How can I performance test using shell scripts - tools and techniques?

I have a system to which I must apply load for the purpose of performance testing. Some of the load can be created via LoadRunner over HTTP.
However in order to generate realistic load for the system I also need to simulate users using a command line tool which uses a non HTTP protocol* to talk to the server.
* edit: actually it is HTTP but we've been advised by the vendor that it's not something easy to record/script and replay. So we're limited to having to invoke it using the CLI tool.
I have the constraint of not having the licences for LoadRunner to do this and not having the time to put the case to get the license.
Therefore I was wondering if there was a tool that I could use to control the concurrent execution of a collection of shell scripts (it needs to run on Solaris) which will be my transactions. Ideally it would be able to ramp up in accordance with a predetermined scehdule.
I've had a look around and can't tell if JMeter will do the trick. It seems very web oriented.
You can use bellow script to trigger load test for HTTP/S requests,
#!/bin/bash
#define variables
set -x # run in debug mode
DURATION=60 # how long should load be applied ? - in seconds
TPS=20 # number of requests per second
end=$((SECONDS+$DURATION))
#start load
while [ $SECONDS -lt $end ];
do
for ((i=1;i<=$TPS;i++)); do
curl -X POST <url> -H 'Accept: application/json' -H 'Authorization: Bearer xxxxxxxxxxxxx' -H 'Content-Type: application/json' -d '{}' --cacert /path/to/cert/cert.crt -o /dev/null -s -w '%{time_starttransfer}\n' >> response-times.log &
done
sleep 1
done
wait
#end load
echo "Load test has been completed"
You may refer this for more information
If all you need is starting a bunch of shell scripts in parallel, you can quickly create something of your own in perl with fork, exec and sleep.
#!/usr/bin/perl
for $i (1..1000)
{
if (fork == 0)
{
exec ("script.sh");
exit;
}
sleep 1;
}
For anyone interested I have written a Java tool to manage this for me. It references a few files to control how it runs:
1) Schedules File - defines various named lists of timings which controls the length of sequential phases.
e.g. MAIN,120,120,120,120,120
This will result in a schedule named MAIN which has 5 phases each 120 seconds long.
2) Transactions File - defines transactions that need to run. Each transaction has a name, a command that should be called, boolean controlling repetition, integer controlling pause between repetitions in seconds, data file reference,schedule to use and increments.
e.g. Trans1,/path/to/trans1.ksh,true,10,trans1.data.csv,MAIN,0,10,0,10,0
This will result in a transaction running trans1.ksh, repeatedly with a pause of 10 seconds between repetitions. It will reference the data in trans1.data.csv. During phase 1 it will increment the number of parallel invocations by 0, phase 2 will add 10 parallel invocations, phase 3 none added and so on. Phase times are taken from the schedule named MAIN.
3) Data Files - as referenced in the transaction file, this will be a CSV with a header. Each line of data will be passed to subsequent invocations of the transaction.
e.g.
HOSTNAME,USERNAME,PASSWORD
server1,jimmy,password123
server1,rodney,ILoveHorses
These get passed to the transaction scripts via environment variables (e.g. PASSWORD=ILoveHorses), a bit klunky, but workable.
My Java simply parses the config files, sets up a manager thread per transaction which itself takes care of setting up and starting executor threads in accordance with the configuration. Managers take care of adding executors linearly so as not to totally overload it.
When it runs, it just reports every second on how many workers each transaction has running and which phase it's in.
It was a fun little weekend project, it's certainly no load runner and I'm sure there are some massive flaws in it that I'm currently blissfully unaware of, but it seems to do ok.
So in summary the answer here was to "roll ya own".

Resources