I have a need to convert some of my perl CGI-scripts to binaries.
But when I have a script of 100kb converted into binary it becomes about 2-3Mb. This is understood why, as compiler has to pack inside all the needed tools to execute the script.
The question is about the time of pages loading on the server, when they are binary. Say, if I have a binary perl-script "script", that answers on ajax requests and that binary weights about 3mb, will it reflect on AJAX requests? If, say, some users have low connection, will they wait for ages until all these 3Mb will be transferred? Or, the server WON'T send all the 3mb to a user, but just an answer (short XML/JSON whatsoever)?
Another case is when I have HTML page, that is generated by this binary perl-script on the server. User addresses his browser to the script, that weights 3Mb and after he has to get an HTML page. Will the user wait again, until the whole script is been loaded (every single byte form those 3Mb), or just wait the time that is needed to load EXACTLY the HTML page (say, 70Kb), and the rest mass will be run on the server-side only and won't make the user to wait for it?
Thanks!
Or, the server WON'T send all the 3mb to a user, but just an answer (short XML/JSON whatsoever)?
This.
The server executes the program. It sends the output of the program to the client.
There might be an impact on performance by bundling the script up (and it will probably be a negative one) but that has to do with how long it takes the server to run the program and nothing to do with how long it takes to send data back to the client over the network.
Wrapping/Packaging a perl script into a binary can be useful for ease of transport or installation. Some folks even use it as a (trivial) form of obfuscation. But in the end, the act of "Unpacking" the binary into usable components at the beginning of every CGI call will actually slow you down.
If you wish to improve performance in a CGI situation, you should seriously consider techniques that make your script persistent to eliminate startup time. mod_perl is an older solution to this problem. More modern solutions include FCGI or wrapping your script into it's own mini web server.
Now if you are delivering the script to a customer and a PHB requires wrapping for obfuscation purposes, then be comforted that the startup performance hit only occurs once if you write your script to be persistent.
Related
I have a hunch that it may not be possible but what does it take for me to ask? Hey, "the fool didn't know it was impossible, so he did it."
This question is env/tool/version agnostic. I target any industry standard tools for performance testing like HP LR, Apache JMeter, SilkPerformer etc.
The scenario:
A Web(HTML/HTTP) script is being executed in LR Vugen.
As the script execution progresses, the vuser follows the scripted steps/journey. Each action function calls the SUT hosts/services for responses while maintaining local sessions, managing cookies and remembering headers, emulating browser caches so on.
Now, we can pause the execution any time in the tool and that will stop the vuser action. Question is, can we resume the session somewhere else or in the tool to manually interact with the web page or response while maintaining the same session?
This will help users reproduce a case with plenty of steps which can be executed by the tool and the user can take over at a certain point to carry with a different path.
The only way to accomplish interaction with the session is using Looadrunner's TrucClient protocol.
This protocol actually uses a real browser (Firefox or IE emulator) for every VUser being executed. With some runtime setting options, The browser can be visible while replaying the load scenario.
Of course, TruClient protocol is only used for testing web sites.
Hope this helps.
i am curious if there is a way of monitoring the request duration time on an iis server. Personally I have came up with a solution but it's really resource intensive and that is why i'm asking the question, just to gather more opinions.
My plan is to extract the duration time of each request and send it to graphite so as to have a real time overview of the performance of the webserver. The idea i've came up with is to use poweshell with its webadministration module. And if you run get-item IIS:\AppPools\DefaultAppPool | Get-WebRequest for example you get all the requests on that app pool with a lot of info including the time info.
The thing is that i should have a script which runs every 100 ms to get all requests and that is kinda wasteful. Is there a way to tell iis to put the request duration time(in miliseconds) in the logs? Because then it would be much easier to get the information I need.
I don't know if there is such a feature on IIS, but I've done the same (sending iis page times to graphite) by using a reverse proxy between internet and the iis server, like nginx.
The proxy module from nginx allow you to log on each request the time the backend took to produce the page.
Also, having a proxy like nginx in fron of an IIS could be very helpful if you have to deal with visits with slow connections, nginx will store the reply from backend, drop backend connection and wait until visitor gets all the content. Highly recommended.
In case you go this route, you should use logster (also from etsy guys) or logstash to parse nginx logs each period of time you want (likely every minute).
Seems that there is a feature that logs requests based on a regex, and it's called Advanced Logging Module. You can specify from a number of fields what you want to get loged and it's W3C compliant. In my case i had time take as a filed which can be specified and that was what i was looking for. After that i written a script in powershell which parses the logs and gets the information i need, constructs a metric and sends it to statsd which in term sends it to powershell.
The method i chose for the log parsing was the following: in the script i used get-content comandlet from powershell to gather all the logs in one file(yes iis breaks the logs in multiple files, and i'm guessing the number of logs is dependent on the number of your working processes but i'm not sure). This was the first iteration in a second iteration i gather all the logs in another file and make a diff between the first file and the latter and only the difference gets processed.
I chose this method because it's i thought it wold be better to have the minimum regex processing. The next step is erasing the first file of accumulated logs and moving the second one in pace of the first that was erased and running the script again, so to have always a method of comparison. Also the log rollover is at one hour, after which the logs are erased.
I have noticed that some of my ajax-heavy sites (ones I visit, not ones I have built), have certain auto-refresh features. For example, in GMail, if I get a new message, I see the new message without a page reload. It's the same with the Facebook browser-based IM client. From what I can tell, there aren't any java applets handling the server-browser binding, so I'm left to assume it's being done by AJAX and perhaps some element I'm unaware of. So by my best guess, it's done in one of two ways:
The javascript does a steady "ping" to a server-side script, checking for any updates that might be available (which would explain why some of these pages bring any other heavy-duty pages to a crawl). or
The javascript sits idly by and a server-side script actually "Pushes" any updates to the browser. But I'm not sure if this is possible. I'd imagine there is some kind of AJAX function that still pings, but all it simply asks "any updates?" and the server-script has a simple boolean that says "nope" or "I'm glad you asked." But if this is the case, any data changes would need to call the script directly so that it has the data changes ready and makes the change to that boolean function.
So is that possible/feasible/how it works? I imagine something like:
Someone sends an email/IM/DB update to the server, the server calls the script using the script's URL plus some relevant GET variable, the script notes the change and updates the "updates available" variable, the AJAX gets the response that there are in fact updates, the AJAX runs its normal "update page" functions, which executes the normal update scripts and outputs them to the browser.
I ask because it seems really inefficient that the js is just doing a constant check which requires a) the server to do work every 1.5 seconds, and b) my browser to do work every 1.5 seconds just so that on my end I can say "Oh boy, I got an IM! just like a real IM client!"
Read about Comet
I've actually been working on a small .NET Web App that uses the Ajax with long polling technique described.
Depending on what technology you're using, you could use thread signaling mechanisms to hold your request until an update is retrieved.
With ASP.NET I'm running my server on a single machine, so I store a reference to my Producer object (which contains a thread that processes the data). To initiate the data pull, my service's Subscribe method is called, which creates a Consumer object that's registered with the Producer. If the Consumer is long polling mode, it has a AutoResetEvent which is signaled whenever it receives new data, and whenever the web client makes a request for data, the Consumer first waits on the reset event, and then returns it.
But you're mentioning something about PHP - as far as I know persistence is maintained through serialization, not actually keeping the object in memory, so I don't know how you could reference a Producer object using $_CACHE[] or $_SESSION[]. When I developed in PHP I never really knew anything about multithreading so I didn't play around with it, but I guess you can look into that.
Using infinite loops is going to consume a lot of your processing power - I would exhaust all other options first.
I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.
This is my setting: I have written a .NET application for local client machines, which implements a feature that could also be used on a webpage. To keep this example simple, assume that the client installs a software into which he can enter some data and gets some data back.
The idea is to create a webpage that holds a form into which the user enters the same data and gets the same results back as above. Due to the company's available web servers, the first idea was to create a mono webservice, but this was dismissed for reasons unknown. The "service" is not to be run as a webservice, but should be called by a PHP script. This is currently realized by calling the mono application via shell_exec from PHP.
So now I am stuck with a mono port of my application, which works fine, but takes way too long to execute. I have already stripped out all unnecessary dlls, methods etc, but calling the application via the command line - submitting the desired data via commandline parameters - takes approximately 700ms. We expect about 10 hits per second, so this could only work when setting up a lot of servers for this task.
I assume the 700m are related to the cost of starting the application every time, because it does not make much difference in terms of time if I handle the request only once or five hundred times (I take the original input, vary it slighty and do 500 iterations with "new" data every time. Starting from the second iteration, the processing time drops down to approximately 1ms per iteration)
My next idea was to setup the mono application as a remoting server, so that it only has to be started once and can then handle incoming requests. I therefore wrote another mono application that serves as the client. Calling the client, letting the client pass the data to the server and retrieving the result now takes 344ms. This is better, but still way slower than I would expect and want it to be.
I have then implemented a new project from scratch based on this blog post and get stuck with the same performance issues.
The question is: am I missing something related to the mono-projects that could improve the speed of the client/server? Although the idea of creating a webservice for this task was dismissed, would a webservice perform better under these circumstances (as I would not need the client application to call the service), although it is said that remoting is faster than webservices?
I could have made that clearer, but implementing a webservice is currently not an option (and please don't ask why, I didn't write the requirements ;))
Meanwhile I have checked that it's indeed the startup of the client, which takes most of the time in the remoting scenario.
I could imagine accessing the server via pipes from the command line, which would be perfectly suitable in my scenario. I guess this would be done using sockets?
You can try to use AOT to reduce the startup time. On .NET you would use ngen for that purpoise, on mono just do a mono --aot on all assemblies used by your application.
AOT'ed code is slower than JIT'ed code, but has the advantage of reducing startup time.
You can even try to AOT framework assemblies such as mscorlib and System.
I believe that remoting is not an ideal thing to use in this scenario. However your idea of having mono on server instead of starting it every time is indeed solid.
Did you consider using SOAP webservices over HTTP? This would also help you with your 'web page' scenario.
Even if it is a little to slow for you in my experience a custom RESTful services implementation would be easier to work with than remoting.