Determining session time for a website - algorithm

For one of my classes we need to calculate the session length for a user visiting a website. We were given a web log. The web log is in this format:
IPAddress date httpMethod httpStatus size referrer browserInfo
The httpMethod looks like this: GET /include/main_page.css HTTP/1.1
The referrer is always the main page: http://www.cs.myCollage.com or -
I am using a timeout value of 20 minutes.
QUESTIONS:
I am not sure how to tell when a session is over other than when it times out. Is the only way to end a session with a timeout? Is there a way to detect when a user leaves the site (using only the information in the logs)?
This is my current strategy (assume that we have these logs):
IPAddress Time httpMethod ...
IP1 2:15 GET something
IP1 2:17 GET something else
IP1 2:30 GET something else
IP1 4:30 GET something else
IP1 4:32 GET something else
This means that the user has had two sessions. I think that the first session would be either 15 minutes or 35 minutes. Should I include the timeout in the session time?
The second session would be between 2 minutes and 22 minutes.

Timeout value is used to separate different sessions coming from same IP (which is not necessarily the same person). In your example you have two different sessions because period from 2:30 to 4:30 is larger than timeout value.
As for determining session length this is probably straightforward class homework solution, and probably what teacher had in mind: just subtract start time from end time. In your case 15 minutes for first session, and 2 minutes for second.
If this would be a real world project then maybe last page in each session should be given some value too. For this you can use temporal locality approach:
The duration of the last GET could be estimated by average durations of all pages that precede it. In you example (2:15,2:17,2:30) first two pages lasted for 15 minutes, so estimation is that visitor is kinda slow and/or thorough and that third page lasted for 7.5 minutes, and session total is 22.5 minutes. From (4:30,4:32) we deduce that last page lasted for 2 minutes, and session total is 4 minutes. In special case where we have only one page visit you must have some arbitrary value for duration, like 1 minute.
Another approach is to put a value to every page. Some page take more time to read than others. This means you must read the whole log and determine the average visit time for each page when they are in mid session, and use this time for case when page is last in session. This is more complicated, and probably not an answer to your homework question.
Best real world solution would probably be a mix of these two approaches.

Related

How to disable "Sleeping for N seconds" on PyWikiBot

At Pywikibot's Mediawiki Talk page this question has been asked some 2 years ago already.
The answers there were along the lines "you shouldn't" and maxthrottle isn't the right parameter for that.
For intranet usecases the throttle is mostly counterproductive. Especially when testing the automation the throttle kicks in no matter how low the number of API accesses is. So I'd rather switch if off or set it to a reasonable time of a few millisecs instead of the default 10 seconds.
How can the throttle be set to a different time?
see https://github.com/donkaban/pywiki-bot/blob/master/user-config.py#L159
# Slow down the robot such that it never makes a second page edit within
# 'put_throttle' seconds.
put_throttle = 0
0 looks like a good value

Visual Studio Load Test request completion and think time

I'm using load test in Visual Studio to test our web api services. But to my surprise I can't seem to test what I want to. Actually I have a single url in my .webtest file and try to send the same url time and again to see what is the avg. response time.
Here are the details
1.I use constant load of 1 user
2.Test duration of 1 hour
3.Think time of 10 seconds (not the think time between iterations)
4.The avg. response time that I get is 1.5 seconds
5.So the avg. test time comes out to be 11.5 seconds
6.Requests/sec are 0.088
7.And I'm using Sequential Test Order among 4 types of different tests
So these figures are making me think that every time a virtual user sends a request besides the specified think time it waits for the request to complete before he sends a new one (request). Thus technically the total think time becomes
Total think time = think time specified + avg. response time
But I don't want the user to wait for an already sent request to come back and then send a new one after a specified think time. I need to configure the load test in such a way that if the think time is 10 seconds then the user should send next request after every 10 seconds without waiting the first one to come back then think for another 10 seconds and then send a new request (hence making the total think time to 11.5 seconds in my case as mentioned above). And no matter what type of test I choose among 4 different types Visual Studio is always forcing the virtual user to wait for the completion of the request then add specified think time and then send a new one.
I know what Visual Studio load test is doing is more of a practical approach where the user sends the request wait till it comes back then think or interact with the website and then sends a new one.
Any help or suggestion would be appreciated towards what I'm trying to achieve.
In the properties of the scenario, set the "Test mix type" to be "Test mix based on user pace" and set the "Tests per user per hour" as appropriate. See here.
The suggestion in the question that:
Total think time = think time specified + avg. response time
is erroneous. To my mind adding the values does not provide a useful result. The two values on the right are as stated. Think time simulates the time a user spends reading the page, deciding what to do next and typing/clicking/etc their response. Response time is the "turn around" time between sending a request and getting the response. Adding them does not increase the think time in any sense, it just makes the total duration for handing the request in this specific test. Another test might make the same request with a different think time. Note that many web pages cause more than one request and response to be issued; JavaScript and other allow web pages to do many clever things.

Does JMeter show the correct average response time for the first page it hits for many virtual users?

I'm load testing a system with 500 virtual users. I've kept the "Ramp-Up period (in seconds)" option to zero. So, what I understand, JMeter will hit the system with 500 virtual users all at the same time. Please correct me if I'm wrong here.
Now, the summary report shows the average response time for the first page is ~100 seconds!. Which is more than a minute and a half of wait time. But while the JMeter is running, I manually went to the same page/url using a browser and didn't have to wait for that long. It was not even close, the page response was almost immediate for me.
My question is: is there any known issue for the average response time of the first page? Is it JMeter which is taking long to trigger that many users?
Thanks in advance.
--Ishtiaque
There is no issue in Jmeter related to first page response time.
Summary Report shows all response time details in Milliseconds, the value "100" seconds have you converted milliseconds to seconds?
Also in order to make sure that 500 users hit concurrently, use Synchronizing Timer.
Hope this will help.
While the response times will be accurate, you need to consider the affect of starting so many threads at once on both your server and your client.
500 threads to start at once is not insignificant n the client. If your server has the connections, it will start 500 threads as well.
Ramping over a period of time is more realistic loadwise, but still not really indicative of server capability until the threads have all started and settled in.
Databases can also require a settling in period which can affect response times.
Alternative to ramping is introducing a random wait at the start of each thread before firing the first sample. You can then choose not to ramp over time, but still expect resources on the client to suddenly come under load and change the settings if you hit limits. This will make the entire run much more realistic of typical behaviour. However, you need to determine if your use cases are typical.
Although the heap size is increased, i notice there is still longer time as compared to actual response time. Later i realised it was the probe effect (the extra time a tool generates due to test execution)

Do ongoing parse.com requests continue to count against the API limit?

My understanding of the parse.com API rate limit is that it’s not a concurrent-job limit, it’s just the number of requests started in a given second. So if a user is, say, uploading a file from a slow network and it takes 30 seconds, that’s not 1 of my 30 req/s taken up that whole time. It’s just one request, the first second.
On my team, though, is a wonderful security guy whose job it is to worry. He thinks that if 30 users upload a file each, for 30 seconds, at a 30 r/s limit, no one else will be able to use our app until they are done.
Which one is correct?
Your understanding was correct. It's the number of requests started per second. The duration of the request does not come in to play.
Source: I work at Parse.
I think you are right. I've made some experiments with Parse, for example i reloaded a UITableview 10 or 20 times in one second (can't remember) for 3-4 minutes and checked the requests in the admin panel. The maximum value was always less than 30, but it doesn't matter, the point is that you can test it this way and get more informations.
Just create some test project and reload the SampleViewController.m (which contains a Parse query) 30 times in one second, after this you can check the data browser which will display the traffic by req/sec.
As a second option you can upload a bunch of images by current user in every second, since the upload time is longer than 1 sec, you can check what happens when you start uploading a bunch of images (or other data) in every second.

sorting algorithm issue

I need help with my server application problems. Thing is:
I need to count 'top urls' in my web server within a eg one minute. How to acquire it?
by 'top urls' i mean top 10 or something
Suppose in one minute i got:
1 request with url 'http://localhost/10.jpg',
2 requests with url 'http://localhost/1.jpg', and 'http://localhots/12.jpg'
4 request with url 'http://localhost/2.jpg' and 'http://localhost/3.jpg'
and 10 requestes for 'http://localhost/13.jpg'
Should I add all requestes to table, and then after given time, sort them, or maybe is antoher, simpler way to sort them ?
Thx for all help
If you are keeping a temporary hit counter for each page, you don't really need to sort. When you want to start tracking, reset all the temporary counters to 0, and initialize a top ten list of pages. Every a time page is fetched, increment it's count, then check the value against the top ten list. If the count is greater than the next higher count on the list, move it up a rank.

Resources