sorting algorithm issue - sorting

I need help with my server application problems. Thing is:
I need to count 'top urls' in my web server within a eg one minute. How to acquire it?
by 'top urls' i mean top 10 or something
Suppose in one minute i got:
1 request with url 'http://localhost/10.jpg',
2 requests with url 'http://localhost/1.jpg', and 'http://localhots/12.jpg'
4 request with url 'http://localhost/2.jpg' and 'http://localhost/3.jpg'
and 10 requestes for 'http://localhost/13.jpg'
Should I add all requestes to table, and then after given time, sort them, or maybe is antoher, simpler way to sort them ?
Thx for all help

If you are keeping a temporary hit counter for each page, you don't really need to sort. When you want to start tracking, reset all the temporary counters to 0, and initialize a top ten list of pages. Every a time page is fetched, increment it's count, then check the value against the top ten list. If the count is greater than the next higher count on the list, move it up a rank.

Related

Oracle Reports Builder 12 multiple page report

I have a report that spans 3 pages. When I run it for a parameter that may return more than one record, it returns all the page 1s first, then all the page 2s and finally all the page 3s.
I haven't been able to figure out how to format it so that it outputs page 1-3 for the first record, then page 1-3 for the second and so on.
Any guidance would be appreciated.
The way you described it, I think of 3 queries - each of them producing one of pages.
One of them (let's call it Q1) should be the "master" query, and others linked to it.
In Paper Layout Editor, Q1's repeating frame (R1) should enclose all other frames so that all data related to the 1st record are grouped together. Don't forget to set vertical elasticity properties (probably to "variable"), set page protect to "yes"; also, you'll probably want to set page break on all repeating frames so that they are printed on separate pages.

Chance of data mismatch when Multiple users share datafiles

The sceanrio is: change user password(User will change password to new password).
I have to run a fixed number of users for 1hr duration.
Input I have is: A fixed number of userids set to the same password.
The data file (username,password,newpassword) in case of 3 users ( users can vary Up to 200) is like:
u1,p1,newp1
u2,p1,newp1
u3,p1,newp1
u1,newp1,newp2
u2,newp1,newp2
u3,newp1,newp2
u1,newp2,newp3
u2,newp2,newp3
u3,newp2,newp3....
The mode used for csv data config is shared across " all threads".
So basically all users will start with password p1 and change it to newp1 in the first iteration. Then for next iteration the current password will be newp1 and it will changed to newp1.
Initially I will use a 5-10 minute ramp up.
The doubt is: This is working in case of few users. But is it sure that all the users will pick the data from csv in the correct order for a prolonged duration?
For example if some of the users finish the iteration earlier will they pick the userids already in process by other ids?
Any miss match will affect the execution since assumption is the users will pick the correct current and new passwords. Also one failure will fail all reamaining iterations.
Please check this approach is correct and suggest modifications if needed. Also feel free to suggest if you have a better approach for the scenario.

How get max count of request in time in splunk

Hi I'm developing rails web application with Solr search engine inside. The path to get search results is '/search/results'.
Users makes many requests when searching for something and I am in need of getting max count of intime search requests for all time (to check need it to do some optimization or increase RAM etc.). I know that there are peak times, when loading is critical and search works slowly.
I use Splunk service to collect app logs and it's possible to get this requests count from logs, but I don't know how write correct Splunk query to get data which I need.
So, how can I get max number of per 1hour requests to '/search/results' path for date range?
Thanks kindly!
If you can post your example data & or your sample search, its much easier to figure out. I'll just post a few examples of I think might lead you in the right direction.
Let's say the '/search/results' is in a field called "uri_path".
earliest=-2w latest=-1w sourcetype=app_logs uri_path='/search/results'
| stats count(uri_path) by date_hour
would give you a count (sum) per hour over last week, per hour.
earliest=-2w latest=-1w sourcetype=app_logs uri_path=*
| stats count by uri_path, hour
would split the table (you can think 'group by') by the different uri_paths.
You can use the time-range picker on the right side of the search bar to use a GUI to select your time if you don't want to use the time range abbreviations, (w=week, mon=month, m=minute, and so on).
After that, all you need to do is | pipe to the stats command where you can count by date_hour (which is an automatically generated field).
NOTE:
If you don't have the uri_path field already extracted, you can do it really easily with the rex command.
... | rex "matching stuff before uri path (?<uri_path>\/\w+\/\w+) stuff after'
| uri_path='/search/results'
| stats count(uri_path) by date_hour
In case you want to learn more:
Stats Functions (in Splunk)
Field Extractor - for permanent extractions

Correct method to calculate "Total wait time" of a session in oracle

I need to find out the total time a session is waiting when its is active.
For this i used the query like below...
SELECT (SUM (wait_time + time_waited) / 1000000)
FROM v$active_session_history
WHERE session_id = 614
But, i feel i'm not getting what i wanted using this query.
Like, first time when i ran this query i got 145.980962, # second time=145.953926and #3rd time i got 127.706429.
Ideally, the time should be same or increase. But, as you see, the value returned is reducing everytime.
Please correct me where i'm doing wrong.
It does not contain whole history, v$active_session_history "forgets" older lines. Think about it as a ring of buffers. Once all buffers are written, it restarts from 1st buffer.
To get events of some session, look v$session_event. To get current (active) event of active session: v$session_wait (In recent Oracle versions, you can find this info also in v$session)
NOTE: v$session_event view will not show you CPU time (which is not event but can be seen in v$active_session_history). You can add it, for example, from v$sesstat if needed...
Your bloomer is that you have not understood the nature of v$active_session_history: it is a sample not a log. That is, each record in ASH is a point in time, and doesn't refer back to previous records.
Don't worry, it's a common mistake.
This is a particular problem with WAIT_TIME. This is the total time waited for that specfic occurence of that event. So if the wait event stretches across two samples, in the first record WAIT_TIME will be 1 (one second) and in the next sample it will be 2 (two seconds). However, a SUM(WAIT_TIME) would produce a total of 3 which is too much. Of course this is an arithmetic proghression so if the wait event stretches to ten samples (ten seconds) a SUM(WAIT_TIME) would produce a total of 55.
Basically, WAIT_TIME is a flag - if it is 0 the session is ON CPU and if it's greater than zero it is WAITING.
TIME_WAITED is only populated when the event has stopped waiting. So a SUM(TIME_WAITED) wouldn't give an inflated value. In fact just the opposite: it will only be populated for wait events which were ongoing at the sample time. So there can be lots of waits which fall between the interstices of the samples which won't show up in that SUM.
This is why ASH is good for highlighting big performance issues and bad for identifying background niggles.
So why doesn't the total time doesn't increase each time you run your query? Because ASH is a circular buffer. Older records get aged out to make way for new samples. AWR stores a percentage of the ASH records on disk; they are accessible through DBA_HIST_ACTIVE_SESSION_HIST (the default is one record in ten). So probably ASH purged some samples with high wait times between the second and third times you ran your queries. You could check that by including MIN(SAMPLE_TIME) in the select list.
Finally, bear in mind that SIDs get reused. The primary key for identifying a session is (SID, Serial#), Your query only grouops by SID, so it may use data from several different sessions.
There is a useful presentation by Graham Woods, on of the Oracle gurus who worked on ASH called "Shifting through the ASHes". Altough if would be better to hear Graham speaking, the slide deck on its own still provides some useful insights. Find it here.
tl;dr
ASH is a sample not a log. Use it for COUNTs not SUMs.
"Anything wrong in the way query these tables? "
As I said above, but perhaps didn't make clear enough, DBA_HIST_ACTIVE_SESSION_HIST only holds a fraction of the records from ASH. So it is even less meaningful to run SUM() on its columns than on the live ASH.
Whereas V$SESSION_EVENT is an actual log of events. Its wait times are reliable and accurate. That's why you pay the overhead of enabling timed statistics. Having said which, V$SESSION_EVENT only gives us aggregated values per session, so it's not particularly useful in diagnosis.

Determining session time for a website

For one of my classes we need to calculate the session length for a user visiting a website. We were given a web log. The web log is in this format:
IPAddress date httpMethod httpStatus size referrer browserInfo
The httpMethod looks like this: GET /include/main_page.css HTTP/1.1
The referrer is always the main page: http://www.cs.myCollage.com or -
I am using a timeout value of 20 minutes.
QUESTIONS:
I am not sure how to tell when a session is over other than when it times out. Is the only way to end a session with a timeout? Is there a way to detect when a user leaves the site (using only the information in the logs)?
This is my current strategy (assume that we have these logs):
IPAddress Time httpMethod ...
IP1 2:15 GET something
IP1 2:17 GET something else
IP1 2:30 GET something else
IP1 4:30 GET something else
IP1 4:32 GET something else
This means that the user has had two sessions. I think that the first session would be either 15 minutes or 35 minutes. Should I include the timeout in the session time?
The second session would be between 2 minutes and 22 minutes.
Timeout value is used to separate different sessions coming from same IP (which is not necessarily the same person). In your example you have two different sessions because period from 2:30 to 4:30 is larger than timeout value.
As for determining session length this is probably straightforward class homework solution, and probably what teacher had in mind: just subtract start time from end time. In your case 15 minutes for first session, and 2 minutes for second.
If this would be a real world project then maybe last page in each session should be given some value too. For this you can use temporal locality approach:
The duration of the last GET could be estimated by average durations of all pages that precede it. In you example (2:15,2:17,2:30) first two pages lasted for 15 minutes, so estimation is that visitor is kinda slow and/or thorough and that third page lasted for 7.5 minutes, and session total is 22.5 minutes. From (4:30,4:32) we deduce that last page lasted for 2 minutes, and session total is 4 minutes. In special case where we have only one page visit you must have some arbitrary value for duration, like 1 minute.
Another approach is to put a value to every page. Some page take more time to read than others. This means you must read the whole log and determine the average visit time for each page when they are in mid session, and use this time for case when page is last in session. This is more complicated, and probably not an answer to your homework question.
Best real world solution would probably be a mix of these two approaches.

Resources