I have created the following view in Ganglia, showing cpu_user stats:
Can someone tell me what Sintr means? I was not able to find any information on Google or stackexchange websites.
Interestingly, I have two servers with identical hardware that I'm monitoring, but only one of them has the Sintr entry (which caught my eye).
Okay, I found an answer hidden in some Ganglia dev mailing list...
From this post:
I also added two specific metrics to Linux. cpu_intr and cpu_sintr
count the number of cycles spent on hard/soft interrupts.
Still wondering why it's only shown for one server and not for the other, but that's another story.
Related
I would like to know if there is a proper method to track memory accesses
across multiple resources at once. For example I set up a simple dual core CPU
by advancing the simple.py from learning gem5 (I just added another
TimingSimpleCPU and made the port connections).
I took a look at the different debug options and found for example the
MemoryAccess flag (and others), but this seemed to only show the accesses at
the DRAM or one other resource component.
Nevertheless I imagine a way to track events across CPU, bus and finally memory.
Does this feature already exist?
What can I try next? Is it and idea to add my own --debug-flag or can I work
with the TraceCPU for my specified use?
I haven't worked much with gem5 yet so I'm not sure how to achieve this. Since until now I only ran in SE mode is the FS mode a solution?
Finally I also found the TraceCPUData flag in the --debug-flags, but running
this with my config script created no output (like many other flags btw. ...).
It seems that this is a --debug-flag for the TraceCPU, what kind of output does this flag create and can it help me?
I am evaluating various system monitoring tools to use one to monitor my hadoop cluster.
One of the tools I am impressed by is collectl. I have been playing around with it since a couple of days.
I am struggling to find how can we aggregate the metrics captured by collectl when using colmux?
Say, I have 10 nodes in my hadoop cluster each running collectl as a service. Using colmux I can see the
performance metrics of each node in a single view (in single and multi-line formats). Great!
But what if I am considering aggregate of CPU, IO etc on all the nodes in the cluster. That is I want to find
how my cluster as a whole is performing by aggregating the performance metrics from each node into corresponding
numbers, thereby giving me cluster-level metrics instead of node-level.
Any help is greatly appreciated. Thanks!
I had already answered this on the mailing list but for the benefit of those not on it I'll repeat myself here..
That's a cool idea. So if I understand you correctly you might see some sort of total line at the bottom? I can always add to my wish list but no promises. But I think I may also have a solution if you don't mind doing a little extra work on your own ;) btw - can I assume you've installed readkey so you can change sort columns with the arrow keys?
If you run colmux with --noesc, it will take it out of full screen more and simply print everything as scrolling output. If you then also include "--lines 99999" (or some big number) it will print all the output from all the remote systems so you don't miss anything. Finally you can pipe the output through perl, python, bash, or whatever your favorite scripting tool might be and do the totals yourself. Then whenever you see a new header fly by, print the totals and reset the counters to 0. You could even add timestamps and maybe even ultimately make it your own opensource project. I bet others would find it useful too.
-mark
We have been having a bit of a nightmare this last week with a business critical XPage application, all of a sudden it has started crawling really badly, to the point where I have to reboot the server daily and even then some pages can take 30 seconds to open.
The server has 12GB RAM, and 2 CPUs, I am waiting for another 2 to be added to see if this helps.
The database has around 100,000 documents in it, with no more than 50,000 displayed in any one view.
The same database set up as a training application with far fewer documents, on the same server always responds even when the main copy if crawling.
There are a number of view panels in this application - I have read these are really slow. Should I get rid of them and replace with a Repeat control?
There is also Readers fields on the documents containing Roles, and authors fields as it's a workflow application.
I removed quite a few unnecessary views from the back end over the weekend to help speed it up but that has done very little.
Any ideas where I can check to see what's causing this massive performance hit? It's only really become unworkable in the last week but as far as I know nothing in the design has changed, apart from me deleting some old views.
Try to get more info about state of your server and application.
Hardware troubleshooting is summarized here: http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Domino_Server_performance_troubleshooting_best_practices
According to your experience - only one of two applications is slowed down, it is rather code problem. The best thing is to profile your code: http://www.openntf.org/main.nsf/blog.xsp?permaLink=NHEF-84X8MU
To go deeper you can start to look for semaphore locks: http://www-01.ibm.com/support/docview.wss?uid=swg21094630, or to look at javadumps: http://lazynotesguy.net/blog/2013/10/04/peeking-inside-jvms-heap-part-2-usage/ and NSDs http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Using_NSD_A_Practical_Guide/$file/HND202%20-%20LAB.pdf and garbage collector Best setting for HTTPJVMMaxHeapSize in Domino 8.5.3 64 Bit.
This presentation gives a good overview of Domino troubleshooting (among many others on the web).
Ok so we resolved the performance issues by doing a number of things. I'll list the changes we did in order of the improvement gained, starting with the simple tweaks that weren't really noticeable.
Defrag Domino drive - it was showing as 32% fragmented and I thought I was on to a winner but it was really no better after the defrag. Even though IBM docs say even 1% fragmentation can cause performance issues.
Reviewed all the main code in the application and took a number of needless lookups out when they can be replaced with applicationScope variables. For instance on the search page, one of the drop down choices gets it's choices by doing an #Unique lookup on all documents in the database. Changed it to a keyword and put that in the application Scope.
Removed multiple checks on database.queryAccessRole and put the user's roles in a sessionScope.
DB had 103,000 documents - 70,000 of them were tiny little docs with about 5 fields on them. They don't need to be indexed by the FTIndex so we moved them in to a separate database and pointed the data source to that DB when these docs were needed. The FTIndex went from 500mb to 200mb = faster indexing and searches but the overall performance on the app was still rubbish.
The big one - I finally got around to checking the application properties, advanced tab. I set the following options :
Optimize document table map (ran copystyle compact)
Dont overwrite free space
Dont support specialized response hierarchy
Use LZ1 compression (ran copystyle compact with options to change existing attachments -ZU)
Dont allow headline monitoring
Limit entries in $UpdatedBy and $Revisions to 10 (as per domino documentation)
And also dont allow the use of stored forms.
Now I don't know which one of these options was the biggest gain, and not all of them will be applicable to your own apps, but after doing this the application flies! It's running like there are no documents in there at all, views load super fast, documents open like they should - quickly and everyone is happy.
Until the http threads get locked out - thats another question of mine that I am about to post so please take a look if you have any idea of what's going on :-)
Thanks to all who have suggested things to try.
in my script i have a scenario like the page contains multiple check boxes for example 10, as per the user need user selects check boxes for example one user selects 4 check boxes and other user clicks 5 check boxes, so per each it will vary.
so how to correlate those values,
thanking you.
From the website: "Please don’t share your solutions, ask for help, or help others. This is meant to be a challenge."
So you appear to be violating one of the primary rules in this website. I have looked at this challenge and it's really good to gauge someone's knowledge.
However, to address technology generally - in reading your question I get the sense you may be missing certain fundamental knowledge for this kind of thing. Here's some fundamental knowledge. Hopefully my answer will help increase your knowledge. And hopefully you can use this increased general knowledge to address this specific question.
Definitions:
Correlation - you're taking data the SERVER sends to the browser, capturing it and sending it back. Information present on web pages would fit into this category.
Parameterization - you've got a set of values you'd like to put into web forms. This is usually values like names, addresses, etc
Also understand exactly what is happening when you conduct certain actions on your browser. When you "click" a checkbox does that actually send a message to a server? That usually doesn't (though not always) happen. So when you use phrases like 'click a checkbox' that tells me you may not appreciate the fact that performance testing is server focused, not browser focused.
Performance testing isn't intuitive so you need to understand these concepts. If you dedicate time to understanding the concepts I've outlined above you'll have the knowledge to complete the challenge.
Good luck.
What is driving the variation on check boxes being checked? Is it the result of something that comes back from the server, from a previous request? Or is it somewhat random based on whatever the user wants to do at runtime?
I'm working on an app that gives traffic alerts in real time and is based on crowd-sourced information. In other words, people use the app and report traffic problems and at the same time they are informed about traffic problems in their area.
A difficult task is how to distinguish real alert reports from fake ones so that the app behaves properly and is useful.
Do you know of any documentation regarding this issue or any programmer stories, insights into this problem? How should this problem be tackled?
What I've come up until now is:
each person using the app is uniquely identified
each alert report has a reliability value in an interval 1 .. x
the reliability of a report is calculated based on the number of users that reported it or confirmed it and the reputation of those people. But how exactly?
each person has a reputation value which is calculated somehow. But how?
I'm not sure how to handle the reputation/reliability stuff so I'd love some input on this. There must be some documentation on how to create a crowd-sourcing product that works.
Panos Ipeirotis has a fabulous talk on this subject. Yes, you want to incentivize good behavior. If Waze is too complicated this talk will be too, but it will give you a good idea of what is possible if you throw the kitchen sink at it.