How to get number of calls for a trace - google-cloud-stackdriver

I've added traces to measure execution of some code running on GCP. The traces appear in trace list on StackDriver page and I can see their duration. What I cannot find is number of times reach traces was issued. Where does this number appear?

So the comments confirm that StackDriver Trace is not usable for collecting info about number (density) of events along the timeline. I'd expect to get this data from the trace, but will have to find another tools to obtain it.

Related

Why are there holes in my cloudwatch logs?

I have been running lambdas using C# with serverless.com framework for some months now, and I consistently notice holes in the cloudwatch logs. So far it has only been an annoyance. I have been looking around for some explanation, but it is starting to get to the point where I need to understand/fix the problem.
For instance, today I can see the lambda monitor shows hundreds to thousands of executions between 7AM and 8AM, but the cloudwatch logs show logfiles up until 7:19AM and then nothing again until 8:52AM.
What is going on here?
Logs are by Invocation of the lambda and log group links are by concurrent executions. If you look at your lambda metrics, you will see a stat called ConcurrentExecution - this is the total number of simultaneous serverless lambda containers you have running at any given moment - but that does NOT equal the same as Invocations. The headless project im on is doing about 5k invocations an hour and we've never been above 5 concurrent executions of any of our 25ish lambda's (helps that they all run after start up at about 300ms)
So if you have 100 invocations in 10 seconds, but they all take less than a second to run, once a given lambda container is spun up it will be reused as long as it is continually receiving events. This is how AWS works around the 'cold start' problem as much as possible where a given lambda may take 10-15 or more seconds to start up. By trying to predict traffic flow (and you can manipulate these settings as well) AWS is attempting to have a warm lambda ready to go for you whenever you need it.
These concurrent executions are slowly shut down as their volume drops off, their calls brought back in to other ones that are still active.
What this means for Log Group logs is two fold:
you may see large 'gaps' in the times but if you look closely any given log group will have multiple invocations in it.
log groups are delayed by several seconds to several minutes depending on the server load, so at any given time you may not actually be seeing all the logs of a given moment.
The other possibility is that you logging is not set up correctly (Python lambda's in particular have difficulty in logging properly to cloudwatch - the default Logging Handler doesn't play nice with the way lambda boots up a handler to attach it to the logGroup) or what you are getting is a ton of hits that are not actually doing anything - only pings/keep alive events that do not actually trigger any of your log statement - at which you will generally only see the concurrent start up/shutdown log statements (as stated above they are far fewer)
What do you mean with gaps in log groups?
A log group gets its log by log streams and one of the same lambda container use the same log stream. So it may not be the most recent log stream in your log group that have the latest log entry.
Here you can read more about it:
https://dashbird.io/blog/how-to-save-hundreds-hours-debugging-lambda/
While trying to edit my question with screenshots and tallies of the data, I came upon the answer. I thought it would be helpful for this to be a separate answer as it is extremely specific and enlightening.
The crux of the problem is that I didn't expect such huge gaps between invocation times and log write times. 12 minutes is an eternity compared to the work I have done in the past.
Consider this graph:
12:59 UTC should be 7:59AM CST. Counting the invocations between 12:59 and 13:08, I get roughly ~110.
Cloudwatch shows these log streams:
Looking at these log streams, there seems to be a large gap. The timestamp on the log stream is the "file close" time. The logstream for 8:08:37 includes events from 12 minutes before.
So the timestamps on the log streams are not very useful for finding debug data. The search all has not been very helpful up until now either. Slow and very limited. I will look into some other method for crunching logs.

Should average response time include failed transactions or not?

In loadrunner report it excludes failed transactions for calculating average response time but in JMeter it includes failed transactions as well for calculating average response time. I am bit confused here. What is the best way to calculate average response time? Should it include failed transactions or not? Detailed explanations will be highly appreciated.
It depends on where exactly your "transaction" failed.
If it reached the server, made a "hit" (or several hits), kicked off request processing and failed with non-successful status code - I believe it should be included as your load testing tool has triggered the request and it's the application under test which failed to respond properly or on time.
If the "transaction" didn't start due to missing test data or incorrect configuration of the load testing tool - it shouldn't be included. However it means that your test is not correct and needs to be fixed.
So for well-behaved tests I would include everything into the report and maybe prepared 3 views:
Everything (with passed and failed transactions)
Successes only
Failures only
In JMeter you can use Filter Results Tool to remove failed transactions from the final report, the tool can be installed using JMeter Plugins Manager
A failed transaction can be faster than one which passes. Example, a 4xx or 5xx status message may arrive almost instantaneously back to the client. Get enough of these errors and your average response time will drop considerably. In fact, if I was an unscrupulous tester, castigated for the level of failure on my tests, I might include a lot of "fast responses" in my data set to deliberately skew the response time so my stakeholders don't yell at me anymore.
Not that this every happens.

Generating summary report from jtl

After running a JMeter load test from command line(nonGUI mode), I would like to have a summary report with each transaction, Avg response times, #of transactions and so on. I tried to achieve it by importing summary_report.jtl file by following the steps.
Open JMeter-UI
Add Summary Report Listener
Browse the summary_report.jtl file that is created during the test.
Now I am seeing all the transactions, #samples, Error% and so on. But average, min, max and std deviation values are ZERO.
What could be the issue here?
Can you see the raw file and check if the latency has been captured properly? If your JTL didn't capture latency you may see all the metrics as 0.
Also check if there is any exception in jmeter.log file when you try to open the jtl. Might help with debugging.
(Also, you mentioned summary_report.jtl, check if your JTL has all the samples or it's a summary report itself.)

Get time stamps for stack in crash dump

I have a process that crashes unexpectedly.
About the same time the crash occurs, I see an error in the log infrastructure process and then it softly shut down.
I'm trying to understand which of the processes is causing the problem, the log infra getting my process crash or the other way around.
In order to do that, I'm looking at the crash dump my process produced (taken with adplus) and trying to understand, at what time exactly the first exit-related method was called, then compare it with the log infra error time and shutdown time.
How can I do that, is there a way to get, method calls time stamp, in stack?
Thanks.
Attach WinDbg or start your app with WinDbg and change the show time stamps parameter:
.echotimestamps 1
This will insert timestamps into the output for all events such as exceptions, thread creations etc.. see this msdn link.
I would also write a log to disk immediately once WinDbg attaches:
.logopen c:\temp\mylog.txt
to capture the output, this should achieve what you want.

What is the difference between trace log and counter log?

In performance monitor, what is the difference between counter log and trace log? Any guidelines on when we should use each of them?
Counter logs poll the event in some period interval (for e.g. every minutes) based on a setting, whereas Trace logs only log the event when something happens.

Resources