Some spans reported to google trace represent method calls that ended in an error.
Is there a way to get google trace to visually set these spans apart from success spans (a different color, an error icon similar to AWS xray...)?
I tried setting these attributes, but visually they made no difference:
Span status
/error/message attribute
/error/name attribute
/http/status_code attribute
You could also use a Trace Filter. This will filter the Traces by “Terms”. For instance, you could select the span as well as a latency and Stackdriver Trace will filter it.
Related
I am implementing micrometer for our GraphQL service. One thing I am noticing is for #BatchMapping methods we are getting a DataFetcherObservationContext for each index in the incoming list.
Example: I am looking up a group of skus and on each of those skus I am looking up the brand information using a #BatchMapping so that I am making only 1 webservice call to our Brand microservice. However when I look at the observability trace metrics in grafana I am seeing an entry for each index(sku) in the list that I am giving to the #BatchMapping. Is there a way to combine these into a single DataFetcherObservationContext so I am not getting 1 for each Sku that I am ultimately returning?
See attached screenshot for what I see in grafana
I am using all of the OOB Observation contexts for graphql and have just started to dabble into creating my own custom implementation but hoping there is an easier way.
I am expecting this to be one single observation for the entire #BatchMapping not individual for each index of the parent list coming in.
Edit: One other thing I am seeing is stackOverFlowErrors if I try to look at the graphQLContext for the Observation object for all of the parentObservations. It seems to be doing so many that it overflows the buffer.
Is there a way to get the full trace request given a starting point from anywhere in the lifecycle of the trace?
Basically, if I have a middle point or an end point of a trace, can I use those points to obtain the full trace of a request?
I want to build a tracing service (in Golang) where the service can return the full trace of a request given that a user supplies a point/span at any time during the trace of the request.
I have tried searching and looking to see if any projects have mentioned backwards tracing or something similar to that.
Currently, with other tracers like Datadog, its not possible to get the trace of a full request given any starting point that is not the beginning.
In OpenTelemetry, the Trace ID is immutable and intended to be the same for the entire logical request (assuming W3C Headers). A trace is a directed acyclic graph, which means the ordering can be determined by finding all spans with the same trace ID and then sorting them by their edges (which would be determined by the span ID and parent span ID fields). This means that you can 'look back' very easily as long as you have all of the spans available - you just look for everything with the same trace ID as the span you have, and create the graph.
It depends on what you've logged. If you have access to the full raw logs, and you've logged the entire context at the beginning of every server, client, and middleware request, you can perhaps look for all the logs with a similar traceID in their context. Again, based on your instrumentation, you might not HAVE this log.
For example, I see an error log in Kibana, but I am not interested only in this error, but also the context of this line, i.e., I want to know what happens before and after this error. Such as:
the order failed with status "FAILED", but the log just before this line would contain the method name who caused this error
some 5-10 lines before this, I know there would be a line like "Start processing order xxxxx with status xxx"
and 15-20 lines after this log, there would be something like "End processing with status xxx"
All this together, marks a life cycle of processing of this particular order. And all these lines are what I mean by saying "context".
How can I get all these lines as a search in Kibana?(Let's suppose all the literals are in the field "message")
For now, I know we can "view surrounding documents", but that is not efficient enough.
https://www.elastic.co/guide/en/kibana/current/xpack-apm.html
Well, just learn about Elastic APM and it can solve part of the problem. APM can record "span" and "transactions" to form "distributed trace", then add info to the field "trace" to logs and then we can aggregate all logs with same trace id to learn the context of this event across the microservices.
The question now changes to "How to use APM to add trace". And, one of our microservices is reactive, which cannot be easily adapted to use APM: Reactive pipeline context is thread-based, between the threads there are no easy way to transfer the trace from one context to another. So this is the part APM cannot solve.
But at least now we know in imperative apps we have a way.
My organization is setting up dashboards for our backend services and after performance testing that we ran, we have noticed that some API calls report http status N\A.
It is not very helpful, anyone seen something like that?
Is that a configuration issue?
Sounds like some of your http.server.requests.count metric values do not have any status tag, so when you group by the status tag, those are being aggregaed with a value of n/a.
If it is intentional/expected that this metric would have values without the status tag and you just want to ignored those metric values, then you can use the exclude_null() function to remove that tag grouping from your graph (docs here).
If it is not intentional/expected that this metric would have values without the status tag, then you probably want to reach out to support#datadoghq.com to get that looked into.
The top events report in analytics is showing really strange numbers and I'm not sure why. It was noticed because a bunch of the same event category/actions (with different labels and pages that they would happen on) would have the exact same amount of events occuring (ie: 8 with 83, ~15 with 62, ~75 with 21) yet these events are on pages with drastically different pageviews so they shouldn't have such similar counts for clicking the contact button)...
Also, if I make a custom flat table report and report on the same numbers they show what seems to be a much more accurate picture... Is there something odd about the numbers in 'Top Events' that I'm not understanding that would mean the numbers it's reporting are correct, or is it actually just messed up/a bug/etc?
note: also notice in the custom report it also reports on 4 different events from the same page happening throughout the year (as the email changed several times) whereas the top events report only shows 1... The single item reported in Top Events corresponds to the 3rd item in the flat table
Top Events:
Custom Flat Table:
I'm under the impression that you didn't set these event triggers up, so these are spambots that are sending fake event data to your analytics. You may see some spammy referral data in your Referral reports too (unless you've done some filtering already).
Since you mentioned that the events don't correlate to pageviews, they are likely ghosts (They are just sending analytics data without actually visiting your site). You'll want to set up something like a Hostname filter to prevent ghost referrals from sending data to your property without actually visiting your server.
For reference, if they were not ghosts (and were actually going to your site), you'd need to either set up more specific Source filters in GA, or a server-side blacklist.
Here is a good article on the motives behind spambots, and here are some options for how to filter them.