How to preserve mixed case in GCP logging as currently it shows all lowercase? - google-cloud-logging

Following is the one example of the issue.
The code written in ruby as following:
DesLogger.instance.info ("Google::Cloud.configure.use_error_reporting: #{Google::Cloud.configure.use_error_reporting}")
However the log entry shown in GCP logging is:
INFO -- : google::cloud.configure.use_error_reporting: true
The reason for this question is that we have based-64 encoded data shown in GCP logs all in lowercase. As based-64 encoded data is case sensitive, it cannot be decoded and hence it blocks for further analysis.
Appreciate for any pointers.

Related

ill-formed log in Loki

I'm using zerolog in golang, which outputs json formatted log, the app is running on k8s, and has cri-o format as following.
actual log screenshot on Grafana loki
My question is, since there's some non-json text prepended to my json log, I can't seem to effectively query the log, one example is, when I tried to pipe the log into logfmt, exceptions were thrown.
What I want is to be able to query into the sub field of the json.
My intuition is to maybe for each log, only select the parts from { (start of the json), then maybe I can do more interesting manipulation. I'm a bit stuck and not sure what's the best way to proceed.
Any help and comments is appreciated.
after some head scratching, problem solved.
As I'm directly using the promtail setup from here https://raw.githubusercontent.com/grafana/loki/master/tools/promtail.sh
And within this setup, the default parser is docker, but we need to change it to cri, afterwards, the logs are properly parsed as json in my Grafana dashboard

Why protobuf needs to encode the type info in the message?

I am reading the official protobuf encoding doc. It states that protobuf message encodes the type of each field in the message. But, I thought the client side has the schema class file as well, so client should be able to know the types already. Why does protobuf even bother to send the type info the client already knows?
It says right there in your linked docs:
When the message is being decoded, the parser needs to be able to skip fields that it doesn't recognize. This way, new fields can be added to a message without breaking old programs that do not know about them. To this end, the "key" for each pair in a wire-format message is actually two values – the field number from your .proto file, plus a wire type that provides just enough information to find the length of the following value.
(emphasis mine)

How to read data in logs using logstash?

I have just started log stash, i have log files in that log file whole object is printed in the logs, Since my object is huge i cant write the grok patterns to the whole object and also i expecting only two values out of those object. Can you please let us know how can i get that?
my logs files looks like below
2015-06-10 13:02:57,903 your done OBJ[name:test;loc:blr;country:india,acc:test#abe.com]
This is just an example my object has lot attributes in int , in those object i need to get only name and acc.
Regards
Mohan.
You can use the following pattern for the same
%{GREEDYDATA}\[name:%{WORD:name};%{GREEDYDATA},acc:%{NOTSPACE:account}\]
GREEDYDATA us defined as follows -
GREEDYDATA .*
The key lie in understanding greedydata macro.
It eats up all possible characters as possible.
Logstash patterns don't have to match the entire line. You could also pull the leading information off (date, time, etc) in one grok{} and then use a different grok{} to pull off just the two fields that you want.

How to load complex web logs syntax with Pig?

I am a complete beginner on Pig. I have installed cdh4 pig and I am connected to a cdh4 cluster. We need to process these web log files that are going to be massive (the files are already being loaded to HDFS). Unfortunately the log syntax is quite involved (not a typical comma delimited file). A restriction is I cannot currently pre-process the log files with some other tool because they are just too huge and can't afford storing a copy. Here is a raw line in the logs:
"2013-07-02 16:17:12
-0700","?c=Thing.Render&d={%22renderType%22:%22Primary%22,%22renderSource%22:%22Folio%22,%22things%22:[{%22itemId%22:%225442f624492068b7ce7e2dd59339ef35%22,%22userItemId%22:%22873ef2080b337b57896390c9f747db4d%22,%22listId%22:%22bf5bbeaa8eae459a83fb9e2ceb99930d%22,%22ownerId%22:%222a4034e6b2e800c3ff2f128fa4f1b387%22}],%22redirectId%22:%22tgvm%22,%22sourceId%22:%226da6f959-8309-4387-84c6-a5ddc10c22dd%22,%22valid%22:false,%22pageLoadId%22:%224ada55ef-4ea9-4642-ada5-d053c45c00a4%22,%22clientTime%22:%222013-07-02T23:18:07.243Z%22,%22clientTimeZone%22:5,%22process%22:%22ml.mobileweb.fb%22,%22c%22:%22Thing.Render%22}","http://m.someurl.com/listthing/5442f624492068b7ce7e2dd59339ef35?rdrId=tgvm&userItemId=873ef2080b337b57896390c9f747db4d&fmlrdr=t&itemId=5442f624492068b7ce7e2dd59339ef35&subListId=bf5bbeaa8eae459a83fb9e2ceb99930d&puid=2a4034e6b2e800c3ff2f128fa4f1b387&mlrdr=t","Mozilla/5.0
(iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML,
like Gecko) Mobile/10B329
[FBAN/FBIOS;FBAV/6.2;FBBV/228172;FBDV/iPhone4,1;FBMD/iPhone;FBSN/iPhone
OS;FBSV/6.1.3;FBSS/2;
FBCR/Sprint;FBID/phone;FBLC/en_US;FBOP/1]","10.nn.nn.nnn","nn.nn.nn.nn,
nn.nn.0.20"
As you probably noticed there is some json embedded there but it is url encoded. After url decoding (can Pig do url decoding?) here is how the json looks:
{"renderType":"Primary","renderSource":"Folio","things":[{"itemId":"5442f624492068b7ce7e2dd59339ef35","userItemId":"873ef2080b337b57896390c9f747db4d","listId":"bf5bbeaa8eae459a83fb9e2ceb99930d","ownerId":"2a4034e6b2e800c3ff2f128fa4f1b387"}],"redirectId":"tgvm","sourceId":"6da6f959-8309-4387-84c6-a5ddc10c22dd","valid":false,"pageLoadId":"4ada55ef-4ea9-4642-ada5-d053c45c00a4","clientTime":"2013-07-02T23:18:07.243Z","clientTimeZone":5,"process":"ml.mobileweb.fb","c":"Thing.Render"}
I need to extract the different fields in the json and the "things" field which is in fact a collection. I also need to extract the other query string values in the log. Can Pig directly deal with this kind of source data and if so could you be so kind to guide me through how to have Pig be able to parse and load it?
Thank you!
For such complicated task, you ususally need to write your Load function.
I recommend Chapter 11. Writing Load and Store Functions in Programming Pig. Load/Store Functions in official docuemnt is too simple.
I experimented plenty and learned tons. Tried a couple json libraries, piggybank and the java.net.URLDecoder. Even tried the CSVExcelStorage. I registered the libraries and was able to solve the problem partially. When I ran the tests against a larger data set, it started hitting encoding issues in some lines of the source data resulting in exceptions and job failure. So I ended up using Pig's built-in regex functionality to extract the desired values:
A = load '/var/log/live/collector_2013-07-02-0145.log' using TextLoader();
-- fix some of the encoding issues
A = foreach A GENERATE REPLACE($0,'\\\\"','"');
-- super basic url-decode
A = foreach A GENERATE REPLACE($0,'%22','"');
-- extract each of the fields from the embedded json
A = foreach A GENERATE
REGEX_EXTRACT($0,'^.*"redirectId":"([^"\\}]+).*$',1) as redirectId,
REGEX_EXTRACT($0,'^.*"fromUserId":"([^"\\}]+).*$',1) as fromUserId,
REGEX_EXTRACT($0,'^.*"userId":"([^"\\}]+).*$',1) as userId,
REGEX_EXTRACT($0,'^.*"listId":"([^"\\}]+).*$',1) as listId,
REGEX_EXTRACT($0,'^.*"c":"([^"\\}]+).*$',1) as eventType,
REGEX_EXTRACT($0,'^.*"renderSource":"([^"\\}]+).*$',1) as renderSource,
REGEX_EXTRACT($0,'^.*"renderType":"([^"\\}]+).*$',1) as renderType,
REGEX_EXTRACT($0,'^.*"engageType":"([^"\\}]+).*$',1) as engageType,
REGEX_EXTRACT($0,'^.*"clientTime":"([^"\\}]+).*$',1) as clientTime,
REGEX_EXTRACT($0,'^.*"clientTimeZone":([^,\\}]+).*$',1) as clientTimeZone;
I decided not to use REGEX_EXTRACT_ALL in case the order of the fields varies.

Structured debug log

I am writing a complex application (a compiler analysis). To debug it I need to examine the application's execution trace to determine how its values and data structures evolve during its execution. It is quite common for me to generate megabytes of text output for a single run and sifting my way through all that is very labor-intensive. To help me manage these logs I've written my own library that formats them in HTML and makes it easy to color text from different code regions and indent code in called functions. An example of the output is here.
My question is: is there any better solution than my own home-spun library? I need some way to emit debug logs that may include arbitrary text and images and visually structure them and if possible, index them so that I can easily find the region of the output I'm most interested. Is there anything like this out there?
Regardless you didn't mentioned a language applied, I'd like to propose apache Log4XXX family: http://logging.apache.org/
It offers customizable details level as well as tag-driven loggers. GUI tool (chainsaw) can be combined with "old good" GREP approach (so you see only what you're interested in at the moment).
Colorizing, search and filtering using an expression syntax is available in the latest developer snapshot of Chainsaw. The expression syntax also supports regular expressions (using the 'like' keyword).
Chainsaw can parse any regular text log file, not just log files generated by log4j.
The latest developer snapshot of Chainsaw is available here:
http://people.apache.org/~sdeboy
The File, load Chainsaw configuration menu item is where you define the 'format' and location of the log file you want to process, and the expression syntax can be found in the tutorial, available from the help menu.
Feel free to email the log4j users list if you have additional questions.
I created a framework that might help you, https://github.com/pablito900/VisualLogs

Resources