Merge multiple log files into a single one based on timestamp

Merge multiple log files into a single one based on timestamp - shell

I'm trying to find a solution to fastly merge 2 log files coming from 2 application servers.
The log files is like this:
00:00:00,028 DEBUG [com.acme.productionservice...
I would like something that based on the time stamp print one line of the log file or another for example:
if file one have 2 lines:
00:00:00,028 DEBUG [com.acme.productionservice...
00:00:00,128 DEBUG [com.acme.productionservice...
and file two have this 3 lines:
00:00:00,045 DEBUG [com.acme.productionservice...
00:00:00,100 DEBUG [com.acme.productionservice...
00:00:00,150 DEBUG [com.acme.productionservice...
the output should be
00:00:00,028 DEBUG [com.acme.productionservice... (file 1)
00:00:00,045 DEBUG [com.acme.productionservice... (file 2)
00:00:00,100 DEBUG [com.acme.productionservice... (file 2)
00:00:00,128 DEBUG [com.acme.productionservice... (file 1)
00:00:00,150 DEBUG [com.acme.productionservice... (file 2)
the only way I currently know is using
cat file1 file | sort
but this is very slow for gb of logs
I need something like reading the 2 files and compare the timestamps and decide what to print.

I ended up by using
sort -m
I also used a trick to understand from which log file the log comes from with
for a in *.log ; do
awk '$0=FILENAME" "$0' $a > $a.log
do
sort -m -k 2 *.log.log

Try Super Speedy Syslog Searcher
(assuming you have rust installed)
cargo install super_speedy_syslog_searcher
then
s4 log1 log2
However, Super Speedy Syslog Searcher expects to find a datetime stamp. If you can change the logging format from timestamp to datetimestamp then s4 can sort and merge the lines.

Related

Faster way of Appending/combining thousands (42000) of netCDF files in NCO

I seem to be having trouble properly combining thousands of netCDF files (42000+) (3gb in size, for this particular folder/variable). The main variable that i want to combine has a structure of (6, 127, 118) i.e (time,lat,lon)
Im appending each file 1 by 1 since the number of files is too long.
I have tried:
for i in input_source/**/**/*.nc; do ncrcat -A -h append_output.nc $i append_output.nc ; done
but this method seems to be really slow (order of kb/s and seems to be getting slower as more files are appended) and is also giving a warning:
ncrcat: WARNING Intra-file non-monotonicity. Record coordinate "forecast_period" does not monotonically increase between (input file file1.nc record indices: 17, 18) (output file file1.nc record indices 17, 18) record coordinate values 6.000000, 1.000000
that basically just increases the variable "forecast_period" 1-6 n-times. n = 42000files. i.e. [1,2,3,4,5,6,1,2,3,4,5,6......n]
And despite this warning i can still open the file and ncrcat does what its supposed to, it is just slow, at-least for this particular method
I have also tried adding in the option:
--no_tmp_fl
but this gives an eror:
ERROR: nco__open() unable to open file "append_output.nc"
full error attached below
If it helps, im using wsl and ubuntu in windows 10.
Im new to bash and any comments would be much appreciated.

Either of these commands should work:
ncrcat --no_tmp_fl -h *.nc
or
ls input_source/**/**/*.nc | ncrcat --no_tmp_fl -h append_output.nc
Your original command is slow because you open and close the output files N times. These commands open it once, fill-it up, then close it.

I would use CDO for this task. Given the huge number of files it is recommended to first sort them on time (assuming you want to merge them along the time axis). After that, you can use
cdo cat *.nc outfile

diff -u -s, line cound (+, -) not giving correct value

I am using diff -u -s file1 file2 and counting + and - for Added and deleted lines in files for File comparison automation. (Modified lines will also count as one + and one -). These counts match with Araxis tool compare statistics (Total Added+Deleted of script=Changed+deleted+new of Araxis) for most of the files. But script total and Araxis total does not match for few files.
P.S. - I am using cygwin to run script on windows. I tried dos2unix, tail -c 4 etc in hope of removing BOM characters. But out of these culprit files some of them does not have BOM, and still count does not match. Following are few sample culprit files.
(1)SIACPO_ActivacionDesactivacionBlacklist.aspx.vb - Script gives 57 total count, while araxis 55
(2)SIACPO_Suspension_Servicio.aspx - Script gives 2509 total count, while araxis 2473
(3)repCuadreProceso.aspx - Script gives 1165 total count, while araxis 1163
(4)detaPago.aspx.vb - This is strange file. There is no change at all, except BOM character on 1st line. Script gives 0, 0 count, then why at all this in modified list of files??
Now how can I attach these 4 culprint files (Dev as well as Prod version) for your troubleshooting?

dynamic debug statement of kernel in which file

I have enabled the CONFIG_DYNAMIC_DEBUG option in kernel. After which we get control file in debug/dynamic_debug directory.
After we enable some debug logs in control file, where this log statements will be printed, in which log file ?

You can check kernel log level by cat /proc/sys/kernel/printk. Default is 4. Log levels are defined here https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/kern_levels.h?id=refs/tags/v4.8-rc8#n7. As a test you can set it to highest to make sure that everything is logged: echo "7" > /proc/sys/kernel/printk.
You can also run cat /proc/kmsg while the dynamic debug statements are running. It /proc/kmsg holds kernel messages for them to be picked up by dmesg or something else.

Postprocess drmemory error stacks with new symbols after process exits

After running a set of tests with drmemory overnight I am trying to resolve the error stacks by providing pdb symbols. The pdb's come from a large samba-mapped repository and using _NT_SYMBOL_PATH at runtime slowed things down too much.
Does anyone know of a tool that post-processes results.txt and pulls new symbols (via NT_SYMBOL_PATH or otherwise) as required to produce more detailed stacks ? If not, any hints for adapting asan_symbolize.py to do this ?
https://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/asan/scripts/asan_symbolize.py
What I came up with so far using dbghelp.dll is below. Works but could be better.
https://github.com/patraulea/postpdb

ok this Query does not pertain to use of windbg or doesn't have anything to do with _NT_SYMBOL_PATH
Dr.Memory is a memory diagnostic tool akin to valgrind and is based on Dynamorio instumentation framework usable on raw unmodified binaries
on windows you can invoke it like drmemory.exe calc.exe from a command prompt (cmd.exe)
as soon as the binary finishes execution a log file named results.txt is written to a default location
if you had setup _NT_SYMBOL_PATH drmemory honors it and resolves symbol information from prepulled symbol file (viz *.pdb) it does not seem to download files from ms symbol server it simply seems to ignore the SRV* cache and seems to use only the downstream symbol folder
so if the pdb file is missing or isnt downloaded yet
the results.txt will contain stack trace like
# 6 USER32.dll!gapfnScSendMessage +0x1ce (0x75fdc4e7 <USER32.dll+0x1c4e7>)
# 7 USER32.dll!gapfnScSendMessage +0x2ce (0x75fdc5e7 <USER32.dll+0x1c5e7>)
while if the symbol file was available it would show
# 6 USER32.dll!InternalCallWinProc
# 7 USER32.dll!UserCallWinProcCheckWow
so basically you need the symbol file for appplication in question
so as i commented you need to fetch the symbols for the exe in question
you can use symchk on a running process too and create a manifest file
and you can use symchk on a machine that is connected to internet
to download symbols and copy it to a local folder on a non_internet machine
and point _NT_SYMBOL_PATH to this folder
>tlist | grep calc.exe
1772 calc.exe Calculator
>symchk /om calcsyms.txt /ip 1772
SYMCHK: GdiPlus.dll FAILED - MicrosoftWindowsGdiPlus-
1.1.7601.17514-gdiplus.pdb mismatched or not found
SYMCHK: FAILED files = 1
SYMCHK: PASSED + IGNORED files = 27
>head -n 4 calcsyms.txt
calc.pdb,971D2945E998438C847643A9DB39C88E2,1
calc.exe,4ce7979dc0000,1
ntdll.pdb,120028FA453F4CD5A6A404EC37396A582,1
ntdll.dll,4ce7b96e13c000,1
>tail -n 4 calcsyms.txt
CLBCatQ.pdb,00A720C79BAC402295B6EBDC147257182,1
clbcatq.dll,4a5bd9b183000,1
oleacc.pdb,67620D076A2E43C5A18ECD5AF77AADBE2,1
oleacc.dll,4a5bdac83c000,1
so assuming you have fetched the symbols it would be easier to rerun the tests with a locally cached copies of the symbol files
if you have fetched the symbols but you cannot rerun the tests and have to work solely with the output from results.txt you have some text processing work (sed . grep , awk . or custom parser)
the drmemory suite comes with a symbolquery.exe in the bin folder and it can be used to resolve the symbols from results.txt
in the example above you can notice the offset relative to modulebase like
0x1c4e7 in the line # 6 USER32.dll!gapfnScSendMessage +0x1ce (0x75fdc4e7 {USER32.dll+0x1c4e7})
so for each line in results.txt you have to parse out the offset and invoke symbolquery on the module like below
:\>symquery.exe -f -e c:\Windows\System32\user32.dll -a +0x1c4e7
InternalCallWinProc+0x23
??:0
:\>symquery.exe -f -e c:\Windows\System32\user32.dll -a +0x1c5e7
UserCallWinProcCheckWow+0xb3
a simple test processing example from a result.txt and a trimmed output
:\>grep "^#" results.txt | sed s/".*<"//g
# 0 system call NtUserBuildPropList parameter #2
USER32.dll+0x649d9>)
snip
COMCTL32.dll+0x2f443>)
notice the comctl32.dll (there is a default comctl.dll in system32.dll and several others in winsxs you have to consult the other files like global.log to view the dll load path
symquery.exe -f -e c:\Windows\winsxs\x86_microsoft.windows.common-
controls_6595b64144ccf1df_6.0.7601.17514_none_41e6975e2bd6f2b2\comctl32.dll -a +0x2f443
CallOriginalWndProc+0x1a
??:0
symquery.exe -f -e c:\Windows\system32\comctl32.dll -a +0x2f443
DrawInsert+0x120 <----- wrong symbol due to wrong module (late binding
/forwarded xxx yyy reasons)

lftp - restart position

When trying to mirror using lftp I receive the following output (-d debugging mode):
<--- 227 Entering Passive Mode {some numbers}
---- Connecting data socket to (more numbers and port)
---- Data connection established
---> REST 0
<--- 350 Restart position accepted (0).
---> RETR {some filename}
When I open this file, the file is corrupted - the content of the file is shifted down by several lines and then on top of it a normal copy of the file is written. For example, if file had five lines (line breaks not shown for compactness): line1 line2 line3 line4 line5, then the corrupted file would read: line1 line2 line3 line3 line4 line5.
Given the other problems I am experiencing with this ftp/network combination, I understand that this is not lftp's fault. However, I wonder if disabling restart position changes would somehow fix those corrupted files (at least it works for the other files). By reading the manual I can see these two options:
hftp:use-range (boolean)
when true, lftp will use Range header for transfer restart.
http:use-range (boolean)
when true, lftp will use Range header for transfer restart.
I don't know if this is relevant to what I am trying to achieve (force lftp to always download the data in full, without restarting position), or whether what I want is achievable in principle. I would try these options by actually running them, but I cannot see any predictable pattern in when files get corrupted and re-downloading the same files always gives the correct version. So any help is appreciated! :)

Not sure if this is the solution, but based on logs I think that the problem for me was caused by get -c commands, so I removed --continue from the mirror job.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Merge multiple log files into a single one based on timestamp - shell

I ended up by using sort -m I also used a trick to understand from which log file the log comes from with for a in .log ; do awk '$0=FILENAME" "$0' $a > $a.log do sort -m -k 2 .log.log

Related

Faster way of Appending/combining thousands (42000) of netCDF files in NCO

diff -u -s, line cound (+, -) not giving correct value

dynamic debug statement of kernel in which file

Postprocess drmemory error stacks with new symbols after process exits

lftp - restart position

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Merge multiple log files into a single one based on timestamp - shell

I ended up by using sort -m I also used a trick to understand from which log file the log comes from with for a in *.log ; do awk '$0=FILENAME" "$0' $a > $a.log do sort -m -k 2 *.log.log

Related

Faster way of Appending/combining thousands (42000) of netCDF files in NCO

diff -u -s, line cound (+, -) not giving correct value

dynamic debug statement of kernel in which file

Postprocess drmemory error stacks with new symbols after process exits

lftp - restart position

Categories

Resources

I ended up by using sort -m I also used a trick to understand from which log file the log comes from with for a in .log ; do awk '$0=FILENAME" "$0' $a > $a.log do sort -m -k 2 .log.log