Using head and tail in UNIX to extract specific transaction - shell

I have a question regarding using the head and tail commands in UNIX to truncate a specific transaction from a huge transaction log.
head -X <<<filename>>> | tail -Y > <<<Truncatedfile>>>
where X is the number of lines I want from the beginning of the file and Y is the number of lines I want from the bottom of the file.
How can I modify this code to have the truncated file with just the transactions for a unique transaction ID? For example:-
The file contains transaction logs for n number of transaction IDs in a sequence. So, if I only need logs extracted for just 1 single transaction ID how to modify the above code?

You wouldn't modify the above code, instead you'd
grep -w transactionid filename
Assuming that the transactionid appears as a separate word (-w)
Edit You can include some context lines (this includes 10 lines after the match:)
grep -w -A 10 transactionid filename
Alternatively,
grep -vw transactionid filename
Simple hides all lines NOT containing the transaction id. This close to equivalent to doing sed -e '/transactionid/!d'.
To print lines 5-12
sed -n '5,12p' filename

Related

Pipe tail output into column

I'm trying to tail a log file and format the output into columns. This gives me what I want without tail:
cat /var/log/test.log | column -t -s "|"
How can I pipe the output of tail -f var/log/test.log into column?
EDIT: Here's an excerpt from the file. I'm manually adding the first line of the file so it could be used as the column headers, but I could format it differently if necessary.
timestamp|type|uri|referer|user_id|link|message
Feb 5 23:58:29 181d5d6339bd drupal_overlake: 1612569509|geocoder|https://overlake.lando/admin/config/development/configuration/config-split/add|https://overlake.lando/admin/config/development/configuration/config-split/add|0||Could not execute query "https://maps.googleapis.com/maps/api/geocode/json?address=L-054%2C%20US&language=&region=US".
Feb 5 23:58:29 181d5d6339bd drupal_overlake: 1612569509|geocoder|https://overlake.lando/admin/config/development/configuration/config-split/add|https://overlake.lando/admin/config/development/configuration/config-split/add|0||Unable to geocode 'L-054, US'.
You can't do it with the -f option to tail. column can't produce any output until it receives all its input, since it needs to calculate the number of rows and columns by examining all the input. tail -f never stops writing, so column doesn't know when it's done.
You can use
tail -n 100 test.log | column -t -s "|"
to format the last 100 lines of the log.

Show only newly added lines of logfile in terminal

I use tail -f to show the contents of a logfile.
What I want is when the logfile content changes, instead of appending the new lines to my screen, only the newly added lines should be shown on my screen.
So as if a clearscreen was made every time before printing the new lines.
I tried to find a solution by web search but couldn't find anything useful.
edit:
In my case it happens that several lines will be added at once (it is a php error logfile). So I am looking for a solution where more than the single last line can be shown on screen.
The watch command in combination with the tail command shows the last line of a log file with the intervall of every 2 seconds. Basically it doesn't refresh whenever a new line is appended to the log file but since you could specifiy an intervall it might help you for your use case.
watch -t tail -1 <path_to_logfile>
If you need a faster intervall like every 0.5 seconds, then you could specify it with the 'n' option i.e.:
watch -t -n 0.5 tail -1 <path_to_logfile>
Try
$ watch 'tac FILE | grep -m1 -C2 PATTERN | tac'
where
PATTERN is any keyword (or regexp) to identify errors you seek in the log,
tac prints the lines in reverse,
-m is a max count of matching lines to grep,
-C is any number of lines of context (before and after the match) to show (optional).
That would be similar to
$ tail -f FILE | grep -C2 PATTERN
if you didn't mind just appending occurrences to the output in real-time.
But if you don't know any generic PATTERN to look for at all,
you'd have to just follow all the updates as the logfile grows:
$ tail -n0 -f FILE
Or even, create a copy of the logfile and then do a diff:
Copy: cp file.log{,.old}
Refresh the webpage with your .php code (or whatever, to trigger the error)
Run: diff file.log{,.old}
(or, if you prefer sort to diff: $ sort file.log{,.old} | uniq -u)
The curly braces is shorthand for both filenames (see Brace Expansion in $ man bash)
If you must avoid any temp copies, store the line count in memory:
z=$(grep -c ^ file.log)
Refresh the webpage to trigger an error
tail -n +$z file.log
The latter approach can be built upon, to create a custom scripting solution more suitable for your needs (check timestamps, clear screen, filter specific errors, etc). For example, to only show the lines that belong to the last error message in the log file updated in real-time:
$ clear; z=$(grep -c ^ FILE); while true; do d=$(date -r FILE); sleep 1; b=$(date -r FILE); if [ "$d" != "$b" ]; then clear; tail -n +$z FILE; z=$(grep -c ^ FILE); fi; done
where
FILE is, obviously, your log file name;
grep -c ^ FILE counts all lines in a file (that is almost, but not entirely unlike cat FILE|wc -l that would only count newlines);
sleep 1 sets the pause/delay between checking the file timestamps to 1 second, but you could change it to even a floating point number (the less the interval, the higher the CPU usage).
To simplify any repetitive invocations in future, you could save this compound command in a Bash script that could take a target logfile name as an argument, or define a shell function, or create an alias in your shell, or just reverse-search your bash history with CTRL+R. Hope it helps!

grep -Ff producing invalid output

I'm using
code -
grep -Ff list.txt C:/data/*.txt > found.txt
but it keeps outputting invalid responses, lines don't contain the emails i input..
list.txt contains -
email#email.com
customer#email.com
imadmin#gmail.com
newcustomer#email.com
helloworld#yes.com
and so on.. email to match on each line,
search files contain -
user1:phonenumber1:email#email.com:last-active:recent
user2:phonennumber2:customer#email.com:last-active:inactive
user3:phonenumber3:blablarandom#bla.com:last-active:never
then another may contain -
blublublu email#email.com phonenumber subscribed
nanananana customer#email.com phonenumber unsubscribed
useruser noemailinput#noemail.com phonenumber pending
so what I'm trying to do is present grep with a list of emails/list of strings " list.txt " and to then search the directory provided for matches of each string and output the entire line that contains each match.
example of output in this case would be -
user1:phonenumber1:email#email.com:last-active:recent
user2:phonennumber2:customer#email.com:last-active:inactive
blublublu email#email.com phonenumber subscribed
nanananana customer#email.com phonenumber unsubscribed
yet it wouldn't output the other two lines -
user3:phonenumber3:blablarandom#bla.com:last-active:never
useruser noemailinput#noemail.com phonenumber pending
because no string is within that line.
The file list.txt probably contains empty lines or some of the separators. When I added : to list.txt, all the lines from the first sample started to match. Similarly, adding a space made all the lines from the second sample match. Adding # causes the same symptoms.
Try running grep -oFf ... (if your grep supports -o) to see the exact matching parts. If there are empty lines in list.txt, the number of matches will be less than the number of matches without -o. Try searching the output of -o for extremely short outputs to check for suspicious strings. You can also examine the shortest lines in list.txt.
while read line ; do echo ${#line} "$line" ; done < list.txt | sort -nk1,1
I think your file list.txt may have blank lines in it, causing it to match every line in the files specified with C:/data/*.txt. To fix you can either manually delete every empty line or run the command sed -i '/^$/d' list.txt where the -i flag edits the file in place.
The issue may also be related to dos carriage returns, try running: cat -v list.txt and checking if the lines are followed by ^M:
email#email.com^M
customer#email.com^M
If this is the case you will need to amend the file using either dos2unix or tr -d '\r' < list.txt > output.txt.

compare rows in two files in unix shell script and merge without redundant data

There is one old report file residing on a drive.
Everytime a new report is generated, it should be compared to the contents of this old file.
If any new account row is reported in this new report file, it should be added to the old file, else just skip.
Both files will have same title and headers.
Eg: old report
RUN DATE:xyz FEE ASSESSMENT REPORT
fee calculator
ACCOUNT NUMBER DELVRY DT TOTAL FEES
=======================================================
123456 2014-06-27 110.0
The new report might be
RUN DATE:xyz FEE ASSESSMENT REPORT
fee calculator
ACCOUNT NUMBER DELVRY DT TOTAL FEES
=======================================================
898989 2014-06-26 11.0
So now the old report should be merged to have both rows under it - 123456 and 898989 acc no rows.
I am new to shell scripting. I don't know if I should use diff cmd or while read LINE or awk?
Thanks!
This appears to be several commands in combination to create an actual script, rather than an adept commandlinefu in only one line.
Assuming the number of lines in the header section of the report is consistent, then you can use tail -n +7 to return the lines after the first 7 as you show in your example.
If they are not the same, but all end with the line you've shown above "==========" then you can use grep -n to find that line number and start parsing the account numbers after it.
#!/usr/bin/env bash
OLD_FILE="ancient_report.log"
NEW_FILE="latest_and_greatest.log"
tmp_ext=".tmp"
tail -n +7 ${OLD_FILE} > ${OLD_FILE}${tmp_ext}
tail -n +7 ${NEW_FILE} >> ${OLD_FILE}${tmp_ext}
sort -u ${OLD_FILE}${tmp_ext} > ${OLD_FILE}${tmp_ext}.unique
mv -f ${OLD_FILE}${tmp_ext}.unique ${OLD_FILE}
To illustrate this script:
#!/usr/bin/env bash
The shebang line above tells *nix how to run it.
OLD_FILE="ancient_report.log"
NEW_FILE="latest_and_greatest.log"
tmp_ext=".tmp"
Declare starting variables. You can also do this by using arguments of the file names. OLD_FILE=${1} to get the first argument on the command line.
tail -n +7 ${OLD_FILE} > ${OLD_FILE}${tmp_ext}
tail -n +7 ${NEW_FILE} >> ${OLD_FILE}${tmp_ext}
Put the endings of the two files into a single 'tmp' file
sort -u ${OLD_FILE}${tmp_ext} > ${OLD_FILE}${tmp_ext}.unique
sort and retain only the 'unique' entries with -u
If your OS version of sort does not have the -u then you can get the same results by using: sort <filename> | uniq
mv -f ${OLD_FILE}${tmp_ext}.unique ${OLD_FILE}
Replace old file with new uniq'd file.
There are of course many simpler ways to do this, but this one gets the job done with several commands in a sequence.
Edit:
To preserve the header portion of the file with the latest report date, then instead of mving the new tmp file over the old, do:
rm ${OLD_FILE};
head -n 7 ${NEW_FILE}) > ${OLD_FILE}
cat ${OLD_FILE}${tmp_ext}.unique >> ${OLD_FILE}
This removes the OLD_FILE (can't overwrite without deleting first) and cats together the header of the new file (for date) and the entire contents of the unique tmp file. After this you can do general file cleanup such as removing any new files you've created. To preserve/debug any changes, you can add a datestamp to each 'uniqued' file name and keep them as an audit trail of all report additions.

Create files using grep and wildcards with input file

This should be a no-brainer, but apparently I have no brain today.
I have 50 20-gig logs that contain entries from multiple apps, one of which addes a transaction ID to its log lines. I have 42 transaction IDs I need to review, and I'd like to parse out the appropriate lines into separate files.
To do a single file, the command would be simply,
grep CDBBDEADBEEF2020X02393 server.log* > CDBBDEADBEEF2020X02393.log
that creates a log isolated to that transaction, from all 50 server.logs.
Now, I have a file with 42 txnIDs (shortening to 4 here):
CDBBDEADBEEF2020X02393
CDBBDEADBEEF6548X02302
CDBBDE15644F2020X02354
ABBDEADBEEF21014777811
And I wrote:
#/bin/sh
grep $1 server.\* > $1.log
But that is not working. Changing the shebang to #/bin/bash -xv, gives me this weird output (obviously I'm playing with what the correct escape magic must be):
$ ./xtrakt.sh B7F6E465E006B1F1A
#!/bin/bash -xv
grep - ./server\.\*
' grep - './server.*
: No such file or directory
I have also tried the command line
grep - server.* < txids.txt > $1
But OBVIOUSLY that $1 is pointless and I have no idea how to get a file named per txid using the input redirect form of the command.
Thanks in advance for any ideas. I haven't gone the route of doing a foreach in the shell script, because I want grep to put the original filename in the output lines so I can examine context later if I need to.
Also - it would be great to have the server.* files ordered numerically (server.log.1, server.log.2 NOT server.log.1, server.log.10...)
try this:
while read -r txid
do
grep "$txid" server.* > "$txid.log"
done < txids.txt
and for the file ordering - rename files with one digit to two digit, with leading zeroes, e.g. mv server.log.1 server.log.01.

Resources