compare 2 csv files in shell using awk

compare 2 csv files in shell using awk - shell

Thanks.
I have 2 csv files which I need to compare them and report back if it's different. file format is same in both files and even first column data (column A) in both files have same content(it's header info).
Tried using awk command but have there is conditions which am not sure how to implement.
conditions :
a. Need to exclude first 2 rows (since those are not required for comparison). can this be achieved by doing :
NFR=NR > 2
b. If any of the value differ then in output need to report back with header info and it's respective servername along with values.
File1.csv :
Status Check
APP servers
Server name,abc,def,ghi,jkl,mno,
Summary,,,,,,
System Start Time,Nov/12/2016 20:12:24 GMT,Nov/12/2016 20:15:38 GMT,Nov/12/2016 20:15:37 GMT,Nov/12/2016 20:15:57 GMT,Nov/12/2016 20:11:42 GMT,
System Life Time,118day.14hr.15min.19sec,118day.14hr.12min.01sec,118day.14hr.12min.03sec,118day.14hr.11min.44sec,118day.14hr.16min.01sec,
OS Version,SunOS 5.10,SunOS 5.10,SunOS 5.10,SunOS 5.10,SunOS 5.10,
Service Pack Version,Generic_147148-26,Generic_147148-26,Generic_147148-26,Generic_147148-26,Generic_147148-26,
State,Up,Up,Up,Up,Up,
File2.csv :
Status Check
APP servers
Server name,abc,def,ghi,jkl,mno,
Summary,,,,,,
System Start Time,Nov/13/2016 20:12:24 GMT,Nov/13/2016 20:15:38 GMT,Nov/13/2016 20:15:37 GMT,Nov/13/2016 20:15:57 GMT,Nov/13/2016 20:11:42 GMT,
System Life Time,118day.14hr.15min.19sec,118day.14hr.12min.01sec,118day.14hr.12min.03sec,118day.14hr.11min.44sec,118day.14hr.16min.01sec,
OS Version,SunOS 5.10,SunOS 5.10,SunOS 5.11,SunOS 5.12,SunOS 5.10,
Service Pack Version,Generic_147148-26,Generic_147148-26,Generic_147148-26,Generic_147148-26,Generic_147148-26,
State,Down,Up,Down,Up,Down,
Result/Output:
OS Version value is different for server name ghi and jkl : 5.11,5.12
State value is different for server name abc, ghi and mno : Down,Down,Down
Is it possible to exclude 5/6 column for comparison as well since that will be date/time related so not required for comparison.
can just give key value(say column b/c) only those specific columns data gets compared b/w files ?

this may give you an idea how to approach the problem
$ paste -d, file{1,2} |
awk -F, 'NR<3 {next}
NR==3 {n=split($0,h); m=n/2}
NR!=5 && NR!=6 {for(i=2;i<=m-1;i++)
if($i!=$(i+m)) print $1,h[i],$i,$(i+m)}'
OS Version ghi SunOS 5.10 SunOS 5.11
OS Version jkl SunOS 5.10 SunOS 5.12
State abc Up Down
State ghi Up Down
State mno Up Down
your output formatting can be added but will complicate the code. Since your values contain space you may want to keep comma as output field separator as well.

Related

FreeBSD script to show active connections and append number remote file

I am using NetScaler FreeBSD, which recognizes many of the UNIX like commands, grep, awk, crontab… etc.
I run the following command to get the number of connected users that we have on the system
#> nsconmsg -g aaa_cur_ica_conn -d stats
OUTPUT (numbered lines):
Line1: Displaying current counter value information
Line2: NetScaler V20 Performance Data
Line3: NetScaler NS11.1: Build 63.9.nc, Date: Oct 11 2019, 06:17:35
Line4:
Line5: reltime:mili second between two records Sun Jun 28 23:12:15 2020
Line6: Index reltime counter-value symbol-name&device-no
Line7: 1 2675410 605 aaa_cur_ica_conn
…
…
From above output - I only need the number of connected users (represented in Line 7, 3rd column (605 to be precise), along with the Hostname and Time (of the running script)
Now, to extract this important 3rd column number i.e. 605, along with the hostname, and time of data collected - I wrote the following script:
printf '%s - %s - %s\n' "$(hostname)" "$(date '+%H:%M')" "$(nsconmsg -g aaa_cur_ica_conn -d stats | grep aaa_cur_ica_conn | awk '{print $3}')"
The result is perfect, showing hostname, time, and the number of connected users as follows:
Hostname - 09:00 – 605
Now can anyone please shed light on how I can:
Run this script every day - 5am to 5pm (12hours)?
Each time scripts runs - append a file on a remote Unix share with the output?
I appreciate this might be a bit if a challenge... however would be grateful for any bash scripts wizards out there that can create magic!
Thanks in advance!

I would suggest a quick look into the FreeBSD Handbook or For People New to Both FreeBSD and UNIX® so that you could get familiar with the operating system and tools that could help you achieve better what you want.
For example, there is a utility/command named cron
The software utility cron is a time-based job scheduler in Unix-like computer operating systems.
For example, to run something all days between 5am to 5pm every minute, you could use something like:
* 05-17 * * * command
Try more options here: https://crontab.guru/#*_05-17_*_*_*.
There are more tools for scheduling commands, for example at (https://en.wikipedia.org/wiki/At_(command)) but this something you need to evaluate and read more about it.
Now regarding the command, you are using to get the "number of connected users", you could avoid the grep and just used awk for example:
awk '/aaa_cur_ica_conn/ {print $3}'
This will print only column 3 if line contains aaa_cur_ica_conn, but as before I invite you to read more about the topic so that you could bet a better overview and better understand the commands.
Last but not least, check this link How do I ask a good question? the better you could format, and elaborate your question the easy for others to give an answer.

How to read every line from a txt file and print starting from the line which starts with "Created_Date" in shell scripting [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 4 years ago.
5G_Fixed_Wireless_Dashboard_TestScedule||||||||||||||||^M
Report Run Date||08/07/2018|||||||||||||||||||||^M
Requesting User Company||NEW|||||||||||||||||||||^M
Report Criteria|||||||||||||||||||||||^M
" Service Job Updated from Date:
Service Job Updated to Date:
Service Job Created from Date: 08/06/2018
Service Job Created to Date:
Service Job Status:
Resolution Code:"|||||||||||||||||||||||^M
Created Date|Job Status|Schedule Date|Job
Number|Service Job Type|Verizon Customer Order
Number|Verizon Location Code|Service|Installation
Duration|Part Number
I want to print starting from Created Date. The result
file should be something like below.
Created Date|Job Status|Schedule Date|Job
Number|Service Job Type|Verizon Customer Order
Number|Verizon Location Code|Service|Installation
Duration|Part Number
I have tried the following lines after you people linked me to some other questions. But my requirement is to print the result to the same file.
FILELIST=find $MFROUTDIR -maxdepth 1 -name "XXXXXX_5G_Order_*.txt"
for nextFile in $FILELIST;do
cat $nextFile | sed -n -e '/Created Date/,$p'
done
By writing above lines of code, output is printed on console. Could you please suggest some way to print it in same file.

This can be easily done with a simple awk command:
awk '/^Created Date/{p=1} p' file
Created Date|Job Status|Schedule Date|Job
Number|Service Job Type|Verizon Customer Order
Number|Verizon Location Code|Service|Installation
Duration|Part Number
We set a flag p to 1 when we encounter a line that starts with Created Date. Later we use awk default action to print each line when p==1.
References:
Effective AWK Programming
Awk Tutorial

Extract a string that is located above and nearest to the matching pattern in a multiline output

Below is the HP ssacli command to see configured hardware RAID details:
ssacli ctrl slot=0 show config
and its output is as below:
HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)
Internal Drive Cage at Port 1I, Box 1, OK
Internal Drive Cage at Port 2I, Box 0, OK
Port Name: 1I (Mixed)
Port Name: 2I (Mixed)
Array A (Solid State SAS, Unused Space: 0 MB)
logicaldrive 1 (447.10 GB, RAID 1, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS SSD, 480 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS SSD, 480 GB, OK)
SEP (Vendor ID HPE, Model Smart Adapter) 379 (Port: Unknown)
I have to figure out the Array name in order to delete it by searching for the matching disk info which I get as input from the user. For example if the disk input is 1I:1:1, then I have to search this string in the output of the above command. Since this disk is available and matching, I have to extract the Array name (here it is A) and once I get this Array parameter then I can go ahead delete this existing RAID configuration.
ssacli ctrl slot=0 show config | grep -B 4 '1I:1:1' | grep Array | awk '{print $2}'
The problem with the above command is,
value 4 in the grep -B cannot be always constant as the matching disk may come first, second or third or so on under an Array in the output.
there may be multiple RAID array configurations available in the output, so there may be Array A, B, C etc., I have to find and retrieve the nearest Array string that matches my input disk

I think your requirement could be simply solved with a single usage of awk. You store the disk name as a variable to be passed and first store the array names as you go through the list of lines. Once you do a match of the actual disk name, you print the array you just stored. So pipe your command output to
| awk -v disk="1I:1:1" '/^[[:space:]]*Array/{ array=$2; } $0 ~ disk { print array; exit }'
This answer assumes that the array names do not contain spaces or else it would be broken and would print the first part of the array name only.

You can process the file from the end:
tac infile \
| awk -v input='1I:1:1' '$0 ~ input {flag=1} flag && /Array/ {print $2; exit}'
This sets a flag when encountering a line matching the user input; after that, if a line matches Array and the flag is set, the second field is printed.

Use AWK to extract just the MAC addresses from a show mac address-table from a cisco switch

I'd need to take the output from a show mac address-table on a Cisco switch and extract only the Mac addresses and put them in a CSV file.
The output looks like this
vlan Mac Address Type Ports
----- ----------- ----- -----
All 0011.2233.4455 STATIC CPU
All 0011.2233.4466 STATIC CPU
All 0011.2233.4477 STATIC CPU
All 0011.2233.4488 STATIC CPU
Macs are displayed in groups of 4 as seen above. I need to grab each MAC address and output it to a csv file.
This works but it also grabs unwanted output.
awk '{print $2}' macTable.log > macTable.csv

Just add a condition that the field only contains digits and dots:
awk '$2~/^[0-9.]+$/{print $2}' macTable.log

You can test for the two types of mac-address STATIC and DYNAMIC
awk 'tolower($3)~/static|dynamic/ {print $2}' log
0011.2233.4455
0011.2233.4466
0011.2233.4477
0011.2233.4488
PS you need to use tolower since switch may use uppercase and routers uses lowercase.

You could also do it with grep:
egrep -o '([0-9]{4}\.){2}[0-9]{4}'
Output:
0011.2233.4455
0011.2233.4466
0011.2233.4477
0011.2233.4488

creating a script which finds two alternating patterns

So my issue is that I need to make a script which finds a pattern where Time to live and User-Agent occur in that order and I increment a count (or grab what data I want, etc; it will likely evolve from there.)
For example:
Time to live: 64
Some other data: ________
...
User-Agent: Mozilla/Chrome/IE:Windows/Unix/Mac
So basically the data appears in that order TTL then user-agent, from that information I can grab the data I want but I don't know what to do about the pattern to identify this. If it helps I'm getting this data from a Wireshark capture saved as a text file.
Thanks to Shellter I got to the point where I have:
egrep ' User-Agent:| Time to live:' ../*.txt
which finds if both (TTL and UA) are in the file.
I'd appreciate any assistance.
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Header checksum: 0x7e4d [correct]
[Good: True]
[Bad: False]
Source: 1.1.1.3 (1.1.1.3)
Destination: 1.1.1.4 (1.1.1.4)
//packet 2
Fragment offset: 0
Time to live: 128
Protocol: TCP (6)
Hypertext Transfer Protocol
GET / HTTP/1.1\r\n
[Expert Info (Chat/Sequence): GET / HTTP/1.1\r\n]
[Message: GET / HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: GET
Request URI: /
Request Version: HTTP/1.1
Host: mail.yahoo.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0\r\n
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
I apologize for the slow reply, I had to do some editing.
So basically I just need to identify when a TTL only occurs, when a TTL occurs and there's user-agent data; basically I use this to identify clients behind a gateway.
So if TTL is 126 (windows) and I see 125, we assume it's behind a gateway and count++.
If we get that same count but with a different user-agent but same OS, count doesn't change.
If we get that same count but with a different user-agent and OS, count++.
so output could be as simple as:
1 (ttl)
1 (ttl+os)
2 (ttl+os+ua)
from the example (not the data) above.

It's still a little unclear what you're looking to report, but maybe this will help.
We're going to use awk as that tool was designed to solve problems of this nature (among many others).
And while my output doesn't match your output exactly, I think the code is self-documenting enough that you can work with this, and make a closer approximation to your final need. Feel free to update your question with your new code, new output, and preferably an exact example of the output you hope to achieve.
awk '
/Time to live/{ttl++}
/User-Agent/{agent++}
/Windows|Linux|Solaris/{os++}
END{print "ttl="ttl; print "os="os; print"agent="agent}
' ttlTest.txt
output
ttl=2
os=1
agent=1
The key thing to understand is that awk (and most Unix based reg-ex utilities, grep included) read each line of input and decide if it will print (or do something else) with the current line of data.
awk normally will print every line of input if you give it something like
awk '{print $1}' file
i this example, printing just the first field from each line of data.
In the solution above, we're filtering the data with regular expressions and the applying an action because we have matched some data, i.e.
/Time to live/{ ttl++ }
| | | |
| | | > block end
| | > action (in this case, increment value of ttl var
| > block begin
>/ regex to match / #
So we have 2 other 'regular expressions' that we're scanning each line for, and every time we match that regular expression, we increment the related variable.
Finally, awk allows for END blocks that execute after all data has been read from files.
This is how we create your summary report. awk also has BEGIN blocks that execute before any data has been read.
Another idiom of awk scanning that allows for more complex patterns to be match looks like
awk '{
if ( /Time to live/ && User-Agent/ ) {
ttl_agent++
}
}' ttlTest.txt
Where the first and last { } block-definition characters, indicate that this logic will be applied to each line that is read from the data. This block can be quite complex, and can use other variable values to be evaluated inside the if test, like if ( var=5 ) { print "found var=5"}.
IHTH

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

compare 2 csv files in shell using awk - shell

Related

FreeBSD script to show active connections and append number remote file

How to read every line from a txt file and print starting from the line which starts with "Created_Date" in shell scripting [duplicate]

Extract a string that is located above and nearest to the matching pattern in a multiline output

Use AWK to extract just the MAC addresses from a show mac address-table from a cisco switch

creating a script which finds two alternating patterns

Categories

Resources