Group_by and group_concat in shell script - shell

My intent is to identify the duplicate jars in classpath. So I have used following commands to do some preprocessing.
mvn -o dependency:list | grep ":.*:.*:.*" | cut -d] -f2- | sed 's/:[a-z]*$//g' | sort -u -t: -k2
and the file produced is in format
group_id:artifact_id:type:version
so, now for an example, I have following two lines in a file
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
I want to produce a file with following content.
jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26
content of this file varies. there can be multiple libs with diff version.
Any idea how to do it with shell script? I want to avoid database query.
Adding a snap of sample file here...
org.glassfish.jaxb:jaxb-runtime:jar:2.4.0-b180725.0644
org.jboss.spec.javax.annotation:jboss-annotations-api_1.2_spec:jar:1.0.2.Final
org.jboss.logging:jboss-logging:jar:3.3.2.Final
org.jboss.spec.javax.transaction:jboss-transaction-api_1.2_spec:jar:1.0.1.Final
org.jboss.spec.javax.websocket:jboss-websocket-api_1.1_spec:jar:1.1.3.Final
com.github.stephenc.jcip:jcip-annotations:jar:1.0-1
com.beust:jcommander:jar:1.72
com.sun.jersey.contribs:jersey-apache-client4:jar:1.19.1
org.glassfish.jersey.ext:jersey-bean-validation:jar:2.26
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
org.glassfish.jersey.core:jersey-common:jar:2.26
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.26
org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.26
com.sun.jersey:jersey-core:jar:1.19.1
org.glassfish.jersey.ext:jersey-entity-filtering:jar:2.26
org.glassfish.jersey.inject:jersey-hk2:jar:2.31
org.glassfish.jersey.media:jersey-media-jaxb:jar:2.26
org.glassfish.jersey.media:jersey-media-json-jackson:jar:2.26
org.glassfish.jersey.media:jersey-media-multipart:jar:2.26
org.glassfish.jersey.core:jersey-server:jar:2.26
org.glassfish.jersey.ext:jersey-spring4:jar:2.26
net.minidev:json-smart:jar:2.3
com.google.code.findbugs:jsr305:jar:3.0.1
javax.ws.rs:jsr311-api:jar:1.1.1
org.slf4j:jul-to-slf4j:jar:1.7.25
junit:junit:jar:4.12
org.latencyutils:LatencyUtils:jar:2.0.3
org.liquibase:liquibase-core:jar:3.5.5
log4j:log4j:jar:1.2.16
org.apache.logging.log4j:log4j-api:jar:2.10.0
com.googlecode.log4jdbc:log4jdbc:jar:1.2
org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0
ch.qos.logback:logback-classic:jar:1.2.3
ch.qos.logback:logback-core:jar:1.2.3
io.dropwizard.metrics:metrics-core:jar:4.1.6
io.dropwizard.metrics:metrics-healthchecks:jar:4.1.6
io.dropwizard.metrics:metrics-jmx:jar:4.1.6
io.micrometer:micrometer-core:jar:1.0.6
org.jvnet.mimepull:mimepull:jar:1.9.6
com.microsoft.sqlserver:mssql-jdbc:jar:6.2.2.jre8
com.netflix.netflix-commons:netflix-commons-util:jar:0.3.0
com.netflix.netflix-commons:netflix-statistics:jar:0.1.1
io.netty:netty-buffer:jar:4.1.27.Final
io.netty:netty-codec:jar:4.1.27.Final
io.netty:netty-codec-http:jar:4.1.27.Final
io.netty:netty-common:jar:4.1.27.Final
io.netty:netty-resolver:jar:4.1.27.Final
io.netty:netty-transport:jar:4.1.27.Final
io.netty:netty-transport-native-epoll:jar:4.1.27.Final
io.netty:netty-transport-native-unix-common:jar:4.1.27.Final
com.nimbusds:nimbus-jose-jwt:jar:8.3

There might be easier methods but this is what I can do now ... probably can be narrowed down to a single line with some tweaking
[07:38 am alex ~]$ date; cat a
Wed 4 Nov 07:38:21 GMT 2020
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
[07:38 am alex ~]$ FIRST=`cat a | awk -F'[:]' '{print $2}' | uniq`
[07:38 am alex ~]$ SECOND=`cat a | awk -F'[:]' '{print $1":"$4}' | xargs | sed 's/ /,/g'`
[07:38 am alex ~]$ echo "$FIRST | $SECOND"
jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26

Could you please try following, this could be done in a single awk itself. Completely based on your shown samples only.
awk '
BEGIN{
FS=":"
OFS=" | "
}
FNR==1{
first=$1
third=$3
second=$2
next
}
FNR==2{
print second,first","$1":"$NF
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=":" ##Setting field separator colon here.
OFS=" | " ##Setting output field separator as space | space here.
}
FNR==1{ ##Checking conditon if this is first line then do following.
first=$1 ##Creating first with 1st field value.
third=$3 ##Creating third with 3rd field value.
second=$2 ##Creating second with 2nd field value of current line.
next ##next will skip all further statements from here.
}
FNR==2{ ##Checking condition if this is 2nd line then do following.
print second,first","$1":"$NF ##Printing second first first field and last field of current line.
}
' Input_file ##Mentioning Input_file name here.

Related

Extract text between 2 similar or different strings separately in shell script

I want to extract text between each ### separately to compare with a different file. Need to extract all CVE numbers for all docker images to compare from previous report. File looks as shown below. This is a snippet and it has more than 100 such lines. Need to do this via Shell Script. Kindly help.
### Vulnerabilities found in docker image alarm-integrator:22.0.0-150
| CVE | X-ray Severity | Anchore Severity | Trivy Severity | TR |
| :--- | :------------: | :--------------: | :------------: | :--- |
|[CVE-2020-29361](#221fbde4e2e4f3dd920622768262ee64c52d1e1384da790c4ba997ce4383925e)|||Important|
|[CVE-2021-35515](#898e82a9a616cf44385ca288fc73518c0a6a20c5e0aae74ed8cf4db9e36f25ce)|||High|
### Vulnerabilities found in docker image br-agent:22.0.0-154
| CVE | X-ray Severity | Anchore Severity | Trivy Severity | TR |
| :--- | :------------: | :--------------: | :------------: | :--- |
|[CVE-2020-29361](#221fbde4e2e4f3dd920622768262ee64c52d1e1384da790c4ba997ce4383925e)|||Important|
|[CVE-2021-23214](#75eaa96ec256afa7bc6bc3445bab2e7c5a5750678b7cda792e3c690667eacd98)|||Important|
I've tried something like this grep -oP '(?<=\"##\").*?(?=\"##\")' but it doesn't work.
Expected Output:
For alarm-integrator
CVE-2020-29361
CVE-2021-35515
For br-agent
CVE-2020-29361
CVE-2021-23214
With your shown samples, please try following awk code.
awk '
/^##/ && match($0,/docker image[[:space:]]+[^:]*/){
split(substr($0,RSTART,RLENGTH),arr1)
print "for "arr1[3]
next
}
match($0,/^\|\[[^]]*/){
print substr($0,RSTART+2,RLENGTH-2)
}
' Input_file
Explanation: Adding detailed explanation for above awk code.
awk ' ##Starting awk program from here.
/^##/ && match($0,/docker image[[:space:]]+[^:]*/){ ##Checking condition if line starts from ## AND using match function to match regex docker image[[:space:]]+[^:]* to get needed value.
split(substr($0,RSTART,RLENGTH),arr1) ##Splitting matched part in above match function into arr1 array with default delimiter of space here.
print "for "arr1[3] ##Printing string for space arr1 3rd element here
next ##next will skip all further statements from here.
}
match($0,/^\|\[[^]]*/){ ##using match function to match starting |[ till first occurrence of ] here.
print substr($0,RSTART+2,RLENGTH-2) ##printing matched sub string from above regex.
}
' Input_file ##mentioning Input_file name here.
Using GNU awk (which I assume you have or can get since you're using GNU grep) for the 3rd arg to match():
$ cat tst.awk
match($0,/^###.* ([^:]+):.*/,a) { print "For", a[1] }
match($0,/\[([^]]+)/,a) { print a[1] }
!NF
$ awk -f tst.awk file
For alarm-integrator
CVE-2020-29361
CVE-2021-35515
For br-agent
CVE-2020-29361
CVE-2021-23214
with awk you can do:
awk -v FS=' |[[]|[]]' '/^[#]+/{sub(/:.*$/,"");print "For " $NF} /^\|\[/{print $2} /^$/ {print ""}' file
For alarm-integrator
CVE-2020-29361
CVE-2021-35515
For br-agent
CVE-2020-29361
CVE-2021-23214
we config the field separator FS as |[[]|[]]: space or [ character or ] character.
first condition-action is for getting For alarm-integrator and For br-agent
second condition-action for all CVE numbers
and lastly we add the blank line.
more readable:
awk -v FS=' |[[]|[]]' '
/^[#]+/{sub(/:.*$/,"");print "For " $NF}
/^\|\[/{print $2}
/^$/ {print ""}
' file
For alarm-integrator
CVE-2020-29361
CVE-2021-35515
For br-agent
CVE-2020-29361
CVE-2021-23214

Extract a property value from a text file

I have a log file which contains lines like the following one:
Internal (reserved=1728469KB, committed=1728469KB)
I'd need to extract the value contained in "committed", so 1728469
I'm trying to use awk for that
cat file.txt | awk '{print $4}'
However that produces:
committed=1728469KB)
This is still incomplete and would need still some work. Is there a simpler solution to do that instead?
Thanks
Could you please try following, using match function of awk.
awk 'match($0,/committed=[0-9]+/){print substr($0,RSTART+10,RLENGTH-10)}' Input_file
With GNU grep using \K option of it:
grep -oP '.*committed=\K[0-9]*' Input_file
Output will be 1728469 in both above solutions.
1st solution explanation:
awk ' ##Starting awk program from here.
match($0,/committed=[0-9]+/){ ##Using match function to match from committed= till digits in current line.
print substr($0,RSTART+10,RLENGTH-10) ##Printing sub string from RSTART+10 to RLENGTH-10 in current line.
}
' Input_file ##Mentioning Input_file name here.
Sed is better at simple matching tasks:
sed -n 's/.*committed=\([0-9]*\).*/\1/p' input_file
$ awk -F'[=)]' '{print $3}' file
1728469KB
You can try this:
str="Internal (reserved=1728469KB, committed=1728469KB)"
echo $str | awk '{print $3}' | cut -d "=" -f2 | rev | cut -c4- | rev

Need only parent domain from a URL in dnsmasq.log file

I want to fetch website names visited by connected LAN clients from dnsmasq.log file. So far i have been able to get this done.
cat /tmp/dnsmasq.log | grep query | egrep -v 'AAA|SRV|PTR' | awk '{print $1" "$2" "$3","$8","$6}'
May 29 12:00:17,127.0.0.1,ftp.box.com
May 29 12:00:33,10.0.0.41,2.android.pool.ntp.org
I need output as
May 29 12:00:17,127.0.0.1,box.com
May 29 12:00:33,10.0.0.41,ntp.org
I need just the parent domain name in the output. Please help.
Thanks
Could you please try following, written and tested with shown samples.Considering that you need last 2 elements of your url.
awk 'BEGIN{FS=OFS=","} {num=split($NF,array,".");$NF=array[num-1]"."array[num]} 1' Input_file
Explanation: Adding detailed explanation.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this awk program from here.
FS=OFS="," ##Setting field separator and output field separator as comma here.
}
{
num=split($NF,array,".") ##Splitting last field and metioning . as separator.
$NF=array[num-1]"."array[num] ##Setting last column value as 2nd last element DOT and last element of array here.
}
1 ##1 will print lines here.
' Input_file ##Mentioning Input_file name here.

Awk command to cut the url

I want to cut my url https://jenkins-crumbtest2.origin-ctc-core-nonprod.com/ into https://origin-ctc-core-nonprod.com I have tried several ways to handle it
$ echo https://jenkins-crumbtest2-test.origin-ctc-core-nonprod.com/ | cut -d"/" -f3 | cut -d"/" -f5
jenkins-crumbtest2.origin-ctc-core-nonprod.com
I have 3 inputs which i want to pass to get the expected output. I want to pass any of the input to get the same output.
Input:
1. https://jenkins-crumbtest2-test.origin-ctc-core-nonprod.com/ (or)
2. https://jenkins-crumbtest2.origin-ctc-core-nonprod.com/ (or)
3. https://jenkins-crumbtest2-test-lite.origin-ctc-core-nonprod.com/
Expected Output:
https://origin-ctc-core-nonprod.com
Can someone please help me ?
Could you please try following. Written and tested with shown samples only.
awk '{gsub(/:\/\/.*test\.|:\/\/.*crumbtest2\.|:\/\/.*test-lite\./,"://")} 1' Input_file
OR non-one liner form of solution above is as follows.
awk '
{
gsub(/:\/\/.*test\.|:\/\/.*crumbtest2\.|:\/\/.*test-lite\./,"://")
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
gsub(/:\/\/.*test\.|:\/\/.*crumbtest2\.|:\/\/.*test-lite\./,"://") ##Gobally substituting everything till test OR crumbtest OR test-lite with :// in line.
}
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name h
This awk skips the records that don't have fixed string origin-ctc-core-nonprod.com in them:
awk 'match($0,/origin-ctc-core-nonprod\.com/){print "https://" substr($0,RSTART,RLENGTH)}'
You can use it with: echostring| awk ..., catfile|or awk ...file .
Explined:
awk ' # using awk
match($0,/origin-ctc-core-nonprod\.com/) { # if fixed string is matched
print "https://" substr($0,RSTART,RLENGTH) # output https:// and fixed string
# exit # uncomment if you want only
}' # one line of output like in sample
Or if you don't need the https:// part, you could just use grep:
grep -om 1 "origin-ctc-core-nonprod\.com"
Then again:
$ var=$(grep -om 1 "origin-ctc-core-nonprod\.com" file) && echo https://$var

Copy numbers at the beginning of each line to the end of line

I have a file that produces this kind of lines . I wanna edit these lines and put them in passageiros.txt
a82411:x:1015:1006:Adriana Morais,,,:/home/a82411:/bin/bash
a60395:x:1016:1006:Afonso Pichel,,,:/home/a60395:/bin/bash
a82420:x:1017:1006:Afonso Alves,,,:/home/a82420:/bin/bash
a69225:x:1018:1006:Afonso Alves,,,:/home/a69225:/bin/bash
a82824:x:1019:1006:Afonso Carreira,,,:/home/a82824:/bin/bash
a83112:x:1020:1006:Aladje Sanha,,,:/home/a83112:/bin/bash
a82652:x:1022:1006:Alexandre Ferreira,,,:/home/a82652:/bin/bash
a83063:x:1023:1006:Alexandre Feijo,,,:/home/a83063:/bin/bash
a82540:x:1024:1006:Ana Santana,,,:/home/a82540:/bin/bash
With the following code i'm able to get something like this:
cat /etc/passwd |grep "^a[0-9]" | cut -d ":" -f1,5 | sed "s/a//" | sed "s/,//g" > passageiros.txt
sed -e "s/$/:::a/" -i passageiros.txt
82411:Adriana Morais:::a
60395:Afonso Pichel:::a
82420:Afonso Alves:::a
69225:Afonso Alves:::a
82824:Afonso Carreira:::a
83112:Aladje Sanha:::a
82652:Alexandre Ferreira:::a
83063:Alexandre Feijo:::a
82540:Ana Santana:::a
So my goal is to create something like this:
82411:Adriana Morais:::a82411#
60395:Afonso Pichel:::a60395#
82420:Afonso Alves:::a82420#
69225:Afonso Alves:::a69225#
82824:Afonso Carreira:::a82824#
83112:Aladje Sanha:::a83112#
82652:Alexandre Ferreira:::a82652#
83063:Alexandre Feijo:::a83063#
82540:Ana Santana:::a82540#
How can I do this?
Could you please try following.
awk -F'[:,]' '{val=$1;sub(/[a-z]+/,"",$1);print $1,$5,_,_,val"#"}' OFS=":" Input_file
Explanation: Adding explanation for above code too.
awk -F'[:,]' ' ##Starting awk script here and making field seprator as colon and comma here.
{ ##Starting main block here for awk.
val=$1 ##Creating a variable val whose value is first field.
sub(/[a-z]+/,"",$1) ##Using sub for substituting any kinf of alphabets small a to z in first field with NULL here.
print $1,$5,_,_,val"#" ##Printing 1st, 5th field and printing 2 NULL variables and printing variable val with #.
} ##Closing block for awk here.
' OFS=":" Input_file ##Mentioning OFS value as colon here and mentioning Input_file name here.
EDIT: Adding #Aserre's solution too here.
awk -F'[:,]' '{print substr($1, 2),$5,_,_,$1"#"}' OFS=":" Input_file
You may use the following awk:
awk 'BEGIN {FS=OFS=":"} {sub(/^a/, "", $1); gsub(/,/, "", $5); print $1, $5, _, _, "a" $1 "#"}' file > passageiros.txt
See the online demo
Details
BEGIN {FS=OFS=":"} sets the input and output field separator to :
sub(/^a/, "", $1) removes the first a from Field 1
gsub(/,/, "", $5) removes all , from Field 5
print $1, $5, _, _, "a" $1 "#" prints only the necessary fields to the output.
You can use just one sed:
grep '^a' file | cut -d: -f1,5 | sed 's/a\([^:]*\)\(.*\)/\1\2:::a\1#/;s/,,,//'

Resources