Extract a property value from a text file - bash

I have a log file which contains lines like the following one:
Internal (reserved=1728469KB, committed=1728469KB)
I'd need to extract the value contained in "committed", so 1728469
I'm trying to use awk for that
cat file.txt | awk '{print $4}'
However that produces:
committed=1728469KB)
This is still incomplete and would need still some work. Is there a simpler solution to do that instead?
Thanks

Could you please try following, using match function of awk.
awk 'match($0,/committed=[0-9]+/){print substr($0,RSTART+10,RLENGTH-10)}' Input_file
With GNU grep using \K option of it:
grep -oP '.*committed=\K[0-9]*' Input_file
Output will be 1728469 in both above solutions.
1st solution explanation:
awk ' ##Starting awk program from here.
match($0,/committed=[0-9]+/){ ##Using match function to match from committed= till digits in current line.
print substr($0,RSTART+10,RLENGTH-10) ##Printing sub string from RSTART+10 to RLENGTH-10 in current line.
}
' Input_file ##Mentioning Input_file name here.

Sed is better at simple matching tasks:
sed -n 's/.*committed=\([0-9]*\).*/\1/p' input_file

$ awk -F'[=)]' '{print $3}' file
1728469KB

You can try this:
str="Internal (reserved=1728469KB, committed=1728469KB)"
echo $str | awk '{print $3}' | cut -d "=" -f2 | rev | cut -c4- | rev

Related

how to discard the last field of the content of a file using awk command

how to discard the last field using awk
list.txt file contains data like below,
Ram/45/simple
Gin/Run/657/No/Sand
Ram/Hol/Sin
Tan/Tin/Bun
but I require output like below,
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
tried the following command but it prints only the last field
cat list.txt |awk -F '/' '{print $(NF)}'
45
No
Hol
Tin
With GNU awk, you could try following.
awk 'BEGIN{FS=OFS="/"} NF--' Input_file
OR with any awk try following.
awk 'BEGIN{FS=OFS="/"} match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
This simple awk should work:
awk '{sub(/\/[^/]*$/, "")} 1' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
Or even this simpler sed should also work:
sed 's~/[^/]*$~~' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin

Group_by and group_concat in shell script

My intent is to identify the duplicate jars in classpath. So I have used following commands to do some preprocessing.
mvn -o dependency:list | grep ":.*:.*:.*" | cut -d] -f2- | sed 's/:[a-z]*$//g' | sort -u -t: -k2
and the file produced is in format
group_id:artifact_id:type:version
so, now for an example, I have following two lines in a file
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
I want to produce a file with following content.
jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26
content of this file varies. there can be multiple libs with diff version.
Any idea how to do it with shell script? I want to avoid database query.
Adding a snap of sample file here...
org.glassfish.jaxb:jaxb-runtime:jar:2.4.0-b180725.0644
org.jboss.spec.javax.annotation:jboss-annotations-api_1.2_spec:jar:1.0.2.Final
org.jboss.logging:jboss-logging:jar:3.3.2.Final
org.jboss.spec.javax.transaction:jboss-transaction-api_1.2_spec:jar:1.0.1.Final
org.jboss.spec.javax.websocket:jboss-websocket-api_1.1_spec:jar:1.1.3.Final
com.github.stephenc.jcip:jcip-annotations:jar:1.0-1
com.beust:jcommander:jar:1.72
com.sun.jersey.contribs:jersey-apache-client4:jar:1.19.1
org.glassfish.jersey.ext:jersey-bean-validation:jar:2.26
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
org.glassfish.jersey.core:jersey-common:jar:2.26
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.26
org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.26
com.sun.jersey:jersey-core:jar:1.19.1
org.glassfish.jersey.ext:jersey-entity-filtering:jar:2.26
org.glassfish.jersey.inject:jersey-hk2:jar:2.31
org.glassfish.jersey.media:jersey-media-jaxb:jar:2.26
org.glassfish.jersey.media:jersey-media-json-jackson:jar:2.26
org.glassfish.jersey.media:jersey-media-multipart:jar:2.26
org.glassfish.jersey.core:jersey-server:jar:2.26
org.glassfish.jersey.ext:jersey-spring4:jar:2.26
net.minidev:json-smart:jar:2.3
com.google.code.findbugs:jsr305:jar:3.0.1
javax.ws.rs:jsr311-api:jar:1.1.1
org.slf4j:jul-to-slf4j:jar:1.7.25
junit:junit:jar:4.12
org.latencyutils:LatencyUtils:jar:2.0.3
org.liquibase:liquibase-core:jar:3.5.5
log4j:log4j:jar:1.2.16
org.apache.logging.log4j:log4j-api:jar:2.10.0
com.googlecode.log4jdbc:log4jdbc:jar:1.2
org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0
ch.qos.logback:logback-classic:jar:1.2.3
ch.qos.logback:logback-core:jar:1.2.3
io.dropwizard.metrics:metrics-core:jar:4.1.6
io.dropwizard.metrics:metrics-healthchecks:jar:4.1.6
io.dropwizard.metrics:metrics-jmx:jar:4.1.6
io.micrometer:micrometer-core:jar:1.0.6
org.jvnet.mimepull:mimepull:jar:1.9.6
com.microsoft.sqlserver:mssql-jdbc:jar:6.2.2.jre8
com.netflix.netflix-commons:netflix-commons-util:jar:0.3.0
com.netflix.netflix-commons:netflix-statistics:jar:0.1.1
io.netty:netty-buffer:jar:4.1.27.Final
io.netty:netty-codec:jar:4.1.27.Final
io.netty:netty-codec-http:jar:4.1.27.Final
io.netty:netty-common:jar:4.1.27.Final
io.netty:netty-resolver:jar:4.1.27.Final
io.netty:netty-transport:jar:4.1.27.Final
io.netty:netty-transport-native-epoll:jar:4.1.27.Final
io.netty:netty-transport-native-unix-common:jar:4.1.27.Final
com.nimbusds:nimbus-jose-jwt:jar:8.3
There might be easier methods but this is what I can do now ... probably can be narrowed down to a single line with some tweaking
[07:38 am alex ~]$ date; cat a
Wed 4 Nov 07:38:21 GMT 2020
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
[07:38 am alex ~]$ FIRST=`cat a | awk -F'[:]' '{print $2}' | uniq`
[07:38 am alex ~]$ SECOND=`cat a | awk -F'[:]' '{print $1":"$4}' | xargs | sed 's/ /,/g'`
[07:38 am alex ~]$ echo "$FIRST | $SECOND"
jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26
Could you please try following, this could be done in a single awk itself. Completely based on your shown samples only.
awk '
BEGIN{
FS=":"
OFS=" | "
}
FNR==1{
first=$1
third=$3
second=$2
next
}
FNR==2{
print second,first","$1":"$NF
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=":" ##Setting field separator colon here.
OFS=" | " ##Setting output field separator as space | space here.
}
FNR==1{ ##Checking conditon if this is first line then do following.
first=$1 ##Creating first with 1st field value.
third=$3 ##Creating third with 3rd field value.
second=$2 ##Creating second with 2nd field value of current line.
next ##next will skip all further statements from here.
}
FNR==2{ ##Checking condition if this is 2nd line then do following.
print second,first","$1":"$NF ##Printing second first first field and last field of current line.
}
' Input_file ##Mentioning Input_file name here.

Awk command to cut the url

I want to cut my url https://jenkins-crumbtest2.origin-ctc-core-nonprod.com/ into https://origin-ctc-core-nonprod.com I have tried several ways to handle it
$ echo https://jenkins-crumbtest2-test.origin-ctc-core-nonprod.com/ | cut -d"/" -f3 | cut -d"/" -f5
jenkins-crumbtest2.origin-ctc-core-nonprod.com
I have 3 inputs which i want to pass to get the expected output. I want to pass any of the input to get the same output.
Input:
1. https://jenkins-crumbtest2-test.origin-ctc-core-nonprod.com/ (or)
2. https://jenkins-crumbtest2.origin-ctc-core-nonprod.com/ (or)
3. https://jenkins-crumbtest2-test-lite.origin-ctc-core-nonprod.com/
Expected Output:
https://origin-ctc-core-nonprod.com
Can someone please help me ?
Could you please try following. Written and tested with shown samples only.
awk '{gsub(/:\/\/.*test\.|:\/\/.*crumbtest2\.|:\/\/.*test-lite\./,"://")} 1' Input_file
OR non-one liner form of solution above is as follows.
awk '
{
gsub(/:\/\/.*test\.|:\/\/.*crumbtest2\.|:\/\/.*test-lite\./,"://")
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
gsub(/:\/\/.*test\.|:\/\/.*crumbtest2\.|:\/\/.*test-lite\./,"://") ##Gobally substituting everything till test OR crumbtest OR test-lite with :// in line.
}
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name h
This awk skips the records that don't have fixed string origin-ctc-core-nonprod.com in them:
awk 'match($0,/origin-ctc-core-nonprod\.com/){print "https://" substr($0,RSTART,RLENGTH)}'
You can use it with: echostring| awk ..., catfile|or awk ...file .
Explined:
awk ' # using awk
match($0,/origin-ctc-core-nonprod\.com/) { # if fixed string is matched
print "https://" substr($0,RSTART,RLENGTH) # output https:// and fixed string
# exit # uncomment if you want only
}' # one line of output like in sample
Or if you don't need the https:// part, you could just use grep:
grep -om 1 "origin-ctc-core-nonprod\.com"
Then again:
$ var=$(grep -om 1 "origin-ctc-core-nonprod\.com" file) && echo https://$var

Grep only 2 portions in a line

I have the following line. I can grep one part but struggling with also grepping the second portion.
Line:
html:<TR><TD>PICK_1</TD><TD>36.0000</TD><TD>1000000</TD><TD>26965</TD><TD>100000000</TD><TD>97074000</TD><TD>2926000</TD><TD>2.926%</TD><TD>97.074%</TD></TR>
I want to have the following results after grepping this line.
PICK_1 97.074%
Currently just grepping first portion via following command.
grep -Po "<TR><TD>[A-Z0-9_]+" test.txt
Appreciate any help on how I can go about doing this. Thanks.
Use awk with a custom field separator:
awk -F'[<>TDR/]+' '{ print $2, $(NF-1) }' file
This splits the line on things that look like one or more opening or closing <TD> or <TR> tags, and prints the second and second-last field.
Warning: this will break on almost every input except the one that you've shown, since awk, grep and friends are designed for processing text, not HTML.
If you always have the same number of fields delimited by "TD" tags, you can try with this (dirty) awk:
awk -F'[<TD>|</TD>]' '{print $8 " " $80}'
Or this combination of column and awk:
column -t -s "</TD>" | awk -F' ' '{print $3 " " $11}'
Or with sed instead of column:
sed -e 's/<TD>/ /g' | awk -F' ' '{print $3 " " $11}'
try provide each patter after "-e" option
grep -e PICK_1 -e "<TR><TD>[A-Z0-9_]+" test.txt
awk -F'[<>]' '{print $5,$(NF-4)}' file
PICK_1 97.074%

awk - split only by first occurrence

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.
Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.
Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven
Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

Resources