Need only parent domain from a URL in dnsmasq.log file - shell

I want to fetch website names visited by connected LAN clients from dnsmasq.log file. So far i have been able to get this done.
cat /tmp/dnsmasq.log | grep query | egrep -v 'AAA|SRV|PTR' | awk '{print $1" "$2" "$3","$8","$6}'
May 29 12:00:17,127.0.0.1,ftp.box.com
May 29 12:00:33,10.0.0.41,2.android.pool.ntp.org
I need output as
May 29 12:00:17,127.0.0.1,box.com
May 29 12:00:33,10.0.0.41,ntp.org
I need just the parent domain name in the output. Please help.
Thanks

Could you please try following, written and tested with shown samples.Considering that you need last 2 elements of your url.
awk 'BEGIN{FS=OFS=","} {num=split($NF,array,".");$NF=array[num-1]"."array[num]} 1' Input_file
Explanation: Adding detailed explanation.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this awk program from here.
FS=OFS="," ##Setting field separator and output field separator as comma here.
}
{
num=split($NF,array,".") ##Splitting last field and metioning . as separator.
$NF=array[num-1]"."array[num] ##Setting last column value as 2nd last element DOT and last element of array here.
}
1 ##1 will print lines here.
' Input_file ##Mentioning Input_file name here.

Related

Reformatting text file using awk and cut as a one liner

Data:
CHR SNP BP A1 TEST NMISS BETA SE L95 U95 STAT P
1 chr1:1243:A:T 1243 T ADD 16283 -6.124 0.543 -1.431 0.3534 -1.123 0.14
Desired output:
MarkerName P-Value
chr1:1243 0.14
The actual file is 1.2G worth of lines like the above
I need to strip the 2nd column of the text past the 2nd colon and then paste this to the final 12th column and give it a new header.
I have tried:
awk '{print $2, $12}' | cut -d: -f1-2
but this removes the whole line after the colons and I want to keep the "p" column
I outputted this to a new file and then pasted it onto the P-value column using awk but was wondering if there was a one-liner method of doing this?
Many thanks
My comment in more understandable form:
$ awk '
BEGIN {
print "MarkerName P-Value" # output header
}
NR>1 { # skip the funky first record
split($2,a,/:/) # split by :
printf "%s:%s %s\n",a[1],a[2],$12 # printf allows easier output formating
}' file
Output:
MarkerName P-Value
chr1:1243 0.14
EDIT: Adding one more solution here, since OP mentioned my first solution somehow didn't work for OP but it worked fine for me, as an alternative adding this here.
awk '
BEGIN{
print "MarkerName P-Value"
}
FNR>1{
match($2,/([^:]*:){2}/)
print OFS substr($2,RSTART,RLENGTH-1),$NF
}
' Input_file
With shown samples, could you please try following. You need not to use cut with awk, awk could take care of everything within itself.
awk -F' +|:' '
BEGIN{
print "MarkerName P-Value"
}
FNR>1{
print OFS $2":"$3,$NF
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -F' +|:' ' ##Starting awk program from here and setting field separator as spaces or colon for all lines.
BEGIN{ ##Starting BEGIN section of this program from here.
print "MarkerName P-Value" ##Printing headers here.
}
FNR>1{ ##Checking condition if line number is greater than 1 then do following.
print OFS $2":"$3,$NF ##Printing space(OFS) 2nd field colon 3rd field and last field as per OP request.
}
' Input_file ##Mentioning Input_file name here.
$ awk -F'[: ]+' '{print (NR==1 ? "MarkerName P-Value" : $2":"$3" "$NF)}' file
MarkerName P-Value
chr1:1243 0.14
Sed alternative:
sed -En '1{s/^.*$/MarkerName\tP-Value/p};s/([[:digit:]]+[[:space:]]+)([[:alnum:]]+:[[:digit:]]+)(.*)([[:digit:]]+\.[[:digit:]]+$)/\2\t\4/p'
For the first line, substitute the full line for the headers. Then, split the line into 4 sections based on regular expressions and then print the 2nd subsection followed by a tab and then the 4th subsection.

Group_by and group_concat in shell script

My intent is to identify the duplicate jars in classpath. So I have used following commands to do some preprocessing.
mvn -o dependency:list | grep ":.*:.*:.*" | cut -d] -f2- | sed 's/:[a-z]*$//g' | sort -u -t: -k2
and the file produced is in format
group_id:artifact_id:type:version
so, now for an example, I have following two lines in a file
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
I want to produce a file with following content.
jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26
content of this file varies. there can be multiple libs with diff version.
Any idea how to do it with shell script? I want to avoid database query.
Adding a snap of sample file here...
org.glassfish.jaxb:jaxb-runtime:jar:2.4.0-b180725.0644
org.jboss.spec.javax.annotation:jboss-annotations-api_1.2_spec:jar:1.0.2.Final
org.jboss.logging:jboss-logging:jar:3.3.2.Final
org.jboss.spec.javax.transaction:jboss-transaction-api_1.2_spec:jar:1.0.1.Final
org.jboss.spec.javax.websocket:jboss-websocket-api_1.1_spec:jar:1.1.3.Final
com.github.stephenc.jcip:jcip-annotations:jar:1.0-1
com.beust:jcommander:jar:1.72
com.sun.jersey.contribs:jersey-apache-client4:jar:1.19.1
org.glassfish.jersey.ext:jersey-bean-validation:jar:2.26
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
org.glassfish.jersey.core:jersey-common:jar:2.26
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.26
org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.26
com.sun.jersey:jersey-core:jar:1.19.1
org.glassfish.jersey.ext:jersey-entity-filtering:jar:2.26
org.glassfish.jersey.inject:jersey-hk2:jar:2.31
org.glassfish.jersey.media:jersey-media-jaxb:jar:2.26
org.glassfish.jersey.media:jersey-media-json-jackson:jar:2.26
org.glassfish.jersey.media:jersey-media-multipart:jar:2.26
org.glassfish.jersey.core:jersey-server:jar:2.26
org.glassfish.jersey.ext:jersey-spring4:jar:2.26
net.minidev:json-smart:jar:2.3
com.google.code.findbugs:jsr305:jar:3.0.1
javax.ws.rs:jsr311-api:jar:1.1.1
org.slf4j:jul-to-slf4j:jar:1.7.25
junit:junit:jar:4.12
org.latencyutils:LatencyUtils:jar:2.0.3
org.liquibase:liquibase-core:jar:3.5.5
log4j:log4j:jar:1.2.16
org.apache.logging.log4j:log4j-api:jar:2.10.0
com.googlecode.log4jdbc:log4jdbc:jar:1.2
org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0
ch.qos.logback:logback-classic:jar:1.2.3
ch.qos.logback:logback-core:jar:1.2.3
io.dropwizard.metrics:metrics-core:jar:4.1.6
io.dropwizard.metrics:metrics-healthchecks:jar:4.1.6
io.dropwizard.metrics:metrics-jmx:jar:4.1.6
io.micrometer:micrometer-core:jar:1.0.6
org.jvnet.mimepull:mimepull:jar:1.9.6
com.microsoft.sqlserver:mssql-jdbc:jar:6.2.2.jre8
com.netflix.netflix-commons:netflix-commons-util:jar:0.3.0
com.netflix.netflix-commons:netflix-statistics:jar:0.1.1
io.netty:netty-buffer:jar:4.1.27.Final
io.netty:netty-codec:jar:4.1.27.Final
io.netty:netty-codec-http:jar:4.1.27.Final
io.netty:netty-common:jar:4.1.27.Final
io.netty:netty-resolver:jar:4.1.27.Final
io.netty:netty-transport:jar:4.1.27.Final
io.netty:netty-transport-native-epoll:jar:4.1.27.Final
io.netty:netty-transport-native-unix-common:jar:4.1.27.Final
com.nimbusds:nimbus-jose-jwt:jar:8.3
There might be easier methods but this is what I can do now ... probably can be narrowed down to a single line with some tweaking
[07:38 am alex ~]$ date; cat a
Wed 4 Nov 07:38:21 GMT 2020
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26
[07:38 am alex ~]$ FIRST=`cat a | awk -F'[:]' '{print $2}' | uniq`
[07:38 am alex ~]$ SECOND=`cat a | awk -F'[:]' '{print $1":"$4}' | xargs | sed 's/ /,/g'`
[07:38 am alex ~]$ echo "$FIRST | $SECOND"
jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26
Could you please try following, this could be done in a single awk itself. Completely based on your shown samples only.
awk '
BEGIN{
FS=":"
OFS=" | "
}
FNR==1{
first=$1
third=$3
second=$2
next
}
FNR==2{
print second,first","$1":"$NF
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=":" ##Setting field separator colon here.
OFS=" | " ##Setting output field separator as space | space here.
}
FNR==1{ ##Checking conditon if this is first line then do following.
first=$1 ##Creating first with 1st field value.
third=$3 ##Creating third with 3rd field value.
second=$2 ##Creating second with 2nd field value of current line.
next ##next will skip all further statements from here.
}
FNR==2{ ##Checking condition if this is 2nd line then do following.
print second,first","$1":"$NF ##Printing second first first field and last field of current line.
}
' Input_file ##Mentioning Input_file name here.

Use sed (or similar) to remove anything between repeating patterns

I'm essentially trying to "tidy" a lot of data in a CSV. I don't need any of the information that's in "quotes".
Tried sed 's/".*"/""/' but it removes the commas if there's more than one section together.
I would like to get from this:
1,2,"a",4,"b","c",5
To this:
1,2,,4,,,5
Is there a sed wizard who can help? :)
You may use
sed 's/"[^"]*"//g' file > newfile
See online sed demo:
s='1,2,"a",4,"b","c",5'
sed 's/"[^"]*"//g' <<< "$s"
# => 1,2,,4,,,5
Details
The "[^"]*" pattern matches ", then 0 or more characters other than ", and then ". The matches are removed since RHS is empty. g flag makes it match all occurrences on each line.
Could you please try following.
awk -v s1="\"" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){if($i~s1){$i=""}}} 1' Input_file
Non-one liner form of solution is:
awk -v s1="\"" '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i~s1){
$i=""
}
}
}
1
' Input_file
Detailed explanation:
awk -v s1="\"" ' ##Starting awk program from here and mentioning variable s1 whose value is "
BEGIN{ ##Starting BEGIN section of this code here.
FS=OFS="," ##Setting field separator and output field separator as comma(,) here.
}
{
for(i=1;i<=NF;i++){ ##Starting a for loop which traverse through all fields of current line.
if($i~s1){ ##Checking if current field has " in it if yes then do following.
$i="" ##Nullifying current field value here.
}
}
}
1 ##Mentioning 1 will print edited/non-edited line here.
' Input_file ##Mentioning Input_file name here.
With Perl:
perl -p -e 's/".*?"//g' file
? forces * to be non-greedy.
Output:
1,2,,4,,,5

How to add single quote after specific word using sed?

I am trying to write a script to add a single quote after a "GOOD" word .
For example, I have file1 :
//WER GOOD=ONE
//WER1 GOOD=TWO2
//PR1 GOOD=THR45
...
Desired change is to add single quotes :
//WER GOOD='ONE'
//WER1 GOOD='TWO2'
//PR1 GOOD='THR45'
...
This is the script which I am trying to run:
#!/bin/bash
for item in `grep "GOOD" file1 | cut -f2 -d '='`
do
sed -i 's/$item/`\$item/`\/g' file1
done
Thank you for the help in advance !
Could you please try following.
sed "s/\(.*=\)\(.*\)/\1'\2'/" Input_file
OR as per OP's comment to remove empty line use:
sed "s/\(.*=\)\(.*\)/\1'\2'/;/^$/d" Input_file
Explanation: following is only for explanation purposes.
sed " ##Starting sed command from here.
s/ ##Using s to start substitution process from here.
\(.*=\)\(.*\) ##Using sed buffer capability to store matched regex into memory, saving everything till = in 1st buffer and rest of line in 2nd memory buffer.
/\1'\2' ##Now substituting 1st and 2nd memory buffers with \1'\2' as per OP need adding single quotes before = here.
/" Input_file ##Closing block for substitution, mentioning Input_file name here.
Please use -i option in above code in case you want to save output into Input_file itself.
2nd solution with awk:
awk 'match($0,/=.*/){$0=substr($0,1,RSTART) "\047" substr($0,RSTART+1,RLENGTH) "\047"} 1' Input_file
Explanation: Adding explanation for above code.
awk '
match($0,/=.*/){ ##Using match function to mmatch everything from = to till end of line.
$0=substr($0,1,RSTART) "\047" substr($0,RSTART+1,RLENGTH) "\047" ##Creating value of $0 with sub-strings till value of RSTART and adding ' then sub-strings till end of line adding ' then as per OP need.
} ##Where RSTART and RLENGTH are variables which will be SET once a TRUE matched regex is found.
1 ##1 will print edited/non-edited line.
' Input_file ##Mentioning Input_file name here.
3rd solution: In case you have only 2 fields in your Input_file then try more simpler in awk:
awk 'BEGIN{FS=OFS="="} {$2="\047" $2 "\047"} 1' Input_file
Explanation of 3rd code: Use only for explanation purposes, for running please use above code itself.
awk ' ##Starting awk program here.
BEGIN{FS=OFS="="} ##Setting FS and OFS values as = for all line for Input_file here.
{$2="\047" $2 "\047"} ##Setting $2 value with adding a ' $2 and then ' as per OP need.
1 ##Mentioning 1 will print edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.

Replace header of one column by file name

I have about 100 comma-separated text files with eight columns.
Example of two file names:
sample1_sorted_count_clean.csv
sample2_sorted_count_clean.csv
Example of file content:
Domain,Phylum,Class,Order,Family,Genus,Species,Count
Bacteria,Proteobacteria,Alphaproteobacteria,Sphingomonadales,Sphingomonadaceae,Zymomonas,Zymomonas mobilis,0.0
Bacteria,Bacteroidetes,Flavobacteria,Flavobacteriales,Flavobacteriaceae,Zunongwangia,Zunongwangia profunda,0.0
For each file, I would like to replace the column header "Count" by sample ID, which is contained in the first part of the file name (sample1, sample2)
In the end, the header should then look like this:
Domain,Phylum,Class,Order,Family,Genus,Species,sample1
If I use my code, the header looks like this:
Domain,Phylum,Class,Order,Family,Genus,Species,${f%_clean.csv}
for f in *_clean.csv; do echo ${f}; sed -e "1s/Domain,Phylum,Class,Order,Family,Genus,Species,RPMM/Domain,Phylum,Class,Order,Family,Genus,Species,${f%_clean.csv}/" ${f} > ${f%_clean.csv}_clean2.csv; done
I also tried:
for f in *_clean.csv; do gawk -F"," '{$NF=","FILENAME}1' ${f} > t && mv t ${f%_clean.csv}_clean2.csv; done
In this case, "count" is replaced by the entire file name, but each row of the column contains file name now. The count values are no longer present. This is not what I want.
Do you have any ideas on what else I may try?
Thank you very much in advance!
Anna
If you are ok with awk, could you please try following.
awk 'BEGIN{FS=OFS=","} FNR==1{var=FILENAME;sub(/_.*/,"",var);$NF=var} 1' *.csv
EDIT: Since OP is asking that after 2nd underscore everything should be removed in file's name then try following.
awk 'BEGIN{FS=OFS=","} FNR==1{split(FILENAME,array,"_");$NF=array[1]"_"array[2]} 1' *.csv
Explanation: Adding explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of code from here, which will be executed before Input_file(s) are being read.
FS=OFS="," ##Setting FS and OFS as comma here for all files all lines.
} ##Closing BEGIN section here.
FNR==1{ ##Checking condition if FNR==1 which means very first line is being read for Input_file then do following.
split(FILENAME,array,"_") ##Using split of awk out of box function by splitting FILENAME(which contains file name in it) into an array named array with delimiter _ here.
$NF=array[1]"_"array[2] ##Setting last field value to array 1st element underscore and then array 2nd element value in it.
} ##Closing FNR==1 condition BLOCK here.
1 ##Mentioning 1 will print the rest of the lines for current Input_file.
' *.csv ##Passing all *.csv files to awk program here.

Resources