Parsing functionality in shell script - shell

If I am trying to look up which host bus is the hard drive attached to, I would use
ls -ld /sys/block/sd*/device
it returns
lrwxrwxrwx 1 root root 0 Oct 18 14:52 /sys/block/sda/device -> ../../../1:0:0:0
Now if I want to parse out that "1" in the end of the above string, what would be the quickest way?
Sorry I am very new to shell scripting, I can't make full use of this powerful scripting language.
Thanks!

Split with slashes, select last field, split it with colons and select first result:
ls -ld /sys/block/sd*/device | awk -F'/' '{ split( $NF, arr, /:/ ); print arr[1] }'
It yields:
1

Try doing this :
$ ls -ld /sys/block/sd*/device | grep -oP '\d+(?=:\d+:\d:\d+)'
0
2
3
or
$ printf '%s\n' /sys/block/sd*/device |
xargs readlink -f |
grep -oP '\d+(?=:\d+:\d:\d+)'
and if you want only the first occurence :
grep ...-m1 ...

Related

Find unique URLs in a file

Situation
I have many URLs in a file, and I need to find out how many unique URLs exist.
I would like to run either a bash script or a command.
myfile.log
/home/myfiles/www/wp-content/als/xm-sf0ab5df9c1262f2130a9b313192deca4-f0ab5df9c1262f2130a9b313192deca4-c23c5fbca96e8d641d148bac41017635|https://public.rgfl.org/HS/PowerPoint%20Presentations/Health%20and%20Safety%20Law.ppt,18,17
/home/myfiles/www/wp-content/als/xm-s4bf050d47df5bfaf0486a50a8528cb16-4bf050d47df5bfaf0486a50a8528cb16-c23c5fbca96e8d641d148bac41017635|https://public.rgfl.org/HS/PowerPoint%20Presentations/Health%20and%20Safety%20Law.ppt,15,14
/home/myfiles/www/wp-content/als/xm-sad122bf22152ba4823a520cc2fe59f40-ad122bf22152ba4823a520cc2fe59f40-c23c5fbca96e8d641d148bac41017635|https://public.rgfl.org/HS/PowerPoint%20Presentations/Health%20and%20Safety%20Law.ppt,17,16
/home/myfiles/www/wp-content/als/xm-s3c0f031eebceb0fd5c4334ecef15292d-3c0f031eebceb0fd5c4334ecef15292d-c23c5fbca96e8d641d148bac41017635|https://public.rgfl.org/HS/PowerPoint%20Presentations/Health%20and%20Safety%20Law.ppt,12,11
/home/myfiles/www/wp-content/als/xm-sff661e8c3b4f94957926d5434d0ad549-ff661e8c3b4f94957926d5434d0ad549-c23c5fbca96e8d641d148bac41017635|https://quality.gha.org/Portals/2/documents/HEN/Meetings/nursesinstitute/062013/nursesroleineliminatingharm_moddydunning.pptx,17,16
/home/myfiles/www/wp-content/als/xm-s32c41ec2a5440ad220008b9abfe9add2-32c41ec2a5440ad220008b9abfe9add2-c23c5fbca96e8d641d148bac41017635|https://quality.gha.org/Portals/2/documents/HEN/Meetings/nursesinstitute/062013/nursesroleineliminatingharm_moddydunning.pptx,19,18
/home/myfiles/www/wp-content/als/xm-s28787ca2f4372ddb3616d3fd53c161ab-28787ca2f4372ddb3616d3fd53c161ab-c23c5fbca96e8d641d148bac41017635|https://quality.gha.org/Portals/2/documents/HEN/Meetings/nursesinstitute/062013/nursesroleineliminatingharm_moddydunning.pptx,22,21
/home/myfiles/www/wp-content/als/xm-s89a7b68158e38391da9f0de1e636c0d5-89a7b68158e38391da9f0de1e636c0d5-c23c5fbca96e8d641d148bac41017635|https://quality.gha.org/Portals/2/documents/HEN/Meetings/nursesinstitute/062013/nursesroleineliminatingharm_moddydunning.pptx,13,12
/home/myfiles/www/wp-content/als/xm-sc4b14e10f6151995f21334061ff1d139-c4b14e10f6151995f21334061ff1d139-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hy-wire-car-2.pptx,13,12
/home/myfiles/www/wp-content/als/xm-se589d47d163e43fa0c0d68e824e2c286-e589d47d163e43fa0c0d68e824e2c286-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hy-wire-car-2.pptx,19,18
/home/myfiles/www/wp-content/als/xm-s52f897a623c539d09bfb988bfb153888-52f897a623c539d09bfb988bfb153888-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hy-wire-car-2.pptx,14,13
/home/myfiles/www/wp-content/als/xm-sccf27a904c5b88e96a3522b2e1180fed-ccf27a904c5b88e96a3522b2e1180fed-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hy-wire-car-2.pptx,18,17
/home/myfiles/www/wp-content/als/xm-s6874bf9d589708764dab754e5af06ddf-6874bf9d589708764dab754e5af06ddf-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hy-wire-car-2.pptx,17,16
/home/myfiles/www/wp-content/als/xm-s46c55ec8387dbdedd7a83b3ad541cdc1-46c55ec8387dbdedd7a83b3ad541cdc1-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hy-wire-car-2.pptx,19,18
/home/myfiles/www/wp-content/als/xm-s08cfdc15f5935b947bbaa93c7193d496-08cfdc15f5935b947bbaa93c7193d496-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hydro-power-plant.ppt,9,8
/home/myfiles/www/wp-content/als/xm-s86e267bd359c12de262c0279cee0c941-86e267bd359c12de262c0279cee0c941-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hydro-power-plant.ppt,15,14
/home/myfiles/www/wp-content/als/xm-s5aa60354d134b87842918d760ec8bc30-5aa60354d134b87842918d760ec8bc30-c23c5fbca96e8d641d148bac41017635|https://royalmechanical.files.wordpress.com/2011/06/hydro-power-plant.ppt,14,13
Desired Result:
Unique Urls: 4
cut -d "|" -f 2 file | cut -d "," -f 1 | sort -u | wc -l
Output:
4
See: man cut, man sort
An awk solution would be
awk '{sub(/^[^|]*\|/,"");gsub(/,[^,]*/,"");i+=a[$0]++?0:1}END{print i}' file
4
If you happen to use GNU awk then below would also give you the same result
awk '{i+=a[gensub(/.*(http[^,]*).*/,"\\1",1)]++?0:1}END{print i}' file
4
Or even short as pointed out in this cracker comment by #cyrus
awk -F '[|,]' '{i+=!a[$2]++} END{print i}' file
4
which uses awk multiple field separator functionality with more idiomatic awk.
Note: See the [ awk manual ] for more info.
Parse with sed, and since file appears to be already sorted,
(with respect to URLs), just run uniq, and count it:
echo Unique URLs: $(sed 's/^.*|\([^,]*\),.*$/\1/' file | uniq | wc -l)
Use GNU grep to extract URLs:
echo Unique URLs: $(grep -o 'ht[^|,]*' file | uniq | wc -l)
Output (either method):
Unique URLs: 4
tr , '|' < myfile.log | sort -u -t '|' -k 2,2 | wc -l
tr , '|' < myfile.log translates all commas into pipe characters
sort -u -t '|' -k 2,2 sorts unique (-u), pipe delimited (-t '|'), in the second field only (-k 2,2)
wc -l counts the unique lines

How to remove all but the last 3 parts of FQDN?

I have a list of IP lookups and I wish to remove all but the last 3 parts, so:
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
would become
163data.com.cn
I have spent hours searching for clues, including parameter substitution, but the closest I got was:
$ string="98.254.237.114.broad.lyg.js.dynamic.163data.com.cn"
$ string1=${string%.*.*.*}
$ echo $string1
Which gives me the inverted answer of:
98.254.237.114.broad.lyg.js.dynamic
which is everything but the last 3 parts.
A script to do a list would be better than just the static example I have here.
Using CentOS 6, I don't mind if it by using sed, cut, awk, whatever.
Any help appreciated.
Thanks, now that I have working answers, may I ask as a follow up to then process the resulting list and if the last part (after last '.') is 3 characters - eg .com .net etc, then to just keep the last 2 parts.
If this is against protocol, please advise how to do a follow up question.
if parameter expansion inside another parameter expansion is supported, you can use this:
$ s='98.254.237.114.broad.lyg.js.dynamic.163data.com.cn'
$ # removing last three fields
$ echo "${s%.*.*.*}"
98.254.237.114.broad.lyg.js.dynamic
$ # pass output of ${s%.*.*.*} plus the extra . to be removed
$ echo "${s#${s%.*.*.*}.}"
163data.com.cn
can also reverse the line, get required fields and then reverse again.. this makes it easier to use change numbers
$ echo "$s" | rev | cut -d. -f1-3 | rev
163data.com.cn
$ echo "$s" | rev | cut -d. -f1-4 | rev
dynamic.163data.com.cn
$ # and easy to use with file input
$ cat ip.txt
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
foo.bar.123.baz.xyz
a.b.c.d.e.f
$ rev ip.txt | cut -d. -f1-3 | rev
163data.com.cn
123.baz.xyz
d.e.f
echo $string | awk -F. '{ if (NF == 2) { print $0 } else { print $(NF-2)"."$(NF-1)"."$NF } }'
NF signifies the total number of field separated by "." and so we want the last piece (NF), last but 1 (NF-1) and last but 2 (NF-2)
$ echo $string | awk -F'.' '{printf "%s.%s.%s\n",$(NF-2),$(NF-1),$NF}'
163data.com.cn
Brief explanation,
Set the field separator to .
Print only last 3 field using the awk parameter $(NF-2), $(NF-1),and $NF.
And there's also another option you may try,
$ echo $string | awk -v FPAT='[^.]+.[^.]+.[^.]+$' '{print $NF}'
163data.com.cn
It sounds like this is what you need:
awk -F'.' '{sub("([^.]+[.]){"NF-3"}","")}1'
e.g.
$ echo "$string" | awk -F'.' '{sub("([^.]+[.]){"NF-3"}","")}1'
163data.com.cn
but with just 1 sample input/output it's just a guess.
wrt your followup question, this might be what you're asking for:
$ echo "$string" | awk -F'.' '{n=(length($NF)==3?2:3); sub("([^.]+[.]){"NF-n"}","")}1'
163data.com.cn
$ echo 'www.google.com' | awk -F'.' '{n=(length($NF)==3?2:3); sub("([^.]+[.]){"NF-n"}","")}1'
google.com
Version which uses only bash:
echo $(expr "$string" : '.*\.\(.*\..*\..*\)')
To use it with a file you can iterate with xargs:
File:
head list.dat
98.254.237.114.broad.lyg.js.dynamic.163data.com.cn
98.254.34.56.broad.kkk.76onepi.co.cn
98.254.237.114.polst.a65dal.com.cn
iterating the whole file:
cat list.dat | xargs -I^ -L1 expr "^" : '.*\.\(.*\..*\..*\)'
Notice: it won't be very efficient in large scale, so you need to consider by your own whether it is good enough for you.
Regexp explanation:
.* \. \( .* \. .* \. .* \)
\___| | | | |
| \------------------------/> brakets shows which part we extract
| | |
| \-------/> the \. indicates the dots to separate specific number of words
|
|
-> the rest and the final dot which we are not interested in (out of brakets)
details:
http://tldp.org/LDP/abs/html/string-manipulation.html -> Substring Extraction

Oneline file-monitoring

I have a logfile continously filling with stuff.
I wish to monitor this file, grep for a specific line and then extract and use parts of that line in a curl command.
I had a look at How to grep and execute a command (for every match)
This would work in a script but I wonder if it is possible to achieve this with the oneliner below using xargs or something else?
Example:
Tue May 01|23:59:11.012|I|22|Event to process : [imsi=242010800195809, eventId = 242010800195809112112, msisdn=4798818181, inbound=false, homeMCC=242, homeMNC=01, visitedMCC=238, visitedMNC=01, timestamp=Tue May 12 11:21:12 CEST 2015,hlr=null,vlr=4540150021, msc=4540150021 eventtype=S, currentMCC=null, currentMNC=null teleSvcInfo=null camelPhases=null serviceKey=null gprsenabled= false APNlist: null SGSN: null]|com.uws.wsms2.EventProcessor|processEvent|139
Extract the fields I want and semi-colon separate them:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2";"$4";"$12}' | tr -cd '[[:digit:].\n.;]'
Curl command, e.g. something like:
http://user:pass#www.some-url.com/services/myservice?msisdn=...&imsi=...&vlr=...
Thanks!
Try this:
tail -f file.log | grep "Event to process" | awk -F'=' '{print $2" "$4" "$12; }' | tr -cd '[[:digit:].\n. ]' |while read msisdn imsi vlr ; do curl "http://user:pass#www.some-url.com/services/myservice?msisdn=$msisdn&imsi=$imsi&vlr=$vlr" ; done

Bind two files by column in bash

when i have two files such as file A
012
658
458
895
235
and file B
1
2
3
4
5
how could they be joined in bash? The output shoudl just be
1012
2658
3458
4895
5235
really I just want to bind by column such as in R (cbind).
Assuming columns are in equal length in both files, you can use paste command:
paste --delimiters='' fileB fileA
The default delimiter for paste command is TAB. So '' make sure no delimiter is in place.
Like this maybe:
paste -d'\0' B A
Or, if you like awk:
awk 'FNR==NR{A[FNR]=$0;next} {print $0,A[FNR]}' OFS='' A B
Using pure Bash and no external commands:
while read -u 3 A && read -u 4 B; do
echo "${B}${A}"
done 3< File_A.txt 4< File_B.txt
grep "run complete" *.err | awk -F: '{print $1}'|sort > a
ls ../bam/*bam | grep -v temp | awk -F[/_] '{print $3".err"}' | sort > b
diff <(grep "run complete" *.err | awk -F: '{print $1}'|sort) <(ls ../bam/*bam | grep -v temp | awk -F[/_] '{print $3".err"}' )
paste a b

Sorting output with awk, and formatting it

I'm trying to format the output of ls -la to only contain files modified in December and output them nicely, this is what they currently look like:
ls -la | awk {'print $6,$7,$8,$9,$10'} | grep "Dec" | sort -r | head -5
Dec 4 20:15 folder/
Dec 4 19:51 ./
Dec 4 17:42 Folder\ John/
Dec 4 16:19 Homework\ MAT\ 08/
Dec 4 16:05 Folder\ Smith/
etc..
How can I set up something like a regular expression to not include things like "./" and "../",
Also how can I omit the slash "\" for folders that have spaces in them. Id like to drop the slash at the end. Is this possible through a shell command? Or would I have to use Perl to make modifications to the test? I do want the date and time to remain as is. Any help would be greatly appreciated!
The box has linux and this is being done via SSH.
Edit:
Heres what I have so far (thanks to Mark and gbacon for this)
ls -laF | grep -vE ' ..?/?$' | awk '{ for (i=6; i<=NF; i++) printf("%s ", $i); printf("\n"); } ' | grep "Dec" | sort -r | head -5
Im just having trouble with replacing "\ " with just a space " ". Other than that Thanks for all the help upto this point!
You can use find to do most of the work for you:
find -mindepth 1 -maxdepth 1 -printf "%Tb %Td %TH:%TM %f\n" | grep "^Dec" | sort -r
The parent directory (..) is not included by default. The -mindepth 1 gets rid of the current directory (.). You can remove the -maxdepth 1 to make it recursive, but you should change the %f to %p to include the path with the filename.
These are the fields in the -printf:
%Tb - short month name
%Td - day of the month
%TM:%TM - hours and minutes
%f - filename
In the grep I've added a match for the beginning of the line so it won't match a file named "Decimal" that was modified in November, for example.
Check and make sure your 'ls' command isn't aliased to something else. Typically, "raw" ls doesn't give you the / for directories, nor should it be escaping the spaces.
Clearly something is escaping the spaces for you for your awk to be printing those files, since awk tends to break field up by whitespace, that's what the \ characters are for.
Spaces is files names are designed specifically to frustrate writing easy script and pipe mashups like you're are trying to do here.
You could filter the output of ls:
ls -la | grep -vE ' ..?/?$' | awk {'print $6,$7,$8,$9,$10'} | grep "Dec" | sort -r | head -5
If you're content to use Perl:
ls -la | perl -lane 's/\\ / /g;
print "#F[5..9]"
if $F[8] !~ m!^..?/?$! &&
$F[5] eq "Dec"'
Here's one of your answers:
How can I set up something like a regular expression to not include things like "./" and "../",
Use ls -lA instead of ls -la.
Instead of printing out a fixed number of columns, you can print out everything from column 6 t the end of the line:
ls -lA | awk '{ for (i=6; i<=NF; i++) printf("%s ", $i); printf("\n"); } '
I don't get the spaces backslashed, so I don't know why you are getting that. To fix it you could add this:
| sed 's/\\//g'
what's with all the greps and seds???
ls -laF | awk '!/\.\.\/$/ && !/\.\/$/ &&/Dec/ { for (i=6; i<=NF; i++) printf("%s ", $i); printf("\n"); }'
well you can drop . and .. by adding grep -v "\." | grep -v "\.\."
not sure about the rest
It really irks me to see pipelines with awk and grep/sed. Awk is a very powerful line-processing tool.
ls -laF | awk '
/ \.\.?\/$/ {next}
/ Dec / {for (i=1; i<=5; i++) $i = ""; print}
' | sort -r | head -5

Resources