I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?

Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
This would be the output:
$ awk -F"/" '{print $NF}' file

In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename

If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array

Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}

Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
Glad to notice there are better manners

It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
$ while read b ; do basename "$b" ; done < a

I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}

you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
2nd to last :
mawk '$!--NF=$NF' FS='/'
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
echo '/home/parent/child1/child2/filename'
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`

just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed

You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
sed -n 's/.*\/\([^\/]*\)$/\1/p'


Reformat date in text file (.csv) with sed and date

This is the input .csv file
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
Now, I want to convert the second column (date) to this specific format
date -d '10/12/2017 2:10pm' +'%Y/%m/%d %H:%M:%S', so that "10/12/2017 2:10pm" converts to "2017/10/12 14:10:00"
Expecting output file
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
I know it can be done by using bash or python, but I want to do it in one-line command. Any ideas? Is there a way to pass date result to sed?
One-liner awk approach.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd) }1' OFS=, infile #>>outfile
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
This will output changes in your Terminal, you need to redirect the output to a file if you need record the output or use FILENAME to redirect the output to the input infile itself.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd); print >FILENAME }' OFS=, infile
Or with GNU awk implementations which does support -i inplace identifier for in-place replace. see 'awk' save modifications in place
You can do it in one line, but that begs the question -- "How long of a line do you want?" Since you have it labeled 'shell' and not bash, etc., you are a bit limited in your string handling. POSIX shell provides enough to do what you want, but it isn't the speediest remedy. You are either going to end up with an awk or sed solution that calls date or a shell solution that calls awk or sed to parse old date from the original file and feeds the result to date to get your new date. You will have to work out which provides the most efficient remedy.
As far as the one-liner goes, you can do something similar to the following while remaining POSIX compliant. It simply uses awk to get the 2nd field from the file, pipes the result to a while loop which uses expr length "$field" to get the length and uses that within expr substr "$field" "2" <length expression - 2> to chop the double-quotes from the end of the original date olddt, followed by date -d "$olddt" +'%Y/%m/%d %H:%M:%S' to get newdt and finally sed -i "s;$olddt;$newdt;" to perform the substitution in place. Your one-liner (shown with auto line-continuations for readability)
$ awk -F, '{print $2}' timefile.txt |
while read -r field; do
olddt="$(expr substr "$field" "2" "$(($(expr length "$field") - 2))")";
newdt=$(date -d "$olddt" +'%Y/%m/%d %H:%M:%S');
sed -i "s;$olddt;$newdt;" timefile.txt; done
Example Input File
$ cat timefile.txt
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
Resulting File
$ cat timefile.txt
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
There are probably faster ways to do it, but this is a reasonable length one-liner (relatively speaking).
Revised less ugly sed method:
sed 's/^.*,"\|",.*//g;h;s#.*#date "+%Y/%m/%d %T" -d "&"#e;H;g;s#\n\|$#,#g;s/^/s,/' input.csv | sed -f - input.csv
Spread out, (it works the same):
sed 's/^.*,"\|",.*//g
s#.*#date "+%Y/%m/%d %T" -d "&"#e;
s/^/s,/' input.csv | sed -f - input.csv
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
How it works:
The first sed block uses the evaluate command to run date, the output of which is used to generate some new sed substitute commands. To show the new s commands, temporarily replace the shell script | pipe with a # comment:
s,10/11/2017 2:10pm,2017/10/11 14:10:00,
s,10/12/2017 3:10pm,2017/10/12 15:10:00,
These are piped to the second sed.

awk: Preserve multiple field separators

I'm using awk to swap fields in a filename using two different field separators.
I want to know if it's possible to preserve both separators, '/' and '_', in the correct positions in the output.
I want to change this:
into this:
I've tried:
awk -F "[/_]" '{ t=$3; $3=$4; $4=t;print}' file.txt
but the field separators are missing from the output:
path to file example 123.txt
I've tried preserving the field separators:
awk -F "[/_]" '{t=$3; $3=$4; $4=t; OFS=FS; print}' file.txt
but I get this:
Is there a way of preserving the correct original field separator in awk when you're dealing multiple separators?
Here is one solution:
awk -F/ '{n=split($NF,a,"_");b=a[1];a[1]=a[2];a[2]=b;$NF=a[1];for (i=2;i<=n;i++) $NF=$NF"_"a[i]}1' OFS=/ file
You can always use Perl.
$ echo $e
$ echo $e | perl -ple 's/([^_\/]+)_([^_\/]+)/\2_\1/'
$ cat /tmp/1
$ awk -F'_' '{split($1,a,".*/"); gsub(a[2],"",$1);print $1$2"_"a[2]"_"$3}' /tmp/1

how to extract string appears after one particular string in Shell

I am working on a script where I am grepping lines that contains -abc_1.
I need to extract string that appear just after this string as follow :
option : -abc_1 <some_path>
I have used following code :
grep "abc_1" | awk -F " " {print $4}
This code is failing if there are more spaces used between string , e.g :
option : -abc_1 <some_path>
It will be helpful if I can extract the path somehow without bothering of spaces.
This should do:
echo 'option : -abc_1 <some_path>' | awk '/abc_1/ {print $4}'
If you do not specify field separator, it uses one ore more blank as separator.
PS you do not need both grep and awk
With sed you can do the search and the filter in one step:
sed -n 's/^.*abc_1 *: *\([^ ]*\).*$/\1/p'
The -n option suppresses printing, but the p command at the end still prints if a successful substitution was made.
perl -lne ' print $1 if(/-abc_1 (.*)/)' your_file
Tested Here
Or if you want to use awk:
awk '{for(i=1;i<=NF;i++)if($i="-abc_1")print $(i+1)}' your_file
try this grep only way:
grep -Po '^option\s*:\s*-abc_1\s*\K.*' file
or if the white spaces were fixed:
grep -Po '^option : -abc_1 \K.*' file

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
# create csv target file
touch $TARGET
echo #userId,itemId > $TARGET
while read LINE
# Replaces all matches of :: with a ,
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}'
$ cat altfile

Only get hash value using md5sum (without filename)

I use md5sum to generate a hash value for a file.
But I only need to receive the hash value, not the file name.
md5=`md5sum ${my_iso_file}`
echo ${md5}
3abb17b66815bc7946cefe727737d295 ./iso/somefile.iso
How can I 'strip' the file name and only retain the value?
A simple array assignment works... Note that the first element of a Bash array can be addressed by just the name without the [0] index, i.e., $md5 contains only the 32 characters of md5sum.
md5=($(md5sum file))
echo $md5
# 53c8fdfcbb60cf8e1a1ee90601cc8fe2
Using AWK:
md5=`md5sum ${my_iso_file} | awk '{ print $1 }'`
You can use cut to split the line on spaces and return only the first such field:
md5=$(md5sum "$my_iso_file" | cut -d ' ' -f 1)
On Mac OS X:
md5 -q file
md5="$(md5sum "${my_iso_file}")"
md5="${md5%% *}" # remove the first space and everything after it
echo "${md5}"
Another way is to do:
md5sum filename | cut -f 1 -d " "
cut will split the line to each space and return only the first field.
By leaning on head:
md5_for_file=`md5sum ${my_iso_file}|head -c 32`
One way:
set -- $(md5sum $file)
Another way:
md5=$(md5sum $file | while read sum file; do echo $sum; done)
Another way:
md5=$(set -- $(md5sum $file); echo $1)
(Do not try that with backticks unless you're very brave and very good with backslashes.)
The advantage of these solutions over other solutions is that they only invoke md5sum and the shell, rather than other programs such as awk or sed. Whether that actually matters is then a separate question; you'd probably be hard pressed to notice the difference.
If you need to print it and don't need a newline, you can use:
printf $(md5sum filename)
md5=$(md5sum < $file | tr -d ' -')
md5=`md5sum ${my_iso_file} | cut -b-32`
md5sum puts a backslash before the hash if there is a backslash in the file name. The first 32 characters or anything before the first space may not be a proper hash.
It will not happen when using standard input (file name will be just -), so pixelbeat's answer will work, but many others will require adding something like | tail -c 32.
if you're concerned about screwy filenames :
md5sum < "${file_name}" | awk NF=1
other messier ways to deal with this :
md5sum "${file_name}" | awk NF=NF OFS= FS=' .*$'
| awk '_{ exit }++_' RS=' '
to do it entirely inside awk :
mawk 'BEGIN {
__ = ARGV[ --ARGC ]
_ = sprintf("%c",(_+=(_^=_<_)+_)^_+_*++_)
( _=" md5sum < "((_)(__)_) ) | getline
print $(_*close(_)) }' "${file_name}"
Well, I had the same problem today, but I was trying to get the file MD5 hash when running the find command.
I got the most voted question and wrapped it in a function called md5 to run in the find command. The mission for me was to calculate the hash for all files in a folder and output it as hash:filename.
md5() { md5sum $1 | awk '{ printf "%s",$1 }'; }
export -f md5
find -type f -exec bash -c 'md5 "$0"' {} \; -exec echo -n ':' \; -print
So, I'd got some pieces from here and also from 'find -exec' a shell function in Linux
For the sake of completeness, a way with sed using a regular expression and a capture group:
md5=$(md5sum "${my_iso_file}" | sed -r 's:\\*([^ ]*).*:\1:')
The regular expression is capturing everything in a group until a space is reached. To get a capture group working, you need to capture everything in sed.
(More about sed and capture groups here: How can I output only captured groups with sed?)
As delimiter in sed, I use colons because they are not valid in file paths and I don't have to escape the slashes in the filepath.
Another way:
md5=$(md5sum ${my_iso_file} | sed '/ .*//' )
md5=$(md5sum < index.html | head -c -4)
