pair up files with matching extensions - bash

Let's say I have this list of files:
a.1
a.2
a.3
b.1
b.2
Is there a bash one-liner that could find the one 'a' file for which there is no 'b' with the same extension? (i.e. a.3: no match)
I'm sure I could write a short bash/perl script to do this.
But I would like to know if there is any "trick" for this (of course, GNU tools are at my disposal; awk, sed, find ...)

You could try this:
ls -1 [ab].* | sort -t . -k 2 | uniq -u -s2

Here is my bash one-liner
for afile in a*; do bfile=${afile/a/b}; test -f $bfile || echo $afile; done
I like the above solution since it only uses bash and no other tools. However, to showcase the power of Unix tools, the next solution uses 4: ls, sed, sort, and uniq:
ls [ab]* | sed 's/b/a/' | sort | uniq -u

If you can use Perl:
perl -le 'for (<a*>) { /(.*)\.(.*)/; print "$1.$2" if !-e "b.$2"}'

bash version 4 has associative arrays, so you can do this:
declare -A a_files
while read -r filename; do
ext="${filename##*.}"
case "${filename%.*}" in
a) a_files[$ext]="$filename" ;;
b) unset a_files[$ext] ;;
esac
done < <(ls [ab].*)
echo "non-matched 'a' files: ${a_files[#]}"
Or, with awk:
ls [ab].* | awk -F. '
$1 == "a" {a_files[$2] = $0}
$1 == "b" {delete a_files[$2]}
END {for (ext in a_files) print a_files[ext]}
'

Ruby(1.9+)
$ ruby -e 'Dir["a.*"].each{|x|puts x if not File.exist? "b."+x.scan(/a\.(\d+)/)[0][0]}'

Related

Sum of file sizes with awk on a list of files

I have a list of files and want to sum over their file sizes.
So, I created a (global) variable as counter and are trying to loop over that list, get the file size with ls and cut&add it with
export COUNTER=1
for x in $(cat ./myfiles.lst); do ls -all $x | awk '{COUNTER+=$5}'; done
However, my counter is empty?
> echo $COUNTER
> 1
Does someone has an idea for my, what I am missing here?
Cheers and thanks,
Thomas
OK, I found a way piping the result from the awk pipe into a variable
(which is probably not elegant but working ;) )
for x in $(cat ./myfiles.lst); do a=$(ls -all $x |awk '{print $5}'); COUNTER=$(($COUNTER+$a)) ; done
> echo $COUNTER
> 4793061514
awk is getting called for every file, so in COUNTER you got the last file's size.
A better solution is:
ls -all <myfiles.lst | awk '{COUNTER+=$5} END {print COUNTER}'
But you are reinventing the wheel here. You can do something like
du -s <myfiles.lst
(If you have du installed. Note: see the comments below my answer about du. I had tested this with cygwin and with that it worked like a charm.)
Shorter version of the last:
ls -l | awk '{sum += $5} END {print sum}'
Now, say you want to filter by certain types of files, age, etc... Just throw the ls -l into a find, and you can filter using find's extensive filter parameters:
find . -type f -exec ls -l {} \; | awk '{sum += $5} END {print sum}'
ls -ltS | awk -F " " {'print $5'} | awk '{s+=$1} END {print s}'

How to list all files and put number in front of them , using shell

I want to count all files that I have in my directory and put number in front of them, and in a new line, for example :
file.txt nextfile.txt example.txt
and the output to be :
1.file.txt
2.nextfile.txt
3.example.txt
and so on.
i am trying something with : ls -L |
You can do this if you have nl installed:
ls -1 | nl
(Note with modern shells (ls usually a built-in) the -1 part is not needed. And this applies to the below solutions too.)
Or with awk:
ls -1 | awk '{print NR, $0}'
Or with a single awk command:
awk '{c=1 ; for (f in ARGV) {print c, f ; c++ } }' *
Or with cat:
cat -n <(ls -1)
You can do this by using shell built-in printf in a for loop:
n=0
for i in *; do
printf "%d.%s\n" $((n++)) "$i"
done

sh shell script of working with for loop

I am using sh shell script to read the files of a folder and display on the screen:
for d in `ls -1 $IMAGE_DIR | egrep "jpg$"`
do
pgm_file=$IMAGE_DIR/`echo $d | sed 's/jpg$/pgm/'`
echo "file $pgm_file";
done
the output result is reading line by line:
file file1.jpg
file file2.jpg
file file3.jpg
file file4.jpg
Because I am not familiar with this language, I would like to have the result that print first 2 results in the same row like this:
file file1.jpg; file file2.jpg;
file file3.jpg; file file4.jpg;
In other languages, I just put d++ but it does not work with this case.
Would this be doable? I will be happy if you would provide me sample code.
thanks in advance.
Let the shell do more work for you:
end_of_line=""
for d in "$IMAGE_DIR"/*.jpg
do
file=$( basename "$d" )
printf "file %s; %s" "$file" "$end_of_line"
if [[ -z "$end_of_line" ]]; then
end_of_line=$'\n'
else
end_of_line=""
fi
pgm_file=${d%.jpg}.pgm
# do something with "$pgm_file"
done
for d in "$IMAGE_DIR"/*jpg; do
pgm_file=${d%jpg}pgm
printf '%s;\n' "$d"
done |
awk 'END {
if (ORS != RS)
print RS
}
ORS = NR % n ? FS : RS
' n=2
Set n to whatever value you need.
If you're on Solaris, use nawk or /usr/xpg4/bin/awk
(do not use /usr/bin/awk).
Note also that I'm trying to use a standard shell syntax,
given your question is sh related (i.e. you didn't mention bash or ksh,
for example).
Something like this inside the loop:
echo -n "something; "
[[ -n "$oddeven" ]] && oddeven= || { echo;oddeven=x;}
should do.
Three per line would be something like
[[ "$((n++%3))" = 0 ]] && echo
(with n=1) before entering the loop.
Why use a loop at all? How about:
ls $IMAGE_DIR | egrep 'jpg$' |
sed -e 's/$/;/' -e 's/^/file /' -e 's/jpg$/pgm/' |
perl -pe '$. % 2 && chomp'
(The perl just deletes every other newline. You may want to insert a space and add a trailing newline if the last line is an odd number.)

results of wc as variables

I would like to use the lines coming from 'wc' as variables. For example:
echo 'foo bar' > file.txt
echo 'blah blah blah' >> file.txt
wc file.txt
2 5 23 file.txt
I would like to have something like $lines, $words and $characters associated to the values 2, 5, and 23. How can I do that in bash?
In pure bash: (no awk)
a=($(wc file.txt))
lines=${a[0]}
words=${a[1]}
chars=${a[2]}
This works by using bash's arrays. a=(1 2 3) creates an array with elements 1, 2 and 3. We can then access separate elements with the ${a[indice]} syntax.
Alternative: (based on gonvaled solution)
read lines words chars <<< $(wc x)
Or in sh:
a=$(wc file.txt)
lines=$(echo $a|cut -d' ' -f1)
words=$(echo $a|cut -d' ' -f2)
chars=$(echo $a|cut -d' ' -f3)
There are other solutions but a simple one which I usually use is to put the output of wc in a temporary file, and then read from there:
wc file.txt > xxx
read lines words characters filename < xxx
echo "lines=$lines words=$words characters=$characters filename=$filename"
lines=2 words=5 characters=23 filename=file.txt
The advantage of this method is that you do not need to create several awk processes, one for each variable. The disadvantage is that you need a temporary file, which you should delete afterwards.
Be careful: this does not work:
wc file.txt | read lines words characters filename
The problem is that piping to read creates another process, and the variables are updated there, so they are not accessible in the calling shell.
Edit: adding solution by arnaud576875:
read lines words chars filename <<< $(wc x)
Works without writing to a file (and do not have pipe problem). It is bash specific.
From the bash manual:
Here Strings
A variant of here documents, the format is:
<<<word
The word is expanded and supplied to the command on its standard input.
The key is the "word is expanded" bit.
lines=`wc file.txt | awk '{print $1}'`
words=`wc file.txt | awk '{print $2}'`
...
you can also store the wc result somewhere first.. and then parse it.. if you're picky about performance :)
Just to add another variant --
set -- `wc file.txt`
chars=$1
words=$2
lines=$3
This obviously clobbers $* and related variables. Unlike some of the other solutions here, it is portable to other Bourne shells.
I wanted to store the number of csv file in a variable. The following worked for me:
CSV_COUNT=$(ls ./pathToSubdirectory | grep ".csv" | wc -l | xargs)
xargs removes the whitespace from the wc command
I ran this bash script not in the same folder as the csv files. Thus, the pathToSubdirectory
You can assign output to a variable by opening a sub shell:
$ x=$(wc some-file)
$ echo $x
1 6 60 some-file
Now, in order to get the separate variables, the simplest option is to use awk:
$ x=$(wc some-file | awk '{print $1}')
$ echo $x
1
declare -a result
result=( $(wc < file.txt) )
lines=${result[0]}
words=${result[1]}
characters=${result[2]}
echo "Lines: $lines, Words: $words, Characters: $characters"

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources