Bash: Filter directory when piping from `ls` to `tee` - bash

(background info)
Writing my first bash psuedo-program. The program downloads a bunch of files from the network, stores them in a sub-directory called ./network-files/, then removes all the files it downloaded. It also logs the result to several log files in ./logs/.
I want to log the filenames of each file deleted.
Currently, I'm doing this:
echo -e "$(date -u) >>> Removing files: $(ls -1 "$base_directory"/network-files/* | tr '\n' ' ')" | tee -a $network_files_log $verbose_log $network_log
($base_directory is a variable defining the base directory for the app, $network_files_log etc are variables defining the location of various log files)
This produces some pretty grody and unreadable output:
Tue Jun 21 04:55:46 UTC 2016 >>> Removing files: /home/vagrant/load-simulator/network-files/207822218.png /home/vagrant/load-simulator/network-files/217311040.png /home/vagrant/load-simulator/network-files/442119100.png /home/vagrant/load-simulator/network-files/464324101.png /home/vagrant/load-simulator/network-files/525787337.png /home/vagrant/load-simulator/network-files/581100197.png /home/vagrant/load-simulator/network-files/640387393.png /home/vagrant/load-simulator/network-files/650797708.png /home/vagrant/load-simulator/network-files/827538696.png /home/vagrant/load-simulator/network-files/833069509.png /home/vagrant/load-simulator/network-files/8580204.png /home/vagrant/load-simulator/network-files/858174053.png /home/vagrant/load-simulator/network-files/998266826.png
Any good way to strip out the /home/vagrant/load-simulator/network-files/ part from each of those file paths? I suspect there's something I should be doing with sed or grep, but haven't had any luck so far.

You might also consider using find. Its perfect for walking directories, removing files and using customized printf for output:
find $PWD/x -type f -printf "%f\n" -delete >>$YourLogFile.log

Don't use ls at all; use a glob to populate an array with the desired files. You can then use parameter expansion to shorten each array element.
d=$base_directory/network-files
files=( "$d"/* )
printf '%s Removing files: %s' "$(date -u)" "${files[*]#$d/}" | tee ...

You could do it a couple of ways. To directly answer the question, you could use sed to do it with the substitution command like:
echo -e "$(date -u) >>> Removing files: $(ls -1 "$base_directory"/network-files/* | tr '\n' ' ')" | sed -e "s,$base_directory/network-files/,," | tee -a $network_files_log $verbose_log $network_log
which adds sed -e "s,$base_directory/network-files/,," to the pipeline. It will substitute the string found in base_directory with the empty string, so long as there are not any commas in base_directory. If there are you could try a different separator for the parts of the sed command, like underscore: sed -e "s_$base_directory/network-files__"
Instead though, you could just have the subshell cd to that directory and then the string wouldn't be there in the first place:
echo -e "$(date -u) >>> Removing files: $(cd "$base_directory/network-files/"; ls -1 | tr '\n' ' ')" | tee -a "$network_files_log" "$verbose_log" "$network_log"
Or you could avoid some potential pitfalls with echo and use printf like
{ printf '%s >>>Removing files: '; printf '%s ' "$(cd "$base_directory/network-files"; ls -1)"; printf '\n'; } | tee -a ...

testdata="/home/vagrant/load-simulator/network-files/207822218.png /home/vagrant/load-simulator/network-files/217311040.png"
echo -e $testdata | sed -e 's/\/[^ ]*\///g'
Pipe your output to sed the replace that captured group with nothing.
The regex: \/[^ ]*\/
Start with a /, captured everything that is not a space until it gets to the last /.

Related

delete files with partial match to another list of names

I have a list of names in file1 that is a substring of the true filenames of the files I want to delete. I want to delete the files that are partially matched by the names in file1. Any idea how to specify the files to delete?
file1:
file123
file313
file355
True files:
file123.sam.out
file313.sam.out
file355.sam.out
file342455.sam.out
file34455.sam.out
Files to keep:
file342455.sam.out
file34455.sam.out
Assuming you don't have any filenames containing newline literals...
printf '%s\n' * | grep -Fvf file1 | xargs -d $'\n' rm -f --
Let's walk through this piece-by-piece:
printf '%s\n' * generates a list of files in your current directory.
grep -Fvf file1 removes from that list any string that contains as a substring a line from file1
xargs -d $'\n' rm -f -- splits its stdin on newlines, and passes anything it's left as an argument to rm -f --
If you have GNU tools (and a shell, like bash, with process substitution support), you can use NUL delimiters and thus work with all possible filenames:
printf '%s\0' * |
grep -zFvf <(tr '\n' '\0' <file1) |
xargs -0 rm -f --
printf '%s\0' * puts a NUL, instead of a newline, after each filename.
tr '\n' '\0' <file1 emits the contents of file1 with newlines replaced with NULs
grep -zFvf reads both its inputs as NUL-delimited, and writes NUL-delimited output, but otherwise behaves as above.
xargs -0 rm -f -- reads content, splitting on NULs, and passes input as arguments to rm -f --.
#!/bin/bash
PATTERN_FILE=file1
FILE_TO_REMOVE_FOLDER=files
cat $PATTERN_FILE | while read x
do
if [ "" != "$x" ]
then
echo "rm $FILE_TO_REMOVE_FOLDER/$x*"
rm $FILE_TO_REMOVE_FOLDER/$x*
fi
done

combining all files that contains the same word into a new text file with leaving new lines between individual files

it is my first question here. I have a folder called "materials", which has 40 text files in it. I am basically trying to combine the text files that contain the word "carbon"(both in capitalized and lowercase form)in it into a single file with leaving newlines between them. I used " grep -w carbon * " to identify the files that contain the word carbon. I just don't know what to do after this point. I really appreciate all your help!
grep -il carbon materials/*txt | while read line; do
echo ">> Adding $line";
cat $line >> result.out;
echo >> result.out;
done
Explanation
grep searches the strings in the files. -i ignores the case for the searched string. -l prints on the filename containing the string
while command loops over the files containing the string
cat with >> appends to the results.out
echo >> adds new line after appending each files content to result.out
Execution
$ ls -1 materials/*.txt
materials/1.txt
materials/2.txt
materials/3.txt
$ grep -i carbon materials/*.txt
materials/1.txt:carbon
materials/2.txt:CARBON
$ grep -irl carbon materials/*txt | while read line; do echo ">> Adding $line"; cat $line >> result.out; echo >> result.out; done
>> Adding materials/1.txt
>> Adding materials/2.txt
$ cat result.out
carbon
CARBON
Try this (assuming your filenames don't contain newline characters):
grep -iwl carbon ./* |
while IFS= read -r f; do cat "$f"; echo; done > /tmp/combined
If it is possible that your filenames may contain newline characters and your shell is bash, then:
grep -iwlZ carbon ./* |
while IFS= read -r -d '' f; do cat "$f"; echo; done > /tmp/combined
grep is assumed to be GNU grep (for the -w and -Z options). Note that these will leave a trailing newline character in the file /tmp/combined.

Sed replace substring only if expression exist

In a bash script, I am trying to remove the directory name in filenames :
documents/file.txt
direc/file5.txt
file2.txt
file3.txt
So I try to first see if there is a "/" and if yes delete everything before :
for i in **/*.scss *.scss; do
echo "$i" | sed -n '^/.*\// s/^.*\///p'
done
But it doesn't work for files in the current directory, it gives me a blank string.
I get :
file.txt
file5.txt
When you only want the filename, use basename instead of sed.
# basename /path/to/file
returns file
here is the man page
Your sed attempt is basically fine, but you should print regardless of whether you performed a substitution; take out the -n and the p at the end. (Also there was an unrelated syntax error.)
Also, don't needlessly loop over all files.
printf '%s\n' **/*.scss *.scss |
sed -n 's%^.*/%%p'
This also can be done with awk bash util.
Example:
echo "1/2/i.py" | awk 'BEGIN {FS="/"} {print $NF}'
output: i.py
Eventually, I did :
for i in **/*.scss *.scss; do
# for i in *.scss; do
# for i in _hm-globals.scss; do
name=${i##*/} # remove dir name
name=${name%.scss} # remove extension
name=`echo "$name" | sed -n "s/^_hm-//p"` # remove _hm-
if [[ $name = *"."* ]]; then
name=`echo "$name" | sed -n 's/\./-/p'` #replace . to --
fi
echo "$name" >&2
done

How does xargs format the input of $'\n'?

Problem
(1) Given a string, I replace spaces with $'\n' using sed:
echo "one two" | sed 's/ /$'"'"'\\n'"'"'/g'
This outputs:
# one$'\n'two
(2) Note that echoing this output of (1):
echo one$'\n'two
results in:
# one
# two
(3) I echo the output of (1) in another way, by piping the output of (1) into xargs echo:
echo "one two" | sed 's/ /$'"'"'\\n'"'"'/g' | xargs echo
But I don't get the same output as (2):
# one$\ntwo
Question
What does xargs do when formatting the input of a string containing $'\n'?
Why is echoing a string with $'\n' not the same as using xargs echo on the same string?
When you write
echo one$'\n'two
at the command line, bash replaces the "$'\n'" with a newline. But when you pass it to xargs no such replacement can happen.
But piping it to xargs will still not do what you want, since by default xargs uses the newline as an argument separator:
$ echo "one two" | tr ' ' '\n' | xargs echo
one two
You must tell xargs to use a different separator, even if it is a bogus one:
$ echo "one two" | tr ' ' '\n' | xargs -0 echo
one
two
Unsure if answering your question, but a trick I've used in the past for similar cases is to use printf instead, which loops over passed arguments in a loop (if not enough % to consume them), e.g.:
$ printf "%s\n" one two
one
two
Use shell own white-space separator if above are in a single string
$ args="one two"
$ printf "%s\n" $args
one
two
Just for completeness, feed to xargs -n1 with some foo scriptlet
$ printf "%s\n" one two |xargs -n1 sh -c 'echo [$(date -R)] foo=$1' --
[Sun, 03 Jun 2018 21:34:17 -0300] foo=one
[Sun, 03 Jun 2018 21:34:17 -0300] foo=two

extract characters from filename of newest file

I am writing a bash script where i will need to check a directory for existing files and look at the last 4 digits of the first segment of the file name to set the counter when adding new files to the directory.
Naming Scructure:
yymmddHNAZXLCOM0001.835
I need to put the portion in the example 0001 into a CTR variable so the next file it puts into the directory will be
yymmddHNAZXLCOM0002.835
and so on.
what would be the easiest and shortest way to do this?
You can do this with sed:
filename="yymmddHNAZXLCOM0001.835"
first_part=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\1/')
counter=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\2/')
suffix=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\3/')
echo "$first_part$(printf "%04u" $(($counter + 1))).$suffix"
=> "yymmddHNAZXLCOM0002.835"
All three sed calls use the same regular expression. The only thing that changes is the group selected to return. There's probably a way to do all of that in one call, but my sed-fu is rusty.
Alternate version, using a Bash array:
filename="yymmddHNAZXLCOM0001.835"
ary=($(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\1 \2 \3/'))
echo "${ary[0]}$(printf "%04u" $((${ary[1]} + 1))).${ary[2]}"
=> "yymmddHNAZXLCOM0002.835"
Note: This version assumes that the filename does not have spaces in it.
Try this...
current=`echo yymmddHNAZXLCOM0001.835 | cut -d . -f 1 | rev | cut -c 1-4 | rev`
next=`echo $current | awk '{printf("%04i",$0+1)}'`
f() {
if [[ $1 =~ (.*)([[:digit:]]{4})(\.[^.]*)$ ]]; then
local -a ctr=("${BASH_REMATCH[#]:1}")
touch "${ctr}$((++ctr[1]))${ctr[2]}"
# ...
else
echo 'no matches'
fi
}
shopt -s nullglob
f *

Resources