Deleting files with same using shell script - bash

Im totally newbie in shell script.
Im need compare file name in two directories and delete files with same name.
EG:
Directory1/
one
two
three
four
Directory2/
two
four
five
After run script the directories will be:
Directory1/
one
three
Diretory2/
five
Thanks

test -f tests if a file exists:
cd dir1
for file in *
do
test -f ../dir2/$file && rm $file ../dir2/$file
done
cd ..

Quick and dirty:
while read fname
do
rm -vf Directory{1,2}/"$fname"
done < <(sort
<(cd Directory1/ && ls)
<(cd Directory2/ && ls) |
uniq -d)
This assumes a number of things about the filenames, but it should get you there with the input shown, and similar cases.
Tested too, now:
mkdir /tmp/stacko && cd /tmp/stacko
mkdir Directory{1,2}
touch Directory1/{one,two,three,four} Directory2/{two,four,five}
Runnning the command shows:
removed `Directory1/four'
removed `Directory2/four'
removed `Directory1/two'
removed `Directory2/two'
And the resulting tree is:
Directory1/one
Directory1/three
Directory2/five

Related

SSH - Run multiple commands at once [duplicate]

Say I have a file /templates/apple and I want to
put it in two different places and then
remove the original.
So, /templates/apple will be copied to /templates/used AND /templates/inuse
and then after that I’d like to remove the original.
Is cp the best way to do this, followed by rm? Or is there a better way?
I want to do it all in one line so I’m thinking it would look something like:
cp /templates/apple /templates/used | cp /templates/apple /templates/inuse | rm /templates/apple
Is this the correct syntax?
You are using | (pipe) to direct the output of a command into another command. What you are looking for is && operator to execute the next command only if the previous one succeeded:
cp /templates/apple /templates/used && cp /templates/apple /templates/inuse && rm /templates/apple
Or
cp /templates/apple /templates/used && mv /templates/apple /templates/inuse
To summarize (non-exhaustively) bash's command operators/separators:
| pipes (pipelines) the standard output (stdout) of one command into the standard input of another one. Note that stderr still goes into its default destination, whatever that happen to be.
|&pipes both stdout and stderr of one command into the standard input of another one. Very useful, available in bash version 4 and above.
&& executes the right-hand command of && only if the previous one succeeded.
|| executes the right-hand command of || only it the previous one failed.
; executes the right-hand command of ; always regardless whether the previous command succeeded or failed. Unless set -e was previously invoked, which causes bash to fail on an error.
Why not cp to location 1, then mv to location 2. This takes care of "removing" the original.
And no, it's not the correct syntax. | is used to "pipe" output from one program and turn it into input for the next program. What you want is ;, which seperates multiple commands.
cp file1 file2 ; cp file1 file3 ; rm file1
If you require that the individual commands MUST succeed before the next can be started, then you'd use && instead:
cp file1 file2 && cp file1 file3 && rm file1
That way, if either of the cp commands fails, the rm will not run.
Note that cp A B; rm A is exactly mv A B. It'll be faster too, as you don't have to actually copy the bytes (assuming the destination is on the same filesystem), just rename the file. So you want cp A B; mv A C
Another option is typing Ctrl+V Ctrl+J at the end of each command.
Example (replace # with Ctrl+V Ctrl+J):
$ echo 1#
echo 2#
echo 3
Output:
1
2
3
This will execute the commands regardless if previous ones failed.
Same as: echo 1; echo 2; echo 3
If you want to stop execution on failed commands, add && at the end of each line except the last one.
Example (replace # with Ctrl+V Ctrl+J):
$ echo 1 &&#
failed-command &&#
echo 2
Output:
1
failed-command: command not found
In zsh you can also use Alt+Enter or Esc+Enter instead of Ctrl+V Ctrl+J
Using pipes seems weird to me. Anyway you should use the logical and Bash operator:
$ cp /templates/apple /templates/used && cp /templates/apple /templates/inuse && rm /templates/apples
If the cp commands fail, the rm will not be executed.
Or, you can make a more elaborated command line using a for loop and cmp.
Try this..
cp /templates/apple /templates/used && cp /templates/apple /templates/inuse && rm /templates/apple

Listing only directories using ls in Bash but preserve ls format and without dirname

There are already many answers for this similar question. But none of them satisfy my requirements. I want
List all directories under a directory without using glob (*) syntax, i.e. I want to directly use lsdir somedir
Output should containing basename of the directories like when you just use ls, like:
$ lsdir path/to/some/dir
dir1 dir2 dir3 dir4
but not this:
$ lsdir path/to/dir
path/to/dir/dir1 path/to/dir/dir2 path/to/dir/dir3 path/to/dir/dir4
To satisfy requirement 1, it seems feasible to define a function, but anyway we are going to use -d option, to list the directories themselves of the ls command parameters.
And when using -d option, ls list directory names with its parent prepended, like above.
ls format (color, align, sort) should be preserved.
To satisfy requirement 2, we can use find but in this way we lose all the ls output format, like coloring (based on customized dircolors theme), alignment (output in aligned columns), sorting (sorting customized with various flags and in a column-first manner), and maybe some other things.
I know it's too greedy to want this many features simultaneously, and indeed I can live without all of them.
It's possible to emulate ls output format manually but that's too inconsistent.
I wonder if there is a way to achieve this and still utilize ls, i.e. how to achieve requirement 2 using ls.
This may be what you're looking for:
cd path/to/dir && dirs=(*/) && ls -d "${dirs[#]%?}"
or, perhaps
(shopt -s nullglob; cd path/to/dir && dirs=(*/) && ((${#dirs[#]} > 0)) && ls -d "${dirs[#]%?}")
The second version runs in a subshell and prints nothing if there is no any subdirectory inside path/to/dir.
Based on #M. Nejat Aydin's excellent answer, I am going to improve a little more to make it a useful command, especially with respect to processing options and multiple directories.
list_directories() {
local opts=()
local args=()
for i in $(seq $#); do
if [[ "${!i}" == -* ]]; then
opts+=("${!i}")
else
args+=("${!i}")
fi
done
(( ${#args[#]} == 0 )) && args=('.')
local -i args_n_1=${#args[#]}-1
for i in $(seq 0 $args_n_1); do
if (( ${#args[#]} > 1 )); then
(( i > 0 )) && echo
echo "${args[i]}:"
fi
(
shopt -s nullglob
cd "${args[i]}" &&
dirs=(*/) &&
(( ${#dirs[#]} > 0 )) &&
ls -d "${opts[#]}" "${dirs[#]%?}"
)
done
}
alias lsd=list_directories
This lsd can be used with any number of ls options and directories freely mixed.
$ lsd -h dir1 dir2 -rt ~
Note: Semantic meaning changes when you use globs with lsd.
lsd path/to/dir* list all directories under each directory starting with "path/to/dir".
To list all directories starting with "path/to/dir", use plain old ls -d path/to/dir*.

create a directory for every file and generate “n” copies for each file

while I was looking for a solution for my files, I found something that is perfect, I include the answer here: https://unix.stackexchange.com/questions/219991/how-do-i-create-a-directory-for-every-file-in-a-parent-directory/220026#220026?newreg=94b9d49a964a4cd1a14ef2d8f6150bf8
but now, my problem is how can generate 50 copies to the directories generated by each file I was dealing with the following command line
ls -p | grep -v / | xargs -t -n1 -i bash -c 'for i in {1..50}; do cp {} "{}_folder/copy${i}_{}" ; done'
to get the following
-file1.csv---->folder_for_file1---->1copy_file1.csv,2copy_file1.csv,3copy_file1.csv........50copy_file1.csv
-file2.csv---->folder_for_file2---->1copy_file2.csv,2copy_file2.csv,3copy_file2.csv........50copy_file2.csv
-file3.csv---->folder_for_file3---->1copy_file3.csv,2copy_file3.csv,3copy_file3.csv........50copy_file3.csv
...
-file256.csv---->folder_forfile256---->1copy_file256.csv,2copy_file256.csv,3copy_file256.csv........50copy_file256.csv
How can I match this with the previous answer??, include the functional code of that answer
cd ParentFolder
for x in ./*.csv; do
mkdir "${x%.*}" && mv "$x" "${x%.*}"
done
all the credits to the person who generated this great answer and thanks in advance to everyone
Replace the move for a copy/remove and add a for loop:
cd ParentFolder
for x in ./*.csv; do
mkdir "${x%.*}"
for (( i=1;i<=50;i++ )); do # Create a loop, looping 50 times
cp "$x" "${x%.*}/copy$i_$x" # use i in the copy command
rm -f "$x" # Remove the file after the 50 copies
done
done
I have done some tests and I can publish the following code that works partially, because it effectively copies each file 50 times within the generated folder, but with the name "copy" to each new file, and also adds the extension .csv, but if someone can provide a solution to solve this would be great, I thank to #Raman Sailopal for his help and comments
code
cd pruebas
for x in ./*.csv; do
mkdir "${x%.*}"
for ((i=1;i<=50;i++)); do # Create a loop, looping 50 times
cp "$x" "${x%.*}/copy_$x_$i.csv" # use i in the copy command
#rm -f "$x" # Remove the file after the 50 copies
done
done

if then else statement will not loop properly

I figured how to get an if then else statement to work but it now seems to have broken. =( I cannot work out what is going wrong!
There are up to 10 directories in ./ called barcode01 - 09 and one called unclassified. This script is supposed to go into each one, prep the directory for ~/Taxonomy.R (Which requires all the fastq files to be gzipped and put into a sub-directory titled "data". It then runs the ~/Taxonomy.R script to make a metadata file for each.
Edit the tmp.txt file is created using ls > tmp.txt then echo "0" >> tmp.txt to make a sacrificial list of directories for the script to chew through then stop when it gets to 0.
#!/bin/bash
source deactivate
source activate R-Env
value=(sed -n 1p tmp.txt)
if [ "$value" = "0" ]
then
rm tmp.txt
else
cd "$(sed -n 1p tmp.txt)"
gzip *fastq
#
for i in *.gz
do
mv "$i" "${i%.*}_R1.fastq.gz";
done
#this adds the direction identifier "R1" to all the fastq.gzips
mkdir Data
mv *gz Data
~/Taxonomy3.R
cd Data
mv * ..
cd ..
rm -r Data
cd ..
sed '1d' tmp.txt > tmp2.txt
mv tmp2.txt tmp.txt
fi
Currently, it is only making the metadata file in the first barcode directory.
If you indent your code, things will get a lot clearer.
On the other hand, modifying your tmp.txt file this way id slow and dangerous. Better traverse its contents only reading it.
#!/bin/bash
source deactivate
source activate R-Env
for value in $(<tmp.txt)
do
cd "$value"
gzip *fastq
for i in *.gz
do
# This adds the direction identifier "R1" to all the fastq.gzips
mv "$i" "${i%.*}_R1.fastq.gz"
done
mkdir Data
mv *gz Data
~/Taxonomy3.R
mv Data/* .
rmdir Data
cd -
done
rm tmp.txt
With this reworked script you only need to create the tmp.txt file WITHOUT adding any marker at the end (in fact, you never needed it, you could have checked for empty file).
For each folder in the script, the operations you wanted are executed. I simplified some folder changing, minimizing it to the required ones for the R script to properly run. To go back, I used cd -, which goes to the previous folder, that way you can have more than one leven in your tmp.txt file.
Hope everything else is clear.

gnu parallel to parallelize a for loop

I have seen several questions about this topic, but I lack the ability to translate this to my specific problem. I have a for loop that loops through sub directories and then executes a .sh script on a compressed text file inside each directory. I want to parallelize this process, but I'm struggling to apply gnu parallel.
Here is my loop:
for d in ./*/ ; do (cd "$d" && script.sh); done
I understand I need to input a list into parallel, so i have been trying this:
ls -d */ | parallel cd && script.sh
While this appears to get started, I get an error when gzip tries to unzip one of the txt files inside the directory, saying the file does not exist:
gzip: *.txt.gz: No such file or directory
However, when I run the original for loop, I have no issues aside from it taking a century to finish. Also, I only get the gzip error once when using parallel, which is so weird considering I have over 1000 sub-directories.
My questions are:
How do I get Parallel to work in my case? How do I get parallel to parallelize the application of a .sh script to 1000s of files in their own sub-directories? ie- what is the solution to my problem? I gotta make progress.
What am I missing? Syntax, loop, bad script? I want to learn.
Is Parallel actually attempting to run all these .sh scripts in parallel? Why dont I get an error for every .txt.gz file?
Is parallel the best option for the application? Is there another option that is better suited to my needs?
Two problems:
In:
ls -d */ | parallel cd && script.sh
what is paralleled is just cd, not script.sh. script.sh is only executed once, after all parallel cd jobs have run, if there was no error. It is the same as:
ls -d */ | parallel cd
if [ $? -eq 0 ]; then script.sh; fi
You do not pass the target directory to cd. So, what is executed by parallel is just cd, which just changes the current directory to your home directory. The final script.sh is executed in the current directory (from where you invoked the command) where there are probably no *.txt.gz files, thus the error.
You can check yourself the effect of the first problem with:
$ mkdir /tmp/foobar && cd /tmp/foobar && mkdir a b c
$ ls -d */ | parallel cd && pwd
/tmp/foobar
The output of pwd is printed only once, even if you have more than one input directory. You can fix it by quoting the command and then check the second problem with:
$ ls -d */ | parallel 'cd && pwd'
/homes/myself
/homes/myself
/homes/myself
You should see as many pwd outputs as there are input directories but it is always the same output: your home directory. You can fix the second problem by using the {} replacement string that is substituted with the current input. Check it with:
$ ls -d */ | parallel 'cd {} && pwd'
/tmp/foobar/a
/tmp/foobar/b
/tmp/foobar/c
Now, you should have all input directories properly listed in the output.
For your specific problem this should work:
ls -d */ | parallel 'cd {} && script.sh'

Resources