Move just those files named as specific rows in a sample sheet - bash

Imagine I have these files in my working directory in bash:
123.tsv 456.tsv 789.tsv 101112.tsv 131415.tsv
and that I have this sample sheet (tab separated):
sampleID tissue
123 lung
124 bone
456 lung
457 bone
Now, I want to move those files corresponding to lung samples to a new directory, so I would like to have the following files in the new directory:
123.tsv
456.tsv
I was trying to use:
awk -F"\t" '$2 == "lung"'
But I am not sure about how to include this in a for loop to select filenames included in the first column of the output file from the awk command.
How can I solve this?

If row number is larger 1 and second column contains lung then print content of first column with some text around it:
mkdir new_dir
awk 'NR>1 && $2=="lung" {print "mv", $1 ".tsv new_dir"}' sample.sheet
If output looks fine, append | sh to awk line to execute commands.

#!/bin/sh
#
#
me=$( basename "${0}" )
# Adjust these as needed. If you want to use your current
# working directory change (or remove) `/tmp/` to `./`.
old_dir="/tmp/foo"
new_dir="/tmp/bar"
list="/tmp/sample_sheet"
# Make sure all the pieces are available. Exit if not.
if [ ! -d "${old_dir}" ]
then
echo "ERROR: ${me}: Source '${old_dir}' does not exist." 1>&2
exit 1
elif [ ! -d "${new_dir}" ]
then
echo "ERROR: ${me}: Target '${new_dir}' does not exist." 1>&2
exit 2
elif [ ! -r "${list}" ]
then
echo "ERROR: ${me}: Sample sheet input '${list}' does not exist." 1>&2
exit 3
fi
# Iterate over the first column in `${list}`.
for file in $( awk 'NR>1 && $2=="lung" {print $1".tsv"}' "${list}" )
do
# If the file exists move it, if not do nothing.
if [ -f "${old_dir}/${file}" ]
then
echo "INFO: ${me}: mv ${old_dir}/${file} ${new_dir}/${file}"
mv "${old_dir}/${file}" "${new_dir}/${file}"
fi
done

Here's a script that you can run like, for example, this:
./move_files.sh lung
This works for both cases (lung and bone), and is general. Put this into a file called move_files.sh:
#!/usr/bin/env bash
files=$(sed -e "s/\([0-9]\{3\}\)\( *$1\)/\1/g" <(grep $1 eg.sheet))
if [ ! -d $1 ]; then
mkdir $1
fi
for t in ${files[#]}; do
mv "./$t.tsv" $1
done
With the following directory content:
101112.tsv 123.tsv 124.tsv 131415.tsv 456.tsv 457.tsv 789.tsv eg.sheet move_files.sh
and eg.sheet containing:
sampleID tissue
123 lung
124 bone
456 lung
457 bone
... running the script with
./move_files.sh lung
... results in 123.tsv and 456.tsv being moved into a newly created lung directory (or simply moved there if the directory already exists).
You can then simply run
./move_files.sh bone
to move 124.tsv and 457.tsv to a newly created bone directory. Of course this is then generalisable to whatever is in eg.sheet.
Side note: you must run chomd +x move_files.sh in order to use it in the way I've suggested. Otherwise, you can invoke it with bash move_files.sh lung instead.
EDIT:
To address the point raised by keithpjolley in the comments, this can still work with "tissues" such as "eye lash" just by quoting the $1 variable throughout and by calling it with a quoted string (e.g., ./move_files.sh "eye lash"):
#!/usr/bin/env bash
files=$(sed -e "s/\([0-9]\{3\}\)\( *$1\)/\1/g" <(grep "$1" eg.sheet))
if [ ! -d "$1" ]; then
mkdir "$1"
fi
for t in ${files[#]}; do
mv "./$t.tsv" "$1"
done

Related

Simple bash shell program

I want to write the bash script which would accept 4 parameters: name of file, name of directory, and two strings.
If there is a mistake (if first parameter is not a file or second is not a directory) then string which is a third parameter should be printed else file should be copied to directory and string which is a fourth parameter should be printed. I don't why the compiler reports mistake in line 3 with then.
#!/bin/bash
if [-f $1]; then
if[-d $2] ; then
cp $1 / $2
echo $4
fi
done
else
echo $3
exit 1
fi
If you are having problems, paste your code in at https://www.shellcheck.net/
Fix each issue, then get the report again.
The result:
#!/bin/bash
if [ -f "$1" ]; then
if [ -d "$2" ] ; then
cp "$1" / "$2"
echo "$4"
fi
else
echo "$3"
exit 1
fi
I still think you are likely to have an issue at line 4 though, when it tries to copy the root directory into arg 2 without -r. I think what you meant was just
cp "$1" "$2"
Also, you have action for the case that someone passes a valid file as $1 but a non-directory as $2. The program will just exit silently and do nothing.

How to add a character to folder names when using the ls command?

I would like to display a character at the beginning of all folder names when typing the ls command.
So instead of this:
ls
Folder-1 Folder-2 file.txt
It displays this:
ls
📁Folder-1 📁Folder-2 file.txt
Is there a script I can write in my .bash_profile to do this?
You can make a custom script that makes it for you:
#!/bin/bash
lsmod=($(ls)) # Convert ls output to an array
for folder in ${lsmod[#]}; do # Iterate over ls results
if [[ -d $folder ]]; then # If this is a folder then...
echo "=> $folder" # Put your char
else # If not, display normally
echo "$folder"
fi
done
Folders are pointed out with a =>. Hope this helps.
One line approach:
for i in `ls`; do echo $([[ -d "$i" ]] && echo "=> $i" || echo "$i"); done;

bash script - ignoring whitespaces in script parameters

I'm quite new to bash scripting and I've ran out of ideas in my homework script.
The script takes 4 arguments - pathToDirectory, [-c/-m/-r] == copy/move/remove, ext1 and ext2 (example of running a script: script.sh /home/user/somefolder -c a.txt b.sh ).
The script should find all files in /home/user/someFolder (and its all subfolders) that contain 'a.txt' in their names and (in -c and -m case) rename that 'a.txt' part to 'b.sh' and depending on -c/-m argument either create a new file or just rename an existing file (in -r case it just removes the file) and then write in stdout something like 'old name => new name'.
example output of a script mentioned above:
/home/user/someFolder/bbb.txt => /home/user/someFolder/bba.txt
Well, that was not a problem to implement, everything worked until I posted my code to our upload system (evaluates our script).
The very first Upload System's try to run my script looked like "script.sh /something/graph 1 -c .jpg .jpeg".
The problem now is, that the whole '/something/graph 1' is a path and that whitespace before '1' ruins it all.
expected output: ./projekty/graph 1.jpg => ./projekty/graph 1.jpeg
my script output: ./projekty/graph => ./projekty/graph.jpeg
1.jpg => 1.jpeg
What I have so far:
if [ "$2" = "-r" ]; then
for file in $(find $1 -name "*$3"); do
echo $file
rm -f $file
done
elif [ "$2" = "-c" ]; then
for file in $(find "$1" -name "*$3") ; do
cp "$file" "${file//$3/$4}"
echo $file "=>" ${file%$3}$4
done
elif [ "$2" = "-m" ]; then
for file in $(find $1 -name "*$3"); do
mv "$file" "${file//$3/$4}"
echo $file "=>" ${file%$3}$4
done
else
echo Unknown parameter >&2
fi
My tried&notworking&probablystupid idea: as the -r/-c/-m parameter should be at $2, I was able to detect that $2 is something else (assumpting something that still belongs to the path) and append that $2 thing to $1, so then I had a variable DIR which was the whole path. Using shift I moved all parameters to the left (because of the whitespace, the -r/-m/-c parameter was not on $2 but on $3, so I made it $2 again) and then the code looked like: (just the -c part)
DIR=$1
if [ "$2" != "-r" ] && [ "$2" != "-c" ] && [ "$2" != "-m" ]; then
DIR+=" $2"
shift
fi
if [ "$2" = "-c" ]; then
for file in $(find "$DIR" -name "*$3") ; do
cp "$file" "${file//$3/$4}"
echo $file "=>" ${file%$3}$4
done
fi
when i echoed "$DIR", it showed the whole path (correctly), but it still didn't work..
Is there any other/better/any way how to fix this please ? :/
Thanks in advance !
As the target string needs to be replaced only at the very end of a filename, "${file//$3/$4}" is a bad idea.
Example: ./projekty/graph.jpg.jpg.jpg.graph.jpg
Passing a string prone to unquoted expansion to a loop is a no better idea either.
The fact is that find works as expected and its output looks like:
./projekty/graph 1.jpg
But inside a loop it is expanded incorrectly:
./projekty/graph
1.jpg
To avoid this, you can save the output of find to a variable and then tokenize it until no text is left:
list="$(find $1 -name "*$3")"
while [ -n "$list" ]; do
file="${list%%$'\n'*}"
list="${list#$file}"
list="${list#$'\n'}"
# your commands here
# ...
done

How to make option to run one directory or all directory in shell script?

I want to make the decision for Shell script. Either to run only one directory or all directory. This is my coding:
#!/bin/sh
function_run ()
{
python Declare.py
for a in $1
do
echo $a
# Loop to call the file.
for i in 1 2 3 4 5
do
# Cut the row and column
grep -v '^#' $a/result*.txt | tr -s ' ' | cut -d ' ' -f 6 | cat > pos.txt
done
python Calculate.py $a/pos.txt $a/neg.txt $a/pos$i.txt $a/neg$i.txt
python Graph.py $a/output*.py $a/output1.py
python allMax.py "$a" $a/output1.py $a/maxList.py
done
python allMaxGraph.py maxList.py maxTPRlist.py max1TPRlist.py
}
echo "Which KO you want to run?(If all just put ALL): "
read input
a=K*
if [ $input = 'ALL' ]
then
function_run $a
elif [ -d $input ]
then
function_run $input
else
echo "File $input does not exist"
fi
But when I want to run for all directory and I input ALL. But only one directory work and the script stop. It not run for all directory.
Your main problem is that you are reusing the variable name a inside the function_run(). Change the local variable name or use local a.
You have more problems.
Start using quotes around variables, such as
a="K*"
if [ "$input" = "ALL" ]
You have a check for an existing dir. That check is skipped when you use ALL. Move the check to the function.
I do not understand your function. You are looping with $i, but never use the different values.
Your function is parsing the input with for a in $1. Use $* for all parameters.
When you think you are finished, also test with a directory name with a space inside (mkdir "Know what").

Command with two in files, one out file, looped

so here is my dilemma. I have a command in the form:
grdpaste infile.grd infile.grd -Goutfile.grd
I have a series of folders in the same directory that each contain a file named infile.grd. I want to iterate through all the folder so that the first run combines infile.grd from the first and second folder, and then the second combines outfile.grd from the first run and infile.grd from the third folder, and so on. I do not know how many folders exist, and the final product should contain the combination of all the infiles.
I think I can use a counter to control the combination parts (I did it earlier in my script), but I do not know how to make a for loop that takes one file from one folder and the other file from the next folder, without knowing the names of the folders. I hope this makes sense, thanks much.
AM
If grdpaste will accept an empty input file in a sane way then the following should work:
lastfile=dummy.grd
touch "$lastfile"
for infile in */infile.grd; do
_outfile=outfile$((i++)).grd
grdpaste "$lastfile" "$infile" -G"$_outfile"
lastfile=$_outfile
done
If it can't then the above loop needs to be modified to store the first name it sees in $lastfile and do nothing else that first loop through... something like this:
lastfile=
for infile in */infile.grd; do
[ -z "$lastfile" ] && { lastfile=$infile; continue; }
_outfile=outfile$((i++)).grd
grdpaste "$lastfile" "$infile" -G"$_outfile"
lastfile=$_outfile
done
solution posted below. For complete code, see moravi project here.
for folder in */
do
ls "$folder" | sed 's/e/e/' >"${folder%/}.tmp"
done
for file in *.tmp
do
lat=$(echo $file | awk -F "." '{print $1}')
count=0
while read line
do
count=$(( $count + 1 ))
if [ "$count" = "1" ]
then
declare "tmp_${count}=$line"
elif [ "$count" = "2" ]
then
declare "tmp_${count}=$line"
prod="P"$(( ${count} - 1 ))".grd"
grdpaste ./${lat}/${tmp_1} ./${lat}/${tmp_2} -G./${lat}/${prod} -V
elif [ "$count" > "2" ]
then
r="tmp_"${count}
declare "r=$line"
pprod="P"$(( ${count} - 2 ))".grd"
prod="P"$(( ${count} - 1 ))".grd"
grdpaste ./${lat}/${r} ./${lat}/${pprod} -G./${lat}/${prod} -V
to_paste=${prod}
fi
done <$file
done
rm *.tmp

Resources