Bash looping over files in different directories and print output - bash

I have *.vcf, *.vcf.vcfidx and *.vcf.idx files in directory /mypath/mydir/. I want to loop over .vcf files only using command below (for file 1):
command for one vcf file:
vcf-subset -c sample.txt vcffile1.vcf | bgzip -c > output_vcfile1.vcf_.vcf.gz
Can someone please help loop over all the .vcf (not vcf.vcfidx or vcf.idx) files and get the output for each file in designated directory /get/inthis/dir/ using the command shown above?

Just use glob pattern *.vcf:
for i in *.vcf; do echo "$i"; done
The glob pattern *.vcf will match only files ending in .vcf.
Your command:
for i in *.vcf; do
vcf-subset -c sample.txt "$i" | bgzip -c > /get/inthis/dir/output_"$i"_.vcf.gz
done
If you have to search for .vcf files in a specific directory e.g. /foo/bar/, do:
for i in /foo/bar/*.vcf; do
vcf-subset -c sample.txt "$i" | bgzip -c > /get/inthis/dir/output_"${i##*/}"_.vcf.gz
done

Related

CP only folders data that match in two different directories into one directory using linux or bash

What would be a efficient way to look into two different directories and if subdirectories match, copy these subfolders in a new output folder linux or bash scripting? I know I need cp command and do match based on SC#### values.
Example folder one:
[NAME]$ Project
Sample_SC1234-AAA-AAA
Sample_SC2345-AAA-BBB
Sample_SC3456-CCC-CCC
Sample_SC4567-DDD-AAA
Example folder Two:
[NAME]$ Lz
Sample_SC1234-AAA-BBB
Sample_SC4567-BBB-AAA
Sample_SC5678-DDD-BBB
Sample_SC6789-BBB-DDD
Wanted output:
[NAME]$ New
Sample_SC1234-AAA-BBB
Sample_SC4567-BBB-AAA
Sample_SC1234-AAA-AAA
Sample_SC4567-DDD-AAA
ls Project Lz|grep Sample_SC |cut -d '-' -f 1|sort |uniq -c |awk '{if($1 > 1)print $2}' |while read line
do
cp Project/$line* Lz/$line* New/
done
Get the duplicate SC#### values from the directories listed under ./Project and ./Lz subdirectories and use those values in your recursive copy command.
#!/bin/bash
mkdir -p ./New
while read -r line ; do
cp -r ./Project/*"$line"* ./Lz/*"$line"* ./New
done < <(awk 'a[$0]++{print $0}' <(grep -o 'SC[0-9]\{4\}' <(ls Lz Project)))

Run a script on all recently modified files in bash

I would like to:
Find latest modified file in a folder
Change some files in the folder
Find all files modified after file of step 1
Run a script on these files from step 2
This this where I've end up:
#!/bin/bash
var=$(find /home -type f -exec stat \{} --printf="%y\n" \; |
sort -n -r |
head -n 1)
echo $var
sudo touch -d $var /home/foo
find /home/ -newer /home/foo
Can anybody help me in achieving these actions ?
Use inotifywait instead to monitor files and check for changes
inotifywait -m -q -e modify --format "%f" {Path_To__Monitored_Directory}
Also, you can make it output to file, loop over it's contents and run your script on every entry.
inotifywait -m -q -e modify --format "%f" -o {Output_File} {Path_To_Monitored_Directory}
sample output:
file1
file2
Example
We are monitoring directory named /tmp/dir which contains file1 and file2.
The following script which monitor the whole directory and echo the file name:
#!/bin/bash
while read ch
do
echo "File modified= $ch"
done < <(inotifywait -m -q -e modify --format "%f" /tmp/dir)
Run this script and modify file1 echo "123" > /tmp/dir/file1, the script will output the following:
File modified= file1
Also you can look at this stackoverflow answer

Shell Script: How to copy files with specific string from big corpus

I have a small bug and don't know how to solve it. I want to copy files from a big folder with many files, where the files contain a specific string. For this I use grep, ack or (in this example) ag. When I'm inside the folder it matches without problem, but when I want to do it with a loop over the files in the following script it doesn't loop over the matches. Here my script:
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" | while read -d $'\0' file; do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done
SEARCH_QUERY holds the String I want to find inside the files, INPUT_DIR is the folder where the files are located, OUTPUT_DIR is the folder where the found files should be copied to. Is there something wrong with the while do?
EDIT:
Thanks for the suggestions! I took this one now, because it also looks for files in subfolders and saves a list with all the files.
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" > "output_list.txt"
while read file
do
echo "${file##*/}"
cp "${file}" "${OUTPUT_DIR}/${file##*/}"
done < "output_list.txt"
Better implement it like below with a find command:
find "${INPUT_DIR}" -name "*.*" | xargs grep -l "${SEARCH_QUERY}" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
or another option:
grep -l "${SEARCH_QUERY}" "${INPUT_DIR}/*.*" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
if you do not mind doing it in just one line, then
grep -lr 'ONE\|TWO\|THREE' | xargs -I xxx -P 0 cp xxx dist/
guide:
-l just print file name and nothing else
-r search recursively the CWD and all sub-directories
match these works alternatively: 'ONE' or 'TWO' or 'THREE'
| pipe the output of grep to xargs
-I xxx name of the files is saved in xxx it is just an alias
-P 0 run all the command (= cp) in parallel (= as fast as possible)
cp each file xxx to the dist directory
If i understand the behavior of ag correctly, then you have to
adjust the read delimiter to '\n' or
use ag -0 -l to force delimiting by '\0'
to solve the problem in your loop.
Alternatively, you can use the following script, that is based on find instead of ag.
while read file; do
echo "$file"
cp "$file" "$OUTPUT_DIR/$file"
done < <(find "$INPUT_DIR" -name "*$SEARCH_QUERY*" -print)

Display filename of tar file

I would like to know how to display the filename along with the lines matching a specfic word of a tar file.
Command wise :
zcat file | grep "stuff" -r # shows what I want
zcat *.gz | grep "stuff" -ar # this fails
You can use zgrep:
For single file, you can use the following command to display filename:
zgrep "stuff" file.gz /dev/null
For multiple files:
zgrep "stuff" *.gz
Maybe this related answer can help. It uses tar to untar (you would need to add -z) and pipes each file of the archive to awk for "grepping" inside it.
I'm not quite sure what the question is but if you are looking for tar files on your system then just do something like this. This will recursively search your current directory and any child directories for .tar files. Hope this helps.
find -name "*.tar"
If zcat file | grep "stuff" -r shows what you want, you can do this for multiple files:
for name in *.gz ; do zcat "$name" | grep -a "stuff" | sed -e "s/^/${name}: /" ; done
This command uses globbing (*) to expand to a list of .gz files in your working directory, then calls zcat for extraction, grep for the search and sed for prefixing with the filename on each of the files.
Note that if you are working with gzipped tarballs, most people give them a .tgz or .tar.gz instead of just .gz extension.
This will output nameOfFileInTar:LineNumber:Match. Invoke with greptar.sh tarfile.tar pattern
If you don't want the line number, remove the -n option. If you only want the line number, add |cut -f1 -d: after the grep
#!/bin/bash
TARFILE=$1
PATTERN=$2
tar ztf $TARFILE | while read -r FILE
do
res=$(tar zxf $TARFILE $FILE -O | grep -n $2 )
if [[ $? == 0 ]]; then
echo "$res" | while read -r line; do
echo $FILE:$line;
done
fi
done

Bash - how do I wipe the contents of all files in a directory

is it possible to wipe the contents of all given files in a directory? e.g. if I have a bunch of .csv files I want wiped.
I generally use "# > .csv" on the command line for a single csv file, but a "# > *.csv" results in a error: bash: *.csv: ambiguous redirect
I have tried piping /dev/null to *.csv but get same result. When I have a directory full of files whose content I want wiped, it's a real pain.
If I use a script and for loop on all the files I get the same error when using the redirect on the $f (the file) in the loop.
Thanks
for f in *.csv; do
> "$f"
done
You can use truncate for the same :
truncate -s 0 *.csv
When you say "wipe" you mean:
"overwite" with random content,
or an simple "truncate"
or even simpler delete?
Delete:
rm *.csv #will delete all .csv file in the currect directory
Truncate
see #John Zwinkcs answer
Overwrite with random and delete
shopt -s nullglob
for file in *.csv
do
echo "wiping $file" >&2
eval "$(gstat -c 'count=%b;blocksize=%B' "$file")"
dd if=/dev/random of="$file" bs="$blocksize" count="$count" 2>/dev/null
rm "$file"
done
This is a way using sed
sed -i '1,$d' *.csv
also
sed -ni '' *.csv

Resources