Find single line files and move them to a subfolder - bash

I am using the following bash line to find text files in a subfolder with a given a pattern inside it and move them to a subfolder:
find originalFolder/ -maxdepth 1 -type f -exec grep -q 'mySpecificPattern' {} \; -exec mv -i {} destinationFolder/ \;
Now instead of grepping a pattern, I would like to move the files to a subfolder if they consist only of a single line (of text): how can I do that?

You can do it this way:
while IFS= read -r -d '' file; do
[[ $(wc -l < "$file") -eq 1 ]] && echo mv -i "$file" destinationFolder/
done < <(find originalFolder/ -maxdepth 1 -type f -print0)
Note use of echo in front of mv so that you can verify output before actually executing mv. Once you're satisfied with output, remove echo before mv.

Using wc as shown above is the most straightforward way, although it reads the entire file to determine the length. It's also possible to do length checks with awk, and the exit function lets you fit that into a find command.
find . -type f -exec awk 'END { exit (NR==1 ? 0 : 1) } NR==2 { exit 1 }' {} \; -print
The command returns status 0 if there has been only 1 input record at end-of-file, and it also exits immediately with status 1 when line 2 is encountered; this should easily outrun wc if large files are a performance concern.

Related

For Loop: Identify Filename Pairs, Input to For Loop

I am attempting to adapt a previously answered question for use in a for loop.
I have a folder containing multiple paired file names that need to be provided sequentially as input to a for loop.
Example Input
WT1_0min-SRR9929263_1.fastq
WT1_0min-SRR9929263_2.fastq
WT1_20min-SRR9929265_1.fastq
WT1_20min-SRR9929265_2.fastq
WT3_20min-SRR12062597_1.fastq
WT3_20min-SRR12062597_2.fastq
Paired file names can be identified with the answer from the previous question:
find . -name '*_1.fastq' -exec basename {} '_1.fastq' \; | xargs -n1 -I{} echo {}_1.fastq {}_2.fastq
I now want adopt this for use in a for loop so that each output file can be independently piped to subsequent commands and also so that output file names can be appended.
Input files can be provided as a comma-separated list of files after the -1 and -2 flags respectively. So for this example, the bulk and undesired input would be:
-1 WT1_0min-SRR9929263_1.fastq,WT1_20min-SRR9929265_1.fastq,WT3_20min-SRR12062597_1.fastq
-2 WT1_0min-SRR9929263_2.fastq,WT1_20min-SRR9929265_2.fastq,WT3_20min-SRR12062597_2.fastq
However, I would like to run this as a for loop so that input files are provided sequentially:
Iteration #1
-1 WT1_0min-SRR9929263_1.fastq
-2 WT1_0min-SRR9929263_2.fastq
Iteration #2
-1 WT1_20min-SRR9929265_1.fastq
-2 WT1_20min-SRR9929265_2.fastq
Iteration #3
-1 WT3_20min-SRR12062597_1.fastq
-2 WT3_20min-SRR12062597_2.fastq
Below is an example of the for loop I would like to run using the xarg code to pull filenames. It currently does not work. I assume I need to somehow save the paired filenames from the xarg code as a variable that can be referenced in the for loop?
find . -name '*_1.fastq' -exec basename {} '_1.fastq' \; | xargs -n1 -I{} echo {}_1.fastq {}_2.fastq
for file in *.fastq
do
bowtie2 -p 8 -x /path/genome \
1- {}_1.fastq \
2- {}_2.fastq \
"../path/${file%%.fastq}_UnMappedReads.fastq.gz" \
2> "../path/${file%%.fastq}_Bowtie2_log.txt" | samtools view -# 7 -b | samtools sort -# 7 -m 5G -o "../path/${file%%.fastq}_Mapped.bam"
done
The expected outputs for the example would be:
WT1_0min-SRR9929263_UnMappedReads.fastq.gz
WT1_20min-SRR9929265_UnMappedReads.fastq.gz
WT3_20min-SRR12062597_UnMappedReads.fastq.gz
WT1_0min-SRR9929263_Bowtie2_log.txt
WT1_20min-SRR9929265_Bowtie2_log.txt
WT3_20min-SRR12062597_Bowtie2_log.txt
WT1_0min-SRR9929263_Mapped.bam
WT1_20min-SRR9929265_Mapped.bam
WT3_20min-SRR12062597_Mapped.bam
I don't know what "bowtie2" or "samtools" are but best I can tell all you need is:
#!/usr/bin/env bash
for file1 in *_1.fastq; do
file2="${file1%_1.fastq}_2.fastq"
echo "$file1" "$file2"
done
Replace echo with whatever you want to do with ta pair of files.
If you HAD to use find for some reason then it'd be:
#!/usr/bin/env bash
while IFS= read -r file1; do
file2="${file1%_1.fastq}_2.fastq"
echo "$file1" "$file2"
done < <(find . -type f -name '*_1.fastq' -print)
or if your file names can contain newlines then:
#!/usr/bin/env bash
while IFS= read -r -d $'\0' file1; do
file2="${file1%_1.fastq}_2.fastq"
echo "$file1" "$file2"
done < <(find . -type f -name '*_1.fastq' -print0)

sed to replace string in file only displayed but not executed

I want to find all files with certain name (Myfile.txt) that do not contain certain string (my-wished-string) and then do a sed in order to do a replace in the found files. I tried with:
find . -type f -name "Myfile.txt" -exec grep -H -E -L "my-wished-string" {} + | sed 's/similar-to-my-wished-string/my-wished-string/'
But this only displays me all files with wished name that miss the "my-wished-string", but does not execute the replacement. Do I miss here something?
With a for loop and invoking a shell.
find . -type f -name "Myfile.txt" -exec sh -c '
for f; do
grep -H -E -L "my-wished-string" "$f" &&
sed -i "s/similar-to-my-wished-string/my-wished-string/" "$f"
done' sh {} +
You might want to add a -q to grep and -n to sed to silence the printing/output to stdout
You can do this by constructing two stacks; the first containing the files to search, and the second containing negative hits, which will then be iterated over to perform the replacement.
find . -type f -name "Myfile.txt" > stack1
while read -r line;
do
[ -z $(sed -n '/my-wished-string/p' "${line}") ] && echo "${line}" >> stack2
done < stack1
while read -r line;
do
sed -i "s/similar-to-my-wished-string/my-wished-string/" "${line}"
done < stack2
With some versions of sed, you can use -i to edit the file. But don't pipe the list of names to sed, just execute sed in the find:
find . -type f -name Myfile.txt -not -exec grep -q "my-wished-string" {} \; -exec sed -i 's/similar-to-my-wished-string/my-wished-string/g' {} \;
Note that any file which contains similar-to-my-wished-string also contains the string my-wished-string as a substring, so with these exact strings the command is a no-op, but I suppose your actual strings are different than these.

Is there a way to pipe from a variable?

I'm trying to find all files in a file structure above a certain file size, list them, then delete them. What I currently have looks like this:
filesToDelete=$(find $find $1 -type f -size +$2k -ls)
if [ -n "$filesToDelete" ];then
echo "Deleting files..."
echo $filesToDelete
$filesToDelete | xargs rm
else
echo "no files to delete"
fi
Everything works, except the $filesToDelete | xargs rm, obviously. Is there a way to use pipe on a variable? Or is there another way I could do this? My google-fu didn't really find anything, so any help would be appreciated.
Edit: Thanks for the information everyone. I will post the working code here now for anyone else stumbling upon this question later:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "deleting file $f"; rm $f' {} \;
else
echo "no files above" $2 "kb found"
fi
As already pointed out, you don't need piping a var in this case. But just in case you needed it in some other situation, you can use
xargs rm <<< $filesToDelete
or, more portably
echo $filesToDelete | xargs rm
Beware of spaces in file names.
To also output the value together with piping it, use tee with process substitution:
echo "$x" | tee >( xargs rm )
You can directly use -exec to perform an action on the files that were found in find:
find $1 -type f -size +$2k -exec rm {} \;
The -exec trick makes find execute the command given for each one of the matches found. To refer the match itself we have to use {} \;.
If you want to perform more than one action, -exec sh -c "..." makes it. For example, here you can both print the name of the files are about to be removed... and remove them. Note the f={} thingy to store the name of the file, so that it can be used later on in echo and rm:
find $1 -type f -size +$2k -exec sh -c 'f={}; echo "removing $f"; rm $f' {} \;
In case you want to print a message if no matches were found, you can use wc -l to count the number of matches (if any) and do an if / else condition with it:
if [ $(find $1 -type f -size +$2k | wc -l) -ge 1 ]; then
find $1 -type f -size +$2k -exec rm {} \;
else
echo "no matches found"
fi
wc is a command that does word count (see man wc for more info). Doing wc -l counts the number of lines. So command | wc -l counts the number of lines returned by command.
Then we use the if [ $(command | wc -l) -ge 1 ] check, which does an integer comparison: if the value is greater or equal to 1, then do what follows; otherwise, do what is in else.
Buuuut the previous approach was using find twice, which is a bit inefficient. As -exec sh -c is opening a sub-shell, we cannot rely on a variable to keep track of the number of files opened. Why? Because a sub-shell cannot assign values to its parent shell.
Instead, let's store the files that were deleted into a file, and then count it:
find . -name "*.txt" -exec sh -c 'f={}; echo "$f" >> /tmp/findtest; rm $f' {} \;
if [ -s /tmp/findtest ]; then #check if the file is empty
echo "file has $(wc -l < /tmp/findtest) lines"
# you can also `cat /tmp/findtest` here to show the deleted files
else
echo "no matches"
fi
Note that you can cat /tmp/findtest to see the deleted files, or also use echo "$f" alone (without redirection) to indicate while removing. rm /tmp/findtest is also an option, to do once the process is finished.
You don't need to do all this. You can directly use find command to get the files over a particular size limit and delete it using xargs.
This should work:
#!/bin/bash
if [ $(find $1 -type f -size +$2k | wc -l) -eq 0 ]; then
echo "No Files to delete"
else
echo "Deleting the following files"
find $1 -size +$2 -exec ls {} \+
find $1 -size +$2 -exec ls {} \+ | xargs rm -f
echo "Done"
fi

copy list of filenames in a textfile in bash

I need to copy a list of filenames in a textfile. Trying by this:
#!/bin/sh
mjdstart=55133
mjdend=56674
datadir=/nfs/m/ir1/ssc/evt
hz="h"
for mjd in $(seq $mjdstart $mjdend); do
find $datadir/ssc"${hz}"_allcl_mjd"${mjd}".evt -maxdepth 1 -type f -printf $datadir'/%f\n' > ssc"${hz}".list
done
I tried also:
find $datadir/ssc"${hz}"_allcl_mjd"${mjd}".evt -maxdepth 1 -type f -printf $datadir'/%f\n' | split -l999 -d - ssc"${hz}".list
Or other combinations, but clearly I am missing something: the textfile is empty. Where is my mistake?
Use >> (append) instead of > (overwrite) otherwise you will have output of last command only:
> ssc"${hz}".list
for mjd in $(seq $mjdstart $mjdend); do
find $datadir/ssc"${hz}"_allcl_mjd"${mjd}".evt -maxdepth 1 -type f -printf $datadir'/%f\n' >> ssc"${hz}".list
done
You don't need to use find here, as you simply have a range of specific file names whose existence you are checking for:
#!/bin/sh
mjdstart=55133
mjdend=56674
datadir=/nfs/m/ir1/ssc/evt
hz="h"
for mjd in $(seq $mjdstart $mjdend); do
fnname="$datadir/ssc${hz}_allcl_mjd${mjd}.evt"
[[ -f $fname ]] && printf "$fname\n"
done > "ssc$hz.list"
You are using find wrong. The first argument is the directory, in which it should search. Also, using > overwrites your list file in every turn. Use >> to concatenate:
find $datadir -maxdepth 1 -type f -name "src${hz}_allcl_mjd${mjd}.evt" >> ssc"${hz}".list

Bash function find and rm

I am trying to do a recursive grep and deleting files with less than a specified entry.
To be more clear, I have a directory of 400000 text files and in each file i have 10 items each starting with the >. Now the problem is that some of the files out of the 4000000 files have only 6-7 or 8-9 items starting with >.
So I wish to delete the files which have fewer than 10 items. I am using the recursive function, however i am not able to figure out how to add rm in the recursive way. What I have till now is:
find . -name "*.[txt]" -exec grep ">" -c {} \;
You can use -exec like this:
find . -name "*.txt" -exec bash -c '(( $(grep ">" -c "$1") <= 10 )) && rm "$1"' - '{}' \;
To avoid creating shell per file you can use:
while read -r f; do
(( $(grep ">" -c "$f") <= 10 )) && rm "$f"
done < <(find . -name "*.txt")
I would break it up into smaller steps:
find . -type f -exec grep -c '>' {} + |
awk -F: '$2 != 10 {print $1}' |
xargs echo rm
remove the "echo" if you're satisfied it's working
The awk step is fragile if you have any filenames containing ":"

Resources