Need to remove the extra empty lines from the output of shell script - shell

i'm trying to write a code which will print all files taking more than min_size (lets say 10G) in a directory. the problem is output off the below code is all files irrespective of the min_size. i will be getting other details like mtime , owner as well later in the code but this part itself doesnt work fine, whats wrong here ?
#!/bin/sh
if (( $# <3 )); then
echo "$0 dirname min_size count"
exit 1
else
dirname="$1";
min_size="$2";
count="$3";
#shift 3
fi
tmpfile=$(mktemp /lawdump/pulkit/files.XXXXXX)
exec 3> "$tmpfile"
find "${dirname}" -type f -print0 2>&1 | grep -v "Permission denied" | xargs -0 -I {} echo "{}" > "$tmpfile"
for i in `cat tmpfile`
do
x="`du -ah $i | awk '{print $1}' | grep G | sort -nr -k 1`"
size=$(echo $x | sed 's/[A-Za-z]*//g')
if [ size > $min_size ];then
echo $size
fi
done
Note : i know this can be done through find or du but i need to write a shell script to have an email sent out regularly with all the details.

Related

Shell: Add string to the end of each line, which match the pattern. Filenames are given in another file

I'm still new to the shell and need some help.
I have a file stapel_old.
Also I have in the same directory files like english_old_sync, math_old_sync and vocabulary_old_sync.
The content of stapel_old is:
english
math
vocabulary
The content of e.g. english is:
basic_grammar.md
spelling.md
orthography.md
I want to manipulate all files which are given in stapel_old like in this example:
take the first line of stapel_old 'english', (after that math, and so on)
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
search in english_old_sync line by line for the pattern '.md'
And append to each line after .md :::#a1
The result should be e.g. of english_old_sync:
basic_grammar.md:::#a1
spelling.md:::#a1
orthography.md:::#a1
of math_old_sync:
geometry.md:::#a1
fractions.md:::#a1
and so on. stapel_old should stay unchanged.
How can I realize that?
I tried with sed -n, while loop (while read -r line), and I'm feeling it's somehow the right way - but I still get errors and not the expected result after 4 hours inspecting and reading.
Thank you!
EDIT
Here is the working code (The files are stored in folder 'olddata'):
clear
echo -e "$(tput setaf 1)$(tput setab 7)Learning directories:$(tput sgr 0)\n"
# put here directories which should not become flashcards, command: | grep -v 'name_of_directory_which_not_to_learn1' | grep -v 'directory2'
ls ../ | grep -v 00_gliederungsverweise | grep -v 0_weiter | grep -v bibliothek | grep -v notizen | grep -v Obsidian | grep -v z_nicht_uni | tee olddata/stapel_old
# count folders
echo -ne "\nHow much different folders: " && wc -l olddata/stapel_old | cut -d' ' -f1 | tee -a olddata/stapel_old
echo -e "Are this learning directories correct? [j ODER y]--> yes; [Other]-->no\n"
read lernvz_korrekt
if [ "$lernvz_korrekt" = j ] || [ "$lernvz_korrekt" = y ];
then
read -n 1 -s -r -p "Learning directories correct. Press any key to continue..."
else
read -n 1 -s -r -p "Learning directories not correct, please change in line 4. Press any key to continue..."
exit
fi
echo -e "\n_____________________________\n$(tput setaf 6)$(tput setab 5)Found cards:$(tput sgr 0)$(tput setaf 6)\n"
#GET && WRITE FOLDER NAMES into olddata/stapel_old
anzahl_zeilen=$(cat olddata/stapel_old |& tail -1)
#GET NAMES of .md files of every stapel and write All to 'stapelname'_old_sync
i=0
name="var_$i"
for (( num=1; num <= $anzahl_zeilen; num++ ))
do
i="$((i + 1))"
name="var_$i"
name=$(cat olddata/stapel_old | sed -n "$num"p)
find ../$name/ -name '*.md' | grep -v trash | grep -v Obsidian | rev | cut -d'/' -f1 | rev | tee olddata/$name"_old_sync"
done
(tput sgr 0)
I tried to add:
input="olddata/stapel_old"
while IFS= read -r line
do
sed -n "$line"p olddata/stapel_old
done < "$input"
The code to change only the english_old_sync is:
lines=$(wc -l olddata/english_old_sync | cut -d' ' -f1)
for ((num=1; num <= $lines; num++))
do
content=$(sed -n "$num"p olddata/english_old_sync)
sed -i "s/"$content"/""$content":::#a1/g"" olddata/english_old_sync
done
So now, this need to be a inner for-loop, of a outer for-loop which holds the variable for english, right?
stapel_old should stay unchanged.
You could try a while + read loop and embed sed inside the loop.
#!/usr/bin/env bash
while IFS= read -r files; do
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
cp copies the file with a new name, if the goal is renaming the original file name from the content of the file staple_old then change cp to mv
The -n and -i flag from sed was ommited , include it, if needed.
The script also assumes that there are no empty/blank lines in the content of staple_old file. If in case there are/is add an addition test after the line where the do is.
[[ -n $files ]] || continue
It also assumes that the content of staple_old are existing files. Just in case add an additional test.
[[ -e $files ]] || { printf >&2 '%s no such file or directory.\n' "$files"; continue; }
Or an if statement.
if [[ ! -e $files ]]; then
printf >&2 '%s no such file or directory\n' "$files"
continue
fi
See also help test
See also help continue
Combining them all together should be something like:
#!/usr/bin/env bash
while IFS= read -r files; do
[[ -n $files ]] || continue
[[ -e $files ]] || {
printf >&2 '%s no such file or directory.\n' "$files"
continue
}
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
Remove the echo's If you're satisfied with the output so the script could copy/rename and edit the files.

Looping through each file in directory - bash

I'm trying to perform certain operation on each file in a directory but there is a problem with order it's going through. It should do one file at the time. The long line (unzipping, grepping, zipping) works fine on a single file without a script, so there is a problem with a loop. Any ideas?
Script should grep through through each zipped file and look for word1 or word2. If at least one of them exist then:
unzip file
grep word1 and word2 and save it to file_done
remove unzipped file
zip file_done to /donefiles/ with original name
remove file_done from original directory
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done
The code snipped you've provided has a few problems, e.g. unneeded nested for cycle and erroneous pipeline
(the whole line gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip...).
Note also your code will work correctly only if *.gz files don't have spaces (or special characters) in names.
Also zgrep -c 'word1\|word2' will also match strings like line_starts_withword1_orword2_.
Here is the working version of the script:
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c -E 'word1|word2' $file) # now counter is the number of word1/word2 occurences in $file
if [[ $counter -gt 0 ]]; then
name=$(basename $file .gz)
zcat $file | grep -E 'word1|word2' > ${name}_done
gzip -f -c ${name}_done > /donefiles/$file
rm -f ${name}_done
else
echo 'nothing to do here'
fi
done
What we can improve here is:
since we unzipping the file anyway to check for word1|word2 presence, we may do this to temp file and avoid double-unzipping
we don't need to count how many word1 or word2 is inside the file, we may just check for their presence
${name}_done can be a temp file cleaned up automatically
we can use while cycle to handle file names with spaces
#!/bin/bash
tmp=`mktemp /tmp/gzip_demo.XXXXXX` # create temp file for us
trap "rm -f \"$tmp\"" EXIT INT TERM QUIT HUP # clean $tmp upon exit or termination
find . -maxdepth 1 -mindepth 1 -type f -name '*.gz' | while read f; do
# quotes around $f are now required in case of spaces in it
s=$(basename "$f") # short name w/o dir
gunzip -f -c "$f" | grep -P '\b(word1|word2)\b' > "$tmp"
[ -s "$tmp" ] && gzip -f -c "$tmp" > "/donefiles/$s" # create archive if anything is found
done
It looks like you have an inner loop inside the outer one :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do #<<< HERE
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done
The inner loop goes through all the files in the directory if one of them contains file1 or file2. You probably want this :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
else
echo "nothing to do here"
fi
done

Creating a shell script with diff function to compare multiple files

I have five different files and all are in different directory, I want to check matching files and find out the unique files as well.
I am not sure how should I handle this.
You can look to the output of
chksum "path1/file1" "path2/f2" "p3/f3" "p4/f4" "p5/f5" | sort
You can also make a script looping through the files with
files=("path1/file1" "path2/f2" "p3/f3" "p4/f4" "p5/f5")
for i in {0..4}; do
((j=$i+1))
while [ $j -le 4 ]; do
diff "${files[i]}" "${files[j]}" >/dev/null
if [ $? -eq 0 ]; then
echo "${files[i]} and ${files[j]} are the same."
else
echo "${files[i]} and ${files[j]} are different."
fi
((j++))
done
done
You can use cksum ou md5sum to detect identical files :
find . -type f | while read f; do md5sum "$f"; done > tmp.txt
cat tmp.txt | cut -d" " -f1 | while read c
do n=`grep $c tmp.txt | wc -l`
if [ "$n" != "1" ]; then
grep $c tmp.txt
fi
done | sort -u

remove files which contain more than 14 lines in a folder

Unix command used
wc -l * | grep -v "14" | rm -rf
However this grouping doesn't seem to do the job. Can anyone point me towards the correct way?
Thanks
wc -l * 2>&1 | while read -r num file; do ((num > 14)) && echo rm "$file"; done
remove "echo" if you're happy with the results.
Here's one way to print out the names of all files with at least 15 lines (assuming you have Gnu awk, for the nextfile command):
awk 'FNR==15{print FILENAME;nextfile}' *
That will produce an error for any subdirectory, so it's not ideal.
You don't actually want to print the filenames, though. You want to delete them. You can do that in awk with the system function:
# The following has been defanged in case someone decides to copy&paste
awk 'FNR==15{system("echo rm "FILENAME);nextfile}' *
for f in *; do if [ $(wc -l $f | cut -d' ' -f1) -gt 14 ]; then rm -f $f; fi; done
There's a few problems with your solution: rm doesn't take input from stdin, and your grep only finds files who don't have exactly 14 lines. Try this instead:
find . -type f -maxdepth 1 | while read f; do [ `wc -l $f | tr -s ' ' | cut -d ' ' -f 2` -gt 14 ] && rm $f; done
Here's how it works:
find . -type f -maxdepth 1 #all files (not directories) in the current directory
[ #start comparison
wc -l $f #get line count of file
tr -s ' ' #(on the output of wc) eliminate extra whitespace
cut -d ' ' -f 2 #pick just the line count out of the previous output
-gt 14 ] #test if all that was greater than 14
&& rm $f #if the comparison was true, delete the file
I tried to figure out a solution just using find with -exec, but I couldn't figure out a way to test the line count. Maybe somebody else can come up with a way for it

bash echo string >> file does not work

I wrote the following script:
for filename in `find . -name '*'.cpp | grep $IN_REGEX | grep -v $OUT_REGEX`
do
echo "Output file is $OUTPUT_FILE"
count=`git log --pretty=format: --name-only $filename | grep -v ^$ | wc -l`
echo "$count $filename" >> $OUTPUT_FILE
done
But nothing gets written into the output file.
Please note:
I have set the values for OUTPUT_FILE, IN_REGEX and OUT_REGEX.
The code inside the loop is being executed. I checked this with an sh -x invokation.
When I remove the >> $OUTPUT_FILE I get the output.
I tried a touch $OUTPUT_FILE inside the script and that is working fine.
Can someone please point out what is my mistake here?
This line of code
for filename in `find . -name '*'.cpp
is a
(break on space in file names)
You should instead do :
while IFS= read -r file; do
echo "Output file is $OUTPUT_FILE"
count=$(git log --pretty=format: --name-only "$file" | grep -v '^$' | wc -l)
echo "$count $file" >> "$OUTPUT_FILE"
done < <(find . -name '*.cpp' | grep "$IN_REGEX" | grep -v "$OUT_REGEX")
For this to work, ensure that $OUTPUT_FILE have a path in it.

Resources