Find number of files with prefixes in bash - bash

I've been trying to count all files with a specific prefix and then if the number of files with the prefix does not match the number 5 I want to print the prefix.
To achieve this, I wrote the following bash script:
#!/bin/bash
for filename in $(ls); do
name=$(echo $filename | cut -f 1 -d '.')
num=$(ls $name* | wc -l)
if [$num != 5]; then
echo $name
fi
done
But I get this error (repeatedly):
./check_uneven_number.sh: line 5: [1: command not found
Thank you!

The if statement takes a command, runs it, and checks its exit status. Left bracket ([) by itself is a command, but you wrote [$num. The shell expands $num to 1, creating the word [1, which is not a command.
if [ $num != 5 ]; then

Your code loops over file names, not prefixes; so if there are three file names with a particular prefix, you will get three warnings, instead of one.
Try this instead:
# Avoid pesky ls
printf '%s\n' * |
# Trim to just prefixes
cut -d . -f 1 |
# Reduce to unique
sort -u |
while IFS='' read -r prefix; do
# Pay attention to quoting
num=$(printf . "$prefix"* | wc -c)
# Pay attention to spaces
if [ "$num" -ne 5 ]; then
printf '%s\n' "$prefix"
fi
done
Personally, I'd prefer case over the clunky if here, but it takes some getting used to.

Related

Counting number of lines in file and saving it in a bash file

I am trying to loop through all the files in a folder and add the file name of those files with 10 lines to a txt file but I don't know how to write the if statement.
As of right now, what I have is:
for FILE in *.txt do if wc $FILE == 10; then "$FILE" >> saved_names.txt fi done
I am getting stuck in how to format the statement that will evaluate to a boolean for the if statement.
I have already tried the if statement as:
if [ wc $FILE != 10 ]
if "wc $FILE" != 10
if "wc $FILE != 10"
as well as other ways but I don't seem to get it right. I know I am new to Bash but I can't seem to find a solution to this question.
There are a few problems in your code.
To count the number of lines in the file you should run "wc -l" command. However, that command will result in the number of lines and the name of the file (so for example - 10 a.txt - you can test it by running the command on a file in your terminal). To receive only the number of lines you need to pass the file's name to the standard input of that command
"==" is used in bash to compare strings. To compare integers as in that case, you should use "-eq" (take a look here https://tldp.org/LDP/abs/html/comparison-ops.html)
In terms of brackets: To get the wc command result you need to run it in a terminal and switch the command in the code to the result. To do that, you need correct brackets - $(wc -l). To receive a result of the comparison as a bool, you need to use square brackets with spaces [ 1 -eq 1 ].
To save the name of the file in another file using >> you need to first put the name to the standard output (as >> redirect the standard output to the chosen place). To do that you can just use the echo command.
The code should look like this:
#!/bin/bash
for FILE in *.txt
do
if [ "$(wc -l < "$FILE")" -eq 10 ]
then
echo "$FILE" >> saved_names.txt
fi
done
Try:
for file in *.txt; do
if [[ $(wc -l < "$file") -eq 10 ]]; then
printf '%s\n' "$file"
fi
done > saved_names.txt
Change > to >> if you want to append the filenames.
Related docs:
Command Substitution
Conditional Constructs
Extract the actual number of lines from a file with wc -l $FILE | cut -f1 -d' ' and use -eq operator:
for FILE in *.txt; do if [ "$(wc -l $FILE | cut -f1 -d' ')" -eq 10 ]; then "$FILE" >> saved_names.txt; fi; done

Counting all the 5 from a specific range in Bash

I want to count how many times the digit "5" appears from the range 1 to 4321. For example, the number 5 appears 1 or the number 555, 5 would appear 3 times etc.
Here is my code so far, however, the results are 0, and they are supposed to be 1262.
#!/bin/bash
typeset -i count5=0
for n in {1..4321}; do
echo ${n}
done | \
while read -n1 digit ; do
if [ `echo "${digit}" | grep 5` ] ; then
count5=count5+1
fi
done | echo "${count5}"
P.s. I am looking to fix my code so it can print the right output. I do not want a completely different solution or a shortcut.
What about something like this
seq 4321 | tr -Cd 5 | wc -c
1262
Creates the sequence, delete everything but 5's and count the chars
The main problem here is http://mywiki.wooledge.org/BashFAQ/024. With minimal changes, your code could be refactored to
#!/bin/bash
typeset -i count5=0
for n in {1..4321}; do
echo $n # braces around ${n} provide no benefit
done | # no backslash required here; fix weird indentation
while read -n1 digit ; do
# prefer modern command substitution syntax over backticks
if [ $(echo "${digit}" | grep 5) ] ; then
count5=count5+1
fi
echo "${count5}" # variable will not persist outside subprocess
done | head -n 1 # so instead just print the last one after the loop
With some common antipatterns removed, this reduces to
#!/bin/bash
printf '%s\n' {1..4321} |
grep 5 |
wc -l
A more efficient and elegant way to do the same is simply
printf '%s\n' {1..4321} | grep -c 5
One primary issue:
each time results are sent to a pipe said pipe starts a new subshell; in bash any variables set in the subshell are 'lost' when the subshell exits; net result is even if you're correctly incrementing count5 within a subshell you'll still end up with 0 (the starting value) when you exit from the subshell
Making minimal changes to OP's current code:
while read -n1 digit ; do
if [ `echo "${digit}" | grep 5` ]; then
count5=count5+1
fi
done < <(for n in {1..4321}; do echo ${n}; done)
echo "${count5}"
NOTE: there are a couple performance related issues with this method of coding but since OP has explicitly asked to a) 'fix' the current code and b) not provide any shortcuts ... we'll leave the performance fixes for another day ...
A simpler way to get the number for a certain n would be
nx=${n//[^5]/} # Remove all non-5 characters
count5=${#nx} # Calculate the length of what is left
A simpler method in pure bash could be:
printf -v seq '%s' {1..4321} # print the sequence into the variable seq
fives=${seq//[!5]} # delete all characters but 5s
count5=${#fives} # length of the string is the count of 5s
echo $count5 # print it
Or, using standard utilities tr and wc
printf '%s' {1..4321} | tr -dc 5 | wc -c
Or using awk:
awk 'BEGIN { for(i=1;i<=4321;i++) {$0=i; x=x+gsub("5",""); } print x} '

Bash shell script to find missing files from filename

I have a folder that should contain 1485 files, named PA0001.png, PA0002.png ... up to PA1485.png
Some of them are missing and I'd like to write a shell script able to identify the missing ones and print them, as a list, in a .txt file (preferably without the leading string PA and the .png extension, but with the leading zeroes, if any)
I have no clue on how to proceed though, maybe using awk? But I'm still quite of a noob... Any help would be much appreciated!
You can get the list of the sequence number of missing files using bash loop
# Redirect output, per answer
exec > file.txt
for ((i=1 ; i<=1485 ; i++)) ; do
# Convert to 4 digit zero padded
printf -v id '%04d' $i
if [ ! -f "PA$id.png" ] ; then
echo $id
fi
done
Here's a slight refactoring of the existing answer, with explanations in the comments.
# Assign each number in the sequence to i; loop until we have done them all
for ((i=1 ; i<=1485 ; i++)) ; do
# Format the number with padding for the file name part
printf -v id '%04d' "$i"
# If a file with this name does not exist,
if [ ! -f "PA$id.png" ] ; then
# Print it to standard output
echo "$id"
fi
# Redirect the loop's standard output to a file
done >missing.txt
You can do exactly this without a single Bash loop:
#!/usr/bin/env bash
{
find . \
-maxdepth 1 \
-regextype posix-extended \
-regex '.*/([[:digit:]]){4}\.png' \
-printf '%f\n'
printf '%04d.png\n' {1..1485}
} | sort | uniq --unique
It combines the list of files with the list of expected files;
then sort and print the unique entries that are those that are only in the printed expected list, so are missing files.

Bash scripting; confused with for loop

I need to make a for loop that loops for every item in a directory.
My issue is the for loop is not acting as I would expect it to.
cd $1
local leader=$2
if [[ $dOpt = 0 ]]
then
local items=$(ls)
local nitems=$(ls |grep -c ^)
else
local items=$(ls -l | egrep '^d' | awk '{print $9}')
local nitems=$(ls -l | egrep '^d' | grep -c ^)
fi
for item in $items;
do
printf "${CYAN}$nitems\n${NONE}"
let nitems--
if [[ $nitems -lt 0 ]]
then
exit 4
fi
printf "${YELLOW}$item\n${NONE}"
done
dOpt is just a switch for a script option.
The issue I'm having is the nitems count doesn't decrease at all, it's as if the for loop is only going in once. Is there something I'm missing?
Thanks
Goodness gracious, don't rely on ls to iterate over files.
local is only useful in functions.
Use filename expansion patterns to store the filenames in an array.
cd "$1"
leader=$2 # where do you use this?
if [[ $dOpt = 0 ]]
then
items=( * )
else
items=( */ ) # the trailing slash limits the results to directories
fi
nitems=${#items[#]}
for item in "${items[#]}" # ensure the quotes are present here
do
printf "${CYAN}$((nitems--))\n${NONE}"
printf "${YELLOW}$item\n${NONE}"
done
Using this technique will safely handle files with spaces, even newlines, in the name.
Try this:
if [ "$dOpt" == "0" ]; then
list=(`ls`)
else
list=(`ls -l | egrep '^d' | awk '{print $9}'`)
fi
for item in `echo $list`; do
... # do something with item
done
Thanks for all the suggestions. I found out the problem was changing $IFS to ":". While I meant for this to avoid problems with whitespaces in the filename, it just complicated things.

ksh: shell script to search for a string in all files present in a directory at a regular interval

I have a directory (output) in unix (SUN). There are two types of files created with timestamp prefix to the file name. These file are created on a regular interval of 10 minutes.
e. g:
1. 20140129_170343_fail.csv (some lines are there)
2. 20140129_170343_success.csv (some lines are there)
Now I have to search for a particular string in all the files present in the output directory and if the string is found in fail and success files, I have to count the number of lines present in those files and save the output to the cnt_succ and cnt_fail variables. If the string is not found I will search again in the same directory after a sleep timer of 20 seconds.
here is my code
#!/usr/bin/ksh
for i in 1 2
do
grep -l 0140127_123933_part_hg_log_status.csv /osp/local/var/log/tool2/final_logs/* >log_t.txt; ### log_t.txt will contain all the matching file list
while read line ### reading the log_t.txt
do
echo "$line has following count"
CNT=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT=`expr $CNT - 1`
echo $CNT
done <log_t.txt
if [ $CNT > 0 ]
then
exit
fi
echo "waiitng"
sleep 20
done
The problem I'm facing is, I'm not able to get the _success and _fail in file in line and and check their count
I'm not sure about ksh, but while ... do; ... done is notorious for running off with whatever variables you're using in bash. ksh might be similar.
If I've understand your question right, SunOS has grep, uniq and sort AFAIK, so a possible alternative might be...
First of all:
$ cat fail.txt
W34523TERG
ADFLKJ
W34523TERG
WER
ASDTQ34T
DBVSER6
W34523TERG
ASDTQ34T
DBVSER6
$ cat success.txt
abcde
defgh
234523452
vxczvzxc
jkl
vxczvzxc
asdf
234523452
vxczvzxc
dlkjhgl
jkl
wer
234523452
vxczvzxc
And now:
egrep "W34523TERG|ASDTQ34T" fail.txt | sort | uniq -c
2 ASDTQ34T
3 W34523TERG
egrep "234523452|vxczvzxc|jkl" success.txt | sort | uniq -c
3 234523452
2 jkl
4 vxczvzxc
Depending on the input data, you may want to see what options sort has on your system. Examining uniq's options may prove useful too (it can do more than just count duplicates).
Think you want something like this (will work in both bash and ksh)
#!/bin/ksh
while read -r file; do
lines=$(wc -l < "$file")
((sum+=$lines))
done < <(grep -Rl --include="[1|2]*_fail.csv" "somestring")
echo "$sum"
Note this will match files starting with 1 or 2 and ending in _fail.csv, not exactly clear if that's what you want or not.
e.g. Let's say I have two files, one starting with 1 (containing 4 lines) and one starting with 2 (containing 3 lines), both ending in `_fail.csv somewhere under my current working directory
> abovescript
7
Important to understand grep options here
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
and
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
Finaly I'm able to find the solution. Here is the complete code:
#!/usr/bin/ksh
file_name="0140127_123933.csv"
for i in 1 2
do
grep -l $file_name /osp/local/var/log/tool2/final_logs/* >log_t.txt;
while read line
do
if [ $(echo "$line" |awk '/success/') ] ## will check the success file
then
CNT_SUCC=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT_SUCC=`expr $CNT_SUCC - 1`
fi
if [ $(echo "$line" |awk '/fail/') ] ## will check the fail file
then
CNT_FAIL=`wc -l $line|tr -s " "|cut -d" " -f2`
CNT_FAIL=`expr $CNT_FAIL - 1`
fi
done <log_t.txt
if [ $CNT_SUCC > 0 ] && [ $CNT_FAIL > 0 ]
then
echo " Fail count = $CNT_FAIL"
echo " Success count = $CNT_SUCC"
exit
fi
echo "waitng for next search..."
sleep 10
done
Thanks everyone for your help.
I don't think I'm getting it right, but You can't diffrinciate the files?
maybe try:
#...
CNT=`expr $CNT - 1`
if [ $(echo $line | grep -o "fail") ]
then
#do something with fail count
else
#do something with success count
fi

Resources