Elegant way to check for equal values within an array or any given textfile - bash

Hello i'm fairly new to scripting, and struggling with trying to test/check if 4 lines in a textfile are equal to eachother, and i cannot figure this one out since comparison examples are all with two variables. i've come up with this:
#!/bin/sh
#check if mxf videofiles are older than 10 minutes and parse them into tclist.txt
find . -amin +10 |sed "s/^..//" >tclist.txt
#grep timecode and cut : from the output of mxfprobe and place that into variable TC
for z in $(cat tclist.txt); do TC=$(mxfprobe -i "$z" 2>&1 |grep timecode|sed "s/[^0-9]*//"|sed "s/://"|sed "s/://"|sed "s/://")
echo $TC >>offsetcheck.txt
done;
The output of offsetcheck.txt then looks like this:
10194013
10194013
10194014
10194014
How can i test if those 4 values are equal to eachother? (in this example two files are drifted one frame)
I've tried to place those values into an array and check them for uniqueness...
exec 10<&0
exec < offsetcheck.txt
let count=0
while read LINE; do
ARRAY[$count]=$LINE
((count++))
done
echo ${ARRAY[#]}
exec 0<&10 10<&-
if ($ARRAY !== array_unique($ARRAY))
{
echo There were duplicate values
}

... struggling with trying to test/check if 4 lines in a textfile are
equal to eachother
You could use sort and wc to determine the number of unique values in the file. The following would tell whether the file contains unique values or not:
(( $(sort -u offsetcheck.txt | wc -l) == 1 )) && echo "File contains unique values" || echo "File does not contain unique values"
If you wanted to do the same for an array, you could say:
for i in "${ARRAY[#]}"; do echo "$i" ; done | sort -u | wc -l
to get the number of unique values in the array.
If the values in the array are guaranteed not to have any space, then saying:
echo "${ARRAY[#]}" | tr ' ' '\n' | sort -u | wc -l
would suffice. (But note the if above.)

Looks to me like the whole process can be reduced to
n=$(
find . -amin +10 |
sed "s/^..//" |
xargs -I FILE mxfprobe -i "FILE" 2>&1 |
grep -h timecode |
sed 's/[^0-9]//g' |
sort -u |
wc -l
)
Then check if n == 1

Related

Get full path name of file and its size using awk

I want to get the file names followed by their size for all files having size in MB or GB. I have done this much so far :
LIST=$(ls -lh -d -1 $PWD/{*,} | awk '{ print $9":"$5 }')
for i in $LIST
do
if [[ $( echo "$i" | cut -f2 -d: | egrep "M|G" | wc -l) -ne 0 ]]
# egrep not working, only finds M
then
echo "$i" >> bigfiles
fi
done
What I am getting is :
amit#C0deDaedalus:~$ test/findbig
/home/amit/Batch:3.8M
/home/amit/Black:3.6M
What I want is :
amit#C0deDaedalus:~$ test/findbig
/home/amit/Batch File Programming.pdf:3.8M
/home/amit/Black Panther - Legend Has It ( Instrumental ).opus:3.6M
Basically, everything is working fine except filenames that I get are not complete. Only first word is shown. I can't figure out whether there is something wrong with logic or syntax but I think it has something to do with awk.
So, How do I get the full path names of files (having spaces in between) in the output ?
I have tried the loop trick in awk, but don't know how to get both of the columns to fit in.
You can use read and the convenient occurrence of the filename at the right-side of the ls -l listing. read puts all the "extra" fields into the final variable:
function f_getfields
{
local perm lnk uname grp size d1 d2 d3 filename
while read perm lnk uname grp size d1 d2 d3 filename
do
echo "$filename $size"
done < <(ls -l)
}
f_getfields
The problem is due to the spaces in your file names. The for loop uses spaces as delimeter. Therefore the first item in your list will be "/home/amit/Batch", second item "File" and so on.
You can use while loop instead of for, something like :
ls -lh -d -1 $PWD/{*,} | awk '{ print $9":"$5 }' | while read LINE
do
echo ${LINE}
# do your stuff here
done
As an aside, if your only intention is to find out large files, you may want to check out disk usage command :
$ du -a | sort -rn | head

Counting grep results

I'm new to bash so I'm finding trouble doing something very basic.
Through playing with various scripts I found out that the following script prints the lines that contain the "word"
for file in*; do
cat $file | grep "word"
done
doing the following:
for file in*; do
cat $file | grep "word" | wc -l
done
had a result of printing in every iteration how many times did the "word" appeared on file.
How can I implement a counter for all those appearances and in the
end just echo the counter?
I used a counter that way but it appeared 0.
let x+=cat $filename | grep "word"
You can pipe the entire loop to wc -l.
for file in *; do
cat $file | grep "word"
done | wc -l
This is a useless use of cat. How about:
for file in *; do
grep "word" "$file"
done | wc -l
Actually, the entire loop is unnecessary if you pass all the file names to grep at once.
grep "word" * | wc -l
Note that if word shows up more than once on the same line these solutions will only count the entire line once. If you want to count same-line occurrences separately you can use -o to print each match on a separate line:
grep -o "word" * | wc -l
The oneliner in John's answer is the way to go. Just to satisfy your curiosity:
sum=0
for f in *; do
x="$(grep 'word' "$f" | wc -l)"
echo "x: $x"
(( sum += x ))
done
echo "sum: $sum"
If the line containing the grep and wc does not yield a number you are SOL. That is why you should stick to the other solution or do a pure bash implementation with things like read, 'case and *word*)' or if [[ "$line" =~ "$re_containing_word" ]]; then ...

convert factor to numeric in bash

What's the most efficient way to convert a factor vector (not all levels are unique) into a numeric vector in bash? The values in the numeric vector do not matter as long as each represents a unique level of the factor.
To illustrate, this would be the R equivalent to what I want to do in bash:
numeric<-seq_along(levels(factor))[factor]
I.e.:
factor
AV1019A
ABG1787
AV1019A
B77hhA
B77hhA
numeric
1
2
1
3
3
Many thanks.
It is most probably not the most efficient, but maybe something to start.
#!/bin/bash
input_data=$( mktemp )
map_file=$( mktemp )
# your example written to a file
echo -e "AV1019A\nABG1787\nAV1019A\nB77hhA\nB77hhA" >> $input_data
# create a map <numeric, factor> and write to file
idx=0
for factor in $( cat $input_data | sort -u )
do
echo $idx $factor
let idx=$idx+1
done > $map_file
# go through your file again and replace values with keys
while read line
do
key=$( cat $map_file | grep -e ".* ${line}$" | awk '{print $1}' )
echo $key
done < $input_data
# cleanup
rm -f $input_data $map_file
I initially wanted to use associative arrays, but it's a bash 4+ feature only and not available here and there. If you have bash 4 then you have one file less, which is obviously more efficient.
#!/bin/bash
# your example written to a file
input_data=$( mktemp )
echo -e "AV1019A\nABG1787\nAV1019A\nB77hhA\nB77hhA" >> $input_data
# declare an array
declare -a factor_map=($( cat $input_data | sort -u | tr "\n" " " ))
# go through your file replace values with keys
while read line
do
echo ${factor_map[#]/$line//} | cut -d/ -f1 | wc -w | tr -d ' '
done < $input_data
# cleanup
rm -f $input_data

How to process values from for loop in shell script

I have below for loop in shell script
#!/bin/bash
#Get the year
curr_year=$(date +"%Y")
FILE_NAME=/test/codebase/wt.properties
key=wt.cache.master.slaveHosts=
prop_value=""
getproperty(){
prop_key=$1
prop_value=`cat ${FILE_NAME} | grep ${prop_key} | cut -d'=' -f2`
}
#echo ${prop_value}
getproperty ${key}
#echo "Key = ${key}; Value="${prop_value}
arr=( $prop_value )
for i in "${arr[#]}"; do
echo $i | head -n1 | cut -d "." -f1
done
The output I am getting is as below.
test1
test2
test3
I want to process the test2 from above results to below script in place of 'ABCD'
grep test12345 /home/ptc/storage/**'ABCD'**/apache/$curr_year/logs/access.log* | grep GET > /tmp/test.access.txt
I tried all the options but could not able to succeed as I am new to shell scripting.
Ignoring the many bugs elsewhere and focusing on the one piece of code you say you want to change:
for i in "${arr[#]}"; do
val=$(echo "$i" | head -n1 | cut -d "." -f1)
grep test12345 /dev/null "/home/ptc/storage/$val/apache/$curr_year/logs/access.log"* \
| grep GET
done > /tmp/test.access.txt
Notes:
Always quote your expansions. "$i", "/path/with/$val/"*, etc. (The * should not be quoted on the assumption that you want it to be expanded).
for i in $prop_value would have the exact same (buggy) behavior; using arr buys you nothing. If you want using arr to increase correctness, populate it correctly: read -r -a arr <<<"$prop_value"
The redirection is moved outside the loop -- that way the second iteration through the loop doesn't overwrite the file written by the first one.
The extra /dev/null passed to grep ensures that its behavior is consistent regardless of the number of matches; otherwise, it would display filenames only if more than one matching log file existed, and not otherwise.

Bash Script to save grep -c results

I am new to programming altogether and am trying to write my first bash script.
I have a file called NUMBERS.txt that has various numbers in it, as such:
1000
1001
1001
1000
1002
1001
etc..
I would like to write a script to count the occurrence of each number, save it as a variable and print it into a new text file as such:
1001= 3
1000= 2
etc..
I am completely stuck.
Here's what I have so far:
#!/bin/bash
for Count in `grep -c '1000' /NUMBERS.txt `
do
echo 'Count = '${Count}
done
for Count in `grep -c '1001' /NUMBERS.txt `
do
echo 'Count = '${Count}
done
Sort the file then count how many times each unique line occurs:
sort NUMBERS.txt | uniq -c
Now your file is already have one number on each line, it is simpler
for i in `sort -u NUMBERS.txt ` ; do count=`grep -c "$i" NUMBERS.txt ` ; echo "$i=$count" ; done > your_result.txt
or in a different format
for i in `sort -u NUMBERS.txt `
do
count=`grep -c "$i" NUMBERS.txt `
echo "$i=$count"
done > your_result.txt
As asked by , the performance is not very good. here is a much better one
sort NUMBERS.txt | uniq -c | awk '{print $1,"=",$2}'
Basically you go through NUNMBERS.txt twice. The first pass, you get the unique numbers;
The second pass you count the occurrence of each unique number.
I'm not the best at shell script, but here is a solution that works, using bash and grep -c :
#!/bin/bash
INPUT="./numbers.txt"
OUTPUT="./result.txt"
rm -f ${OUTPUT}
# you might want to change the values
for i in {1000..2000}; do
for Count in `grep -c ${i} ${INPUT}`; do
echo "${i} = ${Count}" >> ${OUTPUT}
done
done

Resources