How to perform arithmetic operations on the output of a shell command - tcsh

I'm trying to count the number of entries in a set of log files. Some of these logs have lines that should not be counted (the number of these remains constant). The way I'd like to go about this is a Perl script that iterates over a hash, which maps log names to a one-liner that gets the number of entries for that particular log (I figured this would be easier to maintain than dozens of if-else statements)
Getting the number of lines is simple:
wc -l [logfile] | cut -f1 -d " "
The issue is when I need to subtract, say, 1 or 2 from this value. I tried the following:
expr( wc -l [logfile] | cut -f1 -d " " ) - 1
But this results in an error:
Badly placed ()'s.
: Command not found.
How do I perform arithmetic operations on the output of a shell command? Is there a better way to do this?

To display one less than the number of lines with bash or any bourne-like shell:
echo $(( $(wc -l <file) - 1 ))
Discussion
To get the number of lines, you used:
wc -l logfile | cut -f1 -d " "
cut is required here because wc copies the file name to its output. To avoid that, and thus avoid the need for cut, supply the input to wc via stdin:
wc -l <logfile
In modern (POSIX) shells, arithmetic is done with $((...)). Thus, we can substract one from the number of lines via:
$(( $(wc -l <file) - 1 ))

It's a bit clunky to shell out to wc and cut just to count the number of lines in a file.
Your requirement isn't very clear, but this Perl code creates a hash that relates every log file in the current directory to the number of lines it contains. It works by reading each file into an array of lines, and then evaluating that array in scalar context to give the line count. I hope it's obvious how to subtract a constant delta from each line count.
use strict;
use warnings;
my %lines;
for my $logfile ( glob '*.log' ) {
my $num_lines = do {
open my $fh, '<', $logfile or die qq{Unable to open "$logfile" for input: $!};
my #lines = <$fh>;
};
$lines{$logfile} = $num_lines;
}
Update
After a comment from w.k, I think this version may be rather nicer
use strict;
use warnings;
my %lines;
for my $logfile ( glob '*.log' ) {
open my $fh, '<', $logfile or die qq{Unable to open "$logfile" for input: $!};
1 while <$fh>;
$lines{$logfile} = $.;
}

The existing answers went in the direction of solving your issue in perl, which you mentioned but your own experiments were in shell syntax.
You indicated tcsh but expr is Posix shell syntax.
Here is an example of a csh script that counts the number of lines in a file whose name it is passed and then does arithmetic on the number of lines.
set lines=`wc -l < $1`
# oneless = ($lines - 1)
echo "There are $lines in $1 and minus one makes $oneless"
Test:
csh count.csh count.csh
There are 3 lines in count.csh and minus one makes 2

Related

Counting number of lines in file and saving it in a bash file

I am trying to loop through all the files in a folder and add the file name of those files with 10 lines to a txt file but I don't know how to write the if statement.
As of right now, what I have is:
for FILE in *.txt do if wc $FILE == 10; then "$FILE" >> saved_names.txt fi done
I am getting stuck in how to format the statement that will evaluate to a boolean for the if statement.
I have already tried the if statement as:
if [ wc $FILE != 10 ]
if "wc $FILE" != 10
if "wc $FILE != 10"
as well as other ways but I don't seem to get it right. I know I am new to Bash but I can't seem to find a solution to this question.
There are a few problems in your code.
To count the number of lines in the file you should run "wc -l" command. However, that command will result in the number of lines and the name of the file (so for example - 10 a.txt - you can test it by running the command on a file in your terminal). To receive only the number of lines you need to pass the file's name to the standard input of that command
"==" is used in bash to compare strings. To compare integers as in that case, you should use "-eq" (take a look here https://tldp.org/LDP/abs/html/comparison-ops.html)
In terms of brackets: To get the wc command result you need to run it in a terminal and switch the command in the code to the result. To do that, you need correct brackets - $(wc -l). To receive a result of the comparison as a bool, you need to use square brackets with spaces [ 1 -eq 1 ].
To save the name of the file in another file using >> you need to first put the name to the standard output (as >> redirect the standard output to the chosen place). To do that you can just use the echo command.
The code should look like this:
#!/bin/bash
for FILE in *.txt
do
if [ "$(wc -l < "$FILE")" -eq 10 ]
then
echo "$FILE" >> saved_names.txt
fi
done
Try:
for file in *.txt; do
if [[ $(wc -l < "$file") -eq 10 ]]; then
printf '%s\n' "$file"
fi
done > saved_names.txt
Change > to >> if you want to append the filenames.
Related docs:
Command Substitution
Conditional Constructs
Extract the actual number of lines from a file with wc -l $FILE | cut -f1 -d' ' and use -eq operator:
for FILE in *.txt; do if [ "$(wc -l $FILE | cut -f1 -d' ')" -eq 10 ]; then "$FILE" >> saved_names.txt; fi; done

How to only concatenate files with same identifier using bash script?

I have a directory with files, some have the same ID, which is given in the first part of the file name before the first underscore (always). e.g.:
S100_R1.txt
S100_R2.txt
S111_1_R1.txt
S111_R1.txt
S111_R2.txt
S333_R1.txt
I want to concatenate those identical IDs (and if possible placing the original files in another dir, e.g. output:
original files (folder)
S100_merged.txt
S111_merged.txt
S333_R1.txt
Small note: I imaging that perhaps a solution would be to place all files which will be processed by the code in a new directory and than in a second step move the files with the appended "merged" back to the original dir or something like this...
I am extremely new to bash scripting, so I really can't produce this code. I am use to R language and I can think how it should be but can't write it.
My pitiful attempt is something like this:
while IFS= read -r -d '' id; do
cat *"$id" > "./${id%.txt}_grouped.txt"
done < <(printf '%s\0' *.txt | cut -zd_ -f1- | sort -uz)
or this:
for ((k=100;k<400;k=k+1));
do
IDList= echo "S${k}_S*.txt" | awk -F'[_.]' '{$1}'
while [ IDList${k} == IDList${k+n} ]; do
cat IDList${k}_S*.txt IDList${k+n}_S*.txt S${k}_S*.txt S${k}_S*.txt >cat/S${k}_merged.txt &;
done
Sometimes there are only one version of the file (e.g. S333_R1.txt) sometime two (S100*), three (S111*) or more of the same.
I am prepared for harsh critique for this question because I am so far from a solution, but if someone would be willing to help me out I would greatly appreciate it!
while read $fil;
do
if [[ "$(find . -maxdepth 1 -name $line"_*.txt" | wc -l)" -gt "1" ]]
then
cat $line_*.txt >> "$line_merged.txt"
fi
done <<< "$(for i in *_*.txt;do echo $i;done | awk -F_ '{ print $1 }')"
Search for files with _.txt and run the output into awk, printing the strings before "_". Run this through a while loop. Check if the number of files for each prefix pattern is greater than 1 using find and if it is, cat the files with that prefix pattern into a merged file.
for id in $(ls | grep -Po '^[^_]+' | uniq) ; do
if [ $(ls ${id}_*.txt 2> /dev/null | wc -l) -gt 1 ] ; then
cat ${id}_*.txt > _${id}_merged.txt
mv ${id}_*.txt folder
fi
done
for f in _*_merged.txt ; do
mv ${f} ${f:1}
done
A plain bash loop with preprocessing:
# first get the list of files
find . -type f |
# then extract the prefix
sed 's#./\([^_]*\)_#\1\t&#' |
# then in a loop merge the files
while IFS=$'\t' read prefix file; do
cat "$file" >> "${prefix}_merged.txt"
done
That script is iterative - one file at a time. To detect if there is one file of specific prefix, we have to look at all files at a time. So first an awk script to join list of filenames with common prefix:
find . -type f | # maybe `sort |` ?
# join filenames with common prefix
awk '{
f=$0; # remember the file path
gsub(/.*\//,"");gsub(/_.*/,""); # extract prefix from filepath and store it in $0
a[$0]=a[$0]" "f # Join path with leading space in associative array indexed with prefix
}
# Output prefix and filanames separated by spaces.
# TBH a tab would be a better separator..
END{for (i in a) print i a[i]}
' |
# Read input separated by spaces into a bash array
while IFS=' ' read -ra files; do
#first array element is the prefix
prefix=${files[0]}
unset files[0]
# rest is the files
case "${#files[#]}" in
0) echo super error; ;;
# one file - preserve the filename
1) cat "${files[#]}" > "$outdir"/"${files[1]}"; ;;
# more files - do a _merged.txt suffix
*) cat "${files[#]}" > "$outdir"/"${prefix}_merged.txt"; ;;
esac
done
Tested on repl.
IDList= echo "S${k}_S*.txt"
Executes the command echo with the environment variable IDList exported and set to empty with one argument equal to S<insert value of k here>_S*.txt.
Filename expansion (ie. * -> list of files) is not executed inside " double quotes.
To assign a result of execution into a variable, use command substitution var=$( something seomthing | seomthing )
IDList${k+n}_S*.txt
The ${var+pattern} is a variable expansion that does not add two variables together. It uses pattern when var is set and does nothing when var is unset. See shell parameter expansion and this my answer on ${var-pattern}, but it's similar.
To add two numbers use arithemtic expansion $((k + n)).
awk -F'[_.]' '{$1}'
$1 is just invalid here. To print a line, print it {print %1}.
Remember to check your scripts with http://shellcheck.net
A pure bash way below. It uses only globs (no need for external commands like ls or find for this question) to enumerate filenames and an associative array (which is supported by bash since the version 4.0) in order to compute frequencies of ids. Parsing ls output to list files is questionable in bash. You may consider reading ParsingLs.
#!/bin/bash
backupdir=original_files # The directory to move the original files
declare -A count # Associative array to hold id counts
# If it is assumed that the backup directory exists prior to call, then
# drop the line below
mkdir "$backupdir" || exit
for file in [^_]*_*; do ((++count[${file%%_*}])); done
for id in "${!count[#]}"; do
if ((count[$id] > 1)); then
mv "$id"_* "$backupdir"
cat "$backupdir/$id"_* > "$id"_merged.txt
fi
done

Counting number of delimiters of special character bash shell script Performance improvement

Hi I have a script that is going to count the number of records in a file and find the expected delimiters per a record by dividing the total record count by rs_count. It works fine but it is a little slow on large records. I was wondering if there is a way to improve performance. The RS is a special character octal \246. I am using bash shell script.
Some additional info:
A line is a record.
The file will always have the same number of delimiters.
The purpose of the script is to check if the file has the expected number of fields. After calculating it, the script just echos it out.
for file in $SOURCE; do
echo "executing File -"$file
if (( $total_record_count != 0 ));then
filename=$(basename "$file")
total_record_count=$(wc -l < $file)
rs_count=$(sed -n 'l' $file | grep -o $RS | wc -l)
Delimiter_per_record=$((rs_count/total_record_count))
fi
done
Counting the delimiters (not total records) in a file
On a file with 50,000 lines, I note around a 10 fold increase by incorporating the sed, grep, and wc pipeline to a single awk process:
awk -v RS='Delimiter' 'END{print NR -1}' input_file
Dealing with wc when there's no trailing line breaks
If you count the instances of ^ (start of line), you will get a true count of lines. Using grep:
grep -co "^" input_file
(Thankfully, even though ^ is a regex, the performance of this is on par with wc)
Incorporating these two modifications into a trivial test based on your supplied code:
#!/usr/bin/env bash
SOURCE="$1"
RS=$'\246'
for file in $SOURCE; do
echo "executing File -"$file
if [[ $total_record_count != 0 ]];then
filename=$(basename "$file")
total_record_count=$(grep -oc "^" $file)
rs_count="$(awk -v RS=$'\246' 'END{print NR -1}' $file)"
Delimiter_per_record=$((rs_count/total_record_count))
fi
done
echo -e "\$rs_count:\t${rs_count}\n\$Delimiter_per_record:\t${Delimiter_per_record}\n\$total_record_count:\t${total_record_count}" | column -t
Running this on a file with 50,000 lines on my macbook:
time ./recordtest.sh /tmp/randshort
executing File -/tmp/randshort
$rs_count: 186885
$Delimiter_per_record: 3
$total_record_count: 50000
real 0m0.064s
user 0m0.038s
sys 0m0.012s
Unit test one-liner
(creates /tmp/recordtest, chmod +x's it, creates /tmp/testfile with 10 lines of random characters including octal \246, and then runs the script file on the testfile)
echo $'#!/usr/bin/env bash\n\nSOURCE="$1"\nRS=$\'\\246\'\n\nfor file in $SOURCE; do\n echo "executing File -"$file\n if [[ $total_record_count != 0 ]];then\n filename=$(basename "$file")\n total_record_count=$(grep -oc "^" $file)\n rs_count="$(awk -v RS=$\'\\246\' \'END{print NR -1}\' $file)"\n Delimiter_per_record=$((rs_count/total_record_count))\n fi\ndone\n\necho -e "\\$rs_count:\\t${rs_count}\\n\\$Delimiter_per_record:\\t${Delimiter_per_record}\\n\\$total_record_count:\\t${total_record_count}" | column -t' > /tmp/recordtest ; echo $'\246459ca4f23bafff1c8fc017864aa3930c4a7f2918b\246753f00e5a9278375b\nb\246a3\246fc074b0e415f960e7099651abf369\246a6f\246f70263973e176572\2467355\n1590f285e076797aa83b2ee537c7f99\24666990bb60419b8aa\246bb5b6b\2467053\n89b938a5\246560a54f2826250a2c026c320302529331229255\246ef79fbb52c2\n9042\246bb\246b942408a22f912268ffc78f08c\2462798b0c05a75439\246245be2ea5\n0ef03170413f90e\246e0\246b1b2515c4\2466bf0a1bb\246ee28b78ccce70432e6b\24653\n51229e7ab228b4518404360b31a\2463673261e3242985bf24e59bc657\246999a\n9964\246b08\24640e63fae788ea\246a1777\2460e94f89af8b571e\246e1b53e6332\246c3\246e\n90\246ae12895f\24689885e\246e736f942080f267a275132a348ec1e837b99efe94\n2895e91\246\246f506f\246c1b986a63444b4258\246bc1b39182\24630\24696be' > /tmp/testfile ; chmod +x /tmp/recordtest ; /tmp/./recordtest /tmp/testfile
Which produces this result:
$rs_count: 39
$Delimiter_per_record: 3
$total_record_count: 10
Though there's a number of solutions for counting instances of characters in files, quite a few come undone when trying to process special characters like octal \246
awk seems to handle it reliably and quickly.

piping files in unix for "wc" while retaining filename

I have a bunch of files of the form myfile[somenumber] that are in nested directories.
I want to generate a line count on each of the files, and output that count to a file.
These files are binary and so they have to be piped through an additional script open_file before they can be counted by "wc". I do:
ls ~/mydir/*/*/other_dir/myfile* | while read x; do open_file $x | wc -l; done > stats
this works, but the problem is that it outputs the line counts to the file stats without saying the original filename. for example, it outputs:
100
150
instead of:
/mydir/...pathhere.../myfile1: 100
/mydir/...pathhere.../myfile2: 150
Second question:
What if I wanted to divide the number of wc -l by a constant, e.g. dividing it by 4, before outputting it to the file?
I know that the number of lines is a multiple of 4 so the result should be in an integer. Not sure how to do that from the above script.
how can I make it put the original filename and the wc -l result in the output file?
thank you.
You can output the file name before counting the lines:
echo -n "$x: " ; open_file $x | wc -l. The -n parameter to echo omits the trailing newline in the output.
To divide integers, you can use expr, e.g., expr $(open_file $x | wc -l) / 4.
So, the complete while loop will look as follows:
while read x; do echo -n "$x: " ; expr $(open_file $x | wc -l) / 4 ; done
Try this:
while read x; do echo -n "$x: " ; s=$(open_file $x | wc -l); echo $(($s / 4));
You've thrown away the filename by the time you get to wc(1) -- all it ever sees is a pipe(7) -- but you can echo the filename yourself before opening the file. If open_file fails, this will leave you with an ugly output file, but it might be a suitable tradeoff.
The $((...)) uses bash(1) arithmetic expansion. It might not work on your shell.

How do I divide the output of a command by two, and store the result into a bash variable?

Say if i wanted to do this command:
(cat file | wc -l)/2
and store it in a variable such as middle, how would i do it?
I know its simply not the case of
$middle=$(cat file | wc -l)/2
so how would i do it?
middle=$((`wc -l < file` / 2))
middle=$((`wc -l file | awk '{print $1}'`/2))
This relies on Bash being able to reference the first element of an array using scalar syntax and that is does word splitting on white space by default.
middle=($(wc -l file)) # create an array which looks like: middle='([0]="57" [1]="file")'
middle=$((middle / 2)) # do the math on ${middle[0]}
The second line can also be:
((middle /= 2))
When assigning variables, you don't use the $
Here is what I came up with:
mid=$(cat file | wc -l)
middle=$((mid/2))
echo $middle
The double parenthesis are important on the second line. I'm not sure why, but I guess it tells Bash that it's not a file?
using awk.
middle=$(awk 'END{print NR/2}' file)
you can also make your own "wc" using just the shell.
linec(){
i=0
while read -r line
do
((i++))
done < "$1"
echo $i
}
middle=$(linec "file")
echo "$middle"

Resources