collect output in bash before writing - bash

I've got the following issue:
i am parsing a file in a bash script and preparing an output format like this:
echo "Evaluation of data from $date" > $outPut
printf "\n%s\t\t\t%s\t\t%s\n" "column1" "column2" "column3" >> $outPut
printf '%*s\n' "${COLUMNS:-$(tput cols)}" '' | tr ' ' - >> $outPut
cat $file | \
while read i;
do
...
printf "..."
done >> $outPut
It works fine if $outPut is a file. But depending on a program parameter $output can be /dev/stdout.
If I pipe the output to less
bash someprogram.bash --tostdout | less
less starts immediately with zero output. After a while I see everything, but if I do :G less just stops working and I can only stop it with CTRL-C. If I don't pipe it, the output works fine.
What I want is: the function to write everything at once after collecting the output + the pipe to wait for my program to finish.

I found a workaround by first writing to a tempfile and cat it to stdout afterwards
tmpFile=$(mktemp -p /tmp)
echo "Evaluation of data from $date" > $tmpFile
printf "\n%s\t\t\t%s\t\t%s\n" "column1" "column2" "column3" >> $tmpFile
printf '%*s\n' "${COLUMNS:-$(tput cols)}" '' | tr ' ' - >> $tmpFile
cat $file | \
while read i;
do
...
printf "..."
done >> $tmpFile
cat $tmpFile
rm $tmpFile
The advantage is that less gets the full output of the script and not chunk after chunk.

Related

Concatenate the output of 2 commands in the same line in Unix

I have a command like below
md5sum test1.txt | cut -f 1 -d " " >> test.txt
I want output of the above result prefixed with File_CheckSum:
Expected output: File_CheckSum: <checksumvalue>
I tried as follows
echo 'File_Checksum:' >> test.txt | md5sum test.txt | cut -f 1 -d " " >> test.txt
but getting result as
File_Checksum:
adbch345wjlfjsafhals
I want the entire output in 1 line
File_Checksum: adbch345wjlfjsafhals
echo writes a newline after it finishes writing its arguments. Some versions of echo allow a -n option to suppress this, but it's better to use printf instead.
You can use a command group to concatenate the the standard output of your two commands:
{ printf 'File_Checksum: '; md5sum test.txt | cut -f 1 -d " "; } >> test.txt
Note that there is a race condition here: you can theoretically write to test.txt before md5sum is done reading from it, causing you to checksum more data than you intended. (Your original command mentions test1.txt and test.txt as separate files, so it's not clear if you are really reading from and writing to the same file.)
You can use command grouping to have a list of commands executed as a unit and redirect the output of the group at once:
{ printf 'File_Checksum: '; md5sum test1.txt | cut -f 1 -d " " } >> test.txt
printf "%s: %s\n" "File_Checksum:" "$(md5sum < test1.txt | cut ...)" > test.txt
Note that if you are trying to compute the hash of test.txt(the same file you are trying to write to), this changes things significantly.
Another option is:
{
printf "File_Checksum: "
md5sum ...
} > test.txt
Or:
exec > test.txt
printf "File_Checksum: "
md5sum ...
but be aware that all subsequent commands will also write their output to test.txt. The typical way to restore stdout is:
exec 3>&1
exec > test.txt # Redirect all subsequent commands to `test.txt`
printf "File_Checksum: "
md5sum ...
exec >&3 # Restore original stdout
Operator &&
e.g. mkdir example && cd example

Bash - Extract Matching String from GZIP Files Is Running Very Slow

Complete novice in Bash. Trying to iterate thru 1000 gzip files, may be GNU parallel is the solution??
#!/bin/bash
ctr=0
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
for f in "$dir"/*.gz; do
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr" >> $1
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym
else
ctr=$((ctr+1))
fi
done
done
Any help to speed the process will be greatly appreciated !!!
#!/bin/bash
ctr=0
export ctr
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
export dir
doit() {
f="$1"
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr"
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym >&2
else
ctr=$((ctr+1))
fi
done
}
export -f doit
parallel doit ::: *gz 2>&1 > $1
The Bash while read loop is probably your main bottleneck here. Calling multiple external processes for simple field splitting will exacerbate the problem. Briefly,
while IFS="|" read -r first second third rest; do ...
leverages the shell's built-in field splitting functionality, but you probably want to convert the whole thing to a simple Awk script anyway.
echo "file_name,symbol,record_count" > "$1"
for f in "/data/myfolder"/*.gz; do
gunzip -c "$f" |
awk -F "\|" -v f="$f" -v OFS="," '
/H/ { if(ctr) print f, sym, ctr
ctr=0; sym=$3;
print sym >"/dev/stderr"
next }
{ ++ctr }'
done >>"$1"
This vaguely assumes that printing the lone sym is just for diagnostics. It should hopefully not be hard to see how this can be refactored if this is an incorrect assumption.

Redirection based on variables assigned within while loop setup

I want to access while loop variable out side the loop
while read line
do
...
...
...
done < $file > /home/Logs/Sample_$line_$(date +"%Y%m%d_%H%M%S").log
In the above example whatever the log file is getting generated that doesn't have the value for the line variable. i.e. $line is not working here.
Please let me know how this can be written to make it work.
#!/bin/sh
exec 1> /home/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
echo "Execution Started : `date` \n"
SQL_DIR=/home/Sql
INFILE=in_file
TEMPFILE=temp_file
RETURN_CODE=0
ls -ltr $SQL_DIR|grep ".sql"|awk -F" " '{print $9}'|awk -F"." '{print $1}' > $INFILE
sed '/^$/d' $INFILE > $TEMPFILE; mv $TEMPFILE $INFILE
while read line
do
{
START_TIME=`date +%s`
printf "\n SQL File Executed Is : $line.sql"
SQL_FILE_NM=$line.sql
SQL_FILE=$SQL_DIR/$SQL_FILE_NM
nzsql -db netezza_db -Atqf $SQL_FILE > /dev/null
RETURN_CODE=$?
if [ $RETURN_CODE -eq 0 ]; then
echo "Time taken to execute sqlfile $SQL_FILE=$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS" > $TEMPFILE
printf "\n Success: Sql completed successfully at `date` \n"
cat $TEMPFILE|mailx -s "Time taken to execute sqlfile $SQL_FILE=$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS" 'koushik.chandra#a.com'
else
printf "\n Error: Failed in sql execution at `date`"
exit $RETURN_CODE
fi
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - START_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the sql $line="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
} > /home/Logs/sql_query_time_$line_$(date +"%Y%m%d_%H%M%S").log
done < $INFILE
rm -f $INFILE $TEMPFILE
exit $RETURN_CODE
You actually need redirection inside the while loop:
while read -r line; do
{ cmd1; cmd2; cmd3; } > "/home/Logs/Sample_${line}_$(date +"%Y%m%d_%H%M%S").log"
done < "$file"
When you have > outfile after done then output is redirected to one file only.

A script to find all the users who are executing a specific program

I've written the bash script (searchuser) which should display all the users who are executing a specific program or a script (at least a bash script). But when searching for scripts fails because the command the SO is executing is something like bash scriptname.
This script acts parsing the ps command output, it search for all the occurrences of the specified program name, extracts the user and the program name, verifies if the program name is that we're searching for and if it's it displays the relevant information (in this case the user name and the program name, might be better to output also the PID, but that is quite simple). The verification is accomplished to reject all lines containing program names which contain the name of the program but they're not the program we are searching for; if we're searching gedit we don't desire to find sgedit or gedits.
Other issues I've are:
I would like to avoid the use of a tmp file.
I would like to be not tied to GNU extensions.
The script has to be executed as:
root# searchuser programname <invio>
The script searchuser is the following:
#!/bin/bash
i=0
search=$1
tmp=`mktemp`
ps -aux | tr -s ' ' | grep "$search" > $tmp
while read fileline
do
user=`echo "$fileline" | cut -f1 -d' '`
prg=`echo "$fileline" | cut -f11 -d' '`
prg=`basename "$prg"`
if [ "$prg" = "$search" ]; then
echo "$user - $prg"
i=`expr $i + 1`
fi
done < $tmp
if [ $i = 0 ]; then
echo "No users are executing $search"
fi
rm $tmp
exit $i
Have you suggestion about to solve these issues?
One approach might looks like such:
IFS=$'\n' read -r -d '' -a pids < <(pgrep -x -- "$1"; printf '\0')
if (( ! ${#pids[#]} )); then
echo "No users are executing $1"
fi
for pid in "${pids[#]}"; do
# build a more accurate command line than the one ps emits
args=( )
while IFS= read -r -d '' arg; do
args+=( "$arg" )
done </proc/"$pid"/cmdline
(( ${#args[#]} )) || continue # exited while we were running
printf -v cmdline_str '%q ' "${args[#]}"
user=$(stat --format=%U /proc/"$pid") || continue # exited while we were running
printf '%q - %s\n' "$user" "${cmdline_str% }"
done
Unlike the output from ps, which doesn't distinguish between ./command "some argument" and ./command "some" "argument", this will emit output which correctly shows the arguments run by each user, with quoting which will re-run the given command correctly.
What about:
ps -e -o user,comm | egrep "^[^ ]+ +$1$" | cut -d' ' -f1 | sort -u
* Addendum *
This statement:
ps -e -o user,pid,comm | egrep "^\s*\S+\s+\S+\s*$1$" | while read a b; do echo $a; done | sort | uniq -c
or this one:
ps -e -o user,pid,comm | egrep "^\s*\S+\s+\S+\s*sleep$" | xargs -L1 echo | cut -d ' ' -f1 | sort | uniq -c
shows the number of process instances by user.

Bash - sometimes creates only empty output

I am trying to create a bash dictionary script that accepts first argument and creates file named after that, then script accepts next arguments (which are files inside same folder) and outputs their content into file (first argument). It also sorts, deletes symbols etc., but main problem is, that sometimes ouptut file is empty (I am passing one non empty file and one non existing file), after deleting and running script few more times it is sometimes empty sometimes not.
#!/bin/bash
numberoffileargs=$(( $# - 1 ))
exitstat=0
counterexit=0
acceptingstdin=0;
> "$1";
#check if we have given input files given
if [ "$#" -gt 1 ]; then
#for cycle going through input files
for i in "${#:2}"
do
#check whether input file is readable
if [ -r "${i}" ]; then
cat "${i}" >> "$1"
#else redirect to standard output
else
exitstat=2
counterexit=$((counterexit + 1))
echo "file does not exist" 1>&2
fi
done
else
echo "stdin code to be done"
acceptingstdin=1
#stdin input to output file
#stdin=$(cat)
fi
#one word for each line, alphabetical sort, alphabet only, remove duplicates
#all lowercase
#sort -u >> "$1"
if [ "$counterexit" -eq "$numberoffileargs" ] && [ "$acceptingstdin" -eq 0 ]; then
exitstat=3
fi
cat "$1" | sed -r 's/[^a-zA-Z\-]+/ /g' | tr A-Z a-z | tr ' ' '\n' | sort -u | sed '/^$/d' > "$1"
echo "$numberoffileargs"
echo "$counterexit"
echo "$exitstat"
exit $exitstat
Here is your script with some syntax improvement. Your trouble came from the fact that the dictionary was both on input and output on your pipeline; I added a temp file to fix it.
#!/bin/bash
(($# >= 1)) || { echo "Usage: $0 dictionary file ..." >&2 ; exit 1;}
dict="$1"
shift
echo "Creating $dict ..."
>| "$dict" || { echo "Failed." >&2 ; exit 1;}
numberoffileargs=$#
exitstat=0
counterexit=0
acceptingstdin=0
if (($# > 0)); then
for i ; do
#check whether input file is readable
if [ -r "${i}" ]; then
cat "${i}" >> "$dict"
else
exitstat=2
let counterexit++
echo "file does not exist" >&2
fi
done
else
echo "stdin code to be done"
acceptingstdin=1
fi
if ((counterexit == numberoffileargs && acceptingstdin == 0)); then
exitstat=3
fi
sed -r 's/[^a-zA-Z\-]+/ /g' < "$dict" | tr '[:upper:]' '[:lower:]' | tr ' ' '\n' |
sort -u | sed '/^$/d' >| tmp$$
mv -f tmp$$ "$dict"
echo "$numberoffileargs"
echo "$counterexit"
echo "$exitstat"
exit $exitstat
The pipeline might be improved.

Resources