I've written my first script, one in which I want to know if 2 files have the same values in a specific column.
Both files are WEKA machine-learning prediction outputs for different algorithms, hence they have to be in the same format, but the prediction column would be different.
Here's the code I've written based on the tutorial presented in https://linuxconfig.org/bash-scripting-tutorial-for-beginners:
#!/bin/bash
lineasdel1=$(wc -l $1 | awk '{print $1}')
lineasdel2=$(wc -l $2 | awk '{print $1}')
if [ "$lineasdel1" != "$lineasdel2" ]; then
echo "Files $1 and $2 have different number of lines, unable to perform"
exit 1
fi
function quitalineasraras {
awk '$1!="==="&&NF>0'
}
function acomodo {
awk '{gsub(/^ +| +$/, ""); gsub(/ +0/, " W 0"); gsub(/ +1$/, " W 1"); gsub(/ +/, "\t") gsub(/\+\tW/, "+"); print}'
}
function procesodel1 {
quitalineasraras "$1" | acomodo
}
function procesodel2 {
quitalineasraras "$2" | acomodo
}
el1procesado=$(procesodel1)
el2procesado=$(procesodel2)
function pegar {
paste <(echo "$el1procesado") <(echo "$el2procesado")
}
function contarintersec {
awk 'BEGIN {FS="\t"} $3==$8 {n++} END {print n}'
}
unido=$(pegar)
interseccion=$(contarintersec $unido)
echo "Estos 2 archivos tienen $interseccion coincidencias."
I ran all individual codes of all functions in the terminal and verified they work successfully (I'm using Linux Mint 19.2). Script's permissions also have been changed to make it executable. Paste command also is supposed to work with that variable syntax.
But when I run it via:
./script.sh file1 file2
if both files have the same number of lines, and I press enter, no output is obtained; instead, the terminal opens an empty line with cursor waiting for something. In order to write another command, I've got to press CTRL+C.
If both files have different number of lines the error message prints successfully, so I think the problem has something to do with the functions, with the fact that awk has different syntax for some chores, or with turning the output of functions into variables.
I know that I'm missing something, but can't come up with what could be.
Any help will be appreciated.
what could be.
function quitalineasraras {
awk '$1!="==="&&NF>0'
}
function procesodel1 {
quitalineasraras "$1" | acomodo
}
el1procesado=$(procesodel1)
The positional variables $1 are set for each function separately. The "$1" inside procesodel1 expands to empty. The quitalineasraras is passed one empty argument "".
The awk inside quitalineasraras is passed only the script without the filename, so it reads the input for standard input, ie. it waits for the input on standard input.
The awk inside quitalineasraras without any file arguments makes your script seem to wait.
Related
I am a shell script which will give few lines as a output. Below is the output I am getting from shell script. My script flow is like first it will check weather we are having that file, if I am having it should give me file name and modified date. If I am not having it should give me file name and not found in a tabular form and send email. Also it should add header to the output.
CMC_daily_File.xlsx Not Found
CareOneHMA.xlsx Jun 11
Output
File Name Modified Date
CMC_daily_File.xlsx Not Found
CareOneHMA.xlsx Jun 11
UPDATE
sample of script
#!/bin/bash
if [ -e /saddwsgnas/radsfftor/coffe/COE_daily_File.xlsx ]; then
cd /sasgnas/radstor/coe/
ls -la COE_daily_File.xlsx | awk '{print $9, $6"_"$7}'
else
echo "CMC_COE_daily_File.xlsx Not_Found"
fi
Output
CMC_COE_daily_File.xlsx Jun_11
I thought I might offer you some options with a slightly modified script. I use the stat command to obtain the file modification time in more expansive format, as well as specifying an arbitrary, pre-defined, spacer character to divide the column data. That way, you can focus on displaying the content in its original, untampered form. This would also allow the formatted reporting of filenames which contain spaces without affecting the logic for formatting/aligning columns. The column command is told about that spacer character and it will adjust the width of columns to the widest content in each column. (I only wish that it also allowed you to specify a column divider character to be printed, but that is not part of its features/functions.)
I also added the extra AWK action, on the chance that you might be interested in making the results stand out more.
#!/bin/sh
#QUESTION: https://stackoverflow.com/questions/74571967/how-to-send-shell-script-output-in-a-tablular-form-and-send-the-mail
SPACER="|"
SOURCE_DIR="/saddwsgnas/radsfftor/coe"
SOURCE_DIR="."
{
printf "File Name${SPACER}Modified Date\n"
#for file in COE_daily_File.xlsx
for file in test_55.sh awkReportXmlTagMissingPropertyFieldAssignment.sh test_54.sh
do
if [ -e "${SOURCE_DIR}/${file}" ]; then
cd "${SOURCE_DIR}"
#ls -la "${file}" | awk '{print $9, $6"_"$7}'
echo "${file}${SPACER}"$(stat --format "%y" "${file}" | cut -f1 -d\. | awk '{ print $1, $2 }' )
else
echo "${file}${SPACER}Not Found"
fi
done
} | column -x -t -s "|" |
awk '{
### Refer to:
# https://man7.org/linux/man-pages/man4/console_codes.4.html
# https://www.ecma-international.org/publications-and-standards/standards/ecma-48/
if( NR == 1 ){
printf("\033[93;3m%s\033[0m\n", $0) ;
}else{
print $0 ;
} ;
}'
Without that last awk command, the output session for that script was as follows:
ericthered#OasisMega1:/0__WORK$ ./test_55.sh
File Name Modified Date
test_55.sh 2022-11-27 14:07:15
awkReportXmlTagMissingPropertyFieldAssignment.sh 2022-11-05 21:28:00
test_54.sh 2022-11-27 00:11:34
ericthered#OasisMega1:/0__WORK$
With that last awk command, you get this:
I want to write a script for any name given as an argument and prints the list of paths
to home directories of people with the name.
I am new at scripts. Is there any simple way to do this with awk or egrep command?
Example:
$ show names jakub anna (as an argument)
/home/users/jakubo
/home/students/j_luczka
/home/students/kubeusz
/home/students/jakub5z
/home/students/qwertinx
/home/users/lazinska
/home/students/annalaz
Here is the my friend's code but I have to write it from a different way and it has to be simple like this code
#!/bin/bash
for name in $#
do
awk -v n="$name" -F ':' 'BEGIN{IGNORECASE=1};$5~n{print $6}' /etc/passwd | while read line
do
echo $line
done
done
Possible to use a simple awk script to look for matching names.
The list of names can be passed as a space separated list to awk, which will construct (in the BEGIN section) a combined pattern (e.g. '(names|jakub|anna)'). The pattern is used for testing the user name column ($5) of the password file.
#! /bin/sh
awk -v "L=$*" -F: '
BEGIN {
name_pat = "(" gensub(" ", "|", "g", L) ")"
}
$5 ~ name_pat { print $6 }
' /etc/passwd
Since at present the question as a whole is unclear, this is more of a long comment, and only a partial answer.
There is one easy simplification, since the sample code includes:
... | while read line
do
echo $line
done
All of the code shown above after and including the | is needless, and does nothing, (like a UUoC), and should therefore be removed. (Actually echo $line with an unquoted $line would remove formatting and repeated spaces, but that's not relevant to the task at hand, so we can say the code above does nothing.)
I'm studying awk pretty fiercely to write a git diffn implementation which will show line numbers for git diff, and I want confirmation on whether or not this Wikipedia page on awk is wrong [Update: I've now fixed this part of that Wikipedia page, but this is what it used to say]:
(pattern)
{
print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)
}
Output may be sent to a file:
(pattern)
{
print "expression" > "file name"
}
or through a pipe:
(pattern)
{
print "expression" | "command"
}
Notice (pattern) is above the opening brace. I'm pretty sure this is wrong but need to know for certain before editing the page. What I think that page should look like is this:
/regex_pattern/ {
print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)
}
Output may be sent to a file:
/regex_pattern/ {
print "expression" > "file name"
}
or through a pipe:
/regex_pattern/ {
print "expression" | "command"
}
Here's a test to "prove" it. I'm on Linux Ubuntu 18.04.
1. test_awk.sh
gawk \
'
BEGIN
{
print "START OF AWK PROGRAM"
}
'
Test and error output:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
gawk: cmd. line:3: BEGIN blocks must have an action part
But with this:
2. test_awk.sh
gawk \
'
BEGIN {
print "START OF AWK PROGRAM"
}
'
It works fine!:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
START OF AWK PROGRAM
Another example (fails to provide expected output):
3. test_awk.sh
gawk \
'
/hey/
{
print $0
}
'
Erroneous output:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey1
hello
hey2
hey2
But like this:
4. test_awk.sh
gawk \
'
/hey/ {
print $0
}
'
It works as expected:
$ echo -e "hey1\nhello\nhey2" | ./test_awk.sh
hey1
hey2
Updates: after solving this problem, I just added these sections below:
Learning material:
In the process of working on this problem, I just spent several hours and created these examples: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/tree/master/awk. These examples, comments, and links would prove useful to anyone getting started learning awk/gawk.
Related:
git diff with line numbers and proper code alignment/indentation
"BEGIN blocks must have an action part" error in awk script
The whole point of me learning awk at all in the first place was to write git diffn. I just got it done: Git diff with line numbers (Git log with line numbers)
I agree with you that the Wikipedia page is wrong. It's right in the awk manual:
A pattern-action statement has the form
pattern { action }
A missing { action } means print the line; a missing pattern always matches. Pattern-action statements are separated by newlines or semicolons.
...
Statements are terminated by semicolons, newlines or right braces.
This the man page for the default awk on my Mac. The same information is in the GNU awk manual, it's just buried a little deeper. And the POSIX specification of awk states
An awk program is composed of pairs of the form:
pattern { action }
Either the pattern or the action (including the enclosing brace characters) can be omitted.
A missing pattern shall match any record of input, and a missing action shall be equivalent to:
{ print }
You can see in you examples that instead of semicolons at the end of statements you can separate them with new lines. When you have
/regex/
{ ...
}
it's equivalent to /regex/; {...} which is equal to /regex/{print $0} {...} as you tested the behavior.
Note that BEGIN and END are special markers and they need action statements explicitly since for BEGIN {print $0} is not possible as the default action. That's why the open curly brace should be on the same line. Perhaps due to convenience but it's all consistent.
I am trying to collate a series of .csv log files that are named by date (e.g., 2019-02-24.csv). There are a bunch of them, so I'm trying to script the process. I've crafted an AWK script that combines individual files:
awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFICE,Acronym,Name/) getline; } 1 { print } ' 2019-01-01.csv >> usage_history.csv
But I am failing when I try to string the AWK commands together with a control loop in BASH:
for i in {01..28}; do echo "awk ' FNR==1 { while (/\"_time\",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-$i.csv >> user_history.csv"; done
When I run this, it prints out the correct commands to the command line, but the awk scripts are not executed (they only get printed). If I run it without echo, I get errors telling me that the file doesn't exist; though all files are present:
bash: awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-01.csv >> user_history.csv: No such file or directory
What am I missing in my loop?
Here is a condensed sample of the command and the error messages:
$ for i in {01..02}; do "awk ' FNR==1 { while (/\"_time\",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-$i.csv >> user_history.csv"; done
bash: awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-01.csv >> user_history.csv: No such file or directory
bash: awk ' FNR==1 { while (/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/) getline; } 1 { print } ' 2019-01-02.csv >> user_history.csv: No such file or directory
Could you please try following.
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-[0-9]*.csv >> user_history.csv
Here following are the points why one could use this approach:
1- Use of for loop and calling awk command in that each time will be a overkill. We should use smart approach when awk could read multiple files then we should sue it.
2- Now comes the getline part which you tried in your code, so if we want to negate any string then simply negate it by using !/string_to_be_skipped/ so it will look for only those lines which are NOT having this string.
3- While mentioning file(multiple files) to single awk command I used 2019-01-[0-9]*.csv why because since you have NOT told if files will be created daily basis or not so in case we give it a loop style and that specific file is NOT present then we will get an error. For an example let's say I use following awk command where I intentionally removed file named(2019-01-02.csv).
awk '........' 2019-01-{01..29}.csv
awk: cannot open 2019-01-02.csv (No such file or directory)
So to avoid these kind of situations I have used 2019-01-[0-9]*.csv where it will only look for files which have digits after 2019-01-0 and will loop NOT run in a loop and complaint us that some xyz etc file is missing.
Try this:
for i in {01..28}; do awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-$i.csv >>user_history.csv;done
The commands after do should not be quoted.
And what you were doing essentially equals to ignore the title lines.
The {print} after 1 is unnecessary -- single 1 implies {print}. The 1 is to provide a true.
-- When there's only an expression but no block, the block implies to {print}.
-- And only a regexp equals $0~/regex/, and here I negated it.
If there's no other command inside the loop, you can simplify the loop with one awk command:
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-{01..28}.csv >>user_history.csv
But this one will throw error and stop executing when one of the files not existed.
Another way is:
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 2019-01-[0-3][0-9].csv >>user_history.csv
This one will only match filenames, instead of loop for them.
It won't stop executing nor throw error, So if there's file missing you wouldn't know. And it will match extra files if exist.
For example it will read 2019-01-34.csv if it exists.
So if you want the warnings (warnings won't affect the results), but don't want the commands to stop, then use the first for loop one.
Pitfalls:
[0-3][1-9] won't match 10,20 and 30, but will match 32 to 39.
[0-9]* will match any longer number, but with 20 to 29 before 3 or likewise, it's string order.
Thanks to #Tiw and #RavinderSingh13 for their guidance. Here is the final awk script that is working well for my case where I have daily files from multiple days, months, and years (only 2018 and 2019 in this case):
awk '!/"_time",PIN,FULLNAME,OFFCODE,Acronym,Name/' 201[8-9]-[0-1][0-2]-[0-3][0-9].csv >> user_history.csv
Problem: Comparison of files from Pre-check status and Post-check status of a node for specific parameters.
With some help from community, I have written the following solution which extracts the information from files from directories pre and post and based on the "Node-ID" (which happens to be unique and is to be extracted from the files as well). After extracting the data from Pre/post folder, I have created folders based on the node-id and dumped files into the folders.
My Code to extract data (The data is extracted from Pre and Post folders)
FILES=$(find postcheck_logs -type f -name *.log)
for f in $FILES
do
NODE=`cat $f | grep -m 1 ">" | awk '{print $1}' | sed 's/[>]//g'` ##Generate the node-id
echo "Extracting Post check information for " $NODE
mkdir temp/$NODE-post ## create a temp directory
cat $f | awk 'BEGIN { RS=$NODE"> "; } /^param1/ { foo=RS $0; } END { print foo ; }' > temp/$NODE-post/param1.txt ## extract data
cat $f | awk 'BEGIN { RS=$NODE"> "; } /^param2/ { foo=RS $0; } END { print foo ; }' > temp/$NODE-post/param2.txt
cat $f | awk 'BEGIN { RS=$NODE"> "; } /^param3/ { foo=RS $0; } END { print foo ; }' > temp/$NODE-post/param3.txt
done
After this I have a structure as:
/Node1-pre/param1.txt
/Node1-post/param1.txt
and so on.
Now I am stuck to compare $NODE-pre and $NODE-post files,
I have tried to do it using recursive grep, but I am not finding a suitable way to do so. What is the best possible way to compare these files using diff?
Moreover, I find the above data extraction program very slow. I believe it's not the best possible way (using least resources) to do so. Any suggestions?
Look askance at any instance of cat one-file — you could use I/O redirection on the next command in the pipeline instead.
You can do the whole thing more simply with:
for f in $(find postcheck_logs -type f -name *.log)
do
NODE=$(sed '/>/{ s/ .*//; s/>//g; p; q; }' $f) ##Generate the node-id
echo "Extracting Post check information for $NODE"
mkdir temp/$NODE-post
awk -v NODE="$NODE" -v DIR="temp/$NODE-post" \
'BEGIN { RS=NODE"> " }
/^param1/ { param1 = $0 }
/^param2/ { param2 = $0 }
/^param3/ { param3 = $0 }
END {
print RS param1 > DIR "/param1.txt"
print RS param2 > DIR "/param2.txt"
print RS param3 > DIR "/param3.txt"
}' $f
done
The NODE finding process is much better done by a single sed command than cat | grep | awk | sed, and you should plan to use $(...) rather than back-quotes everywhere.
The main processing of the log file should be done once; a single awk command is sufficient. The script is passed to variables — NODE and the directory name. The BEGIN is cleaned up; the $ before NODE was probably not what you intended. The main actions are very similar; each looks for the relevant parameter name and saves it in an appropriate variable. At the end, it write the saved values to the relevant files, decorated with the value of RS. Semicolons are only needed when there's more than one statement on a line; there's just one statement per line in this expanded script. It looks bigger than the original, but that's only because I'm using vertical space.
As to comparing the before and after files, you can do it in many ways, depending on what you want to know. If you've got a POSIX-compliant diff (you probably do), you can use:
diff -r temp/$NODE-pre temp/$NODE-post
to report on the differences, if any, between the contents of the two directories. Alternatively, you can do it manually:
for file in param1.txt param2.txt param3.txt
do
if cmp -s temp/$NODE-pre/$file temp/$NODE-post/$file
then : No difference
else diff temp/$NODE-pre/$file temp/$NODE-post/$file
fi
done
Clearly, you can wrap that in a 'for each node' loop. And, if you are going to need to do that, then you probably do want to capture the output of the find command in a variable (as in the original code) so that you do not have to repeat that operation.