Replace specific column with values passed as parameter in a file within loop in shell script - shell

Suppose we have a file test_file.csv:
"261718"|"2017-08-21"|"ramesh_1"|"111"
"261719"|"2017-08-23"|"suresh_1"|"112"
required modified test_file.csv should be :
"261718"|"2017-08-21"|"ramesh"|"111"
"261719"|"2017-08-23"|"suresh"|"112"
How would I find and replace the third column with the required values passed as parameters? It should be within an iteration.

You can save your arguments as comma separated values and store it in a variable args.
Pass this variable to awk using -v option. overwrite the third column $3 with nth array element where n is the current row number
args='"ramesh","suresh"'
awk -F "|" -v args=$args '
BEGIN {
split(args,arr,",")
}
{
$3=arr[NR];OFS=FS;print
}' test_file.csv
Output:
"261718"|"2017-08-21"|"ramesh"|"111"
"261719"|"2017-08-23"|"suresh"|"112"

Related

Is there a way to iterate over values of a column then check if it's present elsewhere?

I have generated 2 .csv files, one containing the original md5sums of some files in a directory and one containing the md5sums calculated at a specific moment.
md5_original.csv
----------
$1 $2 $3
7815696ecbf1c96e6894b779456d330e,,s1.txt
912ec803b2ce49e4a541068d495ab570,,s2.txt
040b7cf4a55014e185813e0644502ea9,,s64.txt
8a0b67188083b924d48ea72cb187b168,,b43.txt
etc.
md5_$current_date.csv
----------
$1 $2 $3
7815696ecbf1c96e6894b779456d330e,,s1.txt
4d4046cae9e9bf9218fa653e51cadb08,,s2.txt
3ff22b3585a0d3759f9195b310635c29,,b43.txt
etc.
* some files could be deleted when calculating current md5sums
I am looking to iterate over the values of column $3 in md5_$current_date.csv and, for each value of that column, to check if it exists in the md5_original.csv and if so, finally to compare its value on $1.
Output should be:
s2.txt hash changed from 912ec803b2ce49e4a541068d495ab570 to 4d4046cae9e9bf9218fa653e51cadb08.
b43.txt hash changed from 8a0b67188083b924d48ea72cb187b168 to 3ff22b3585a0d3759f9195b310635c29.
I have written the script for building this two .csv files, but I am struggling to the awk part where I have to do what I have asked above. I don't know if there is a better way to do this, I am a newbie.
I would use GNU AWK for this task following way, let md5_original.csv content be
7815696ecbf1c96e6894b779456d330e {BLANK_COLUMN} s1.txt
912ec803b2ce49e4a541068d495ab570 {BLANK_COLUMN} s2.txt
040b7cf4a55014e185813e0644502ea9 {BLANK_COLUMN} s64.txt
8a0b67188083b924d48ea72cb187b168 {BLANK_COLUMN} b43.txt
and md5_current.csv content be
7815696ecbf1c96e6894b779456d330e {BLANK_COLUMN} s1.txt
4d4046cae9e9bf9218fa653e51cadb08 {BLANK_COLUMN} s2.txt
3ff22b3585a0d3759f9195b310635c29 {BLANK_COLUMN} b43.txt
then
awk 'FNR==NR{arr[$3]=$1;next}($3 in arr)&&($1 != arr[$3]){print $3 " hash changed from " arr[$3] " to " $1}' md5_original.csv md5_current.csv
output
s2.txt hash changed from 912ec803b2ce49e4a541068d495ab570 to 4d4046cae9e9bf9218fa653e51cadb08
b43.txt hash changed from 8a0b67188083b924d48ea72cb187b168 to 3ff22b3585a0d3759f9195b310635c29
Explanation: FNR is number of row in current file, NR is number of row globally, these are equal only when processing 1st file. When processing 1st file I create array arr so keys are filenames and values are corresponding hash values, next cause GNU AWK to go to next line i.e. no other action is undertaken, so rest is applied only for all but first file. ($3 in arr) is condition: is current $3 one of keys of arr? If it does hold true I print concatenation of current $3 (that is filename) hash changed from string value for key $3 from array arr (that is old hash value) to string $1 (current hash value). If given filename is not present in array arr then no action is undertaken.
Edit: added exclusion for hash which not changed as suggested in comment.
(tested in gawk 4.2.1)

Grab similar data and do the math operation on the grabbed data

I need to search the value of each student attendance from below table and need to do summation of each student attendance.
I have a file dump with below data inside file_dump.txt
[days Leaves PERCENTAGE student_attendance]
194 1.3 31.44% student1.entry2
189 1.3 30.63% student2._student2
138 0.9 22.37% student3.entry2
5 0.0 0.81% student3._student3
5 0.0 0.81% student1._student1
I need to search student1 data from above table using linux command (grep or other commands) and then do the summation of student1.entry2 and student1._student1 together that is ( 194 + 5 = 199).
How can I do this using linux command line ?
Awk is eminently suitable for small programming tasks like this.
awk -v student="student1" '$4 ~ "^" student "\." { sum += $1 }
END { print sum }' file
The -v option lets you pass in a value for an Awk variable from the command line; we do that to provide a value for the variable student. The first line checks whether the fourth field $4 begins with that variable immediately followed by a dot, and if so, adds the first field $1 to the variable sum. (Conveniently, uninitialized variables spring to life with a default value of zero / empty string.) This gets repeated for each input line in the file. Then the END block gets executed after the input file has been exhausted, and we print the accumulated sum.
If you want to save this in an executable script, you might want to allow the caller to specify the student to search for:
#!/bin/sh
# Fail if $1 is unset
: ${1?Syntax: $0 studentname}
awk -v student="$1" '$4 ~ "^" student "\." { sum += $1 }
END { print sum }' file
Notice how the $ variables inside the single quotes are Awk field names (the single quotes protect the script's contents from the shell's variable interpolation etc facilities), whereas the one with double quotes around it gets replaced by the shell with the value of the first command-line argument.

Update column in file based on associative array value in bash

So I have a file named testingFruits.csv with the following columns:
name,value_id,size
apple,1,small
mango,2,small
banana,3,medium
watermelon,4,large
I also have an associative array that stores the following data:
fruitSizes[apple] = xsmall
fruitSizes[mango] = small
fruitSizes[banana] = medium
fruitSizes[watermelon] = xlarge
Is there anyway I can update the 'size' column within the file based on the data within the associative array for each value in the 'name' column?
I've tried using awk but I had no luck. Here's a sample of what I tried to do:
awk -v t="${fruitSizes[*]}" 'BEGIN{n=split(t,arrayval,""); ($1 in arrayval) {$3=arrayval[$1]}' "testingFruits.csv"
I understand this command would get the bash defined array fruitSizes, do a split on all the values, then check if the first column (name) is within the fruitSizes array. If it is, then it would update the third column (size) with the value found in fruitSizes for that specific name.
Unfortunately this gives me the following error:
Argument list too long
This is the expected output I'd like in the same testingFruits.csv file:
name,value_id,size
apple,1,xsmall
mango,2,small
banana,3,medium
watermelon,4,xlarge
One edge case I'd like to handle is the presence of duplicate values in the name column with different values for the value_id and size columns.
If you want to stick to an awk script, pass the array via stdin to avoid running into ARG_MAX issues.
Since your array is associative, listing only the values ${fruitSizes[#]} is not sufficient. You also need the keys ${!fruitSizes[#]}. pr -2 can pair the keys and values in one line.
This assumes that ${fruitSizes[#]} and ${!fruitSizes[#]} expand in the same order, and your keys and values are free of the field separator (, in this case).
printf %s\\n "${!fruitSizes[#]}" "${fruitSizes[#]}" | pr -t -2 -s, |
awk -F, -v OFS=, 'NR==FNR {a[$1]=$2; next} $1 in a {$3=a[$1]} 1' - testingFruits.csv
However, I'm wondering where the array fruitSizes comes from. If you read it from a file or something like that, it would be easier to leave out the array altogether and do everything in awk.

Initialize an Array inside AWK Command and use the Array to Print using AWK

Im trying to Do a Comparison of 2 File Data and print certain out out of it.
My objective mainly here is to initlize an araay containing some values inside the same awk statement and use it for some printing purpose.
Below is the Command i am using which i feel looking like some syntactical error.
Please Help in the AWK part how I should define the Array also How i cna use it inside it.
Command tried -
paste -d "|" filedata.txt tabdata.txt | awk -F '|' '{array=("RE_LOG_ID" "FILE_RUN_ID" "FH_RECORDTYPE" "FILECATEGORY")}' '{c=NF/2;for(i=1;i<=c;i++)if($i!=$(i+c))printf "%s|%s|%s|%s\n",$1,${array[i]},$i,$(i+c)}'
SAMPLE INPUT FILE
filedata.txt
A|1|2|3
B|2|3|4
tabdata.txt
A|1|4|3
B|2|3|7
So my Output i am wanting is . -
A|FH_RECORDTYPE|2|4
B|FILECATEGORY|4|7
The Output Comprises the Differences -
PRIMARYKEY|COLUMNNAME|FILE1DATA|FILE2DATA
I want the Array to be initialized inside the AWK as array=("RE_LOG_ID" "FILE_RUN_ID" "FH_RECORDTYPE" "FILECATEGORY") and will correspond Column Names
The fetching columnname from the array- condition will be when ($i!=$(i+c)) whichever "i"th position mismatches i will print the "i" th Element from the Array.
Finding the Differences Section is working perfect if i remove the array part from my command, but my ask is i want to initialize an array containing the column names and print it too within the awk statement.
Just i need help how to incorporate the Array Part within AWK.
Unfortunately arrays in AWK cannot be assigned as you expect. As an alternative, you can use split function like:
split("RE_LOG_ID FILE_RUN_ID FH_RECORDTYPE FILECATEGORY", array, " ")
(Optional " " is needed because FS is overwritten.)
Then your command will look like:
paste -d "|" filedata.txt tabdata.txt | awk -F '|' '
BEGIN {split("RE_LOG_ID FILE_RUN_ID FH_RECORDTYPE FILECATEGORY", array, " ")}
{
c= NF/2;
for(i=1; i<=c; i++)
if ($i != $(i+c))
printf "%s|%s|%s|%s\n", $1, array[i], $i, $(i+c);
}'

Bash Find Null values of all variables after equal sign in a file

I have a configuration(conf.file) with list of variables and its values generated from shell script
cat conf.file
export ORA_HOME=/u01/app/12.1.0
export ORA_SID=test1
export ORA_LOC=
export TW_WALL=
export TE_STAT=YES
I want to find any variable has null value after equal(=) symbol, if so, then report the message as Configuration file has following list of null variables
You can use awk for this:
awk -F"[= ]" '$3=="" && NF==3 {print $2}' conf.file
That will split each record by a space or an equal sign, then test the third field in each row. If it's empty, it will print the second field (the variable).
UPDATE: Added in a test for Number of Fields (NF) equal to 3 to avoid null rows.
try:
awk -F"=" '$2' Input_file
As you need after = a field shouldn't be empty so making = as a field separator and checking if 2nd field is not empty then no action defined in my code so default print action will happen for any line which satisfy this condition. Let me know if this helps.
EDIT: Above will give only those values whose values are NULL after =, thanks to JNevill for letting me know that requirement is exactly opposite, following may help now in same.
awk -F"=" '!$2{gsub(/.* |=/,"",$1);print $1}' Input_file

Resources