Save changes to a file AWK/SED - bash

I have a huge text file delimited with comma.
19429,(Starbucks),390 Provan Walk,Glasgow,G34 9DL,-4.136909,55.872982
The first one is a unique id. I want the user to enter the id and enter a value for one of the following 6 fields in order to be replaced. Also, i'm asking him to enter a 2-7 value in order to identify which field should be replaced.
Now i've done something like this. I am checking every line to find the id user entered and then i'm replacing the value.
awk -F ',' -v elem=$element -v id=$code -v value=$value '{if($1==id) {if(elem==2) { $2=value } etc }}' $path
Where $path = /root/clients.txt
Let's say user enters "2" in order to replace the second field, and also enters "Whatever". Now i want "(Starbucks)" to be replaced with "Whatever" What i've done work fine but does not save the change into the file. I know that awk is not supposed to do so, but i don't know how to do it. I've searched a lot in google but still no luck.
Can you tell me how i'm supposed to do this? I know that i can do it with sed but i don't know how.

Newer versions of GNU awk support inplace editing:
awk -i inplace -v elem="$element" -v id="$code" -v value="$value" '
BEGIN{ FS=OFS="," } $1==id{ $elem=value } 1
' "$path"
With other awks:
awk -v elem="$element" -v id="$code" -v value="$value" '
BEGIN{ FS=OFS="," } $1==id{ $elem=value } 1
' "$path" > /usr/tmp/tmp$$ &&
mv /usr/tmp/tmp$$ "$path"
NOTES:
Always quote your shell variables unless you have an explicit reason not to and fully understand all of the implications and caveats.
If you're creating a tmp file, use "&&" before replacing your original with it so you don't zap your original file if the tmp file creation fails for any reason.
I fully support replacing Starbucks with Whatever in Glasgow - I'd like to think they wouldn't have let it open in the first place back in my day (1986 Glasgow Uni Comp Sci alum) :-).

awk is much easier than sed for processing specific variable fields, but it does not have in-place processing. Thus you might do the following:
#!/bin/bash
code=$1
element=$2
value=$3
echo "code is $code"
awk -F ',' -v elem=$element -v id=$code -v value=$value 'BEGIN{OFS=",";} /^'$code',/{$elem=value}1' mydb > /tmp/mydb.txt
mv /tmp/mydb.txt ./mydb
This finds a match for a line starting with code followed by a comma (you could also use ($1==code)), then sets the elemth field to value; finally it prints the output, using the comma as output field separator. If nothing matches, it just echoes the input line.
Everything is written to a temporary file, then overwrites the original.
Not very nice but it gets the job done.

Related

how to insert text in matching line only and only if not already present for bash

Trying to search /etc/pam.d/common-password file for entry that starts with password required pam_unix.so and check to see if sha512 is in the fourth column. If its there, do nothing. If its not there add it. It could be anywhere in the fourth column, but it seems easiest to add to the end if not there.
password required pam_unix.so use_authtok nullok shadow try_first_pass
So ending line should look like:
password required pam_unix.so use_authtok nullok shadow try_first_pass sha512
Using the below if statement and replacing the comment with my command:
if [[ "$(grep 'password'$'\t''required'$'\t''pam_unix.so' common-password | cut -f4)" != *"sha512"* ]]; then
>&2 echo 'Remediation needs to be done'
fi
Tried below awk but end up with an empty common-password file
awk '/^'password'$'\t''required'$'\t''pam_unix.so'/{print $0," sha512"}' /etc/pam.d/common-password > /etc/pam.d/common-password.fixed && mv /etc/pam.d/common-password.fixed /etc/pam.d/common-password
Tried below grep and sed, but get error "sed: no input files"
grep -F 'password'$'\t''required'$'\t''pam_unix.so' /etc/pam.d/common-password | sed --follow-symlinks -ie 's/$/& sha512/g'
Need to be able to find this line and modify it and only this line. And only if it has not already been modified. I only want to modify this one line if it exists and leave the rest of the file intact.
Any help would be greatly appreciated.
If you can use gnu-awk you can make use of -i inplace and set the field separator and the output field separator to a tab to compare the field values.
Then you can check if field 4 does not contain sha512 using $4 !~ /sha512/
For example
awk -i inplace '
BEGIN{FS=OFS="\t"}
$1=="password" && $2=="required" && $3=="pam_unix.so" && $4 !~ /sha512/{$0 = $0 OFS "sha512"}1
' file
For one line, use vi and save yourself the trouble of fixing the mess a typo will make.
If you really need an automated edit, simplify to one executable and consolidate your logic.
$: sed -i '/password\trequired\tpam_unix.so\t/{ /\tsha512/{n}; s/$/\tsha512/; }' file
this:
searches for records that have the initial required string of password\trequired\tpam_unix.so\t
opens a block that groups statements to be executed collectively in {...}
searches for records to exclude with /\tsha512/{n}; the n goes on to the next record and skips the rest of the steps in the outer block. This will still output the record, so it won't be deleted from the file on the in-place edit (use d for that).
if the previous check did not match, the n won't be executed, so the next command substitutes in the string you want at the end of the record.
the closing brace of the outer block ends the set of commands on the only outer scan, so sed proceeds to the next record.
If the end-of-record tab logic is not an issue, and you are sure there are no trailing tabs otherwise, or if you are confident that they indicate an empty final field, just use 's/$/\tsha512/', but if it's possible there might be trash tabs at the end of the record, you might want to clean those - maybe 's/\t*$/\tsha512/', or even 's/\t{0,1}$/\tsha512/'. Check your file and your logic against the possible consequences of carelessly editing...
Good luck.

Unable to use associative array value in sed or awk

I am trying to iteratively search and replace strings in a file using a variable input and replacement string. I have tried using sed and awk and have seemed to determine that it is actually the associative array value that is giving me issues(?).
I am looking at an associative array like this:
declare -A speedReplaceValuePairsText
speedReplaceValuePairsText["20"]="xthirtyx"
speedReplaceValuePairsText["30"]="xfiftyx"
speedReplaceValuePairsText["40"]="xsixtyx"
speedReplaceValuePairsText["50"]="xeightyx"
speedReplaceValuePairsText["60"]="xhundredx"
and for ease I was declaring my replacement vars first:
for speedBeforeValue in "${!speedReplaceValuePairsText[#]}";
do
findValue=${speedBeforeValue}
replaceWithValue=${speedReplaceValuePairsText[$speedBeforeValue]}
#replaceWithValue="blah"
echo " Replacing $findValue with $replaceWithValue..."
awk -v srch="$findValue" -v repl="$replaceWithValue" '{gsub(srch,repl); print}' infile.txt > outfile.txt
#sed 's/'"$findValue"'/'"$replaceWithValue"'/g' infile.txt > outfile.txt
#sed "s/$findValue/$replaceWithValue/g" $scriptDir/$currentFileName > outfile.txt
done
The commented out lines are alternate versions of what I have tried with similar inbetween versions.
I have tried using just a normal string (the commented out "blah") and that works fine.
The weirdest part is that the echo statement displays the right value for both key and value.
I have tried so many combinations I am losing my mind. Please someone tell me I am doing something dumb here.
NOTE: This is nested inside another loop but I do not believe this to be an issue, let me know if I am wrong
EDIT: I have simplified the in and out files, and to clarify, if i try to use my associative array value, nothing gets replaced. But if i use a dummy string like "blah" it works.
BONUS: I have marked the answer below, but my search and replace values start and end in double quotes but no matter what I try it replaces all instances of 60. How can i make it replace "60" with "xsixtyx"?
Thanks
I think you want to use >> instead of > inside your loop?
awk -v srch="$findValue" -v repl="$replaceWithValue" '{gsub(srch,repl); print}' $scriptDir/$currentFileName >> ./$outputFolderName/$currentFileName
I tried to run your code it works as expected except that >.
Or if you just want to see the replaced results
awk -v srch="$findValue" -v repl="$replaceWithValue" '{ if (gsub(srch,repl)) print}' $scriptDir/$currentFileName >> ./$outputFolderName/$currentFileName
For a file with
30
20
60
the output looks like
xthirtyx
xhundredx
xfiftyx
For the second case.
Here is the full bash script I tried
#!/bin/bash
declare -A speedReplaceValuePairsText
speedReplaceValuePairsText["20"]="xthirtyx"
speedReplaceValuePairsText["30"]="xfiftyx"
speedReplaceValuePairsText["40"]="xsixtyx"
speedReplaceValuePairsText["50"]="xeightyx"
speedReplaceValuePairsText["60"]="xhundredx"
for speedBeforeValue in "${!speedReplaceValuePairsText[#]}";
do
findValue=${speedBeforeValue}
replaceWithValue=${speedReplaceValuePairsText[$speedBeforeValue]}
echo " Replacing $findValue with $replaceWithValue..."
awk -v srch="$findValue" -v repl="$replaceWithValue" '{if (gsub(srch,repl)) print}' test.txt >> /tmp/test.txt
done

performance issues in shell script

I have a 200 MB tab separated text file with millions of rows. In this file, I have a column with multiple locations like US , UK , AU etc.
Now I want to break this file on the basis of this column. Though this code is working fine for me, but facing performance issue as it is taking more than 1 hour to split the file into multiple files based on locations. Here is the code:
#!/bin/bash
read -p "Please enter the file to split " file
read -p "Enter the Col No. to split " col_no
#set -x
header=`head -1 $file`
cnt=1
while IFS= read -r line
do
if [ $((cnt++)) -eq 1 ]
then
echo "$line" >> /dev/null
else
loc=`echo "$line" | cut -f "$col_no"`
f_name=`echo "file_"$loc".txt"`
if [ -f "$f_name" ]
then
echo "$line" >> "$f_name";
else
touch "$f_name";
echo "file $f_name created.."
echo "$line" >> "$f_name";
sed -i '1i '"$header"'' "$f_name"
fi
fi
done < $file
The logic applied here is that we are reading the entire file only once, and depending on the locations, we are creating and appending the data to it.
Please suggest necessary improvements in the code to enhance its performance.
Following is a sample data and is separated by colon instead of tab. The country code is in the 4th column:
ID1:ID2:ID3:ID4:ID5
100:abcd:TEST1:ZA:CCD
200:abcd:TEST2:US:CCD
300:abcd:TEST3:AR:CCD
400:abcd:TEST4:BE:CCD
500:abcd:TEST5:CA:CCD
600:abcd:TEST6:DK:CCD
312:abcd:TEST65:ZA:CCD
1300:abcd:TEST4153:CA:CCD
There are a couple of things to bear in mind:
Reading files using while read is slow
Creating subshells and executing external processes is slow
This is a job for a text processing tool, such as awk.
I would suggest that you used something like this:
# save first line
NR == 1 {
header = $0
next
}
{
filename = "file_" $col ".txt"
# if country code has changed
if (filename != prev) {
# close the previous file
close(prev)
# if we haven't seen this file yet
if (!(filename in seen)) {
print header > filename
}
seen[filename]
}
# print whole line to file
print >> filename
prev = filename
}
Run the script using something along the following lines:
awk -v col="$col_no" -f script.awk file
where $col_no is a shell variable containing the column number with the country codes.
If you don't have too many different country codes, you can get away with leaving all the files open, in which case you can remove the call to close(filename).
You can test the script on the sample provided in the question like this:
awk -F: -v col=4 -f script.awk file
Note that I've added -F: to change the input field separator to :.
I think Tom is on the right track, but I'd simplify this a little.
Awk is magical in some ways. One of those ways is that it will keep all its input and output file handles open unless you explicitly close them. So if you create a variable containing an output file name, you can simply redirect to your variable and trust that awk will send the data to the place you've specified and eventually close the output file when it runs out of input to process.
(N.B. an extension of this magic is that in addition to redirects, you can maintain multiple PIPES. Imagine if you were to cmd="gzip -9 > file_"$4".txt.gz"; print | cmd)
The following splits your file without adding a header to each output file.
awk -F: 'NR>1 {out="file_"$4".txt"; print > out}' inp.txt
If adding the header is important, a little more code is required. But not much.
awk -F: 'NR==1{h=$0;next} {out="file_"$4".txt"} !(out in files){print h > out; files[out]} {print > out}' inp.txt
Or, because this one-liner is now a bit long, we can split it out for explanation:
awk -F: '
NR==1 {h=$0;next} # Capture the header
{out="file_"$4".txt"} # Capture the output file
!(out in files){ # If we haven't seen this output file before,
print h > out; # print the header to it,
files[out] # and record the fact that we've seen it.
}
{print > out} # Finally, print our line of input.
' inp.txt
I tested these two scripts successfully on the input data you provided in your question. With this type of solution, there is no need to sort your input data -- your output in each file will be in the order in which that subset's records appeared in your input data.
Note: different versions of awk will permit you to open different numbers of open files. GNU awk (gawk) has a limit in the thousands -- significantly more than the number of countries you might have to deal with. BSD awk version 20121220 (in FreeBSD) appears to run out after 21117 files. BSD awk version 20070501 (in OS X El Capitan) is limited to 17 files.
If you're not confident in your potential number of open files, you can experiment with your version of awk usig something like this:
mkdir -p /tmp/i
awk '{o="/tmp/i/file_"NR".txt"; print "hello" > o; printf "\r%d ",NR > "/dev/stderr"}' /dev/random
You can also test the number of open pipes:
awk '{o="cat >/dev/null; #"NR; print "hello" | o; printf "\r%d ",NR > "/dev/stderr"}' /dev/random
(If you have a /dev/yes or something that just spits out lines of text ad nauseam, that would be better than using /dev/random for input.)
I haven't previously come across this limit in my own awk programming because when I've needed to create many many output files, I've always used gawk. :-P

Bash Script: Grabbing First Item Per Line, Throwing Into Array

I'm fairly new to the world of writing Bash scripts and am needing some guidance. I've begun writing a script for work, and so far so good. However, I'm now at a part that needs to collect database names. The names are actually stored in a file, and I can grep them.
The command I was given is cat /etc/oratab which produces something like this:
# This file is used by ORACLE utilities. It is created by root.sh
# and updated by the Database Configuration Assistant when creating
# a database.
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# The first and second fields are the system identifier and home
# directory of the database respectively. The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
OEM:/software/oracle/agent/agent12c/core/12.1.0.3.0:N
*:/software/oracle/agent/agent11g:N
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
I turn turned around and wrote grep ":/software/oracle/ora" /etc/oratab so it can grab everything I need, which is 10 databases. Not the most elegant way, but it gets what I need:
dev068:/software/oracle/ora-10.02.00.04.11:Y
dev299:/software/oracle/ora-10.02.00.04.11:Y
xtst036:/software/oracle/ora-10.02.00.04.11:Y
xtst161:/software/oracle/ora-10.02.00.04.11:Y
dev360:/software/oracle/ora-11.02.00.04.02:Y
dev361:/software/oracle/ora-11.02.00.04.02:Y
xtst215:/software/oracle/ora-11.02.00.04.02:Y
xtst216:/software/oracle/ora-11.02.00.04.02:Y
dev298:/software/oracle/ora-11.02.00.04.03:Y
xtst160:/software/oracle/ora-11.02.00.04.03:Y
So, if I want to grab the name, such as dev068 or xtst161, how do I? I think for what I need to do with this project moving forward, is storing them in an array. As mentioned in the documentation, a colon is the field terminator. How could I whip this together so I have an array, something like:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
I feel like I may be asking for too much assistance here but I'm truly at a loss. I would be happy to clarify if need be.
It is much simpler using awk:
awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
To populate a BASH array with above output use:
mapfile -t arr < <(awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab)
To check output:
declare -p arr
declare -a arr='([0]="dev068" [1]="dev299" [2]="xtst036" [3]="xtst161" [4]="dev360" [5]="dev361" [6]="xtst215" [7]="xtst216" [8]="dev298" [9]="xtst160")'
We can pipe the output of grep to the cut utility to extract the first field, taking colon as the field separator.
Then, assuming there are no whitespace or glob characters in any of the names (which would be subject to word splitting and filename expansion), we can use a command substitution to run the pipeline, and capture the output in an array by assigning it within the parentheses.
names=($(grep ':/software/oracle/ora' /etc/oratab| cut -d: -f1;));
Note that the above command actually makes use of word splitting on the command substitution output to split the names into separate elements of the resulting array. That is why we must be sure that no whitespace occurs within any single database name, otherwise that name would be internally split into separate elements of the array. The only characters within the command substitution output that we want to be taken as word splitting delimiters are the line feeds that delimit each line of output coming off the cut utility.
You could also use awk for this:
awk -F: '!/^#/ && $2 ~ /^\/software\/oracle\/ora-/ {print $1}' /etc/oratab
The first pattern excludes any commented-out lines (starting with a #). The second pattern looks for your expected directory pattern in the second field. If both conditions are met it prints the first field, which the Oracle SID. The -F: flag sets the field delimiter to a colon.
With your file that gets:
dev068
dev299
xtst036
xtst161
dev360
dev361
xtst215
xtst216
dev298
xtst160
Depending on what you're doing you could finesse it further and check the last flag is set to Y; although that is really to indicate automatic start-up, it can sometime be used to indicate that a database isn't active at all.
And you can put the results into an array with:
declare -a DBS=(`awk -F: -v key='/software/oracle/ora' '$2 ~ key{print $1}' /etc/oratab`)
and then refer to ${DBS[1]} (which evaluates to dev299) etc.
If you'd like them into a Bash array:
$ cat > toarr.bash
#!/bin/bash
while read -r line
do
if [[ $line =~ .*Y$ ]] # they seem to end in a "Y"
then
arr[$((i++))]=${line%%:*}
fi
done < file
echo ${arr[*]} # here we print the array arr
$ bash toarr.bash
dev068 dev299 xtst036 xtst161 dev360 dev361 xtst215 xtst216 dev298 xtst160

how can I supply bash variables as fields for print in awk

I currently am trying to use awk to rearrange a .csv file that is similar to the following:
stack,over,flow,dot,com
and the output would be:
over,com,stack,flow,dot
(or any other order, just using this as an example)
and when it comes time to rearrange the csv file, I have been trying to use the following:
first='$2'
second='$5'
third='$1'
fourth='$3'
fifth='$4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' '{print $a,$b,$c,$d,$e}' somefile.csv
with the intent of awk/print interpreting the $a,$b,$c,etc as field numbers, so it would come out to the following:
{print $2,$5,$1,$3,$4}
and print out the fields of the csv file in that order, but unfortunately I have not been able to get this to work correctly yet. I've tried several different methods, this seeming like the most promising, but unfortunately have not been able to get any solution to work correctly yet. Having said that, I was wondering if anyone could possibly give any suggestions or point out my flaw as I am stumped at this point in time, any help would be much appreciated, thanks!
Use simple numbers:
first='2'
second='5'
third='1'
fourth='3'
fifth='4'
awk -v a=$first -v b=$second -v c=$third -v d=$fourth -v e=$fifth -F '^|,|$' \
'{print $a, $b, $c, $d, $e}' somefile.csv
Another way with a shorter example:
aa='$2'
bb='$1'
cc='$3'
awk -F '^|,|$' "{print $aa,$bb,$cc}" somefile.csv
You already got the answer to your specific question but have you considered just specifying the order as a string instead of each individual field? For example:
order="2 5 1 3 4"
awk -v order="$order" '
BEGIN{ FS=OFS=","; n=split(order,a," ") }
{ for (i=1;i<n;i++) printf "%s%s",$(a[i]),OFS; print $(a[i]) }
' somefile.csv
That way if you want to add/delete fields or change the order you just trivially rearrange the numbers in the first line instead of having to mess with a bunch of hard-coded variables, etc.
Note that I changed your FS as there was no need for it to be that complicated. Also, you don't need the shell variable, "order",you could just populate the awk variable of the same name explicitly, I just started with the shell variable since you had started with shell variables so maybe you have a reason.

Resources