Grabing values from one file (via awk) and using them in another (via sed) - bash

I am moving using gawk to grab some values but not all values from a file. I have another file that's a template that I will use to replace certain piece then generate a file specific to those values I grab. I would like to use sed to substitute these fields of interest that are in the template.
the dog NAME , likes to ACTION in water when he's bored
another file,f1, would have the name of the dog and the action
Maxs,swim
StoneCold,digs
Thor,leaps
So I can grab these values and store them into an associative array...what I cant do, or see, is how do I get these to my sed script?
so a simple sed script could be like this
s/NAME/ value from f1
s/ACTION/ value from f1
so my out put for the template would be
the dog Maxs , likes to swim in water when he's bored
So if I ran a bash file, the command would look something like this, or what I have attempted
gawk -f f1 animalNameAction | sed -f (is there a way to put something here) template | cat
gawk -f f1 animalNameAction > PulledValues| sed -f PulledValues template | cat
but none of this has worked. So I am left wondering how this could be done.

You can do this, using awk itself,
I assume, template can be of multiline char,
so in FNR==NR{} block, I saved entire file (template) contents in variable t,
and in other block, I replaced NAME and ACTION with first and second fields from comma separated file.
Here is example :
$ cat template
the dog NAME , likes to ACTION in water when he's bored
$ cat file
Maxs,swim
StoneCold,digs
Thor,leaps
$ awk 'FNR==NR{ t = (t ? t RS :"") $0; next}{ s=t; gsub(/NAME/,$1,s); gsub(/ACTION/,$2,s); print s}' template FS=',' file
the dog Maxs , likes to swim in water when he's bored
the dog StoneCold , likes to digs in water when he's bored
the dog Thor , likes to leaps in water when he's bored
Better Readable :
awk 'FNR==NR{
t = (t ? t RS :"") $0;
next
}
{
s=t;
gsub(/NAME/,$1,s);
gsub(/ACTION/,$2,s);
print s
}
' template FS=',' file

Related

How to replace a whole line (between 2 words) using sed?

Suppose I have text as:
This is a sample text.
I have 2 sentences.
text is present there.
I need to replace whole text between two 'text' words. The required solution should be
This is a sample text.
I have new sentences.
text is present there.
I tried using the below command but its not working:
sed -i 's/text.*?text/text\
\nI have new sentence/g' file.txt
With your shown samples please try following. sed doesn't support lazy matching in regex. With awk's RS you could do the substitution with your shown samples only. You need to create variable val which has new value in it. Then in awk performing simple substitution operation will so the rest to get your expected output.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file
Above code will print output on terminal, once you are Happy with results of above and want to save output into Input_file itself then try following code.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file > temp && mv temp Input_file
You have already solved your problem using awk, but in case anyone else will be looking for a sed solution in the future, here's a sed script that does what you needed. Granted, the script is using some advanced sed features, but that's the fun part of it :)
replace.sed
#!/usr/bin/env sed -nEf
# This pattern determines the start marker for the range of lines where we
# want to perform the substitution. In our case the pattern is any line that
# ends with "text." — the `$` symbol meaning end-of-line.
/text\.$/ {
# [p]rint the start-marker line.
p
# Next, we'll read lines (using `n`) in a loop, so mark this point in
# the script as the beginning of the loop using a label called `loop`.
:loop
# Read the next line.
n
# If the last read line doesn't match the pattern for the end marker,
# just continue looping by [b]ranching to the `:loop` label.
/^text/! {
b loop
}
# If the last read line matches the end marker pattern, then just insert
# the text we want and print the last read line. The net effect is that
# all the previous read lines will be replaced by the inserted text.
/^text/ {
# Insert the replacement text
i\
I have a new sentence.
# [print] the end-marker line
p
}
# Exit the script, so that we don't hit the [p]rint command below.
b
}
# Print all other lines.
p
Usage
$ cat lines.txt
foo
This is a sample text.
I have many sentences.
I have many sentences.
I have many sentences.
I have many sentences.
text is present there.
bar
$
$ ./replace.sed lines.txt
foo
This is a sample text.
I have a new sentence.
text is present there.
bar
Substitue
sed -i 's/I have 2 sentences./I have new sentences./g'
sed -i 's/[A-Z]\s[a-z].*/I have new sentences./g'
Insert
sed -i -e '2iI have new sentences.' -e '2d'
I need to replace whole text between two 'text' words.
If I understand, first text. (with a dot) is at the end of first line and second text at the beginning of third line. With awk you can get the required solution adding values to var s:
awk -v s='\nI have new sentences.\n' '/text.?$/ {s=$0 s;next} /^text/ {s=s $0;print s;s=""}' file
This is a sample text.
I have new sentences.
text is present there.

How to get paragraphs of text by index number

I am wondering if there is a way to get paragraphs of text (source file would be a pyx file) by number as sed does with lines
sed -n ${i}p
At this moment I'd be interested to use awk with:
awk '/custom-pyx-tag\(/,/\)custom-pyx-tag/'
but I can't find documentation or examples about that.
I'm also trying to trim "\r\n" with gsub(/\r\n/,"; ") int the same awk command but it doesn't work, and I can't really figure out why.
Any hint would be very appreciated, thanks
EDIT:
This is just one example and not my exact need but I would need to know how to do it for a multipurpose project
Let's take the case that I have exported the ID3Tags of a huge collection of audio files and these have been stored in a pyx-like format, so in the end I will have a nice big file with this pattern repeating for each file in the collection:
audio-genre(
blablabla
)audio-genre
audio-artist(
bla.blabla
)audio-artist
audio album(
bla-bla-bla
)audio-album
audio-track-num(
0x
)audio-track-num
audio-track-title(
bla.bla-bla
)audio-track-title
audio-lyrics(
blablablablabla
bla.bla.bla.bla
blah-blah-blah
blabla-blabla
)audio-lyrics
...
Now if I want to extract the artist of the 1234th audio file I can use:
awk '/audio-artist\(/, /)audio-artist/' | sed '/audio-artist/d' | sed -n 1234p
so being one line it can be obtained with sed, but I don't know how to get an entire paragraph given its index, for example if I want to get the lyrics of the 6543th file how could I do it?
In the end it is just a question of whether there is a command equivalent to
sed -n $ {num} p
but to be used for paragraphs
awk -v indx=1024
'BEGIN {
RS=""
}
{ split($0,arr,"audio-artist");
for (i=2;i<=length(arr);i=i+2)
{ gsub("[()]","",arr[i]);
arts[cnt+=1]=arr[i]
}
}
END {
print arts[indx]
}' audioartist
One liner:
awk -v indx=1234 'BEGIN {RS=""} NR==1 { split($0,arr,"audio-artist");for (i=2;i<=length(arr);i=i+2) { gsub("[()]","",arr[i]);arts[cnt+=1]=arr[i] } } END { print arts[indx] }' audioartist
Using awk, and the file called audioartist, we consume the file as one line by setting the records separator (RS) to "". We then split the whole file into an array arr, based on the separator audio-artist. We look through the array arr starting from 2 in steps of 2 till the end of the array and strip out the opening and closing brackets, creating another array called arts with an incrementing count as the index and the stripped artist as the value. At the end we print the arts index specified by the passed indx variable (in this case 1234).

Using sed to find-and-replace in a text file using strings from another text file

I have two files as follows. The first is sample.txt:
new haven co-op toronto on $1245
joe schmo co-op powell river bc $4444
The second is locations.txt:
toronto
powell river
on
bc
We'd like to use sed to produce a marked up sample-new.txt that added ; before and after each of these. So that the final string would appear like:
new haven co-op ;toronto; ;on; $1245
joe schmo co-op ;powell river; ;bc; $4444
Is this possible using bash? The actual files are much longer (thousands of lines in each case) but as a one-time job we're not too concerned about processing time.
--- edited to add ---
My original approach was something like this:
cat locations.txt | xargs -i sed 's/{}/;/' sample.txt
But it only ran the script once per pattern, as opposed to the methods you've proposed here.
Using awk:
awk 'NR==FNR{a[NR]=$0; next;} {for(i in a)gsub("\\<"a[i]"\\>",";"a[i]";"); print} ' locations.txt sample.txt
Using awk+sed
sed -f <(awk '{print "s|\\<"$0"\\>|;"$0";|g"}' locations.txt) sample.txt
Same using pure sed:
sed -f <(sed 's/.*/s|\\<&\\>|\;&\;|g/' locations.txt) sample.txt
(After you show your coding attempts, I will add the explanation of why this works.)
Just to complete your set of options, you can do this in pure bash, slowly:
#!/usr/bin/env bash
readarray -t places < t2
while read line; do
for place in "${places[#]}"; do
line="${line/ $place / ;$place; }"
done
echo "$line"
done < t1
Note that this likely won't work as expected if you include places that are inside other places, for example "niagara on the lake" which is in "on":
foo bar co-op ;niagara ;on; the lake; on $1
Instead, you might want to do more targeted pattern matching, which will be much easier in awk:
#!/usr/bin/awk -f
# Collect the location list into the index of an array
NR==FNR {
places[$0]
next
}
# Now step through the input file
{
# Handle two-letter provinces
if ($(NF-1) in places) {
$(NF-1)=";" $(NF-1) ";"
}
# Step through the remaining places doing substitutions as we find matches
for (place in places) {
if (length(place)>2 && index($0,place)) {
sub(place,";"place";")
}
}
}
# Print every line
1
This works for me using the data in your question:
$ cat places
toronto
powell river
niagara on the lake
on
bc
$ ./tst places input
new haven co-op ;toronto; ;on; $1245
joe schmo co-op ;powell river; ;bc; $4444
foo nar co-op ;niagara on the lake; ;on; $1
You may have a problem if your places file contains an actual non-province comprising two letters. I'm not sure if such things exist in Canada, but if they do, you'll either need to tweak such lines manually, or make the script more complex by handling provinces separately from cities.

AWK between 2 patterns - first occurence

I am having this example of ini file. I need to extract the names between 2 patterns Name_Z1 and OBJ=Name_Z1 and put them each on a line.
The problem is that there are more than one occurences with Name_Z1 and OBJ=Name_Z1 and i only need first occurence.
[Name_Z5]
random;text
Names;Jesus;Tom;Miguel
random;text
OBJ=Name_Z5
[Name_Z1]
random;text
Names;Jhon;Alex;Smith
random;text
OBJ=Name_Z1
[Name_Z2]
random;text
Names;Chris;Mara;Iordana
random;text
OBJ=Name_Z2
[Name_Z1_Phone]
random;text
Names;Bill;Stan;Mike
random;text
OBJ=Name_Z1_Phone
My desired output would be:
Jhon
Alex
Smith
I am currently writing a more ample script in bash and i am stuck on this. I prefer awk to do the job.
My greatly appreciation for who can help me. Thank you!
For Wintermute solution: The [Name_Z1] part looks like this:
[CAB_Z1]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;AIRE;ALIMENTA;BATER;CONVERTIDOR;DISTRIBUCION;FUEGO;HURTO;MAINS;MALLO;MAYOR;MENOR;PANEL;TEMP
NAME=CAB_Z1
And the [Name_Z1_Phone] part looks like this:
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
The fix should be somewhere around the "|PerceivedSeverity"
Expected Output:
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
This should work:
sed -n '/^\[Name_Z1/,/^OBJ=Name_Z1/ { /^Names/ { s/^Names;//; s/;/\n/g; p; q } }' foo.txt
Explanation: Written readably, the code is
/^\[Name_Z1/,/^OBJ=Name_Z1/ {
/^Names/ {
s/^Names;//
s/;/\n/g
p
q
}
}
This means: In the pattern range /^\[Name_Z1/,/^OBJ=Name_Z1/, for all lines that match the pattern /^Names/, remove the Names; in the beginning, then replace all remaining ; with newlines, print the whole thing, and then quit. Since it immediately quits, it will only handle the first such line in the first such pattern range.
EDIT: The update made things a bit more complicated. I suggest
sed -n '/^\[CAB_Z1/,/^NAME=CAB_Z1/ { /^FilterAttr=/ { s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/; s/;/\n/g; p; q } }' foo.txt
The main difference is that instead of removing ^Names from a line, the substitution
s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/;
is applied. This isolates the part between contains; and |PerceivedSeverity before continuing as before. It assumes that there is only one such part in the line. If the match is ambiguous, it will pick the one that appears last in the line.
An (g)awk way that doesn't need a set number of fields(although i have assumed that contains; will always be on the line you need the names from.
(g)awk '(x+=/Z1/)&&match($0,/contains;([^|]+)/,a)&&gsub(";","\n",a[1]){print a[1];exit}' f
Explanation
(x+=/Z1/) - Increments x when Z1 is found. Also part of a
condition so x must exist to continue.
match($0,/contains;([^|]+)/,a) - Matches contains; and then captures everything after
up to the |. Stores the capture in a. Again a
condition so must succeed to continue.
gsub(";","\n",a[1]) - Substitutes all the ; for newlines in the capture
group a[1].
{print a[1];exit}' - If all conditions are met then print a[1] and exit.
This way should work in (m)awk
awk '(x+=/Z1/)&&/contains/{split($0,a,"|");y=split(a[2],b,";");for(i=3;i<=y;i++)
print b[i];exit}' file
sed -n '/\[Name_Z1\]/,/OBJ=Name_Z1$/ s/Names;//p' file.txt | tr ';' '\n'
That is sed -n to avoid printing anything not explicitly requested. Start from Name_Z1 and finish at OBJ=Name_Z1. Remove Names; and print the rest of the line where it occurs. Finally, replace semicolons with newlines.
Awk solution would be
$ awk -F";" '/Name_Z1/{f=1} f && /Names/{print $2,$3,$4} /OBJ=Name_Z1/{exit}' OFS="\n" input
Jhon
Alex
Smith
OR
$ awk -F";" '/Name_Z1/{f++} f==1 && /Names/{print $2,$3,$4}' OFS="\n" input
Jhon
Alex
Smith
-F";" sets the field seperator as ;
/Name_Z1/{f++} matches the line with pattern /Name_Z1/ If matched increment {f++}
f==1 && /Names/{print $2,$3,$4} is same as if f == 1 and maches pattern Name with line if true, then print the the columns 2 3 and 4 (delimted by ;)
OFS="\n" sets the output filed seperator as \n new line
EDIT
$ awk -F"[;|]" '/Z1/{f++} f==1 && NF>1{for (i=5; i<15; i++)print $i}' input
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
Here is a more generic solution for data in group of blocks.
This awk does not need the end tag, just the start.
awk -vRS= -F"\n" '/^\[Name_Z1\]/ {n=split($3,a,";");for (i=2;i<=n;i++) print a[i];exit}' file
Jhon
Alex
Smith
How it works:
awk -vRS= -F"\n" ' # By setting RS to nothing, one record equals one block. Then FS is set to one line as a field
/^\[Name_Z1\]/ { # Search for block with [Name_Z1]
n=split($3,a,";") # Split field 3, the names and store number of fields in variable n
for (i=2;i<=n;i++) # Loop from second to last field
print a[i] # Print the fields
exit # Exits after first find
' file
With updated data
cat file
data
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
data
awk -vRS= -F"\n" '/^\[CAB_Z1_FUEGO\]/ {split($3,a,"|");n=split(a[2],b,";");for (i=3;i<=n;i++) print b[i]}' file
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
The following awk script will do what you want:
awk 's==1&&/^Names/{gsub("Names;","",$0);gsub(";","\n",$0);print}/^\[Name_Z1\]$/||/^OBJ=Name_Z1$/{s++}' inputFileName
In more detail:
s==1 && /^Names;/ {
gsub ("Names;","",$0);
gsub(";","\n",$0);
print
}
/^\[Name_Z1\]$/ || /^OBJ=Name_Z1$/ {
s++
}
The state s starts with a value of zero and is incremented whenever you find one of the two lines:
[Name_Z1]
OBJ=Name_Z1
That means, between the first set of those lines, s will be equal to one. That's where the other condition comes in. When s is one and you find a line starting with Names;, you do two substitutions.
The first is to get rid of the Names; at the front, the second is to replace all ; semi-colon characters with a newline. Then you print it out.
The output for your given test data is, as expected:
Jhon
Alex
Smith

Bash script replace two fields in a text file using variables

This should be a simple fix but I cannot wrap my head around it at the moment.
I have a comma-delimited file called my_course that contains a list of courses with some information about them.
I need to get user input about the last two fields and change them accordingly.
Each field is constructed like:
CourseNumber,CourseTitle,CreditHours,Status,Grade
Example file:
CSC3210,COMPUTER ORG & PROGRAMMING,3,0,N/A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,0,N/A
I get the user input for 3 things: Course Number, Status (0 or 1), and Grade (A,B,C,N/A)
So far I have tried matching the line containing the course number and changing the last two fields. I haven't been about to figure out how to modify the last two fields using sed so I'm using this horrible jumble of awk and sed:
temporary=$(awk -v status=$status -v grade=$grade '
BEGIN { FS="," }; $(NF)=""; $(NF-1)="";
/'$cNum'/ {printf $0","status","grade;}' my_course)
sed -i "s/CSC$cNum.*/$temporary/g" my_course
The issue that I'm running into here is the number of fields in the course title can range from 1 to 4 so I can't just easily print the first n fields. I've tried removing the last two fields and appending the new values for status and grade but that isn't working for me.
Note: I have already done checks to ensure that the user inputs valid data.
Use a simple awk-script:
BEGIN {
FS=","
OFS=FS
}
$0 ~ course {
$(NF-1)=status
$NF=grade
} {print}
and on the cmd-line, set three parameters for the various parameters like course, status and grade.
in action:
$ cat input
CSC3210,COMPUTER ORG & PROGRAMMING,3,0,N/A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,0,N/A
$ awk -vcourse="CSC3210" -vstatus="1" -vgrade="A" -f grades.awk input
CSC3210,COMPUTER ORG & PROGRAMMING,3,1,A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,0,N/A
$ awk -vcourse="CSC1010" -vstatus="1" -vgrade="B" -f grades.awk input
CSC3210,COMPUTER ORG & PROGRAMMING,3,0,N/A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,1,B
It doesn't matter how much commas you have in course name as long as you look only at last two commas:
sed -i "/CSC$cNum/ s/.,[^,]*$$/$status,$grade/"
The trick is to use $ in pattern to match the end of line. $$ because of double quotes.
And don't bother building the "temporary" line - apply substitution only to line that matches course number.

Resources