Replace string starting with in between two lines using bash - bash

Is there a way to use sed to replace starting of the string in the entire file without using a loop?
For example, my source data is the following:
str_address: 123 main street
str_desc: Apt3
str_desc: 2nd floor
str_city: new york
str_desc: mailing address
Now, the file will have thousands of addresses, but I want anytime "str_desc" appears after "str_address" and before "str_city" to be replaced with "str_address", however any "str_desc" that appears after str_city to remain unchanged.
Desired output:
str_address: 123 main street
str_address: Apt3
str_address: 2nd floor
str_city: new york
str_desc: mailing address
I can extract this info with,
cat file | awk '/str_city/{f=0} f; /str_address/{f=1}'
which gives
str_desc: Apt3
str_desc: 2nd floor
But I am having trouble changing the first "str_desc" to "str_address".

You almost have the complete solution in your awk extraction code:
awk '/str_city/{f=0} f; /str_address/{f=1}'
The idea is to:
turn the flag on when you see str_address.
turn the flag off when you see str_city.
replace str_desc with str_address if the flag is on.
That's basically (in readable form, and the order is important):
awk '
$1 == "str_address:" { flag = 1 }
$1 == "str_desc:" && flag == 1 { $1 = "str_address:" }
$1 == "str_city:" { flag = 0 }
{ print }
' < inputFile >outputFile
Here's a transcript showing it in action:
pax$ echo '
str_address: 123 main street
str_desc: Apt3
str_desc: 2nd floor
str_city: new york
str_desc: mailing address
' | awk '
$1 == "str_address:" { flag = 1 }
$1 == "str_desc:" && flag == 1 { $1 = "str_address:" }
$1 == "str_city:" { flag = 0 }
{ print }'
str_address: 123 main street
str_address: Apt3
str_address: 2nd floor
str_city: new york
str_desc: mailing address
And, of course, a minified version:
awk '$1=="str_address:"{f=1}$1=="str_desc:"&&f==1{$1="str_address:"}$1=="str_city:"{f=0}{print}' < inputFile >outputFile

You can use an address range in sed:
$ sed '/str_address/,/str_city/s/str_desc/str_address/' infile
str_address: 123 main street
str_address: Apt3
str_address: 2nd floor
str_city: new york
str_desc: mailing address
This leaves all the str_desc outside of the /str_address/,/str_city/ range untouched, and substitutes the others with str_address (that's the s/str_desc/str_address/ part).

Related

To split and arrange number in single inverted

I have around 65000 products codes in a text file.I wanted to split those number in group of 999 each .Then-after want each 999 number with single quotes separated by comma.
Could you please suggest how I can achieve above scenario through Unix script.
87453454
65778445
.
.
.
.
Till 65000 productscodes
Need to arrange in below pattern:
'87453454','65778445',
With awk:
awk '
++c == 1 { out = "\047" $0 "\047"; next }
{ out = out ",\047" $0 "\047" }
c == 999 { print out; c = 0 }
END { if (c) print out }
' file
Or, with GNU sed:
sed "
:a
\$bb
N
0~999{
:b
s/\n/','/g
s/^/'/
s/$/'/
b
}
ba" file
With Perl:
perl -ne '
sub pq { chomp; print "\x27$_\x27" } pq;
for (1 .. 998) {
if (defined($_ = <>)) {
print ",";
pq
}
}
print "\n"
' < file
Credit for Mauke perl#libera.chat
65000 isn't that many lines for awk - just do it all in one shot :
mawk 'BEGIN { FS = RS; RS = "^$"; OFS = (_="\47")(",")_
} gsub(/^|[^0-9]*$/,_, $!(NF = NF))'
'66771756','69562431','22026341','58085790','22563930',
'63801696','24044132','94255986','56451624','46154427'
That's for grouping them all in one line. To make 999 ones, try
jot -r 50 10000000 99999999 |
# change "5" to "999" here
rs -C= 0 5 |
mawk 'sub(".*", "\47&\47", $!(NF -= _==$NF ))' FS== OFS='\47,\47'
'36452530','29776340','31198057','36015730','30143632'
'49664844','83535994','86871984','44613227','12309645'
'58002568','31342035','72695499','54546650','21800933'
'38059391','36935562','98323086','91089765','65672096'
'17634208','14009291','39114390','35338398','43676356'
'14973124','19782405','96782582','27689803','27438921'
'79540212','49141859','25714405','42248622','25589123'
'11466085','87022819','65726165','86718075','56989625'
'12900115','82979216','65469187','63769703','86494457'
'26544666','89342693','64603075','26102683','70528492'
_==$NF checks whether right most column is empty or not,
—- i.e. whether there's a trailing edge sep that needds to be trimmed
If your input file only contains short codes as shown in your example, you could use the following hack:
xargs -L 999 bash -c "printf \'%s\', \"\$#\"; echo" . <inputFile >outputFile
Alternatively, you can use this sed command:
sed -Ene"s/(.*)/'\1',/;H" -e{'0~999','$'}'{z;x;s/\n//g;p}' <inputFile >outputFile
s/(.*)/'\1',/ wraps each line in '...',
but does not print it (-n)
instead, H appends the modified line to the so called hold space; basically a helper variable storing a single string.
(This also adds a line break as a separator, but we remove that later).
Every 999 lines (0~999) and at the end of the input file ($) ...
... the hold space is then printed and cleared (z;x;...;p)
while deleting all delimiter-linebreaks (s/\n//g) mentioned earlier.

Calculating the total time in an online meeting

Suppose there was a meeting and the meeting record is saved in a CSV file. How to write a bash script/awk script to find out the total amount of time for which an employee stayed online. One employee may leave and rejoin the meeting, all his/her online time should be calculated.
What I did is as follows, but got stuck on how to compare one record with all other record, and add the total time of each joined and left pairs of a person.
#!/bin/bash
inputFile=$1
startTime=$(date -u -d $2 +"%s")
endTime=$(date -u -d $3 +"%s")
awk 'BEGIN{ FS=","; totalTime=0; }
{
for (rows=1; rows <= NR; rows++) {
#I am stuck here on how to compare a record with each and every record
if (($1==?? && $2=="Joined") && ($1==?? && $2=="Left")) {
totalTime=$($(date -u -d $3 +"%s")-$(date -u -d $3 +"%s"))
print $1 "," $totalTime +"%H:%M:%S"
}' $inputFile
The start_time and end_time of the meeting are given at command line such as:
$ ./script.sh input.csv 10:00:00 13:00:00
The output look like this: (Can be stored in an output file)
Bob, 00:30:00
John, 01:02:00
The contents of the CSV file is as follows:
Employee_name, Joined/Left, Time
John, joined, 10:00:00
Bob, joined, 10:01:00
James, joined, 10:00:30
Bob, left, 10:20:00
Bob, joined, 10:35:00
Bob, left, 11:40:00
James, left, 11:40:00
John, left, 10:41:00
Bob, joined, 11:45:00
$ cat tst.awk
BEGIN { FS=" *, *"; OFS=", " }
NR==1 { next }
$1 in joined {
jt = time2secs(joined[$1])
lt = time2secs($3)
totSecs[$1] += (lt - jt)
delete joined[$1]
next
}
{ joined[$1] = $3 }
END {
for (name in totSecs) {
print name, secs2time(totSecs[name])
}
}
function time2secs(time, t) {
split(time,t,/:/)
return (t[1]*60 + t[2])*60 + t[3]
}
function secs2time(secs, h,m,s) {
h = int(secs / (60*60))
m = int((secs - (h*60*60)) / 60)
s = int(secs % 60)
return sprintf("%02d:%02d:%02d", h, m, s)
}
.
$ awk -f tst.awk file
James, 01:39:30
Bob, 01:24:00
John, 00:41:00
If you need to consider DST transitions, leap-seconds, meetings going overnight (or for multiple days), people still being in the meeting when it ends, or anything else you haven't shown in your question - that is left as an exercise :-).

Bash, csv file: get number of rows by column name

What I have:
file.csv
car, speed, gas, color
2cv, 120, 8 , green
vw, 80, , yellow
Jaguar, 250, 15 , red
Benz, , , silver
What I have found:
This script returns exactly what I need by column number:
#!/bin/bash
awk -F', *' -v col=3 '
FNR>1 {
if ($col)
maxc=FNR
}
END {
print maxc
}
' file.csv
read -p "For End press Enter or Ctl + C"
I get exactly the output which I need (the number of the last line of column):
* for "col=1" ("car" column), the answer: 5
* for "col=2" ("speed" column), the answer: 4
* for "col=3" ("gas" column), the answer: 4
* for "col=4" ("color" column), the answer: 5
What I am looking for:
I am looking for a way to get the same not by "vol=volumnumber p.e. vol=3" but by "vol=columnheadlinevalue p.e. vol=gas".
It can be, it need additional like:
col_name=gas # selected column headline
col=get column number from $col_name # not working part
Here's one way to do exactly what you do but which finds the column name when FNR==1:
#!/bin/bash
columns=(car speed gas color)
for col in "${columns[#]}"
do
LINE_CNT=$(awk '-F[\t ]*,[\t ]*' -vcol=${col} '
FNR==1 {
for(i=1; i<=NF; ++i) {
if($i == col) {
col = i;
break;
}
}
if(i>NF) {
exit 1;
}
}
FNR>1 {
if($col) maxc=FNR;
}
END{
print maxc;
}' file.csv)
echo "$col $LINE_CNT"
done
Output:
car 5
speed 4
gas 4
color 5

Find nth row using AWK and assign them to a variable

Okay, I have two files: one is baseline and the other is a generated report. I have to validate a specific string in both the files match, it is not just a single word see example below:
.
.
name os ksd
56633223223
some text..................
some text..................
My search criteria here is to find unique number such as "56633223223" and retrieve above 1 line and below 3 lines, i can do that on both the basefile and the report, and then compare if they match. In whole i need shell script for this.
Since the strings above and below are unique but the line count varies, I had put it in a file called "actlist":
56633223223 1 5
56633223224 1 6
56633223225 1 3
.
.
Now from below "Rcount" I get how many iterations to be performed, and in each iteration i have to get ith row and see if the word count is 3, if it is then take those values into variable form and use something like this
I'm stuck at the below, which command to be used. I'm thinking of using AWK but if there is anything better please advise. Here's some pseudo-code showing what I'm trying to do:
xxxxx=/root/xxx/xxxxxxx
Rcount=`wc -l $xxxxx | awk -F " " '{print $1}'`
i=1
while ((i <= Rcount))
do
record=_________________'(Awk command to retrieve ith(1st) record (of $xxxx),
wcount=_________________'(Awk command to count the number of words in $record)
(( i=i+1 ))
done
Note: record, wcount values are later printed to a log file.
Sounds like you're looking for something like this:
#!/bin/bash
while read -r word1 word2 word3 junk; do
if [[ -n "$word1" && -n "$word2" && -n "$word3" && -z "$junk" ]]; then
echo "all good"
else
echo "error"
fi
done < /root/shravan/actlist
This will go through each line of your input file, assigning the three columns to word1, word2 and word3. The -n tests that read hasn't assigned an empty value to each variable. The -z checks that there are only three columns, so $junk is empty.
I PROMISE you you are going about this all wrong. To find words in file1 and search for those words in file2 and file3 is just:
awk '
NR==FNR{ for (i=1;i<=NF;i++) words[$i]; next }
{ for (word in words) if ($0 ~ word) print FILENAME, word }
' file1 file2 file3
or similar (assuming a simple grep -f file1 file2 file3 isn't adequate). It DOES NOT involve shell loops to call awk to pull out strings to save in shell variables to pass to other shell commands, etc, etc.
So far all you're doing is asking us to help you implement part of what you think is the solution to your problem, but we're struggling to do that because what you're asking for doesn't make sense as part of any kind of reasonable solution to what it sounds like your problem is so it's hard to suggest anything sensible.
If you tells us what you are trying to do AS A WHOLE with sample input and expected output for your whole process then we can help you.
We don't seem to be getting anywhere so let's try a stab at the kind of solution I think you might want and then take it from there.
Look at these 2 files "old" and "new" side by side (line numbers added by the cat -n):
$ paste old new | cat -n
1 a b
2 b 56633223223
3 56633223223 c
4 c d
5 d h
6 e 56633223225
7 f i
8 g Z
9 h k
10 56633223225 l
11 i
12 j
13 k
14 l
Now lets take this "actlist":
$ cat actlist
56633223223 1 2
56633223225 1 3
and run this awk command on all 3 of the above files (yes, I know it could be briefer, more efficient, etc. but favoring simplicity and clarity for now):
$ cat tst.awk
ARGIND==1 {
numPre[$1] = $2
numSuc[$1] = $3
}
ARGIND==2 {
oldLine[FNR] = $0
if ($0 in numPre) {
oldHitFnr[$0] = FNR
}
}
ARGIND==3 {
newLine[FNR] = $0
if ($0 in numPre) {
newHitFnr[$0] = FNR
}
}
END {
for (str in numPre) {
if ( str in oldHitFnr ) {
if ( str in newHitFnr ) {
for (i=-numPre[str]; i<=numSuc[str]; i++) {
oldFnr = oldHitFnr[str] + i
newFnr = newHitFnr[str] + i
if (oldLine[oldFnr] != newLine[newFnr]) {
print str, "mismatch at old line", oldFnr, "new line", newFnr
print "\t" oldLine[oldFnr], "vs", newLine[newFnr]
}
}
}
else {
print str, "is present in old file but not new file"
}
}
else if (str in newHitFnr) {
print str, "is present in new file but not old file"
}
}
}
.
$ awk -f tst.awk actlist old new
56633223225 mismatch at old line 12 new line 8
j vs Z
It's outputing that result because the 2nd line after 56633223225 is j in file "old" but Z in file "new" and the file "actlist" said the 2 files had to be common from one line before until 3 lines after that pattern.
Is that what you're trying to do? The above uses GNU awk for ARGIND but the workaround is trivial for other awks.
Use the below code:
awk '{if (NF == 3) { word1=$1; word2=$2; word3=$3; print "Words are:" word1, word2, word3} else {print "Line", NR, "is having", NF, "Words" }}' filename.txt
I have given the solution as per the requirement.
awk '{ # awk starts from here and read a file line by line
if (NF == 3) # It will check if current line is having 3 fields. NF represents number of fields in current line
{ word1=$1; # If current line is having exact 3 fields then 1st field will be assigned to word1 variable
word2=$2; # 2nd field will be assigned to word2 variable
word3=$3; # 3rd field will be assigned to word3 variable
print word1, word2, word3} # It will print all 3 fields
}' filename.txt >> output.txt # THese 3 fields will be redirected to a file which can be used for further processing.
This is as per the requirement, but there are many other ways of doing this but it was asked using awk.

Remove linefeed from csv preserving rows

I have a CSV that was exported, some lines have a linefeed (ASCII 012) in the middle of a record. I need to replace this with a space, but preserve the new line for each record to load it.
Most of the lines are fine, however a good few have this:
Input:
10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :
Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"
Output:
10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"
I have been looking into Awk but cannot really make sense of how to preserve the actual row.
Another Example:
Input:
9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working
Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
Output:
9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
One way using GNU awk:
awk -f script.awk file.txt
Contents of script.awk:
BEGIN {
FS = "[,~]"
}
NF < 21 {
line = (line ? line OFS : line) $0
fields = fields + NF
}
fields >= 21 {
print line
line=""
fields=0
}
NF == 21 {
print
}
Alternatively, you can use this one-liner:
awk -F "[,~]" 'NF < 21 { line = (line ? line OFS : line) $0; fields = fields + NF } fields >= 21 { print line; line=""; fields=0 } NF == 21 { print }' file.txt
Explanation:
I made an observation about your expected output: it seems each line should contain exactly 21 fields. Therefore if your line contains less than 21 fields, store the line and store the number of fields. When we loop onto the next line, the line will be joined to the stored line with a space, and the number of fields totaled. If this number of fields is greater or equal to 21 (the sum of the fields of a broken line will add to 22), print the stored line. Else if the line contains 21 fields (NF == 21), print it. HTH.
I think sed is your choice. I assume all the records end with non-colon character, thus if a line end with a colon, it is recognized as an exception and should be concatenated to the previous line.
Here is the code:
cat data | sed -e '/[^"]$/N' -e 's/\n//g'
The first execution -e '/[^"]$/N' match an abnormal case, and read in next record without empty the buffer. Then -e 's/\n//g' remove the new line character.
try this one-liner:
awk '{if(t){print;t=0;next;}x=$0;n=gsub(/"/,"",x);if(n%2){printf $0" ";t=1;}else print $0}' file
idea:
count the number of " in a line. if the count is odd, join the following line, otherwise the current line would be considered as a complete line.

Resources