Below here I'm trying to remove comma only from msg column.
Input file ("abc.txt" has many entries as below):
alert tcp any any -> any [10,112,34] (msg:"Its an Test, Rule"; reference:url,view/Main; sid:1234; rev:1;)
Expected Output:
alert tcp any any -> any [10,112,34] (msg:"Its an Test Rule"; reference:url,view/Main; sid:1234; rev:1;)
This is what i have tried using awk:
awk -F ';' '{for(i=1;i<=NF;i++){if(match($i,"msg:")>0){split($i, array, "\"");tmessage=array[2];gsub("[',']","",tmessage);message=tmessage; }}print message'} abc.txt
The problem with having awk rewrite your fields is that output for modified lines will be field-separated by OFS, which is static.
The way around this is to avoid dealing with fields, and just handle the string replacement on $0. You could piece together the parts of the line manually, like this:
awk '{x=index($0,"msg:"); y=index(substr($0,x),";"); s=substr($0,x,y); gsub(/,/,"",s); print substr($0,1,x-1) s substr($0,x+y)}' input.txt
Or spelled out for easier reading:
{
x=index($0,"msg:") # find the offset of the interesting bit
y=index(substr($0,x),";") # find the length of that bit
s=substr($0,x,y) # clip the bit
gsub(/,/,"",s) # replace commas,
print substr($0,1,x-1) s substr($0,x+y) # print the result.
}
Related
Suppose I have text as:
This is a sample text.
I have 2 sentences.
text is present there.
I need to replace whole text between two 'text' words. The required solution should be
This is a sample text.
I have new sentences.
text is present there.
I tried using the below command but its not working:
sed -i 's/text.*?text/text\
\nI have new sentence/g' file.txt
With your shown samples please try following. sed doesn't support lazy matching in regex. With awk's RS you could do the substitution with your shown samples only. You need to create variable val which has new value in it. Then in awk performing simple substitution operation will so the rest to get your expected output.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file
Above code will print output on terminal, once you are Happy with results of above and want to save output into Input_file itself then try following code.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file > temp && mv temp Input_file
You have already solved your problem using awk, but in case anyone else will be looking for a sed solution in the future, here's a sed script that does what you needed. Granted, the script is using some advanced sed features, but that's the fun part of it :)
replace.sed
#!/usr/bin/env sed -nEf
# This pattern determines the start marker for the range of lines where we
# want to perform the substitution. In our case the pattern is any line that
# ends with "text." — the `$` symbol meaning end-of-line.
/text\.$/ {
# [p]rint the start-marker line.
p
# Next, we'll read lines (using `n`) in a loop, so mark this point in
# the script as the beginning of the loop using a label called `loop`.
:loop
# Read the next line.
n
# If the last read line doesn't match the pattern for the end marker,
# just continue looping by [b]ranching to the `:loop` label.
/^text/! {
b loop
}
# If the last read line matches the end marker pattern, then just insert
# the text we want and print the last read line. The net effect is that
# all the previous read lines will be replaced by the inserted text.
/^text/ {
# Insert the replacement text
i\
I have a new sentence.
# [print] the end-marker line
p
}
# Exit the script, so that we don't hit the [p]rint command below.
b
}
# Print all other lines.
p
Usage
$ cat lines.txt
foo
This is a sample text.
I have many sentences.
I have many sentences.
I have many sentences.
I have many sentences.
text is present there.
bar
$
$ ./replace.sed lines.txt
foo
This is a sample text.
I have a new sentence.
text is present there.
bar
Substitue
sed -i 's/I have 2 sentences./I have new sentences./g'
sed -i 's/[A-Z]\s[a-z].*/I have new sentences./g'
Insert
sed -i -e '2iI have new sentences.' -e '2d'
I need to replace whole text between two 'text' words.
If I understand, first text. (with a dot) is at the end of first line and second text at the beginning of third line. With awk you can get the required solution adding values to var s:
awk -v s='\nI have new sentences.\n' '/text.?$/ {s=$0 s;next} /^text/ {s=s $0;print s;s=""}' file
This is a sample text.
I have new sentences.
text is present there.
I am wondering if there is a way to get paragraphs of text (source file would be a pyx file) by number as sed does with lines
sed -n ${i}p
At this moment I'd be interested to use awk with:
awk '/custom-pyx-tag\(/,/\)custom-pyx-tag/'
but I can't find documentation or examples about that.
I'm also trying to trim "\r\n" with gsub(/\r\n/,"; ") int the same awk command but it doesn't work, and I can't really figure out why.
Any hint would be very appreciated, thanks
EDIT:
This is just one example and not my exact need but I would need to know how to do it for a multipurpose project
Let's take the case that I have exported the ID3Tags of a huge collection of audio files and these have been stored in a pyx-like format, so in the end I will have a nice big file with this pattern repeating for each file in the collection:
audio-genre(
blablabla
)audio-genre
audio-artist(
bla.blabla
)audio-artist
audio album(
bla-bla-bla
)audio-album
audio-track-num(
0x
)audio-track-num
audio-track-title(
bla.bla-bla
)audio-track-title
audio-lyrics(
blablablablabla
bla.bla.bla.bla
blah-blah-blah
blabla-blabla
)audio-lyrics
...
Now if I want to extract the artist of the 1234th audio file I can use:
awk '/audio-artist\(/, /)audio-artist/' | sed '/audio-artist/d' | sed -n 1234p
so being one line it can be obtained with sed, but I don't know how to get an entire paragraph given its index, for example if I want to get the lyrics of the 6543th file how could I do it?
In the end it is just a question of whether there is a command equivalent to
sed -n $ {num} p
but to be used for paragraphs
awk -v indx=1024
'BEGIN {
RS=""
}
{ split($0,arr,"audio-artist");
for (i=2;i<=length(arr);i=i+2)
{ gsub("[()]","",arr[i]);
arts[cnt+=1]=arr[i]
}
}
END {
print arts[indx]
}' audioartist
One liner:
awk -v indx=1234 'BEGIN {RS=""} NR==1 { split($0,arr,"audio-artist");for (i=2;i<=length(arr);i=i+2) { gsub("[()]","",arr[i]);arts[cnt+=1]=arr[i] } } END { print arts[indx] }' audioartist
Using awk, and the file called audioartist, we consume the file as one line by setting the records separator (RS) to "". We then split the whole file into an array arr, based on the separator audio-artist. We look through the array arr starting from 2 in steps of 2 till the end of the array and strip out the opening and closing brackets, creating another array called arts with an incrementing count as the index and the stripped artist as the value. At the end we print the arts index specified by the passed indx variable (in this case 1234).
XYZNA0000778800Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
I have above file format from which I want to find a matching record. For example, match a number(7789) on line starting with XYZ and once matched look for a matching number (7345) in lines below starting with 1 until it reaches to line starting with 9. retrieve the entire line record. How can I accomplish this using shell script, awk, sed or any combination.
Expected Output:
XYZNA0000778900Z
17345000012300324000000004000000000000000
With sed one can do:
$ sed -n '/^XYZ.*7789/,/^9$/{/^1.*7345/p}' file
17345000012300324000000004000000000000000
Breakdown:
sed -n ' ' # -n disabled automatic printing
/^XYZ.*7789/, # Match line starting with XYZ, and
# containing 7789
/^1.*7345/p # Print line starting with 1 and
# containing 7345, which is coming
# after the previous match
/^9$/ { } # Match line that is 9
range { stuff } will execute stuff when it's inside range, in this case the range is starting at /^XYZ.*7789/ and ending with /^9$/.
.* will match anything but newlines zero or more times.
If you want to print the whole block matching the conditions, one can use:
$ sed -n '/^XYZ.*7789/{:s;N;/\n9$/!bs;/\n1.*7345/p}' file
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
This works by reading lines between ^XYZ.*7779 and ^9$ into the pattern
space. And then printing the whole thing if ^1.*7345 can be matches:
sed -n ' ' # -n disables printing
/^XYZ.*7789/{ } # Match line starting
# with XYZ that also contains 7789
:s; # Define label s
N; # Append next line to pattern space
/\n9$/!bs; # Goto s unless \n9$ matches
/\n1.*7345/p # Print whole pattern space
# if \n1.*7345 matches
I'd use awk:
awk -v rid=7789 -v fid=7345 -v RS='\n9\n' -F '\n' 'index($1, rid) { for(i = 2; i < $NF; ++i) { if(index($i, fid)) { print $i; next } } }' filename
This works as follows:
-v RS='\n9\n' is the meat of the whole thing. Awk separates its input into records (by default lines). This sets the record separator to \n9\n, which means that records are separated by lines with a single 9 on them. These records are further separated into fields, and
-F '\n' tells awk that fields in a record are separated by newlines, so that each line in a record becomes a field.
-v rid=7789 -v fid=7345 sets two awk variables rid and fid (meant by me as record identifier and field identifier, respectively. The names are arbitrary.) to your search strings. You could encode these in the awk script directly, but this way makes it easier and safer to replace the values with those of a shell variables (which I expect you'll want to do).
Then the code:
index($1, rid) { # In records whose first field contains rid
for(i = 2; i < $NF; ++i) { # Walk through the fields from the second
if(index($i, fid)) { # When you find one that contains fid
print $i # Print it,
next # and continue with the next record.
} # Remove the "next" line if you want all matching
} # fields.
}
Note that multi-character record separators are not strictly required by POSIX awk, and I'm not certain if BSD awk accepts it. Both GNU awk and mawk do, though.
EDIT: Misread question the first time around.
an extendable awk script can be
$ awk '/^9$/{s=0} s&&/7345/; /^XYZ/&&/7789/{s=1} ' file
set flag s when line starts with XYZ and contains 7789; reset when line is just 9, and print when flag is set and contains pattern 7345.
This might work for you (GNU sed):
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789/!b;/7345/p' file
Use the option -n for the grep-like nature of sed. Gather up records beginning with XYZ and ending in 9. Reject any records which do not have 7789 in the header. Print any remaining records that contain 7345.
If the 7345 will always follow the header,this could be shortened to:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789.*7345/p' file
If all records are well-formed (begin XYZ and end in 9) then use:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^[^\n]*7789.*7345/p' file
Below is my sample data in the csv .
20160711,"M","N1","F","S","A","good data with.....some special character and space
space ..
....","M","072","00126"
20160711,"M","N1","F","S","A","R","M","072","00126"
20160711,"M","N1","F","S","A","R","M","072","00126"
In above in a field I have good data along with junk data and line splited to new line .
I want to remove this special character (due to this special char and space,the line was moved to the next line) as well as merge this split line to a single line.
currently I am using something like below which is taking lots of time :
tr -cd '\11\12\15\40-\176' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' MY_FILE.csv > MY_FILE.csv.tmp
attached a screenshot of original data in the file .
You could use
tr -c '[:print:]\r\n' ' ' <bad.csv >better.csv
to get rid of the non-printable chars…
sed '/[^"]$/ { N ; s/\n// }' better.csv | sed '/[^"]$/ { N ; s/\n// }' >even_better.csv
would cover most cases (i.e. would fail to trap an extra line break just after a random quote)
– Samson Scharfrichter
One problem that you will likely have with a traditional unix tool like awk is that while it supports field separators, it does not support quote+comma-style CSV formatting like the one in your screenshot or sample data. Awk can separate fields in a record using a field separator, but it has no concept of quote armour around your fields, so embedded commas are also considered field separators.
If you're comfortable with that because none of your plaintext data includes commas, and none of your "non-printable" data includes commas by accident, then you can just consider the quotes to be part of the field. They're printable characters, after all.
If you want to join your multi-line records into a single line and strip any non-printable characters, the following awk one-liner might do:
awk -F, 'NF<10{$0=last $0;last=$0} NF<10{next} {last="";sub(/[^[:print:]]/,"")} 1' inputfile
Note that this works except in cases where the line break is between the last comma and the content of the last field because from awk's perspective an empty field is valid and there's no need to join. If this logic doesn't match your data, you get another fun programming task as a result. :)
Let's break out the awk script and see what it does.
awk -F, ' # Set comma as the field separator...
NF<10 { # For any lines that have fewer than 10 fields...
$0=last $0 # Insert the last "saved" line here,
last=$0 # and save the newly joined line for the next round.
}
NF<10 { # If we still have fewer than 10 lines,
next # repeat.
}
{
sub(/[^[:print:]]/,"") # finally, substitute an empty string
} # for all non-printables,
1' inputfile # And print the current line.
This should be a simple fix but I cannot wrap my head around it at the moment.
I have a comma-delimited file called my_course that contains a list of courses with some information about them.
I need to get user input about the last two fields and change them accordingly.
Each field is constructed like:
CourseNumber,CourseTitle,CreditHours,Status,Grade
Example file:
CSC3210,COMPUTER ORG & PROGRAMMING,3,0,N/A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,0,N/A
I get the user input for 3 things: Course Number, Status (0 or 1), and Grade (A,B,C,N/A)
So far I have tried matching the line containing the course number and changing the last two fields. I haven't been about to figure out how to modify the last two fields using sed so I'm using this horrible jumble of awk and sed:
temporary=$(awk -v status=$status -v grade=$grade '
BEGIN { FS="," }; $(NF)=""; $(NF-1)="";
/'$cNum'/ {printf $0","status","grade;}' my_course)
sed -i "s/CSC$cNum.*/$temporary/g" my_course
The issue that I'm running into here is the number of fields in the course title can range from 1 to 4 so I can't just easily print the first n fields. I've tried removing the last two fields and appending the new values for status and grade but that isn't working for me.
Note: I have already done checks to ensure that the user inputs valid data.
Use a simple awk-script:
BEGIN {
FS=","
OFS=FS
}
$0 ~ course {
$(NF-1)=status
$NF=grade
} {print}
and on the cmd-line, set three parameters for the various parameters like course, status and grade.
in action:
$ cat input
CSC3210,COMPUTER ORG & PROGRAMMING,3,0,N/A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,0,N/A
$ awk -vcourse="CSC3210" -vstatus="1" -vgrade="A" -f grades.awk input
CSC3210,COMPUTER ORG & PROGRAMMING,3,1,A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,0,N/A
$ awk -vcourse="CSC1010" -vstatus="1" -vgrade="B" -f grades.awk input
CSC3210,COMPUTER ORG & PROGRAMMING,3,0,N/A
CSC2010,INTRO TO COMPUTER SCIENCE,3,0,N/A
CSC1010,COMPUTERS & APPLICATIONS,3,1,B
It doesn't matter how much commas you have in course name as long as you look only at last two commas:
sed -i "/CSC$cNum/ s/.,[^,]*$$/$status,$grade/"
The trick is to use $ in pattern to match the end of line. $$ because of double quotes.
And don't bother building the "temporary" line - apply substitution only to line that matches course number.