Use grep to extract specific words from a string under conditions and print it to variable file names - bash

I have a sample file like this:
r 2.1238 /NodeList/1/DeviceList/1/ ( type=0x806, source=00:00, destination=ff:ff) ns3::IPv4Header (source ipv4: 10.1.2.1 dest ipv4: 10.1.2.4)
+ 2.0076 /NodeList/0/DeviceList/1/ ( type=0x806, source=00:00, destination=ff:ff) ns3::ArpHeader (source ipv4: 10.1.2.1 dest ipv4: 10.1.2.4)
- 2.0077 /NodeList/1/DeviceList/1/ ( type=0x806, source=00:00, destination=ff:ff) ns3::IPv4Header (source ipv4: 10.1.2.1 dest ipv4: 10.1.2.4)
d 4.0042 /NodeList/2/DeviceList/1/ ( type=0x806, source=00:00, destination=ff:ff) ns3::IPv4Header (source ipv4: 10.1.2.1 dest ipv4: 10.1.2.4)
and so on.
The bold parts are the important fields. The first field can be '-', '+', 'r' or 'd'. The second field has a time stamp, in bold. Third field is to be the Node number, again in bold. The node number refers to the file to which the previous two fields is to be printed.
The restriction is that I want data from only those lines containing IPv4Header, eg. 1st, 3rd and 4th line in above sample.
So, I want my output to be like:
Node0.txt:
+ 2.0076
Node1.txt:
r 2.1238
- 2.0077
As you can see Node0 in line 2 is rejected as it does not contain IPv4Header. As we can see the number of output files being generated is equal to the number of nodes having IPv4Header.
Assume the number of lines and nodes to be multiple and variable for different files. Can someone please provide me with the necessary grep code?

awk to the rescue!
$ awk -F/ '/IPv4Header/{print $1 > "Node" $3 ".txt" }' file
should create two files with contents
Node1.txt: r 2.1238
- 2.0077
Node2.txt: d 4.0042
note that Node0 won't be created as per your definition.

What you are actually looking for is sed. For example, using sed you could do the following:
$ sed -n '/IPv4Header/s/^\([+-rd]\)[ ]*\([0-9.]*\)[^0-9]*\([0-9]\).*$/Node\3.txt \1 \2/p' file
Node1.txt r 2.1238
Node1.txt - 2.0077
Node2.txt d 4.0042
Where
sed -n suppresses normal outoput,
/IPv4Header/ locate only lines containing IPv4Header
What following is the standard sed substitute s/match/replace/ where match uses capture groups \(stuff\) to save stuff for use in replace using backreferences where \1, \2... is replaced with stuff1, stuff2, etc..
^\([+-rd]\) save the first character if it is one of +-rd in capture group 1,
[ ]* skip any number of spaces,
\([0-9.]*\) save sequence of 0-9 and . in capture group 2,
[^0-9]* skip anything not a 0-9,
\([0-9]\) capture the next number in capture group 3,
.*$ skip remaining chars
/Node\3 \1 \2/ replace with the backreferences in the order shown
p print it.
Give it a try and let me know if it is doing what you need.

Related

sed replace string with pipe and stars

I have the following string:
|**barak**.version|2001.0132012031539|
in file text.txt.
I would like to replace it with the following:
|**barak**.version|2001.01.2012031541|
So I run:
sed -i "s/\|\*\*$module\*\*.version\|2001.0132012031539/|**$module**.version|$version/" text.txt
but the result is a duplicate instead of replacing:
|**barak**.version|2001.01.2012031541|**barak**.version|2001.0132012031539|
What am I doing wrong?
Here is the value for module and version:
$ echo $module
barak
$ echo $version
2001.01.2012031541
Assumptions:
lines of interest start and end with a pipe (|) and have one more pipe somewhere in the middle of the data
search is based solely on the value of ${module} existing between the 1st/2nd pipes in the data
we don't know what else may be between the 1st/2nd pipes
the version number is the only thing between the 2nd/3rd pipes
we don't know the version number that we'll be replacing
Sample data:
$ module='barak'
$ version='2001.01.2012031541'
$ cat text.txt
**barak**.version|2001.0132012031539| <<<=== leave this one alone
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.0132012031539| <<<=== replace this one
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.0132012031539| <<<=== replace this one
One sed solution with -Extended regex support enabled and making use of a capture group:
$ sed -E "s/^(\|[^|]*${module}[^|]*).*/\1|${version}|/" text.txt
Where:
\| - first occurrence (escaped pipe) tells sed we're dealing with a literal pipe; follow-on pipes will be treated as literal strings
^(\|[^|]*${module}[^|]*) - first capture group that starts at the beginning of the line, starts with a pipe, then some number of non-pipe characters, then the search pattern (${module}), then more non-pipe characters (continues up to next pipe character)
.* - matches rest of the line (which we're going to discard)
\1|${version}| - replace line with our first capture group, then a pipe, then the new replacement value (${version}), then the final pipe
The above generates:
**barak**.version|2001.0132012031539|
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.01.2012031541| <<<=== replaced
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.01.2012031541| <<<=== replaced
An awk alternative using GNU awk:
awk -v mod="$module" -v vers="$version" -F \| '{ OFS=FS;split($2,map,".");inmod=substr(map[1],3,length(map[1])-4);if (inmod==mod) { $3=vers } }1' file
Pass two variables mod and vers to awk using $module and $version. Set the field delimiter to |. Split the second field into array map using the split function and using . as the delimiter. Then strip the leading and ending "**" from the first index of the array to expose the module name as inmod using the substr function. Compare this to the mod variable and if there is a match, change the 3rd delimited field to the variable vers. Print the lines with short hand 1
Pipe is only special when you're using extended regular expressions: sed -E
There's no reason why you need extended here, stick with basic regex:
sed "
# for lines matching module.version
/|\*\*$module\*\*.version|/ {
# replace the version
s/|2001.0132012031539|/|$version|/
}
" text.txt
or as an unreadable one-liner
sed "/|\*\*$module\*\*.version|/ s/|2001.0132012031539|/|$version|/" text.txt

Allow only specifi character else null should transfer in unix

Allow characters in 2nd columns are 0 to 9 and A to Z and Symbol like "+" and "-", if allow character found in 2nd column then complete record should be Transfer else null should be Transfer in 2nd column
Input
- 1|89+
- 2|-AB
- 3|XY*
- 4|PR%
Output
- 1|89+
- 2|-AB
- 3|<null>
- 4|<null>
grep -E '^[a-zA-Z0-9\+\-\|]+$' file > file1
but above code is discard complete record if matching not found, I Need all records but if matching found then it should Transfer else null Transfer.
Use sed to replace everything after a pipe, that begins with zero or more characters in the class of digits, letters, plus or minus followed by one character not in that class up to the end of the string with a pipe only.
sed 's/\|[0-9a-zA-Z+-]*[^0-9a-zA-Z+-].*$/|/' file
Using awk and character classes where supported:
$ awk 'BEGIN{FS=OFS="|"}$2~/[^[:alnum:]+-]/{$2=""}1' file
1|89+
2|-AB
3|
4|
Where not supported (such as mawk) use:
$ awk 'BEGIN{FS=OFS="|"}$2~/[^A-Za-z0-9+-]/{$2=""}1' file

reformatting text file from rows to column

i have multiple files in a directory that i need to reformat and put the output in one file, the file structure is:
========================================================
Daily KPIs - DATE: 24/04/2013
========================================================
--------------------------------------------------------
Number of des = 5270
--------------------------------------------------------
Number of users = 210
--------------------------------------------------------
Number of active = 520
--------------------------------------------------------
Total non = 713
--------------------------------------------------------
========================================================
I need the output format to be:
Date,Numberofdes,Numberofusers,Numberofactive,Totalnon
24042013,5270,210,520,713
The directory has around 1500 files with the same format and im using Centos 7.
Thanks
First we need a method to join the elements of an array into a string (cf. Join elements of an array?):
function join_array()
{
local IFS=$1
shift
echo "$*"
}
Then we can cycle over each of the files and convert each one into a comma-separated list (assuming that the original file have a name ending in *.txt).
for f in *.txt
do
sed -n 's/[^:=]\+[:=] *\(.*\)/\1/p' < $f | {
mapfile -t fields
join_array , "${fields[#]}"
}
done
Here, the sed command looks inside each input file for lines that:
begin with a substring that contains neither a : nor a = character (the [^:=]\+ part);
then follow a : or a = and an arbitrary number of spaces (the [:=] * part);
finally, end with an arbitrary substring (the *\(.*\) part).
The last substring is then captured and printed instead of the original string. Any other line in the input files is discared.
After that, the output of sed is read by mapfile into the indexed array variable fields (the -t ensures that trailing newlines from each line read are discarded) and finally the lines are joined thanks to our previously-defined join_array method.
The reason whereby we need to wrap mapfile inside a subshell is explained here: readarray (or pipe) issue.

How to read and replace Special characters in a fixed length file using shell script

I have a fixed length file in which some records have different special characters like Еӏєпа
I'm able to select those records containing special characters/.
I want to read 2 columns from those records and update it with '*' padded with blanks
Sample Data :
1234562013-09-01 01:05:30Еӏєпа Нцвѡі A other
5657812011-05-05 02:34:56abu jaya B other
Specifically, the 3rd and 4th column containing special characters, should be replaced with a single '*' padded with blanks to fill the length
I need result like below
1234562013-09-01 01:05:30* * A2013-09-01 02:03:40other
5657812011-05-05 02:34:56abu jaya B2013-09-01 07:06:10other
Tried the following commands :
sed -r "s/^(.{56}).{510}/\1$PAD/g;s/^(.{511}).{1023}/\1$PAD/g" errorline.txt
cut -c 57-568
Could someone help me out with this?
I would go with awk, something like:
awk '/[LIST__OF_SPECIAL_CHARS]/ {
l=$0
# for 3rd col
# NOTE the * must be padded if you have a fixed length file
# This can be done with spaces and/or (s)printf, read the docs
if (substr($0,FROM,NUM_OF_CHARS) ~ /[LIST__OF_SPECIAL_CHARS]/) {
l=substr(l,1,START_OF_3RD_COL_MINUS_1) "*" substr(l,START_OF_4TH_COL)
}
# for 4th col
# NOTE the * must be padded if you have a fixed length file
# This can be done with spaces and/or (s)printf, read the docs
if (substr($0,START_OF_4TH_COL,NUM_OF_CHARS) ~ /[LIST__OF_SPECIAL_CHARS]/) {
l=substr(l,1,START_OF_4TH_COL_MINUS_1) "*" substr(l,END_OF_4TH_COL_PLUS_1)
}
# after printing this line, skip to next record.
print l
next
}
{ # prints every other record
print }' INPUTFILE
sed "/.\{56\}.*[^a-zA-Z0-9 ].*.\{7\}/ s/\(.\{56\}\).\{20\}\(.\{7\}\)/\1* * \2/"errorline.txt
where:
56 is the first part of your line that don't contain special char
20 is the second part taht contain maybe special char
7 is the last part, end of your string.
"* * " is the string that will replace your special char section.
Adapt those values to your string structure
This sed read all the file and replace only the lines with special char.

Using sed to search large number of files for specific string and replace it

What I am trying to do is search a large number of source files for a particular pattern and put in fort of this pattern another expression. The files I am looking in are all with the same extension *.F90.
My first step is to use grep and find all lines of those files containing allocate but not allocated, so I have:
grep –I “ allocate *(” *.F90 | grep –v allocated
The first problem that I have is that the bracket might be preceded by one or more spaces. I can have
allocate(
or allocate (
or allocate (
This is why I need the “*” in the grep command.
The general rule however (besides the spaces) says that the allocate is followed by “(” and than comes the thing that is being allocated. So I have:
allocate ( array_name ( ....
again the spaces are optional
So what I would like to do is find this string, and put in front of it the following:
If( allocated(array_name) ) deallocate(array_name)
and imidiately after this on the next line I would like to have the original string allocate(array( … .
Please note that the array_name is an alphanumeric string which after the substitutions is appearing in more than one place. It is the name of the array being alocated.
I would be very grateful if someone can give me a hint how to do this. I am stuck and have no idea how to do it.
I assume you mean you want to replace allocate ( array_name ) with If( allocated(array_name) ) deallocate(array_name) allocate ( array_name ).
In GNU or BSD sed you can do the following:
sed -i.bk -e '/allocated/t' \
-e 's/allocate *( *\([A-Za-z0-9_]*\) *)/If( allocated(\1) ) deallocate(\1) &/' \
*.F90
This will search and replace matching lines in *.F90 and skip lines with allocated on. The original file will be called *.F90.bk.
As #Anders Johansson mentioned there can be other cases where the argument to allocate is something not alphanum-underscore, then you can search for this before you search and replace:
for i in *.F90; do
echo "$i"
sed -n '/.*allocate *( *\([^ )]*\) *).*/{h; s//\1/; /^[A-Za-z0-9_]*$/t
x; p;}' "$i"
done
(note the newline after t, BSD sed interpret everything after t as a label). Press ctrl+v ctrl+j in bash to input a newline on the command line.
/a\(b\)c/ find line with matching string
h *h*old the match abc into hold space
s//\1/ *s*ubstitute last match abc with first group b
/^[a-z]*$/t if b matches ^[a-z]*$, then branch to end of script
x e*x*change hold space abc an pattern space b
p *p*rint pattern space b
cat old_file.txt | sed 's/allocate *( *\([a-zA-Z0-9_]*\)/If( allocated(\1) ) deallocate(\1)\
allocate(\1/' > new_file.txt

Resources