I have a text document with the following content:
[ForwardTimer],__fc_layer_1__,[Span:1ms970us]
[ForwardTimer],__batch_norm_2__,[Span:5ms64us]
[ForwardTimer],__batch_norm_3__,[Span:5ms87us]
I want to convert the time values in ms unit, like
[ForwardTimer],__fc_layer_1__,1.970ms
[ForwardTimer],__batch_norm_2__,5.064ms
[ForwardTimer],__batch_norm_3__,5.087ms
while keeping previous words unchanged.
How can I process the document using shell script, especially with sed or awk command?
awk -F '\\[Span:' '{split($2,array,"ms|us"); printf("%s%s.%03dms\n",$1,array[1],array[2])}' file.txt
Output:
[ForwardTimer],__fc_layer_1__,1.970ms
[ForwardTimer],__batch_norm_2__,5.064ms
[ForwardTimer],__batch_norm_3__,5.087ms
This splits your lines with [Span: as field separator in two parts ($1 and $2). With function split() and ms or us as field separator it splits $2 in three parts (array[1], array[2] and array[3]). array[3] is unused. The formatted output then makes printf().
This might work for you (GNU sed):
sed -E 's/\[Span:([0-9]*)([^0-9]*)([0-9]*)[^]]*[]]/\1.\n\3\2/;:a;/\n[0-9]{3}/!s/\n/&0/;ta;s/\n//' file
Use pattern matching and back references to achieve the desired result.
Not forgetting to zero space the decimal part of the match using a loop and the introduced newline, which on completion is removed.
The first substitution command focuses on a string such as [Span:5ms64us] and if found groups the 5 in back reference 1, ms in back reference 2 and 64 in back reference 3. These are rearranged into \1.\n\3\2 i.e. 5.\n64ms and the remainder of the initial string is removed.
The second part of the sed script zero spaces the decimal part of back reference 3 to be 3 digits long with leading zeroes. Using the \n as a marker, if the numeric digits following the \n is less than 3 in length, a 0 is appended to the \n and the check is repeated. Once the check passes i.e. there are 3 digits, the \n is removed and the processing is complete.
Related
This question already has answers here:
sed: replace values in a single column
(3 answers)
Closed last month.
I have a file that has columns seperated by a semi column(;) and I want to change all occurrences of a word in a particular column only to another word. The column number differentiates based on the variable that holds the column number. The word I want to change is stored in a variable, and the word I want to change to is stored in a variable too.
I tried
sed -i "s/\<$word\>/$wordUpdate/g" $anyFile
I tried this but it changed all occurrences of word in the whole file! I only want in a particular column
the number of column is stored in a variable called numColumn
and the columns are seperated by a semi column ;
It is much simpler to use awk for column edits, e.g. if your input looks like this:
68;61;83;27;60;70;84;11;46;62;93;97;40;23;19
33;70;17;49;81;21;68;83;16;6;42;38;68;81;89
73;40;95;64;32;33;77;56;23;11;70;28;33;80;24
8;9;74;6;86;78;87;41;11;79;23;28;71;99;15
29;87;77;9;98;12;7;66;60;85;20;14;55;97;17
39;24;21;58;23;61;39;26;57;70;76;16;70;53;8
37;46;18;64;56;28;86;7;80;71;94;46;19;53;43
71;2;47;62;9;21;68;9;9;80;32;59;73;74;72
20;34;89;58;74;92;86;35;48;81;50;6;63;67;90
78;17;6;63;61;65;75;31;33;82;24;5;90;46;12
You can replace 60 in column c with s with something like this:
<infile awk '$c ~ m { $c = s } 1' FS=';' OFS=';' c=5 m=60 s=XX
Output:
68;61;83;27;XX;70;84;11;46;62;93;97;40;23;19
33;70;17;49;81;21;68;83;16;6;42;38;68;81;89
73;40;95;64;32;33;77;56;23;11;70;28;33;80;24
8;9;74;6;86;78;87;41;11;79;23;28;71;99;15
29;87;77;9;98;12;7;66;60;85;20;14;55;97;17
39;24;21;58;23;61;39;26;57;70;76;16;70;53;8
37;46;18;64;56;28;86;7;80;71;94;46;19;53;43
71;2;47;62;9;21;68;9;9;80;32;59;73;74;72
20;34;89;58;74;92;86;35;48;81;50;6;63;67;90
78;17;6;63;61;65;75;31;33;82;24;5;90;46;12
This might work for you (GNU sed):
word=foo wordUpdate=bar numColumn=3
sed -i 'y/;/\n/
s#.*#echo "&" | sed "'${numColumn}'s/\<'${word}'\>/'${wordUpdate}'/"#e
y/\n/;/' file
Convert each line into a separate file where the columns are lines.
Substitute the matching line (column number) with the word for the updated word.
Reverse the conversion.
N.B. The solution relies on the GNU only e evaluation flag. Also the word and updateWord may need to be quoted.
This can be done with a little creativity...
Note that I'm using double-quotes to embed the logic. This takes a little extra care to double your \'s on backreferences.
$: word=baz; c=3; new=XX; lead="^([^;]*;){$((c-1))}"; sed -E "/$lead$word;/{s/($lead)$word/\\1$new/}" file
1;2;3;4;5;6;7;8;9;0;
foo;bar;XX;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
Explained:
lead="^([^;]*;){$((c-1))}"
^ means at the start of a record
(...) is grouping for the following {...} which specified repetition
[^;]* mean zero or more non-semicolons
$((c-1)) does the math and returns one less than the desired column; if you want to look at column 3, it returns two.
SO, ^([^;]*;){$((c-1))} at the start of the record, one-less-than-column occurrences of non-semicolons followed by a semicolon
thus, sed -E "/$lead$word;/{s/($lead)$word/\\1$new/}" file mean read file and on records where $word occurs in the requested column, save everything before it, and put that stuff back, but replace $word with $new.
Even if you MUST use sed, I recommend a function.
fix(){
local word="$1" col="$2" new="$3" file="$4"
local lead="^([^;]*;){$((col-1))}"
sed -E "/$lead$word;/{s/($lead)$word/\\1$new/}" "$file"
}
In use -
$: fix bar 2 HI file
1;2;3;4;5;6;7;8;9;0;
foo;HI;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
$: fix 1 1 XX file
XX;2;3;4;5;6;7;8;9;0;
foo;bar;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
$: fix bar 2 '(^_^)' file
1;2;3;4;5;6;7;8;9;0;
foo;(^_^);baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
No changes if no matches -
$: fix bar 5 HI file
1;2;3;4;5;6;7;8;9;0;
foo;bar;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
NOTE -
This logic requires trailing delimiters if you ever want to match the last field -
$: fix 0 10 HI file
1;2;3;4;5;6;7;8;9;HI;
foo;bar;baz;qux;foo;bar;baz;qux;
a;b;c;d;e;f;g;
delimiters removed:
$: fix 0 10 HI file
1;2;3;4;5;6;7;8;9;0
foo;bar;baz;qux;foo;bar;baz;qux
a;b;c;d;e;f;g
Otherwise you have to complicate the logic a bit.
But honestly, for field parsing, you'd be so much better served to use awk, or even perl or python, or for that matter a bash loop, though that's going to be relatively slow.
I have the following contents in a file
{"Hi","Hello","unix":["five","six"]}
I would like to replace comma within the square brackets only to semi colon. Rest of the comma's in the line should not be changed.
Output should be
{"Hi","Hello","unix":["five";"six"]}
I have tried using sed but it is not working. Below is the command I tried. Kindly help.
sed 's/:\[*\,*\]/;/'
Thanks
If your Input_file is same as sample shown then following may help you in same.
sed 's/\([^[]*\)\([^,]*\),\(.*\)/\1\2;\3/g' Input_file
Output will be as follows.
{"Hi","Hello","unix":["five";"six"]}
EDIT: Adding explanation also for same now, it should be only taken for explanation purposes, one should run above code only for getting the output.
sed 's/\([^[]*\)\([^,]*\),\(.*\)/\1\2;\3/g' Input_file
s ##is for substitution in sed.
\([^[]*\) ##Creating the first memory hold which will have the contents from starting to before first occurrence of [ and will be obtained by 1 later in code.
\([^,]*\) ##creating second memory hold which will have everything from [(till where it stopped yesterday) to first occurrence of ,
, ##Putting , here in the line of Input_file.
\(.*\) ##creating third memory hold which will have everything after ,(comma) to till end of current line.
/\1\2;\3/g ##Now mentioning the memory hold by their number \1\2;\3/g so point to be noted here between \2 and \3 have out ;(semi colon) as per OP's request it needed semi colon in place of comma.
Awk would also be useful here
awk -F'[][]' '{gsub(/,/,";",$2); print $1"["$2"]"$3}' file
by using gsub, you can replace all occurrences of matched symbol inside a specific field
Input File
{"Hi","Hello","unix":["five","six"]}
{"Hi","Hello","unix":["five","six","seven","eight"]}
Output
{"Hi","Hello","unix":["five";"six"]}
{"Hi","Hello","unix":["five";"six";"seven";"eight"]}
You should definitely use RavinderSingh13's answer instead of mine (it's less likely to break or exhibit unexpected behavior given very complex input) but here's a less robust answer that's a little easier to explain than his:
sed -r 's/(:\[.*),(.*\])/\1;\2/g' test
() is a capture group. You can see there are two in the search. In the replace, they are refered to as \1 and \2. This allows you to put chunks of your search back in the replace expression. -r keeps the ( and ) from needing to be escaped with a backslash. [ and ] are special and need to be escaped for literal interpretation. Oh, and you wanted .* not *. The * is a glob and is used in some places in bash and other shells, but not in regexes alone.
edit: and /g allows the replacement to happen multiple times.
I am trying to clean up the next file:
1. 10.160.120.10 ; 140.0.0.40 ;Data-- 1155~00120~xtl~12/01/2016 03:00:24~000BBBBBA4FB~ÍežG5„È&gÈe#Ÿ#•Œ‘„¦åEI²6frÞõ+ã:®*ÓÓÂ"ða5»V$è~
2. ¼?Amµxðïej£„7‹ìËÏð‡.4 --
3. 10.160.120.11 ; 140.10.10.10 ;Data-- 1155~00120~xtl~12/01/2016 03:00:54~2B3BB1EB1BBB~£ˆD]†CÀ,£ÑÉ»In&Ry+/jÑ%A¡ã ÷d_#C÷—NÏÕÞ
3. Ü‚úè"åD\’c\ûñ7x°yFæï --
Note that the numbers are not an actual part of the file. They are just reference for the number of line. The size of the line depends on the encoded message (That is why the 3 is reapeated because it basically one line). There are thousands of records but they follow the same pattern. Each record ends with a (--).
Basically what I am trying to achive is to just get the IPs side by side.
For example:
10.160.120.10 000BBBBBA4FB
My first step would be to delete everything between the first (;) and the fourth (~) since that pattern is the same for each record.
Which leads me to this.
sed 's/;.*~//'
However this particular command would delete everything untill the last (~) and not the fourth.
If it succesfully removes everything between the first (;) and the fourth (~) it would get me something like this:
0.165.65.113 0008B9A4F3~ÍežG5„È&gÈe#Ÿ#•Œ‘„¦åEI²6frÞõ+ã:®*ÓÓÂ"ða5»V$è~
¼?Amµxðïej£„7‹ìËÏð‡.4 --
And then I guess I could delete everything after the first (~) so I can get the desired output.
Am I following the right procedure? Should I achive this with swd or awk? Any suggestion are appreciated!
Instead of trying to remove stuff, why don't you just keep the stuff you want?
sed -r -n 's/^[^0-9]*(([0-9]{1,3}\.){3}[0-9]{1,3}).*([0-9A-F]{12}).*$/\1 \3/p'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# IP Address 12 Hex digits
Explanation:
\1 \3 means enter everything that matched the first and the third set of parenthesis of the search term.
^[^0-9]* matches all non-digits from the beginning of the file
([0-9]{1,3}\.){3}[0-9]{1,3} matches an IP address. The whole term is in parentheses because we want to keep it. The inner (...) could be referenced as \2 in the replacement term, but we don't need that.
[0-9A-F]{12} is simply 12 hexadecimal digits (upper case, use `[0-9a-fA-F] if you expect lower cases as well)
Assuming your data struture is the same
use several field separator at once with a class including ";" and "~". Be carefull , not space alone as separator like by default that return a different field 3 (and 6)
awk -F '[[:blank:]*[;~][[:blank:]]*' '/--$/ {print $1 " " $7}' YourFile
Assuming there is only space char and no tab as separator and data line have Data
awk -F ' *[;~] *' '/--$/ {print $1 " " $7}' YourFile
I have a file with fields separated by the '`' character. But sometimes the actual data also contains this character. How can I remove all the erroneous rows and retain only the good quality data.
Sample Row as below . Towards the end 'fff`ff' this is the erroneous column . in such case The row should be eliminated.
xxx`1000165811`2012`2012_q2`05/09/2012 22:02:00`1343`04/07/2004 00:00:00`05/09/2012 00:00:00````F`1`1.000000`9.620000`1.0000````fff`Not`Free`Free`1.000000`9.620000`0.000000`1.0000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`56565666`255.590000`21`0`0.000000```ddd`dddd`FA May 2012 ddd`0.000000`0.000000`0.000000`0.000000`0.000000`05/30/2012 00:00:00`05/30/2012 00:00:00`1.000000`ddd`ddd`OW`DL`dd dd dd`ddd`dd`dd dd`dd dd`0.000000`0.000000``````````0.000000`````````Non_Mobile`9.620000`1.000000`1`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`9.620000`9.620000`0.000000`0.000000`0.000000`0.000000`28.590000`6.990000`**fff`ff**`````````9.620000`1.000000`1
You need to know what the correct number of delimiters in a line is. You need to count the actual number of delimiters in each line, and reject those lines where the actual count is not the correct number.
Assuming the the correct number of separators is n=5, then you could try:
n=5
grep -E '^[^`]*(`[^`]*){'"$n"'}$' data
The regex uses extended regular expressions (-E). The regex matches the start of the line, zero or more non-back-ticks, then a sequence of n occurrences of a back tick followed by zero or more non-back-ticks, followed by the end of line. Because the back-tick is a shell metacharacter, it is best to enclose most of the regular expression in single quotes. The variable $n could be used without the double quotes around it, but it's generally best to enclose variables in double quotes. Clearly, you can also use this version too:
grep -E '^([^`]*`){'"$n"'}[^`]*$' data
Given a data file data:
AA`BB`CC`DD`EE`FF
AABB`CC`DD`EE`FF
A`A`BB`CC`DD`EE`FF
`BB`CC`DD`EE`FF
`BB`CC`DD`EE`
``CC`DD`EE`
``CC``EE`
````EE`
`BB```EE`
`````
``````
````
Welcome`to`the`land`of`insanity
The output of the command is:
AA`BB`CC`DD`EE`FF
`BB`CC`DD`EE`FF
`BB`CC`DD`EE`
``CC`DD`EE`
``CC``EE`
````EE`
`BB```EE`
`````
Welcome`to`the`land`of`insanity
grep -v "[^`]`[^`]`[^`]`"
you need to have one more times that the correct lines would have
In the spirit of "Be careful what you ask for", here is a "one-liner" (spread over three lines for readability) that will do what was asked, using only awk and assuming that $FILE is the relevant filename.
awk -F'`' -v file="$FILE" '
BEGIN{ while(getline<file){if (min==""||NF<min){min=NF}}}
NF==min' "$FILE"
This incantation first determines the minimum number of delimiters per line (without sorting the file), and then rejects all lines with more than that many.
(This is similar to Ed Morton's proposal, but without the bug :-)
I found several related questions, but none of them fits what I need, and since I am a real beginner, I can't figure it out.
I have a text file with entries like this, separated by a blank line:
example entry &with/ special characters
next line (any characters)
next %*entry
more words
I would like the output merge the lines, put a comma between, and delete empty lines. I.e., the example should look like this:
example entry &with/ special characters, next line (any characters)
next %*entry, more words
I would prefer sed, because I know it a little bit, but am also happy about any other solution on the linux command line.
Improved per Kent's elegant suggestion:
awk 'BEGIN{RS="";FS="\n";OFS=","}{$1=$1}7' file
which allows any number of lines per block, rather than the 2 rigid lines per block I had. Thank you, Kent. Note: The 7 is Kent's trademark... any non-zero expression will cause awk to print the entire record, and he likes 7.
You can do this with awk:
awk 'BEGIN{RS="";FS="\n";OFS=","}{print $1,$2}' file
That sets the record separator to blank lines, the field separator to newlines and the output field separator to a comma.
Output:
example entry &with/ special characters,next line (any characters)
next %*entry,more words
Simple sed command,
sed ':a;N;$!ba;s/\n/, /g;s/, , /\n/g' file
:a;N;$!ba;s/\n/, /g -> According to this answer, this code replaces all the new lines with ,(comma and space).
So After running only the first command, the output would be
example entry &with/ special characters, next line (any characters), , next %*entry, more words
s/, , /\n/g - > Replacing , , with new line in the above output will give you the desired result.
example entry &with/ special characters, next line (any characters)
next %*entry, more words
This might work for you (GNU sed):
sed ':a;$!N;/.\n./s/\n/, /;ta;/^[^\n]/P;D' file
Append the next line to the current line and if there are characters either side of the newline substitute the newline with a comma and a space and then repeat. Eventually an empty line or the end-of-file will be reached, then only print the next line if it is not empty.
Another version but a little more sofisticated (allowing for white space in the empty line) would be:
sed ':a;$!N;/^\s*$/M!s/\n/, /;ta;/\`\s*$/M!P;D' file
sed -n '1h;1!H
$ {x
s/\([^[:cntrl:]]\)\n\([^[:cntrl:]]\)/\1, \2/g
s/\(\n\)\n\{1,\}/\1/g
p
}' YourFile
change all after loading file in buffer. Could be done "on the fly" while reading the file and based on empty line or not.
use -e on GNU sed