How to parse strace in shell into plain text?

How to parse strace in shell into plain text? - shell

I've trace log generated by strace command like on running PHP by:
sudo strace -e sendto -fp $(pgrep -n php) -o strace.log
And the output looks like:
11208 sendto(4, "set 29170397297_-cache-schema 85 0 127240\r\n\257\202\v\0?\0\0\0\2\27\10stdClass\24\7\21\3cid\21\6schema\21\4d\37ata\25\n\247\21\5block\24\6\21\6fields\24\f\21\3bid\24\2\5\21\4type 0\37erial\21\10not null\5\21\6module\24\4\16\7\21\7va\37rchar\21\6length\6#\16\t\5\21\7default\r\21\5de\2lta#\5\16\v\16\f\6 \35\7\16\r\21\0010\21\5t \207C\30#6\2\16\r\r n\4tatus#0\4\21\3int/\7\6\0\21\4size \222\finy\21\6weight\24\3 ;\0\22\300 \6\6region#8\340\5P\5custom\27\300,\17\16\23\16\24\21\nvisibility\340\t\34\7\5pages\24\2 \205\3\4tex#\206 \261\1it \365\0\5\240\0\377y\10\r\21\ftransl!N\2ble %\1ca!a\340\3Q\0\1n\31\vprimary key\24\1\6\0\16\6\21\vunique#\21\ts\24\1\21\3tmd\24\3 \31\0\20 2\v\n\6\2\16\16\21\7index \210\10\1\21\4list\24\5\240\36\0\21 \36\10\26\6\3\16\25\6\4\16\n \1\6\4\21\4name \7\0\na\317\2_ro\252\0\5!$\0\n \3\341\2\23\0\16\340\0\16A\214\2\21\3r!\354# \v\22\21\10unsigned\5#\332\0\36\213\0\n \213\0\16 l\6%\16!\24\1\16%\271\0%#p\5\16#\16$\21\f\200l\241b#n\2\4\16\6M\2\10\16&#E\4\21\4bod\201_\5\32\16\t\4\16\23B\\\2g\16\34 \30\3info .\0\7a\255\0\200#q!L\5\6forma\201\332B/!d\2\4\16\37 y\0*y\0 \225a;\240\201\2'\21\van\0_\207\200\2\5\16\1\340\0U =#U\1\16\3#\222 \212\2lob#O\n\23\16)\21\6expire#\30\342\0\26\7\21\7create\241\17< \25\0\n\203\1\"\177\0dY\0\22 \305\5\5small\240!a\32\0.\230\0.\240\240\0\1\240\240\3,\21\vb S\2kpo\"\313\2s\24\6!\220\2\t\21\2\241q\0\10 ?\4\21\tno \213\6ort\5\21\fm\";\3ine_A\313\232\241\3\2\5\16#\340\4\16!\345\340\0U\223\340\0'AC\4sourc\202\202\340\3\27\0\v\200\27\0_C\326\340\0074\1\16\21_\240\363\2\1\16\25\340\3\16\r\0\21\vmultipliers\31\0- \223\1\21\t\341\0\30B-\0\1!\10\0003a\253\0005\v\0005ac \327Dz\"\364 \20\0\10 \6\0 #\333\r\0165\16\36\0163\21\nidenti$x\nr\0166\21\vadmin_ce\10\21\5label\21\f\244H\6 hook\21\23\240\r\0_\340\1\375\fs\21\3api\24\4\21\5own F\0062\16C\16B\21\17 H\5imum_v \260$\25\7\6\1\21\17curr m\340\1\22!\242\0002\"\305\0022\21\20\340\1N\5_groupa\247\2\6\0163\352\0\10 \352\2\0164\5 \325C%\341\0P\341\5\220\1\0162aQA\26\4\16:\5\21\17\201\321\1 c\"$\5back\21#\340\7b\0_\200!\340\3\311\1\16\7C\340\0a!\312\1\no \300#\240!&}\241\237\0\0\242e\341\4n\5\16;\24\10\16< \7\2=\21\35\340\1m\0\320\0 \342\3XAz\v\16>\16G\16?\16#\16A\21\30\341\tT\201\5\1\21\22\200\243\0 B0\6 string#o\4toolsbD\1\16C \260\0D!D\4C\16L\16E!P\0F \3\201T\16G\21\21ckeditor_set%\266\0gE\323\0\5%Q\0# 4#\345!)\"w#\372\1\21\10\340\0!\0\1 \31\0\32\240\334\4#\16\n\21\10\300D \r\2O\21\25\300\r\6_input_\244+\340\16V\1\16+ \31\340\4h X\0\2!;\0# \245\0+ \247\0Q T\7R\21\26comme#/\0_%\266\2cko W\3pane ;\4\5\24\10\21\7#\v\0_\243\257\301\231\1\21\4F\35 !\340\1\22F\323\0021\21\10\"\311'B\0e#\223A\254&f`\346\"~\6\vcollap&q%\227\340\6\35\2\0\21\t\240\35\344\1a\3009\0\0#\212\300.\0001\200L$\247\1enFl\344\0\216\300,\0\1G\5\3view\340\0002\300\177 \372\0\1 K\0T!"..., 8196, MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 8196
It sounds like these are represented by ordinary C escape codes.
I've tried to decode them in shell by printf like:
while read line; do printf "%s" "$line"; done < <(cat strace.log | head -n2)
but it failed (looks like it doesn't make any sense):
11208 sendto(4, "set 29170397297_-cache-schema 85 0 127240rn257202v0?00022710stdClass247213cid216schema214d37ata25n247215block246216fields24f213bid2425214type 037erial2110not null5216module244167217va37rchar216length6#16t5217defaultr215de2lta#516v16f6 35716r210010215t 207C30#6216rr n4tatus#04213int/760214size 222finy216weight243 ;022300 66region#83405P5custom27300,171623162421nvisibility340t3475pages242 20534tex#206 2611it 365052400377y10r21ftransl!N2ble %1ca!a3403Q01n31vprimary key2416016621vunique#21ts241213tmd243 31020 2vn621616217index 210101214list24524036021 3610266316256416n 164214name 70na3172_ro25205!$0n 3341223016340016A2142213r!354# v222110unsigned5#3320362130n 213016 l6%16!24116%2710%#p516#16$21f200l241b#n24166M21016&#E4214bod201_53216t41623B\2g1634 303info .07a2550200#q!L56forma201332B/!d241637 y0*y0 225a;2402012'21van0_207200251613400U =#U1163#222 2122lob#On2316)216expire#303420267217create24117< 250n2031"1770dY022 30555small240!a320.`2300.240240012402403,21vb S2kpo"3132s246!2202t212241q010...
Is there any better way to parse the output of strace command to see plain strings passed to recvfrom/sendto?
Ideally it is possible to print printable characters including new lines (\r\n), but cut-off NULLs and other non-printable characters?

The problem why read doesn't work, because shell is already escaping the characters, so the string is doubled escaped, therefore \r\n is printed as rn.
To ignore escaping of characters by shell, you can use read -r which allow backslashes to escape any characters (so they're treated literally). Here is example:
while read -r line; do printf "%b\n" "$line"; done < strace.log | strings
Since it's a binary data, above example also includes strings command to display only printable strings.
Strace also support printing all strings in hex when -x is specified, but it'll work the same.
Here is the version to parse strace output in real-time:
while read -r line;
do printf "%b\n" "$line" | strings
done < <(sudo strace -e recvfrom,sendto -s 1000 -fp $(pgrep -n php) 2>/dev/stdout)
Further more strings, can be replaced by more specific filter using grep, to get only what is inside double quotes:
grep -o '".\+[^"]"' | grep -o '[^"]\+[^"]'
however this may still print binary formats.
To avoid that, lets simplify the whole process, so lets define the following formatter alias:
alias format-strace='grep --line-buffered -o '\''".\+[^"]"'\'' | grep --line-buffered -o '\''[^"]*[^"]'\'' | while read -r line; do printf "%b" $line; done | tr "\r\n" "\275\276" | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"'
where:
grep -o '".\+[^"]"' - select double-quoted string with quotes
grep -o '[^"]*[^"]' - select text within the double quotes
while read -r line - store each line into $line and do some action (help read)
printf "%b" $line - print line by expanding backslash escape sequences
tr "\r\n" "\275\276" - temporarily replace \r\n into \275\276
tr -d "[:cntrl:]" - remove all control characters
tr "\275\276" "\r\n" - restore new line endings
then the complete example to trace some command (e.g. php) can look like:
strace -e trace=read,write,recvfrom,sendto -s 1000 -fp $(pgrep -n php) 2>&1 | format-strace
Check for similar example: How to view the output of a running process in another bash session? at Unix.SE

Related

Using sed to find a string with wildcards and then replacing with same wildcards

So I am trying to remove new lines using sed, because it the only way I can think of to do it. I'm completely self taught so there may be a more efficient way that I just don't know.
The string I am searching for is \HF=-[0-9](newline character). The problem is the data it is searching through can look like (Note: there are actual new line characters in this data, which I think is causing a bit of the problem)
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-1.
3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.6974
,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.\H
,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.,1
.3948,3.\C,0,0.,-1.3948,3.\C,0,1.2079,0.6974,3.\C,0,-1.2079,0.6974,3.\
C,0,-1.2079,-0.6974,3.\C,0,1.2079,-0.6974,3.\H,0,0.,2.4822,3.\H,0,2.14
97,1.2411,3.\H,0,-2.1497,1.2411,3.\H,0,-2.1497,-1.2411,3.\H,0,2.1497,-
1.2411,3.\H,0,0.,-2.4822,3.\\Version=ES64L-G09RevD.01\State=1-AG\HF=-4
61.3998608\MP2=-463.0005321\RMSD=3.490e-09\PG=D02H [SG"(C4H4),X(C8H8)]
\\#
OR
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3.1_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-
1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.69
74,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.
\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.
,1.3948,3.1\C,0,0.,-1.3948,3.1\C,0,1.2079,0.6974,3.1\C,0,-1.2079,0.697
4,3.1\C,0,-1.2079,-0.6974,3.1\C,0,1.2079,-0.6974,3.1\H,0,0.,2.4822,3.1
\H,0,2.1497,1.2411,3.1\H,0,-2.1497,1.2411,3.1\H,0,-2.1497,-1.2411,3.1\
H,0,2.1497,-1.2411,3.1\H,0,0.,-2.4822,3.1\\Version=ES64L-G09RevD.01\St
ate=1-AG\HF=-461.4104442\MP2=-463.0062587\RMSD=3.651e-09\PG=D02H [SG"(
C4H4),X(C8H8)]\\#
OR
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3.3_Slide1.7\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.
,-1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.
6974,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,
0.\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,
0.,-0.3052,3.3\C,0,0.,-3.0948,3.3\C,0,1.2079,-1.0026,3.3\C,0,-1.2079,-
1.0026,3.3\C,0,-1.2079,-2.3974,3.3\C,0,1.2079,-2.3974,3.3\H,0,0.,0.782
2,3.3\H,0,2.1497,-0.4589,3.3\H,0,-2.1497,-0.4589,3.3\H,0,-2.1497,-2.94
11,3.3\H,0,2.1497,-2.9411,3.3\H,0,0.,-4.1822,3.3\\Version=ES64L-G09Rev
D.01\State=1-AG\HF=-461.436061\MP2=-463.0177441\RMSD=7.859e-09\PG=C02H
[SGH(C4H4),X(C8H8)]\\#
OR
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3.6_Slide0.9\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.
,-1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.
6974,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,
0.\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,
0.,0.4948,3.6\C,0,0.,-2.2948,3.6\C,0,1.2079,-0.2026,3.6\C,0,-1.2079,-0
.2026,3.6\C,0,-1.2079,-1.5974,3.6\C,0,1.2079,-1.5974,3.6\H,0,0.,1.5822
,3.6\H,0,2.1497,0.3411,3.6\H,0,-2.1497,0.3411,3.6\H,0,-2.1497,-2.1411,
3.6\H,0,2.1497,-2.1411,3.6\H,0,0.,-3.3822,3.6\\Version=ES64L-G09RevD.0
1\State=1-AG\HF=-461.4376969\MP2=-463.0163868\RMSD=7.263e-09\PG=C02H [
SGH(C4H4),X(C8H8)]\\#
Basically the number I am looking for can be broken up into two lines at any point based on character count. I need to get rid of the newline breaking up the number so that I can extract the entire value into a separate file. (I have no problems with the extraction to a new file, hence why it isn't included in the code)
Currently I am using this code
sed -i ':a;N;$!ba;s/HF=-*[0-9]*\n/HF=-*[0-9]*/g' $i &&
Which ALMOST works, expect it doesn't replace the wildcard values with the same values. It replaces it with the actual text [0-9] instead and doesn't always remove the new line character.
Important to the is that THERE ARE ACTUAL NEW LINE CHARACTERS in the output file and there is no way to change that without messing up the other 30 lines I am extracting from this output file.
What I want is to just get rid of the newline characters that occur when that string is found, regardless of how many digits there are in between the - sign and the newline character.
So the expected output would be something like
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-1.
3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.6974
,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.\H
,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.,1
.3948,3.\C,0,0.,-1.3948,3.\C,0,1.2079,0.6974,3.\C,0,-1.2079,0.6974,3.\
C,0,-1.2079,-0.6974,3.\C,0,1.2079,-0.6974,3.\H,0,0.,2.4822,3.\H,0,2.14
97,1.2411,3.\H,0,-2.1497,1.2411,3.\H,0,-2.1497,-1.2411,3.\H,0,2.1497,-
1.2411,3.\H,0,0.,-2.4822,3.\\Version=ES64L-G09RevD.01\State=1-AG\HF=-461.3998608\MP2=-463.0005321\RMSD=3.490e-09\PG=D02H [SG"(C4H4),X(C8H8)]
\\#
These files are rather large and have over 1500 executions of this line of code, so the more efficient the better.
Everything else in the script this is in is using a combination of grep, awk, sed, and basic UNIX commands.
EDIT
After trying
sed -i -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/' $i &&
I still had no luck getting rid of those pesky new line characters.
If it has any effect on the answers at all here is the rest of the code to go with the one line that is causing problems
echo name HF MP2 mpdiff | cat > allE
for i in *.out
do echo name HF MP2 mpdiff | cat > $i.allE
grep "Slide" $i | cut -d "\\" -f2 | cat | tr -d '\n' > $i.name &&
grep "EUMP2" $i | cut -d "=" -f3 | cut -c 1-25 | tr '\n' ' ' | tr -s ' ' >> $i.mp &&
grep "EUMP2" $i | cut -d "=" -f2 | cut -c 1-25 | tr '\n' ' ' | tr -s ' ' >> $i.mpdiff &&
sed -i -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/' $i &&
grep '\\HF' $i | awk -F 'HF' '{print substr($2,2,14)}' | tr '\n' ' ' >> $i.hf &&
paste $i.name >> $i.energies &&
sed -i 's/ /0 /g' $i.hf &&
sed -i 's/\\/0/g' $i.hf &&
sed -i 's/[A-Z]/0/g' $i.hf &&
paste $i.hf >> $i.energies &&
sed -i 's/[ABCEFGHIJKLMNOPQRSTUVWXYZ]//g' $i.mp &&
paste $i.mp >> $i.energies &&
sed -i 's/[ABCEFGHIJKLMNOPQRSTUVWXYZ]//g' $i.mpdiff &&
paste $i.mpdiff >> $i.energies &&
transpose $i.energies >> $i.allE #temp.txt &&
#cat temp.txt > $i.energies
#echo $i is finished
done
echo see allE for energies
#rm *.energies #temp.txt
rm *.name
rm *.mp
rm *.hf
rm *.mpdiff

Here is how you can fix your current attempt.
sed -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/'
Add the i flag if you want to make the changes on the file itself, add && to send the job to the background, etc. The -E flag is needed, because backreferences (see below) are part of extended regular expressions.
I made the following changes: I changed -* to -? as there should be at most one dash (if I understand correctly and that is in fact a minus sign, not a dash). I added the period to the bracket expression, so that the decimal point would be matched too. (Note that in a bracket expression, the dot is a regular character). I wrapped the whole thing except the newline in parentheses - making it into a subexpression, which you can refer to with a backreference - which is what I did in the replacement part.
A few notes though - this will join the lines even if the entire number is at the end of one line, but not followed by the closing \. If in fact the entire number being on one line, but the closing \ is on the next line, you can change the sed command slightly, to leave those alone. On the other hand, this does not handle situations where, for example, one line ends in \H and the next line begins with F=304.222\ You only mentioned "split number" in your problem statement; shouldn't you, though, also handle such cases, where the newline splits the \HF=...\ token, just not in the "number" portion of the token?

It looks like your input lines start with a space. I have ignored them in this solution.
sed -rz 's/(AG\\HF=-[0-9]*)\n/\1/g' "$i"

Command execution in sed while preserving unmatched part of the line

It is simple - I have a data stream with IPv4 addresses encoded into hexadecimal representation like for example 0c22384e which stands for 12.34.56.78.
I figured out sed command with substitution of captured octets into decimal numbers separated by dot.
echo "0c22384e" | sed -E 's/([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})/printf "%d.%d.%d.%d" 0x\1 0x\2 0x\3 0x\4/eg'
This works with a single number BUT as soon I add some text that is not supposed to be matched, it is also passed for the execution - via printf in this case.
How can I preserve the unmatched part of the line without being passed for the execution?

With only one address in a line you could use
echo "Something 0c22384e more" |
sed -r 's/(.*)([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})(.*)/"\1" 0x\2 0x\3 0x\4 0x\5 "\6"/' |
xargs -n6 printf '%s%d.%d.%d.%d%s\n'
EDIT:
Replaced solution for one line and more addresses
with solution for more lines (assuming no '\r' in the stream):
echo "Something 0c22384e more 0c22385e
Second line: 0c22386e and 0c223870
Third line: 0c22388e and 0c223890
4th line: 0c2238ae and 0c2238b0" |
sed 's/$/\r/' |
sed -r 's/[0-9a-f]{8}/\n&\n/g' |
sed -r 's/([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})/printf '%d.%d.%d.%d' 0x\1 0x\2 0x\3 0x\4/e' |
tr -d '\n' |
tr '\r' '\n'

How to search & replace arbitrary literal strings in sed and awk (and perl)

Say we have some arbitrary literals in a file that we need to replace with some other literal.
Normally, we'd just reach for sed(1) or awk(1) and code something like:
sed "s/$target/$replacement/g" file.txt
But what if the $target and/or $replacement could contain characters that are sensitive to sed(1) such as regular expressions. You could escape them but suppose you don't know what they are - they are arbitrary, ok? You'd need to code up something to escape all possible sensitive characters - including the '/' separator. eg
t=$( echo "$target" | sed 's/\./\\./g; s/\*/\\*/g; s/\[/\\[/g; ...' ) # arghhh!
That's pretty awkward for such a simple problem.
perl(1) has \Q ... \E quotes but even that can't cope with the '/' separator in $target.
perl -pe "s/\Q$target\E/$replacement/g" file.txt
I just posted an answer!! So my real question is, "is there a better way to do literal replacements in sed/awk/perl?"
If not, I'll leave this here in case it comes in useful.

The quotemeta, which implements \Q, absolutely does what you ask for
all ASCII characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash
Since this is presumably in a shell script, the problem is really of how and when shell variables get interpolated and so what the Perl program ends up seeing.
The best way is to avoid working out that interpolation mess and instead properly pass those shell variables to the Perl one-liner. This can be done in several ways; see this post for details.
Either pass the shell variables simply as arguments
#!/bin/bash
# define $target
perl -pe"BEGIN { $patt = shift }; s{\Q$patt}{$replacement}g" "$target" file.txt
where the needed arguments are removed from #ARGV and utilized in a BEGIN block, so before the runtime; then file.txt gets processed. There is no need for \E in the regex here.
Or, use the -s switch, which enables command-line switches for the program
# define $target, etc
perl -s -pe"s{\Q$patt}{$replacement}g" -- -patt="$target" file.txt
The -- is needed to mark the start of arguments, and switches must come before filenames.
Finally, you can also export the shell variables, which can then be used in the Perl script via %ENV; but in general I'd rather recommend either of the above two approaches.
A full example
#!/bin/bash
# Last modified: 2019 Jan 06 (22:15)
target="/{"
replacement="&"
echo "Replace $target with $replacement"
perl -wE'
BEGIN { $p = shift; $r = shift };
$_=q(ah/{yes); s/\Q$p/$r/; say
' "$target" "$replacement"
This prints
Replace /{ with &
ah&yes
where I've used characters mentioned in a comment.
The other way
#!/bin/bash
# Last modified: 2019 Jan 06 (22:05)
target="/{"
replacement="&"
echo "Replace $target with $replacement"
perl -s -wE'$_ = q(ah/{yes); s/\Q$patt/$repl/; say' \
-- -patt="$target" -repl="$replacement"
where code is broken over lines for readability here (and thus needs the \). Same printout.

Me again!
Here's a simpler way using xxd(1):
t=$( echo -n "$target" | xxd -p | tr -d '\n')
r=$( echo -n "$replacement" | xxd -p | tr -d '\n')
xxd -p file.txt | sed "s/$t/$r/g" | xxd -p -r
... so we're hex-encoding the original text with xxd(1) and doing search-replacement using hex-encoded search strings. Finally we hex-decode the result.
EDIT: I forgot to remove \n from the xxd output (| tr -d '\n') so that patterns can span the 60-column output of xxd. Of course, this relies on GNU sed's ability to operate on very long lines (limited only by memory).
EDIT: this also works on multi-line targets eg
target=$'foo\nbar'
replacement=$'bar\nfoo'

With awk you could do it like this:
awk -v t="$target" -v r="$replacement" '{gsub(t,r)}' file
The above expects t to be a regular expression, to use it a string you can use
awk -v t="$target" -v r="$replacement" '{while(i=index($0,t)){$0 = substr($0,1,i-1) r substr($0,i+length(t))} print}' file
Inspired from this post
Note that this won't work properly if the replacement string contains the target. The above link has solutions for that too.

This is an enhancement
of wef’s answer.
We can remove the issue of the special meaning of various special characters
and strings (^, ., [, *, $, \(, \), \{, \}, \+, \?,
&, \1, …, whatever, and the / delimiter)
by removing the special characters. 
Specifically, we can convert everything to hex;
then we have only 0-9 and a-f to deal with. 
This example demonstrates the principle:
$ echo -n '3.14' | xxd
0000000: 332e 3134 3.14
$ echo -n 'pi' | xxd
0000000: 7069 pi
$ echo '3.14 is a transcendental number. 3614 is an integer.' | xxd
0000000: 332e 3134 2069 7320 6120 7472 616e 7363 3.14 is a transc
0000010: 656e 6465 6e74 616c 206e 756d 6265 722e endental number.
0000020: 2020 3336 3134 2069 7320 616e 2069 6e74 3614 is an int
0000030: 6567 6572 2e0a eger..
$ echo "3.14 is a transcendental number. 3614 is an integer." | xxd -p \
| sed 's/332e3134/7069/g' | xxd -p -r
pi is a transcendental number. 3614 is an integer.
whereas, of course, sed 's/3.14/pi/g' would also change 3614.
The above is a slight oversimplification; it doesn’t account for boundaries. 
Consider this (somewhat contrived) example:
$ echo -n 'E' | xxd
0000000: 45 E
$ echo -n 'g' | xxd
0000000: 67 g
$ echo '$Q Eak!' | xxd
0000000: 2451 2045 616b 210a $Q Eak!.
$ echo '$Q Eak!' | xxd -p | sed 's/45/67/g' | xxd -p -r
&q gak!
Because $ (24) and Q (51)
combine to form 2451,
the s/45/67/g command rips it apart from the inside. 
It changes 2451 to 2671, which is &q (26 + 71). 
We can prevent that by separating the bytes of data in the search text,
the replacement text and the file with spaces. 
Here’s a stylized solution:
encode() {
xxd -p -- "$#" | sed 's/../& /g' | tr -d '\n'
}
decode() {
xxd -p -r -- "$#"
}
left=$( printf '%s' "$search" | encode)
right=$(printf '%s' "$replacement" | encode)
encode file.txt | sed "s/$left/$right/g" | decode
I defined an encode function because I used that functionality three times,
and then I defined decode for symmetry. 
If you don’t want to define a decode function, just change the last line to
encode file.txt | sed "s/$left/$right/g" | xxd -p –r
Note that the encode function triples the size of the data (text)
in the file, and then sends it through sed as a single line
— without even having a newline at the end. 
GNU sed seems to be able to handle this;
other versions might not be able to.
As an added bonus, this solution handles multi-line search and replace
(in other words, search and replacement strings that contain newline(s)).

I can explain why this doesn't work:
perl(1) has \Q ... \E quotes but even that can't cope with the '/' separator in $target.
The reason is because the \Q and \E (quotemeta) escapes are processed after the regex is parsed, and a regex is not parsed unless there are valid pattern delimiters defining a regex.
As an example, here's an attempt to replace the string /etc/ in /etc/hosts by using a variable in a string passed to perl:
$target="/etc/";
perl -pe "s/\Q$target\E/XXX/" <<<"/etc/hosts";
After the shell expands the variable in the string, perl receives the command s/\Q/etc/\E/XXX/ which is not a valid regex because it doesn't contain three pattern delimiters (perl sees five delimiters, i.e., s/…/…/…/…/). Therefore, the \Q and \E are never even executed.
The solution, as #zdim suggested, is to pass the variables to perl in a way that they are included in the regex after the regex is parsed, such as like this:
perl -s -pe 's/\Q$target\E/XXX/ig' -- -target="/etc/" <<<"/etc/123"

awk escaping is not all that complex either :
on the searching regex, just these 2 suffices to escape any and all awk variants - simply "cage" all of them, with additional escaping performed for just the circumflex/caret, and backslash itself :
-- technically you don't need to escape space at all - sometimes i like using it for marking an unambiguous anchoring point for the character instead of letting awk be too agile about how it handles spaces and tabs. swap the space for "!" inside the regex if u like
jot -s '' -c - 32 126 |
mawk 'gsub("[[-\440{-~:-# -/]", "[&]") \
gsub(/\\|\^/, "\\\\&")^_' FS='^$' RS='^$'
\440 is (`) - i'm just not a fan of having those exposed in my code
|
[ ][!]["][#][$][%][&]['][(][)][*][+] [,] [-][.] [/] # re-aligned for
0123456789 [:][;] [<] [=][>] [?] # readability
[#]ABCDEFGHIJKLMNOPQRSTUVWXYZ [[][\\] []][\^][_]
[`]abcdefghijklmnopqrstuvwxyz [{] [|] [}][~]
as for replacement, only literal "&" needs to be escaped via
gsub(target_regex, "&") # nothing escaped
matched text
gsub(target_regex, "\\&") # 2 backslashes
literal "&"
gsub("[[:punct:]]", "\\\\&") # 4 backslashes
\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\#\[\\\]\^\_\`\{\|\}\~
—- (personally prefer using square-brackets i.e. char classes as an escaping mechanism than having backslash galore)
gsub("[[:punct:]]", "\\\\\\&") # 6 backslashes
\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&\&
Use 6-backslashes only if you're planning to feed this output further down to another gsub()/match() function call

Convert String to HEX using sed command

I need to convert a string in chinese to its appropriate HEX format. I can do it using sed in the following way
echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g' | sed 's/^\(.\{0\}\)/\1\\x/' | sed -r 's/(.*)\\x/\1 /'
which gives me output as:
\xE6\xAC\xA2\xE8\xBF\x8E
This is correct answer that I am looking for. Please suggest me making using of sed more efficiently in above command. The above command is being run on ubuntu 16.04 terminal

You can chain sed-commands with ";":
echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g;s/^\(.\{0\}\)/\1\\x/' | sed -r 's/(.*)\\x/\1 /'
\xE6\xAC\xA2\xE8\xBF\x8E
Since you use sed and sed -r interchangingly, you have to modify the second, remaining sed call, to combine the remaining ones:
echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g;s/^\(.\{0\}\)/\1\\x/;s/\(.*\)\\x/\1 /'
Having a second look at it, what the output of xxd is without sed, I observed, the solution is much more easy:
echo -n 欢迎 | xxd -p -u | sed -r 's/(..)/\\x\1/g'
Your initial approach appended \x to 2 characters, but you can preceed it your pairs. However chaining multiple sed commands might still be a useful thing to know.

From an efficiency standpoint, about the best option I could come up with would be to replace xdd, 3-pipes, and 3 calls to sed with od and 2 bash parameter expansions. (there may be more efficient ways, but this was what came to mind)
For example, you could assign the result of command substitution $(printf "欢迎" | od -A none -t x1) to a variable which would contain ' e6 ac a2 e8 bf 8e'. Then it is simply a matter of converting to upper-case and then using a substring replacement of 'space' to '\x' (both provided by bash parameter expansions, e.g.
a=$(printf "欢迎" | od -A none -t x1); \
a=${a^^}; \
a=${a// /\\x}; \
echo $a
\xE6\xAC\xA2\xE8\xBF\x8E
(shown with line-continuations above, you can just copy/paste into your terminal to test)
From Your Request in Comment for C
The code in C to output the upper-case hex bytes contained in your string is trivial, e.g.
#include <stdio.h>
int main (void) {
char *s = "欢迎";
while (*s) /* output each byte in upper-case hex */
printf ("\\x%hhX", ((unsigned char)*s++));
putchar ('\n');
return 0;
}
Example Use/Output
$ ./bin/str2hexbytes
\xE6\xAC\xA2\xE8\xBF\x8E
(note: you could use the exact-width types in stdint.h and the exact-width format specifiers provided in inttypes.h for a more formal solution, but it would accomplish the same thing. Similarly, you could use wide-character types, but virtually all modern compilers have no problem handling multibyte characters in an ordinary string or array of char)

Base64 encoding new line

I am trying to encode some hex values to base64 in shell script.
nmurshed#ugster05:~$ echo -n "1906 1d8b fb01 3e78 5c21 85db 58a7 0bf9 a6bf 1e42 cb59 95cd 99be 66f7 8758 cf46 315f 1607 66f7 6793 e5b3 61f9 fa03 952d 9101 b129 7180 6f1d ca93 3494 55e0 0e2e" | xxd -r -p | base64
GQYdi/sBPnhcIYXbWKcL+aa/HkLLWZXNmb5m94dYz0YxXxYHZvdnk+WzYfn6A5UtkQGxKXGAbx3K
kzSUVeAOLg==
I get a automatic new line after 76 charecters, Is there a way to avoid that ?
Online i found, use "-n" to ignore new lines...Can anyone suggest something ?

echo -n doesn't actually matter here: It controls whether there's a newline on the output from echo, but whether echo emits a newline has no bearing on whether xxd or base64 emit newlines.
Because xxd ignores any trailing newline in the input, echo or echo -n will behave precisely the same here; whether there's a newline by echo makes no difference, because that newline (if it exists) will be consumed by xxd when reading its input. Rather, what you ultimately care about is the output of base64, which is what is generating your final result.
Assuming you have the GNU version of base64, add -w 0 to disable line wrapping in its output. Thus:
printf '%s' "1906 1d8b fb01 3e78 5c21 85db 58a7 0bf9 a6bf 1e42 cb59 95cd 99be 66f7 8758 cf46 315f 1607 66f7 6793 e5b3 61f9 fa03 952d 9101 b129 7180 6f1d ca93 3494 55e0 0e2e" \
| xxd -r -p \
| base64 -w 0

I had a similar problem where
var1=$(echo -n "$USER:$PASSWORD" | base64)
was resulting in an erroneous base64 encoded value which was unusable in my next step of the script, used printf & it worked fine. Here is my code:
var1=$(printf "%s" "${USER}:${PASSWORD}" | base64)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to parse strace in shell into plain text? - shell

Related

Using sed to find a string with wildcards and then replacing with same wildcards

Command execution in sed while preserving unmatched part of the line

How to search & replace arbitrary literal strings in sed and awk (and perl)

Convert String to HEX using sed command

Base64 encoding new line

Categories

Resources