generate a random number/string or an iterator in sed 's/' - bash

I adapted Jan Goyvaerts's e-mail regex to a bash function to be used in pipes to anonymize e-mail addresses:
function remove_emails {
sed -r "s|\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b|email.address#removed.com|gI";
}
which I'm using in a bash pipe:
mysqldump \
-uuser \
-ppass \
db_name \
| remove_emails \
| gzip -c \
| cat \
> tmp.sql.gz
works fine but now, I'd like to have different random e-mails, I'd be satisfied with:
email.address1#removed.com
email.address2#removed.com
or
eiyyzhupzftrvjwehbqp#removed.com
kwmbrshzmxqlrqatqpff#removed.com
or anything that differs and is unique
I'm quite comfortable with bash but using counters, process substitution and so fails as sed is invoked only once, so
sed "s,sth,$(echo $RANDOM),g"
and similar won't work,
Is there anything to generate random stuff or counters in sed itself?

This might work for you (GNU sed):
<<<'Here is a random number.' sed 's/random number/& $RANDOM/;s/.*/echo "&"/e'
or if you prefer:
<<<'Here is a random number.' sed 's/random number/& $RANDOM/;s/.*/echo "&"/' | sh

I experimented with potong's correct answer and found a way to implement an iterator which answers the other part of my question:
remove_emails() {
sed -r 's|\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b|test$(( iterator++ ))#example.com|gI;s|.*|echo "&"|' | bash
}
iterator=0
test_data='some.e.mail.address.#domain.com\nsome.other#email.co.uk\nwhatever#man.biz\nsed#sed.com\n'
echo -e "before:\n${test_data}"
echo -e "after: \n${test_data}" | remove_emails

You could do it by repeatedly invoking sed in a while loop as shown below:
remove_emails() {
while read line
do
sed -r "s|\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b|email.address${RANDOM}#removed.com|gI" <<< "$line"
done
}

Related

Editing lines in .mk file

I would like to edit a .mk file using Bash.
Inside the file, it looks like this:
SRC_PATHS = src/lib \
src/Application \
src/win \
src/prj
I would like to add a new source, which should look like this:
SRC_PATHS = src/lib \
src/Application \
src/win \
src/prj \
src/New
I am trying a sed command, but cannot add a new line.
Note: the last src path (src/prj) is not always the same.
If ed is available/acceptable.
#!/usr/bin/env bash
ed -s file.mk <<-'EOF'
$t.
-1s/$/ \\/
+s|\(^[[:blank:]]\{1,\}\) \(.\{1,\}\)$|\1 scr/new|
,p
Q
EOF
In-one-line
printf '%s\n' '$t.' '-1s/$/ \\/' '+s|\(^[[:blank:]]*\) \(.*\)$|\1 scr/new|' ,p Q | ed -s file.mk
with a shell variable to store the replacement.
#!/usr/bin/env bash
var='scr/new'
ed -s file.mk <<-EOF
\$t.
-1s/\$/ \\\/
+s|\(^[[:blank:]]\{1,\}\) \(.\{1,\}\)\$|\1 $var|
,p
Q
EOF
Remove the ,p to silence the output to stdout , it is there just to see what is the new outcome of the edited buffer.
Change Q to w if in-place editing is needed
JFYI, both the script and the one-liner are not limited to just bash it should work on any POSIX compliant shell.
With sed how about:
sed -i '$s#.\+#& \\'\\$'\n'' src/New#' file.mk
Result:
SRC_PATHS = src/lib \
src/Application \
src/win \
src/prj \
src/New
Considering the indentation of the input and of the desired result, which is not uniform between the first line and the others, I suspect that it is not important at all. If this is the case, then this sed command might work:
sed -z 's#\n$# \\\nsrc/New\n#' file.mk
where
-z is to treat the file as a single line/stream with embedded \ns
\n$ targets the EOF together with the last \n
the replacement string is \\\nsrc/New\n.
Thanks to all who answered, I tried all your suggestions, and here are the code snippets working and applicable to my needs:
sed -i '/^SRC_PATHS[\t ]*=/{:a;/\\$/{N;ba;};s,$, \\\n\tsrc/New,}' file.mk
there are some instances where there is already a "\" in the file, so I added new code to clean up those lines
sed -i '/^$*.\\/d' file.mk
then to add another path in the EOF:
sed -i '$s#.\+#& \\'\\$'\n'' src/New#' file.mk

Bash: replace specific text with its translation

There is a huge file, in it I want to replace all the text between '=' and '\n' with its translation, here is an example:
input:
screen.LIGHT_COLOR=Lighting Color
screen.LIGHT_M=Light (Morning)
screen.AMBIENT_M=Ambient (Morning)
output:
screen.LIGHT_COLOR=Цвет Освещения
screen.LIGHT_M=Свет (Утро)
screen.AMBIENT_M=Эмбиент (Утро)
All I have managed to do until now is to extract and translate the targeted text.
while IFS= read -r line
do
echo $line | cut -d= -f2- | trans -b en:ru
done < file.txt
output:
Цвет Освещения
Свет (Утро)
Эмбиент (Утро)
*trans is short for translate-shell. It is slow, but does the job. -b for brief translation; en:ru means English to Russian.
If you have any suggestions or solutions i'll be glad to know, thanks!
edit, in case someone needs it:
After discovering trans-shell limitations I ended up going with the #TaylorG. suggestion. It is seams that translation-shell allows around 110 request per some time. Processing each line seperatly results in 1300 requests, which breaks the script.
long story short, it is faster to pack all the data into a single request. Its possible to reduce processing time from couple of minutes to mere seconds. sorry for the messy code, it's my third day with:
cut -s -d = -f 1 en_US.lang > option_en.txt
cut -s -d = -f 2 en_US.lang > value_en.txt
# merge lines
sed ':a; N; $!ba; s/\n/ :: /g' value_en.txt > value_en_block.txt
trans -b en:ru -i value_en_block.txt -o value_ru_block.txt
sed 's/ :: /\n/g' value_ru_block.txt > value_ru.txt
paste -d = option_en.txt value_ru.txt > ru_RU.lang
# remove trmporary files
rm option_en.txt value_en.txt value_en_block.txt value_ru.txt value_ru_block.txt
Thanks Taylor G., Armali and every commentator
Using pipe in a large loop is expensive. You can try the following instead.
cut -s -d = -f 1 file.txt > name.txt
cut -s -d = -f 2- file.txt | trans -b en:ru > translate.txt
paste -d = name.txt translate.txt
It shall be much faster than your current script. I'm not sure how your trans method is written. It needs to be updated to process batch input if it's not, e.g. using a while loop.
trans() {
while read -r line; do
# do translate and print result
done
}
You already did most of the work, though it can be optimized a bit. What's missing is just to output the first part of the line up to the equal sign together with the translation:
while IFS== read left right
do echo $left=`trans -b en:ru <<<$right`
done <file.txt

Convert String to HEX using sed command

I need to convert a string in chinese to its appropriate HEX format. I can do it using sed in the following way
echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g' | sed 's/^\(.\{0\}\)/\1\\x/' | sed -r 's/(.*)\\x/\1 /'
which gives me output as:
\xE6\xAC\xA2\xE8\xBF\x8E
This is correct answer that I am looking for. Please suggest me making using of sed more efficiently in above command. The above command is being run on ubuntu 16.04 terminal
You can chain sed-commands with ";":
echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g;s/^\(.\{0\}\)/\1\\x/' | sed -r 's/(.*)\\x/\1 /'
\xE6\xAC\xA2\xE8\xBF\x8E
Since you use sed and sed -r interchangingly, you have to modify the second, remaining sed call, to combine the remaining ones:
echo -n 欢迎 | xxd -p -u | sed 's/.\{2\}/&\\x/g;s/^\(.\{0\}\)/\1\\x/;s/\(.*\)\\x/\1 /'
Having a second look at it, what the output of xxd is without sed, I observed, the solution is much more easy:
echo -n 欢迎 | xxd -p -u | sed -r 's/(..)/\\x\1/g'
Your initial approach appended \x to 2 characters, but you can preceed it your pairs. However chaining multiple sed commands might still be a useful thing to know.
From an efficiency standpoint, about the best option I could come up with would be to replace xdd, 3-pipes, and 3 calls to sed with od and 2 bash parameter expansions. (there may be more efficient ways, but this was what came to mind)
For example, you could assign the result of command substitution $(printf "欢迎" | od -A none -t x1) to a variable which would contain ' e6 ac a2 e8 bf 8e'. Then it is simply a matter of converting to upper-case and then using a substring replacement of 'space' to '\x' (both provided by bash parameter expansions, e.g.
a=$(printf "欢迎" | od -A none -t x1); \
a=${a^^}; \
a=${a// /\\x}; \
echo $a
\xE6\xAC\xA2\xE8\xBF\x8E
(shown with line-continuations above, you can just copy/paste into your terminal to test)
From Your Request in Comment for C
The code in C to output the upper-case hex bytes contained in your string is trivial, e.g.
#include <stdio.h>
int main (void) {
char *s = "欢迎";
while (*s) /* output each byte in upper-case hex */
printf ("\\x%hhX", ((unsigned char)*s++));
putchar ('\n');
return 0;
}
Example Use/Output
$ ./bin/str2hexbytes
\xE6\xAC\xA2\xE8\xBF\x8E
(note: you could use the exact-width types in stdint.h and the exact-width format specifiers provided in inttypes.h for a more formal solution, but it would accomplish the same thing. Similarly, you could use wide-character types, but virtually all modern compilers have no problem handling multibyte characters in an ordinary string or array of char)

How to parse strace in shell into plain text?

I've trace log generated by strace command like on running PHP by:
sudo strace -e sendto -fp $(pgrep -n php) -o strace.log
And the output looks like:
11208 sendto(4, "set 29170397297_-cache-schema 85 0 127240\r\n\257\202\v\0?\0\0\0\2\27\10stdClass\24\7\21\3cid\21\6schema\21\4d\37ata\25\n\247\21\5block\24\6\21\6fields\24\f\21\3bid\24\2\5\21\4type 0\37erial\21\10not null\5\21\6module\24\4\16\7\21\7va\37rchar\21\6length\6#\16\t\5\21\7default\r\21\5de\2lta#\5\16\v\16\f\6 \35\7\16\r\21\0010\21\5t \207C\30#6\2\16\r\r n\4tatus#0\4\21\3int/\7\6\0\21\4size \222\finy\21\6weight\24\3 ;\0\22\300 \6\6region#8\340\5P\5custom\27\300,\17\16\23\16\24\21\nvisibility\340\t\34\7\5pages\24\2 \205\3\4tex#\206 \261\1it \365\0\5\240\0\377y\10\r\21\ftransl!N\2ble %\1ca!a\340\3Q\0\1n\31\vprimary key\24\1\6\0\16\6\21\vunique#\21\ts\24\1\21\3tmd\24\3 \31\0\20 2\v\n\6\2\16\16\21\7index \210\10\1\21\4list\24\5\240\36\0\21 \36\10\26\6\3\16\25\6\4\16\n \1\6\4\21\4name \7\0\na\317\2_ro\252\0\5!$\0\n \3\341\2\23\0\16\340\0\16A\214\2\21\3r!\354# \v\22\21\10unsigned\5#\332\0\36\213\0\n \213\0\16 l\6%\16!\24\1\16%\271\0%#p\5\16#\16$\21\f\200l\241b#n\2\4\16\6M\2\10\16&#E\4\21\4bod\201_\5\32\16\t\4\16\23B\\\2g\16\34 \30\3info .\0\7a\255\0\200#q!L\5\6forma\201\332B/!d\2\4\16\37 y\0*y\0 \225a;\240\201\2'\21\van\0_\207\200\2\5\16\1\340\0U =#U\1\16\3#\222 \212\2lob#O\n\23\16)\21\6expire#\30\342\0\26\7\21\7create\241\17< \25\0\n\203\1\"\177\0dY\0\22 \305\5\5small\240!a\32\0.\230\0.\240\240\0\1\240\240\3,\21\vb S\2kpo\"\313\2s\24\6!\220\2\t\21\2\241q\0\10 ?\4\21\tno \213\6ort\5\21\fm\";\3ine_A\313\232\241\3\2\5\16#\340\4\16!\345\340\0U\223\340\0'AC\4sourc\202\202\340\3\27\0\v\200\27\0_C\326\340\0074\1\16\21_\240\363\2\1\16\25\340\3\16\r\0\21\vmultipliers\31\0- \223\1\21\t\341\0\30B-\0\1!\10\0003a\253\0005\v\0005ac \327Dz\"\364 \20\0\10 \6\0 #\333\r\0165\16\36\0163\21\nidenti$x\nr\0166\21\vadmin_ce\10\21\5label\21\f\244H\6 hook\21\23\240\r\0_\340\1\375\fs\21\3api\24\4\21\5own F\0062\16C\16B\21\17 H\5imum_v \260$\25\7\6\1\21\17curr m\340\1\22!\242\0002\"\305\0022\21\20\340\1N\5_groupa\247\2\6\0163\352\0\10 \352\2\0164\5 \325C%\341\0P\341\5\220\1\0162aQA\26\4\16:\5\21\17\201\321\1 c\"$\5back\21#\340\7b\0_\200!\340\3\311\1\16\7C\340\0a!\312\1\no \300#\240!&}\241\237\0\0\242e\341\4n\5\16;\24\10\16< \7\2=\21\35\340\1m\0\320\0 \342\3XAz\v\16>\16G\16?\16#\16A\21\30\341\tT\201\5\1\21\22\200\243\0 B0\6 string#o\4toolsbD\1\16C \260\0D!D\4C\16L\16E!P\0F \3\201T\16G\21\21ckeditor_set%\266\0gE\323\0\5%Q\0# 4#\345!)\"w#\372\1\21\10\340\0!\0\1 \31\0\32\240\334\4#\16\n\21\10\300D \r\2O\21\25\300\r\6_input_\244+\340\16V\1\16+ \31\340\4h X\0\2!;\0# \245\0+ \247\0Q T\7R\21\26comme#/\0_%\266\2cko W\3pane ;\4\5\24\10\21\7#\v\0_\243\257\301\231\1\21\4F\35 !\340\1\22F\323\0021\21\10\"\311'B\0e#\223A\254&f`\346\"~\6\vcollap&q%\227\340\6\35\2\0\21\t\240\35\344\1a\3009\0\0#\212\300.\0001\200L$\247\1enFl\344\0\216\300,\0\1G\5\3view\340\0002\300\177 \372\0\1 K\0T!"..., 8196, MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 8196
It sounds like these are represented by ordinary C escape codes.
I've tried to decode them in shell by printf like:
while read line; do printf "%s" "$line"; done < <(cat strace.log | head -n2)
but it failed (looks like it doesn't make any sense):
11208 sendto(4, "set 29170397297_-cache-schema 85 0 127240rn257202v0?00022710stdClass247213cid216schema214d37ata25n247215block246216fields24f213bid2425214type 037erial2110not null5216module244167217va37rchar216length6#16t5217defaultr215de2lta#516v16f6 35716r210010215t 207C30#6216rr n4tatus#04213int/760214size 222finy216weight243 ;022300 66region#83405P5custom27300,171623162421nvisibility340t3475pages242 20534tex#206 2611it 365052400377y10r21ftransl!N2ble %1ca!a3403Q01n31vprimary key2416016621vunique#21ts241213tmd243 31020 2vn621616217index 210101214list24524036021 3610266316256416n 164214name 70na3172_ro25205!$0n 3341223016340016A2142213r!354# v222110unsigned5#3320362130n 213016 l6%16!24116%2710%#p516#16$21f200l241b#n24166M21016&#E4214bod201_53216t41623B\2g1634 303info .07a2550200#q!L56forma201332B/!d241637 y0*y0 225a;2402012'21van0_207200251613400U =#U1163#222 2122lob#On2316)216expire#303420267217create24117< 250n2031"1770dY022 30555small240!a320.`2300.240240012402403,21vb S2kpo"3132s246!2202t212241q010...
Is there any better way to parse the output of strace command to see plain strings passed to recvfrom/sendto?
Ideally it is possible to print printable characters including new lines (\r\n), but cut-off NULLs and other non-printable characters?
The problem why read doesn't work, because shell is already escaping the characters, so the string is doubled escaped, therefore \r\n is printed as rn.
To ignore escaping of characters by shell, you can use read -r which allow backslashes to escape any characters (so they're treated literally). Here is example:
while read -r line; do printf "%b\n" "$line"; done < strace.log | strings
Since it's a binary data, above example also includes strings command to display only printable strings.
Strace also support printing all strings in hex when -x is specified, but it'll work the same.
Here is the version to parse strace output in real-time:
while read -r line;
do printf "%b\n" "$line" | strings
done < <(sudo strace -e recvfrom,sendto -s 1000 -fp $(pgrep -n php) 2>/dev/stdout)
Further more strings, can be replaced by more specific filter using grep, to get only what is inside double quotes:
grep -o '".\+[^"]"' | grep -o '[^"]\+[^"]'
however this may still print binary formats.
To avoid that, lets simplify the whole process, so lets define the following formatter alias:
alias format-strace='grep --line-buffered -o '\''".\+[^"]"'\'' | grep --line-buffered -o '\''[^"]*[^"]'\'' | while read -r line; do printf "%b" $line; done | tr "\r\n" "\275\276" | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"'
where:
grep -o '".\+[^"]"' - select double-quoted string with quotes
grep -o '[^"]*[^"]' - select text within the double quotes
while read -r line - store each line into $line and do some action (help read)
printf "%b" $line - print line by expanding backslash escape sequences
tr "\r\n" "\275\276" - temporarily replace \r\n into \275\276
tr -d "[:cntrl:]" - remove all control characters
tr "\275\276" "\r\n" - restore new line endings
then the complete example to trace some command (e.g. php) can look like:
strace -e trace=read,write,recvfrom,sendto -s 1000 -fp $(pgrep -n php) 2>&1 | format-strace
Check for similar example: How to view the output of a running process in another bash session? at Unix.SE

What is the most compact or efficient way of doing several subsitutions in a file in bash

I have a file data.base which looks like:
1234 XXXX
4321 XXXX
9884 ZZZZ
5454 YYYY
4311 YYYY
9882 ZZZZ
9976 ZZZZ
( ... random occurrences like this till 10000 lines)
I would like to create a file called data.case which derives from data.base just with substitutions of XXXX, YYYY, ZZZZ for float numbers.
I wonder what would be the most compact/efficient/short way to do that on bash or friends.
What I usually do is something like:
sed -e "s/XXXX/1.34555/g" data.base > temp1
sed -e "s/YYYY/2.985/g" temp1 > temp2
sed -e "s/ZZZZ/-4.3435/g" temp2 > data.case
rm -fr temp1 temp2
But I do not think this is the most compact or efficient way when you have to deal with more than 3 substitutions.
Thanks
Thanks
Use an option to ececute several commands in same sed:
sed "s/XXXX/1.34555/g; s/YYYY/2.985/g"; s/ZZZZ/-4.3435/g" data.base > data.case
$ cat sedcommands
s/XXXX/1.34555/g
s/YYYY/2.985/g
s/ZZZZ/-4.3435/g
$ sed -f sedcommands data.base > data.case
you can make use of associative arrays in awk
awk 'BEGIN{
# add as needed
s["XXXX"]=1.3455
s["YYYY"]=2.985
s["ZZZZ"]=-4.3435
}
($2 in s) { print $1,s[$2] }' file
output
$ ./shell.sh
1234 1.3455
4321 1.3455
9884 -4.3435
5454 2.985
4311 2.985
9882 -4.3435
9976 -4.3435
sed -e "s/XXXX/1.34555/g;s/YYYY/2.985/g;s/ZZZZ/-4.3435/g"
or put them in a cmd file
and list them out.
Whilst sed can do multiple substitutions in one pass, the general UNIX approach which is more widely applicable and can be combined with other commands is to use command piping:
cat data.base | \
sed -e "s/XXXX/1.34555/g" | \
sed -e "s/YYYY/2.985/g" | \
sed -e "s/ZZZZ/-4.3435/g" > data.base
The redirection at the end will 'unlink' the old data.base that is being used as input by cat; you could however still use a temporary file so that you can intercept error conditions and not have lost the original data.base in the process.
(When using piping, its useful to be familiar with the tee program, which saves the stream to a file whilst passing it on)

Resources