How to add a space after special characters in bash script? - bash

I have a text file with something like,
!aa
#bb
#cc
$dd
%ee
expected output is,
! aa
# bb
# cc
$ dd
% ee
What I have tried, echo "${foo//#/# }".
This does work fine with one string but it does not work for all the lines in the file. I have tried with this while loop to read all the lines of the file and do the same using echo but it does not work.
while IFS= read -r line; do
foo=$line
sep="!##$%"
echo "${foo//$sep/$sep }"
done < $1
I have tried with awk split but it does not give the expected output. Is there any workaround for this? by using awk or sed.

The following assumes you want to add a space after every character in the !##$% set (even if it is the last character in a line). Test file:
$ cat file.txt
a!a
#bb
c#c
$dd
ee%
foo
%b%r
$ sep='!##$%'
With sed:
$ sed 's/['"$sep"']/& /g' file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r
With awk:
$ awk '{gsub(/['"$sep"']/,"& "); print}' file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r
With plain bash (not recommended, it is too slow):
$ while IFS= read -r line; do
str=""
for (( i=0; i<${#line}; i++ )); do
char="${line:i:1}"
str="$str$char"
[[ "$char" =~ [$sep] ]] && str="$str "
done
printf '%s\n' "$str"
done < file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r
Or (not sure which is the worst):
$ while IFS= read -r line; do
for (( i=0; i<${#sep}; i++ )); do
char="${sep:i:1}"
line="${line//$char/$char }"
done
printf '%s\n' "$line"
done < file.txt
a! a
# bb
c# c
$ dd
ee%
foo
% b% r

Characters you call special in your example seems to be subset of characters known as [[:punct:]] to GNU sed, thus I propose following solution:
sed 's/\([[:punct:]]\)/\1 /g' file.txt
which with file.txt content being
!aa
#bb
#cc
$dd
%ee
output
! aa
# bb
# cc
$ dd
% ee
Explanation: I use capturing group \(...\) which has any character belonging to [:punct:] then I replace what was captured with content of that capture followed by space. I use g to apply it to all occurences in each line, though this has not visible impact for data above. You might elect to drop g if you are sure there will be at most one character to replace in every line.
If you want to know more about [:punct:] or other similar character sets read about Character Classes on Regular-Expressions.info

If the file always contain a symbol at the start of line like that then use this
sed -Ei 's/^(.)/\1 /g' yourfile.txt
The -E option is to tell sed to use regex. -i modifies the file inline, you can remove it if you want to output to console or another file. The ^(.) regex captures the first character on the line and add a space to it (\1 )

Assuming that special characters are non-numeric and non-alphabetic characters, and special characters can appear anywhere in the line, use the following regular expression to replace them.
sed 's/[^a-zA-Z0-9]/& /g' urfile

Related

pipe stdout and prepend some chars for each line

I want to pipe the stdout of a process through a "tool" to prepend some chars to each line. I'm working in a bash.
Example:
PREPEND=' * '
foo.bin | toolXY "$PREPEND"
If foo.bin will output:
hello
world
the Output after toolXY should be:
* hello
* world
What whould be the command for toolXY?
Awk would work for this as well.
cat foo.bin | awk 'PREPEND=" * " {print PREPEND $0}'
foo.bin | sed "s/^/$PREPEND/"
or
foo.bin | while IFS= read -r line; do echo "$PREPEND $line"; done
The second is more robust if $PREPEND can contain unpredictable special characters. IFS= preserves leading whitespace on each line and -r protects backslashes from being interpreted as escape sequences.

shell script for reading file and replacing new file with | symbol

i have txt file like below.
abc
def
ghi
123
456
789
expected output is
abc|def|ghi
123|456|789
I want replace new line with pipe symbol (|). i want to use in egrep.After empty line it should start other new line.
you can try with awk
awk -v RS= -v OFS="|" '{$1=$1}1' file
you get,
abc|def|ghi
123|456|789
Explanation
Set RS to a null/blank value to get awk to operate on sequences of blank lines.
From the POSIX specification for awk:
RS
The first character of the string value of RS shall be the input record separator; a by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences consisting of a plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a shall always be a field separator, no matter what the value of FS is.
$1==$1 re-formatting output with OFS as separator, 1 is true for always print.
Here's one using GNU sed:
cat file | sed ':a; N; $!ba; s/\n/|/g; s/||/\n/g'
If you're using BSD sed (the flavor packaged with Mac OS X), you will need to pass in each expression separately, and use a literal newline instead of \n (more info):
cat file | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/|/g' -e 's/||/\
/g'
If file is:
abc
def
ghi
123
456
789
You get:
abc|def|ghi
123|456|789
This replaces each newline with a | (credit to this answer), and then || (i.e. what was a pair of newlines in the original input) with a newline.
The caveat here is that | can't appear at the beginning or end of a line in your input; otherwise, the second sed will add newlines in the wrong places. To work around that, you can use another character that won't be in your input as an intermediate value, and then replace singletons of that character with | and pairs with \n.
EDIT
Here's an example that implements the workaround above, using the NUL character \x00 (which should be highly unlikely to appear in your input) as the intermediate character:
cat file | sed ':a;N;$!ba; s/\n/\x00/g; s/\x00\x00/\n/g; s/\x00/|/g'
Explanation:
:a;N;$!ba; puts the entire file in the pattern space, including newlines
s/\n/\x00/g; replaces all newlines with the NUL character
s/\x00\x00/\n/g; replaces all pairs of NULs with a newline
s/\x00/|/g replaces the remaining singletons of NULs with a |
BSD version:
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\x00/g' -e 's/\x00\x00/\
/g' -e 's/\x00/|/g'
EDIT 2
For a more direct approach (GNU sed only), provided by #ClaudiuGeorgiu:
sed -z 's/\([^\n]\)\n\([^\n]\)/\1|\2/g; s/\n\n/\n/g'
Explanation:
-z uses NUL characters as line-endings (so newlines are not given special treatment and can be matched in the regular expression)
s/\([^\n]\)\n\([^\n]\)/\1|\2/g; replaces every 3-character sequence of <non-newline><newline><non-newline> with <non-newline>|<non-newline>
s/\n\n/\n/g replaces all pairs of newlines with a single newline
In native bash:
#!/usr/bin/env bash
curr=
while IFS= read -r line; do
if [[ $line ]]; then
curr+="|$line"
else
printf '%s\n' "${curr#|}"
curr=
fi
done
[[ $curr ]] && printf '%s\n' "${curr#|}"
Tested:
$ f() { local curr= line; while IFS= read -r line; do if [[ $line ]]; then curr+="|$line"; else printf '%s\n' "${curr#|}"; curr=; fi; done; [[ $curr ]] && printf '%s\n' "${curr#|}"; }
$ f < <(printf '%s\n' 'abc' 'def' 'ghi' '' 123 456 789)
abc|def|ghi
123|456|789
Use rs. For example:
rs -C'|' 2 3 < file
rs = reshape data array. Here I'm specifying that I want 2 rows, 3 columns, and the output separator to be pipe.

How to perform a for loop on each character in a string in Bash?

I have a variable like this:
words="这是一条狗。"
I want to make a for loop on each of the characters, one at a time, e.g. first character="这", then character="是", character="一", etc.
The only way I know is to output each character to separate line in a file, then use while read line, but this seems very inefficient.
How can I process each character in a string through a for loop?
You can use a C-style for loop:
foo=string
for (( i=0; i<${#foo}; i++ )); do
echo "${foo:$i:1}"
done
${#foo} expands to the length of foo. ${foo:$i:1} expands to the substring starting at position $i of length 1.
With sed on dash shell of LANG=en_US.UTF-8, I got the followings working right:
$ echo "你好嗎 新年好。全型句號" | sed -e 's/\(.\)/\1\n/g'
你
好
嗎
新
年
好
。
全
型
句
號
and
$ echo "Hello world" | sed -e 's/\(.\)/\1\n/g'
H
e
l
l
o
w
o
r
l
d
Thus, output can be looped with while read ... ; do ... ; done
edited for sample text translate into English:
"你好嗎 新年好。全型句號" is zh_TW.UTF-8 encoding for:
"你好嗎" = How are you[ doing]
" " = a normal space character
"新年好" = Happy new year
"。全型空格" = a double-byte-sized full-stop followed by text description
${#var} returns the length of var
${var:pos:N} returns N characters from pos onwards
Examples:
$ words="abc"
$ echo ${words:0:1}
a
$ echo ${words:1:1}
b
$ echo ${words:2:1}
c
so it is easy to iterate.
another way:
$ grep -o . <<< "abc"
a
b
c
or
$ grep -o . <<< "abc" | while read letter; do echo "my letter is $letter" ; done
my letter is a
my letter is b
my letter is c
I'm surprised no one has mentioned the obvious bash solution utilizing only while and read.
while read -n1 character; do
echo "$character"
done < <(echo -n "$words")
Note the use of echo -n to avoid the extraneous newline at the end. printf is another good option and may be more suitable for your particular needs. If you want to ignore whitespace then replace "$words" with "${words// /}".
Another option is fold. Please note however that it should never be fed into a for loop. Rather, use a while loop as follows:
while read char; do
echo "$char"
done < <(fold -w1 <<<"$words")
The primary benefit to using the external fold command (of the coreutils package) would be brevity. You can feed it's output to another command such as xargs (part of the findutils package) as follows:
fold -w1 <<<"$words" | xargs -I% -- echo %
You'll want to replace the echo command used in the example above with the command you'd like to run against each character. Note that xargs will discard whitespace by default. You can use -d '\n' to disable that behavior.
Internationalization
I just tested fold with some of the Asian characters and realized it doesn't have Unicode support. So while it is fine for ASCII needs, it won't work for everyone. In that case there are some alternatives.
I'd probably replace fold -w1 with an awk array:
awk 'BEGIN{FS=""} {for (i=1;i<=NF;i++) print $i}'
Or the grep command mentioned in another answer:
grep -o .
Performance
FYI, I benchmarked the 3 aforementioned options. The first two were fast, nearly tying, with the fold loop slightly faster than the while loop. Unsurprisingly xargs was the slowest... 75x slower.
Here is the (abbreviated) test code:
words=$(python -c 'from string import ascii_letters as l; print(l * 100)')
testrunner(){
for test in test_while_loop test_fold_loop test_fold_xargs test_awk_loop test_grep_loop; do
echo "$test"
(time for (( i=1; i<$((${1:-100} + 1)); i++ )); do "$test"; done >/dev/null) 2>&1 | sed '/^$/d'
echo
done
}
testrunner 100
Here are the results:
test_while_loop
real 0m5.821s
user 0m5.322s
sys 0m0.526s
test_fold_loop
real 0m6.051s
user 0m5.260s
sys 0m0.822s
test_fold_xargs
real 7m13.444s
user 0m24.531s
sys 6m44.704s
test_awk_loop
real 0m6.507s
user 0m5.858s
sys 0m0.788s
test_grep_loop
real 0m6.179s
user 0m5.409s
sys 0m0.921s
I believe there is still no ideal solution that would correctly preserve all whitespace characters and is fast enough, so I'll post my answer. Using ${foo:$i:1} works, but is very slow, which is especially noticeable with large strings, as I will show below.
My idea is an expansion of a method proposed by Six, which involves read -n1, with some changes to keep all characters and work correctly for any string:
while IFS='' read -r -d '' -n 1 char; do
# do something with $char
done < <(printf %s "$string")
How it works:
IFS='' - Redefining internal field separator to empty string prevents stripping of spaces and tabs. Doing it on a same line as read means that it will not affect other shell commands.
-r - Means "raw", which prevents read from treating \ at the end of the line as a special line concatenation character.
-d '' - Passing empty string as a delimiter prevents read from stripping newline characters. Actually means that null byte is used as a delimiter. -d '' is equal to -d $'\0'.
-n 1 - Means that one character at a time will be read.
printf %s "$string" - Using printf instead of echo -n is safer, because echo treats -n and -e as options. If you pass "-e" as a string, echo will not print anything.
< <(...) - Passing string to the loop using process substitution. If you use here-strings instead (done <<< "$string"), an extra newline character is appended at the end. Also, passing string through a pipe (printf %s "$string" | while ...) would make the loop run in a subshell, which means all variable operations are local within the loop.
Now, let's test the performance with a huge string.
I used the following file as a source:
https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt
The following script was called through time command:
#!/bin/bash
# Saving contents of the file into a variable named `string'.
# This is for test purposes only. In real code, you should use
# `done < "filename"' construct if you wish to read from a file.
# Using `string="$(cat makefiles.txt)"' would strip trailing newlines.
IFS='' read -r -d '' string < makefiles.txt
while IFS='' read -r -d '' -n 1 char; do
# remake the string by adding one character at a time
new_string+="$char"
done < <(printf %s "$string")
# confirm that new string is identical to the original
diff -u makefiles.txt <(printf %s "$new_string")
And the result is:
$ time ./test.sh
real 0m1.161s
user 0m1.036s
sys 0m0.116s
As we can see, it is quite fast.
Next, I replaced the loop with one that uses parameter expansion:
for (( i=0 ; i<${#string}; i++ )); do
new_string+="${string:$i:1}"
done
The output shows exactly how bad the performance loss is:
$ time ./test.sh
real 2m38.540s
user 2m34.916s
sys 0m3.576s
The exact numbers may very on different systems, but the overall picture should be similar.
I've only tested this with ascii strings, but you could do something like:
while test -n "$words"; do
c=${words:0:1} # Get the first character
echo character is "'$c'"
words=${words:1} # trim the first character
done
It is also possible to split the string into a character array using fold and then iterate over this array:
for char in `echo "这是一条狗。" | fold -w1`; do
echo $char
done
The C style loop in #chepner's answer is in the shell function update_terminal_cwd, and the grep -o . solution is clever, but I was surprised not to see a solution using seq. Here's mine:
read word
for i in $(seq 1 ${#word}); do
echo "${word:i-1:1}"
done
#!/bin/bash
word=$(echo 'Your Message' |fold -w 1)
for letter in ${word} ; do echo "${letter} is a letter"; done
Here is the output:
Y is a letter
o is a letter
u is a letter
r is a letter
M is a letter
e is a letter
s is a letter
s is a letter
a is a letter
g is a letter
e is a letter
To iterate ASCII characters on a POSIX-compliant shell, you can avoid external tools by using the Parameter Expansions:
#!/bin/sh
str="Hello World!"
while [ ${#str} -gt 0 ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done
or
str="Hello World!"
while [ -n "$str" ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done
sed works with unicode
IFS=$'\n'
for z in $(sed 's/./&\n/g' <(printf '你好嗎')); do
echo hello: "$z"
done
outputs
hello: 你
hello: 好
hello: 嗎
Another approach, if you don't care about whitespace being ignored:
for char in $(sed -E s/'(.)'/'\1 '/g <<<"$your_string"); do
# Handle $char here
done
Another way is:
Characters="TESTING"
index=1
while [ $index -le ${#Characters} ]
do
echo ${Characters} | cut -c${index}-${index}
index=$(expr $index + 1)
done
fold and while read are great for the job as shown in some answers here. Contrary to those answers, I think it's much more intuitive to pipe in the order of execution:
echo "asdfg" | fold -w 1 | while read c; do
echo -n "$c "
done
Outputs: a s d f g
I share my solution:
read word
for char in $(grep -o . <<<"$word") ; do
echo $char
done
TEXT="hello world"
for i in {1..${#TEXT}}; do
echo ${TEXT[i]}
done
where {1..N} is an inclusive range
${#TEXT} is a number of letters in a string
${TEXT[i]} - you can get char from string like an item from an array

Extract words from files

How can I extract all the words from a file, every word on a single line?
Example:
test.txt
This is my sample text
Output:
This
is
my
sample
text
The tr command can do this...
tr [:blank:] '\n' < test.txt
This asks the tr program to replace white space with a new line.
The output is stdout, but it could be redirected to another file, result.txt:
tr [:blank:] '\n' < test.txt > result.txt
And here the obvious bash line:
for i in $(< test.txt)
do
printf '%s\n' "$i"
done
EDIT Still shorter:
printf '%s\n' $(< test.txt)
That's all there is to it, no special (pathetic) cases included (And handling multiple subsequent word separators / leading / trailing separators is by Doing The Right Thing (TM)). You can adjust the notion of a word separator using the $IFS variable, see bash manual.
The above answer doesn't handle multiple spaces and such very well. An alternative would be
perl -p -e '$_ = join("\n",split);' test.txt
which would. E.g.
esben#mosegris:~/ange/linova/build master $ echo "test test" | tr [:blank:] '\n'
test
test
But
esben#mosegris:~/ange/linova/build master $ echo "test test" | perl -p -e '$_ = join("\n",split);'
test
test
This might work for you:
# echo -e "this is\tmy\nsample text" | sed 's/\s\+/\n/g'
this
is
my
sample
text
perl answer will be :
pearl.214> cat file1
a b c d e f pearl.215> perl -p -e 's/ /\n/g' file1
a
b
c
d
e
f
pearl.216>

Replace certain token with the content of a file (using a bash-script)

I have a file containing some text and the words INSERT_HERE1 and INSERT_HERE2. I'd like to replace these words with the content of file1.txt and file2.txt respectively.
I suspect sed or awk could pull it off but I've basically never used them.
Sed does have a built-in read file command. The commands you want would look something like this:
$ sed -e '/INSERT_HERE1/ {
r FILE1
d }' -e '/INSERT_HERE2/ {
r FILE2
d }' < file
This would output
foo
this is file1
bar
this is file2
baz
The r command reads the file, and the d command deletes the line with the INSERT_HERE tags. You need to use the curly braces since sed commands and multi-line input since sed commands have to start on their own line, and depending on your shell, you may need \ at the end of the lines to avoid premature execution. If this is something you would use a lot, you can just put the command in a file and use sed -f to run it.
If you are okay with Perl you can do:
$ cat FILE1
this is file1
$ cat FILE2
this is file2
$ cat file
foo
INSERT_HERE1
bar
INSERT_HERE2
baz
$ perl -ne 's/^INSERT_HERE(\d+)\s+$/`cat FILE$1`/e;print' file
foo
this is file1
bar
this is file2
baz
$
This is not tested, but would be pretty close to what you need:
sed -e "s/INSERT_HERE1/`cat file1.txt`/" -e "s/INSERT_HERE2/`cat file2.txt`/" <file >file.out
It will not properly handle a file with slashes in it, though, so you may need to tweak it a bit.
I'd recommend Perl instead, though. Something like this:
#!/usr/bin/perl -w
my $f1 = `cat file1.txt`;
my $f2 = `cat file2.txt`;
while (<>) {
chomp;
s/INSERT_HERE1/$f1/;
s/INSERT_HERE2/$f2/;
print "$_\n";
}
This assumes that INSERT_HERE1 and INSERT_HERE2 may only appear once per line, and that the file1.txt does not include the text INSERT_HERE2 (wouldn't be difficult to fix, though). Use like this:
./script <file >file.out
This is suitable for small substitution files that may be substituted many times:
awk 'BEGIN {
while ((getline line < ARGV[1]) > 0) {file1 = file1 nl line; nl = "\n"};
close (ARGV[1]); nl = "";
while ((getline line < ARGV[2]) > 0) {file2 = file2 nl line; nl = "\n"};
close (ARGV[2]);
ARGV[1] = ""; ARGV[2] = "" }
{ gsub("token1", file1);
gsub("token2", file2);
print }' file1.txt file2.txt mainfile.txt
You may want to add some extra newlines here and there, depending on how you want your output to look.
Easily done with Bash. If you need it to be POSIX shell let me know:
#!/bin/bash
IFS= # Needed to prevent the shell from interpreting the newlines
f1=$(< /path/to/file1.txt)
f2=$(< /path/to/file2.txt)
while read line; do
if [[ "$line" == "INSERT_HERE1" ]]; then
echo "$f1"
elif [[ "$line" == "INSERT_HERE2" ]]; then
echo "$f2"
else
echo "$line"
fi
done < /path/to/input/file
This snippet replaces any section that is specified in the upper array. For e.g. here
<!--insert.txt-->
with the contents of "insert.txt"
#!/bin/bash
replace[1]=\<!--insert.txt--\> ; file[1]=insert.txt
replace[2]=\<!--insert2.txt--\> ; file[2]=insert2.txt
replacelength=${#replace[#]}
cat blank.txt > tmp.txt
for i in $(seq 1 ${replacelength})
do
echo Replacing ${file[i]} ...
sed -e "/${replace[i]}/r ${file[i]}" -e "/${replace[i]}/d" tmp.txt > tmp_2.txt
mv tmp_2.txt tmp.txt
done
mv tmp.txt file.txt
If you're not afraid of .zip files you can try this example as long as it is online: http://ablage.stabentheiner.de/2013-04-16_contentreplace.zip
I would use perl's in place replacement with -i.ext option
perl -pi.bak -e 's|INSERT_HERE1|`cat FILE1`|ge;
s|INSERT_HERE2|`cat FILE2`|ge;' myfile
Then, use diff myfile.bak myfile to verify:

Resources