Filter input to remove certain characters/strings - bash

I have quick question about text parsing, for example:
INPUT="a b c d e f g"
PATTERN="a e g"
INPUT variable should be modified so that PATTERN characters should be removed, so in this example:
OUTPUT="b c d f"
I've tried to use tr -d $x in a for loop counting by 'PATTERN' but I don't know how to pass output for the next loop iteration.
edit:
How if a INPUT and PATTERN variables contain strings instead of single characters???

Where does $x come from? Anyway, you were close:
tr -d "$PATTERN" <<< $INPUT
To assign the result to a variable, just use
OUTPUT=$(tr -d "$PATTERN" <<< $INPUT)
Just note that spaces will be removed, too, because they are part of the $PATTERN.

Pure Bash using parameter substitution:
INPUT="a b c d e f g"
PATTERN="a e g"
for p in $PATTERN; do
INPUT=${INPUT/ $p/}
INPUT=${INPUT/$p /}
done
echo "'$INPUT'"
Result:
'b c d f'

Related

How to iterate string with blank?

I have next code:
line="95:p1=a b c 95:p2=d e 96:p1=a b c 96:p2=d e"
for l in $line; do
echo $l
done
I got next:
95:p1=a
b
c
95:p2=d
e
96:p1=a
b
c
96:p2=d
e
But in fact a b c is a whole string in my business, so if possible I could get next with some ways?
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
1st solution: With your shown samples and attempts please try following awk code. Written and tested with GNU awk.
Here is the Online demo for used regex.
echo "$line"
95:p1=a b c 95:p2=d e 96:p1=a b c 96:p2=d e
awk -v RS='[0-9]{2}:p[0-9]=[a-zA-Z] [a-zA-Z]( [a-zA-Z]|$)*' 'RT{print RT}' <<<"$line"
Output with shown samples will be as follows:
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
2nd solution: With any POSIX awk please try following awk code:
awk '
{
while(match($0,/[0-9]{2}:p[0-9]=[a-zA-Z] [a-zA-Z]( [a-zA-Z]|$)*/)){
print substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}
' <<<"$line"
With bash
read -ra f <<<"$line" # split the string into words
n=${#f[#]}
i=0
lines=()
while ((i < n)); do
l=${f[i++]}
until ((i == n)) || [[ ${f[i]} =~ ^[0-9]+: ]]; do
l+=" ${f[i++]}"
done
lines+=( "$l" )
done
declare -p lines
outputs
declare -a lines=([0]="95:p1=a b c" [1]="95:p2=d e" [2]="96:p1=a b c" [3]="96:p2=d e")
Now you can do
for l in "${lines[#]}"; do
do_something_with "$l"
done
Or sed, and capture the lines with bash builtin mapfile
mapfile -t lines < <(sed -E 's/ ([0-9]+:)/\n\1/g' <<< "$line")
You can't do this with regular parameters. If you want a collection of strings that can contain whitespace, use an array.
line=("95:p1=a b c" "95:p2=d e" "96:p1=a b c" "96:p2=d e")
for l in "${line[#]}"; do
echo "$l"
done
Otherwise, you'll need some way of distinguishing between "literal" spaces and "delimiter" spaces. (Maybe the latter is followed by <num>:, but that logic is not trivial to implement using bash regular expressions. You would probably be better off using a more capable language instead of trying to do this in bash.)
echo "${line}" |
mawk 'BEGIN { FS=RS="^$"(ORS="") } gsub(" [^ :]+:","\1&") + gsub("\1.","\n")^_'
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
If your grep supports -P (PCRE) option, would you please try:
grep -Po "\d+:.*?(?=(?:\s*\d+:|$))" <<< "$line"
Output:
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
Explanation of the regex \d+:.*?(?=(?:\s*\d+:|$)):
\d+: matches digits followed by a colon. It will match 95: or 96:.
.*?(?=pattern) matches the shortest sequence of characters
followd by the pattern. (?=pattern) is a lookahead assertion
which is not included in the mathed result.
The pattern above is described as (?:\s*\d+:|$), an alternation of
digits followed by a colon or end of the string. The former
matches the starting portion of the next item. The \s* before
\d+ matches a zero or more space character(s) which trims the
whitespace(s) from the matched result.
If you want to iterate over the divided substrings, you can say:
while IFS= read -r i; do
echo "$i" # or whatever you want to do with "$i"
done < <(grep -Po "\d+:.*?(?=(?:\s*\d+:|$))" <<< "$line")

Store multiple values in a single variable in linux

I need to store multiple command outputs that comes from a for loop into a single variable.
The variable should store the output separated by space.
Output that I am expecting:
for i in a b c d e
do
xyz=$i
done
echo $xyz should return a b c d e
Concatenate strings
#!/usr/bin/env sh
# initialize xyz to empty
xyz=
for i in a b c d e
do
# concatenate xyz space and i into xyz
xyz="$xyz $i"
done
# remove the extra leading space from xyz
xyz="${xyz# }"
echo "$xyz should return a b c d e"
Or with growing the arguments array:
#!/usr/bin/env sh
# clear arguments array
set --
for i in a b c d e
do
# add i to arguments array
set -- "$#" "$i"
done
# expand arguments into xyz
xyz="$*"
echo "$xyz should return a b c d e"
I'd suggest to use an array instead of var
for i in a b c d e
do
arr+=("$i")
done
echo "${arr[#]}"
Here is one way to solve the problem:
for i in a b c d e
do
xyz="$xyz$sep$i"
sep=" "
done
echo "$xyz"
Here is the output:
a b c d e
Here is how it works. At first, both variables xyz and sep are unset, so expanding them (that is, $xyz and $sep) leads to empty strings. The sep variable represents a separator which is empty initially.
After the first iteration, the xyz variable is set to a and the sep variable is set to (a space). Now for the second and subsequent iterations, the separator is a space, so now xyz="$xyz$sep$i" appends a space and the new value of $i to the existing value of $xyz. For example, in the second iteration, xyz="$xyz$sep$i" expands to xyz="a b".
Of course, another alternative that avoids altering the separator value between iterations is as follows:
for i in a b c d e
do
xyz="$xyz $i"
done
xyz="${xyz# }"
echo "$xyz"
Once again the output is:
a b c d e
In this alternative solution, the separator is always a space. But that means that in the first iteration of xyz="$xyz$sep$i" adds a leading space to $xyz, that is, after the first iteration, the value of the xyz variable is a (note the leading space). The final value of the variable thus becomes a b c d e (note the leading space again). We get rid of this leading space with the ${xyz# } syntax which removes the smallest prefix pattern.
See POSIX.1-2008: Shell Command Language: 2.6.2 Parameter Expansion for more details on parameter expansion and prefix pattern removal.

Is there a way of finding index by content without loop in bash

I am wondering if there is a simple solution (one line command of sed or awk) of finding index by content in bash. For example, array=(a b c d e), given a target element "d", how can I get its corresponding array index of 3 without looping through the array and comparing each element with the target?
Try this with GNU grep:
array=(a b c d e)
declare -p array | grep -Po '\[\K[^\]](?=\]="d")'
or with sed:
array=(a b c d e)
declare -p array | sed 's/.*\[\([^\[]\)\]\+="d".*/\1/'
Output with grep and sed:
3
With a variable:
array=(a b c d e)
target="d"
index="$(declare -p array | grep -Po '\[\K[^\]](?=\]="'"$target"'")')"
echo "$index"

Splitting a string to tokens according to shell parameter rules without eval

I have a string like
$ str="abc 'e f g' hij"
and i wish to get whole e f g part of it. In other words, i wish to tokenize the string according to shell parameter rules.
Currently, i am doing that as
$ str="abc 'e f g' hij"; (eval "set -- $str"; echo $2)
but this is totally unsafe if a single * gets outside of '-ticks.
Any better solutions?
You can use set -f to disable filename expansion altogether.
$ str="* 'e f g' hij"
$ ( set -f; eval "set -- $str"; echo $2 )
e f g
This addresses just one problem you might anticipate with eval, but there may be other options available with set you can explore.

How to line wrap output in bash?

I have a command which outputs in this format:
A
B
C
D
E
F
G
I
J
etc
I want the output to be in this format
A B C D E F G I J
I tried using ./script | tr "\n" " " but all it does is remove n from the output
How do I get all the output in one line. (Line wrapped)
Edit: I accidentally put in grep while asking the question. I removed
it. My original question still stands.
The grep is superfluous.
This should work:
./script | tr '\n' ' '
It did for me with a command al that lists its arguments one per line:
$ al A B C D E F G H I J
A
B
C
D
E
F
G
H
I
J
$ al A B C D E F G H I J | tr '\n' ' '
A B C D E F G H I J $
As Jonathan Leffler points out, you don't want the grep. The command you're using:
./script | grep tr "\n" " "
doesn't even invoke the tr command; it should search for the pattern "tr" in files named "\n" and " ". Since that's not the output you reported, I suspect you've mistyped the command you're using.
You can do this:
./script | tr '\n' ' '
but (a) it joins all its input into a single line, and (b) it doesn't append a newline to the end of the line. Typically that means your shell prompt will be printed at the end of the line of output.
If you want everything on one line, you can do this:
./script | tr '\n' ' ' ; echo ''
Or, if you want the output wrapped to a reasonable width:
./script | fmt
The fmt command has a number of options to control things like the maximum line length; read its documentation (man fmt or info fmt) for details.
No need to use other programs, why not use Bash to do the job? (-- added in edit)
line=$(./script.sh)
set -- $line
echo "$*"
The set sets command-line options, and one of the (by default) seperators is a "\n". EDIT: This will overwrite any existing command-line arguments, but good coding practice would suggest that you reassigned these to named variables early in the script.
When we use "$*" (note the quotes) it joins them alll together again using the first character of IFS as the glue. By default that is a space.
tr is an unnecessary child process.
By the way, there is a command called script, so be careful of using that name.
If I'm not mistaken, the echo command will automatically remove the newline chars when its argument is given unquoted:
tmp=$(./script.sh)
echo $tmp
results in
A B C D E F G H I J
whereas
tmp=$(./script.sh)
echo "$tmp"
results in
A
B
C
D
E
F
G
H
I
J
If needed, you can re-assign the output of the echo command to another variable:
tmp=$(./script.sh)
tmp2=$(echo $tmp)
The $tmp2 variable will then contain no newlines.

Resources