Is there a way of finding index by content without loop in bash - bash

I am wondering if there is a simple solution (one line command of sed or awk) of finding index by content in bash. For example, array=(a b c d e), given a target element "d", how can I get its corresponding array index of 3 without looping through the array and comparing each element with the target?

Try this with GNU grep:
array=(a b c d e)
declare -p array | grep -Po '\[\K[^\]](?=\]="d")'
or with sed:
array=(a b c d e)
declare -p array | sed 's/.*\[\([^\[]\)\]\+="d".*/\1/'
Output with grep and sed:
3
With a variable:
array=(a b c d e)
target="d"
index="$(declare -p array | grep -Po '\[\K[^\]](?=\]="'"$target"'")')"
echo "$index"

Related

Shell separate line into multiple lines after every number

So I have a selection of text files all of which are on one line
I need a way to seperate the line into multiple lines after every number.
At the minute I have something like this
a 111111b 222c 3d 444444
and I need a way to get it to this
a 11111
b 222
c 3
d 444444
I have been trying to create a gawk with regex but I'm not aware of a way to get this to work. (I am fairly new to shell)
Easy with sed.
$: cat file
a 51661b 99595c 65652d 51515
$: sed -E 's/([a-z] [0-9]+)\n*/\1\n/g' file
a 51661
b 99595
c 65652
d 51515
Pretty easy with awk.
$: awk '{ print gensub("([a-z] [0-9]+)\n*", "\\1\n", "g") }' file
a 51661
b 99595
c 65652
d 51515
Could even do with bash built-ins only...but don't...
while read -r line
do while [[ "$line" =~ [a-z]\ [0-9]+ ]]
do printf "%s\n" "$BASH_REMATCH"
line=${line#$BASH_REMATCH}
done
done < file
a 51661
b 99595
c 65652
d 51515
You already have a good answer from Paul, but for sed an arguably more direct expression simply using the first two numbered backreferences separated by a newline would be:
sed -E 's/([0-9])([^0-9])/\1\n\2/g' file
Example Use/Output
In your case that would be:
$ echo "a 111111b 222c 3d 444444" | sed -E 's/([0-9])([^0-9])/\1\n\2/g'
a 111111
b 222
c 3
d 444444

How to iterate string with blank?

I have next code:
line="95:p1=a b c 95:p2=d e 96:p1=a b c 96:p2=d e"
for l in $line; do
echo $l
done
I got next:
95:p1=a
b
c
95:p2=d
e
96:p1=a
b
c
96:p2=d
e
But in fact a b c is a whole string in my business, so if possible I could get next with some ways?
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
1st solution: With your shown samples and attempts please try following awk code. Written and tested with GNU awk.
Here is the Online demo for used regex.
echo "$line"
95:p1=a b c 95:p2=d e 96:p1=a b c 96:p2=d e
awk -v RS='[0-9]{2}:p[0-9]=[a-zA-Z] [a-zA-Z]( [a-zA-Z]|$)*' 'RT{print RT}' <<<"$line"
Output with shown samples will be as follows:
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
2nd solution: With any POSIX awk please try following awk code:
awk '
{
while(match($0,/[0-9]{2}:p[0-9]=[a-zA-Z] [a-zA-Z]( [a-zA-Z]|$)*/)){
print substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}
' <<<"$line"
With bash
read -ra f <<<"$line" # split the string into words
n=${#f[#]}
i=0
lines=()
while ((i < n)); do
l=${f[i++]}
until ((i == n)) || [[ ${f[i]} =~ ^[0-9]+: ]]; do
l+=" ${f[i++]}"
done
lines+=( "$l" )
done
declare -p lines
outputs
declare -a lines=([0]="95:p1=a b c" [1]="95:p2=d e" [2]="96:p1=a b c" [3]="96:p2=d e")
Now you can do
for l in "${lines[#]}"; do
do_something_with "$l"
done
Or sed, and capture the lines with bash builtin mapfile
mapfile -t lines < <(sed -E 's/ ([0-9]+:)/\n\1/g' <<< "$line")
You can't do this with regular parameters. If you want a collection of strings that can contain whitespace, use an array.
line=("95:p1=a b c" "95:p2=d e" "96:p1=a b c" "96:p2=d e")
for l in "${line[#]}"; do
echo "$l"
done
Otherwise, you'll need some way of distinguishing between "literal" spaces and "delimiter" spaces. (Maybe the latter is followed by <num>:, but that logic is not trivial to implement using bash regular expressions. You would probably be better off using a more capable language instead of trying to do this in bash.)
echo "${line}" |
mawk 'BEGIN { FS=RS="^$"(ORS="") } gsub(" [^ :]+:","\1&") + gsub("\1.","\n")^_'
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
If your grep supports -P (PCRE) option, would you please try:
grep -Po "\d+:.*?(?=(?:\s*\d+:|$))" <<< "$line"
Output:
95:p1=a b c
95:p2=d e
96:p1=a b c
96:p2=d e
Explanation of the regex \d+:.*?(?=(?:\s*\d+:|$)):
\d+: matches digits followed by a colon. It will match 95: or 96:.
.*?(?=pattern) matches the shortest sequence of characters
followd by the pattern. (?=pattern) is a lookahead assertion
which is not included in the mathed result.
The pattern above is described as (?:\s*\d+:|$), an alternation of
digits followed by a colon or end of the string. The former
matches the starting portion of the next item. The \s* before
\d+ matches a zero or more space character(s) which trims the
whitespace(s) from the matched result.
If you want to iterate over the divided substrings, you can say:
while IFS= read -r i; do
echo "$i" # or whatever you want to do with "$i"
done < <(grep -Po "\d+:.*?(?=(?:\s*\d+:|$))" <<< "$line")

How to line wrap output in bash?

I have a command which outputs in this format:
A
B
C
D
E
F
G
I
J
etc
I want the output to be in this format
A B C D E F G I J
I tried using ./script | tr "\n" " " but all it does is remove n from the output
How do I get all the output in one line. (Line wrapped)
Edit: I accidentally put in grep while asking the question. I removed
it. My original question still stands.
The grep is superfluous.
This should work:
./script | tr '\n' ' '
It did for me with a command al that lists its arguments one per line:
$ al A B C D E F G H I J
A
B
C
D
E
F
G
H
I
J
$ al A B C D E F G H I J | tr '\n' ' '
A B C D E F G H I J $
As Jonathan Leffler points out, you don't want the grep. The command you're using:
./script | grep tr "\n" " "
doesn't even invoke the tr command; it should search for the pattern "tr" in files named "\n" and " ". Since that's not the output you reported, I suspect you've mistyped the command you're using.
You can do this:
./script | tr '\n' ' '
but (a) it joins all its input into a single line, and (b) it doesn't append a newline to the end of the line. Typically that means your shell prompt will be printed at the end of the line of output.
If you want everything on one line, you can do this:
./script | tr '\n' ' ' ; echo ''
Or, if you want the output wrapped to a reasonable width:
./script | fmt
The fmt command has a number of options to control things like the maximum line length; read its documentation (man fmt or info fmt) for details.
No need to use other programs, why not use Bash to do the job? (-- added in edit)
line=$(./script.sh)
set -- $line
echo "$*"
The set sets command-line options, and one of the (by default) seperators is a "\n". EDIT: This will overwrite any existing command-line arguments, but good coding practice would suggest that you reassigned these to named variables early in the script.
When we use "$*" (note the quotes) it joins them alll together again using the first character of IFS as the glue. By default that is a space.
tr is an unnecessary child process.
By the way, there is a command called script, so be careful of using that name.
If I'm not mistaken, the echo command will automatically remove the newline chars when its argument is given unquoted:
tmp=$(./script.sh)
echo $tmp
results in
A B C D E F G H I J
whereas
tmp=$(./script.sh)
echo "$tmp"
results in
A
B
C
D
E
F
G
H
I
J
If needed, you can re-assign the output of the echo command to another variable:
tmp=$(./script.sh)
tmp2=$(echo $tmp)
The $tmp2 variable will then contain no newlines.

Filter input to remove certain characters/strings

I have quick question about text parsing, for example:
INPUT="a b c d e f g"
PATTERN="a e g"
INPUT variable should be modified so that PATTERN characters should be removed, so in this example:
OUTPUT="b c d f"
I've tried to use tr -d $x in a for loop counting by 'PATTERN' but I don't know how to pass output for the next loop iteration.
edit:
How if a INPUT and PATTERN variables contain strings instead of single characters???
Where does $x come from? Anyway, you were close:
tr -d "$PATTERN" <<< $INPUT
To assign the result to a variable, just use
OUTPUT=$(tr -d "$PATTERN" <<< $INPUT)
Just note that spaces will be removed, too, because they are part of the $PATTERN.
Pure Bash using parameter substitution:
INPUT="a b c d e f g"
PATTERN="a e g"
for p in $PATTERN; do
INPUT=${INPUT/ $p/}
INPUT=${INPUT/$p /}
done
echo "'$INPUT'"
Result:
'b c d f'

How to delete all lines containing certain words, except those containing certain words?

I have a file called file1.txt. I'd like to delete every line containing the words, "center of", "farm", or "middle of", etc. except lines which contain "①" or "city".
The list of deletions and exceptions is quite long.
The files are in UTF-8.
How can I delete every line containing at least one of these words, but not those lines which have some of the exceptions?
This might work for you:
sed -i '/center of\|farm\|middle of/{/①\|city/!d}' file1.txt
or
sed -i '/center of/ba
/farm/ba
/middle of/ba
b
:a
/①/b
/city/b
d' file1.txt
and if you have a words.txtand exceptions.txt files, use this:
sed '/\*exceptions\*/{h;s/.*/:a/p;d}
x
/./{x;s|.*|/&/b|p;$!d;s/.*/d/;q}
x
s|.*|/&/ba|' words.txt - <<<"*exceptions*" exceptions.txt > file.sed
sed -i -f file.sed file1.txt
sed -r '/①|city/{p;d};/center of|farm|middle of/d' file1.txt
sed '/blacklist/{/whitelist/p;d}' file
Delete the blacklist, except it is in the whitelist:
echo -e "a b\nb c\nc d\nd e\ne f" | sed '/c\|d/{/a\|b/p;d}'
prints
a b
b c
e f
which is every line, which does not contain c or d, and lines containing c or d only if they contain a or b.

Resources