How does negative matching work in extglob in parameter expansion - bash

Problem
The behaviour of
!(pattern-list)
does not work the way I would expect when used in parameter expansion, specifically
${parameter/pattern/string}
Input
a="1 2 3 4 5 6 7 8 9 10"
Test cases
$ printf "%s\n" "${a/!([0-9])/}"
[blank]
#expected 12 3 4 5 6 7 8 9 10
$ printf "%s\n" "${a/!(2)/}"
[blank]
#expected 2 3 4 5 6 7 8 9 10
$ printf "%s\n" "${a/!(*2*)/}"
2 3 4 5 6 7 8 9 10
#Produces the behaviour expected in previous one, not sure why though
$ printf "%s\n" "${a/!(*2*)/,}"
,2 3 4 5 6 7 8 9 10
#Expected after previous worked
$ printf "%s\n" "${a//!(*2*)/}"
2
#Expected again previous worked
$ printf "%s\n" "${a//!(*2*)/,}"
,,2,
#Why are there 3 commas???
Specs
GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)
Notes
These are very basic examples, so if it is possible to include more complex examples with explanations in the answer then please do.
Any more info or examples needed let me know in the comments.
Have already looked at How does extglob work with shell parameter expansion?, and have even commented on what the problem is with that particular problem, so please don't mark as a dupe.

Parameter expansion of the form ${parameter/pattern/string} (where pattern doesn't start with a /) works by finding the leftmost longest substring in the value of the variable parameter that matches the pattern pattern and replacing it with string. In other words, $parameter is decomposed into three parts prefix,match, and suffix such that
$parameter == "${prefix}${match}${suffix}"
$prefix is the shortest possible string enabling the other requirements to be fulfilled (i.e. the match, if at all possible, occurs in the leftmost position)
$match matches pattern and is as long as possible
any of $prefix, $match and/or $suffix can be empty
and the result of ${parameter/pattern/string} is "${prefix}string${suffix}".
For the global replacement form (${parameter//pattern/string}) of this type of parameter expansion, the same process is recursively performed for the suffix part, however a zero-length match is handled as a special case (in order to prevent infinite recursion):
if "${prefix}${match}" != ""
"${parameter//pattern/string}" = "${prefix}string${suffix//pattern/string}"
else suffix=${parameter:1} and
"${parameter//pattern/string}" = "string${parameter:0:1}${suffix}//pattern/string}"
Now let's analyze the cases individually:
"${a/!([0-9])/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Indeed, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of a single digit, and therefore it matches the pattern !([0-9]). Hence the empty result of expansion.
"${a/!(2)/}" --> prefix='' match='1 2 3 4 5 6 7 8 9 10' suffix=''. Similar to the above, '1 2 3 4 5 6 7 8 9 10' is not a string consisting of the single character '2', and therefore it matches the pattern !(2). Hence the empty result of expansion.
"${a/!(*2*)/}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. The substring '1 ' doesn't match the pattern *2*, and therefore it matches the pattern !(*2*).
"${a/!(*2*)/,}". There were no surprises here, so no need to elaborate.
"${a//!(*2*)/}". There were no surprises here, so no need to elaborate.
"${a//!(*2*)/,}" --> prefix='' match='1 ' suffix='2 3 4 5 6 7 8 9 10'. Then ${suffix//!(*2*)/,} expands to ",2," as follows. The empty string in the beginning of suffix matches the pattern !(*2*), producing an extra comma in the result. Since the zero-length match special case (described above) was triggered, the first character of suffix is forcibly consumed, leaving us with ' 3 4 5 6 7 8 9 10', which matches the !(*2*) pattern in its entirety and is replaced with the last comma that we see in the final result of the expansion.

Related

grep: remove lines with the same number twice

I have a .txt file and on each line is some amount of numbers. What I need is to filtrate these which does not contain the same number. So I want the output to be only the lines which have all the numbers different. I have to use command grep!
Example:
File_input:
1 1 2 3 4 5
1 2 3 4 5 6
6 6 6 6 6 6
What I want
File_output:
1 2 3 4 5 6
First and third lines contains same numbers so these has to be filtrated out.
This should work for your example:
grep -v "\([0-9]\).*\1" myfile
Idea is to catch any single digit [0-9] and store it \(\) and search for the existing same pattern \1 on the same line. You can easily extend to any word made of digits.
With the given input you can use
sed -r '/([0-9]+).+\1/d' File_input
You will have problems with suubstrings: 1 matches 12 and 12 matches 1.
ou can add word boundaries \b with
sed -r '/\b([0-9]+)\b.*\b\1\b/d' File_input

Replace the nth field of every mth line using awk or bash

For a file that contains entries similar to as follows:
foo 1 6 0
fam 5 11 3
wam 7 23 8
woo 2 8 4
kaz 6 4 9
faz 5 8 8
How would you replace the nth field of every mth line with the same element using bash or awk?
For example, if n = 1 and m = 3 and the element = wot, the output would be:
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
I understand you can call / print every mth line using e.g.
awk 'NR%7==0' file
So far I have tried to keep this in memory but to no avail... I need to keep the rest of the file as well.
I would prefer answers using bash or awk, but sed solutions would also be helpful. I'm a beginner in all three. Please explain your solution.
awk -v m=3 -v n=1 -v el='wot' 'NR % m == 0 { $n = el } 1' file
Note, however, that the inter-field whitespace is not guaranteed to be preserved as-is, because awk splits a line into fields by any run of whitespace; as written, the output fields of modified lines will be separated by a single space.
If your input fields are consistently separated by 2 spaces, however, you can effectively preserve the input whitespace by adding -F' ' -v OFS=' ' to the awk invocation.
-v m=3 -v n=1 -v el='wot' defines Awk variables m, n, and el
NR % m == 0 is a pattern (condition) that evaluates to true for every m-th line.
{ $n = el } is the associated action that replaces the nth field of the input line with variable el, causing the line to be rebuilt, implicitly using OFS, the output-field separator, which defaults to a space.
1 is a common Awk shorthand for printing the (possibly modified) input line at hand.
Great little exercise. While I would probably lean toward an awk solution, in bash you can also rely on parameter expansion with substring replacement to replace the nth field of every mth line. Essentially, you can read every line, preserving whitespace, then check your line count, e.g. if c is your line counter and m your variable for mth line, you could use:
if (( $((c % m )) == 0)) ## test for mth line
If the line is a replacement line, you can read each word into an array after restoring default word-splitting and then use your array element index n-1 to provide the replacement (e.g. ${line/find/replace} with ${line/"${array[$((n-1))]}"/replace}).
If it isn't a replacement line, simply output the line unchanged. A short example could be similar to the following (to which you can add additional validations as required)
#!/bin/bash
[ -n "$1" -a -r "$1" ] || { ## filename given an readable
printf "error: insufficient or unreadable input.\n"
exit 1
}
n=${2:-1} ## variables with default n=1, m=3, e=wot
m=${3:-3}
e=${4:-wot}
c=1 ## line count
while IFS= read -r line; do
if (( $((c % m )) == 0)) ## test for mth line
then
IFS=$' \t\n'
a=( $line ) ## split into array
IFS=
echo "${line/"${a[$((n-1))]}"/$e}" ## nth replaced with e
else
echo "$line" ## otherwise just output line
fi
((c++)) ## advance counter
done <"$1"
Example Use/Output
n=1, m=3, e=wot
$ bash replmn.sh dat/repl.txt
foo 1 6 0
fam 5 11 3
wot 7 23 8
woo 2 8 4
kaz 6 4 9
wot 5 8 8
n=1, m=2, e=baz
$ bash replmn.sh dat/repl.txt 1 2 baz
foo 1 6 0
baz 5 11 3
wam 7 23 8
baz 2 8 4
kaz 6 4 9
baz 5 8 8
n=3, m=2, e=99
$ bash replmn.sh dat/repl.txt 3 2 99
foo 1 6 0
fam 5 99 3
wam 7 23 8
woo 2 99 4
kaz 6 4 9
faz 5 99 8
An awk solution is shorter (and avoids problems with duplicate occurrences of the replacement string in $line), but both would need similar validation of field existence, etc.. Learn from both and let me know if you have any questions.

Split number string arbitrarily using bash into fixed number of variables

I have a string with 3000 elements (NOT in series) in bash,
sections='1 2 4 ... 3000'
I am trying to split this string into x chunks of length n.
I want x to be typically between 3-10. Each chunk may not be of
the same length.
Each chunk is the input to a job.
Looking at https://unix.stackexchange.com/questions/122499/bash-split-a-list-of-files
and using bash arrays, my first attempt looks like this:
#! /bin/bash
nArgs=10
nChunkSize=10
z="0 1 2 .. 1--"
zs=(${z// / })
echo ${zs[#]}
for i in $nArgs; do
echo "Creating argument: "$i
startItem=$i*$nChunkSize
zArg[$i] = ${zs[#]:($startItem:$chunkSize}
done
echo "Resulting args"
for i in $nArgs; do
echo "Argument"${zArgs[$1]}
done
The above is far from working I'm afraid. Any pointers on the ${zs[#]:($startItem:$chunkSize} syntax?
For an input of 13 elements:
z='0 1 2 3 4 5 6 7 8 10 11 12 15'
nChunks=3
and nArgs=4
I would like to obtain an array with 3 elements, zs with content
zs[0] = '0 1 2 3'
zs[1] = '4 5 6 7'
zs[2] = '8 10 11 12 15'
Each zs will be used as arguments to subsequent jobs.
First note: This is a bad idea. It won't work reliably with arbitrary (non-numeric) contents, as bash doesn't have support for nested arrays.
output=( )
sections_str='1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 3000'
batch_size=4
read -r -a sections <<<"$sections_str"
for ((i=0; i<${#sections[#]}; i+=batch_size)); do
current_pieces=( "${sections[#]:i:batch_size}" )
output+=( "${current_pieces[*]}" )
done
declare -p output # to view your output
Notes:
zs=( $z ) is buggy. For example, any * inside your list will be replaced with a list of filenames in the current directory. Use read -a to read into an array in a reliable way that doesn't depend on shell configuration other than IFS (which can be controlled scoped to just that one line with IFS=' ' read -r -a).
${array[#]:start:count} expands to up to count items from your array, starting at position start.

How to sequence lines in files if some lines are strings

I encountered a problem with bash, I started using it recently.
I realize that lot of magic stuff can be done with just one line, as my previous question was solved by it.
This time question is simple:
I have a file which has this format
2 2 10
custom
8 10
3 5 18
custom
1 5
some of the lines equal to string custom (it can be any line!) and other lines have 2 or 3 numbers in it.
I want a file which will sequence the line with numbers but keep the lines with custom (order also must be the same), so desired output is
2 4 6 8 10
custom
8 9 10
3 8 13 18
custom
1 2 3 4 5
I also wish to overwrite input file with this one.
I know that with seq I can do the sequencing, but I wish elegant way to do it on file.
You can use awk like this:
awk '/^([[:blank:]]*[[:digit:]]+){2,3}[[:blank:]]*$/ {
j = (NF==3) ? $2 : 1
s=""
for(i=$1; i<=$NF; i+=j)
s = sprintf("%s%s%s", s, (i==$1)?"":OFS, i)
$0=s
} 1' file
2 4 6 8 10
custom
8 9 10
3 8 13 18
custom
1 2 3 4 5
Explanation:
/^([[:blank:]]*[[:digit:]]+){2,3}[[:blank:]]*$/ - match only lines with 2 or 3 numbers.
j = (NF==3) ? $2 : 1 - set variable j to $2 if there are 3 columns otherwise set j to 1
for(i=$1; i<=$NF; i+=j) run a loop from 1st col to last col, increment by j
sprintf is used for formatting the generated sequence
1 is default awk action to print each line
This might work for you (GNU sed, seq and paste):
sed '/^[0-9]/s/.*/seq & | paste -sd\\ /e' file
If a line begins with a digit use the lines values as parameters for the seq command which is then piped to paste command. The RHS of the substitute command is evaluated using the e flag (GNU sed specific).

How to delete leading newline in a string in bash?

I'm having the following issue. I have an array of numbers:
text="\n1\t2\t3\t4\t5\n6\t7\t8\t9\t0"
And I'd like to delete the leading newline.
I've tried
sed 's/.//' <<< "$text"
cut -c 1- <<< "$text"
and some iterations. But the issue is that both of those delete the first character AFTER EVERY newline. Resulting in this:
text="\n\t2\t3\t4\t5\n\t7\t8\t9\t0"
This is not what I want and there doesn't seem to be an answer to this case.
Is there a way to tell either of those commands to treat newlines like characters and the entire string as one entity?
awk to the rescue!
awk 'NR>1'
of course you can do the same with tail -n +2 or sed 1d as well.
You can probably use the substitution modifier (see parameter expansion and ANSI C quoting in the Bash manual):
$ text=$'\n1\t2\t3\t4\t5\n6\t7\t8\t9\t0'
$ echo "$text"
1 2 3 4 5
6 7 8 9 0
$ echo "${text/$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$
It replaces the first newline with nothing, as requested. However, note that it is not anchored to the first character:
$ alt="${text/$'\n'/}"
$ echo "${alt/$'\n'/}"
1 2 3 4 56 7 8 9 0
$
Using a caret ^ before the newline doesn't help — it just means there's no match.
As pointed out by rici in the comments, if you read the manual page I referenced, you can find how to anchor the pattern at the start with a # prefix:
$ echo "${text/#$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$ echo "${alt/#$'\n'/}"
1 2 3 4 5
6 7 8 9 0
$
The notation bears no obvious resemblance to other regex systems; you just have to know it.

Resources