From the docs:
$(patsubst PATTERN,REPLACEMENT,TEXT)
Finds whitespace-separated words in TEXT that match PATTERN and
replaces them with REPLACEMENT. Here PATTERN may contain a %
which acts as a wildcard, matching any number of any characters
within a word.
...
Whitespace between words is folded into single space characters;
leading and trailing whitespace is discarded.
Now, given a makefile, is:
# The pattern for patsubst, does NOT contain '%'
foo := $(patsubst x,y,x x x)
# The pattern for patsubst, does contain '%'
bar := $(patsubst x%,y,x x x)
# The variable 'foo', is a result from a patsubst-pattern, that did NOT contain a '%'
# The variable 'bar', is a result from a patsubst-pattern, that did contain a '%'
all ::
#echo 'foo is: "$(foo)"'
#echo 'bar is: "$(bar)"'
Executing, we get:
foo is: "y y y"
bar is: "y y y"
So, it is obvious, that Make, may or may not "fold" all whitespace into one and single whitespace.
Or, did I do something wrong.
In fact all is explained in the doc:
Finds whitespace-separated words in TEXT ...
means that one or more spaces have to separate the words.
... that match PATTERN ...
means that it select only words that match a pattern (which can include some spaces).
... and replaces them with REPLACEMENT.
means that the selected patterns will be replace by a replacement.
A picture is worth a thousand words.
For PATTERN = X:
+---- SEPARATORS ----+
| |
+-------+-------+ +--------+------+
| | | |
X space space space X space space space x
| | |
+---------------------+---------------------+
|
PATTERNS
For PATTERN = X%:
+---- SEPARATORS ---+
| |
+-+-+ +-+-+
| | | |
X space space space X space space space x
| | | | |
+------+-----+ +------+-----+ |
| | |
+--- PATTERNS ------+--------------+
Interesting thing:
When you use the % character in your pattern, you can re-use it in the replacement, like this:
$(patsubst x%,y%,xa xb xc)
# Will be "ya yb yc"
But when you have space character in the % variable, make will strip them in the replacement.
$(patsubst x%,y%,xa xb xc)
# Will also be "ya yb yc"
EDIT: After reading the source code, the interesting things are:
function.c +146: The function patsubst_expand_pat
misc.c +337: The function find_next_token
misc.c +325: The function next_token
So here is the behavior:
If no % in the pattern, this is a simple substitution (which keep the spaces).
Else it split the text by words and get rid of all spaces (using the isblank function).
Finally, it does the replacement
Related
I want to number all the lines that do not contain a character "b" or "c" in the line. How to do that?
My first idea was to write this:
ls -l | nl -bp'[!bc]'
because without an exclamation mark it numbers only the lines that do contain characters "b" or "c". I thought that exclamation mark would inverse this numbering, but it did not..
I would need somehow to inverse it so that it would number lines that do not contain these characters. Could you write this command using nl -bp?
Number lines that consist entirely of allowed characters.
ls -l | nl -bp'^[^bc]*$'
[^bc] matches any character that is neither b nor c. If it were [bc] then it'd match only bs and cs. ^ negates a character class, so [^bc] matches anything that's not a b or c.
The ^ at front and $ are anchors. They require the entire line to match, not just a part of the line.
Note that this use of ^ is entirely unrelated to the ^ inside square brackets. It's an unfortunate reuse of the same symbol. One is an anchor and the other is a character class negation.
* matches zero or more of the preceding item. You may have seen .* before, which matches anything at all because a dot matches any single character. The use here is similar, except instead of . we have [^bc].
With awk expression:
ls -l | awk '/^[^bc]*$/{ $0 = ++c FS $0 }1'
or the same with negated regex pattern:
ls -l | awk '!/[bc]/{ $0 = ++c FS $0 }1'
Since you use ls I think you want to print all the files in a folder.
You could use: ls -l | grep -v "[bc]".
I am trying to replace every time there is one space with two spaces in Unix. We are just reading from standard input and writing to standard ouput. I also have to avoid using the functions awk and perl. For example if I read in something like San Diego it should print San Diego. If there are already multiple spaces, it should just leave them alone.
How about bash only? First test file:
$ cat file
1
2 3
4 5
San Diego NO
Then:
$ cat file |
while IFS= read line
do
while [[ "$line" =~ (^|.+[^ ])\ ([^ ].*) ]]
do
line="${BASH_REMATCH[1]} ${BASH_REMATCH[2]}"
done
echo "$line"
done
1
2 3
4 5
San Diego NO
You have to a bit careful here not to forget spaces at the beginning or end.
I present three solutions for educational purpose:
sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g' # solution 1
sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g' # solution 2
sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g' # solution 3
All three solutions make use of subexpressions:
9.3.6 BREs Matching Multiple Characters
A subexpression can be defined within a BRE by enclosing it between
the character pairs \( and \). Such a subexpression shall match
whatever it would have matched without the \( and \), except that
anchoring within subexpressions is optional behavior; see BRE
Expression Anchoring. Subexpressions can be arbitrarily nested.
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression enclosed
between "\(" and "\)" preceding the '\n'. The character n shall be a
digit from 1 through 9, specifying the nth subexpression (the one that
begins with the nth \( from the beginning of the pattern and ends
with the corresponding paired \) ). The expression is invalid if
less than n subexpressions precede the \n. For example, the
expression ".∗\1$" matches a line consisting of two adjacent
appearances of the same string, and the expression a*\1 fails to
match a. When the referenced subexpression matched more than one
string, the back-referenced expression shall refer to the last matched
string. If the subexpression referenced by the back-reference matches
more than one string because of an asterisk (*) or an interval
expression (see item (5)), the back-reference shall match the last
(rightmost) of these strings.
Solution 1: sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g'
Here there are two subexpressions. The first subexpression \(^\|[^ ]\) matches the beginning of the line (^) or (\|) a non-space character ([^ ]). The second subexpression \($\|[^ ]\) is similar but with the end-of-line ($).
Solution 2: sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g'
This replaces one-or more spaces by the same amount of spaces and an extra one. Afterwards we correct the ones with 3 spaces or more by removing a single space from those.
Solution 3: sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g'
This does the same thing as solution 2 but inverts the logic. First remove a space from all sequences that have more then one space, and afterwards add a space. This one-liner is just one-character shorter then solution 2.
Example: based on solution 1
The following commands are nothing more then echo "string" | sed ..., but to show the spaces, wrapped into a printf statement.
# default string
$ printf "|%s|" " foo bar car "
| foo bar car |
# spaces replaced
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
# 3 spaces in front and back
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
note: If you want to replace any form of blanks (spaces and tabs in any encoding) by the same doubled blank, you could use :
sed 's/\(^\|[^[:blank:]]\)\([[:blank:]]\)\($\|[^[:blank:]]\)/\1\2\2\3/g'
sed 's/\(^\|[[:graph:]]\)\([[:blank:]]\)\($\|[[:graph:]]\)/\1\2\2\3/g
Something along the lines of
cat input.txt | sed 's,\([[:alnum:]]\) \([[:alnum:]]\),\1 \2,'
should work for that purpose.
replace only occurrence of 1 space between 2 chars hat are not white space with 2 spaces
`sed 's/\([^ ]\) \([^ ]\)/\1 \2/g' file`
1) [^ ] - not space char
2) \1 \2 - first expression found in Parenthesis, 2 spaces, second Parentheses expiration
3) sed used with s///g is replacing the regex in the first // with the value in the second //
To get what is between "aa=" and either % or empty
string = "aa=value%bb"
string2 = "bb=%aa=value"
The rule must work on both strings to get the value of "aa="
I would like a BASH LANGUAGE solution if possible.
Use this:
result=$(echo "$string" | grep -o 'aa=[^%]*')
result=${result:3} # remove aa=
[^%]* matches any sequence of characters that doesn't contain %, so it will stop when it gets to % or the end of the string. $(result:3} expands to the substring starting from character 3, which removes aa= from the beginning.
I matched a string against a regex:
s = "`` `foo`"
r = /(?<backticks>`+)(?<inline>.+)\g<backticks>/
And I got:
s =~ r
$& # => "`` `foo`"
$~[:backticks] # => "`"
$~[:inline] # => " `foo"
Why is $~[:inline] not "` `foo"? Since $& is s, I expect:
$~[:backticks] + $~[:inline] + $~[:backticks]
to be s, but it is not, one backtick is gone. Where did the backtick go?
It is actually expected. Look:
(?<backticks>`+) - matches 1+ backticks and stores them in the named capture group "backticks" (there are two backticks). Then...
(?<inline>.+) - 1+ characters other than a newline are matched into the "inline" named capture group. It grabs all the string and backtracks to yield characters to the recursed subpattern that is actually the "backticks" capture group. So,...
\g<backticks> - finds 1 backtick that is at the end of the string. It satisfies the condition to match 1+ backticks. The named capture "backtick" buffer is re-written here.
The matching works like this:
"`` `foo`"
||1
| 2 |
|3
And then 1 becomes 3, and since 1 and 3 are the same group, you see one backtick.
I want to use sed to modify my file named "baz".
When i search a pattern foo , foo is not at the beginning or end of line, i want to append bar before foo, how can i do it using sed?
Input file named baz:
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
Output file
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
You can just use something like:
sed 's/foo/barfoo/g' baz
(the g at the end means global, every occurrence on each line rather than just the first).
For an arbitrary (rather than fixed) pattern such as foo[0-9], you could use capture groups as follows:
pax$ echo 'xyz fooA abc
xyz foo5 abc
xyz fooB abc' | sed 's/\(foo[0-9]\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc
xyz fooB abc
The parentheses capture the actual text that matched the pattern and the \1 uses it in the substitution.
You can use arbitrarily complex patterns with this one, including ensuring you match only complete words. For example, only changing the pattern if it's immediately surrounded by a word boundary:
pax$ echo 'xyz fooA abc
xyz foo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc' | sed 's/\(\bfoo[0-9]\b\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc
In terms of how the capture groups work, you can use parentheses to store the text that matches a pattern for later use in the replacement. The captured identifiers are based on the ( characters reading from left to right, so the regex (I've left off the \ escape characters and padded it a bit for clarity):
( ( \S* ) ( \S* ) )
^ ^ ^ ^ ^ ^
| | | | | |
| +--2--+ +--3--+ |
+---------1---------+
when applied to the text Pax Diablo would give you three groups:
\1 = Pax Diablo
\2 = Pax
\3 = Diablo
as shown below:
pax$ echo 'Pax Diablo' | sed 's/\(\(\S*\) \(\S*\)\)/[\1] [\2] [\3]/'
[Pax Diablo] [Pax] [Diablo]
Just substitute the start of the line with something different.
sed '/^foo/s/^/bar/'
To replace or modify all "foo" except at beginning or end of line, I would suggest to temporarily replace them at beginning and end of line with a unique sentinel value.
sed 's/^foo/____veryunlikelytoken_bol____/
s/foo$/____veryunlikelytoken_eol____/
s/foo/bar&/g
s/^____veryunlikelytoken_bol____/foo/
s/____veryunlikelytoken_eol____$/foo/'
In sed there is no way to specify "cannot match here". In Perl regex and derivatives (meaning languages which borrowed from Perl's regex, not necessarily languages derived from Perl) you have various negative assertions so you can do something like
perl -pe 's/(?!^)foo(?!$)/barfoo/g'