sed backreferences and command interpolation - bash

I am having an interesting issue using only sed to substitute short month strings (ex "Oct") with the corresponding number value (ex "10) given a string such as the following:
Oct 14 09:23:35 some other input
To be replaced directly via sed with:
14-10-2013 09:23:25 some other input
None of the following is actually relevant to solving the trivial problem of month string -> number conversion; I'm more interested in understanding some weird behavior I encountered while trying to solve this problem entirely with sed.
Without any attempt of this string substitution (the echo statement is in lieu of the actual input in my script):
...
MMM_DD_HH_mm_SS="([A-Za-z]{3}) ([0-9]{2}) (.+:[0-9]{2})"
echo "Oct 14 09:23:35 some other input" | sed -r "s/$MMM_DD_HH_mm_ss (.+)/\2-\1-\3 \4/"
Then how to transform the backreference \1 into a number. Of course one thinks of using command interpolation with the backreference as an argument:
...
TestFunc()
{
echo "received input $1$1"
}
...
echo "Oct 14 09:23:35 some other input" | sed -r "s/$MMM_DD_HH_mm_ss (.+)/\2-$(TestFunc \\1)-\3 \4/"
Where TestFunc would be a variation of the date command (as proposed by Jotne below) with the echo'd date-time group as an input. Here TestFunc is just an echo because I'm much more interested in the behavior of what the function believes to be the value of $1.
In this case the sed with TestFunc produces the output:
14-received input OctOct-09:23:35 some other input
Which suggests that sed actually is inserting backreference \1 into the command substitution $(...) for handling by TestFunc (which appears to receive \1 as the local variable $1).
However, all attempts to do anything more with the local $1 fail. For example:
TestFunc()
{
echo "processed: $1$1" > tmp.txt # Echo 1
if [ "$1" == "Oct" ]; then
echo "processed: 10"
else
echo "processed: $1$1" # Echo 2
fi
}
Returns:
14-processed: OctOct-09:23:35 some other input
$1 has been substituted into Echo 2, yet tmp.txt contains the value processed: \1\1; as if the backreference is not being inserted into the command substitution. Even weirder, the if condition fails with $1 != "Oct", yet it falls through to an echo statement which indicates $1 = "Oct".
My question is why is the backreference insertion working in the case of Echo 2 but not Echo 1? I suspect that the backreference insertion isn't working at all (given the failure of the if statement in TestFunc) but rather something subtle is going on that makes the substitution appear to work correctly in the case of Echo 2; what is that subtlety?
Solution
On further reflection I believe I understand what is going on:
\\1 is passed to the command substitution subroutine / child function as the literal \1. This is why equality test within the child function is failing.
however the echo function is correctly handling the string \\1 as $1. So echo "aa$1aa" returns the result of the command substitution to sed as aa\1aa. Other functions such as rev also "see" $1 as \1.
sed then interpolates \1 in aa\1aa as Oct or whatever the backreference is, to return aaOctaa to the user.
Since command substitution within regexes clearly works, it would be really cool if sed replaced the value of \\1 (or \1, whatever) with the backreference before executing the command substitution $(...); this would significantly increase sed's power...

This might work for you (GNU sed):
s/$/\nJan01...Oct10Nov11Dec12/;s/(...) (..) (..:..:.. .*)\n.*\1(..).*/\2-\4-2013 \3/;s/\n.*//' file
Add a lookup to the end of the line and use the back reference to match on it making sure to remove the lookup table in all cases.
Here's an example of passing a backreference to a function:
f(){ echo "x$1y$1z"; }
echo a b c | sed -r 's/(.) (.) (.)/'"$(f \\2)"'/'
returns:
xbybz
HTH

Use the correct tool:
date -d "Oct 14 09:23:35" +"%d-%m-%Y %H:%M:%S"
14-10-2013 09:23:35
Date does read your input and convert it to any format you like

Related

How to prepend to a string that comes out of a pipe

I have two strings saved in a bash variable delimited by :. I want to get extract the second string, prepend that with THIS_VAR= and append it to a file named saved.txt
For example if myVar="abc:pqr", THIS_VAR=pqr should be appended to saved.txt.
This is what I have so far,
myVar="abc:pqr"
echo $myVar | cut -d ':' -f 2 >> saved.txt
How do I prepend THIS_VAR=?
printf 'THIS_VAR=%q\n' "${myVar#*:}"
See Shell Parameter Expansion and run help printf.
The more general solution in addition to #konsolebox's answer is piping into a compound statement, where you can perform arbitrary operations:
echo This is in the middle | {
echo This is first
cat
echo This is last
}

In bash how can I get the last part of a string after the last hyphen [duplicate]

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

I want to extract the strings from file name

one_two_three_four_five.rtf
I need five in A variable
I need four in B variable
And remaining in C variable
Should read from the last character
Note after 2 underscore from the last. There could be many underscores but should take has C variable.
Is it possible?
For example using parameter expansion
#!/bin/ksh
string="one_two_three_four_five.rtf"
base=${string%.rtf}
a=${base##*_}; base=${base%_$a}
b=${base##*_}; base=${base%_$b}
c=$base
echo "$a - $b - $c"
s="one_two_three_four_five.rtf"
source <(sed -r 's/(.*)_([^_]*)_([^_]*)[.].*/C="\1"; B="\2";A="\3"/' <<< "${s}")
# Result:
echo "A=$A, B=$B, C=$C"
A=five, B=four, C=one_two_three
Explanation:
sed -r No need for escaping backslashes
(.*)_ Matches largest string until underscore with the condition that there are underscores left for matching the remaining string
([^_]*) String without underscore
[.] A dot without special meaning
"\1" First remembered string
<<< "${s}" Input for sed is like echo "${s}" | sed ...
<(..) Simulates a file, so sourcing these will execute the commands.

Bash Columns SED and BASH Commands without AWK?

I wrote 2 difference scripts but I am stuck at the same problem.
The problem is am making a table from a file ($2) that I get in args and $1 is the numbers of columns. A little bit hard to explain but I am gonna show you input and output.
The problem is now that I don't know how I can save every column now in a difference var so i can build it in my HTML code later
#printf #TR##TD#$...#/TD##TD#$...#/TD##TD#$..#/TD##/TR##TD#$...
so input look like that :
Name\tSize\tType\tprobe
bla\t4711\tfile\t888888888
abcde\t4096\tdirectory\t5555
eeeee\t333333\tblock\t6666
aaaaaa\t111111\tpackage\t7777
sssss\t44444\tfile\t8888
bbbbb\t22222\tfolder\t9999
Code :
c=1
column=$1
file=$2
echo "$( < $file)"| while read Line ; do
Name=$(sed "s/\\\t/ /g" $file | cut -d' ' -f$c,-$column)
printf "$Name \n"
#let c=c+1
#printf "<TR><TD>$Name</TD><TD>$Size</TD><TD>$Type</TD></TR>\n"
exit 0
done
Output:
Name Size Type probe
bla 4711 file 888888888
abcde 4096 directory 5555
eeeee 333333 block 6666
aaaaaa 111111 package 7777
sssss 44444 file 8888
bbbbb 22222 folder 9999
This is tailor-made job for awk. See this script:
awk -F'\t' '{printf "<tr>";for(i=1;i<=NF;i++) printf "<td>%s</td>", $i;print "</tr>"}' input
<tr><td>bla</td><td>4711</td><td>file</td><td>888888888</td></tr>
<tr><td>abcde</td><td>4096</td><td>directory</td><td>5555</td></tr>
<tr><td>eeeee</td><td>333333</td><td>block</td><td>6666</td></tr>
<tr><td>aaaaaa</td><td>111111</td><td>package</td><td>7777</td></tr>
<tr><td>sssss</td><td>44444</td><td>file</td><td>8888</td></tr>
<tr><td>bbbbb</td><td>22222</td><td>folder</td><td>9999</td></tr>
In bash:
celltype=th
while IFS=$'\t' read -a columns; do
rowcontents=$( printf '<%s>%s</%s>' "$celltype" "${columns[#]}" "$celltype" )
printf '<tr>%s</tr>\n' "$rowcontents"
celltype=td
done < <( sed $'s/\\\\t/\t/g' "$2")
Some explanations:
IFS=$'\t' read -a columns reads a line from standard input, using only the tab character to separate fields, and putting each field into a separate element of the array columns. We change IFS so that other whitespace, which could occur in a field, is not treated as a field delimiter.
On the first line read from standard input, <th> elements will be output by the printf line. After resetting the value of celltype at the end of the loop body, all subsequent rows will consist of <td> elements.
When setting the value of rowcontents, take advantage of the fact that the first argument is repeated as many times as necessary to consume all the arguments.
Input is via process substitution from the sed command, which requires a crazy amount of quoting. First, the entire argument is quoted with $'...', which tells bash to replace escaped characters. bash converts this to the literal string s/\\t/^T/g, where I am using ^T to represent a literal ASCII 09 tab character. When sed sees this argument, it performs its own escape replacement, so the search text is a literal backslash followed by a literal t, to be replaced by a literal tab character.
The first argument, the column count, is unnecessary and is ignored.
Normally, you avoid making the while loop part of a pipeline because you set parameters in the loop that you want to use later. Here, all the variables are truly local to the while loop, so you could avoid the process substitution and use a pipeline if you wish:
sed $'s/\\\\t/\t/g' "$2" | while IFS=$'\t' read -a columns; do
...
done

how to chop last n bytes of a string in bash string choping?

for example qa_sharutils-2009-04-22-15-20-39, want chop last 20 bytes, and get 'qa_sharutils'.
I know how to do it in sed, but why $A=${A/.\{20\}$/} does not work?
Thanks!
If your string is stored in a variable called $str, then this will get you give you the substring without the last 20 digits in bash
${str:0:${#str} - 20}
basically, string slicing can be done using
${[variableName]:[startIndex]:[length]}
and the length of a string is
${#[variableName]}
EDIT:
solution using sed that works on files:
sed 's/.\{20\}$//' < inputFile
similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3
using awk:
echo $str | awk '{print substr($0,1,length($0)-20)}'
or using strings manipulation - echo ${string:position:length}:
echo ${str:0:$((${#str}-20))}
In the ${parameter/pattern/string} syntax in bash, pattern is a path wildcard-style pattern, not a regular expression. In wildcard syntax a dot . is just a literal dot and curly braces are used to match a choice of options (like the pipe | in regular expressions), so that line will simply erase the literal string ".20".
There are several ways to accomplish the basic task.
$ str="qa_sharutils-2009-04-22-15-20-39"
If you want to strip the last 20 characters. This substring selection is zero based:
$ echo ${str::${#str}-20}
qa_sharutils
The "%" and "%%" to strip from the right hand side of the string. For instance, if you want the basename, minus anything that follows the first "-":
$ echo ${str%%-*}
qa_sharutils
only if your last 20 bytes is always date.
$ str="qa_sharutils-2009-04-22-15-20-39"
$ IFS="-"
$ set -- $str
$ echo $1
qa_sharutils
$ unset IFS
or when first dash and beyond are not needed.
$ echo ${str%%-*}
qa_sharutils

Resources