Bash script, command - output to array, then print to file - bash

I need advice on how to achieve this output:
myoutputfile.txt
Tom Hagen 1892
State: Canada
Hank Moody 1555
State: Cuba
J.Lo 156
State: France
output of mycommand:
/usr/bin/mycommand
Tom Hagen
1892
Canada
Hank Moody
1555
Cuba
J.Lo
156
France
Im trying to achieve with this shell script:
IFS=$'\r\n' GLOBIGNORE='*' :; names=( $(/usr/bin/mycommand) )
for name in ${names[#]}
do
#echo $name
echo ${name[0]}
#echo ${name:0}
done
Thanks

Assuming you can always rely on the command to output groups of 3 lines, one option might be
/usr/bin/mycommand |
while read name;
read year;
read state; do
echo "$name $year"
echo "State: $state"
done
An array isn't really necessary here.
One improvement could be to exit the loop if you don't get all three required lines:
while read name && read year && read state; do
# Guaranteed that name, year, and state are all set
...
done

An easy one-liner (not tuned for performance):
/usr/bin/mycommand | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
It reads 3 lines at a time from the pipe and then passes them to a new instance of printf which is used to format the output.
If you have whitespace at the beginning (it looks like that in your example output), you may need to use something like this:
/usr/bin/mycommand | sed -e 's/^\s*//g' | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"

#!/bin/bash
COUNTER=0
/usr/bin/mycommand | while read LINE
do
if [ $COUNTER = 0 ]; then
NAME="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 1 ]; then
YEAR="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 2 ]; then
STATE="$LINE"
COUNTER=0
echo "$NAME $YEAR"
echo "State: $STATE"
fi
done

chepner's pure bash solution is simple and elegant, but slow with large input files (loops in bash are slow).
Michael Jaros' solution is even simpler, if you have GNU xargs (verify with xargs --version), but also does not perform well with large input files (external utility printf is called once for every 3 input lines).
If performance matters, try the following awk solution:
/usr/bin/mycommand | awk '
{ ORS = (NR % 3 == 1 ? " " : "\n")
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") }
{ print (NR % 3 == 0 ? "State: " : "") $0 }
' > myoutputfile.txt
NR % 3 returns the 0-based index of each input line within its respective group of consecutive 3 lines; returns 1 for the 1st line, 2 for the 2nd, and 0(!) for the 3rd.
{ ORS = (NR % 3 == 1 ? " " : "\n") determines ORS, the output-record separator, based on that index: a space for line 1, and a newline for lines 2 and 3; the space ensures that line 2 is appended to line 1 with a space when using print.
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") strips leading and trailing whitespace from the line - including, if present, a trailing \r, which your input seems to have.
{ print (NR % 3 == 0 ? "State: " : "") $0 } prints the trimmed input line, prefixed by "State: " only for every 3rd input line, and implicitly followed by ORS (due to use of print).

Related

Is there a command for substituting a set of characters by a set of strings?

I'm would like to substitute a set of edit: single byte characters with a set of literal strings in a stream, without any constraint on the line size.
#!/bin/bash
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'
The expected output would be:
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...
I can think of a bash function that would do that, something like:
chars_to_strings() {
local delim buffer
while true
do
delim=''
IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'
if [[ -n "${delim:+_}" ]] || [[ -n "${buffer:+_}" ]]
then
# Do the replacements in "$buffer"
# ...
printf "%s%s" "$buffer" "$delim"
else
break
fi
done
}
But I'm looking for a more efficient way, any thoughts?
Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?
sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'
Or, via separate commands:
sed -e $'s/\a/<bell>/g' \
-e $'s/\b/<backspace>/g' \
-e $'s/\t/<horizontal-tab>/g' \
-e $'s/\v/<vertical-tab>/g'
Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):
$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
{
gsub(/\a/, "<bell>")
gsub(/\b/, "<backspace>")
gsub(/\t/, "<horizontal-tab>")
gsub(/\v/, "<vertical-tab>")
print $0
}
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>
For a simple one-liner with reasonable portability, try Perl.
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'
Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.
Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).
Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...
A few items:
continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')
General idea:
chars_to_strings() {
while read -r -n 15 buffer # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
do
printf "%s\n" "${buffer}"
done | awk '{print NR,FNR,length($0)}' # replace 'print ...' with OP's replacement logic
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
A variation on this idea using a named pipe:
mkfifo /tmp/pipeX
sleep infinity > /tmp/pipeX # keep pipe open so awk does not exit
awk '{print NR,FNR,length($0)}' < /tmp/pipeX &
chars_to_strings() {
while read -r -n 15 buffer
do
printf "%s\n" "${buffer}"
done > /tmp/pipeX
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
# kill background 'awk' and/or 'sleep infinity' when no longer needed
don't waste FS/OFS - use the built-in variables to take 2 out of the 5 needed :
echo $' \t abc xyz \t \a \n\n ' |
mawk 'gsub(/\7/, "<bell>", $!(NF = NF)) + gsub(/\10/,"<bs>") +\
gsub(/\11/,"<h-tab>")^_' OFS='<v-tab>' FS='\13' ORS='<newline>'
<h-tab> abc xyz <h-tab> <bell> <newline><newline> <newline>
To have NO constraint on the line length you could do something like this with GNU awk:
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(foo,bar)
print
}'
That will read and process the input 100 chars at a time no matter which chars are present, whether it has newlines or not, and even if the input was one multi-terabyte line.
Replace gsub(foo,bar) with whatever substitution(s) you have in mind, e.g.:
$ printf '\a,\b,\t,\v' |
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(/\a/,"<bell>")
gsub(/\b/,"<backspace>")
gsub(/\t/,"<horizontal-tab>")
gsub(/\v/,"<vertical-tab>")
print
}'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab>
and of course it'd be trivial to pass a list of old and new strings to awk rather than hardcoding them, you'd just have to sanitize any regexp or backreference metachars before calling gsub().

How to make that variable $f defined how much "Freq" will be printed from column number 3?

I need a help with my bash script. I've problem with code:
for v in $(seq 1 $f)); do echo $(grep "Freq" freq.log) | awk '{print$3}')
because this comands printed $f times column number 3 instead should be printed $f values of "Freq" from column number 3.
It's look like
enter image description here
Should be like
enter image description here
I don't know how make that variable $f defined how much "Freq" will be printed from column number 3. In this file I've plenty expressions of "Freq" but I need just $f.
For sure I pasted all content of script:
#!/bin/bash
e=$(grep "atomic number" freq.log | tail -1 | awk '{print$2}')
echo "Liczba atomow znajdujacyh sie w podanej czasteczce wynosi: $e"
f=$(bc <<< "($e*3-6)/3")
echo "Liczba wartosci Freq, ktore wczyta skrypt to $f"
for v in $(seq 1 $f); do
echo "$(grep "Freq" freq.log | awk '{print$3}')"
done
Sample input data file; geometry optimization calculations in GAUSSIAN
A A A
Frequencies -- 182.1477 202.8948 227.7144
Red. masses -- 6.6528 8.2622 6.3837
Frc consts -- 0.1300 0.2004 0.1950
IR Inten -- 0.8602 0.4870 1.2090
NAtoms= 35 NActive= 35 NUniq= 35 SFac= 1.00D+00 NAtFMM= 60 NAOKFM=F Big=F
Here is your bash script converted to a single awk script:
awk script script.awk
/atomic number/{ # for each line matching regEx pattern "atomic number"
e = $2; # store current 2nd field in variable e
}
/Freq/{ # for each line matching regEx pattern "Freq"
freqArr[fr++]=$3; # add 3rd field to array freqArr, increment array counter fr
}
END { # on complete scanning input file
print "Liczba atomow znajdujacyh sie w podanej czasteczce wynosi: " e;
f = ( ((e * 3) - 6) / 3 ); # claculate vairable f
print "Liczba wartosci Freq, ktore wczyta skrypt to " f;
for (currFreq in freqArr) { # scan all element freqArr
if (currFreq == f) # if currFreq equals f
freqCount++; # increment freqCount coutner
}
print freqCount;
}
run command
awk -f script.awk freq.log

bash to identify and verify file headers

Using the tab-delimited file below I am trying to validate the header line 1 and then store that number in a variable $header to use in a couple of if statements. If $header equals 10 then file has expected number of fields, but if $header less than 10 file is missing header for: and the missing header fields are printed underneath. The bash seems close and if i use the awk by itself it seems to work perfectly, but I can not seem to use it in the if. Thank you :).
file.txt
Index Chr Start End Ref Alt Freq Qual Score Input
1 1 1 100 C - 1 GOOD 10 .
2 2 20 200 A C .002 STRAND BIAS 2 .
3 2 270 400 - GG .036 GOOD 6 .
file2.txt
Index Chr Start End Ref Alt Freq Qual Score
1 1 1 100 C - 1 GOOD 10
2 2 20 200 A C .002 STRAND BIAS 2
3 2 270 400 - GG .036 GOOD 6
bash
for f in /home/cmccabe/Desktop/validate/*.txt; do
bname=`basename $f`
pref=${bname%%.txt}
header=$(awk -F'\t' '{print NF, "fields detected in file and they are:" ORS $0; exit}') $f >> ${pref}_output # detect header row in file and store in header and write to output
if [[ $header == "10" ]]; then # display results
echo "file has expected number of fields" # file is validated for headers
else
echo "file is missing header for:" # missing header field ...in file not-validated
echo "$header"
fi # close if.... else
done >> ${pref}_output
desired output for file.txt
file has expected number of fields
desired output for file1.txt
file is missing header for:
Input
You can use awk if you like, but bash is more than capable of handling the first line fields comparison on its own. If you maintain an array of expected field names, you can then easily split the first line into fields, compare against the expected number of fields, and output the identity of the missing field if you read less than the expected number of fields from any given file.
The following is a short example that takes filenames as arguments (you need to take filenames from stdin for a large number of files, or use xargs, as required). The script simply reads the first line in each file, separates the line into fields, checks the field count, and outputs any missing fields in a short error message:
#!/bin/bash
declare -i header=10 ## header has 10 fields
## aray of field names (can be read from 1st file)
fields=( "Index"
"Chr"
"Start"
"End"
"Ref"
"Alt"
"Freq"
"Qual"
"Score"
"Input" )
for i in "$#"; do ## for each file given as argument
read -r line < "$i" ## read first line from file into 'line'
oldIFS="$IFS" ## save current Internal Field Separator (IFS)
IFS=$'\t' ## set IFS to word-split on '\t'
fldarray=( $line ); ## fill 'fldarray' with fields in line
IFS="$oldIFS" ## restore original IFS
nfields=${#fldarray[#]} ## get number of fields in 'line'
if (( nfields < header )) ## test against header
then
printf "error: only '%d' fields in file '%s'\nmissing:" "$nfields" "$i"
for j in "${fields[#]}" ## for each expected field
do ## check against those in line, if not present print
[[ $line =~ $j ]] || printf " %s" "$j"
done
printf "\n\n" ## tidy up with newlines
fi
done
Example Input
$ cat dat/hdr.txt
Index Chr Start End Ref Alt Freq Qual Score Input
1 1 1 100 C - 1 GOOD 10 .
2 2 20 200 A C .002 STRAND BIAS 2 .
3 2 270 400 - GG .036 GOOD 6 .
$ cat dat/hdr2.txt
Index Chr Start End Ref Alt Freq Qual Score
1 1 1 100 C - 1 GOOD 10
2 2 20 200 A C .002 STRAND BIAS 2
3 2 270 400 - GG .036 GOOD 6
$ cat dat/hdr3.txt
Index Chr Start End Alt Freq Qual Score Input
1 1 1 100 - 1 GOOD 10 .
2 2 20 200 C .002 STRAND BIAS 2 .
3 2 270 400 GG .036 GOOD 6 .
Example Use/Output
$ bash hdrfields.sh dat/hdr.txt dat/hdr2.txt dat/hdr3.txt
error: only '9' fields in file 'dat/hdr2.txt'
missing: Input
error: only '9' fields in file 'dat/hdr3.txt'
missing: Ref
Look things over, while awk can do many things bash cannot on its own, bash is more than capable with parsing text.
Here is one in GNU awk (nextfile):
$ awk '
FNR==NR {
for(n=1;n<=NF;n++)
a[$n]
nextfile
}
NF==(n-1) {
print FILENAME " file has expected number of fields"
nextfile
}
{
for(i=1;i<=NF;i++)
b[$i]
print FILENAME " is missing header for: "
for(i in a)
if(i in b==0)
print i
nextfile
}' file1 file1 file2
file1 file has expected number of fields
file2 is missing header for:
Input
The first file processed by the script defines the headers (in a) that the following files should have and compares them (in b) against it.
This piece of code will do exactly what you are asking. Let me know if it works for you.
for f in ./*.txt; do
[[ $( head -1 $f | awk '{ print NF}' ) -eq 10 ]] && echo "File $f has all the fields on its header" || echo "File $f is missing " $( echo "Index Chr Start End Ref Alt Freq Qual Score Input $( head -1 $f )" | tr ' ' '\n' | sort | uniq -c | awk '/1 / {print $2}' );
done
Output :
File ./file2.txt is missing Input
File ./file.txt has all the fields on its header

bash routine to return the page number of a given line number from text file

Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed

How can I get the column in BASH?

I have a large .xml file like that:
c1="a1" c2="b1" c3="cccc1"
c1="aa2" c2="bbbb2" c3="cc2"
c1="aaaaaa3" c2="bb3" c3="cc3"
...
I need the result like the following:
a1 b1 cccc1
aa2 bbbb2 cc2
aaaaaa3 bb3 cc3
...
How can I get the column in BASH?
I have the following method in PL/SQL,but it's very inconvenient:
SELECT C1,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 1)) c1,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 2)) c2,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 3)) c3
FROM TEST;
Use cut:
cut -d'"' -f2,4,6 --output-delimiter=" " test.txt
Or you can use sed if the number of columns is not known:
sed 's/[a-z][a-z0-9]\+="\([^"]\+\)"/\1/g' < test.txt
Explanation:
[a-z][a-z0-9]\+ - matches a string starting with a alpha char followed by any number of alphanumeric chars
"\([^"]\+\)" - captures any string inside the quotes
\1 - represents the captured string that in this case is used to replace the entire match
A perl approach (based on the awk answer by #A-Ray)
perl -F'"' -ane 'print join(" ",#F[ map { 2 * $_ + 1} (0 .. $#F) ]),"\n";' < test.txt
Explanation:
-F'"' set input separator to "
-a turn autosplit on - this results in #F being filed with content of fields in the input
-n iterate through all lines but don't print them by default
-e execute code following
map { 2 * $_ + 1} (0 .. $#F) generates a list of indexes (1,3,5 ...)
#F[map { 2 * $_ + 1} (0 .. $#F)] takes a slice from the array, selecting only odd fields
join - joins the slice with spaces
NOTE: I would not use this approach without a good reason, the first two are easier.
Some benchmarking (on a Raspberry Pi, with a 60000 lines input file and output thrown away to /dev/null)
cut - 0m0.135s no surprise there
sed - 0m5.864s
perl - 0m8.218s - I guess regenerating the index list every line isn't that fast (with a hard coded slice list it goes to half, but that would defeat the purpose)
the read based solution - 0m52.027s
You can also look at the built-in substring replacement/removal bash offers. Either in a short script or One-Liner:
#!/bin/bash
while read -r line; do
new=${line//c[0-9]=/} ## remove 'cX=', where X is '0-9'
new=${new//\"/} ## remove all '"' (double-quotes)
echo "$new"
done <"$1"
exit 0
Input
$ cat dat/stuff.xml
c1="a1" c2="b1" c3="cccc1"
c1="aa2" c2="bbbb2" c3="cc2"
c1="aaaaaa3" c2="bb3" c3="cc3"
Output
$ bash parsexmlcx.sh dat/stuff.xml
a1 b1 cccc1
aa2 bbbb2 cc2
aaaaaa3 bb3 cc3
As a One-Liner
while read -r line; do new=${line//c[0-9]=/}; new=${new//\"/}; echo "$new"; done <dat/stuff.xml
awk -F '"' '{ for(i=2; i<=NF; i+=2) { printf $i" " } print "" }'
Explanation
-F '"' makes Awk treat quotation marks (") as field delimiters. For example, Awk will split the line...
c1="a1" c2="b1" c3="cccc1"
...into fields numbered as...
1: 'c1='
2: 'a1'
3: ' c2='
4: 'b1'
5: ' c3='
6: 'cccc1'
7: ''
for(i=2; i<=NF; i+=2) { printf $i" " } starts at field 2, prints the value of the field, skips a field, and continues. In this case, fields 2, 4, and 6 will be printed.
print outputs a string following by a newline. printf also outputs a string, but doesn't append a newline. Therefore...
printf $i" "
...outputs the value of field $i followed by a space.
print ""
...simply outputs a newline.

Resources