Separating sections of a text file with a bash script - bash

I have a list:
### To Read:
One Hundred Years of Solitude | Gabriel García Márquez
Moby-Dick | Herman Melville
Frankenstein | Mary Shelley
On the Road | Jack Kerouac
Eyeless in Gaza | Aldous Huxley
### Read:
The Name of the Wind (The Kingkiller Chronicles: Day One) | Patrick Rothfuss | 6-27-2013
The Wise Man’s Fear (The Kingkiller Chronicles: Day Two) | Patrick Rothfuss | 8-4-2013
Vampires in the Lemon Grove | Karen Russell | 12-25-2013
Brave New World | Aldous Huxley | 2-2014
I'd like to use something like python's string.split(' | ') to separate the various fields into separate strings, but since the two sections have different numbers of fields, I think I need to treat them differently. How do I go about selecting the lines in between '### To Read:' and '### Read:' and after '### Read:' and splitting them? Should I use awk or sed?

You have not specified any desired output. So, as I interpret your question, you want to read certain lines from a file, split the lines on '|' and, analogous to python lists, put the results in bash arrays. The specified lines include all lines after ### To Read: except for the line that reads ### Read:. The script below does this and then, to demonstrate success, displays the arrays (using declare):
active=
while read line
do
if [ "$line" = '### To Read:' ]
then
active=1
elif [ "$line" = '### Read:' ]
then
active=1
elif [ "$active" ]
then
IFS='|' my_array=($line)
declare -p my_array
fi
done <mylist
The output from your sample input is:
declare -a my_array='([0]="One Hundred Years of Solitude " [1]=" Gabriel García Márquez")'
declare -a my_array='([0]="Moby-Dick " [1]=" Herman Melville")'
declare -a my_array='([0]="Frankenstein " [1]=" Mary Shelley")'
declare -a my_array='([0]="On the Road " [1]=" Jack Kerouac")'
declare -a my_array='([0]="Eyeless in Gaza " [1]=" Aldous Huxley")'
declare -a my_array='([0]="The Name of the Wind (The Kingkiller Chronicles: Day One) " [1]=" Patrick Rothfuss " [2]=" 6-27-2013")'
declare -a my_array='([0]="The Wise Man’s Fear (The Kingkiller Chronicles: Day Two) " [1]=" Patrick Rothfuss " [2]=" 8-4-2013")'
declare -a my_array='([0]="Vampires in the Lemon Grove " [1]=" Karen Russell " [2]=" 12-25-2013")'
declare -a my_array='([0]="Brave New World " [1]=" Aldous Huxley " [2]=" 2-2014")'
Note that this approach easily handles the input even though the lines have different numbers of fields.

You are not telling us how to deliver the final output, but here is a skeleton for an Awk solution.
awk -F ' \| ' '/^### To read:/ { s=1; next }
/^### Read:/ { s=2; next }
s==1 { print $1 "," $2 ",\"\"" }
s == 2 { print $1 "," $2 "," $3 }' file
This will simply print an empty third field from the first subsection. You can obviously adapt the actions to be anything you like, or rewrite this in Python if you are more familiar with that.

Related

Bash: concatenated variables derived from text file using grep gives confused output

In my directory, I have a multiple nifti files (e.g., WIP944_mp2rage-0.75iso_TR5.nii) from my MRI scanner accompanied by text files (e.g., WIP944_mp2rage-0.75iso_TR5_info.txt) containing information on the acquisition parameters (e.g., "Series description: WIP944_mp2rage-0.75iso_TR5_INV1_PHS_ND"). Based on these parameters (e.g., INV1_PHS_ND), I need to change the nifti file name, which are echoed in $niftibase. I used grep to do this. When echoing all variables individually, it gives me what I want, but when I try to concatenate them into one filename, the variables are mixed together, instead of delimited by a dot.
I tried multiple forms of sed to cut away potentially invisible characters and identified the source of the problems: the "INV1_PHS_ND" part of 'series description' gives me troubles, which is the $struct component, potentially due to the fact that this part varies in how many fields are extracted. Sometimes this is 3 (in the case of INV1_PHS_ND), but it can be 2 as well (INV1_ND). When I introduce this variable into the filename, everything goes haywire.
for infofile in ${PWD}/*.txt; do
# General characteristics of subjects (i.e., date of session, group number, and subject number)
reco=$(grep -A0 "Series description:" ${infofile} | cut -d ' ' -f 3 | cut -d '_' -f 1)
date=$(grep -A0 "Series date:" ${infofile} | cut -c 16-21)
group=$(grep -A0 "Subject:" ${infofile} | cut -d '^' -f 2 | cut -d '_' -f 1 )
number=$(grep -A0 "Subject:" ${infofile} | cut -d '^' -f 2 | cut -d '_' -f 2)
ScanNr=$(grep -A0 "Series number:" ${infofile} | cut -d ' ' -f 3)
# Change name if reco has structural prefix
if [[ $reco = *WIP944* ]]; then
struct=$(grep -A0 "Series description: WIP944" ${infofile} | cut -d '_' -f 4,5,6)
niftibase=$(basename $infofile _info.txt).nii
#echo ${subStudy}.struct.${date}.${group}.${protocol}.${paradigm}.nii
echo ${subStudy}.struct.${struct}.${date}.${group}.${protocol}${number}.${paradigm}.n${ScanNr}.nii
#mv ${niftibase} ${subStudy}.struct.${struct}.${date}.${group}.${protocol}${number}.${paradigm}.n${ScanNr}.nii
fi
done
This gives me output like this:
.niit47.n4lot.Noc002
.niit47.n5lot.Noc002D
.niit47.n6lot.Noc002
.niit47.n8lot.Noc002
.niit47.n9lot.Noc002
.niit47.n10ot.Noc002
.niit47.n11ot.Noc002D
for all 7 WIP944 files. However, it needs to be in the direction of this:
H1.struct.INV2_PHS_ND.190523.Pilot.Noc001.Heat47.n11.nii, where H1, Noc, and Heat47 are loaded in from a setup file.
EDIT: I tried to use awk in the following way:
reco=$(awk 'FNR==8 {print;exit}' $infofile | cut -d ' ' -f 3 | cut -d '_' -f 1)
date=$(awk 'FNR==2 {print;exit}' $infofile | cut -c 15-21)
group=$(awk 'FNR==6 {print;exit}' $infofile | cut -d '^' -f 2 | cut -d '_' -f 1 )
number=$(awk 'FNR==6 {print;exit}' $infofile | cut -d '^' -f 2 | cut -d '_' -f 2)
ScanNr=$(awk 'FNR==14 {print;exit}' $infofile | cut -d ' ' -f 3)
which again gave me the correct output when echoing the variables individually, but not when I tried to combine them: .niit47.n11022_PHS_ND.
I used echo "$struct" | tr -dc '[:print:]' | od -c to see if there were hidden characters due to line endings, which resulted in:
0000000 I N V 2 _ P H S _ N D
0000013
EDIT: This is how the text file looks like:
Series UID: 1.3.12.2.1107.5.2.34.18923.2019052316005066316714852.0.0.0
Study date: 20190523
Study time: 153529.718000
Series date: 20190523
Series time: 160111.750000
Subject: MDC-0153,pilot_003^pilot_003
Subject birth date: 19970226
Series description: WIP944_mp2rage-0.75iso_TR5_INV1_PHS_ND
Image type: ORIGINAL\PRIMARY\P\ND
Manufacturer: SIEMENS
Model name: Investigational_Device_7T
Software version: syngo MR B17
Study id: 1
Series number: 5
Repetition time (ms): 5000
Echo time[1] (ms): 2.51
Inversion time (ms): 900
Flip angle: 7
Number of averages: 1
Slice thickness (mm): 0.75
Slice spacing (mm):
Image columns: 320
Image rows: 320
Phase encoding direction: ROW
Voxel size x (mm): 0.75
Voxel size y (mm): 0.75
Number of volumes: 1
Number of slices: 240
Number of files: 240
Number of frames: 0
Slice duration (ms) : 0
Orientation: sag
PixelBandwidth: 248
I have one of these for each nifti file. subStudy is hardcoded in a setup file, which is loaded in prior to running the for loop. When I echo this, it shows the correct value. I need to change the names of multiple files with a specific prefix, which are stored in $reco.
As confirmed in comments, the input files have DOS carriage returns, which are basically invalid in Unix files. Also, you should pay attention to proper quoting.
As a general overhaul, I would recommend replacing the entire Bash script with a simple Awk script, which is both simpler and more idiomatic.
for infofile in ./*.txt; do # no need to use $(PWD)
# Pre-filter with a simple grep
grep -q '^Series description: [^ _]*WIP944' "$infofile" && continue
# Still here? Means we want to rename
suffix="$(awk -F : '
BEGIN { split("Series description:Series date:Subject:Series number", f, /:/) }
{ sub(/\r/, ""); } # get rid of pesky DOS carriage return
NR == 1 { nifbase = FILENAME; sub(/_info\.txt$/, ".nii", nifbase) }
$1 in f { x[$1] = substring($0, length($1)+2) }
END {
split(x["Series description"], t, /_/); struct=t[4] "_" t[5] "_" t[6]
split(x["Series description"], t, /_/); reco = t[1]
date=substr(x["Series date"], 16, 5)
split(x["Subject"], t, /\^/); split(t[2], tt, /_/); group=tt[1]
number=tt[2]
ScanNr=x["Series number"]
### FIXME: protocol and paradigm are still undefined
print struct "." date "." group "." protocol number "." paradigm ".n" ScanNr
}' "$infofile")"
echo mv "$infofile" "$subStudy.struct.$suffix"
done
This probably still requires some tweaking (at least "protocol" and "paradigm" are still undefined). Once it seems to print the correct values, you can remove the echo before mv and have it actually rename files for you.
(Probably still better test on a copy of your real data files first!)

How to search a string in a file and print the line shell script

I have a file where there is a lot of books with index number.
I want to search the books with index number.
The file format is kind of like this:
"The Declaration of Independence of the United States of America,
1
by Thomas Jefferson"
......................
Alice's Adventures in Wonderland, by Lewis Carroll
11
#!/bin/bash
echo "Enter the content your are searching for:"
read content
echo -e "\nResult Shwoing For: $content\n"
grep $content GUTINDEX.ALL
If user search for 1.This code is printing 1, 11 every line that has one in them. I want to only print the line which contains 1:
"The Declaration of Independence of the United States of America, 1
simple use the -w flag, read more at grep --help
grep -w ${line_number} ${file_name}
for grep -w 1 books
The Declaration of Independence of the United States of America 1
Bobs's 1 in Wonderland, by Lewis Carroll 11
it may catch book names that contains number,
so better use regex [${digit}]$ for example [1]$ for matching
index at end of line.
grep -w [${line_number}]$ ${file_name}
for grep -w 1$ books
The Declaration of Independence of the United States of America, 1
you need to use regex. Change grep to egrep.
file:
1
11
111
if you want to search only 1 then you can use
cat file | egrep "^1$" # it means start and end with 1.`
then you need extend scrip. For example
file.txt
abc,1
abd,111
abf,11111
#
cat file.txt | while read line ; do
res=$(echo ${line} | awk -v FS=',' '{print $2}' | grep "^1$")
if [ $? -eq 0 ]; then
echo $line
fi
done

Tabulate part of text file written by shell

I have a shell script that is writing(echoing) the output on an array to a file. The file is in the following format
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
Client Name,Account Number,Amount,Tran Time
Michael Press,20484,602117,11.41.02
Adam West,164121,50152,11.41.06
John Smith,15113,411700,11.41.07
Leo Anderson,2115116,350056,11.41.07
Wayne Clark,451987,296503,11.41.08
And i have multiple such line.
How do i tabulate the names after ---?
I tried using spaces while echoing the array elements. Also tried tabs. I tried using column -t -s options. But the text above the --- is interfering with the desired output.
The desired output is
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
Client Name Account Number Amount Tran Time
Michael Press 20484 602117 11.41.02
Adam West 164121 50152 11.41.06
John Smith 15113 411700 11.41.07
Leo Anderson 2115116 350056 11.41.07
Wayne Clark 451987 296503 11.41.08
The printing to a file is a part of a bigger script. So, i am looking for a simple solution to plug into this script.
Here's the snippet from that script where i am echoing to the file.
echo "The tansaction detials for today are 35 " >> log.txt
echo "" >> log.txt
echo " Please check the 5 biggest transactios below " >> log.txt
echo "" >> log.txt
echo "-----------------------------------------------------------------------------------" >> log.txt
echo "" >> log.txt
echo "" >> log.txt
echo "Client Name,Account Number,Amount,Tran Time" >> log.txt
array=( `output from a different script` )
x=1
for i in ${array[#]}
do
#echo "Array $x - $i"
Clientname=$(echo $i | cut -f1 -d',')
accountno=$(echo $i | cut -f2 -d',')
amount=$(echo $i | cut -f3 -d',')
trantime=$(echo $i | cut -f4 -d',')
echo "$Clientname,$accountno,$amount,$trantime" >> log.txt
(( x=$x+1 ))
done
I'm not sure to understand everythings =P
but to answer this question :
How do i tabulate the names after ---?
echo -e "Example1\tExample2"
-e means : enable interpretation of backslash escapes
So for your output, I suggest :
echo -e "$Clientname\t$accountno\t$amount\t$trantime" >> log.txt
Edit : If you need more space, you can double,triple,... it
echo -e "Example1\t\tExample2"
If I understand your question, in order to produce the output format of:
Client Name Account Number Amount Tran Time
Michael Press 20484 602117 11.41.02
Adam West 164121 50152 11.41.06
John Smith 15113 411700 11.41.07
Leo Anderson 2115116 350056 11.41.07
Wayne Clark 451987 296503 11.41.08
You should use the output formatting provided by printf instead of echo. For example, for the headings, you can use:
printf "Client Name Account Number Amount Tran Time\n" >> log.txt
instead of:
echo "Client Name,Account Number,Amount,Tran Time" >> log.txt
For writing the five largest amounts and details, you could use:
printf "%-14s%-17s%8s%s\n" "$Clientname" "$accountno" "$amount" "$trantime" >> log.txt
instead of:
echo "$Clientname,$accountno,$amount,$trantime" >> log.txt
If that isn't what you are needing, just drop a comment and let me know and I'm happy to help further.
(you may have to tweak the field widths a bit, I just did a rough count)
True Tabular Output Requires Measuring Each Field
If you want to insure that your data is always in tabular form, you need to measure each field width (including the heading) and then take the max of either the field width (or heading) to set the field width for your output. Below is an example of how that can be done (using your simulated other program input):
#!/bin/bash
ofn="log.txt" # set output filename
# declare variables as array and integer types
declare -a line_arr hdg name acct amt trn tmp
declare -i nmx=0 acmx=0 ammx=0 tmx=0
# set heading array (so you can measure lengths)
hdg=( "Client Name"
"Account Number"
"Ammount"
"Tran Time" )
## set the initial max based on headings
nmx="${#hdg[0]}" # max name width
acmx="${#hdg[1]}" # max account width
ammx="${#hdg[2]}" # max ammount width
tmx="${#hdg[3]}" # max tran width
{ IFS=$'\n' # your array=( `output from a different script` )
line_arr=($(
cat << EOF
Michael Press,20484,602117,11.41.02
Adam West,164121,50152,11.41.06
John Smith,15113,411700,11.41.07
Leo Anderson,2115116,350056,11.41.07
Wayne Clark,451987,296503,11.41.08
EOF
)
)
}
# write heading to file
cat << EOF > "$ofn"
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
EOF
# read line array into tmp, compare to max field widths
{ IFS=$','
for i in "${line_arr[#]}"; do
tmp=( $(printf "%s" "$i") )
((${#tmp[0]} > nmx )) && nmx=${#tmp[0]}
((${#tmp[1]} > acmx )) && acmx=${#tmp[1]}
((${#tmp[2]} > ammx )) && ammx=${#tmp[2]}
((${#tmp[3]} > tmx )) && tmx=${#tmp[3]}
name+=( "${tmp[0]}" ) # fill name array
acct+=( "${tmp[1]}" ) # fill account num array
amt+=( "${tmp[2]}" ) # fill amount array
trn+=( "${tmp[3]}" ) # fill tran array
done
}
printf "%-*s %-*s %-*s %s\n" "$nmx" "${hdg[0]}" "$acmx" "${hdg[1]}" \
"$ammx" "${hdg[2]}" "${hdg[3]}" >> "$ofn"
for ((i = 0; i < ${#name[#]}; i++)); do
printf "%-*s %-*s %-*s %s\n" "$nmx" "${name[i]}" "$acmx" "${acct[i]}" \
"$ammx" "${amt[i]}" "${trn[i]}" >> "$ofn"
done
(you can remove the extra space between each field in the final two printf statements if you only want a single space between them -- looked better with 2 to me)
Output to log.txt
$ cat log.txt
The tansaction detials for today are 35
Please check the 5 biggest transactions below
-----------------------------------------------------------------------------------
Client Name Account Number Ammount Tran Time
Michael Press 20484 602117 11.41.02
Adam West 164121 50152 11.41.06
John Smith 15113 411700 11.41.07
Leo Anderson 2115116 350056 11.41.07
Wayne Clark 451987 296503 11.41.08
Look things over and let me know if you have any questions.

bash grep -e to array in a loop

I have a text with repeated data patterns, and grep keeps getting all matches without stop.
for ((count = 1; count !=17; count++)); do # 17 times
xuz1[count]=`grep -e "1 O1" $out_file | cut -c10-29`
xuz2[count]=`grep -e "2 O2" $out_file | cut -c10-29`
xuz3[count]=`grep -e "3 O3" $out_file | cut -c10-29`
echo ${xuz1[count]}
echo ${xuz2[count]}
echo ${xuz3[count]}
done
data looks like:
some text.....
Text....
.....
1 O1 111111 111111 111111
2 O2 222211 222211 222211
3 O3 643653 652346 757686
some text.....
1 O1 111122 111122 111122
2 O2 222222 222222 222222
3 O3 343653 652346 757683
some text.....
1 O1 111333 111333 111333
2 O2 222333 222333 222333
3 O3 343653 652346 757684
.
.
.
And result I'm getting:
xuz1[1] = 111111 111111 111111
xuz2[1] = 222211 222211 222211
xuz3[1] = 643653 652346 757686
xuz1[2] = 111111 111111 111111
xuz2[2] = 222211 222211 222211
xuz3[2] = 643653 652346 757686
...
looking for result like this:
xuz1[1]=111111 111111 111111
xuz2[1]=222211 222211 222211
xuz3[1]=343653 652346 757683
xuz1[2]=111122 111122 111122
xuz2[2]=222222 222222 222222
xuz3[2]=343653 652346 757684
also tried "grep -m 1 -e"
Which way should I go?
for now I ended up with one line
grep -A4 -e "1 O1" $out_file | cut -c10-29
Some text.... Is a huge text part.
A little bash script with a single grep is enough
grep -E '^[0-9]+ +O[0-9]+ +.*'|
while read idx oidx cols; do
if ((idx == 1)); then
let ++i
name=xuz$i
let j=1
fi
echo "$name[$j]=$cols"
let ++j
done
You haven't really described what you want, but I guess something like this.
awk '! /^[1-9][0-9]* O[0-9] / { n++; m=0; if (NR>1) print ""; next }
{ print "xuz" ++m "[" n "]=" substr($0, 10) }' "$out_file"
If the regex doesn't match, we assume we are looking at one of the "some text" pieces, and that this starts a new record. Increment n and reset m. Otherwise, print the output for this item within this record.
If some text could be more than one line, you will need a minor change, but I hope this should be enough at least to send you in the right direction.
You can do this in pure Bash, too, though this is going to be highly inefficient - you would expect a Bash while read loop to be at least a hundred times slower than Awk, and the code is markedly less idiomatic and elegant.
while read -r m x result; do
case $m::$x in
[1-9]::O[1-9])
printf 'xuz%d[%d]=%s\n' $m $n "$result;;
*)
# If n is unset, don't print an empty line
printf '%s' "${n+$'\n'}"
let ((n++));;
esac
done <"$out_file"
I would aggressively challenge any requirement to do this in pure Bash. If it's for homework, the requirement is unrealistic, and a core skill for shell script authors is to understand the limits of the shell and the strengths of the common support tools like Awk. The Awk language is virtually guaranteed to be available wherever you have a shell, in particular a heavy shell like Bash. (In a limited e.g. embedded environment, a limited shell like Dash would make more sense. Then e.g. the let keyword won't be available, though it should not be hard to make this script properly portable.)
The case statement accepts glob patterns, not regular expressions, so the pattern here is slightly less general (we accept one positive digit in the first field).
Thank you all for participating in discussion.
*** this is my home project to help my wife do extract data from research calculations /// speed up is around 400 times **
file used for extracting data from, contains around 2000 lines,
needed data blocks look like this
and they're repeated 10-20 times in the file.
uiyououy COORDINATES
NR ATOM CCCCC X Y Z
1 O1 8.00 0.000000000 0.882236820 -0.789494235
2 O2 8.00 0.000000000 -1.218250722 -1.644061652
3 O3 8.00 0.000000000 1.218328524 0.400260050
4 O4 8.00 0.000000000 -0.882314622 2.033295837
Text text text text
tons of text
to extract 4 lines I used expression below
grep -A4 --no-group-separator -e "1 O1" $from_file | cut -c23-64
>xyz_temp.txt
# grep 4 lines at once to txt
sed -i '/^[ \t]*$/d' xyz_temp.txt
#del empty lines from xyz txt
next is to convert string in to numbers (should use '| bc -l' for arithmetic)
while IFS= read line
do
IFS=' ' read -r -a arr_line <<< "$line"
# break line of xyz into 3 numbers
s1=$(echo "${arr_line[0]}" \* 0.529177249 | bc -l)
# some math convertion
s2=$(echo "${arr_line[1]}" \* 0.529177249 | bc -l)
s3=$(echo "${arr_line[2]}" \* 0.529177249 | bc -l)
#-------to array non sorted ------------
arr[$n]=${n}";"${from_file}";"${gd_}";"${frt[count_4s]}";"${n4}";"${s1}";"${s2}";"${s3}
echo ${arr[n]}
#--------------------------------------------
done <"$from_file_txt"
sort array
IFS=$'\n' sorted=($(sort -t \; -k4 -k5 -g <<<"${arr[*]}"))
# -t separator ';' -k column -g generic * to get new line output
#-k4 -k5 sort by column 4 then5
#printf "%s\n" "${sorted[*]}"
unset IFS
There is Last part which will combine data to result view
echo "$n"
n2=1
n42=1
count_4s2=1
i=0
echo "============================== sorted =============================="
################### loop for empty 4s lines
printf "%s" ";" ";" ";" ";" ";" "${count_4s2}" ";"
printf "%s\n"
printf "%s\n" "${sorted[i]}"
while [ $i -lt $((n-2)) ]
do
i=$((i+1))
if [ "$n42" = "4" ] # 1234
then n42=0
count_4s2=$((count_4s2+1))
printf "%s" ";" ";" ";" ";" ";" "${count_4s2}" ";"
printf "%s\n"
fi
#--------------------------------------------
n2=$((n2+1))
n42=$((n42+1))
printf "%s\n" "${sorted[i]}"
done ############# while
#00000000000000000000000000000000000000
printf "%s\n"
echo ==END===END===END==
Output looks like this
============================== sorted ==============================
;;;;;1;
17;A-13_A1+.out;1.3;0.4;1;0;.221176355474853043;-.523049776514580244
18;A-13_A1+.out;1.3;0.4;2;0;-.550350051428402955;-.734584881824005358
19;A-13_A1+.out;1.3;0.4;3;0;.665269869069959489;.133910683627893251
20;A-13_A1+.out;1.3;0.4;4;0;-.336096173116409577;1.123723974181515102
;;;;;2;
13;A-13_A1+.out;1.3;0.45;1;0;.279265277182782148;-.504490787956469897
14;A-13_A1+.out;1.3;0.45;2;0;-.583907412327951988;-.759310392973448167
15;A-13_A1+.out;1.3;0.45;3;0;.662538493711206290;.146829200993661293
16;A-13_A1+.out;1.3;0.45;4;0;-.357896358566036450;1.116971979936256771
;;;;;3;
9;A-13_A1+.out;1.3;0.5;1;0;.339333719743262501;-.482029749553797105
10;A-13_A1+.out;1.3;0.5;2;0;-.612395507070451545;-.788968880150283253
11;A-13_A1+.out;1.3;0.5;3;0;.658674809217196345;.163289820251690233
12;A-13_A1+.out;1.3;0.5;4;0;-.385613021360830052;1.107708808923212876
==END===END===END==
*note : some code might not shown here
next step is to paste it to excel with ; separator.

Bash script, command - output to array, then print to file

I need advice on how to achieve this output:
myoutputfile.txt
Tom Hagen 1892
State: Canada
Hank Moody 1555
State: Cuba
J.Lo 156
State: France
output of mycommand:
/usr/bin/mycommand
Tom Hagen
1892
Canada
Hank Moody
1555
Cuba
J.Lo
156
France
Im trying to achieve with this shell script:
IFS=$'\r\n' GLOBIGNORE='*' :; names=( $(/usr/bin/mycommand) )
for name in ${names[#]}
do
#echo $name
echo ${name[0]}
#echo ${name:0}
done
Thanks
Assuming you can always rely on the command to output groups of 3 lines, one option might be
/usr/bin/mycommand |
while read name;
read year;
read state; do
echo "$name $year"
echo "State: $state"
done
An array isn't really necessary here.
One improvement could be to exit the loop if you don't get all three required lines:
while read name && read year && read state; do
# Guaranteed that name, year, and state are all set
...
done
An easy one-liner (not tuned for performance):
/usr/bin/mycommand | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
It reads 3 lines at a time from the pipe and then passes them to a new instance of printf which is used to format the output.
If you have whitespace at the beginning (it looks like that in your example output), you may need to use something like this:
/usr/bin/mycommand | sed -e 's/^\s*//g' | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
#!/bin/bash
COUNTER=0
/usr/bin/mycommand | while read LINE
do
if [ $COUNTER = 0 ]; then
NAME="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 1 ]; then
YEAR="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 2 ]; then
STATE="$LINE"
COUNTER=0
echo "$NAME $YEAR"
echo "State: $STATE"
fi
done
chepner's pure bash solution is simple and elegant, but slow with large input files (loops in bash are slow).
Michael Jaros' solution is even simpler, if you have GNU xargs (verify with xargs --version), but also does not perform well with large input files (external utility printf is called once for every 3 input lines).
If performance matters, try the following awk solution:
/usr/bin/mycommand | awk '
{ ORS = (NR % 3 == 1 ? " " : "\n")
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") }
{ print (NR % 3 == 0 ? "State: " : "") $0 }
' > myoutputfile.txt
NR % 3 returns the 0-based index of each input line within its respective group of consecutive 3 lines; returns 1 for the 1st line, 2 for the 2nd, and 0(!) for the 3rd.
{ ORS = (NR % 3 == 1 ? " " : "\n") determines ORS, the output-record separator, based on that index: a space for line 1, and a newline for lines 2 and 3; the space ensures that line 2 is appended to line 1 with a space when using print.
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") strips leading and trailing whitespace from the line - including, if present, a trailing \r, which your input seems to have.
{ print (NR % 3 == 0 ? "State: " : "") $0 } prints the trimmed input line, prefixed by "State: " only for every 3rd input line, and implicitly followed by ORS (due to use of print).

Resources