Bash: Parse CSV with quotes, commas and newlines - bash

Say I have the following csv file:
id,message,time
123,"Sorry, This message
has commas and newlines",2016-03-28T20:26:39
456,"It makes the problem non-trivial",2016-03-28T20:26:41
I want to write a bash command that will return only the time column. i.e.
time
2016-03-28T20:26:39
2016-03-28T20:26:41
What is the most straight forward way to do this? You can assume the availability of standard unix utils such as awk, gawk, cut, grep, etc.
Note the presence of "" which escape , and newline characters which make trivial attempts with
cut -d , -f 3 file.csv
futile.

As chepner said, you are encouraged to use a programming language which is able to parse csv.
Here comes an example in python:
import csv
with open('a.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, quotechar='"')
for row in reader:
print(row[-1]) # row[-1] gives the last column

As said here
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file.csv \
| awk -F, '{print $NF}'
To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file
This works by splitting the file along " characters and removing newlines in every other block.
Output
time
2016-03-28T20:26:39
2016-03-28T20:26:41
Then use awk to split the columns and display the last column

CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Python installed, use the csv module instead of plain BASH.
If not, consider csvkit which has a lot of powerful tools to process CSV files from the command line.
See also:
https://unix.stackexchange.com/questions/7425/is-there-a-robust-command-line-tool-for-processing-csv-files

another awk alternative using FS
$ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}
NR>1{sub(/,/,"",$NF); print $NF}' file
2016-03-28T20:26:39
2016-03-28T20:26:41

I ran into something similar when attempting to deal with lspci -m output, but the embedded newlines would need to be escaped first (though IFS=, should work here, since it abuses bash' quote evaluation).
Here's an example
f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"
And the only reasonable way I can find to bring that into bash is along the lines of:
# echo 'f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"' | { eval array=($(cat)); declare -p array; }
declare -a array='([0]="f:13.3" [1]="System peripheral" [2]="Intel Corporation" [3]="Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" [4]="-r01" [5]="Super Micro Computer Inc" [6]="Device 0838")'
#
Not a full answer, but might help!

Vanilla bash script
save this code as parse_csv.sh, give it execution privilege (chmod +x parse_csv.sh)
#!/bin/bash
# vim: ts=4 sw=4 hidden nowrap
# #copyright Copyright © 2021 Carlos Barcellos <carlosbar at gmail.com>
# #license https://www.gnu.org/licenses/lgpl-3.0.en.html
if [ "$1" = "-h" -o "$1" = "--help" -o "$1" = "-v" ]; then
echo "parse csv 0.1"
echo ""
echo "parse_csv.sh [csv file] [delimiter]"
echo " csv file csv file to parse; default stdin"
echo " delimiter delimiter to use. default is comma"
exit 0
fi
delim=,
if [ $# -ge 1 ]; then
[ -n "$1" ] && file="$1"
[ -n "$2" -a "$2" != "\"" ] && delim="$2"
fi
processLine() {
if [[ ! "$1" =~ \" ]]; then
(
IFSS="$delim" fields=($1)
echo "${fields[#]}"
)
return 0
fi
under_scape=0
fields=()
acc=
for (( x=0; x < ${#1}; x++ )); do
if [ "${1:x:1}" = "${delim:0:1}" -o $((x+1)) -ge ${#1} ] && [ $under_scape -ne 1 ]; then
[ "${1:x:1}" != "${delim:0:1}" ] && acc="${acc}${1:x:1}"
fields+=($acc)
acc=
elif [ "${1:x:1}" = "\"" ]; then
if [ $under_scape -eq 1 ] && [ "${1:x+1:1}" = "\"" ]; then
acc="${acc}${1:x:1}"
else
under_scape=$((!under_scape))
fi
[ $((x+1)) -ge ${#1} ] && fields+=($acc)
else
acc="${acc}${1:x:1}"
fi
done
echo "${fields[#]}"
return 0
}
while read -r line; do
processLine "$line"
done < ${file:-/dev/stdin}
Then use: parse_csv.sh "csv file". To print only the last col, you can change the echo "${fields[#]}" to echo "${fields[-1]}"

Perl to the rescue! Use the Text::CSV_XS module to handle CSV.
perl -MText::CSV_XS=csv -we 'csv(in => $ARGV[0],
on_in => sub { $_[1] = [ $_[1][-1] ] })
' -- file.csv
the csv subroutine processes the csv
in specifies the input file, $ARGV[0] contains the first command line argument, i.e. file.csv here
on_in specifies code to run. It gets the current row as the second argument, i.e. $_[1]. We just set the whole row to the contents of the last column.

I think you are overthinking it.
$: echo time; grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}$' file
time
2016-03-28T20:26:39
2016-03-28T20:26:41
If you want to check for that comma just to be sure,
$: echo time; sed -En '/,[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}$/{ s/.*,//; p; }' file
time
2016-03-28T20:26:39
2016-03-28T20:26:41

csvquote is designed for exactly this kind of thing. It santizes the file (reversibly) and allows awk to depend on commas being field separators and newlines being record separators.

awk -F, '!/This/{print $NF}' file
time
2016-03-28T20:26:39
2016-03-28T20:26:41

Related

BASH: How to split "CSV" string with commas and quotes [duplicate]

Say I have the following csv file:
id,message,time
123,"Sorry, This message
has commas and newlines",2016-03-28T20:26:39
456,"It makes the problem non-trivial",2016-03-28T20:26:41
I want to write a bash command that will return only the time column. i.e.
time
2016-03-28T20:26:39
2016-03-28T20:26:41
What is the most straight forward way to do this? You can assume the availability of standard unix utils such as awk, gawk, cut, grep, etc.
Note the presence of "" which escape , and newline characters which make trivial attempts with
cut -d , -f 3 file.csv
futile.
As chepner said, you are encouraged to use a programming language which is able to parse csv.
Here comes an example in python:
import csv
with open('a.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, quotechar='"')
for row in reader:
print(row[-1]) # row[-1] gives the last column
As said here
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file.csv \
| awk -F, '{print $NF}'
To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file
This works by splitting the file along " characters and removing newlines in every other block.
Output
time
2016-03-28T20:26:39
2016-03-28T20:26:41
Then use awk to split the columns and display the last column
CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Python installed, use the csv module instead of plain BASH.
If not, consider csvkit which has a lot of powerful tools to process CSV files from the command line.
See also:
https://unix.stackexchange.com/questions/7425/is-there-a-robust-command-line-tool-for-processing-csv-files
another awk alternative using FS
$ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}
NR>1{sub(/,/,"",$NF); print $NF}' file
2016-03-28T20:26:39
2016-03-28T20:26:41
I ran into something similar when attempting to deal with lspci -m output, but the embedded newlines would need to be escaped first (though IFS=, should work here, since it abuses bash' quote evaluation).
Here's an example
f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"
And the only reasonable way I can find to bring that into bash is along the lines of:
# echo 'f:13.3 "System peripheral" "Intel Corporation" "Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" -r01 "Super Micro Computer Inc" "Device 0838"' | { eval array=($(cat)); declare -p array; }
declare -a array='([0]="f:13.3" [1]="System peripheral" [2]="Intel Corporation" [3]="Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder" [4]="-r01" [5]="Super Micro Computer Inc" [6]="Device 0838")'
#
Not a full answer, but might help!
Vanilla bash script
save this code as parse_csv.sh, give it execution privilege (chmod +x parse_csv.sh)
#!/bin/bash
# vim: ts=4 sw=4 hidden nowrap
# #copyright Copyright © 2021 Carlos Barcellos <carlosbar at gmail.com>
# #license https://www.gnu.org/licenses/lgpl-3.0.en.html
if [ "$1" = "-h" -o "$1" = "--help" -o "$1" = "-v" ]; then
echo "parse csv 0.1"
echo ""
echo "parse_csv.sh [csv file] [delimiter]"
echo " csv file csv file to parse; default stdin"
echo " delimiter delimiter to use. default is comma"
exit 0
fi
delim=,
if [ $# -ge 1 ]; then
[ -n "$1" ] && file="$1"
[ -n "$2" -a "$2" != "\"" ] && delim="$2"
fi
processLine() {
if [[ ! "$1" =~ \" ]]; then
(
IFSS="$delim" fields=($1)
echo "${fields[#]}"
)
return 0
fi
under_scape=0
fields=()
acc=
for (( x=0; x < ${#1}; x++ )); do
if [ "${1:x:1}" = "${delim:0:1}" -o $((x+1)) -ge ${#1} ] && [ $under_scape -ne 1 ]; then
[ "${1:x:1}" != "${delim:0:1}" ] && acc="${acc}${1:x:1}"
fields+=($acc)
acc=
elif [ "${1:x:1}" = "\"" ]; then
if [ $under_scape -eq 1 ] && [ "${1:x+1:1}" = "\"" ]; then
acc="${acc}${1:x:1}"
else
under_scape=$((!under_scape))
fi
[ $((x+1)) -ge ${#1} ] && fields+=($acc)
else
acc="${acc}${1:x:1}"
fi
done
echo "${fields[#]}"
return 0
}
while read -r line; do
processLine "$line"
done < ${file:-/dev/stdin}
Then use: parse_csv.sh "csv file". To print only the last col, you can change the echo "${fields[#]}" to echo "${fields[-1]}"
Perl to the rescue! Use the Text::CSV_XS module to handle CSV.
perl -MText::CSV_XS=csv -we 'csv(in => $ARGV[0],
on_in => sub { $_[1] = [ $_[1][-1] ] })
' -- file.csv
the csv subroutine processes the csv
in specifies the input file, $ARGV[0] contains the first command line argument, i.e. file.csv here
on_in specifies code to run. It gets the current row as the second argument, i.e. $_[1]. We just set the whole row to the contents of the last column.
I think you are overthinking it.
$: echo time; grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}$' file
time
2016-03-28T20:26:39
2016-03-28T20:26:41
If you want to check for that comma just to be sure,
$: echo time; sed -En '/,[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}$/{ s/.*,//; p; }' file
time
2016-03-28T20:26:39
2016-03-28T20:26:41
csvquote is designed for exactly this kind of thing. It santizes the file (reversibly) and allows awk to depend on commas being field separators and newlines being record separators.
awk -F, '!/This/{print $NF}' file
time
2016-03-28T20:26:39
2016-03-28T20:26:41

Bash: Reading Output to a String with Special Characters

I'm using TShark to read TCP streams of a PCAP into a file of a set format. My code:
#!/bin/bash
OUT="*/temp/Temp.txt"
NEW="\"REQ:"
i=0
echo "Generating conversations..."
echo "" > $OUT
while [ "$COUNT" != 1 ]
do
BLOCK="$(tshark -r */browser.pcap -q -z follow,tcp,ascii,$i)"
SUB=$(echo "$BLOCK" | sed -n '5p')
PORT=${SUB##*:}
BLOCK="${BLOCK//$'\t'/\"RES:}"
BLOCK=$(echo "$BLOCK" | tail -n +6)
BLOCK=$(echo "$BLOCK" | head -n -1)
COUNT=$(echo "$BLOCK" | wc -l)
BLOCK=$(echo "$BLOCK" | awk '{print $j"\""}')
j=1
while [ $j -lt $(($COUNT+2)) ]
do
CHECK=$(echo "$BLOCK" | sed $j'q;d')
PREF=${CHECK:0:5}
if [ "$PREF" != "\"RES:" ]; then
CHECK=$NEW$CHECK
BLOCK=$(echo "$BLOCK" | sed $j's/.*/'$CHECK'/')
fi
j=$(($j+1))
done
if [ "$COUNT" != 1 ]; then
echo "" >> $OUT
echo "\$" >> $OUT
echo "tag = \"gen."$i"\"" >> $OUT
echo "port = \""$PORT"\"" >> $OUT
echo "base = \"TCP\"" >> $OUT
echo "payloads:" >> $OUT
echo "$BLOCK" >> $OUT
echo "Generated conversation "$i
fi
i=$(($i+1))
done
echo "Generation complete!"
When I run this, I get the following error for each conversation read:
> sed: -e expression #1, char 18: unterminated `s' command
I believe the problem lies in the call to TShark on line 9. Originally I used the "raw" argument for the command, which outputs raw hex data. This worked and output correctly. However, my task requires outputting ASCII data. Changing "raw" to "ascii" (both recognized by TShark) causes the aforementioned errors. I believe this is because the ASCII data in the read packets contains special characters; a small piece of data generated by line 9 in command line is:
..7.<.......Y.|.$.......2...W...v.'#
My question is are the special characters in the ASCII data I'm parsing causing the sed errors? If so, how could I make bash ignore them? Thanks!
Edit- I am ultimately trying to get the output of this TShark command, which looks like this...
===================================================================
Follow: tcp,raw
Filter: tcp.stream eq 4
Node 0: 10.211.55.3:58733
Node 1: 157.127.239.146:80
47455420687474703a2f2f73656d696e617270726f6a656374732e6f72672f6373732e7068703f7374796c6573686565743d393620485454502f312e310d0a486f73743a2073656d696e617270726f6a656374732e6f72670d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a20746578742f6373732c2a2f2a3b713d302e310d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6f6b69653a205f5f6366647569643d646564613432383039663566623634356461663239333963366235336565653764313433373734383236323b206d7962625b6c61737476697369745d3d313433373734383333353b206d7962625b6c6173746163746976655d3d313433373734383333353b207369643d31663739303463373761383761656234363537306131636161316462336161310d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a
485454502f312e3120323030204f4b0d0a446174653a204672692c203234204a756c20323031352031343a33313a303420474d540d0a436f6e74656e742d547970653a20746578742f6373730d0a582d506f77657265642d42793a205048502f352e342e31360d0a5365727665723a20636c6f7564666c6172652d6e67696e780d0a43462d5241593a20323062303533396434326436313365332d4c41580d0a436f6e74656e742d456e636f64696e673a20677a69700d0a436f6e74656e742d4c656e6774683a203134320d0a4167653a20300d0a5669613a20312e31206e657070737730390d0a0d0a1f8b08000000000000036c8cbd0a03211084ebf52916ac13f2db689bcb6b04bd15919caeac060e42de3d981469325f37df305bcf4ee896436b2e067c2af06ebe47e14721837aba0eac8299171683faf88955e05928c8a6733578a82b365e12a1be9c063fefb977ceff27d511a5120d9eeb6a1564273195efe37e37aa970278030000ffff0300cc348afaa1000000
47455420687474703a2f2f7777772e676f6f676c652d616e616c79746963732e636f6d2f616e616c79746963732e6a7320485454502f312e310d0a486f73743a207777772e676f6f676c652d616e616c79746963732e636f6d0d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a202a2f2a0d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a49662d4d6f6469666965642d53696e63653a205468752c203039204a756c20323031352032333a35303a353620474d540d0a0d0a
485454502f312e3120333034204e6f74204d6f6469666965640d0a446174653a204672692c203234204a756c20323031352031343a33303a353520474d540d0a457870697265733a204672692c203234204a756c20323031352031353a35313a343120474d540d0a43616368652d436f6e74726f6c3a207075626c69632c206d61782d6167653d373230300d0a566172793a204163636570742d456e636f64696e670d0a436f6e6e656374696f6e3a20636c6f73650d0a5669613a20312e31206e657070737730390d0a0d0a
===================================================================
...into a custom format for a program to read. The above output is in the working raw hex data format. The custom format looks like this for the corresponding conversation:
$
tag = "gen.4"
port = "58733"
base = "TCP"
payloads:
"REQ:47455420687474703a2f2f73656d696e617270726f6a656374732e6f72672f6373732e7068703f7374796c6573686565743d393620485454502f312e310d0a486f73743a2073656d696e617270726f6a656374732e6f72670d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a20746578742f6373732c2a2f2a3b713d302e310d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6f6b69653a205f5f6366647569643d646564613432383039663566623634356461663239333963366235336565653764313433373734383236323b206d7962625b6c61737476697369745d3d313433373734383333353b206d7962625b6c6173746163746976655d3d313433373734383333353b207369643d31663739303463373761383761656234363537306131636161316462336161310d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a0d0a"
"RES:485454502f312e3120323030204f4b0d0a446174653a204672692c203234204a756c20323031352031343a33313a303420474d540d0a436f6e74656e742d547970653a20746578742f6373730d0a582d506f77657265642d42793a205048502f352e342e31360d0a5365727665723a20636c6f7564666c6172652d6e67696e780d0a43462d5241593a20323062303533396434326436313365332d4c41580d0a436f6e74656e742d456e636f64696e673a20677a69700d0a436f6e74656e742d4c656e6774683a203134320d0a4167653a20300d0a5669613a20312e31206e657070737730390d0a0d0a1f8b08000000000000036c8cbd0a03211084ebf52916ac13f2db689bcb6b04bd15919caeac060e42de3d981469325f37df305bcf4ee896436b2e067c2af06ebe47e14721837aba0eac8299171683faf88955e05928c8a6733578a82b365e12a1be9c063fefb977ceff27d511a5120d9eeb6a1564273195efe37e37aa970278030000ffff0300cc348afaa1000000"
"REQ:47455420687474703a2f2f7777772e676f6f676c652d616e616c79746963732e636f6d2f616e616c79746963732e6a7320485454502f312e310d0a486f73743a207777772e676f6f676c652d616e616c79746963732e636f6d0d0a557365722d4167656e743a204d6f7a696c6c612f352e3020285831313b204c696e7578207838365f36343b2072763a33382e3029204765636b6f2f32303130303130312046697265666f782f33382e300d0a4163636570743a202a2f2a0d0a4163636570742d4c616e67756167653a20656e2d55532c656e3b713d302e350d0a4163636570742d456e636f64696e673a20677a69702c206465666c6174650d0a526566657265723a20687474703a2f2f73656d696e617270726f6a656374732e6f72672f632f74736861726b2d666f6c6c6f772d7463702d73747265616d0d0a436f6e6e656374696f6e3a206b6565702d616c6976650d0a49662d4d6f6469666965642d53696e63653a205468752c203039204a756c20323031352032333a35303a353620474d540d0a0d0a"
"RES:485454502f312e3120333034204e6f74204d6f6469666965640d0a446174653a204672692c203234204a756c20323031352031343a33303a353520474d540d0a457870697265733a204672692c203234204a756c20323031352031353a35313a343120474d540d0a43616368652d436f6e74726f6c3a207075626c69632c206d61782d6167653d373230300d0a566172793a204163636570742d456e636f64696e670d0a436f6e6e656374696f6e3a20636c6f73650d0a5669613a20312e31206e657070737730390d0a0d0a"
You can tell bash to not interpret metacharacters by quoting the variable expansion:
sed $j's/.*/'"$CHECK"'/'
In fact, there is no reason to use single quotes in the above, so you could just double-quote the entire command argument:
sed "${j}s/.*/$CHECK/"
However, neither of the above will tell sed to avoid interpreting special characters in the replacement part of the s command, so if $CHECK contains a /, then that will prematurely terminate the replacement.
So the question really is, is there a better way of accomplishing this:
BLOCK=$(echo "$BLOCK" | sed $j's/.*/'$CHECK'/')
Apparently, the goal is to replace line $j of the value of $BLOCK with the value of $CHECK. One way to do this, using awk:
BLOCK="$(awk -v repl="$CHECK" 'NR==$j{print repl;next}1')"
Notes:
Although I didn't fix it in my example, it is very bad style to use ALL CAPS for shell variables. Normally, shell variables in ALL CAPS are reserved for use as known exported variables by bash or system utilities (eg. $PATH; $IFS; $TERM; etc.). Your own variables should be lower-case to avoid conflicts.
The full loop that the command is excerpted from could probably be all implemented more efficiently and more cleanly (and more understandably) in awk. Based on the sample output, the following would probably work:
echo "Generating conversations..."
i=0
while
tshark -r */browser.pcap -q -z follow,tcp,ascii,$i |
awk -v idx=$i -v '
NR==4 { n = split($0, a, /:/); port = a[n]; }
NR<6 { next; }
/^=========/ { exit port != 0; }
port { print "$"
printf "tag = \"gen.%d\"" idx
print "port = \"%s\"" port
print "base = \"TCP\""
print "payloads:"
port = 0
}
/^\t/ { printf "\"RES:%s\"" substr($0, 2) "\""; next; }
{ printf "\"REQ:%s\"" $0 "\""; }
' >> $OUT;
do
echo "Generated conversation "$i
done
echo "Generation complete!"
I didn't try it. It may well be buggy. I don't understand the termination condition, so I just made a guess. I'm not sure if you really meant to extract the port number from line 5 (as in the code) or line 4 (as in the example.)

Using bash, separate servers into separate file depending on even or odd numbers

The output comes from a command I run from our netscaler. It outputs the following ... One thing to note is that the middle two numbers change but the even/odd criteria is always on the last digit. We never have more than 2 digits, so we'll never hit 10.
WC-01-WEB1
WC-01-WEB4
WC-01-WEB3
WC-01-WEB5
WC-01-WEB8
I need to populate a file called "even" and "odds." If we're dealing with numbers I can figure it out, but having the number within a string is throwing me off.
Example code but I'm missing the part where I need to match the string.
if [ $even_servers -eq 0 ]
then
echo $line >> evenfile
else
echo $line >> oddfile
fi
This is a simple awk command:
awk '/[02468]$/{print > "evenfile"}; /[13579]$/{print > "oddfile"}' input.txt
There must be better way.
How about this version:
for v in `cat <my_file>`; do export type=`echo $v | awk -F 'WEB' '{print $2%2}'`; if [ $type -eq 0 ]; then echo $v >> evenfile ; else echo $v >> oddfile; fi; done
I assume your list of servers is stored in the filename <my_file>. The basic idea is to tokenize on WEB using awk and process the chars after WEB to determine even-ness. Once this is known, we export the value to a variable type and use this to selectively dump to the appropriate file.
For the case when the name is the output of another command:
export var=`<another command>`; export type=`echo $var | awk -F 'WEB' '{print $2%2}'`; if [ $type -eq 0 ]; then echo $var >> evenfile ; else echo $var >> oddfile; fi;
Replace <another command> with your perl script.
As always grep is your friend:
grep "[2468]$" input_file > evenfile
grep "[^2468]$" input_file > oddfile
I hope this helps.

Variables from file

A text file has the following structure:
paa pee pii poo puu
baa bee bii boo buu
gaa gee gii goo guu
maa mee mii moo muu
Reading it line by line in a script is done with
while read LINE; do
ACTION
done < FILE
I'd need to get parameters 3 and 4 of each line into variables for ACTION. If this was manual input, $3 and $4 would do the trick. I assume awk is the tool, but I just can't wrap my head around the syntax. Halp?
read does this just fine. Pass it multiple variables and it will split on $IFS into that many fields.
while read -r one two three four five; do
action "$three" "$four"
done <file
I added the -r option because that is usually what you want. The default behavior is a legacy oddity of limited use.
Thanks tripleee. In the meantime I managed a suitably versatile solution:
#!/bin/sh
if [ ! $1 ]; then
echo "Which inputfile?"
exit
elif [ ! $2 -o ! $3 ]; then
echo "Two position parameters required"
exit
fi
if [ -f outfile ]; then
mv outfile outfile.old
fi
while read -a LINE; do
STRING="${LINE[#]}"
if [ ${LINE[$2-1]} == ${LINE[$3-1]} ]; then # remove comment for strings
# if [ ${LINE[$(($2-1))]} -eq ${LINE[$(($3-1))]} ]; then # remove comment for integers
echo $STRING >> outfile
fi
done < $1

How to printf a variable length line in fixed length chunks?

I need to to analyze (with grep) and print (with some formatting) the content of an
app's log.
This log contains text data in variable length lines. What I need is, after some grepping, loop each line of this output and print it with a maximum fixed length of 50 characters. If a line is longer than 50 chars, it should print a newline and then continue with the rest in the following line and so on until the line is completed.
I tried to use printf to do this, but it's not working and I don't know why. It just outputs the lines in same fashion of echo, without any consideration about printf formatting, though the \t character (tab) works.
function printContext
{
str="$1"
log="$2"
tmp="/tmp/deluge/$$"
rm -f $tmp
echo ""
echo -e "\tLog entries for $str :"
ln=$(grep -F "$str" "$log" &> "$tmp" ; cat "$tmp" | wc -l)
if [ $ln -gt 0 ];
then
while read line
do
printf "\t%50s\n" "$line"
done < $tmp
fi
}
What's wrong? I Know that I can make a substring routine to accomplish this task, but printf should be handy for stuff like this.
Instead of:
printf "\t%50s\n" "$line"
use
printf "\t%.50s\n" "$line"
to truncate your line to 50 characters only.
I'm not sure about printf but seeing as how perl is installed everywhere, how about a simple 1 liner?
echo $ln | perl -ne ' while( m/.{1,50}/g ){ print "$&\n" } '
Here's a clunky bash-only way to break the string into 50-character chunks
i=0
chars=50
while [[ -n "${y:$((chars*i)):$chars}" ]]; do
printf "\t%s\n" "${y:$((chars*i)):$chars}"
((i++))
done

Resources