How to split a string on delimiter and append a constant value to all elements - bash

I have a file with each line in below format
KeyA=ValA1,ValA2,ValA3...ValAn
KeyB=ValB1,ValB2,ValB3....ValBn
I have multiples lines in that file with varying number of values for each line.
My task is to append Val Key for each line. Expected sample output:
ValA1 KeyA
ValA2 KeyA
ValA3 KeyA
ValB1 KeyB
ValB2 KeyB
ValB3 KeyB
What I tried is :
while read -r line; do
KEY=$(echo $line | cut -d '=' -f 1)
VALUES=$(echo $line | cut -d '=' -f 2)
for VAL in $VALUES;do
echo $VAL $KEY
done
done < file.txt
I am able to achieve the expected output, but I am supposed to complete this without using the for loop.
Can someone suggest me any other solution.

One should not parse line-based text files with shell loops; shell is interpreted one line at a time as a program is read. This is extremely inefficient for bulk jobs. Please use dedicated text processors like awk or perl.
awk -F'[=,]' '{k=$1; for(f=2;f<=NF;f++) print $f, k}' file
-F'[=,]' - Fields are delimited by a single comma/equals
{...} - with no condition, this action will be performed on every line
k=$1 - set k to Field 1
for(f=2;f<=NF;f++) - iterate over all remaining fields (NF = Number of Fields)
print $f, k - print the field, a space, and the value of k

I got this solution. First substitute = and , for a space. Then read each line with xargs and execute a script, that will buffer the first argument (ie. they key) and output with iterating over all the others:
<inputfile tr '[=,]' ' ' |
xargs -l sh -c 't="$1"; shift; printf "$t %s\n" "$#"' --
On my second try I did the following, where I don't substitute = for a space, so if values have = in them, it doesn't get's split up.
while IFS== read -r key vals; do
printf "%s" "$vals" |
xargs -d, printf "$key %s\n"
done <inputfile

Related

Error in bash script: arithmetic error

I am wrote a simple script to extract text from a bunch of files (*.out) and add two lines at the beginning and a line at the end. Then I add the extracted text with another file to create a new file. The script is here.
#!/usr/bin/env bash
#A simple bash script to extract text from *.out and create another file
for f in *.out; do
#In the following line, n is a number which is extracted from the file name
n=$(echo $f | cut -d_ -f6)
t=$((2 * $n ))
#To extract the necessary text/data
grep " B " $f | tail -${t} | awk 'BEGIN {OFS=" ";} {print $1, $4, $5, $6}' | rev | column -t | rev > xyz.xyz
#To add some text as the first, second and last lines.
sed -i '1i -1 2' xyz.xyz
sed -i '1i $molecule' xyz.xyz
echo '$end' >> xyz.xyz
#To combine the extracted info with another file (ea_input.in)
cat xyz.xyz ./input_ea.in > "${f/abc.out/pqr.in}"
done
./script.sh: line 4: (ls file*.out | cut -d_ -f6: syntax error: invalid arithmetic operator (error token is ".out) | cut -d_ -f6")
How I can correct this error?
In bash, when you use:
$(( ... ))
it treats the contents of the brackets as an arithmetic expression, returning the result of the calculation, and when you use:
$( ... )
it executed the contents of the brackets and returns the output.
So, to fix your issue, it should be as simple as to replace line 4 with:
n=$(ls $f | cut -d_ -f6)
This replaces the outer double brackets with single, and removes the additional brackets around ls $f which should be unnecessary.
The arithmetic error can be avoided by adding spaces between parentheses. You are already using var=$((arithmetic expression)) correctly elsewhere in your script, so it should be easy to see why $( ((ls "$f") | cut -d_ -f6)) needs a space. But the subshells are completely superfluous too; you want $(ls "$f" | cut -d_ -f6). Except ls isn't doing anything useful here, either; use $(echo "$f" | cut -d_ -f6). Except the shell can easily, albeit somewhat clumsily, extract a substring with parameter substitution; "${f#*_*_*_*_*_}". Except if you're using Awk in your script anyway, it makes more sense to do this - and much more - in Awk as well.
Here is an at empt at refactoring most of the processing into Awk.
for f in *.out; do
awk 'BEGIN {OFS=" " }
# Extract 6th _-separated field from input filename
FNR==1 { split(FILENAME, f, "_"); t=2*f[6] }
# If input matches regex, add to array b
/ B / { b[++i] = $1 OFS $4 OFS $5 OFS $6 }
# If array size reaches t, start overwriting old values
i==t { i=0; m=t }
END {
# Print two prefix lines
print "$molecule"; print -1, 2;
# Handle array smaller than t
if (!m) m=i
# Print starting from oldest values (index i + 1)
for(j=1; j<=m; j++) {
# Wrap to beginning of array at end
if(i+j > t) i-=t
print b[i+j]; }
print "$end" }' "$f" |
rev | column -t | rev |
cat - ./input_ea.in > "${f/foo.out/bar.in}"
done
Notice also how we avoid using a temporary file (this would certainly have been avoidable without the Awk refactoring, too) and how we take care to quote all filename variables in double quotes.
The array b contains (up to) the latest t values from matching lines; we collect these into an array which is constrained to never contain more than t values by wrapping the index i back to the beginning of the array when we reach index t. This "circular array" avoids keeping too many values in memory, which would make the script slow if the input file contains many matches.

bash string manipulation - regex match with delimiter

I have a string like this:
zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A
The order inside the delimiter | can be random - that means the key-value pairs can be randomly ordered in the string.
I want an output string like the following:
"INTERNET","10.10.10.0/24","SCB-INET-A"
All values in the output are values from the key-value string above
Does anyone know how I can solve this with awk or sed?
Given your input is a variable var:
var="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A"
echo "$var" | tr "|" "\n" | sed -n -r "s/(zone|name|gateway)=(.*)/\"\2\"/p"
"INTERNET"
"10.10.10.100"
"SCB-INET-A"
Using another 2 pipes inserts commas and removes line breaks:
SOFAR | tr "\n" "," | sed 's/,$//'
"INTERNET","10.10.10.100","SCB-INET-A"
Whenever you have name -> value pairs in your input the best approach is to create an array of those mappings (f[] below) and then access the values by their names:
$ cat tst.awk
BEGIN { RS="|"; FS="[=\n]"; OFS="," }
{ f[$1] = "\"" $2 "\"" }
END { print f["zone"], f["CIDR"], f["name"] }
$ awk -f tst.awk file
"INTERNET","10.10.10.0/24","SCB-INET-A"
The above will work efficiently (i.e. literally orders of magnitude faster than a shell loop) and portably using any awk in any shell on any UNIX box, unlike all of the other answers so far which all rely on non-POSIX functionality. It does full string matching instead of partial regexp matching, like some of the other answers, so it is extremely robust and will not result in bad output given partial matches. It also will not interpret any input characters (e.g. escape sequences and/or globbing chars), like some of your other answers do, and instead will just robustly reproduce them as-is in the output.
If you need to enhance it to print any extra field values just add them as , f["<field name>"] to the print statement and if you need to change the output format or do anything else it's all absolutely trivial too.
Using awk:
var="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|name=SCB-INET-A|inheritDNSRestrictions=true"
awk -v RS='|' -v ORS=',' -F= '$1~/zone|gateway|name/{print "\"" $2 "\""}' <<<"$var" | sed 's/,$//'
"INTERNET","10.10.10.100","SCB-INET-A"
The input record separator RS is set to |.
The input field separator FS is set to =.
The output record separator ORS is set to ,.
$1~/zone|gateway|name/ is filtering the parameter to extract. The print statement is added double quote to the parameter value.
The sed statement is to remove the annoying last , (that the print statement is adding).
One more solution using Bash. Not the shortest but I hope it is the best readable and so the best maintainable.
#!/bin/bash
# Function split_key_val()
# selects values from a string with key-value pairs
# IN: string_with_key_value_pairs wanted_key_1 [wanted_key_2] ...
# OUT: result
function split_key_val {
local KEY_VAL_STRING="$1"
local RESULT
# read the string with key-value pairs into array
IFS=\| read -r -a ARRAY <<< "$KEY_VAL_STRING"
#
shift
# while there are wanted-keys ...
while [[ -n $1 ]]
do
WANTED_KEY="$1"
# Search the array for the wanted-key
for KEY_VALUE in "${ARRAY[#]}"
do
# the key is the part before "="
KEY=$(echo "$KEY_VALUE" |cut --delimiter="=" --fields=1)
# the value is the part after "="
VALUE=$(echo "$KEY_VALUE" |cut --delimiter="=" --fields=2)
if [[ $KEY == $WANTED_KEY ]]
then
# if result is empty; result= found value...
if [[ -z $RESULT ]]
then
# (quote the damned quotes)
RESULT="\"${VALUE}\""
else
# ... else add a comma as a separator
RESULT="${RESULT},\"${VALUE}\""
fi
fi # key == wanted-key
done # searched whole array
shift # prepare for next wanted-key
done
echo "$RESULT"
return 0
}
STRING="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A"
split_key_val "$STRING" zone CIDR name
The result is:
"INTERNET","10.10.10.0/24","SCB-INET-A"
without using more sophisticated text editing tools (as an exercise!)
$ tr '|' '\n' <file | # make it columnar
egrep '^(zone|CIDR|name)=' | # get exact key matches
cut -d= -f2 | # get values
while read line; do echo '"'$line'"'; done | # quote values
paste -sd, # flatten with comma
will give
"INTERNET","10.10.10.0/24","SCB-INET-A"
you can also replace while statement with xargs printf '"%s"\n'
Not using sed or awk but the Bash Arrays feature.
line="zone=INTERNET|sta=good|CIDR=10.10.10.0/24|a=1 1|...=...|name=SCB-INET-A"
echo "$line" | tr '|' '\n' | {
declare -A vars
while read -r item ; do
if [ -n "$item" ] ; then
vars["${item%%=*}"]="${item##*=}"
fi
done
echo "\"${vars[zone]}\",\"${vars[CIDR]}\",\"${vars[name]}\"" ; }
One advantage of this method is that you always get your fields in order independent of the order of fields in the input line.

Search file A for a list of strings located in file B and append the value associated with that string to the end of the line in file A

This is a bit complicated, well I think it is..
I have two files, File A and file B
File A contains delay information for a pin and is in the following format
AD22 15484
AB22 9485
AD23 10945
File B contains a component declaration that needs this information added to it and is in the format:
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
So what I am trying to achieve is the following output
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
There is no order to the pin numbers in file A or B
So I'm assuming the following needs to happen
open file A, read first line
search file B for first string field in the line just read
once found in file B at the end of the line add the text "\nPIN_DELAY='"
add the second string filed of the line read from file A
add the following text at the end "';"
repeat by opening file A, read the second line
I'm assuming it will be a combination of sed and awk commands and I'm currently trying to work it out but think this is beyond my knowledge. Many thanks in advance as I know it's complicated..
FILE2=`cat file2`
FILE1=`cat file1`
TMPFILE=`mktemp XXXXXXXX.tmp`
FLAG=0
for line in $FILE1;do
echo $line >> $TMPFILE
for line2 in $FILE2;do
if [ $FLAG == 1 ];then
echo -e "PIN_DELAY='$(echo $line2 | awk -F " " '{print $1}')'" >> $TMPFILE
FLAG=0
elif [ "`echo $line | grep $(echo $line2 | awk -F " " '{print $1}')`" != "" ];then
FLAG=1
fi
done
done
mv $TMPFILE file1
Works for me, you can also add a trap for remove tmp file if user send sigint.
awk to the rescue...
$ awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' keys data
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it.
FINAL Script for my application, A big thank you to all that helped..
# ! /usr/bin/sh
# script created by Adam with a LOT of help from users on stackoverflow
# must pass $1 file (package file from Xilinx)
# must pass $2 file (chips.prt file from the PCB design office)
# remove these temp files, throws error if not present tho, whoops!!
rm DELAYS.txt CHIP.txt OUTPUT.txt
# BELOW::create temp files for the code thanks to Glastis#stackoverflow https://stackoverflow.com/users/5101968/glastis I now know how to do this
DELAYS=`mktemp DELAYS.txt`
CHIP=`mktemp CHIP.txt`
OUTPUT=`mktemp OUTPUT.txt`
# BELOW::grep input file 1 (pkg file from Xilinx) for lines containing a delay in the form of n.n and use TAIL to remove something (can't remember), sed to remove blanks and replace with single space, sed to remove space before \n, use awk to print columns 3,9,10 and feed into awk again to calculate delay provided by fedorqui#stackoverflow https://stackoverflow.com/users/1983854/fedorqui
# In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
# {...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
# $(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
# {$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
# {...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".
grep -E -0 '[0-9]\.[0-9]' $1 | tail -n +2 | sed -e 's/[[:blank:]]\+/ /g' -e 's/\s\n/\n/g' | awk '{print ","$3",",$9,$10}' | awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 169); NF--}1' >> $DELAYS
# remove blanks in part file and add additional commas (,) so that the following awk command works properly
cat $2 | sed -e "s/[[:blank:]]\+//" -e "s/(/(,/g" -e 's/)/,)/g' >> $CHIP
# this awk command is provided by karakfa#stackoverflow https://stackoverflow.com/users/1435869/karakfa Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it. https://stackoverflow.com/questions/32458680/search-file-a-for-a-list-of-strings-located-in-file-b-and-append-the-value-assoc
awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' $DELAYS $CHIP >> $OUTPUT
# remove the additional commas (,) added in earlier before ) and after ( and you are done..
cat $OUTPUT | sed -e 's/(,/(/g' -e 's/,)/)/g' >> chipsd.prt

Bash Text file formatting

I have some files with the following format:
555584280113;01-04-2013 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
552185022741;01-04-2013 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511965271852;01-04-2013 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
5511980644500;01-04-2013 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
553186398559;01-04-2013 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
558487839822;01-04-2013 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I need to have them with a sequence of 10 digits long at the beginning, removed the prefix 55 on the second column (which I have done with a simple sed 's/^55//g') and reformat the date to look like this:
0000000001;555584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;552185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;5511965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;5511980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;553186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;555584280113;01-04-2013 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
I have the date part in a separate way:
cat file.txt | cut -d\; -f2 | awk '{print $1}' |awk -v OFS="-" -F"-" '{print $3$2$1}'
And it works, but I don't know how to put all of them together, the sequence + sed for the prefix + change the date format. The sequence part I'm not even sure how to do it.
Thanks for the help.
awk is one of the best tool out there used for text parsing and formatting. Here is one way of meeting your requirements:
awk '
BEGIN { FS = OFS = ";" }
{
printf "%010d;", NR
$1 = substr($1,3)
split($2, tmp, /[- ]/)
$2=tmp[3]tmp[2]tmp[1]" "tmp[4]
}1' file
We set the input and output field separator to ;
We use printf to format your first column number requirement
We use substr function to remove the first two characters of column 1
We use split function to format the time
Using 1 we print rest of the statement as is.
Output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
If the name of the input file is input, then the following command removes the 55, adds a 10-digit line number, and rearranges the date. With GNU sed:
nl -nrz -w10 -s\; input | sed -r 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
If one is using Mac OSX (or another OS without GNU sed), then a slight change is required:
nl -nrz -w10 -s\; input | sed -E 's/55//; s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/'
Sample output:
0000000001;5584280113;20130401 00:00:11;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000002;2185022741;20130401 00:00:13;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000003;11965271852;20130401 00:00:14;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000004;11980644500;20130401 00:00:22;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000005;3186398559;20130401 00:00:31;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000006;5584280113;20130401 00:00:41;0,22;889;30008;1501;sms;/xxx/yyy/zzz
0000000007;8487839822;20130401 00:01:09;0,22;889;30008;1501;sms;/xxx/yyy/zzz
How it works: nl is a handy *nix utility for adding line numbers. -w10 tells nl that we want 10 digit line numbers. -nrz tells nl to pad the line numbers with zeros, and -s\; tells nl to add a semicolon after the line number. (We have to escape the semicolon so that the shell ignores it.)
The remaining changes are handled by sed. The sed command s/55// removes the first occurrence of 55. The rearrangement of the date is handled by s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3\2\1/.
You could actually use a Bash loop to do this.
i=0
while read f1 f2; do
((++i))
IFS=\; read n d <<< $f1
d=${d:6:4}${d:3:2}${d:0:2}
printf "%010d;%d;%d %s\n" $i $n $d $f2
done < file.txt

Use "cut" in shell script without space as delimiter

I'm trying to write a script that reads the file content below and extract the value in the 6th column of each line, then print each line without the 6th column. The comma is used as the delimiter.
Input:
123,456,789,101,145,5671,hello world,goodbye for now
223,456,789,101,145,5672,hello world,goodbye for now
323,456,789,101,145,5673,hello world,goodbye for now
What I did was
#!/bin/bash
for i in `cat test_input.txt`
do
COLUMN=`echo $i | cut -f6 -d','`
echo $i | cut -f1-5,7- -d',' >> test_$COLUMN.txt
done
The output I got was
test_5671.txt:
123,456,789,101,145,hello
test_5672.txt:
223,456,789,101,145,hello
test_5673.txt:
323,456,789,101,145,hello
The rest of "world, goodbye for now" was not written into the output files, because it seems like the space between "hello" and "world" was used as a delimiter?
How do I get the correct output
123,456,789,101,145,hello world,goodbye for now
It's not a problem with the cut command but with the for loop you're using. For the first loop run the variable i will only contain 123,456,789,101,145,5671,hello.
If you insist to read the input file line-by-line (not very efficient), you'd better use a read-loop like this:
while read i
do
...
done < test_input.txt
echo '123,456,789,101,145,5671,hello world,goodbye for now' | while IFS=, read -r one two three four five six seven eight rest
do
echo "$six"
echo "$one,$two,$three,$four,$five,$seven,$eight${rest:+,$rest}"
done
Prints:
5671
123,456,789,101,145,hello world,goodbye for now
See the man bash Parameter Expansion section for the :+ syntax (essentially it outputs a comma and the $rest if $rest is defined and non-empty).
Also, you shouldn't use for to loop over file contents.
As ktf mentioned, your problem is not with cut but with the way you're passing the lines into cut. The solution he/she has provided should work.
Alternatively, you could achieve the same behaviour with a line of awk:
awk -F, '{for(i=1;i<=NF;i++) {if(i!=6) printf "%s%s",$i,(i==NF)?"\n":"," > "test_"$6".txt"}}' test_input.txt
For clarity, here's a verbose version:
awk -F, ' # "-F,": using comma as field separator
{ # for each line in file
for(i=1;i<=NF;i++) { # for each column
sep = (i == NF) ? "\n" : "," # column separator
outfile = "test_"$6".txt" # output file
if (i != 6) { # skip sixth column
printf "%s%s", $i, sep > outfile
}
}
}' test_input.txt
an easy method id to use tr commende to convert the espace carracter into # and after doing the cat commande retranslate it into the espace.

Resources