Search file A for a list of strings located in file B and append the value associated with that string to the end of the line in file A - bash

This is a bit complicated, well I think it is..
I have two files, File A and file B
File A contains delay information for a pin and is in the following format
AD22 15484
AB22 9485
AD23 10945
File B contains a component declaration that needs this information added to it and is in the format:
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
So what I am trying to achieve is the following output
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
There is no order to the pin numbers in file A or B
So I'm assuming the following needs to happen
open file A, read first line
search file B for first string field in the line just read
once found in file B at the end of the line add the text "\nPIN_DELAY='"
add the second string filed of the line read from file A
add the following text at the end "';"
repeat by opening file A, read the second line
I'm assuming it will be a combination of sed and awk commands and I'm currently trying to work it out but think this is beyond my knowledge. Many thanks in advance as I know it's complicated..

FILE2=`cat file2`
FILE1=`cat file1`
TMPFILE=`mktemp XXXXXXXX.tmp`
FLAG=0
for line in $FILE1;do
echo $line >> $TMPFILE
for line2 in $FILE2;do
if [ $FLAG == 1 ];then
echo -e "PIN_DELAY='$(echo $line2 | awk -F " " '{print $1}')'" >> $TMPFILE
FLAG=0
elif [ "`echo $line | grep $(echo $line2 | awk -F " " '{print $1}')`" != "" ];then
FLAG=1
fi
done
done
mv $TMPFILE file1
Works for me, you can also add a trap for remove tmp file if user send sigint.

awk to the rescue...
$ awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' keys data
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it.

FINAL Script for my application, A big thank you to all that helped..
# ! /usr/bin/sh
# script created by Adam with a LOT of help from users on stackoverflow
# must pass $1 file (package file from Xilinx)
# must pass $2 file (chips.prt file from the PCB design office)
# remove these temp files, throws error if not present tho, whoops!!
rm DELAYS.txt CHIP.txt OUTPUT.txt
# BELOW::create temp files for the code thanks to Glastis#stackoverflow https://stackoverflow.com/users/5101968/glastis I now know how to do this
DELAYS=`mktemp DELAYS.txt`
CHIP=`mktemp CHIP.txt`
OUTPUT=`mktemp OUTPUT.txt`
# BELOW::grep input file 1 (pkg file from Xilinx) for lines containing a delay in the form of n.n and use TAIL to remove something (can't remember), sed to remove blanks and replace with single space, sed to remove space before \n, use awk to print columns 3,9,10 and feed into awk again to calculate delay provided by fedorqui#stackoverflow https://stackoverflow.com/users/1983854/fedorqui
# In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
# {...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
# $(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
# {$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
# {...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".
grep -E -0 '[0-9]\.[0-9]' $1 | tail -n +2 | sed -e 's/[[:blank:]]\+/ /g' -e 's/\s\n/\n/g' | awk '{print ","$3",",$9,$10}' | awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 169); NF--}1' >> $DELAYS
# remove blanks in part file and add additional commas (,) so that the following awk command works properly
cat $2 | sed -e "s/[[:blank:]]\+//" -e "s/(/(,/g" -e 's/)/,)/g' >> $CHIP
# this awk command is provided by karakfa#stackoverflow https://stackoverflow.com/users/1435869/karakfa Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it. https://stackoverflow.com/questions/32458680/search-file-a-for-a-list-of-strings-located-in-file-b-and-append-the-value-assoc
awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' $DELAYS $CHIP >> $OUTPUT
# remove the additional commas (,) added in earlier before ) and after ( and you are done..
cat $OUTPUT | sed -e 's/(,/(/g' -e 's/,)/)/g' >> chipsd.prt

Related

How to find content in a file and replace the adjecent value

Using bash how do I find a string and update the string next to it for example pass value
my.site.com|test2.spin:80
proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
Expected output is to update proxy_pass.map with
my.site2.com test2.spin:80
my.site.com test2.spin:80;
I tried using awk
awk '{gsub(/^my\.site\.com\s+[A-Za-z0-9]+\.spin:8080;$/,"my.site2.comtest2.spin:80"); print}' proxy_pass.map
but does not seem to work. Is there a better way to approch the problem. ?
One awk idea, assuming spacing needs to be maintained:
awk -v rep='my.site.com|test2.spin:80' '
BEGIN { split(rep,a,"|") # split "rep" variable and store in
site[a[1]]=a[2] # associative array
}
$1 in site { line=$0 # if 1st field is in site[] array then make copy of current line
match(line,$1) # find where 1st field starts (in case 1st field does not start in column #1)
newline=substr(line,1,RSTART+RLENGTH-1) # save current line up through matching 1st field
line=substr(line,RSTART+RLENGTH) # strip off 1st field
match(line,/[^[:space:];]+/) # look for string that does not contain spaces or ";" and perform replacement, making sure to save everything after the match (";" in this case)
newline=newline substr(line,1,RSTART-1) site[$1] substr(line,RSTART+RLENGTH)
$0=newline # replace current line with newline
}
1 # print current line
' proxy_pass.map
This generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
If the input looks like:
$ cat proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
This awk script generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
NOTES:
if multiple replacements need to be performed I'd suggest placing them in a file and having awk process said file first
the 2nd match() is hardcoded based on OP's example; depending on actual file contents it may be necessary to expand on the regex used in the 2nd match()
once satisified with the result the original input file can be updated in a couple ways ... a) if using GNU awk then awk -i inplace -v rep.... or b) save result to a temp file and then mv the temp file to proxy_pass.map
If the number of spaces between the columns is not significant, a simple
proxyf=proxy_pass.map
tmpf=$$.txt
awk '$1 == "my.site.com" { $2 = "test2.spin:80;" } {print}' <$proxyf >$tmpf && mv $tmpf $proxyf
should do. If you need the columns to be lined up nicely, you can replace the print by a suitable printf .... statement.
With your shown samples and attempts please try following awk code. Creating shell variable named var where it stores value my.site.com|test2.spin:80 in it. which further is being passed to awk program. In awk program creating variable named var1 which has shell variable var's value in it.
In BEGIN section of awk using split function to split value of var(shell variable's value container) into array named arr with separator as |. Where num is total number of values delimited by split function. Then using for loop to be running till value of num where it creates array named arr2 with index of current i value and making i+1 as its value(basically 1 is for key of array and next item is value of array).
In main block of awk program checking condition if $1 is in arr2 then print arr2's value else print $2 value as per requirement.
##Shell variable named var is being created here...
var="my.site.com|test2.spin:80"
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
print $1,(($1 in arr2)?arr2[$1]:$2)
}
' Input_file
OR in case you want to maintain spaces between 1st and 2nd field(s) then try following code little tweak of Above code. Written and tested with your shown samples Only.
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
match($0,/[[:space:]]+/)
print $1 substr($0,RSTART,RLENGTH) (($1 in arr2)?arr2[$1]:$2)
}
' Input_file
NOTE: This program can take multiple values separated by | in shell variable to be passed and checked on in awk program. But it considers that it will be in format of key|value|key|value... only.
#!/bin/sh -x
f1=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f1)
f2=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f2)
echo "${f1}%${f2};" >> proxy_pass.map
tr '%' '\t' < proxy_pass.map >> p1
cat > ed1 <<EOF
$
-1
d
wq
EOF
ed -s p1 < ed1
mv -v p1 proxy_pass.map
rm -v ed1
This might work for you (GNU sed):
<<<'my.site.com|test2.spin:80' sed -E 's#\.#\\.#g;s#^(\S+)\|(\S+)#/^\1\\b/s/\\S+/\2/2#' |
sed -Ef - file
Build a sed script from the input arguments and apply it to the input file.
The input arguments are first prepared so that their metacharacters ( in this case the .'s are escaped.
Then the first argument is used to prepare a match command and the second is used as the value to be replaced in a substitution command.
The result is piped into a second sed invocation that takes the sed script and applies it the input file.

Bash: Separating a file by blank lines and assigning to a list

So i have a file for example
a
b
c
d
I'd like to make the list of the lines with data out of this. The empty line would be the seperator. So above file's list would be
First element = a
Second element = b
c
Third element = d
Replace blank lines with ,, then remove newline characters:
cat <file> | sed 's/^$/, /' | tr -d '\n'
The following awk would do:
awk 'BEGIN{RS="";ORS=",";FS="\n";OFS=""}($1=$1)' file
This adds an extra , at the end. You can get rid of that in the following way:
awk 'BEGIN{RS="";ORS=",";FS="\n";OFS=""}
{$1=$1;s=s $0 ORS}END{sub(ORS"$","",s); print s}' file
But what happened now, by making this slight modification to eliminate the last ORS (i.e. comma), you have to store the full thing in memory. So you could then just do it more boring and less elegant by storing the full file in memory:
awk '{s=s $0}END{gsub(/\n\n/,",",s);gsub(/\n/,"",s); print s}' file
The following sed does exactly the same. Store the full file in memory and process it.
sed ':a;N;$!ba;s/\n\n/,/g;s/\n//g' <file>
There is, however, a way to play it a bit more clever with awk.
awk 'BEGIN{RS=OFS="";FS="\n"}{$1=$1; print (NR>1?",":"")$0}' file
It depends on what you need to do with that data.
With perl, you have a one-liner:
$ perl -00 -lnE 'say "element $. = $_"' file.txt
element 1 = a
element 2 = b
c
element 3 = d
But clearly you need to process the elements in some way, and I suspect Perl is not your cup of tea.
With bash you could do:
elements=()
n=0
while IFS= read -r line; do
[[ $line ]] && elements[n]+="$line"$'\n' || ((n++))
done < file.txt
# strip the trailing newline from each element
elements=("${elements[#]/%$'\n'/}")
# and show what's in the array
declare -p elements
declare -a elements='([0]="a" [1]="b
c" [2]="d")'
$ awk -v RS= '{print "Element " NR " = " $0}' file
Element 1 = a
Element 2 = b
c
Element 3 = d
If you really want to say First Element instead of Element 1 then enjoy the exercise :-).

Error in bash script: arithmetic error

I am wrote a simple script to extract text from a bunch of files (*.out) and add two lines at the beginning and a line at the end. Then I add the extracted text with another file to create a new file. The script is here.
#!/usr/bin/env bash
#A simple bash script to extract text from *.out and create another file
for f in *.out; do
#In the following line, n is a number which is extracted from the file name
n=$(echo $f | cut -d_ -f6)
t=$((2 * $n ))
#To extract the necessary text/data
grep " B " $f | tail -${t} | awk 'BEGIN {OFS=" ";} {print $1, $4, $5, $6}' | rev | column -t | rev > xyz.xyz
#To add some text as the first, second and last lines.
sed -i '1i -1 2' xyz.xyz
sed -i '1i $molecule' xyz.xyz
echo '$end' >> xyz.xyz
#To combine the extracted info with another file (ea_input.in)
cat xyz.xyz ./input_ea.in > "${f/abc.out/pqr.in}"
done
./script.sh: line 4: (ls file*.out | cut -d_ -f6: syntax error: invalid arithmetic operator (error token is ".out) | cut -d_ -f6")
How I can correct this error?
In bash, when you use:
$(( ... ))
it treats the contents of the brackets as an arithmetic expression, returning the result of the calculation, and when you use:
$( ... )
it executed the contents of the brackets and returns the output.
So, to fix your issue, it should be as simple as to replace line 4 with:
n=$(ls $f | cut -d_ -f6)
This replaces the outer double brackets with single, and removes the additional brackets around ls $f which should be unnecessary.
The arithmetic error can be avoided by adding spaces between parentheses. You are already using var=$((arithmetic expression)) correctly elsewhere in your script, so it should be easy to see why $( ((ls "$f") | cut -d_ -f6)) needs a space. But the subshells are completely superfluous too; you want $(ls "$f" | cut -d_ -f6). Except ls isn't doing anything useful here, either; use $(echo "$f" | cut -d_ -f6). Except the shell can easily, albeit somewhat clumsily, extract a substring with parameter substitution; "${f#*_*_*_*_*_}". Except if you're using Awk in your script anyway, it makes more sense to do this - and much more - in Awk as well.
Here is an at empt at refactoring most of the processing into Awk.
for f in *.out; do
awk 'BEGIN {OFS=" " }
# Extract 6th _-separated field from input filename
FNR==1 { split(FILENAME, f, "_"); t=2*f[6] }
# If input matches regex, add to array b
/ B / { b[++i] = $1 OFS $4 OFS $5 OFS $6 }
# If array size reaches t, start overwriting old values
i==t { i=0; m=t }
END {
# Print two prefix lines
print "$molecule"; print -1, 2;
# Handle array smaller than t
if (!m) m=i
# Print starting from oldest values (index i + 1)
for(j=1; j<=m; j++) {
# Wrap to beginning of array at end
if(i+j > t) i-=t
print b[i+j]; }
print "$end" }' "$f" |
rev | column -t | rev |
cat - ./input_ea.in > "${f/foo.out/bar.in}"
done
Notice also how we avoid using a temporary file (this would certainly have been avoidable without the Awk refactoring, too) and how we take care to quote all filename variables in double quotes.
The array b contains (up to) the latest t values from matching lines; we collect these into an array which is constrained to never contain more than t values by wrapping the index i back to the beginning of the array when we reach index t. This "circular array" avoids keeping too many values in memory, which would make the script slow if the input file contains many matches.

Print text between two lines (from list of line numbers in file) in Unix [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have a sample file which has thousands of lines.
I want to print text between two line numbers in that file. I don't want to input line numbers manually, rather I have a file which contains list of line numbers between which text has to be printed.
Example : linenumbers.txt
345|789
999|1056
1522|1366
3523|3562
I need a shell script which will read line numbers from this file and print the text between each range of lines into a separate (new) file.
That is, it should print lines between 345 and 789 into a new file, say File1.txt, and print text between lines 999 and 1056 into a new file, say File2.txt, and so on.
considering your target file has only thousands of lines. here is a quick and dirty solution.
awk -F'|' '{system("sed -n \""$1","$2"p\" targetFile > file"NR)}' linenumbers.txt
the targetFile is your file containing thousands of lines.
the oneliner does not require your linenumbers.txt to be sorted.
the oneliner allows line range to be overlapped in your linenumbers.txt
after running the command above, you will have n filex files. n is the row counts of linenumbers.txt x is from 1-n you can change the filename pattern as you want.
Here's one way using GNU awk. Run like:
awk -f script.awk numbers.txt file.txt
Contents of script.awk:
BEGIN {
# set the field separator
FS="|"
}
# for the first file in the arguments list
FNR==NR {
# add the row number and field one as keys to a multidimensional array with
# a value of field two
a[NR][$1]=$2
# skip processing the rest of the code
next
}
# for the second file in the arguments list
{
# for every element in the array's first dimension
for (i in a) {
# for every element in the second dimension
for (j in a[i]) {
# ensure that the first field is treated numerically
j+=0
# if the line number is greater than the first field
# and smaller than the second field
if (FNR>=j && FNR<=a[i][j]) {
# print the line to a file with the suffix of the first file's
# line number (the first dimension)
print > "File" i
}
}
}
}
Alternatively, here's the one-liner:
awk -F "|" 'FNR==NR { a[NR][$1]=$2; next } { for (i in a) for (j in a[i]) { j+=0; if (FNR>=j && FNR<=a[i][j]) print > "File" i } }' numbers.txt file.txt
If you have an 'old' awk, here's the version with compatibility. Run like:
awk -f script.awk numbers.txt file.txt
Contents of script.awk:
BEGIN {
# set the field separator
FS="|"
}
# for the first file in the arguments list
FNR==NR {
# add the row number and field one as a key to a pseudo-multidimensional
# array with a value of field two
a[NR,$1]=$2
# skip processing the rest of the code
next
}
# for the second file in the arguments list
{
# for every element in the array
for (i in a) {
# split the element in to another array
# b[1] is the row number and b[2] is the first field
split(i,b,SUBSEP)
# if the line number is greater than the first field
# and smaller than the second field
if (FNR>=b[2] && FNR<=a[i]) {
# print the line to a file with the suffix of the first file's
# line number (the first pseudo-dimension)
print > "File" b[1]
}
}
}
Alternatively, here's the one-liner:
awk -F "|" 'FNR==NR { a[NR,$1]=$2; next } { for (i in a) { split(i,b,SUBSEP); if (FNR>=b[2] && FNR<=a[i]) print > "File" b[1] } }' numbers.txt file.txt
I would use sed to process the sample data file because it is simple and swift. This requires a mechanism for converting the line numbers file into the appropriate sed script. There are many ways to do this.
One way uses sed to convert the set of line numbers into a sed script. If everything was going to standard output, this would be trivial. With the output needing to go to different files, we need a line number for each line in the line numbers file. One way to give line numbers is the nl command. Another possibility would be to use pr -n -l1. The same sed command line works with both:
nl linenumbers.txt |
sed 's/ *\([0-9]*\)[^0-9]*\([0-9]*\)|\([0-9]*\)/\2,\3w file\1.txt/'
For the given data file, that generates:
345,789w > file1.txt
999,1056w > file2.txt
1522,1366w > file3.txt
3523,3562w > file4.txt
Another option would be to have awk generate the sed script:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt
If your version of sed will allow you to read its script from standard input with -f - (GNU sed does; BSD sed does not), then you can convert the line numbers file into a sed script on the fly, and use that to parse the sample data:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f - sample.data
If your system supports /dev/stdin, you can use one of:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/stdin sample.data
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/fd/0 sample.data
Failing that, use an explicit script file:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt > sed.script
sed -n -f sed.script sample.data
rm -f sed.script
Strictly, you should deal with ensuring the temporary file name is unique (mktemp) and removed even if the script is interrupted (trap):
tmp=$(mktemp sed.script.XXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt > $tmp
sed -n -f $tmp sample.data
rm -f $tmp
trap 0
The final trap 0 allows your script to exit successfully; omit it, and you script will always exit with status 1.
I've ignored Perl and Python; either could be used for this in a single command. The file management is just fiddly enough that using sed seems simpler. You could also use just awk, either with a first awk script writing an awk script to do the heavy duty work (trivial extension of the above), or having a single awk process read both files and produce the required output (harder, but far from impossible).
If nothing else, this shows that there are many possible ways of doing the job. If this is a one-off exercise, it really doesn't matter very much which you choose. If you will be doing this repeatedly, then choose the mechanism that you like. If you're worried about performance, measure. It is likely that converting the line numbers into a command script is a negligible cost; processing the sample data with the command script is where the time is taken. I would expect sed to excel at that point; I've not measured to confirm that it does.
You could do the following
# myscript.sh
linenumbers="linenumber.txt"
somefile="afile"
while IFS=\| read start end ; do
echo "sed -n '$start,${end}p;${end}q;' $somefile > $somefile-$start-$end"
done < $linenumbers
run it like so sh myscript.sh
sed -n '345,789p;789q;' afile > afile-345-789
sed -n '999,1056p;1056q;' afile > afile-999-1056
sed -n '1522,1366p;1366q;' afile > afile-1522-1366
sed -n '3523,3562p;3562q;' afile > afile-3523-3562
then when you're happy do sh myscript.sh | sh
EDIT Added William's excellent points on style and correctness.
EDIT Explanation
The basic idea is to get a script to generate a series of shell commands that can be checked for correctness first before being executed by "| sh".
sed -n '345,789p;789q; means use sed and don't echo each line (-n) ; there are two commands saying from line 345 to 789 p(rint) the lines and the second command is at line 789 q(uit) - by quitting on the last line you save having sed read all the input file.
The while loop reads from the $linenumbers file using read, read if given more than one variable name populates each with a field from the input, a field is usually separated by space and if there are too few variable names then read will put the remaining data into the last variable name.
You can put the following in at your shell prompt to understand that behaviour.
ls -l | while read first rest ; do
echo $first XXXX $rest
done
Try adding another variable second to the above to see what happens then, it should be obvious.
The problem is your data is delimited by |s and that's where using William's suggestion of IFS=\| works as now when reading from the input the IFS has changed and the input is now separated by |s and we get the desired result.
Others can feel free to edit,correct and expand.
To extract the first field from 345|789 you can e.g use awk
awk -F'|' '{print $1}'
Combine that with the answers received from your other question and you will have a solution.
This might work for you (GNU sed):
sed -r 's/(.*)\|(.*)/\1,\2w file-\1-\2.txt/' | sed -nf - file

Use "cut" in shell script without space as delimiter

I'm trying to write a script that reads the file content below and extract the value in the 6th column of each line, then print each line without the 6th column. The comma is used as the delimiter.
Input:
123,456,789,101,145,5671,hello world,goodbye for now
223,456,789,101,145,5672,hello world,goodbye for now
323,456,789,101,145,5673,hello world,goodbye for now
What I did was
#!/bin/bash
for i in `cat test_input.txt`
do
COLUMN=`echo $i | cut -f6 -d','`
echo $i | cut -f1-5,7- -d',' >> test_$COLUMN.txt
done
The output I got was
test_5671.txt:
123,456,789,101,145,hello
test_5672.txt:
223,456,789,101,145,hello
test_5673.txt:
323,456,789,101,145,hello
The rest of "world, goodbye for now" was not written into the output files, because it seems like the space between "hello" and "world" was used as a delimiter?
How do I get the correct output
123,456,789,101,145,hello world,goodbye for now
It's not a problem with the cut command but with the for loop you're using. For the first loop run the variable i will only contain 123,456,789,101,145,5671,hello.
If you insist to read the input file line-by-line (not very efficient), you'd better use a read-loop like this:
while read i
do
...
done < test_input.txt
echo '123,456,789,101,145,5671,hello world,goodbye for now' | while IFS=, read -r one two three four five six seven eight rest
do
echo "$six"
echo "$one,$two,$three,$four,$five,$seven,$eight${rest:+,$rest}"
done
Prints:
5671
123,456,789,101,145,hello world,goodbye for now
See the man bash Parameter Expansion section for the :+ syntax (essentially it outputs a comma and the $rest if $rest is defined and non-empty).
Also, you shouldn't use for to loop over file contents.
As ktf mentioned, your problem is not with cut but with the way you're passing the lines into cut. The solution he/she has provided should work.
Alternatively, you could achieve the same behaviour with a line of awk:
awk -F, '{for(i=1;i<=NF;i++) {if(i!=6) printf "%s%s",$i,(i==NF)?"\n":"," > "test_"$6".txt"}}' test_input.txt
For clarity, here's a verbose version:
awk -F, ' # "-F,": using comma as field separator
{ # for each line in file
for(i=1;i<=NF;i++) { # for each column
sep = (i == NF) ? "\n" : "," # column separator
outfile = "test_"$6".txt" # output file
if (i != 6) { # skip sixth column
printf "%s%s", $i, sep > outfile
}
}
}' test_input.txt
an easy method id to use tr commende to convert the espace carracter into # and after doing the cat commande retranslate it into the espace.

Resources