Use "cut" in shell script without space as delimiter - shell

I'm trying to write a script that reads the file content below and extract the value in the 6th column of each line, then print each line without the 6th column. The comma is used as the delimiter.
Input:
123,456,789,101,145,5671,hello world,goodbye for now
223,456,789,101,145,5672,hello world,goodbye for now
323,456,789,101,145,5673,hello world,goodbye for now
What I did was
#!/bin/bash
for i in `cat test_input.txt`
do
COLUMN=`echo $i | cut -f6 -d','`
echo $i | cut -f1-5,7- -d',' >> test_$COLUMN.txt
done
The output I got was
test_5671.txt:
123,456,789,101,145,hello
test_5672.txt:
223,456,789,101,145,hello
test_5673.txt:
323,456,789,101,145,hello
The rest of "world, goodbye for now" was not written into the output files, because it seems like the space between "hello" and "world" was used as a delimiter?
How do I get the correct output
123,456,789,101,145,hello world,goodbye for now

It's not a problem with the cut command but with the for loop you're using. For the first loop run the variable i will only contain 123,456,789,101,145,5671,hello.
If you insist to read the input file line-by-line (not very efficient), you'd better use a read-loop like this:
while read i
do
...
done < test_input.txt

echo '123,456,789,101,145,5671,hello world,goodbye for now' | while IFS=, read -r one two three four five six seven eight rest
do
echo "$six"
echo "$one,$two,$three,$four,$five,$seven,$eight${rest:+,$rest}"
done
Prints:
5671
123,456,789,101,145,hello world,goodbye for now
See the man bash Parameter Expansion section for the :+ syntax (essentially it outputs a comma and the $rest if $rest is defined and non-empty).
Also, you shouldn't use for to loop over file contents.

As ktf mentioned, your problem is not with cut but with the way you're passing the lines into cut. The solution he/she has provided should work.
Alternatively, you could achieve the same behaviour with a line of awk:
awk -F, '{for(i=1;i<=NF;i++) {if(i!=6) printf "%s%s",$i,(i==NF)?"\n":"," > "test_"$6".txt"}}' test_input.txt
For clarity, here's a verbose version:
awk -F, ' # "-F,": using comma as field separator
{ # for each line in file
for(i=1;i<=NF;i++) { # for each column
sep = (i == NF) ? "\n" : "," # column separator
outfile = "test_"$6".txt" # output file
if (i != 6) { # skip sixth column
printf "%s%s", $i, sep > outfile
}
}
}' test_input.txt

an easy method id to use tr commende to convert the espace carracter into # and after doing the cat commande retranslate it into the espace.

Related

How to find content in a file and replace the adjecent value

Using bash how do I find a string and update the string next to it for example pass value
my.site.com|test2.spin:80
proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
Expected output is to update proxy_pass.map with
my.site2.com test2.spin:80
my.site.com test2.spin:80;
I tried using awk
awk '{gsub(/^my\.site\.com\s+[A-Za-z0-9]+\.spin:8080;$/,"my.site2.comtest2.spin:80"); print}' proxy_pass.map
but does not seem to work. Is there a better way to approch the problem. ?
One awk idea, assuming spacing needs to be maintained:
awk -v rep='my.site.com|test2.spin:80' '
BEGIN { split(rep,a,"|") # split "rep" variable and store in
site[a[1]]=a[2] # associative array
}
$1 in site { line=$0 # if 1st field is in site[] array then make copy of current line
match(line,$1) # find where 1st field starts (in case 1st field does not start in column #1)
newline=substr(line,1,RSTART+RLENGTH-1) # save current line up through matching 1st field
line=substr(line,RSTART+RLENGTH) # strip off 1st field
match(line,/[^[:space:];]+/) # look for string that does not contain spaces or ";" and perform replacement, making sure to save everything after the match (";" in this case)
newline=newline substr(line,1,RSTART-1) site[$1] substr(line,RSTART+RLENGTH)
$0=newline # replace current line with newline
}
1 # print current line
' proxy_pass.map
This generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
If the input looks like:
$ cat proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
This awk script generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
NOTES:
if multiple replacements need to be performed I'd suggest placing them in a file and having awk process said file first
the 2nd match() is hardcoded based on OP's example; depending on actual file contents it may be necessary to expand on the regex used in the 2nd match()
once satisified with the result the original input file can be updated in a couple ways ... a) if using GNU awk then awk -i inplace -v rep.... or b) save result to a temp file and then mv the temp file to proxy_pass.map
If the number of spaces between the columns is not significant, a simple
proxyf=proxy_pass.map
tmpf=$$.txt
awk '$1 == "my.site.com" { $2 = "test2.spin:80;" } {print}' <$proxyf >$tmpf && mv $tmpf $proxyf
should do. If you need the columns to be lined up nicely, you can replace the print by a suitable printf .... statement.
With your shown samples and attempts please try following awk code. Creating shell variable named var where it stores value my.site.com|test2.spin:80 in it. which further is being passed to awk program. In awk program creating variable named var1 which has shell variable var's value in it.
In BEGIN section of awk using split function to split value of var(shell variable's value container) into array named arr with separator as |. Where num is total number of values delimited by split function. Then using for loop to be running till value of num where it creates array named arr2 with index of current i value and making i+1 as its value(basically 1 is for key of array and next item is value of array).
In main block of awk program checking condition if $1 is in arr2 then print arr2's value else print $2 value as per requirement.
##Shell variable named var is being created here...
var="my.site.com|test2.spin:80"
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
print $1,(($1 in arr2)?arr2[$1]:$2)
}
' Input_file
OR in case you want to maintain spaces between 1st and 2nd field(s) then try following code little tweak of Above code. Written and tested with your shown samples Only.
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
match($0,/[[:space:]]+/)
print $1 substr($0,RSTART,RLENGTH) (($1 in arr2)?arr2[$1]:$2)
}
' Input_file
NOTE: This program can take multiple values separated by | in shell variable to be passed and checked on in awk program. But it considers that it will be in format of key|value|key|value... only.
#!/bin/sh -x
f1=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f1)
f2=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f2)
echo "${f1}%${f2};" >> proxy_pass.map
tr '%' '\t' < proxy_pass.map >> p1
cat > ed1 <<EOF
$
-1
d
wq
EOF
ed -s p1 < ed1
mv -v p1 proxy_pass.map
rm -v ed1
This might work for you (GNU sed):
<<<'my.site.com|test2.spin:80' sed -E 's#\.#\\.#g;s#^(\S+)\|(\S+)#/^\1\\b/s/\\S+/\2/2#' |
sed -Ef - file
Build a sed script from the input arguments and apply it to the input file.
The input arguments are first prepared so that their metacharacters ( in this case the .'s are escaped.
Then the first argument is used to prepare a match command and the second is used as the value to be replaced in a substitution command.
The result is piped into a second sed invocation that takes the sed script and applies it the input file.

bash string manipulation - regex match with delimiter

I have a string like this:
zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A
The order inside the delimiter | can be random - that means the key-value pairs can be randomly ordered in the string.
I want an output string like the following:
"INTERNET","10.10.10.0/24","SCB-INET-A"
All values in the output are values from the key-value string above
Does anyone know how I can solve this with awk or sed?
Given your input is a variable var:
var="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A"
echo "$var" | tr "|" "\n" | sed -n -r "s/(zone|name|gateway)=(.*)/\"\2\"/p"
"INTERNET"
"10.10.10.100"
"SCB-INET-A"
Using another 2 pipes inserts commas and removes line breaks:
SOFAR | tr "\n" "," | sed 's/,$//'
"INTERNET","10.10.10.100","SCB-INET-A"
Whenever you have name -> value pairs in your input the best approach is to create an array of those mappings (f[] below) and then access the values by their names:
$ cat tst.awk
BEGIN { RS="|"; FS="[=\n]"; OFS="," }
{ f[$1] = "\"" $2 "\"" }
END { print f["zone"], f["CIDR"], f["name"] }
$ awk -f tst.awk file
"INTERNET","10.10.10.0/24","SCB-INET-A"
The above will work efficiently (i.e. literally orders of magnitude faster than a shell loop) and portably using any awk in any shell on any UNIX box, unlike all of the other answers so far which all rely on non-POSIX functionality. It does full string matching instead of partial regexp matching, like some of the other answers, so it is extremely robust and will not result in bad output given partial matches. It also will not interpret any input characters (e.g. escape sequences and/or globbing chars), like some of your other answers do, and instead will just robustly reproduce them as-is in the output.
If you need to enhance it to print any extra field values just add them as , f["<field name>"] to the print statement and if you need to change the output format or do anything else it's all absolutely trivial too.
Using awk:
var="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|name=SCB-INET-A|inheritDNSRestrictions=true"
awk -v RS='|' -v ORS=',' -F= '$1~/zone|gateway|name/{print "\"" $2 "\""}' <<<"$var" | sed 's/,$//'
"INTERNET","10.10.10.100","SCB-INET-A"
The input record separator RS is set to |.
The input field separator FS is set to =.
The output record separator ORS is set to ,.
$1~/zone|gateway|name/ is filtering the parameter to extract. The print statement is added double quote to the parameter value.
The sed statement is to remove the annoying last , (that the print statement is adding).
One more solution using Bash. Not the shortest but I hope it is the best readable and so the best maintainable.
#!/bin/bash
# Function split_key_val()
# selects values from a string with key-value pairs
# IN: string_with_key_value_pairs wanted_key_1 [wanted_key_2] ...
# OUT: result
function split_key_val {
local KEY_VAL_STRING="$1"
local RESULT
# read the string with key-value pairs into array
IFS=\| read -r -a ARRAY <<< "$KEY_VAL_STRING"
#
shift
# while there are wanted-keys ...
while [[ -n $1 ]]
do
WANTED_KEY="$1"
# Search the array for the wanted-key
for KEY_VALUE in "${ARRAY[#]}"
do
# the key is the part before "="
KEY=$(echo "$KEY_VALUE" |cut --delimiter="=" --fields=1)
# the value is the part after "="
VALUE=$(echo "$KEY_VALUE" |cut --delimiter="=" --fields=2)
if [[ $KEY == $WANTED_KEY ]]
then
# if result is empty; result= found value...
if [[ -z $RESULT ]]
then
# (quote the damned quotes)
RESULT="\"${VALUE}\""
else
# ... else add a comma as a separator
RESULT="${RESULT},\"${VALUE}\""
fi
fi # key == wanted-key
done # searched whole array
shift # prepare for next wanted-key
done
echo "$RESULT"
return 0
}
STRING="zone=INTERNET|status=good|routed=special|location=001|resp=user|switch=not set|stack=no|dswres=no|CIDR=10.10.10.0/24|allowDuplicateHost=disable|inheritAllowDuplicateHost=true|pingBeforeAssign=enable|inheritPingBeforeAssign=true|locationInherited=true|gateway=10.10.10.100|inheritDefaultDomains=true|inheritDefaultView=true|inheritDNSRestrictions=true|name=SCB-INET-A"
split_key_val "$STRING" zone CIDR name
The result is:
"INTERNET","10.10.10.0/24","SCB-INET-A"
without using more sophisticated text editing tools (as an exercise!)
$ tr '|' '\n' <file | # make it columnar
egrep '^(zone|CIDR|name)=' | # get exact key matches
cut -d= -f2 | # get values
while read line; do echo '"'$line'"'; done | # quote values
paste -sd, # flatten with comma
will give
"INTERNET","10.10.10.0/24","SCB-INET-A"
you can also replace while statement with xargs printf '"%s"\n'
Not using sed or awk but the Bash Arrays feature.
line="zone=INTERNET|sta=good|CIDR=10.10.10.0/24|a=1 1|...=...|name=SCB-INET-A"
echo "$line" | tr '|' '\n' | {
declare -A vars
while read -r item ; do
if [ -n "$item" ] ; then
vars["${item%%=*}"]="${item##*=}"
fi
done
echo "\"${vars[zone]}\",\"${vars[CIDR]}\",\"${vars[name]}\"" ; }
One advantage of this method is that you always get your fields in order independent of the order of fields in the input line.

Find and Replace with awk

I have this value, cutted from .txt:
,Request Id,dummy1,dummy2,dummyN
I am trying to find and replace the space with "_", like this:
#iterator to read lines of txt
#if conditions
trim_line=$(echo "$user" | awk '{gsub(" ", "_", $0); print}')
echo $trim_line
but the echo is showing:
Id,dummy1,dummy2,dummyN
Expected output:
,Request_Id,dummy1,dummy2,dummyN
Where is my bug?
EDIT:
The echo of user is not the expected, it is:
Id,dummy1,dummy2,dummyN
And should be:
,Request Id,dummy1,dummy2,dummyN
To do this operation I am using:
for user in $(cut -d: -f1 $FILENAME)
do (....) find/replace
You can try bash search and replace substring :
echo $user
,Request Id,dummy1,dummy2,dummyN
echo ${user// /_} ## For all the spaces
,Request_Id,dummy1,dummy2,dummyN
echo ${user/ /_} ## For first match
This will replace all the blank spaces with _. Note that here two / are used after user. This is to do the search and replace operation on whole text. If you put only one / then search and replace would be done over first match.
Your problem is your use of a for loop to read the contents of your file. The shell splits the output of your command substitution $(cut -d: -f1 $FILENAME) on white space and you have one in the middle of your line, so it breaks.
Use a while read loop to read the file line by line:
while IFS=: read -r col junk; do
col=${col// /_}
# use $col here
done < "$FILENAME"
As others have mentioned, there's no need to use an external tool to make the substitution.
...That said, if you don't plan on doing something different (e.g. executing other commands) with each line, then the best option is to use awk:
awk -F: '{ gsub(/ /, "_", $1); print $1 }' "$FILENAME"
The output of this command is the first column of your input file, with the substitution made.
If your data is already in an environment variable, the fastest way is to directly use built-in bash replacement feature:
echo "${user// /_/}"
With awk, set the separator as , or the space character will be interpreted as the separator.
echo ",Request Id,dummy1,dummy2,dummyN" | awk -F, '{gsub(" ", "_", $0); print}'
,Request_Id,dummy1,dummy2,dummyN
note: if it's just to replace a character in a raw string (no tokens, no fields), bash, sed and tr are best suited.

sed - removing last comma from listed value after doing a replace

I'm using sed to replace my file of new lines \n with ',' which works fine however, in my last item, I don't want the ,.
How can I remove this?
Example:
sed 's/\n/,/g' myfile.out > myfile.csv
Output:
1,2,3,4,5,6,
Well you can use labels:
$ cat file
1
2
3
4
5
6
$ sed ':a;N;s/\n/,/;ba' file
1,2,3,4,5,6
You can also use paste command:
$ paste -sd, file
1,2,3,4,5,6
Consider jaypal singh's paste solution, which is the most efficient and elegant.
An awk alternative, which doesn't require reading the entire file into memory first:
awk '{ printf "%s%s", sep, $0; sep = "," }' myfile.out > myfile.csv
If the output should have a trailing newline (thanks, Ed Morton):
awk '{ printf "%s%s", sep, $0; sep = "," } END { printf "\n" }' myfile.out > myfile.csv
For the first input line, sep, due to being an uninitialized variable, defaults to the empty string, effectively printing just $0, the input line.
Setting sep to "," after the first print ensures that all remaining lines have a , prepended.
END { printf "\n" } prints a trailing newline after all input lines have been processed. (print "" would work too, given that print appends the output record separator (ORS), which defaults to a newline).
The net effect is that , is only placed between input lines, so the output won't have a trailing comma.
You could add a second s command after the first: sed -z 's/\n/,/g ; s/,$//. This removes a comma at the end. (The option -z is from gnu sed and I needed it to get the first s command working.)

Search file A for a list of strings located in file B and append the value associated with that string to the end of the line in file A

This is a bit complicated, well I think it is..
I have two files, File A and file B
File A contains delay information for a pin and is in the following format
AD22 15484
AB22 9485
AD23 10945
File B contains a component declaration that needs this information added to it and is in the format:
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
So what I am trying to achieve is the following output
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
There is no order to the pin numbers in file A or B
So I'm assuming the following needs to happen
open file A, read first line
search file B for first string field in the line just read
once found in file B at the end of the line add the text "\nPIN_DELAY='"
add the second string filed of the line read from file A
add the following text at the end "';"
repeat by opening file A, read the second line
I'm assuming it will be a combination of sed and awk commands and I'm currently trying to work it out but think this is beyond my knowledge. Many thanks in advance as I know it's complicated..
FILE2=`cat file2`
FILE1=`cat file1`
TMPFILE=`mktemp XXXXXXXX.tmp`
FLAG=0
for line in $FILE1;do
echo $line >> $TMPFILE
for line2 in $FILE2;do
if [ $FLAG == 1 ];then
echo -e "PIN_DELAY='$(echo $line2 | awk -F " " '{print $1}')'" >> $TMPFILE
FLAG=0
elif [ "`echo $line | grep $(echo $line2 | awk -F " " '{print $1}')`" != "" ];then
FLAG=1
fi
done
done
mv $TMPFILE file1
Works for me, you can also add a trap for remove tmp file if user send sigint.
awk to the rescue...
$ awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' keys data
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it.
FINAL Script for my application, A big thank you to all that helped..
# ! /usr/bin/sh
# script created by Adam with a LOT of help from users on stackoverflow
# must pass $1 file (package file from Xilinx)
# must pass $2 file (chips.prt file from the PCB design office)
# remove these temp files, throws error if not present tho, whoops!!
rm DELAYS.txt CHIP.txt OUTPUT.txt
# BELOW::create temp files for the code thanks to Glastis#stackoverflow https://stackoverflow.com/users/5101968/glastis I now know how to do this
DELAYS=`mktemp DELAYS.txt`
CHIP=`mktemp CHIP.txt`
OUTPUT=`mktemp OUTPUT.txt`
# BELOW::grep input file 1 (pkg file from Xilinx) for lines containing a delay in the form of n.n and use TAIL to remove something (can't remember), sed to remove blanks and replace with single space, sed to remove space before \n, use awk to print columns 3,9,10 and feed into awk again to calculate delay provided by fedorqui#stackoverflow https://stackoverflow.com/users/1983854/fedorqui
# In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
# {...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
# $(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
# {$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
# {...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".
grep -E -0 '[0-9]\.[0-9]' $1 | tail -n +2 | sed -e 's/[[:blank:]]\+/ /g' -e 's/\s\n/\n/g' | awk '{print ","$3",",$9,$10}' | awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 169); NF--}1' >> $DELAYS
# remove blanks in part file and add additional commas (,) so that the following awk command works properly
cat $2 | sed -e "s/[[:blank:]]\+//" -e "s/(/(,/g" -e 's/)/,)/g' >> $CHIP
# this awk command is provided by karakfa#stackoverflow https://stackoverflow.com/users/1435869/karakfa Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it. https://stackoverflow.com/questions/32458680/search-file-a-for-a-list-of-strings-located-in-file-b-and-append-the-value-assoc
awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' $DELAYS $CHIP >> $OUTPUT
# remove the additional commas (,) added in earlier before ) and after ( and you are done..
cat $OUTPUT | sed -e 's/(,/(/g' -e 's/,)/)/g' >> chipsd.prt

Resources