Assign line a variable name Unix - bash

I'm currently reading in a file of three letter strings in unix and was wondering how I would go about making the lines variables so that I can grep them in the code...
My idea goes something like this:
!#/bin/bash
IFS=''
while read line
do
code=$(line)
#This would be where I want to assign the line a variable
grep "$code" final.txt > deptandcourse.txt
#This is where I would want to grep according to that three letter string
done < strings.txt
Sample file (strings.txt):
ABC
BCA
BDC
I would like to put these letters in the variable line and then grep the file (final.txt) first for 'ABC', then 'BCA', then 'BDC'

line is a variable you've set to contain the contents of each line of the file your reading from throughout the loop, so you don't need to reassign it to another variable. See this page for more information on using read in a loop.
Also, it looks like you might want to append to deptandcourse.txt with >> as using the > redirect will overwrite the file each time.
Maybe this is what you want:
while read -r line
do
grep "$line" final.txt >> deptandcourse.txt
done < strings.txt
As #JohnZwinck suggested in his comment:
grep -f strings.txt final.txt > deptandcourse.txt
which seems to be the best solution.
You could also use awk to accomplish the same thing:
awk 'FNR==NR {
a[$0]
next
}
{
for(i in a)
if($0 ~ i)
print
}' strings.txt final.txt > deptandcourse.txt

Related

Grep a line from a file and replace a substring and append the line to the original file in bash?

This is what I want to do.
for example my file contains many lines say :
ABC,2,4
DEF,5,6
GHI,8,9
I want to copy the second line and replace a substring EF(all occurrences) and make it XY and add this line back to the file so the file looks like this:
ABC,2,4
DEF,5,6
GHI,8,9
DXY,5,6
how can I achieve this in bash?
EDIT : I want to do this in general and not necessarily for the second line. I want to grep EF, and do the substition in whatever line is returned.
Here's a simple Awk script.
awk -F, -v pat="EF" -v rep="XY" 'BEGIN { OFS=FS }
$1 ~ pat { x = $1; sub(pat, rep, x); y = $0; sub($1, x, y); a[++n] = y }
1
END { for(i=1; i<=n; i++) print a[i] }' file
The -F , says to use comma as the input field separator (internal variable FS) and in the BEGIN block, we also set that as the output field separator (OFS).
If the first field matches the pattern, we copy the first field into x, substitute pat with rep, and then substitute the first field of the whole line $0 with the new result, and append it to the array a.
1 is a shorthand to say "print the current input line".
Finally, in the END block, we output the values we have collected into a.
This could be somewhat simplified by hardcoding the pattern and the replacement, but I figured it's more useful to make it modular so that you can plug in whatever values you need.
While this all could be done in native Bash, it tends to get a bit tortured; spending the 30 minutes or so that it takes to get a basic understanding of Awk will be well worth your time. Perhaps tangentially see also while read loop extremely slow compared to cat, why? which explains part of the rationale for preferring to use an external tool like Awk over a pure Bash solution.
You can use the sed command:
sed '
/EF/H # copy all matching lines
${ # on the last line
p # print it
g # paste the copied lines
s/EF/XY/g # replace all occurences
s/^\n// # get rid of the extra newline
}'
As a one-liner:
sed '/EF/H;${p;g;s/EF/XY/g;s/^\n//}' file.csv
If ed is available/acceptable, something like:
#!/bin/sh
ed -s file.txt <<-'EOF'
$kx
g/^.*EF.*,.*/t'x
'x+;$s/EF/XY/
,p
Q
EOF
Or in one-line.
printf '%s\n' '$kx' "g/^.*EF.*,.*/t'x" "'x+;\$s/EF/XY/" ,p Q | ed -s file.txt
Change Q to w if in-place editing is needed.
Remove the ,p to silence the output.
Using BASH:
#!/bin/bash
src="${1:-f.dat}"
rep="${2:-XY}"
declare -a new_lines
while read -r line ; do
if [[ "$line" == *EF* ]] ; then
new_lines+=("${line/EF/${rep}}")
fi
done <"$src"
printf "%s\n" "${new_lines[#]}" >> "$src"
Contents of f.dat before:
ABC,2,4
DEF,5,6
GHI,8,9
Contents of f.dat after:
ABC,2,4
DEF,5,6
GHI,8,9
DXY,5,6
Following on from the great answer by #tripleee, you can create a variation that uses a single call to sub() by outputting all records before the substitution is made, then add the updated record to the array to be output with the END rule, e.g.
awk -F, '1; /EF/ {sub(/EF/,"XY"); a[++n]=$0} END {for(i=1;i<=n;i++) print a[i]}' file
Example Use/Output
An expanded input based on your answer to my comment below the question that all occurrences of EF will be replaced with XY in all records, e.g.
$ cat file
ABC,2,4
DEF,5,6
GHI,8,9
EFZ,3,7
Use and output:
$ awk -F, '1; /EF/ {sub(/EF/,"XY"); a[++n]=$0} END {for(i=1;i<=n;i++) print a[i]}' file
ABC,2,4
DEF,5,6
GHI,8,9
EFZ,3,7
DXY,5,6
XYZ,3,7
Let me know if you have questions.

Read columns from a file into variables and use for substitute values in another file

I have following file :
input.txt
b73_chr10 w22_chr9
w22_chr7 w22_chr10
w22_chr8 w22_chr8
I have written the following code(given below) to read the first and second column and substitute the values of first column with values in second column in output.conf file .For example, I would like to change the value b73_chr10 with w22_chr9,w22_chr7 with w22_chr10,w22_chr8 with w22_chr8 and keep doing for all the values till the end.
value1=$(echo $line| awk -F\ '{print $1}' input.txt)
value2=$(echo $line| awk -F\ '{print $2}' input.txt)
sed -i '.bak' 's/$value1/$value2/g' output.conf
cat output.conf
output.conf
<rules>
<rule>
condition =between(b73_chr10,w22_chr1)
color = ylgn-9-seq-7
flow=continue
z=9
</rule>
<rule>
condition =between(w22_chr7,w22_chr2)
color = blue
flow=continue
z=10
</rule>
<rule>
condition =between(w22_chr8,w22_chr3)
color = vvdblue
flow=continue
z=11
</rule>
</rules>
I tried the commands(as above),but it is leaving blank file for me.Can anybody guide where I went wrong ?
I suspect that sed by itself is the wrong tool for this. You can however do what you're asking in bash alone:
#!/usr/bin/env bash
# Declare an associative array (requires bash 4)
declare -A repl=()
# Step through our replacement file, recording it to an array.
while read this that; do
repl["$this"]="$that"
done < inp1
# Read the input file, replacing things strings noted in the array.
while read line; do
for string in "${!repl[#]}"; do
line="${line/$string/${repl[$string]}}"
done
echo "$line"
done < circos.conf
This approach of course is oversimplified and therefore shouldn't be used verbatim -- you'll want to make sure you're only editing the lines that you really want to edit (verifying that they match /condition =between/ for example). Note that because this solution uses an associative array (declare -A ...), it depends on bash version 4.
If you were to solve this with awk, the same basic principle would apply:
#!/usr/bin/awk -f
# Collect the tranlations from the first file.
NR==FNR { repl[$1]=$2; next }
# Step through the input file, replacing as required.
{
for ( string in repl ) {
sub(string, repl[string])
}
}
# And print.
1
You'd run this with the first argument being the translation file, and the second being the input file:
$ ./thisscript translations.txt circos.conf
Before you read the better solution(s), a small explanation what you did wrong.
A fixed version of your script would be
while read -r line; do
value1=$(echo "$line"| awk -F" " '{print $1}')
value2=$(echo "$line"| awk -F" " '{print $2}')
sed -i "s/$value1/$value2/g" circos.conf
done < input.txt
What are the changes here?
Added while read -r line; do ... done < input.txt
Your "$line" was never initialised
awk with -F" " and not \;
You have whitespace in between
awk without input.txt
awk should read from the pipe, not from the file
sed with double quotes
The variables must be evaluated.
What's wrong with this solution?
First you must hope that the values from input.txt are sed_friendly (no slashes or other special characters).
And when you use this for large files, you will keep on looping. awk can handle the looping, you should avoid nesting awk in a loop.
When the input.txt is limited, you might want something like
sed -i -e 's/b73_chr10/w22_chr9/g' \
-e 's/w22_chr7/w22_chr10/g' \
-e 's/w22_chr8/w22_chr8/g' circos.conf
And now the comment of #alvits makes sence. Put all those sed commands in a sed-command file. When you can't change the format of input.txt, you can rewrite it in the script, but using an array as in the solution of #Ghoti is better.

Search file A for a list of strings located in file B and append the value associated with that string to the end of the line in file A

This is a bit complicated, well I think it is..
I have two files, File A and file B
File A contains delay information for a pin and is in the following format
AD22 15484
AB22 9485
AD23 10945
File B contains a component declaration that needs this information added to it and is in the format:
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
So what I am trying to achieve is the following output
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
There is no order to the pin numbers in file A or B
So I'm assuming the following needs to happen
open file A, read first line
search file B for first string field in the line just read
once found in file B at the end of the line add the text "\nPIN_DELAY='"
add the second string filed of the line read from file A
add the following text at the end "';"
repeat by opening file A, read the second line
I'm assuming it will be a combination of sed and awk commands and I'm currently trying to work it out but think this is beyond my knowledge. Many thanks in advance as I know it's complicated..
FILE2=`cat file2`
FILE1=`cat file1`
TMPFILE=`mktemp XXXXXXXX.tmp`
FLAG=0
for line in $FILE1;do
echo $line >> $TMPFILE
for line2 in $FILE2;do
if [ $FLAG == 1 ];then
echo -e "PIN_DELAY='$(echo $line2 | awk -F " " '{print $1}')'" >> $TMPFILE
FLAG=0
elif [ "`echo $line | grep $(echo $line2 | awk -F " " '{print $1}')`" != "" ];then
FLAG=1
fi
done
done
mv $TMPFILE file1
Works for me, you can also add a trap for remove tmp file if user send sigint.
awk to the rescue...
$ awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' keys data
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it.
FINAL Script for my application, A big thank you to all that helped..
# ! /usr/bin/sh
# script created by Adam with a LOT of help from users on stackoverflow
# must pass $1 file (package file from Xilinx)
# must pass $2 file (chips.prt file from the PCB design office)
# remove these temp files, throws error if not present tho, whoops!!
rm DELAYS.txt CHIP.txt OUTPUT.txt
# BELOW::create temp files for the code thanks to Glastis#stackoverflow https://stackoverflow.com/users/5101968/glastis I now know how to do this
DELAYS=`mktemp DELAYS.txt`
CHIP=`mktemp CHIP.txt`
OUTPUT=`mktemp OUTPUT.txt`
# BELOW::grep input file 1 (pkg file from Xilinx) for lines containing a delay in the form of n.n and use TAIL to remove something (can't remember), sed to remove blanks and replace with single space, sed to remove space before \n, use awk to print columns 3,9,10 and feed into awk again to calculate delay provided by fedorqui#stackoverflow https://stackoverflow.com/users/1983854/fedorqui
# In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
# {...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
# $(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
# {$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
# {...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".
grep -E -0 '[0-9]\.[0-9]' $1 | tail -n +2 | sed -e 's/[[:blank:]]\+/ /g' -e 's/\s\n/\n/g' | awk '{print ","$3",",$9,$10}' | awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 169); NF--}1' >> $DELAYS
# remove blanks in part file and add additional commas (,) so that the following awk command works properly
cat $2 | sed -e "s/[[:blank:]]\+//" -e "s/(/(,/g" -e 's/)/,)/g' >> $CHIP
# this awk command is provided by karakfa#stackoverflow https://stackoverflow.com/users/1435869/karakfa Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it. https://stackoverflow.com/questions/32458680/search-file-a-for-a-list-of-strings-located-in-file-b-and-append-the-value-assoc
awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' $DELAYS $CHIP >> $OUTPUT
# remove the additional commas (,) added in earlier before ) and after ( and you are done..
cat $OUTPUT | sed -e 's/(,/(/g' -e 's/,)/)/g' >> chipsd.prt

How to quickly delete the lines in a file that contain items from a list in another file in BASH?

I have a file called words.txt containing a list of words. I also have a file called file.txt containing a sentence per line. I need to quickly delete any lines in file.txt that contain one of the lines from words.txt, but only if the match is found somewhere between { and }.
E.g. file.txt:
Once upon a time there was a cat.
{The cat} lived in the forest.
The {cat really liked to} eat mice.
E.g. words.txt:
cat
mice
Example output:
Once upon a time there was a cat.
Is removed because "cat" is found on those two lines and the words are also between { and }.
The following script successfully does this task:
while read -r line
do
sed -i "/{.*$line.*}/d" file.txt
done < words.txt
This script is very slow. Sometimes words.txt contains several thousand items, so the while loop takes several minutes. I attempted to use the sed -f option, which seems to allow reading a file, but I cannot find any manuals explaining how to use this.
How can I improve the speed of the script?
An awk solution:
awk 'NR==FNR{a["{[^{}]*"$0"[^{}]*}"]++;next}{for(i in a)if($0~i)next;b[j++]=$0}END{printf "">FILENAME;for(i=0;i in b;++i)print b[i]>FILENAME}' words.txt file.txt
It converts file.txt directly to have the expected output.
Once upon a time there was a cat.
Uncondensed version:
awk '
NR == FNR {
a["{[^{}]*" $0 "[^{}]*}"]++
next
}
{
for (i in a)
if ($0 ~ i)
next
b[j++] = $0
}
END {
printf "" > FILENAME
for (i = 0; i in b; ++i)
print b[i] > FILENAME
}
' words.txt file.txt
If files are expected to get too large that awk may not be able to handle it, we can only redirect it to stdout. We may not be able to modify the file directly:
awk '
NR == FNR {
a["{[^{}]*" $0 "[^{}]*}"]++
next
}
{
for (i in a)
if ($0 ~ i)
next
}
1
' words.txt file.txt
you can use grep to match 2 files like this:
grep -vf words.txt file.txt
In think that using the grep command should be way faster. By example:
grep -f words.txt -v file.txt
The f option make grep use the words.txt file as matching patterns
The v option reverse the matching, ie keeping files that do not match one of the patterns.
It doesn't solve the {} constraint, but that is easily avoidable, for example by adding the brackets to the pattern file (or in a temporary file created at runtime).
I think this should work for you:
sed -e 's/.*/{.*&.*}/' words.txt | grep -vf- file.txt > out ; mv out file.txt
This basically just modifies the words.txt file on the fly and uses it as a word file for grep.
In pure native bash (4.x):
#!/bin/env bash4
# ^-- MUST start with a /bin/bash shebang, NOT /bin/sh
readarray -t words <words.txt # read words into array
IFS='|' # use | as delimiter when expanding $*
words_re="[{].*(${words[*]}).*[}]" # form a regex matching all words
while read -r; do # for each line in file...
if ! [[ $REPLY =~ $words_re ]]; then # ...check whether it matches...
printf '%s\n' "$REPLY" # ...and print it if not.
fi
done <file.txt
Native bash is somewhat slower than awk, but this still is a single-pass solution (O(n+m), whereas the sed -i approach was O(n*m)), making it vastly faster than any iterative approach.
You could do this in two steps:
Wrap each word in words.txt with {.* and .*}:
awk '{ print "{.*" $0 ".*}" }' words.txt > wrapped.txt
Use grep with inverse match:
grep -v -f wrapped.txt file.txt
This would be particularly useful if words.txt is very large, as a pure-awk approach (storing all the entries of words.txt in an array) would require a lot of memory.
If would prefer a one-liner and would like to skip creating the intermediate file you could do this:
awk '{ print "{.*" $0 ".*}" }' words.txt | grep -v -f - file.txt
The - is a placeholder which tells grep to use stdin
update
If the size of words.txt isn't too big, you could do the whole thing in awk:
awk 'NR==FNR{a[$0]++;next}{p=1;for(i in a){if ($0 ~ "{.*" i ".*}") { p=0; break}}}p' words.txt file.txt
expanded:
awk 'NR==FNR { a[$0]++; next }
{
p=1
for (i in a) {
if ($0 ~ "{.*" i ".*}") { p=0; break }
}
}p' words.txt file.txt
The first block builds an array containing each line in words.txt. The second block runs for every line in file.txt. A flag p controls whether the line is printed. If the line matches the pattern, p is set to false. When the p outside the last block evaluates to true, the default action occurs, which is to print the line.

How to append to lines in a file that do not contain a specific pattern using shell script

I have a flat file as follows:
11|aaa
11|bbb|NO|xxx
11|ccc
11|ddd|NO|yyy
For lines that do not contain |NO|, I would like to add the string |YES| at the end. So my file should look like:
11|aaa|YES|
11|bbb|NO|xxx
11|ccc|YES|
11|ddd|NO|yyy
I am using AIX and sed -i option for inline replacements is not available. Hence, currently I'm using the following code to do this:
#Get the lines that do not contain |NO|
LINES=`grep -v "|NO|" file`
for i in LINES
do
sed "/$i/{s/$/|YES|/;}" file > temp
mv temp file
done
The above works, however, as my file contains over 40000 lines, it takes about 3 hours to run. I believe it is taking so much time because it has to search for each line and write to a temp file. Is there a faster way to achieve this ?
This will be quick:
sed '/NO/!s/$/|YES|/' filename
If temp.txt is your file, try:
awk '$0 !~ /NO/ {print $0 "|YES|"} $0 ~ /NO/ {print}' temp.txt
Simple with awk. Put the code below into a script and run it with awk -f script file > temp
/\|NO\|/ { print; next; } # just print anything which contains |NO| and read next line
{ print $0 "|YES|"; } # For any other line (no pattern), print the line + |YES|
I'm not sure about awk regexps; if it doesn't work, try to remove the two \ in the first pattern.

Resources