How to check for column content in Bash - bash

I am stuck with this problem.
Using Bash, we have to check if the .txt file presents data for two columns , and if not, annotations have to be emptied .
Data is a txt file as follows :
#pacId locusName Best-hit-arabi-name arabi-defline
23158591 Lus10000002.g AT1G75330.1 ornithine carbamoyltransferase
23170978 Lus10000003.g AT1G14540.1 Peroxidase superfamily protein
I have to Empty annotations with no "Best-hit" & "arabi-defline" columns
I am thinking of doing a while script reading each line , but I don't know what would be the code to check if the columns are empty.
Thanks for helping me out !

I have to Empty annotations with no "Best-hit" & "arabi-defline" columns
I'll assume that you mean:
I have to remove the line that doesn't contain value for the Best-hit and arabi-defline
So if it's the case, here a simple solution using awk:
awk '{if ($3 && $4){print $0}}' test.txt
I think awk is a better fit than bash in this case but you can also do it using bash with something like:
while read -r pacId locusName bHAN aD; do [[ $bHAN && $aD ]] && echo "${pacId} ${locusName} ${bHAN} ${aD}"; done < test.txt
Of course if you want to change the default separator by something else than any blank, you can just override the IFS like this:
while IFS='\t' read -r pacId locusName bHAN aD; do [[ $bHAN && $aD ]] && echo -e "${pacId}\t${locusName}\t${bHAN}\t${aD}"; done < test.txt
Same thing for awk, you'll just have to use -F to change the default separator.

Related

shell script compare file with multiple line pattern

I have a file which is created after some manual configuration.
I need to check this file automatically with a shell script.
The file looks like this:
eth0;eth0;1c:98:ec:2a:1a:4c
eth1;eth1;1c:98:ec:2a:1a:4d
eth2;eth2;1c:98:ec:2a:1a:4e
eth3;eth3;1c:98:ec:2a:1a:4f
eth4;eth4;48:df:37:58:da:44
eth5;eth5;48:df:37:58:da:45
eth6;eth6;48:df:37:58:da:46
eth7;eth7;48:df:37:58:da:47
I want to compare it to a pattern like this:
eth0;eth0;*
eth1;eth1;*
eth2;eth2;*
eth3;eth3;*
eth4;eth4;*
eth5;eth5;*
eth6;eth6;*
eth7;eth7;*
If I would only have to check this pattern I could run this loop:
c=0
while [ $c -le 7 ]
do
if [ "$(grep "eth"${c}";eth"${c}";*" current_mapping)" ];
then
echo "eth$c ok"
fi
(( c++ ))
done
There are 6 or more different patterns possible. A pattern could also look like this for example (depending and specific configuration requests):
eth4;eth0;*
eth5;eth1;*
eth6;eth2;*
eth7;eth3;*
eth0;eth4;*
eth1;eth5;*
eth2;eth6;*
eth3;eth7;*
So I don't think I can run a standard grep per line command in a loop. The eth numbers are not consistently the same.
Is it possible somehow to compare the whole file to pattern like it would be possible with grep for a single line?
Assuming file is your data file and patt is your file that contains above pattern. You can use this grep -f in conjunction with sed in a process substitution that replaces * with .* and ? with . to make it a workable regex.
grep -f <(sed 's/\*/.*/g; s/?/./g' patt) file
eth0;eth0;1c:98:ec:2a:1a:4c
eth1;eth1;1c:98:ec:2a:1a:4d
eth2;eth2;1c:98:ec:2a:1a:4e
eth3;eth3;1c:98:ec:2a:1a:4f
eth4;eth4;48:df:37:58:da:44
eth5;eth5;48:df:37:58:da:45
eth6;eth6;48:df:37:58:da:46
eth7;eth7;48:df:37:58:da:47
I wrote this loop now and it does the job (current_mapping being the file with the content in the first code block of the question). I would have to create arrays with different patterns and use a case for every pattern. I was just wondering if there is something like grep for multiple lines, that could the same without writing this loop.
array=("eth0;eth0;*" "eth1;eth1;*" "eth2;eth2;*" "eth3;eth3;*" "eth4;eth4;*" "eth5;eth5;*" "eth6;eth6;*" "eth7;eth7;*")
c=1
while [ $c -le 8 ]
do
if [ ! "$(sed -n "${c}"p current_mapping | grep "${array[$c-1]}")" ];
then
echo "somethings wrong"
fi
(( c++ ))
done
Try any:
grep -P '(eth[0-9]);\1'
grep -E '(eth[0-9]);\1'
sed -n '/\(eth[0-9]\);\1/p'
awk -F';' '$1 == $2'
There are commands only. Apply them to a pipe or file.
Updated the answer after the question was edited.
As we can see the task requirements are as follows:
a file (a set of lines) formatted like ethN;ethM;MAC
examine each line for equality ethN and ethM
if they are equal, output a string ethN ok
If I understand the task correctly we can achieve this using the following code without loops:
awk -F';' '$1 == $2 { print $1, "ok" }'

Changing words in text files using multiple dictionaries

I have a bunch of files which need to be translated using custom dictionaries. Each file contains a line indicating which dictionary to use. Here's an example:
*A:
!
=1
*>A_intro
1r
=2
1r
=3
1r
=4
1r
=5
2A:maj
*-
In the file above, *A: indicates to use dictA.
I can translate this part easily using the following syntax:
sed -f dictA < myfile
My problem is that some files require a change of dictionary half way in the text. For example:
*B:
1B:maj
2E:maj/5
2B:maj
2E:maj/5
*C:
2F:maj/5
2C:maj
2F:maj/5
2C:maj
*-
I would like to write a script to automate the translation process. Using this example, I would like the script to read the first line, select dictB, use dictB to translate each line until it reads *C:, select dictC, and then keep going.
Thanks #Cyrus. That was useful. Here's what I ended up doing.
#!/bin/sh
key="sedDictNull.txt"
while read -r line || [ -n "$line" ] ## Makes sure that the last line is read. See http://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line
do
if [[ $line =~ ^\*[Aa]:$ ]]
then
key="sedDictA.txt"
elif [[ $line =~ ^\*[Aa]#:$ ]]
then
key="sedDictA#.txt"
fi
echo "$line" | sed -f $key
done < $1
I assume your "dictionaries" are really sed scripts that search and replace, like this:
s/2C/nothing/;
s/2B/something/;
You could reorganize these scripts into sections, like this:
/^\*B:/, /^\*[^B]/ {
s/1B/whatever/;
s/2B/something/;
}
/^\*C:/, /^\*[^C]/ {
s/2C/nothing/;
s/2B/something/;
}
And, of course, you could do that on the fly:
for dict in B C
do echo "/^\\*$dict:/, /^\\*[^$dict]/ {"
cat dict.$dict
echo "}"
done | sed -f- dict.in

appending text to specific line in file bash

So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below:
Original file:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
Expected result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
This is my code:
while read p; do
if [[ $p == "HEA"* ]]
then
IFS=',' read -ra ADDR <<< "$p"
echo ${#ADDR[#]}
arrayCount=${#ADDR[#]}
if [ "${arrayCount}" -eq 16 ];
then
sed -i "/$p/ s/\$/,xx/g" $f
fi
fi
done <$f
Result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
What im doing wrong? I'm sure its something small but i cant find it..
It can be done using awk:
awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
-F, sets input field separator as comma
NF==16 is the condition that says execute block inside { and } if # of fields is 16
$0 = $0 FS "xx" appends xx at end of line
1 is the default awk action that means print the output
For using sed answer should be in the following:
Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first.
Use the special char & to denote the matched string
The sed statement should look like the following:
sed -i "${line_number}s/.*/&xx/"
I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.

how to validate if data has a trailing "/"

I have a file containing various information. The fields are delimited by |. One of the fields contains a directory. For example :
blah|blah|blah|/usr/local/etc/|blah|blah
I need to validate that the path field does not end with a "/". I'm using ksh. Any suggestions?
thanks.
Assuming the directory is always in the 4th field
line=0
while IFS='|' read -rA fields; do
let line++
[[ ${fields[3]} == */ ]] && echo line $line: ends with a slash
done < filename
Not ksh, but this is a natural job for awk:
awk -F\| '$4 ~ /\/$/ {
print "Trailing slash in line "NR":", $4
}' ${file:?}
Try this:
if [ line ~= '(/\w+)+(\||$)' ]
My shell syntax is rusty, so this might need a little massaging into shape
Don't forget special path like / (root)
I keep the / (root) in code below
echo "blah|blah|blah|/usr/local/etc/|blah|blah|
blah|blah|blah|/|blah|blah
blah|blah|blah|.|blah|blah
blah|blah|blah|/usr/local/etc|blah|blah" \
sed "
/\/\|/ {
/\|\/\|/ !s/\/|/|/
}"
explaination:
//\|/ treat line where a "/|" appear
//\|/ ! treat line where "|/|" doesn't appear (here in the case of previous test occur)
s//|/|/ replace "/|" by "|" (here when both test occur successfully)

Processing a tab delimited file with shell script processing

normally I would use Python/Perl for this procedure but I find myself (for political reasons) having to pull this off using a bash shell.
I have a large tab delimited file that contains six columns and the second column is integers. I need to shell script a solution that would verify that the file indeed is six columns and that the second column is indeed integers. I am assuming that I would need to use sed/awk here somewhere. Problem is that I'm not that familiar with sed/awk. Any advice would be appreciated.
Many thanks!
Lilly
gawk:
BEGIN {
FS="\t"
}
(NF != 6) || ($2 != int($2)) {
exit 1
}
Invoke as follows:
if awk -f colcheck.awk somefile
then
# is valid
else
# is not valid
fi
Well you can directly tell awk what the field delimiter is (the -F option). Inside your awk script you can tell how many fields are present in each record with the NF variable.
Oh, and you can check the second field with a regex. The whole thing might look something like this:
awk < thefile -F\\t '
{ if (NF != 6 || $2 ~ /[^0123456789]/) print "Format error, line " NR; }
'
That's probably close but I need to check the regex because Linux regex syntax variation is so insane. (edited because grrrr)
here's how to do it with awk
awk 'NF!=6||$2+0!=$2{print "error"}' file
Pure Bash:
infile='column6.dat'
lno=0
while read -a line ; do
((lno++))
if [ ${#line[#]} -ne 6 ] ; then
echo -e "line $lno has ${#line[#]} elements"
fi
if ! [[ ${line[1]} =~ ^[0-9]+$ ]] ; then
echo -e "line $lno column 2 : not an integer"
fi
done < "$infile"
Possible output:
line 19 has 5 elements
line 36 column 2 : not an integer
line 38 column 2 : not an integer
line 51 has 3 elements

Resources