How to check for column content in Bash

How to check for column content in Bash - bash

I am stuck with this problem.
Using Bash, we have to check if the .txt file presents data for two columns , and if not, annotations have to be emptied .
Data is a txt file as follows :
#pacId locusName Best-hit-arabi-name arabi-defline
23158591 Lus10000002.g AT1G75330.1 ornithine carbamoyltransferase
23170978 Lus10000003.g AT1G14540.1 Peroxidase superfamily protein
I have to Empty annotations with no "Best-hit" & "arabi-defline" columns
I am thinking of doing a while script reading each line , but I don't know what would be the code to check if the columns are empty.
Thanks for helping me out !

I have to Empty annotations with no "Best-hit" & "arabi-defline" columns
I'll assume that you mean:
I have to remove the line that doesn't contain value for the Best-hit and arabi-defline
So if it's the case, here a simple solution using awk:
awk '{if ($3 && $4){print $0}}' test.txt
I think awk is a better fit than bash in this case but you can also do it using bash with something like:
while read -r pacId locusName bHAN aD; do [[ $bHAN && $aD ]] && echo "${pacId} ${locusName} ${bHAN} ${aD}"; done < test.txt
Of course if you want to change the default separator by something else than any blank, you can just override the IFS like this:
while IFS='\t' read -r pacId locusName bHAN aD; do [[ $bHAN && $aD ]] && echo -e "${pacId}\t${locusName}\t${bHAN}\t${aD}"; done < test.txt
Same thing for awk, you'll just have to use -F to change the default separator.

Related

shell script compare file with multiple line pattern

I have a file which is created after some manual configuration.
I need to check this file automatically with a shell script.
The file looks like this:
eth0;eth0;1c:98:ec:2a:1a:4c
eth1;eth1;1c:98:ec:2a:1a:4d
eth2;eth2;1c:98:ec:2a:1a:4e
eth3;eth3;1c:98:ec:2a:1a:4f
eth4;eth4;48:df:37:58:da:44
eth5;eth5;48:df:37:58:da:45
eth6;eth6;48:df:37:58:da:46
eth7;eth7;48:df:37:58:da:47
I want to compare it to a pattern like this:
eth0;eth0;*
eth1;eth1;*
eth2;eth2;*
eth3;eth3;*
eth4;eth4;*
eth5;eth5;*
eth6;eth6;*
eth7;eth7;*
If I would only have to check this pattern I could run this loop:
c=0
while [ $c -le 7 ]
do
if [ "$(grep "eth"${c}";eth"${c}";*" current_mapping)" ];
then
echo "eth$c ok"
fi
(( c++ ))
done
There are 6 or more different patterns possible. A pattern could also look like this for example (depending and specific configuration requests):
eth4;eth0;*
eth5;eth1;*
eth6;eth2;*
eth7;eth3;*
eth0;eth4;*
eth1;eth5;*
eth2;eth6;*
eth3;eth7;*
So I don't think I can run a standard grep per line command in a loop. The eth numbers are not consistently the same.
Is it possible somehow to compare the whole file to pattern like it would be possible with grep for a single line?

Assuming file is your data file and patt is your file that contains above pattern. You can use this grep -f in conjunction with sed in a process substitution that replaces * with .* and ? with . to make it a workable regex.
grep -f <(sed 's/\*/.*/g; s/?/./g' patt) file
eth0;eth0;1c:98:ec:2a:1a:4c
eth1;eth1;1c:98:ec:2a:1a:4d
eth2;eth2;1c:98:ec:2a:1a:4e
eth3;eth3;1c:98:ec:2a:1a:4f
eth4;eth4;48:df:37:58:da:44
eth5;eth5;48:df:37:58:da:45
eth6;eth6;48:df:37:58:da:46
eth7;eth7;48:df:37:58:da:47

I wrote this loop now and it does the job (current_mapping being the file with the content in the first code block of the question). I would have to create arrays with different patterns and use a case for every pattern. I was just wondering if there is something like grep for multiple lines, that could the same without writing this loop.
array=("eth0;eth0;*" "eth1;eth1;*" "eth2;eth2;*" "eth3;eth3;*" "eth4;eth4;*" "eth5;eth5;*" "eth6;eth6;*" "eth7;eth7;*")
c=1
while [ $c -le 8 ]
do
if [ ! "$(sed -n "${c}"p current_mapping | grep "${array[$c-1]}")" ];
then
echo "somethings wrong"
fi
(( c++ ))
done

Try any:
grep -P '(eth[0-9]);\1'
grep -E '(eth[0-9]);\1'
sed -n '/\(eth[0-9]\);\1/p'
awk -F';' '$1 == $2'
There are commands only. Apply them to a pipe or file.
Updated the answer after the question was edited.
As we can see the task requirements are as follows:
a file (a set of lines) formatted like ethN;ethM;MAC
examine each line for equality ethN and ethM
if they are equal, output a string ethN ok
If I understand the task correctly we can achieve this using the following code without loops:
awk -F';' '$1 == $2 { print $1, "ok" }'

Changing words in text files using multiple dictionaries

I have a bunch of files which need to be translated using custom dictionaries. Each file contains a line indicating which dictionary to use. Here's an example:
*A:
!
=1
*>A_intro
1r
=2
1r
=3
1r
=4
1r
=5
2A:maj
*-
In the file above, *A: indicates to use dictA.
I can translate this part easily using the following syntax:
sed -f dictA < myfile
My problem is that some files require a change of dictionary half way in the text. For example:
*B:
1B:maj
2E:maj/5
2B:maj
2E:maj/5
*C:
2F:maj/5
2C:maj
2F:maj/5
2C:maj
*-
I would like to write a script to automate the translation process. Using this example, I would like the script to read the first line, select dictB, use dictB to translate each line until it reads *C:, select dictC, and then keep going.

Thanks #Cyrus. That was useful. Here's what I ended up doing.
#!/bin/sh
key="sedDictNull.txt"
while read -r line || [ -n "$line" ] ## Makes sure that the last line is read. See http://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line
do
if [[ $line =~ ^\*[Aa]:$ ]]
then
key="sedDictA.txt"
elif [[ $line =~ ^\*[Aa]#:$ ]]
then
key="sedDictA#.txt"
fi
echo "$line" | sed -f $key
done < $1

I assume your "dictionaries" are really sed scripts that search and replace, like this:
s/2C/nothing/;
s/2B/something/;
You could reorganize these scripts into sections, like this:
/^\*B:/, /^\*[^B]/ {
s/1B/whatever/;
s/2B/something/;
}
/^\*C:/, /^\*[^C]/ {
s/2C/nothing/;
s/2B/something/;
}
And, of course, you could do that on the fly:
for dict in B C
do echo "/^\\*$dict:/, /^\\*[^$dict]/ {"
cat dict.$dict
echo "}"
done | sed -f- dict.in

appending text to specific line in file bash

So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below:
Original file:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
Expected result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
This is my code:
while read p; do
if [[ $p == "HEA"* ]]
then
IFS=',' read -ra ADDR <<< "$p"
echo ${#ADDR[#]}
arrayCount=${#ADDR[#]}
if [ "${arrayCount}" -eq 16 ];
then
sed -i "/$p/ s/\$/,xx/g" $f
fi
fi
done <$f
Result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
What im doing wrong? I'm sure its something small but i cant find it..

It can be done using awk:
awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
-F, sets input field separator as comma
NF==16 is the condition that says execute block inside { and } if # of fields is 16
$0 = $0 FS "xx" appends xx at end of line
1 is the default awk action that means print the output

For using sed answer should be in the following:
Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first.
Use the special char & to denote the matched string
The sed statement should look like the following:
sed -i "${line_number}s/.*/&xx/"
I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.

how to validate if data has a trailing "/"

I have a file containing various information. The fields are delimited by |. One of the fields contains a directory. For example :
blah|blah|blah|/usr/local/etc/|blah|blah
I need to validate that the path field does not end with a "/". I'm using ksh. Any suggestions?
thanks.

Assuming the directory is always in the 4th field
line=0
while IFS='|' read -rA fields; do
let line++
[[ ${fields[3]} == */ ]] && echo line $line: ends with a slash
done < filename

Not ksh, but this is a natural job for awk:
awk -F\| '$4 ~ /\/$/ {
print "Trailing slash in line "NR":", $4
}' ${file:?}

Try this:
if [ line ~= '(/\w+)+(\||$)' ]
My shell syntax is rusty, so this might need a little massaging into shape

Don't forget special path like / (root)
I keep the / (root) in code below
echo "blah|blah|blah|/usr/local/etc/|blah|blah|
blah|blah|blah|/|blah|blah
blah|blah|blah|.|blah|blah
blah|blah|blah|/usr/local/etc|blah|blah" \
sed "
/\/\|/ {
/\|\/\|/ !s/\/|/|/
}"
explaination:
//\|/ treat line where a "/|" appear
//\|/ ! treat line where "|/|" doesn't appear (here in the case of previous test occur)
s//|/|/ replace "/|" by "|" (here when both test occur successfully)

Processing a tab delimited file with shell script processing

normally I would use Python/Perl for this procedure but I find myself (for political reasons) having to pull this off using a bash shell.
I have a large tab delimited file that contains six columns and the second column is integers. I need to shell script a solution that would verify that the file indeed is six columns and that the second column is indeed integers. I am assuming that I would need to use sed/awk here somewhere. Problem is that I'm not that familiar with sed/awk. Any advice would be appreciated.
Many thanks!
Lilly

gawk:
BEGIN {
FS="\t"
}
(NF != 6) || ($2 != int($2)) {
exit 1
}
Invoke as follows:
if awk -f colcheck.awk somefile
then
# is valid
else
# is not valid
fi

Well you can directly tell awk what the field delimiter is (the -F option). Inside your awk script you can tell how many fields are present in each record with the NF variable.
Oh, and you can check the second field with a regex. The whole thing might look something like this:
awk < thefile -F\\t '
{ if (NF != 6 || $2 ~ /[^0123456789]/) print "Format error, line " NR; }
'
That's probably close but I need to check the regex because Linux regex syntax variation is so insane. (edited because grrrr)

here's how to do it with awk
awk 'NF!=6||$2+0!=$2{print "error"}' file

Pure Bash:
infile='column6.dat'
lno=0
while read -a line ; do
((lno++))
if [ ${#line[#]} -ne 6 ] ; then
echo -e "line $lno has ${#line[#]} elements"
fi
if ! [[ ${line[1]} =~ ^[0-9]+$ ]] ; then
echo -e "line $lno column 2 : not an integer"
fi
done < "$infile"
Possible output:
line 19 has 5 elements
line 36 column 2 : not an integer
line 38 column 2 : not an integer
line 51 has 3 elements

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to check for column content in Bash - bash

Related

shell script compare file with multiple line pattern

Changing words in text files using multiple dictionaries

appending text to specific line in file bash

how to validate if data has a trailing "/"

Processing a tab delimited file with shell script processing

Categories

Resources