How to check for column content in Bash - bash
I am stuck with this problem.
Using Bash, we have to check if the .txt file presents data for two columns , and if not, annotations have to be emptied .
Data is a txt file as follows :
#pacId locusName Best-hit-arabi-name arabi-defline
23158591 Lus10000002.g AT1G75330.1 ornithine carbamoyltransferase
23170978 Lus10000003.g AT1G14540.1 Peroxidase superfamily protein
I have to Empty annotations with no "Best-hit" & "arabi-defline" columns
I am thinking of doing a while script reading each line , but I don't know what would be the code to check if the columns are empty.
Thanks for helping me out !
I have to Empty annotations with no "Best-hit" & "arabi-defline" columns
I'll assume that you mean:
I have to remove the line that doesn't contain value for the Best-hit and arabi-defline
So if it's the case, here a simple solution using awk:
awk '{if ($3 && $4){print $0}}' test.txt
I think awk is a better fit than bash in this case but you can also do it using bash with something like:
while read -r pacId locusName bHAN aD; do [[ $bHAN && $aD ]] && echo "${pacId} ${locusName} ${bHAN} ${aD}"; done < test.txt
Of course if you want to change the default separator by something else than any blank, you can just override the IFS like this:
while IFS='\t' read -r pacId locusName bHAN aD; do [[ $bHAN && $aD ]] && echo -e "${pacId}\t${locusName}\t${bHAN}\t${aD}"; done < test.txt
Same thing for awk, you'll just have to use -F to change the default separator.
Related
shell script compare file with multiple line pattern
I have a file which is created after some manual configuration. I need to check this file automatically with a shell script. The file looks like this: eth0;eth0;1c:98:ec:2a:1a:4c eth1;eth1;1c:98:ec:2a:1a:4d eth2;eth2;1c:98:ec:2a:1a:4e eth3;eth3;1c:98:ec:2a:1a:4f eth4;eth4;48:df:37:58:da:44 eth5;eth5;48:df:37:58:da:45 eth6;eth6;48:df:37:58:da:46 eth7;eth7;48:df:37:58:da:47 I want to compare it to a pattern like this: eth0;eth0;* eth1;eth1;* eth2;eth2;* eth3;eth3;* eth4;eth4;* eth5;eth5;* eth6;eth6;* eth7;eth7;* If I would only have to check this pattern I could run this loop: c=0 while [ $c -le 7 ] do if [ "$(grep "eth"${c}";eth"${c}";*" current_mapping)" ]; then echo "eth$c ok" fi (( c++ )) done There are 6 or more different patterns possible. A pattern could also look like this for example (depending and specific configuration requests): eth4;eth0;* eth5;eth1;* eth6;eth2;* eth7;eth3;* eth0;eth4;* eth1;eth5;* eth2;eth6;* eth3;eth7;* So I don't think I can run a standard grep per line command in a loop. The eth numbers are not consistently the same. Is it possible somehow to compare the whole file to pattern like it would be possible with grep for a single line?
Assuming file is your data file and patt is your file that contains above pattern. You can use this grep -f in conjunction with sed in a process substitution that replaces * with .* and ? with . to make it a workable regex. grep -f <(sed 's/\*/.*/g; s/?/./g' patt) file eth0;eth0;1c:98:ec:2a:1a:4c eth1;eth1;1c:98:ec:2a:1a:4d eth2;eth2;1c:98:ec:2a:1a:4e eth3;eth3;1c:98:ec:2a:1a:4f eth4;eth4;48:df:37:58:da:44 eth5;eth5;48:df:37:58:da:45 eth6;eth6;48:df:37:58:da:46 eth7;eth7;48:df:37:58:da:47
I wrote this loop now and it does the job (current_mapping being the file with the content in the first code block of the question). I would have to create arrays with different patterns and use a case for every pattern. I was just wondering if there is something like grep for multiple lines, that could the same without writing this loop. array=("eth0;eth0;*" "eth1;eth1;*" "eth2;eth2;*" "eth3;eth3;*" "eth4;eth4;*" "eth5;eth5;*" "eth6;eth6;*" "eth7;eth7;*") c=1 while [ $c -le 8 ] do if [ ! "$(sed -n "${c}"p current_mapping | grep "${array[$c-1]}")" ]; then echo "somethings wrong" fi (( c++ )) done
Try any: grep -P '(eth[0-9]);\1' grep -E '(eth[0-9]);\1' sed -n '/\(eth[0-9]\);\1/p' awk -F';' '$1 == $2' There are commands only. Apply them to a pipe or file. Updated the answer after the question was edited. As we can see the task requirements are as follows: a file (a set of lines) formatted like ethN;ethM;MAC examine each line for equality ethN and ethM if they are equal, output a string ethN ok If I understand the task correctly we can achieve this using the following code without loops: awk -F';' '$1 == $2 { print $1, "ok" }'
Changing words in text files using multiple dictionaries
I have a bunch of files which need to be translated using custom dictionaries. Each file contains a line indicating which dictionary to use. Here's an example: *A: ! =1 *>A_intro 1r =2 1r =3 1r =4 1r =5 2A:maj *- In the file above, *A: indicates to use dictA. I can translate this part easily using the following syntax: sed -f dictA < myfile My problem is that some files require a change of dictionary half way in the text. For example: *B: 1B:maj 2E:maj/5 2B:maj 2E:maj/5 *C: 2F:maj/5 2C:maj 2F:maj/5 2C:maj *- I would like to write a script to automate the translation process. Using this example, I would like the script to read the first line, select dictB, use dictB to translate each line until it reads *C:, select dictC, and then keep going.
Thanks #Cyrus. That was useful. Here's what I ended up doing. #!/bin/sh key="sedDictNull.txt" while read -r line || [ -n "$line" ] ## Makes sure that the last line is read. See http://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line do if [[ $line =~ ^\*[Aa]:$ ]] then key="sedDictA.txt" elif [[ $line =~ ^\*[Aa]#:$ ]] then key="sedDictA#.txt" fi echo "$line" | sed -f $key done < $1
I assume your "dictionaries" are really sed scripts that search and replace, like this: s/2C/nothing/; s/2B/something/; You could reorganize these scripts into sections, like this: /^\*B:/, /^\*[^B]/ { s/1B/whatever/; s/2B/something/; } /^\*C:/, /^\*[^C]/ { s/2C/nothing/; s/2B/something/; } And, of course, you could do that on the fly: for dict in B C do echo "/^\\*$dict:/, /^\\*[^$dict]/ {" cat dict.$dict echo "}" done | sed -f- dict.in
appending text to specific line in file bash
So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below: Original file: a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a Expected result: a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx This is my code: while read p; do if [[ $p == "HEA"* ]] then IFS=',' read -ra ADDR <<< "$p" echo ${#ADDR[#]} arrayCount=${#ADDR[#]} if [ "${arrayCount}" -eq 16 ]; then sed -i "/$p/ s/\$/,xx/g" $f fi fi done <$f Result: a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a ,xx b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a ,xx What im doing wrong? I'm sure its something small but i cant find it..
It can be done using awk: awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a b,b,b,b,b,b a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx -F, sets input field separator as comma NF==16 is the condition that says execute block inside { and } if # of fields is 16 $0 = $0 FS "xx" appends xx at end of line 1 is the default awk action that means print the output
For using sed answer should be in the following: Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first. Use the special char & to denote the matched string The sed statement should look like the following: sed -i "${line_number}s/.*/&xx/" I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.
how to validate if data has a trailing "/"
I have a file containing various information. The fields are delimited by |. One of the fields contains a directory. For example : blah|blah|blah|/usr/local/etc/|blah|blah I need to validate that the path field does not end with a "/". I'm using ksh. Any suggestions? thanks.
Assuming the directory is always in the 4th field line=0 while IFS='|' read -rA fields; do let line++ [[ ${fields[3]} == */ ]] && echo line $line: ends with a slash done < filename
Not ksh, but this is a natural job for awk: awk -F\| '$4 ~ /\/$/ { print "Trailing slash in line "NR":", $4 }' ${file:?}
Try this: if [ line ~= '(/\w+)+(\||$)' ] My shell syntax is rusty, so this might need a little massaging into shape
Don't forget special path like / (root) I keep the / (root) in code below echo "blah|blah|blah|/usr/local/etc/|blah|blah| blah|blah|blah|/|blah|blah blah|blah|blah|.|blah|blah blah|blah|blah|/usr/local/etc|blah|blah" \ sed " /\/\|/ { /\|\/\|/ !s/\/|/|/ }" explaination: //\|/ treat line where a "/|" appear //\|/ ! treat line where "|/|" doesn't appear (here in the case of previous test occur) s//|/|/ replace "/|" by "|" (here when both test occur successfully)
Processing a tab delimited file with shell script processing
normally I would use Python/Perl for this procedure but I find myself (for political reasons) having to pull this off using a bash shell. I have a large tab delimited file that contains six columns and the second column is integers. I need to shell script a solution that would verify that the file indeed is six columns and that the second column is indeed integers. I am assuming that I would need to use sed/awk here somewhere. Problem is that I'm not that familiar with sed/awk. Any advice would be appreciated. Many thanks! Lilly
gawk: BEGIN { FS="\t" } (NF != 6) || ($2 != int($2)) { exit 1 } Invoke as follows: if awk -f colcheck.awk somefile then # is valid else # is not valid fi
Well you can directly tell awk what the field delimiter is (the -F option). Inside your awk script you can tell how many fields are present in each record with the NF variable. Oh, and you can check the second field with a regex. The whole thing might look something like this: awk < thefile -F\\t ' { if (NF != 6 || $2 ~ /[^0123456789]/) print "Format error, line " NR; } ' That's probably close but I need to check the regex because Linux regex syntax variation is so insane. (edited because grrrr)
here's how to do it with awk awk 'NF!=6||$2+0!=$2{print "error"}' file
Pure Bash: infile='column6.dat' lno=0 while read -a line ; do ((lno++)) if [ ${#line[#]} -ne 6 ] ; then echo -e "line $lno has ${#line[#]} elements" fi if ! [[ ${line[1]} =~ ^[0-9]+$ ]] ; then echo -e "line $lno column 2 : not an integer" fi done < "$infile" Possible output: line 19 has 5 elements line 36 column 2 : not an integer line 38 column 2 : not an integer line 51 has 3 elements