Execute command on each line in a file - bash

I have a list of ids in one file that I want to use to grep their information from a second file. I can only get my output to show only the information for the last id and I think I just can't figure out how to tweak my code a bit so that it outputs the info for each line, not the last one only.
my command:
for i in $(cat my_ids.txt);
do
for name in $i;
do
class=$(grep -A 25 $name id_info.txt | grep -E "tf_class");
family=$(grep -A 25 $name id_info.txt | grep -E "tf_family");
echo -e "$name\n\class\n\family";
done
done
I only get the last id's information lines that I need. I need it to show up for each ID and I don't know how else to tweak this. I also tried removing the second for loop but it was giving the exact same output.
Sample input from my_ids.txt:
MA0052.4
MA0602.1
MA0497.1
MA0786.1
MA0515.1
Sample input from id_info.txt
AC MA0052.4
XX
ID MEF2A
XX
DE MA0052.4 MEF2A ; From JASPAR
PO A C G T
01 5075.0 2119.0 3651.0 5317.0
02 4033.0 1960.0 4493.0 5676.0
03 1984.0 10919.0 1007.0 2252.0
04 627.0 2974.0 236.0 12325.0
05 12437.0 1013.0 1066.0 1646.0
06 13132.0 253.0 610.0 2167.0
07 14680.0 141.0 506.0 835.0
08 14453.0 231.0 241.0 1237.0
09 14956.0 173.0 202.0 831.0
10 441.0 349.0 215.0 15157.0
11 15582.0 50.0 422.0 108.0
12 2566.0 1060.0 11104.0 1432.0
13 7709.0 4039.0 1605.0 2809.0
14 6171.0 3523.0 1810.0 4658.0
15 5254.0 3812.0 2479.0 4617.0
XX
CC tax_group:vertebrates
CC tf_family:Regulators of differentiation
CC tf_class:MADS box factors
CC pubmed_ids:25217591
CC uniprot_ids:Q02078
CC data_type:ChIP-seq
AC MA0602.1
XX
ID Arid5a
XX
DE MA0602.1 Arid5a ; From JASPAR
PO A C G T
01 18.0 43.0 23.0 17.0
02 16.0 32.0 3.0 48.0
03 85.0 3.0 7.0 5.0
04 96.0 0.0 1.0 2.0
05 6.0 0.0 1.0 93.0
06 93.0 1.0 1.0 6.0
07 2.0 1.0 1.0 96.0
08 4.0 9.0 4.0 83.0
09 23.0 3.0 52.0 22.0
10 34.0 35.0 18.0 12.0
11 29.0 13.0 27.0 31.0
12 57.0 8.0 19.0 16.0
13 29.0 18.0 26.0 27.0
14 34.0 23.0 15.0 27.0
XX
CC tax_group:vertebrates
CC tf_family:ARID-related
CC tf_class:ARID
CC pubmed_ids:25215497
CC uniprot_ids:Q3U108
CC data_type:PBM
XX
AC MA0497.1
XX
ID MEF2C
XX
DE MA0497.1 MEF2C ; From JASPAR
PO A C G T
01 705.0 321.0 676.0 507.0
02 733.0 151.0 573.0 752.0
03 431.0 196.0 822.0 760.0
04 382.0 1412.0 78.0 337.0
05 0.0 985.0 0.0 1224.0
06 1616.0 256.0 74.0 263.0
07 1706.0 32.0 241.0 230.0
08 2107.0 0.0 87.0 15.0
09 2131.0 0.0 2.0 76.0
10 2135.0 0.0 4.0 70.0
11 56.0 62.0 0.0 2091.0
12 2177.0 0.0 32.0 0.0
13 389.0 120.0 1671.0 29.0
14 975.0 836.0 148.0 250.0
15 1009.0 450.0 126.0 624.0
XX
CC tax_group:vertebrates
CC tf_family:Regulators of differentiation
CC tf_class:MADS box factors
CC pubmed_ids:7559475
CC uniprot_ids:Q06413
CC data_type:ChIP-seq
XX
AC MA0786.1
XX
ID POU3F1
XX
DE MA0786.1 POU3F1 ; From JASPAR
PO A C G T
01 1034.0 126.0 322.0 1437.0
02 505.0 186.0 128.0 2471.0
03 2471.0 7.0 26.0 21.0
04 44.0 53.0 21.0 2471.0
05 37.0 13.0 2471.0 232.0
06 170.0 2471.0 413.0 1119.0
07 1423.0 1.0 21.0 1048.0
08 2471.0 103.0 130.0 284.0
09 2471.0 20.0 25.0 63.0
10 259.0 95.0 128.0 2471.0
11 382.0 302.0 620.0 1167.0
12 1510.0 478.0 452.0 961.0
XX
CC tax_group:vertebrates
CC tf_family:POU domain factors
CC tf_class:Homeo domain factors
CC pubmed_ids:1361172
CC uniprot_ids:Q03052
CC data_type:HT-SELEX
XX
AC MA0515.1
XX
ID Sox6
XX
DE MA0515.1 Sox6 ; From JASPAR
PO A C G T
01 4.0 139.0 50.0 56.0
02 0.0 221.0 0.0 28.0
03 161.0 0.0 0.0 88.0
04 0.0 0.0 0.0 249.0
05 0.0 0.0 0.0 249.0
06 0.0 0.0 249.0 0.0
07 0.0 0.0 0.0 249.0
08 0.0 115.0 5.0 129.0
09 4.0 112.0 0.0 133.0
10 14.0 76.0 31.0 128.0
XX
CC tax_group:vertebrates
CC tf_family:SOX-related factors
CC tf_class:High-mobility group (HMG) domain factors
CC pubmed_ids:21985497
CC uniprot_ids:P40645
CC data_type:ChIP-seq
XX
Example of the output I get when I run this as a bash script:
MA0052.4
MA0602.1
MA0497.1
MA0786.1
MA0515.1 CC tf_class:High-mobility group (HMG) domain factors CC tf_family:SOX-related factors
Desired output:
MA0602.1 CC ARID CC ARID-related
MA0497.1 CC MADS box factors CC Regulators of differentiation
MA0786.1 CC Homeo domain factors CC POU domain factors
MA0515.1 CC tf_class:High-mobility group (HMG) domain factors CC tf_family:SOX-related factors
Another code snippet I tried but the output just gives me id names and nothing more; probably because I am messing up the syntax somehow (ran this in terminal):
while IFS= read -r line; do class=$(grep -A 25 $line id_infoc.txt | grep -E "tf_class"); family=$(grep -A 25 $line id_info.txt | grep -E "tf_family"); echo -e "$line\n\class\n\family"; done < my_ids.txt

Try this script:
#! /usr/bin/env bash
while read -r id; do
name="$id"
class=$( grep -A 25 "$name" id_info.txt | grep -E "tf_class")
family=$(grep -A 25 "$name" id_info.txt | grep -E "tf_family")
echo -e "${name}\n${class}\n${family}"
done <"my_ids.txt"

Ignoring style, the bug in your code is that you use \family and \class instead of $family and $class.
Invoking grep multiple times as you do will be a bit inefficient if the file is large and there are many ids to check.
A straightforward solution in awk that only needs to read each file once might be:
awk '
function do_print () {
if (name in ids)
printf("%s\n%s\n%s\n",name,class,family)
name=family=class=""
}
# read ids into an array
NR==FNR { ids[$0]; next }
# start of a section
/^AC / { do_print(); name=$2; next }
# other candidate values found
/^CC tf_family:/ { family=$0; next }
/^CC tf_class:/ { class=$0; next }
# maybe print final section
END { do_print() }
' my_ids.txt id_info.txt
To filter out the tf_family:,etc, the regexes can be replaced by sub:
sub(/^CC tf_family:/,"CC ") { family=$0; next }
sub(/^CC tf_class:/,"CC ") { class=$0; next }

Related

concat two files side-by-side, append difference between fields, and print in tabular format

Consider I have a two files as below: I need to concatenate and find difference in the new file.
a.txt
a 2019 66
b 2020 50
c 2018 48
b.txt
a 2019 50
b 2019 40
c 2018 45
Desired output:
a 2019 66 a 2019 50 16
b 2020 50 b 2019 40 10
c 2018 48 c 2018 45 3
I tried:
awk -F, -v OFS=" " '{$7=$3-$6}1' file3.txt
it prints
a 2019 66 a 2019 50 0
b 2020 50 b 2019 40 0
c 2018 48 c 2018 45 0
Also can help in printing in tabular format?
Your awk command seems fine except -F,. You should paste those files first.
$ paste a.txt b.txt | awk '{print $0,$3-$6}' | column -t
a 2019 66 a 2019 50 16
b 2020 50 b 2019 40 10
c 2018 48 c 2018 45 3
Within single awk could you please try following.
awk 'FNR==NR{a[FNR]=$0;b[FNR]=$NF;next} {print a[FNR],$0,b[FNR]-$NF}' a.txt b.txt | column -t
Output will be as follows.
a 2019 66 a 2019 50 16
b 2020 50 b 2019 40 10
c 2018 48 c 2018 45 3

Issue with including if statement in bash script

I am pretty new to bash scripting. I have my bash script below and I want to include an if statement when month (ij==09) equals 09 then "i" should be from 01 to 30. I tried several ways but did not work.
How can I include an if statement in the code below to achieve my task.? Any help is appreciated.
Thanks.
#!/bin/bash
for ii in 2007
do
for i in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 #Day of the Month
do
for ij in 09 10 # Month
do
for j in 0000 0100 0200 0300 0400 0500 0600 0700 0800 0900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300
do
cdo cat DAS_0125_H.A${ii}${ij}${i}.${j}.002_var.nc outfile_${ii}${ij}${i}.nc
done
done
done
done
The smallest change is adding a continue for day 31 in month 9.
You must test "09" as a string (or as 10#09).
(I also changed cdo ... into echo cdo ...)
for ii in 2007
do
for i in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 #Day of the Month
do
for ij in 09 10 # Month
do
if [[ "${ij}" == "09" ]] && [[ "${i}" == "31" ]]; then continue; fi
for j in 0000 0100 0200 0300 0400 0500 0600 0700 0800 0900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300
do
echo "cdo cat DAS_0125_H.A${ii}${ij}${i}.${j}.002_var.nc outfile_${ii}${ij}${i}.nc"
done
done
done
done
It would be easier to read when you use loop through the variales with a seq. You do not want to use `for ((i=1;i<=31;i++)) in view of the leading zeroes.
Also use verbose variable names.
for year in 2007
do
for day in {01..31} # Day of the Month
do
for month in {09,10} # Month
do
if [[ "${month}" == "09" ]] && [[ "${day}" == "31" ]]; then continue; fi
for hour in {00..23}
do
echo cdo cat DAS_0125_H.A${year}${month}${day}.${hour}00.002_var.nc outfile_${year}${month}${day}.nc
done
done
done
done
When the files already exist, you can consider
ls DAS_0125_H.A2007{09,10}{01..31}.{00..23}00.002_var.nc |
sed -r 's/.*([0-9]{8})/cdo cat & outfile_\1.nc/'
When this will show the commands you want, you can execute them by
source <(ls DAS_0125_H.A2007{09,10}{01..31}.{00..23}00.002_var.nc |
sed -r 's/.*([0-9]{8})/cdo cat & outfile_\1.nc/')

how to add 0 digit to a single symbol hex value where it is missed, bash

I have a some file with the following content
$ cat somefile
28 46 5d a2 26 7a 192 168 2 2
0 15 e c8 a8 a3 192 168 100 3
54 4 2b 8 c 26 192 168 20 3
As you can see the values in first six columns are represented in hex, the values in last four columns in decimal formats. I just want to add 0 to every single symbol hexidecimal value.
Thanks beforehand.
This one should work out for you:
while read -a line
do
hex=(${line[#]:0:6})
printf "%02x " ${hex[#]/#/0x}
echo ${line[#]:6:4}
done < somefile
Example:
$ cat somefile
28 46 5d a2 26 7a 192 168 2 2
0 15 e c8 a8 a3 192 168 100 3
54 4 2b 8 c 26 192 168 20 3
$ while read -a line
> do
> hex=(${line[#]:0:6})
> printf "%02x " ${hex[#]/#/0x}
> echo ${line[#]:6:4}
> done < somefile
28 46 5d a2 26 7a 192 168 2 2
00 15 0e c8 a8 a3 192 168 100 3
54 04 2b 08 0c 26 192 168 20 3
Here is a way with awk if that is an option:
awk '{for(i=1;i<=6;i++) if(length($i)<2) $i=0$i}1' file
Test:
$ cat file
28 46 5d a2 26 7a 192 168 2 2
0 15 e c8 a8 a3 192 168 100 3
54 4 2b 8 c 26 192 168 20 3
$ awk '{for(i=1;i<=6;i++) if(length($i)<2) $i=0$i}1' file
28 46 5d a2 26 7a 192 168 2 2
00 15 0e c8 a8 a3 192 168 100 3
54 04 2b 08 0c 26 192 168 20 3
Please try this too, if it helps (bash version 4.1.7(1)-release)
#!/bin/bash
while read line;do
arr=($line)
i=0
for num in "${arr[#]}";do
if [ $i -lt 6 ];then
if [ ${#num} -eq 1 ];then
arr[i]='0'${arr[i]};
fi
fi
i=$((i+1))
done
echo "${arr[*]}"
done<your_file
This might work for you (GNU sed):
sed 's/\b\S\s/0&/g' file
Finds a single non-space character and prepends a 0.

CRC16 and data communications

Hi I have been trying to calculate a CRC for a device I want to write a software interface for. For simplicity I will say X is the device and Y is the hardware controller. I am looking for a nudge in the right direction I am sure I am on the correct track just a little confused on a few points.
When the device is idle it sends the following strings of data every 2 seconds or so that looks like it is counting up in hex: The 2 bytes between the | | is the CRC I assume. (XX) is the varying byte.
X: 96 10 01 E1 (E4) 01 FF 10 17 | F7 EC | 10 06 E1 96 FE
X: 96 10 01 E1 (E6) 01 FF 10 17 | 7F FA | 10 06 E1 96 FE
X: 96 10 01 E1 (E8) 01 FF 10 17 | C7 9B | 10 06 E1 96 FE
X: 96 10 01 E1 (EA) 01 FF 10 17 | 4F 8D | FE 10 06 E1 96 FE
X: 96 10 01 E1 (EC) 01 FF 10 17 | D7 B6 | FE 10 06 E1 96 FE
X: 96 10 01 E1 (EE) 01 FF 10 17 | 5F A0 | FE 10 06 E1 96 FE
Using reveng with reveng -w 16 -s and the above sets of data I get:
width=16 poly=0x1021 init=0x1e69 refin=true refout=true xorout=0x0000 check=0x3da6 name=(none)
When I intercept the a command from the controller I get:
X: 96 10 01 E1 (EE) 01 FF 10 17 | 5F A0 | FE 10 06 E1 96 FE -- Last line before command
Y: E1 10 01 96 (22) 05 01 C0 A8 35 00 10 17 |0B B8| FE 10 06 96 E1 FE
Where (22) is the the modifier |0B B8| is the CRC. How is the 22 derived from the E4? is it another CRC?
When I sent the same command several times I intercepted the following:
Y: E1100196220501C0A8350010170BB8FE100696E1FE
Y: E11001962A0501C0A835001017C1C7FE100696E1FE
Y: E11001962E0501C0909400101753C8FE100696E1FE
Y: E1100196300501809094001017C3EEFE100696E1FE
Y: E1100196360501C090940010170D48FE100696E1FE
Y: E11001962A0501C09094001017B6F7FE100696E1FE
Y: E11001962A0501C09094001017B6F7FE100696E1FE
Using reveng with reveng -w 16 -s and the above sets of data I get:
width=16 poly=0x1021 init=0xd313 refin=true refout=true xorout=0x0000 check=0x295f name=(none)
The polynomial is the same but init and check vary, sorry for the long post but here is the summary of my questions:
1) Is it common for say the device to use the same polynomial but different init and check to the controller?
2) Is the constant counting strings from the device used to offset the variable byte used to calculate the checksum? If so what is this mechanism called and what methods could be used to derive the relationship between the count and the byte?
3) Am I on the right track or have I got lost along the way?
Thanks for taking the time to read this and would really appreciate a kick in the right direction.
Drop the first byte off of your X and Y sequences, and then you'll get for both:
width=16 poly=0x1021 init=0xffff refin=true refout=true xorout=0xffff check=0x906e name="X-25"
To wit:
% reveng -w 16 -s 100196220501C0A8350010170BB8 1001962A0501C0A835001017C1C7 1001962E0501C0909400101753C8 100196300501809094001017C3EE 100196360501C090940010170D48 1001962A0501C09094001017B6F7
width=16 poly=0x1021 init=0xffff refin=true refout=true xorout=0xffff check=0x906e name="X-25"
% reveng -w 16 -s 1001E1E401FF1017F7EC 1001E1E601FF10177FFA 1001E1E801FF1017C79B 1001E1EA01FF10174F8D 1001E1EC01FF1017D7B6 1001E1EE01FF10175FA0
width=16 poly=0x1021 init=0xffff refin=true refout=true xorout=0xffff check=0x906e name="X-25"

bash sequence 00 01 ... 10

in bash, with
$ echo {1..10}
1 2 3 4 5 6 7 8 9 10
I can get a numbers sequence, but in some case I need
01 02 03 ... 10
how I can get this ?
and how I can get ?
001 002 ... 010 011 .. 100
This will work in any shell on a machine that has coreutils installed (thanks commenters for correcting me):
seq -w 1 10
and
seq -w 1 100
Explanation:
the option -w will:
Equalize the widths of all numbers by padding with zeros as necessary.
seq [-w] [-f format] [-s string] [-t string] [first [incr]] last
prints a sequence of numbers, one per line (default), from
first (default 1), to near last as possible, in increments of incr (default
1). When first is larger than last the default incr is -1
use seq command with -f parameter, try:
seq -f "%02g" 0 10
results:
00
01
02
03
04
05
06
07
08
09
10
seq -f "%03g" 0 10
results:
000
001
002
003
004
005
006
007
008
009
010
printf "%02d " {1..10} ; echo
Output:
01 02 03 04 05 06 07 08 09 10
Similarly:
printf "%03d " {1..100} ; echo
In more recent versions of bash, simply:
echo {01..10}
And:
echo {001..100}
for i in {01..99}; do
echo $i
done
will return :
01
02
03
04
05
06
07
08
09
10
...
Replacing 01 with 001 and 99 with 999 or 100 will do what you expect also.
$ printf "%02d " {0..10}; echo
00 01 02 03 04 05 06 07 08 09 10
$ printf "%03d " {0..100}; echo
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100
Just vary the field width in the format string (2 and 3 in this case) and of course the brace expansion range. The echo is there just for cosmetic purposes, since the format string does not contain a newline itself.
printf is a shell builtin, but you likely also have a version from coreutils installed, which can be used in-place.
awk only:
awk 'BEGIN { for (i=0; i<10; i++) printf("%02d ", i) }'
The following will work in bash
echo {01..10}
**EDIT seeing the answers around me I just wanted to add this, in the case we're talking about commands that will work under any terminal
yes | head -n 100 | awk '{printf( "%03d ", NR )}' ##for 001...100
or
yes | head -n 10 | awk '{printf( "%03d ", NR )}' ##for 01..10
echo 0{0..9}
You can get: 00 01 02 03 04 05 06 07 08 09
echo 0{0..9} 1{0..9}
You can get: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
echo 00{0..9} 0{10..99}
You can get 001 .. 099
There are so many ways to do this! My personal favorite is:
yes | grep y | sed 100q | awk '{printf( "%03d ", NR )}'; echo
Clearly, neither the sed nor the grep are necessary (the grep being far more trivial, since if you omit the sed you need to change the awk), but they contribute to the overall satisfaction of the solution! The final echo is not really necessary either, but it's always nice to have a trailing newline.
Another nice option is:
yes | nl -ba | tr ' ' 0 | sed 100q | cut -b 4-6
Or (less absurdly):
yes '' | sed ${top-100}q | nl -ba -w ${width-3} -n rz
as commented by favoretti, seq is your friend.
But there is a caveat:
seq -w uses the second argument to set the format it will use.
Thus, the command seq -w 1 9 will print the sequence 1 2 3 4 5 6 7 8 9
To print the sequence 01 .. 09 you need to do the following:
seq -w 1 09
Or for clarities sake use the same format on both ends, for instance:
seq -w 000 010 for the series 001 002 003 ... 010
And you can also use a step argument that also works in reverse:
seq -w 10 -1 01' for 10,09,08...01 orseq -w 01 2 10` for 01,03,05,07,09

Resources