Can I use Entrez Direct to query multiple nucleotide accession version identifiers against a database without using epost? - bioinformatics

I have downloaded a hit table from blast NCBI (Nucleotide blast using the nucleotide collection database and megablast program) and used awk to order it by accession version identities.
awk -F "\t" 'NF>1{print}' unsorted_input.txt | sort -k2 > sorted_output.txt
I then used Entrez Direct to use the accession version identifiers to extract the subject organism of each alignment:
awk -F "\t" 'NF>1{print $2}' unsorted_input.txt | epost -db nucleotide | efetch -format docsum | xtract -pattern DocumentSummary -element Organism | sort | paste sorted_output.txt - > final_output.txt
This command was able to extract the subject organism data for some alignments but not all. I noticed that for alignments that epost did not work for, individually querying them with esearch did work:
esearch -db nucleotide -query "accession_version_identifier" | efetch -format docsum | xtract -pattern DocumentSummary -element Organism
So, I attempted to use this approach with a loop, using the accession version identifier (second column) of each line to extract the subject organism name as such:
while IFS=$'\t' read -r -a myArray
do
esearch -db nucleotide -query "${myArray[1]}" | efetch -format docsum | xtract -pattern DocumentSummary -element Organism > "output.txt"
done < input.txt
However, this only returned the subject organism of the first row. How can I apply this to every row, storing all subject organisms in the same file?
The first few lines of the input file can be found below. It is tab delimited:
ce1e013e-c4c5-47f9-b041-521ee293c4f0 AB002282.1 91.217 649 24 22 41 676 8 636 0.0 854
c10d7882-cc00-4ee2-8643-9b27fef66e83 AB828191.1 84.615 117 9 6 118 228 17668 17781 5.16e-19 108
c10d7882-cc00-4ee2-8643-9b27fef66e83 AB828191.1 84.615 117 9 6 118 228 20740 20853 5.16e-19 108
c10d7882-cc00-4ee2-8643-9b27fef66e83 AB828191.1 84.615 117 9 6 118 228 23812 23925 5.16e-19 108
c10d7882-cc00-4ee2-8643-9b27fef66e83 AB828191.1 84.615 117 9 6 118 228 26884 26997 5.16e-19 108
c10d7882-cc00-4ee2-8643-9b27fef66e83 AB828191.1 84.615 117 9 6 118 228 29956 30069 5.16e-19 108
c10d7882-cc00-4ee2-8643-9b27fef66e83 AB828191.1 85.345 116 9 6 118 228 33027 33139 1.11e-20 113
c10d7882-cc00-4ee2-8643-9b27fef66e83 AB828191.1 87.000 100 7 5 132 228 14613 14709 5.16e-19 108
8e8ac3f3-63f6-4519-ad25-287a25169f87 AB850654.1 88.262 4660 175 260 16 4401 103840 108401 0.0 5232
c4233926-9f23-46c4-bc4d-5702f47885bd AB850654.1 89.958 4272 119 235 1 4042 104203 108394 0.0 5227
876d8f20-9d36-4207-8754-0924d99a6c46 AC019188.6 91.855 221 4 7 3 210 78509 78290 1.39e-75 296

I have fixed the problem:
while IFS=$'\t' read -r -a myArray
do
echo | esearch -db nucleotide -query "${myArray[1]}" | efetch -format docsum | xtract -pattern DocumentSummary -element Title,Organism >> output.txt
done < input.txt

Related

how to make a table from the columns of other tables in bash?

Hello I have 50 tables in tsv format all with the same column names in the same order:
e.g.
cat sample1.tsv | head -4
name
coverage
ID
bases
reads
length
vir1
0.535
3rf
1252
53
11424
vir2
0.124
2ds
7534
152
63221
vir3
0.643
6tf
3341
73
21142
I want to elaborate a table from the "reads" column (5th column) from the 50 tables. The name column have the same values and same order along the 50 tables
Desired output:
cat reads_table.tsv | head -4
names
sample1
sample2
sample3
sample4
sample5
sample50
vir1
53
742
42
242
42
342
vir2
152
212
512
21
74
41
vir3
73
13
172
42
142
123
I was thinking on doing this by saving the reads column (the 5th column in all tables) to an array and using paste bash function to paste the columns and save them to a new empty file called "reads_table.tsv" but I don't know how to do this on bash.
This is what I tried in a first instance:
for i in *.tsv
do
reads=$(awk '{print $5}' $i)
sed -i 's/$/\t$reads/' $i >> reads_table.tsv
done
Created some input files to match OP's expected output:
$ head sample*.tsv
==> sample1.tsv <==
name coverage ID bases reads length
vir1 0.535 3rf 1252 53 11424
vir2 0.124 2ds 7534 152 63221
vir3 0.643 6tf 3341 73 21142
==> sample2.tsv <==
name coverage ID bases reads length
vir1 0.535 3rf 1252 742 11424
vir2 0.124 2ds 7534 212 63221
vir3 0.643 6tf 3341 13 21142
==> sample3.tsv <==
name coverage ID bases reads length
vir1 0.535 3rf 1252 42 11424
vir2 0.124 2ds 7534 512 63221
vir3 0.643 6tf 3341 172 21142
==> sample4.tsv <==
name coverage ID bases reads length
vir1 0.535 3rf 1252 242 11424
vir2 0.124 2ds 7534 21 63221
vir3 0.643 6tf 3341 42 21142
==> sample5.tsv <==
name coverage ID bases reads length
vir1 0.535 3rf 1252 42 11424
vir2 0.124 2ds 7534 74 63221
vir3 0.643 6tf 3341 142 21142
==> sample50.tsv <==
name coverage ID bases reads length
vir1 0.535 3rf 1252 342 11424
vir2 0.124 2ds 7534 41 63221
vir3 0.643 6tf 3341 123 21142
One awk idea:
awk '
BEGIN { FS=OFS="\t" }
FNR==NR { lines[FNR]=$1 } # save 1st column ("name") of 1st file
FNR==1 { split(FILENAME,a,".") # 1st row of each file: split FILENAME
lines[FNR]=lines[FNR] OFS a[1] # save FILENAME (sans ".tsv")
next
}
{ lines[FNR]=lines[FNR] OFS $5 } # rest of rows in file: append tjhe 5th column to our output lines
END { for (i=1;i<=FNR;i++) # loop through rows and ...
print lines[i] # print the associated line to stdout
}
' $(find . -name "sample*.tsv" -printf "%f\n" | sort -V ) > reads_table.tsv
NOTES:
the find/sort is required to insure the files are fed to awk in Version sort order (eg, sample3.tsv comes before sample21.tsv)
the printf %f\n removes the leading .\ from the filename (otherwise we could remove in the awk script)
the -V option tells sort to run a Version sort
This generates:
name sample1 sample2 sample3 sample4 sample5 sample50
vir1 53 742 42 242 42 342
vir2 152 212 512 21 74 41
vir3 73 13 172 42 142 123

MITE (legacy pipeline) used instead of DSB (uops cache) when jump is not quite aligned on 32 bytes

This question used to be a part of this (now updated) question, but it seems like it should be another question, since it didn't help to get an answer to the other one.
My starting point is a loop doing 3 independent additions:
for (unsigned long i = 0; i < 2000000000; i++) {
asm volatile("" : "+r" (a), "+r" (b), "+r" (c), "+r" (d)); // prevents C compiler from optimizing out adds
a = a + d;
b = b + d;
c = c + d;
}
When this loop is not unrolled, it executes in 1 cycle (which is to be expected: it contains 4 instructions: the 3 additions, and the macro-fused increment/jump; all of which can be executed in one cycle on ports 0, 1, 5 and 6). When unrolling this loop, performances are surprising, and tend to be 25% slower than the non-unrolled version, which is probably due to uops scheduling, as suggested in the comments of the previous question.
In this question, I'm not asking about the performances, but rather about why in some cases, uops come from the MITE (legacy pipeline), and in other cases, from the DSB (uop cache). (note that I'm using a Skylake with the LSD (Loop Stream Detector) disabled)
Experimentally, when the jump is not quite aligned on 32 bytes, uops are issued from the MITE rather than the DSB. ("not quite 32 bytes" really means from 2 bytes before and 3 bytes after 32 bytes. Or put another way, starting from a 32-byte aligned jump, it means adding 1 to 3 bytes of padding, or removing 1 or 2 bytes of padding)
Compiling the C code above with Clang and (manually) unrolling it one time produces the following assembly code:
movl $2000000000, %esi
.p2align 4, 0x90
.LBB0_1:
addl %edi, %edx # 1
addl %edi, %ecx
addl %edi, %eax
addl %edi, %edx # 2
addl %edi, %ecx
addl %edi, %eax
addq $-2, %rsi
jne .LBB0_1
This code executes 2 cycles/iteration, as expected, and most uops are delivered by the DSB. Adding one byte of padding before the loop causes the loop to execute in 3 cycles/iteration, and all the uops are now delivered by the MITE.
In an effort to understand what is happening, I changed the align directive to .p2align 7 (thus aligning the loop on 128 bytes), and added some padding before the loop, thus changing the loop alignment. The results are as follows (long snippet ahead; explanations below):
| Padding | Jump offset | Cycles | MITE uops | DSB uops | DSB miss | DSB miss penalty |
| ------- | ----------- | ----------------- | ------------- | ------------- | ------------- | ---------------- |
| 0 | 16 | 2 453 942 151 | 1 589 440 | 7 000 531 761 | 73 681 | 33 419 |
| 1 | 17 | 2 454 623 799 | 2 002 088 | 7 000 493 234 | 107 433 | 28 686 |
| 2 | 18 | 2 454 010 264 | 1 611 181 | 7 000 580 070 | 72 372 | 34 963 |
| 3 | 19 | 2 455 016 743 | 1 531 428 | 7 001 271 720 | 76 240 | 42 493 |
| 4 | 20 | 2 454 056 088 | 1 592 150 | 7 000 571 537 | 71 691 | 29 677 |
| 5 | 21 | 2 455 111 497 | 1 701 204 | 7 001 068 440 | 85 117 | 41 744 |
| 6 | 22 | 2 454 558 860 | 2 081 244 | 7 000 362 980 | 105 388 | 29 829 |
| 7 | 23 | 2 454 351 179 | 1 765 720 | 7 000 472 785 | 81 903 | 39 022 |
| 8 | 24 | 2 454 470 296 | 2 045 062 | 7 000 337 694 | 107 763 | 30 750 |
| 9 | 25 | 2 454 395 853 | 1 748 525 | 7 000 560 730 | 82 773 | 37 030 |
| 10 | 26 | 2 453 920 970 | 1 500 801 | 7 000 562 016 | 70 144 | 36 559 |
| 11 | 27 | 2 453 748 551 | 1 485 784 | 7 000 530 064 | 66 535 | 32 019 |
| 12 | 28 | 2 453 973 841 | 1 601 708 | 7 000 562 754 | 72 601 | 31 970 |
| 13 | 29 | 2 454 749 106 | 2 085 092 | 7 000 539 751 | 109 862 | 30 977 |
| 14 | 30 | **3 003 289 033** | 7 001 845 873 | 358 240 | 1 000 075 874 | 37 506 |
| 15 | 31 | **4 003 748 994** | 7 002 171 254 | 372 672 | 1 000 086 939 | 39 679 |
| 16 | 32 | **3 003 810 021** | 7 002 294 170 | 295 736 | 1 000 114 704 | 28 974 |
| 17 | 33 | **3 002 912 972** | 7 001 752 747 | 350 755 | 1 000 071 698 | 32 249 |
| 18 | 34 | **3 003 392 542** | 7 001 941 076 | 360 439 | 1 000 076 887 | 45 663 |
| 19 | 35 | **3 003 040 266** | 7 001 759 091 | 343 693 | 1 000 072 685 | 38 703 |
| 20 | 36 | 2 453 764 603 | 1 511 899 | 7 000 546 442 | 66 912 | 32 996 |
| 21 | 37 | 2 454 889 754 | 1 946 579 | 7 000 713 787 | 102 922 | 31 852 |
| 22 | 38 | 2 454 700 423 | 1 961 612 | 7 000 581 288 | 100 281 | 30 364 |
| 23 | 39 | 2 454 398 236 | 1 974 415 | 7 000 350 258 | 103 015 | 30 855 |
| 24 | 40 | 2 452 285 702 | 1 562 028 | 7 000 416 473 | 67 622 | 38 783 |
| 25 | 41 | 2 454 500 700 | 2 013 917 | 7 000 384 154 | 102 906 | 31 165 |
| 26 | 42 | 2 454 666 446 | 1 928 032 | 7 000 572 245 | 99 613 | 35 813 |
| 27 | 43 | 2 453 929 241 | 1 565 110 | 7 000 588 419 | 70 027 | 31 336 |
| 28 | 44 | 2 453 852 431 | 1 595 897 | 7 000 633 247 | 71 735 | 35 984 |
| 29 | 45 | 2 454 664 111 | 2 039 338 | 7 000 534 894 | 105 225 | 30 043 |
| 30 | 46 | 2 454 523 184 | 1 876 338 | 7 000 592 928 | 88 020 | 48 456 |
| 31 | 47 | 2 454 091 130 | 1 560 821 | 7 000 631 532 | 70 150 | 37 773 |
| 32 | 48 | 2 453 813 400 | 1 535 557 | 7 000 556 686 | 70 196 | 33 268 |
| 33 | 49 | 2 453 772 578 | 1 501 716 | 7 000 526 938 | 67 747 | 33 492 |
| 34 | 50 | 2 455 308 730 | 1 643 047 | 7 001 287 728 | 80 148 | 43 035 |
| 35 | 51 | 2 453 790 620 | 1 506 869 | 7 000 529 450 | 66 903 | 35 315 |
| 36 | 52 | 2 453 509 109 | 1 534 817 | 7 000 405 227 | 67 344 | 30 526 |
| 37 | 53 | 2 453 516 412 | 1 469 184 | 7 000 430 367 | 65 040 | 30 686 |
| 38 | 54 | 2 453 851 033 | 1 556 722 | 7 000 581 363 | 69 098 | 36 605 |
| 39 | 55 | 2 454 916 648 | 2 089 549 | 7 000 572 462 | 111 448 | 30 435 |
| 40 | 56 | 2 455 089 502 | 1 991 232 | 7 000 799 155 | 104 559 | 30 724 |
| 41 | 57 | 2 454 744 425 | 2 002 307 | 7 000 532 096 | 105 221 | 32 393 |
| 42 | 58 | 2 454 543 686 | 1 960 042 | 7 000 500 103 | 101 409 | 27 943 |
| 43 | 59 | 2 453 893 848 | 1 561 182 | 7 000 607 528 | 73 192 | 33 645 |
| 44 | 60 | 2 453 989 634 | 1 629 949 | 7 000 556 378 | 74 704 | 34 821 |
| 45 | 61 | 2 453 879 092 | 1 551 181 | 7 000 561 022 | 70 233 | 36 191 |
| 46 | 62 | **3 003 015 120** | 7 001 772 138 | 348 243 | 1 000 073 404 | 35 333 |
| 47 | 63 | **4 004 092 512** | 7 002 359 576 | 380 452 | 2 000 097 711 | 50 376 |
| 48 | 64 | **2 234 898 441** | 109 006 411 | 7 893 398 716 | 109 108 | 35 075 |
| 49 | 65 | **3 003 182 414** | 7 001 843 757 | 357 954 | 2 000 075 494 | 36 281 |
| 50 | 66 | **3 003 280 054** | 7 001 876 384 | 358 097 | 2 000 075 630 | 39 301 |
| 51 | 67 | **3 004 086 641** | 7 002 384 321 | 307 480 | 2 000 114 067 | 32 242 |
| 52 | 68 | 2 461 587 458 | 15 841 141 | 6 986 174 099 | 70 725 | 29 985 |
| 53 | 69 | 2 454 704 936 | 2 019 734 | 7 000 530 774 | 123 110 | 32 717 |
| 54 | 70 | **2 629 777 063** | 639 698 105 | 6 362 945 524 | 121 313 | 29 648 |
| 55 | 71 | 2 452 517 518 | 21 196 356 | 6 980 899 385 | 5 689 504 | 27 618 |
| 56 | 72 | 2 457 056 675 | 79 539 769 | 6 922 550 909 | 23 953 203 | 32 238 |
| 57 | 73 | 2 453 966 239 | 1 486 894 | 7 000 608 597 | 72 506 | 36 799 |
| 58 | 74 | 2 461 391 665 | 53 426 497 | 6 948 932 999 | 13 034 546 | 37 883 |
| 59 | 75 | 2 454 091 521 | 1 537 438 | 7 000 613 720 | 73 256 | 38 003 |
| 60 | 76 | 2 550 237 671 | 312 611 365 | 6 689 536 750 | 62 278 250 | 41 078 |
| 61 | 77 | 2 454 371 129 | 1 915 411 | 7 000 545 114 | 107 086 | 30 133 |
| 62 | 78 | 2 462 015 450 | 32 874 270 | 6 969 244 698 | 5 296 338 | 37 506 |
| 63 | 79 | 2 453 810 530 | 1 588 073 | 7 000 489 720 | 70 291 | 36 915 |
| 64 | 80 | 2 453 510 981 | 1 521 322 | 7 000 384 678 | 67 219 | 30 114 |
| 65 | 81 | 2 454 659 220 | 1 531 897 | 7 001 004 411 | 74 567 | 41 201 |
| 66 | 82 | 2 453 984 834 | 1 570 182 | 7 000 624 664 | 72 914 | 39 483 |
| 67 | 83 | 2 454 127 882 | 1 638 057 | 7 000 590 289 | 75 623 | 33 755 |
| 68 | 84 | 2 453 781 071 | 1 575 812 | 7 000 535 270 | 74 337 | 34 094 |
| 69 | 85 | 2 453 947 163 | 1 595 272 | 7 000 545 139 | 71 584 | 38 966 |
| 70 | 86 | 2 453 948 945 | 1 594 376 | 7 000 552 806 | 71 096 | 34 265 |
| 71 | 87 | 2 453 888 591 | 1 540 673 | 7 000 536 024 | 71 123 | 33 350 |
| 72 | 88 | 2 453 838 422 | 1 539 740 | 7 000 540 957 | 71 776 | 33 191 |
| 73 | 89 | 2 454 013 271 | 1 532 577 | 7 000 534 226 | 69 794 | 32 287 |
| 74 | 90 | 2 453 959 044 | 1 549 283 | 7 000 562 495 | 71 483 | 35 739 |
| 75 | 91 | 2 454 357 932 | 2 062 771 | 7 000 290 377 | 111 481 | 28 864 |
| 76 | 92 | 2 454 258 445 | 1 937 218 | 7 000 338 810 | 101 760 | 27 475 |
| 77 | 93 | 2 454 156 149 | 1 738 764 | 7 000 400 563 | 82 207 | 38 130 |
| 78 | 94 | **3 003 245 905** | 7 001 947 715 | 356 496 | 1 000 078 668 | 38 983 |
| 79 | 95 | **4 003 498 969** | 7 002 106 621 | 361 236 | 1 000 087 167 | 41 197 |
| 80 | 96 | **3 003 440 683** | 7 001 915 914 | 340 975 | 1 000 081 844 | 36 174 |
| 81 | 97 | **3 003 192 020** | 7 001 848 864 | 354 371 | 1 000 076 474 | 37 465 |
| 82 | 98 | **3 004 231 542** | 7 002 423 726 | 327 973 | 1 000 119 668 | 34 498 |
| 83 | 99 | **3 003 204 122** | 7 001 869 410 | 341 860 | 1 000 075 913 | 34 005 |
| 84 | 100 | 2 453 903 936 | 1 509 757 | 7 000 577 662 | 70 586 | 38 383 |
| 85 | 101 | 2 454 444 592 | 1 649 275 | 7 000 764 725 | 76 185 | 37 481 |
| 86 | 102 | 2 455 551 786 | 2 094 483 | 7 000 919 108 | 115 683 | 33 599 |
| 87 | 103 | 2 454 090 830 | 1 644 299 | 7 000 554 367 | 76 131 | 37 986 |
| 88 | 104 | 2 452 263 286 | 1 982 058 | 7 000 594 326 | 105 011 | 32 747 |
| 89 | 105 | 2 453 938 066 | 1 552 994 | 7 000 560 184 | 71 781 | 38 307 |
| 90 | 106 | 2 453 839 657 | 1 591 329 | 7 000 534 174 | 71 493 | 32 464 |
| 91 | 107 | 2 456 284 290 | 1 721 752 | 7 001 608 059 | 87 228 | 62 810 |
| 92 | 108 | 2 453 706 579 | 1 577 941 | 7 000 431 429 | 70 517 | 33 684 |
| 93 | 109 | 2 453 714 638 | 1 484 598 | 7 000 514 337 | 66 443 | 34 239 |
| 94 | 110 | 2 453 814 023 | 1 619 443 | 7 000 418 813 | 74 924 | 34 831 |
| 95 | 111 | 2 453 734 759 | 1 502 260 | 7 000 447 611 | 66 790 | 36 660 |
| 96 | 112 | 2 456 304 117 | 1 636 949 | 7 001 903 454 | 87 894 | 45 984 |
| 97 | 113 | 2 454 764 375 | 2 032 245 | 7 000 503 166 | 111 873 | 36 308 |
| 98 | 114 | 2 453 930 372 | 1 641 970 | 7 000 527 807 | 75 164 | 36 817 |
| 99 | 115 | 2 453 596 195 | 1 577 533 | 7 000 528 820 | 74 424 | 35 428 |
| 100 | 116 | 2 453 774 301 | 1 490 781 | 7 000 546 047 | 71 040 | 31 462 |
| 101 | 117 | 2 453 808 290 | 1 472 783 | 7 000 563 094 | 68 497 | 30 214 |
| 102 | 118 | 2 453 927 668 | 1 578 700 | 7 000 547 988 | 72 499 | 36 894 |
| 103 | 119 | 2 453 881 334 | 1 538 221 | 7 000 556 688 | 73 651 | 38 630 |
| 104 | 120 | 2 454 620 311 | 2 049 316 | 7 000 459 876 | 110 210 | 30 452 |
| 105 | 121 | 2 453 793 013 | 1 553 815 | 7 000 448 812 | 70 690 | 35 146 |
| 106 | 122 | 2 453 516 549 | 1 477 303 | 7 000 369 210 | 66 462 | 32 381 |
| 107 | 123 | 2 453 679 941 | 1 558 433 | 7 000 399 585 | 71 027 | 37 700 |
| 108 | 124 | 2 453 984 832 | 1 591 183 | 7 000 558 547 | 74 810 | 32 532 |
| 109 | 125 | 2 453 972 231 | 1 585 644 | 7 000 573 173 | 73 159 | 39 583 |
| 110 | 126 | **3 003 167 043** | 7 001 793 152 | 341 345 | 1 000 076 047 | 41 811 |
| 111 | 127 | **4 004 031 670** | 7 002 344 014 | 394 950 | 2 000 094 647 | 42 345 |
| 112 | 128 | **2 017 184 284** | 2 397 032 | 7 999 676 604 | 97 555 | 23 614 |
| 113 | 129 | **3 003 231 942** | 7 001 876 887 | 355 548 | 2 000 078 108 | 35 462 |
| 114 | 130 | **3 003 073 797** | 7 001 763 748 | 343 879 | 2 000 073 914 | 36 604 |
| 115 | 131 | **3 003 066 183** | 7 001 799 239 | 334 265 | 2 000 076 089 | 37 578 |
| 116 | 132 | 2 459 437 822 | 11 831 880 | 6 990 241 198 | 69 673 | 31 901 |
| 117 | 133 | 2 453 833 994 | 1 520 407 | 7 000 579 352 | 72 385 | 39 387 |
| 118 | 134 | 2 453 582 104 | 1 508 309 | 7 000 462 005 | 70 623 | 30 954 |
| 119 | 135 | 2 453 607 456 | 1 520 805 | 7 000 426 804 | 69 833 | 35 969 |
| 120 | 136 | 2 453 516 773 | 218 632 117 | 6 783 760 256 | 64 474 484 | 29 161 |
| 121 | 137 | 2 454 656 532 | 2 135 434 | 7 000 368 481 | 121 168 | 29 070 |
| 122 | 138 | 2 464 943 252 | 76 396 888 | 6 926 141 929 | 18 701 369 | 29 401 |
| 123 | 139 | 2 454 713 076 | 1 945 881 | 7 000 526 215 | 113 113 | 32 864 |
| 124 | 140 | 2 459 197 278 | 17 602 061 | 6 984 668 329 | 3 270 690 | 39 930 |
| 125 | 141 | 2 453 811 452 | 1 546 333 | 7 000 539 142 | 71 850 | 32 204 |
| 126 | 142 | 2 453 943 973 | 1 557 203 | 7 000 570 909 | 74 167 | 34 542 |
| 127 | 143 | 2 453 989 607 | 1 490 927 | 7 000 599 022 | 67 774 | 32 994 |
| 128 | 144 | 2 455 332 089 | 1 619 032 | 7 001 303 644 | 83 418 | 43 983 |
padding represents how many bytes of padding were added before the loop. Jump offset represents the alignment of the jump: it occurs 16 bytes after the start of the loop, and its value is thus always padding+16 (but it helps visualizing to have a column for it). Cycles is the number of cycles to execute the program. MITE uops is the number of uops delivered by the MITE. DSB uops is the number of uops delivered by the DSB. DSB miss is the number of DSB misses. DSB miss penalty is the number penalty cycles due to DSB-to-MITE switches. Those number were obtained using perf stat -e idq.dsb_uops,idq.mite_uops,frontend_retired.dsb_miss,dsb2mite_switches.penalty_cycles,cycles.
In the case of the loop unrolled once, performances vary quite a lot depending on whether uops are delivered by the MITE or the DSB. However, in the case of the same loop unrolled 4 times, the exact same MITE/DSB pattern can be observed, and bearly affect performances:
| Padding | Jump offset | Cycles | MITE uops | DSB uops | DSB miss | DSB miss penalty |
| ------- | ----------- | ----------------- | ------------- | ------------- | ------------- | ---------------- |
| 0 | 34 | 2 443 059 894 | 6 404 874 796 | 324 866 | 557 007 | 58 270 |
| 1 | 35 | 2 469 823 874 | 6 402 845 671 | 359 397 | 242 913 | 44 004 |
| 2 | 36 | 2 509 831 578 | 2 428 288 | 6 400 917 619 | 126 454 | 35 718 |
| 3 | 37 | 2 516 899 098 | 2 183 357 | 6 401 715 038 | 115 461 | 42 722 |
| 4 | 38 | 2 535 785 420 | 3 596 045 | 6 405 592 898 | 193 088 | 145 459 |
| 5 | 39 | 2 536 888 998 | 4 544 195 | 6 407 929 337 | 270 307 | 141 847 |
| 6 | 40 | 2 514 898 947 | 3 500 301 | 6 404 310 391 | 168 683 | 103 966 |
| 7 | 41 | 2 497 731 601 | 2 860 409 | 6 402 485 570 | 136 201 | 70 007 |
| 8 | 42 | 2 519 396 945 | 3 373 375 | 6 405 438 970 | 180 768 | 96 499 |
| 9 | 43 | 2 519 959 317 | 3 038 180 | 6 401 766 682 | 163 982 | 57 217 |
| 10 | 44 | 2 518 862 677 | 2 556 957 | 6 400 557 326 | 127 913 | 33 141 |
| 11 | 45 | 2 505 211 679 | 1 982 925 | 6 400 617 993 | 95 689 | 33 755 |
| 12 | 46 | 2 520 256 213 | 1 764 948 | 6 401 331 329 | 79 917 | 49 950 |
| 13 | 47 | 2 528 859 616 | 2 865 395 | 6 402 516 447 | 156 550 | 51 970 |
| 14 | 48 | 2 526 844 155 | 2 334 728 | 6 402 255 285 | 122 589 | 49 508 |
| 15 | 49 | 2 526 623 614 | 2 617 350 | 6 401 419 706 | 141 028 | 39 374 |
| 16 | 50 | 2 508 159 432 | 2 293 737 | 6 400 708 049 | 110 325 | 38 407 |
| 17 | 51 | 2 505 715 666 | 2 646 431 | 6 401 083 574 | 137 684 | 41 563 |
| 18 | 52 | 2 499 124 059 | 2 407 547 | 6 400 350 409 | 127 750 | 33 880 |
| 19 | 53 | 2 519 671 512 | 2 875 080 | 6 401 825 044 | 151 559 | 45 711 |
| 20 | 54 | 2 519 382 271 | 2 178 986 | 6 400 787 103 | 94 733 | 44 873 |
| 21 | 55 | 2 494 177 992 | 1 953 404 | 6 400 469 971 | 94 724 | 32 348 |
| 22 | 56 | 2 488 166 104 | 1 865 899 | 6 400 788 908 | 89 963 | 32 295 |
| 23 | 57 | 2 473 667 778 | 1 883 684 | 6 400 516 105 | 88 080 | 31 822 |
| 24 | 58 | 2 491 983 809 | 1 964 243 | 6 401 141 418 | 95 559 | 38 009 |
| 25 | 59 | 2 523 682 312 | 2 179 584 | 6 402 528 236 | 115 550 | 51 286 |
| 26 | 60 | 2 468 826 280 | 1 568 693 | 6 400 555 529 | 69 083 | 39 205 |
| 27 | 61 | 2 468 128 275 | 2 474 660 | 6 400 400 765 | 128 799 | 32 787 |
| 28 | 62 | 2 461 792 136 | 6 401 675 319 | 325 130 | 91 908 | 31 537 |
| 29 | 63 | 2 413 473 869 | 6 401 891 263 | 308 719 | 474 886 616 | 30 068 |
| 30 | 64 | 2 442 178 183 | 2 412 150 | 6 800 327 022 | 137 335 | 33 005 |
| 31 | 65 | 2 512 670 489 | 6 402 475 993 | 321 507 | 82 884 937 | 30 439 |
| 32 | 66 | 2 438 295 147 | 6 402 583 033 | 320 775 | 193 935 | 32 813 |
| 33 | 67 | 2 465 431 142 | 6 402 487 498 | 300 367 | 192 554 | 29 581 |
| 34 | 68 | 2 510 544 922 | 1 664 395 | 6 400 550 345 | 79 102 | 35 757 |
| 35 | 69 | 2 492 243 510 | 2 598 101 | 6 400 252 944 | 137 725 | 30 489 |
| 36 | 70 | 2 477 042 696 | 2 701 036 | 6 400 305 241 | 157 174 | 29 164 |
| 37 | 71 | 2 514 818 722 | 1 666 562 | 6 400 550 483 | 79 761 | 42 464 |
| 38 | 72 | 2 458 949 815 | 2 697 410 | 6 400 122 020 | 148 539 | 30 023 |
| 39 | 73 | 2 473 858 051 | 1 653 601 | 6 400 523 949 | 76 190 | 40 743 |
| 40 | 74 | 2 437 856 049 | 2 644 658 | 6 400 220 386 | 146 309 | 27 825 |
| 41 | 75 | 2 502 432 002 | 1 700 199 | 6 400 535 604 | 79 871 | 43 243 |
| 42 | 76 | 2 493 675 148 | 2 622 476 | 6 400 171 037 | 153 333 | 31 309 |
| 43 | 77 | 2 484 286 254 | 1 700 755 | 6 400 512 732 | 80 362 | 50 028 |
| 44 | 78 | 2 494 745 100 | 2 713 187 | 6 400 363 559 | 159 990 | 31 604 |
| 45 | 79 | 2 525 806 102 | 3 195 503 | 6 401 041 048 | 193 130 | 66 443 |
| 46 | 80 | 2 525 084 219 | 2 901 188 | 6 400 857 107 | 171 471 | 48 662 |
| 47 | 81 | 2 525 023 891 | 2 503 546 | 6 400 362 906 | 151 389 | 31 424 |
| 48 | 82 | 2 516 945 604 | 1 818 682 | 6 400 778 875 | 83 134 | 41 091 |
| 49 | 83 | 2 503 330 074 | 2 295 094 | 6 400 936 466 | 127 778 | 37 184 |
| 50 | 84 | 2 515 257 599 | 1 998 408 | 6 401 086 057 | 103 812 | 36 661 |
| 51 | 85 | 2 515 704 687 | 2 203 920 | 6 400 816 810 | 103 168 | 48 042 |
| 52 | 86 | 2 521 414 196 | 2 112 029 | 6 401 272 207 | 101 158 | 52 608 |
| 53 | 87 | 2 516 900 368 | 1 597 896 | 6 400 570 586 | 73 608 | 40 296 |
| 54 | 88 | 2 471 915 311 | 1 991 994 | 6 400 413 877 | 92 759 | 35 733 |
| 55 | 89 | 2 478 161 240 | 2 757 067 | 6 400 671 792 | 141 983 | 42 998 |
| 56 | 90 | 2 468 575 551 | 1 893 460 | 6 400 361 170 | 91 596 | 32 235 |
| 57 | 91 | 2 516 481 566 | 1 936 691 | 6 400 335 059 | 97 668 | 25 221 |
| 58 | 92 | 2 482 788 158 | 2 873 305 | 6 400 470 197 | 157 875 | 35 177 |
| 59 | 93 | 2 472 664 516 | 3 482 867 | 6 401 550 404 | 199 835 | 49 199 |
| 60 | 94 | 2 522 537 958 | 5 604 672 405 | 800 614 280 | 35 268 930 | 12 965 365 |
| 61 | 95 | 2 521 875 392 | 5 604 350 958 | 800 642 890 | 34 500 749 | 12 985 188 |
| 62 | 96 | 2 475 386 582 | 6 006 074 137 | 400 581 950 | 27 625 952 | 8 251 826 |
| 63 | 97 | 2 480 407 320 | 6 007 748 529 | 400 687 290 | 21 488 812 | 8 386 755 |
| 64 | 98 | 2 451 562 172 | 6 406 359 632 | 369 366 | 687 803 | 59 309 |
| 65 | 99 | 2 469 472 059 | 6 407 104 495 | 365 022 | 821 782 | 63 981 |
| 66 | 100 | 2 525 647 143 | 2 627 685 | 6 404 609 372 | 148 635 | 53 376 |
| 67 | 101 | 2 533 208 849 | 4 294 575 | 6 405 154 176 | 224 516 | 174 959 |
| 68 | 102 | 2 522 792 300 | 2 297 167 | 6 404 309 702 | 128 216 | 62 867 |
| 69 | 103 | 2 528 134 912 | 3 877 072 | 6 405 083 855 | 204 813 | 147 178 |
| 70 | 104 | 2 480 455 890 | 2 144 317 | 6 401 555 634 | 102 192 | 34 375 |
| 71 | 105 | 2 457 138 962 | 2 871 586 | 6 400 323 955 | 138 739 | 46 120 |
| 72 | 106 | 2 476 839 093 | 2 554 822 | 6 400 518 957 | 127 515 | 32 344 |
| 73 | 107 | 2 522 202 654 | 2 698 007 | 6 401 714 270 | 136 845 | 39 610 |
| 74 | 108 | 2 529 648 028 | 2 591 016 | 6 402 573 048 | 124 463 | 77 588 |
| 75 | 109 | 2 504 833 699 | 2 099 386 | 6 400 941 244 | 102 431 | 33 246 |
| 76 | 110 | 2 509 193 033 | 2 244 590 | 6 402 859 463 | 118 633 | 44 816 |
| 77 | 111 | 2 526 808 490 | 3 075 036 | 6 401 267 531 | 168 516 | 50 367 |
| 78 | 112 | 2 525 662 170 | 2 076 530 | 6 401 870 313 | 109 810 | 44 704 |
| 79 | 113 | 2 523 356 566 | 1 647 814 | 6 400 602 452 | 74 710 | 39 700 |
| 80 | 114 | 2 490 947 127 | 2 618 819 | 6 400 769 586 | 139 588 | 38 773 |
| 81 | 115 | 2 525 323 899 | 2 433 800 | 6 401 805 576 | 113 498 | 77 057 |
| 82 | 116 | 2 528 753 531 | 3 317 116 | 6 402 358 198 | 151 306 | 132 752 |
| 83 | 117 | 2 517 309 668 | 1 923 449 | 6 401 356 394 | 89 733 | 79 670 |
| 84 | 118 | 2 519 588 707 | 1 620 560 | 6 400 866 891 | 74 881 | 53 689 |
| 85 | 119 | 2 487 765 769 | 2 620 064 | 6 400 321 476 | 134 480 | 33 623 |
...
For both loops (the one unrolled once, and the one unrolled 4 times), note the exception when the jump is aligned exactly on 64 bytes: in such cases, macro-fusion does not happen (documented in Intel Optimization manual, Section 2.5.2.1 SandyBridge Legacy Pipeline), and for some reason, this causes uops to be delivered by the MITE rather than the DSB.
Question: What causes uops to be delivered by the MITE rather than the DSB when the alignment of the jump instruction is close to 32 bytes?

Generate combinations of elements with echo

I need to prepare a simple script to generate all the permutations possible of a set of elements stored in a variable in groups of n elements (being n parameterizable), the easiest solution which came to mind was using several loops depending on the selected length of the group. But I thought that it would be more elegant taking advantage of the ability of echo command to generate combinations, that is
echo {1,2}{1,2}
11 12 21 22
So using this method, I'm trying to achieve a general way to do it, using as input parameters the list of elements (for example {1,2}) and the number of elements. It would be something like it:
set={1,2,3,4}
group=3
for ((i=0; i<$group; i++));
do
repetition=$set$repetition
done
So in this particular case, at the end of the loop the repetition variable has the value {1,2,3,4}{1,2,3,4}{1,2,3,4}. But I'm not able to find the way to use this variable to produce the combinations using the echo command. I've tried, several things like:
echo $repetition
echo $(echo $repetition)
I'm stucked on it, I'd appreciate any tip or help on that.
You can use:
bash -c "echo "$repetition""
111 112 113 114 121 122 123 124 131 132 133 134 141 142 143 144 211 212 213 214 221 222 223 224 231 232 233 234 241 242 243 244 311 312 313 314 321 322 323 324 331 332 333 334 341 342 343 344 411 412 413 414 421 422 423 424 431 432 433 434 441 442 443 444
Or else use eval instead of bash -c
If you need k-combinations for all k, this combination script can help:
#!/bin/bash
POWER=$((2**$#))
BITS=`seq -f '0' -s '' 1 $#`
while [ $POWER -gt 1 ];do
POWER=$(($POWER-1))
BIN=`bc <<< "obase=2; $POWER"`
MASK=`echo $BITS | sed -e "s/0\{${#BIN}\}$/$BIN/" | grep -o .`
POS=1; AWK=`for M in $MASK;do
[ $M -eq 1 ] && echo -n "print \\$\${POS};"
POS=$(($POS+1))
done;echo`
awk -v ORS=" " "{$AWK}" <<< "$#" | sed 's/ $//'
done
Example:
./combination ⚪ ⛔ ⚫
⚪ ⛔ ⚫
⚪ ⛔
⚪ ⚫
⚪
⛔ ⚫
⛔
⚫
The empty set is there too, trust me.

Iterate over two items in bash

Given a file integers that contains integers separated by new lines. For instance:
1
39
77
109
137
169
197
229
261
293
One can iterate over the file using the following code:
while read a
do
echo "$a"
done < integers
I'm looking however for an elegant solution such that the loop takes two integers at once and always updates by one step, such that:
while #some funny commands
do
echo "$a | $b"
done < integers
results in:
1 | 39
39 | 77
77 | 109
109 | 137
137 | 169
169 | 197
197 | 229
229 | 261
261 | 293
{
read a
while read b; do
echo "$a | $b"
a=$b
done
} < file
Output:
1 | 39
39 | 77
77 | 109
109 | 137
137 | 169
169 | 197
197 | 229
229 | 261
261 | 293
Use a variable to store the previous value:
prev=
while read line; do
[[ ! -z $prev ]] && echo $prev "|" $line;
prev=$line;
done <file

find all values sequentially from given to target value in columns, bash

I have a data.txt file
1 2 3 4 5 6
cat data.txt
17 245 1323 17.7777 10.2222 61.1111
19 232 1232 19.9999 19.9999 68.8888
13 133 1233 13.3333 13.3333 63.3333
17 177 1678 17.7777 17.7777 69.9999
12 122 2325 12.2222 11.333 64.4444
18 245 1323 18.8888 12.4444 68.8888
12 222 1222 12.2222 19.9999 61.1111
14 245 1323 14.4444 13.5555 68.8888
I would like the find all the values sequentially from 12.2222 in column 4 to 18.8888.
Answer:
echo ${minValsCol4[#]}
12.2222 13.3333 14.4444 17.7777 18.8888
And the values sequentially from 63.3333 in column 6 to 68.8888.
Answer:
echo ${minValsCol6[#]}
63.3333 64.4444 68.8888
Any solution in awk?
Thanks.
Using awk and sort -nu:
awk -v col=4 -v start=12.2222 -v end=18.8888 '$col>=start && $col<=end{
print $col}' data.txt | sort -nu
12.2222
13.3333
14.4444
17.7777
18.8888

Resources