Open Text File and count how many times each IP is listed - powershell-4.0

I've got a text file with ALOT of IP addresses, each different IP could be in the text file many many times, so I'm trying to write (or get help writing) a Powershell script that will do the following:
1 - Open the text file, count the number of times each IP is listed and
2 - If the total number is greater than 20 then append that IP to another file called Banned.txt
after this script runs, I will manually delete the file (the file is created by another program) after a few days, I would this script again.
Here' my best attempt:
$BadIP = Get-Content -Path G:\ips.txt
foreach ($ip in $BadIP) {
$matches = Select-String -InputObject $BadIP -Pattern $ip -AllMatches
$a = $matches.Matches.Count
$ip + " " + $a
}
the above code works, but shows the same ip again and again. The result looks like this:
197.3.11.26 1
188.128.119.172 1
64.71.18.150 1
212.92.107.105 51
95.213.162.132 1
212.92.107.105 51
212.92.107.105 51
212.92.123.202 48
64.74.185.234 1
212.92.107.105 51
212.92.107.105 51
212.92.115.227 45
212.92.123.202 48
212.92.107.105 51
212.92.115.227 45
212.92.123.202 48
As you can see, 212.92.107.105 is shown many times.
Update: Working code:
$BadIP = Get-Content -Path G:\ips.txt
foreach ($ip in $BadIP) {
$matches = Select-String -InputObject $BadIP -Pattern $ip -AllMatches
$a = $matches.Matches.Count
if ($a -gt 10) {
Add-Content -Path G:\NewBanned.txt -Value $ip -Force
}
}
$B = Get-Content -Path G:\NewBanned.txt | sort | Get-Unique
Set-Content -Path G:\C.txt -Value $B -Force
197.3.11.26
188.128.119.172
64.71.18.150
212.92.107.105
95.213.162.132
212.92.107.105
212.92.107.105
212.92.123.202
64.74.185.234
212.92.107.105
212.92.107.105
212.92.115.227
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
79.175.133.67
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
212.92.107.105
186.202.182.154
212.92.115.227
23.97.190.194
212.92.123.202
116.247.79.114
212.92.107.125
212.92.107.105
95.154.89.97
212.92.115.227
212.92.123.202
212.92.107.105
37.61.220.2
212.92.115.227
216.210.86.226
212.92.107.105
212.92.115.227
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
212.92.123.202
212.92.115.227
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
212.92.123.202
103.196.30.114
212.92.115.227
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
72.43.207.8
212.92.123.202
52.176.111.87
51.15.147.173
212.92.115.227
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
212.92.123.202
51.15.147.173
212.92.115.227
212.92.123.202
212.92.107.105
80.241.45.18
51.15.147.173
212.92.115.227
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
212.92.107.105
51.15.147.173
212.92.123.202
212.92.115.227
212.92.107.105
51.15.147.173
212.92.123.202
212.92.107.105
212.92.115.227
121.122.140.41
61.222.127.100
187.216.131.254
138.201.35.26
212.92.123.202
212.92.107.105
212.92.115.227
51.15.147.173
162.250.124.186
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
51.15.147.173
212.92.107.105
212.92.115.227
212.92.123.202
212.92.107.105
51.15.147.173
212.92.115.227
212.92.123.202
198.71.53.228
212.92.107.105
212.92.115.227
212.92.123.202
51.15.147.173
212.92.107.105
212.92.115.227
212.92.123.202
121.122.140.41
212.92.107.105
51.15.147.173
212.92.115.227
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
51.15.147.173
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
51.15.147.173
212.92.123.202
212.92.115.227
138.201.35.26
212.92.107.105
212.92.123.202
51.15.147.173
124.97.39.218
212.92.115.227
212.92.107.105
212.92.123.202
212.92.115.227
51.15.147.173
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
212.118.13.124
212.92.123.202
212.92.115.227
212.92.107.105
212.92.123.202
83.37.113.40
212.92.115.227
212.92.107.105
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
78.110.10.245
212.92.107.105
212.92.115.227
212.92.123.202
38.110.28.23
212.92.107.105
212.92.115.227
212.92.123.202
212.92.107.105
212.92.115.227
212.92.123.202
212.92.107.105
212.92.123.242
212.92.123.202
212.92.115.227
212.92.107.105
212.92.123.202
212.92.115.227
212.92.107.105
212.92.115.227
212.92.107.105
212.92.123.202
88.250.238.17
212.92.115.227
212.92.107.105
88.250.238.17
212.92.123.202
212.92.115.227
212.92.107.105
208.98.255.113
124.97.39.218
177.84.35.46
1.174.101.155

Let PowerShell do the heavy lifting for you. The Group-Object cmdlet groups identical input objects and automatically counts them:
Get-Content 'G:\NewBanned.txt' |
Group-Object |
Select-Object Name, Count
Output to a text file like this:
Get-Content 'G:\NewBanned.txt' |
Group-Object |
ForEach-Object { '{0} {1}' -f $_.Name, $_.Count } |
Set-Content 'G:\C.txt'
Output to a CSV file like this:
Get-Content 'G:\NewBanned.txt' |
Group-Object |
Select-Object #{n='IPAddress';e={$_.Name}}, Count
Export-Csv 'G:\C.csv' -NoType
The calculated property in the latter is just to replace "Name" with "IPAddress" as the column title.

Related

MITE (legacy pipeline) used instead of DSB (uops cache) when jump is not quite aligned on 32 bytes

This question used to be a part of this (now updated) question, but it seems like it should be another question, since it didn't help to get an answer to the other one.
My starting point is a loop doing 3 independent additions:
for (unsigned long i = 0; i < 2000000000; i++) {
asm volatile("" : "+r" (a), "+r" (b), "+r" (c), "+r" (d)); // prevents C compiler from optimizing out adds
a = a + d;
b = b + d;
c = c + d;
}
When this loop is not unrolled, it executes in 1 cycle (which is to be expected: it contains 4 instructions: the 3 additions, and the macro-fused increment/jump; all of which can be executed in one cycle on ports 0, 1, 5 and 6). When unrolling this loop, performances are surprising, and tend to be 25% slower than the non-unrolled version, which is probably due to uops scheduling, as suggested in the comments of the previous question.
In this question, I'm not asking about the performances, but rather about why in some cases, uops come from the MITE (legacy pipeline), and in other cases, from the DSB (uop cache). (note that I'm using a Skylake with the LSD (Loop Stream Detector) disabled)
Experimentally, when the jump is not quite aligned on 32 bytes, uops are issued from the MITE rather than the DSB. ("not quite 32 bytes" really means from 2 bytes before and 3 bytes after 32 bytes. Or put another way, starting from a 32-byte aligned jump, it means adding 1 to 3 bytes of padding, or removing 1 or 2 bytes of padding)
Compiling the C code above with Clang and (manually) unrolling it one time produces the following assembly code:
movl $2000000000, %esi
.p2align 4, 0x90
.LBB0_1:
addl %edi, %edx # 1
addl %edi, %ecx
addl %edi, %eax
addl %edi, %edx # 2
addl %edi, %ecx
addl %edi, %eax
addq $-2, %rsi
jne .LBB0_1
This code executes 2 cycles/iteration, as expected, and most uops are delivered by the DSB. Adding one byte of padding before the loop causes the loop to execute in 3 cycles/iteration, and all the uops are now delivered by the MITE.
In an effort to understand what is happening, I changed the align directive to .p2align 7 (thus aligning the loop on 128 bytes), and added some padding before the loop, thus changing the loop alignment. The results are as follows (long snippet ahead; explanations below):
| Padding | Jump offset | Cycles | MITE uops | DSB uops | DSB miss | DSB miss penalty |
| ------- | ----------- | ----------------- | ------------- | ------------- | ------------- | ---------------- |
| 0 | 16 | 2 453 942 151 | 1 589 440 | 7 000 531 761 | 73 681 | 33 419 |
| 1 | 17 | 2 454 623 799 | 2 002 088 | 7 000 493 234 | 107 433 | 28 686 |
| 2 | 18 | 2 454 010 264 | 1 611 181 | 7 000 580 070 | 72 372 | 34 963 |
| 3 | 19 | 2 455 016 743 | 1 531 428 | 7 001 271 720 | 76 240 | 42 493 |
| 4 | 20 | 2 454 056 088 | 1 592 150 | 7 000 571 537 | 71 691 | 29 677 |
| 5 | 21 | 2 455 111 497 | 1 701 204 | 7 001 068 440 | 85 117 | 41 744 |
| 6 | 22 | 2 454 558 860 | 2 081 244 | 7 000 362 980 | 105 388 | 29 829 |
| 7 | 23 | 2 454 351 179 | 1 765 720 | 7 000 472 785 | 81 903 | 39 022 |
| 8 | 24 | 2 454 470 296 | 2 045 062 | 7 000 337 694 | 107 763 | 30 750 |
| 9 | 25 | 2 454 395 853 | 1 748 525 | 7 000 560 730 | 82 773 | 37 030 |
| 10 | 26 | 2 453 920 970 | 1 500 801 | 7 000 562 016 | 70 144 | 36 559 |
| 11 | 27 | 2 453 748 551 | 1 485 784 | 7 000 530 064 | 66 535 | 32 019 |
| 12 | 28 | 2 453 973 841 | 1 601 708 | 7 000 562 754 | 72 601 | 31 970 |
| 13 | 29 | 2 454 749 106 | 2 085 092 | 7 000 539 751 | 109 862 | 30 977 |
| 14 | 30 | **3 003 289 033** | 7 001 845 873 | 358 240 | 1 000 075 874 | 37 506 |
| 15 | 31 | **4 003 748 994** | 7 002 171 254 | 372 672 | 1 000 086 939 | 39 679 |
| 16 | 32 | **3 003 810 021** | 7 002 294 170 | 295 736 | 1 000 114 704 | 28 974 |
| 17 | 33 | **3 002 912 972** | 7 001 752 747 | 350 755 | 1 000 071 698 | 32 249 |
| 18 | 34 | **3 003 392 542** | 7 001 941 076 | 360 439 | 1 000 076 887 | 45 663 |
| 19 | 35 | **3 003 040 266** | 7 001 759 091 | 343 693 | 1 000 072 685 | 38 703 |
| 20 | 36 | 2 453 764 603 | 1 511 899 | 7 000 546 442 | 66 912 | 32 996 |
| 21 | 37 | 2 454 889 754 | 1 946 579 | 7 000 713 787 | 102 922 | 31 852 |
| 22 | 38 | 2 454 700 423 | 1 961 612 | 7 000 581 288 | 100 281 | 30 364 |
| 23 | 39 | 2 454 398 236 | 1 974 415 | 7 000 350 258 | 103 015 | 30 855 |
| 24 | 40 | 2 452 285 702 | 1 562 028 | 7 000 416 473 | 67 622 | 38 783 |
| 25 | 41 | 2 454 500 700 | 2 013 917 | 7 000 384 154 | 102 906 | 31 165 |
| 26 | 42 | 2 454 666 446 | 1 928 032 | 7 000 572 245 | 99 613 | 35 813 |
| 27 | 43 | 2 453 929 241 | 1 565 110 | 7 000 588 419 | 70 027 | 31 336 |
| 28 | 44 | 2 453 852 431 | 1 595 897 | 7 000 633 247 | 71 735 | 35 984 |
| 29 | 45 | 2 454 664 111 | 2 039 338 | 7 000 534 894 | 105 225 | 30 043 |
| 30 | 46 | 2 454 523 184 | 1 876 338 | 7 000 592 928 | 88 020 | 48 456 |
| 31 | 47 | 2 454 091 130 | 1 560 821 | 7 000 631 532 | 70 150 | 37 773 |
| 32 | 48 | 2 453 813 400 | 1 535 557 | 7 000 556 686 | 70 196 | 33 268 |
| 33 | 49 | 2 453 772 578 | 1 501 716 | 7 000 526 938 | 67 747 | 33 492 |
| 34 | 50 | 2 455 308 730 | 1 643 047 | 7 001 287 728 | 80 148 | 43 035 |
| 35 | 51 | 2 453 790 620 | 1 506 869 | 7 000 529 450 | 66 903 | 35 315 |
| 36 | 52 | 2 453 509 109 | 1 534 817 | 7 000 405 227 | 67 344 | 30 526 |
| 37 | 53 | 2 453 516 412 | 1 469 184 | 7 000 430 367 | 65 040 | 30 686 |
| 38 | 54 | 2 453 851 033 | 1 556 722 | 7 000 581 363 | 69 098 | 36 605 |
| 39 | 55 | 2 454 916 648 | 2 089 549 | 7 000 572 462 | 111 448 | 30 435 |
| 40 | 56 | 2 455 089 502 | 1 991 232 | 7 000 799 155 | 104 559 | 30 724 |
| 41 | 57 | 2 454 744 425 | 2 002 307 | 7 000 532 096 | 105 221 | 32 393 |
| 42 | 58 | 2 454 543 686 | 1 960 042 | 7 000 500 103 | 101 409 | 27 943 |
| 43 | 59 | 2 453 893 848 | 1 561 182 | 7 000 607 528 | 73 192 | 33 645 |
| 44 | 60 | 2 453 989 634 | 1 629 949 | 7 000 556 378 | 74 704 | 34 821 |
| 45 | 61 | 2 453 879 092 | 1 551 181 | 7 000 561 022 | 70 233 | 36 191 |
| 46 | 62 | **3 003 015 120** | 7 001 772 138 | 348 243 | 1 000 073 404 | 35 333 |
| 47 | 63 | **4 004 092 512** | 7 002 359 576 | 380 452 | 2 000 097 711 | 50 376 |
| 48 | 64 | **2 234 898 441** | 109 006 411 | 7 893 398 716 | 109 108 | 35 075 |
| 49 | 65 | **3 003 182 414** | 7 001 843 757 | 357 954 | 2 000 075 494 | 36 281 |
| 50 | 66 | **3 003 280 054** | 7 001 876 384 | 358 097 | 2 000 075 630 | 39 301 |
| 51 | 67 | **3 004 086 641** | 7 002 384 321 | 307 480 | 2 000 114 067 | 32 242 |
| 52 | 68 | 2 461 587 458 | 15 841 141 | 6 986 174 099 | 70 725 | 29 985 |
| 53 | 69 | 2 454 704 936 | 2 019 734 | 7 000 530 774 | 123 110 | 32 717 |
| 54 | 70 | **2 629 777 063** | 639 698 105 | 6 362 945 524 | 121 313 | 29 648 |
| 55 | 71 | 2 452 517 518 | 21 196 356 | 6 980 899 385 | 5 689 504 | 27 618 |
| 56 | 72 | 2 457 056 675 | 79 539 769 | 6 922 550 909 | 23 953 203 | 32 238 |
| 57 | 73 | 2 453 966 239 | 1 486 894 | 7 000 608 597 | 72 506 | 36 799 |
| 58 | 74 | 2 461 391 665 | 53 426 497 | 6 948 932 999 | 13 034 546 | 37 883 |
| 59 | 75 | 2 454 091 521 | 1 537 438 | 7 000 613 720 | 73 256 | 38 003 |
| 60 | 76 | 2 550 237 671 | 312 611 365 | 6 689 536 750 | 62 278 250 | 41 078 |
| 61 | 77 | 2 454 371 129 | 1 915 411 | 7 000 545 114 | 107 086 | 30 133 |
| 62 | 78 | 2 462 015 450 | 32 874 270 | 6 969 244 698 | 5 296 338 | 37 506 |
| 63 | 79 | 2 453 810 530 | 1 588 073 | 7 000 489 720 | 70 291 | 36 915 |
| 64 | 80 | 2 453 510 981 | 1 521 322 | 7 000 384 678 | 67 219 | 30 114 |
| 65 | 81 | 2 454 659 220 | 1 531 897 | 7 001 004 411 | 74 567 | 41 201 |
| 66 | 82 | 2 453 984 834 | 1 570 182 | 7 000 624 664 | 72 914 | 39 483 |
| 67 | 83 | 2 454 127 882 | 1 638 057 | 7 000 590 289 | 75 623 | 33 755 |
| 68 | 84 | 2 453 781 071 | 1 575 812 | 7 000 535 270 | 74 337 | 34 094 |
| 69 | 85 | 2 453 947 163 | 1 595 272 | 7 000 545 139 | 71 584 | 38 966 |
| 70 | 86 | 2 453 948 945 | 1 594 376 | 7 000 552 806 | 71 096 | 34 265 |
| 71 | 87 | 2 453 888 591 | 1 540 673 | 7 000 536 024 | 71 123 | 33 350 |
| 72 | 88 | 2 453 838 422 | 1 539 740 | 7 000 540 957 | 71 776 | 33 191 |
| 73 | 89 | 2 454 013 271 | 1 532 577 | 7 000 534 226 | 69 794 | 32 287 |
| 74 | 90 | 2 453 959 044 | 1 549 283 | 7 000 562 495 | 71 483 | 35 739 |
| 75 | 91 | 2 454 357 932 | 2 062 771 | 7 000 290 377 | 111 481 | 28 864 |
| 76 | 92 | 2 454 258 445 | 1 937 218 | 7 000 338 810 | 101 760 | 27 475 |
| 77 | 93 | 2 454 156 149 | 1 738 764 | 7 000 400 563 | 82 207 | 38 130 |
| 78 | 94 | **3 003 245 905** | 7 001 947 715 | 356 496 | 1 000 078 668 | 38 983 |
| 79 | 95 | **4 003 498 969** | 7 002 106 621 | 361 236 | 1 000 087 167 | 41 197 |
| 80 | 96 | **3 003 440 683** | 7 001 915 914 | 340 975 | 1 000 081 844 | 36 174 |
| 81 | 97 | **3 003 192 020** | 7 001 848 864 | 354 371 | 1 000 076 474 | 37 465 |
| 82 | 98 | **3 004 231 542** | 7 002 423 726 | 327 973 | 1 000 119 668 | 34 498 |
| 83 | 99 | **3 003 204 122** | 7 001 869 410 | 341 860 | 1 000 075 913 | 34 005 |
| 84 | 100 | 2 453 903 936 | 1 509 757 | 7 000 577 662 | 70 586 | 38 383 |
| 85 | 101 | 2 454 444 592 | 1 649 275 | 7 000 764 725 | 76 185 | 37 481 |
| 86 | 102 | 2 455 551 786 | 2 094 483 | 7 000 919 108 | 115 683 | 33 599 |
| 87 | 103 | 2 454 090 830 | 1 644 299 | 7 000 554 367 | 76 131 | 37 986 |
| 88 | 104 | 2 452 263 286 | 1 982 058 | 7 000 594 326 | 105 011 | 32 747 |
| 89 | 105 | 2 453 938 066 | 1 552 994 | 7 000 560 184 | 71 781 | 38 307 |
| 90 | 106 | 2 453 839 657 | 1 591 329 | 7 000 534 174 | 71 493 | 32 464 |
| 91 | 107 | 2 456 284 290 | 1 721 752 | 7 001 608 059 | 87 228 | 62 810 |
| 92 | 108 | 2 453 706 579 | 1 577 941 | 7 000 431 429 | 70 517 | 33 684 |
| 93 | 109 | 2 453 714 638 | 1 484 598 | 7 000 514 337 | 66 443 | 34 239 |
| 94 | 110 | 2 453 814 023 | 1 619 443 | 7 000 418 813 | 74 924 | 34 831 |
| 95 | 111 | 2 453 734 759 | 1 502 260 | 7 000 447 611 | 66 790 | 36 660 |
| 96 | 112 | 2 456 304 117 | 1 636 949 | 7 001 903 454 | 87 894 | 45 984 |
| 97 | 113 | 2 454 764 375 | 2 032 245 | 7 000 503 166 | 111 873 | 36 308 |
| 98 | 114 | 2 453 930 372 | 1 641 970 | 7 000 527 807 | 75 164 | 36 817 |
| 99 | 115 | 2 453 596 195 | 1 577 533 | 7 000 528 820 | 74 424 | 35 428 |
| 100 | 116 | 2 453 774 301 | 1 490 781 | 7 000 546 047 | 71 040 | 31 462 |
| 101 | 117 | 2 453 808 290 | 1 472 783 | 7 000 563 094 | 68 497 | 30 214 |
| 102 | 118 | 2 453 927 668 | 1 578 700 | 7 000 547 988 | 72 499 | 36 894 |
| 103 | 119 | 2 453 881 334 | 1 538 221 | 7 000 556 688 | 73 651 | 38 630 |
| 104 | 120 | 2 454 620 311 | 2 049 316 | 7 000 459 876 | 110 210 | 30 452 |
| 105 | 121 | 2 453 793 013 | 1 553 815 | 7 000 448 812 | 70 690 | 35 146 |
| 106 | 122 | 2 453 516 549 | 1 477 303 | 7 000 369 210 | 66 462 | 32 381 |
| 107 | 123 | 2 453 679 941 | 1 558 433 | 7 000 399 585 | 71 027 | 37 700 |
| 108 | 124 | 2 453 984 832 | 1 591 183 | 7 000 558 547 | 74 810 | 32 532 |
| 109 | 125 | 2 453 972 231 | 1 585 644 | 7 000 573 173 | 73 159 | 39 583 |
| 110 | 126 | **3 003 167 043** | 7 001 793 152 | 341 345 | 1 000 076 047 | 41 811 |
| 111 | 127 | **4 004 031 670** | 7 002 344 014 | 394 950 | 2 000 094 647 | 42 345 |
| 112 | 128 | **2 017 184 284** | 2 397 032 | 7 999 676 604 | 97 555 | 23 614 |
| 113 | 129 | **3 003 231 942** | 7 001 876 887 | 355 548 | 2 000 078 108 | 35 462 |
| 114 | 130 | **3 003 073 797** | 7 001 763 748 | 343 879 | 2 000 073 914 | 36 604 |
| 115 | 131 | **3 003 066 183** | 7 001 799 239 | 334 265 | 2 000 076 089 | 37 578 |
| 116 | 132 | 2 459 437 822 | 11 831 880 | 6 990 241 198 | 69 673 | 31 901 |
| 117 | 133 | 2 453 833 994 | 1 520 407 | 7 000 579 352 | 72 385 | 39 387 |
| 118 | 134 | 2 453 582 104 | 1 508 309 | 7 000 462 005 | 70 623 | 30 954 |
| 119 | 135 | 2 453 607 456 | 1 520 805 | 7 000 426 804 | 69 833 | 35 969 |
| 120 | 136 | 2 453 516 773 | 218 632 117 | 6 783 760 256 | 64 474 484 | 29 161 |
| 121 | 137 | 2 454 656 532 | 2 135 434 | 7 000 368 481 | 121 168 | 29 070 |
| 122 | 138 | 2 464 943 252 | 76 396 888 | 6 926 141 929 | 18 701 369 | 29 401 |
| 123 | 139 | 2 454 713 076 | 1 945 881 | 7 000 526 215 | 113 113 | 32 864 |
| 124 | 140 | 2 459 197 278 | 17 602 061 | 6 984 668 329 | 3 270 690 | 39 930 |
| 125 | 141 | 2 453 811 452 | 1 546 333 | 7 000 539 142 | 71 850 | 32 204 |
| 126 | 142 | 2 453 943 973 | 1 557 203 | 7 000 570 909 | 74 167 | 34 542 |
| 127 | 143 | 2 453 989 607 | 1 490 927 | 7 000 599 022 | 67 774 | 32 994 |
| 128 | 144 | 2 455 332 089 | 1 619 032 | 7 001 303 644 | 83 418 | 43 983 |
padding represents how many bytes of padding were added before the loop. Jump offset represents the alignment of the jump: it occurs 16 bytes after the start of the loop, and its value is thus always padding+16 (but it helps visualizing to have a column for it). Cycles is the number of cycles to execute the program. MITE uops is the number of uops delivered by the MITE. DSB uops is the number of uops delivered by the DSB. DSB miss is the number of DSB misses. DSB miss penalty is the number penalty cycles due to DSB-to-MITE switches. Those number were obtained using perf stat -e idq.dsb_uops,idq.mite_uops,frontend_retired.dsb_miss,dsb2mite_switches.penalty_cycles,cycles.
In the case of the loop unrolled once, performances vary quite a lot depending on whether uops are delivered by the MITE or the DSB. However, in the case of the same loop unrolled 4 times, the exact same MITE/DSB pattern can be observed, and bearly affect performances:
| Padding | Jump offset | Cycles | MITE uops | DSB uops | DSB miss | DSB miss penalty |
| ------- | ----------- | ----------------- | ------------- | ------------- | ------------- | ---------------- |
| 0 | 34 | 2 443 059 894 | 6 404 874 796 | 324 866 | 557 007 | 58 270 |
| 1 | 35 | 2 469 823 874 | 6 402 845 671 | 359 397 | 242 913 | 44 004 |
| 2 | 36 | 2 509 831 578 | 2 428 288 | 6 400 917 619 | 126 454 | 35 718 |
| 3 | 37 | 2 516 899 098 | 2 183 357 | 6 401 715 038 | 115 461 | 42 722 |
| 4 | 38 | 2 535 785 420 | 3 596 045 | 6 405 592 898 | 193 088 | 145 459 |
| 5 | 39 | 2 536 888 998 | 4 544 195 | 6 407 929 337 | 270 307 | 141 847 |
| 6 | 40 | 2 514 898 947 | 3 500 301 | 6 404 310 391 | 168 683 | 103 966 |
| 7 | 41 | 2 497 731 601 | 2 860 409 | 6 402 485 570 | 136 201 | 70 007 |
| 8 | 42 | 2 519 396 945 | 3 373 375 | 6 405 438 970 | 180 768 | 96 499 |
| 9 | 43 | 2 519 959 317 | 3 038 180 | 6 401 766 682 | 163 982 | 57 217 |
| 10 | 44 | 2 518 862 677 | 2 556 957 | 6 400 557 326 | 127 913 | 33 141 |
| 11 | 45 | 2 505 211 679 | 1 982 925 | 6 400 617 993 | 95 689 | 33 755 |
| 12 | 46 | 2 520 256 213 | 1 764 948 | 6 401 331 329 | 79 917 | 49 950 |
| 13 | 47 | 2 528 859 616 | 2 865 395 | 6 402 516 447 | 156 550 | 51 970 |
| 14 | 48 | 2 526 844 155 | 2 334 728 | 6 402 255 285 | 122 589 | 49 508 |
| 15 | 49 | 2 526 623 614 | 2 617 350 | 6 401 419 706 | 141 028 | 39 374 |
| 16 | 50 | 2 508 159 432 | 2 293 737 | 6 400 708 049 | 110 325 | 38 407 |
| 17 | 51 | 2 505 715 666 | 2 646 431 | 6 401 083 574 | 137 684 | 41 563 |
| 18 | 52 | 2 499 124 059 | 2 407 547 | 6 400 350 409 | 127 750 | 33 880 |
| 19 | 53 | 2 519 671 512 | 2 875 080 | 6 401 825 044 | 151 559 | 45 711 |
| 20 | 54 | 2 519 382 271 | 2 178 986 | 6 400 787 103 | 94 733 | 44 873 |
| 21 | 55 | 2 494 177 992 | 1 953 404 | 6 400 469 971 | 94 724 | 32 348 |
| 22 | 56 | 2 488 166 104 | 1 865 899 | 6 400 788 908 | 89 963 | 32 295 |
| 23 | 57 | 2 473 667 778 | 1 883 684 | 6 400 516 105 | 88 080 | 31 822 |
| 24 | 58 | 2 491 983 809 | 1 964 243 | 6 401 141 418 | 95 559 | 38 009 |
| 25 | 59 | 2 523 682 312 | 2 179 584 | 6 402 528 236 | 115 550 | 51 286 |
| 26 | 60 | 2 468 826 280 | 1 568 693 | 6 400 555 529 | 69 083 | 39 205 |
| 27 | 61 | 2 468 128 275 | 2 474 660 | 6 400 400 765 | 128 799 | 32 787 |
| 28 | 62 | 2 461 792 136 | 6 401 675 319 | 325 130 | 91 908 | 31 537 |
| 29 | 63 | 2 413 473 869 | 6 401 891 263 | 308 719 | 474 886 616 | 30 068 |
| 30 | 64 | 2 442 178 183 | 2 412 150 | 6 800 327 022 | 137 335 | 33 005 |
| 31 | 65 | 2 512 670 489 | 6 402 475 993 | 321 507 | 82 884 937 | 30 439 |
| 32 | 66 | 2 438 295 147 | 6 402 583 033 | 320 775 | 193 935 | 32 813 |
| 33 | 67 | 2 465 431 142 | 6 402 487 498 | 300 367 | 192 554 | 29 581 |
| 34 | 68 | 2 510 544 922 | 1 664 395 | 6 400 550 345 | 79 102 | 35 757 |
| 35 | 69 | 2 492 243 510 | 2 598 101 | 6 400 252 944 | 137 725 | 30 489 |
| 36 | 70 | 2 477 042 696 | 2 701 036 | 6 400 305 241 | 157 174 | 29 164 |
| 37 | 71 | 2 514 818 722 | 1 666 562 | 6 400 550 483 | 79 761 | 42 464 |
| 38 | 72 | 2 458 949 815 | 2 697 410 | 6 400 122 020 | 148 539 | 30 023 |
| 39 | 73 | 2 473 858 051 | 1 653 601 | 6 400 523 949 | 76 190 | 40 743 |
| 40 | 74 | 2 437 856 049 | 2 644 658 | 6 400 220 386 | 146 309 | 27 825 |
| 41 | 75 | 2 502 432 002 | 1 700 199 | 6 400 535 604 | 79 871 | 43 243 |
| 42 | 76 | 2 493 675 148 | 2 622 476 | 6 400 171 037 | 153 333 | 31 309 |
| 43 | 77 | 2 484 286 254 | 1 700 755 | 6 400 512 732 | 80 362 | 50 028 |
| 44 | 78 | 2 494 745 100 | 2 713 187 | 6 400 363 559 | 159 990 | 31 604 |
| 45 | 79 | 2 525 806 102 | 3 195 503 | 6 401 041 048 | 193 130 | 66 443 |
| 46 | 80 | 2 525 084 219 | 2 901 188 | 6 400 857 107 | 171 471 | 48 662 |
| 47 | 81 | 2 525 023 891 | 2 503 546 | 6 400 362 906 | 151 389 | 31 424 |
| 48 | 82 | 2 516 945 604 | 1 818 682 | 6 400 778 875 | 83 134 | 41 091 |
| 49 | 83 | 2 503 330 074 | 2 295 094 | 6 400 936 466 | 127 778 | 37 184 |
| 50 | 84 | 2 515 257 599 | 1 998 408 | 6 401 086 057 | 103 812 | 36 661 |
| 51 | 85 | 2 515 704 687 | 2 203 920 | 6 400 816 810 | 103 168 | 48 042 |
| 52 | 86 | 2 521 414 196 | 2 112 029 | 6 401 272 207 | 101 158 | 52 608 |
| 53 | 87 | 2 516 900 368 | 1 597 896 | 6 400 570 586 | 73 608 | 40 296 |
| 54 | 88 | 2 471 915 311 | 1 991 994 | 6 400 413 877 | 92 759 | 35 733 |
| 55 | 89 | 2 478 161 240 | 2 757 067 | 6 400 671 792 | 141 983 | 42 998 |
| 56 | 90 | 2 468 575 551 | 1 893 460 | 6 400 361 170 | 91 596 | 32 235 |
| 57 | 91 | 2 516 481 566 | 1 936 691 | 6 400 335 059 | 97 668 | 25 221 |
| 58 | 92 | 2 482 788 158 | 2 873 305 | 6 400 470 197 | 157 875 | 35 177 |
| 59 | 93 | 2 472 664 516 | 3 482 867 | 6 401 550 404 | 199 835 | 49 199 |
| 60 | 94 | 2 522 537 958 | 5 604 672 405 | 800 614 280 | 35 268 930 | 12 965 365 |
| 61 | 95 | 2 521 875 392 | 5 604 350 958 | 800 642 890 | 34 500 749 | 12 985 188 |
| 62 | 96 | 2 475 386 582 | 6 006 074 137 | 400 581 950 | 27 625 952 | 8 251 826 |
| 63 | 97 | 2 480 407 320 | 6 007 748 529 | 400 687 290 | 21 488 812 | 8 386 755 |
| 64 | 98 | 2 451 562 172 | 6 406 359 632 | 369 366 | 687 803 | 59 309 |
| 65 | 99 | 2 469 472 059 | 6 407 104 495 | 365 022 | 821 782 | 63 981 |
| 66 | 100 | 2 525 647 143 | 2 627 685 | 6 404 609 372 | 148 635 | 53 376 |
| 67 | 101 | 2 533 208 849 | 4 294 575 | 6 405 154 176 | 224 516 | 174 959 |
| 68 | 102 | 2 522 792 300 | 2 297 167 | 6 404 309 702 | 128 216 | 62 867 |
| 69 | 103 | 2 528 134 912 | 3 877 072 | 6 405 083 855 | 204 813 | 147 178 |
| 70 | 104 | 2 480 455 890 | 2 144 317 | 6 401 555 634 | 102 192 | 34 375 |
| 71 | 105 | 2 457 138 962 | 2 871 586 | 6 400 323 955 | 138 739 | 46 120 |
| 72 | 106 | 2 476 839 093 | 2 554 822 | 6 400 518 957 | 127 515 | 32 344 |
| 73 | 107 | 2 522 202 654 | 2 698 007 | 6 401 714 270 | 136 845 | 39 610 |
| 74 | 108 | 2 529 648 028 | 2 591 016 | 6 402 573 048 | 124 463 | 77 588 |
| 75 | 109 | 2 504 833 699 | 2 099 386 | 6 400 941 244 | 102 431 | 33 246 |
| 76 | 110 | 2 509 193 033 | 2 244 590 | 6 402 859 463 | 118 633 | 44 816 |
| 77 | 111 | 2 526 808 490 | 3 075 036 | 6 401 267 531 | 168 516 | 50 367 |
| 78 | 112 | 2 525 662 170 | 2 076 530 | 6 401 870 313 | 109 810 | 44 704 |
| 79 | 113 | 2 523 356 566 | 1 647 814 | 6 400 602 452 | 74 710 | 39 700 |
| 80 | 114 | 2 490 947 127 | 2 618 819 | 6 400 769 586 | 139 588 | 38 773 |
| 81 | 115 | 2 525 323 899 | 2 433 800 | 6 401 805 576 | 113 498 | 77 057 |
| 82 | 116 | 2 528 753 531 | 3 317 116 | 6 402 358 198 | 151 306 | 132 752 |
| 83 | 117 | 2 517 309 668 | 1 923 449 | 6 401 356 394 | 89 733 | 79 670 |
| 84 | 118 | 2 519 588 707 | 1 620 560 | 6 400 866 891 | 74 881 | 53 689 |
| 85 | 119 | 2 487 765 769 | 2 620 064 | 6 400 321 476 | 134 480 | 33 623 |
...
For both loops (the one unrolled once, and the one unrolled 4 times), note the exception when the jump is aligned exactly on 64 bytes: in such cases, macro-fusion does not happen (documented in Intel Optimization manual, Section 2.5.2.1 SandyBridge Legacy Pipeline), and for some reason, this causes uops to be delivered by the MITE rather than the DSB.
Question: What causes uops to be delivered by the MITE rather than the DSB when the alignment of the jump instruction is close to 32 bytes?

How to convert this to a file?

I am making a request to an external API, this route should return a PDF file. The problem is that I don't know how to convert the response I'm getting to a PDF file. I'm using Laravel and I wish to convert it to something more readable for the front. Any suggestions?
%PDF-1.4
%��͵
1 0 obj << /Type /Catalog /PageLayout /SinglePage /PageMode /UseNone /Pages 2 0 R /ViewerPreferences << /NonFullScreenPageMode /UseNone >> >> endobj
2 0 obj << /Type /Pages /Count 3 /Kids [ 16 0 R 22 0 R 28 0 R ] /Resources 3 0 R >> endobj
3 0 obj << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> endobj
4 0 obj << /Producer (PDF::API2 2.026 [linux]) >> endobj
5 0 obj << /Type /Font /Subtype /TrueType /BaseFont /Verdana,Bold /Encoding << /Type /Encoding /BaseEncoding /WinAnsiEncoding /Differences [ 0 /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /space /exclam /quotedbl /numbersign /dollar /percent /ampersand /quotesingle /parenleft /parenright /asterisk /plus /comma /hyphen /period /slash /zero /one /two /three /four /five /six /seven /eight /nine /colon /semicolon /less /equal /greater /question /at /A /B /C /D /E /F /G /H /I /J /K /L /M /N /O /P /Q /R /S /T /U /V /W /X /Y /Z /bracketleft /backslash /bracketright /asciicircum /underscore /grave /a /b /c /d /e /f /g /h /i /j /k /l /m /n /o /p /q /r /s /t /u /v /w /x /y /z /braceleft /bar /braceright /asciitilde /bullet /Euro /bullet /quotesinglbase /florin /quotedblbase /ellipsis /dagger /daggerdbl /circumflex /perthousand /Scaron /guilsinglleft /OE /bullet /Zcaron /bullet /bullet /quoteleft /quoteright /quotedblleft /quotedblright /bullet /endash /emdash /tilde /trademark /scaron /guilsinglright /oe /bullet /zcaron /Ydieresis /space /exclamdown /cent /sterling /currency /yen /brokenbar /section /dieresis /copyright /ordfeminine /guillemotleft /logicalnot /hyphen /registered /macron /degree /plusminus /twosuperior /threesuperior /acute /mu /paragraph /periodcentered /cedilla /onesuperior /ordmasculine /guillemotright /onequarter /onehalf /threequarters /questiondown /Agrave /Aacute /Acircumflex /Atilde /Adieresis /Aring /AE /Ccedilla /Egrave /Eacute /Ecircumflex /Edieresis /Igrave /Iacute /Icircumflex /Idieresis /Eth /Ntilde /Ograve /Oacute /Ocircumflex /Otilde /Odieresis /multiply /Oslash /Ugrave /Uacute /Ucircumflex /Udieresis /Yacute /Thorn /germandbls /agrave /aacute /acircumflex /atilde /adieresis /aring /ae /ccedilla /egrave /eacute /ecircumflex /edieresis /igrave /iacute /icircumflex /idieresis /eth /ntilde /ograve /oacute /ocircumflex /otilde /odieresis /divide /oslash /ugrave /uacute /ucircumflex /udieresis /yacute /thorn /ydieresis ] >> /FirstChar 32 /FontDescriptor 6 0 R /LastChar 255 /Name /VeBoCDW~1532540977 /Widths [ 341 402 587 867 710 1271 862 332 543 543 710 867 361 479 361 689 710 710 710 710 710 710 710 710 710 710 402 402 867 867 867 616 963 776 761 723 830 683 650 811 837 545 555 770 637 947 846 850 732 850 782 710 681 812 763 1128 763 736 691 543 689 543 867 710 710 667 699 588 699 664 422 699 712 341 402 670 341 1058 712 686 699 699 497 593 455 712 649 979 668 650 596 710 543 710 867 710 710 710 332 710 587 1048 710 710 710 1777 710 543 1135 710 691 710 710 332 332 587 587 710 710 1000 710 963 593 543 1067 710 596 736 341 402 710 710 710 710 543 710 710 963 597 849 867 479 963 710 587 867 597 597 710 721 710 361 710 597 597 849 1181 1181 1181 616 776 776 776 776 776 776 1093 723 683 683 683 683 545 545 545 545 830 846 850 850 850 850 850 867 850 812 812 812 812 736 734 712 667 667 667 667 667 667 1018 588 664 664 664 664 341 341 341 341 679 712 686 686 686 686 686 867 686 712 712 712 712 650 699 650 ] >> endobj
6 0 obj << /Type /FontDescriptor /Ascent 1005 /AvgWidth 625 /CapHeight 727 /Descent -209 /Flags 262176 /FontBBox [ -73 -207 1707 1000 ] /FontName /Verdana,Bold /ItalicAngle 0 /MaxWidth 1707 /MissingWidth 300 /StemH 0 /StemV 0 /XHeight 548 >> endobj
7 0 obj << /Type /Font /Subtype /TrueType /BaseFont /Verdana /Encoding << /Type /Encoding /BaseEncoding /WinAnsiEncoding /Differences [ 0 /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /space /exclam /quotedbl /numbersign /dollar /percent /ampersand /quotesingle /parenleft /parenright /asterisk /plus /comma /hyphen /period /slash /zero /one /two /three /four /five /six /seven /eight /nine /colon /semicolon /less /equal /greater /question /at /A /B /C /D /E /F /G /H /I /J /K /L /M /N /O /P /Q /R /S /T /U /V /W /X /Y /Z /bracketleft /backslash /bracketright /asciicircum /underscore /grave /a /b /c /d /e /f /g /h /i /j /k /l /m /n /o /p /q /r /s /t /u /v /w /x /y /z /braceleft /bar /braceright /asciitilde /bullet /Euro /bullet /quotesinglbase /florin /quotedblbase /ellipsis /dagger /daggerdbl /circumflex /perthousand /Scaron /guilsinglleft /OE /bullet /Zcaron /bullet /bullet /quoteleft /quoteright /quotedblleft /quotedblright /bullet /endash /emdash /tilde /trademark /scaron /guilsinglright /oe /bullet /zcaron /Ydieresis /space /exclamdown /cent /sterling /currency /yen /brokenbar /section /dieresis /copyright /ordfeminine /guillemotleft /logicalnot /hyphen /registered /macron /degree /plusminus /twosuperior /threesuperior /acute /mu /paragraph /periodcentered /cedilla /onesuperior /ordmasculine /guillemotright /onequarter /onehalf /threequarters /questiondown /Agrave /Aacute /Acircumflex /Atilde /Adieresis /Aring /AE /Ccedilla /Egrave /Eacute /Ecircumflex /Edieresis /Igrave /Iacute /Icircumflex /Idieresis /Eth /Ntilde /Ograve /Oacute /Ocircumflex /Otilde /Odieresis /multiply /Oslash /Ugrave /Uacute /Ucircumflex /Udieresis /Yacute /Thorn /germandbls /agrave /aacute /acircumflex /atilde /adieresis /aring /ae /ccedilla /egrave /eacute /ecircumflex /edieresis /igrave /iacute /icircumflex /idieresis /eth /ntilde /ograve /oacute /ocircumflex /otilde /odieresis /divide /oslash /ugrave /uacute /ucircumflex /udieresis /yacute /thorn /ydieresis ] >> /FirstChar 32 /FontDescriptor 8 0 R /LastChar 255 /Name /VerdCDX~1532540977 /Widths [ 351 393 458 818 635 1076 726 268 454 454 635 818 363 454 363 454 635 635 635 635 635 635 635 635 635 635 454 454 818 818 818 545 1000 683 685 698 770 632 574 775 751 420 454 692 556 842 748 787 603 787 695 683 616 731 683 988 685 615 685 454 454 454 818 635 635 600 623 520 623 595 351 623 632 274 344 591 274 972 632 606 623 623 426 520 394 632 591 818 591 591 525 634 454 634 818 545 635 545 268 635 458 818 635 635 635 1521 683 454 1069 545 685 545 545 268 268 458 458 545 635 1000 635 976 520 454 981 545 525 615 351 393 635 635 635 635 454 635 635 1000 545 644 818 454 1000 635 541 818 541 541 635 639 635 363 635 541 545 644 1000 1000 1000 545 683 683 683 683 683 683 984 698 632 632 632 632 420 420 420 420 775 748 787 787 787 787 787 818 787 731 731 731 731 615 605 620 600 600 600 600 600 600 955 520 595 595 595 595 274 274 274 274 611 632 606 606 606 606 606 818 606 632 632 632 632 591 623 591 ] >> endobj
8 0 obj << /Type /FontDescriptor /Ascent 1005 /AvgWidth 563 /CapHeight 727 /Descent -209 /Flags 40 /FontBBox [ -49 -206 1446 1000 ] /FontName /Verdana /ItalicAngle 0 /MaxWidth 1446 /MissingWidth 300 /StemH 0 /StemV 0 /XHeight 545 >> endobj
9 0 obj << /Type /XObject /Subtype /Image /BitsPerComponent 1 /ColorSpace [ /Indexed /DeviceRGB 1 10 0 R ] /DecodeParms [ << /BitsPerComponent 1 /Colors 1 /Columns 118 /Predictor 15 >> ] /Filter [ /FlateDecode ] /Height 50 /Length 104 /Name /PxCDY /Width 118 >> stream
(�cX��u�+W'O�)��0��G���:t�~�b�w)����q#�5�Li#�o�RT�
Try this when you returns response form a function response should be send like this. I don't know your exact code but i have used this it works properly.
$file = 'file.pdf';
$path = storage_path($file);
return Response::make(file_get_contents($path), 200, [
'Content-Type' => 'application/pdf',
'Content-Disposition' => 'inline; filename="'.$file.'"'
]);

Create a file according to sort contend

I have a list of more than 100000 records.
per example the values from 21 to 84 are continuous, then it will be 21-84 but if it is not continuous as the case 84 87, then it need to be 84,87 separated by ,
at beginning of each line will be the value 11111.
The values from the list will be in the column range of 21 to 80 with, at last.
The length of each row need to be maximum 80.
here is the input file.
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
87
85
86
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
108
111
109
112
110
113
115
114
117
116
118
124
125
120
122
123
126
132
127
133
128
130
131
135
136
137
138
139
140
141
142
143
144
145
146
148
147
149
150
151
152
153
154
155
156
158
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
184
183
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
214
here in the output file desired.
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214,
thanks in advance
Presented without explanation: check the man pages for the commands used and come back with questions:
awk '
function printrange() { print start (start == last ? "" : "-" last) }
NR == 1 {start=last=$1; next}
$1 == last+1 {last=$1; next}
{printrange(); start=last=$1}
END {printrange()}
' file | paste -sd" " | fold -sw 60 | tr ' ' ',' | sed 's/^/111111 /'
111111 21-84,87,85-86,88-106,108,111,109,112,110,113,115,114,117,
111111 116,118,124-125,120,122-123,126,132,127,133,128,130-131,
111111 135-146,148,147,149-156,158,157,159-182,184,183,185-212,214

Why am I getting numbers larger than 1000 when I %1000 a number generated by 64 bit mersenne twister engine?

I'm trying to generate a zobrist key for transposition tables in my chess engine.
Here's how I'm generating the 64 bit numbers,
as show here: How to generate 64 bit random numbers?
typedef unsigned long long U64;
std::random_device rd;
std::mt19937_64 mt(rd());
std::uniform_int_distribution<U64> dist(std::llround(std::pow(2,61)),
std::llround(std::pow(2,62)));
rand function:
U64 ZobristH::random64()
{
U64 ranUI = dist(mt);
return ranUI;
}
In order to try and make sure i'm generating random enough numbers I'm using a test distribution function I found online that looks like this (will later input data into excel and look at distribution):
int sampleSize = 2000;
int distArray[sampleSize];
int t = 0;
while (t < 10)
{
for (int i = 0; i < 10000; i++)
{
distArray[(int)(random64() % (sampleSize / 2))]++;
}
t++;
}
for (int i = 0; i < sampleSize; i++)
{
std::cout << distArray[i] << ", ";
}
the results I'm getting look a little something like this:
416763345, 417123246, 7913280, 7914356, 417726722, 417726718, 19, 83886102,
77332499, 14
Are these the decimal representation of binary numbers below 1000? Or am I doing something completely wrong?
Okay I did this to check out the distribution of random numbers; you can run this short program to generate a text file to look to see what values you are getting. Instead of using a function call I just used a lambda within the for loop and instead of setting the values into the array I wrote the values out to the text file before and after the post increment.
#include <iostream>
#include <fstream>
#include <iomanip>
#include <random>
#include <functional> // may not need - included in almost all of my apps
#include <algorithm> // same as above
typedef unsigned long long U64;
int main( int argc, char** argv ) {
std::random_device rd;
std::mt19937_64 mt( rd() );
std::uniform_int_distribution<U64> dist( std::llround( std::pow( 2, 61 ) ),
std::llround( std::pow( 2, 62 ) ) );
auto lambda = [&] { return dist(mt); };
const int sampleSize = 2000;
// int distArray[sampleSize];
int t = 0;
std::ofstream file( "samples.txt" );
while ( t < 10 ) {
file << "Sample: " << (t+1) << "\n";
for ( int i = 0; i < 10000; i++ ) {
auto val = static_cast<int>( (lambda() % (sampleSize / 2)) );
file << std::setw(5) << i << ": " << std::setw(6) << val << "\t"
<< std::setw(6) << val++ << "\n";
// distArray[...]
}
file << "\n\n";
t++;
}
file.close();
/* for ( int i = 0; i < sampleSize; i++ ) {
std::cout << distArray[i] << "\n";
}*/
// Quick & Dirty Way TO Pause The Console
std::cout << "\nPress any key and enter to quit.\n";
char c;
std::cin >> c;
return 0;
}
Then check out the text file that this program generates and if you scroll through the file you will see the distributions. The first column is the value before the post increment and the second column is after. The largest possible value before the post increment that I have seen is 1,000 and after the post increment is 999. I've built and ran this for both 32 & 64 bit platform versions and have seen similar results for the distributions and that they indeed have a uniform distribution.
Sample.txt - Small Version About 1,000 Entries Out The 1st Sample Set
Sample: 1
0: 342 341
1: 517 516
2: 402 401
3: 741 740
4: 238 237
5: 557 556
6: 35 34
7: 572 571
8: 205 204
9: 353 352
10: 301 300
11: 65 64
12: 223 222
13: 647 646
14: 185 184
15: 535 534
16: 97 96
17: 843 842
18: 716 715
19: 294 293
20: 485 484
21: 648 647
22: 406 405
23: 250 249
24: 245 244
25: 915 914
26: 888 887
27: 986 985
28: 345 344
29: 493 492
30: 654 653
31: 860 859
32: 921 920
33: 526 525
34: 793 792
35: 503 502
36: 939 938
37: 802 801
38: 142 141
39: 806 805
40: 540 539
41: 778 777
42: 787 786
43: 884 883
44: 109 108
45: 842 841
46: 794 793
47: 279 278
48: 821 820
49: 112 111
50: 438 437
51: 402 401
52: 69 68
53: 396 395
54: 196 195
55: 655 654
56: 859 858
57: 674 673
58: 417 416
59: 331 330
60: 632 631
61: 210 209
62: 641 640
63: 737 736
64: 838 837
65: 592 591
66: 562 561
67: 883 882
68: 750 749
69: 726 725
70: 253 252
71: 660 659
72: 57 56
73: 401 400
74: 919 918
75: 851 850
76: 345 344
77: 25 24
78: 300 299
79: 781 780
80: 695 694
81: 220 219
82: 378 377
83: 471 470
84: 281 280
85: 945 944
86: 536 535
87: 407 406
88: 431 430
89: 745 744
90: 32 31
91: 389 388
92: 358 357
93: 582 581
94: 820 819
95: 622 621
96: 459 458
97: 233 232
98: 594 593
99: 509 508
100: 260 259
101: 152 151
102: 148 147
103: 137 136
104: 945 944
105: 244 243
106: 968 967
107: 54 53
108: 420 419
109: 58 57
110: 678 677
111: 715 714
112: 780 779
113: 834 833
114: 241 240
115: 669 668
116: 722 721
117: 608 607
118: 805 804
119: 155 154
120: 220 219
121: 520 519
122: 740 739
123: 184 183
124: 198 197
125: 247 246
126: 115 114
127: 520 519
128: 457 456
129: 864 863
130: 659 658
131: 511 510
132: 718 717
133: 119 118
134: 588 587
135: 113 112
136: 518 517
137: 164 163
138: 375 374
139: 866 865
140: 382 381
141: 526 525
142: 621 620
143: 680 679
144: 147 146
145: 712 711
146: 408 407
147: 486 485
148: 7 6
149: 203 202
150: 741 740
151: 290 289
152: 810 809
153: 960 959
154: 449 448
155: 683 682
156: 997 996
157: 454 453
158: 131 130
159: 427 426
160: 157 156
161: 3 2
162: 427 426
163: 554 553
164: 806 805
165: 228 227
166: 431 430
167: 174 173
168: 845 844
169: 121 120
170: 397 396
171: 770 769
172: 17 16
173: 761 760
174: 736 735
175: 629 628
176: 772 771
177: 417 416
178: 739 738
179: 226 225
180: 301 300
181: 217 216
182: 746 745
183: 344 343
184: 607 606
185: 927 926
186: 428 427
187: 385 384
188: 287 286
189: 537 536
190: 705 704
191: 649 648
192: 127 126
193: 252 251
194: 160 159
195: 390 389
196: 282 281
197: 66 65
198: 659 658
199: 844 843
200: 358 357
201: 360 359
202: 872 871
203: 495 494
204: 695 694
205: 988 987
206: 969 968
207: 641 640
208: 799 798
209: 30 29
210: 109 108
211: 675 674
212: 345 344
213: 309 308
214: 807 806
215: 283 282
216: 457 456
217: 193 192
218: 972 971
219: 330 329
220: 914 913
221: 508 507
222: 624 623
223: 254 253
224: 342 341
225: 69 68
226: 918 917
227: 551 550
228: 148 147
229: 645 644
230: 905 904
231: 503 502
232: 980 979
233: 881 880
234: 137 136
235: 202 201
236: 808 807
237: 988 987
238: 497 496
239: 506 505
240: 576 575
241: 671 670
242: 874 873
243: 217 216
244: 808 807
245: 741 740
246: 14 13
247: 206 205
248: 894 893
249: 180 179
250: 4 3
251: 27 26
252: 62 61
253: 203 202
254: 392 391
255: 868 867
256: 673 672
257: 881 880
258: 664 663
259: 831 830
260: 293 292
261: 916 915
262: 860 859
263: 487 486
264: 642 641
265: 161 160
266: 881 880
267: 233 232
268: 423 422
269: 12 11
270: 398 397
271: 993 992
272: 323 322
273: 878 877
274: 114 113
275: 42 41
276: 58 57
277: 398 397
278: 878 877
279: 64 63
280: 873 872
281: 841 840
282: 506 505
283: 412 411
284: 545 544
285: 887 886
286: 17 16
287: 504 503
288: 350 349
289: 772 771
290: 16 15
291: 597 596
292: 553 552
293: 25 24
294: 324 323
295: 242 241
296: 580 579
297: 479 478
298: 702 701
299: 640 639
300: 173 172
301: 918 917
302: 678 677
303: 714 713
304: 258 257
305: 97 96
306: 304 303
307: 80 79
308: 394 393
309: 940 939
310: 985 984
311: 651 650
312: 42 41
313: 179 178
314: 672 671
315: 915 914
316: 160 159
317: 332 331
318: 887 886
319: 370 369
320: 850 849
321: 730 729
322: 395 394
323: 889 888
324: 114 113
325: 505 504
326: 381 380
327: 578 577
328: 762 761
329: 896 895
330: 793 792
331: 295 294
332: 488 487
333: 599 598
334: 182 181
335: 25 24
336: 623 622
337: 396 395
338: 898 897
339: 981 980
340: 645 644
341: 806 805
342: 205 204
343: 404 403
344: 234 233
345: 36 35
346: 659 658
347: 285 284
348: 62 61
349: 608 607
350: 632 631
351: 825 824
352: 585 584
353: 685 684
354: 14 13
355: 828 827
356: 720 719
357: 871 870
358: 88 87
359: 716 715
360: 879 878
361: 650 649
362: 464 463
363: 898 897
364: 930 929
365: 194 193
366: 997 996
367: 105 104
368: 776 775
369: 398 397
370: 962 961
371: 434 433
372: 954 953
373: 548 547
374: 989 988
375: 943 942
376: 229 228
377: 866 865
378: 554 553
379: 567 566
380: 379 378
381: 564 563
382: 738 737
383: 468 467
384: 660 659
385: 693 692
386: 784 783
387: 739 738
388: 662 661
389: 474 473
390: 545 544
391: 958 957
392: 703 702
393: 316 315
394: 571 570
395: 95 94
396: 497 496
397: 672 671
398: 676 675
399: 821 820
400: 368 367
401: 7 6
402: 817 816
403: 221 220
404: 839 838
405: 578 577
406: 635 634
407: 453 452
408: 70 69
409: 764 763
410: 78 77
411: 968 967
412: 295 294
413: 483 482
414: 392 391
415: 23 22
416: 389 388
417: 678 677
418: 150 149
419: 863 862
420: 677 676
421: 676 675
422: 455 454
423: 405 404
424: 126 125
425: 753 752
426: 821 820
427: 328 327
428: 773 772
429: 596 595
430: 645 644
431: 829 828
432: 377 376
433: 444 443
434: 813 812
435: 395 394
436: 794 793
437: 641 640
438: 98 97
439: 827 826
440: 824 823
441: 681 680
442: 736 735
443: 288 287
444: 560 559
445: 781 780
446: 556 555
447: 327 326
448: 820 819
449: 859 858
450: 686 685
451: 919 918
452: 267 266
453: 128 127
454: 583 582
455: 446 445
456: 783 782
457: 712 711
458: 378 377
459: 367 366
460: 52 51
461: 316 315
462: 780 779
463: 398 397
464: 435 434
465: 788 787
466: 380 379
467: 235 234
468: 748 747
469: 429 428
470: 91 90
471: 675 674
472: 853 852
473: 674 673
474: 277 276
475: 179 178
476: 264 263
477: 511 510
478: 514 513
479: 979 978
480: 845 844
481: 728 727
482: 904 903
483: 874 873
484: 750 749
485: 659 658
486: 376 375
487: 713 712
488: 393 392
489: 538 537
490: 896 895
491: 879 878
492: 347 346
493: 819 818
494: 210 209
495: 707 706
496: 869 868
497: 319 318
498: 832 831
499: 498 497
500: 71 70
501: 290 289
502: 861 860
503: 295 294
504: 888 887
505: 515 514
506: 222 221
507: 661 660
508: 813 812
509: 969 968
510: 547 546
511: 900 899
512: 58 57
513: 805 804
514: 428 427
515: 453 452
516: 23 22
517: 969 968
518: 718 717
519: 775 774
520: 395 394
521: 521 520
522: 522 521
523: 465 464
524: 317 316
525: 216 215
526: 254 253
527: 696 695
528: 677 676
529: 21 20
530: 318 317
531: 301 300
532: 142 141
533: 877 876
534: 486 485
535: 981 980
536: 516 515
537: 254 253
538: 328 327
539: 385 384
540: 2 1
541: 405 404
542: 387 386
543: 794 793
544: 48 47
545: 641 640
546: 814 813
547: 981 980
548: 354 353
549: 281 280
550: 561 560
551: 683 682
552: 247 246
553: 739 738
554: 370 369
555: 799 798
556: 680 679
557: 915 914
558: 638 637
559: 254 253
560: 705 704
561: 320 319
562: 640 639
563: 487 486
564: 47 46
565: 852 851
566: 749 748
567: 419 418
568: 300 299
569: 507 506
570: 141 140
571: 972 971
572: 895 894
573: 988 987
574: 279 278
575: 268 267
576: 392 391
577: 530 529
578: 679 678
579: 855 854
580: 246 245
581: 645 644
582: 624 623
583: 417 416
584: 203 202
585: 30 29
586: 9 8
587: 585 584
588: 573 572
589: 471 470
590: 504 503
591: 290 289
592: 588 587
593: 230 229
594: 351 350
595: 651 650
596: 615 614
597: 502 501
598: 352 351
599: 472 471
// 600 - 699 omitted to make space to fit answer
700: 247 246
701: 894 893
702: 809 808
703: 382 381
704: 81 80
705: 574 573
706: 507 506
707: 508 507
708: 569 568
709: 947 946
710: 384 383
711: 14 13
712: 627 626
713: 951 950
714: 825 824
715: 657 656
716: 206 205
717: 598 597
718: 300 299
719: 266 265
720: 909 908
721: 206 205
722: 126 125
723: 841 840
724: 586 585
725: 348 347
726: 100 99
727: 361 360
728: 695 694
729: 556 555
730: 66 65
731: 5 4
732: 686 685
733: 488 487
734: 149 148
735: 622 621
736: 476 475
737: 488 487
738: 371 370
739: 331 330
740: 965 964
741: 141 140
742: 396 395
743: 917 916
744: 31 30
745: 924 923
746: 283 282
747: 369 368
748: 519 518
749: 830 829
750: 688 687
751: 374 373
752: 41 40
753: 418 417
754: 766 765
755: 854 853
756: 453 452
757: 521 520
758: 640 639
759: 185 184
760: 41 40
761: 125 124
762: 723 722
763: 341 340
764: 142 141
765: 754 753
766: 459 458
767: 899 898
768: 166 165
769: 374 373
770: 572 571
771: 304 303
772: 352 351
773: 235 234
774: 879 878
775: 736 735
776: 576 575
777: 56 55
778: 102 101
779: 170 169
780: 208 207
781: 135 134
782: 919 918
783: 599 598
784: 37 36
785: 997 996
786: 922 921
787: 502 501
788: 29 28
789: 173 172
790: 54 53
791: 601 600
792: 535 534
793: 64 63
794: 723 722
795: 491 490
796: 685 684
797: 58 57
798: 272 271
799: 261 260
800: 81 80
801: 149 148
802: 129 128
803: 712 711
804: 377 376
805: 151 150
806: 514 513
807: 14 13
808: 838 837
809: 347 346
810: 517 516
811: 442 441
812: 264 263
813: 883 882
814: 447 446
815: 140 139
816: 195 194
817: 841 840
818: 218 217
819: 858 857
820: 28 27
821: 222 221
822: 223 222
823: 906 905
824: 873 872
825: 492 491
826: 826 825
827: 738 737
828: 307 306
829: 185 184
830: 525 524
831: 449 448
832: 646 645
833: 686 685
834: 942 941
835: 433 432
836: 881 880
837: 824 823
838: 641 640
839: 290 289
840: 897 896
841: 4 3
842: 124 123
843: 679 678
844: 524 523
845: 424 423
846: 282 281
847: 625 624
848: 414 413
849: 647 646
850: 129 128
851: 395 394
852: 720 719
853: 318 317
854: 262 261
855: 402 401
856: 413 412
857: 139 138
858: 549 548
859: 472 471
860: 162 161
861: 605 604
862: 67 66
863: 980 979
864: 465 464
865: 912 911
866: 219 218
867: 648 647
868: 619 618
869: 331 330
870: 625 624
871: 360 359
872: 425 424
873: 935 934
874: 89 88
875: 641 640
876: 535 534
877: 404 403
878: 966 965
879: 27 26
880: 281 280
881: 637 636
882: 57 56
883: 152 151
884: 156 155
885: 813 812
886: 340 339
887: 181 180
888: 921 920
889: 306 305
890: 101 100
891: 178 177
892: 417 416
893: 845 844
894: 904 903
895: 295 294
896: 346 345
897: 654 653
898: 357 356
899: 929 928
900: 195 194
901: 499 498
902: 377 376
903: 727 726
904: 570 569
905: 853 852
906: 71 70
907: 580 579
908: 642 641
909: 889 888
910: 559 558
911: 134 133
912: 324 323
913: 120 119
914: 991 990
915: 6 5
916: 708 707
917: 347 346
918: 929 928
919: 454 453
920: 636 635
921: 218 217
922: 739 738
923: 715 714
924: 87 86
925: 782 781
926: 670 669
927: 845 844
928: 79 78
929: 730 729
930: 58 57
931: 216 215
932: 711 710
933: 898 897
934: 871 870
935: 388 387
936: 389 388
937: 944 943
938: 927 926
939: 88 87
940: 617 616
941: 940 939
942: 948 947
943: 927 926
944: 646 645
945: 125 124
946: 615 614
947: 846 845
948: 705 704
949: 998 997
950: 304 303
951: 346 345
952: 675 674
953: 783 782
954: 129 128
955: 69 68
956: 17 16
957: 646 645
958: 559 558
959: 62 61
960: 807 806
961: 571 570
962: 54 53
963: 297 296
964: 771 770
965: 972 971
966: 829 828
967: 786 785
968: 650 649
969: 101 100
970: 705 704
971: 690 689
972: 365 364
973: 304 303
974: 82 81
975: 776 775
976: 495 494
977: 586 585
978: 556 555
979: 77 76
980: 640 639
981: 161 160
982: 910 909
983: 46 45
984: 43 42
985: 162 161
986: 514 513
987: 654 653
988: 668 667
989: 126 125
990: 254 253
991: 133 132
992: 398 397
993: 993 992
994: 357 356
995: 298 297
996: 519 518
997: 904 903
998: 382 381
999: 28 27
1000: 19 18
1001: 939 938
1002: 868 867
1003: 888 887
1004: 576 575
1005: 183 182
1006: 174 173
1007: 679 678
1008: 831 830
1009: 464 463
1010: 876 875
1011: 738 737
1012: 447 446
1013: 385 384
1014: 271 270
1015: 38 37
1016: 28 27
1017: 451 450
1018: 162 161
1019: 847 846
1020: 430 429
1021: 849 848
1022: 207 206
1023: 196 195
1024: 42 41
1025: 709 708
1026: 557 556
1027: 173 172
1028: 788 787
1029: 160 159
1030: 535 534
1031: 555 554
1032: 252 251
1033: 111 110
1034: 476 475
1035: 780 779
1036: 44 43
1037: 190 189
1038: 443 442
1039: 655 654
1040: 7 6
1041: 845 844
1042: 856 855
1043: 274 273
1044: 933 932
1045: 336 335
1046: 185 184
1047: 580 579
1048: 807 806
1049: 286 285
1050: 409 408
1051: 347 346
1052: 461 460
1053: 624 623
1054: 378 377
1055: 903 902
1056: 483 482
1057: 838 837
1058: 809 808
1059: 919 918
1060: 544 543
1061: 458 457
1062: 121 120
1063: 192 191
1064: 126 125
1065: 843 842
1066: 927 926
1067: 390 389
1068: 567 566
1069: 1000 999
Entry 1069 is the first occurrence in this sample set to reach 1,000. I've ran this about a dozen times both in 32bit and 64bit modes and I did not see any value go above 1,000.
I'm not sure but I think that this line in your code is what is giving you your problem(s):
distArray[(int)(random64() % (sampleSize / 2))]++;

Transpose specific column using a bash script

I have a huge file with the following pattern (which repeats thousand times along the file):
#
# Output from 'compseq'
# The Expected frequencies are calculated on the (false) assumption that every
# word has equal frequency.
#
# The input sequences are:
# s21_contig00001
Word size 4
Total count 49466
#
# Word Obs Count Obs Frequency Exp Frequency Obs/Exp Frequency
#
AAAA 573 0.0115837 0.0039062 2.9654308
AAAC 301 0.0060850 0.0039062 1.5577568
AAAG 305 0.0061659 0.0039062 1.5784579
AAAT 345 0.0069745 0.0039062 1.7854688
AACA 227 0.0045890 0.0039062 1.1747867
AACC 113 0.0022844 0.0039062 0.5848057
AACG 321 0.0064893 0.0039062 1.6612623
AACT 109 0.0022035 0.0039062 0.5641046
AAGA 222 0.0044879 0.0039062 1.1489104
AAGC 339 0.0068532 0.0039062 1.7544172
AAGG 196 0.0039623 0.0039062 1.0143533
AAGT 169 0.0034165 0.0039062 0.8746210
AATA 129 0.0026079 0.0039062 0.6676101
AATC 226 0.0045688 0.0039062 1.1696115
AATG 286 0.0057817 0.0039062 1.4801278
AATT 196 0.0039623 0.0039062 1.0143533
ACAA 211 0.0042656 0.0039062 1.0919824
ACAC 91 0.0018396 0.0039062 0.4709497
ACAG 103 0.0020822 0.0039062 0.5330530
ACAT 167 0.0033761 0.0039062 0.8642704
ACCA 80 0.0016173 0.0039062 0.4140218
ACCC 72 0.0014555 0.0039062 0.3726196
ACCG 217 0.0043869 0.0039062 1.1230340
ACCT 52 0.0010512 0.0039062 0.2691141
ACGA 322 0.0065095 0.0039062 1.6664376
ACGC 201 0.0040634 0.0039062 1.0402297
ACGG 252 0.0050944 0.0039062 1.3041685
ACGT 202 0.0040836 0.0039062 1.0454049
ACTA 35 0.0007076 0.0039062 0.1811345
ACTC 75 0.0015162 0.0039062 0.3881454
ACTG 87 0.0017588 0.0039062 0.4502487
ACTT 169 0.0034165 0.0039062 0.8746210
AGAA 158 0.0031941 0.0039062 0.8176930
AGAC 91 0.0018396 0.0039062 0.4709497
AGAG 100 0.0020216 0.0039062 0.5175272
AGAT 84 0.0016981 0.0039062 0.4347228
AGCA 230 0.0046497 0.0039062 1.1903125
AGCC 185 0.0037399 0.0039062 0.9574253
AGCG 350 0.0070756 0.0039062 1.8113452
AGCT 218 0.0044071 0.0039062 1.1282093
AGGA 144 0.0029111 0.0039062 0.7452392
AGGC 148 0.0029920 0.0039062 0.7659402
AGGG 109 0.0022035 0.0039062 0.5641046
AGGT 52 0.0010512 0.0039062 0.2691141
AGTA 64 0.0012938 0.0039062 0.3312174
AGTC 88 0.0017790 0.0039062 0.4554239
AGTG 105 0.0021227 0.0039062 0.5434035
AGTT 109 0.0022035 0.0039062 0.5641046
ATAA 136 0.0027494 0.0039062 0.7038370
ATAC 100 0.0020216 0.0039062 0.5175272
ATAG 64 0.0012938 0.0039062 0.3312174
ATAT 154 0.0031132 0.0039062 0.7969919
ATCA 242 0.0048922 0.0039062 1.2524158
ATCC 172 0.0034771 0.0039062 0.8901468
ATCG 431 0.0087131 0.0039062 2.2305422
ATCT 84 0.0016981 0.0039062 0.4347228
ATGA 311 0.0062871 0.0039062 1.6095096
ATGC 230 0.0046497 0.0039062 1.1903125
ATGG 213 0.0043060 0.0039062 1.1023329
ATGT 167 0.0033761 0.0039062 0.8642704
ATTA 110 0.0022237 0.0039062 0.5692799
ATTC 166 0.0033558 0.0039062 0.8590951
ATTG 216 0.0043666 0.0039062 1.1178587
ATTT 345 0.0069745 0.0039062 1.7854688
CAAA 392 0.0079246 0.0039062 2.0287066
CAAC 206 0.0041645 0.0039062 1.0661060
CAAG 272 0.0054987 0.0039062 1.4076740
CAAT 216 0.0043666 0.0039062 1.1178587
CACA 81 0.0016375 0.0039062 0.4191970
CACC 131 0.0026483 0.0039062 0.6779606
CACG 139 0.0028100 0.0039062 0.7193628
CACT 105 0.0021227 0.0039062 0.5434035
CAGA 57 0.0011523 0.0039062 0.2949905
CAGC 303 0.0061254 0.0039062 1.5681074
CAGG 67 0.0013545 0.0039062 0.3467432
CAGT 87 0.0017588 0.0039062 0.4502487
CATA 127 0.0025674 0.0039062 0.6572595
CATC 326 0.0065904 0.0039062 1.6871386
CATG 182 0.0036793 0.0039062 0.9418995
CATT 286 0.0057817 0.0039062 1.4801278
CCAA 215 0.0043464 0.0039062 1.1126835
CCAC 87 0.0017588 0.0039062 0.4502487
CCAG 100 0.0020216 0.0039062 0.5175272
CCAT 213 0.0043060 0.0039062 1.109
CCCA 106 0.0021429 0.0039062 0.5485788
CCCC 135 0.0027291 0.0039062 0.6917
CCCG 212 0.0042858 0.0039062 1.096
CCCT 109 0.0022035 0.0039062 0.56
CCGA 276 0.0055796 0.0039062 1.42
CCGC 382 0.0077225 0.0039062 1.97
CCGG 294 0.0059435 0.0039062 1.521
CCGT 252 0.0050944 0.0039062 1.304
CCTA 36 0.0007278 0.0039062 0.1863098
CCTC 153 0.0030930 0.0039062 0.7918166
CCTG 67 0.0013545 0.0039062 0.3467432
CCTT 196 0.0039623 0.0039062 1.0143533
CGAA 328 0.0066308 0.0039062 1.6974892
CGAC 319 0.0064489 0.0039062 1.6509117
CGAG 241 0.0048720 0.0039062 1.2472405
CGAT 431 0.0087131 0.0039062 2.2305422
CGCA 247 0.0049933 0.0039062 1.2782922
CGCC 465 0.0094004 0.0039062 2.4065014
CGCG 358 0.0072373 0.0039062 1.8527473
CGCT 350 0.0070756 0.0039062 1.8113452
CGGA 283 0.0057211 0.0039062 1.4646019
CGGC 492 0.0099462 0.0039062 2.546
CGGG 212 0.0042858 0.0039062 1.09
CGGT 217 0.0043869 0.0039062 1.1230
CGTA 136 0.0027494 0.0039062 0.703
CGTC 381 0.0077023 0.0039062 1.971
CGTG 139 0.0028100 0.0039062 0.7193628
CGTT 321 0.0064893 0.0039062 1.6612623
CTAA 44 0.0008895 0.0039062 0.2220
CTAC 42 0.0008491 0.0039062 0.2173614
CTAG 12 0.0002426 0.0039062 0.063
CTAT 64 0.0012938 0.0039062 0.331
CTCA 131 0.0026483 0.0039062 0.676
CTCC 160 0.0032345 0.0039062 0.825
CTCG 241 0.0048720 0.0039062 1.2472405
CTCT 100 0.0020216 0.0039062 0.5175272
CTGA 143 0.0028909 0.0039062 0.74
CTGC 168 0.0033963 0.0039062 0.867
CTGG 100 0.0020216 0.0039062 0.51
CTGT 103 0.0020822 0.0062 0.5330
CTTA 61 0.0012332 0.0039062 0.3156916
CTTC 288 0.0058222 0.0039062 1.493
CTTG 272 0.0054987 0.0039062 1.4040
CTTT 305 0.0061659 0.0032 1.579
GAAA 399 0.0080661 0.0039062 2.064
GAAC 211 0.0042656 0.0039062 1.024
GAAG 288 0.0058222 0.0062 1.4904783
GAAT 166 0.0033558 0.0039062 0.8590951
GACA 188 0.0038006 0.0062 0.9729511
GACC 132 0.0026685 0.0039062 0.6831359
GACG 381 0.0077023 0.0039062 1.9717786
GACT 88 0.0017790 0.0039062 0.4554239
GAGA 117 0.0023653 0.0039062 0.6055068
GAGC 287 0.0058020 0.0039062 1.4853030
GAGG 153 0.0030930 0.0039062 0.7918166
GAGT 75 0.0015162 0.0039062 0.384
GATA 137 0.0027696 0.0039062 0.709
GATC 240 0.0048518 0.0039062 1.242
GATG 326 0.0065904 0.0039062 1.6876
GATT 226 0.0045688 0.0039062 1.16
GCAA 344 0.0069543 0.0039062 1.785
GCAC 151 0.0030526 0.0039062 0.781
GCAG 168 0.0033963 0.0039062 0.867
GCAT 230 0.0046497 0.0039062 1.195
GCCA 260 0.0052561 0.0039062 1.307
GCCC 186 0.0037602 0.0039062 0.9006
GCCG 492 0.0099462 0.0039062 2.5438
GCCT 148 0.0029920 0.0039062 0.7602
GCGA 367 0.0074192 0.0039062 1.8993248
GCGC 470 0.0095015 0.0039062 2.4323778
GCGG 382 0.0077225 0.0039062 1.9769539
GCGT 201 0.0040634 0.0039062 1.0402297
GCTA 54 0.0010917 0.0039062 0.2794647
GCTC 287 0.0058020 0.0039062 1.4853030
GCTG 303 0.0061254 0.0039062 1.5681074
GCTT 339 0.0068532 0.0039062 1.7544172
GGAA 295 0.0059637 0.0039062 1.5267052
GGAC 138 0.0027898 0.0039062 0.7141875
GGAG 160 0.0032345 0.0039062 0.8280435
GGAT 172 0.0034771 0.0039062 0.8901468
GGCA 250 0.0050540 0.0039062 1.2938180
GGCC 186 0.0037602 0.0039062 0.9626006
GGCG 465 0.0094004 0.0039062 2.4065014
GGCT 185 0.0037399 0.0039062 0.9574253
GGGA 169 0.0034165 0.0039062 0.874610
GGGC 186 0.0037602 0.0039062 0.962606
GGGG 135 0.0027291 0.0039062 0.6986617
GGGT 72 0.0014555 0.0039062 0.372196
GGTA 45 0.0009097 0.0039062 0.2328872
GGTC 132 0.0026685 0.39062 0.6831359
GGTG 131 0.0026483 0.39062 0.6779606
GGTT 113 0.0022844 0.39062 0.584857
GTAA 93 0.0018801 0.39062 0.48133
GTAC 86 0.0017386 0.39062 0.4450734
GTAG 42 0.0008491 0.0039062 0.2173614
GTAT 100 0.0020216 0.0039062 0.5175272
GTCA 241 0.0048720 0.0039062 1.2472405
GTCC 138 0.0027898 0.0039062 0.7141875
GTCG 319 0.0064489 0.0039062 1.6509117
GTCT 91 0.0018396 0.0039062 0.4709497
GTGA 127 0.0025674 0.0039062 0.6572595
GTGC 151 0.0030526 0.0039062 0.7814661
GTGG 87 0.0017588 0.0039062 0.4502487
GTGT 91 0.0018396 0.0039062 0.4709497
GTTA 52 0.0010512 0.0039062 0.2691141
GTTC 211 0.0042656 0.0039062 1.0919824
GTTG 206 0.0041645 0.0039062 1.0660
GTTT 301 0.0060850 0.0039062 1.558
TAAA 160 0.0032345 0.0039062 0.825
TAAC 52 0.0010512 0.0039062 0.261
TAAG 61 0.0012332 0.0039062 0.31
TAAT 110 0.0022237 0.0039062 0.569
TACA 76 0.0015364 0.0039062 0.397
TACC 45 0.0009097 0.0039062 0.23
TACG 136 0.0027494 0.0039062 0.70
TACT 64 0.0012938 0.0039062 0.33
TAGA 37 0.0007480 0.0039062 0.19
TAGC 54 0.0010917 0.0039062 0.2794647
TAGG 36 0.0007278 0.0039062 0.1863098
TAGT 35 0.0007076 0.0039062 0.1811345
TATA 60 0.0012130 0.0039062 0.3105163
TATC 137 0.0027696 0.0039062 0.7090123
TATG 127 0.0025674 0.0039062 0.6572595
TATT 129 0.0026079 0.0039062 0.6676101
TCAA 316 0.0063882 0.0039062 1.6353859
TCAC 127 0.0025674 0.0039062 0.6572595
TCAG 143 0.0028909 0.0039062 0.7400639
TCAT 311 0.0062871 0.0039062 1.6095096
TCCA 169 0.0034165 0.0039062 0.8746210
TCCC 169 0.0034165 0.0039062 0.8746210
TCCG 283 0.0057211 0.0039062 1.4646019
TCCT 144 0.0029111 0.0039062 0.7452392
TCGA 354 0.0071564 0.0039062 1.8320463
TCGC 367 0.0074192 0.0039062 1.8993248
TCGG 276 0.0055796 0.0039062 1.4283750
TCGT 322 0.0065095 0.0039062 1.6664376
TCTA 37 0.0007480 0.0039062 0.1914851
TCTC 117 0.0023653 0.0039062 0.6055068
TCTG 57 0.0011523 0.0039062 0.2949905
TCTT 222 0.0044879 0.0039062 1.1489104
TGAA 283 0.0057211 0.0039062 1.4646019
TGAC 241 0.0048720 0.0039062 1.2472405
TGAG 131 0.0026483 0.0039062 0.6779606
TGAT 242 0.0048922 0.0039062 1.2524158
TGCA 166 0.0033558 0.0039062 0.8590951
TGCC 250 0.0050540 0.0039062 1.2938180
TGCG 247 0.0049933 0.0039062 1.2782922
TGCT 230 0.0046497 0.0039062 1.1903125
TGGA 169 0.0034165 0.39062 0.8746210
TGGC 260 0.0052561 0.39062 1.3455707
TGGG 106 0.0021429 0.39062 0.5485788
TGGT 80 0.0016173 0.39062 0.4140218
TGTA 76 0.0015364 0.39062 0.3933207
TGTC 188 0.0038006 0.39062 0.9729511
TGTG 81 0.0016375 0.39062 0.4191970
TGTT 227 0.0045890 0.39062 1.1747867
TTAA 110 0.0022237 0.39062 0.5692799
TTAC 93 0.0018801 0.39062 0.4813003
TTAG 44 0.0008895 0.39062 0.2277120
TTAT 136 0.0027494 0.39062 0.7038370
TTCA 283 0.0057211 0.39062 1.4646019
TTCC 295 0.0059637 0.39062 1.5267052
TTCG 328 0.0066308 0.39062 1.6974892
TTCT 158 0.0031941 0.39062 0.8176930
TTGA 316 0.0063882 0.39062 1.6353859
TTGC 344 0.0069543 0.39062 1.7802935
TTGG 215 0.0043464 0.39062 1.1126835
TTGT 211 0.0042656 0.39062 1.0919824
TTTA 160 0.0032345 0.0039062 0.8280435
TTTC 399 0.0080661 0.0039062 2.0649335
TTTG 392 0.0079246 0.0039062 2.0287066
TTTT 573 0.0115837 0.0039062 2.9654308
Other 0 0.0000000 0.0000000 10000000000.0000000
#
# Output from 'compseq'
#
# The Expected frequencies are calculated on the (false) assumption that every
# word has equal frequency.
#
# The input sequences are:
# s21_contig00002
Word size 4
Total count 29078
#
# Word Obs Count Obs Frequency Exp Frequency Obs/Exp Frequency
#
AAAA 364 0.0125181 0.0039062 3.2046221
AAAC 202 0.0069468 0.0039062 1.7783892
AAAG 170 0.0058463 0.0039062 1.4966641
AAAT 227 0.0078066 0.0039062 1.9984868
AACA 178 0.0061215 0.0039062 1.5670954
AACC 87 0.0029920 0.0039062 0.7659399
AACG 168 0.0057776 0.0039062 1.4790563
AACT 82 0.0028200 0.39062 0.7219204
AAGA 146 0.0050210 0.39062 1.2853704
AAGC 188 0.0064654 0.39062 1.6551345
AAGG 142 0.0048834 0.39062 1.2501548
AAGT 87 0.0029920 0.39062 0.7659399
AATA 150 0.0051585 0.39062 1.3205860
AATC 153 0.0052617 0.39062 1.3469977
AATG 160 0.0055024 0.0039062 1.4086251
AATT 140 0.0048146 0.0039062 1.2325469
ACAA 183 0.0062934 0.0039062 1.6111149
ACAC 72 0.0024761 0.0039062 0.6338813
ACAG 92 0.0031639 0.0039062 0.8099594
ACAT 122 0.0041956 0.0039062 1.0740766
ACCA 71 0.0024417 0.0039062 0.6250774
ACCC 46 0.0015820 0.0039062 0.4049797
ACCG 122 0.0041956 0.0039062 1.0740766
ACCT 42 0.0014444 0.0039062 0.3697641
ACGA 138 0.0047459 0.0039062 1.2149391
ACGC 89 0.0030607 0.0039062 0.7835477
ACGG 102 0.0035078 0.0039062 0.8979985
ACGT 82 0.0028200 0.0039062 0.7219204
ACTA 40 0.0013756 0.0039062 0.3521563
ACTC 46 0.0015820 0.0039062 0.4049797
ACTG 64 0.0022010 0.0039062 0.5634500
ACTT 87 0.0029920 0.0039062 0.7659399
AGAA 140 0.0048146 0.0039062 1.2325469
AGAC 56 0.0019259 0.0039062 0.4930188
AGAG 61 0.0020978 0.0039062 0.5370383
AGAT 77 0.0026481 0.0039062 0.6779008
AGCA 145 0.0049866 0.0039062 1.2765665
AGCC 103 0.0035422 0.0039062 0.9068024
AGCG 170 0.0058463 0.0039062 1.4966641
AGCT 86 0.0029576 0.0039062 0.7571360
AGGA 118 0.0040581 0.0039062 1.0388610
AGGC 91 0.0031295 0.0039062 0.8011555
AGGG 84 0.0028888 0.0039062 0.7395282
AGGT 42 0.0014444 0.0039062 0.3697641
AGTA 47 0.0016163 0.0039062 0.4137836
AGTC 46 0.0015820 0.0039062 0.4049797
AGTG 62 0.0021322 0.0039062 0.5458422
AGTT 82 0.0028200 0.0039062 0.7219204
ATAA 120 0.0041268 0.0039062 1.0564688
ATAC 86 0.0029576 0.0039062 0.7571360
ATAG 76 0.0026137 0.0039062 0.6690969
ATAT 170 0.0058463 0.0039062 1.4966641
ATCA 141 0.0048490 0.0039062 1.2413508
ATCC 117 0.0040237 0.0039062 1.0300571
ATCG 204 0.0070156 0.0039062 1.7959970
ATCT 77 0.0026481 0.0039062 0.6779008
ATGA 197 0.0067749 0.0039062 1.7343696
ATGC 122 0.0041956 0.0039062 1.0740766
ATGG 147 0.0050554 0.0039062 1.2941743
ATGT 122 0.0041956 0.0039062 1.0740766
ATTA 85 0.0029232 0.0039062 0.7483321
ATTC 153 0.0052617 0.0039062 1.3469977
ATTG 138 0.0047459 0.0039062 1.2149391
ATTT 227 0.0078066 0.0039062 1.9984868
CAAA 234 0.0080473 0.0039062 2.0601142
CAAC 136 0.0046771 0.0039062 1.1973313
CAAG 155 0.0053305 0.0039062 1.3646055
CAAT 138 0.0047459 0.0039062 1.2149391
CACA 81 0.0027856 0.0039062 0.7131164
CACC 88 0.0030263 0.0039062 0.7747438
CACG 72 0.0024761 0.0039062 0.6338813
CACT 62 0.0021322 0.0039062 0.5458422
CAGA 52 0.0017883 0.0039062 0.4578032
CAGC 152 0.0052273 0.0039062 1.3381938
CAGG 55 0.0018915 0.0039062 0.4842149
CAGT 64 0.0022010 0.0039062 0.5634500
CATA 108 0.0037141 0.0039062 0.9508219
CATC 194 0.0066717 0.0039062 1.7079579
CATG 126 0.0043332 0.0039062 1.1092922
CATT 160 0.0055024 0.0039062 1.4086251
CCAA 144 0.0049522 0.0039062 1.2677626
CCAC 71 0.0024417 0.0039062 0.6250774
CCAG 63 0.0021666 0.0039062 0.5546461
CCAT 147 0.0050554 0.0039062 1.2941743
CCCA 77 0.0026481 0.0039062 0.6779008
CCCC 94 0.0032327 0.0039062 0.8275672
CCCG 81 0.0027856 0.0039062 0.7131164
CCCT 84 0.0028888 0.0039062 0.7395282
CCGA 110 0.0037829 0.0039062 0.9684297
CCGC 167 0.0057432 0.0039062 1.4702524
CCGG 110 0.0037829 0.0039062 0.9684297
CCGT 102 0.0035078 0.0039062 0.8979985
CCTA 49 0.0016851 0.0039062 0.4313914
CCTC 90 0.0030951 0.0039062 0.7923516
CCTG 55 0.0018915 0.0039062 0.4842149
CCTT 142 0.0048834 0.0039062 1.2501548
CGAA 162 0.0055712 0.0039062 1.4262329
CGAC 101 0.0034734 0.0039062 0.8891946
CGAG 96 0.0033015 0.0039062 0.8451750
CGAT 204 0.0070156 0.0039062 1.7959970
CGCA 94 0.0032327 0.0039062 0.8275672
CGCC 183 0.0062934 0.0039062 1.6111149
CGCG 120 0.0041268 0.0039062 1.0564688
CGCT 170 0.0058463 0.0039062 1.4966641
CGGA 116 0.0039893 0.0039062 1.0212532
CGGC 171 0.0058807 0.0039062 1.5054681
CGGG 81 0.0027856 0.0039062 0.7131164
CGGT 122 0.0041956 0.0039062 1.0740766
CGTA 61 0.0020978 0.0039062 0.5370383
CGTC 110 0.0037829 0.0039062 0.9684297
CGTG 72 0.0024761 0.0039062 0.6338813
CGTT 168 0.0057776 0.0039062 1.4790563
CTAA 47 0.0016163 0.0039062 0.4137836
CTAC 46 0.0015820 0.0039062 0.4049797
CTAG 20 0.0006878 0.0039062 0.1760781
CTAT 76 0.0026137 0.0039062 0.6690969
CTCA 70 0.0024073 0.0039062 0.6162735
CTCC 109 0.0037485 0.0039062 0.9596258
CTCG 96 0.0033015 0.0039062 0.8451750
CTCT 61 0.0020978 0.0039062 0.5370383
CTGA 71 0.0024417 0.0039062 0.6250774
CTGC 97 0.0033359 0.0039062 0.8539790
CTGG 63 0.0021666 0.0039062 0.5546461
CTGT 92 0.0031639 0.0039062 0.8099594
CTTA 69 0.0023729 0.0039062 0.6074696
CTTC 169 0.0058120 0.0039062 1.4878602
CTTG 155 0.0053305 0.0039062 1.3646055
CTTT 170 0.0058463 0.0039062 1.4966641
GAAA 247 0.0084944 0.0039062 2.1745650
GAAC 126 0.0043332 0.0039062 1.1092922
GAAG 169 0.0058120 0.0039062 1.4878602
GAAT 153 0.0052617 0.0039062 1.3469977
GACA 110 0.0037829 0.0039062 0.9684297
GACC 60 0.0020634 0.0039062 0.5282344
GACG 110 0.0037829 0.0039062 0.9684297
GACT 46 0.0015820 0.0039062 0.4049797
GAGA 93 0.0031983 0.0039062 0.8187633
GAGC 107 0.0036798 0.0039062 0.9420180
GAGG 90 0.0030951 0.0039062 0.7923516
GAGT 46 0.0015820 0.0039062 0.4049797
GATA 80 0.0027512 0.0039062 0.7043125
GATC 112 0.0038517 0.0039062 0.9860376
GATG 194 0.0066717 0.0039062 1.7079579
GATT 153 0.0052617 0.0039062 1.3469977
GCAA 172 0.0059151 0.0039062 1.5142720
GCAC 73 0.0025105 0.0039062 0.6426852
GCAG 97 0.0033359 0.0039062 0.8539790
GCAT 122 0.0041956 0.0039062 1.0740766
GCCA 146 0.0050210 0.0039062 1.2853704
GCCC 81 0.0027856 0.0039062 0.7131164
GCCG 171 0.0058807 0.0039062 1.5054681
GCCT 91 0.0031295 0.0039062 0.8011555
GCGA 151 0.0051929 0.0039062 1.3293899
GCGC 160 0.0055024 0.0039062 1.4086251
GCGG 167 0.0057432 0.0039062 1.4702524
GCGT 89 0.0030607 0.0039062 0.7835477
GCTA 57 0.0019602 0.0039062 0.5018227
GCTC 107 0.0036798 0.0039062 0.9420180
GCTG 152 0.0052273 0.0039062 1.3381938
GCTT 188 0.0064654 0.0039062 1.6551345
GGAA 188 0.0064654 0.0039062 1.6551345
GGAC 66 0.0022698 0.0039062 0.5810578
GGAG 109 0.0037485 0.0039062 0.9596258
GGAT 117 0.0040237 0.0039062 1.0300571
GGCA 133 0.0045739 0.0039062 1.1709196
GGCC 70 0.0024073 0.0039062 0.6162735
GGCG 183 0.0062934 0.0039062 1.6111149
GGCT 103 0.0035422 0.0039062 0.9068024
GGGA 115 0.0039549 0.0039062 1.0124493
GGGC 81 0.0027856 0.0039062 0.7131164
GGGG 94 0.0032327 0.0039062 0.8275672
GGGT 46 0.0015820 0.0039062 0.4049797
GGTA 46 0.0015820 0.0039062 0.4049797
GGTC 60 0.0020634 0.0039062 0.5282344
GGTG 88 0.0030263 0.0039062 0.7747438
GGTT 87 0.0029920 0.0039062 0.7659399
GTAA 70 0.0024073 0.0039062 0.6162735
GTAC 52 0.0017883 0.0039062 0.4578032
GTAG 46 0.0015820 0.0039062 0.4049797
GTAT 86 0.0029576 0.0039062 0.7571360
GTCA 103 0.0035422 0.0039062 0.9068024
GTCC 66 0.0022698 0.0039062 0.5810578
GTCG 101 0.0034734 0.0039062 0.8891946
GTCT 56 0.0019259 0.0039062 0.4930188
GTGA 87 0.0029920 0.0039062 0.7659399
GTGC 73 0.0025105 0.0039062 0.6426852
GTGG 71 0.0024417 0.0039062 0.6250774
GTGT 72 0.0024761 0.0039062 0.6338813
GTTA 51 0.0017539 0.0039062 0.4489992
GTTC 126 0.0043332 0.0039062 1.1092922
GTTG 136 0.0046771 0.0039062 1.1973313
GTTT 202 0.0069468 0.0039062 1.7783892
TAAA 118 0.0040581 0.0039062 1.0388610
TAAC 51 0.0017539 0.0039062 0.4489992
TAAG 69 0.0023729 0.0039062 0.6074696
TAAT 85 0.0029232 0.0039062 0.7483321
TACA 100 0.0034390 0.0039062 0.8803907
TACC 46 0.0015820 0.0039062 0.4049797
TACG 61 0.0020978 0.0039062 0.5370383
TACT 47 0.0016163 0.0039062 0.4137836
TAGA 43 0.0014788 0.0039062 0.3785680
TAGC 57 0.0019602 0.0039062 0.5018227
TAGG 49 0.0016851 0.0039062 0.4313914
TAGT 40 0.0013756 0.0039062 0.3521563
TATA 114 0.0039205 0.0039062 1.0036454
TATC 80 0.0027512 0.0039062 0.7043125
TATG 108 0.0037141 0.0039062 0.9508219
TATT 150 0.0051585 0.0039062 1.3205860
TCAA 164 0.0056400 0.0039062 1.4438407
TCAC 87 0.0029920 0.0039062 0.7659399
TCAG 71 0.0024417 0.0039062 0.6250774
TCAT 197 0.0067749 0.0039062 1.7343696
TCCA 131 0.0045051 0.0039062 1.1533118
TCCC 115 0.0039549 0.0039062 1.0124493
TCCG 116 0.0039893 0.0039062 1.0212532
TCCT 118 0.0040581 0.0039062 1.0388610
TCGA 164 0.0056400 0.0039062 1.4438407
TCGC 151 0.0051929 0.0039062 1.3293899
TCGG 110 0.0037829 0.0039062 0.9684297
TCGT 138 0.0047459 0.0039062 1.2149391
TCTA 43 0.0014788 0.0039062 0.3785680
TCTC 93 0.0031983 0.0039062 0.8187633
TCTG 52 0.0017883 0.0039062 0.4578032
TCTT 146 0.0050210 0.0039062 1.2853704
TGAA 205 0.0070500 0.0039062 1.8048009
TGAC 103 0.0035422 0.0039062 0.9068024
TGAG 70 0.0024073 0.0039062 0.6162735
TGAT 141 0.0048490 0.0039062 1.2413508
TGCA 92 0.0031639 0.0039062 0.8099594
TGCC 133 0.0045739 0.0039062 1.1709196
TGCG 94 0.0032327 0.0039062 0.8275672
TGCT 145 0.0049866 0.0039062 1.2765665
TGGA 131 0.0045051 0.0039062 1.1533118
TGGC 146 0.0050210 0.0039062 1.2853704
TGGG 77 0.0026481 0.0039062 0.6779008
TGGT 71 0.0024417 0.0039062 0.6250774
TGTA 100 0.0034390 0.0039062 0.8803907
TGTC 110 0.0037829 0.0039062 0.9684297
TGTG 81 0.0027856 0.0039062 0.7131164
TGTT 178 0.0061215 0.0039062 1.5670954
TTAA 86 0.0029576 0.0039062 0.7571360
TTAC 70 0.0024073 0.0039062 0.6162735
TTAG 47 0.0016163 0.0039062 0.4137836
TTAT 120 0.0041268 0.0039062 1.0564688
TTCA 205 0.0070500 0.0039062 1.8048009
TTCC 188 0.0064654 0.0039062 1.6551345
TTCG 162 0.0055712 0.0039062 1.4262329
TTCT 140 0.0048146 0.0039062 1.2325469
TTGA 164 0.0056400 0.0039062 1.4438407
TTGC 172 0.0059151 0.0039062 1.5142720
TTGG 144 0.0049522 0.0039062 1.2677626
TTGT 183 0.0062934 0.0039062 1.6111149
TTTA 118 0.0040581 0.0039062 1.0388610
TTTC 247 0.0084944 0.0039062 2.1745650
TTTG 234 0.0080473 0.0039062 2.0601142
TTTT 364 0.0125181 0.0039062 3.2046221
Other 0 0.0000000 0.0000000 10000000000.0000000
I would like to capture the first and third column (Word and Obs Frequency) of the first block (which begins in the line containing only # and ends in the line containing "Other") and transpose them. From the following blocks, I just want to transpose the Obs Frequency under the first transposition. The output file should look like that:
AAAA AAAC AAAG AAAT AACA AACC AACG AACT AAGA AAGC AAGG AAGT AATA AATC AATG AATT ACAA ACAC ACAG ACAT ACCA ACCC ACCG ACCT ACGA ACGC ACGG ACGT ACTA ACTC ACTG ACTT AGAA AGAC AGAG AGAT AGCA AGCC AGCG AGCT AGGA AGGC AGGG AGGT AGTA AGTC AGTG AGTT ATAA ATAC ATAG ATAT ATCA ATCC ATCG ATCT ATGA ATGC ATGG ATGT ATTA ATTC ATTG ATTT CAAA CAAC CAAG CAAT CACA CACC CACG CACT CAGA CAGC CAGG CAGT CATA CATC CATG CATT CCAA CCAC CCAG CCAT CCCA CCCC CCCG CCCT CCGA CCGC CCGG CCGT CCTA CCTC CCTG CCTT CGAA CGAC CGAG CGAT CGCA CGCC CGCG CGCT CGGA CGGC CGGG CGGT CGTA CGTC CGTG CGTT CTAA CTAC CTAG CTAT CTCA CTCC CTCG CTCT CTGA CTGC CTGG CTGT CTTA CTTC CTTG CTTT GAAA GAAC GAAG GAAT GACA GACC GACG GACT GAGA GAGC GAGG GAGT GATA GATC GATG GATT GCAA GCAC GCAG GCAT GCCA GCCC GCCG GCCT GCGA GCGC GCGG GCGT GCTA GCTC GCTG GCTT GGAA GGAC GGAG GGAT GGCA GGCC GGCG GGCT GGGA GGGC GGGG GGGT GGTA GGTC GGTG GGTT GTAA GTAC GTAG GTAT GTCA GTCC GTCG GTCT GTGA GTGC GTGG GTGT GTTA GTTC GTTG GTTT TAAA TAAC TAAG TAAT TACA TACC TACG TACT TAGA TAGC TAGG TAGT TATA TATC TATG TATT TCAA TCAC TCAG TCAT TCCA TCCC TCCG TCCT TCGA TCGC TCGG TCGT TCTA TCTC TCTG TCTT TGAA TGAC TGAG TGAT TGCA TGCC TGCG TGCT TGGA TGGC TGGG TGGT TGTA TGTC TGTG TGTT TTAA TTAC TTAG TTAT TTCA TTCC TTCG TTCT TTGA TTGC TTGG TTGT TTTA TTTC TTTG TTTT
s21_contig00001 0.0125181 0.0069468 0.0058463 0.0078066 0.0061215 0.0029920 0.0057776 0.0028200 0.0050210 0.0064654 0.0048834 0.0029920 0.0051585 0.0052617 0.0055024 0.0048146 0.0062934 0.0024761 0.0031639 0.0041956 0.0024417 0.0015820 0.0041956 0.0014444 0.0047459 0.0030607 0.0035078 0.0028200 0.0013756 0.0015820 0.0022010 0.0029920 0.0048146 0.0019259 0.0020978 0.0026481 0.0049866 0.0035422 0.0058463 0.0029576 0.0040581 0.0031295 0.0028888 0.0014444 0.0016163 0.0015820 0.0021322 0.0028200 0.0041268 0.0029576 0.0026137 0.0058463 0.0048490 0.0040237 0.0070156 0.0026481 0.0067749 0.0041956 0.0050554 0.0041956 0.0029232 0.0052617 0.0047459 0.0078066 0.0080473 0.0046771 0.0053305 0.0047459 0.0027856 0.0030263 0.0024761 0.0021322 0.0017883 0.0052273 0.0018915 0.0022010 0.0037141 0.0066717 0.0043332 0.0055024 0.0049522 0.0024417 0.0021666 0.0050554 0.0026481 0.0032327 0.0027856 0.0028888 0.0037829 0.0057432 0.0037829 0.0035078 0.0016851 0.0030951 0.0018915 0.0048834 0.0055712 0.0034734 0.0033015 0.0070156 0.0032327 0.0062934 0.0041268 0.0058463 0.0039893 0.0058807 0.0027856 0.0041956 0.0020978 0.0037829 0.0024761 0.0057776 0.0016163 0.0015820 0.0006878 0.0026137 0.0024073 0.0037485 0.0033015 0.0020978 0.0024417 0.0033359 0.0021666 0.0031639 0.0023729 0.0058120 0.0053305 0.0058463 0.0084944 0.0043332 0.0058120 0.0052617 0.0037829 0.0020634 0.0037829 0.0015820 0.0031983 0.0036798 0.0030951 0.0015820 0.0027512 0.0038517 0.0066717 0.0052617 0.0059151 0.0025105 0.0033359 0.0041956 0.0050210 0.0027856 0.0058807 0.0031295 0.0051929 0.0055024 0.0057432 0.0030607 0.0019602 0.0036798 0.0052273 0.0064654 0.0064654 0.0022698 0.0037485 0.0040237 0.0045739 0.0024073 0.0062934 0.0035422 0.0039549 0.0027856 0.0032327 0.0015820 0.0015820 0.0020634 0.0030263 0.0029920 0.0024073 0.0017883 0.0015820 0.0029576 0.0035422 0.0022698 0.0034734 0.0019259 0.0029920 0.0025105 0.0024417 0.0024761 0.0017539 0.0043332 0.0046771 0.0069468 0.0040581 0.0017539 0.0023729 0.0029232 0.0034390 0.0015820 0.0020978 0.0016163 0.0014788 0.0019602 0.0016851 0.0013756 0.0039205 0.0027512 0.0037141 0.0051585 0.0056400 0.0029920 0.0024417 0.0067749 0.0045051 0.0039549 0.0039893 0.0040581 0.0056400 0.0051929 0.0037829 0.0047459 0.0014788 0.0031983 0.0017883 0.0050210 0.0070500 0.0035422 0.0024073 0.0048490 0.0031639 0.0045739 0.0032327 0.0049866 0.0045051 0.0050210 0.0026481 0.0024417 0.0034390 0.0037829 0.0027856 0.0061215 0.0029576 0.0024073 0.0016163 0.0041268 0.0070500 0.0064654 0.0055712 0.0048146 0.0056400 0.0059151 0.0049522 0.0062934 0.0040581 0.0084944 0.0080473 0.0125181
s21_contig00002 0.0125181 0.0069468 0.0058463 0.0078066 0.0061215 0.0029920 0.0057776 0.0028200 0.0050210 0.0064654 0.0048834 0.0029920 0.0051585 0.0052617 0.0055024 0.0048146 0.0062934 0.0024761 0.0031639 0.0041956 0.0024417 0.0015820 0.0041956 0.0014444 0.0047459 0.0030607 0.0035078 0.0028200 0.0013756 0.0015820 0.0022010 0.0029920 0.0048146 0.0019259 0.0020978 0.0026481 0.0049866 0.0035422 0.0058463 0.0029576 0.0040581 0.0031295 0.0028888 0.0014444 0.0016163 0.0015820 0.0021322 0.0028200 0.0041268 0.0029576 0.0026137 0.0058463 0.0048490 0.0040237 0.0070156 0.0026481 0.0067749 0.0041956 0.0050554 0.0041956 0.0029232 0.0052617 0.0047459 0.0078066 0.0080473 0.0046771 0.0053305 0.0047459 0.0027856 0.0030263 0.0024761 0.0021322 0.0017883 0.0052273 0.0018915 0.0022010 0.0037141 0.0066717 0.0043332 0.0055024 0.0049522 0.0024417 0.0021666 0.0050554 0.0026481 0.0032327 0.0027856 0.0028888 0.0037829 0.0057432 0.0037829 0.0035078 0.0016851 0.0030951 0.0018915 0.0048834 0.0055712 0.0034734 0.0033015 0.0070156 0.0032327 0.0062934 0.0041268 0.0058463 0.0039893 0.0058807 0.0027856 0.0041956 0.0020978 0.0037829 0.0024761 0.0057776 0.0016163 0.0015820 0.0006878 0.0026137 0.0024073 0.0037485 0.0033015 0.0020978 0.0024417 0.0033359 0.0021666 0.0031639 0.0023729 0.0058120 0.0053305 0.0058463 0.0084944 0.0043332 0.0058120 0.0052617 0.0037829 0.0020634 0.0037829 0.0015820 0.0031983 0.0036798 0.0030951 0.0015820 0.0027512 0.0038517 0.0066717 0.0052617 0.0059151 0.0025105 0.0033359 0.0041956 0.0050210 0.0027856 0.0058807 0.0031295 0.0051929 0.0055024 0.0057432 0.0030607 0.0019602 0.0036798 0.0052273 0.0064654 0.0064654 0.0022698 0.0037485 0.0040237 0.0045739 0.0024073 0.0062934 0.0035422 0.0039549 0.0027856 0.0032327 0.0015820 0.0015820 0.0020634 0.0030263 0.0029920 0.0024073 0.0017883 0.0015820 0.0029576 0.0035422 0.0022698 0.0034734 0.0019259 0.0029920 0.0025105 0.0024417 0.0024761 0.0017539 0.0043332 0.0046771 0.0069468 0.0040581 0.0017539 0.0023729 0.0029232 0.0034390 0.0015820 0.0020978 0.0016163 0.0014788 0.0019602 0.0016851 0.0013756 0.0039205 0.0027512 0.0037141 0.0051585 0.0056400 0.0029920 0.0024417 0.0067749 0.0045051 0.0039549 0.0039893 0.0040581 0.0056400 0.0051929 0.0037829 0.0047459 0.0014788 0.0031983 0.0017883 0.0050210 0.0070500 0.0035422 0.0024073 0.0048490 0.0031639 0.0045739 0.0032327 0.0049866 0.0045051 0.0050210 0.0026481 0.0024417 0.0034390 0.0037829 0.0027856 0.0061215 0.0029576 0.0024073 0.0016163 0.0041268 0.0070500 0.0064654 0.0055712 0.0048146 0.0056400 0.0059151 0.0049522 0.0062934 0.0040581 0.0084944 0.0080473 0.0125181
Importantly, the identifier of each block with the pattern "21_contig" located under the statement "The input sequences are:" should be placed in the first column, replacing "Obs Frequency".
This seems to work too (save this code as transpose.awk):
/^# +s21_contig[0-9]+/ {
if (source) print_results()
source = $2
}
/^[ACGT]+ / {
if (!($1 in key))
{
key[$1] = 1
seq[++nkeys] = $1
}
obs[$1] = $3
}
END { print_results() }
function print_results( i)
{
if (printed_header == 0)
{
pad = " "
for (i = 1; i <= nkeys; i++)
{
printf "%s%s", pad, seq[i]
pad = " "
}
printf "\n"
printed_header++
}
printf "%s ", source
for (i = 1; i <= nkeys; i++)
printf " %-9s", obs[seq[i]]
printf "\n"
delete obs
}
Run the script as:
awk -f transpose.awk data
On the given data:
AAAA AAAC AAAG AAAT AACA AACC AACG AACT AAGA AAGC AAGG AAGT AATA AATC AATG AATT ACAA ACAC ACAG ACAT ACCA ACCC ACCG ACCT ACGA ACGC ACGG ACGT ACTA ACTC ACTG ACTT AGAA AGAC AGAG AGAT AGCA AGCC AGCG AGCT AGGA AGGC AGGG AGGT AGTA AGTC AGTG AGTT ATAA ATAC ATAG ATAT ATCA ATCC ATCG ATCT ATGA ATGC ATGG ATGT ATTA ATTC ATTG ATTT CAAA CAAC CAAG CAAT CACA CACC CACG CACT CAGA CAGC CAGG CAGT CATA CATC CATG CATT CCAA CCAC CCAG CCAT CCCA CCCC CCCG CCCT CCGA CCGC CCGG CCGT CCTA CCTC CCTG CCTT CGAA CGAC CGAG CGAT CGCA CGCC CGCG CGCT CGGA CGGC CGGG CGGT CGTA CGTC CGTG CGTT CTAA CTAC CTAG CTAT CTCA CTCC CTCG CTCT CTGA CTGC CTGG CTGT CTTA CTTC CTTG CTTT GAAA GAAC GAAG GAAT GACA GACC GACG GACT GAGA GAGC GAGG GAGT GATA GATC GATG GATT GCAA GCAC GCAG GCAT GCCA GCCC GCCG GCCT GCGA GCGC GCGG GCGT GCTA GCTC GCTG GCTT GGAA GGAC GGAG GGAT GGCA GGCC GGCG GGCT GGGA GGGC GGGG GGGT GGTA GGTC GGTG GGTT GTAA GTAC GTAG GTAT GTCA GTCC GTCG GTCT GTGA GTGC GTGG GTGT GTTA GTTC GTTG GTTT TAAA TAAC TAAG TAAT TACA TACC TACG TACT TAGA TAGC TAGG TAGT TATA TATC TATG TATT TCAA TCAC TCAG TCAT TCCA TCCC TCCG TCCT TCGA TCGC TCGG TCGT TCTA TCTC TCTG TCTT TGAA TGAC TGAG TGAT TGCA TGCC TGCG TGCT TGGA TGGC TGGG TGGT TGTA TGTC TGTG TGTT TTAA TTAC TTAG TTAT TTCA TTCC TTCG TTCT TTGA TTGC TTGG TTGT TTTA TTTC TTTG TTTT
s21_contig00001 0.0115837 0.0060850 0.0061659 0.0069745 0.0045890 0.0022844 0.0064893 0.0022035 0.0044879 0.0068532 0.0039623 0.0034165 0.0026079 0.0045688 0.0057817 0.0039623 0.0042656 0.0018396 0.0020822 0.0033761 0.0016173 0.0014555 0.0043869 0.0010512 0.0065095 0.0040634 0.0050944 0.0040836 0.0007076 0.0015162 0.0017588 0.0034165 0.0031941 0.0018396 0.0020216 0.0016981 0.0046497 0.0037399 0.0070756 0.0044071 0.0029111 0.0029920 0.0022035 0.0010512 0.0012938 0.0017790 0.0021227 0.0022035 0.0027494 0.0020216 0.0012938 0.0031132 0.0048922 0.0034771 0.0087131 0.0016981 0.0062871 0.0046497 0.0043060 0.0033761 0.0022237 0.0033558 0.0043666 0.0069745 0.0079246 0.0041645 0.0054987 0.0043666 0.0016375 0.0026483 0.0028100 0.0021227 0.0011523 0.0061254 0.0013545 0.0017588 0.0025674 0.0065904 0.0036793 0.0057817 0.0043464 0.0017588 0.0020216 0.0043060 0.0021429 0.0027291 0.0042858 0.0022035 0.0055796 0.0077225 0.0059435 0.0050944 0.0007278 0.0030930 0.0013545 0.0039623 0.0066308 0.0064489 0.0048720 0.0087131 0.0049933 0.0094004 0.0072373 0.0070756 0.0057211 0.0099462 0.0042858 0.0043869 0.0027494 0.0077023 0.0028100 0.0064893 0.0008895 0.0008491 0.0002426 0.0012938 0.0026483 0.0032345 0.0048720 0.0020216 0.0028909 0.0033963 0.0020216 0.0020822 0.0012332 0.0058222 0.0054987 0.0061659 0.0080661 0.0042656 0.0058222 0.0033558 0.0038006 0.0026685 0.0077023 0.0017790 0.0023653 0.0058020 0.0030930 0.0015162 0.0027696 0.0048518 0.0065904 0.0045688 0.0069543 0.0030526 0.0033963 0.0046497 0.0052561 0.0037602 0.0099462 0.0029920 0.0074192 0.0095015 0.0077225 0.0040634 0.0010917 0.0058020 0.0061254 0.0068532 0.0059637 0.0027898 0.0032345 0.0034771 0.0050540 0.0037602 0.0094004 0.0037399 0.0034165 0.0037602 0.0027291 0.0014555 0.0009097 0.0026685 0.0026483 0.0022844 0.0018801 0.0017386 0.0008491 0.0020216 0.0048720 0.0027898 0.0064489 0.0018396 0.0025674 0.0030526 0.0017588 0.0018396 0.0010512 0.0042656 0.0041645 0.0060850 0.0032345 0.0010512 0.0012332 0.0022237 0.0015364 0.0009097 0.0027494 0.0012938 0.0007480 0.0010917 0.0007278 0.0007076 0.0012130 0.0027696 0.0025674 0.0026079 0.0063882 0.0025674 0.0028909 0.0062871 0.0034165 0.0034165 0.0057211 0.0029111 0.0071564 0.0074192 0.0055796 0.0065095 0.0007480 0.0023653 0.0011523 0.0044879 0.0057211 0.0048720 0.0026483 0.0048922 0.0033558 0.0050540 0.0049933 0.0046497 0.0034165 0.0052561 0.0021429 0.0016173 0.0015364 0.0038006 0.0016375 0.0045890 0.0022237 0.0018801 0.0008895 0.0027494 0.0057211 0.0059637 0.0066308 0.0031941 0.0063882 0.0069543 0.0043464 0.0042656 0.0032345 0.0080661 0.0079246 0.0115837
s21_contig00002 0.0125181 0.0069468 0.0058463 0.0078066 0.0061215 0.0029920 0.0057776 0.0028200 0.0050210 0.0064654 0.0048834 0.0029920 0.0051585 0.0052617 0.0055024 0.0048146 0.0062934 0.0024761 0.0031639 0.0041956 0.0024417 0.0015820 0.0041956 0.0014444 0.0047459 0.0030607 0.0035078 0.0028200 0.0013756 0.0015820 0.0022010 0.0029920 0.0048146 0.0019259 0.0020978 0.0026481 0.0049866 0.0035422 0.0058463 0.0029576 0.0040581 0.0031295 0.0028888 0.0014444 0.0016163 0.0015820 0.0021322 0.0028200 0.0041268 0.0029576 0.0026137 0.0058463 0.0048490 0.0040237 0.0070156 0.0026481 0.0067749 0.0041956 0.0050554 0.0041956 0.0029232 0.0052617 0.0047459 0.0078066 0.0080473 0.0046771 0.0053305 0.0047459 0.0027856 0.0030263 0.0024761 0.0021322 0.0017883 0.0052273 0.0018915 0.0022010 0.0037141 0.0066717 0.0043332 0.0055024 0.0049522 0.0024417 0.0021666 0.0050554 0.0026481 0.0032327 0.0027856 0.0028888 0.0037829 0.0057432 0.0037829 0.0035078 0.0016851 0.0030951 0.0018915 0.0048834 0.0055712 0.0034734 0.0033015 0.0070156 0.0032327 0.0062934 0.0041268 0.0058463 0.0039893 0.0058807 0.0027856 0.0041956 0.0020978 0.0037829 0.0024761 0.0057776 0.0016163 0.0015820 0.0006878 0.0026137 0.0024073 0.0037485 0.0033015 0.0020978 0.0024417 0.0033359 0.0021666 0.0031639 0.0023729 0.0058120 0.0053305 0.0058463 0.0084944 0.0043332 0.0058120 0.0052617 0.0037829 0.0020634 0.0037829 0.0015820 0.0031983 0.0036798 0.0030951 0.0015820 0.0027512 0.0038517 0.0066717 0.0052617 0.0059151 0.0025105 0.0033359 0.0041956 0.0050210 0.0027856 0.0058807 0.0031295 0.0051929 0.0055024 0.0057432 0.0030607 0.0019602 0.0036798 0.0052273 0.0064654 0.0064654 0.0022698 0.0037485 0.0040237 0.0045739 0.0024073 0.0062934 0.0035422 0.0039549 0.0027856 0.0032327 0.0015820 0.0015820 0.0020634 0.0030263 0.0029920 0.0024073 0.0017883 0.0015820 0.0029576 0.0035422 0.0022698 0.0034734 0.0019259 0.0029920 0.0025105 0.0024417 0.0024761 0.0017539 0.0043332 0.0046771 0.0069468 0.0040581 0.0017539 0.0023729 0.0029232 0.0034390 0.0015820 0.0020978 0.0016163 0.0014788 0.0019602 0.0016851 0.0013756 0.0039205 0.0027512 0.0037141 0.0051585 0.0056400 0.0029920 0.0024417 0.0067749 0.0045051 0.0039549 0.0039893 0.0040581 0.0056400 0.0051929 0.0037829 0.0047459 0.0014788 0.0031983 0.0017883 0.0050210 0.0070500 0.0035422 0.0024073 0.0048490 0.0031639 0.0045739 0.0032327 0.0049866 0.0045051 0.0050210 0.0026481 0.0024417 0.0034390 0.0037829 0.0027856 0.0061215 0.0029576 0.0024073 0.0016163 0.0041268 0.0070500 0.0064654 0.0055712 0.0048146 0.0056400 0.0059151 0.0049522 0.0062934 0.0040581 0.0084944 0.0080473 0.0125181
The code prints the [ACGT] code sequences in the order in which they're encountered, adding new values as necessary. If a value is missing from one of the sources, it will show as a blank field in the output. The heading line corresponds to the list of [ACGT] code sequences as of the end of the first set of data; the code never tries to print the heading a second time.
This awk script should do what you need:
Content of script.awk:
/^# +s21_contig/ { sequence[++seqcnt] = $2 }
{ map[sequence[seqcnt], $1] = $3 }
/^[ACGT]+ / && !seen[$1]++ { words[++wordcnt] = $1 }
END {
for (word=1; word<=wordcnt; word++) {
printf "\t%s", words[word]
}
print ""
for (seqnum=1; seqnum<=seqcnt; seqnum++) {
printf "%s ", sequence[seqnum];
for (word=1; word<=wordcnt; word++) {
printf "%s%s", map[sequence[seqnum],words[word]], (word==wordcnt ? RS : FS)
}
}
}
Run it like:
awk -f script.awk file
Output:
AAAA AAAC AAAG AAAT AACA AACC AACG AACT AAGA AAGC AAGG AAGT AATA AATC AATG AATT ACAA ACAC ACAG ACAT ACCA ACCC ACCG ACCT ACGA ACGC ACGG ACGT ACTA ACTC ACTG ACTT AGAA AGAC AGAG AGAT AGCA AGCC AGCG AGCT AGGA AGGC AGGG AGGT AGTA AGTC AGTG AGTT ATAA ATAC ATAG ATAT ATCA ATCC ATCG ATCT ATGA ATGC ATGG ATGT ATTA ATTC ATTG ATTT CAAA CAAC CAAG CAAT CACA CACC CACG CACT CAGA CAGC CAGG CAGT CATA CATC CATG CATT CCAA CCAC CCAG CCAT CCCA CCCC CCCG CCCT CCGA CCGC CCGG CCGT CCTA CCTC CCTG CCTT CGAA CGAC CGAG CGAT CGCA CGCC CGCG CGCT CGGA CGGC CGGG CGGT CGTA CGTC CGTG CGTT CTAA CTAC CTAG CTAT CTCA CTCC CTCG CTCT CTGA CTGC CTGG CTGT CTTA CTTC CTTG CTTT GAAA GAAC GAAG GAAT GACA GACC GACG GACT GAGA GAGC GAGG GAGT GATA GATC GATG GATT GCAA GCAC GCAG GCAT GCCA GCCC GCCG GCCT GCGA GCGC GCGG GCGT GCTA GCTC GCTG GCTT GGAA GGAC GGAG GGAT GGCA GGCC GGCG GGCT GGGA GGGC GGGG GGGT GGTA GGTC GGTG GGTT GTAA GTAC GTAG GTAT GTCA GTCC GTCG GTCT GTGA GTGC GTGG GTGT GTTA GTTC GTTG GTTT TAAA TAAC TAAG TAAT TACA TACC TACG TACT TAGA TAGC TAGG TAGT TATA TATC TATG TATT TCAA TCAC TCAG TCAT TCCA TCCC TCCG TCCT TCGA TCGC TCGG TCGT TCTA TCTC TCTG TCTT TGAA TGAC TGAG TGAT TGCA TGCC TGCG TGCT TGGA TGGC TGGG TGGT TGTA TGTC TGTG TGTT TTAA TTAC TTAG TTAT TTCA TTCC TTCG TTCT TTGA TTGC TTGG TTGT TTTA TTTC TTTG TTTT
s21_contig00001 0.0115837 0.0060850 0.0061659 0.0069745 0.0045890 0.0022844 0.0064893 0.0022035 0.0044879 0.0068532 0.0039623 0.0034165 0.0026079 0.0045688 0.0057817 0.0039623 0.0042656 0.0018396 0.0020822 0.0033761 0.0016173 0.0014555 0.0043869 0.0010512 0.0065095 0.0040634 0.0050944 0.0040836 0.0007076 0.0015162 0.0017588 0.0034165 0.0031941 0.0018396 0.0020216 0.0016981 0.0046497 0.0037399 0.0070756 0.0044071 0.0029111 0.0029920 0.0022035 0.0010512 0.0012938 0.0017790 0.0021227 0.0022035 0.0027494 0.0020216 0.0012938 0.0031132 0.0048922 0.0034771 0.0087131 0.0016981 0.0062871 0.0046497 0.0043060 0.0033761 0.0022237 0.0033558 0.0043666 0.0069745 0.0079246 0.0041645 0.0054987 0.0043666 0.0016375 0.0026483 0.0028100 0.0021227 0.0011523 0.0061254 0.0013545 0.0017588 0.0025674 0.0065904 0.0036793 0.0057817 0.0043464 0.0017588 0.0020216 0.0043060 0.0021429 0.0027291 0.0042858 0.0022035 0.0055796 0.0077225 0.0059435 0.0050944 0.0007278 0.0030930 0.0013545 0.0039623 0.0066308 0.0064489 0.0048720 0.0087131 0.0049933 0.0094004 0.0072373 0.0070756 0.0057211 0.0099462 0.0042858 0.0043869 0.0027494 0.0077023 0.0028100 0.0064893 0.0008895 0.0008491 0.0002426 0.0012938 0.0026483 0.0032345 0.0048720 0.0020216 0.0028909 0.0033963 0.0020216 0.0020822 0.0012332 0.0058222 0.0054987 0.0061659 0.0080661 0.0042656 0.0058222 0.0033558 0.0038006 0.0026685 0.0077023 0.0017790 0.0023653 0.0058020 0.0030930 0.0015162 0.0027696 0.0048518 0.0065904 0.0045688 0.0069543 0.0030526 0.0033963 0.0046497 0.0052561 0.0037602 0.0099462 0.0029920 0.0074192 0.0095015 0.0077225 0.0040634 0.0010917 0.0058020 0.0061254 0.0068532 0.0059637 0.0027898 0.0032345 0.0034771 0.0050540 0.0037602 0.0094004 0.0037399 0.0034165 0.0037602 0.0027291 0.0014555 0.0009097 0.0026685 0.0026483 0.0022844 0.0018801 0.0017386 0.0008491 0.0020216 0.0048720 0.0027898 0.0064489 0.0018396 0.0025674 0.0030526 0.0017588 0.0018396 0.0010512 0.0042656 0.0041645 0.0060850 0.0032345 0.0010512 0.0012332 0.0022237 0.0015364 0.0009097 0.0027494 0.0012938 0.0007480 0.0010917 0.0007278 0.0007076 0.0012130 0.0027696 0.0025674 0.0026079 0.0063882 0.0025674 0.0028909 0.0062871 0.0034165 0.0034165 0.0057211 0.0029111 0.0071564 0.0074192 0.0055796 0.0065095 0.0007480 0.0023653 0.0011523 0.0044879 0.0057211 0.0048720 0.0026483 0.0048922 0.0033558 0.0050540 0.0049933 0.0046497 0.0034165 0.0052561 0.0021429 0.0016173 0.0015364 0.0038006 0.0016375 0.0045890 0.0022237 0.0018801 0.0008895 0.0027494 0.0057211 0.0059637 0.0066308 0.0031941 0.0063882 0.0069543 0.0043464 0.0042656 0.0032345 0.0080661 0.0079246 0.0115837
s21_contig00002 0.0125181 0.0069468 0.0058463 0.0078066 0.0061215 0.0029920 0.0057776 0.0028200 0.0050210 0.0064654 0.0048834 0.0029920 0.0051585 0.0052617 0.0055024 0.0048146 0.0062934 0.0024761 0.0031639 0.0041956 0.0024417 0.0015820 0.0041956 0.0014444 0.0047459 0.0030607 0.0035078 0.0028200 0.0013756 0.0015820 0.0022010 0.0029920 0.0048146 0.0019259 0.0020978 0.0026481 0.0049866 0.0035422 0.0058463 0.0029576 0.0040581 0.0031295 0.0028888 0.0014444 0.0016163 0.0015820 0.0021322 0.0028200 0.0041268 0.0029576 0.0026137 0.0058463 0.0048490 0.0040237 0.0070156 0.0026481 0.0067749 0.0041956 0.0050554 0.0041956 0.0029232 0.0052617 0.0047459 0.0078066 0.0080473 0.0046771 0.0053305 0.0047459 0.0027856 0.0030263 0.0024761 0.0021322 0.0017883 0.0052273 0.0018915 0.0022010 0.0037141 0.0066717 0.0043332 0.0055024 0.0049522 0.0024417 0.0021666 0.0050554 0.0026481 0.0032327 0.0027856 0.0028888 0.0037829 0.0057432 0.0037829 0.0035078 0.0016851 0.0030951 0.0018915 0.0048834 0.0055712 0.0034734 0.0033015 0.0070156 0.0032327 0.0062934 0.0041268 0.0058463 0.0039893 0.0058807 0.0027856 0.0041956 0.0020978 0.0037829 0.0024761 0.0057776 0.0016163 0.0015820 0.0006878 0.0026137 0.0024073 0.0037485 0.0033015 0.0020978 0.0024417 0.0033359 0.0021666 0.0031639 0.0023729 0.0058120 0.0053305 0.0058463 0.0084944 0.0043332 0.0058120 0.0052617 0.0037829 0.0020634 0.0037829 0.0015820 0.0031983 0.0036798 0.0030951 0.0015820 0.0027512 0.0038517 0.0066717 0.0052617 0.0059151 0.0025105 0.0033359 0.0041956 0.0050210 0.0027856 0.0058807 0.0031295 0.0051929 0.0055024 0.0057432 0.0030607 0.0019602 0.0036798 0.0052273 0.0064654 0.0064654 0.0022698 0.0037485 0.0040237 0.0045739 0.0024073 0.0062934 0.0035422 0.0039549 0.0027856 0.0032327 0.0015820 0.0015820 0.0020634 0.0030263 0.0029920 0.0024073 0.0017883 0.0015820 0.0029576 0.0035422 0.0022698 0.0034734 0.0019259 0.0029920 0.0025105 0.0024417 0.0024761 0.0017539 0.0043332 0.0046771 0.0069468 0.0040581 0.0017539 0.0023729 0.0029232 0.0034390 0.0015820 0.0020978 0.0016163 0.0014788 0.0019602 0.0016851 0.0013756 0.0039205 0.0027512 0.0037141 0.0051585 0.0056400 0.0029920 0.0024417 0.0067749 0.0045051 0.0039549 0.0039893 0.0040581 0.0056400 0.0051929 0.0037829 0.0047459 0.0014788 0.0031983 0.0017883 0.0050210 0.0070500 0.0035422 0.0024073 0.0048490 0.0031639 0.0045739 0.0032327 0.0049866 0.0045051 0.0050210 0.0026481 0.0024417 0.0034390 0.0037829 0.0027856 0.0061215 0.0029576 0.0024073 0.0016163 0.0041268 0.0070500 0.0064654 0.0055712 0.0048146 0.0056400 0.0059151 0.0049522 0.0062934 0.0040581 0.0084944 0.0080473 0.0125181
Doing it in one pass is a challenge. This script accomplishes that. The formatting of output was set to match the sample provided, but can easily be adjusted. The comments inline below explain the script operation.
#!/bin/bash
[ -f "$1" ] || {
printf "\n Error: insufficient input, file '%s' not found.\n\n" "${0//*\//}"
exit 1
}
## this script requires the header row to be equal for each sequence
key="${2:-s21}" # key to identify sequence ( 3 chars ) default "s21"
currentseq="" # variable to hold sequence
declare -i needhdr=0 # flag to control print header
declare -i seqcnt=0 # sequence count
declare -a obsfarray # array to hold Obs Frequency
## make single pass through data file
while read -r word obscnt obsfreq expfreq oefreq || [ -n "$word" ]; do
## capture inputseq from obscnt
if [ "z${obscnt:0:3}" = "z${key}" ]; then
# if sequence count > 0 headers is already printed and ready to print data
if [ $seqcnt -gt 0 ]; then
needhdr=1 # set need header to false
printf "\n%s" "$inputseq" # print newline followed by input sequence
for i in ${obsfarray[#]}; do # print the Obs Frequency values
printf " $i"
done
unset obsfarray # unset the array for next sequence
fi
inputseq="${obscnt}" # set the inputseq valued from obscnt
((seqcnt++)) # increment the seqcnt
fi
## print header, capture obsfreq values
# test that first char is A C G T
if [ "z${word:0:1}" = "zA" ] || [ "z${word:0:1}" = "zC" ] ||
[ "z${word:0:1}" = "zG" ] || [ "z${word:0:1}" = "zT" ]; then
if [ "z${word:1:1}" != "zo" ]; then # get rid of pesky 'Total'
[ $needhdr -eq 0 ] && printf " %s" "$word" # print header
obsfarray+=( "$obsfreq" ) # fill Obs Frequency array
fi
fi
currentseq="$inputseq" # keep current seq to test for new value
done <"$1"
# print final sequence and Obs Frequency array
printf "\n%s" "$inputseq"
for i in ${obsfarray[#]}; do
printf " $i"
done
unset obsfarray
exit 0
output (showing 5 values each row):
$ ./dna.sh dat/dna.dat
AAAA AAAC AAAG AAAT AACA <snip>
s21_contig00001 0.0115837 0.0060850 0.0061659 0.0069745 0.0045890 <snip>
s21_contig00002 0.0125181 0.0069468 0.0058463 0.0078066 0.0061215 <snip>

Resources