SHELL: Output contents of each FOLDER to each column of CSV - macos
I am trying to output contents of each folder into each column of CSV e.g.
ls -R is great but it Outputs everything into 1 column:
/Folder1:
1a.jpg
2a.jpg
3a.jpg
4a.jpg
5a.jpg
/Folder2:
1b.jpg
2b.jpg
3b.jpg
4b.jpg
5b.jpg
/Folder3:
1c.jpg
2c.jpg
3c.jpg
4c.jpg
5c.jpg
/Folder4:
1d.jpg
2d.jpg
3d.jpg
4d.jpg
5d.jpg
/Folder5:
1e.jpg
2e.jpg
3e.jpg
4e.jpg
5e.jpg
But I am trying to Output every folder into new column.
/Folder1: ,/Folder2: ,/Folder3: ,/Folder4: ,/Folder5:
1a.jpg ,1b.jpg ,1c.jpg ,1d.jpg ,1e.jpg
2a.jpg ,2b.jpg ,2c.jpg ,2d.jpg ,2e.jpg
3a.jpg ,3b.jpg ,3c.jpg ,3d.jpg ,3e.jpg
4a.jpg ,4b.jpg ,4c.jpg ,4d.jpg ,4e.jpg
5a.jpg ,5b.jpg ,5c.jpg ,5d.jpg ,5e.jpg
Related
Why does sequentially labelling files in folder creates extra files?
Currently trying to preprocess data and need to name all the image files in my folder sequentially, however when I try to do it with the code below, it ends up producing more files that appear to be previous transformations of the images (previously inverted/cropped copies) ls | cat -n | while read n f; do mv "$f" "$n.png"; done I wanted the folders in my file to just label the data sequentially and not create any additional files. The modifications to the image were made using imagemagick on Linux The output of ls is as below: 000001560000.png~ 000001900000.png~~~~ 000002260000.png~ 000001560000.png~~ 000001900000.png~~~~~ 000002260000.png~~ 000001560000.png~~~ 000001910000.png~ 000002260000.png~~~ 000001560000.png~~~~ 000001910000.png~~ 000002260000.png~~~~ 000001560000.png~~~~~ 000001910000.png~~~ 000002260000.png~~~~~ 000001570000.png~ 000001910000.png~~~~ 000002270000.png~ 000001570000.png~~ 000001910000.png~~~~~ 000002270000.png~~ 000001570000.png~~~ 000001920000.png~ 000002270000.png~~~ 000001570000.png~~~~ 000001920000.png~~ 000002270000.png~~~~ 000001570000.png~~~~~ 000001920000.png~~~ 000002270000.png~~~~~ 000001580000.png~ 000001920000.png~~~~ 000002280000.png~ 000001580000.png~~ 000001920000.png~~~~~ 000002280000.png~~ 000001580000.png~~~ 000001930000.png~ 000002280000.png~~~ 000001580000.png~~~~ 000001930000.png~~ 000002280000.png~~~~ 000001580000.png~~~~~ 000001930000.png~~~ 000002280000.png~~~~~ 000001590000.png~ 000001930000.png~~~~ 000002290000.png~ 000001590000.png~~ 000001930000.png~~~~~ 000002290000.png~~ 000001590000.png~~~ 000001940000.png~ 000002290000.png~~~ 000001590000.png~~~~ 000001940000.png~~ 000002290000.png~~~~ 000001590000.png~~~~~ 000001940000.png~~~ 000002290000.png~~~~~ 000001600000.png~ 000001940000.png~~~~ 000002300000.png~ 000001600000.png~~ 000001940000.png~~~~~ 000002300000.png~~ 000001600000.png~~~ 000001950000.png~ 000002300000.png~~~ 000001600000.png~~~~ 000001950000.png~~ 000002300000.png~~~~ 000001600000.png~~~~~ 000001950000.png~~~ 000002300000.png~~~~~ 000001610000.png~ 000001950000.png~~~~ 000002310000.png~ 000001610000.png~~ 000001950000.png~~~~~ 000002310000.png~~ 000001610000.png~~~ 000001960000.png~ 000002310000.png~~~ 000001610000.png~~~~ 000001960000.png~~ 000002310000.png~~~~ 000001610000.png~~~~~ 000001960000.png~~~ 000002310000.png~~~~~ 000001620000.png~ 000001960000.png~~~~ 000002320000.png~ 000001620000.png~~ 000001960000.png~~~~~ 000002320000.png~~ 000001620000.png~~~ 000001970000.png~ 000002320000.png~~~ 000001620000.png~~~~ 000001970000.png~~ 000002320000.png~~~~ 000001620000.png~~~~~ 000001970000.png~~~ 000002320000.png~~~~~ 000001630000.png~ 000001970000.png~~~~ 000002330000.png~ 000001630000.png~~ 000001970000.png~~~~~ 000002330000.png~~ 000001630000.png~~~ 000001980000.png~ 000002330000.png~~~ 000001630000.png~~~~ 000001980000.png~~ 000002330000.png~~~~ 000001630000.png~~~~~ 000001980000.png~~~ 000002330000.png~~~~~ 000001640000.png~ 000001980000.png~~~~ 000002340000.png~ 000001640000.png~~ 000001980000.png~~~~~ 000002340000.png~~ 000001640000.png~~~ 000001990000.png~ 000002340000.png~~~ 000001640000.png~~~~ 000001990000.png~~ 000002340000.png~~~~ 000001640000.png~~~~~ 000001990000.png~~~ 000002340000.png~~~~~ 000001650000.png~ 000001990000.png~~~~ 000002350000.png~ 000001650000.png~~ 000001990000.png~~~~~ 000002350000.png~~ 000001650000.png~~~ 000002000000.png~ 000002350000.png~~~ 000001650000.png~~~~ 000002000000.png~~ 000002350000.png~~~~ 000001650000.png~~~~~ 000002000000.png~~~ 000002350000.png~~~~~ 000001660000.png~ 000002000000.png~~~~ 000002360000.png~ 000001660000.png~~ 000002000000.png~~~~~ 000002360000.png~~ 000001660000.png~~~ 000002010000.png~ 000002360000.png~~~ 000001660000.png~~~~ 000002010000.png~~ 000002360000.png~~~~ 000001660000.png~~~~~ 000002010000.png~~~ 000002360000.png~~~~~ 000001670000.png~ 000002010000.png~~~~ 000002370000.png~ 000001670000.png~~ 000002010000.png~~~~~ 000002370000.png~~ 000001670000.png~~~ 000002020000.png~ 000002370000.png~~~ 000001670000.png~~~~ 000002020000.png~~ 000002370000.png~~~~ 000001670000.png~~~~~ 000002020000.png~~~ 000002370000.png~~~~~ 000001680000.png~ 000002020000.png~~~~ 000002380000.png~ 000001680000.png~~ 000002020000.png~~~~~ 000002380000.png~~ 000001680000.png~~~ 000002030000.png~ 000002380000.png~~~ 000001680000.png~~~~ 000002030000.png~~ 000002380000.png~~~~ 000001680000.png~~~~~ 000002030000.png~~~ 000002380000.png~~~~~ 000001690000.png~ 000002030000.png~~~~ 000002390000.png~ 000001690000.png~~ 000002030000.png~~~~~ 000002390000.png~~ 000001690000.png~~~ 000002040000.png~ 000002390000.png~~~ 000001690000.png~~~~ 000002040000.png~~ 000002390000.png~~~~ 000001690000.png~~~~~ 000002040000.png~~~ 000002390000.png~~~~~ 000001700000.png~ 000002040000.png~~~~ 000002400000.png~ 000001700000.png~~ 000002040000.png~~~~~ 000002400000.png~~ 000001700000.png~~~ 000002050000.png~ 000002400000.png~~~ 000001700000.png~~~~ 000002050000.png~~ 000002400000.png~~~~ 000001700000.png~~~~~ 000002050000.png~~~ 000002400000.png~~~~~ 000001710000.png~ 000002050000.png~~~~ 000002410000.png~ 000001710000.png~~ 000002050000.png~~~~~ 000002410000.png~~ 000001710000.png~~~ 000002060000.png~ 000002410000.png~~~ 000001710000.png~~~~ 000002060000.png~~ 000002410000.png~~~~ 000001710000.png~~~~~ 000002060000.png~~~ 000002410000.png~~~~~ 000001720000.png~ 000002060000.png~~~~ 000002420000.png~ 000001720000.png~~ 000002060000.png~~~~~ 000002420000.png~~ 000001720000.png~~~ 000002070000.png~ 000002420000.png~~~ 000001720000.png~~~~ 000002070000.png~~ 000002420000.png~~~~ 000001720000.png~~~~~ 000002070000.png~~~ 000002420000.png~~~~~ 000001730000.png~ 000002070000.png~~~~ 10.png 000001730000.png~~ 000002070000.png~~~~~ 11.png 000001730000.png~~~ 000002080000.png~ 12.png 000001730000.png~~~~ 000002080000.png~~ 13.png 000001730000.png~~~~~ 000002080000.png~~~ 14.png 000001740000.png~ 000002080000.png~~~~ 15.png 000001740000.png~~ 000002080000.png~~~~~ 16.png 000001740000.png~~~ 000002090000.png~ 17.png 000001740000.png~~~~ 000002090000.png~~ 18.png 000001740000.png~~~~~ 000002090000.png~~~ 19.png 000001750000.png~ 000002090000.png~~~~ 1.png 000001750000.png~~ 000002090000.png~~~~~ 20.png 000001750000.png~~~ 000002100000.png~ 21.png 000001750000.png~~~~ 000002100000.png~~ 22.png 000001750000.png~~~~~ 000002100000.png~~~ 23.png 000001760000.png~ 000002100000.png~~~~ 24.png 000001760000.png~~ 000002100000.png~~~~~ 25.png 000001760000.png~~~ 000002110000.png~ 26.png 000001760000.png~~~~ 000002110000.png~~ 27.png 000001760000.png~~~~~ 000002110000.png~~~ 28.png 000001770000.png~ 000002110000.png~~~~ 29.png 000001770000.png~~ 000002110000.png~~~~~ 2.png 000001770000.png~~~ 000002120000.png~ 30.png 000001770000.png~~~~ 000002120000.png~~ 31.png 000001770000.png~~~~~ 000002120000.png~~~ 32.png 000001780000.png~ 000002120000.png~~~~ 33.png 000001780000.png~~ 000002120000.png~~~~~ 34.png 000001780000.png~~~ 000002130000.png~ 35.png 000001780000.png~~~~ 000002130000.png~~ 36.png 000001780000.png~~~~~ 000002130000.png~~~ 37.png 000001790000.png~ 000002130000.png~~~~ 38.png 000001790000.png~~ 000002130000.png~~~~~ 39.png 000001790000.png~~~ 000002140000.png~ 3.png 000001790000.png~~~~ 000002140000.png~~ 40.png 000001790000.png~~~~~ 000002140000.png~~~ 41.png 000001800000.png~ 000002140000.png~~~~ 42.png 000001800000.png~~ 000002140000.png~~~~~ 43.png 000001800000.png~~~ 000002150000.png~ 44.png 000001800000.png~~~~ 000002150000.png~~ 45.png 000001800000.png~~~~~ 000002150000.png~~~ 46.png 000001810000.png~ 000002150000.png~~~~ 47.png 000001810000.png~~ 000002150000.png~~~~~ 48.png 000001810000.png~~~ 000002160000.png~ 49.png 000001810000.png~~~~ 000002160000.png~~ 4.png 000001810000.png~~~~~ 000002160000.png~~~ 50.png 000001820000.png~ 000002160000.png~~~~ 51.png 000001820000.png~~ 000002160000.png~~~~~ 52.png 000001820000.png~~~ 000002170000.png~ 53.png 000001820000.png~~~~ 000002170000.png~~ 54.png 000001820000.png~~~~~ 000002170000.png~~~ 55.png 000001830000.png~ 000002170000.png~~~~ 56.png 000001830000.png~~ 000002170000.png~~~~~ 57.png 000001830000.png~~~ 000002180000.png~ 58.png 000001830000.png~~~~ 000002180000.png~~ 59.png 000001830000.png~~~~~ 000002180000.png~~~ 5.png 000001840000.png~ 000002180000.png~~~~ 60.png 000001840000.png~~ 000002180000.png~~~~~ 61.png 000001840000.png~~~ 000002190000.png~ 62.png 000001840000.png~~~~ 000002190000.png~~ 63.png 000001840000.png~~~~~ 000002190000.png~~~ 64.png 000001850000.png~ 000002190000.png~~~~ 65.png 000001850000.png~~ 000002190000.png~~~~~ 66.png 000001850000.png~~~ 000002200000.png~ 67.png 000001850000.png~~~~ 000002200000.png~~ 68.png 000001850000.png~~~~~ 000002200000.png~~~ 69.png 000001860000.png~ 000002200000.png~~~~ 6.png 000001860000.png~~ 000002200000.png~~~~~ 70.png 000001860000.png~~~ 000002210000.png~ 71.png 000001860000.png~~~~ 000002220000.png~ 72.png 000001860000.png~~~~~ 000002220000.png~~ 73.png 000001870000.png~ 000002220000.png~~~ 74.png 000001870000.png~~ 000002220000.png~~~~ 75.png 000001870000.png~~~ 000002220000.png~~~~~ 76.png 000001870000.png~~~~ 000002230000.png~ 77.png 000001870000.png~~~~~ 000002230000.png~~ 78.png 000001880000.png~ 000002230000.png~~~ 79.png 000001880000.png~~ 000002230000.png~~~~ 7.png 000001880000.png~~~ 000002230000.png~~~~~ 80.png 000001880000.png~~~~ 000002240000.png~ 81.png 000001880000.png~~~~~ 000002240000.png~~ 82.png 000001890000.png~ 000002240000.png~~~ 83.png 000001890000.png~~ 000002240000.png~~~~ 84.png 000001890000.png~~~ 000002240000.png~~~~~ 85.png 000001890000.png~~~~ 000002250000.png~ 86.png 000001890000.png~~~~~ 000002250000.png~~ 8.png 000001900000.png~ 000002250000.png~~~ 9.png 000001900000.png~~ 000002250000.png~~~~ 000001900000.png~~~ 000002250000.png~~~~~ The output of the second ls is: 100.png 148.png 196.png 244.png 292.png 340.png 388.png 436.png 484.png 101.png 149.png 197.png 245.png 293.png 341.png 389.png 437.png 485.png 102.png 150.png 198.png 246.png 294.png 342.png 390.png 438.png 486.png 103.png 151.png 199.png 247.png 295.png 343.png 391.png 439.png 487.png 104.png 152.png 200.png 248.png 296.png 344.png 392.png 440.png 488.png 105.png 153.png 201.png 249.png 297.png 345.png 393.png 441.png 489.png 106.png 154.png 202.png 250.png 298.png 346.png 394.png 442.png 490.png 107.png 155.png 203.png 251.png 299.png 347.png 395.png 443.png 491.png 108.png 156.png 204.png 252.png 300.png 348.png 396.png 444.png 492.png 109.png 157.png 205.png 253.png 301.png 349.png 397.png 445.png 493.png 110.png 158.png 206.png 254.png 302.png 350.png 398.png 446.png 494.png 111.png 159.png 207.png 255.png 303.png 351.png 399.png 447.png 495.png 112.png 160.png 208.png 256.png 304.png 352.png 400.png 448.png 496.png 113.png 161.png 209.png 257.png 305.png 353.png 401.png 449.png 497.png 114.png 162.png 210.png 258.png 306.png 354.png 402.png 450.png 498.png 115.png 163.png 211.png 259.png 307.png 355.png 403.png 451.png 499.png 116.png 164.png 212.png 260.png 308.png 356.png 404.png 452.png 500.png 117.png 165.png 213.png 261.png 309.png 357.png 405.png 453.png 501.png 118.png 166.png 214.png 262.png 310.png 358.png 406.png 454.png 502.png 119.png 167.png 215.png 263.png 311.png 359.png 407.png 455.png 503.png 120.png 168.png 216.png 264.png 312.png 360.png 408.png 456.png 504.png 121.png 169.png 217.png 265.png 313.png 361.png 409.png 457.png 505.png 122.png 170.png 218.png 266.png 314.png 362.png 410.png 458.png 506.png 123.png 171.png 219.png 267.png 315.png 363.png 411.png 459.png 507.png 124.png 172.png 220.png 268.png 316.png 364.png 412.png 460.png 508.png 125.png 173.png 221.png 269.png 317.png 365.png 413.png 461.png 509.png 126.png 174.png 222.png 270.png 318.png 366.png 414.png 462.png 510.png 127.png 175.png 223.png 271.png 319.png 367.png 415.png 463.png 511.png 128.png 176.png 224.png 272.png 320.png 368.png 416.png 464.png 512.png 129.png 177.png 225.png 273.png 321.png 369.png 417.png 465.png 513.png 130.png 178.png 226.png 274.png 322.png 370.png 418.png 466.png 514.png 131.png 179.png 227.png 275.png 323.png 371.png 419.png 467.png 515.png 132.png 180.png 228.png 276.png 324.png 372.png 420.png 468.png 516.png 133.png 181.png 229.png 277.png 325.png 373.png 421.png 469.png 517.png 134.png 182.png 230.png 278.png 326.png 374.png 422.png 470.png 87.png 135.png 183.png 231.png 279.png 327.png 375.png 423.png 471.png 88.png 136.png 184.png 232.png 280.png 328.png 376.png 424.png 472.png 89.png 137.png 185.png 233.png 281.png 329.png 377.png 425.png 473.png 90.png 138.png 186.png 234.png 282.png 330.png 378.png 426.png 474.png 91.png 139.png 187.png 235.png 283.png 331.png 379.png 427.png 475.png 92.png 140.png 188.png 236.png 284.png 332.png 380.png 428.png 476.png 93.png 141.png 189.png 237.png 285.png 333.png 381.png 429.png 477.png 94.png 142.png 190.png 238.png 286.png 334.png 382.png 430.png 478.png 95.png 143.png 191.png 239.png 287.png 335.png 383.png 431.png 479.png 96.png 144.png 192.png 240.png 288.png 336.png 384.png 432.png 480.png 97.png 145.png 193.png 241.png 289.png 337.png 385.png 433.png 481.png 98.png 146.png 194.png 242.png 290.png 338.png 386.png 434.png 482.png 99.png 147.png 195.png 243.png 291.png 339.png 387.png 435.png 483.png
Unix/bash/Shell: How to Find Files from a List and Merge Them into One File
I would like to merge specific files (XXXXXXX_Abstract_TOC.txt, XXXXXXX_Chapter1.txt, XXXXXXX_Chapter2.txt, XXXXXXX_Chapter3.txt, XXXXXXX_Chapter4.txt, XXXXXXX_Conclusion.txt) into one file based on specific numbers that come from a text file(/util_files/list_NRPs.txt). Note: X is [0-9] digit The list_NRPs.txt contains as follows: 0030001 0030002 0030004 ... In /All_Files folder, I have files as follows: 0030001_Abstract_TOC.txt 0030001_Chapter1.txt 0030001_Chapter2.txt 0030001_Chapter3.txt 0030001_Chapter4.txt 0030001_Conclusion.txt 0030002_Abstract_TOC.txt 0030002_Chapter1.txt 0030002_Chapter2.txt 0030002_Chapter3.txt 0030002_Chapter4.txt 0030002_Conclusion.txt 0030004_Abstract_TOC.txt 0030004_Chapter1.txt 0030004_Chapter2.txt 0030004_Chapter3.txt 0030004_Chapter4.txt 0030004_Conclusion.txt ... For each XXXXXXX from list_NRPs.txt I would like to merge XXXXXXX_Abstract_TOC.txt, XXXXXXX_Chapter1.txt, XXXXXXX_Chapter2.txt, XXXXXXX_Chapter3.txt, XXXXXXX_Chapter4.txt, XXXXXXX_Conclusion.txt into XXXXXXX_All.txt. The final process in /All_Files folder would be: 0030001_Abstract_TOC.txt 0030001_Chapter1.txt 0030001_Chapter2.txt 0030001_Chapter3.txt 0030001_Chapter4.txt 0030001_Conclusion.txt 0030001_All.txt 0030002_Abstract_TOC.txt 0030002_Chapter1.txt 0030002_Chapter2.txt 0030002_Chapter3.txt 0030002_Chapter4.txt 0030002_Conclusion.txt 0030002_All.txt 0030004_Abstract_TOC.txt 0030004_Chapter1.txt 0030004_Chapter2.txt 0030004_Chapter3.txt 0030004_Chapter4.txt 0030004_Conclusion.txt 0030004_All.txt ... I would like start with cat ../util_files/list_NRPs.txt | xargs but I do not know how to proceed. How can I do that?
You can use globbing to concatenate multiple files matching each line in list_NRPs.txt file: while read -r ch; do cat "/All_Files/$ch"* > "/All_Files/${ch}_All.txt" done < /util_files/list_NRPs.txt
How to load a CSV (comma seperated) file into HBase table using Flume?
I want to load a CSV (just comma separated) file into my Hbase table. I already tried it with help of some googled articles, now just I am able to load entire row (or line) as value into Hbase, i.e. all values in single row are getting stored as single column, but I want to split the row based on delimiter comma (,) and store those vales into different columns in Hbase table's column family. Please help to solve my issue. Any suggestions are appreciated. Following are my present using input file, agent configuration file and hbase output files. 1)input file 8600000US00601,00601,006015-DigitZCTA,0063-DigitZCTA,11102 8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869 8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423 8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548 8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603 2)agent configuration file agent.sources = spool agent.channels = fileChannel2 agent.sinks = sink2 agent.sources.spool.type = spooldir agent.sources.spool.spoolDir = /home/cloudera/Desktop/flume agent.sources.spool.fileSuffix = .completed agent.sources.spool.channels = fileChannel2 #agent.sources.spool.deletePolicy = immediate agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink agent.sinks.sink2.channel = fileChannel2 agent.sinks.sink2.table = sample agent.sinks.sink2.columnFamily = s1 agent.sinks.sink2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer agent.sinks.sink1.serializer.regex = "\"([^\"]+)\"" agent.sinks.sink2.serializer.regexIgnoreCase = true agent.sinks.sink1.serializer.colNames =col1,col2,col3,col4,col5 agent.sinks.sink2.batchSize = 100 agent.channels.fileChannel2.type=memory 3)HBase output hbase(main):009:0> scan 'sample' ROW COLUMN+CELL 1431064328720-0LalKGmSf3-1 column=s1:payload, timestamp=1431064335428, value=8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869 1431064328720-0LalKGmSf3-2 column=s1:payload, timestamp=1431064335428, value=8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423 1431064328720-0LalKGmSf3-3 column=s1:payload, timestamp=1431064335428, value=8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548 1431064328721-0LalKGmSf3-4 column=s1:payload, timestamp=1431064335428, value=8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603 4 row(s) in 0.0570 seconds hbase(main):010:0>
error in writing to a file
I have written a python script that calls unix sort using subprocess module. I am trying to sort a table based on two columns(2 and 6). Here is what I have done sort_bt=open("sort_blast.txt",'w+') sort_file_cmd="sort -k2,2 -k6,6n {0}".format(tab.name) subprocess.call(sort_file_cmd,stdout=sort_bt,shell=True) The output file however contains an incomplete line which produces an error when I parse the table but when I checked the entry in the input file given to sort the line looks perfect. I guess there is some problem when sort tries to write the result to the file specified but I am not sure how to solve it though. The line looks like this in the input file gi|191252805|ref|NM_001128633.1| Homo sapiens RIMS binding protein 3C (RIMBP3C), mRNA gnl|BL_ORD_ID|4614 gi|124487059|ref|NP_001074857.1| RIMS-binding protein 2 [Mus musculus] 103 2877 3176 846 941 1.0102e-07 138.0 In output file however only gi|19125 is printed. How do I solve this? Any help will be appreciated. Ram
Using subprocess to call an external sorting tool seems quite silly considering that python has a built in method for sorting items. Looking at your sample data, it appears to be structured data, with a | delimiter. Here's how you could open that file, and iterate over the results in python in a sorted manner: def custom_sorter(first, second): """ A Custom Sort function which compares items based on the value in the 2nd and 6th columns. """ # First, we break the line into a list first_items, second_items = first.split(u'|'), second.split(u'|') # Split on the pipe character. if len(first_items) >= 6 and len(second_items) >= 6: # We have enough items to compare if (first_items[1], first_items[5]) > (second_items[1], second_items[5]): return 1 elif (first_items[1], first_items[5]) < (second_items[1], second_items[5]): return -1 else: # They are the same return 0 # Order doesn't matter then else: return 0 with open(src_file_path, 'r') as src_file: data = src_file.read() # Read in the src file all at once. Hope the file isn't too big! with open(dst_sorted_file_path, 'w+') as dst_sorted_file: for line in sorted(data.splitlines(), cmp = custom_sorter): # Sort the data on the fly dst_sorted_file.write(line) # Write the line to the dst_file. FYI, this code may need some jiggling. I didn't test it too well.
What you see is probably the result of trying to write to the file from multiple processes simultaneously. To emulate: sort -k2,2 -k6,6n ${tabname} > sort_blast.txt command in Python: from subprocess import check_call with open("sort_blast.txt",'wb') as output_file: check_call("sort -k2,2 -k6,6n".split() + [tab.name], stdout=output_file) You can write it in pure Python e.g., for a small input file: def custom_key(line): fields = line.split() # split line on any whitespace return fields[1], float(fields[5]) # Python uses zero-based indexing with open(tab.name) as input_file, open("sort_blast.txt", 'w') as output_file: L = input_file.read().splitlines() # read from the input file L.sort(key=custom_key) # sort it output_file.write("\n".join(L)) # write to the output file If you need to sort a file that does not fit in memory; see Sorting text file by using Python
how to replace last comma in a line with a string in unix
I trying to insert a string in every line except for first and last lines in a file, but not able to get it done, can anyone give some clue how to achieve? Thanks in advance. How to replace last comma in a line with a string xxxxx (except for first and last rows) using unix Original File 00,SRI,BOM,FF,000004,20120808030100,20120907094412,"GTEXPR","SRIVIM","8894-7577","SRIVIM#GTEXPR." 10,SRI,FF,NMNN,3112,NMNSME,U,NM,GEB,,230900,02BLYPO 10,SRI,FF,NMNN,3112,NMNSME,U,NM,TCM,231040,231100,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,UPW,231240,231300,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,UFG,231700,231900,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,FTG,232140,232200,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,BOR,232340,232400,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,BAY,232640,232700,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,RWD,233400,,01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,CCL,,101400,02CHLSU 10,SRI,FF,BUN,0800,NMJWJB,U,NM,PAR,101540,101700,01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,MCE,101840,101900,01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,SSS,102140,102200,09 10,SRI,FF,BUN,0800,NMJWJB,U,NM,FSS,102600,,01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,CCL,,103700,01CHLSU 10,SRI,FF,BUN,0802,NMJWJB,U,NM,PAR,103940,104000,01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,MCE,104140,104200,01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,SSS,104440,104500,09 10,SRI,FF,BUN,0802,NMJWJB,U,NM,FSS,105000,,01 10,SRI,FF,BUN,3112,NMNSME,U,NM,GEB,,230900,02BLYSU 10,SRI,FF,BUN,3112,NMNSME,U,NM,TCM,231040,231100,01 10,SRI,FF,BUN,3112,NMNSME,U,NM,UPW,231240,231300,01 10,SRI,FF,BUN,3112,NMNSME,U,NM,UFG,231700,231900,01 10,SRI,FF,BUN,3112,NMNSME,U,NM,FTG,232140,232200,01 10,SRI,FF,BUN,3112,NMNSME,U,NM,BOR,232340,232400,01 10,SRI,FF,BUN,3112,NMNSME,U,NM,BAY,232640,232700,01 10,SRI,FF,BUN,3112,NMNSME,U,NM,RWD,233400,,01 99,SRI,FF,28 Expected File 00,SRI,BOM,FF,000004,20120808030100,20120907094412,"GTEXPR","SRIVIM","8894-7577","SRIVIM#GTEXPR." 10,SRI,FF,NMNN,3112,NMNSME,U,NM,GEB,,230900,xxxxx02BLYPO 10,SRI,FF,NMNN,3112,NMNSME,U,NM,TCM,231040,xxxxx231100,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,UPW,231240,xxxxx231300,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,UFG,231700,xxxxx231900,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,FTG,232140,xxxxx232200,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,BOR,232340,xxxxx232400,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,BAY,232640,xxxxx232700,01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,RWD,233400,,xxxxx01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,CCL,,101400,xxxxx02CHLSU 10,SRI,FF,BUN,0800,NMJWJB,U,NM,PAR,101540,101700,xxxxx01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,MCE,101840,101900,xxxxx01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,SSS,102140,102200,xxxxx09 10,SRI,FF,BUN,0800,NMJWJB,U,NM,FSS,102600,,xxxxx01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,CCL,,103700,xxxxx01CHLSU 10,SRI,FF,BUN,0802,NMJWJB,U,NM,PAR,103940,104000,xxxxx01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,MCE,104140,104200,xxxxx01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,SSS,104440,104500,xxxxx09 10,SRI,FF,BUN,0802,NMJWJB,U,NM,FSS,105000,,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,GEB,,230900,xxxxx02BLYSU 10,SRI,FF,BUN,3112,NMNSME,U,NM,TCM,231040,231100,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,UPW,231240,231300,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,UFG,231700,231900,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,FTG,232140,232200,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,BOR,232340,232400,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,BAY,232640,232700,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,RWD,233400,,xxxxx01 99,SRI,FF,28
awk can be quite useful for manipulating data files like this one. Here's a one-liner that does more-or-less what you want. It prepends the string "xxxxx" to the twelfth field of each input line that has at least twelve fields. $ awk 'BEGIN{FS=OFS=","}NF>11{$12="xxxxx"$12}{print}' 16006747.txt 00,SRI,BOM,FF,000004,20120808030100,20120907094412,"GTEXPR","SRIVIM","8894-7577","SRIVIM#GTEXPR." 10,SRI,FF,NMNN,3112,NMNSME,U,NM,GEB,,230900,xxxxx02BLYPO 10,SRI,FF,NMNN,3112,NMNSME,U,NM,TCM,231040,231100,xxxxx01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,UPW,231240,231300,xxxxx01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,UFG,231700,231900,xxxxx01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,FTG,232140,232200,xxxxx01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,BOR,232340,232400,xxxxx01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,BAY,232640,232700,xxxxx01 10,SRI,FF,NMNN,3112,NMNSME,U,NM,RWD,233400,,xxxxx01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,CCL,,101400,xxxxx02CHLSU 10,SRI,FF,BUN,0800,NMJWJB,U,NM,PAR,101540,101700,xxxxx01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,MCE,101840,101900,xxxxx01 10,SRI,FF,BUN,0800,NMJWJB,U,NM,SSS,102140,102200,xxxxx09 10,SRI,FF,BUN,0800,NMJWJB,U,NM,FSS,102600,,xxxxx01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,CCL,,103700,xxxxx01CHLSU 10,SRI,FF,BUN,0802,NMJWJB,U,NM,PAR,103940,104000,xxxxx01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,MCE,104140,104200,xxxxx01 10,SRI,FF,BUN,0802,NMJWJB,U,NM,SSS,104440,104500,xxxxx09 10,SRI,FF,BUN,0802,NMJWJB,U,NM,FSS,105000,,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,GEB,,230900,xxxxx02BLYSU 10,SRI,FF,BUN,3112,NMNSME,U,NM,TCM,231040,231100,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,UPW,231240,231300,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,UFG,231700,231900,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,FTG,232140,232200,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,BOR,232340,232400,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,BAY,232640,232700,xxxxx01 10,SRI,FF,BUN,3112,NMNSME,U,NM,RWD,233400,,xxxxx01 99,SRI,FF,28