The simplest way to delete a section of text, n times - bash
I have a file bigger than 4gb which is bad news for me because I can't open the file in notepad++ and use the macro feature to record and repeat a process to the end of a file.
What I'd like to do is say, leave the first 20 lines of text, then delete the next 80, then repeat that process to the end of a file.
What would be the easiest way to do this?
I'm looking at these files on a linux server so running a script of some kind would be the easiest way, or maybe someone knows a way to do this in vi? (hence the lame taging)
Thanks in advance
awk can do this fairly easily:
awk '(NR-1)%100 < 20' bigfile.txt
I would go with the awk solution, but here's one way you could do the same thing with sed:
seq 20 | sed 's/$/~100p/' | sed -nf - bigfile.txt
Testing:
seq 20 | sed 's/$/~100p/' | sed -nf - <(seq 200)
Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
Related
Matching Column numbers from two different txt file
I have two text files which are a different size. The first one below example1.txt has only one column of numbers: 101 102 103 104 111 120 120 125 131 131 131 131 131 131 And the Second text file example2.txt has two columns: 101 3 102 3 103 3 104 4 104 4 111 5 120 1 120 1 125 2 126 2 127 2 128 2 129 2 130 2 130 2 130 2 131 10 131 10 131 10 131 10 131 10 131 10 132 10 The first column in the example1.txt is a subset of column one in example2.txt. The second column numbers in example2.txt are the associated values with the first column. What I want to do is to get the associated second column of example1.txt following the example2.txt. I have tried but couldn't figure it out yet. Any suggestions or solutions in bash, awk would be appreciated Therefore the result would be: 101 3 102 3 103 3 104 4 111 5 120 1 120 1 125 2 131 10 131 10 131 10 131 10 131 10 131 10 UPDATE: I have been trying to do the column matching like : awk -F'|' 'NR==FNR{c[$1]++;next};c[$1] > 0' example1.txt example2.txt > output.txt In both files, the first column goes like an ascending order, but the frequency of the same numbers may not be the same. For example, the frequency of 104 is one in the example1.txt, but it appeared twice in the example2.txt The important thing is that the associated second column value would be the same for example1.txt too. Just see the expected output in the end.
$ awk 'NR==FNR{a[$1]++; next} ($1 in a) && b[$1]++ < a[$1]' f1 f2 101 3 102 3 103 3 104 4 111 5 120 1 120 1 125 2 131 10 131 10 131 10 131 10 131 10 131 10 This solution doesn't make use of the fact that the first column is in ascending order. Perhaps some optimization can be done based on that. ($1 in a) && b[$1]++ < a[$1] is the main difference from your solution. This checks if the field exists as well as that the count doesn't exceed that of the first file. Also, not sure why you set the field separator as | because there is no such character in the sample given.
How to replace list of numbers in column for random numbers in other column in BASH environment
I have a tab file with two columns like that 5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 6 94 6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 205 284 307 406 2 10 13 40 47 58 2 13 40 87 and the desired output should be 5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 14 27 6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 6 209 299 305 2 10 13 23 40 47 58 87 10 23 40 58 I would like to change the numbers in 2nd column for random numbers in 1st column resulting in an output in 2nd column with the same number of numbers. I mean e.g. if there are four numbers in 2nd column for x row, the output must have four random numbers from 1st column for this row, and so on... I'm try to create two arrays by AWK and split and replace every number in 2nd column for numbers in 1st column but not in a randomly way. I have seen the rand() function but I don't know exactly how joint these two things in a script. Is it possible to do in BASH environment or are there other better ways to do it in BASH environment? Thanks in advance
awk to the rescue! $ awk -F'\t' 'function shuf(a,n) {for(i=1;i<n;i++) {j=i+int(rand()*(n+1-i)); t=a[i]; a[i]=a[j]; a[j]=t}} function join(a,n,x,s) {for(i=1;i<=n;i++) {x=x s a[i]; s=" "} return x} BEGIN{srand()} {an=split($1,a," "); shuf(a,an); bn=split($2,b," "); delete m; delete c; j=0; for(i=1;i<=bn;i++) m[b[i]]; # pull elements from a upto required sample size, # not intersecting with the previous sample set for(i=1;i<=an && j<bn;i++) if(!(a[i] in m)) c[++j]=a[i]; cn=asort(c); print $1 FS join(c,cn)}' file 5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 85 94 6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 20 205 294 295 2 10 13 23 40 47 58 87 10 13 47 87 shuffle (standard algorithm) the input array, sample required number of elements, additional requirement is no intersection with the existing sample set. Helper structure map to keep existing sample set and used for in tests. The rest should be easy to read.
Assuming that there is a tab delimiting the two columns, and each column is a space delimited list: awk 'BEGIN{srand()} {n=split($1,a," "); m=split($2,b," "); printf "%s\t",$1; for (i=1;i<=m;i++) printf "%d%c", a[int(rand() * n) +1], (i == m) ? "\n" : " " }' FS=\\t input
Try this: # This can be an external file of course # Note COL1 and COL2 seprated by hard TAB cat <<EOF > d1.txt 5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 6 94 6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 205 284 307 406 2 10 13 40 47 58 2 13 40 87 EOF # Loop to read each line, not econvert TAB to:, though could have used IFS cat d1.txt | sed 's/ /:/' | while read LINE do # Get the 1st column data COL1=$( echo ${LINE} | cut -d':' -f1 ) # Get col1 number of items NUM_COL1=$( echo ${COL1} | wc -w ) # Get col2 number of items NUM_COL2=$( echo ${LINE} | cut -d':' -f2 | wc -w ) # Now split col1 items into an array read -r -a COL1_NUMS <<< "${COL1}" COL2=" " # THis loop runs once for each COL2 item COUNT=0 while [ ${COUNT} -lt ${NUM_COL2} ] do # Generate a random number to use as teh random index for COL1 COL1_IDX=${RANDOM} let "COL1_IDX %= ${NUM_COL1}" NEW_NUM=${COL1_NUMS[${COL1_IDX}]} # Check for duplicate DUP_FOUND=$( echo "${COL2}" | grep ${NEW_NUM} ) if [ -z "${DUP_FOUND}" ] then # Not a duplicate, increment loop conter and do next one let "COUNT = COUNT + 1 " # Add the random COL1 item to COL2 COL2="${COL2} ${COL1_NUMS[${COL1_IDX}]}" fi done # Sort COL2 COL2=$( echo ${COL2} | tr ' ' '\012' | sort -n | tr '\012' ' ' ) # Print echo ${COL1} :: ${COL2} done Output: 5 6 14 22 23 25 27 84 85 88 89 94 95 98 100 :: 88 95 6 8 17 20 193 205 209 284 294 295 299 304 305 307 406 :: 20 299 304 305 2 10 13 40 47 58 :: 2 10 40 58
Remove rows that have a specific numeric value in a field
I have a very bulky file about 1M lines like this: 4001 168991 11191 74554 60123 37667 125750 28474 8 145 25 101 83 51 124 43 2985 136287 4424 62832 50788 26847 89132 19184 3 129 14 101 88 61 83 32 1 14 10 12 7 13 4 6136 158525 14054 100072 134506 78254 146543 41638 1 40 4 14 19 10 35 4 2981 112734 7708 54280 50701 33795 75774 19046 7762 339477 26805 148550 155464 119060 254938 59592 1 22 2 12 10 6 17 2 6 136 16 118 184 85 112 56 1 28 1 5 18 25 40 2 1 26 2 19 28 6 18 3 4071 122584 14031 69911 75930 52394 89733 30088 1 9 1 3 4 3 11 2 14 314 32 206 253 105 284 66 I want to remove rows that have a value less than 100 in the second column. How to do this with sed?
I would use awk to do this. Example: awk ' $2 >= 100 ' file.txt this will only display every row from file.txt that has a column $2 greater than 100.
Use the following approach: sed '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' -E /tmp/test.txt (replace /tmp/test.txt with your current file path) ([0-9]{1,2}|[0][0-9]+) - will match either digits from 0 to 99 OR a digits with leading zero (ex. 012, 00982) d - delete the pattern space; -E(--regexp-extended) - Use extended regular expressions rather than basic regular expressions To remove matched lines in place use -i option: sed -i -E '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' /tmp/test.txt
convert comma separated list in text file into columns in bash
I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below: [30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55], [28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47], [-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71], [0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5] Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]") This is basically what I'm trying to accomplish: 30 28 -7 0 30 6 32 6 35 50 43 3 34 58 71 5 43 56 30 1.5 52 64 23 1 . . . . . . . . . . . . I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format. Anyone know how to do this, have any suggestions, or thoughts on scripting this?
Since you have your lists in python, just do it in python: l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]] for i in zip(*l): print "\t".join(i) produces 30 28 -7 0 30 6 32 6
awk based solution: awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i} END {for (i=1; i<=NF; i++) print a[i]}' file 30 28 -7 0 30 6 32 6 35 50 43 3 34 58 71 5 43 56 30 1.5 52 64 23 1 .......... ..........
Another solution, but it works only for file with 4 lines: $ paste \ <(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \ <(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \ <(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \ <(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) 30 28 -7 0 30 6 32 6 35 50 43 3 34 58 71 5 43 56 30 1.5 52 64 23 1 68 87 28 1.5 88 99 13 0.5 97 110 13 0.5 105 116 10 0 107 119 11 0.5 107 120 12 0.5 105 117 11 0.5 101 114 13 0.5 93 113 22 1 88 103 17 0.5 80 82 3 0 69 6 -0.5 55 47 -15 -0.5 -20 2.5 38 71 Updated: or another version with preprocessing: $ sed 's|\[||;s|\][,]\?||' t >t2 $ paste \ <(sed -n '1{s|,|\n|g;p}' t2) \ <(sed -n '2{s|,|\n|g;p}' t2) \ <(sed -n '3{s|,|\n|g;p}' t2) \ <(sed -n '4{s|,|\n|g;p}' t2)
If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested: $ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T Example: cat data [30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55], [28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47], [-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71], [0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5] $ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T 30 28 -7 0 30 6 43 3 32 6 71 5 35 50 30 1.5 34 58 23 1 43 56 28 1.5 52 64 13 0.5 68 87 13 0.5 88 99 10 0 97 110 11 0.5 105 116 12 0.5 107 119 11 0.5 107 120 13 0.5 105 117 22 1 101 114 17 0.5 93 113 3 0 88 103 -15 -0.5 80 82 -20 -0.5 69 6 38 2.5 55 47 71
using sort command in shell scripting
I execute the following code : for i in {1..12};do printf "%s %s\n" "${edate1[$i]}" "${etime1[$i]}" (I retrieve the values of edate1 and etime1 from my database and store it in an array which works fine.) I receive the o/p as: 97 16 97 16 97 12 107 16 97 16 97 16 97 16 97 16 97 16 97 16 97 16 100 15 I need to sort the first column using the sort command. Expected o/p: 107 16 100 16 97 12 97 16 97 16 97 16 97 16 97 16 97 16 97 16 97 16 97 15
This is what I did to find your solution: Copy your original input to in.txt Run this code, which uses awk, sort, and paste. awk '{print $1}' in.txt | sort -g -r -s > tmp.txt paste tmp.txt in.txt | awk '{print $1 " " $3}' > out.txt Then out.txt matches the expected output in your original post. To see how it works, look at this: $ paste tmp.txt in.txt 107 97 16 100 97 16 97 97 12 97 107 16 97 97 16 97 97 16 97 97 16 97 97 16 97 97 16 97 97 16 97 97 16 97 100 15 So you're getting the first column sorted, then the original columns in place. Awk makes it easy to print out the columns (fields) you're interested in, ie, the first and third.
This is the best and simplest way to sort your data <OUTPUT> | sort -nrk1 Refer the following link to know more about the magic of sort.