I do not want to wait for Oracle DataDump expdb to finish writing to dump file.
So I start reading data from the moment it's created.
Then I write this data to another file.
It worked ok - file sizes are the same (the one that OracleDump created and the one my data monitoring script created).
But when I run cmp it shows difference in 27 bytes:
cmp -l ora.dmp monitor_10k_rows.dmp
3 263 154
4 201 131
5 174 173
6 103 75
48 64 70
58 0 340
64 0 1
65 0 104
66 0 110
541 60 61
545 60 61
552 60 61
559 60 61
20508 0 15
20509 0 157
20510 0 230
20526 0 10
20532 0 15
20533 0 225
20534 0 150
913437 0 226
913438 0 37
913454 0 10
913460 0 1
913461 0 104
913462 0 100
ls -al ora.dmp
-rw-r--r-- 1 oracle oinstall 999424 Jun 20 11:35 ora.dmp
python -c 'print 999424-913462'
85962
od ora.dmp -j 913461 -N 1
3370065 000100
3370066
od monitor_10k_rows.dmp -j 913461 -N 1
3370065 000000
3370066
Even if I extract more data the difference is still 27 bytes but different addresses/values:
cmp -l ora.dmp monitor_30k_rows.dmp
3 245 134
4 222 264
5 377 376
6 54 45
48 36 43
57 0 2
58 0 216
64 0 1
65 0 104
66 0 120
541 60 61
545 60 61
552 60 61
559 60 61
20508 0 50
20509 0 126
20510 0 173
20526 0 10
20532 0 50
20533 0 174
20534 0 120
2674717 0 226
2674718 0 47
2674734 0 10
2674740 0 1
2674741 0 104
2674742 0 110
Some writes are the same.
Is there a way know addresses of bytes which will differ?
ls -al ora.dmp
-rw-r--r-- 1 bicadmin bic 2760704 Jun 20 11:09 ora.dmp
python -c 'print 2760704-2674742'
85962
How can update my monitored copy after DataDump updated the original at adress 2674742 using Python for example?
Exact same thing happens if I use COMPRESSION=DATA_ONLY option.
Update: Figured how to sync bytes that differ between 2 files:
def patch_file(fn, diff):
for line in diff.split(os.linesep):
if line:
addr, to_octal, _ = line.strip().split()
with open(fn , 'r+b') as f:
f.seek(int(addr)-1)
f.write(chr(int (to_octal,8)))
diff="""
3 157 266
4 232 276
5 272 273
6 16 25
48 64 57
58 340 0
64 1 0
65 104 0
66 110 0
541 61 60
545 61 60
552 61 60
559 61 60
20508 15 0
20509 157 0
20510 230 0
20526 10 0
20532 15 0
20533 225 0
20534 150 0
913437 226 0
913438 37 0
913454 10 0
913460 1 0
913461 104 0
913462 100 0
"""
patch_file(f3,diff)
wrote a patch using Python:
addr=[3 , 4 , 5 , 6 , 48 , 58 , 64 , 65 , 66 , 541 , 545 , 552 , 559 , 20508 , 20509 , 20510 , 20526 , 20532 , 20533 , 20534 ]
last_range=[85987, 85986, 85970, 85964, 85963, 85962]
def get_bytes(addr):
out =[]
with open(f1 , 'r+b') as f:
for a in addr:
f.seek(a-1)
data= f.read(1)
hex= binascii.hexlify(data)
binary = int(hex, 16)
octa= oct(binary)
out.append((a,octa))
return out
def patch_file(fn, bytes_to_update):
with open(fn , 'r+b') as f:
for (a,to_octal) in bytes_to_update:
print (a,to_octal)
f.seek(int(a)-1)
f.write(chr(int (to_octal,8)))
if 1:
from_file=f1
fsize=os.stat(from_file).st_size
bytes_to_read = addr + [fsize-x for x in last_range]
bytes_to_update = get_bytes(bytes_to_read)
to_file =f3
patch_file(to_file,bytes_to_update)
The reason I do dmp file monitoring is because it cuts backup time in half.
The goal is to remove 22 from the root node and re-balance the tree.
First I remove 22, and replace it by its in-order successor 28.
Secondly I rebalance the resulting tree, by moving the empty node to the left. The resulting tree is below.
Is moving 28 up the right procedure, and did I balance the left side correctly in the end?
22,34
/ | \
16 28 37
/ \ / \ / \
15 21 25 33 35 43
[28],34
/ | \
16 * 37
/ \ / \ / \
15 21 25 33 35 43
34
/ \
16,28 37
/ | \ / \
15 21,25 33 35 43
Thanks!
To delete 22 from
22,34
/ | \
16 28 37
/ \ / \ / \
15 21 25 33 35 43 ,
we replace it by its in-order successor 25, leaving a hole (*).
25,34
/ | \
16 28 37
/ \ / \ / \
15 21 * 33 35 43
We can't fix the hole by borrowing, so we merge its parent into its sibling, moving the hole up.
25,34
/ | \
16 * 37
/ \ | / \
15 21 28,33 35 43
The hole has two siblings now, so we can redistribute one of the parent's keys down.
34
/ \
16,25 37
/ | \ / \
15 21 28,33 35 43
(I'm working from this set of lecture notes. Don't bother memorizing the details here unless it's for an exam. Even then... I really wish data structure courses did not emphasize balanced search trees to the degree that they do.)
I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below:
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]")
This is basically what I'm trying to accomplish:
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
. . . .
. . . .
. . . .
I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format.
Anyone know how to do this, have any suggestions, or thoughts on scripting this?
Since you have your lists in python, just do it in python:
l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]]
for i in zip(*l):
print "\t".join(i)
produces
30 28 -7 0
30 6
32 6
awk based solution:
awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i}
END {for (i=1; i<=NF; i++) print a[i]}' file
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
..........
..........
Another solution, but it works only for file with 4 lines:
$ paste \
<(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t)
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
68 87 28 1.5
88 99 13 0.5
97 110 13 0.5
105 116 10 0
107 119 11 0.5
107 120 12 0.5
105 117 11 0.5
101 114 13 0.5
93 113 22 1
88 103 17 0.5
80 82 3 0
69 6 -0.5
55 47 -15 -0.5
-20 2.5
38
71
Updated: or another version with preprocessing:
$ sed 's|\[||;s|\][,]\?||' t >t2
$ paste \
<(sed -n '1{s|,|\n|g;p}' t2) \
<(sed -n '2{s|,|\n|g;p}' t2) \
<(sed -n '3{s|,|\n|g;p}' t2) \
<(sed -n '4{s|,|\n|g;p}' t2)
If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested:
$ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T
Example:
cat data
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
$ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T
30 28 -7 0
30 6 43 3
32 6 71 5
35 50 30 1.5
34 58 23 1
43 56 28 1.5
52 64 13 0.5
68 87 13 0.5
88 99 10 0
97 110 11 0.5
105 116 12 0.5
107 119 11 0.5
107 120 13 0.5
105 117 22 1
101 114 17 0.5
93 113 3 0
88 103 -15 -0.5
80 82 -20 -0.5
69 6 38 2.5
55 47 71