I have the following issue:
I try to index same table with same data and same config on 2 servers.
First is local machine (Not very good).
Second is Amazon EC2 General-purpose M1 medium insance.
The search results differs almost twice:
Local:
collected 84208 docs, 7.6 MB
sorted 54.9 Mhits, 100.0% done
total 84208 docs, 7646878 bytes
total 27.188 sec, 281252 bytes/sec, 3097.17 docs/sec
total 3 reads, 0.177 sec, 99835.0 kb/call avg, 59.2 msec/call avg
total 1013 writes, 7.735 sec, 679.4 kb/call avg, 7.6 msec/call avg
Amazon:
collected 84208 docs, 7.6 MB
sorted 54.9 Mhits, 100.0% done
total 84208 docs, 7646878 bytes
total 52.111 sec, 146740 bytes/sec, 1615.92 docs/sec
total 3 reads, 1.270 sec, 99833.9 kb/call avg, 423.4 msec/call avg
total 1010 writes, 6.980 sec, 680.8 kb/call avg, 6.9 msec/call avg
Does anybody have a clue what can be a reason for such results?
Do I need to run some specific option on Amazon for Sphinx server?
Related
Looking at Oracle Statspack report, which has snap of exactly at before and after of the batch program, is it possible to say how much commits has been done on database? How many transaction has been done?
In the Load Profile section you have number of transactions per second and number of rollbacks per second:
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 0.4 0.0 0.01 0.06
DB CPU(s): 0.2 0.0 0.00 0.03
Redo size: 20,766.2 2,092.1
Logical reads: 548.5 55.3
Block changes: 138.3 13.9
Physical reads: 22.9 2.3
Physical writes: 16.5 1.7
User calls: 7.0 0.7
Parses: 9.2 0.9
Hard parses: 0.0 0.0
W/A MB processed: 1.8 0.2
Logons: 0.1 0.0
Executes: 62.6 6.3
Rollbacks: 0.0 0.0
Transactions: 9.9
I've recently studied in my syllabus that Kb refers to Kilo bits where as KB refers to Kilo Bytes. Also I've studied that Kb refers to speed and KB refers to speed. So according to what I've studied I must be able to download 1 MB of file in 8 Seconds at a speed of 1 Mbps as 1 MB equals 8 Mb. But I can download that file in just 1 Second at a speed of 1 Mbps. How is that possible?
You are correct until the last statement.
You can download a 1 MB file in 8 sec at 1 Mb/s or 1 sec at 1 MB/s.
8 Mb = 1 MB.
we have got TOTAL
Label: 10
Average: 1288
Median: 1278
90%: 1525
95%: 1525
99%: 1546
Min: 887
Max: 1546
Throughput: 6.406149903907751
KB/sec: 39.21264413837284
What do means of means KB/sec? please help me understand ot it
According to the Glossary
KB/s(Aggregate Report)
Throughput is measured in bytes and represents the amount of data that the Virtual users received from the server.Throughput KPI is measured in kilobytes(KB) per seconds.
So basically it is average amount of data received by JMeter from the application under test per second.
KB/sec is the speed of a connection.
KB meaning Kilobyte and sec meaning per second
You get faster speeds of MB/sec which is Megabyte and even faster speeds of GB/sec which is Gigabytes
1000 KB = 1 MB
1000 MB = 1 GB
Hope this helps :)
In hive I running a query -
select ret[0],ret[1],ret[2],ret[3],ret[4],ret[5],ret[6] from (select combined1(extra) as ret from log_test1) a ;
Here ret[0],ret[1],ret[2] ... are domain, date, IP, etc. This query is doing heavy write on disk.
iostat result on one of the box in cluster.
avg-cpu: %user %nice %system %iowait %steal %idle
20.65 0.00 1.82 57.14 0.00 20.39
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 0.00 0.00 535.00 0.00 23428.00 87.58 143.94 269.11 0.00 269.11 1.87 100.00
My mapper is basically stuck in disk IO. I have 3 box cluster. My yarn configuration is
Mapper memory(mapreduce.map.memory.mb)=2GB,
I/O Sort Memory Buffer=1 GB.
I/O Sort Spill Percent=0.8
Counters of my jobs are
FILE: Number of bytes read 0
FILE: Number of bytes written 2568435
HDFS: Number of bytes read 1359720216
HDFS: Number of bytes written 19057298627
Virtual memory (bytes) snapshot 24351916032
Total committed heap usage (bytes) 728760320
Physical memory (bytes) snapshot 2039455744
Map input records 76076426
Input split bytes 2738
GC time elapsed (ms) 55602
Spilled Records 0
As mapper should initially write everything in RAM and when RAM gets full(I/O Sort Memory Buffer),it should spill the data into disk. But as I am seeing, Spilled Records=0 and also mapper is not using full RAM, still there is so heavy disk write.
Even when I am running query
select combined1(extra) from log_test1;
I am getting same heavy disk io write.
What can be the reason of this heavy disk write and how can I reduce this heavy disk write ? As in this case disk io is becoming bottleneck for my mapper.
It may be that your subquery is being written to disk before the second stage of the processing takes place. You should use Explain to examine the execution plan.
You could try rewriting your subquery as a CTE https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression
I have just read an article by Rico Mariani that concerns with performance of memory access given different locality, architecture, alignment and density.
The author built an array of varying size containing a doubly linked list with an int payload, which was shuffled to a certain percentage. He experimented with this list and found some consistent results on his machine.
Quoting one of the result table:
Pointer implementation with no changes
sizeof(int*)=4 sizeof(T)=12
shuffle 0% 1% 10% 25% 50% 100%
1000 1.99 1.99 1.99 1.99 1.99 1.99
2000 1.99 1.85 1.99 1.99 1.99 1.99
4000 1.99 2.28 2.77 2.92 3.06 3.34
8000 1.96 2.03 2.49 3.27 4.05 4.59
16000 1.97 2.04 2.67 3.57 4.57 5.16
32000 1.97 2.18 3.74 5.93 8.76 10.64
64000 1.99 2.24 3.99 5.99 6.78 7.35
128000 2.01 2.13 3.64 4.44 4.72 4.80
256000 1.98 2.27 3.14 3.35 3.30 3.31
512000 2.06 2.21 2.93 2.74 2.90 2.99
1024000 2.27 3.02 2.92 2.97 2.95 3.02
2048000 2.45 2.91 3.00 3.10 3.09 3.10
4096000 2.56 2.84 2.83 2.83 2.84 2.85
8192000 2.54 2.68 2.69 2.69 2.69 2.68
16384000 2.55 2.62 2.63 2.61 2.62 2.62
32768000 2.54 2.58 2.58 2.58 2.59 2.60
65536000 2.55 2.56 2.58 2.57 2.56 2.56
The author explains:
This is the baseline measurement. You can see the structure is a nice round 12 bytes and it will align well on x86. Looking at the first column, with no shuffling, as expected things get worse and worse as the array gets bigger until finally the cache isn't helping much and you have about the worst you're going to get, which is about 2.55ns on average per item.
But something quite strange can be seen around 32k items:
The results for shuffling are not exactly what I expected. At small sizes, it makes no difference. I expected this because basically the entire table is staying hot in the cache and so locality isn't mattering. Then as the table grows you see that shuffling has a big impact at about 32000 elements. That's 384k of data. Likely because we've blown past a 256k limit.
Now the bizarre thing is this: after this the cost of shuffling actually goes down, to the point that later on it hardly matters at all. Now I can understand that at some point shuffled or not shuffled really should make no difference because the array is so huge that runtime is largely gated by memory bandwidth regardless of order. However... there are points in the middle where the cost of non-locality is actually much worse than it will be at the endgame.
What I expected to see was that shuffling caused us to reach maximum badness sooner and stay there. What actually happens is that at middle sizes non-locality seems to cause things to go very very bad... And I do not know why :)
So the question is: What might have caused this unexpected behavior?
I have thought about this for some time, but found no good explanation. The test code looks fine to me. I don't think CPU branch prediction is the culprit in this instance, as it should be observable far earlier than 32k items, and show a far slighter spike.
I have confirmed this behavior on my box, it looks pretty much exactly the same.
I figured it might be caused by forwarding of CPU state, so I changed the order of rows and/or column generation - almost no difference in output. To make sure, I generated data for a larger continuous sample. For easy of viewing, I put it into excel:
And another independent run for good measure, negligible difference
I put my best theory here: http://blogs.msdn.com/b/ricom/archive/2014/09/28/performance-quiz-14-memory-locality-alignment-and-density-suggestions.aspx#10561107 but it's just a guess, I haven't confirmed it.
Mystery solved! From my blog:
Ryan Mon, Sep 29 2014 9:35 AM #
Wait - are you concluding that completely randomized access is the same speed as sequential for very large cases? That would be very surprising!!
What's the range of rand()? If it's 32k that would mean you're just shuffling the first 32k items and doing basically sequential reads for most items in the large case, and the per-item avg would become very close to the sequential case. This matches your data very well.
Mon, Sep 29 2014 10:57 AM #
That's exactly it!
The rand function returns a pseudorandom integer in the range 0 to RAND_MAX (32767). Use the srand function to seed the pseudorandom-number generator before calling rand.
I need a different random number generator!
I'll redo it!