I'd like to extract the values for MemTotal, MemFree, MemAvailable, SwapTotal and SwapFree from /proc/meminfo in Golang. The closest I've gotten so far, is to use fmt.Sscanf() which will give me the values I want one at a time, but I'm also getting many lines with zeros for output. Here's the code I'm using:
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
f, e := os.Open("/proc/meminfo")
if e != nil {
panic(e)
}
defer f.Close()
s := bufio.NewScanner(f)
for s.Scan() {
var n int
fmt.Sscanf(s.Text(), "MemFree: %d kB", &n)
fmt.Println(n)
}
}
Which gives me the following results:
0
11260616
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
So the first question, is there a way to limit the results to the one value (non-zero) I'm after? Or, is there a better way to approach this altogether?
My /proc/meminfo file looks like this:
MemTotal: 16314336 kB
MemFree: 11268004 kB
MemAvailable: 13955820 kB
Buffers: 330284 kB
Cached: 2536848 kB
SwapCached: 0 kB
Active: 1259348 kB
Inactive: 3183140 kB
Active(anon): 4272 kB
Inactive(anon): 1578028 kB
Active(file): 1255076 kB
Inactive(file): 1605112 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194304 kB
SwapFree: 4194304 kB
Dirty: 96 kB
Writeback: 0 kB
AnonPages: 1411704 kB
Mapped: 594408 kB
Shmem: 6940 kB
KReclaimable: 151936 kB
Slab: 253384 kB
SReclaimable: 151936 kB
SUnreclaim: 101448 kB
KernelStack: 17184 kB
PageTables: 25060 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 12351472 kB
Committed_AS: 6092984 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 40828 kB
VmallocChunk: 0 kB
Percpu: 5696 kB
AnonHugePages: 720896 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 230400 kB
DirectMap2M: 11235328 kB
DirectMap1G: 14680064 kB
Note, s.Scan() reads the input line by line. If a line does not match the format string given to fmt.Sscanf, your program outputs 0 as var n int is declared inside the loop. My suggestion is to check the first result returned by fmt.Sscanf`, i.e., the number of items matched. So, if first result is 1 you have a match and you can output the value. See working example here: https://go.dev/play/p/RtBKusGg8wV
EDIT: I tried to stay as close as possible to your code. There may be further issues as the unit of measurement used may vary according to the man pages. It may be good enough for your use case, however, if the the values in question on your systems are always output in "kB".
I'd like to extract the values for MemTotal, MemFree, MemAvailable, SwapTotal and SwapFree from /proc/meminfo in Golang.
When I look at the values you provided from /proc/meminfo I think of a map: key/value pairs using items from the first column as keys and items from the second column as values.
To keep it simple, you could use map[string]string initially to collect, then convert where needed later to a specific type.
From there, you could use the comma ok idiom to check whether values are available for the specific data you'd like to retrieve.
If you didn't care about the specific values, you just wanted anything that was non-zero you could filter key/value pairs before you put them in the map: assert that they're not zero. I'd recommend explicitly trim spaces before any comparisons that you may make.
EDIT: Note, I ran into issues using other approaches and eventually switched to using a bufio.Scanner to process the file I was working with (also in the /proc filesystem).
Related
I would like to show some experimental results about Rocksdb Put performance. The fact that single-threaded put throughput is slower than two-threaded put throughput. It is wired because it uses the default skiplist as memtable, and this data structure supports concurrent writes.
Here is my testing code.
uint64_t nthread = 2;
uint64_t nkeys = 16000000;
std::thread threads[nthread];
std::atomic<uint64_t> idx(1000000);
for (int t = 0; t < nthread; t++) {
threads[t] = std::thread([db, &idx, nthread, nkeys, &write_option_disable] {
WriteBatch batch;
for (int i = 0; i < nkeys / nthread; i++) {
std::string key = "WVERIFY" + std::to_string(idx.fetch_add(1));
std::string value = "MOCK";
auto ikey = rocksdb::Slice(key);
auto ivalue = rocksdb::Slice(value);
db->Put(write_option_disable, ikey, ivalue);
}
return 0;
});
}
for (auto& t : threads) {
t.join();
}
Besides, here are the results I got.
// Single thread
Uptime(secs): 8.4 total, 8.3 interval
Flush(GB): cumulative 1.170, interval 1.170
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 1.17 GB write, 143.35 MB/s write, 0.00 GB read, 0.00 MB/s read, 8.1 seconds
Interval compaction: 1.17 GB write, 144.11 MB/s write, 0.00 GB read, 0.00 MB/s read, 8.1 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
Block cache LRUCache#0x564742515ea0#7011 capacity: 8.00 MB collections: 1 last_copies: 0 last_secs: 2e-05 secs_since: 8
Block cache entry stats(count,size,portion): Misc(1,0.00 KB,0%)
** File Read Latency Histogram By Level [default] **
** DB Stats **
Uptime(secs): 8.4 total, 8.3 interval
Cumulative writes: 16M writes, 16M keys, 16M commit groups, 1.0 writes per commit group, ingest: 1.63 GB, 199.80 MB/s
Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 16M writes, 16M keys, 16M commit groups, 1.0 writes per commit group, ingest: 1669.88 MB, 200.85 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
// 2 threads
Uptime(secs): 31.4 total, 31.4 interval
Flush(GB): cumulative 0.183, interval 0.183
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.67 GB write, 21.84 MB/s write, 0.97 GB read, 31.68 MB/s read, 10.2 seconds
Interval compaction: 0.67 GB write, 21.87 MB/s write, 0.97 GB read, 31.72 MB/s read, 10.2 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
Block cache LRUCache#0x5619fb7bbea0#6183 capacity: 8.00 MB collections: 1 last_copies: 0 last_secs: 1.9e-05 secs_since: 31
Block cache entry stats(count,size,portion): Misc(1,0.00 KB,0%)
** File Read Latency Histogram By Level [default] **
** DB Stats **
Uptime(secs): 31.4 total, 31.4 interval
Cumulative writes: 16M writes, 16M keys, 11M commit groups, 1.4 writes per commit group, ingest: 0.45 GB, 14.67 MB/s
Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 16M writes, 16M keys, 11M commit groups, 1.4 writes per commit group, ingest: 460.94 MB, 14.69 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
===========================update==========================
This is my Rocksdb's setting.
DB* db;
Options options;
BlockBasedTableOptions table_options;
rocksdb::WriteOptions write_option_disable;
write_option_disable.disableWAL = true;
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
options.IncreaseParallelism();
options.OptimizeLevelStyleCompaction();
// create the DB if it's not already present
options.create_if_missing = true;
The atomic idx shared between two threads can introduced non-trivial overhead. Try inserting random values from each thread, and maybe increase the number of threads.
Consider the cache system with the following properties:
Cache (direct mapped cache):
- Cache size 128 bytes, block size 16 bytes (24 bytes)
- Tag/Valid bits for cache blocks are as follows:
Block index - 0 1 2 3 4 5 6 7
Tag - 0 6 7 0 5 3 1 3
Valid - 1 0 0 1 0 0 0 1
Find Tag Block index, Block offset, Cache hit/miss for memory addresses - 0x7f6, 0x133.
I am not sure how to solve.
Since cache size is 128 bytes, cache has 128/16 = 8 blocks and hence block offset = 3.
Since block size is 16 bytes, block offset is 4.
Address bits are 12 for 0x7f6 = 0111 1111 0110:
Offset = (0110 >> 1) = 3
Index = 111 = 7
Tag = 01111 = f
We had a system with a 3-node Cassandra 2.0.6 ring. Over time, the application load on that system increased until a limit where the ring could not handle it anymore, causing the typical node overload failures.
We doubled the size of the ring, and recently even added one more node, to try to handle the load, but there're still only 3 nodes taking all the load; but not the original 3 nodes of the initial ring.
We did the bootstrap + cleanup process described in the adding nodes guide. We also tried repairs on each node after not seeing much improvements in the ring load. Our load is 99.99% writes on this system.
Here's a chart of the cluster load illustrating the issue:
The highest load tables have a high cardinality on the partition key that I'd expect distributes well over vnodes.
Edit: nodetool info
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN x.y.z.92 56.83 GB 256 13.8% x-y-z-b53e8ab55e0a rack1
UN x.y.z.253 136.87 GB 256 15.2% x-y-z-bd3cf08449c8 rack1
UN x.y.z.70 69.84 GB 256 14.2% x-y-z-39e63dd017cd rack1
UN x.y.z.251 74.03 GB 256 14.4% x-y-z-36a6c8e4a8e8 rack1
UN x.y.z.240 51.77 GB 256 13.0% x-y-z-ea239f65794d rack1
UN x.y.z.189 128.49 GB 256 14.3% x-y-z-7c36c93e0022 rack1
UN x.y.z.99 53.65 GB 256 15.2% x-y-z-746477dc5db9 rack1
Edit: tpstats (node highly loaded)
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 11591287 0 0
RequestResponseStage 0 0 283211224 0 0
MutationStage 32 405875 349531549 0 0
ReadRepairStage 0 0 3591 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 3246983 0 0
AntiEntropyStage 0 0 72055 0 0
MigrationStage 0 0 133 0 0
MemoryMeter 0 0 205 0 0
MemtablePostFlusher 0 0 94915 0 0
FlushWriter 0 0 12521 0 0
MiscStage 0 0 34680 0 0
PendingRangeCalculator 0 0 14 0 0
commitlog_archiver 0 0 0 0 0
AntiEntropySessions 1 1 1 0 0
InternalResponseStage 0 0 30 0 0
HintedHandoff 0 0 1957 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 196
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 31663792
_TRACE 24409
REQUEST_RESPONSE 4
COUNTER_MUTATION 0
How could I further troubleshoot this issue?
You need to run nodetool cleanup on the previous nodes that were part of the ring. Nodetool cleanup will remove the partition keys that the node currently does not own.
Seems like after the addition of the nodes, the keys have not been deleted hence causing the load to be higher on the previous nodes.
Try running
nodetool cleanup
on the previous nodes
Based on following output of !address -summary command, I think I have got a native memory leak. In order to deterine the callstack on where these allocations are happening, I am following article at http://www.codeproject.com/KB/cpp/MemoryLeak.aspx
0:000> !address -summary
TEB 7efdd000 in range 7efdb000 7efde000
TEB 7efda000 in range 7efd8000 7efdb000
TEB 7efd7000 in range 7efd5000 7efd8000
TEB 7efaf000 in range 7efad000 7efb0000
TEB 7efac000 in range 7efaa000 7efad000
ProcessParametrs 00441b78 in range 00440000 00540000
Environment 004407f0 in range 00440000 00540000
-------------------- Usage SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Pct(Busy) Usage
551a000 ( 87144) : 04.16% 14.59% : RegionUsageIsVAD
5b8d3000 ( 1499980) : 71.53% 00.00% : RegionUsageFree
2cc3000 ( 45836) : 02.19% 07.68% : RegionUsageImage
4ff000 ( 5116) : 00.24% 00.86% : RegionUsageStack
0 ( 0) : 00.00% 00.00% : RegionUsageTeb
1c040000 ( 459008) : 21.89% 76.87% : RegionUsageHeap
0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap
1000 ( 4) : 00.00% 00.00% : RegionUsagePeb
0 ( 0) : 00.00% 00.00% : RegionUsageProcessParametrs
0 ( 0) : 00.00% 00.00% : RegionUsageEnvironmentBlock
Tot: 7fff0000 (2097088 KB) Busy: 2471d000 (597108 KB)
0:000> !heap -s
LFH Key : 0x7fdcf95f
Termination on corruption : DISABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00440000 00000002 453568 436656 453568 62 54 32 0 0 LFH
006b0000 00001002 64 16 64 4 2 1 0 0
002b0000 00041002 256 4 256 2 1 1 0 0
00620000 00001002 64 16 64 5 2 1 0 0
00250000 00001002 64 16 64 4 2 1 0 0
007d0000 00041002 256 4 256 0 1 1 0 0
005c0000 00001002 1088 388 1088 7 17 2 0 0 LFH
02070000 00041002 256 4 256 1 1 1 0 0
02270000 00041002 256 144 256 0 1 1 0 0 LFH
04e10000 00001002 3136 1764 3136 384 36 3 0 0 LFH
External fragmentation 21 % (36 free blocks)
-----------------------------------------------------------------------------
But when I run !heap -p –a command, I don’t get any callstack, just the following. Any ideas how to get callstack of allocations source?
0:000> !heap -p -a 0218e008
address 0218e008 found in
_HEAP # 4e10000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
0218e000 001c 0000 [00] 0218e008 000d4 - (busy)
You should use deleaker. It's powerful tool for debuging.
use valgrind for linux and deleaker for windows.
If you don't get a call stack from !heap -p -a
The reason can be that you have not used gflags correctly
Remeber to use correct name including .exe
Try to start it inteactivly and go to the image tab, might be easier
Try with page heap, that also gives call stack
I know nothing about Windows, but at least on Unix systems a debugger (like gdb on Linux) is useful to understand callstacks.
And you could also circumvent some of your issues by using e.g. Boehm's conservative garbage collector. On many systems you can also hunt memory leaks with the help of valgrind
I created a basic TCP server that reads incoming binary data in protocol buffer format, and writes a binary msg as response. I would like to benchmark the the roundtrip time.
I tried iperf, but could not make it send the same input file multiple times. Is there another benchmark tool than can send a binary input file repeatedly?
If you have access to a linux or unix machine1, you should use tcptrace. All you need to do is loop through your binary traffic test while capturing with wireshark or tcpdump file.
After you have that .pcap file2, analyze with tcptrace -xtraffic <pcap_filename>3. This will generate two text files, and the average RTT stats for all connections in that pcap are shown at the bottom of the one called traffic_stats.dat.
[mpenning#Bucksnort tcpperf]$ tcptrace -xtraffic willers.pcap
mod_traffic: characterizing traffic
1 arg remaining, starting with 'willers.pcap'
Ostermann's tcptrace -- version 6.6.1 -- Wed Nov 19, 2003
16522 packets seen, 16522 TCP packets traced
elapsed wallclock time: 0:00:00.200709, 82318 pkts/sec analyzed
trace file elapsed time: 0:03:21.754962
Dumping port statistics into file traffic_byport.dat
Dumping overall statistics into file traffic_stats.dat
Plotting performed at 15.000 second intervals
[mpenning#Bucksnort tcpperf]$
[mpenning#Bucksnort tcpperf]$ cat traffic_stats.dat
Overall Statistics over 201 seconds (0:03:21.754962):
4135308 ttl bytes sent, 20573.672 bytes/second
4135308 ttl non-rexmit bytes sent, 20573.672 bytes/second
0 ttl rexmit bytes sent, 0.000 bytes/second
16522 packets sent, 82.199 packets/second
200 connections opened, 0.995 conns/second
11 dupacks sent, 0.055 dupacks/second
0 rexmits sent, 0.000 rexmits/second
average RTT: 67.511 msecs <------------------
[mpenning#Bucksnort tcpperf]$
The .pcap file used in this example was a capture I generated when I looped through an expect script that pulled data from one of my servers. This was how I generated the loop...
#!/usr/bin/python
from subprocess import Popen, PIPE
import time
for ii in xrange(0,200):
# willers.exp is an expect script
Popen(['./willers.exp'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
time.sleep(1)
You can adjust the sleep time between loops based on your server's accept() performance and the duration of your tests.
END NOTES:
A Knoppix Live-CD will do
Filtered to only capture test traffic
tcptrace is capable of very detailed per-socket stats if you use other options...
================================
[mpenning#Bucksnort tcpperf]$ tcptrace -lr willers.pcap
1 arg remaining, starting with 'willers.pcap'
Ostermann's tcptrace -- version 6.6.1 -- Wed Nov 19, 2003
16522 packets seen, 16522 TCP packets traced
elapsed wallclock time: 0:00:00.080496, 205252 pkts/sec analyzed
trace file elapsed time: 0:03:21.754962
TCP connection info:
200 TCP connections traced:
TCP connection 1:
host c: myhost.local:44781
host d: willers.local:22
complete conn: RESET (SYNs: 2) (FINs: 1)
first packet: Tue May 31 22:52:24.154801 2011
last packet: Tue May 31 22:52:25.668430 2011
elapsed time: 0:00:01.513628
total packets: 73
filename: willers.pcap
c->d: d->c:
total packets: 34 total packets: 39
resets sent: 4 resets sent: 0
ack pkts sent: 29 ack pkts sent: 39
pure acks sent: 11 pure acks sent: 2
sack pkts sent: 0 sack pkts sent: 0
dsack pkts sent: 0 dsack pkts sent: 0
max sack blks/ack: 0 max sack blks/ack: 0
unique bytes sent: 2512 unique bytes sent: 14336
actual data pkts: 17 actual data pkts: 36
actual data bytes: 2512 actual data bytes: 14336
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data bytes: 0 rexmt data bytes: 0
zwnd probe pkts: 0 zwnd probe pkts: 0
zwnd probe bytes: 0 zwnd probe bytes: 0
outoforder pkts: 0 outoforder pkts: 0
pushed data pkts: 17 pushed data pkts: 33
SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: 1/0
req 1323 ws/ts: Y/Y req 1323 ws/ts: Y/Y
adv wind scale: 6 adv wind scale: 1
req sack: Y req sack: Y
sacks sent: 0 sacks sent: 0
urgent data pkts: 0 pkts urgent data pkts: 0 pkts
urgent data bytes: 0 bytes urgent data bytes: 0 bytes
mss requested: 1460 bytes mss requested: 1460 bytes
max segm size: 792 bytes max segm size: 1448 bytes
min segm size: 16 bytes min segm size: 32 bytes
avg segm size: 147 bytes avg segm size: 398 bytes
max win adv: 40832 bytes max win adv: 66608 bytes
min win adv: 5888 bytes min win adv: 66608 bytes
zero win adv: 0 times zero win adv: 0 times
avg win adv: 14035 bytes avg win adv: 66608 bytes
initial window: 32 bytes initial window: 40 bytes
initial window: 1 pkts initial window: 1 pkts
ttl stream length: 2512 bytes ttl stream length: NA
missed data: 0 bytes missed data: NA
truncated data: 0 bytes truncated data: 0 bytes
truncated packets: 0 pkts truncated packets: 0 pkts
data xmit time: 1.181 secs data xmit time: 1.236 secs
idletime max: 196.9 ms idletime max: 196.9 ms
throughput: 1660 Bps throughput: 9471 Bps
RTT samples: 18 RTT samples: 24
RTT min: 43.8 ms RTT min: 0.0 ms
RTT max: 142.5 ms RTT max: 7.2 ms
RTT avg: 68.5 ms RTT avg: 0.7 ms
RTT stdev: 35.8 ms RTT stdev: 1.6 ms
RTT from 3WHS: 80.8 ms RTT from 3WHS: 0.0 ms
RTT full_sz smpls: 1 RTT full_sz smpls: 3
RTT full_sz min: 142.5 ms RTT full_sz min: 0.0 ms
RTT full_sz max: 142.5 ms RTT full_sz max: 0.0 ms
RTT full_sz avg: 142.5 ms RTT full_sz avg: 0.0 ms
RTT full_sz stdev: 0.0 ms RTT full_sz stdev: 0.0 ms
post-loss acks: 0 post-loss acks: 0
segs cum acked: 0 segs cum acked: 9
duplicate acks: 0 duplicate acks: 1
triple dupacks: 0 triple dupacks: 0
max # retrans: 0 max # retrans: 0
min retr time: 0.0 ms min retr time: 0.0 ms
max retr time: 0.0 ms max retr time: 0.0 ms
avg retr time: 0.0 ms avg retr time: 0.0 ms
sdv retr time: 0.0 ms sdv retr time: 0.0 ms
================================
You can always stick a shell loop around a program like iperf. Also, assuming iperf can read from a file (thus stdin) or programs like ttcp, could allow a shell loop catting a file N times into iperf/ttcp.
If you want a program which sends a file, waits for your binary response, and then sends another copy of the file, you probably are going to need to code that yourself.
You will need to measure the time in the client application for a roundtrip time, or monitor the network traffic going from, and coming to, the client to get the complete time interval. Measuring the time at the server will exclude any kernel level delays in the server and all the network transmission times.
Note that TCP performance will go down as the load goes up. If you're going to test under heavy load, you need professional tools that can scale to thousands (or even millions in some cases) of new connection/second or concurrent established TCP connections.
I wrote an article about this on my blog (feel free to remove if this is considered advertisement, but I think it's relevant to this thread): http://synsynack.wordpress.com/2012/04/09/realistic-latency-measurement-in-the-application-layers
As a very simple highlevel tool netcat comes to mind ... so something like time (nc hostname 1234 < input.binary | head -c 100) assuming the response is 100 bytes long.