How to get systeminfos with Ruby? - ruby

I like to get my Pi's systeminfos like CPU usage, CPU temp, RAM usage, uptime and the available disk size. I know how to do this in Python, but it wont work in Ruby. Can someone please tell me how I can achieve this? I think it must be Ruby, because I need it for my Siriproxy and the plugin is written in Ruby.
Tnanks in advantage!
This is the Python script:
#!/usr/bin/env python
import os, time
# Return CPU temperature as a character string
def getCPUtemperature():
res = os.popen('vcgencmd measure_temp').readline()
return(res.replace("temp=","").replace("'C\n",""))
# Return RAM information (unit=kb) in a list
# Index 0: total RAM
# Index 1: used RAM
# Index 2: free RAM
def getRAMinfo():
p = os.popen('free')
i = 0
while 1:
i = i + 1
line = p.readline()
if i==2:
return(line.split()[1:4])
# Return % of CPU used by user as a character string
def getCPUuse():
return(str(os.popen("top -n1 | awk '/Cpu\(s\):/ {print $2}'").readline().strip(\
)))
# Return information about disk space as a list (unit included)
# Index 0: total disk space
# Index 1: used disk space
# Index 2: remaining disk space
# Index 3: percentage of disk used
def getDiskSpace():
p = os.popen("df -h /")
i = 0
while 1:
i = i +1
line = p.readline()
if i==2:
return(line.split()[1:5])
# CPU informatiom
CPU_temp = getCPUtemperature()
CPU_usage = getCPUuse()
# RAM information
# Output is in kb, here I convert it in Mb for readability
RAM_stats = getRAMinfo()
RAM_total = round(int(RAM_stats[0]) / 1000,1)
RAM_used = round(int(RAM_stats[1]) / 1000,1)
RAM_free = round(int(RAM_stats[2]) / 1000,1)
# Disk information
DISK_stats = getDiskSpace()
DISK_total = DISK_stats[0]
DISK_free = DISK_stats[1]
DISK_perc = DISK_stats[3]

These method definitions should help you. I'm not in a position to test them, but they should be pretty close if not spot on.
def get_cpu_temperature
%x{vcgencmd measure_temp}.lines.first.sub(/temp=/, '').sub(/C\n/, '')
end
def get_ram_info
%x{free}.lines.to_a[1].split[1,3]
end
def get_cpu_use
%x{top -n1}.lines.find{ |line| /Cpu\(s\):/.match(line) }.split[1]
end
def get_disk_Space
%x{df -h /}.lines.to_a[1].split[1,4]
end

Related

Improve code result speed by multiprocessing

I'm self study of Python and it's my first code.
I'm working for analyze logs from the servers. Usually I need analyze full day logs. I created script (this is example, simple logic) just for check speed. If I use normal coding the duration of analyzing 20mil rows about 12-13 minutes. I need 200mil rows by 5 min.
What I tried:
Use multiprocessing (met issue with share memory, think that fix it). But as the result - 300K rows = 20 sec and no matter how many processes. (PS: Also need control processors count in advance)
Use threading (I found that it's not give any speed, 300K rows = 2 sec. But normal code same, 300K = 2 sec)
Use asyncio (I think that script is slow because need reads many files). Result same as threading - 300K = 2 sec.
Finally I think that all three my script incorrect and didn't work correctly.
PS: I try to avoid use specific python modules (like pandas) because in this case it will be more difficult to execute on different servers. Better to use common lib.
Please help to check 1st - multiprocessing.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a, n):
proc_num = os.getpid()
a_temp_m = a["vod_miss"]
a_temp_h = a["vod_hit"]
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m[n] = a_temp_m[n] + 1
elif j[3].find('HIT') != -1:
a_temp_h[n] = a_temp_h[n] + 1
a["vod_miss"][n] = a_temp_m[n]
a["vod_hit"][n] = a_temp_h[n]
if __name__ == '__main__':
procs = []
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
n = 1
vod_live_cuts[i] = manager.list([0] * cpu)
vod_live_cuts[ii] = manager.list([0] * cpu)
for m in file:
proc = Process(target=argument, args=(m, vod_live_cuts, (n-1)))
procs.append(proc)
proc.start()
if n >= cpu:
n = 1
proc.join()
else:
n += 1
[proc.join() for proc in procs]
[proc.close() for proc in procs]
I'm expect, each file by def argument will be processed by independent process and finally all results will be saved in dict vod_live_cuts. For each process I added independent list in dict. I think it will help cross operation for use this parameter. But maybe it's wrong way :(
using IPC is costly, so only use "shared objects" for saving the final result, not for intermediate results while parsing the file.
limiting the number of processes is done by using a multiprocessing.Pool, the following code uses it to reach the max hard-disk speed, you only need to post-process the results.
you can only parse data as fast as your HDD can read it (typically 30-80 MB/s), so if you need to improve the performance further you should use SSD or RAID0 for higher disk speed, you cannot get much faster than this without changing your hardware.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager, Pool
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a):
proc_num = os.getpid()
a_temp_m_n = 0 # make it local to process
a_temp_h_n = 0 # as shared lists use IPC
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m_n = a_temp_m_n + 1
elif j[3].find('HIT') != -1:
a_temp_h_n = a_temp_h_n + 1
a["vod_miss"].append(a_temp_m_n)
a["vod_hit"].append(a_temp_h_n)
if __name__ == '__main__':
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
vod_live_cuts[i] = manager.list()
vod_live_cuts[ii] = manager.list()
with Pool(cpu) as pool:
tasks = []
for m in file:
task = pool.apply_async(argument, args=(m, vod_live_cuts))
tasks.append(task)
for task in tasks:
task.get()
print(list(vod_live_cuts[i]))
print(list(vod_live_cuts[ii]))

Fast and furious random reading of huge files

I know that the question isn't new but I haven't found anything useful. In my case I have a 20 GB file and I need to read random lines from it. Now I have simple file index which contains line numbers and corresponding seek offsets. Also I disabled buffering when reading to read only the needed line.
And this is my code:
def create_random_file_gen(file_path, batch_size=0, dtype=np.float32, delimiter=','):
index = load_file_index(file_path)
if (batch_size > len(index)) or (batch_size == 0):
batch_size = len(index)
lines_indices = np.random.random_integers(0, len(index), batch_size)
with io.open(file_path, 'rb', buffering=0) as f:
for line_index in lines_indices:
f.seek(index[line_index])
line = f.readline(2048)
yield __get_features_from_line(line, delimiter, dtype)
The problem is that it's extremely slow: reading of 5000 lines takes 89 seconds on my Mac(here I point to ssd drive). There is code I used for testing:
features_gen = tedlium_random_speech_gen(5000) # just a wrapper for function given above
i = 0
for feature, cls in features_gen:
if i % 1000 == 0:
print("Got %d features" % i)
i += 1
print("Total %d features" % i)
I've read something about files memory mapping but I don't really understand how it works: how the mapping works in essence and will it speed up the process or no.
So the main question what are the possible ways to speed up the process? The only way I see now is to read randomly not every line but blocks of lines.

Improving an algorithm for substring search when reading ZIP files

So I have a ZIP reader library, and I read ZIP files by first figuring out where the EOCD record is (the standard way "from the tail"). I have to look for a pattern that is roughly this:
4byte_magic_number, fixed_n_bytes, 2_bytes_of_comment_size, comment
The bytesize of comment is provided in the 2_bytes_of_comment_size. Just scanning for the magic number is insufficient, because I eager-read a substantial portion at the tail of the file - basically the maximum size the ZIP EOCD record can be, and then look for this pattern in there.
So far, I came up with this
def locate_eocd_signature(in_str)
# We have to scan from the _very_ tail. We read the very minimum size
# the EOCD record can have (up to and including the comment size), using
# a sliding window. Once our end offset matches the comment size we found our
# EOCD marker.
eocd_signature_int = 0x06054b50
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
end_location = minimum_record_size * -1
loop do
# If the window is nil, we have rolled off the start of the string, nothing to do here.
# We use negative values because if we used positive slice indices
# we would have to detect the rollover ourselves
break unless window = in_str[end_location, minimum_record_size]
window_location = in_str.bytesize + end_location
unpacked = window.unpack(unpack_pattern)
# If we found the signature, pick up the comment size, and check if the size of the window
# plus that comment size is where we are in the string. If we are - bingo.
if unpacked[0] == 0x06054b50 && comment_size = unpacked[-1]
assumed_eocd_location = in_str.bytesize - comment_size - minimum_record_size
# if the comment size is where we should be at - we found our EOCD
return assumed_eocd_location if assumed_eocd_location == window_location
end
end_location -= 1 # Shift the window back, by one byte, and try again.
end
end
but it just screams ugly at me. Is there a better way to do something like this? Is there a pack specifier that says "all the bytes in binary until the the end of the string" that I do not know of? Then I could tack that onto the end of the pack specifier for example... A bit at loss here.
In the end I opted for the following optimization. First, I made a method for finding all the indices of a given substring in a string - there is no stdlib builtin for this.
def all_indices_of_substr_in_str(of_substring, in_string)
last_i = 0
found_at_indices = []
while last_i = in_string.index(of_substring, last_i)
found_at_indices << last_i
last_i += of_substring.bytesize
end
found_at_indices
end
Then, we use it to "latch" onto the offsets in our buffer where our signature was found.
def locate_eocd_signature(in_str)
eocd_signature = 0x06054b50
eocd_signature_str = [eocd_signature].pack('V')
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
str_size = in_str.bytesize
indices = all_indices_of_substr_in_str(eocd_signature_str, in_str)
indices.each do |check_at|
maybe_record = in_str[check_at..str_size]
# If the record is smaller than the minimum - we will never recover anything
break if maybe_record.bytesize < minimum_record_size
# Now we check if the record ends with the combination
# of the comment size and an arbitrary byte string of that size.
# If it does - we found our match
*_unused, comment_size = maybe_record.unpack(unpack_pattern)
if (maybe_record.bytesize - minimum_record_size) == comment_size
return check_at # Found the EOCD marker location
end
end
# If we haven't caught anything, return nil deliberately instead of returning the last statement
nil
end

Ruby takes the doubled amount of RAM it should

I was trying things out with my homeserver and wrote a little ruby program that fills up the RAM by a given amount. But actually I have to halve the amount of bytes I want to put into the RAM. Am I missing something here or is this a bug?
Here the code:
class RAM
def initialize
#b = ''
end
def fill_ram(size)
puts 'Choose if you want to set the size in bytes, megabytes or gigabytes.'
answer = ''
valid = ['bytes', 'megabytes', 'gigabytes']
until valid.include?(answer)
answer = gets.chomp.downcase
if answer == 'bytes'
size = size * 0.5
elsif answer == 'megabytes'
size = size * 1024 * 1024 * 0.5
elsif answer == 'gigabytes'
size = size * 1024 * 1024 * 1024 * 0.5
else
puts 'Please choose between bytes, megabytes or gigabyte.'
end
end
size1 = size
if #b.bytesize != 0
size1 = size + #b.bytesize
end
until #b.bytesize == size1
#b << '0' * size
end
size = 0
end
def clear_ram
exit
end
def read_ram
puts 'At the moment this program fills ' + #b.bytesize.to_s + ' bytes of RAM'
end
end
Just imagine that the "* 0.5" at each line wouldn't be there.
I did test it in IRB and just created a new RAM object and filled it with 1000 Megabytes of data. In my case it filled the RAM actually with 2000 Megabytes of data, so I did add the times 0.5 to each line, but that can't be the solution.
When I run it I get:
Choose if you want to set the size in bytes, megabytes or gigabytes.
bytes
At the moment this program fills 512 bytes of RAM
I think the problem is the missing check for the encoding.
I ran my test in US-ASCII (One character = 1 Byte).
If you run it in UTF-16 you have an explanation for your problem.
Can you try the following code to check your encoding:
p Encoding.default_internal
p Encoding.default_external
After reading the comment:
The result of your script depends on the parameter of RAM.fill_ram. How do you start your script - and how often do you call RAM.fill_ram?
Please provide the full code.
I called my example with
r = RAM.new
r.fill_ram(1024)
r.read_ram

fastest way to create huge files?

I need to create a huge file filled with anything. I'm doing it this way but it takes so long:
exit 1 unless ARGV.length > 0
File.open("file-#{ARGV[0]}M.txt", 'w') do |f|
(ARGV[0].to_i*1048576).times {f.write(1) }
end
What's the best way of doing that (in platform independent way?)
In *nix, use dd:
system("dd if=/dev/zero of=" + f + " bs=1 count=0 seek=" + ARGV[0] + "M");
If you want some content (instead of zeros) in the file, use
/dev/random
for if instead of /dev/zero
If you want a non-sparse file, use
bs=#{ARGV[0]}M
and omit seek
Universal method:
#Create a 1M fill buffer
fills = '1'*1048576
File.open("file-#{ARGV[0]}M.txt", 'w') do |f|
(ARGV[0].to_i).times {f.write(fills) }
end
It is similar to the one you have, but it writes 1M at a time. You write 1 byte at a time which creates a lot of overhead for hard disk to search and write. Writing 1M at a time will be much faster. If you have an even faster hard drive (like 16M/s), you can try to increase 1M to 16M.
A pure Ruby option:
n = ARGV[0] or exit 1
File.open("file-#{n}M.txt", 'w') do |f|
contents = "x" * (1024*1024)
n.to_i.times { f.write(contents) }
end

Resources