I have a file in tune of few hundred MBs to be compressed. I don't need to go through that file per se, so I am free to use either system or use Zlib like explained in this SO question.
I am inclined towards system because my ruby process doesn't have to bother reading it and bloating up, so use well known gzip command to run through system. Also, I get the exit status, so I know how it went.
Anything I am missing? Is there a best practice around this? Any loopholes?
If you will use system command than you can't intervene in compression. So you won't be able to redirect compressed output to socket, provide external progress bar, combine custom tar archive etc. These things may be important during compression of large files.
Please look at the following example using ruby-zstds (zstd is better than gzip today).
require "socket"
require "zstds"
require "minitar"
TCPSocket.open "google.com", 80 do |socket|
writer = ZSTDS::Stream::Writer.new socket
begin
Minitar::Writer.open writer do |tar|
tar.add_file_simple "file.txt" do |tar_writer|
File.open "file.txt", "r" do |file|
tar_writer.write(file.read(512)) until file.eof?
end
end
tar.add_file_simple "file2.txt" ...
end
ensure
writer.close
end
end
We are reading file.txt in a streaming way, adding it to tar archive and sending portions to google immediately. We don't need to store any compressed files.
Context:
I'm on Linux. I am writing a disk usage extension for Sensu. Extensions must be non-blocking, because they are included in the agent's main code loop. They also must be very lightweight, because they may be triggered as often as once every 10 seconds, or even down to once per second.
So I cannot spawn a new executable to gather disk usage information. From within Ruby, I can only do stuff like File.open() on /proc and /sys and so on, read the content, parse it, file.close(), then print the result. Repeat.
I've found the sys-filesystem gem, which appears to have everything I need. But I'd rather not force extensions to depend on gems, if it can be avoided. I'll use the gem if it turns out to be the best way, but is there a good alternative? Something that doesn't require a ton of coding?
The information can be accessed via the system call statfs
http://man7.org/linux/man-pages/man2/statfs.2.html
I can see there is a ruby interface to this here:
http://ruby-doc.org/core-trunk/File/Statfs.html
While I am developing my application I need to do tons of math over and over again, tweaking it and running again and observing results.
The math is done on arrays that are loaded from large files. Many megabytes. Not very large but the problem is each time I run my script it first has to load the files into arrays. Which takes a long time.
I was wondering if there is anything external that works similarly to arrays, in that I can know the location of data and just get it. And that it doesn't need to reload everything.
I don't know much about databases except that they seem to not work the way I need to. They aren't ordered and always need to search through everything. It seems. Still a possibility is in-memory databases?
If anyone has a solution it would be great to hear it.
Side question - isn't it just possible to have user entered scripts that my ruby program runs so I can have the main ruby program run indefinitely? I still don't know anything about user entered options and how that would work though.
Use Marshal:
# save an array to a file
File.open('array', 'w') { |f| f.write Marshal.dump(my_array) }
# load an array from file
my_array = File.open('array', 'r') { |f| Marshal.load(f.read) }
Your OS will keep the file cached between saves and loads, even between runs of separate processes using the data.
So let's say pretty often a script runs that opens a browser and does web things:
require 'watir-webdriver'
$browser = Watir::Browser.new(:firefox, :profile => "botmode")
=> #<Watir::Browser:0x7fc97b06f558 url="about:blank" title="about:blank">
It could end gracefully with a browser.close, or it could crash sooner and leave behind the memory-hungry Firefox process, unnoticed until they accumulate and slow the server to a crawl.
My question is twofold:
What is a good practice to ensure that even in case of script failure anywhere leading to immediate error exit, the subprocess will always get cleaned up (I already have lots of short begin-rescue-end blocks peppered for other unrelated small tests)
More importantly, can I simply remember this Watir::Browser:0x7fc97b06f558 object address or PID somehow and re-assign it to another $browser variable in a whole new Ruby process, for example irb? I.e. can an orphaned browser on webdriver be re-attached later in another program using watir-webdriver on the same machine? From irb I could then get in and re-attach to the browser left behind by the crashed Ruby script, to examine the website it was on, check what went wrong, what elements are different than expected, etc.
Another hugely advantageous use of the latter would be to avoid the overhead of potentially hundreds of browser startups and shutdowns per day...best to keep one alive as sort of a daemon. The first run would attempt to reuse a previous browser object using my specially prepared botmode profile, otherwise create one. Then I would deliberately not call $browser.close at the end of my script. If nothing else I run an at job to kill the Xvfb :99 display FF runs inside of at the end of the day anyway (giving FF no choice but to die with it, if still running). Yes I am aware of Selenium standalone jar, but trying to avoid that java service footprint too.
Apologies if this is more a basic Ruby question. I just wasn't sure how to phrase it and keep getting irrelevant search results.
I guess, U cant just remember the variable from another process. But the solution might be creating a master process and process your script in loop in thread, periodically checking the browser running state. I'm using some thing similar in my acceptance tests on Cucumber + watir. So it will be some thing like that:
require 'rubygems'
require 'firewatir' # or watir
#browser = FireWatir::Firefox.new
t = Thread.new do
#browser.goto "http://google.com"
#call more browser actions here
end
while not_exit?
if t.stop?
# error occurred in thread, restart or exit
end
if browser_live?
# browser was killed for a some reason
# restart or exit
end
end
#browser.close
not_exit? - can be over TRAP for the ctrl+C
browser_live? - you can check if firefox browser running with processes listings
It is quite tricky but might work for you
You can use DRb like this:
browsers pool:
require 'drb'
require 'watir'
browser = Watir::Browser.new :chrome
DRb.start_service 'druby://127.0.0.1:9395', browser
gets
and then from test script use this browser:
require 'drb'
browser = DRbObject.new_with_uri 'druby://127.0.0.1:9395'
browser.goto 'stackoverflow.com'
I'm pretty sure that at the point ruby exits, any handles or pointers to something like a browser object would become invalid. So re-using something in a later ruby process is likely not a good approach. In addition I might be wrong on this, but it does seem that webdriver is not very good at connecting to a running browser process. So for your approach to work it would really all need to be wrapped by some master process that was calling all the tests etc.. and hey wait a sec, that's starting to sound like a framework, which you might already (or perhaps should be) using in the first place.
So a better solution is probably to look at whatever framework you are using to run your tests and investigate any capability for 'setup/teardown' actions (which can go by different names) which are run before and after either each test, groups of tests, or all tests. Going this way is good since most frameworks are designed to allow you to run any single test, or set of tests that you want to. And if your tests are well designed they can be run singly without having to expect the system was left in some perfect state by a prior test. Thus these sorts of setup/teardown actions are designed to work that way as well.
As an example Cucumber has this at the feature level, with the idea of a 'background' which is basically intended as a way to dry out scenarios by defining common steps to run before each scenario in a feature file. (such as navigating to and logging into your site) This could include a call to a series of steps that would look to see if a browser object existed, and if not create one. However you'd need to put that in every feature file which starts to become rather non dry.
Fortunately cucumber also allows a way to do this in one place via the use of Hooks. You can define hooks to run before steps, in the event of specific conditions, 'before' and 'after' each scenario, as well as code that runs once before any scenarios, and code defined to run 'at_exit' where you could close the browser after all scenarios have run.
If I was using cucumber I'd look at the idea of a some code in env.rb that would run at the start to create a browser, complemented by at_exit code to close the browser. Then perhaps also code in a before hook which could check to see that the browser is still there and re-create it if needed, and maybe logout actions in a after hook. Leave stuff like logging in for the individual scenarios, or a background block if all scenarios in a feature login with the same sort of user.
Not so much a solution but a workaround for part 1 of my question, using pkill. Posting here since it turned out to be a lot less trivial than I had hoped.
After the ruby script exits, its spawned processes (which may not at all belong in the same PID tree anymore, like firefox-bin) have a predictable "session leader" which turned out to be the parent of the bash shell calling rubyprogram.rb in my case. Available as $PPID in Bash, for when you have to go higher than $$.
Thus to really clean up unwanted heavyweight processes eg. after a ruby crash:
#!/bin/bash
# This is the script that wraps on top of Ruby scripts
./ruby_program_using_watirwebdriver_browser.rb myparams & # spawn ruby in background but keep going below:
sleep 11 # give Ruby a chance to launch its web browser
pstree -panu $$ # prints out a process tree starting under Bash, the parent of Ruby. Firefox may not show!
wait # now wait for Ruby to exit or crash
pkill -s $PPID firefox-bin # should only kill firefox-bin's caused above, not elsewhere on the system
# Another way without pkill, will also print out what's getting killed if anything:
awk '$7=="firefox-bin" && $3=="'$PPID'" {print $1}' <(ps x -o pid,pgid,sess,ppid,tty,time,comm) | xargs -rt kill
OPTIONAL
And since I use a dedicated Xvfb Xwindows server just for webdriving on DISPLAY :99, I can also count on xkill:
timeout 1s xwininfo -display :99 -root -all |awk '/("Navigator" "Firefox")/ {print $1}' |xargs -rt xkill -display :99 -id
# the timeout is in case xkill decides to wait for user action, when window id was missing
Just an update on part 2 of my question.
It seems one CAN serialize a Watir:Browser object with YAML, and because it's text-based the contents were quite interesting to me (e.g. some things I've only dreamed of tweaking hidden inside private elements of private classes...but that's a separate topic)
Deserializing from YAML is still trouble. While I haven't tested beyond the first try it gives me some kind of reg exp parse error...not sure what that's about.
(more on that at at how to serialize an object using TCPServer inside? )
Meanwhile, even attempting to serialize with Marshal, which is also built-in to Ruby but stores in binary format, results in a very reasonable-sounding error about not being able to dump a TCPServer object (apparently contained within my Watir:Browser pointed to by $browser)
All in all I'm not surprised at these results, but still pretty confident there is a way, until Watir arrives at something more native (like PersistentWebdriver or how it used to be in the days of jssh when you could simply attach to an already running browser with the right extension)
Until then, if serialization + deserialization to a working object gets too thorny I'll resort to daemonizing a portion of my Ruby to keep objects persistent and spare the frequent and costly setup/teardowns. And I did take a gander at some established (unit testing) frameworks but none seem to fit well yet within my overall software structure--I'm not web testing after all.
I have a script that is getting the GeoIP locations of various ips, this is run daily and I'm going to expect to have around ~50,000 ips to look up.
I have a GeoIP system set up - I just would like to eliminate having to run wget 50,000 times per report.
What I was thinking is, there must be some way to have wget open a connection with the url - then pass the ips, that way it doesn't have to re-establish the connection.
Any help will be much appreciated.
If you give wget several addresses at once, with consecutive addresses belonging to the same HTTP/1.1 (Connection: keep-alive) supporting server, wget will re-use the already-established connection.
If there are too many addresses to list on the command line, you can write them to a file and use the -i/--input-file= option (and, per UNIX tradition, -i-/--input-file=- reads standard input).
There is, however, no way to preserve a connection across different wget invocations.
You could also write a threaded Ruby script to run wget on multiple input files simultaneously to speed the process up. So if you have 5 files containing 10,000 addresses each, you could use this script:
#!/usr/bin/ruby
threads = []
for file in ARGV
threads << Thread.new(file) do |filename|
system("wget -i #{filename}")
end
end
threads.each { |thrd| thrd.join }
Each of these threads would use one connection to download all addresses in a file. The following command then means only 5 connections to the server to download all 50,000 files.
./fetch.rb "list1.txt" "list2.txt" "list3.txt" "list4.txt" "list5.txt"
You could also write a small program (in Java or C or whatever) that sends the list of files as a POST request and the server returns an object with data about them. Shouldn't be too slow either.