I've written a filter and use its register-function to load an external CSV-file and fill a bunch of hash-tables. The filter-function then accesses the hash-tables and adds fields to the event. While that's working nicely, the downside is that it only loads once and I'd need to restart logstash to trigger the reload after a change in the CSV-file. Maybe I should add that the filter is currently consuming events coming from three different file inputs.
Writing an input doesn't seem to solve it as the input is not tied to the filter in some way. Therefore, my plan is to somehow reload the CSV-file every few hours or at a particular time and somehow block the entire filter during that, i.e. pause incoming events. That sounds like a weird thing to do and I'm not sure whether or not logstash is actually meant to be used like this.
I'm a newbie regarding Ruby and actually I'm quite amazed that the filter is working this nice. As Google let me down on the entire issue I'm hoping that anyone on here has experience with this, can post a link to an example or can point me to another way of solving this.
For educational purposes I looked into the source of logstash and noticed that I could actually understand what's going on and things are much less complicated than I had thought.
There is a function filterworker in pipeline.rb and a class filterworker and I don't know which one is actually used, but my findings seem to be true for both.
Basically all filters seem to run in one thread in case it's not configured otherwise. This means that I can reload the file anywhere in the filter-function and the entire processing for all filters is paused (input and output might still do something, but that's handled by the queue for the events holding maximum 20 entries).
Therefore, this seems to do it for me:
public
def register
#config_files_read_timestamps = {}
read_config_files
end # def register
def filter(event)
# return nothing unless there's an actual filter event
return unless filter?(event)
read_config_files
:
# filter_matched should go in the last line of our successful code
filter_matched(event)
end # def filter
private
def read_config_files
read_marker_file
:
end
def check_for_changed_file?(filename)
mtime = File.mtime(filename)
#config_files_read_timestamps[filename] ||= Time.at(0)
if #config_files_read_timestamps[filename] < mtime
#config_files_read_timestamps[filename] = mtime
return true
end
end
def read_marker_file
if !check_for_changed_file?("markers.txt")
return
end
:
end
Obviously I don't need a separate thread for the parsing. It would become necessary if I plan to start the reload at a specific time. In that case I'd have to join the thread and then continue with event handling.
Let me know if there could be improvements...
Is there a way to capture and store (or write to a file) the values returned in the Response? (Checkpoint values)
Using HP UFT 11.52
Thanks,
Lynn
I figured it out. In UFT API under Standard Activities, there are File function modules including "Write to File". I added the module to the test, set the path and other properties, passed the variable to the file and it worked! Couldn't be easier.
I mentioned this on my other answer , you can also write it programatically if you have dynamic array response please refer below:
https://stackoverflow.com/a/28012383/3972994
After running a test, in the test folder, you can find a Snapshots/LastIteration directory.
In it you can find the return value for each step saved in a txt file.
Pay attention that if you data drive the step, only the last iteration will be saved to file.
However, in the Test's log (Test dir/Log/vtd_user.log) you can find all the iterations persisted
Thanks,
Yossi
You do not need to use the standard activities if you do this
var iResponse = this.Activity.responsebody;
System.IO.File.WriteLines(#"directorypath&FileName);
the above will write the response to the file and rewrite it for every run
I'm trying to use the method recreate_versions! but I'm using the method from the wiki to create unique filenames. The problem is that when I run recreate_versions! it changes the filenames but it doesn't update them on the mounted object itself. How could I refresh these URL's?
A solution that works when dealing with caching is to save the mounted object after recreating versions:
Example:
avatar.image.recreate_versions!
avatar.save!
This way you can keep using unique filenames even when recreating versions and properly handle caching.
Here is what worked for me. It uses the filename if it already exists. So they don't change when you recreate_versions!
def filename
if original_filename
if model && model.read_attribute(:avatar).present? #or whatever you call your column
model.read_attribute(:avatar)
else
# create new filename however you're doing it
end
end
end
Assuming I have a directory structure similar to:
path_to_file/one/index.html
How can I set my sinatra app to be routed to
mysite.com/path_to_file/one/
and have the previously mentioned file to render? path_to_file will always stay the same, but there will be different folders (two, three, etc.) inside it.
I've tried the following:
get '/path_to_file/:number' do
File.read(File.join('path_to_file', "#{params[:number]}", "index.html"))
end
but then the e.g. javascript file linked from index.html doesn't render correctly.
Got it!
get '/path_to_file/:number/:file' do
File.read(File.join('path_to_file', "#{params[:number]}", "#{params[:file]}"))
end
get '/path_to_file/:number' do
File.read(File.join('path_to_file', "#{params[:number]}", "index.html"))
end
Order is important, since if these two methods are reversed, get '/path_to_file/:number' becomes a superset of get '/path_to_file/:number/:file'.
Just a thought, but you could setup your server software, Apache, nginx, whatever it is you're using, to serve .css and .js and image files from a different location.
I have a Mechanize based Ruby script to scrape a website. I am hoping to speed it up by caching the downloaded HTML pages locally to make the whole "tweak output -> run -> tweak output" cycle quicker. I would prefer not to have to install an external cache on the machine just for this script. The ideal solution would plugin to Mechanize and transparently cache fetched pages, images and so on.
Anyone know of a library that will do this? Or another way of achieving the same outcome (script runs much quicker second time round)?
A good way of doing this type of thing is to use the (AWESOME) VCR gem.
Here's an example of how you would do it:
require 'vcr'
require 'mechanize'
# Setup VCR's configs. The cassette library directory is where
# all of your "recordings" are saved as YAML files.
VCR.configure do |c|
c.cassette_library_dir = 'vcr_cassettes'
c.hook_into :webmock
end
# Make a request...
# The first time you do this it will actually make the call out
# Subsequent calls will read the cassette file instead of hitting the network
VCR.use_cassette('google_homepage') do
a = Mechanize.new
a.get('http://google.com/')
end
As you can see... VCR records the communication as a YAML file on the first run:
mario$ find tester -mindepth 1 -maxdepth 3
tester/vcr_cassettes
tester/vcr_cassettes/google_homepage.yml
If you want to have VCR create new versions of the cassettes, just delete the corresponding file.
I'm not sure that caching the pages is going to help that much. What will help more is to have a record of previously visited URLs so you don't revisit them repeatedly. The page caching is moot because you should have already grabbed the important information when you saw the page the first time so all you need to do is check to see if you've seen it already. If you have, grab the summary information you care about and manipulate it as necessary.
I used to write analytical spiders using Perl's Mechanize. Ruby's Mechanize is based on it. Storing the previously visited URLs in SOME sort of cache was useful, like a hash, but, because apps crash or hosts go down mid-session, all the previous results would be gone. A real disk-based database was essential at that point.
I like Postgres, but even SQLite is a good choice. Whatever you use, get the important information on the drive where it can survive a restart or crash.
Something else I'd recommend, is use a YAML file for configuration of your app. Put every parameter that is likely to be changed during the app's run in there. Then, write the app so it periodically checks that file's modification time and reloads it if there's been a change. That way, you can adjust its run-time behavior on the fly. I had to write a spider to analyze a Fortune 50 corporation's multiple-websites several years ago. The app ran for three weeks spidering many different sites tied to that corporation, and because I could tweak the regex used to control which pages the app processed, I could fine tune it without shutting down that app.
If you store some information about the page after the first request, you can rebuild the page later without having to re-request it from the server.
# 1) store the page information
# uri: a URI instance
# response: a hash of response headers
# body: a string
# code: the HTTP response code
page = agent.get(url)
uri, response, body, code = [page.uri, page.response, page.body, page.code]
# 2) rebuild the page, given the stored information
page = Mechanize::Page.new(uri, response, body, code, agent)
I've used this technique in spiders/scrapers so that the code can be tweaked without having to re-request all the pages. e.g.:
# agent: a Mechanize instance
# storage: must respond to [] and []=, and must accept and return arbitrary ruby objects.
# for in-memory storage, you could use a Hash.
# or, you could write something that is backed by a filesystem, mongodb, riak, redis, s3, etc...
# logger: a Logger instance
class Foobar < Struct.new(:agent, :storage, :logger)
def get_cached(uri)
cache_key = "_cache/#{uri}"
if args = storage[cache_key]
logger.debug("getting (cached) #{uri}")
uri, response, body, code = args
page = Mechanize::Page.new(uri, response, body, code, agent)
agent.send(:add_to_history, page)
page
else
logger.debug("getting (UNCACHED) #{uri}")
page = agent.get(uri)
storage[cache_key] = [page.uri, page.response, page.body, page.code]
page
end
end
end
Which you could use like this:
require 'logger'
require 'pp'
require 'rubygems'
require 'mechanize'
storage = {}
foo = Foobar.new(Mechanize.new, storage, Logger.new(STDOUT))
foo.get_cached("http://ifconfig.me/ua")
foo.get_cached("http://ifconfig.me/ua")
foo.get_cached("http://ifconfig.me/ua")
foo.get_cached("http://ifconfig.me/encoding")
foo.get_cached("http://ifconfig.me/encoding")
pp storage
Which prints the following information:
D, [2013-10-19T14:13:32.019291 #18107] DEBUG -- : getting (UNCACHED) http://ifconfig.me/ua
D, [2013-10-19T14:13:36.375649 #18107] DEBUG -- : getting (cached) http://ifconfig.me/ua
D, [2013-10-19T14:13:36.376822 #18107] DEBUG -- : getting (cached) http://ifconfig.me/ua
D, [2013-10-19T14:13:36.376910 #18107] DEBUG -- : getting (UNCACHED) http://ifconfig.me/encoding
D, [2013-10-19T14:13:52.830416 #18107] DEBUG -- : getting (cached) http://ifconfig.me/encoding
{"_cache/http://ifconfig.me/ua"=>
[#<URI::HTTP:0x007fe4ac94d098 URL:http://ifconfig.me/ua>,
{"date"=>"Sat, 19 Oct 2013 19:13:33 GMT",
"server"=>"Apache",
"vary"=>"Accept-Encoding",
"content-encoding"=>"gzip",
"content-length"=>"87",
"connection"=>"close",
"content-type"=>"text/plain"},
"Mechanize/2.7.2 Ruby/2.0.0p247 (http://github.com/sparklemotion/mechanize/)\n",
"200"],
"_cache/http://ifconfig.me/encoding"=>
[#<URI::HTTP:0x007fe4ac99d2a0 URL:http://ifconfig.me/encoding>,
{"date"=>"Sat, 19 Oct 2013 19:13:48 GMT",
"server"=>"Apache",
"vary"=>"Accept-Encoding",
"content-encoding"=>"gzip",
"content-length"=>"42",
"connection"=>"close",
"content-type"=>"text/plain"},
"gzip,deflate,identity\n",
"200"]}
How about writing pages out to files, each page in an individual file, and separating the tweak and run cycles?