Ruby Gems with Persistent data - ruby

I want to create ruby application (not rails). This is a console app which will need to persist some data. I'm using pstore as the database. I want to deploy this application as a gem.
My question is: where does my data live?
Currently I've created a data directory as a sibling to the bin directory in a standard gem layout. I would therefore, expect that the gem would be storing its data "inside itself" after it gets deployed. But when I do a local gem install to test, I find that the data is being stored locally to the project files, not somewhere inside the gems directory.
Of course it could be I simply mis-understand what "rake install_gem" is doing. Also, I am vaguely worried that if I need to sudo to install the gem, that it will actually be able to create the data file "inside itself" in the gem directory.
Can someone clarify this a little bit?
Thank you.
John Schank
#makevoid - thanks for the reply. Here is the entirety of my main script. In /bin directory... (I added it to the main question because I'm not familiar with how to format content in a comment - and the pasted code looked awful.
#!/usr/bin/env ruby
$LOAD_PATH.unshift File.dirname(__FILE__) + '/../lib'
require 'timesheet'
begin
command_hash = TimesheetParser.parse
store = YAML::Store.new("data/time_entries.yaml")
tl = TimeLog.new(store)
ts = Timesheet.new(tl)
ts.process(command_hash)
rescue Exception => e
raise if command_hash[:debug]
puts e.message

On Linux there are two common used locations for storing variable data.
/home/user/.application
If every user needs it's own storage this is usually done in the users home directory. The path for your storage in the users home directory should be
ENV["HOME"] + "/." + $application_name
/var/lib/application
If all users share the storage, or the application is intended to be run by only one user (most daemons), /var is the right place to store all kind of data.
/var/log for logs
/var/run for pid's
/var/lock for lock files
/var/www for httpservers
/var/tmp for not important but persistant data
/var/lib for all other data
The path for your storage in /var should be
"/var/lib/" + $application_name
Make sure, the permissions for this directory are such, that you don't have to let your application run as root.

You definitely don't want to store data inside the gem directory. The expected behaviour is that users can uninstall and reinstall gems without any problems. If you have data in your gem's installed directory, uninstalling the gem will destroy that data and piss off your users.
johannes has the right ideas for use on Linux. For a Mac the specific directories would be a little different. The same goes for Windows. You'll need to research what the right places are for each platform you want to target and have your code conditionally switch storage locations depending on what kind of host it runs on.
Don't forget to let users override your defaults. A way to do that will make them very happy :)

Related

Can I replace %USERPROFILE% and still get KNOWNFOLDERIDs from the registry?

We're developing an open source Python library that runs on Linux, MacOS, and Windows, but we don't have much experience or exposure to Windows in the developer team. The way we setup and run our test suite works fine under Linux and Mac, but is suboptimal on Windows.
Our tests set up a new directory in a temporary location, place a fake .gitconfig with relevant configurations inside it, and have the relevant HOME environment variables point to this location as the home directory in order to pick up the configurations during testing.
The code is shortened and can't be run, but hopefully illustrates the gist of what we do:
with make_tempfile(mkdir=True) as new_home:
pass
for v, val in get_home_envvars(new_home).items():
set_envvar(v, val)
if not os.path.exists(new_home):
os.makedirs(new_home)
with open(os.path.join(new_home, '.gitconfig'), 'w') as f:
f.write("""\
[user]
name = Tester
email = test#example.com
[more configs for testing]
exc = 1
""")
where get_home_envvars() makes sure that the $HOME env variable points to the new, temporary test home. On Windows since Python 3.8, os.path no longer queried the $HOME variable to determine a user's home, but USERPROFILE[1 ][2], so we've just overwritten this variable with the temporary test home:
def get_home_envvars(new_home):
environ = os.environ
out = {'HOME': new_home}
if on_windows:
# requires special handling, since it has a number of relevant variables
# and also Python changed its behavior and started to respect USERPROFILE only
# since python 3.8: https://bugs.python.org/issue36264
out['USERPROFILE'] = new_home
out['HOMEDRIVE'], out['HOMEPATH'] = splitdrive(new_home)
return {v: val for v, val in out.items() if v in os.environ}
However, we have now discovered that this breaks our test setup on Windows, with tests "bleeding" their caches, cookie data bases etc. into the places where we perform our unit tests, and with this creating files and directories that break our test assumptions.
I have a very limited understanding on what happens exactly, but my current hypothesis is this: Our library determines the appropriate locations for caches, logs, cookies, etc upon start by using appdirs [3], which does so by querying the "special folder" IDs/ CSIDLs that Windows has [4]. This information is determined in the Windows registry - which is found based on the USERPROFILE. To quote one specific reply in the Python bug tracker to this change:
This is unfortunate. Modifying USERPROFILE is highly unusual. USERPROFILE is the location of the user's "NTUSER.DAT" registry hive and local application data ("AppData\Local"), including "UsrClass.dat" (the "Software\Classes" registry hive). It's also the default location for a user's known shell folders and home directory. Modifying USERPROFILE shouldn't cause problems with any of this, but I'm not completely at ease with it.
After our testsuite setup is done, we start new processes that run our tests. The new processes only get to see the new USERPROFILE, and appdirs returns the paths it finds by sending them through normpath, which unfortunately interprets the empty string returned by _get_win_folder for a CSIDL that now can't be found anymore as a relative path (.):
# snippet from appdirs source code
path = os.path.normpath(_get_win_folder("CSIDL_COMMON_APPDATA"))
And based on this, we end up configuring the current working directory of each test as the place for user data, user caches, etc.
My question is: How could I fix this? Based on my probably incomplete understanding, I currently think it ultimately boils down to the question how to treat or mock the USERPROFILE. I need to have it pointed to a registry in order to derive the "special folder" IDs (be it with appdirs or more modern replacements of it) - but I also need it to point to the fake home with test-specific Git configurations. I believe the latter requires overwriting USERPROFILE in Python3.8 and newer. I'm wondering if there is a way to copy or mock the registry and place it under the new home? Set relevant CSIDLs/KNOWNFOLDERIDs in some other way? Hardcode other temporary locations to use as cache directories etc? Or maybe there is a more clever way to run a test suite under Windows that does not require a fake home?
I would be very grateful to learn from more experienced Windows developers what to do, or also what not to do. Many thanks in advance.
[1] https://docs.python.org/3.11/library/os.path.html#os.path.expanduser
[2] https://bugs.python.org/issue36264
[3] https://github.com/ActiveState/appdirs
[4] https://learn.microsoft.com/en-us/windows/win32/shell/csidl

Simple Local Database Solution for Ruby?

I'm attempting to write a simple Ruby/Nokogiri scraper to get event information from multiple pages and then output it to a CSV that is attached to an email sent out weekly.
I have completed the scraping components and the CSV component and it's working perfectly. However, I now realize that I need to know when new events are added, which means I need some sort of database. Ideally I would just store this locally.
I've dabbled a bit with using the ruby gem 'sequel', but the data does not seem to persist beyond the running of the program. Do I need to download some database software to work with 'sequel'? Also I'm not using the Rails framework, just Ruby.
Any and all guidance is deeply appreciated!
I'm guessing you did Sequel.sqlite, as in the first example in the Sequel README, which creates an in-memory SQLite database. To create a database in your filesystem instead of memory, just pass it a path, e.g.:
Sequel.sqlite("./my-database.db")
This is, of course, assuming that you have the sqlite3 gem installed. If the given file doesn't exist, it will be created.
This is covered in the Sequel docs.

Ruby gem CLI tool, how should I save user settings?

I am currently making a CLI tool gem that downloads files from some service and saves them to a specified folder.
I was wondering, what would be the best way to store user settings for it?
For example, the folder to download the files to, the api access token and secret, that kind of thing.
I wouldn't want to ask a user for that input on every run.
I would read the API stuff from environment variables. Let the users decide if they want to enter it every time or set the variable in a .bashrc or .bash_profile file.
And I would ask for the download folder every time.

Where should a gem store log files?

I am building a ruby gem that should output a logfile. Where is it a good practice to store log files?
I am extracting this functionality from a Rails website I am building, and there I could simply log in the log/ directory.
Ideally, make the path configurable (.rc file, switch, rails/rack config, whatever).
If it's a Rack middleware, add the possibility to specify it in the constructor's arguments.
If no log path is provided, fallback to detecting a log directory. (I vaguely remember it being config.paths['log'] in Rails, but be sure that config actually points to something before using that in your gem if it can be used outside of Rails.)
And if all else fails, log to nowhere...
Also, allow to disable logging if you enable it by default. Not everyone wants logs.

WindowsAzure: Is it possible to set directory permissions within the web.config?

A PHP scriptof mine wants to write into a log folder, the resulting error is:
Unable to open the log file "E:\approot\framework\log/dev.log" for writing.
When I set the writing permissions for the WebRole User RD001... manually it works fine.
Now I want to set the folder permissions automatically. Is there an easy way to get it done?
Please note that I'm very new to IIS and the stuff around, I would appreciate precise answers, thx.
Short/Technical Response:
You could probably set permissions on a particular folder using full-trust and a startup taks. However, you'd need to account for a stateless OS and changing drive letters (possible, not likely) in this script, which would make it difficult. Also, local storage is not persisted, so you'd have no way to ensure this data stayed in the case of a reboot.
Recommendation: Don't write local, read below ...
EDIT: Got to thinking about this, and while I still recommend against this, there is a 3rd option: You can allocate local storage in the service config, then access it from PHP using a dll reference, then you will have access to that folder. Please remember local storage is not persisted, so it's gone during a reboot.
Service Config for local:
http://blogs.mscommunity.net/blogs/dadamec/archive/2008/12/11/azure-reading-and-writing-with-localstorage.aspx
Accessing config from php:
http://phpazure.codeplex.com/discussions/64334?ProjectName=phpazure
Long / Detailed Response:
In Azure, you really are encouraged to approach things as a platform and not as "software on a server". What I mean there is that ideas such as "write something to a local log file" are somewhat incompatible with the cloud "idea". Depending on your usage, you could (and should) convert this script to output this data to some cloud-based or external storage, vs just placing it on the disk.
I would suggest modifying this script to leverage the PHP Azure SDK and write these log entries out to table or blob storage in Azure. If this sounds good, please provide the PHP and I can give an exact example.
The main reason for that (besides pushing the cloud idea) is that in Azure, you cannot assume the host machine ("role instance") will maintain an OS state, so while you can set some things such as folder permissions, you can't rely on them sticking that way. You have no real way to guarantee those permissions won't be reset when the fabric has to update your role and react to some lower level problem. For example, a hard-drive cage on the rack where your current instance lives could fail. If the failure were bad enough, the Fabric controller would need to rebuild your instance. When that happens, your code is moved to an entirely different server, so the need would arise to re-set those permissions. Also, depending on the changes, the E:\ could all of a sudden need to be the F:\ or X:\ drive and you wouldn't know.
Its much better to pretend (at some level) that your application is running "in Azure" and not "on a server in azure", so you make no assumptions about the hosting environment. So anything you need outside of your code (data, logs, audits, etc) should be stored somewhere you can control (Azure Storage, external call-out, etc)

Resources