Can Trains config file be specified dynamically or relative to the running script path? - trains

Suppose I have a server where many users run different experiments, possibly with different Trains Servers.
I know about the TRAINS_CONFIG_FILE environment variable, but I wonder if this can be made more flexible in one of the following ways:
Specifying the Trains config file dynamically, i.e. during runtime of the training script?
Storing a config file in each of the training repos and specifying its path relatively to the running script path (instead of relatively to ~/)?

Disclaimer: I'm a member of Allegro Trains team
Loading of the configuration is done at import time. This means that if you set the os environment before importing the package, you should be fine:
os.environ['TRAINS_CONFIG_FILE']='~/repo/trains.conf'
from trains import Task
The configuration file is loaded based on the current working directory, this means that if you have os.environ['TRAINS_CONFIG_FILE']='trains.conf' the trains.conf file will be loaded from the running directory at the time the import happens (which usually is the folder where your script is executed from). This means you can have it as part of the repository, and always set the TRAINS_CONFIG_FILE to point to it.
A few notes:
What is the use case for different configuration files ?
Notice that when running with trains-agent , this method will override the configuration that the trains-agent passes to the code.

Related

Can I replace %USERPROFILE% and still get KNOWNFOLDERIDs from the registry?

We're developing an open source Python library that runs on Linux, MacOS, and Windows, but we don't have much experience or exposure to Windows in the developer team. The way we setup and run our test suite works fine under Linux and Mac, but is suboptimal on Windows.
Our tests set up a new directory in a temporary location, place a fake .gitconfig with relevant configurations inside it, and have the relevant HOME environment variables point to this location as the home directory in order to pick up the configurations during testing.
The code is shortened and can't be run, but hopefully illustrates the gist of what we do:
with make_tempfile(mkdir=True) as new_home:
pass
for v, val in get_home_envvars(new_home).items():
set_envvar(v, val)
if not os.path.exists(new_home):
os.makedirs(new_home)
with open(os.path.join(new_home, '.gitconfig'), 'w') as f:
f.write("""\
[user]
name = Tester
email = test#example.com
[more configs for testing]
exc = 1
""")
where get_home_envvars() makes sure that the $HOME env variable points to the new, temporary test home. On Windows since Python 3.8, os.path no longer queried the $HOME variable to determine a user's home, but USERPROFILE[1 ][2], so we've just overwritten this variable with the temporary test home:
def get_home_envvars(new_home):
environ = os.environ
out = {'HOME': new_home}
if on_windows:
# requires special handling, since it has a number of relevant variables
# and also Python changed its behavior and started to respect USERPROFILE only
# since python 3.8: https://bugs.python.org/issue36264
out['USERPROFILE'] = new_home
out['HOMEDRIVE'], out['HOMEPATH'] = splitdrive(new_home)
return {v: val for v, val in out.items() if v in os.environ}
However, we have now discovered that this breaks our test setup on Windows, with tests "bleeding" their caches, cookie data bases etc. into the places where we perform our unit tests, and with this creating files and directories that break our test assumptions.
I have a very limited understanding on what happens exactly, but my current hypothesis is this: Our library determines the appropriate locations for caches, logs, cookies, etc upon start by using appdirs [3], which does so by querying the "special folder" IDs/ CSIDLs that Windows has [4]. This information is determined in the Windows registry - which is found based on the USERPROFILE. To quote one specific reply in the Python bug tracker to this change:
This is unfortunate. Modifying USERPROFILE is highly unusual. USERPROFILE is the location of the user's "NTUSER.DAT" registry hive and local application data ("AppData\Local"), including "UsrClass.dat" (the "Software\Classes" registry hive). It's also the default location for a user's known shell folders and home directory. Modifying USERPROFILE shouldn't cause problems with any of this, but I'm not completely at ease with it.
After our testsuite setup is done, we start new processes that run our tests. The new processes only get to see the new USERPROFILE, and appdirs returns the paths it finds by sending them through normpath, which unfortunately interprets the empty string returned by _get_win_folder for a CSIDL that now can't be found anymore as a relative path (.):
# snippet from appdirs source code
path = os.path.normpath(_get_win_folder("CSIDL_COMMON_APPDATA"))
And based on this, we end up configuring the current working directory of each test as the place for user data, user caches, etc.
My question is: How could I fix this? Based on my probably incomplete understanding, I currently think it ultimately boils down to the question how to treat or mock the USERPROFILE. I need to have it pointed to a registry in order to derive the "special folder" IDs (be it with appdirs or more modern replacements of it) - but I also need it to point to the fake home with test-specific Git configurations. I believe the latter requires overwriting USERPROFILE in Python3.8 and newer. I'm wondering if there is a way to copy or mock the registry and place it under the new home? Set relevant CSIDLs/KNOWNFOLDERIDs in some other way? Hardcode other temporary locations to use as cache directories etc? Or maybe there is a more clever way to run a test suite under Windows that does not require a fake home?
I would be very grateful to learn from more experienced Windows developers what to do, or also what not to do. Many thanks in advance.
[1] https://docs.python.org/3.11/library/os.path.html#os.path.expanduser
[2] https://bugs.python.org/issue36264
[3] https://github.com/ActiveState/appdirs
[4] https://learn.microsoft.com/en-us/windows/win32/shell/csidl

how does a gradle task explicitly set itself having altered it's output or up to date for tasks dependent on it

I am creating a rather custom task that processes a number of input files and outputs a different number of output files.
I want to check the dates of the input files against the existing output files and also might look at the content of the input files to make the determination whether it is up to date or needs to be invoked to become up to date. What properties do I need to set in a doFirst, code the main action, or whatever ( where and when) in my task to set the right state for their dependency checker and task executor so as either appropriately force dependents to build or not.
Also any doc on standard lib utilities to do things like file date checking etc, getting lists of files etc that are easy like in ruby rake.
How do I specify the inputs and outputs to the task ? Especially as the outputs will not be known until the source is parsed and the output directory is scanned for what exists.
a sample that does this in a larger project that has tasks that are dependent on it would be really nice :)
What properties do I need to set in a doFirst, code the main action, or whatever ( where and when) in my task to set the right state for their dependency checker and task executor so as either appropriately force dependents to build or not.
Ideally this should be done as a custom task type. None of this logic should be in any of the Gradle files at all. Either have the logic in a dedicated plugin project that gets published somewhere which you can then reference in the project, or have the logic in buildSrc.
What you are trying to develop is what is known as an incremental task: https://docs.gradle.org/current/userguide/custom_tasks.html#incremental_tasks
These are used heavily throughout Gradle which makes the incremental build of Gradle possible: https://docs.gradle.org/current/userguide/more_about_tasks.html#sec:up_to_date_checks
How do I specify the inputs and outputs to the task ? Especially as the outputs will not be known until the source is parsed and the output directory is scanned for what exists.
Once you have your tasks defined and whatever else you need, in your main Gradle files you would configure them as you would any other plugin or task.
The two links above should be enough to help get you started.
As for a small example, I developed a Gradle plugin that generates files based on some input that is not known until its configured. The 'custom' task type just extends the provided JavaExec. The custom task is Wsdl2java. Then based on user configuration, tasks get registered here using the input file from the user. Since I reused built-in task types, I know for sure that no extra work will done and can rely on Gradle doing the heavy lifting. There's also a test to ensure that configuration cache works as expected: ConfigurationCacheFunctionalTests.
As I mentioned earlier, the links above should be enough to get you started.

Detect if ZONEINFO fails in Go

In Go, you can specify a specific zoneinfo.zip file to be used by adding a ZONEINFO environment variable that points to the specific file you'd like to use for time zone information. This is great as it allows me to ensure that the versions of the IANA time zone database that I'm using on my front-end and my back-end are the same.
However, there does not appear to be any way to detect if use of the specified zone info file has failed. Looking at the source code (https://golang.org/src/time/zoneinfo.go), it looks like any errors using the specified file will fail quietly and then go will proceed to check for default OS locations or the default $GOROOT location to pull time zone information from there. This is not the behavior that I would prefer as I would like to know with certainty that I am using my specified version of zone info.
I've thought of the following in terms of solutions, but I'm not happy with any of them.
I can check that the environment variable is set myself, but this is at best a partial solution as it doesn't tell me if the file is actually usable by LoadLocation.
I can ensure none of the backup locations for zone info exist. This seems a bit extreme and means that I have to be extremely careful about the environment that the code is running in in both dev and production settings.
Does anyone know of a more elegant way to ensure that I am using the zoneinfo.zip file specified by my ZONEINFO environment variable?
Update: To address this problem I too inspiration from #chuckx's answer below and put together a Go package that takes the guess work out of which time zone database is being used. Included in the readme are instructions on how to get the correct version of the time zone database using a Go installation.
Maybe consider not relying on the environment variable?
If you're not averse to distributing the unzipped fileset, you can easily use LoadLocationFromTZData(name string, data []byte). The second argument is the contents of an individual timezone file.
For reference, the functionality for processing a zip file is found in the unexported function loadTzinfoFromZip().
Step-by-step approach extracted from #Slotherooo's comment:
Make a local version of time.loadTzinfoFromZip(zipfile, name string) ([]byte, error)
Use that method to extract the []byte for the desired location from a timeinfo.zip file
Use time.LoadLocationFromTZData() exclusively instead of time.LoadLocation

Elasticsearch - how to store scripts in config/scripts directory

I'm trying to experiment with using scripts in the config/scripts directory. The Elasticsearch docs here say this:
Save the contents of the script as a file called config/scripts/my_script.groovy on every data node in the cluster:
This seems like it's probably really easy, but I'm afraid I don't understand how exactly to put a groovy file "on every data node in the cluster". Would this normally be done through the command line somehow, or can it be done by manually moving the groovy file (in Finder on OSX for example)? I have a test index, but when I look at the file structure on the nodes I'm confused where to put the groovy file. Help, pretty please.
You just need to copy the file to each server running elasticsearch. If you're just running elasticsearch on your computer then go to the folder you've installed elasticsearch into and add copy the file into config/scripts in there (you may have to create the folder first). Doesn't matter how the file gets there.
You should see an entry in the logs (or the console if you are running in the foreground) along the lines of
compiling script file [/path/to/elasticsearch/config/scripts/my_script.groovy
This won't show up straightaway - by default elasticsearch checks for new/updated scripts every 60 seconds (you can change this with the watcher.interval setting)
Since file scripts are deprecated (elastic/elasticsearch#24552 & elastic/elasticsearch#24555) this aproach is not going to work anymore.
API it's the only way.

Including a log4j.properties file in my jar, but different properties file at execution time (optionally)

I want to include a log4j.properties file in my maven build, but be able to use a different properties file at execution time (using cron on unix)
Any ideas?
You want to be able to change properties per environment.
There are number approach to address this issue.
Create directory in each environment which will contain different files (log4j.properties in your example). Add these directories to your classpath in each environment.
Use filter ability + profile ability of maven in order to populate log4j.properties with correct values in the build time.
Use build server (Jenkins, for example) which essentially will make p.2.
Each of these approaches has it's own drawbacks. I am currently using a bit weired combination of 2&3 because Jenkins limitations.

Resources