dockerfile copy list of files, when list is taken from a local file - bash

I've got a file containing a list of paths that I need to copy by Dockerfile's COPY command on docker build.
My use case is such: I've got a python requirements.txt file, when inside I'm calling multiple other requirements files inside the project, with -r PATH.
Now, I want to docker COPY all the requirements files alone, run pip install, and then copy the rest of the project (for cache and such). So far i haven't managed to do so with docker COPY command.
No need of help on fetching the paths from the file - I've managed that - just if it is possible to be done - how?
thanks!

Not possible in the sense that the COPY directive allows it out of the box, however if you know the extensions you can use a wildcard for the path such as COPY folder*something*name somewhere/.
For simple requirements.txt fetching that could be:
# but you need to distinguish it somehow
# otherwise it'll overwrite the files and keep the last one
# e.g. rename package/requirements.txt to package-requirements.txt
# and it won't be an issue
COPY */requirements.txt ./
RUN for item in $(ls requirement*);do pip install -r $item;done
But if it gets a bit more complex (as in collecting only specific files, by some custom pattern etc), then, no. However for that case simply use templating either by a simple F-string, format() function or switch to Jinja, create a Dockerfile.tmpl (or whatever you'd want to name a temporary file), then collect the paths, insert into the templated Dockerfile and once ready dump to a file and execute afterwards with docker build.
Example:
# Dockerfile.tmpl
FROM alpine
{{replace}}
# organize files into coherent structures so you don't have too many COPY directives
files = {
"pattern1": [...],
"pattern2": [...],
...
}
with open("Dockerfile.tmpl", "r") as file:
text = file.read()
insert = "\n".join([
f"COPY {' '.join(values)} destination/{key}/"
for key, values in files.items()
])
with open("Dockerfile", "w") as file:
file.write(text.replace("{{replace}}", insert))

You might want to do this for example:
FROM ...
ARG files
COPY files
and run with
docker build -build-args items=`${cat list_of_files_to_copy.txt}`

Related

Temp file not being deleted

I'm trying to create a temporary file in my pipeline, then use that file in another rule.
For example, I have two rules in a .smk file:
#Unzip adapter trimmed fastq file
rule unzip_fastq:
input:
'{sample}.adapterTrim.round2.fastq.gz',
output:
temp('{sample}.adapterTrim.round2.fastq')
conda:
'../envs/rep_element.yaml'
shell:
'gunzip -c {input[0]} > {output[0]}'
#Run bowtie2 to align to rep elements and parse output
rule parse_bowtie2_output_realtime:
input:
'{sample}.adapterTrim.round2.fastq'
output:
'rep_element_pipeline/{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam'
params:
bt2=config["ref"]["bt2_index_path"], eid=config["ref"]["enst2id"]
conda:
'../envs/rep_element.yaml'
shell:
'perl ../scripts/parse_bowtie2_output_realtime_includemultifamily.pl '
'{input[0]} {params.bt2} {output[0]} {params.eid}'
{sample}.adapterTrim.round2.fastq is used once and should ultimately be deleted upon completion. However, I'm finding that this file is uploaded to Amazon S3, even with the addition of temp(). I'm also finding that this file is removed locally, but still persists on S3.
Am I doing this correctly? '{sample}.adapterTrim.round2.fastq' is not currently written in the rule-all of the Snakefile.
We ultimately need to prevent this file from being uploaded to S3, so if there is a way to specify not to upload this file in the rule, that would be useful.
It seems that the snippet in the question is not consistent with actual use, since for S3 files one would need to wrap file names in remote.
However, as a general solution, documentation contains the following:
The remote() wrapper is mutually-exclusive with the temp() and protected() wrappers.
Hence, if you intend to use a temp file, make sure it's not wrapped in remote, or explicitly wrap the file in local.

sql loader without .dat extension

Oracle's sqlldr defaults to a .dat extension. That I want to override. I don't like to rename the file. When googled get to know few answers to use . like data='fileName.' which is not working. Share your ideas, please.
Error message is fileName.dat is not found.
Sqlloder has default extension for all input files data,log,control...
data= .dat
log= .log
control = .ctl
bad =.bad
PARFILE = .par
But you have to pass filename without apostrophe and dot
sqlloder pass/user#db control=control data=data
sqloader will add extension. control.ctl data.dat
Nevertheless i do not understand why you do not want to specify extension?
You can't, at least in Unix/Linux environments. In Windows you can use the trailing period trick, specifying either INFILE 'filename.' in the control file or DATA=filename. on the command line. WIndows file name handling allows that; you can for instance do DIR filename. at a command prompt and it will list the file with no extension (as will DIR filename). But you can't do that with *nix, from a shell prompt or anywhere else.
You said you don't want to copy or rename the file. Temporarily renaming it might be the simplest solution, but as you may have a reason not to do that even briefly you could instead create a hard or soft link to the file which does have an extension, and use that link as the target instead. You could wrap that in a shell script that takes the file name argument:
# set variable from correct positional parameter; if you pass in the control
# file name or other options, this might not be $1 so adjust as needed
# if the tmeproary file won't be int he same directory, need to be full path
filename=$1
# optionally check file exists, is readable, etc. but overkill for demo
# can also check temporary file does not already exist - stop or remove
# create soft link somewhere it won't impact any other processes
ln -s ${filename} /tmp/${filename##*/}.dat
# run SQL*Loader with soft link as target
sqlldr user/password#db control=file.ctl data=/tmp/${filename##*/}.dat
# clean up
rm -f /tmp/${filename##*/}.dat
You can then call that as:
./scriptfile.sh /path/to/filename
If you can create the link in the same directory then you only need to pass the file, but if it's somewhere else - which may be necessary depending on why renaming isn't an option, and desirable either way - then you need to pass the full path of the data file so the link works. (If the temporary file will be int he same filesystem you could use a hard link, and you wouldn't have to pass the full path then either, but it's still cleaner to do so).
As you haven't shown your current command line options you may have to adjust that to take into account anything else you currently specify there rather than in the control file, particularly which positional argument is actually the data file path.
I have the same issue. I get a monthly download of reference data used in medical application and the 485 downloaded files don't have file extensions (#2gb). Unless I can load without file extensions I have to copy the files with .dat and load from there.

Add extra file into the rpm building process

i have the source-code of an application that supports adding python plugins. i have written a python script and want to build a custom rpm that by-default includes my script. So that i do not have to additionally add it after the rpm installation.
Now as far as i understand, there are two parts to this-
Adding the file to the source code.
Listing that file in the .spec file.
How do i know where to put the file in the source? How do i specify the path where i want my script to be copied? The spec file contains text like-
%if %{with_python}
%files python
%{_mandir}/man5/collectd-python*
%{_libdir}/%{name}/python.so
//Something like this?
// %{_libdir}/%{name}/gearman.py
// %{_libdir}/%{name}/redis.py
%endif
You need to know where to place your script file on the target installation (e.g. /usr/lib/myApp/plugins/myNiceScript.py)
In the spec-File (section %install) you have to copy your script under %{buildroot} into the target directory (which has to be created first.
%install
...
# in case the dir does not exist:
mkdir -p %{buildroot}/usr/lib/myApp/plugins
cp whereitis/myNiceScript.py %{buildroot}/usr/lib/myApp/plugins
At the end you have to define the file flags in the %files section. E.g. if your file has to have 644 under root:
%files
...
%defattr(644,root,root)
/usr/lib/myApp/plugins/myNiceScript.py
If your plugins directory is to be created during installation you need to define these flags too:
%defattr(755,root,root)
%dir /usr/lib/myApp/plugins

Using CMake, how can I concat files and install them

I'm new to CMake and I have a problem that I can not figure out a solution to. I'm using CMake to compile a project with a bunch of optional sub-dirs and it builds shared library files as expected. That part seems to be working fine. Each of these sub-dirs contains a sql file. I need to concat all the selected sql files to one sql header file and install the result. So one file like:
sql_header.sql
sub_dir_A.sql
sub_dir_C.sql
sub_dir_D.sql
If I did this directly in a make file I might do something like the following only smarter to deal with only the selected sub-dirs:
cat sql_header.sql > "${INSTALL_PATH}/somefile.sql"
cat sub_dir_A.sql >> "${INSTALL_PATH}/somefile.sql"
cat sub_dir_C.sql >> "${INSTALL_PATH}/somefile.sql"
cat sub_dir_D.sql >> "${INSTALL_PATH}/somefile.sql"
I have sort of figured out pieces of this, like I can use:
LIST(APPEND PACKAGE_SQL_FILES "some_file.sql")
which I assume I can place in each of the sub-dirs CMakeLists.txt files to collect the file names. And I can create a macro like:
CAT(IN "${PACKAGE_SQL_FILES}" OUT "${INSTALL_PATH}/somefile.sql")
But I am lost between when the CMake initially runs and when it runs from the make install. Maybe there is a better way to do this. I need this to work on both Windows and Linux.
I would be happy with some hints to point me in the right direction.
You can create the concatenated file mainly using CMake's file and function commands.
First, create a cat function:
function(cat IN_FILE OUT_FILE)
file(READ ${IN_FILE} CONTENTS)
file(APPEND ${OUT_FILE} "${CONTENTS}")
endfunction()
Assuming you have the list of input files in the variable PACKAGE_SQL_FILES, you can use the function like this:
# Prepare a temporary file to "cat" to:
file(WRITE somefile.sql.in "")
# Call the "cat" function for each input file
foreach(PACKAGE_SQL_FILE ${PACKAGE_SQL_FILES})
cat(${PACKAGE_SQL_FILE} somefile.sql.in)
endforeach()
# Copy the temporary file to the final location
configure_file(somefile.sql.in somefile.sql COPYONLY)
The reason for writing to a temporary is so the real target file only gets updated if its content has changed. See this answer for why this is a good thing.
You should note that if you're including the subdirectories via the add_subdirectory command, the subdirs all have their own scope as far as CMake variables are concerned. In the subdirs, using list will only affect variables in the scope of that subdir.
If you want to create a list available in the parent scope, you'll need to use set(... PARENT_SCOPE), e.g.
set(PACKAGE_SQL_FILES
${PACKAGE_SQL_FILES}
${CMAKE_CURRENT_SOURCE_DIR}/some_file.sql
PARENT_SCOPE)
All this so far has simply created the concatenated file in the root of your build tree. To install it, you probably want to use the install(FILES ...) command:
install(FILES ${CMAKE_BINARY_DIR}/somefile.sql
DESTINATION ${INSTALL_PATH})
So, whenever CMake runs (either because you manually invoke it or because it detects changes when you do "make"), it will update the concatenated file in the build tree. Only once you run "make install" will the file finally be copied from the build root to the install location.
As of CMake 3.18, the CMake command line tool can concatenate files using cat. So, assuming a variable PACKAGE_SQL_FILES containing the list of files, you can run the cat command using execute_process:
# Concatenate the sql files into a variable 'FINAL_FILE'.
execute_process(COMMAND ${CMAKE_COMMAND} -E cat ${PACKAGE_SQL_FILES}
OUTPUT_VARIABLE FINAL_FILE
WORKING_DIRECTORY ${CMAKE_CURRENT_LIST_DIR}
)
# Write out the concatenated contents to 'final.sql.in'.
file(WRITE final.sql.in ${FINAL_FILE})
The rest of the solution is similar to Fraser's response. You can use configure_file so the resultant file is only updated when necessary.
configure_file(final.sql.in final.sql COPYONLY)
You can still use install in the same way to install the file:
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/final.sql
DESTINATION ${INSTALL_PATH})

Directory dependencies with rake

I'm using rake to copy a directory as so:
file copied_directory => original_directory do
#copy directory
end
This works fine, except when something inside of original_directory changes. The problem is that the mod date doesn't change on the enclosing directory, so rake doesn't know to copy the directory again. Is there any way to handle this? Unfortunately my current setup does not allow me to set up individual dependencies for each individual file inside of original_directory.
You could use rsync to keep the 2 directories in sync as shown here: http://asciicasts.com/episodes/149-rails-engines
You don't need to know the files to depend on them:
file copied_directory => FileList[original_directory, original_directory + "/**/*"]

Resources