keep directory structure when generating files - go

I have those thrift interfaces:
./thrift/a/a1.thrift
./thrift/a/a2.thrift
./thrift/b/b1.thrift
./thrift/b/b2.thrift
where a1.thrift includes a2, b1, b2 (with include "thrift/a/a2.thrift")
I generate the Go files for all those with thrift -r --gen go:package_prefix=work -I . --out . thrift/a/a1.thrift
It outputs:
./a1/constants.go
./a1/ttypes.go
./a2/...
./b1/...
./b2/...
How can I tell thrift to output in?
./a/a1/...
./a/a2/...
./b/b1/...
./b/b2/...
Note that I can move those files by hand but first I have many and second in Go the package has to match the directories, so I would need to edit those files. As an example the generated Go file for a1 will import a2 as work/a2 and not work/a/a2)

Use namespaces. Add a line similar to the following on top of each IDL file:
namespace go a.a1 // whatever you need, but exactly one per IDL file
Running
thrift -r -gen go a1.thrift
creates files under
gen-go/a/a1/*

Related

dockerfile copy list of files, when list is taken from a local file

I've got a file containing a list of paths that I need to copy by Dockerfile's COPY command on docker build.
My use case is such: I've got a python requirements.txt file, when inside I'm calling multiple other requirements files inside the project, with -r PATH.
Now, I want to docker COPY all the requirements files alone, run pip install, and then copy the rest of the project (for cache and such). So far i haven't managed to do so with docker COPY command.
No need of help on fetching the paths from the file - I've managed that - just if it is possible to be done - how?
thanks!
Not possible in the sense that the COPY directive allows it out of the box, however if you know the extensions you can use a wildcard for the path such as COPY folder*something*name somewhere/.
For simple requirements.txt fetching that could be:
# but you need to distinguish it somehow
# otherwise it'll overwrite the files and keep the last one
# e.g. rename package/requirements.txt to package-requirements.txt
# and it won't be an issue
COPY */requirements.txt ./
RUN for item in $(ls requirement*);do pip install -r $item;done
But if it gets a bit more complex (as in collecting only specific files, by some custom pattern etc), then, no. However for that case simply use templating either by a simple F-string, format() function or switch to Jinja, create a Dockerfile.tmpl (or whatever you'd want to name a temporary file), then collect the paths, insert into the templated Dockerfile and once ready dump to a file and execute afterwards with docker build.
Example:
# Dockerfile.tmpl
FROM alpine
{{replace}}
# organize files into coherent structures so you don't have too many COPY directives
files = {
"pattern1": [...],
"pattern2": [...],
...
}
with open("Dockerfile.tmpl", "r") as file:
text = file.read()
insert = "\n".join([
f"COPY {' '.join(values)} destination/{key}/"
for key, values in files.items()
])
with open("Dockerfile", "w") as file:
file.write(text.replace("{{replace}}", insert))
You might want to do this for example:
FROM ...
ARG files
COPY files
and run with
docker build -build-args items=`${cat list_of_files_to_copy.txt}`

Make: Dependency on newest file in directory

For a small project, I have the following workflow:
compile code and generate ./data and ./images
run code, which will write many files to ./data
generate images from the data files, place them in ./images
generate a video from the images
I have written a makefile, which can run the code, and compile it before, if necessary. But I don't know how to implement the dependencies of steps 3 and 4, and currently make that targets manually.
So, is there a way to check if e.g. the newest file in ./data is newer than the newest file in ./images ? It's not necessary to do this on a file-by-file basis, and the total number of data / image files is not known.
Typically the date of the directory is the date that the last file was added/modified, so you could use the timestamp on the directory itself for dependencies.
images : data
// generate images
Alternatively, if there is a mapping between the files in the two directories, you could do something like:
images/%.img: data/%.dat
// generate image...
which would prevent reprocessing data that's already been handled.

Using CMake, how can I concat files and install them

I'm new to CMake and I have a problem that I can not figure out a solution to. I'm using CMake to compile a project with a bunch of optional sub-dirs and it builds shared library files as expected. That part seems to be working fine. Each of these sub-dirs contains a sql file. I need to concat all the selected sql files to one sql header file and install the result. So one file like:
sql_header.sql
sub_dir_A.sql
sub_dir_C.sql
sub_dir_D.sql
If I did this directly in a make file I might do something like the following only smarter to deal with only the selected sub-dirs:
cat sql_header.sql > "${INSTALL_PATH}/somefile.sql"
cat sub_dir_A.sql >> "${INSTALL_PATH}/somefile.sql"
cat sub_dir_C.sql >> "${INSTALL_PATH}/somefile.sql"
cat sub_dir_D.sql >> "${INSTALL_PATH}/somefile.sql"
I have sort of figured out pieces of this, like I can use:
LIST(APPEND PACKAGE_SQL_FILES "some_file.sql")
which I assume I can place in each of the sub-dirs CMakeLists.txt files to collect the file names. And I can create a macro like:
CAT(IN "${PACKAGE_SQL_FILES}" OUT "${INSTALL_PATH}/somefile.sql")
But I am lost between when the CMake initially runs and when it runs from the make install. Maybe there is a better way to do this. I need this to work on both Windows and Linux.
I would be happy with some hints to point me in the right direction.
You can create the concatenated file mainly using CMake's file and function commands.
First, create a cat function:
function(cat IN_FILE OUT_FILE)
file(READ ${IN_FILE} CONTENTS)
file(APPEND ${OUT_FILE} "${CONTENTS}")
endfunction()
Assuming you have the list of input files in the variable PACKAGE_SQL_FILES, you can use the function like this:
# Prepare a temporary file to "cat" to:
file(WRITE somefile.sql.in "")
# Call the "cat" function for each input file
foreach(PACKAGE_SQL_FILE ${PACKAGE_SQL_FILES})
cat(${PACKAGE_SQL_FILE} somefile.sql.in)
endforeach()
# Copy the temporary file to the final location
configure_file(somefile.sql.in somefile.sql COPYONLY)
The reason for writing to a temporary is so the real target file only gets updated if its content has changed. See this answer for why this is a good thing.
You should note that if you're including the subdirectories via the add_subdirectory command, the subdirs all have their own scope as far as CMake variables are concerned. In the subdirs, using list will only affect variables in the scope of that subdir.
If you want to create a list available in the parent scope, you'll need to use set(... PARENT_SCOPE), e.g.
set(PACKAGE_SQL_FILES
${PACKAGE_SQL_FILES}
${CMAKE_CURRENT_SOURCE_DIR}/some_file.sql
PARENT_SCOPE)
All this so far has simply created the concatenated file in the root of your build tree. To install it, you probably want to use the install(FILES ...) command:
install(FILES ${CMAKE_BINARY_DIR}/somefile.sql
DESTINATION ${INSTALL_PATH})
So, whenever CMake runs (either because you manually invoke it or because it detects changes when you do "make"), it will update the concatenated file in the build tree. Only once you run "make install" will the file finally be copied from the build root to the install location.
As of CMake 3.18, the CMake command line tool can concatenate files using cat. So, assuming a variable PACKAGE_SQL_FILES containing the list of files, you can run the cat command using execute_process:
# Concatenate the sql files into a variable 'FINAL_FILE'.
execute_process(COMMAND ${CMAKE_COMMAND} -E cat ${PACKAGE_SQL_FILES}
OUTPUT_VARIABLE FINAL_FILE
WORKING_DIRECTORY ${CMAKE_CURRENT_LIST_DIR}
)
# Write out the concatenated contents to 'final.sql.in'.
file(WRITE final.sql.in ${FINAL_FILE})
The rest of the solution is similar to Fraser's response. You can use configure_file so the resultant file is only updated when necessary.
configure_file(final.sql.in final.sql COPYONLY)
You can still use install in the same way to install the file:
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/final.sql
DESTINATION ${INSTALL_PATH})

Mahout - Naive Bayes

I tried deploying 20- news group example with mahout, it seems working fine. Out of curiosity I would like to dig deep into the model statistics,
for example: bayes-model directory contains the following sub directories,
trainer-tfIdf trainer-thetaNormalizer trainer-weights
which contains part-0000 files. I would like to read the contents of the file for better understanding, cat command doesnt seems to work, it prints some garbage.
Any help is appreciated.
Thanks
The 'part-00000' files are created by Hadoop, and are in Hadoop's SequenceFile format, containing values specific to Mahout. You can't open them as text files, no. You can find the utility class SequenceFileDumper in Mahout that will try to output the content as text to stdout.
As to what those values are to begin with, they're intermediate results of the multi-stage Hadoop-based computation performed by Mahout. You can read the code to get a better sense of what these are. The "tfidf" directory for example contains intermediate calculations related to term frequency.
You can read part-0000 files using hadoop's filesystem -text option. Just get into the hadoop directory and type the following
`bin/hadoop dfs -text /Path-to-part-file/part-m-00000`
part-m-00000 will be printed to STDOUT.
If it gives you an error, you might need to add the HADOOP_CLASSPATH variable to your path. For example, if after running it gives you
text: java.io.IOException: WritableName can't load class: org.apache.mahout.math.VectorWritable
then add the corresponding class to the HADOOP_CLASSPATH variable
export HADOOP_CLASSPATH=/src/mahout/trunk/math/target/mahout-math-0.6-SNAPSHOT.jar
That worked for me ;)
In order to read part-00000 (sequence files) you need to use the "seqdumper" utility. Here's an example I used for my experiments:
MAHOUT_HOME$: bin/mahout seqdumper -s
~/clustering/experiments-v1/t14/tfidf-vectors/part-r-00000
-o ~/vectors-v2-1010
-s is the sequence file you want to convert to plain text
-o is the output file

Join multiple Coffeescript files into one file? (Multiple subdirectories)

I've got a bunch of .coffee files that I need to join into one file.
I have folders set up like a rails app:
/src/controller/log_controller.coffee
/src/model/log.coffee
/src/views/logs/new.coffee
Coffeescript has a command that lets you join multiple coffeescripts into one file, but it only seems to work with one directory. For example this works fine:
coffee --output app/controllers.js --join --compile src/controllers/*.coffee
But I need to be able to include a bunch of subdirectories kind of like this non-working command:
coffee --output app/all.js --join --compile src/*/*.coffee
Is there a way to do this? Is there a UNIXy way to pass in a list of all the files in the subdirectories?
I'm using terminal in OSX.
They all have to be joined in one file because otherwise each separate file gets compiled & wrapped with this:
(function() { }).call(this);
Which breaks the scope of some function calls.
From the CoffeeScript documentation:
-j, --join [FILE] : Before compiling, concatenate all scripts together in the order they were passed, and write them into the specified file. Useful for building large projects.
So, you can achieve your goal at the command line (I use bash) like this:
coffee -cj path/to/compiled/file.js file1 file2 file3 file4
where file1 - fileN are the paths to the coffeescript files you want to compile.
You could write a shell script or Rake task to combine them together first, then compile. Something like:
find . -type f -name '*.coffee' -print0 | xargs -0 cat > output.coffee
Then compile output.coffee
Adjust the paths to your needs. Also make sure that the output.coffee file is not in the same path you're searching with find or you will get into an infinite loop.
http://man.cx/find |
http://www.rubyrake.org/tutorial/index.html
Additionally you may be interested in these other posts on Stackoverflow concerning searching across directories:
How to count lines of code including sub-directories
Bash script to find a file in directory tree and append it to another file
Unix script to find all folders in the directory
I've just release an alpha release of CoffeeToaster, I think it may help you.
http://github.com/serpentem/coffee-toaster
The most easy way to use coffee command line tool.
coffee --output public --join --compile app
app is my working directory holding multiple subdirectories and public is where ~output.js file will be placed. Easy to automate this process if writing app in nodejs
This helped me (-o output directory, -j join to project.js, -cw compile and watch coffeescript directory in full depth):
coffee -o web/js -j project.js -cw coffeescript
Use cake to compile them all in one (or more) resulting .js file(s). Cakefile is used as configuration which controls in which order your coffee scripts are compiled - quite handy with bigger projects.
Cake is quite easy to install and setup, invoking cake from vim while you are editing your project is then simply
:!cake build
and you can refresh your browser and see results.
As I'm also busy to learn the best way of structuring the files and use coffeescript in combination with backbone and cake, I have created a small project on github to keep it as a reference for myself, maybe it will help you too around cake and some basic things. All compiled files are in www folder so that you can open them in your browser and all source files (except for cake configuration) are in src folder. In this example, all .coffee files are compiled and combined in one output .js file which is then included in html.
Alternatively, you could use the --bare flag, compile to JavaScript, and then perhaps wrap the JS if necessary. But this would likely create problems; for instance, if you have one file with the code
i = 0
foo = -> i++
...
foo()
then there's only one var i declaration in the resulting JavaScript, and i will be incremented. But if you moved the foo function declaration to another CoffeeScript file, then its i would live in the foo scope, and the outer i would be unaffected.
So concatenating the CoffeeScript is a wiser solution, but there's still potential for confusion there; the order in which you concatenate your code is almost certainly going to matter. I strongly recommend modularizing your code instead.

Resources