Extract multiple objects with the same name from a library - static-libraries

I have a .a file that contains a number of objects that share the same name (utils.o for example).
How can I extract these objects when the ar utility only operates on the name?

man ar reveals this modifier:
N Uses the count parameter. This is used if there are multiple entries in the archive with the same name.
Extract or delete instance count of the given name from the archive.
So, for example, you could use ar xN 5 libfoo.a utils.o to extract the 5th utils.o object from your archive...

Related

What would be the most efficient way to check that a name is within a list of possible words?

For example, let's imagine that you have the following name ['Michael', 'Scott'] and you want to check that most of the words that make up the name are contained in the following list of names and surnames:
['Pam', 'Michael', 'Jin', 'Schrute', ...]
PS: The list is very large, greater than 10,000 words
You have to save those 10k+ words in unordered_set(in c++ for example), which means it will be hashed, and checking if the name is contained in the list will be O(1) ~ O(log(max_len(word))).

How to concatenate many files using their basenames?

I study genetic data from 288 fish samples (Fish_one, Fish_two ...)
I have four files per fish, each with a different suffix.
eg. for sample_name Fish_one:
file 1 = "Fish_one.1.fq.gz"
file 2 = "Fish_one.2.fq.gz"
file 3 = "Fish_one.rem.1.fq.gz"
file 4 = "Fish_one.rem.2.fq.gz"
I would like to apply the following concatenate instructions to all my samples, using maybe a text file containing a list of all the sample_name, that would be provided to a loop?
cp sample_name.1.fq.gz sample_name.fq.gz
cat sample_name.2.fq.gz >> sample_name.fq.gz
cat sample_name.rem.1.fq.gz >> sample_name.fq.gz
cat sample_name.rem.2.fq.gz >> sample_name.fq.gz
In the end, I would have only one file per sample, ideally in a different folder.
I would be very grateful to receive a bit of help on this one, even though I'm sure the answer is quite simple for a non-novice!
Many thanks,
NoƩ
I would like to apply the following concatenate instructions to all my
samples, using maybe a text file containing a list of all the
sample_name, that would be provided to a loop?
In the first place, the name of the cat command is mnemonic for "concatentate". It accepts multiple command-line arguments naming sources to concatenate together to the standard output, which is exactly what you want to do. It is poor form to use a cp and three cats where a single cat would do.
In the second place, although you certainly could use a file of name stems to drive the operation you describe, it's likely that you don't need to go to the trouble to create or maintain such a file. Globbing will probably do the job satisfactorily. As long as there aren't any name stems that need to be excluded, then, I'd probably go with something like this:
for f in *.rem.1.fq.gz; do
stem=${f%.rem.1.fq.gz}
cat "$stem".{1,2,rem.1,rem.2}.fq.gz > "${other_dir}/${stem}.fq.gz"
done
That recognizes the groups present in the current working directory by the members whose names end with .rem.1.fq.gz. It extracts the common name stem from that member's name, then concatenates the four members to the correspondingly-named output file in the directory identified by ${other_dir}. It relies on brace expansion to form the arguments to cat, so as to minimize code and (IMO) improve clarity.

MapReduce One-to-one processing of multiple input files

Please clarify
I have set of input files (say 10) with specific names. I run word count job on all files at once (input path is folder). I am expecting 10 output files with same names as input files. I.e. File1 input should be counted and should be stored in a separate output file with "file1" name. And so on to all files.
There are 2 approaches you can take to achieve multiple outputs
Use MultipleOutputs class - refer this document for information about multipleclassoutput (https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html) , for more information about how to implement refer this http://appsintheopen.com/posts/44-map-reduce-multiple-outputs
Another option is using LazyOuputFormat, however, this is used in conjunction with multipleoutputs, for more information about its implementation refer this ( https://ssmolen.wordpress.com/2014/07/09/hadoop-mapreduce-write-output-to-multiple-directories-depending-on-the-reduce-key/ ).
I feel using LazyOutputFormat in conjunction with MultipleOuputs class is better approach.
Set the number of reduce tasks to be equal to the number of input files. This will create the given number of output files, as well.
Add a file prefix to each map output key (word). E.g., when you meet the word "cat" in file named "file0.txt" you can emit the key "0_cat", or "file0_cat", or anything else that is unique for "file0.txt". Use the context to get each time the filename.
Override the default Partitioner, to make sure that all the map output keys with prefix "0_", or "file0_" will go to the first partition, all the keys with prefix "1_", or "file1_" will go to the second, etc.
In the reducer, remove the "x_" or "filex_" prefix from the output key and use it as the name of the output file (using MultipleOutputs). Otherwise, if you don't want MultipleOutputs, you can easily do the mapping between outputfiles and input files by checking your Partitioner code. (e.g., part-00000 will be the partition 0's output)

netcdf time dimension missing CDO

I downloaded a netcdf file containing 4-5 variables but it has only 2 dimensions (lat and lon).
Time is missing and this does not allow me to merge timesteps or do anything useful.
Is there any way to fix this hopefully by using CDO?
there are 100 netcdf files (without time dimension) and I want to merge them using time as the main variable for merging.
Let me expound a bit on the answer by #Ales --
ncecat is a command-line function in the NCO package that concatenates multiple netcdf files lacking a record (oftentimes "Time") dimension. The concatenated file will have a new record dimension, by default it is 'record'.
Let's say you have a list of netcdf files such as A.nc, B.nc, and C.nc, all with the dimensions lat x lon. You can concatenate them using ncecat A.nc B.nc C.nc -O new_file.nc. new_file.nc will have dimensions record x lat x lon, where record will have a size of 3 (the number of files you concatenated). The -O option is not necessary, but will explicitly create (and overwrite) the file new_file.nc...or whatever you want to call it.
You can use command:
ncecat -O -u time in.nc out.nc
to create a new time dimension from scratch.

Querying a 'column' in core data

I have a core data entity named "Folder". Each "Folder" has a 1-to-many relationship with the entity "File", and each file contains the field "filename".
What is a succinct way of producing an array of all of the filenames for a given folder?
I expected it to be something like:
NSManagedObject* folder = [self getSomeFolder];
NSArray* files = [folder valueForKey:#"files.#unionOfSet.filename"];
... but i've been having no luck getting it to go, and Apple's set operations guide has got me stumped.
Your solution is mostly correct, but you need to use -valueForKeyPath: instead of -valueForKey:. -valueForKey: is optimized for keys that do not contain multiple elements (separated by .).

Resources