Formatting multiple json files recursively - bash

This is a theoretical question about minimizing side effects in bash scripting.
I recently used a simple mechanism for formatting a bunch of json files, in a nested directory structure...
for f in `find ./ -name *json`; do echo $f ; python -mjson.tool $f > /tmp/1 && cp /tmp/1 $f ; done.
The mechanism is simply to
format each file using python's mjson.tool,
write it to a tmp location, and
then rewrite it back in place.
Is there a way to do this which is more elegant, i.e. with minimal side effects? I'm assuming bash experts have a better way of doing this sort of thing .

Unix tools working on a streaming basis -- they don't store all of the contents of the files in memory at once. Therefore, you have to use an intermediary location since you would be overwriting a file that is currently being read from.
You may consider that your snippet isn't fault tolerant. If you make a mistake, you would have just overwritten all your data. You should store the output in a new location, verify, then move to overwrite. :)

Using Eclipse IDE we can achieve formatting for multiple JSON files
Import the files into eclipse and select the files (you wish to format) or folder(for all the files) and right click -> source -> format

I was looking for something similar and just noticed I can select all JSON files I have in my VSCode file panel and CTRL + Click > "Format". Works like magic for a one-off operation, it's formatting the files in-place.
VSCode format in action

Related

How to remove a reoccuring word/character and what comes after, from the filenames of multiple files?

I have several folders of video files where, due to the download manager I use, they are all named in the following format "FILENAME.mp4; filename= FILENAME.mp4" All I've been trying to do is to remove everything after (and including) ".mp4; filename". However, I haven't found a way to do this.
I have tried some free software (such as Renamer, Namechanger, Name Munger for Mac, Transnomino) but I failed to do what I need to.
I'm working on Mac OSX 10.13.6.
Any help with this issue would be appreciated.
You can achieve it using Terminal. Go to the folder where you want to rename files using this cd command, for example:
cd ~/Documents/Videos
And run this command to rename all files recursively:
find . -iname "*.mp4;*" | sed -E 's/(\.[^\.]*)(\.mp4)(.*)/mv "\1\2\3" "\1\2"/' | sh
This command will keep only FILENAME.mp4 part from FILENAME.mp4; filename= FILENAME.mp4 file name
I used to extensively use a windows Rename tool called Renamer 6.0, and it had a "pattern rename" facility called "Multi change" that could have handled this.
In the context of that tool it would be asking for a source pattern like %a= %b and a destination pattern (like %b), everything after the = would be stored in %b variable and then renaming the file to just %b would lose everything after the =
See if your preferred rename tool has a similar facility?
If your tool supports regex, then find: .*?=(.*) and replace with $1
I'm also minded that asking this question on https://unix.stackexchange.com/ might elicit some help crafting a shell script that will perform this rename (though also plenty of shell capable people here, one of them may see it - it's just that it's not quite as hardcore programmer-y a question as most).
If you're willing to learn/use java, then that could be another good way to get the problem solved. It would (at a guess) look something like this:
for (final File f : new File("C:\\temp").listFiles()) {
if (f.isFile()) {
string n = f.getName();
if (n.contains("=")) {
f.renameTo(new File(n.substring(n.indexOf("=")+1));
}
}
}

Shell script to verify data packages

I need to make shell script to check my algorithms with loads of data(tests packages saved in .in files, every package contains folder with .in file and the other one with .out file where supposed to be correct result)
Sometimes It's about 1000 files in one packages so there's no point of doing it manually. I need some kind of loop which opens this .in file then redirect input of my c++ program and also redirect output of this program(save result to .out files) But the point is I can't get this language as quick as I need.
And I would like this script to compare results of my algorithm to .out files from packages
for f in ExternalIn/*.in; do//part of code which opens process with my algorithm and compare its .out file to .out file from package
Skipping checks for missing files, whitespace-safety, etc., you probably need something like:
for f in ExternalIn/*.in; do
# diff the result of my_cpp_app eating file.in with file.out
# and store the comparison result in file.diff
diff ${f/.in/.out} <(my_cpp_app <$f 2>/dev/null) > ${f/.in/.diff}
done
Although I would probably do it with find / xargs pipeline which is not only safer but also allows parallel execution.
Or even write a Makefile for this and use make, which after all is a tool for exactly this kind of work.

Append part of folder name to all .gz within

I have a folder of data folders with the following structure:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
I want to modify all the data.gz within each sample folder by appending the sample name but not the random numbers to get:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName2_data1.gz
It seems like this should be a simple mv for loop but I haven't been able to figure out how to pull part of a folder name using basename.
for i in */Data/Intensities/BaseCalls/*.gz; do mv $i "fastq""/"${i%%-*}"."`basename $i`; done
I couldn't figure out how to make the files stay in their original folder but for my purposes it works to have all the files go to a new folder ("fastq")
I suppose the "sampleName" part doesn't include dashes. In that case, use the standard pattern removal expansion: %%. That is, suppose your full path (relative to directory root) is stored in $path, just do ${path%%-*} to extract the "sampleName" part. Search for %% in the Bash Reference Manual for more details. As a simple example:
> path=sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
> echo ${path%%-*}
sampleName1
Otherwise, you could also use more advanced substring extraction based on regex. See BashFAQ/100 or Manipulating Strings from the TLDP Advanced Bash Scripting Guide.
Update. Here's the full command to perform the job described, and it is entirely native to the shell:
for file in */Data/Intensities/BaseCalls/*.gz; do
mv "$file" "${file%/*}/${file%%-*}_${file##*/}"
done

Create new files from existing ones but change their extension

In shell, what is a good way to duplicating files in an existing directory so that the result gives the same file but with a different extension? So taking something like:
path/view/blah.html.erb
And adding:
path/view/blah.mobile.erb
So that in the path/view directory, there would be:
path/view/blah.html.erb
path/view/blah.mobile.erb
I'd ideally like to perform this at a directory level and not create the file if it already has both extensions but that isn't necessary.
You can do:
cd /path/view/
for f in *.html.erb; do
cp "$f" "${f/.html./.mobile.}"
done
PS: This replaces first instance of .html. with .mobile., syntax is bash specific (let me know if you're not using BASH).

What is the fastest way to unzip textfiles in Matlab during a function?

I would like to scan text of textfiles in Matlab with the textscan function. Before I can open the textfile with fid = fopen('C:\path'), I need to unzip the files first. The files have the extension: *.gz
There are thousands of files which I need to analyze and high performance is important.
I have two ideas:
(1) Use an external program an call it from the command line in Matlab
(2) Use a Matlab 'zip'toolbox. I have heard of gunzip, but don't know about its performance.
Does anyone knows a way to unzip these files as quick as possible from within Matlab?
Thanks!
You could always try the Matlab unzip() function:
unzip
Extract contents of zip file
Syntax
unzip(zipfilename)
unzip(zipfilename, outputdir)
unzip(url, ...)
filenames = unzip(...)
Description
unzip(zipfilename) extracts the archived contents of zipfilename into the current folder and sets the files' attributes, preserving the timestamps. It overwrites any existing files with the same names as those in the archive if the existing files' attributes and ownerships permit it. For example, files from rerunning unzip on the same zip filename do not overwrite any of those files that have a read-only attribute; instead, unzip issues a warning for such files.
Internally, this uses Java's zip library org.apache.tools.zip. If your zip archives each contain many text files it might be faster to drop down into Java and extract them entry by entry, without explicitly unzipped files. look at the source of unzip.m to get some ideas, and also the Java documentation.
I've found 7zip-commandline(Windows) / p7zip(Unix) to be somewhat speedier for this.
[edit]From some quick testing, it seems making a system call to gunzip is faster than using MATLAB's native gunzip. You could give that a try as well.
Just write a new function that imitates basic MATLAB gunzip functionality:
function [] = sunzip(fullfilename,output_dir)
if ~exist('output_dir','var'), output_dir = fileparts(fullfilename); end
app_path = '/usr/bin/7za';
switches = ' e'; %extract files ignoring directory structure
options = [' -o' output_dir];
system([app_path switches options '_' fullfilename]);
Then use it as you would use gunzip:
sunzip('/data/time_1000.out.gz',tmp_dir);
With MATLAB's toc timer, I get the following extraction times with 6 uncompressed 114MB ASCII files:
gunzip: 10.15s
sunzip: 7.84s
worked well, just needed a minor change to Max's syntax calling the executable.
system([app_path switches ' ' fullfilename options ]);

Resources