concatenate fastq files in a directory

concatenate fastq files in a directory - bash

I have a file uploader, resumable.js, which takes a file and breaks it into 1MB 'chunks' and than sends over the files 1MB at a time. So after an upload I have a directory with thousands, sometimes millions of individual fastq files. I can concatenate all of these 'chunks' back into the files original state with this line of code..
cat file_name.* > merged.fastq
How would I go about concatenating the files back into its original state without manually running this script in the command line? Should I set up some bash script to handle this issue, maybe a cronjob? Any ideas to solve this issue are greatly appreciated.
ANSWER: For what its worth I used this npm module and it works great.
https://www.npmjs.com/package/joiner

Related

How do I use grep command to search in .bz2.gz.bz2 file?

Basically I have .bz2.gz.bz2 file which on extraction gives a .bz2.gz file and on again extraction gives .bz2 file. In this .bz2 file, is my txt file which I want to search on using grep command. I have searched for this but I got bzgrep command which will only search in bz2 file and not the corresponding .gz.bz2 file and give me no results.
Is there a command in unix system which will recursively search in a zipped archive for zipped archive and return results only when it finds the txt file inside it?
P.S: the txt file may be deep in the archive to level 10 max. I want the command to recursively find the txt file and search for the required string. And there will be no other than an archive inside the archive until the txt file level.

I'm not sure I fully understand but maybe this will help:
for i in /path/to/each/*.tar.bz2; do
tar -xvjf "$i" -C /path/to/save/in
rm $i
done
extract all `tar.bz2` and save them in directory then remove the `.bz2`

Thnx for sharing your question.
There are a couple of strange things with it though:
It makes no sense to have a .bz2.gz.bz2 file, so have you created this file yourself? If so, I'd advise you to reconsider doing so in that manner.
Also, you mention there is a .bz2 that would apparently contain different archives, but a .bz2 can only contain one single file by design. So if it contains archives it is probably a .tar.bz2 file in which the tar-file holds the actual archives.
In answer to your question, why can't you write a simple shell script that will unpack your .bz2.gz.bz2 into a .bz2.gz and then into a .bz2 file and then execute your bzgrep command on that file?
I do not understand where it is exactly that you seem to get stuck..

Bash script behaving differently for different files

I have a bash script that uses awk to process some files that I have downloaded. If I run the script on any of the files it does not work properly. However, if I transfer the contents of a file in a newly created one it seems to work as supposed. Could it have anything to do with the settings of the files?
I have two files file hotel_12313.dat and hotel_99999.dat . The first one is downloaded and the second one is created by me. If I copy the data from the first file into the second one and I execute the script on both of them the output is different.

When was a file used by another program

I have a file with a series of 750 csv files. I wrote a Stata that runs through each of these files and performs a single task. The files are quite big, and my program has been running for more than 12 hours.
Is there is a way to know which was the last of the csv files that was used by Stata? I would like to know if the code is somewhere near finishing. I thought that organizing the 750 files by "last used" would do the trick, but it does not.
Next time I should be more careful about signalling how the process is going...
Thank you

From the OS X terminal, cd to the directory containing the CSV files, and run the command
ls -lUt | head
which should show your files, sorted by the most recent access time, limited to the 10 most recently accessed.

On the most basic level you can use display and log your session:
clear
set more off
local myfiles file1 file2 file3
forvalues i = 1/3 {
display "processing file`i'"
// <do-something-with-file>
}
See also help log.

How to tar a folder while files inside the folder might being written by some other process

I am trying to create a script for cron job. I have around 8 GB folder containing thousands of files. I am trying to create a bash script which first tar the folder and then transfer the tarred file to ftp server.
But I am not sure while tar is tarring the folder and some other process is accessing files inside it or writing to the files inside it.
Although its is fine for me if the tarred file does not contains that recent changes while the tar was tarring the folder.
suggest me the proper way. Thanks.

tar will hapilly tar "whatever it can". But you will probably have some surprises when untarring, as tar also stored the size of the file it tars, before taring it. So expect some surprises.
A very unpleasant surprise would be : if the size is truncated, then tar will "fill" it with "NUL" characters to match it's recorded size... This can give very unpleasant side effects. In some cases, tar, when untarring, will say nothing, and silently add as many NUL characters it needs to match the size (in fact, in unix, it doesn't even need to do that : the OS does it, see "sparse files"). In some cases, if truncating occured during the taring of the file, tar will complain it encounters an Unexpected End of File when untarring (as it expected XXX bytes but only reads fewer than this), but will still say that the file should be XXX bytes (and the unix OSes will then create it as a sparse file, with "NUL" chars magically appended at the end to match the expected size when you read it).
(to see the NUL chars : an easy way is to less thefile (or cat -v thefile | more on a very old unix. Look for any ^#)
But on the contrary, if files are only appended to (logs, etc), then the side effect is less problematic : you will only miss some bits of them (which you say you're ok about), and not have that unpleasant "fill with NUL characters" side effects. tar may complain when untarring the file, but it will untar it.

I think tar failed (so do not create archive) when an archived file is modified during archiving. As Etan said, the solution depends on what you want finally in the tarball.
To avoid a tar failure, you can simply COPY the folder elsewhere before to call tar. But in this case, you cannot be confident in the consistency of the backuped directory. It's NOT an atomic operation, so some files will be todate while other files will be outdated. It can be a severe issue or not follow your situation.
If you can, I suggest you configure how these files are created. For example: "only recent files are appended, files older than 1 day are never changed", in this case you can easily backup only old files and the backup will be consistent.
More generally, you have to accept to loose last data AND be not consistant (each files is backup at a different date), or you have to act at a different level. I suggest :
Configure the software that produces the data to choose a consistency
Or use OS/Virtualization features. For example it's possible to do consistent snapshot of a storage on some virtual storage...

MeshLab: processing multiple files in meshlabserver

I'm new to using meshlabserver and meshlab in general.
I created the .mlx file and tries to run a command in meshlabserver for one file and it worked. I would like to know how do I write a command for hundreds of files?
Thanks in Advance.

I've just created a batch file with necessary loops and calls the .mlx file that will run the meshlabserver command. However one should know that the resulting files will be saved in the same directory where meshlabserver.exe is.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

concatenate fastq files in a directory - bash

Related

How do I use grep command to search in .bz2.gz.bz2 file?

Bash script behaving differently for different files

When was a file used by another program

How to tar a folder while files inside the folder might being written by some other process

MeshLab: processing multiple files in meshlabserver

Categories

Resources