I have 575 bz2 files with average size 3G and need to convert them to .gz format to make them compatible with a downstream pipeline.
$ ll -h | head
total 1.4T
drwxrws---+ 1 dz33 dcistat 24K Aug 23 09:21 ./
drwxrws---+ 1 dz33 dcistat 446 Aug 22 11:57 ../
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_1.fastq.bz2
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_2.fastq.bz2
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_1.fastq.bz2
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_2.fastq.bz2
-rw-rw---- 1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_1.fastq.bz2
-rw-rw---- 1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_2.fastq.bz2
-rw-rw---- 1 dz33 dcistat 1.8G Aug 22 11:38 DRR091553_1.fastq.bz2
$ ll | wc -l
575
For a single file I probably can do bzcat a.bz2 | gzip -c >a.gz, but I am wondering how to convert them entirely with one command or loop in bash/linux.
Do them simply and fast in parallel with GNU Parallel:
parallel --dry-run 'bzcat {} | gzip -c > {.}.gz' ::: *bz2
Sample Output
bzcat a.bz2 | gzip -c > a.gz
bzcat b.bz2 | gzip -c > b.gz
bzcat c.bz2 | gzip -c > c.gz
If you like how it looks, remove the --dry-run. Maybe add a progress meter with --bar or --progress.
In a terminal, change directory to the one containing the .bz files, then use the following command:
for f in *.bz; do bzcat "$f" | gzip -c >"${f%.*}.gz"; done
This will process each file, one at a time, and give the .gz file the name of the .bz file.
Example: DRR091550_1.fastq.bz2 will become DRR091550_1.fastq.gz.
Related
When I execute the ls -l -h command, I get an output as show by the image below.
How can the number of the items in a folder be included in the output?
Update
The current output looks like this
total 41M
-rw-r--r-- 1 root root 41M Dec 20 09:56 completed_projects.bson
-rw-r--r-- 1 root root 213 Dec 20 09:57 completed_projects.metadata.json
drwxrwxr-x 2 adipster adipster 4.0K Jun 16 13:22 contents
-rw-rw-r-- 1 adipster adipster 13 Jun 16 13:20 file.py
drwxrwxr-x 4 adipster adipster 4.0K Jun 16 13:22 folder
drwxrwxr-x 2 adipster adipster 4.0K Jun 16 13:21 items
But I'll like to have another column indicating the number of items in a folder like this
total 41M
-rw-r--r-- 1 root root 41M Dec 20 09:56 completed_projects.bson
-rw-r--r-- 1 root root 213 Dec 20 09:57 completed_projects.metadata.json
drwxrwxr-x 2 adipster adipster 4.0K Jun 16 13:22 contents 235
-rw-rw-r-- 1 adipster adipster 13 Jun 16 13:20 file.py
drwxrwxr-x 4 adipster adipster 4.0K Jun 16 13:22 folder 19
drwxrwxr-x 2 adipster adipster 4.0K Jun 16 13:21 items 5
where the numbers at the extreme right represents the number of items in a folder
You can do something like this:
echo -n "Number of files in folder is: " && ls | wc -l && ls -l
ouptut should be something like this:
umber of files in folder is: 3
Total 279K
-rwxr-xr-x 1 user users 19K Jun 16 00:17 a
-rwxr-xr-x 1 user users 5K Jun 16 00:17 b
-rwxr-xr-x 1 user users 255K Jun 16 00:17 c
You can omit echo statement, just as a note -n is no new line flag.
sed has an option to execute the constructed replacement with /e.
We only count subdirs, looking at the first character
ls -l | sed -r 's/d(.*) ([^ ]*)/printf "d%s %-20s%s\n" "\1" \2 $(ls \2| wc -l)/e'
EDIT: Solution for directories with spaces in their name.
Parsing ls should be avoided. When you try to fix above cmmand for directory names with spaces, you might try
# Don't do this
ls -l | sed -r 's/d(.{,48}) (.*)/printf "d%s %-20s%s\n" "\1" "\2" $(ls "\2"| wc -l)/e'
It is time to write a script. Perhaps with find or something like
#/bin/bash
for i in *; do
printf "%-70s %s\n" "$(/bin/ls -ld "$i")" "$(/bin/ls -d "$i"/* 2>/dev/null| wc -l)"
done
The wc in the subdir will count wrong when filenames have newlines.
ls() { command ls "$#" | tee >(echo "$(wc -l) items"); }
That uses an output process substitution to run the little "echo" script on its stdin while also displaying stdin (thanks to tee). This way, you don't have to run ls twice.
Usual caveat: output will be incorrect when there's a file with a newline in the name.
I have a folder such like:
drwxrwsr-x+ 1 dz33 dcistat 212 Sep 22 13:34 ./
drwxrwsr-x+ 1 dz33 dcistat 46 Sep 7 13:51 ../
-rw-rw---- 1 qg25 dcistat 542 Sep 15 13:55 createsamplelist.R
-rwxrwxr-x 1 dz33 dcistat 3717 Sep 7 14:15 Freedman-HuEx1.0v2-Analysis.Rnw*
drwxrws---+ 1 dz33 dcistat 0 Sep 22 13:34 Gao/
-rw-rw---- 1 qg25 dcistat 530 Sep 14 17:04 .log
-rwxrwxr-x 1 dz33 dcistat 154 Sep 7 13:44 Makefile*
-rwxrwx--x 1 qg25 dcistat 1191 Sep 15 09:04 pacaroma.R*
-rw-rw---- 1 qg25 dcistat 1741 Sep 14 17:23 pacaroma.Rout
-rw-rw---- 1 qg25 dcistat 4426 Sep 15 16:54 pacmeap.R
-rw-rw---- 1 qg25 dcistat 3230 Sep 14 17:15 .RData
-rw-rw---- 1 qg25 dcistat 0 Sep 14 17:04 .txt
My question is how to move all files belong to user qg25 to the directory Gao/.
find -maxdepth 1 -user qg25 -exec mv {} Gao/ \;
find /PATH_NAME -group qg25 -exec mv -t /NEW_PATH_NAME {} +
This should do the trick but I would test it on some dummy data.
Issue : Folders are treated as Files
My Code:
#!/bin/bash
for var in $(ls)
do
echo $var
if [ -e $var ]
then
echo "This is a file"
else
echo "This is not a file"
fi
done
echo All Done
Current Contents of the Root Folder:
-rw-r--r-- 1 stibo stibo 401 Sep 17 2015 id_rsa.pub
drwxrwxr-x 24 stibo stibo 4096 Jul 26 09:25 step
-rwxr-xr-x 1 stibo stibo 51 Jul 27 12:51 test.txt
drwxrwxrwx 2 stibo stibo 4096 Aug 2 10:32 deletionFile
-rwxrw-r-- 1 stibo stibo 225 Aug 2 12:32 deletionScript.vi
-rw-rw-r-- 1 stibo stibo 235 Aug 2 12:33 logdetails.txt
-rw-rw-r-- 1 stibo stibo 123 Aug 2 12:42 path1.txt
-rw-rw-r-- 1 stibo stibo 285 Aug 2 16:18 path2.txt
drwxrwxr-x 2 stibo stibo 4096 Aug 3 10:14 archival
-rw-rw-r-- 1 stibo stibo 0 Aug 3 13:42 ls
-rw-rw-r-- 1 stibo stibo 164732 Aug 3 14:11 messages
-rw-rw-r-- 1 stibo stibo 164732 Aug 3 14:11 wtmp
drwxrwxrwx 2 stibo stibo 4096 Aug 3 14:21 backup
-rwxrwxr-x 1 stibo stibo 160 Aug 4 15:34 newScript.vi
-rw-rw-r-- 1 stibo stibo 160 Aug 4 15:41 Code.txt
-rw-rw-r-- 1 stibo stibo 0 Aug 4 15:43 Details.txt
Where there are 4 folders and 12 files.
But when I run the script, I could see every thing is considered as a file, even if there are folders in it.
Can you please let me know where I am going wrong?
You've tested if a given entry exists with -e. Use -d for testing for directories and -f for files
I want to add a new column at the beggining of every row. Command used:
tree -afispugD --inodes
I want to put a new column which will be the name of the file.
Example:
119801433 -rwxr--r-- u1915811 alum 1252 Apr 1 21:06 ./file
119802284 -rw-r--r-- u1915811 alum 1255 Mar 20 10:25 ./text.txt
119865862 drwxr-xr-x u1915811 alum 4096 Feb 27 10:20 ./S3/folder2
To this:
file 119801433 -rwxr--r-- u1915811 alum 1252 Apr 1 21:06 ./file
text.txt 119802284 -rw-r--r-- u1915811 alum 1255 Mar 20 10:25 ./text.txt
folder2 119865862 drwxr-xr-x u1915811 alum 4096 Feb 27 10:20 ./S3/folder2
PS: I have to do it because tree command doesn't show names :(
All you need is:
$ awk -F'/' '{print $NF,$0}' file
file 119801433 -rwxr--r-- u1915811 alum 1252 Apr 1 21:06 ./file
text.txt 119802284 -rw-r--r-- u1915811 alum 1255 Mar 20 10:25 ./text.txt
folder2 119865862 drwxr-xr-x u1915811 alum 4096 Feb 27 10:20 ./S3/folder2
or if you want to use some specific spacing in the output use printf instead of print:
$ awk -F'/' '{printf "%-10s%s\n",$NF,$0}' file
file 119801433 -rwxr--r-- u1915811 alum 1252 Apr 1 21:06 ./file
text.txt 119802284 -rw-r--r-- u1915811 alum 1255 Mar 20 10:25 ./text.txt
folder2 119865862 drwxr-xr-x u1915811 alum 4096 Feb 27 10:20 ./S3/folder2
or, since this is a simple substitution on a single line, you could use sed instead of awk:
$ sed 's/\(.*\/\(.*\)\)/\2 \1/' file
file 119801433 -rwxr--r-- u1915811 alum 1252 Apr 1 21:06 ./file
text.txt 119802284 -rw-r--r-- u1915811 alum 1255 Mar 20 10:25 ./text.txt
folder2 119865862 drwxr-xr-x u1915811 alum 4096 Feb 27 10:20 ./S3/folder2
This one will also work, if file has whitespaces in its name or if it's a symbolic link
tree -afispugD --inodes | awk '{FS="./"; ORS=""; printf("%-60s%s\n",$NF,$0)}'
Until there are spaces in the filenames, this should work:
tree -afispugD --inodes | awk '{printf("-30s%s\n",$NF,$0}'
Using only grep and sed, is there a way I can tranform the output of ls -l * into this :
-rw-r--r-- agenda.txt
-rw-r--r-- annuaire.txt
Thanks!
seeing that you have already got your "answer", here's one of the simpler solution
ls -l | tr -s " "| cut -d" " -f1,8-
#OP, sed is "powerful", but sometimes, simplicity is more powerful.
Side note: Don't parse file names like that.
ls -l | sed 's/[ ]+//g' | sed 's/ [0-9].*:.[0-9]/ /g'
ls -altrh| sed -E 's/ +.+ / / g'
Or you can go with ssed which supports Perl Regular Expressions.
I solved your problem using the ssed program you can install it in any Posix system, ssed stands for super sed.
so i did a ls -latrh in my home directory.
telsa:~ mahmoh$ ls -altrh
total 136
drwxr-xr-x 5 root admin 170B Jun 24 00:27 ../
drwx------+ 4 mahmoh staff 136B Jun 24 00:27 Pictures/
drwx------+ 3 mahmoh staff 102B Jun 24 00:27 Music/
drwx------+ 3 mahmoh staff 102B Jun 24 00:27 Movies/
drwx------+ 3 mahmoh staff 102B Jun 24 00:27 Desktop/
-rw------- 1 mahmoh staff 3B Jun 24 00:27 .CFUserTextEncoding
drwxr-xr-x+ 5 mahmoh staff 170B Jun 24 00:27 Public/
drwx------+ 5 mahmoh staff 170B Jun 24 02:19 Documents/
-rw-r--r--# 1 mahmoh staff 15K Jun 24 02:19 .DS_Store
drwx------# 36 mahmoh staff 1.2K Jun 24 14:48 Library/
-rw-r--r-- 1 mahmoh staff 279B Jun 24 15:27 .profile~
-rw-r--r--# 1 mahmoh staff 14K Jun 24 15:29 .vimrc
-rw-r--r-- 1 mahmoh staff 279B Jun 24 15:30 .profile
drwx------ 2 mahmoh staff 68B Jun 24 15:46 .Trash/
drwxr-xr-x 3 mahmoh staff 102B Jun 24 20:26 .mplayer/
-rw------- 1 mahmoh staff 3.5K Jun 24 22:11 .bash_history
-rw------- 1 mahmoh staff 42B Jun 24 23:25 .lesshst
-rw-r--r-- 1 mahmoh staff 3.6K Jun 24 23:39 temp
-rw-r--r-- 1 mahmoh staff 3.3K Jun 24 23:43 rtorrent.rc~
drwxr-xr-x 5 mahmoh staff 170B Jun 24 23:52 torrents/
-rw-r--r-- 1 mahmoh staff 3.3K Jun 24 23:56 .rtorrent.rc~
-rw------- 1 mahmoh staff 3.7K Jun 24 23:56 .viminfo
-rw-r--r-- 1 mahmoh staff 3.3K Jun 24 23:56 .rtorrent.rc
drwxr-xr-x+ 25 mahmoh staff 850B Jun 24 23:56 ./
drwx------+ 10 mahmoh staff 340B Jun 24 23:58 Downloads/
Now watch.
telsa:~ mahmoh$ ls -altrh| ssed -R -e 's/ +.+ / / g'
total 136
drwxr-xr-x ../
drwx------+ Pictures/
drwx------+ Music/
drwx------+ Movies/
drwx------+ Desktop/
-rw------- .CFUserTextEncoding
drwxr-xr-x+ Public/
drwx------+ Documents/
-rw-r--r--# .DS_Store
drwx------# Library/
-rw-r--r-- .profile~
-rw-r--r--# .vimrc
-rw-r--r-- .profile
drwx------ .Trash/
drwxr-xr-x .mplayer/
-rw------- .bash_history
-rw------- .lesshst
-rw-r--r-- temp
-rw-r--r-- rtorrent.rc~
drwxr-xr-x torrents/
-rw-r--r-- .rtorrent.rc~
-rw------- .viminfo
-rw-r--r-- .rtorrent.rc
drwxr-xr-x+ ./
drwx------+ Downloads/
ls -l | sed 's/^\([^\t ]\+\)\(.*:.[^ \t]\+\)\(.\+\)/\1 \3/'
Here is a working command. The slightly tricky thing is that ls -l will print the year for files that are older than some time (6 months) and hh:mm for newer files.
ls -l | sed 's/ .*[0-9]* .*[A-Z][a-z][a-z] [ 0-9][0-9] \{1,2\}[0-9][0-9]:*[0-9][0-9] / /'
For the following example
drwxr-xr-x 39 root root 1024 Feb 19 08:58 /
the starting .* will match 39 root root 1024 and then the rest of the regular expression matches month name (so you might restrict a-z to fewer characters) followed by year or hh:mm.
why not use awk instead of sed? awk is built for stuff like this.
see this manual page for more about fields in awk.
Like this?
ls -l | sed 's/ [0-9].*:.[0-9] / /' | less
Transforms
-rw-r--r-- 1 tomislav tomislav 609 2009-11-26 10:32 Test.class
-rw-r--r-- 1 tomislav tomislav 46 2009-12-14 12:16 test.groovy
into
-rw-r--r-- Test.class
-rw-r--r-- test.groovy