Finding which files are taking up most space - macos

On a mac terminal, I want to find out which files are the biggest in my project.
I try:
du -h | sort
But this sorts by path first and then within path the file size. How do I do it just for file size?
Thanks

Try
du -scm * | sort -n
If you want to have it as a nice zsh function you can use this:
function dudir () { du -scm ${1:-*(ND)} | sort -n }

Sort by numeric/reversed:
$ du -sk * | sort -nr
190560 find_buggy_pos.out
126676 DerivedData
29460 fens.txt
11108 cocos2d_html.tar.gz
484 ccore.log
164 ccore.out
16 a.out.dSYM
12 x
12 p
12 o
12 a.out
4 x.txt
4 trash.c
4 test2.cpp
4 test.cpp
4 stringify.py
4 ptest.c
4 o.cpp
4 mismatch.txt
4 games.pgn

It appears that you want to list files by size. Try:
find . -type f -printf "%s %p\n" | sort -n
(By default, du doesn't list counts for files. Use the -a or --all option to list count for files as well.)

On OSX following works:
find . -maxdepth 1 -type f -exec du -k {} \; | sort -nr

Use -k option:
du -sk * | sort -n

Related

Moving all files from subfolders to main folders with duplicate file names

I've been trying to write a little script to sort image files in my Linux server.
I tried multiple solution found all over StackExchange but it never meets my requirements.
Explanation:
photo_folder are filled with images (various extensions).
Mostly, images are already in this folder.
But sometime, like the example below, images are hidden in one or multiple photo_subfolder and file names are often the same such as 1.jpg, 2.jpg... in each of them.
Basically, I would like to move all image files from photo_subfolder to their photo_folder and all duplicated filenames to be renamed before merging together.
Example:
|parent_folder
| |photo_folder
| | |photo_subfolder1
| | | 1.jpg
| | | 2.jpg
| | | 3.jpg
| | |photo_subfolder2
| | | 1.jpg
| | | 2.jpg
| | | 3.jpg
| | |photo_subfolder3
| | | 1.jpg
| | | 2.jpg
| | | 3.jpg
Expectation:
|parent_folder
| |photo_folder
| | 1_a.jpg
| | 2_a.jpg
| | 3_a.jpg
| | 1_b.jpg
| | 2_b.jpg
| | 3_b.jpg
| | 1_c.jpg
| | 2_c.jpg
| | 3_c.jpg
Note that files names are just an example. Could be anything.
Thank you!
You can replace the / of the subdirectories with another character, e.g. _ , and then cp/mv the original file to the parent directory.
I try to recreate an example of your directory tree here - very simple, but I hope it can be adapted to your case. Note that I am using bash.
#!/bin/bash
bd=parent
mkdir ${bd}
for i in $(seq 3); do
mkdir -p "${bd}/photoset_${i}/subset_${i}"
for j in $(seq 5); do
touch "${bd}/photoset_${i}/${j}.jpg"
touch "${bd}/photoset_${i}/${j}.png"
touch "${bd}/photoset_${i}/subset_${i}/${j}.jpg"
touch "${bd}/photoset_${i}/subset_${i}/${j}.gif"
done
done
Here is the script that will cp the files from the subdirectories to the parent directory. Basically
find all the files recursively in the subdirectories and loop on them
use sed to replace \ with '_' and store this in a variable new_filepath (I also remove the initial parent_, but this is optional)
copy (or move) the old filepath into parent with filename new_filepath
for xtension in jpg png gif; do
while IFS= read -r -d '' filepath; do
new_filepath=$(echo "${filepath}" | sed s#/#_#g)
cp "${filepath}" "${bd}/${new_filepath}"
done < <(find ${bd} -type f -name "*${xtension}" -print0)
done
ls ${bd}
If you want to remove also the additional parent_ from the new_filepath you can replace the new_filepath above with:
new_filepath=$(echo ${filepath} | sed s#/#_#g | sed s/${bd}_//g)
I assumed that you define all the possible extension in the script. Otherwise to find all the extensions in the directory tree you can use the following snippet from a previous answer
find . -type f -name '*.*' | sed 's|.*\.||' | sort -u

Concatenate many files into one file without the header

I have three csv files (with the same name, e.g. A_bestInd.csv) that are located in different subfolders. I want to copy all of them into one file (e.g. All_A_bestInd.csv). To do that, I did the following:
{ find . -type f -name A_bestInd.csv -exec cat '{}' \; ; } >> All_A_bestInd.csv
The result of this command is the following:
Class Conf 1 2 3 4 //header of file1
A Reduction 5 1 2 1
A Reduction 1 8 1 10
Class Conf 1 2 3 4 //header of file2
A No_red 2 1 3 2
A No_red 3 6 1 9
Class Conf 1 2 3 4 //header of file3
A Reduction 5 5 8 9
A Reduction 7 2 1 11
As you can see, the issue is the header of each file is copied. How can I change my command to keep only one header and avoid the rest?
Use tail +2 to trim the headers from all the files.
find . -type f -name A_bestInd.csv -exec tail +2 {} \; >> All_A_bestInd.csv
To keep just one header you could combine it with head -1.
{ find . -type f -name A_bestInd.csv -exec head -1 {} \; -quit
find . -type f -name A_bestInd.csv -exec tail +2 {} \; } >> All_A_bestInd.csv
There are solutions with tail +2 and awk, but it seems to me the classic way to print all but the first line of a file is sed: sed -e 1d. So:
find . -type f -name A_bestInd.csv -exec sed -e 1d '{}' \; >> All_A_bestInd.csv
Use awk to filter out header lines from all files but the first (except you have thousands of them):
find . -type f -name 'A_bestInd.csv' -exec awk 'NR==1 || FNR>1' {} + > 'All_A_bestInd.csv'
NR==1 || FNR>1 means; if the number of current line from the start of input is 1, or, the number of current line from the start of current file is greater than 1, print current line.
$ cat A_bestInd.csv
Class Conf 1 2 3 4 //header of file3
A Reduction 5 5 8 9
A Reduction 7 2 1 11
$
$ cat foo/A_bestInd.csv
Class Conf 1 2 3 4 //header of file1
A Reduction 5 1 2 1
A Reduction 1 8 1 10
$
$ cat bar/A_bestInd.csv
Class Conf 1 2 3 4 //header of file2
A No_red 2 1 3 2
A No_red 3 6 1 9
$
$ find . -type f -name 'A_bestInd.csv' -exec awk 'NR==1 || FNR>1' {} + > 'All_A_bestInd.csv'
$
$ cat All_A_bestInd.csv
Class Conf 1 2 3 4 //header of file1
A Reduction 5 1 2 1
A Reduction 1 8 1 10
A Reduction 5 5 8 9
A Reduction 7 2 1 11
A No_red 2 1 3 2
A No_red 3 6 1 9

Find most frequent line in file in bash

Suppose I have a file similar to as follows:
Abigail 85
Kaylee 25
Kaylee 25
kaylee
Brooklyn
Kaylee 25
kaylee 25
I would like to find the most repeated line, the output must be just the line.
I've tried
sort list | uniq -c
but I need clean output, just the most repeated line (in this example Kaylee 25).
Kaizen ~
$ sort zlist | uniq -c | sort -r | head -1| xargs | cut -d" " -f2-
Kaylee 25
does this help ?
IMHO, none of these answers will sort the results correctly. The reason is that sort, without the -n, option will sort like this "1 10 11 2 3 4", etc., instead of "1 2 3 4 10 11 12". So, add -n like so:
sort zlist | uniq -c | sort -n -r | head -1
You can then, of course, pipe that to either xargs or sed as described earlier.
awk -
awk '{a[$0]++; if(m<a[$0]){ m=a[$0];s[m]=$0}} END{print s[m]}' t.lis
$ uniq -c list | sort -r | head -1 | awk '{$1=""}1'
Kaylee 25
Is this what you're looking for?

Bash: Limit output of ls and grep

Let me present an example and than try to explain my problem:
noob#noob:~/Downloads$ ls | grep srt$
Elementary - 01x01 - Pilot.LOL.English.HI.C.orig.Addic7ed.com.srt
Haven - 01x01 - Welcome to Haven.DVDRip.SAiNTS.English.updated.Addic7ed.com.srt
Haven - 01x01 - Welcome to Haven.FQM.English.HI.C.updated.Addic7ed.com.srt
Supernatural - 08x01 - We Need to Talk About Kevin.LOL.English.HI.C.updated.Addic7ed.com.srt
The Big Bang Theory - 06x02 - The Decoupling Fluctuation.LOL.English.HI.C.orig.Addic7ed.com.srt
Torchwood - 1x01 - Everything changes.0TV.English.orig.Addic7ed.com.srt
Torchwood - 1x01 - Everything changes.divx.English.updated.Addic7ed.com.srt
Now I only want to delete the first four results of the above command. Normally if I have to delete all the files I would do ls | grep srt$ | xargs -I {} rm {} but in this case I only want to delete the top four.
So, how can limit the output of ls and grep or suggest me an alternate way to achieve this.
You can pipe your commands to head -n to limit to n lines:
ls | grep srt | head -4
$ for i in `seq 1 345`; do echo $i ;done | sed -n '1,4p'
1
2
3
4
geee: ~
$ for i in `seq 1 345`; do echo $i ;done | sed -n '335,360p'
335
336
337
338
339
340
341
342
343
344
345
If you don't have too many files, you can use a bash array:
matching_files=( *.srt )
rm "${matching_files[#]:0:4}"

Script using find to count lines of code

I'm trying to create a shell script that will count the number of lines of code in one folder.
I got this:
h=find . -type f -name \*.[h]* -print0 | xargs -0 cat | wc -l
m=find . -type f -name \*.[m]* -print0 | xargs -0 cat | wc -l
expr $m + $h
But when I'm trying to run it I get this:
lines-of-code: line 6: .: -t: invalid option
.: usage: . filename [arguments]
0
lines-of-code: line 7: .: -t: invalid option
.: usage: . filename [arguments]
0
+
I know I have to do something to make it run on the specific folder I'm in. Is this even possible?
DDIYS (don't to it your self) Use cloc instead. Excelent tool written in perl that does the counting for you as well as a other things. It recognizes more than 80 languages.
Example output:
prompt> cloc perl-5.10.0.tar.gz
4076 text files.
3883 unique files.
1521 files ignored.
http://cloc.sourceforge.net v 1.50 T=12.0 s (209.2 files/s, 70472.1 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Perl 2052 110356 130018 292281
C 135 18718 22862 140483
C/C++ Header 147 7650 12093 44042
Bourne Shell 116 3402 5789 36882
Lisp 1 684 2242 7515
make 7 498 473 2044
C++ 10 312 277 2000
XML 26 231 0 1972
yacc 2 128 97 1549
YAML 2 2 0 489
DOS Batch 11 85 50 322
HTML 1 19 2 98
-------------------------------------------------------------------------------
SUM: 2510 142085 173903 529677
-------------------------------------------------------------------------------
Quote the commands like:
h=$(find . -type f -name *.[h]* -print0 | xargs -0 cat | wc -l)
Please also have a look at sloccount for counting lines of code. You can install it on debian/ubuntu with sudo apt-get install sloccount
For this specific problem, I have a different solution:
find . -type f -print0 | wc --files0-from=-
May be I misunderstood the question, but does this work for you?
wc -l *.[mh]*
Now it works!
h=$(find . -type f -name \*.[h]* -print0 | xargs -0 cat | wc -l)
m=$(find . -type f -name \*.[m]* -print0 | xargs -0 cat | wc -l)
expr $m + $h

Resources