Concatenate file weight less than the sum of the files - filesize

I have done these commands to concatenate the files into one file:
$ ls -1 | wc -l
16916
$ ls -1 *.txt | wc -l
16916
$ ls -lh | head -1
total 93M
$ cat *.txt > ../nectar_3.txt
$ ls -lh ../nectar_3.txt
-rw-r--r-- 1 llopis llopis 52M May 25 16:03 ../nectar_3.txt
Why is the resulting file size half of the sum of the size of all files? The only explanation I can found is about rounding in the ls -lh command, but I couldn't find anything (using ls -lk outputs almost the same 92.76953125M)

The total is rounded, and is not guaranteed to be accurate:
Simple example:
marc#panic$ ls -lk
total 24
-rw-r--r-- 1 marc marc 6000 May 25 08:39 test1.txt
-rw-r--r-- 1 marc marc 7000 May 25 08:39 test2.txt
-rw-r--r-- 1 marc marc 8000 May 25 08:39 test3.txt
Three simple files, total size = 21,000 bytes, yet the total shows 24.

Related

Bash - Version Numbers in Filenames. How to list latest versions only?

I have a directory of versioned files. The version of each file is indicated within it's filename, e.g. "_v1".
Example
List of files shown by ls:
123_FileA_v1.txt
123_FileA_v2.txt
132_FileB_v1.txt
I want to run a command to see only the latest versions:
123_FileA_v2.txt
132_FileB_v1.txt
My first attempt was to list files by mtime using
ls -ltr
But in my case this doesn't lead to sufficient results. I really want to collect versions from the filenames.
What would be the best way to do it?
This will do it :
ls | awk -F '_' '!prefixes[$1]++'
Hope it helps!
Edit :
If you want to see specific info you can do :
ls | awk -F '_' '!prefixes[$1]++' | xargs ls -lh
This will work as long as there are not spaces in your filenames.
Edit :
As requested by #PaulHodges, here is the sample output :_
$ ls -lh
total 0
drwxr-xr-x 5 Matias-Barrios Matias-Barrios 160B Feb 27 11:40 .
drwxr-xr-x 106 Matias-Barrios Matias-Barrios 3.3K Feb 27 11:39 ..
-rw-r--r-- 1 Matias-Barrios Matias-Barrios 0B Feb 27 11:40 132_FileB_v1.txt
-rw-r--r-- 1 Matias-Barrios Matias-Barrios 0B Feb 27 11:40 123_FileA_v2.txt
-rw-r--r-- 1 Matias-Barrios Matias-Barrios 0B Feb 27 11:40 123_FileA_v1.txt
$ ls | awk -F '_' '!prefixes[$1]++'
.
..
132_FileB_v1.txt
123_FileA_v2.txt
You could do something like
(
PATTERN="[0-9]{3}_[^_]*"
for prefix in `find . | egrep -o "$PATTERN" | sort -u`;
do
ls $prefix* | tail -1;
done
)
It will print
123_FileA_v2.txt
132_FileB_v1.txt
What happens here?
The surrounding braces ( are used to support copy & paste of the provided code. read more
The variable PATTERN is used to access all files starting with the same prefix.
The for prefix in `find . | egrep -o "$PATTERN" | sort -u generates a list of file prefixes.
The ls $prefix* lists all files with the same prefix in alphanumerical order
The | tail -1 shows only the last entry of the former ls $prefix*
Edit
I decided to use find . instead of ls *. With that I hope to circumvent the issues with ls *. Please correct me, if I'm wrong!

OSX How to have ls -l sort in alphabetical order and list directories and files together

I want my ls -l command to list both files and directories together rather than separating them. I also want a case insensitive list. For example, the following commands create the directories a and C and also the file b.txt:
% mkdir a C
% touch b.txt
Then I list them
tyler#Tylers-MacBook-Pro test % ls -l
total 0
drwxr-xr-x 2 tyler staff 64 Feb 12 12:06 C
drwxr-xr-x 2 tyler staff 64 Feb 12 12:06 a
-rw-r--r-- 1 tyler staff 0 Feb 12 12:06 b.txt
Note how the order is C, a, b.txt. I want it to list: a, b.txt, C (like this):
tyler#Tylers-MacBook-Pro test % ls -l
total 0
drwxr-xr-x 2 tyler staff 64 Feb 12 12:06 a
-rw-r--r-- 1 tyler staff 0 Feb 12 12:06 b.txt
drwxr-xr-x 2 tyler staff 64 Feb 12 12:06 C
How do I do this case insensitive list that doesn't separate files and directories.
Combined with sort, this should be what's required :
ls -l | sort -f -k 9,9
-f -k 9,9 means sort insensitively (-f) by 9th column (-k 9,9).

zip two file with same content, but final md5sum is different

I have the following operation on my mac:
$ echo "dgrgrrgrgrg" > test1.txt
after a few seconds, copy test1.txt:
$ cp test1.txt test2.txt
$ ls -l
total 16
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:18 test1.txt
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:19 test2.txt
now chech md5sum:
$ md5 *.txt
MD5 (test1.txt) = 8bab5a3e202c901499d83cb25d5a8c80
MD5 (test2.txt) = 8bab5a3e202c901499d83cb25d5a8c80
it's obvious that test1.txt and test2.txt have the same md5sum, now I zip these two files:
$ zip -X test1.zip test1.txt
adding: test1.txt (deflated 8%)
$ zip -X test2.zip test2.txt
adding: test2.txt (deflated 8%)
$ ls -l
total 32
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:18 test1.txt
-rw-r--r-- 1 hqfy staff 127 Mar 31 10:22 test1.zip
-rw-r--r-- 1 hqfy staff 12 Mar 31 10:19 test2.txt
-rw-r--r-- 1 hqfy staff 127 Mar 31 10:23 test2.zip
size of test1.zip and test2.zip are the same, but when I check md5sum:
$ md5 *.zip
MD5 (test1.zip) = af8783f96ce98aef717ecf6229ffb07e
MD5 (test2.zip) = 59e752a03a2930adbe7f30b9cbf14561
I've googled it, using zip with option -X, but it did not work in my case, how can I create the two zip files with the same md5sum?
Quoting from the zip man page here..
With -X, zip strips all old fields and only includes the Unicode and
Zip64 extra fields (currently these two extra fields cannot be
disabled).
So, a different md5sum is expected when zipping (even with -X).
I know that this question is very old, but I may have an answer for you:
The timestamps for the two files (which are very obviously different) are included in the .zip file. That is why the md5sums are different. If you can somehow remove those timestamps, then the md5sums will be the same.
Also note that macOS adds a folder (__MACOSX) to a zip file that contains extra metadata and such. That may also be the issue.

concatenate grep output to an echo statement in UNIX

I am trying to output the number of directories in a given path on a SINGLE line. My desire is to output this:
X-many directories
Currently, with my bash sript, I get this:
X-many
directories
Here's my code:
ARGUMENT=$1
ls -l $ARGUMENT | egrep -c '^drwx'; echo -n "directories"
How can I fix my output? Thanks
I suggest
echo "$(ls -l "$ARGUMENT" | egrep -c '^drwx') directories"
This uses the shell's feature of final newline removal for command substitution.
Do not pipe to ls output and count directories as you can get wrong results if special characters have been used in file/directory names.
To count directories use:
shopt -s nullglob
arr=( "$ARGUMENT"/*/ )
echo "${#arr[#]} directories"
/ at the end of glob will make sure to match only directories in "$ARGUMENT" path.
shopt -s nullglob is to make sure to return empty results if glob pattern fails (no directory in given argument).
as alternative solution
$ bc <<< "$(find /etc -maxdepth 1 -type d | wc -l)-1"
116
another one
$ count=0; while read curr_line; do count=$((count+1)); done < <(ls -l ~/etc | grep ^d); echo ${count}
116
Would work correctly with spaces in the folder name
$ ls -la
total 20
drwxrwxr-x 5 alex alex 4096 Jun 30 18:40 .
drwxr-xr-x 11 alex alex 4096 Jun 30 16:41 ..
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 asdasd
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 dfgerte
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 somefoler with_space
$ count=0; while read curr_line; do count=$((count+1)); done < <(ls -l ./ | grep ^d); echo ${count}
3

Mass replace characters in filenames from terminal?

I have about 50 files in a directory that contain spaces, apostrophes, etc. How can I go about mass-renaming them to remove the apostrophes and replaces spaces with underscores?
I can do
ls | grep '*.txt' | xargs ....
but I'm not sure what to do in the xargs bit
I use ren-regexp, which is a Perl script that lets you mass-rename files very easily.
You'd do something like ren-regexp 's/ /_/g' *.txt.
$ ls -l
total 16
-rw-r--r-- 1 marc marc 7 Apr 11 21:18 That's a wrap.txt
-rw-r--r-- 1 marc marc 6 Apr 11 21:18 What's the time.txt
$ ren-regexp "s/\'//g" "s/ /_/g" *.txt
That's a wrap.txt
1 Thats a wrap.txt
2 Thats_a_wrap.txt
What's the time.txt
1 Whats the time.txt
2 Whats_the_time.txt
$ ls -l
total 16
-rw-r--r-- 1 marc marc 7 Apr 11 21:18 Thats_a_wrap.txt
-rw-r--r-- 1 marc marc 6 Apr 11 21:18 Whats_the_time.txt

Resources