How to know the size of a very small file? - shell

suppose I have a very simple ASCII file that only contains
11111111
now, I want to use a command to find how many bytes it really has, not how many bytes the system allocated for it. I tried
ln -s
and
du
but they only output
4
I think that's how many blocks the system allocates for this file, how can I use a command to find the size of such a small file?

You need to use du -b to see the size of the file in bytes.
$ du -b file
9 file

wc -c will do:
$ echo "11111111" > file
$ wc -c file
9 file

You can use the stat command to get information on a file. For instance, the size of file in bytes is:
$ echo "11111111" > file
$ stat -c %s file
9
Type man stat to see all of the other useful things it can tell you about a file.

Related

Convert between byte count and "human-readable" string

Is there a shell command that simply converts back and forth between a number string in bytes and the "human-readable" number string offered by some commands via the -h option?
To clarify the question: ls -l without the -h option (some output supressed)
> ls -l
163564736 file1.bin
13209 file2.bin
gives the size in bytes, while with the -hoption (some output supressed)
> ls -lh
156M file1.bin
13K file2.bin
the size is human readable in kilobytes and megabytes.
Is there a shell command that simply turns 163564736into 156M and 13209 into 13K and also does the reverse?
numfmt
To:
echo "163564736" | numfmt --to=iec
From:
echo "156M" | numfmt --from=iec
There is no standard (cross-platform) tool to do it. But solution using awk is described here

Command to list all file types and their average size in a directory

I am working on a specific project where I need to work out the make-up of a large extract of documents so that we have a baseline for performance testing.
Specifically, I need a command that can recursively go through a directory and, for each file type, inform me of the number of files of that type and their average size.
I've looked at solutions like:
Unix find average file size,
How can I recursively print a list of files with filenames shorter than 25 characters using a one-liner? and https://unix.stackexchange.com/questions/63370/compute-average-file-size, but nothing quite gets me to what I'm after.
This du and awk combination should work for you:
du -a mydir/ | awk -F'[.[:space:]]' '/\.[a-zA-Z0-9]+$/ { a[$NF]+=$1; b[$NF]++ }
END{for (i in a) print i, b[i], (a[i]/b[i])}'
Give you something to start, with below script, you will get a list of file and its size, line by line.
#!/usr/bin/env bash
DIR=ABC
cd $DIR
find . -type f |while read line
do
# size=$(stat --format="%s" $line) # For the system with stat command
size=$(perl -e 'print -s $ARGV[0],"\n"' $line ) # #Mark Setchell provided the command, but I have no osx system to test it.
echo $size $line
done
Output sample
123 ./a.txt
23 ./fds/afdsf.jpg
Then it is your homework, with above output, you should be easy to get file type and their average size
You can use "du" maybe:
du -a -c *.txt
Sample output:
104 M1.txt
8 in.txt
8 keys.txt
8 text.txt
8 wordle.txt
136 total
The output is in 512-byte blocks, but you can change it with "-k" or "-m".

Contradicting redirected data in LinuxShell

I am new to shell scripting
I keyed in the Command
$ ls -l >out.txt
then I see the output
$ vi out.txt
the contents of the file were
total 8
-rw-rw-r-- 1 arun arun 0 May 5 19:55 out.txt
i now do this
$ ls -l
total 12
-rw-rw-r-- 1 arun arun 54 May 5 19:55 out.txt
why is there a discrepancy in the output that i received on the terminal and the output that was saved on the file out.txt?
The first time you ran ls, out.txt was empty.
The second time you ran ls, out.txt contained the results of ls, hence not empty.
As soon as the shell parsed the command and saw the use of stdout going to out.txt, it opened out.txt in your directory with size 0 bytes. When you did ls -l later on in the shell, out.txt already had some content and it showed the size.
When you ran
ls -l >out.txt
the sequence of events was:
Open the file out.txt for writing. Initially, the file size is 0 bytes.
Run ls -l, which sees the empty file out.txt.
Write the output of ls -l to out.txt.
After step 3, out.txt is a 54-byte file, which you observe with your second invocation of ls -l.
... because your first command put data into out.txt. Its size is necessarily larger after that.

Extract part of a filename shell script

In bash I would like to extract part of many filenames and save that output to another file.
The files are formatted as coffee_{SOME NUMBERS I WANT}.freqdist.
#!/bin/sh
for f in $(find . -name 'coffee*.freqdist)
That code will find all the coffee_{SOME NUMBERS I WANT}.freqdist file. Now, how do I make an array containing just {SOME NUMBERS I WANT} and write that to file?
I know that to write to file one would end the line with the following.
> log.txt
I'm missing the middle part though of how to filter the list of filenames.
You can do it natively in bash as follows:
filename=coffee_1234.freqdist
tmp=${filename#*_}
num=${tmp%.*}
echo "$num"
This is a pure bash solution. No external commands (like sed) are involved, so this is faster.
Append these numbers to a file using:
echo "$num" >> file
(You will need to delete/clear the file before you start your loop.)
If the intention is just to write the numbers to a file, you do not need find command:
ls coffee*.freqdist
coffee112.freqdist coffee12.freqdist coffee234.freqdist
The below should do it which can then be re-directed to a file:
$ ls coffee*.freqdist | sed 's/coffee\(.*\)\.freqdist/\1/'
112
12
234
Guru.
The previous answers have indicated some necessary techniques. This answer organizes the pipeline in a simple way that might apply to other jobs as well. (If your sed doesn't support ‘;’ as a separator, replace ‘;’ with ‘|sed’.)
$ ls */c*; ls c*
fee/coffee_2343.freqdist
coffee_18z8.x.freqdist coffee_512.freqdist coffee_707.freqdist
$ find . -name 'coffee*.freqdist' | sed 's/.*coffee_//; s/[.].*//' > outfile
$ cat outfile
512
18z8
2343
707

Can the leading bytes of an input stream be copied to another file without closing the input stream?

I'm basically asking why:
head -c 2 > /tmp/first-two-bytes
cat /tmp/first-two-bytes -
doesn't copy the first two bytes of stdin to /tmp/first-two-bytes then dump the entire contents of stdin to stout.
[Edit] Just to be clear, here's what happens on my machine:
$ uname -a
Darwin Myles-Byrnes-iMac.local 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64
$ echo "hello, world" | (head -c 2 > /tmp/first-two-bytes; cat /tmp/first-two-bytes -)
he$ cat /tmp/first-two-bytes
he$
Your commands do exactly what they should. Remember that a stream is not a file. Whatever is read from the stream, is removed from it. There's no rewinding (unless you implement it yourself using a buffer in your app - but it would be in the app, not a property of the stream). The first command reads 2 bytes from stdin. The other outputs the file from /tmp and the "entire contents of stdin" - but at the point it is called, the "entire contents" of stdin is already two bytes less than before the previous command was executed.
As you can see from the following command, the behaviour is just as you describe:
$ echo "hello, world" | (head -c 2 > /tmp/first-two-bytes; cat /tmp/first-two-bytes -)
hello, world
$ cat /tmp/first-two-bytes
he$
Note that the last $ is the prompt
Each byte in a stream can only be read once. It is entirely possible that head could be implemented so that it reads only 2 bytes, but it is also possible that head could be implemented to read the entire stream and output only the first 2 bytes. If the latter implementation is used, then stdout will be exhausted before cat ever sees any data.
If you want the functionality of head that is guaranteed to read exactly 2 bytes of data from the input stream and is maximally portable, you probably want to use dd. Just replace head -c 2 with dd bs=2 count=1
If I understand correctly, you want to write only the first two bytes from standard input into a special file and then print the entire input (including the first two bytes) to standard output.
Here's how I'd do it:
while read -r -n2 c2; do
if [ ! -f tmp.txt ]; then
echo $c2 > tmp.txt
fi
echo $c2
done < input.txt
Basically you first read the input into a variable, two bytes at a time, then write only the first two bytes.

Resources