bash list file (ls) and find a number - bash

I have in my directory this files
ls -l /toto/
total 0
brw-rw---- 1 tata par 112, 24 Apr 16 13:08 file1
brw-rw---- 1 tata par 112, 23 Apr 16 13:08 file2
My bash have to verify that the number 112 is present for all lines
for f in $(ls -l /toto/);
do
fff=`grep "112" $f`
echo $fff
done
result:
grep: tata: No such file or directory
grep: 112: No such file or directory
grep: file1: No such file or directory
why? how ? Thanks

The files listed in your question are block devices (the b as the first character in the permissions block tells that).
This means 112 and 24 are the major and the minor version of the first file, in decimal notation.
The Unix command stat can be used to produce a file listing that uses a custom format (as opposed to ls that knows only a couple of fixed formats).
The command line you need is:
stat --format "%t %n" /toto/*
The %t format specifier lists the major version of a device file, in hexadecimal notation. %n lists the file name (we use it for debug).
112 in hexadecimal is 0x70. The command above should print:
70 file1
70 file2
Now you can pipe it through grep '^70 ' and then to wc -l to count the number of lines that start with 70 (70 followed by a space):
stat --format "%t %n" /toto/* | grep '^70 ' | wc -l
If you want to know if all files in the /toto/ directory have major version 112 then you can compare the number produced by the command above against the number produced by the next command (it produces the number of files and directories in the /toto/ directory)`
ls -1 /toto/ | wc -l
If you want to also know what files have a different major version then you can run this command:
stat --format "%t %n" /toto/* | grep -v '^70 '
It filters out the lines that do not start with ^70 and displays only the files that have a different major version (and their major version in hex).
If it doesn't display anything then all the files in the /toto/ directory has major version 112.
Remark: the command above will also list the regular files and directories and other files that are not devices (only the devices has versions).

Related

Finding files that are *not* hard links or under hard links directory via a shell script

I would like to find all files not a hard link or under a hard link directory.
I found this awesome SO but below command do not handle the case under hard link directory!
find /1 -type f -links 1 -print
for example:
/1/2/3/test.txt
/1/A/3/test.txt
2 is hard link to A, then we only expect find one test.txt file.
One more example from android:
$ adb shell ls -li /data/data/com.android.nfc |grep files
4243 drwxrwx--x 2 nfc nfc 3488 2022-06-13 11:08 files
$ adb shell ls -li /data/user/0/com.android.nfc |grep files
4243 drwxrwx--x 2 nfc nfc 3488 2022-06-13 11:08 files
$ adb shell ls -li /data/data/com.android.nfc/files/service_state.xml
5877 -rw------- 1 nfc nfc 100 2022-06-13 11:08 /data/data/com.android.nfc/files/service_state.xml
$ adb shell ls -li /data/user/0/com.android.nfc/files/service_state.xml
5877 -rw------- 1 nfc nfc 100 2022-06-13 11:08 /data/user/0/com.android.nfc/files/service_state.xml
Systems that support unrestricted hard links to directories are rare, but a similar situation can be created using bind mounts. (See What is a bind mount?.)
Try this Shellcheck-clean code to list files under the current directory that do not have multiple paths (caused by bind mounts or links to directories):
#! /bin/bash -p
shopt -s lastpipe
declare -A devino_of_file
declare -A count_of_devino
find . -type f -printf '%D.%i-%p\0' \
| while IFS= read -r -d '' devino_path; do
devino=${devino_path%%-*}
path=${devino_path#*-}
devino_of_file[$path]=$devino
count_of_devino[$devino]=$(( ${count_of_devino[$devino]-0}+1 ))
done
for path in "${!devino_of_file[#]}"; do
devino=${devino_of_file[$path]}
(( ${count_of_devino[$devino]} == 1 )) && printf '%s\n' "$path"
done
shopt -s lastpipe ensures that variables set in the while loop in the pipeline persist after the pipeline completes. It requires Bash 4.2 (released in 2011) or later.
The code uses "devino" values. The devino value for a path consists of the device number and inode number for the path, separated by a . character. A devino string should uniquely identify a file on a system, independent of any path to it.
The devino_of_file associative array maps paths to the corresponding devino values.
The count_of_devino associative array maps devino strings to counts of the number of paths found to them.
See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?) for an explanation of while IFS= read -r -d '' ....
When all files in the directory tree have been processed, all paths whose devino value have a count of 1 (meaning that no other path has been found to the same file) are printed.
The code that populates the associative arrays can handle arbitrary paths (including ones that contain spaces or newlines) but the output will be useless if any of the paths contain newlines (because of the '%s\n' format string).
Alternative paths caused by symlinks are automatically avoided because find doesn't follow symlinks by default. The code should still work if the -follow option to find is used though. (It's easier to test with symlinks than with directory hardlinks or bind mounts.)
Note that Bash code runs very slowly. It is interpreted in a very laborious way. The code above is likely to be too slow if the directory tree being processed has large numbers of files. For example, it processes files at a rate of around 10 thousand per second on my test VM.
From comments on the previous edit of this answer, it seems that the duplication is being caused because some files appear in two different places in the filesystem due to bind mounts.
That being the case, the original code you used produces technically correct output. However it is listing some relevant files more than once (because they have multiple names):
find /1 -type f -links 1 -print
A mounted filesystem is uniquely identified by its device number. A file is uniquely identified within that filesystem by its inode number. So a file can be uniquely identified on a particular host by the (device#,inode#) tuple. (GNU) find can provide these tuples along with filenames, as #pjh's answer shows:
find /1 -type f -links 1 -printf '%D.%i %p\0'
A simple (GNU) awk script can filter the output so that only one path is listed for each unique (device#,inode#):
find /1 -type f -links 1 -printf '%D.%i %p\0' |
gawk -v RS='\0' '!id[$1]++ && sub(/^[0-9.]+ /,"")'
This uses the common awk idiom !x[y]++ which evaluates to true only when the element y is inserted into the array x (it is inserted with value 0 the first time y is seen and the value is incremented thereafter; !0 is true).
The (device#,inode#) prefix is deleted by sub().
awk implicitly prints processed records if the "pattern" evaluates to true. ie. when a (device#,inode#) tuple is first seen and the prefix is successfully stripped. The (GNU) find output is delimited by nulls rather than newline, so the (GNU) awk script sets the input record separator RS to null also.
Forgive the humor in the comment, but I don't think you understand your question.
What I mean by that is that when you create a file, it's a link.
$: date > file1
$: ls -l file1 # note the 2nd field - the "number of hard links"
-rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
You think of file1 as the file, but it's ...complicated, lol.
The date command above creates output. The redirection tells "the system" that you want that data in "a file", so it allocates space on the disk, writes the data to that space, and creates an inode that defines the "file".
A "hard link" is basically just a link to that data. It's the same "file" with another name if you make another link. Editing either edits both (all, if you make several), because they are the same file.
$: date >file1
$: ln file1 file2
$: diff file?
$: cat file1
Mon Jun 13 17:30:22 GMT 2022
$: date >file2
$: diff file?
$: cat file1
Mon Jun 13 17:31:06 GMT 2022
Now, a symlink is another file of another kind with a different inode, containing the name of the file it "links" to symbolically, but a hard link is the file. ls -i will show you the inode index number, in fact.
$: date >file1
$: ln file1 file2
$: diff file?
$: cat file2
Mon Jun 13 17:34:41 GMT 2022
$: ls -li file? # note the 1st and 3rd fields
24415801 -rw-r--r--. 2 paul 518 29 Jun 13 17:34 file1
24415801 -rw-r--r--. 2 paul 518 29 Jun 13 17:34 file2
$: rm file2
$: ls -li file? # note the 1st and 3rd fields
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
Let's make a different file with that name and compare again.
$: date >file2
$: cat file? # not linked now
Mon Jun 13 17:34:41 GMT 2022
Mon Jun 13 17:41:23 GMT 2022
$: diff file? # now they differ
1c1
< Mon Jun 13 17:34:41 GMT 2022
---
> Mon Jun 13 17:41:23 GMT 2022
$: ls -li file? # and have different inodes, one link each
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
24419687 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:41 file2
If I cad copied the original data the diff would have been empty, but it would still be a different inode, so a different file, and I could have edited them independently.
And a symlink -
$: ln -s file1 file3
$: diff file1 file3
$: ls -li file?
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
24419687 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:41 file2
24419696 lrwxrwxrwx. 1 P2759474 518 5 Jun 13 17:44 file3 -> file1
Opening a symlink will usually open the file it targets, but it might depend on what tool you are using... be aware of the differences
You cannot create a hard link to a file on a separate filesystem, because it doesn't work that way. You can use a symlink.
What you might be looking for is
for f in *; [[ -f "$f" ]] && echo "$f"; done
or something like that.
Hope that helps.

printing output of ls | wc -l in a txt file with two columns

I have many directories with different number of CSV files in each of them. I am interested in getting a list of those directories with a their respective number of CSV files. So I have this bash loop. for i in 21 22 23 do ls my/directory/name$i/*csv | wc -l doneThe question is, how do I get a txt file with an output like this? name21 3name22 5name23 2 Where the 1st column corresponds to the name of each directory and the second one corresponds to the number of CSV files in each directory.
You can echo the names and results of your command to output.txt:
for i in 21 22 23
do
echo name$i $(ls my/directory/name$i/*csv | wc -l)
done > output.txt
https://mywiki.wooledge.org/ParsingLs explains a number of scenarios where using ls and parsing its output will produce wrong or unexpected results. You are much better off e.g. using an array and counting the number of elements in it.
for i in 21 22 23; do
files=( my/directory/name$i/*csv )
echo "name$i ${#files[#]}"
done >output.txt
Perhaps also notice the redirection after done, which avoids opening and closing the same file repeatedly inside the loop.
If you want to be portable to POSIX sh which doesn't have arrays, try a function.
count () {
echo "$#"
}
for i in 21 22 23; do
echo "name$i $(count my/directory/name$i/*csv)"
done >output.txt
For a robustness test, try creating files with problematic names like
touch my/directory/name21/"file name
with newline".csv my/directory/name22/'*'.csv

subset ls based on position

I have a list of files in a folder that need to be fed piped through to more commands, if I know the position of the files when using ls -v file_*.nc is it possible to remove/ignore files based on their position? So if ls -v file_*.nc returns 300 files, and I want files 8,73, and 151 removed from the pipe I could do something like ls -v file_*.nc | {remove 8,73,151} | do other stuff.
I don't want to delete/move the files, I just don't want them piped through to the next command.
If you wanted to filter out from the input as you said : is it possible to remove/ignore files
you can use grep -v <PATTERN> which -v is an exclusive match option.
input files
ls -v1 txt*
txt
txt-1
txt-2
txt-3
txt-4
txt-5
txt-6
txt-7
txt-8
txt-9
txt-10
then ignore any file which contains either 7, 8, 9
ls -v txt* | grep -v '[789]'
txt
txt-1
txt-2
txt-3
txt-4
txt-5
txt-6
txt-10
removed/ignored
txt-7
txt-8
txt-9

Not empty file, but "wc -l" outputs 0

I have a non-empty file (even a big one, 400Ko), that I can read with less.
But if I try to output the number of lines with wc -l /path/to/file it outputs 0.
How can it be possible?
You can verify for yourself that the file contains no newline/linefeed (ASCII 10) characters, which would result in wc -l reporting 0 lines.
First, count the characters in your file:
wc -c /path/to/file
You should get a non-zero value.
Now, filter out everything that isn't a newline:
tr -dc '\n' /path/to/file | wc -c
You should get back 0.
Or, delete the newlines and count the result.
tr -d '\n' | wc -c
You should get back the same value as in step 1.
wc counts number of '\n' characters in the file. Could it be that your file does not contain one?
Here is the GNU source:
https://www.gnu.org/software/cflow/manual/html_node/Source-of-wc-command.html
look for COUNT(c) macro.
Here's one way it's possible. Make a 400k file with just nulls in it:
dd if=/dev/zero bs=1024 count=400 of=/tmp/nulls ; ls -log /tmp/nulls
Output shows the file exists:
400+0 records in
400+0 records out
409600 bytes (410 kB, 400 KiB) copied, 0.00343425 s, 119 MB/s
-rw-rw-r-- 1 409600 Feb 28 11:12 /tmp/nulls
Now count the lines:
wc -l /tmp/nulls
0 /tmp/nulls
It is possible if the HTML file is minified. The newline characters would have been removed during minification of the content.
Try with file command,
file filename.html
filename.html: HTML document text, UTF-8 Unicode text, with very long lines, with no line terminators

Command to list all file types and their average size in a directory

I am working on a specific project where I need to work out the make-up of a large extract of documents so that we have a baseline for performance testing.
Specifically, I need a command that can recursively go through a directory and, for each file type, inform me of the number of files of that type and their average size.
I've looked at solutions like:
Unix find average file size,
How can I recursively print a list of files with filenames shorter than 25 characters using a one-liner? and https://unix.stackexchange.com/questions/63370/compute-average-file-size, but nothing quite gets me to what I'm after.
This du and awk combination should work for you:
du -a mydir/ | awk -F'[.[:space:]]' '/\.[a-zA-Z0-9]+$/ { a[$NF]+=$1; b[$NF]++ }
END{for (i in a) print i, b[i], (a[i]/b[i])}'
Give you something to start, with below script, you will get a list of file and its size, line by line.
#!/usr/bin/env bash
DIR=ABC
cd $DIR
find . -type f |while read line
do
# size=$(stat --format="%s" $line) # For the system with stat command
size=$(perl -e 'print -s $ARGV[0],"\n"' $line ) # #Mark Setchell provided the command, but I have no osx system to test it.
echo $size $line
done
Output sample
123 ./a.txt
23 ./fds/afdsf.jpg
Then it is your homework, with above output, you should be easy to get file type and their average size
You can use "du" maybe:
du -a -c *.txt
Sample output:
104 M1.txt
8 in.txt
8 keys.txt
8 text.txt
8 wordle.txt
136 total
The output is in 512-byte blocks, but you can change it with "-k" or "-m".

Resources