UNIX ls -l date format varies - shell

I am using SunOS 5.10. I would like the contents of an "ls -l" to be directed into a file that can be read into a database. However the time format varies. Below is a sample of the output of an ls -l. Why do the files ls_txt.sh and nohup.out have a timestamp and not a year value?
-rw-rw-r-- 1 gilmog other 57 Jul 25 2017 fnd2.txt
-rw-rw-r-- 1 gilmog other 702 Jan 24 2018 handySh
-rw-rw-r-- 1 gilmog other 189 Nov 7 23:20 ls_txt.sh
-rw------- 1 gilmog other 3915 Sep 12 03:58 nohup.out
-rw-rw-r-- 1 gilmog other 1655 Jan 24 2018 npiFn.sas

Caution: do not parse the output of ls. Its output is meant for human consumption, to understand the contents of the filesystem. If you want a program to know time information about a file, use stat1.
Now, with that out of the way, I'll answer your question. The time varies because that's how it's defined to work. From the POSIX documentation on ls:
The field shall contain the appropriate date and timestamp of when the file was last modified. In the POSIX locale, the field shall be the equivalent of the output of the following date command:
date "+%b %e %H:%M"
if the file has been modified in the last six months, or:
date "+%b %e %Y"
(where two characters are used between %e and %Y ) if the file has not been modified in the last six months or if the modification date is in the future, except that, in both cases, the final produced by date shall not be included and the output shall be as if the date command were executed at the time of the last modification date of the file rather than the current time. When the LC_TIME locale category is not set to the POSIX locale, a different format and order of presentation of this field may be used.
This definition makes a horrible mess for a program to parse. So, to reiterate: do not parse ls output.
1 If you don't have stat on your Solaris box, then you might just have to rely on ls. I'm sorry. The command for that is approximately ls -siv -# -/ c -%all z.

Related

Finding files that are *not* hard links or under hard links directory via a shell script

I would like to find all files not a hard link or under a hard link directory.
I found this awesome SO but below command do not handle the case under hard link directory!
find /1 -type f -links 1 -print
for example:
/1/2/3/test.txt
/1/A/3/test.txt
2 is hard link to A, then we only expect find one test.txt file.
One more example from android:
$ adb shell ls -li /data/data/com.android.nfc |grep files
4243 drwxrwx--x 2 nfc nfc 3488 2022-06-13 11:08 files
$ adb shell ls -li /data/user/0/com.android.nfc |grep files
4243 drwxrwx--x 2 nfc nfc 3488 2022-06-13 11:08 files
$ adb shell ls -li /data/data/com.android.nfc/files/service_state.xml
5877 -rw------- 1 nfc nfc 100 2022-06-13 11:08 /data/data/com.android.nfc/files/service_state.xml
$ adb shell ls -li /data/user/0/com.android.nfc/files/service_state.xml
5877 -rw------- 1 nfc nfc 100 2022-06-13 11:08 /data/user/0/com.android.nfc/files/service_state.xml
Systems that support unrestricted hard links to directories are rare, but a similar situation can be created using bind mounts. (See What is a bind mount?.)
Try this Shellcheck-clean code to list files under the current directory that do not have multiple paths (caused by bind mounts or links to directories):
#! /bin/bash -p
shopt -s lastpipe
declare -A devino_of_file
declare -A count_of_devino
find . -type f -printf '%D.%i-%p\0' \
| while IFS= read -r -d '' devino_path; do
devino=${devino_path%%-*}
path=${devino_path#*-}
devino_of_file[$path]=$devino
count_of_devino[$devino]=$(( ${count_of_devino[$devino]-0}+1 ))
done
for path in "${!devino_of_file[#]}"; do
devino=${devino_of_file[$path]}
(( ${count_of_devino[$devino]} == 1 )) && printf '%s\n' "$path"
done
shopt -s lastpipe ensures that variables set in the while loop in the pipeline persist after the pipeline completes. It requires Bash 4.2 (released in 2011) or later.
The code uses "devino" values. The devino value for a path consists of the device number and inode number for the path, separated by a . character. A devino string should uniquely identify a file on a system, independent of any path to it.
The devino_of_file associative array maps paths to the corresponding devino values.
The count_of_devino associative array maps devino strings to counts of the number of paths found to them.
See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?) for an explanation of while IFS= read -r -d '' ....
When all files in the directory tree have been processed, all paths whose devino value have a count of 1 (meaning that no other path has been found to the same file) are printed.
The code that populates the associative arrays can handle arbitrary paths (including ones that contain spaces or newlines) but the output will be useless if any of the paths contain newlines (because of the '%s\n' format string).
Alternative paths caused by symlinks are automatically avoided because find doesn't follow symlinks by default. The code should still work if the -follow option to find is used though. (It's easier to test with symlinks than with directory hardlinks or bind mounts.)
Note that Bash code runs very slowly. It is interpreted in a very laborious way. The code above is likely to be too slow if the directory tree being processed has large numbers of files. For example, it processes files at a rate of around 10 thousand per second on my test VM.
From comments on the previous edit of this answer, it seems that the duplication is being caused because some files appear in two different places in the filesystem due to bind mounts.
That being the case, the original code you used produces technically correct output. However it is listing some relevant files more than once (because they have multiple names):
find /1 -type f -links 1 -print
A mounted filesystem is uniquely identified by its device number. A file is uniquely identified within that filesystem by its inode number. So a file can be uniquely identified on a particular host by the (device#,inode#) tuple. (GNU) find can provide these tuples along with filenames, as #pjh's answer shows:
find /1 -type f -links 1 -printf '%D.%i %p\0'
A simple (GNU) awk script can filter the output so that only one path is listed for each unique (device#,inode#):
find /1 -type f -links 1 -printf '%D.%i %p\0' |
gawk -v RS='\0' '!id[$1]++ && sub(/^[0-9.]+ /,"")'
This uses the common awk idiom !x[y]++ which evaluates to true only when the element y is inserted into the array x (it is inserted with value 0 the first time y is seen and the value is incremented thereafter; !0 is true).
The (device#,inode#) prefix is deleted by sub().
awk implicitly prints processed records if the "pattern" evaluates to true. ie. when a (device#,inode#) tuple is first seen and the prefix is successfully stripped. The (GNU) find output is delimited by nulls rather than newline, so the (GNU) awk script sets the input record separator RS to null also.
Forgive the humor in the comment, but I don't think you understand your question.
What I mean by that is that when you create a file, it's a link.
$: date > file1
$: ls -l file1 # note the 2nd field - the "number of hard links"
-rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
You think of file1 as the file, but it's ...complicated, lol.
The date command above creates output. The redirection tells "the system" that you want that data in "a file", so it allocates space on the disk, writes the data to that space, and creates an inode that defines the "file".
A "hard link" is basically just a link to that data. It's the same "file" with another name if you make another link. Editing either edits both (all, if you make several), because they are the same file.
$: date >file1
$: ln file1 file2
$: diff file?
$: cat file1
Mon Jun 13 17:30:22 GMT 2022
$: date >file2
$: diff file?
$: cat file1
Mon Jun 13 17:31:06 GMT 2022
Now, a symlink is another file of another kind with a different inode, containing the name of the file it "links" to symbolically, but a hard link is the file. ls -i will show you the inode index number, in fact.
$: date >file1
$: ln file1 file2
$: diff file?
$: cat file2
Mon Jun 13 17:34:41 GMT 2022
$: ls -li file? # note the 1st and 3rd fields
24415801 -rw-r--r--. 2 paul 518 29 Jun 13 17:34 file1
24415801 -rw-r--r--. 2 paul 518 29 Jun 13 17:34 file2
$: rm file2
$: ls -li file? # note the 1st and 3rd fields
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
Let's make a different file with that name and compare again.
$: date >file2
$: cat file? # not linked now
Mon Jun 13 17:34:41 GMT 2022
Mon Jun 13 17:41:23 GMT 2022
$: diff file? # now they differ
1c1
< Mon Jun 13 17:34:41 GMT 2022
---
> Mon Jun 13 17:41:23 GMT 2022
$: ls -li file? # and have different inodes, one link each
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
24419687 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:41 file2
If I cad copied the original data the diff would have been empty, but it would still be a different inode, so a different file, and I could have edited them independently.
And a symlink -
$: ln -s file1 file3
$: diff file1 file3
$: ls -li file?
24415801 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:34 file1
24419687 -rw-r--r--. 1 P2759474 518 29 Jun 13 17:41 file2
24419696 lrwxrwxrwx. 1 P2759474 518 5 Jun 13 17:44 file3 -> file1
Opening a symlink will usually open the file it targets, but it might depend on what tool you are using... be aware of the differences
You cannot create a hard link to a file on a separate filesystem, because it doesn't work that way. You can use a symlink.
What you might be looking for is
for f in *; [[ -f "$f" ]] && echo "$f"; done
or something like that.
Hope that helps.

How I can format a date in shell

I want to convert a custom date format DD-%%%-YYYY to a standard one: YYYYMMDD
Possible values of %%% are:
Jan Fev Mar Avr Mai Jun Jui Aou Sep Oct Nov Dec
Assuming the input is a bash variable, how do I transform it to the standard format?
Example:
$ fr_date='09-Aou-2018'
$ # [transformation]
# sql_date should now contain 20180809
$ echo "$sql_date"
20180809
You can use the date utility.
fr_date='09-Aug-2018'
sql_date="$(date --date=$fr_date +%Y%m%d)"
echo $sql_date
20180809
Please also refer to the date man page for more information.
Additionally, date does not support custom locales, format must be locale independent. Try to store dates as simple Unix epoch.
Solution 1: Rewrite the french months into english, then use date to read and format it:
Pure bash:
tmp=${fr_date/Fev/Feb} tmp=${tmp/Avr/Apr} tmp=${tmp/Mai/May}
tmp=${tmp/Jui/Jul} tmp=${tmp/Aou/Aug}
sql_date=$(date +%Y%m%d -d "$tmp")
With sed:
tmp=$(sed 's/Fev/Feb;s/Avr/Apr;s/Mai/May;s/Jui/Jul;s/Aou/Aug' <<<"$fr_date")
sql_date=$(date +%Y%m%d -d "$tmp")
Solution 2: Assign to each month its corresponding number:
#!/bin/bash
# Requires bash 4 for associative arrays
declare -A month_map=(
[Jan]=01 [Fev]=02 [Mar]=03 [Avr]=04 [Mai]=05 [Jun]=06
[Jui]=07 [Aou]=08 [Sep]=09 [Oct]=10 [Nov]=11[Dec]=12
)
IFS=- read -r day month year <<<"$fr_date"
sql_date=$year${month_map[$month]}$day

bash substring of command instead of variable?

I can do this in bash:
foo=bar
echo ${foo:0:2}
which prints 'ba' (the first two characters of 'bar').
Now I want to do the same with a script/command output instead of a variable, like so:
echo ${$(date):0:10}
But then I get an error: "bad substitution".
Of course I can use an intermediary variable:
foo=$(date)
echo ${foo:0:10}
But is there a way to do this directly?
P.S. The date command is just an example, this is not about generating some date string in a particular format. Just the general concept of taking a substring from an arbitrary shell command output.
No, BASH syntax doesn't allow any kind of nesting. You can do so using external utilities like cut:
date | cut -c 1-10
Wed Jun 13
To replace date:
$ date
Wed Jun 13 14:57:38 EEST 2018
you can use printf:
$ printf "%(%a %b %d%n)T"
Wed Jun 13
man strftime for more format modifiers.
If you want the output Wed Jun 13 for today's date (which this is), then
$ date +'%a %b %e'
Wed Jun 13
See the manual for date and/or for strftime on your system.

How can I iterate over .log files, process them through awk, and replace with output files with different extensions?

Let's say that we have multiple .log files on the prod unix machine(Sunos) in a directory:
For example:
ls -tlr
total 0
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2017-01.log
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2016-02.log
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 todo2015-01.log
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 fix20150223.log
The purpose here is that via nawk I extract specific info from the logs( parse logs ) and "transform" them to .csv files in order to load them to ORACLE tables afterwards.
Although the nawk has been tested and works like a charm, how could I automate a bash script that does the following:
1) For a list of given files in this path
2) nawk (to do my extraction of specific data/info from the log file)
3) Output separately each file to a unique .csv to another directory
4) remove the .log files from this path
What does concern me is that the loadstamp/timestamp on each file ending that is different. I have implemented a script that works only for the latest date. (eg. last month). But I want to load all the historical data and I am bit stuck.
To visualize, my desired/target output is this:
bash-4.4$ ls -tlr
total 0
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2017-01.csv
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2016-02.csv
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 todo2015-01.csv
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 fix20150223.csv
How could this bash script please be achieved? The loading will only takes one time, it's historical as mentioned.
Any help could be extremely useful.
An implementation written for readability rather than terseness might look like:
#!/usr/bin/env bash
for infile in *.log; do
outfile=${infile%.log}.csv
if awk -f yourscript <"$infile" >"$outfile"; then
rm -f -- "$infile"
else
echo "Processing of $infile failed" >&2
rm -f -- "$outfile"
fi
done
To understand how this works, see:
Globbing -- the mechanism by which *.log is replaced with a list of files with that extension.
The Classic for Loop -- The for infile in syntax, used to iterate over the results of the glob above.
Parameter expansion -- The ${infile%.log} syntax, used to expand the contents of the infile variable with any .log suffix pruned.
Redirection -- the syntax used in <"$infile" and >"$outfile", opening stdin and stdout attached to the named files; or >&2, redirecting logs to stderr. (Thus, when we run awk, its stdin is connected to a .log file, and its stdout is connected to a .csv file).

Finding the difference between two time stamp with date in it

I would like to get a simple shell script to find the difference in two times.
Example:
Tue May 9 10:38:17 BST 2017
-rw-rw-r-- 1 unikix unikix 1387 Feb 17 11:34 ABC
The first one is the system date and other one, the time extracted from output of list command.
There is a need to check the time format also (12 hour clock and 24 clock) before finding the difference.
Try the following:
DATE1="Tue May 9 10:38:17 BST 2017"
DATE2="Feb 17 11:34"
sec1=$(date --date "${DATE1}" +"%s")
sec2=$(date --date "${DATE2}" +"%s")
diff=$((${sec1} - ${sec2}))
echo "Difference is ${diff} sec"

Resources