I have the following content in ${f} variable (this being a string):
-rw-r--r-- 1 ftp ftp 407 Jul 20 04:46 abc.zip
-rw-r--r-- 1 ftp ftp 427 Jul 20 05:59 def.zip
-rw-r--r-- 1 ftp ftp 427 Jul 20 06:17 ghi.zip
-rw-r--r-- 1 ftp ftp 427 Jul 20 06:34 jkl.zip
-rw-r--r-- 1 ftp ftp 227820 Jul 20 08:47 mno.zip
What I would like to get is only the files names out of it, such as:
abc.zip
def.zip
ghi.zip
jkl.zip
mno.zip
You can use awk to print the last field in each line. Use <<< input redirection to use the variable value as input.
awk '{print $NF}' <<<"$f"
Note that this won't work if any of the filenames have spaces in their names, you'll just get the part after the last space. Unfortunately, parsing ls output is not trivial.
Related
I would like to sort the files of a directory using stat -c %n%Y <directory_path>*, this command gives you the name of the file concatenated with the time of last modification. Problem is that I don't know how to take last number characters (last modification time) in order to sort the files with a pipe, I guess it would be something like this stat -c %n%Y <directory_path>* | sort, I have already read stat and sort mannuals and other filter mannuals like find or cut and I still don't have a solution. Any suggestions are appreciated, thank you.
If you separate the file and time with a space then you can sort the second column using -k 2:
rseaman#Ubuntu-PC:~/temp/20180310/49211913$ ls -l
total 12
-rw-rw-r-- 1 rseaman rseaman 0 Mar 10 18:42 file0
-rw-rw-r-- 1 rseaman rseaman 37 Mar 10 18:10 file1
-rw-rw-r-- 1 rseaman rseaman 22 Mar 10 18:10 file2
-rw-rw-r-- 1 rseaman rseaman 19 Mar 10 18:13 file3
rseaman#Ubuntu-PC:~/temp/20180310/49211913$ stat -c "%n %Y" * | sort -n -k 2
file1 1520705401
file2 1520705411
file3 1520705612
file0 1520707323
You can then remove the space afterwards if you wish with | tr -d ' ', but this will interfere with files which have a space in them.
you can list the files in directory (sorted by modify time) with a simple command:
ll -trh
OR
ls -ltrh
-t sort by modification time
-r reverse order while sorting
-h with -l, print sizes in human readable format
I have two text files (new.txt and old.txt) which contains the recursively navigated directories.
new.txt
338465485 16 drwxr-x--- 26 encqa2 encqa2 16384 Nov 13 06:04 ./
338465486 4 drwxr-x--- 4 encqa2 encqa2 4096 Sep 19 08:38 ./excalibur
338465487 8 drwxr-x--- 3 encqa2 encqa2 8192 Nov 11 14:33 ./excalibur/data_in
338465488 4 drwxr-x--- 2 encqa2 encqa2 4096 Nov 9 23:16 ./excalibur/data_in/archive
old.txt
338101011 40 drwxr-x--- 26 encqa2 encqa2 36864 Nov 13 06:05 ./
338101012 4 drwxr-x--- 4 encqa2 encqa2 4096 Dec 14 2016 ./manual
338101013 4 drwxr-x--- 2 encqa2 encqa2 4096 Aug 25 2016 ./manual/sorted
338101014 4 drwxr-x--- 2 encqa2 encqa2 4096 Aug 25 2016 ./manual/archive
338101015 4 drwxr-x--- 4 encqa2 encqa2 4096 Aug 25 2016 ./adp
338101016 4 drwxr-x--- 6 encqa2 encqa2 4096 Aug 25 2016 ./adp/0235
what I need is the only it provides me the directories , i.e
expected output after diff should be
./
./excalibur
./excalibur/data_in
./excalibur/data_in/archive
./excalibur/archive
./shares
./shares/data_in
./shares/data_in/archive
./shares/sorted
please provide me the command
If I understand correctly, you want to do get those lines from the two text files which are different, but from these lines you want to output only the directory names, not the full information.
If you do a
diff {old,new}.txt
the differing lines are marked in the output with either a '>' or a '<' in the first column, so you get the desired lines by grepping for these characters:
diff {old,new}.txt | grep '^[<>]' | ....
Now you need only the file names. This is easiest if you know for sure that your pathes won't contain any space. In this case, you can, for instance, pipe your data into:
... | grep -oE ' [^ ]+$' | cut -d ' ' -f 2 | ...
If however the file names can contain spaces, you need to follow a different strategy. For instance, if you know that the number of characters in each line up to the file name is always the same, you can use cut -c .... to select the last portion of the line. Otherwise, you would need to process each line using a regular expression which describes the portion you want to throw away. I would use in this case Perl or Ruby, because I'm most familiar with this, but it can also be done with other tools - Zsh, awk, sed.
After this, you need to remove duplicates. These may occur for instance if a line differs between new.txt and old.txt not in the filename part, but in the file information part. This can be done by finally piping everything into
.... | sort -u
Let's say that we have multiple .log files on the prod unix machine(Sunos) in a directory:
For example:
ls -tlr
total 0
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2017-01.log
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2016-02.log
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 todo2015-01.log
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 fix20150223.log
The purpose here is that via nawk I extract specific info from the logs( parse logs ) and "transform" them to .csv files in order to load them to ORACLE tables afterwards.
Although the nawk has been tested and works like a charm, how could I automate a bash script that does the following:
1) For a list of given files in this path
2) nawk (to do my extraction of specific data/info from the log file)
3) Output separately each file to a unique .csv to another directory
4) remove the .log files from this path
What does concern me is that the loadstamp/timestamp on each file ending that is different. I have implemented a script that works only for the latest date. (eg. last month). But I want to load all the historical data and I am bit stuck.
To visualize, my desired/target output is this:
bash-4.4$ ls -tlr
total 0
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2017-01.csv
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 file2016-02.csv
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 todo2015-01.csv
-rw-r--r-- 1 21922 21922 0 Sep 10 13:15 fix20150223.csv
How could this bash script please be achieved? The loading will only takes one time, it's historical as mentioned.
Any help could be extremely useful.
An implementation written for readability rather than terseness might look like:
#!/usr/bin/env bash
for infile in *.log; do
outfile=${infile%.log}.csv
if awk -f yourscript <"$infile" >"$outfile"; then
rm -f -- "$infile"
else
echo "Processing of $infile failed" >&2
rm -f -- "$outfile"
fi
done
To understand how this works, see:
Globbing -- the mechanism by which *.log is replaced with a list of files with that extension.
The Classic for Loop -- The for infile in syntax, used to iterate over the results of the glob above.
Parameter expansion -- The ${infile%.log} syntax, used to expand the contents of the infile variable with any .log suffix pruned.
Redirection -- the syntax used in <"$infile" and >"$outfile", opening stdin and stdout attached to the named files; or >&2, redirecting logs to stderr. (Thus, when we run awk, its stdin is connected to a .log file, and its stdout is connected to a .csv file).
I'm using this bash CGI:
#!/usr/bin/sh
echo "Content-type: text/html"
echo ""
echo `ls -al`
And it produces ie:
total 52 drwxrwxrwx. 2 root root 4096 Feb 2 18:34 . drwxr-xr-x. 8 root
root 4096 Feb 2 17:58 .. -rw-r--r--. 1 root root 36310 Feb 2 17:45
dds.jpg -rw-rw-rw-. 1 user user 50 Feb 2 18:03 dds_panel.htm
-rwxrwxrwx. 1 user user 460 Feb 2 18:34 test-cgi.cgi
In a terminal they appear each neatly on a single line but in the browser they appear all on the same line. What's the best way to keep the formatting?
If you do not need any html formatting, simply change the content-type to text/plain.
If you need html formatting your output should contain a complete html page. In this case surround your output with <pre>, replace newlines with <br> or convert your output in something like a list or table.
This question already has answers here:
How to delete duplicate lines in a file without sorting it in Unix
(9 answers)
Closed 7 years ago.
I want to remove duplicate entries from a text file, e.g:
kavitha= Tue Feb 20 14:00 19 IST 2012 (duplicate entry)
sree=Tue Jan 20 14:05 19 IST 2012
divya = Tue Jan 20 14:20 19 IST 2012
anusha=Tue Jan 20 14:45 19 IST 2012
kavitha= Tue Feb 20 14:00 19 IST 2012 (duplicate entry)
Is there any possible way to remove the duplicate entries using a Bash script?
Desired output
kavitha= Tue Feb 20 14:00 19 IST 2012
sree=Tue Jan 20 14:05 19 IST 2012
divya = Tue Jan 20 14:20 19 IST 2012
anusha=Tue Jan 20 14:45 19 IST 2012
You can sort then uniq:
$ sort -u input.txt
Or use awk:
$ awk '!a[$0]++' input.txt
It deletes duplicate, consecutive lines from a file (emulates "uniq").
First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
Perl one-liner similar to #kev's awk solution:
perl -ne 'print if ! $a{$_}++' input
This variation removes trailing whitespace before comparing:
perl -lne 's/\s*$//; print if ! $a{$_}++' input
This variation edits the file in-place:
perl -i -ne 'print if ! $a{$_}++' input
This variation edits the file in-place, and makes a backup input.bak
perl -i.bak -ne 'print if ! $a{$_}++' input
This might work for you:
cat -n file.txt |
sort -u -k2,7 |
sort -n |
sed 's/.*\t/ /;s/\([0-9]\{4\}\).*/\1/'
or this:
awk '{line=substr($0,1,match($0,/[0-9][0-9][0-9][0-9]/)+3);sub(/^/," ",line);if(!dup[line]++)print line}' file.txt