awk 'uniq' on a range of columns - sorting

I'm trying to filter out all duplicates of a list, ignoring the first n columns, preferable using awk (but open for other implementations)
I've found a solution for a fixed number of columns, but as I don't know how many columns there will be, I need a range. That solution I've found here
For clarity:
What I'm trying to achieve is an alias for history which will filter out duplicates, but leaves the history_id intact, preferably without messing with the order.
The history is in this form
ID DATE HOUR command
5612 2019-07-25 11:58:30 ls /var/log/schaubroeck/audit/2019/May/
5613 2019-07-25 12:00:22 ls /var/log/schaubroeck/
5614 2019-07-25 12:11:30 ls /etc/logrotate.d/
5615 2019-07-25 12:11:35 cat /etc/logrotate.d/samba
5616 2019-07-25 12:11:49 cat /etc/logrotate.d/named
So this command works for commands up to four arguments long, but I need to replace the fixed columns by a range to account for all cases:
history | awk -F "[ ]" '!keep[$4 $5 $6 $7]++'
I feel #kvantour is getting me on the right path, so I tried:
history | awk '{t=$0;$1=$2=$3=$4="";k=$0;$0=t}_[k]++' | grep cd
But this still yields duplicate lines
1102 2017-10-27 09:05:07 cd /tmp/
1109 2017-10-27 09:07:03 cd /tmp/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1124 2017-11-07 16:38:50 cd /etc/init.d/
1127 2017-12-29 11:13:26 cd /tmp/
1144 2018-06-21 13:04:26 cd /etc/init.d/
1161 2018-06-28 09:53:21 cd /etc/init.d/
1169 2018-07-09 16:33:52 cd /var/log/
1179 2018-07-10 15:54:32 cd /etc/init.d/

you can use sort:
history | sort -u -k4
-u for unique
-k4 to sort only on all columns starting the fourth.
Running this on
1102 2017-10-27 09:05:07 cd /tmp/
1109 2017-10-27 09:07:03 cd /tmp/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1124 2017-11-07 16:38:50 cd /etc/init.d/
1127 2017-12-29 11:13:26 cd /tmp/
1144 2018-06-21 13:04:26 cd /etc/init.d/
1161 2018-06-28 09:53:21 cd /etc/init.d/
1169 2018-07-09 16:33:52 cd /var/log/
1179 2018-07-10 15:54:32 cd /etc/init.d/
yields:
1124 2017-11-07 16:38:50 cd /etc/init.d/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1102 2017-10-27 09:05:07 cd /tmp/
1169 2018-07-09 16:33:52 cd /var/log/
EDIT if you want to keep the order you might apply a second sort:
history | sort -u -k4 | sort -n

The command you propose will not work as you expect. Imagine you have two lines like:
a b c d 12 13 1
x y z d 1 21 31
Both lines will be considered duplicates as the key, used in the array _ is for both d12131.
This is probably what you are interested in:
$ history | awk '{t=$0;$1=$2=$3="";k=$0;$0=t}!_[k]++'
Here we store the original record in the variable t. Remove the first three fields of the record by assigning empty values to it. This will redefine the record $0 and store it in the key k. Then we reset the record to the value of t. We do the check with the key k which now holds all fields except the first 3.
note: setting the field separtor as -F" " will not set it to a single space, but to any seqence of blanks (spaces and tabs). This is also the default behaviour. If you want a single space, add -F"[ ]"

Related

Terminal Piping and Writing to File

I am trying to copy the first two items in my 'Downloads' directory using only the terminal.
I open up zsh, cd into my 'Downloads' directory and start typing.
The below reflects what is shown in the terminal:
% ls -lt | head -3
file1.csv
file2.csv (exactly the files I want)
% ls -lt | head -3 > ToBeCopied.txt
% vim ToBeCopied.txt
total 24625744
-rw-r--r-- 1 Aaron staff 0 22 Apr 15:28 ToBeCopied.txt
-rw-r--r--# 1 Aaron staff 42042 22 Apr 15:16 file1.csv
What happened to file2.csv?

How to sync the modification date of folders within two directories that are the same?

I have a Dropbox folder on one computer with all the original modification dates. Recently, after transferring my data onto another computer, due to a .DS_Store issue, some of the folder's "Date Modified" dates were changed to today. I am trying to write a script that would take the original modification date of a folder, and then be able to find the corresponding folder in my new computer, and change it using touch. The idea is to use stat and touch -mt to do this. Does anyone have any suggestions or better thoughts? Thanks.
Use one folder as the reference for another with --reference=SOURCE:
$ cd "$(mktemp --directory)"
$ touch -m -t 200112311259 ./first
$ touch -m -t 200201010000 ./second
$ ls -l | sed "s/${USER}/user/g"
total 0
-rw-r--r-- 1 user user 0 Dec 31 2001 first
-rw-r--r-- 1 user user 0 Jan 1 2002 second
$ touch -m --reference=./first ./second
$ ls -l | sed "s/${USER}/user/g"
total 0
-rw-r--r-- 1 user user 0 Dec 31 2001 first
-rw-r--r-- 1 user user 0 Dec 31 2001 second

mkdir doesn't do path expansion

So I have folder aa
$ mkdir aa
and path expansion for ls command works like this:
$ ls -la a*
total 0
drwxr-xr-x 1 a a 0 Mar 29 08:41 ./
drwxr-xr-x 1 a a 0 Dec 31 1979 ../
$ ls -la a?
total 0
drwxr-xr-x 1 a a 0 Mar 29 08:41 ./
drwxr-xr-x 1 a a 0 Dec 31 1979 ../
But "the same" for mkdir shows an error:
$ mkdir a*/bb
mkdir: cannot create directory 'a*/bb': No such file or directory
$ mkdir a?/bb
mkdir: cannot create directory 'a?/bb': No such file or directory
Where can I read why this difference in behavior happens and is there simple trick to let mkdir be "smarter" for behavior like in ls?
This does not work, since wildcard expansion is done before the argument is passed to mkdir. bash tries to expand a*/bb, doesn't find a match and tells you so. mkdir is not even invoked here. You can also try e.g.
echo a*/bb
or as you did before
ls -la a*/bb
Both commands will give you the same error message.
Now I realize how stupid that question was. Probably I wanted something like this for expansion to work:
mkdir "$(ls -d a?)"/bb
Try:
mkdir -p a*/aa
mkdir -p a?/aa

cd command fails when directory is extracted from windows file

I have one text file in windows that contains lots of directories that I need to extract.
I tried to extract one directory and tried to cd to it in a shell script, but the cd command failed, with prompting cd: /VAR/GPIO/: No such file or directory.
I have confirmed that the directory exists in my local PC and the directory is correct (though it is relative). I have also searched a lot, seems some special windows characters exist in the extract file. I tried to see them with cat -A check and the result is ^[[m^[[K^[[m^[[KVAR/GPIO/$
I don't even know what the meaning of the m^ or [[K.
Could you please help me about this problem? I use Cygwin in Windows 7 64-bit.
Below is my related code for review:
templt_dir=$(cat temp | grep -m 1 "$templt_name" |head -1 | sed -n "s#$templt_name##p" | sed -n "s#\".*##p")
echo $templt_dir ###comment, it runs output: /VAR/GPIO/, that's correct!
cd $templt_dir ###comment, cd error prompts
cat temp | grep -m 1 "$templt_name" |head -1 | sed -n "s#$templt_name##p" | sed -n "s#\".*##p" > check ###comment, for problem checking
Below is the content of the check file:
$ cat -A check
^[[m^[[K^[[m^[[KVAR/GPIO/$
To confirm my directory is correct, below is the results of ls -l on /VAR:
$ ls VAR -l
total 80K
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:11 Analog/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:37 Communication/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:10 GPIO/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:11 HumanInterface/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:11 Memory/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:11 PWM/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:10 Security/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:11 System/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 25 16:25 Timers/
drwxrwx---+ 1 Administrators Domain Users 0 Jun 24 11:10 UniversalDevice/
The error message cd: /VAR/GPIO/: No such file or directory indicates that
the name stored in $templt_dir doesn’t exist.
This is actually due to the string containing non-printing ANSI escape
sequences.
You need to remove these characters from the string containing the directory.
I found the following sed substitution from this Unix and Linux answer
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g"
which you should include in your pipe command:
templt_dir=$(grep -m 1 "$templt_name" temp | sed -n "s#$templt_name##p; s#\".*##p" | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g")
Note: I concatenated your two sed substitutions into the one command and I removed the unnecessary cat. I also removed the redundant head -1 since grep -m 1 should only output one line. You can probably combine all the sed substitutions into one: sed -r "s#$templt_name##; s#\".*##; s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" (the -n sed option and p sed command can be left out if there’s only line being processed but I can’t test this without having the original file).
Other ways of using sed to strip ANSI escape sequences are listed at Remove color codes (special characters) with sed.
However, a better long-term fix would be to modify the process which creates the text file listing the directories to not include ANSI Escape codes in its output.

How to use shell command to check SVN check-out files and create links in an directory

Basically my question is how to use bash shell command to do following automatically, so I can track modified files easily.
list svn check-out files
create link files to above files in an directory called "change"
laptop$ svn status -q
M rcms/src/config/ta_show.c
M rcms/src/config/ta_config.c
laptop$ cd change
laptop$ link -s ../rcms/src/config/ta_show.c ta_show.c
laptop$ link -s ../rcms/src/config/ta_config.c ta_config.c
laptop$ ls
lrwxrwxrwx 1 root root 59 Nov 27 12:24 ta_show.c -> ../rcms/src/config/ta_show.c
lrwxrwxrwx 1 root root 59 Nov 27 12:24 ta_config.c -> ../rcms/src/config/ta_config.c
I am thinking to use shell command like below:
$ svn status -q | sed 's/M //' | xargs -I xxx ln -s ***BETWEEN REAL FILE AND BASE FILENAME***
you have two things need to be concerned:
the empty line between each file with svn status 'M'
extract the file name
the awk one liner could do it:
awk '$0{x=$2;gsub(".*/","",x);print "ln -s ../"$2" "x}'
so if you pipe your svn status output to the line above, it print the ln -s command lines for you.
if you want the ln -s lines to get executed, you could either pipe the output to sh (svn status|awk ...|sh) or replace the print with system
at the end i would like to show the output below as an exmple:
kent$ echo "M rcms/src/config/ta_show.c
M rcms/src/config/ta_config.c"|awk '$0{x=$2;gsub(".*/","",x);print "ln -s .."$2" "x}'
ln -s ../rcms/src/config/ta_show.c ta_show.c
ln -s ../rcms/src/config/ta_config.c ta_config.c

Resources