How to check if a UID is unique using Bash? - bash

I'm trying to do a simple bash Script to check if there is a not unique UID in /etc/passwd.
I did a script already to display all the UID but I can't figure out how to compare them and return the not unique one if it exists.
Here is the script I wrote:
#!/bin/bash
passwd="$(cat /etc/passwd)"
while read -r line; do
IFS=':' read -ra arrayPasswd <<< "$line"
echo ${arrayPasswd[2]}
done <<< "$passwd"
How could I compare them and return the not unique one in case there is one ?

If you have a script that outputs all of your UIDs, you can use sort and uniq programs to get the desired result:
$ your-script | sort | uniq -d
uniq program takes some lines and deletes consecutive repeats. Flag -d makes it output only duplicates and delete unique ones. You need to use sort first to make any repetitions consecutive.

With awk:
$ awk -F: 'a[$3]++==1 { print $3 }' /etc/passwd
Explanation:
In an associative array (a.k.a. dictionary), keep a count of occurrences of UIDs.
If any UID is repeated, print it.

Related

Print nth column from a CSV file, where myScript.sh reads the table and the column argument is combined like: "col1,col2,col3" in one argument

If I have a script which takes three arguments like so:
./myScript.sh path file col1,col3
and if file is like:
id,role,salary
05,engineer,45000
How would I split $3 into separate variables (note that this could be any number of variables, if I had a larger CSV file) in order to only print the corresponding columns to $3.
I've tried saving $3 to a variable, using Tr and array to possible equate the index of the array to the column header number. I failed to do this. What is the most simplistic amateurish approach to resolving this? It would be straight forward if the script took the columns as separate arguments, but when combined in one argument, it's complicating this quite a bit for me.
Expected output:
id,salary
05,45000
If the order of the columns must stay the same:
#!/bin/bash
path="$1"
fname="$2"
cols="$3"
header=($(head -1 "$fname" | sed 's/,/ /g'))
for i in "${!header[#]}"; do
cols=$(echo "$cols" | sed "s/${header[$i]}/$((i+1))/g")
done
cut -d',' -f$cols $fname
If you need more flexibility e.g. define the order of columns, just change the last part of the script with this:
for i in "${!header[#]}"; do
cols=$(echo "$cols" | sed -e "s/${header[$i]}/\$$((i+1))/g")
done
awk -F, "{print(${cols//,/\",\"})}" $fname
Output:
$ ./so.sh <path> input.txt id,salary
id,salary
05,45000
With the awk method, you can do stuff like
$ ./so.sh <path> input.txt id,salary,id,salary
id,salary,id,salary
05,45000,05,45000

Bash: sort rows within a file by timestamp

I am new to bash scripting and I have written a script to match regex and output lines to print to a file.
However, each line contains multiple columns, one of which is the timestamp column, which appears in the form YYYYMMDDHHMMSSTTT (to millisecond) as shown below.
20180301050630663,ABC,,,,,,,,,,
20180301050630664,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630666,ABC,,,,,,,,,,
20180301050630667,ABC,,,,,,,,,,
20180301050630668,ABC,,,,,,,,,,
20180301050630663,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630661,ABC,,,,,,,,,,
20180301050630662,ABC,,,,,,,,,,
My code is written as follow:
awk -F "," -v OFS=","'{if($2=="ABC"){print}}' < $i>> "$filename"
How can I modify my code such that it can sort the rows by timestamp (YYYYMMDDHHMMSSTTT) in ascending order before printing to file?
You can use a very simple sort command, e.g.
sort yourfile
If you want to insure sort only looks at the datestamp, you can tell sort to only use the first command separated field as your sorting criteria, e.g.
sort -t, -k1 yourfile
Example Use/Output
With your data save in a file named log, you could do:
$ sort -t, -k1 log
20180301050630661,ABC,,,,,,,,,,
20180301050630662,ABC,,,,,,,,,,
20180301050630663,ABC,,,,,,,,,,
20180301050630663,ABC,,,,,,,,,,
20180301050630664,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630665,ABC,,,,,,,,,,
20180301050630666,ABC,,,,,,,,,,
20180301050630667,ABC,,,,,,,,,,
20180301050630668,ABC,,,,,,,,,,
Let me know if you have any problems.
Just add a pipeline.
awk -F "," '$2=="ABC"' < "$i" |
sort -n >> "$filename"
In the general case, to sort on column 234. try sort -t, -k234,234n
Notice alse the quoting around "$i", like you already have around "$filename", and the simplifications of the Awk script.
If you are using gawk you can do:
$ awk -F "," -v OFS="," '$2=="ABC"{a[$1]=$0} # Filter lines that have "ABC"
END{ # set the sort method
PROCINFO["sorted_in"] = "#ind_num_asc"
for (e in a) print a[e] # traverse the array of lines
}' file
An alternative is to use sed and sort:
sed -n '/^[0-9]*,ABC,/p' file | sort -t, -k1 -n
Keep in mind that both of these methods are unrelated to the shell used. Bash is just executing the tools (sed, awk, sort, etc) that are otherwise part of the OS.
Bash itself could do the sort in pure Bash but it would be long and slow.

Concatenate files based on numeric sort of name substring in awk w/o header

I am interested in concatenate many files together based on the numeric number and also remove the first line.
e.g. chr1_smallfiles then chr2_smallfiles then chr3_smallfiles.... etc (each without the header)
Note that chr10_smallfiles needs to come after chr9_smallfiles -- that is, this needs to be numeric sort order.
When separate the two command awk and ls -v1, each does the job properly, but when put them together, it doesn't work. Please help thanks!
awk 'FNR>1' | ls -v1 chr*_smallfiles > bigfile
The issue is with the way that you're trying to pass the list of files to awk. At the moment, you're piping the output of awk to ls, which makes no sense.
Bear in mind that, as mentioned in the comments, ls is a tool for interactive use, and in general its output shouldn't be parsed.
If sorting weren't an issue, you could just use:
awk 'FNR > 1' chr*_smallfiles > bigfile
The shell will expand the glob chr*_smallfiles into a list of files, which are passed as arguments to awk. For each filename argument, all but the first line will be printed.
Since you want to sort the files, things aren't quite so simple. If you're sure the full range of files exist, just replace chr*_smallfiles with chr{1..99}_smallfiles in the original command.
Using some Bash-specific and GNU sort features, you can also achieve the sorting like this:
printf '%s\0' chr*_smallfiles | sort -z -n -k1.4 | xargs -0 awk 'FNR > 1' > bigfile
printf '%s\0' prints each filename followed by a null-byte
sort -z sorts records separated by null-bytes
-n -k1.4 does a numeric sort, starting from the 4th character (the numeric part of the filename)
xargs -0 passes the sorted, null-separated output as arguments to awk
Otherwise, if you want to go through the files in numerical order, and you're not sure whether all the files exist, then you can use a shell loop (although it'll be significantly slower than a single awk invocation):
for file in chr{1..99}_smallfiles; do # 99 is the maximum file number
[ -f "$file" ] || continue # skip missing files
awk 'FNR > 1' "$file"
done > bigfile
You can also use tail to concatenate all the files without header
tail -q -n+2 chr*_smallfiles > bigfile
In case you want to concatenate the files in a natural sort order as described in your quesition, you can pipe the result of ls -v1 to xargs using
ls -v1 chr*_smallfiles | xargs -d $'\n' tail -q -n+2 > bigfile
(Thanks to Charles Duffy) xargs -d $'\n' sets the delimiter to a newline \n in case the filename contains white spaces or quote characters
Using a bash 4 associative array to extract only the numeric substring of each filename; sort those individually; and then retrieve and concatenate the full names in the resulting order:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "Requires bash 4.0 or newer" >&2; exit 1;; esac
# when this is done, you'll have something like:
# files=( [1]=chr_smallfiles1.txt
# [10]=chr_smallfiles10.txt
# [9]=chr_smallfiles9.txt )
declare -A files=( )
for f in chr*_smallfiles.txt; do
files[${f//[![:digit:]]/}]=$f
done
# now, emit those indexes (1, 10, 9) to "sort -n -z" to sort them as numbers
# then read those numbers, look up the filenames associated, and pass to awk.
while read -r -d '' key; do
awk 'FNR > 1' <"${files[$key]}"
done < <(printf '%s\0' "${!files[#]}" | sort -n -z) >bigfile
You can do with a for loop like below, which is working for me:-
for file in chr*_smallfiles
do
tail +2 "$file" >> bigfile
done
How will it work? For loop read all the files from current directory with wild chard character * chr*_smallfiles and assign the file name to variable file and tail +2 $file will output all the lines of that file except the first line and append in file bigfile. So finally all files will be merged (accept the first line of each file) into one i.e. file bigfile.
Just for completeness, how about a sed solution?
for file in chr*_smallfiles
do
sed -n '2,$p' $file >> bigfile
done
Hope it helps!

Shell cut delimiter before last

I`m trying to cut a string (Name of a file) where I have to get a variable in the name.
But the problem is, I have to put it in a shell variable, until now it is ok.
Here is the example of what i have to do.
NAME_OF_THE_FILE_VARIABLEiWANTtoGET_DATE
NAMEfile_VARIABLEiWANT_DATE
NAME_FILE_VARIABLEiWANT_DATE
The position of the variable I want always can change, but it will be always 1 before last. The delimiter is the "_".
Is there a way to count the size of the array to get size-1 or something like that?
OBS: when i cut strings I always use things like that:
VARIABLEiWANT=`echo "$FILENAME" | cut 1 -d "_"`
awk -F'_' '{print $(NF-1)}' file
or you have a string
awk -F'_' '{print $(NF-1)}' <<< "$FILENAME"
save the output of above oneliner into your variable.
IFS=_ read -a array <<< "$FILENAME"
variable_i_want=${array[${#array[#]}-2]}
It's a bit of a mess visually, but it's more efficient than starting a new process. ${#array[#]} is the number of elements read from FILENAME, so the indices for the array range from 0 to ${#array[#]}-1.
As of bash 4.3, though, you can use a negative index instead of computing it.
variable_i_want=${array[-2]}
If you need POSIX compatibility (no arrays), then
tmp=${FILENAME%_${FILENAME##*_}} # FILENAME with last field removed
variable_i_want=${tmp##*_} # last field of tmp
Just got it... I find someone using a cat function... I got to use it with the echo... and rev. didn't understand the rev thing, but I think it revert the order of the delimiter.
CODIGO=`echo "$ARQ_NAME" | rev | cut -d "_" -f 2 | rev `

Loop over list of files to merge according their names

Files in the directory look like that:
A_1_email.txt
A_1_phone.txt
A_2_email.txt
A_2_phone.txt
B_1_email.txt
B_1_phone.txt
B_2_email.txt
B_2_phone.txt
What I want:
To merge files A_1_email.txt and A_1_phone.txt; to merge files B_1_email.txt and B_1_phone.txt and so on.
What I mean by that: if first to flags of files names matches (for example A to A; 1 to 1) than merge files.
How I tried to do this:
ls * | cut -d "_" -f 1-2 | sort | uniq -c | awk '{print $2}' > names && for name in
$(cat names); do
And I am lost here, really don't know how should I go on further.
The following are based on #MichaelJ.Barber's answer (which had the excellent idea of using join), but with the specific intention to avoid the dangerous practice of parsing the output of ls:
# Simple loop: avoids subshells, pipelines
for file in *_email.txt; do
if [[ -r "$file" && -r "${file%_*}_phone.txt" ]]; then
join "$file" "${file%_*}_phone.txt"
fi
done
or
##
# Use IFS and a function to avoid contaminating the global environment.
joinEmailPhone() {
local IFS=$'\n'
local -x LC_COLLATE=C # Ensure glob expansion sorting makes sense.
# According to `man (1) bash`, globs expand sorted "alphabetically".
# If we use LC_COLLATE=C, we don't need to sort again.
# Use an awk test (!seen[$0]++) to ensure uniqueness and a parameter expansion instead of cut
awk '!seen[$0]++{ printf("join %s_email.txt %s_phone.txt\n", $1, $1) }' <<< "${*%_*}" | sh
}
joinEmailPhone *
But in all probability (again assuming LC_COLLATE=C) you can get away with:
printf 'join %s %s\n' * | sh
I'll assume that the files all have tab-separated name-value pairs, where the value is email or phone as appropriate. If that's not the case, do some pre-sorting or otherwise modify as appropriate.
ls *_{email,phone}.txt |
cut -d "_" -f1-2 | # could also do this with variable expansion
sort -u |
awk '{ printf("join %s_email.txt %s_phone.txt\n", $1, $1) }' |
sh
What this does is to identify the unique prefixes for the files and use 'awk' to generate shell commands for joining the pairs, which are then piped into sh to actually run the commands.
You may use printf '%s\n' *_{email,phone}.txt | ... instead of ls *-... in the given scenario, i. e. no newline chars in file path names are to be expected. At least one external command less!
Use a for loop to iterate over the email files, using the read command
with the proper value of IFS to split the file name into the necessary parts.
Note that this does use one non-POSIX feature that bash provides, namely
using a here-string (<<<) to pass the value of $email to the read command.
for email in *_email.txt; do
IFS=_ read fst snd <<< $email
phone=${fst}_${snd}_phone.txt
# merge $email and $phone
done

Resources