I am writing a script to filter GID greater than 1000 and GID less than or equal to 1000. Purpose is to filter out local groups and non-local groups (groups coming from AD) from a file..
There is a file called groups.out which contains group names and GIDs. It could be in any order. Below is the sample file which contains local groups, non=local groups and GIDs as well.
1098052
1098051
domain users
fuse
gdm
haldaemon
and here is the logic I want to apply
Read line by line from the file,
if the line is a number then check
if number greater than or equal to 1000 then check
if greater than or equal to 1000, append it to the file
else if number less than 1000 then dump it
else if erorr occurs append error to file and break the loop and exit
if the line is a string then check the gid of the string/group
if number greater than or equal to 1000 then append to file
else if gid less than 1000 then dump it
else if error occurs append error to file and break the loop and exit
want to repeat it in the loop line by line and if anywhere the error occurs loop should break and exit the entire script
After successful execution of the loop it should print success or if any error occurs, it should exit and append errors to the file.
Below is my uncooked code with many parts missing. Many errors are there as well for gt or eq errors. so you can ignore it
fileA="groups.out"
value=1000
re='[a-z]'
num='[0-9]'
while IFS= read lineA
do
group=$(getent group "$lineA" | awk -F: '{print $3}')
# ------Don't know how to check if a number or string -----
if [ "$group" -gt "$value" ]; then
echo "$lineA" >> ldapgroups.out 2>> error.out
elif [ "$group" -lt "$value" ]; then
echo "$lineA" >> /dev/null 2>> error.out
else
echo " FAILED"
exit 1
fi
#/bin/bash
fileA="groups.out"
value=1000
num='^[0-9]+$'
while IFS= read lineA
do
#check if the line is numbers only
if [[ $lineA =~ $num ]];then
echo "This is a number"
echo $lineA
#check if $line is greater than 1000
if [[ $lineA -gt $value ]];then
#write it to file named numbers.out
echo "number is greater than 1000 writing to file"
echo $lineA >> numbers.out
else
echo "less than, Skipping"
fi
#if its not number, its group names right? so no need to check if with regex
else
#do what ever u want with group names here ...
echo "string"
echo $lineA
fi
# This is where you feed the file to the while loop
done < $fileA
Here is corrected version of your script. it should get u going.
chmod +x scriptfile and use bash scriptfile to run it or schedule it in crontab.
Since your information about how to match group names with gids isnt sufficent I left it out in the script but you should be able to finish it with provided information in other parts of the script.
This really looks like you want two separate scripts. Finding numbers in a particular range is simple with Awk.
awk '!/[^0-9]/ && ($1 >= 1000)' groups.out
The regular expression selects all-numeric input lines (or more properly, it excludes lines which contain a non-numeric character anywhere within them) and the numeric comparison requires the first field to be 1000 or more. (The default action of Awk is to print the entire line when the conditions in your script are true, so we can omit the implicit {print} action).
If you also want to extract the numbers which are less than 1000 to a separate file, the change should be obvious.
For the non-numeric values, we can do
grep '[^0-9]' groups.out |
xargs getent |
awk -F : '$3 >= 1000 { print $3 }'
Several of the branches in your pseudocode seem superfluous. It's not clear in what situation ou would expect an error to occur, or how the action you specify in the error situation would help you diagnose or recover from the error (write acc ss denied, disk full?) so I have not spent any energy on trying to implement those parts.
Related
I have a file which contains varoius data (date,time,speed, distance from the front, distance from the back), the file looks like this, just with more rows:
2003.09.23.,05:05:21:64,134,177,101
2009.03.10.,17:46:17:81,57,102,57
2018.01.05.,00:30:37:04,354,145,156
2011.07.11.,23:21:53:43,310,125,47
2011.06.26.,07:42:10:30,383,180,171
I'm trying to write a simple Bash program, which tells the dates and times when the 'distance from the front' is less than the provided parameter ($1)
So far I wrote:
#!/bin/bash
if [ $# -eq 0 -o $# -gt 1 ]
then
echo "wrong number of parameters"
fi
i=0
fdistance=()
input='auto.txt'
while IFS= read -r line
do
year=${line::4}
month=${line:5:2}
day=${line:8:2}
hour=${line:12:2}
min=${line:15:2}
sec=${line:18:2}
hthsec=${line:21:2}
fdistance=$(cut -d, -f 4)
if [ "$fdistance[$i]" -lt "$1" ]
then
echo "$year[$i]:$month[$i]:$day[$i],$hour[$i]:$min[$i]:$sec[$i]:$hthsec[$i]"
fi
i=`expr $i + 1`
done < "$input"
but this gives the error "whole expression required" and doesn't work at all.
If you have the option of using awk, the entire process can be reduced to:
awk -F, -v dist=150 '$4<dist {split($1,d,"."); print d[1]":"d[2]":"d[3]","$2}' file
Where in the example above, any record with distance (field 4, $4) less than the dist variable value takes the date field (field 1, $1) and splits() the field into the array d on "." where the first 3 elements will be year, mo, day and then simply prints the output of those three elements separated by ":" (which eliminates the stray "." at the end of the field). The time (field 2, $2) is output unchanged.
Example Use/Output
With your sample data in file, you can do:
$ awk -F, -v dist=150 '$4<dist {split($1,d,"."); print d[1]":"d[2]":"d[3]","$2}' file
2009:03:10,17:46:17:81
2018:01:05,00:30:37:04
2011:07:11,23:21:53:43
Which provides the records in the requested format where the distance is less than 150. If you call awk from within your script you can pass the 150 in from the 1st argument to your script.
You can also accomplish this task by substituting a ':' for each '.' in the first field with gsub() and outputting a substring of the first field with substr() that drops the last character, e.g.
awk -F, -v dist=150 '$4<dist {gsub(/[.]/,":",$1); print substr($1,0,length($1)-1),$2}' file
(same output)
While parsing the data is a great exercise for leaning string handling in shell or bash, in practice awk will be Orders of Magnitude faster than a shell script. Processing a million line file -- the difference in runtime can be seconds with awk compared to minutes (or hours) with a shell script.
If this is an exercise to learn string handling in your shell, just put this in your hip pocket for later understanding that awk is the real Swiss Army-Knife for text processing. (well worth the effort to learn)
Would you try the following:
#/bin/bash
if (( $# != 1 )); then
echo "usage: $0 max_distance_from_the_front" >& 2 # output error message to the stderr
exit 1
fi
input="auto.txt"
while IFS=, read -r mydate mytime speed fdist bdist; do # split csv and assign variables
mydate=${mydate%.}; mydate=${mydate//./:} # reformat the date string
if (( fdist < $1 )); then # if the front disatce is less than $1
echo "$mydate,$mytime" # then print the date and time
fi
done < "$input"
Sample output with the same parameter as Keldorn:
$ ./test.sh 130
2009:03:10,17:46:17:81
2011:07:11,23:21:53:43
There are a few odd things in your script:
Why is fdistance an array. It is not necessary (and here done wrong) since the file is read line by line.
What is the cut of the line fdistance=$(cut -d, -f 4) supposed to cut, what's the input?
(Note: When invalid parameters, better end the script right away. Added in the example below.)
Here is a working version (apart from the parsing of the date, but that is not what your question was about so I skipped it):
#!/usr/bin/env bash
if [ $# -eq 0 -o $# -gt 1 ]
then
echo "wrong number of parameters"
exit 1
fi
input='auto.txt'
while IFS= read -r line
do
fdistance=$(echo "$line" | awk '{split($0,a,","); print a[4]}')
if [ "$fdistance" -lt "$1" ]
then
echo $line
fi
done < "$input"
Sample output:
$ ./test.sh 130
2009.03.10.,17:46:17:81,57,102,57
2011.07.11.,23:21:53:43,310,125,47
$
This question already has answers here:
How can I use grep to match but without printing the matches?
(2 answers)
Closed 1 year ago.
in the code below I am attempting to prompt the user to search for, 1) a number, 2) an even or odd number, 3) a big or small number. If the user's number exists and they have entered "odd" and "small" for the above prompts then I wish to simply output all of the numbers within the document.txt. P.s. I know this does not make much sense as you would expect a specific search of odd and small numbers, rather than just echoing all numbers from the file, but this is what I'm doing.
#!/bin/bash
file1=document.txt
read -p 'Enter the number to be searched for: ' num
read -p 'Type "even" for an even number match or "odd" for an odd number match: ' num_type
read -p 'Type "big" for a big number or "small" for a small number: ' num_size
if grep "$num" $file1 && [[ "${num_type}" == "odd" ]] && [[ "${num_size}" == "small" ]]; then
echo $(grep "$num" "$file1") result
else
echo "lol"
fi
The issue I have is that if the above prompts are entered correctly (e.g. the num exists, and the user enters odd and small) then the script runs fine. However, when the user does not enter odd and small, the script runs the same way except the word "lol" is simply added to the bottom of the list of numbers, where I am aiming to simply have the word "lol" as the only output. Any help would be greatly appreciated.
Probably just run the grep once and then decide what to do with the output.
#!/bin/bash
file1=document.txt
read -r -p 'Enter the number to be searched for: ' num
read -r -p 'Type "even" for an even number match or "odd" for an odd number match: ' num_type
read -r -p 'Type "big" for a big number or "small" for a small number: ' num_size
if [[ "${num_type}" == "odd" ]] && [[ "${num_size}" == "small" ]] && result=$(grep "$num" "$file1"); then
echo "$result result"
else
echo "lol"
fi
I reordered the if conditions so we don't run grep at all if we don't need to.
Notice also the use of read -r.
grep "$num" $file1
as grep does, it outputs matched lines. Silence it with -q.
if grep -q "$num" $file1
What i wanna do is assign the 3rd field (each field is separated by :) from each line in Nurses.txt to a variable and compare it with another string which is manually given by the user when he runs the script.
Nurses.txt has this content in it:
12345:Ana Correia:CSLisboa:0:1
98765:Joao Vieira:CSPorto:0:1
54321:Joana Pereira:CSSantarem:0:1
65432:Jorge Vaz:CSSetubal:0:1
76543:Diana Almeida:CSLeiria:0:1
87654:Diogo Cruz:CSBraga:0:1
32198:Bernardo Pato:CSBraganca:0:1
21654:Maria Mendes:CSBeja:0:1
88888:Alice Silva:CSEvora:0:1
96966:Gustavo Carvalho:CSFaro:0:1
And this is the script I have so far, add_nurses.sh:
#!/bin/bash
CS=$(awk -F "[:]" '{print $3}' nurses.txt)
if [["$CS" == "$3"]] ;
then
echo "Error. There is already a nurse registered in that zone";
else
echo "There are no nurses registered in that zone";
fi
When I try to run the script and give it some arguments as shown here:
./add_nurses "Ana Correia" 12345 "CSLisboa" 0
It´s supposed to return "Error. There is already a nurse registered in that zone" but instead it just tells me i have an Output error in Line #6...
A simpler and shorter way to do this job is
if grep -q "^[^:]*:[^:]*:$3:" nurses.txt; then
echo "Error. There is already a nurse registered in that zone"
else
echo "There are no nurses registered in that zone"
fi
The grep call can be simplified as grep -Fq ":$3:" if there is no risk of collision with other fields.
Alternatively, in pure bash without using any external command line utilities:
#!/bin/bash
while IFS=: read -r id name region rest && [[ $region != "$3" ]]; do
:
done < nurses.txt
if [[ $region = "$3" ]]; then
echo "Error. There is already a nurse registered in that zone"
else
echo "There are no nurses registered in that zone"
fi
An alternative way to read the colon separated file would not need awk at all, just bash built-in commands:
read to read from a file into variables
with the -r option to prevent backslash interpretation
IFS as Internal Field Separator to specify the colon : as field separator
#!/bin/bash
# parse parameters to variables
set add_nurse=$1
set add_id=$2
set add_zone=$3
# read colon separated file
set IFS=":"
while read -r nurse id zone d1 d2; do
echo "Nurse: $nurse (ID $id)" "Registered Zone: $zone" "$d1" "$d2"
if [ "$nurse" == "$add_nurse" ] ; then
echo "Found specified nurse '$add_nurse' already registered for zone '$zone'.'"
exit 1
fi
if [ "$zone" == "$add_zone" ] ; then
echo "Found another nurse '$nurse' already registered for specified zone '$add_zone'.'"
exit 1
fi
done < nurses.txt
# reset IFS to default: space, tab, newline
unset IFS
# no records found matching nurse or zone
echo "No nurse is registered for specified zone."
See also:
bash - Read cells in csv file - Unix & Linux Stack Exchange
Judging by the user input (by field from the nurses.txt) to determine if there is indeed a nurse in a given zone according to the op's description, I came up with this solution.
#!/usr/bin/env bash
user_input=("$#")
mapfile -t text_input < <(awk -F':' '{print $2, $1, $3, $4}' nurses.txt)
pattern_from_text_input=$(IFS='|'; printf '%s' "#(${text_input[*]})")
if [[ ${user_input[*]} == $pattern_from_text_input ]]; then
printf 'Error. There is already a nurse "%s" registered in that zone!' "$1" >&2
else
printf 'There are no nurse "%s" registered in that zone.' "$1"
fi
run the script with a debug flag -x e.g.
bash -x ./add_nurses ....
to see what the script is actually doing.
The script will work with the (given order) sample of arguments otherwise an option parser might be required.
It requires bash4+ version because of mapfile aka readarray. For completeness a while read loop and an array assignment is an alternative to mapfile.
while read -r lines; do
text_input+=("$lines")
done < <(awk -F':' '{print $2, $1, $3, $4}' nurses.txt)
First, the content of $CS is a list of items and not only one item so to compare the input against all the items you need to iterate over the fields. Otherwise, you will never get true for the condition.
Second [[ is not the correct command to use here, it will consider the content as bash commands and not as strings.
I updated your script, to make it work for the case you described above
#!/bin/bash
CS=$(awk -F "[:]" '{print $3}' nurses.txt)
for item in `echo $CS`
do
[ "$item" == "$3" ] && echo "Error. There is already a nurse registered in that zone" && exit 1
done
echo "There are no nurses registered in that zone";
Output
➜ $ ./add_nurses.sh "Ana Correia" 12345 "CSLisboa" 0
Error. There is already a nurse registered in that zone
➜ $ ./add_nurses.sh "Ana Correia" 12345 "CSLisboadd" 0
There are no nurses registered in that zone
As already stated in comments and answer:
use single brackets with space inside to test variables: [ "$CS" == "$3" ]
if using awk to get 3rd field of CSV file, it actually returns a column with multiple values as array: verify output by echo "$CS"
So you must use a loop to test each element of the array.
If you iterate over each value of the 3rd nurse's column you can apply almost the same if-test. Only difference are the consequences:
in the case when a value does not match you will continue with the next value
if a value matches you could leave the loop, also the bash-script
#!/bin/bash
# array declaration follows pattern: array=(elements)
CS_array=($(awk -F "[:]" '{print $3}' nurses.txt))
# view how the awk output looks: like an array ?!
echo "$CS_array"
# use a for-each loop to check each string-element of the array
for CS in "${CS_array[#]}" ;
do
# your existing id with corrected test brackets
if [ "$CS" == "$3" ] ;
then
echo "Error. There is already a nurse registered in that zone"
# exit to break the loop if a nurse was found
exit 1
# no else needed, only a 'not found' after all have looped without match
fi
done
echo "There are no nurses registered in that zone"
Notice how complicated the array was passed to the loop:
the "" (double quotes) around are used to get each element as string, even if containing spaces inside (like a nurse's name might)
the ${} (dollar curly-braces) enclosing an expression with more than just a variable name
the expression CS_array[#] will get each element ([#]) from the array (CS_array)
You could also experiment with the array (different attributes):
echo "${#CS_array[*]}" # size of array with prepended hash
echo "${CS_array[*]}" # word splitting based on $IFS
echo "${CS_array[0]}" # first element of the array, 0 based
Detailed tutorial on arrays in bash: A Complete Guide on How To Use Bash Arrays
See also:
Loop through an array of strings in Bash?
I have the following list:
COX1
COX1
COX1
COX1
COX1
Cu-oxidase
Cu-oxidase_3
Cu-oxidase_3
Fer4_NifH
and I want to search if COX1 and Cu-oxidase is in the list, I want to print xyz, if Cu-oxidase_3 and Fer4_NifHis in the list too (independent if the first two are in the list, then it should print abc.
This is what I could script so far:
if grep 'COX1' file.txt; then echo xyz; else exit 0; fi
but it is of course incomplete.
Any solution to that?
ideally my output would be:
xyz
abc
Awk lets you easily search for multiple regular expressions and print something else than the matched string itself. (grep can easily search for multiple patterns, too, but it will print the match or its line number or file name, not some arbitrary string.)
The following assumes that you have a single token per line. This assumption makes the script really simple, though it would also not be hard to support other scenarios.
awk '{ a[$1]++ }
END { if (("COX1" in a) && ("Cu-oxidase" in a)) print "xyz";
if (("Cu-oxidase_3" in a) && ("Fer4_NifH" in a)) print "abc" }' file.txt
This builds an associative array of each token (actually the first whitespace-separated token on each line) and then at the end, when it has read every line in the file, checks whether the sought tokens exist as keys in the array.
Performing a single pass over the input file is a big win especially if you have a large input file and many patterns. Just for completeness, the syntax for performing multiple passes with grep is very straightforward;
if grep -qx 'COX1' file.txt && grep -qx 'Cu-oxidase' file.txt
then
echo xyz
fi
which can be further abbreviated to
grep -qx 'COX1' file.txt && grep -qx 'Cu-oxidase' file.txt && echo xyz
Notice the -x switch to require the whole line to match (otherwise the regex 'Cu-oxidase' would also match on the Cu-oxidase_3 lines).
Above is a very verbose way to achieve this. There are ways to write the same with less ifs and less greps, but I really wanted to show you the logic:
you run a grep command, check for its return value with $?, and finally acts on the conditions.
# default values
HAS_COX1=0
HAS_CUOX=0
HAS_CUO3=0
HAS_FER4=0
# run silently grep
grep -q 'COX1' file.txt
# check for return value and set variable accordingly
if [ $? -eq 0 ]; then HAS_COX1=1; fi
# same as above
grep -q 'Cu-oxidase' file.txt
if [ $? -eq 0 ]; then HAS_CUOX=1; fi
grep -q 'Cu-oxidase_3' file.txt
if [ $? -eq 0 ]; then HAS_CUO3=1; fi
grep -q 'Fer4_NifH' file.txt
if [ $? -eq 0 ]; then HAS_FER4=1; fi
if [ $HAS_COX1 -eq 1 ]; then
if [ $HAS_CUOX -eq 1 ]; then
echo 'xyz'
exit 0
fi
fi
if [ $HAS_CUO3 -eq 1 ]; then
if [ $HAS_FER4 -eq 1 ]; then
echo 'abc'
exit 0
fi
fi
echo 'None of the checks where matched'
exit 1
Beware: this code is untested, so there might be bugs ☺
The code isn't perfect, as it cannot print both 'xyz' and 'abc' when both conditions are met (but that would be an easy fix with the syntax I provide). Also $HAS_CUOX will be set to 1 whenever $HAS_CUO3 is found (no boundary checking in the grep regex).
You could take that code further by using a single grep for each set of conditions to check, using something like 'COX1\|Cu_oxidase' as the regex for grep. And also fix the minor issues I mentioned above.
ideally my output would be:
xyz
abc
You added your expected output after I wrote the above script, but given the elements I gave you, you should be able to figure how to improve that (basically removing the exit 0 where I placed them, and doing exit 1 when no output has been given.
Or just remove all exits as a dirty solution.
I am writing a script to manipulate a text file.
First thing I want to do is check if duplicate entries exist and if so, ask the user whether we wants to keep or remove them.
I know how to display duplicate lines if they exist, but what I want to learn is just to get a yes/no answer to the question "Do duplicates exist?"
It seems uniq will return 0 either if duplicates were found or not as long as the command completed without issues.
What is that command that I can put in an if-statement just to tell me if duplicate lines exist?
My file is very simple, it is just values in single column.
I'd probably use awk to do this but, for the sake of variety, here is a brief pipe to accomplish the same thing:
$ { sort | uniq -d | grep . -qc; } < noduplicates.txt; echo $?
1
$ { sort | uniq -d | grep . -qc; } < duplicates.txt; echo $?
0
sort + uniq -d make sure that only duplicate lines (which don't have to be adjacent) get printed to stdout and grep . -c counts those lines emulating wc -l with the useful side effect that it returns 1 if it doesn't match (i.e. a zero count) and -q just silents the output so it doesn't print the line count so you can use it silently in your script.
has_duplicates()
{
{
sort | uniq -d | grep . -qc
} < "$1"
}
if has_duplicates myfile.txt; then
echo "myfile.txt has duplicate lines"
else
echo "myfile.txt has no duplicate lines"
fi
You can use awk combined with the boolean || operator:
# Ask question if awk found a duplicate
awk 'a[$0]++{exit 1}' test.txt || (
echo -n "remove duplicates? [y/n] "
read answer
# Remove duplicates if answer was "y" . I'm using `[` the shorthand
# of the test command. Check `help [`
[ "$answer" == "y" ] && uniq test.txt > test.uniq.txt
)
The block after the || will only get executed if the awk command returns 1, meaning it found duplicates.
However, for a basic understanding I'll also show an example using an if block
awk 'a[$0]++{exit 1}' test.txt
# $? contains the return value of the last command
if [ $? != 0 ] ; then
echo -n "remove duplicates? [y/n] "
read answer
# check answer
if [ "$answer" == "y" ] ; then
uniq test.txt > test.uniq.txt
fi
fi
However the [] are not just brackets like in other programming languages. [ is a synonym for the test bash builtin command and ] it's last argument. You need to read help [ in order to understand
A quick bash solution:
#!/bin/bash
INPUT_FILE=words
declare -A a
while read line ; do
[ "${a[$line]}" = 'nonempty' ] && duplicates=yes && break
a[$line]=nonempty
done < $INPUT_FILE
[ "$duplicates" = yes ] && echo -n "Keep duplicates? [Y/n]" && read keepDuplicates
removeDuplicates() {
sort -u $INPUT_FILE > $INPUT_FILE.tmp
mv $INPUT_FILE.tmp $INPUT_FILE
}
[ "$keepDuplicates" != "Y" ] && removeDuplicates
The script reads line by line from the INPUT_FILE and stores each line in the associative array a as the key and sets the string nonempty as value. Before storing the value, it first checks whether it is already there - if it is it means it found a duplicate and it sets the duplicates flag and then it breaks out of the cycle.
Later it only checks if the flag is set and asks the user whether to keep the duplicates. If they answer anything else than Y then it calls the removeDuplicates function which uses sort -u to remove the duplicates. ${a[$line]} evaluates to the value of the associative array a for the key $line. [ "$duplicates" = yes ] is a bash builtin syntax for a test. If the test succeeds then whatever follows after && is evaluated.
But note that the awk solutions will likely be faster so you may want to use them if you expect to process bigger files.
You can do uniq=yes/no using this awk one-liner:
awk '!seen[$0]{seen[$0]++; i++} END{print (NR>i)?"no":"yes"}' file
awk uses an array of uniques called seen.
Every time we put an element in unique we increment an counter i++.
Finally in END block we compare # of records with unique # of records in this code: (NR>i)?
If condition is true that means there are duplicate records and we print no otherwise it prints yes.