How to compare a file to a list in linux with one line code? - bash

Hey so got another predicament that I am stuck in. I wanted to see approximately how many Indian people are using the stampede computer. So I set up an indian txt file in vim that has about 50 of the most common surnames in india and I want to compare those names in the file to the user name list.
So far this is the code I have
getent passwd | cut -f 5 -d: | cut -f -d' '
getent passwd gets the userid list which is going to look like this
tg827313:x:827313:8144474:Brandon Williams
the cut functions will get just the last name so the output of the example will be
Williams
Now can use the grep function to compare files but how do I use it to compare the getent passwd list with the file?

To count how many of the last names of computer users appear in the file namefile, use:
getent passwd | cut -f 5 -d: | cut -f -d' ' | grep -wFf namefile | wc -l
How it works
getent passwd | cut -f 5 -d: | cut -f -d' '
This is your code which I will assume works as intended for you.
grep -wFf namefile
This selects names that match a line in namefile. The -F option tells grep not to use regular expressions for the names. The names are assumed to be fixed strings. The option -f tells grep to read the strings from file. -w tells grep to match whole words only.
wc -l
This returns a count of the lines in the output.

Related

Shell: Counting lines per column while ignoring empty ones

I am trying to simply count the lines in the .CSV per column, while at the same time ignoring empty lines.
I use below and it works for the 1st column:
cat /path/test.csv | cut -d, -f1 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 8
And below for the 2nd column:
cat /path/test.csv | cut -d, -f2 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 6
But when I try to count 3rd column, it simply Outputs the Total number of lines in the whole .CSV.
cat /path/test.csv | cut -d, -f3 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 33
#Should be: 19?
I've also tried to use awk instead of cut, but get the same issue.
I have tried creating new file thinking maybe it had some spaces in the lines, still the same.
Can someone clarify what is the difference? Betwen reading 1-2 column and the rest?
20355570_01.tif,,
20355570_02.tif,,
21377804_01.tif,,
21377804_02.tif,,
21404518_01.tif,,
21404518_02.tif,,
21404521_01.tif,,
21404521_02.tif,,
,22043764_01.tif,
,22043764_02.tif,
,22095060_01.tif,
,22095060_02.tif,
,23507574_01.tif,
,23507574_02.tif,
,,23507574_03.tif
,,23507804_01.tif
,,23507804_02.tif
,,23507804_03.tif
,,23509247_01.tif
,,23509247_02.tif
,,23509247_03.tif
,,23527663_01.tif
,,23527663_02.tif
,,23527663_03.tif
,,23527908_01.tif
,,23527908_02.tif
,,23527908_03.tif
,,23535506_01.tif
,,23535506_02.tif
,,23535562_01.tif
,,23535562_02.tif
,,23535636_01.tif
,,23535636_02.tif
That happens when input file has DOS line endings (\r\n). Fix your file using dos2unix and your command will work for 3rd column too.
dos2unix /path/test.csv
Or, you can remove the \r at the end while counting non-empty columns using awk:
awk -F, '{sub(/\r/,"")} $3!=""{n++} END{print n}' /path/test.csv
The problem is in the grep command: the way you wrote it will return 33 lines when you count the 3rd column.
It's better instead to use the following command to count number of lines in .CSV for each column (example below is for the 3rd column):
cat /path/test.csv | cut -d , -f3 | grep -cve '^\s*$'
This will return the exact number of lines for each column and avoid of piping into wc.
See previous post here:
count (non-blank) lines-of-code in bash
edit: I think oguz ismail found the actual reason in their answer. If they are right and your file has windows line endings you can use one of the following commands without having to convert the file.
cut -d, -f3 yourFile.csv cut | tr -d \\r | grep -c .
cut -d, -f3 yourFile.csv | grep -c $'[^\r]' # bash only
old answer: Since I cannot reproduce your problem with the provided input I take a wild guess:
The "empty" fields in the last column contain spaces. A field containing a space is not empty altough it looks like it is empty as you cannot see spaces.
To count only fields that contain something other than a space adapt your regex from . (any symbol) to [^ ] (any symbol other than space).
cut -d, -f3 yourFile.csv | grep -c '[^ ]'

Set User Name and Password from Txt file using bash

I have an env.txt file in the following format:
lDRIVER={ODBC Driver 13 for SQL Server};
PORT=1433;
SERVER=serveename;
DATABASE=db;
UID=username;
PWD=password!
I have a git bash script (.sh) that requires the UID and PWD from that file. I was thinking about getting it by the last/second last line number. How do I do this/ is there a better way (say looking for UID and PWD and assigning the git bash variable that way)
There's lots of ways to do this. You could use awk which I would personally use since it's sort of like an x-acto knife for this type of thing:
uid=$(awk -F"[=;]" '/UID/{print $2}' env.txt)
pwd=$(awk -F"[=;]" '/PWD/{print $2}' env.txt)
Or grep and sed. sed is nice because it allows you to get very specific about the piece of info you want to cut from the line, but it's regex which has its learning curve:
uid=$(grep "UID" env.txt | sed -r 's/^.*=(.*)(;|$)/\1/g' )
pwd=$(grep "PWD" env.txt | sed -r 's/^.*=(.*)(;|$)/\1/g' )
As #JamesK noted in the comments you can use sed and have it do the search instead of grep. This is super nice and I would definitely choose this instead of the grep | sed.
uid=$(sed -nr '/UID/s/^.*=(.*)(;|$)/\1/gp' )
pwd=$(sed -nr '/PWD/s/^.*=(.*)(;|$)/\1/gp' )
Or grep and cut. Bleh... we can all do better, but sometimes we just want to grep and cut and not have to think about it:
uid=$(grep "UID" env.txt | cut -d"=" -f2 | cut -d";" -f1)
pwd=$(grep "PWD" env.txt | cut -d"=" -f2 | cut -d";" -f1)
I definitely wouldn't go by line number though. That looks like and odbc.ini file and the order in which the parameters are listed in each odbc entry are irrelevant.
First rename PWD to something like PASSWORD. PWD is a special variable used by the shell. Even better is to use lowercase variable names for all your own variables.
When the password is without special characters (spaces, $, ), you can
source env.txt
When the password has something special, consider editing the env.txt:
lDRIVER="{ODBC Driver 13 for SQL Server}"
PORT="1433"
SERVER="serveename"
DATABASE="db"
UID="username"
PASSWORD="password!"
When you are only interested in lowercase uid and passwd, consider selecting only the interesting fields and change the keywords to lowercase
source <(sed -rn '/^(UID|PWD)=/ s/([^=]*)/\L\1/p' env.txt)

Bash Script - get User Name given UID

How - given USER ID as parameter, find out what is his name? The problem is to write a Bash script, and somehow use etc/passwd file.
The uid is the 3rd field in /etc/passwd, based on that, you can use:
awk -v val=$1 -F ":" '$3==val{print $1}' /etc/passwd
4 ways to achieve what you need:
http://www.digitalinternals.com/unix/linux-get-username-from-uid/475/
Try this:
grep ":$1:" /etc/passwd | cut -f 1 -d ":"
This greps for the UID within /etc/passwd.
Alternatively you can use the getent command:
getent passwd "$1" | cut -f 1 -d ":"
It then does a cut and takes the first field, delimited by a colon. This first field is the username.
You might find the SS64 pages for cut and grep useful:
http://ss64.com/bash/grep.html
http://ss64.com/bash/cut.html

using cut on a line having multiple instances of the same delimiter - unix

I am trying to write a generic script which can have different file name inputs.
This is just a small part of my bash script.
for example, lets say folder 444-55 has 2 files
qq.filter.vcf
ee.filter.vcf
I want my output to be -
qq
ee
I tried this and it worked -
ls /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf | sort | cut -f1 -d "." | xargs -n 1 basename
But lets say I have a folder like this -
/data2/delivery/Stack_overflow/de.1111_2222_3333_23/secondary/444-55/*.filter.vcf
My script's output would then be
de
de
How can I make it generic?
Thank you so much for your help.
Something like this in a script will "cut" it:
for i in /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf
do
basename "$i" | cut -f1 -d.
done | sort
advantages:
it does not parse the output of ls, which is frowned upon
it cuts after having applied the basename treatment, and the cut ignores the full path.
it also sorts last so it's guaranteed to be sorted according to the prefix
Just move the basename call earlier in the pipeline:
printf "%s\n" /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf |
xargs -n 1 basename |
sort |
cut -f1 -d.

How to echo a string after sorting it by maximum length

so I am just learning bash and right now I am trying to write one line codes to solve some problems. So write now I am listing all the users in stampede and trying to get the length and name of the longest string.
So this is where I am at
getent passwd | cut -f 1 -d: | wc -L
getent passwd - (too get the userid list), the cut command to get the first userid and then wc -L to get the longest length. Now I am trying to figure out how do I echo that? So any input on that would be awesome thank you!
To get the name of the user with the longest name, use:
getent passwd | awk -F: '{longest=length($1)>length(longest)?$1:longest} END{print longest}'
How it works
-F:
Tell awk to use a colon as the field separator.
longest=length($1)>length(longest)?$1:longest
For every line of input, this statement is executed. It assigns to the variable longest the result of a ternary statement:
length($1)>length(longest)?$1:longest
This statement tests the condition length($1)>length(longest). Here, length($1) is the length of the name of the current user and length(longest) is the length of the longest name seen previously. If the current name is longer, the ternary expression returns the current name, $1. Otherwise, it returns the previous longest name, longest.
END{print longest}
After we have finished reading the file, this prints the name that was the longest.
One way to do this, that doesn't stray too far from what you have, is to use sed to only output lines that are at least as long as the length you get from your command above:
getent passwd | cut -f 1 -d: | sed -n /^.\\{`getent passwd | cut -f 1 -d: | wc -L`\\}/p
This will output all users which tie for the longest length.
You can make it a little nicer by storing the list of names in a variable:
u=`getent passwd | cut -f 1 -d:`; sed -n /^.\\{`wc -L <<< "$u"`\\}/p <<< "$u"
Another way of doing it:
getent passwd | cut -f 1 -d: | perl -ne 'print length($_).":$_"'| sort -n| cut -f 2 -d:
We create a list <length>:<username> with perl, sort by length and print only username.

Resources