Parsing CSV file in bash script [duplicate] - bash

This question already has answers here:
How to extract one column of a csv file
(18 answers)
Closed 7 years ago.
I am trying to parse in a CSV file which contains a typical access control matrix table into a shell script. My sample CSV file would be
"user","admin","security"
"user1","x",""
"user2","","x"
"user3","x","x"
I would be using this list in order to create files in their respective folders. The problem is how do I get it to store the values of column 2/3 (admin/security)? The output I'm trying to achieve is to group/sort all users that have admin/security rights and create files in their respective folders. (My idea is to probably store all admin/security users into different files and run from there.)
The environment does not allow me to use any Perl or Python programs. However any awk or sed commands are greatly appreciated.
My desired output would be
$ cat sample.csv
"user","admin","security"
"user1","x",""
"user2","","x"
"user3","x","x"
$ cat security.csv
user2
user3
$ cat admin.csv
user1
user3

if you can use cut(1) (which you probably can if you're on any type of unix) you can use
cut -d , -f (n) (file)
where n is the column you want.
You can use a range of columns (2-3) or a list of columns (1,3).
This will leave the quotes but you can use a sed command or something light-weight for that.
$ cat sample.csv
"user","admin","security"
"user1","x",""
"user2","","x"
"user3","x","x"
$ cut -d , -f 2 sample.csv
"admin"
"x"
""
"x"
$ cut -d , -f 3 sample.csv
"security"
""
"x"
"x"
$ cut -d , -f 2-3 sample.csv
"admin","security"
"x",""
"","x"
"x","x"
$ cut -d , -f 1,3 sample.csv
"user","security"
"user1",""
"user2","x"
"user3","x"
note that this won't work for general csv files (doesn't deal with escaped commas) but it should work for files similar to the format in the example for simple usernames and x's.
if you want to just grab the list of usernames, then awk is pretty much the tool made for the job, and an answer below does a good job that I don't need to repeat.
But a grep solution might be quicker and more lightweight
The grep solution:
grep '^\([^,]\+,\)\{N\}"x"'
where N is the Nth column, with the users being column 0.
$ grep '^\([^,]\+,\)\{1\}"x"' sample.csv
"user1","x",""
"user3","x","x"
$ grep '^\([^,]\+,\)\{2\}"x"' sample.csv
"user2","","x"
"user3","x","x"
from there on you can use cut to get the first column:
$ grep '^\([^,]\+,\)\{1\}"x"' sample.csv | cut -d , -f 1
"user1"
"user3"
and sed 's/"//g' to get rid of quotes:
$ grep '^\([^,]\+,\)\{1\}"x"' sample.csv | cut -d , -f 1 | sed 's/"//g'
user1
user3
$ grep '^\([^,]\+,\)\{2\}"x"' sample.csv | cut -d , -f 1 | sed 's/"//g'
user2
user3

Something to get you started (please note this will not work for csv files with embedded commas and you will have to use a csv parser):
awk -F, '
NR>1 {
gsub(/["]/,"",$0);
if($2!="" && $3!="")
print $1 " has both privileges";
print $1 > "file"
}' csv

Related

How to Write A Second Column in Bash in an Existing txt file

I need to extract the ID name of a parent directory and put that in a tab-delimited text file. Then I need to extract names of the contents of that folder and put it in the same row as that ID name I first extracted. Essentially, Column 1 should list the directory name from parent, Column 2 should list the name first file in that directory, Column 3 should be the name of the next file, and so on and so forth.
/path/to/folder/ID/
pwd | xargs echo | awk -F "/" '{print $n; exit}' >> Text.txt
where 'n' is the location of the desired parent folder (in this case, ID). This works fine, and writes something like "ID001" to my Text.txt file.
I try the same little hack again, using my pwd as my input to xargs, listing out the contents of that folder, and writing the names to my Text.txt file:
pwd | xargs echo | awk -F "/" '{print $7; exit}' >> Text.txt | pwd | xargs echo | xargs ls | xargs echo >> Text.txt
But instead of
ID001 file1 file2
I get
file1 file2
ID001
Which is mostly to be expected, given the commands. I am confused as to why my file names are being appended to the first row and not to the last row. The only related article I could find was this for writing a specific column to a CSV, but it wasn't quite what I was looking for.
This find plus awk pipeline MAY be what you're trying to do:
$ ls tmp
a b
$ find tmp -print | awk '{sub("^[^/]+/",""); printf "%s%s", sep, $0; sep="\t"} END{print ""}'
tmp a b
YMMV if your file names contain tabs or newlines of course.
You probably want to do that as part of multiple commands; for ease in understanding.
You can put the commands in a bash script.
Example scenario
$ pwd
/Users/pa357856/test/tmp/foo
$ ls
file1.txt file2.txt
commands -
$ parentDIR=`pwd | xargs echo | awk -F "/" '{print $6}'`
$ filesList=`ls`
$ echo "$parentDIR" "$filesList" >> test.txt
Result -
$ cat test.txt
foo file1.txt file2.txt

Extract specific string from line with standard grep,egrep or awk

i'm trying to extract a specific string from a grep output
uci show minidlna
produces a large list
.
.
.
minidlna.config.enabled='1'
minidlna.config.db_dir='/mnt/sda1/usb/db'
minidlna.config.enable_tivo='1'
minidlna.config.wide_links='1'
.
.
.
so i tried to narrow down what i wanted by running
uci show minidlna | grep -oE '\bdb_dir=\S+'
this narrows the output to
db_dir='/mnt/sda1/usb/db'
what i want is to output only
/mnt/sda1/usb/db
without the quotes and without the starting "db_dir" so i can run rm /mnt/sda1/usb/db/file.db
i've used the answers found here
How to extract string following a pattern with grep, regex or perl
and that's as close as i got.
EDIT: after using Ed Morton's awk command i needed to pass the output to rm command.
i used:
| ( read DB; (rm $DB/files.db) .
read DB passes the output into the vairable DB.
(...) combines commands.
rm $DB/files.db deletes the the file files.db.
Is this what you're trying to do?
$ awk -F"'" '/db_dir/{print $2}' file
/mnt/sda1/usb/db
That will work in any awk in any shell on every UNIX box.
If that's not what you want then edit your question to clarify your requirements and post more truly representative sample input/output.
Using sed with some effort to avoid single quotes:
sed -n 's/^minidlna.config.db_dir=\s*\S\(\S*\)\S\s*$/\1/p' input
Well, so you end up having a string like db_dir='/mnt/sda1/usb/db'.
I would first remove the quotes by piping this to
.... | tr -d "'"
Now you end up with a string like db_dir=/mnt/sda1/usb/db.
Say you have this string stored in a variable named confstr, then
${confstr##*=}
gives you just /mnt/sda1/usb/db, since *= denotes everything from the start to the equal sign, and ## denotes removal.
I would do this:
Once you either extracted your line about into file.txt (or pipe it into this command), split the fields using the quote character. Use printf to generate the rm command and pass this into bash to execute.
$ awk -F"'" '{printf "rm %s.db/file.db\n", $2}' file.txt | bash
rm: /mnt/sda1/usb/db.db/file.db: No such file or directory
With your original command:
$ uci show minidlna | grep -oE '\bdb_dir=\S+' | \
awk -F"'" '{printf "rm %s.db/file.db\n", $2}' | bash

How to capture first column values of a command?

I am new to shell scripting. I am trying to write a script that is suppose to run a command and use for loop to capture first column of the output and do further processing.
command: tst get files
output of this command is something like
NAME COUNT ADMIN
FileA.txt 30 adminA
FileB.txt 21 local
FileC.txt 9 local
FileD.txt 90 adminA
Here is what I have tried so far : UPDATED also want to run additional commands
#!/bin/bash
for f in $(tst get files)
do
echo "FILE :[${f}]"
tst setprimary ${f} && tst get dataload
done
the output I am seeing is something like
FILE :[NAME]
FILE :[COUNT]
FILE :[ADMIN]
FILE :[FileA.txt]
FILE :[30]
FILE :[adminA]
FILE :[FileB.txt]
FILE :[21]
FILE :[local]
FILE :[FileC.txt]
FILE :[9]
FILE :[local]
FILE :[FileD.txt]
FILE :[90]
FILE :[adminA]
I am looking for an output something like
FILE :[FileA.txt]
FILE :[FileB.txt]
FILE :[FileC.txt]
FILE :[FileD.txt]
What should I modify in the shell script to only capture NAME column values? Am I executing the tst get files command correctly in the for loop or is there a better way to execute a command and loop thru the results?
EDIT (Samuel Kirschner): you can do without the for loop entirely and just use awk to print the lines you're interested in
tst get files | awk 'NR > 1 {print "FILE :[" $1 "]"}'
If you want to keep the for loop for some reason and just extract the file name from the lines while skipping the header, you have a few choices. Awk is probably the easiest because of the NR builtin variable (which counts lines) and automatic field-splitting ($1 refers to the first field in the line, for instance), but you can use sed and cut as well.
You can use awk 'NR > 1 {print $1}' to get the first column (using any whitespace character as a delimiter while skipping the first line) or sed 1d | cut -d$'\t' -f1. Note that $'\t' is bash-specific syntax for a literal tab character, if your file is padded with spaces rather than using tabs to delimit fields, you can't use the sed ... | cut ... example.
i.e.
#!/bin/bash
for f in $(tst get files | awk 'NR > 1 {print $1}')
do
echo "FILE :[${f}]"
done
or
#!/bin/bash
for f in $(tst get files | sed 1d | cut -d$'\t' -f1)
do
echo "FILE :[${f}]"
done
to avoid unnecessary splitting in the for loop. It's best to set IFS to something specific outside the loop body to prevent 'a file with whitespace.txt' from being broken up.
OLD_IFS=IFS
IFS=$'\n\t'
for f in $(tst get files | sed 1d | cut -d$'\t' -f1)
do
echo "FILE :[${f}]"
done
You can just do:
tst get files | awk 'NR > 1 { printf "FILE :[%s]\n", $1 }'
Update: To answer extended problem as per comments below by OP:
while read -r file _; do
tst setprimary "$file" && tst get dataload
done < <(tst get files)
Or perl:
tst ... | perl -lanE 'say "File: [$F[0]]" if $.>1'
the variable $. contains the current line number

awk for different delimiters piped from xargs command

I run an xargs command invoking bash shell with multiple commands. I am unable to figure out how to print two columns with different delimiters.
The command is ran is below
cd /etc/yp
cat "$userlist" | xargs -I {} bash -c "echo -e 'For user {} \n'
grep -w {} auto_*home|sed 's/:/ /' | awk '{print \$1'\t'\$NF}'
grep -w {} passwd group netgroup |cut -f1 -d ':'|sort|uniq;echo -e '\n'"
the output I get is
For user xyz
auto_homeabc.jkl.com:/rtw2kop/xyz
group
netgroup
passwd
I need a tab after the auto_home(since it is a filename) like in
auto_home abc.jkl.com:/rtw2kop/xyz
The entry from auto_home file is below
xyz -rw,intr,hard,rsize=32768,wsize=32768 abc.jkl.com:/rtw2kop/xyz
How do I awk for the first field(auto_home) and the last field abc.jkl.com:/rtw2kop/xyz? As I have put a pipe from grep command to awk.'\t' isnt working in the above awk command.
If I understand what you are attempting correctly, then I suggest this approach:
while read user; do
echo "For user $user"
awk -v user="$user" '$1 == user { print FILENAME "\t" $NF }' auto_home
awk -F: -v user="$user" '$1 == user { print FILENAME; exit }' passwd group netgroup | sort -u
done < "$userlist"
The basic trick is the read loop, which will read a line into the variable $user from the file named in $userlist; after that, it's all straightforward awk.
I took the liberty of changing the selection criteria slightly; it looked as though you wanted to select for usernames, not strings anywhere in the line. This way, only lines will be selected in which the first token is equal to the currently inspected user, and lines in which other tokens are equal to the username but not the first are discarded. I believe this to be what you want; if it is not, please comment and we can work it out.
In the 1st awk command, double-escape the \t to \\t. (You may also need to double-escape the \n.)

BASH Substracting Files on Key line by line

I just wanna to substract one CSV-File from another one, but not if the lines are the same. Instead of comparing the lines I'd like to look if the lines matching in one field.
e.g. the first file
EMAIL;NAME;SALUTATION;ID
foo#bar.com;Foo;Mr;1
bar#foo.com;Bar;Ms;2
and the second file
EMAIL;NAME
foo#bar.com;Foo
the resultfile should be
EMAIL;NAME;SALUTATION;ID
bar#foo.com;Bar;Ms;2
I think u know what I mean ;)
How is that possible in bash? It's easy for me doing this in Java, but I realy like to learn how to do that in bash. Also I can substract by comparing the lines using sort
#! / bin / bash
echo "Substracting Files..."
sort "/tmp/list1.csv" "/tmp/list2.csv" "/tmp/list2.csv" | uniq -u >> /tmp/subList.csv
echo "Files successfully substracted."
But the lines arn't the same tuple. So I have to compare line with keys.
Any suggestions? Thanks a lot.. Nils
One possible solution coming to my mind is this one (working with bash):
grep -v -f <(cut -d ";" -f1 /tmp/list2.csv) /tmp/list1.csv
That means:
cut -d ";" -f1 /tmp/list2.csv: Extract the first column of the second file.
grep -f some_file: Use a file as pattern source.
<(some_command): This is a process substitution. It executes the command and feeds the output to a named pipe which then can be used as file input to grep -f.
grep -v: Print only the lines not matching the pattern(s).
Update: the solution to the question, via join and awk.
join --header -1 1 -2 1 -t";" --nocheck-order -v 1 1.csv 2.csv | | awk 'NR==1 {print gensub(";[^;]\\+$","","g");next} 1'
These were the inverse answers:
$ join -1 1 -2 1 -t";" --nocheck-order -o 1.1,1.2,1.3,1.4 1.csv 2.csv
EMAIL;NAME;SALUTATION;ID
foo#bar.com;Foo;Mr;1
join to the rescue.
Or the skipping of printing the NAME field without -o:
$ join -1 1 -2 1 -t";" --nocheck-order 1.csv 2.csv | awk 'BEGIN {FS=";" ; OFS=";"} {$NF=""; print }'
(But it still prints a plus ;˛after the last field.
HTH

Resources