I'm a beginner-mid level bash scripter and I'm not very familiar with working with csv files through terminal.
Through the hours of research I've wasted on this, I'm guessing sed or awk will be my best bet, I'm just not certain the best way to accomplish this.
The CSV is as follows:
Owner,id,permission.deleted,permission.displayName,permission.domain,permission.emailAddress,permission.id,permission.photoLink,permission.role,permission.type
owner#domain.com,some_file_id,False,Display Name,domain.com,writer#domain.com,permissionidnumber,,writer,user
owner#domain.com,some_file_id,False,Display Name,domain.com,owner#domain.com,permissionidnumber,url,owner,user
My goal is to remove any lines where the owner is granted permissions from the original csv.
Ideally, I'd like something along the lines of "If Column A (Owner) matches Column F (permission.emailAddress), delete the line"
Desired Output - Replace existing CSV with:
The CSV is as follows:
Owner,id,permission.deleted,permission.displayName,permission.domain,permission.emailAddress,permission.id,permission.photoLink,permission.role,permission.type
owner#domain.com,some_file_id,False,Display Name,domain.com,writer#domain.com,permissionidnumber,,writer,user
The command I'm running needs to use the CSV to read the permissions appropriately and I'm removing the owner since they retain ownership and if I try to grant it to them again, they receive an email and I'm trying to avoid spamming my users.
If I can't grab match two columns within the CSV and delete it from there, I can probably grab the owner#domain.com address and set it to a variable to use if that's easier. I just have to run this against ~100 unique users so the more I can automate, the better.
Using any awk in any shell on every Unix box, the following will execute orders of magnitude faster than a shell read loop with far simpler and far briefer code:
awk -F, '$1 != $6' file
For example:
$ awk -F, '$1 != $6' file
Owner,id,permission.deleted,permission.displayName,permission.domain,permission.emailAddress,permission.id,permission.photoLink,permission.role,permission.type
owner#domain.com,some_file_id,False,Display Name,domain.com,writer#domain.com,permissionidnumber,,writer,user
To modify the original file with GNU awk use awk -i inplace -F, '$1!=$6' file or with any awk awk -F, '$1!=$6' file > tmp && mv tmp file.
Maybe awk.
$: cat x
1 2 3 4 5 6 7
3 4 3 4 5 3 7
5 6 3 4 5 6 7
7 8 3 4 5 6 7
9 0 3 4 5 9 7
awk '$1 == $6 { next } 1' x
1 2 3 4 5 6 7
5 6 3 4 5 6 7
7 8 3 4 5 6 7
I have two large files around 7GB each. I would like to find the difference of the second file only if the number of the first column is the same for the two files. The two files are sorted but can have different number of lines.
The first file looks like this: (1.txt)
5 5
6 6
7 7
8 8
9 9
The second file looks like this: (2.txt):
3 3
4 4
5 5
6 6
7 4
8 4
9 9
The output should look like this:
7 4
8 4
Right now I have this one-liner, but I am not sure if it can go faster:
mawk 'NR==FNR{a[$1]=$2; next} ($1 in a) && a[$1]!=$2' 1.txt 2.txt
if the files are sorted on the joined key, the easiest (and fastest) will be
$ join file1 file2 | awk '$2!=$3{print $1,$3}'
7 4
8 4
How would one go about with sed to extract n lines of a file every m-th line?
Say my textfile looks like this:
myfile.dat:
1
2
3
4
5
6
7
8
9
10
Say that I want to extract blocks of three lines and then skipping two lines throughout the entire file, such that my output looks like this:
output.dat:
1
2
3
6
7
8
Any suggestions on how one could achieve this with sed?
Edit:
For my example I could just have used
sed -n 'p;n;p;n;p;n;n' myfile.dat > output.dat
or with GNU sed (not preferred due to portability)
sed '1~5b;2~5b;3~5b;d' myfile.dat > output.dat
However, I typically want to print blocks of 2450 lines from a file with 49 002 450 lines, such that my outputfile contains 247 450 lines.
This might work for you (GNU sed):
sed -n '1~5,+2p' file
Starting at line 1, print line numbers with modulus 5 and the following two lines.
An alternative:
sed -n 'N;N;p;n;n' file
In your case the below would work. It's checking the remainder when divided by 5 is between 1 and 3:
awk 'NR%5==1, NR%5==3' myfile.dat
If I have a text file with the following form
1 1
1 3
3 4
2 2
5 7
...
Is there a Linux command that can give me the following result?
1 3
3 4
5 7
...
So, I want to delete the lines 1 1 and 2 2.
Yes, you can use something like:
awk '$1!=$2{print}' inputfilename
or the slightly less verbose (thanks to ooga):
awk '$1!=$2' inputfilename
which uses the "missing action means print" feature of awk.
Both these awk commands print lines where the columns don't match, and throw away everything else.
Let's say I have a file like:
thing1(space)thing2(space)thing3(space)thing4
E.g.
1 apple 3 4
3 banana 3 8
3 pear 11 12
13 cheeto 15 16
Can I only show lines where thing3 is greater than 3? (i.e. pear and cheeto)
I can easily do this in python, but can we do this in the shell? Maybe with awk? I'm still researching this.
You can do that easily with awk if that is an option available to you by saying:
awk '$3>3' inputFile
$ cat file
1 apple 3 4
3 banana 3 8
3 pear 11 12
13 cheeto 15 16
$ awk '$3>3' file
3 pear 11 12
13 cheeto 15 16
awk by default splits the line in to fields delimited by space and assigns them to variable which can be referenced by stating the column number. In your case you need to reference it by $3.