grep search string and copy line from tsv to another file - bash

I have a tsv file with thousands of tab delimmited lines and I need to search for someones name and then copy the entire line to a seperate file over and over. Can anyone help? Thanks!!

Question is vague but this should generally work:
grep "some-name" *.tsv > ouput

It sounds very simple:
grep "someone's name" tsv-file > separate-file
What's the catch? Is the name in one or two fields? Middle initials?

Related

Bash: Identifying file based on part of filename

I have a folder containing paired files with names that look like this:
PB3999_Tail_XYZ_1234.bam
PB3999_PB_YWZ_5524.bam
I want to pass the files into a for loop as such:
for input in `ls PB*_Tail_.bam`; do tumor=${input%_Tail_*.bam}; $gatk Mutect2 -I $input -I$tumor${*}; done
The issue is, I can't seem to get the syntax right for the tumor input. I want it to recognise the paired file by the first part of the name PB3999_PB while ignoring the second half of the file name _YWZ_5524 that does not match.
Thank you for any help!
Just replaced ${*} with * and added _PB_ suffix to the prefix, to the script in the question. And, renamed variables.
for tailfname in PB*_Tail_*.bam; do
pairprefix="${tailfname%_Tail_*.bam}"
echo command with ${tailfname} ${pairprefix}_PB_*.bam
done
Hope this helps. The name tumor sounds scary. Hope the right files are paired.
I'm trying to fully understand what you want to do here.
If you want to extract just the first two parts, this should do:
echo "PB3999_Tail_XYZ_1234.bam" | cut -d '_' -f 1-2
That returns just the "PB3999_Tail" part.

Using grep/sed for any length patterns between two words

I have some directory called mydirectory with a bunch of text files containing the words 'SAVE' 'ME!' multiple times so I want it to print all the times for this specific pattern 'SAVE'|ANYTHING HERE, FOR ANY AMOUNT OF CHARACTERS|'ME'|Any non-zero amount of !s|'
To do this, I came up with sed -n '/SAVE/,/ME!\{1\}/p' mydirectory/* but this did not work, does anybody know how to do this? I can only use sed and grep for this.
File:
SAVE US OR JUST ME!!
BRAINSSSSSSSSSSSSS
SAVE US OR JUST ME
BRAINSSSSSSSSSSSSS
SAVE ME!
BRAINNNNNNNNNNNNS
SAVE ME
Desired Output
SAVE US OR JUST ME!!
SAVE ME!
$ grep -E '^SAVE.*ME!+$' file
output:
SAVE US OR JUST ME!!
SAVE ME!
anchors the pattern to the beginning and end, which I guess what you want.

Implementation issue in cascading while reading data from hdfs

Suppose I have these files in hdfs directory
500/Customer/part-001
500/Customer/part-002
500/Customer/part-003
Can it be possible to check from which part file the tuple is coming?
Note:I have researched but got nothing.
your question is not very clear.
Let's say your output is in following layout and the delimiter is ';'
id;name;age
1;Jordan;22
2;Nathan;33
and so on
You could use awk or grep or both to get the record
for example, if you want to search for the record Nathan, try the file command
grep -r "Nathan" part*
above command will search for the string "Nathan" and if the string is present in any part file then the first entry (word) in output will be the name of the file.
if you don't want the file name you could use
grep -hr "Nathan" part*
Please be more clear when questioning.
I got answer how to get from which part file tuple file are coming.I solved my problem using code below.
String fileName = flowProcess.getProperty("cascading.source.path").toString();
Thanks,

AIX script for file information

I have a file, in AIX server, with multiple record entries in below format
Name(ABC XYZ) Gender(Male)
AGE(26) BDay(1990-12-09)
My problem is I want to extract the name and the b'day from the file for all the records. I am trying to list it like below:
ABC XYZ 1990-12-09
Can someone please help me with the scripting
Something like this maybe:
awk -F"[()]" '/Name/ && /Gender/{name=$2} /BDay/{print name,$4}' file.txt
That says... "treat opening and closing parentheses as field separators. If you see a line go by that contains Name and Gender, save the second field in the variable name. If you see a line go by that contains the word Bday, print out the last name you saw and also the fourth field on the current line."

Remove all lines from a given text file based on a given list of IDs

I have a list of IDs like so:
11002
10995
48981
And a tab delimited file like so:
11002 Bacteria;
10995 Metazoa
I am trying to delete all lines in the tab delimited file containing one of the IDs from the ID list file. For some reason the following won't work and just returns the same complete tab delimited file without any line removed whatsoever:
grep -v -f ID_file.txt tabdelimited_file.txt > New_tabdelimited_file.txt
I also tried numerous other combinations with grep, but currently I draw blank here.
Any idea why this is failing?
Any help would be greatly appreciated
Since you tagged this with awk, here is one way of doing it:
awk 'BEGIN{FS=OFS="\t"}NR==FNR{ids[$1]++;next}!($1 in ids)' idFile tabFile > new_tabFile
BTW your grep command is correct. Just double check if your file is not formatted for windows.

Resources