There has been a data breach and I need to find all file paths across a file server with email addresses.
I was trying
grep -lr --include='*.{csv,xls,xlsx,txt}' "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" * >output.txt
But, this is returning nothing.
I would be grateful for an suggestions. thanks!
Your grep command is almost correct, there is only small glitches that make it not working.
First, for matching your email regex, you should use grep's extended regex option -E.
Next, as explained in this answer to another question, your --include pattern will not work in zsh. You need to put your ending quote before the braces, as follow: --include='*.'{csv,xls,xlsx,txt}
Finally, if you want to get all files on the server, you should perform this command on root directory / instead of on all files/directories present in the directory you are when you execute the command as you do with *
So your grep command should be:
grep -Elr --include='*.'{csv,xls,xlsx,txt} "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" /
Some points to take into account:
you will not detect email in excel files .xls and .xlsx as they are binary files so grep will not be able to parse them.
email pattern matching is rather hard, there are a lot of special cases in email parsing. The email pattern you're currently using will catch almost all emails, but not all of them.
Related
I have several files inside a directory, some which contain the word "sweet". I would like to use grep to find the files which contain the exact word and then move them to a different folder.
This is my code :
mv `grep -lir 'sweet' ~/directory1/` ~/directory2
However, there are some files with the word "sweets" or "sweeter" or "Sweet", my command is moving them as well, whereas I want the match to be strictly "sweet".
Please help, thanks.
Using grep -lrwF works
check comment thread with Shawn
I have got this: gsutil ls -d gs://mystorage/*123*,
which gives me all files matching the pattern "123".
I wonder if i could do this with condition like >123 and <127. To grab all files whose names contain 124, 125 and 126.
Other than *, gsutil supports special wildcard names.
You can use these special wildcards to match the name of your files, but keep in mind that you are working with strings and characters rather than numbers, therefore the solution is not very straight forward. Here is a guide using regexp, that better explains how to work with digits, in a general way.
For your specific question, you would end up with something like:
gsutil ls -d gs://mystorage/*12[456]*
Have a pile of 50 .rar files on a web server and I want to download them all.
And, the names of the files have nothing in common other than .rar.
I wanted to try aria2 to download all of them altogether, but I think I need to write a script to fetch the addresses of all the .rar files.
I have no idea how to start writing the scrip. Any hint will be appreciated.
You can try to play with wget with -A parameter in your shell script:
wget -r "https://foo/" -P /tmp -A "*.rar"
Here is an explanation of what -A does
Specify comma-separated lists of file name suffixes or patterns to accept or reject (see Types of Files). Note that if any of the wildcard characters, ‘’, ‘?’, ‘[’ or ‘]’, appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A ".mp3"’ or ‘-A '*.mp3'’.
I am trying to redirect emails that match a particular pattern to a shell script which will create files containing the texts, with datestamped filenames.
First, here is the routine from .procmailrc that hands the emails off to the script:
:0c:
* Subject: ^Ingest_q.*
| /home/myname/procmail/process
and here is the script 'process':
#!/bin/bash
DATE=`date +%F_%N`
FILE=/home/myname/procmail/${DATE}_email.txt
while read line
do
echo "$line" 1>>"$FILE";
done
I have gotten very frustrated with this because I can pipe text to this script on the command line and it works fine:
mybox-248: echo 'foo' | process
mybox-249: ls
2013-07-31_856743000_email.txt process
The file contains the word 'foo.'
I have been trying to get an email text to get output as a date-stamped file for hours now, and nothing has worked.
(I've also turned logging on in my .procmailrc and that isn't working either -- I'm not trying to ask a second question by mentioning that, just wondering if that might provide some hint as to what I might be doing wrong ...).
Thanks,
GB
Quoting your attempt:
:0c:
* Subject: ^Ingest_q.*
| /home/myname/procmail/process
The regex is wrong, ^ only matches at beginning of line, so it cannot occur after Subject:. Try this instead.
:0c:process.lock
* ^Subject: Ingest_q
| /home/myname/procmail/process
I also specified a named lockfile; I do not believe Procmail can infer a lock file name from just a script name. As you might have multiple email messages being delivered at the same time, and you don't want their logging intermingled in the log file, using a lock file is required here.
Finally, the trailing .* in the regex is completely redundant, so I removed it.
(The olde Procmail mini-FAQ also addresses both of these issues.)
I realize your recipe is probably just a quick test before you start on something bigger, but the entire recipe invoking the process script can be completely replaced by something like
MAILDIR=/home/myname/procmail
DATE=`date +%F_%N`
:0c:
${DATE}_email.txt
This will generate Berkeley mbox format, i.e. each message should have a From_ pseudo-header before the real headers; if you are not sure whether this is already the case, you should probably use procmail -Yf- to make sure to make it so (otherwise there is really no way to tell where one message ends and another begins; this applies both to your original solution, and this replacement).
Because Procmail sees the file name you are delivering to, it can infer a lockfile name now, as a minor bonus.
Using MAILDIR to specify the directory is the conventional way to do this, but you can specify a complete path to an mbox file if you prefer, of course.
if i want an alias to do "rgrep pattern *" to search all files from my current location down through any sub directories, what is an alias for rgrep I can add to my bashrc file?
i would also like it to ignore errors and only report positive hits
In order for it to ignore errors (such as "Permission denied"), you'll probably need to use a function instead of an alias:
rgrep () { grep -r "${#}" 2>/dev/null; }
How about:
alias rgrep="grep -r"
This will only show 'positive hits', i.e. lines that contain the pattern you specify.
Small piece of advice, however: you might want to get used to just using grep -r directly. You'll then find it much easier if you ever need to use someone else's workstation, for instance!
Edit: you want to match patterns in file names, not in their contents (and also in directory names too). So how about this instead:
alias rgrep="find | grep"
By default, find will find all files and directories, so then it's just a case of passing that list to grep to find the pattern you're looking for.