Parsing .eml files, checking conditions and printing out specific lines - bash

I am trying to parse eml files (thousands of files in a folder), check for specific text in the files and if its there print out the text and other specific lines into a line per file into a text file.
I am using a Linux terminal to execute a command and managed to check the condition, however the command only prints out the file name and the matched condition.
How can I modify this command to extract specific lines if the condition is matched ?
for i in ./*.eml
do
cat "$i"| egrep -o "[0-9]+.from.[0-9]+" | awk -v a="$i" '{ if ( $1 = $3 ) print a, $1, "from", $3}' >> temp.txt
done

Related

AWK remove blank lines and append empty columns to all csv files in the directory

Hi I am looking for a way to combine all the below commands together.
Remove blank lines in the csv file (comma delimited)
Add multiple empty columns to each line up to 100th column
Perform action 1 & 2 on all the files in the folder
I am still learning and this is the best I could get:
awk '!/^[[:space:]]*$/' x.csv > tmp && mv tmp x.csv
awk -F"," '($100="")1' OFS="," x.csv > tmp && mv tmp x.csv
They work out individually but I don't know how how to put them together and I am looking for ways to have it run through all the files under the directory.
Looking for concrete AWK code or shell script calling AWK.
Thank you!
An example input would be:
a,b,c
x,y,z
Expected output would be:
a,b,c,,,,,,,,,,
x,y,z,,,,,,,,,,
you can combine in one script without any loops
$ awk 'BEGIN{FS=OFS=","} FNR==1{close(f); f=FILENAME".updated"} NF{$100=""; print > f}' files...
it won't overwrite the original files.
You can pipe the output of the first to the other:
awk '!/^[[:space:]]*$/' x.csv | awk -F"," '($100="")1' OFS="," > new_x.csv
If you wanted to run the above on all the files in your directory, you would do:
shopt -s nullglob
for f in yourdirectory/*.csv; do
awk '!/^[[:space:]]*$/' "${f}" | awk -F"," '($100="")1' OFS="," > new_"${f}"
done
The shopt -s nullglob is so that an empty directory won't give you a literal *. Quoted from a good source for about looping through files
With recent enough GNU awk you could:
$ gawk -i inplace 'BEGIN{FS=OFS=","}/\S/{NF=100;$1=$1;print}' *
Explained:
$ gawk -i inplace ' # using GNU awk and in-place file editing
BEGIN {
FS=OFS="," # set delimiters to a comma
}
/\S/ { # gawk specific regex operator that matches any character that is not a space
NF=100 # set the field count to 100 which truncates fields above it
$1=$1 # edit the first field to rebuild the record to actually get the extra commas
print # output records
}' *
Some test data (the first empty record is empty, the second empty record has a space and a tab, trust me bro):
$ cat file
1,2,3
1,2,3,4,5,6,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
Output of cat file after the execution of the GNU awk program:
1,2,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100

How to combine awk and sed in while read line from text file to pull parts and rearrange the output

I have text files that have a source path + filename and the destination path.
What I need is to pull the destination path then add just the filename from the line then add a system command to it.
I am nesting a while loop within a for loop to crawl through a directory of text files to first stage files then get the hash using digest then write the results to a text file.
Each line in the text file looks like this.
/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/
I can get the destination path or the file name but it is giving me fits trying to get them together.
I need it to combine into /folder/folder/folder/xxxxx/file.jpg.
Then I need to add a stage command, stage /folder/folder/folder/xxxxx/file.jpg
this gets path;
for file in ls 10*.txt; do cat $file | awk '{print $2}'; done
And this gets the file name;
for file in ls 10*.txt; do TIF=`cat $file | awk '{print $6}' FS=/`; echo $TIF; done
But when I try to combine them using awk, sed, cut or anything esle I can Google, it only pulls the first one in the statement.
Assuming that your input file has tab separated fields and there are no space chars in any of your file/path data, try this,
echo "/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/" \
| awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}'
output
stage /folder/folder/folder/xxxxx/file.jpg
This will then work with
awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file
Review the output to be sure all files will be processed correctly. If so, you can then pass the output to bash and all files will be processed (staged?), i.e.
awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file | bash
IHTH
You can use sed with the delimiter #.
First match the last word (string without slash) before the whitespace, it will be stored in \1.
Store the path (after the whitespace) in \2.
echo '/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/' |
sed -r 's#.*/([^/]*)\s+(.*)#stage \2/\1#'

Awk only works on final file

I am attempting to process many .csv files using the following loop:
for i in *.csv
do
dos2unix $i
awk '{print $0, FILENAME}' $i>new_edit.csv
done
This script shoud append the file name, to the end of each file, and it works. However, looking at the output new_edit.csv only contains data from one of the .csv files entered.
wc -l new_edit.csv
Indicates that my awk is only processing lines from one of my csv files. How can I make my awk process every file?
Instead of using > you should use >> as appending redirector. You could also replace the whole code with:
$ awk '{sub(/\r/,"",$NF); print $0, FILENAME}' *.csv > new_edit.csv
Following program should help you:
since you were using the redirect operator > , which was always overriding the content in file. if we replace it with append redirect oprerator >>, it would process all files and append the content in new file
#!/bin/bash
for i in *.csv
do
awk '{print $0, FILENAME}' $i>>new_edit.csv
done

How do I write an awk print command in a loop?

I would like to write a loop creating various output files with the first column of each input file, respectively.
So I wrote
for i in $(\ls -d /home/*paired.isoforms.results)
do
awk -F"\t" {print $1}' $i > $i.transcript_ids.txt
done
As an example if there were 5 files in the home directory named
A_paired.isoforms.results
B_paired.isoforms.results
C_paired.isoforms.results
D_paired.isoforms.results
E_paired.isoforms.results
I would like to print the first column of each of these files into a seperate output file, i.e. I would like to have 5 output files called
A.transcript_ids.txt
B.transcript_ids.txt
C.transcript_ids.txt
D.transcript_ids.txt
E.transcript_ids.txt
or any other name as long as it is 5 different names and I can still link them back to the original files.
I understand, that there is a problem with the double usage of $ in both the awk and the loop command, but I don't know how to change that.
Is it possible to write a command like this in a loop?
This should do the job:
for file in /home/*paired.isoforms.results
do
base=${file##*/}
base=${base%%_*}
awk -F"\t" '{print $1}' $file > $base.transcript_ids.txt
done
I assume that there can be spaces in the first field since you set the delimiter explicitly to tab. This runs awk once per file. There are ways to do it running awk once for all files, but I'm not convinced the benefit is significant. You could consider using cut instead of awk '{print $1}', too. Note that using ls as you did is less satisfactory than using globbing directly; it runs foul of file names with oddball characters (spaces, tabs, etc) in the name.
You can do that entirely in awk:
awk -F"\t" '{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; print $1 > out}' *_paired.isoforms.results
If your input files don't have names as indicated in the question, you'd have to split on something else ( as well as use a different pattern match for the input files ).
My original answer is actually doing extra name resolution every time something is printed. Here's a version that only updates the output filename when FILENAME changes:
awk -F"\t" 'FILENAME!=lf{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; lf=FILENAME} {print $1 > out}' *_paired.isoforms.results

How to copy a .c file to a numbered listing

I simply want to copy my .c file into a line-numbered listing file. Basically generate a .prn file from my .c file. I'm having a hard time finding the right bash command to do so.
Do you mean nl?
nl -ba filename.c
The -ba means to number all lines, not just non-empty ones.
awk '{print FNR ":" $0}' file1 file2 ...
is one way.
FNR is FileNumberRecord (the current line number per file).
You can change the ":" per your needs.
$0 means "the-whole-line-of-input"
Or you can do
cat -n file1 file2 ....
IHTH
On my linux system, I occasionally use pr -tn to prefix line numbers for listings. The -t option suppresses headers and footers; -n says to prefix line numbers. -n allows optional format and digit specifiers; see man page. Anyhow, to print file xyz.c to xyz.prn with line numbering, use:
pr -tn xyz.c > xyz.prn
Note, this is not as compact and handy as cat -n xyz.c > xyz.prn (using cat -n as suggested in a previous answer); but pr has numerous other options, and I most often use it when I want to both number the lines and put them into multiple columns or print multiple files side by side. Eg for a 2-column numbered listing use:
pr -2 -tn xyz.c > xyz.prn
I think shellter has the right idea. However, if your require output written to files with prn extensions, here's one way:
awk '{ sub(/\.c$/, "", FILENAME); print FNR ":" $0 > FILENAME ".prn" }' file1.c file2.c ...
To perform this on all files in the present working directory:
for i in *.c; do awk '{ sub(/\.c$/, "", FILENAME); print FNR ":" $0 > FILENAME ".prn" }' "$i"; done

Resources