Parsing .eml files, checking conditions and printing out specific lines - bash
I am trying to parse eml files (thousands of files in a folder), check for specific text in the files and if its there print out the text and other specific lines into a line per file into a text file.
I am using a Linux terminal to execute a command and managed to check the condition, however the command only prints out the file name and the matched condition.
How can I modify this command to extract specific lines if the condition is matched ?
for i in ./*.eml
do
cat "$i"| egrep -o "[0-9]+.from.[0-9]+" | awk -v a="$i" '{ if ( $1 = $3 ) print a, $1, "from", $3}' >> temp.txt
done
Related
AWK remove blank lines and append empty columns to all csv files in the directory
Hi I am looking for a way to combine all the below commands together. Remove blank lines in the csv file (comma delimited) Add multiple empty columns to each line up to 100th column Perform action 1 & 2 on all the files in the folder I am still learning and this is the best I could get: awk '!/^[[:space:]]*$/' x.csv > tmp && mv tmp x.csv awk -F"," '($100="")1' OFS="," x.csv > tmp && mv tmp x.csv They work out individually but I don't know how how to put them together and I am looking for ways to have it run through all the files under the directory. Looking for concrete AWK code or shell script calling AWK. Thank you! An example input would be: a,b,c x,y,z Expected output would be: a,b,c,,,,,,,,,, x,y,z,,,,,,,,,,
you can combine in one script without any loops $ awk 'BEGIN{FS=OFS=","} FNR==1{close(f); f=FILENAME".updated"} NF{$100=""; print > f}' files... it won't overwrite the original files.
You can pipe the output of the first to the other: awk '!/^[[:space:]]*$/' x.csv | awk -F"," '($100="")1' OFS="," > new_x.csv If you wanted to run the above on all the files in your directory, you would do: shopt -s nullglob for f in yourdirectory/*.csv; do awk '!/^[[:space:]]*$/' "${f}" | awk -F"," '($100="")1' OFS="," > new_"${f}" done The shopt -s nullglob is so that an empty directory won't give you a literal *. Quoted from a good source for about looping through files
With recent enough GNU awk you could: $ gawk -i inplace 'BEGIN{FS=OFS=","}/\S/{NF=100;$1=$1;print}' * Explained: $ gawk -i inplace ' # using GNU awk and in-place file editing BEGIN { FS=OFS="," # set delimiters to a comma } /\S/ { # gawk specific regex operator that matches any character that is not a space NF=100 # set the field count to 100 which truncates fields above it $1=$1 # edit the first field to rebuild the record to actually get the extra commas print # output records }' * Some test data (the first empty record is empty, the second empty record has a space and a tab, trust me bro): $ cat file 1,2,3 1,2,3,4,5,6, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101 Output of cat file after the execution of the GNU awk program: 1,2,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1,2,3,4,5,6,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
How to combine awk and sed in while read line from text file to pull parts and rearrange the output
I have text files that have a source path + filename and the destination path. What I need is to pull the destination path then add just the filename from the line then add a system command to it. I am nesting a while loop within a for loop to crawl through a directory of text files to first stage files then get the hash using digest then write the results to a text file. Each line in the text file looks like this. /folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/ I can get the destination path or the file name but it is giving me fits trying to get them together. I need it to combine into /folder/folder/folder/xxxxx/file.jpg. Then I need to add a stage command, stage /folder/folder/folder/xxxxx/file.jpg this gets path; for file in ls 10*.txt; do cat $file | awk '{print $2}'; done And this gets the file name; for file in ls 10*.txt; do TIF=`cat $file | awk '{print $6}' FS=/`; echo $TIF; done But when I try to combine them using awk, sed, cut or anything esle I can Google, it only pulls the first one in the statement.
Assuming that your input file has tab separated fields and there are no space chars in any of your file/path data, try this, echo "/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/" \ | awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' output stage /folder/folder/folder/xxxxx/file.jpg This will then work with awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file Review the output to be sure all files will be processed correctly. If so, you can then pass the output to bash and all files will be processed (staged?), i.e. awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file | bash IHTH
You can use sed with the delimiter #. First match the last word (string without slash) before the whitespace, it will be stored in \1. Store the path (after the whitespace) in \2. echo '/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/' | sed -r 's#.*/([^/]*)\s+(.*)#stage \2/\1#'
Awk only works on final file
I am attempting to process many .csv files using the following loop: for i in *.csv do dos2unix $i awk '{print $0, FILENAME}' $i>new_edit.csv done This script shoud append the file name, to the end of each file, and it works. However, looking at the output new_edit.csv only contains data from one of the .csv files entered. wc -l new_edit.csv Indicates that my awk is only processing lines from one of my csv files. How can I make my awk process every file?
Instead of using > you should use >> as appending redirector. You could also replace the whole code with: $ awk '{sub(/\r/,"",$NF); print $0, FILENAME}' *.csv > new_edit.csv
Following program should help you: since you were using the redirect operator > , which was always overriding the content in file. if we replace it with append redirect oprerator >>, it would process all files and append the content in new file #!/bin/bash for i in *.csv do awk '{print $0, FILENAME}' $i>>new_edit.csv done
How do I write an awk print command in a loop?
I would like to write a loop creating various output files with the first column of each input file, respectively. So I wrote for i in $(\ls -d /home/*paired.isoforms.results) do awk -F"\t" {print $1}' $i > $i.transcript_ids.txt done As an example if there were 5 files in the home directory named A_paired.isoforms.results B_paired.isoforms.results C_paired.isoforms.results D_paired.isoforms.results E_paired.isoforms.results I would like to print the first column of each of these files into a seperate output file, i.e. I would like to have 5 output files called A.transcript_ids.txt B.transcript_ids.txt C.transcript_ids.txt D.transcript_ids.txt E.transcript_ids.txt or any other name as long as it is 5 different names and I can still link them back to the original files. I understand, that there is a problem with the double usage of $ in both the awk and the loop command, but I don't know how to change that. Is it possible to write a command like this in a loop?
This should do the job: for file in /home/*paired.isoforms.results do base=${file##*/} base=${base%%_*} awk -F"\t" '{print $1}' $file > $base.transcript_ids.txt done I assume that there can be spaces in the first field since you set the delimiter explicitly to tab. This runs awk once per file. There are ways to do it running awk once for all files, but I'm not convinced the benefit is significant. You could consider using cut instead of awk '{print $1}', too. Note that using ls as you did is less satisfactory than using globbing directly; it runs foul of file names with oddball characters (spaces, tabs, etc) in the name.
You can do that entirely in awk: awk -F"\t" '{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; print $1 > out}' *_paired.isoforms.results If your input files don't have names as indicated in the question, you'd have to split on something else ( as well as use a different pattern match for the input files ). My original answer is actually doing extra name resolution every time something is printed. Here's a version that only updates the output filename when FILENAME changes: awk -F"\t" 'FILENAME!=lf{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; lf=FILENAME} {print $1 > out}' *_paired.isoforms.results
How to copy a .c file to a numbered listing
I simply want to copy my .c file into a line-numbered listing file. Basically generate a .prn file from my .c file. I'm having a hard time finding the right bash command to do so.
Do you mean nl? nl -ba filename.c The -ba means to number all lines, not just non-empty ones.
awk '{print FNR ":" $0}' file1 file2 ... is one way. FNR is FileNumberRecord (the current line number per file). You can change the ":" per your needs. $0 means "the-whole-line-of-input" Or you can do cat -n file1 file2 .... IHTH
On my linux system, I occasionally use pr -tn to prefix line numbers for listings. The -t option suppresses headers and footers; -n says to prefix line numbers. -n allows optional format and digit specifiers; see man page. Anyhow, to print file xyz.c to xyz.prn with line numbering, use: pr -tn xyz.c > xyz.prn Note, this is not as compact and handy as cat -n xyz.c > xyz.prn (using cat -n as suggested in a previous answer); but pr has numerous other options, and I most often use it when I want to both number the lines and put them into multiple columns or print multiple files side by side. Eg for a 2-column numbered listing use: pr -2 -tn xyz.c > xyz.prn
I think shellter has the right idea. However, if your require output written to files with prn extensions, here's one way: awk '{ sub(/\.c$/, "", FILENAME); print FNR ":" $0 > FILENAME ".prn" }' file1.c file2.c ... To perform this on all files in the present working directory: for i in *.c; do awk '{ sub(/\.c$/, "", FILENAME); print FNR ":" $0 > FILENAME ".prn" }' "$i"; done