bash script reading lines in every file copying specific values to newfile - bash

I want to write a script helping me to do my work.
Problem: I have many files in one dir containing data and I need from every file specific values copied in a newfile.
The datafiles can look likes this:
Name abc $desV0
Start MJD56669 opCMS v2
End MJD56670 opCMS v2
...
valueX 0.0456 RV_gB
...
valueY 12063.23434 RV_gA
...
What the script should do is copy valueX and the following value and also valueY and following value copied into an new file in one line. And the add in that line the name of the source datafile. Additionally the value of valueY should only contain everything before the dot.
The result should look like this:
valueX 0.0456 valueY 12063 name_of_sourcefile
I am so far:
for file in $(find -maxdepth 0 -type f -name *.wt); do
for line in $(cat $file | grep -F vb); do
cp $line >> file_done
done
done
But that doesn't work at all. I also have no idea how to get the data in ONE line in the newfile.
Can anyone help me?

I think you can simplify your script a lot using awk:
awk '/valueX/{x=$2}/valueY/{print "valueX",x,"valueY",$2,FILENAME}' *.wt > file_done
This goes through every file in the current directory. When "valueX" is matched, the value is saved to the variable x. When "valueY" is matched, the line is printed.
This assumes that the line containing "valueX" always comes before the one containing "valueY". If that isn't a valid assumption, the script can easily be changed.
To print only the integer part of "valueY", you can use printf instead of print:
awk '/valueX/{x=$2}/valueY/{printf "valueX %s valueY %d %s\n",x,$2,FILENAME}' *.wt > file_done
%d is the format specifier for an integer.
If your requirements are more complex and you need to use find, you should use -exec rather than looping through the results, to avoid problems with awkward file names:
find -maxdepth 1 -iname "5*.par" ! -iname "*_*" -exec \
awk '/valueX/{x=$2}/valueY/{printf "valueX %s valueY %d %s\n",x,$2,"{}"}' '{}' \; > file_done

don't fight. I'm really thankful for your help and exspecially the fast answers.
This is my final solution I think:
#!/bin/bash
for file in $(find * -maxdepth 1 -iname "5*.par" ! -iname "*_*"); do
awk '/TASC/{x=$2}/START/{printf "TASC %s MJD %d %s",x,$2, FILENAME}' $file > mjd_vs_tasc
done
Very thanks again to you guys.

Try something like below :
egrep "valueX|valueY" *.wt | awk -vRD="\n" -vORS=" " -F':| ' '{if (NR%2==0) {print $2, $3, $1} else {print $2, $3}}' > $file.new.txt

Related

Input folder / output folder for each file in AWK [duplicate]

This question already has answers here:
Redirecting stdout with find -exec and without creating new shell
(3 answers)
Closed last month.
I am trying to run (several) Awk scripts through a list of files and would like to get each file as an output in a different folder. I tried already several ways but can not find the solution. The output in the output folder is always a single file called {} which includes all content of all files from the input folder.
Here is my code:
input_folder="/path/to/input"
output_folder="/path/to/output"
find $input_folder -type f -exec awk '! /rrsig/ && ! /dskey/ {print $1,";",$5}' {} >> $output_folder/{} \;
Can you please give me a hint what I am doing wrong?
The code is called in a .sh script.
I'd probably opt for a (slightly) less complicated find | xargs, eg:
find "${input_folder}" -type f | xargs -r \
awk -v outd="${output_folder}" '
FNR==1 { close(outd "/" outf); outf=FILENAME; sub(/.*\//,"",outf) }
! /rrsig/ && ! /dskey/ { print $1,";",$5 > (outd "/" outf) }'
NOTE: the commas in $1,";",$5 will insert spaces between $1, ; and $2; if the spaces are not desired then use $1 ";" $5 (ie, remove the commas)

Extracting Number From a File

I'm trying to write a script (with bash) that looks for a word (for example "SOME(X) WORD:") and prints the rest of the line which is effectively some numbers with "-" in front. To clarify, an example line that I'm looking for in a file is;
SOME(X) WORD: -1.0475392439 ANOTHER.W= -0.0590214433
I want to extract the number after "SOME(X) WORD:", so "-1.0475392439" for this example. I have a similar script to this which extracts the number from the following line (both lines are from the same input file)
A-DESIRED RESULT W( WORD) = -9.68765465413
And the script for this is,
local output="$1"
local ext="log"
local word="W( WORD)"
cd $dir
find "${output}" -type f -name "*.${ext}" -exec awk -v ptn="${word}" 'index($0,ptn) {print $NF,FILENAME}' {} +
But when I change the local word variable from "W( WORD)" to "SOME(X) WORD", it captures the "-0.0590214433" instead of "-1.0475392439" meaning it takes the last number in line. How can I find a solution to this? Thanks in advance!
As you have seen, print $NF outputs the last field of the line. Please modify the find line as:
find "${output}" -type f -name "*.${ext}" -exec awk -v ptn="${word}" 'index($0, ptn) {if (match($0, /-[0-9]+\.[0-9]+/)) print substr($0, RSTART, RLENGTH), FILENAME}' {} +
Then it will output the first number in the line.
Please note it assumes the number always starts with the - sign.

Multiple grep separator and display file information

I want to grep multiple information in files with multiple separator, and display file informations, with only one command.
./WBL-FILE-S-1-execution79065.html
./WBL-FILE-S-1-execution79066.html
./WBL-FILE-S-1-execution79067.html
If I do :
find . -type f -name « *WBL-FILE* » | xargs grep "Fichier lu"
I have results like :
./WBL-FILE-S-1-execution79065.html:<td title="Message">Fichier lu /opt/data/in/bl/000334_iwel1C010116730.blc.TRT</td>
./WBL-FILE-S-1-execution79065.html:<td title="Message">Fichier lu /opt/data/in/bl/000312_iwel1C010116727.blc.TRT</td>
./WBL-FILE-S-1-execution74707.html:<td title="Message">Fichier lu /opt/data/in/bl/000420_iwel1C010116284.blc.TRT</td>
The goal is to get the date of file, filename, the XXXXXX_iwel number, and the CXXXXXXXXX number.
Example :
2021-07-13 13:47 WBL-FILE-S-1-execution79065.html 000334 010116730
2021-07-13 14:48 WBL-FILE-S-1-execution79065.html 000312 010116727
2021-07-14 14:49 WBL-FILE-S-1-execution74707.html 000420 010116284
I almost succeed to extract the different part, but after that, I can't get the "ls" (date) information on the original file.
Is there a way to do that only with one line combinaison of commands ?
Thank you
If you want to add the file's date, grep alone won't cut it anymore. Also, extracting XXXXXX_iwel and CXXXXXXXXX and printing these numbers on the same line is not possible with grep alone.
Therefore I would switch to perl:
perl -nle 'use POSIX "strftime";
BEGIN { sub mtime { strftime "%Y-%m-%d %H:%M:%S", localtime((stat $ARGV)[9]) } }
/Fichier lu.*?(\d+)_iwel.*?C(\d+)/ && print join " ", mtime, $ARGV, $1, $2'
Sine all your files are in the same directory, you can use
perl ... *WBL-FILE*
For a recursive file search, use find -exec instead of find | xargs. This is not only more efficient, but also safer in case some filenames contain whitespace or special symbols like "'\.
find -type f -name '*WBL-FILE*' -exec perl ... {} +
For each file, you can display the information you need with one awk command.
awk 'match($0, /Fichier lu.*[^0-9]([0-9]*)_iwel[^C]*C([0-9]*)/, array) { date_command="date +\"%Y-%m-%d %H:%M:%S\" --date #$(stat -c %Y " FILENAME ")"; date_command | getline formatted_date; close(date_command); print formatted_date, FILENAME, array[1], array[2]}' /path/to/file
It can be rewritten like this for clarity:
awk 'match($0, /Fichier lu.*[^0-9]([0-9]*)_iwel[^C]*C([0-9]*)/, array) {
date_command="date +\"%Y-%m-%d %H:%M:%S\" --date #$(stat -c %Y " FILENAME ")";
date_command | getline formatted_date;
close(date_command);
print formatted_date, FILENAME, array[1], array[2]
}'
Basically it does 3 things:
It matches all lines including Fichier lu and captures the numbers of XXXXXX_iwel and CXXXXXXXXX into an array
It calls a command line to get the modification date of the file with the desired format
It prints all the information you want on the same line
You can plug it after find of course.
find . -name "*WBL-FILE*" | xargs awk 'match($0, /Fichier lu.*[^0-9]([0-9]*)_iwel[^C]*C([0-9]*)/, array) { date_command="date +\"%Y-%m-%d %H:%M:%S\" --date #$(stat -c %Y " FILENAME ")"; date_command | getline formatted_date; close(date_command); print formatted_date, FILENAME, array[1], array[2]}'
Result:
2021-07-28 10:45:50 ./WBL-FILE-S-1-execution79065.html 000334 010116730
2021-07-28 10:45:50 ./WBL-FILE-S-1-execution79065.html 000312 010116727
2021-07-28 10:46:41 ./WBL-FILE-S-1-execution74707.html 000420 010116284
Side notes
I used the match function, which is part of GNU Awk, also known as gawk. If you don’t have it, it’s still possible but it requires another way to capture the string.
The trickiest part is probably the command for getting the date because we need to build a string for the command and then call it and then store the result in a variable. It’s a bit messy. It also requires a two-step process: get the date in Epoch time (i.e. numbers of seconds from 1970-01-01) and then format this value with the YYYY-MM-DD HH:MM:SS format. On the other hand you can adapt these steps very easily. For instance you can display the date with another format by changing the +\"%Y-%m-%d %H:%M:%S\" string sent to date. Or you can display the creation date instead of the last modification date by changing the -c %Y option sent to stat.
The command is not robust to filenames and folders containing whitespaces. To fix this, first you may use an ugly syntax to replace $(stat -c %Y " FILENAME ")" with $(stat -c %Y '"'"'" FILENAME "'"'"')" during the date call. Yikes. This is due to how we build the string in one line. Secondly you may use either of those commands to make sure filenames are passed correctly (to simplify, let’s say the awk script is stored in the AWKSTRING variable).
find . -name "*WBL-FILE*" -print0 | xargs -0 awk "$AWKSTRING"
find . -name "*WBL-FILE*" -exec awk "$AWKSTRING" {} \;
find . -name "*WBL-FILE*" -exec awk "$AWKSTRING" {} +
The latter is probably a bit more optimal than the others, but not all versions of find support it.

Bash: Sort within a directory, concatenate contents of file with smallest entry in new file

I have a set of directories RUN1, RUN2, etc.
In each directory, there is a set of files. In each file, there are two numbers. For example (these are saved as .csv, even though there are clearly no commas here):
RUN1
mod_1.csv
2.32e-00
1.2e-01
mod_b.csv
4.53e-00
1.1e-01
RUN2
mod_a.csv
1.23e-01
0.5e-02
mod3.csv
1.67e-00
0.4e-01
etc.
I would like to do this:
For each directory:
For each file in a directory:
Sort files by the first entry
Save contents and path of file with smallest value of first entry.
For example, above, this would result in a new file containing:
2.32e-00 1.2e-01 ./RUN1/mod_1.csv
1.23e-01 0.5e-02 ./RUN2/mod_a.csv
I started out by trying this:
#!/bin/bash
resultfile="best_results.txt"
for d in $(find . -type d -name 'RUN*' | sort);
do
find "$d" -type f -name 'mod*' -exec awk '{print $0, FILENAME}' {} \; >> "$resultfile"
done
But it gives me both values from all files, like this:
2.32e-00 ./RUN1/mod_1.csv
1.2e-01 ./RUN1/mod_1.csv
4.53e-00 ./RUN1/mod_b.csv
1.1e-01 ./RUN1/mod_b.csv
1.23e-01 ./RUN2/mod_a.csv
0.5e-02 ./RUN2/mod_a.csv
1.67e-00 ./RUN2/mod_3.csv
0.4e-01 ./RUN2/mod_3.csv
Then I figured I needed to use head, but this modification:
find "$d" -type f -name 'mod*' -exec awk '{print $0, FILENAME}' {} \; | head -1 >> "$resultfile"
gave me:
find: `awk' terminated by signal 13
I think I need another sort, and probably head, but I can't quite put this together.
EDIT (for clarity):
I want to look through all of the files in a directory, find the file with the smallest first number, and write the values of that file and the file path to a new file. Then, proceed to the next directory and do the same. In my example:
Directory RUN1 contains files mod_1.csv and mod_b.csv. File mod_1.csv has the smallest first value. I want to write its contents and file path on one line:
2.32e-00 1.2e-01 ./RUN1/mod_1.csv
to a file.
Directory RUN2 contains files, mod_a.csv and mod3.csv. File mod_a.csv has the smallest first value. I want to write its contents and file path on one line:
1.23e-01 0.5e-02 ./RUN2/mod_a.csv
So that the new file looks like this:
2.32e-00 1.2e-01 ./RUN1/mod_1.csv
1.23e-01 0.5e-02 ./RUN2/mod_a.csv
I understand that this was not clear in my pre-edit question. Please ask any questions you have! I'm not sure how to make this more clear.
You'll probably want to remove the newlines during find from what it looks like:
resultfile="best_results.txt"
for d in $(find . -type d -name 'RUN*');
do
find "$d" -type f -name 'mod*' -exec awk '{printf "%s ",$0} END {print "", FILENAME}' {} \;
done | sort >> "$resultfile"
The sorting would normally done at the very end (once all the results are returned from stdout), however, it's unclear how exactly you expect it to be sorted. You could probably get rid of the for loop if you really wanted, since using something such as the following should work similarly:
find RUN* -type f -name 'mod*' -exec awk '{printf "%s ",$0} END {print "", FILENAME}' {} \; | sort -k 2,2 >> "$resultfile"
Use the -k option with sort to specify which column(s) to sort by.
Result (using sort -k 2,2):
1.67e-00 0.4e-01 RUN2/mod3.csv
1.23e-01 0.5e-02 RUN2/mod_a.csv
4.53e-00 1.1e-01 RUN1/mod_b.csv
2.32e-00 1.2e-01 RUN1/mod_1.csv

Recursively check length of directory name

I need to determine if there are any directory names > 31 characters in a given directory (i.e. look underneath that root).
I know I can use something like find /path/to/root/dir -type d >> dirnames.txt
This will give me a text file of complete paths.
What I need is to get the actual number of characters in each directory name. Not sure if parsing the above results w/sed or awk makes sense. Looking for ideas/thoughts/suggestions/tips on how to accomplish this. Thanks!
This short script does it all in one go, i.e. finds all directory names and then outputs any which are greater than 31 characters in length (along with their length in characters):
for d in `find /path/to/root/dir -type d -exec basename {} \;` ; do
len=$(echo $d | wc -c)
if [ $len -gt 31 ] ; then
echo "$d = $len characters"
fi
done
Using your dirnames.txt file created by your find cmd, you can then sort the data by length of pathname, i.e.
awk '{print length($0) "\t" $0}' dirnames.txt | sort +0nr -1 > dirNamesWithSize.txt
This will present the longest path names (based on the value of length) at the top of the file.
I hope this helps.
Try this
find . -type d -exec bash -c '[ $(wc -c <<<"${1##*/}") -gt 32 ] && echo "${1}"' -- {} \; 2>/dev/null
The one bug, which I consider minor, is that it will over-count directory name length by 1 every time.
If what you wanted was the whole path rather than the last path component, then use this:
find . -type d | sed -e '/.\{32,\}/!d'
This version also has a bug, but only when file names have embedded newlines.
The output of both commands is a list of file names which match the criteria. Counting the length of each one is trivial from there.

Resources