Extracting Number From a File - bash

I'm trying to write a script (with bash) that looks for a word (for example "SOME(X) WORD:") and prints the rest of the line which is effectively some numbers with "-" in front. To clarify, an example line that I'm looking for in a file is;
SOME(X) WORD: -1.0475392439 ANOTHER.W= -0.0590214433
I want to extract the number after "SOME(X) WORD:", so "-1.0475392439" for this example. I have a similar script to this which extracts the number from the following line (both lines are from the same input file)
A-DESIRED RESULT W( WORD) = -9.68765465413
And the script for this is,
local output="$1"
local ext="log"
local word="W( WORD)"
cd $dir
find "${output}" -type f -name "*.${ext}" -exec awk -v ptn="${word}" 'index($0,ptn) {print $NF,FILENAME}' {} +
But when I change the local word variable from "W( WORD)" to "SOME(X) WORD", it captures the "-0.0590214433" instead of "-1.0475392439" meaning it takes the last number in line. How can I find a solution to this? Thanks in advance!

As you have seen, print $NF outputs the last field of the line. Please modify the find line as:
find "${output}" -type f -name "*.${ext}" -exec awk -v ptn="${word}" 'index($0, ptn) {if (match($0, /-[0-9]+\.[0-9]+/)) print substr($0, RSTART, RLENGTH), FILENAME}' {} +
Then it will output the first number in the line.
Please note it assumes the number always starts with the - sign.

Related

Input folder / output folder for each file in AWK [duplicate]

This question already has answers here:
Redirecting stdout with find -exec and without creating new shell
(3 answers)
Closed last month.
I am trying to run (several) Awk scripts through a list of files and would like to get each file as an output in a different folder. I tried already several ways but can not find the solution. The output in the output folder is always a single file called {} which includes all content of all files from the input folder.
Here is my code:
input_folder="/path/to/input"
output_folder="/path/to/output"
find $input_folder -type f -exec awk '! /rrsig/ && ! /dskey/ {print $1,";",$5}' {} >> $output_folder/{} \;
Can you please give me a hint what I am doing wrong?
The code is called in a .sh script.
I'd probably opt for a (slightly) less complicated find | xargs, eg:
find "${input_folder}" -type f | xargs -r \
awk -v outd="${output_folder}" '
FNR==1 { close(outd "/" outf); outf=FILENAME; sub(/.*\//,"",outf) }
! /rrsig/ && ! /dskey/ { print $1,";",$5 > (outd "/" outf) }'
NOTE: the commas in $1,";",$5 will insert spaces between $1, ; and $2; if the spaces are not desired then use $1 ";" $5 (ie, remove the commas)

Multiple grep separator and display file information

I want to grep multiple information in files with multiple separator, and display file informations, with only one command.
./WBL-FILE-S-1-execution79065.html
./WBL-FILE-S-1-execution79066.html
./WBL-FILE-S-1-execution79067.html
If I do :
find . -type f -name « *WBL-FILE* » | xargs grep "Fichier lu"
I have results like :
./WBL-FILE-S-1-execution79065.html:<td title="Message">Fichier lu /opt/data/in/bl/000334_iwel1C010116730.blc.TRT</td>
./WBL-FILE-S-1-execution79065.html:<td title="Message">Fichier lu /opt/data/in/bl/000312_iwel1C010116727.blc.TRT</td>
./WBL-FILE-S-1-execution74707.html:<td title="Message">Fichier lu /opt/data/in/bl/000420_iwel1C010116284.blc.TRT</td>
The goal is to get the date of file, filename, the XXXXXX_iwel number, and the CXXXXXXXXX number.
Example :
2021-07-13 13:47 WBL-FILE-S-1-execution79065.html 000334 010116730
2021-07-13 14:48 WBL-FILE-S-1-execution79065.html 000312 010116727
2021-07-14 14:49 WBL-FILE-S-1-execution74707.html 000420 010116284
I almost succeed to extract the different part, but after that, I can't get the "ls" (date) information on the original file.
Is there a way to do that only with one line combinaison of commands ?
Thank you
If you want to add the file's date, grep alone won't cut it anymore. Also, extracting XXXXXX_iwel and CXXXXXXXXX and printing these numbers on the same line is not possible with grep alone.
Therefore I would switch to perl:
perl -nle 'use POSIX "strftime";
BEGIN { sub mtime { strftime "%Y-%m-%d %H:%M:%S", localtime((stat $ARGV)[9]) } }
/Fichier lu.*?(\d+)_iwel.*?C(\d+)/ && print join " ", mtime, $ARGV, $1, $2'
Sine all your files are in the same directory, you can use
perl ... *WBL-FILE*
For a recursive file search, use find -exec instead of find | xargs. This is not only more efficient, but also safer in case some filenames contain whitespace or special symbols like "'\.
find -type f -name '*WBL-FILE*' -exec perl ... {} +
For each file, you can display the information you need with one awk command.
awk 'match($0, /Fichier lu.*[^0-9]([0-9]*)_iwel[^C]*C([0-9]*)/, array) { date_command="date +\"%Y-%m-%d %H:%M:%S\" --date #$(stat -c %Y " FILENAME ")"; date_command | getline formatted_date; close(date_command); print formatted_date, FILENAME, array[1], array[2]}' /path/to/file
It can be rewritten like this for clarity:
awk 'match($0, /Fichier lu.*[^0-9]([0-9]*)_iwel[^C]*C([0-9]*)/, array) {
date_command="date +\"%Y-%m-%d %H:%M:%S\" --date #$(stat -c %Y " FILENAME ")";
date_command | getline formatted_date;
close(date_command);
print formatted_date, FILENAME, array[1], array[2]
}'
Basically it does 3 things:
It matches all lines including Fichier lu and captures the numbers of XXXXXX_iwel and CXXXXXXXXX into an array
It calls a command line to get the modification date of the file with the desired format
It prints all the information you want on the same line
You can plug it after find of course.
find . -name "*WBL-FILE*" | xargs awk 'match($0, /Fichier lu.*[^0-9]([0-9]*)_iwel[^C]*C([0-9]*)/, array) { date_command="date +\"%Y-%m-%d %H:%M:%S\" --date #$(stat -c %Y " FILENAME ")"; date_command | getline formatted_date; close(date_command); print formatted_date, FILENAME, array[1], array[2]}'
Result:
2021-07-28 10:45:50 ./WBL-FILE-S-1-execution79065.html 000334 010116730
2021-07-28 10:45:50 ./WBL-FILE-S-1-execution79065.html 000312 010116727
2021-07-28 10:46:41 ./WBL-FILE-S-1-execution74707.html 000420 010116284
Side notes
I used the match function, which is part of GNU Awk, also known as gawk. If you don’t have it, it’s still possible but it requires another way to capture the string.
The trickiest part is probably the command for getting the date because we need to build a string for the command and then call it and then store the result in a variable. It’s a bit messy. It also requires a two-step process: get the date in Epoch time (i.e. numbers of seconds from 1970-01-01) and then format this value with the YYYY-MM-DD HH:MM:SS format. On the other hand you can adapt these steps very easily. For instance you can display the date with another format by changing the +\"%Y-%m-%d %H:%M:%S\" string sent to date. Or you can display the creation date instead of the last modification date by changing the -c %Y option sent to stat.
The command is not robust to filenames and folders containing whitespaces. To fix this, first you may use an ugly syntax to replace $(stat -c %Y " FILENAME ")" with $(stat -c %Y '"'"'" FILENAME "'"'"')" during the date call. Yikes. This is due to how we build the string in one line. Secondly you may use either of those commands to make sure filenames are passed correctly (to simplify, let’s say the awk script is stored in the AWKSTRING variable).
find . -name "*WBL-FILE*" -print0 | xargs -0 awk "$AWKSTRING"
find . -name "*WBL-FILE*" -exec awk "$AWKSTRING" {} \;
find . -name "*WBL-FILE*" -exec awk "$AWKSTRING" {} +
The latter is probably a bit more optimal than the others, but not all versions of find support it.

Sorting issue in Bash Script

I have a whole file full of filenames that is outputted from the find command below:
find "$ARCHIVE" -type f -name *_[0-9][0-9] | sed 's/_[0-9][0-9]$//' > temp
I am now trying to sort these file names and count them to find out which one appears the most. The problem I am having with this is whenever I execute:
sort -g temp
It prints all the sorted file names to the command line and I am unsure why. Any help with this issue would be greatly appreciated!
You may need this:
sort temp| uniq -c | sort -nr
First we sort temp, then we prefix lines by the number of occurrences (uniq -c), next we compare according to string numerical value (sort -n) and the last command reverse the result of comparisons (sort -r).
Example file:
/home/user/testfiles/405/prob405823
/home/user/testfiles/405/prob405823
/home/user/testfiles/527/prob527149
/home/user/testfiles/518/prob518433
Output:
2 /home/user/testfiles/405/prob405823
1 /home/user/testfiles/527/prob527149
etc..
Resources:
Linux / Unix Command: sort
uniq(1) - Linux man page
ptierno - comments to improve answer
You could do everything after the find in one awk command (this one uses GNU awk 4.*):
find "$ARCHIVE" -type f -name *_[0-9][0-9] |
awk '
{ cnt[gensub(/_[0-9][0-9]$/,"","")]++ }
END {
PROCINFO["sorted_in"] = "#val_num_desc"
for (file in cnt) {
print cnt, file
}
}
'

bash script reading lines in every file copying specific values to newfile

I want to write a script helping me to do my work.
Problem: I have many files in one dir containing data and I need from every file specific values copied in a newfile.
The datafiles can look likes this:
Name abc $desV0
Start MJD56669 opCMS v2
End MJD56670 opCMS v2
...
valueX 0.0456 RV_gB
...
valueY 12063.23434 RV_gA
...
What the script should do is copy valueX and the following value and also valueY and following value copied into an new file in one line. And the add in that line the name of the source datafile. Additionally the value of valueY should only contain everything before the dot.
The result should look like this:
valueX 0.0456 valueY 12063 name_of_sourcefile
I am so far:
for file in $(find -maxdepth 0 -type f -name *.wt); do
for line in $(cat $file | grep -F vb); do
cp $line >> file_done
done
done
But that doesn't work at all. I also have no idea how to get the data in ONE line in the newfile.
Can anyone help me?
I think you can simplify your script a lot using awk:
awk '/valueX/{x=$2}/valueY/{print "valueX",x,"valueY",$2,FILENAME}' *.wt > file_done
This goes through every file in the current directory. When "valueX" is matched, the value is saved to the variable x. When "valueY" is matched, the line is printed.
This assumes that the line containing "valueX" always comes before the one containing "valueY". If that isn't a valid assumption, the script can easily be changed.
To print only the integer part of "valueY", you can use printf instead of print:
awk '/valueX/{x=$2}/valueY/{printf "valueX %s valueY %d %s\n",x,$2,FILENAME}' *.wt > file_done
%d is the format specifier for an integer.
If your requirements are more complex and you need to use find, you should use -exec rather than looping through the results, to avoid problems with awkward file names:
find -maxdepth 1 -iname "5*.par" ! -iname "*_*" -exec \
awk '/valueX/{x=$2}/valueY/{printf "valueX %s valueY %d %s\n",x,$2,"{}"}' '{}' \; > file_done
don't fight. I'm really thankful for your help and exspecially the fast answers.
This is my final solution I think:
#!/bin/bash
for file in $(find * -maxdepth 1 -iname "5*.par" ! -iname "*_*"); do
awk '/TASC/{x=$2}/START/{printf "TASC %s MJD %d %s",x,$2, FILENAME}' $file > mjd_vs_tasc
done
Very thanks again to you guys.
Try something like below :
egrep "valueX|valueY" *.wt | awk -vRD="\n" -vORS=" " -F':| ' '{if (NR%2==0) {print $2, $3, $1} else {print $2, $3}}' > $file.new.txt

listing of files in a directory

I need to list all the files in a directory like:
/home/rk/a.root /home/rk/b.root /home/rk/c.root
for that I am using
$ls | gawk 'BEGIN{ORS=" "}{print "/home/rk/"$1}'
But in that directory there are 2000 files and I need to list first 100 in one line then next 100 in next line and so on.
Also, Before each line I need to add a line "hadd result.root"
try this:
find /home/rk -type f |xargs -n100
Use printf instead of print to prevent automatically adding newlines. Then declare a counter variable in the BEGIN{ } section, increment it for every file and if that (counter % 100) == 0 print a newline and/or the per-line requisite.

Resources