Grepping a specific string from a file in script - bash

I have following file:(A sample file with filename: 2015_09_22_processedPartnumList.txt, Location: /a/b/c/itemreport)
DataLoader_trace_2015_09_22_02_01_32.0956.log:INFO: 2015-09-22
Data Processing Starts : 12345678
I just want to get all the ids from the above file i.e. 12345678 .... (each id in a separate line, not comma separated) in a file /a/b/c/d/ids_date +%d_%m_%Y_%H_%M_%S.log
I have written the following script, but the file I am getting is empty. Without showing any exception or anything. So, it is very difficult for me to identify the errors. Please tell me what is wrong in the script.
LOGDIR=/a/b/logdir
tr=`date +%p`
echo $tr
if [ $tr = "PM" ];
then
date=`date +%Y-%m-%d`
echo "considering today's date for grepping logs"
else
date=`date -d '1 day ago' +%Y-%m-%d`
echo "considering yesterday's date for grepping logs as job run is delayed"
fi
ITEM_FILE=/a/b/c/d/ids_`date +%d_%m_%Y_%H_%M_%S`.log
After implementing grep in PCRE, I am getting this and not any ids are being copied into the new file.

If your grep supports PCRE, you can do:
grep -Po '.*:\s\K\d+$' /a/b/c/itemreport/2015_09_22_processedPartnumList.txt \
>/apps/feeds/out/catalog/ItemPartnumbers_"$(date '+%d_%m_%Y_%H_%M_%S')".log
.*:\s will match upto the space after :, \K will discard the match
\d+$ will match our desired portion i.e. the digits till the end of the line
Example:
% grep -Po '.*:\s\K\d+$' 2015_09_22_processedPartnumList.txt \
>ItemPartnumbers_"$(date '+%d_%m_%Y_%H_%M_%S')".log
% cat ItemPartnumbers_09_11_2015_11_30_49.log
13982787
14011550
13984790
13984791
14176509
14902623
14924193
14924194
13982787
46795670
46795671

That's not very good solution, but it's working.
cat your\ file | cut -d ':' -f2-2 | tr -d INFO

Related

How to send shell script output in a tablular form and send the mail

I am a shell script which will give few lines as a output. Below is the output I am getting from shell script. My script flow is like first it will check weather we are having that file, if I am having it should give me file name and modified date. If I am not having it should give me file name and not found in a tabular form and send email. Also it should add header to the output.
CMC_daily_File.xlsx Not Found
CareOneHMA.xlsx Jun 11
Output
File Name Modified Date
CMC_daily_File.xlsx Not Found
CareOneHMA.xlsx Jun 11
UPDATE
sample of script
#!/bin/bash
if [ -e /saddwsgnas/radsfftor/coffe/COE_daily_File.xlsx ]; then
cd /sasgnas/radstor/coe/
ls -la COE_daily_File.xlsx | awk '{print $9, $6"_"$7}'
else
echo "CMC_COE_daily_File.xlsx Not_Found"
fi
Output
CMC_COE_daily_File.xlsx Jun_11
I thought I might offer you some options with a slightly modified script. I use the stat command to obtain the file modification time in more expansive format, as well as specifying an arbitrary, pre-defined, spacer character to divide the column data. That way, you can focus on displaying the content in its original, untampered form. This would also allow the formatted reporting of filenames which contain spaces without affecting the logic for formatting/aligning columns. The column command is told about that spacer character and it will adjust the width of columns to the widest content in each column. (I only wish that it also allowed you to specify a column divider character to be printed, but that is not part of its features/functions.)
I also added the extra AWK action, on the chance that you might be interested in making the results stand out more.
#!/bin/sh
#QUESTION: https://stackoverflow.com/questions/74571967/how-to-send-shell-script-output-in-a-tablular-form-and-send-the-mail
SPACER="|"
SOURCE_DIR="/saddwsgnas/radsfftor/coe"
SOURCE_DIR="."
{
printf "File Name${SPACER}Modified Date\n"
#for file in COE_daily_File.xlsx
for file in test_55.sh awkReportXmlTagMissingPropertyFieldAssignment.sh test_54.sh
do
if [ -e "${SOURCE_DIR}/${file}" ]; then
cd "${SOURCE_DIR}"
#ls -la "${file}" | awk '{print $9, $6"_"$7}'
echo "${file}${SPACER}"$(stat --format "%y" "${file}" | cut -f1 -d\. | awk '{ print $1, $2 }' )
else
echo "${file}${SPACER}Not Found"
fi
done
} | column -x -t -s "|" |
awk '{
### Refer to:
# https://man7.org/linux/man-pages/man4/console_codes.4.html
# https://www.ecma-international.org/publications-and-standards/standards/ecma-48/
if( NR == 1 ){
printf("\033[93;3m%s\033[0m\n", $0) ;
}else{
print $0 ;
} ;
}'
Without that last awk command, the output session for that script was as follows:
ericthered#OasisMega1:/0__WORK$ ./test_55.sh
File Name Modified Date
test_55.sh 2022-11-27 14:07:15
awkReportXmlTagMissingPropertyFieldAssignment.sh 2022-11-05 21:28:00
test_54.sh 2022-11-27 00:11:34
ericthered#OasisMega1:/0__WORK$
With that last awk command, you get this:

sed extract data only between time range

I have a file with the following entries:
MySql-DataBase-2020-09-22_183748.zip
MySql-DataBase-2020-09-22_184023.zip
MySql-DataBase-2020-09-23_205331.zip
MySql-DataBase-2020-09-23_205606.zip
MySql-DataBase-2020-09-24_200123.zip
MySql-DataBase-2020-09-24_200358.zip
MySql-DataBase-2020-09-25_115839.zip
MySql-DataBase-2020-09-25_120115.zip
MySql-DataBase-2020-09-26_094608.zip
MySql-DataBase-2020-09-26_094843.zip
MySql-DataBase-2020-09-27_122523.zip
MySql-DataBase-2020-09-27_122758.zip
MySql-DataBase-2020-10-01_230024.zip
MySql-DataBase-2020-10-01_230300.zip
MySql-DataBase-2020-10-02_120944.zip
MySql-DataBase-2020-10-02_121219.zip
MySql-DataBase-2020-10-03_151414.zip
MySql-DataBase-2020-10-03_151649.zip
MySql-DataBase-2020-10-04_211059.zip
MySql-DataBase-2020-10-04_211334.zip
MySql-DataBase-2020-10-05_064049.zip
MySql-DataBase-2020-10-05_064324.zip
I want to extract Files which are between today's date & 3 days back.
For today's date 2020/10/05, 3 days back is 2020/10/02.
The output must be:
MySql-DataBase-2020-10-02_120944.zip
MySql-DataBase-2020-10-02_121219.zip
MySql-DataBase-2020-10-03_151414.zip
MySql-DataBase-2020-10-03_151649.zip
MySql-DataBase-2020-10-04_211059.zip
MySql-DataBase-2020-10-04_211334.zip
MySql-DataBase-2020-10-05_064049.zip
I tried using this command to gets 3days back date
date --date='-3 day' '+%Y/%m/%d'
And then used these command to get output between date range
sed -n '/3day=date --date='-3 day' '+%Y/%m/%d'/,/date/p' s.txt
I am getting this error:
sed: -e expression #1, char 20: unterminated address regex
Please Help me to fix this issue. I'll be using this in a bash script.
Using a process substitution to generate the dates and feed that to grep as the patterns to search for:
grep -F -f <(for d in {0..3}; do date -d "$d days ago" "+%F"; done) file
sed -n "/$(date --date='-3 day' '+%Y-%m-%d')/,/$(date +'%y-%m-%d')/p"
MySql-DataBase-2020-10-02_120944.zip
MySql-DataBase-2020-10-02_121219.zip
MySql-DataBase-2020-10-03_151414.zip
MySql-DataBase-2020-10-03_151649.zip
MySql-DataBase-2020-10-04_211059.zip
MySql-DataBase-2020-10-04_211334.zip
MySql-DataBase-2020-10-05_064049.zip
Notice, the use of double quotes at the outermost level. Also, notice the same format of date in both boundaries.

Shell script to fetch log (json format) file between date and timestamp

The log file folder structure is \Mainfolder\folder1\year(2020)\month(07)\date(24)*.json.
Ex: \Mainfolder\folder1\2020\07\24\filename.json.
The .json file is getting created every hour, like 00:00:00_00:59:59.json, 01:00:00_01:59:59.json and so on.
I have to search under the .json file with following inputs.
My current inputs are keyword, start date. Currently I'm taking that Date, and keyword and able to get the output in a file.
Current script for your reference:
#!/bin/bash
set +x
DTE=$(date "+%d-%m-%Y-v%H%m%s")
Date=$1 #yyyy/mm/dd
Keyword=$2 #keyword in string
Start_Time=$3 #hh:mm
End_Time=$4 #hh:mm
BKT=bucketpath/mainfolder/
output=$(gsutil cat -h gs://bucketpath/mainfolder/"$Date"/* | egrep "$Keyword")
echo $output >> $"/tmp/folder/logoutput-$DTE"
gsutil cp -r /tmp/folder/logoutput-$DTE gs://bucketpath/mainfolder/
I have to add end date, Start_Time & End_Time and search in the .json file and get the output in a file like above.
I tried to use awk & sed, but i'm unable to get the output.
Could anyone help me on this script please.
Thanks in advance.
I prepared following script to collect the logs between date and timestamp along with keyword. My log file is in .json format.
The reason for posting here is, it might help someone who is looking for similar script.
#!/bin/bash
set +x
DTE=$(date "+%d-%m-%Y-v%H%m%s")
startdate=$1
enddate=$2
start_Time=$3
end_Time=$4
keyword=$5
BKT=storage/folder
i=$start_time
i1=$(sed 's/.\{3\}$//' <<< "$i")
j=$end_time
j1=$(sed 's/.\{3\}$//' <<< "$j")
curr="$startdate"
while true; do
echo "$curr"
[ "$curr" \< "$enddate" ] || break
output=$(gsutil cat -h gs://storage/folder/"$curr"/"$i1:00:00_$j1:59:59*" | sed -n '/"timestamp":"[^"]*T'$i':/,/"timestamp":"[^"]*T'$j':/p' | grep "$keyword")
echo $output >> $"/tmp/folder/mylog-$DTE"
curr=$( date +%Y/%m/%d --date "$curr +1 day" )
done
gsutil cp -r /tmp/folder/mylog-$DTE gs://storage/folder/

Alternating output in bash for loop from two grep

I'm trying to search through files and extract two pieces of relevant information every time they appear in the file. The code I currently have:
#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
reads=$(grep $str1 $file | cut -d ':' -f 3
samples=$(grep $str2 $file | cut -d '/' -f 8
echo $samples $reads >> reads.txt
done
It is doing each line for the file (the files have varying numbers of instances of these phrases) and gives me the output per row for each file:
PopA_15.fq 1081264
PopA_16.fq PopA_17.fq 1008416 554791
PopA_18.fq PopA_20.fq PopA_21.fq 604610 531227 595129
...
I want it to match each instance (i.e. 1st instance of both greps next two each other):
PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129
...
How do I do this? Thank you
Considering that your Input_file is same as sample shown and number of columns are even on each line with 1 PopA value and other will be with digit values. Following awk may help you in same.
awk '{for(i=1;i<=(NF/2);i++){print $i,$((NF/2)+i)}}' Input_file
Output will be as follows.
PopA_15.fq 1081264
PopA_16.fq 1008416
PopA_17.fq 554791
PopA_18.fq 604610
PopA_20.fq 531227
PopA_21.fq 595129
In case you want to pass output of a command to awk command then you could do like your command | awk command... no need to add Input_file to above awk command.
This is what ended up working for me...any tips for more efficient code are definitely welcome
#!/bin/bash
echo "Utilized reads from ustacks output" > reads.txt
str1="utilized reads:"
str2="Parsing"
for file in /home/desaixmg/novogene/stacks/sample01/conda_ustacks.o*; do
reads=$(grep $str1 $file | cut -d ':' -f 3)
samples=$(grep $str2 $file | cut -d '/' -f 8)
paste <(echo "$samples" | column -t) <(echo "$reads" | column -t) >> reads.txt
done
This provides the desired output described above.

Finding the file name in a directory with a pattern

I need to find the latest file - filename_YYYYMMDD in the directory DIR.
The below is not working as the position is shifting each time because of the spaces between(occurring mostly at file size field as it differs every time.)
please suggest if there is other way.
report =‘ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | cut -d “ “ -f9’
You can use AWK to cut the last field . like below
report=`ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | awk '{print $NF}'`
Cut may not be an option here
If I understand you want to loop though each file in the directory and file the largest 'YYYYMMDD' value and the filename associated with that value, you can use simple POSIX parameter expansion with substring removal to isolate the 'YYYYMMDD' and compare against a value initialized to zero updating the latest variable to hold the largest 'YYYYMMDD' as you loop over all files in the directory. You can store the name of the file each time you find a larger 'YYYYMMDD'.
For example, you could do something like:
#!/bin/sh
name=
latest=0
for i in *; do
test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }
done
printf "%s\n" "$name"
Example Directory
$ ls -1rt
filename_20120615
filename_20120612
filename_20120115
filename_20120112
filename_20110615
filename_20110612
filename_20110115
filename_20110112
filename_20100615
filename_20100612
filename_20100115
filename_20100112
Example Use/Output
$ name=; latest=0; \
> for i in *; do \
> test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }; \
> done; \
> printf "%s\n" "$name"
filename_20120615
Where the script selects filename_20120615 as the file with the greatest 'YYYYMMDD' of all files in the directory.
Since you are using only tools provided by the shell itself, it doesn't need to spawn subshells for each pipe or utility it calls.
Give it a test and let me know if that is what you intended, let me know if your intent was different, or if you have any further questions.

Resources