Unix script to run count query for specific date range - shell

I am getting a GC overhead issue when i am running a count for a date range as it has a huge data to pull, so need a logic to run the query for a specific date range (for example to run query for every 30 days without missing any data ) and sum it up all at the end.
I have tried to run the query for every 30 days, but in this approach ,there might be some chances to miss the data count for few days.
currently , I wrote the below code and am able to run the query successfully,but its very time consuming process, so need a help to change this code for every month or some specific date range instead of running below.
while [ ${PART_START_DATE} -le ${RUN_START_DAY} ]
do
fb_TEST=$(($fb_TEST+$(hive -S -e "use ${DATABASE};set hive.cli.print.header=false;select count(*) from fb_wrk_tab where date = '${PART_START_DATE}';")))
PART_START_DATE=`date -d "${PART_START_DATE} 1 days" +%Y%m%d`
echo "fbwrk_TEST count is"$fb_TEST >> ${LOG_FILE}
done

Related

Want to get the matching Date & time from values in a file which match with the server current Date and time using shell scripts

I am a beginner in scripting and I am trying to set TPS value for a system according to the time of the server when it executes the script. I am having csv file which contains start time , End time and TPS columns which starts from 00:00 to 23:59 as follows.
StartTime,EndTime,TPS
.....,.....,...
.....,.....,...
11:30,12:00,100
12:00,12:45,200
12:45,13:30,520
.....,.....,...
.....,.....,...
23:40,23:50,920
23:50,23:59,250
Time gaps are not uniform.
if the server current time is 11:35, I want to chose the "11:30,12:00,100" line and get it into a seperate file (since 11:35 lies between 11:30-12:00). Also the chosen line will be deleted from the initial csv file.
#Current time into a variable
TS=$(date | cut -d ' ' -f4 | cut -d ':' -f1,2)
echo "Current time = $TS"
Writing the relevant line to a seperate file and removing that line from the initial file is fine for me.
if the TS=11:35, I want to get the output as "11:30,12:00,100" from that csv file.
Struggling to code how to get that matching line.
Partial answer:
Your "datalist" seem improperly conceived for full coverage. You have gaps of 1 minute at each interval which is not being captured.
To get proper coverage of all activity, you need to have first time period ending with ${EndTime_1} ... and ... the next time period starting with ${StrtTime_2}=${EndTime_1}.
When scanning, you should specify range as
if [ "${StrtTime_x}" -le "${LineTime}" -a "${LineTime}" -lt "${EndTime_x}" ]
then
...{action}...
fi
Note: the first comparison is -le , but the second is only -lt , thereby ensuring lines never match both conditions.

How to make squeue display time limits in hours only?

When viewing submitted jobs managed by Slurm, I would like to have the time limit column (specified by %l) to show only hours, instead of the usual days-hours:minutes:seconds format. This is the command I am currently using:
squeue --format="%.6i %.5P %.25j %.8u %.8T %.10M %.5l %.15b %.5C %.6D %R" --sort=+i --me
and this is the example output:
276350 qgpu jobname username RUNNING 1:14:14 1-00:00:00 gres:gpu:v100:1 18 1 s31n02
So, in this case, I would like the elapsed time to remain as is (1:14:14), but the time limit to change from 1-00:00:00 to 24. Is there a way to do it?
This is the way Slurm displays the dates. Elapsed time will eventually be displayed the same way (days-hours:minutes:seconds) after 23:59:59.
You can use a wrapper script to convert into a different format. Or if you know the time limit is no more than a day, just set the time limit to 23:59:00 by using --time=1439.
salloc -N1 --time=1439 bash
Using your squeue command:
166 mypartition interactive jyvet RUNNING 7:36 23:59:00 N/A 1 1 mynode

Unix shell scripting: date format syntax

I m trying to get yesterday date, it's not working in hp ux server.
Prev_date=$(date +"y%m%d" -d "1 day ago")
For this I m still getting current date only.
20210811
Could you please help on the same.
You missed a percentage in front of the 'y': this is working fine for me:
echo $(date +"%y%m%d" -d "1 day ago")
You can use below command, if you want.
date --date=' 1 days ago' '+%Y-%m-%d'
It will give result like
2021-09-06
I prefer this format since most of the time my scripts include SQL queries for data fetch and hence date is required to filter out data on daily basis.

Time upload part of bash script and output in MM:SS

I have a function which uploads a file to a server, I want to measure how long that part of the script takes and display the result for use un the output,
eg:
echo "$file uploaded in XXXM:XXs"
I can display the number of seconds, with:
my_upload_stuff here
echo "$file uploaded in $SECONDS"
Which as I understand displays the number of seconds since the script has started (which is fine for what I need) But this is as far as I can get.
Pulling my remaining hair out trying to figure this, seems to be way harder than I would imagine. Been all around the houses but nothing seems to work - I confess my bash skills are newbie...
You could use the command time if installed (but you'd have to use the external command /usr/bin/time and so you'd have to install it if it's not already installed ... because the built-in bash command time hasn't got the options to produce the desired output).
Or alternatively, you can perform the calculation explicitely, like that:
t1=$(date +%s)
<put upload_command here>
t2=$(date +%s)
echo "Time elapsed for upload: $(( ( t2 - t1 ) / 60 )) minutes and $(( ( t2 - t1 ) % 60 )) seconds"

DataStage execute shell script to sleep in a loop sequence job

Currently, I have a sequence job in DataStage.
Here is the flow:
StartLoop Activity --> UserVariables Activity --> Job Activity --> Execute Command --> Endloop Activity
The job will run every 30 minutes (8 AM - 8 PM) to get real data. The first loop iteration will load data from 8 PM the previous day to 8 AM the current day, and the others will load data that happens in the last 30 minutes.
The UserVariables Activity is to pass variables (SQL statement) to filter data getting in the Job Activity. The first iteration the UserVariables pass variable A (SQL statement 1) to the Job Activity, from the second iteration, it will pass variable B (SQL statement 2) to the Job Activity.
The Execute Command I currently set the 'Sleep 1800' command for the job to sleep 30 minutes to end the iteration of the loop. But I realized now that it is affected by the running time of each iteration. So with my knowing-nothing about shell script, I have searched for solutions and have this file to sleep until a specific time when minute like 30 or 00 (delay 0-1 minute but it's fine).
The shell script is below, I ran it fine on my system but no success on making it as part of the job.
#!/bin/bash
minute=$(date +%M)
num_1=30
num_2=60
if [ $minute -le 30 ];
then
wait=$((($num_1 - $minute)*$num_2))
sleep $wait
fi
if [ $minute -gt 30 ];
then
wait=$((($num_2 - $minute)*$num_2))
sleep $wait
fi
I am now facing 2 problems right now that I need your help with.
The job runs the first iteration fine with the variable A below:
select * from my_table where created_date between trunc(sysdate-1) + 20/24 and trunc(sysdate) + 8/24;
But from the second iteration it failed with the Job Activity with the variable B below:
select * from my_table where created_date between trunc(sysdate-1/48, 'hh') + 30*trunc(to_number(to_char(sysdate-1/48,'MI'))/30)/1440 and trunc(sysdate, 'hh') + 30*trunc(to_number(to_char(sysdate,'MI'))/30)/1440;
In the parallel job, the log said:
INPUT,0: The following SQL statement failed: select * from my_table where created_date between trunc(sysdate-1/48, hh) + 30*trunc(to_number(to_char(sysdate-1/48,MI))/30)/1440 and trunc(sysdate, hh) + 30*trunc(to_number(to_char(sysdate,MI))/30)/1440.
I realized that maybe it failed to run the parallel job because it removed the single quote in hh and MI.
Is it because when passing variables from UserVariables Activity to Job Activity the variable will remove all the quotes? And how can I fix this?
2. How can I make the shell script above as part of the job like Execute Command or some other stage. I have searched for solutions and I think it's about the ExecSH Before/ After Routine Activity. But after reading from IBM pages, I still don't know where to start with it.
Sorry for adding 2 questions at 1 post that makes it so long but it's very relative to each other so it will take lots of time to answer if I separate it into 2 posts and you guys need more information about it.
Thank you!
Try escaping the single quote characters (precede each with a backslash).
Execute the shell script from an Execute Command activity ahead of the Job activity.

Resources