in hql scripts we use "!sh echo ---new line---" for the same . Want to know the alternative for this in impala to print any line in impala scripts? - hql

in hql scripts we use "!sh echo ---new line---" for the same .
Want to know the alternative for this in impala to print any line in impala scripts?

You can invoke the shell command line from impala scripts.
As an example of how it works.
script_impala.sql
-- set a variable containing the of the game
SET hivevar:game=Monopoly;
-- return the list of the game
SELECT list_price FROM fun.games WHERE name = '${hivevar:game}';
-- return the prices of the game ate game shops
SELECT shop, price FROM fun.inventory WHERE game = '${hivevar:game}';
shell hdfs dfs -ls /user;
shell ls -ltr;
shell echo I can echo from impala scripts;
shell cat hex_color.sql
$ impala-shell -f script_impala.sql
output
Starting Impala Shell without Kerberos authentication
Connected to localhost.localdomain:21000
Server version: impalad version 2.10.0-cdh5.13.3 RELEASE (build 15a453e15865344e75ce0fc6c4c760696d50f626)
Variable GAME set to Monopoly
Query: -- return the list of the game
SELECT list_price FROM fun.games WHERE name = 'Monopoly'
Query submitted at: 2020-06-04 23:29:19 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=a94afc7556843cf7:8b19809d00000000
+------------+
| list_price |
+------------+
| 19.99 |
+------------+
Fetched 1 row(s) in 0.15s
Query: -- return the prices of the game ate game shops
SELECT shop, price FROM fun.inventory WHERE game = 'Monopoly'
Query submitted at: 2020-06-04 23:29:19 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=c64d1c8950e5e1f8:15d410cb00000000
+-----------+-------+
| shop | price |
+-----------+-------+
| Dicey | 17.99 |
| Board 'Em | 25.00 |
+-----------+-------+
Fetched 2 row(s) in 0.14s
Found 5 items
drwxrwxrwt - mapred hadoop 0 2019-04-29 18:19 /user/history
drwxrwxrwx - hive hive 0 2019-04-29 18:19 /user/hive
drwxr-xr-x - hue hue 0 2019-11-25 10:19 /user/hue
drwxr-xr-x - spark spark 0 2019-04-29 18:19 /user/spark
drwxrwxrwx - training supergroup 0 2020-05-28 11:33 /user/training
--------
Executed in 2.10s
total 56
-rw-rw-r-- 1 training training 61 Sep 25 2019 hex_color.sql
-rw-rw-r-- 1 training training 115 Sep 25 2019 color_from_rgb.sql~
-rw-rw-r-- 1 training training 58 Sep 25 2019 hex_color_impala.sql
-rwxr-xr-x 1 training training 449 Sep 25 2019 email_results.sh
-rw-rw-r-- 1 training training 1166 Sep 25 2019 zero_air_time.csv
-rw-r--r-- 1 training training 261 Sep 26 2019 change_background.sh~
-rwxr-xr-x 1 training training 262 Sep 26 2019 change_background.sh
-rw------- 1 training training 2966 Sep 26 2019 ChangeVMDesktopColor.txt
-rw------- 1 training training 3279 Sep 26 2019 Hive&ImpalaInScripts&Applications.txt~
-rw------- 1 training training 3278 Sep 26 2019 Hive&ImpalaInScripts&Applications.txt
-rw-rw-r-- 1 training training 449 Sep 26 2019 email_resuts.sh
-rw-rw-r-- 1 training training 120 Apr 27 11:38 color_from_rgb.sql
-rw-rw-r-- 1 training training 397 Jun 4 23:24 game_prices.sql~
-rw-rw-r-- 1 training training 395 Jun 4 23:29 game_prices.sql
--------
Executed in 0.00s
I can echo from impala scripts
--------
Executed in 0.00s
SELECT hex FROM wax.crayons WHERE color = '${hivevar:color}'
--------
Executed in 0.01s
As you can see, you only have to write in your script
shell <command>;
You can even invoke hdfs dfs commands
shell hdfs dfs -<command>;
The above example prints the output to the console, if you want to print the output to a file you have to do as following to redirect all the output to a file:
$ impala-shell -f script_impala.sql >> /home/..../my_file.csv
Other option would be to redirect only the echo commands from inside the script. This would be as following:
script_impala.sql
-- set a variable containing the of the game
SET hivevar:game=Monopoly;
-- return the list of the game
SELECT list_price FROM fun.games WHERE name = '${hivevar:game}';
-- return the prices of the game ate game shops
SELECT shop, price FROM fun.inventory WHERE game = '${hivevar:game}';
shell hdfs dfs -ls /user;
shell ls -ltr;
shell echo I can echo from impala scripts >> /home/..../my_file.csv;
shell echo How are you? >> /home/..../my_file.csv;
shell echo I am feeling very good, how about you? >> /home/..../my_file.csv;
shell cat hex_color.sql
and don't forget the semicolon at the end of the line ";".
I hope it helps.

Related

bash console output print layout

I want to make my writing to the console output in a nice tidy human readable.
here is how it looks now:
====================== Sat Apr 16 12:57:17 EDT 2022 ======================
==========================================================================
====================== Leopard - Download from S3 ======================
==========================================================================
==========================================================================
====================== Leopard - Decompressing ======================
==========================================================================
total 1349872
drwxr-xr-x 2 root root 12288 Apr 16 12:57 .
drwxrwxrwt. 4 root root 102 Apr 16 12:57 ..
-rw-r--r-- 1 root root 185070885 Apr 16 12:03 asdasdasd.sql.gz
-rw-r--r-- 1 root root 40344632 Apr 16 12:03 asdasdas.sql.gz
-rw-r--r-- 1 root root 26631 Apr 16 12:03 asdad.sql.gz
-rw-r--r-- 1 root root 1679 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1237 Apr 16 12:03 asd.sql.gz
-rw-r--r-- 1 root root 5241900 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1144 Apr 16 12:03 asdasasd.sql.gz
-rw-r--r-- 1 root root 489312 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1138 Apr 16 12:03 asdasdasd.sql.gz
==========================================================================
====================== NewYorkCity - Download from S3 ======================
==========================================================================
==========================================================================
====================== NewYorkCity - Unloading SSL Example ======================
==========================================================================
total 1349872
drwxr-xr-x 2 root root 12288 Apr 16 12:57 .
drwxrwxrwt. 4 root root 102 Apr 16 12:57 ..
-rw-r--r-- 1 root root 185070885 Apr 16 12:03 asdasdasd.sql.gz
-rw-r--r-- 1 root root 40344632 Apr 16 12:03 asdasdas.sql.gz
-rw-r--r-- 1 root root 26631 Apr 16 12:03 asdad.sql.gz
-rw-r--r-- 1 root root 1679 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1237 Apr 16 12:03 asd.sql.gz
-rw-r--r-- 1 root root 5241900 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1144 Apr 16 12:03 asdasasd.sql.gz
-rw-r--r-- 1 root root 489312 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1138 Apr 16 12:03 asdasdasd.sql.gz
I want that all the === line will be in the same length, and the text always in the center with 1 space on each side
Will appreciate assistance here :)
UPDATE / EDIT:
The original script is something like that:
eecho () { echo ==========================================================================; }
echo_stage () {
START=1
END=11
for (( c=$START; c<=$END; c++ ))
do
printf == '-%.0s'
done
echo -n " " $1 " "
for (( c=$START; c<=$END; c++ ))
do
printf == '-%.0s'
done
echo
}
stage() {
eecho
echo_stage "$1" "$2"
eecho
}
print_date () { echo "======================" $(date) "======================"; }
reload_db() {
print_date
rm -rf /var/tmp/db
mkdir -p /var/tmp/db
stage "DB - Download from S3"
aws s3 sync s3://db-backup/latest/ /var/tmp/db --profile=papilon --quiet
stage "DB - Decompressing"
pigz -d /var/tmp/db/*
stage "DB - Restoring Data"
cd /var/tmp/db
stage "DB - Restoring Tables"
for i in `ls -1 *.sql | grep -v "_view.sql"`;do echo $i;mysql db < $i;done
stage "DB - Restoring Views"
for i in `ls -1 *.sql | grep "_view.sql"`;do echo $i;mysql db < $i;done
stage "DB - Clean up"
rm -rf /var/tmp/db
print_date
}
reload_db
This awk filter will size your === padding correctly, and align titles to the center, if you pipe your command output through it:
# cmd |
awk '
BEGIN {a[1] = "="}
NF>1 && $1~/^=+$/ && $NF~/^=+$/ {
sub(/^=+/, "")
sub(/=+$/, "")
title_len=length($0)
pad = ""
for (i=1; i<=(74-title_len)/2; ++i) {
pad=pad"="
}
$0 = pad $0 pad a[title_len%2]
}
1'
This doesn't wrap the ls -l output, only the titles. It assumes 74 is hardcoded as the length of the solid === lines (and doesn't change depending on terminal size). The array a is used to add an extra = when 74 - title_len is an odd number.
Example output:
====================== Sat Apr 16 12:57:17 EDT 2022 ======================
==========================================================================
====================== Leopard - Download from S3 ======================
==========================================================================
==========================================================================
======================= Leopard - Decompressing ========================
==========================================================================
total 1349872
drwxr-xr-x 2 root root 12288 Apr 16 12:57 .
drwxrwxrwt. 4 root root 102 Apr 16 12:57 ..
-rw-r--r-- 1 root root 185070885 Apr 16 12:03 asdasdasd.sql.gz
-rw-r--r-- 1 root root 40344632 Apr 16 12:03 asdasdas.sql.gz
-rw-r--r-- 1 root root 26631 Apr 16 12:03 asdad.sql.gz
-rw-r--r-- 1 root root 1679 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1237 Apr 16 12:03 asd.sql.gz
-rw-r--r-- 1 root root 5241900 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1144 Apr 16 12:03 asdasasd.sql.gz
-rw-r--r-- 1 root root 489312 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1138 Apr 16 12:03 asdasdasd.sql.gz
==========================================================================
==================== NewYorkCity - Download from S3 ====================
==========================================================================
==========================================================================
================= NewYorkCity - Unloading SSL Example ==================
==========================================================================
total 1349872
drwxr-xr-x 2 root root 12288 Apr 16 12:57 .
drwxrwxrwt. 4 root root 102 Apr 16 12:57 ..
-rw-r--r-- 1 root root 185070885 Apr 16 12:03 asdasdasd.sql.gz
-rw-r--r-- 1 root root 40344632 Apr 16 12:03 asdasdas.sql.gz
-rw-r--r-- 1 root root 26631 Apr 16 12:03 asdad.sql.gz
-rw-r--r-- 1 root root 1679 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1237 Apr 16 12:03 asd.sql.gz
-rw-r--r-- 1 root root 5241900 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1144 Apr 16 12:03 asdasasd.sql.gz
-rw-r--r-- 1 root root 489312 Apr 16 12:03 asdasd.sql.gz
-rw-r--r-- 1 root root 1138 Apr 16 12:03 asdasdasd.sql.gz
Because you have now posted your script, I will add a new answer for pure bash.
title() {
local text pad
(( ${#1} > 70 )) && { echo "$1"; return; }
text=${1:+ }$1${1:+ }
pad=$( eval "printf %.1s ={1..$(( ( 74 - ${#text} ) / 2 ))}" )
echo "$pad$text$pad$( (( ${#text} % 2 )) && printf = )"
}
For title 'foo bar', this function prints foo bar (or any string up to 70 characters) with a space either side, centered and padded to 74 columns with =. If the string is longer than 70 characters it's too long to pad, so it's printed as is.
================================ foo bar =================================
With no argument, or an empty argument, it prints a solid line of 74 =:
==========================================================================
You can swap = for any single ASCII character.
You can swap 74 for any even number. Also for an odd number, if you change && printf to || printf in the last line. (also change the 70 to N - 4)
You can call title once for the date, or three times for the larger three line banners (see banner below).
Explanation:
${#text} is bash for "length of $text".
${1:+ } expands to a space, unless $1 is empty or unset. This allows us to add spaces to either end of the string, or exclude them for an empty string.
={1..10} expands to =1 =2 =3 ... =10, and printf %.1s prints the first character of each string. Combining these allows us to repeat a string (=) N times.
But we can't use a variable (or arithmetic) in {1..10} normally. So we need eval.
You probably heard eval is bad, and a security risk. That's often true, but here we are not passing any unknown data to eval (such as user input), and it's safe from code injection. (${#text} always expands to a single number)
So we make two bars, of length (74 - text-length) / 2, adding another = to the second bar if the text length is an odd number.
I also made a few changes to your script which you might consider. Apart from the date and titles, these have nothing to do with the padding. The title and banner functions will work on your old script.
title() {
local text pad
(( ${#1} > 70 )) && { echo "$1"; return; }
text="${1:+ }$1${1:+ }"
pad=$( eval "printf %.1s ={1..$(( ( 74 - ${#text} ) / 2 ))}" )
echo "$pad$text$pad$( (( ${#text} % 2 )) && printf = )"
}
banner() {
title
title "$1"
title
}
reload_db() {
local i
title "$(date)"
rm -rf /var/tmp/db || exit 1
mkdir -p /var/tmp/db || exit 1
banner 'DB - Download from S3'
aws s3 sync s3://db-backup/latest/ /var/tmp/db --profile=papilon --quiet
banner 'DB - Decompressing'
pigz -d /var/tmp/db/*
banner 'DB - Restoring Data'
cd /var/tmp/db || exit 1
banner 'DB - Restoring Tables'
GLOBIGNORE='*_view.sql'
for i in *.sql; do
echo "$i"
mysql db < "$i"
done
GLOBIGNORE=
banner 'DB - Restoring Views'
for i in *_view.sql; do
echo "$i"
mysql db < "$i"
done
banner 'DB - Clean up'
rm -rf /var/tmp/db
title "$(date)"
}
At the very least, you should exit early if cd, mkdir, or the first rm fail. Also, looping over an unquoted ls command sub is a bad idea. Instead you can use glob expansion (or find).
You could also concatenate the SQL scripts, but this only works if all the commands end with a semicolon (see Run multiple sql files in mysql batch):
big_title 'DB - Restoring Tables'
GLOBIGNORE='*_view.sql'
printf '%s\n' *.sql
cat *.sql | mysql db
GLOBIGNORE=
big_title 'DB - Restoring Views'
printf '%s\n' *_view.sql
cat *_view.sql | mysql db

grep only files generated in a particular hour

I am trying to grep some pattern in a file set under a folder like below
Where on the output I have to perform remaining operation.
The output main.log is coming so huge almost 50k lines ,as the files starting with server02.log are almost 30 to 40 in number . The script based on this output is taking forever to complete.
Is there a way that I can only take files name starting with server02.log. and generated between time
20:00:00 and 21:00:00
ls -lrth server02.log.*
-rw-r--r-- 1 user user 1.9M Apr 15 20:20 server02.log.2020
-rw-r--r-- 1 user user 1.7M Apr 15 20:30 server02.log.2030
-rw-r--r-- 1 user user 1.6M Apr 15 20:41 server02.log.2041
-rw-r--r-- 1 user user 1.9M Apr 15 20:50 server02.log.2050
-rw-r--r-- 1 user user 2.1M Apr 15 21:00 server02.log.2100
-rw-r--r-- 1 user user 1.4M Apr 15 21:10 server02.log.2110
-rw-r--r-- 1 user user 1.9M Apr 15 21:20 server02.log.2120
-rw-r--r-- 1 user user 656K Apr 15 21:29 server02.log.2129
-rw-r--r-- 1 user user 4.6M Apr 15 21:40 server02.log.2140
-rw-r--r-- 1 user user 1.9M Apr 15 21:50 server02.log.2150
-rw-r--r-- 1 user user 1.7M Apr 15 21:59 server02.log.2159
-rw-r--r-- 1 user user 724K Apr 15 22:09 server02.log.2209
-rw-r--r-- 1 user user 1.3M Apr 15 22:20 server02.log.2220
-rw-r--r-- 1 user user 1.1M Apr 15 22:29 server02.log.2229
-rw-r--r-- 1 user user 1.7M Apr 15 22:41 server02.log.2241
-rw-r--r-- 1 user user 1.5M Apr 15 22:49 server02.log.2249
-rw-r--r-- 1 user user 2.4M Apr 15 23:01 server02.log.2301
-rw-r--r-- 1 user user 1.4M Apr 15 23:10 server02.log.2310
-rw-r--r-- 1 user user 585K Apr 15 23:19 server02.log.2319
-rw-r--r-- 1 user user 858K Apr 15 23:30 server02.log.2330
-rw-r--r-- 1 user user 892K Apr 15 23:40 server02.log.2340
-rw-r--r-- 1 user user 698K Apr 15 23:49 server02.log.2349
grep -E "###Update |###Initiate |###Re-Initiate " server02.log.* >> main.log
from the comments I made the change to my code as below
#!/bin/bash
DIR="."
d=$(date +%Y-%m-%d);
log_dir="logs/$d"
PREFIX="$log_dir/srv_02.log"
#PREFIX="srv_02.log"
echo "prefix value is $PREFIX"
START_HOUR="06"
for F in "$( find "$DIR" -name "${PREFIX}*" -printf '%Tc %p\n' | grep "\ ${START_HOUR}:" )"; do
echo "F value is $F"
grep -E "###Update |###Initiate |###Re-Initiate" "$F" >> main.log
done
error:
prefix value is logs/2021-04-16/srv_02.log
find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `logs/2021-04-16/srv_02.log*'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ `logs/2021-04-16/osbpd_srv_02.log*''.
F value is
grep: : No such file or directory
This solution looks for files in the given directory, created during the specified hour with names matching the given prefix.
#!/bin/bash
d=$(date +%Y-%m-%d)
DIR="logs/$d/$log_dir"
PREFIX="srv_02.log"
#PREFIX=server02.log
echo "prefix value is $PREFIX"
START_HOUR="06"
for F in "$( find "$DIR" -name "${PREFIX}*" -printf '%TY-%Tm-%Td\n' | grep "\ ${START_HOUR}:" )"; do
echo "$F"
# grep -E "###Update |###Initiate |###Re-Initiate Assignment Milestone|###Complete Assignment Milestone|###Cancel Assignment Milestone|###Suspend Assignment Milestone|###Resume Assignment Milestone" "$F" >> main.log
done

TCL/Expect - exec - how to execute program with parameters

I am experimenting with TCL command exec in tclsh and here are my results:
% set show_me_dir "ls"
ls
% exec $show_me_dir
VboxSharedFolder
% set show_me_dir "ls -la"
ls -la
% exec $show_me_dir
couldn't execute "ls -la": no such file or directory
% set show_me_dir {ls -la}
ls -la
% exec $show_me_dir
couldn't execute "ls -la": no such file or directory
% ls -la
total 141
d---------+ 1 wakatana Domain Users 0 Jan 22 19:12 .
d---------+ 1 wakatana Domain Users 0 Apr 16 2014 ..
----------+ 1 wakatana Domain Users 20214 Jan 23 18:43 .bash_history
----------+ 1 wakatana Domain Users 1494 Apr 15 2014 .bash_profile
----------+ 1 wakatana Domain Users 7593 Jan 22 19:03 .bashrc
d---------+ 1 wakatana Domain Users 0 Jan 15 14:56 VboxSharedFolder
%
Can somebody please explain how can I execute command with arguments?
Edit:
The following example from Expanding a list of parameters in Tcl and eval article was big eye opener of what is going on here:
The variable $action is only expanded into the string "piemiddle apple" AFTER the command line has been split into its individual parameters:
% set action {piemiddle apple}
% set $action
can't read "piemiddle apple": no such variable
Result: set command "sees" one argument, equivalent to:
% set {piemiddle apple}
The expand operator allows you to specify that a variable is to be expanded BEFORE the command line is split into individual parameters:
% set action {piemiddle apple}
% set {*}$action
apple
Result: set command "sees" two arguments, equivalent to:
% set piemiddle apple
In earlier versions of Tcl, the eval command was the recommended alternative and it remains available today.
% set action {piemiddle apple}
% eval set $action
apple
Another examples which proves functionality of expansion operator:
% set {*}"name Linus"
Linus
% puts $name
Linus
%
%
% set distro Unbuntu
Unbuntu
% set {*}"linux $distro"
Unbuntu
% puts $linux
Unbuntu
%
%
Finally the discovery that exec needs command as it's first argument and first command option as it's second argument etc.
% exec "ls" "-la"
total 137
d---------+ 1 wakatana Domain Users 0 Jan 22 19:12 .
d---------+ 1 wakatana Domain Users 0 Apr 16 2014 ..
----------+ 1 wakatana Domain Users 20214 Jan 23 18:43 .bash_history
----------+ 1 wakatana Domain Users 1494 Apr 15 2014 .bash_profile
----------+ 1 wakatana Domain Users 7593 Jan 22 19:03 .bashrc
d---------+ 1 wakatana Domain Users 0 Jan 15 14:56 VboxSharedFolder
%
%
% exec "ls -la"
couldn't execute "ls -la": no such file or directory
The safest way to build a command for exec is to use Tcl's list. For example:
% set tcl_version
8.5
% set cmd [list ls -l tmp]
ls -l tmp
% eval exec $cmd
total 32
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 file.txt
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 foo-1.dat
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 foo-2.dat
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 foo-3.dat
% exec {*}$cmd
total 32
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 file.txt
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 foo-1.dat
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 foo-2.dat
-rw-r--r-- 1 pynexj staff 1176 Jan 23 23:24 foo-3.dat
%
Note that {*} is a new syntax of Tcl 8.5 which can help reduce the uses of eval.
As example for ls command you can do:
exec {*}ls -lsa {*}[glob *.cpp]
Please have a look at What does {*} do in TCL?

Please help me how to delete the files in one folder which are more than 60 days old in UNIX

I know how to delete the files the files which are more than 60 days old. But I have to satisfy below conditions. Please help me to get correct script to automate this.
I have below files for each day on monthly basis. So I have these files for last 3 years.
vtm_data_12month_20140301.txt
vtm_data_12month_20140301.control
vtm_mtd_20130622.txt
vtm_mtd_20130622.control
vtm_ytd_20131031.txtvtm_ytd_20131031.control
I'd like to write a script find the all files which are more than 60 days old and delete them all but except last month file.
Suppose for january I want to keep the last file (latest) vtm_data_12month_20140131.txt and delete all 30 files. Issue here is, there is chance that I might have files received for January 30th, so in that case I should not delete the latest file, but I have to delete the rest.
Please advice me how can we achieve this via shell script. Your response is highly appreciated.
There are many ways to do this. The two primary approaches are either to (1) use the actual file date to determine whether the files are removed or (2) use the date embedded in the filename to determine the file date. Both have advantages and pitfalls. What you seem to be asking is to remove files 60 days older than the latest date embedded in the filename or 2.
As you have indicated, you may have a number of files with dates mixed relatively close to the end and you may need to adjust the date. Rather than just having the script parse for a maximum file date string contained in the file, you can prompt for the end date to measure 60 days back from. Otherwise, just scan each embedded date and find the max, and subtract 60 days from there. The following script prompts for an end_date.
In fact, the following script contains code to remove files by both methods (and sample data). The code to remove based on the actual file create date ( (1) above ) is commented out below the code that uses the embedded date. Look over the script and understand what it does. It is fairly well commented. NOTE the actual rm command is commented out to prevent accidents (even though it requires you to enter YES to confirm removal). Uncomment the rm line to be able to actually remove files. Drop a comment if you have questions:
#!/bin/bash
oifs="$IFS" # save current IFS (internal field separator) (default ' \t\n')
IFS=$'\n' # set IFS to only break on space
## prompt for path containing files & read
printf "\n enter the path to files to remove (no ending '/'): "
read -r rmpath
## validate directory
[ -d "$rmpath" ] || { printf "\nerror: bad path '%s'\n\n" "$rmpath"; exit 1; }
## prompt for ending date of files to keep
printf "\n enter the _end_ date of files to keep 'yyyymmdd' : "
read -r enddatestr
IFS="$oifs" # reset IFS to original
enddt=$(date -d "$enddatestr" +%s) # get enddt in seconds since epoch
enddt=$((enddt - (60 * 24 * 3600))) # subtract 60 days
declare -a rmarray
## Using embedded filename date
mdate=$(date -d "#$enddt" +%Y%m%d) # get mdate string to compare to filename
## fill rmarray with file dates older than mdate
for i in $(find "$rmpath" -maxdepth 1 -type f); do
ffname="${i##*/}" # full filename component
fname=${ffname%.*} # filename w/o extension
fdate="${fname##*_}" # get file date string
## if fdate before mdate, add to remove array
[ "$mdate" -gt "$fdate" ] && rmarray+=( "$i" )
done
# ### Using actual file creation date
# tgtfile=/tmp/tgt_$(date +%s) # tmp filename to measure against
#
# ## create temp file to measure against with find & set trap to remove
# touch -t $(date -d "#${enddt}" +%Y%m%d%H%M.%S) "$tgtfile" &&
# trap 'rm -rf "$tgtfile"' 0
#
# ## fill array with filenames to remove
# rmarray=( $(find "$rmpath" -maxdepth 1 -type f ! -newer $tgtfile) )
## verify files are contained in rmarray
[ "${#rmarray[#]}" -lt 1 ] && {
printf "\n No files matched the dates for removal.\n\n"
exit 1
}
## print files that will be removed
printf "\n ** the following files will be removed **\n\n"
for i in "${rmarray[#]}"; do
ls -al "$i"
done
## prompt for actual removal
printf "\n Continue with ACTUAL removal (YES to remove) : "
read ans
if [ "$ans" = "YES" ]; then
for i in "${rmarray[#]}"; do
# rm "$i" # NOTE: 'rm' is commented, uncomment to really delete
done
else
printf "\n You entered '%s' (not YES), no removal performed.\n\n" "$ans"
fi
exit 0
test directory:
$ls -l dat/fstst
total 0
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_data_12month_20140301.control
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_data_12month_20140301.txt
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_mtd_20130622.control
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_mtd_20130622.txt
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_ytd_20131031.control
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_ytd_20131031.txt
use:
$ bash rmfiles_60days.sh
enter the path to files to remove (no ending '/'): dat/fstst
enter the _end_ date of files to keep 'yyyymmdd' : 20140301
** the following files will be removed **
-rw-r--r-- 1 david david 0 Nov 27 01:10 dat/fstst/vtm_mtd_20130622.txt
-rw-r--r-- 1 david david 0 Nov 27 01:10 dat/fstst/vtm_ytd_20131031.control
-rw-r--r-- 1 david david 0 Nov 27 01:10 dat/fstst/vtm_ytd_20131031.txt
-rw-r--r-- 1 david david 0 Nov 27 01:10 dat/fstst/vtm_mtd_20130622.control
Continue with ACTUAL removal (YES to remove) : YES
result:
$ ls -l dat/fstst
total 0
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_data_12month_20140301.control
-rw-r--r-- 1 david david 0 Nov 27 01:10 vtm_data_12month_20140301.txt
The following is an example using the actual file date:
test directory:
$ls -l dat/tst
total 324
-rw-r--r-- 1 david david 74 Sep 9 01:23 1.txt
-rw-r--r-- 1 david david 74 Sep 9 01:23 2.txt
-rw-r--r-- 1 david david 201 Aug 1 03:47 3line.dat
-rw-r--r-- 1 david david 205 Aug 1 03:35 3line.dat.sav
-rw-r--r-- 1 david david 88 Aug 13 04:05 catfile.txt
-rw-r--r-- 1 david david 39 Jul 4 14:40 comma
-rw-r--r-- 1 david david 291 Sep 23 03:00 createfile.txt
-rw-r--r-- 1 david david 11 Jul 17 03:54 data.dat
-rw-r--r-- 1 david david 8 Jul 17 03:54 datb.dat
-rw-r--r-- 1 david david 369 Oct 2 14:25 dia.txt
-rw-r--r-- 1 david david 36 Nov 6 15:51 dicta.dat
-rw-r--r-- 1 david david 23895 Sep 9 17:14 dna.dat
-rw-r--r-- 1 david david 243 Nov 4 23:07 domain.dat
-rw-r--r-- 1 david david 276 Nov 23 00:32 ecread.dat
(snip)
use:
$ bash rmfiles_60days.sh
enter the path to files to remove (no ending '/'): dat/tst
enter the _end_ date of files to keep 'yyyymmdd' : 20141031
** the following files will be removed **
-rw-r--r-- 1 david david 205 Aug 1 03:35 dat/tst/3line.dat.sav
-rw-r--r-- 1 david david 29 Jun 29 02:23 dat/tst/f1f2.dat
-rw-r--r-- 1 david david 8 Jul 17 03:54 dat/tst/datb.dat
-rw-r--r-- 1 david david 60 Jul 27 23:24 dat/tst/vowels.txt
-rw-r--r-- 1 david david 134 Aug 11 00:32 dat/tst/outfile.txt
-rw-r--r-- 1 david david 4622 Jun 26 02:49 dat/tst/single.xml
-rw-r--r-- 1 david david 99 Jul 4 14:51 dat/tst/hostnm
-rw-r--r-- 1 david david 115 Aug 7 01:35 dat/tst/ltags.txt
-rw-r--r-- 1 david david 122 Aug 29 11:11 dat/tst/hh.dat
-rw-r--r-- 1 david david 509 Jul 21 17:28 dat/tst/orders.txt
-rw-r--r-- 1 david david 205 Jun 27 01:06 dat/tst/table.html
(snip)
Continue with ACTUAL removal (YES to remove) : YES
result:
$ ls -l dat/tst
total 168
-rw-r--r-- 1 david david 74 Sep 9 01:23 1.txt
-rw-r--r-- 1 david david 74 Sep 9 01:23 2.txt
-rw-r--r-- 1 david david 291 Sep 23 03:00 createfile.txt
-rw-r--r-- 1 david david 369 Oct 2 14:25 dia.txt
-rw-r--r-- 1 david david 36 Nov 6 15:51 dicta.dat
-rw-r--r-- 1 david david 23895 Sep 9 17:14 dna.dat
-rw-r--r-- 1 david david 243 Nov 4 23:07 domain.dat
-rw-r--r-- 1 david david 276 Nov 23 00:32 ecread.dat
-rw-r--r-- 1 david david 93 Nov 2 21:43 empdata.dat
(snip)

How do I batch renumber files in sequence?

I have a sequence of image files that look like this:
image-149454.jpg
image-149455.jpg
I have some other images that I want to append to the end of the sequence where it left off but currently they are numbered from 0 (i.e. image-000000 to image-010000).
What's a script I could use to rename those new images starting from a certain number and going on, in this case 149456 and onwards?
I would try something like this. You just have to adjust the offset and add a string in front of the name, just as you need it.
x=1
for i in *.jpg; do
temp=$(printf "%08d.jpg" ${x}) #padding since you seem to want it
mv ${i} ${temp}
let x=x+1
done
Here's an untested solution in Python. This should change all of the files in a directory to have sequentially higher numbers than the other.
To use it enter: script.py dir1 dir2
Assuming the original files have the higher number (149455) are in dir1 and the new files start from 000000 are in dir2:
import os, sys, re
max_image = 0
# check if current (with higher numbers) and other (with lower) directory is given
if len(sys.argv) == 3:
for files in os.listdir(sys.argv[1]): # first dir is current
if files.endswith(".jpg"):
#for all jpgs get the max
m = re.search("\-(\d+)", files)
number = int(m.group(1))
if max_image < number:
max_image = number
for files in os.listdir(sys.argv[2]): # second dir is other
if files.endswith(".jpg"):
#get current start point
m = re.search("\-(\d+)", files)
number = int(m.group(1))
os.rename(files, "image-" + str(number + max_image)+".jpg") # add the max from current folder
Here's a pure bash solution which uses the last image sequence as your starting point (per your request):
#!/bin/bash
last_seq=$(ls image-* | tail -1 | sort -n | cut -c7-) # grab the sequence number
last_seq=${last_seq%.jpg} # remove the trailing .jpg
if [ -z $last_seq ] ; then
echo "Unable to obtain the last image sequence number"
exit 1
fi
for image in image-0*.jpg ; do
[ -f ${image} ] || break # In case no files match image-0*.jpg
let last_seq=last_seq+1
mv -v ${image} image-$(printf "%06d.jpg" ${last_seq})
done
Here's how it worked locally:
before the run:
~/tmp › ls -l
total 80
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:20 image-00000.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:20 image-01000.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:05 image-14908.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:05 image-14909.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:05 image-14910.jpg
-rw-r--r-- 1 dyoung staff 418 Dec 12 10:24 testh.sh
during the run:
~/tmp › sh ./testh.sh
image-00000.jpg -> image-14911.jpg
image-01000.jpg -> image-14912.jpg
after the run:
~/tmp › ls -l
total 80
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:05 image-14908.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:05 image-14909.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:05 image-14910.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:20 image-14911.jpg
-rw-r--r-- 1 dyoung staff 0 Dec 12 10:20 image-14912.jpg
-rw-r--r-- 1 dyoung staff 429 Dec 12 12:25 testh.sh

Resources