How to delete multi level partition in Hadoop HDFS

How to delete multi level partition in Hadoop HDFS - shell

Have a multi level partitioned Hive table,now need to delete the partitioned folders which are older
than certain years.
Multilevel partitions looks as below.
/data/warehouse/suite/catalyst/site/company=abc/year=2019/month=08
/data/warehouse/suite/catalyst/site/company=cde/year=2018/month=05
/data/warehouse/suite/catalyst/site/company=cde/year=2017/month=11
/data/warehouse/suite/catalyst/site/company=cde/year=2016/month=11
If want to delete the partitions older than 2 years, That means /year=2017/month=11 and year=2016/month=11 need to be deleted How it can be done.
Pls help, Thanks in advance.

ALTER TABLE mytable drop if exists partition (year<='2017')
You can not control the partition deletion as you are expecting
You can try it using the unix way that is more reliable.
hive -S -e "show partitions test" > tmp.txt
curr_year=`expr "$(date +'%Y')" - "2"`
curr_mon=`expr "$(date +'%m')" - "1"`
cur_part=$curr_year$curr_mon
cur_part=201812
echo $cur_part
#echo "year=2016/month=12" | cut -d '=' -f 2 | grep -o -E '[0-9]+'
#echo "year=2016/month=12" | cut -d '=' -f 4 | grep -o -E '[0-9]+'
while read -r line
do
part_year=`echo $line | cut -d '=' -f 2 | grep -o -E '[0-9]+'`
part_mon=`echo $line | cut -d '=' -f 3 | grep -o -E '[0-9]+'`
part_part=$part_year$part_mon
echo $part_part
if [[ $part_part -lt $cur_part ]]
then
echo "$part_year , $part_mon"
hive --hivevar year="$part_year" --hivevar month="$part_mon" -e 'ALTER TABLE test DROP IF EXISTS PARTITION (year="${hivevar:year}", month="${hivevar:month}")'
fi
done < tmp.txt
> show partitions test;
OK
year=2016/month=12
year=2017/month=11
year=2017/month=12
year=2018/month=12
> show partitions test;
OK
year=2017/month=12
year=2018/month=12
i have tested it is working fine

Related

bash script to scan for repeated episode numbers, append episode modifier

I use youtube-dl to archive specific blogs. I use a custom bash script (called tvify) to help me organize my content into Plex-ready filenames for later replay via my home Plex server.
Archiving the content works fine, unless a blogger posts more than one video on the same date - if that happens my script creates more than one file for a given month/date and plex sees a duplicate episode. In the plex app, it stuffs them together as distinct 'versions' of the same episode. The result is that the description of the video no longer matches its contents, and only one 'version' appears unless I access an additional sub menu.
The videos get downloaded by you tube-dl kicked off from a cron-job, and that downloader script runs the following to help format their filenames and stuff them into appropriate folders for 'seasons'.
The season is the year when the video was released, and the episode is the combination of the month and date in MMDD format.
Below is my 'tvify' script, which helps perform the filename manipulation and stuffs the file into the proper folder for the season.
#!/bin/bash
mySuff="$1"
echo mySuff="$mySuff"
if [ -z "$1" ]; then
mySuff="*.mp4"
fi
for i in $mySuff
do
prb=`ffprobe -- "$i" 2>&1`
myDate=`echo "$prb" | grep -E 'date\s+:' | cut -d ':' -f 2`
myartist=`echo "$prb" | grep -E 'artist\s+:' | cut -d ':' -f 2`
myTitle=`echo "$prb" | grep -E 'title\s+:' | cut -d ':' -f 2 | sed 's/\//_/g'`
cwd_stub=`pwd | awk -F'/' '{print $NF}'`
if [ -d "s${myDate:1:4}" ]; then echo "Directory found" > /dev/null; else mkdir "s${myDate:1:4}"; fi
[ -d "s${myDate:1:4}" ] && mv -- "$i" "s${myDate:1:4}/${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8} - ${myTitle[#]:1:40} _$i" || mv -- "$i" "${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8} - ${myTitle[#]:1:40} _$i"
done
How can I modify that script to identify if a conflicting year/MMDD file exists, and if so, append an appropriate suffix to the episode number so that plex will interpret them as distinct episodes?

I ended up implementing an array, counting the number of elements in the array, and using that to append the integer:
#!/bin/bash
mySuff="$1"
echo mySuff="$mySuff"
if [ -z "$1" ]; then
mySuff="*.mp4"
fi
for i in $mySuff
do
prb=`ffprobe -- "$i" 2>&1`
myDate=`echo "$prb" | grep -E 'date\s+:' | cut -d ':' -f 2`
myartist=`echo "$prb" | grep -E 'artist\s+:' | cut -d ':' -f 2`
myTitle=`echo "$prb" | grep -E 'title\s+:' | cut -d ':' -f 2 | sed 's/\//_/g'`
cwd_stub=`pwd | awk -F'/' '{print $NF}'`
readarray -t conflicts < <(find . -maxdepth 2 -iname "*s${myDate:1:4}e${myDate:5:8}*" -type f -printf '%P\n')
[ ${#conflicts[#]} -gt 0 ] && _inc=${#conflicts[#]} || _inc=
if [ -d "s${myDate:1:4}" ]; then echo "Directory found" > /dev/null; else mkdir "s${myDate:1:4}"; fi
[ -d "s${myDate:1:4}" ]
&& mv -- "$i" "s${myDate:1:4}/${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8}$_inc - ${myTitle[#]:1:40} _$i"
|| mv -- "$i" "${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8}$_inc - ${myTitle[#]:1:40} _$i"
done

how to awk pattern as variable and loop the result?

I assign a keyword as variable, and need to awk from a file using this variable and loop. The file has millions of lines.
i have tried the code below.
DEVICE="DEV2"
while read -r line
do
echo $line
X_keyword=`echo $line | cut -d ',' -f 2 | grep -w "X" | cut -d '=' -f2`
echo $X_keyword
done <<< "$(grep -w $DEVICE $config)"
log="Dev2_PRT.log"
while read -r file
do
VALUE=`echo $file | cut -d '|' -f 1`
HEADER=`echo $VALUE | cut -c 1-4`
echo $file
if [[ $HEADER = 'PTR:' ]]; then
VALUE=`echo $file | cut -d '|' -f 4`
echo $VALUE
XCOORD+=($VALUE)
((X++))
fi
done <<< "awk /$X_keyword/ $log"
expected result:
the log files content lots of below:
PTR:1|2|3|4|X_keyword
PTR:1|2|3|4|Y_rest .....
Filter the X_keyword and get the field no 4.

Unfortunately your shell script is simply the wrong approach to this problem (see https://unix.stackexchange.com/q/169716/133219 for some of the reasons why) so you should set it aside and start over.
To demonstrate the solution, lets create a sample input file:
$ seq 10 | tee file
1
2
3
4
5
6
7
8
9
10
and a shell variable to hold a regexp that's a character list of the chars 5, 6, or 7:
$ var='[567]'
Now, given the above input, here is the solution for how to g/re/p pattern as variable and count how many results:
$ awk -v re="$var" '$0~re{print; c++} END{print "---" ORS c+0}' file
5
6
7
---
3
If that's not all you need then please edit your question to clarify your requirements and provide concise, testable sample input and expected output.

if statement throws error as end of file

i have a script that looks something like
#!/bin/bash$
#x=new value$
#y=old value$
$
export PATH=/xxx/xxx/xxx:$PATH$
$
#get the difference of two files$
diff --side-by-side --suppress-common-lines file.txt file1.txt | tr -d "|,<,>,'\t'" | sed 's/ /:/g' | sed 's/^://' > diff.txt$
cat diff.txt$
$
#get the values$
for i in `cat diff.txt`; do$
plug_x=`echo $i | cut -d ":" -f1`$
echo "the value of jenkins plugin is $plug_x"$
ver_x=`echo $i | cut -d ":" -f2`$
echo "the value of jenkins version is $ver_x"$
plug_y=`echo $i | cut -d ":" -f3`$
echo "the value of db plugin is $plug_y"$
ver_y=`echo $i | cut -d ":" -f4`$
echo "the value of db version is $ver_y"$
if [ -z "$ver_y" ] && [ -z "$ver_x" ] ;$
then $
echo "the plugin is newly added"$
#newly added plugin should be updated in the db$
# mysql -u root -ppassword -h server --local-infile db << EOFMYSQL$
#update the table with the new version$
#EOFMYSQL$
else$
echo "the plugin has changes"$
mysql -u root -ppassword -h server --local-infile db << EOFMYSQL$
insert into table (xxx, xxx) values('$ver_x','$plug_x');$
$
EOFMYSQL $
fi$
done$
but when i run this script it saya
Syntax error: end of file unexpected (expecting "fi")
but the fi is there..i cant figure out why it is throwing the error
this error does not come when i just have echo statements in the script

I would suggest that if you are just running that one query to insert into table, then echo the query and pipe it to the mysql command, that would be (in my opinion) a better and efficient choice.
#!/bin/bash
diff --side-by-side --supress-common-lines test1.txt test2.txt | tr -d "|,<,>,'\t'" | sed 's/ /:/g' | sed 's/^://' > ./diff.txt
for i in `cat ./diff.txt`; do
plug_x=`echo $i | cut -d ":" -f1`
echo "the value of jenkins plugin is $plug_x"
ver_x=`echo $i | cut -d ":" -f2`
echo "the value of jenkins version is $ver_x"
plug_y=`echo $i | cut -d ":" -f3`
echo "the value of db plugin is $plug_y"
ver_y=`echo $i | cut -d ":" -f4`
echo "the value of db version is $ver_y"
if [ -z "$ver_y" ] && [ -z "$ver_x" ]
then
echo "the plugin is newly added"
else
echo "the plugin has changes"
echo "insert into table (xxx, xxx) values('$ver_x','$plug_x')" | mysql -u root -ppassword -h server --local-infile db
fi
done
That should do the job or if you'd like to use While loop, you can certainly do so.
#!/bin/bash
diff --side-by-side --supress-common-lines test1.txt test2.txt | tr -d "|,<,>,'\t'" | sed 's/ /:/g' | sed 's/^://' > ./diff.txt
cat ./diff.txt | while read i
do
plug_x=`echo $i | cut -d ":" -f1`
echo "the value of jenkins plugin is $plug_x"
ver_x=`echo $i | cut -d ":" -f2`
echo "the value of jenkins version is $ver_x"
plug_y=`echo $i | cut -d ":" -f3`
echo "the value of db plugin is $plug_y"
ver_y=`echo $i | cut -d ":" -f4`
echo "the value of db version is $ver_y"
if [ -z "$ver_y" ] && [ -z "$ver_x" ]
then
echo "the plugin is newly added"
else
echo "the plugin has changes"
echo "insert into table (xxx, xxx) values('$ver_x','$plug_x')" | mysql -u root -ppassword -h server --local-infile db
fi
done

Bash Script to batch-convert IP Addresses to CIDR?

Ok, here's the problem.
I have a plaintext list of IP addresses that I'm blocking on my servers, growing more and more unwieldy every day (added 3000+ entries today alone).
It's already been sorted for duplicates so that's not a problem. What I'd like to do is write a script to go through it and consolidate the entries a bit better for mass blocking.
For example, take this:
2.132.35.104
2.132.79.240
2.132.99.87
2.132.236.34
2.132.245.30
And turn it into this:
2.132.0.0/16
Any suggestions on how to code that in a bash script?
UPDATE: I've worked out part-way how to do what I'm needing. Converting it to /24 is easy, as follows:
cat /usr/local/blocks/blocks.txt | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> twentyfour.srt
done
sort -u twentyfour.srt > twentyfour.txt
rm -f twentyfour.srt
ori=`cat /usr/local/blocks/blocks.txt | wc -l`
new=`cat twentyfour.txt | wc -l`
echo "$ori"
echo "$new"
That reduced it down from 4,452 entries to 4,148 entries.
Instead of having:
109.86.9.93
109.86.26.77
109.86.55.225
109.86.70.224
109.86.87.199
109.86.89.202
109.86.95.248
109.86.100.19
109.86.110.43
109.86.145.216
109.86.152.86
109.86.155.238
109.86.156.54
109.86.187.91
109.86.228.86
109.86.234.51
109.86.239.61
I now have:
109.86.100.0/24
109.86.110.0/24
109.86.145.0/24
109.86.152.0/24
109.86.155.0/24
109.86.156.0/24
109.86.187.0/24
109.86.228.0/24
109.86.234.0/24
109.86.239.0/24
109.86.26.0/24
109.86.55.0/24
109.86.70.0/24
109.86.87.0/24
109.86.89.0/24
109.86.9.0/24
109.86.95.0/24
All well and good. BUT, there's 17 entries from the 109.86.. area. In a case where the first 2 octets match more than say 5 entries on /24, I'd like to reduce that to /16.
That's where I'm stuck.
UPDATE 2:
For Steve: Here's the block list for today. And here's the result so far. Apparently it's not removing the near-duplicate entries from twentyfour that are in sixteen.

I wish I could tell you this is a simple filter. However, all of the 2.0.0.0/8 network is registered to RIPE NCC. There's just way too many different ranges of blocked IP addresses, its easier to just narrow down the scope of visitors you do want versus what you don't want.
You could also use various tools you can use to block attacks automatically.
Map to identify which is which. https://www.iana.org/numbers
Here's a script I just made for you. Then you can create the major block lists for each of the primary registries. Afrinic, Lacnic, Apnic, Ripe, and Arin.
create_tables_by_registry.sh
Just run this script... Then run the following registry.sh files. (E.g; ripe.sh)
#!/bin/bash
# Author: Steve Kline
# Date: 03-04-2014
# Designed and tested to run on properly on CentOS 6.5
#Grab Updated IANA Address Space Assignments only if Newer Version
wget -N https://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.txt
assigned=ipv4-address-space.txt
arrayregistry=( afrinic apnic arin lacnic ripe )
for registry in "${arrayregistry[#]}"
do
#Clean up the ipv4-address-space.txt file and keep useable IPs
grep "$registry" $assigned | sed 's/\/8/\.0\.0\.0\/8/g'| colrm 15 > $registry-tmp1.txt
ip=($(cat $registry-tmp1.txt))
echo "#!/bin/bash" > $registry.sh
for ip in "${ip[#]}"
do
echo $ip | sed -e 's/" "//g' > $registry-tmp2.txt
#INSERT OR MODIFY YOUR COMPATIBLE FIREWALL RULES HERE
#This section creates the country to block.
echo "iptables -A INPUT -s $ip -j DROP" >> $registry.sh
chmod +x $registry.sh
done
rm $registry-tmp1.txt -f
rm $registry-tmp2.txt -f
done
Ok! Well I'm back, a little insane here and a little nutty there... I think I helped figure this out for you. I'm sure you can piece together a modification to better fit your needs.
#MODIFY FOR YOUR LIST OF IP ADDRESSES
BADIPS=block.ip
twentyfour=./twentyfour.ips #temp file for all IPs converted to twentyfour net ids
sixteen=./sixteen.ips #temp file for sixteen bit
twentyfourlst1=./twentyfour1.txt #temp file for 24 bit IDs
twentyfourlst2=./twentyfour2.txt #temp file for 24 bit IDs filtered by 16 bit IDs that match
sixteenlst=./sixteen.txt #temp file for parsed sixteenbit
#MODIFY FOR YOUR OUTPUT OF CIDR ADDRESSES
finalfile=./blockips.list #Final file post-merge
cat $BADIPS | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
echo "$oc1.$oc2.$oc3.0/24" >> $twentyfour
echo "$oc1.$oc2.0.0/16" >> $sixteen
done
awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>4){print i,a[i]}}}' $sixteen | sed 's/ [0-9]\| [0-9][0-9]\| [0-9][0-9][0-9]//g' > $sixteenlst
sort -u $twentyfour > twentyfour.txt
# THIS FINDS NEAR DUPLICATES MATCHING FIRST TWO OCTETS
cat $sixteenlst | while read line; do
oc1=`echo "$line" | cut -d '.' -f 1`
oc2=`echo "$line" | cut -d '.' -f 2`
oc3=`echo "$line" | cut -d '.' -f 3`
oc4=`echo "$line" | cut -d '.' -f 4`
grep "\b$oc1.$oc2\b" twentyfour.txt >> duplicates.txt
done
#THIS REMOVES THE NEAR DUPLICATES FROM THE TWENTYFOUR FILE
fgrep -vw -f duplicates.txt twentyfour.txt > twentyfourfinal.txt
#THIS MERGES BOTH RESULTS
cat twentyfourfinal.txt $sixteenlst > $finalfile
sort -u $finalfile
ori=`cat $BADIPS | wc -l`
new=`cat $finalfile | wc -l`
echo "$ori"
echo "$new"
#LAST MIN CLEANUP
rm -f $twentyfour $twentyfourlst $sixteen $sixteenlst duplicates.txt twentyfourfinal.txt
Going Back to fix: I noted a problem... Originally unsuccessful.
`grep "$oc1.$oc1" twentyfour.txt > duplicates.txt
For Example: The old script had bad results with this test IP range... the updated version now above... Does exactly as its intended. match the octet exactly.. and not a similar.
192.168.1.1
192.168.2.50
192.168.5.23
192.168.14.10
192.168.10.5
192.168.24.25
192.165.20.10
10.192.168.30
5.76.10.20
5.76.20.30
5.76.250.10
5.76.34.10
5.76.50.30
95.76.30.1 - Old script matched this to 5.76
20.20.5.5
20.20.10.10
20.20.16.50
20.20.205.20
20.20.60.20
205.20.16.20 - not a problem
20.205.150.150 - Old script matched this to 20.20
220.20.16.0 - Also failed without adding -w parameter to the last grep to only match exact strings.

bash script pulling variables from .txt, keeps giving syntax error while trying to use mount command

Ive been trying to get this to work for the last week and cannot figure out why this is not working. I get mixed results typing directly into the terminal, but keep getting syntax error messages when running from the .sh. using ubuntu 11.10
It looks like part of the mount command gets pushed to the next line not allowing it to complete properly.. I have no idea why this is happening or how to prevent it from going to the second line.
i have several lines defined as follows in mounts.txt, that gets read from mount-drives.sh below
I have called it to run using sudo so it shouldnt be a permissions issue.
Thanks for taking a look, let me know if additional info is needed.
mounts.txt
mountname,//server/share$,username,password,
mount-drives.sh ---origional, updated below
#!/bin/bash
while read LINE;
do
# split lines up using , to separate variables
name=$(echo $LINE | cut -d ',' -f 1)
path=$(echo $LINE | cut -d ',' -f 2)
user=$(echo $LINE | cut -d ',' -f 3)
pass=$(echo $LINE | cut -d ',' -f 4)
echo $name
echo $path
echo $user
echo $pass
location="/mnt/test/$name/"
if [ ! -d $location ]
then
mkdir $location
fi
otherstuff="-o rw,uid=1000,gid=1000,file_mode=0777,dir_mode=0777,username=$user,password=$pass"
mount -t cifs $otherstuff $path $location
done < "/path/to/mounts.txt";
mount-drives.sh ---updated
#!/bin/bash
while read LINE
do
name=$(echo $LINE | cut -d ',' -f 1)
path=$(echo $LINE | cut -d ',' -f 2)
user=$(echo $LINE | cut -d ',' -f 3)
pass=$(echo $LINE | cut -d ',' -f 4)
empty=$(echo $LINE | cut -d ',' -f 5)
location="/mount/test/$name/"
if [ ! -d $location ]
then
mkdir $location
fi
mounting="mount -t cifs $path $location -o username=$user,password=$pass,rw,uid=1000,gid=1000,file_mode=0777,dir_mode=0777"
$mounting
echo $mounting >> test.txt
done < "/var/www/MediaCenter/mounts.txt"

Stab in the dark (after reading the comments). The "$pass" is picking up a newline because the mounts.txt was created in windows and has windows line endings. Try changing the echo $pass line to:
echo ---${pass}---
and see if it all shows up correctly.

There's a lot here that could stand improvement. Consider the following -- far more compact, far more correct -- approach:
while IFS=, read -u 3 -r name path user pass empty _; do
mkdir -p "$location"
cmd=( mount \
-t cifs \
-o "rw,uid=1000,gid=1000,file_mode=0777,dir_mode=0777,username=$user,password=$pass" \
"$path" "$location" \
)
printf -v cmd_str '%q ' "${cmd[#]}" # generate a string corresponding with the command
echo "$cmd_str" >>test.txt # append that string to our output file
"${cmd[#]}" # run the command in the array
done 3<mounts.txt
Unlike the original, this will work correctly even if your path or location values contain whitespace.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to delete multi level partition in Hadoop HDFS - shell

Related

bash script to scan for repeated episode numbers, append episode modifier

how to awk pattern as variable and loop the result?

if statement throws error as end of file

Bash Script to batch-convert IP Addresses to CIDR?

bash script pulling variables from .txt, keeps giving syntax error while trying to use mount command

Categories

Resources