How to check the string length using grep? - bash

I have a lot of Teradata SQL files. The example file look like below:
create multiset volatile table abcde_fghijk_lmnop as(
select
a.oppnl3_budssstr as nip,
from T45_BACKJJU_33KUT.BRANDFO9 a
) with data on commit preserve rows;
create multiset volatile table mari_lee as(
select
b.getter3,
from maleno_fugi75_pratq b
) with data on commit preserve rows;
create multiset table blabla1 as (
select
a.atomic94,
from b4ty7_manto.pretyu59_bxcx a
) with data on commit preserve rows;
CREATE multiset table blablabla2 AS (
SELECT
a.prompter_to12
FROM tresh_old44 a
) WITH data on commit preserve rows;
CREATE multiset table blablablabla3 AS (
SELECT
c.future_opt86
FROM GFTY_133URO c
) WITH data on commit preserve rows;
I want to create a grep method which can count the length of the table name, which can't exceed 10 signs.
I have created a few greps, but none of them work, and I don't know why. What I have done wrong?
for f in /path/to/sql/files/*.sql; do
if grep -q ' table \{1,10\}' "$f"; then
echo "correct length of table name $f"
fi
done
other greps which I used:
if grep -q ' table \{1,10\} as ' "$f"; then
if grep -q ' table \[[:alnum:]]\{1,10\} ' "$f"; then
if grep -q ' table\[[:space:]][[:alnum:]]\{1,10\} ' $f; then

There are a couple of problems with your attempts. Firstly, it looks like you're escaping the [ in some of your bracket expressions, which means that the [ will be interpreted as a literal character instead. Secondly, you need to take care to match 1 to 10 legal characters, followed by a different character.
This pattern does what you want (I removed the -q so that you can see which table definitions match):
$ grep ' table [[:alnum:]_]\{1,10\}[^[:alnum:]_]' file
create multiset volatile table mari_lee as(
create multiset table blabla1 as (
CREATE multiset table blablabla2 AS (
This pattern matches 1 to 10 alphanumeric characters or underscores, followed by a different character, meaning that the longer table names no longer match.
As it appears that the casing is inconsistent, you should probably also use the -i switch to grep, to enable case-insensitive matching. Otherwise, any definitions that use "TABLE" would not match.

Use grep with word boundary to list only valid table names:
grep -E 'table +.{1,10}\b' "$f"
create multiset volatile table mari_lee as(
create multiset table blabla1 as (
CREATE multiset table blablabla2 AS (
To suppress output use -q and check return status:
grep -qE 'table +.{1,10}\b' "$f"

Related

bash / sed / awk Remove or gsub timestamp pattern from text file

I have a text file like this:
1/7/2017 12:53 DROP TABLE table1
1/7/2017 12:53 SELECT
1/7/2017 12:55 --UPDATE #dat_recency SET
Select * from table 2
into table 3;
I'd like to remove all of the timestamp patterns (M/D/YYYY HH:MM, M/DD/YYYY HH:MM, MM/D/YYYY HH:MM, MM/DD/YYYY HH:MM). I can find the patterns using grep but can't figure out how to use gsub. Any suggestions?
DESIRED OUTPUT:
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;
You can use this sed command to remove data/time stamps from line start:
sed -i.bak -E 's~([0-9]{1,2}/){2}[0-9]{4} [0-9]{2}:[0-9]{2} *~~' file
cat file
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;
Use the default space separator, make first and second columns to empty string and then print the whole line.
awk '/^[0-9]/{$1=$2="";gsub(/^[ \t]+|[ \t]+$/, "")} !/^[0-9]/{print}' sample.csv
the command checks each line whether starts with numeric or not, if it is replace the first 2 columns with empty strings and remove leading spaces; otherwise print the original line.
output:
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;

Hive change column name without knowing column data type

I want to change the column name of a Hive table without changing it's datatype.
I tried below query but it requires datatype which I don't know.
ALTER TABLE test CHANGE a a1 INT;
I would like to prefix SALES_ before all my columns irrespective of their column types.
Input Table
emp_id(int) emp_name(string) salary(double)
Output Table
sales_emp_id(int) sales_emp_name(string) sales_salary(double)
Thanks in advance.
Well, altering the column name in hive using alter table command require its datatype.
For this purpose you may perform the below commands,
1)Create a new table with the your new column names)
create table newTable (sales_emp_id int ,sales_emp_name string, sales_salary double) ;
2)Insert into new table from old table
insert into newTable select * from oldtable;
3)Now,you may drop your old table.
drop table oldtable;
The above code may be used if creating a new table sounds ok for you.
Well if you use a shell script , something like below:
while read line;do
SOURCE_TABLENAME= `echo $line| awk -F" " '{print $1}'`
TARGET_TABLENAME= `echo $line| awk -F" " '{print $2}'`
LOC=`echo "$line"| awk -F" " '{print $3}'`
PREFIX="emp_"
S=`hive -e "desc $SOURCE_TABLENAME"`
VAL=echo $S |sed 's/\(\(\w\w*\W*\)\{2\}\)/\1\n/g' | sed 's/$/,/g' | sed -e 's/^/$PREFIX/'
STATEMENT="CREATE TABLE $SOURCE_TABLENAME (`echo $VAL) as select * from $SOURCE_TABLENAME LOCATION $LOC`"
hive -e "drop table $SOURCE_TABLENAME"
done < INPUT_FILE.txt
INPUT_FILE.txt
source_table target_table location (all inputs separated by space)
Without creating new table, you can use the REPLACE function in hive to change all the column names. The command looks like this
ALTER TABLE table_name REPLACE COLUMNS (sales_emp_id INT,sales_emp_name STRING,sales_salary DOUBLE);
Now you can use the describe command to check the column names
describe table_name;

Pass external variable to xidel in bash loop script

I try to parse html page using XPath with xidel.
The page have a table with multiple rows and columns
I need to get values from each row from columns 2 and 5 (IP and port) and store them in csv-like file.
Here is my script
#!/bin/bash
for (( i = 2; i <= 100; i++ ))
do
xidel http://www.vpngate.net/en/ -e '//*[#id="vg_hosts_table_id"]/tbody/tr["'$i'"]/td[2]/span[1]' >> "$i".txt #get value from first column
xidel http://www.vpngate.net/en/ -e '//*[#id="vg_hosts_table_id"]/tbody/tr["'$i'"]/td[5]' >> "$i".txt #get value from second column
sed -i ':a;N;$!ba;s/\n/^/g' "$i".txt #replace newline with custom delimiter
sed -i '/\s/d' "$i".txt #remove blanks
cat "$i".txt >> ip_port_list #create list
zip -m ips.zip "$i".txt #archive unneeded texts
done
The perfomance is not issue
When i manually increment each tr - looks perfect. But not with variable from loop.
I want to receive a pair of values from each row.
Now i got only partial data or even empty file
I need to get values from each row from columns 2 and 5 (IP and port) and store them in csv-like file.
xidel -s "https://www.vpngate.net/en/" -e '
(//table[#id="vg_hosts_table_id"])[3]//tr[not(td[#class="vg_table_header"])]/concat(
td[2]/span[#style="font-size: 10pt;"],
",",
extract(
td[5],
"TCP: (\d+)",
1
)
)
'
220.218.70.177,443
211.58.36.54,995
1.239.223.190,1351
[...]
153.207.18.229,1542
(//table[#id="vg_hosts_table_id"])[3]: Select the 3rd table of its
kind. The one you want.
//tr[not(td[#class="vg_table_header"])]: Select all rows, except the headers.
td[2]/span[#style="font-size: 10pt;"]: Select the 2nd column and the <span> that contains just the IP-address.
extract(td[5],"TCP: (\d+)",1): Select the 5th column and extract (regex) the numerical value after "TCP ".
Maybe this xidel line will come in handy:
xidel -q http://www.vpngate.net/en/ -e '//*[#id="vg_hosts_table_id"]/tbody/tr[*]/concat(td[2]/span[1],",",substring-after(substring-before(td[5],"UDP:"),"TCP: "))'
This will only do one fetch (so the admins of vpngate won't block you) and it'll also create a CSV output (ip,port)... Hopefully that is what you were looking for?

how to select arguments from text file in bash and loop over them

I have a text file that contains the following format below and I wanted to write a bash script that stores the column (adastatus,type,bodycomponent) names into a variable say x1.
# col_name data_type comment
adastatus string None
type string None
bodycomponent string None
bodytextlanguage string None
copyishchar string None
Then for each of the columns names in x1 I want to run a loop
alter table tabelname change x1(i) x1(i) DOUBLE;
How about:
#!/bin/sh
for i in `cut -f1 yourfile.txt`
do
SQL="alter table tablename change $i $i DOUBLE"
sql_command $SQL
done
awk '$1 !~ /^#/ {if ($1) print $1}' in.txt | \
xargs -I % echo "alter table tabelname change % % DOUBLE"
Replace echo with the command needed to run the alter command (from #Severun's answer it sounds like sql_command).
using awk, matches only input lines that do no start with # (except for leading whitespace) and are non-empty, then returns the first whitespace-separated token, i.e., the 1st column value for each line.
xargs invokes the target command once for each column name, substituting the column name for % - note that % as a placeholder was randomly chosen via the -I option.
Try:
#!/bin/bash
while read col1 _ _
do
[[ "$col1" =~ \#.* ]] && continue # skip comments
[[ -z "$col1" ]] && continue # skip empty lines
echo alter table tabelname change ${col1}\(i\) ${col1}\(i\)
done < input.txt
Output:
$ ./c.sh
alter table tabelname change adastatus(i) adastatus(i)
alter table tabelname change type(i) type(i)
alter table tabelname change bodycomponent(i) bodycomponent(i)
alter table tabelname change bodytextlanguage(i) bodytextlanguage(i)
alter table tabelname change copyishchar(i) copyishchar(i)
Change echo to a more appropriate command.

get the table names from file using UNIX Script

I am having a sample file as given below. This is an SQL Loader control file:
LOAD DATA
APPEND
INTO TABLE XXWIN_TMP_LOADER_TAB
( seq POSITION(1:10) INTEGER EXTERNAL
,h_record POSITION(11:20) CHAR
,h_file_name POSITION(21:55) CHAR
)
APPEND
INTO TABLE XXWIN_SQL_LOADER_TAB
( seq POSITION(1:10) INTEGER EXTERNAL
,h_record POSITION(11:20) CHAR
,h_file_name POSITION(21:55) CHAR
)
APPEND
INTO TABLE XXWIN_SQL_LOADER_TAB
( seq POSITION(1:10) INTEGER EXTERNAL
,h_record POSITION(11:20) CHAR
,h_file_name POSITION(21:55) CHAR
)
I would like to select any number of table names occurring in the file which are starting with 'XX_' and ending with '_TAB' and store it into an array using an UNIX script.
Please advice.
Thanks,
Arun
If the file syntax is not changing (the table names start with XX, not XX_):
tnames=`grep -o "TABLE XX[^ ].*_TAB" <file_name> | sed 's/TABLE //g'`
for tn in $tnames; do echo $tn; done
Change the <file_name> to the name of the file.
You don't say which shell, but since sh doesn't support arrays I'm assuming Bash.
tables=($(sed -n '/TABLE /s/TABLE \(XX[^ ]*TAB\) *$/\1/p' inputfile))
for table in ${tables[#]}
do
echo "$table"
done

Resources