I have a sample.hql file which contains below lines.
desc db.table1;
desc db.table2;
desc db.table3;
I am trying to run it from shell command
I want to find out if a particular column is present in the table or not
For eg-If col_1 is present in table1 the output should say col1_1 is found in db.table1
I am not sure how to find it.
I am executing below command
hive -f sample.hql | grep -q "<column_name>"
But I am not sure how to get the db and table name from each executing line.
You can make grep show you before -B and after -A. The below command would show you 10 lines before. This likely will get the job done quick and dirty.
hive -f sample.hql | grep -B 10 -q "<column_name>"
If you wanted to be a little more careful you might try a for loop instead that feeds the lines to hive one at a time. If it finds the column it will echo the table it found the column in. ( the '&&' only executes code if the previous command was successful)
#!/bin/bash
for i in $(cat sample.hql); do
hive -e "$i" | grep -q "<column_name>" && echo $i;
done
Related
I'm creating unix shell script to execute the impala query. I need to get the output log of impala query. For example I tried the below.
output_log = echo $(impala-shell -i $node -q "select name from impaladb.impalatbl" -o output_file)
Output:
+--------+
| name |
+--------+
| tom |
| mike |
+--------+
Fetched 2 row(s) in 0.83s
Here I'm getting the two name output in both output_file and output_log. But I need the "Fetched 2 row(s) in 0.83s" log in output_log variable. How can I get it?
I'm not familiar with impala, so I'm not confident that what you are doing is the most efficient way to query impala. However, you're trying to focus in on a specific line of output; that I can answer.
There are many ways to do this. Maybe the most straightforward is grep:
output_log = echo `impala-shell -i $node -q "select name from impaladb.impalatbl" -o output_file | grep Fetch`
Try this:
Solution 1:
output_log=$(nohup impala-shell -k --ssl -i $node --verbose --delimited --query="select count(*) as cnt from impaladb.impalatbl" 2>/dev/null)
echo $output_log
Solution 2:
output_log=$(echo `impala-shell -k --ssl -i $node --verbose --delimited --query="select count(*) as cnt from impaladb.impalatbl" -o output_file | head output_file`)
echo $output_log
I solved the problem.
The way it works is that impala sends the query output in a different stream and the other information w.r.t the query in a different stream.
hence all you have to do is
impala-shell -i $node -q "select name from impaladb.impalatbl" 2>output_file
the 2> will send the output containing the "Fetched 1 row in 10seconds" in the output file. Now you can grep it or do whatever you want .
Suppose you want to store the same output in output_log variable , then use
output_log=$(impala-shell -i $node -q "select name from impaladb.impalatbl" 2>&1)
here 2>&1 will send the output to stdout which will be used for assinging the value to variable.
For more information on this , just put 2>&1 in google search and learn more about it.
Hope it helps you!!
Some additional observations
2>&1 redirect output from stderr to stdout but stdout also gets the query output, so when you store it in a variable , it will get query output as well as the extra information like"fetched 1 row in 3seconds"
but
when we use
2>a.txt then only the stderr output is getting redirected. so a.txt will contain only information like "starting impala.....fetched 1 row in 2 seconds" . and then you can grep from file and put that in a variable.
just wanted to highlight this slight difference I observed between storing in a file and storing in a variable.
I am trying to find common names in a file and file name is generated dynamically. But when I try to give the filename using the $ size its not getting replaced tried echo and then eval but get an error as an unexpected token (
The code is as below
hive -e "use $1;show tables;">$1.txt
eval $(echo "comm -12 <(sort -u hub_table_list) <(sort -u $1.txt) >result.txt")
The hive command runs succesfully file is created with the parameter name.
It contains the table names.
All Help appreciated.
I would like to read a file to EOF which has multiple sql queries separated by pipe or double pipe. Choosing pipe because tab, \n and space are could already be present in queries. The queries can span to multiple lines. Each query needs to be read into a variable and inside the while loop, I would like to run a command/SQL query on a DB using that querystring variable. The loop has to do this for all querystrings created from the file.
Example file Contents:
select * from blah;||select column from table1;||select column7 from table4;EOF
This is my script so far:
while IFS=$'||' read -r querystring
do
run some shell command on ${querystring}
done < myqueryfile.sql
This is how much I got so far. thoughts/comments appreciated
If sql file contains some sentences with multiple lines like:
select *
from blah;||
select column
from table1;||
select column7
from table4;
You could use gawk/awk to isolate sentence blocks with separation pipes in other line. This produce an output easy to process:
gawk -v RS="[|][|]" -v ORS="\n||\n" '{ print $0}' file | while read -r line
do
if [ "$line" == "||" ]; then
run some shell command on "$sentence"
sentence=""
else
sentence=$sentence"\n"$line
fi
done
No temp files are required ;-)
You could use awk:
/tmp ❯❯❯ cat /tmp/a ⏎
select * from blah;||select column from table1;||select column7 from table4
/tmp ❯❯❯ awk 'BEGIN {FS="[|][|]"} {for(i=1;i<=NF;i++)printf "run some shell command on %s\n",$i}' /tmp/a
run some shell command on select * from blah;
run some shell command on select column from table1;
run some shell command on select column7 from table4
/tmp ❯❯❯ awk 'BEGIN {FS="[|][|]"} {for(i=1;i<=NF;i++)printf "run some shell command on %s\n",$i}' /tmp/a |sh -
How do I delete the output for one big table inside a mysqldump with lots of tables in it?
I have a dump of a database that is 6 GB large, but 90% of it is only one logging-table "cache_entries", that I don’t need anymore inside my backup.
How can I easily remove that bit inside the dump, that describes the large logging-table?
I found this:
http://gtowey.blogspot.de/2009/11/restore-single-table-from-mysqldump.html
Example:
grep -n 'Table structure' dump.sql
and then for example:
sed -n '40,61 p' dump.sql > t2.sql
But how can I change that for my needs?
You could use 'n,n d' to remove certain lines.
I guess in your case you do want to have the table in question, but don't want the data?
Change the grep command to include "Dumping data for table":
grep -n 'Table structure\|Dumping data for table' dump.sql
19:-- Table structure for table `t1`
37:-- Dumping data for table `t1`
47:-- Table structure for table `t2`
66:-- Dumping data for table `t2`
76:-- Table structure for table `t3`
96:-- Dumping data for table `t3`
Now, if you don't want the data for t2, you could use:
sed '66,75 d' dump.sql > cleandump.sql
I found this bash script, that splits a dump of one database into separate filed for each table, using csplit (that splits a file into sections determined by context lines):
#!/bin/bash
####
# Split MySQL dump SQL file into one file per table
# based on http://blog.tty.nl/2011/12/28/splitting-a-database-dump
####
if [ $# -ne 1 ] ; then
echo "USAGE $0 DUMP_FILE"
fi
csplit -s -ftable $1 "/-- Table structure for table/" {*}
mv table00 head
for FILE in `ls -1 table*`; do
NAME=`head -n1 $FILE | cut -d$'\x60' -f2`
cat head $FILE > "$NAME.sql"
done
rm head table*
Source: gist.github.com/1608062
and a bit enhanced:
How do I split the output from mysqldump into smaller files?
once, you have separate files for each table, you can delete the unwanted tables and glue them together if needed with
cat table* >glued_sqldump.sql
you need to find the create table statement of your table, and find the next create table statement. say they are n1 and n2.
then you can just delete them with sed as above. sed 'n1,n2d' dump.sql > new.sql
you can just grep create table and note the line numbers for your prework.
here is a demo.
ubuntu#ubuntu:~$ grep -n [34] a.txt
3:3
4:4
ubuntu#ubuntu:~$ cat a.txt
1
2
3
4
5
6
ubuntu#ubuntu:~$ grep [34] a.txt
3
4
ubuntu#ubuntu:~$ sed '3,4d' a.txt > b.txt
ubuntu#ubuntu:~$ cat b.txt
1
2
5
6
ubuntu#ubuntu:~$
I have to find the filename available in a folder with each file line count. And then, i will have kind of two column data.
Now i have to insert this record into a oracle table having two column(col1, col2).
Can i write a shell script which will do both.
I found here itself of writing the first part.
i.e
wc -l *| egrep -v " total$" | awk '{print $2 " " $1}' > output.txt
Now, how will insert data of output.txt into oracle table.
In version 9i Oracle gave us external tables. These objects allow us to query data in OS files through SELECT statements. This is pretty cool. Even cooler, in 11.0.1.7 we can associate a shell script with an external table to generate its OS file. Check out Adrian Billington's article on listing files with the external table preprocessor in 11g. Your shell script is an ideal candidate for the preprocessor functionality.
If you need to know the contents of the directory now for whatever purpose you can simply SELECT from the external table. If you want to keep a permanent record of the file names you can issue an INSERT INTO ... SELECT * FROM external_table;. This statement could be run autonomically using a database job.
You have two choices.
Both I tested with this table structure:
SQL> create table tbl_files(fileName varchar(20), lineCount number);
First choice is generate sql script with generate insert SQL commands, and run sqlplus command line utility. I little bit modified your shell script:
wc -l *| egrep -v " total$" | awk '{q=sprintf("%c", 39); print "INSERT INTO TBL_FILES(fileName, lineCount) VALUES (" q$2q ", " $1 ");";}' > sqlplusfile.sql
After run this script file "sqlplusfile.sql" have this content:
INSERT INTO TBL_FILES(fileName, lineCount) VALUES ('File1.txt', 10);
INSERT INTO TBL_FILES(fileName, lineCount) VALUES ('File2.txt', 20);
INSERT INTO TBL_FILES(fileName, lineCount) VALUES ('File3.txt', 30);
Now you can run directly sqlplus command with this file in parametr:
sqlplus username/password#oracle_database #sqlplusfile.sql
After run this script, table look like this:
SQL> select * from tbl_files;
FILENAME LINECOUNT
-------------------- ----------
File1.txt 10
File2.txt 20
File3.txt 30
Note: At the end of file "sqlplusfile.sql" must be present "commit;" otherwise, you will not see data in the table.
Second choice is to use sqlldr command line tool (this tool is part of the installation of Oracle Client)
Again, little bit modified your script:
wc -l *| egrep -v " total$" | awk '{print "\""$2"\"" "," $1}' > data.txt
Content of file "data.txt" look like this:
"File1.txt",10
"File2.txt",20
"File3.txt",30
In the same directory I created the file "settings.ctl" with this content:
LOAD DATA
INFILE data.txt
INSERT
INTO TABLE TBL_FILES
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
(fileName, lineCount)
Now you can run this command, which loads data into the database:
sqlldr userid=user/passwd#oracle_database control=settings.ctl
sqlldr utility is better choice, but in some Oracle Client installation is not present.
You can load text data into Oracle with its command-line tool SQLLoader. There is too much to describe here about how to use SQLLoader, start by reading this web page:
http://www.orafaq.com/wiki/SQL*Loader_FAQ
I do not know Oracle, but it appears that the SQL syntax is quite similar to MySQL.
In essence you, you would do what you have done here with one minor change.
wc -l *| egrep -v " total$" | awk '{print $2 "|" $1}' > output.txt
You would write an SQL script called thesql.sql that would look like the following:
LOAD DATA
INFILE output.txt
INTO TABLE yourtable
FIELDS TERMINATED BY '|'
(col1, col2)
Then, in your script (this is where I am hazy), have a line like this:
sqlplus <user>/<pw>#<db> #thesql.sql
I found some help from a couple of different sources - here and here.
Hope this helps!