I am having three columns in a CSV: Client Name, save Set Name and Status. For some clients, we have two Status as Failed and Success both. So, I want to filter those clients only which have status as only Failed. Clients who are having two entries such as Failed and success also, I want to omit.
When I am using the listed command, it's giving me values whose status was successful also might be later on. I want values which are only Failed. Not successful even once
cat "$pwd"/Daily-Failed.csv|egrep -i 'failed|Interrupted'|awk -F',' '{print $2,$3,$9}'|sort -u > "$pwd"/Final-Failed/Failed.csv
(edit) Or with newlines:
cat "$pwd"/Daily-Failed.csv|
egrep -i 'failed|Interrupted'|
awk -F',' '{print $2,$3,$9}'|
sort -u > "$pwd"/Final-Failed/Failed.csv
enter image description here
Please find the input and desired output. Input Client Name, Save Set, Status
Star,D:/,Failed
Star,C:/,Failed
Moon,C:/,Failed
Galaxy,D:/,Failed
Sun,D:/,Failed
Star,C:/,Success
Sun,D:/,Success
Output "Client Name","Save Set",Status
Galaxy,D:/,Failed
Moon,C:/,Failed
Star,D:/,Failed
I want to filter those clients only which have status as only Failed. Clients who are having two entries such as Failed and success also, I want to omit.
I'm going to assume, looking at your sample input (Which really needs to be text in your question, not an image), that both the Client Name and Save Set columns matter - you have (Star, C:/) with both success and failure rows, and (Star, D:/) with just failure, and the latter shows up in your output, and that's the only way that would make sense given your stated goal. On the other hand you also have two (Sun, D:/) rows, one success, one failure, and that shows up in your output even though it doesn't meet your criteria any way you look at it...
Anyways, this sort of grouping and filtering of tabular data screams database, and I like to script sqlite to make it do all the work in such cases:
#!/bin/sh
filename=Daily-Failed.csv
sqlite3 -batch -csv -header <<EOF
.import '${filename}' tbl
SELECT *
FROM tbl
GROUP BY "Client Name", "Save Set"
HAVING count(*) = 1 AND Status = 'Failed'
EOF
after taking the data in your image and turning it into a CSV file Daily-Failed.csv looking like
Client Name,Save Set,Status
Star,D:/,Failed
Star,C:/,Failed
Moon,C:/,Failed
Galaxy,D:/,Failed
Sun,D:/,Failed
Star,C:/,Success
Sun,D:/,Success
that script will output
"Client Name","Save Set",Status
Galaxy,D:/,Failed
Moon,C:/,Failed
Star,D:/,Failed
recently I started to work with db2, and created few databases.
To drop a single DB I should use db2 drop db demoDB, is there a way to drop all DBs at once?
Thanks
Taking into account the previous answer, this set of lines do the same without creating a script.
db2 list db directory | tail -n +6 | sed 'N;N;N;N;N;N;N;N;N;N;N;s/\n/ /g' | awk '$28 = /Indirect/ {print "db2 drop database "$7}' | source /dev/stdin
This filters the local databases, and executes the generated output.
(Only works in English environment)
first , i don't think there is any db2 nature way to do that. But I usually do the following thing. At start, the way to see all the databases on your instance is one of the following:
db2 list db directory
db2 list active active databases
Depends on your need ( all DBs or just the active DBs)
I'm sure there is more DBs lists you can get.(at DB2 user guide)
The way I usually drop all my DBs is by using shell script:
1. create new script by using 'vi db2_drop_all.sh' or some other way you want.
2. paste the code:
#!/bin/bash -x
for db_name in $(db2 list db directory | grep Database | \
grep name | cut -d= -f2); do
db2 drop db $db_name || true
done
exit 0
3. save changes
4. and just run the script (after you switched to your instance of course ) sh db2_drop_all.sh
Notice that in step 2 you can change the list of DBs as you wish. ( for example to db2 list active databases)
Hope it helped you. :)
I have an external hive table on top of a parquet file.
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
I want to get the count of table using shell script.
I tried with following command
myVar =$(hive -S -e " select count(*) from parquet_test;")
echo $myVar
Added -S to run hive in silent mode still I get whole map reduce log and count in the myVar variable. How to get only count.
I don't have access to any of the configuration file to enable or disable the level of logging. Is there any other way?
Finally found a work around.
First flushed the query result into a file in HDFS then read answer from file.
The file only contains the result of the query.
(hive -S -e " INSERT OVERWRITE LOCAL DIRECTORY '/home/test/result/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select count(*) from parquet_test;")
Then reading the file into a variable
Count var=$(hdfs dfs -tail /home/test/result/)
echo $var
Thank you
myVar=$(eval "hive -S -e 'select count(*) from parquet_test;' ")
echo $myVar
I am working on a test in which I must find out the number of partitions of a table and check if it is right. If I use show partitions TableName I get all the partitions by name, but I wish to get the number of partitions, like something along the lines show count(partitions) TableName (which retuns OK btw.. so it's not good) and get 12 (for ex.).
Is there any way to achieve this??
Using Hive CLI
$ hive --silent -e "show partitions <dbName>.<tableName>;" | wc -l
--silent is to enable silent mode
-e tells hive to execute quoted query string
You could use:
select count(distinct <partition key>) from <TableName>;
By using the below command, you will get the all partitions and also at the end it shows the number of fetched rows. That number of rows means number of partitions
SHOW PARTITIONS [db_name.]table_name [PARTITION(partition_spec)];
< failed pictoral example >
You can use the WebHCat interface to get information like this. This has the benefit that you can run the command from anywhere that the server is accessible. The result is JSON - use a JSON parser of your choice to process the results.
In this example of piping the WebHCat results to Python, only the number 24 is returned representing the number of partitions for this table. (Server name is the name node).
curl -s 'http://*myservername*:50111/templeton/v1/ddl/database/*mydatabasename*/table/*mytablename*/partition?user.name=*myusername*' | python -c 'import sys, json; print len(json.load(sys.stdin)["partitions"])'
24
In scala you can do following:
sql("show partitions <table_name>").count()
I used following.
beeline -silent --showHeader=false --outputformat=csv2 -e 'show partitions <dbname>.<tablename>' | wc -l
Use the following syntax:
show create table <table name>;
Can we have multiple SQL* PLUS connections in a shell script?
I have written a shell script to copy the data of tables from one database to another using COPY command of SQL* PLUS. I don't have the privilege to create Database Link so, I am using the COPY command.
I need to copy the data of around 50 tables. When the dataset is small, it runs and copies the data of all the tables. But when the dataset is huge, it gets stuck and I get session inactive message in the unix machine.
I thought of splitting the statements and wrote it as below: But I am getting the error "SP2-0042: unknown command "END1" - rest of line ignored." and "SP2-0042: unknown command "END" - rest of line ignored."
#!/bin/bash
export ORACLE_HOME=/ora00/app/oracle/product/9.2.0.8
export PATH=$PATH:$ORACLE_HOME/bin
args=$#
if [ $args == 1 ]
then
echo "Shell script started"
else
echo "Wrong number of arguments"
exit 1
fi
time_start=`date +%H%M%S`
echo $time_start
sqlplus -s srcUN/srcPwd#srcSID <<END1
COPY from srcUN/srcPwd#srcSID to dstUN/dstPwd#dstSID INSERT tab1 USING SELECT * FROM tab1 WHERE col1 = $1;
COPY from srcUN/srcPwd#srcSID to dstUN/dstPwd#dstSID INSERT tab2 USING SELECT * FROM tab2 WHERE col1 = $1;
END1
sqlplus -s srcUN/srcPwd#srcSID <<END2
COPY from srcUN/srcPwd#srcSID to dstUN/dstPwd#dstSID INSERT tab3 USING SELECT * FROM tab3 WHERE col1 = $1;
END2
#END
Could you please help me resolve this?
Thanks,
Savitha
The problem is that END1 and END2 are not recognized as the end of the input redirection because they have leading whitespace.
Remove all the whitespace on these two lines and it should work.