bash / sed / awk Remove or gsub timestamp pattern from text file

bash / sed / awk Remove or gsub timestamp pattern from text file - bash

I have a text file like this:
1/7/2017 12:53 DROP TABLE table1
1/7/2017 12:53 SELECT
1/7/2017 12:55 --UPDATE #dat_recency SET
Select * from table 2
into table 3;
I'd like to remove all of the timestamp patterns (M/D/YYYY HH:MM, M/DD/YYYY HH:MM, MM/D/YYYY HH:MM, MM/DD/YYYY HH:MM). I can find the patterns using grep but can't figure out how to use gsub. Any suggestions?
DESIRED OUTPUT:
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;

You can use this sed command to remove data/time stamps from line start:
sed -i.bak -E 's~([0-9]{1,2}/){2}[0-9]{4} [0-9]{2}:[0-9]{2} *~~' file
cat file
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;

Use the default space separator, make first and second columns to empty string and then print the whole line.
awk '/^[0-9]/{$1=$2="";gsub(/^[ \t]+|[ \t]+$/, "")} !/^[0-9]/{print}' sample.csv
the command checks each line whether starts with numeric or not, if it is replace the first 2 columns with empty strings and remove leading spaces; otherwise print the original line.
output:
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;

Related

Export hql output to csv in beeline

I am trying to export my hql output to csv in beeline using below command :
beeline -u "jdbc:hive2://****/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"?tez.queue.name=devices-jobs --outputformat=csv2 -e "use schema_name; select * from table_name where open_time_new>= '2020-07-13' and open_time_new < '2020-07-22'" > filename.csv
The problem is that some column values in the table contains commas which pushes the data of same column to the next column value.
For eg:
| abcd | as per data,outage fault,xxxx.
| xyz |as per the source,ghfg,hjhjg.
The above data will get saved as 4 column instead of 2.
Need help!

Try the approach with local directory:
insert overwrite local directory '/tmp/local_csv_report'
row format delimited fields terminated by "," escaped by '\\'
select *
from table_name
where open_time_new >= '2020-07-13'
and open_time_new < '2020-07-22'
This will create several csv files under your local /tmp/local_csv_report directory, so using simple cat after that will merge the results into a single file.

bash: separate blocks of lines between pattern x and y

I have a similar question to this one Sed/Awk - pull lines between pattern x and y, however, in my case I want to output each block-of-lines to individual files (named after the first pattern).
Input example:
-- filename: query1.sql
-- sql comments goes here or else where
select * from table1
where id=123;
-- eof
-- filename: query2.sql
insert into table1
(id, date) values (1, sysdate);
-- eof
I want the bash script to generate 2 files: query1.sql and query2.sql with the following content:
query1.sql:
-- sql comments goes here or else where
select * from table1
where id=123;
query2.sql:
insert into table1
(id, date) values (1, sysdate);
Thank you

awk '/-- filename/{if(f)close(f); f=$3;next} !/eof/&&/./{print $0 >> f}' input
Brief explanation,
-- filename{if(f)close(f); f=$3;next}: locate the record contains filename, and assign it to f
!/eof/&&/./{print $0 >> f}: if following lines don't contain 'eof' neither empty, save it to the corresponding file.

This might work for you (GNU sed):
sed -r '/-- filename: (\S+)/!d;s##/&/,/-- eof/{//d;w \1#p;s/.*/}/p;d' file |
sed -nf - file
Create a sed script from the input file and run it against the input file
N.B. Two lines are needed for each query as the program for the query must be surrounded by braces and the w command must end in a newline.

Using GNU awk to handle multiple open files for you:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{out=$3; f=1}' file
or with any awk:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{close(out); out=$3; f=1}' file

How to export a Hive table into a CSV file including header?

I used this Hive query to export a table into a CSV file.
hive -f mysql.sql
row format delimited fields terminated by ','
select * from Mydatabase,Mytable limit 100"
cat /LocalPath/* > /LocalPath/table.csv
However, it does not include table column names.
How to export in csv the column names ?
show tablename ?

You should add set hive.cli.print.header=true; before your select query to get column names as the first row of your output. The output would look as Mytable.col1, Mytable.col2 ....
If you don't want the table name with the column names, use set hive.resultset.use.unique.column.names=false;. The first row of your output would then look like col1, col2 ...

Invoking hive command-line with the parameters suggested in the other answer here works for a plain select. So, you can extract the column names and create the csv to start with, as follows:
hive -S --hiveconf hive.cli.print.header=true --hiveconf hive.resultset.use.unique.column.names=false --database Mydatabase -e 'select * from Mytable limit 0;' > /LocalPath/table.csv
Post which you can have the actual data extraction part run, except this time, remember to append to the csv:
cat /LocalPath/* >> /LocalPath/table.csv ## From your question with >> for append

How to use sed to insert a line before each line in a file with the original line's content surrounding by a string?

I am trying to use sed (GNU sed version 4.2.1) to insert a line before each line in a file with that line's content surrounding by a string.
Input:
truncate table ALPHA;
truncate table BETA;
delete from TABLE_CHARLIE where ID=1;
Expected Result:
SELECT 'truncate table ALPHA;' from dual;
truncate table ALPHA;
SELECT 'truncate table BETA;' from dual;
truncate table BETA;
SELECT 'delete from TABLE_CHARLIE where ID=1;' from dual;
delete from TABLE_CHARLIE where ID=1;
I have tried to make use of the ampersand (&) special character, but this does not seem to work. If I put anything after the ampersand on the replacement string, the output is not correct.
Attempt 1:
sed -e "s/\(.*\)/SELECT '&\n&/g" input.txt
output:
SELECT 'truncate table ALPHA;
truncate table ALPHA;
SELECT 'truncate table BETA;
truncate table BETA;
SELECT 'delete from TABLE_CHARLIE where ID=1;
delete from TABLE_CHARLIE where ID=1;
With the preceding code, I get the SELECT ' as expected, but once I attempt to add ' from dual; to the right side of string, things get out of whack.
Attempt 2:
sed -e "s/\(.*\)/SELECT '&' from dual;\n&/g" input.txt
output:
' from dual;cate table ALPHA;
truncate table ALPHA;
' from dual;cate table BETA;
truncate table BETA;
SELECT 'delete from TABLE_CHARLIE where ID=1;' from dual;

You can take advantage of the hold space to temporarily store the original line.
sed "h;s/.*/'SELECT '&' from dual;/;p;g" input.txt
or more readably:
sed "
h
s/.*/'SELECT '&' from dual;/
p
g" input.txt
Here's a breakdown of the command.
First, each line of the input is placed in the pattern space.
The h command copies the contents of the pattern space to the hold space.
The s command performs a substitution on the pattern space. The & represents whatever was matched. This command leaves the hold space unaffected.
The p command outputs the contents of the pattern space to standard output.
The g command copies the contents of the hold space to the pattern space.
By default, the contents of the pattern space are written to standard output before reading the next input line.
As Glenn Jackman points out, you can replace p;g with G. This builds up a two-line value in the pattern space that is then printed, rather than print two separate pattern spaces.
sed "h;s/.*/'SELECT '&' from dual;/;G" input.txt
Also, you can add comments to the sed command so that you can understand what the line noise does later :), if this is in a script.
sed "
# The input line is first copied to the pattern space
h # Copy the pattern space to the hold space
s/.*/'SELECT '&' from dual;/ # Modify the pattern space
p # Print the (modified) pattern space
g # Copy the hold space to the pattern space
# The output of the pattern space (the original input line) is now printed
" input.txt

If you're looking for an alternative to sed, these work:
awk '{printf "SELECT '\''%s'\'' from dual;\n%s\n", $0, $0}' file
perl -lpe "print qq{SELECT '\$_' from dual;}" file

Your second attempt works on both 4.2.1 and 4.2.2 versions of sed. I received same invalid input when I tried to save your input file with windows line endings (line feed and carriage return).
Use this command on your input file before running your sed command:
tr -d '\15\32' < winfile.txt > unixfile.txt
Or as you suggest, simply by using the dos2unix utility.

Here's how to do it with awk:
awk -v PRE="SELECT '" -v SU="' from dual;" '{print PRE$0SU; print}'`

Merge all the data within the (.....) in one line in shell script

I am new to shell script i need some help i have one SQL file like
SELECT DISTINCT F1.COL1,
F1.COL5 ADDRESS ,
COALESCE(COL1,
COL2,
COL3,
COL4),
F1.COL7
FROM TABLE1 F1
I need to print this in one line like
SELECT DISTINCT F1.COL1,
F1.COL5 ADDRESS ,
COALESCE(COL1,COL2,COL3,COL4),
F1.COL7
FROM TABLE1 F1
Thanks

With sed :
sed '/(/{:a;N;s/^ *//;s/\n *//;/)/!{ba}}' file
To edit file in place, add the -i option :
sed -i '/(/{:a;N;s/^ *//;s/\n *//;/)/!{ba}}' file
All lines starting with ( are joined until next line containing ).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash / sed / awk Remove or gsub timestamp pattern from text file - bash

You can use this sed command to remove data/time stamps from line start: sed -i.bak -E 's~([0-9]{1,2}/){2}[0-9]{4} [0-9]{2}:[0-9]{2} ~~' file cat file DROP TABLE table1 SELECT --UPDATE #dat_recency SET Select from table 2 into table 3;

Related

Export hql output to csv in beeline

bash: separate blocks of lines between pattern x and y

How to export a Hive table into a CSV file including header?

How to use sed to insert a line before each line in a file with the original line's content surrounding by a string?

Merge all the data within the (.....) in one line in shell script

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash / sed / awk Remove or gsub timestamp pattern from text file - bash

You can use this sed command to remove data/time stamps from line start: sed -i.bak -E 's~([0-9]{1,2}/){2}[0-9]{4} [0-9]{2}:[0-9]{2} *~~' file cat file DROP TABLE table1 SELECT --UPDATE #dat_recency SET Select * from table 2 into table 3;

Related

Export hql output to csv in beeline

bash: separate blocks of lines between pattern x and y

How to export a Hive table into a CSV file including header?

How to use sed to insert a line before each line in a file with the original line's content surrounding by a string?

Merge all the data within the (.....) in one line in shell script

Categories

Resources

You can use this sed command to remove data/time stamps from line start: sed -i.bak -E 's~([0-9]{1,2}/){2}[0-9]{4} [0-9]{2}:[0-9]{2} ~~' file cat file DROP TABLE table1 SELECT --UPDATE #dat_recency SET Select from table 2 into table 3;