how to remove last lines from CSV file - bash

I am inserting SQL query result into a csv file. At the end of file two rows are getting added as shown below. Can anyone please tell me how to delete these two rows (bank row and '168 rows selected').
Also, it will be great if these rows don't get inserted into csv file in the first place while inserting sql query result into csv file.
pvm:0a12dd82,pvm:0a12dd84,TechnicalErrorProcess,21-JUN-19 07.01.58.560000 AM,pi_halted,38930,1
pvm:0a12dd77,pvm:0a12dd79,TechnicalErrorProcess,20-JUN-19 12.36.27.384000 PM,pi_halted,1572846,1
pvm:0a12dd6t,pvm:0a12dd6v,TechnicalErrorProcess,20-JUN-19 12.05.22.145000 PM,pi_halted,38929,1
pvm:0a12dd4h,pvm:0a12dd4l,TechnicalErrorProcess,17-JUN-19 07.11.43.522000 AM,pi_halted,9973686,1
168 rows selected.

For MSSQL, before select query append the following,
set nocount on; select ...
I'm not sure if that will work for other databases.

Filter the output of the command above to exclude to two last lines.
I see two ways for doing it with a bash command:
head --lines=-2
Or
sed -e '/rows selected/d' -e '/^ *$/d'
(Indeed, this was a placeholder).

You can specify a negative number for the -n parameter using the head command:
-n, --lines=[-]K
print the first K lines instead of the first 10; with the leading '-', print all but the last K lines of each file
so:
head -n -2 input-file.txt > output-file.txt

Related

Populate a value in a particular column in csv

I have a folder where there are 50 excel sheets in CSV format. I have to populate a particular value say "XYZ" in the column I of all the sheets in that folder.
I am new to unix and have looked for a couple of pages Here and Here . Can anyone please provide me the sample script to begin with?
For example :
Let's say column C in this case:
A B C
ASFD 2535
BDFG 64486
DFGC 336846
I want to update column C to value "XYZ".
Thanks.
I would export those files into csv format
- with semikolon as field separator
- eventually by leaving out column descriptions (otherwise see comment below)
Then the following combination of SHELL and SED script could more or less do already the trick
#! /bin/sh
for i in *.csv
do
sed -i -e "s/$/;XZY/" $i
done
-i means to edit the file in place, here you could append the value to all lines
-e specifies the regular expresssion for substitution
You might want to use a similar script like this, to rename "XYZ" to "C" only in the 1st line if the csv files should contain also column descriptions.

parse CSV, Group all rows containing string at 5th field, export each group of rows to file with filename <group>_someconstant.csv

Need this in bash.
In a linux directory, I will have a CSV file. Arbitrarily, this file will have 6 rows.
Main_Export.csv
1,2,3,4,8100_group1,6,7,8
1,2,3,4,8100_group1,6,7,8
1,2,3,4,3100_group2,6,7,8
1,2,3,4,3100_group2,6,7,8
1,2,3,4,5400_group3,6,7,8
1,2,3,4,5400_group3,6,7,8
I need to parse this file's 5th field (first four chars only) and take each row with 8100 (for example) and put those rows in a new file. Same with all other groups that exist, across the entire file.
Each new file can only contain the rows for its group (one file with the rows for 8100, one file for the rows with 3100, etc.)
Each filename needs to have that group# prepended to it.
The first four characters could be any numeric value, so I can't check these against a list - there are like 50 groups, and maintenance can't be done on this if a group # changes.
When parsing the fifth field, I only care about the first four characters
So we'd start with: Main_Export.csv and end up with four files:
Main_Export_$date.csv (unchanged)
8100_filenameconstant_$date.csv
3100_filenameconstant_$date.csv
5400_filenameconstant_$date.csv
I'm not sure the rules of the site. If I have to try this for myself first and then post this. I'll come back once I have an idea - but I'm at a total loss. Reading up on awk right now.
If I have understood well your problem this is very easy...
You can just:
$ awk -F, '{fifth=substr($5, 1, 4) ; print > (fifth "_mysuffix.csv")}' file.cv
or just:
$ awk -F, '{print > (substr($5, 1, 4) "_mysuffix.csv")}' file.csv
And you will get several files like:
$ cat 3100_mysuffix.csv
1,2,3,4,3100_group2,6,7,8
1,2,3,4,3100_group2,6,7,8
or...
$ cat 5400_mysuffix.csv
1,2,3,4,5400_group3,6,7,8
1,2,3,4,5400_group3,6,7,8

Deleting the lines which match the output of another unix command

I have a file as below
cat file
a 1
a 2
b 3
I want to delete a 1 row and a 2 row as the first column of it is the same.
I tried cat file|uniq -f 1, im getting the desired output. But I want to delete this from the file.
awk 'NR==FNR{a[$1]++;next}a[$1]==1{print}' file file
This one-liner works for your needs. no matter if your file was sorted or not.
add some explanation:
This one-liner is gonna process the file twice, 1st go record (in a hashtable, key:1st col, value:occurences) the duplicated lines by the 1st column, in the 2nd run, check if the 1st col in the hashtable has value==1, if yes, print. Because those lines are unique lines respect to the col1.

bash script - seeking advice on extracting columns from csv file and how to skip to next line in current while loop

im tasked by my company to write a bash script to extract column(s) from the given csv file. The csv file contain multiple columns but 1 of the column is rather tricky to extract.
Example column for the csv:
(some column)...Column1,Column2,TrickyColumn,Column3,...(more column)
example column data:
...1,2,"fk,this,column",3,....
i looked up on the internet, most of them suggest to use awk or cut and specify the delimiter to comma to extract the data, but some record contain comma in that trickkcolumn, which mess up my columns extraction. Tried quotation mark, but some record have them as well.
here is how i do it
while read line
#read each csv record to line
do
commaPosition=$(expr index "$line" ,) #find the comma position
extractedColumn=${line:0:$((commaPosition-1))} #get the column data by using substring
column1="$extractedColumn" #store the column in variable to later use
line=${line:$CommaPosition} #substring and overwrite the line record so now the line record is : 2,"fk,this,column",3,...
commaPosition=$(expr index "$line" ,)
extractedColumn=${line:0:$((commaPosition-1))}
column2="$extractedColumn"
line=${line:$CommaPosition} #line now = "fk,this,column",3,....
line=$(rev<<<$line) #reverse the line to better extract the tricky column
#reversed line = "3,"nmuloc,siht,kf"
commaPosition=$(expr index "$line" ,)
extractedColumn=${line:0:$((commaPosition-1))}
column3=$(rev<<<$extractedColumn #revert the reversed column back to readable state and store to variable
line=${line:$nextCommaPosition} #line now= "nmuloc,siht,kf"
extractedColumn="$line" #extract the tricky column last and reverse it to store
trickyColumn=$(rev<<<$extractedColumn)
done <csv
as the code above, i grab the tricky column by extracting the column infront and behind before getting the tricky column. But right now i ran into some other problem.
Most of the time the record in the csv file are presented line by line which is fine for column extraction. But some record are presented in multiple line which break my script. Like this:
**...1,2,"fk,
this,
column",3,....**
as oppose to single line
**...1,2,"fk,this,column",3,....**
So the questions is:
is there any way to extract the columns more efficiently?
how to modify the script in case the record spammed in multiple line?
*forgive my poor English and thanks for any help given :)
I hope this helps you to solve the problem of the new lines. Using awk you can define what the field separator FS is (in this case a , or a ,\n) and what the record separator RS is (in this case any character that is not a , followed by a \n)
Reference: http://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr/
-------------------------------test.csv
(each csv line ends with a blank space and then a linebreak. For example, there is a blank space after the 6, the 12 and the 18)
1,2,3,4,5,6
7,8,9,
10,11,12
13,
14,
15,
16,
17,
18
--------------------------------------------
Command:
awk 'BEGIN { RS="[^,]\n";FS=",\|,\n";ORS="\n";OFS=","} { print $1,$2,$3,$4,$5,$6}' test.csv
Output:
1,2,3,4,5,6
7,8,9,10,11,12
13,14,15,16,17,18
For the problem that a , is inside a string (and does not delimit a new field) , I guess you could replace the FS to a more complex regex that detects those cases

How to find the number of columns within a row key in hbase

How to find the number of columns within a row key in hbase (since a row can have many columns)
I don't think there's a direct way to do that as each row can have a different number of columns and they may be spread over several files.
If you don't want to bring the whole row to the client to perform the count there you can write an endpoint coprocessor (HBase version for a stored procedure if you like) to perform the calculation on the region server side and only return the result. you can read a little about coprocessors here
There is a simple way:
Use hbase shell to scan through the table and write the output to a intermediate text file. Because hbase shell output splits each column of a row into a new line, we can just count the lines inside the text file (minus the first 6 lines which are hbase shell standard output and the last 2 lines).
echo "scan 'mytable', {STARTROW=>'mystartrow', ENDROW=>'myendrow'}" | hbase shell > row.txt
wc -l row.txt
Make sure to select the appropriate row keys, as the borders are not inclusive.
If you are only interested into specific columns (families), apply the filters in the hbase shell command above (e.g. FamilyFilter, ColumnRangeFilter, ...).
Thanks for #user3375803, actually you don't have to use external txt file. Because I can not comment on your answer, so I leave my answer below:
echo "scan 'mytable', {STARTROW=>'mystartrow', ENDROW=>'myendrow'}" | hbase shell | wc -l | awk '{print $1-8}'

Resources