add row number to the last column using awk or bash - bash

Input file format:
name id department
xyz 20 cic
abc 25 cis
Output should look like:
name id department
xyz 20 cic 1
abc 25 cis 2
Note: all the fields are tab separated.
Appreciate any help!!

$ awk -F'\t' 'NR>1{$0=$0"\t"NR-1} 1' file
name id department
xyz 20 cic 1
abc 25 cis 2

You should try this:
awk '{printf "%s\t%s\n",$0,NR}' File_name
Explanation:
$0 = print all the lines
NR = Adds number at each line
%s = for printing a literal character
\t = hozintal tab
\n = new line

A variation on Ed Morton's answer:
awk -F'\t' -v OFS='\t' 'NR>1 { $(NF+1)=NR-1} 1' file
This sets the output field separator using the -v option, then simply adds a new field to the current record by setting $(NR+1).

Related

BASH - Split file into several files based on conditions

I have a file (input.txt) with the following structure:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
...
I would like to split this file into multiple files (day.txt; month.txt; ...). Each new text file would contain all "header" lines (the one starting with >) and their content (lines between two header lines).
day.txt would therefore be:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
and month.txt:
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
I cannot use split -l in this case because the amount of lines is not the same for each category (day, month, etc.). However, each sub-category has the same number of lines (=3).
EDIT: As per OP adding 1 more solution now.
awk -F'[>_]' '/^>/{file=$2".txt"} {print > file}' Input_file
Explanation:
awk -F'[>_]' ' ##Creating field separator as > or _ in current lines.
/^>/{ file=$2".txt" } ##Searching a line which starts with > if yes then creating a variable named file whose value is 2nd field".txt"
{ print > file } ##Printing current line to variable file(which will create file name of variable file's value).
' Input_file ##Mentioning Input_file name here.
Following awk may help you on same.
awk '/^>day/{file="day.txt"} /^>month/{file="month.txt"} {print > file}' Input_file
You can set the record separator to > and then just set the file name based on the category given by $1.
$ awk -v RS=">" 'NF {f=$1; sub(/_.*$/, ".txt", f); printf ">%s", $0 > f}' input.txt
$ cat day.txt
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
$ cat month.txt
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
Here's a generic solution for >name_number format
$ awk 'match($0, /^>[^_]+_/){k = substr($0, RSTART+1, RLENGTH-2);
if(!(k in a)){close(op); a[k]; op=k".txt"}}
{print > op}' ip.txt
match($0, /^>[^_]+_/) if line matches >name_ at start of line
k = substr($0, RSTART+1, RLENGTH-2) save the name portion
if(!(k in a)) if the key is not found in array
a[k] add key to array
op=k".txt" output file name
close(op) in case there are too many files to write
print > op print input record to filename saved in op
Since each subcategory is composed of the same amount of lines, you can use grep's -A / --after flag to specify that number of lines to match after a header.
So if you know in advance the list of categories, you just have to grep the headers of their subcategories to redirect them with their content to the correct file :
lines_by_subcategory=3 # number of lines *after* a subcategory's header
for category in "month" "day"; do
grep ">$category" -A $lines_by_subcategory input.txt >> "$category.txt"
done
You can try it here.
Note that this isn't the most efficient solution as it must browse the input once for each category. Other solutions could instead browse the content and redirect each subcategory to their respective file in a single pass.

Parsing key value in an csv file using shell script

Given csv input file
Id Name Address Phone
---------------------
100 Abc NewYork 1234567890
101 Def San Antonio 9876543210
102 ghi Chicago 7412589630
103 GHJ Los Angeles 7896541259
How do we grep/command for the value using the key?
if Key 100, expected output is NewYork
You can try this:
grep 100 filename.csv | cut -d, -f3
Output:
New York
This will search the whole file for the value 100, and return all the values in the 3rd column of the matching rows.
With GNU grep:
grep -Po '^100.....\K...........' file
or shorter:
grep -Po '^100.{5}\K.{11}' file
Output:
NewYork
Awk splits lines by whitespace sequences (by default).
You could use that to write a condition on the first column.
In your example input, it looks like not CSV but columns with fixed width (except the header). If that's the case, then you can extract the name of the city as a substring:
awk '$1 == 100 { print substr($0, 9, 11); }' input.csv
Here 9 is the starting position of the city column, and 11 is its length.
If on the other hand your input file is not what you pasted, but really CSV (comma separated values), and if there are no other embedded commas or newline characters in the input, then you can write like this:
awk -F, '$1 == 100 { print $3 }' input.csv

How to format the out put from database in shell scripting

I have written shell scripting and extracting records from database. And below is my result in xyz.txt file.
NAME
COUNT(*)
Ben 7
Tim 4
BPNAME
COUNT(*)
Mark 7
Jhon 4
But how do i format it as below.? So i can send email to display the same.
NAME COUNT
Ben 7
Tim 4
Mark 7
Jhon 4
awk solution:
awk 'BEGIN{ print "NAME","COUNT" }!/NAME|COUNT/{ print $1,$2 }' xyz.txt | column -t
The output:
NAME COUNT
Ben 7
Tim 4
Mark 7
Jhon 4
You can achieve this task with awk, inclusive formatting instead of piping to column -t like already suggested.
In the BEGIN section of awk you can always print print a header. For formatted printing you can always switch from print to printf.
After the BEGIN section have been executed, printf 1st and 2nd column only if there is no (!) pattern like NAME or (|) COUNT.
$ awk 'BEGIN{ printf "NAME\tCOUNT\n" } !/NAME|COUNT/ { printf "%s\t%s\n", $1,$2 }' file

Remove the line in the file Which has only number in shell script

I have one file which contain sequence of number in every line. I want to remove the line which has only number
I tried (to no avail):
$ cat -n input_file > output_file
My file contain
1 name
2
3 Age
4
5 state
6 city
i want the output as
1 name
3 Age
5 state
6 city
A simple awk formula would do:
cat input_file | awk ' ($2 != "") { print $N } '
Edit: Cleaner way from Tom's comment
awk ' ($2 != "") { print $0 } ' input_file
The easiest way would be to use grep and look for lines with any characters.
testfile.txt:
1 name
2
3 Age
4
5 State
6 city
Then try:
grep '[a-zA-Z]' testfile.txt
1 name
3 Age
5 State
6 city
Starting with this file:
name
Age
state
city
You can skip the empty lines and add the numbers like this:
awk 'NF { print NR, $0 }' file
When the line contains any non-blank characters (i.e. anything other than spaces or tabs), print the line number followed by the contents of the line.
If the numbers are in the input file already, you can use this:
awk 'NF > 1' file
This prints any line with more than one field.

Print lines whose 1st and 4th column differ

I have a file with a bunch of lines of this form:
12 AAA 423 12 BBB beta^11 + 3*beta^10
18 AAA 1509 18 BBB -2*beta^17 - beta^16
18 AAA 781 12 BBB beta^16 - 5*beta^15
Now I would like to print only lines where the 1st and the 4th column differ (the columns are space-separated) (the values AAA and BBB are fixed). I know I can do that by getting all possible values in the first column and then use:
for i in $values; do
cat file.txt | grep "^$i" | grep -v " $i BBB"
done
However, this runs through the file as many times as how many different values appear in the first column. Is there a way how to do that simply in one pass only? I think I can do the comparison, my main problem is that I have no idea how to extract the space-separated columns.
This is something quite straight forward for awk:
awk '$1 != $4' file
With awk, you refer to the first field with $1, the second with $2 and so on. This way, you can compare the first and the forth with $1 != $4. If this is true (that is, $1 and $4 differ), awk performs its default action: print the current line.
For your sample input, this works:
$ awk '$1 != $4' file
18 AAA 781 12 BBB beta^16 - 5*beta^15
Note you can define a different field separator with -v FS="...". This way, you can tell awk that your lines contain fields tab / comma / ... separated. All together it would be like this: awk -v FS="\t" '$1 != $4' file.

Resources