Adjusting column padding in bash - bash

Any idea how can I put the output as the following?
Input:
1 GATTT
2 ATCGT
Desired output:
1 GATTT
2 ATCGT
I tried the following and it did not work
cut -c7,1-6,8-

$ awk -v OFS='\t' '{print $1,$2}' input
1 GATTT
2 ATCGT
or
$ awk '{print $1 "\t" $2}' input

SED can also be used:
sed "s/[:digit:]* .*/ &/g" input
1 GATTT
2 ATCGT

I'm assuming that the original whitespace were 6 spaces based on your cut command. The easiest way to knock this out with simple bash commands is using a tab for separation on the output.
echo " 1 GATTT" | cut -d ' ' -f 7- | tr ' ' '\t'
The cut command makes the delimeter a space character and takes from field 7 on. Then the tr (translate) command converts the remaining space to a tab.

Related

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

Subtract length element two columns

I've a file from which I get two columns: cut -d $'\t' -f 4,5 file.txt
Now I would like to get the difference in length of each element between column 1 and 2.
Input from cut command
A T
AA T
AC TC
A CT
What I would expect
0
1
0
-1
Using awk.
awk ' {print length($1) - length($2)} ' cutoutput.txt
Or awk on the original file you can simply do:
awk ' {print length($4) - length($5)} ' file.txt
You probably can do this only with awk without using cut. Since you don't have the original input file, I would use the following with a | to your cut command:
cut -d $'\t' -f 4,5 file.txt | \
awk '{for (i=1;i<NF;i++) s=length($i)-length($NF); printf s"\n"}'

Find unique words

Suppose there is one file.txt in which below content text is written:
ABC/xyz
ABC/xyz/rst
EFG/ghi
I need to write a shell script that can extract the first unique word before the first /.
So as output, I want ABC and EFG to be written in one file.
You can extract the first word with cut (slash as delimiter), then pipe to sort with the -u (for "unique") option:
$ cut -d '/' -f 1 file.txt | sort -u
ABC
EFG
To get the output into a file, just redirect by appending > filename to the command. (Or pipe to tee filename to see the output and get it in a file.)
Try this :
cat file.txt | tr -s "/" ' ' | awk -F " " '{print $1}' | sort | uniq > outfile.txt
Another interesting variation:
awk -F'/' '{print $1 |" sort -u" }' file.txt > outfile.txt
Not that it matters here, but being able to pipe and redirect within awk can be very handy.
Another easy way:
cut -d"/" -f1 file.txt|uniq > out.txt
You can use a mix of cut and sort like so:
cut -d '/' -f 1 file.txt | sort -u > newfile.txt
The first line grabs any string until a slash / and outputs it into newfile.txt.
The second line sorts the text, removing any duplicate strings you might have.

Formatting numbers to fixed width

I have this output:
30.1.2003
3.3.2003
25.12.2003
I want to make print each value except the year in two digits (might have leading values). i.e
30.01.2003
03.03.2003
25.12.2003
You can use printf:
echo 30.1.2003 | tr . ' ' | xargs printf '%02d.%02d.%04d\n'
awk -F. -v OFS="." '{for(i=1;i<=NF;i++)$i=(length($i)<2?"0":"")$i}7' file
if your output was from some process, do :
yourApp|awk -F. -v OFS="." '{for(i=1;i<=NF;i++)$i=(length($i)<2?"0":"")$i}7'

how awk takes the result of a unix command as a parameter?

Say there is an input file with tabs delimited field, the first field is integer
1 abc
1 def
1 ghi
1 lalala
1 heyhey
2 ahb
2 bbh
3 chch
3 chchch
3 oiohho
3 nonon
3 halal
3 whatever
First, i need to compute the counts of the unique values in the first field, that will be:
5 for 1, 2 for 2, and 6 for 3
Then I need to find the max of these counts, in this case, it's 6.
Now i need to pass "6" to another awk script as a parmeter.
I know i can use command below to get a list of count:
cut -f1 input.txt | sort | uniq -c | awk -F ' ' '{print $1}' | sort
but how do i get the first count number and pass it to the next awk command as a parameter not as an input file?
This is nothing very specific for awk.
Either a program can read from stdin, then you can pass the input with a pipe:
prg1 | prg2
or your program expects input as parameter, then you use
prg2 $(prg1)
Note that in both cases prg1 is processed before prg2.
Some programs allow both possibilities, while a huge amount of data is rarely passed as argument.
This AWK script replaces your whole pipeline:
awk -v parameter="$(awk '{a[$1]++} END {for (i in a) {if (a[i] > max) {max = a[i]}}; print max}' inputfile)" '{print parameter}' otherfile
where '{print parameter}' is a standin for your other AWK script and "otherfile" is the input for that script.
Note: It is extremely likely that the two AWK scripts could be combined into one which would be less of a hack than doing it in a way such as that outlined in your question (awk feeding awk).
You can use the shell's $() command substitution:
awk -f script -v num=$(cut -f1 input.txt | sort | uniq -c | awk -F ' ' '{print $1}' | sort | tail -1) < input_file
(I added the tail -1 to ensure that at most one line is used.)

Resources