bash: conserve tab with spaces for alignment with column - bash

I am trying to display .tsv files aligned nicely as columns, and yet allow limiting display to the current screen width. I am able to get this done in the following way that works in general but will fail if the input contains a particular character that is used by column. The current solution that I am using presently works as follows:
bash$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
I tried using tab itself directly but could not make it work. And with default option for column, any whitespace and not just tabs are used so it does not work for me. Would be thankful for any better alternative than the above.
PS:
A sample is shown below
bash:~$ cat sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | column -n -t | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$

You can set column to use tab as character to be used to delimit columns with -s:
column -t -s $'\t' -n sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y

Related

A UNIX Command to Find the Name of the Student who has the Second Highest Score

I am new to Unix Programming. Could you please help me to solve the question.
For example, If the input file has the below content
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
The output will be
ABC
I tried something like this
sort -k3,3 -rn -t" " | head -n2 | awk '{print $2}'
Using awk
awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}'
Demo:
$cat file.txt
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
$awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}' file.txt
ABC
$
Explanation:
NR>1 --> Skip first record
{arr[$3]=$2} --> Create associtive array with marks as index and name as value
END <-- read till end of file
n=asorti(arr,arr_sorted) <-- Sort array arr on index value(i.e marks) and save in arr_sorted. n= number of element in array
print arr[arr_sorted[n-1]]} <-- n-1 will point to second last value in arr_sorted (i.e marks) and print corresponding value from arr
Your attempt is 90% correct just a single change
Try this...it will work.
sort -k3,3 -rn -t" " | head -n1 | awk '{print $2}'
Instead of using head -n2 replace it with head -n1

How to sum up on a column

I have a file that looks like this :
A B C 128 D
Z F R 18 -
M W A 1 -
T B D 21 P
Z F R 11 -
L W A 10 D
I am looking for a way to sum up the column 4 ( for the lines witch column 5 look like D) here in this example will be: 128 + 10 = 138 .
I managed to sum up all the 4th column with this command :
cat file.txt |awk '{total+= $4} END {print total}'
You just omitted the pattern to select which lines your action applies to.
awk '$5 == "D" {total += $4} END {print total}' file.txt
In awk, the pattern is applied to each line of input, and if the pattern matches, the action is applied. If there is no pattern (as in your attempt), the line is unconditionally processed.
A solution with datamash and sort:
cat file.txt | sort -k5 | datamash -W -g5 sum 4
sort -k5 for sorting according the 5 column.
datamash uses -W to specify that whitespace is the separator, -g5 to group by 5th column, and finally sum 4 to get sum of 4th column.
It gives this output:
- 30
D 138
P 21

Count duplicated couple of lines

I have a configuration file with this format:
cod 11
loc1 23
pto1 33
loc2 55
pto2 66
cod 12
loc1 55
pto1 66
loc2 88
pto2 77
...
I want to count how many times a pair of numbers appear in sequence loc/pto (indipendently of loc/pto number). In the example, the couple 55/66 appears 2 times (once as loc1/pto1 and one as loc2/pto2).
I have googled around and tried some combination of grep, uniq and awk but I only managed in count single line or number duplicated. I read the man documentation of those commands not finding any clue relative to my problem.
You could use the following:
$ sort file | uniq -f1 -dc
2 loc1 55
2 pto1 66
-f1 is skipping the 1st field when comparing lines
-dc is printing duplicate line with its associated count
Despite no visible effort on the part of the OP, this was an interesting question to work out.
awk '{for (i=1 ; i < 10 ; i++) if (NR == i) array[i]=$2} END {for (i=1 ; i < 10 ; i++) print array[i] "," array[i+1]}' file | sort | uniq -c
Output-
1 11,23
1 12,55
1 23,33
1 33,55
2 55,66
1 66,12
1 66,88
1 88,
The output tells you that 55 is followed by 66 twice. Other pairs only occur once.
Explanation-
I define an array in awk whoe elements are the ith number in the second column. The part after the END concatenates the ith and i+1th element. Then there is a sort | uniq -c to see if these pairs occur more than once.
If you want to know how many times a duplicate number appeared in the file:
awk '{print $2}' <filename> | sort | uniq -dc
Output:
2 55
2 66
If you want to know how many times a number appeared in the file regardless of being duplicate or not:
awk '{print $2}' <filename> | sort | uniq -c
Output:
1 11
1 12
1 23
1 33
2 55
2 66
1 77
1 88
If you want to print the full line on duplicate match based on second column:
awk '{print $2}' <filename> | sort | uniq -d | grep -F -f - <filename>
Output:
loc2 55
pto2 66
loc1 55
pto1 66

How to grep two column from a single file

cat Error00
4 0 375
4 2001 21
4 2002 20
cat Error01
4 0 465
4 2001 12
4 2002 40
4 2016 1
I want output as below
4 0 375 465
4 2001 21 12
4 2002 20 20
4 2016 - 1
i am using the below query. here problem is i m not able to handle grep for two field because space is coming.
please suggest how can to get rid of this.
keylist=$(awk '{print $1,$2'} Error0[0-1] | sort | uniq)
for key in ${keylist} ; do
echo ${key}
val_a=$(grep "^${key}" Error00 | awk '{print $3}') ;val_a=${val_a:---}
val_b=$(grep "^${key}" Error01 | awk '{print $1,$2}') ; val_b=${val_b:--- --}
echo $key ${val_a} >>testreport
done
i m geting the oputput as below
4 375 465
0
4 21 12
2001
4 20 20
2002
4 - 1
2016
A single awk one liner can handle this easily:
awk 'FNR==NR{a[$1,$2]=$3;next}{print $1,$2,(a[$1,$2]?a[$1,$2]:"-"),$3}' err0 err1
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1
For formatted output you can use printf instead of print. Like Jonathan Leffler suggest:
printf "%s %-6s %-6s %s\n",$1,$2,(a[$1,$2]?a[$1,$2]:"-"),$3
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1
However a general solution is to use column -t for a nice table output:
awk '{....}' err0 err1 | column -t
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1
grep is not really the right tool for this job. You can either play with awk or Perl (or Python, or …), or you can use join. However, join only joins on a single column at a time, and you appear to need to join on two columns. So, we're going to have to massage the data so that it will work with join. I'm about to assume you're using bash and so have process substitution available. You can do the job without, but it is fiddlier and involves temporary files (and traps to clean them up, etc).
The key to the join will be to replace the blank between the first two columns with a colon (or any other convenient character — control-A would work fine too), then join the files on column 1 with a replacement character. The inputs must be sorted; the output must have the colon replaced with a blank.
$ join -o 0,1.2,2.2 -a 1 -a 2 -e '-' \
> <(sed 's/ */:/' Error00 | sort) \
> <(sed 's/ */:/' Error01 | sort) |
> sed 's/:/ /'
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1
$
The 's/ */:/' operation replaces the first sequence of one or more blanks with a colon; the input data has two blanks between the 4 and the 0 in the first line of Error00. The input to join must be in sorted order of the joining field, here the first field. The output is the join field, the second column of Error00 and the second column of Error01 (remembering that means the second column after the first two have been fused by the colon). If there's an unmatched line in the first file, generate an output line (-a 1); ditto for the second file; and for the missing fields, insert a dash (-e '-'). The final sed removes the colon that was added.
If you want the data formatted, pipe it through awk.
$ join -o 0,1.2,2.2 -a 1 -a 2 -e '-' \
> <(sed 's/ */:/' Error00 | sort) \
> <(sed 's/ */:/' Error01 | sort) |
> sed 's/:/ /' |
> awk '{printf("%s %-6s %-6s %s\n", $1, $2, $3, $4)}'
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1
$

Counting the number of 10-digit numbers in a file

I need to count the total number of instances in which a 10-digit number appears within a file. All of the numbers have leading zeros, e.g.:
This is some text. 0000000001
Returns:
1
If the same number appears more than once, it is counted again, e.g.:
0000000001 This is some text.
0000000010 This is some more text.
0000000001 This is some other text.
Returns:
3
Sometimes there are no spaces between the numbers, but each continuous string of 10-digits should be counted:
00000000010000000010000000000100000000010000000001
Returns:
5
How can I determine the total number of 10-digit numbers appearing in a file?
Try this:
grep -o '[0-9]\{10\}' inputfilename | wc -l
The last requirement - that you need to count multiple numbers per line - excludes grep, as far as I know it can count only per-line.
Edit: Obviously, I stand corrected by Nate :) grep's -o option is what I was looking for.
You can however do this easily with sed like this:
$ cat mkt.sh
sed -r -e 's/[^0-9]/./g' -e 's/[0-9]{10}/num /g' -e 's/[0-9.]//g' $1
$ for i in *.txt; do echo --- $i; cat $i; echo --- number count; ./mkt.sh $i|wc -w; done
--- 1.txt
This is some text. 0000000001
--- number count
1
--- 2.txt
0000000001 This is some text.
0000000010 This is some more text.
0000000001 This is some other text.
--- number count
3
--- 3.txt
00000000010000000010000000000100000000010000000001
--- number count
5
--- 4.txt
1 2 3 4 5 6 6 7 9 0
11 22 33 44 55 66 77 88 99 00
123456789 0
--- number count
0
--- 5.txt
1.2.3.4.123
1234567890.123-AbceCMA-5553///q/\1231231230
--- number count
2
$
This might work for you:
cat <<! >test.txt
0000000001 This is some text.
0000000010 This is some more text.
0000000001 This is some other text.
00000000010000000010000000000100000000010000000001
1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 0 j
12345 67890 12 34 56 78 90
!
sed 'y/X/ /;s/[0-9]\{10\}/\nX\n/g' test.txt | sed '/X/!d' | sed '$=;d'
8
"I need to count the total number of instances in which a 10-digit number appears within a file. All of the numbers have leading zeros"
So I think this might be more accurate:
$ grep -o '0[0-9]\{9\}' filename | wc -l

Resources