Format output in bash - bash

I have output in bash that I would like to format. Right now my output looks like this:
1scom.net 1
1stservicemortgage.com 1
263.net 1
263.sina.com 1
2sahm.org 1
abac.com 1
abbotsleigh.nsw.edu.au 1
abc.mre.gov.br 1
ableland.freeserve.co.uk 1
academicplanet.com 1
access-k12.org 1
acconnect.com 1
acconnect.com 1
accountingbureau.co.uk 1
acm.org 1
acsalaska.net 1
adam.com.au 1
ada.state.oh.us 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
aecom.yu.edu 1
aecon.com 1
aetna.com 1
agedwards.com 1
ahml.info 1
The problem with this is none of the numbers on the right line up. I would like them to look like this:
1scom.net 1
1stservicemortgage.com 1
263.net 1
263.sina.com 1
2sahm.org 1
Would there be anyway to make them look like this without knowing exactly how long the longest domain is? Any help would be greatly appreciated!
The code that outputted this is:
grep -E -o -r "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" $ARCHIVE | sed 's/.*#//' | uniq -ci | sort | sed 's/^ *//g' | awk ' { t = $1; $1 = $2; $2 = t; print; } ' > temp2

ALIGNMENT:
Just use cat with column command and thats it:
cat /path/to/your/file | column -t
For more details on column command refer http://manpages.ubuntu.com/manpages/natty/man1/column.1.html
EDITED:
View file in terminal:
column -t < /path/to/your/file
(as noted by anishsane)
Export to a file:
column -t < /path/to/your/file > /output/file

Related

Write the frequency of each number in column into next column in bash

I have a column with different numbers in each row in a text file. Now, I want the frequency of each number into a new column. And the similar rows should be deleted, to have only each unique number in the first column and the frequency in the second column.
Input:
0.32832977
0.31647876
0.31482627
0.31447645
0.31447645
0.31396809
0.31281157
0.312004
0.31102326
0.30771822
0.30560062
0.30413213
0.30373717
0.29636685
0.29622422
0.29590765
0.2949896
0.29414582
0.28841901
0.28820667
0.28291832
0.28243792
0.28156429
0.28043638
0.27872239
0.27833349
0.27825573
0.27669023
0.27645657
0.27645657
0.27645657
0.27645657
Output:
0.32832977 1
0.31647876 1
0.31482627 1
0.31447645 2
0.31396809 1
0.31281157 1
0.312004 1
0.31102326 1
0.30771822 1
0.30560062 1
0.30413213 1
0.30373717 1
0.29636685 1
0.29622422 1
0.29590765 1
0.2949896 1
0.29414582 1
0.28841901 1
0.28820667 1
0.28291832 1
0.28243792 1
0.28156429 1
0.28043638 1
0.27872239 1
0.27833349 1
0.27825573 1
0.27669023 1
0.27645657 4
I tried this command, but it doesn't seem to work:
awk -F '|' '{freq[$1]++} END{for (i in freq) print freq[i], i}' file
Using Awk is an overkill IMO here, the built-in tools will do the work just fine:
sort -n file | uniq -c | sort
Output:
1 0.32832977
2 0.31447645
4 0.27645657
For completeness, this would be the awk solution (no need to set the input field separator to | if your sample input is representative).
awk '{f[$0]++} END{for (i in f) print i, f[i]}' input.txt
0.28820667 1
0.30560062 1
0.312004 1
0.28156429 1
0.28291832 1
0.29636685 1
0.31447645 2
0.30373717 1
0.31482627 1
:
You can, however, set the output field separator to | or (as I did here) to a tab character, to format the output
awk '{f[$0]++} END{OFS="\t"; for (i in f) print i, f[i]}' input.txt
0.28820667 1
0.30560062 1
0.312004 1
0.28156429 1
0.28291832 1
0.29636685 1
0.31447645 2
0.30373717 1
0.31482627 1
:

select multiple patterns with grep

I have file that looks like that:
t # 3-7, 1
v 0 104
v 1 92
v 2 95
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-8, 1
v 0 94
v 1 13
v 2 19
v 3 5
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-9, 1
v 0 94
v 1 13
v 2 19
v 3 7
u 0 1 2
u 0 2 2
u 0 3 2
t corresponds to header of each block.
I would like to extract multiple patterns from the file and output transactions that contain required patterns altogether.
I tried the following code:
ps | grep -e 't\|u 0 1 2' file.txt
and it works well to extract header and pattern 'u 0 1 2'. However, when I add one more pattern, the output list only headers start with t #. My modified code looks like that:
ps | grep -e 't\|u 0 1 2 && u 0 2 2' file.txt
I tried sed and awk solutions, but they do not work for me as well.
Thank you for your help!
Olha
Use | as the separator before the third alternative, just like the second alternative.
grep -E 't|u 0 1 2|u 0 2 2' file.txt
Also, it doesn't make sense to specify a filename and also pipe ps to grep. If you provide filename arguments, it doesn't read from the pipe (unless you use - as a filename).
You can use grep with multiple -e expressions to grep for more than one thing at a time:
$ printf '%d\n' {0..10} | grep -e '0' -e '5'
0
5
10
Expanding on #kojiro's answer, you'll want to use an array to collect arguments:
mapfile -t lines < file.txt
for line in "${lines[#]}"
do
arguments+=(-e "$line")
done
grep "${arguments[#]}"
You'll probably need a condition within the loop to check whether the line is one you want to search for, but that's it.

How to find sum of elements in column inside of a text file (Bash)

I have a log file with lots of unnecessary information. The only important part of that file is a table which describes some statistics. My goal is to have a script which will accept a column name as argument and return the sum of all the elements in the specified column.
Example log file:
.........
Skipped....
........
WARNING: [AA[409]: Some bad thing happened.
--- TOOL_A: READING COMPLETED. CPU TIME = 0 REAL TIME = 2
--------------------------------------------------------------------------------
----- TOOL_A statistics -----
--------------------------------------------------------------------------------
NAME Attr1 Attr2 Attr3 Attr4 Attr5
--------------------------------------------------------------------------------
AAA 885 0 0 0 0
AAAA2 1 0 2 0 0
AAAA4 0 0 2 0 0
AAAA8 0 0 2 0 0
AAAA16 0 0 2 0 0
AAAA1 0 0 2 0 0
AAAA8 0 0 23 0 0
AAAAAAA4 0 0 18 0 0
AAAA2 0 0 14 0 0
AAAAAA2 0 0 21 0 0
AAAAA4 0 0 23 0 0
AAAAA1 0 0 47 0 0
AAAAAA1 2 0 26 0
NOTE: Some notes
......
Skipped ......
The expected usage script.sh Attr1
Expected output:
888
I've tried to find something with sed/awk but failed to figure out a solution.
tldr;
$ cat myscript.sh
#!/bin/sh
logfile=${1}
attribute=${2}
field=$(grep -o "NAME.\+${attribute}" ${logfile} | wc -w)
sed -nre '/NAME/,/NOTE/{/NAME/d;/NOTE/d;s/\s+/\t/gp;}' ${logfile} | \
cut -f${field} | \
paste -sd+ | \
bc
$ ./myscript.sh mylog.log Attr3
182
Explanation:
assign command-line arguments ${1} and ${2} to the logfile and attribute variables, respectively.
with wc -w, count the quantity of words within the line that
contains both NAME and ${attribute} (the field index) and assign it to field
with sed
suppress automatic printing (-n) and enable extended regular expressions (-r)
find lines between the NAME and NOTE lines, inclusive
delete the lines that match NAME and NOTE
translate each contiguous run of whitespace to a single tab and print the result
cut using the field index
paste all numbers as an infix summation
evaluate the infix summation via bc
Quick and dirty (without any other spec)
awk -v CountCol=2 '/^[^[:blank:]]/ && NF == 6 { S += $( CountCol) } END{ print S + 0 }' YourFile
with column name
awk -v ColName='Attr1' '/^[[:blank:]]/ && NF == 6 { for(i=1;i<=NF;i++){if ( $i == ColName) CountCol = i } /^[^[:blank:]]/ && NF == 6 && CountCol{ S += $( CountCol) } END{ print S + 0 }' YourFile
you should add a header/trailer filter to avoid noisy line (a flag suit perfect for this) but lack of info about structure to set this flag, i use sthe simple field count (assuming text field have 0 as value so not changing the sum when taken in count)
$ awk -v col='Attr3' '/NAME/{for (i=1;i<=NF;i++) f[$i]=i} col in f{sum+=$(f[col]); if (!NF) {print sum+0; exit} }' file
182

Paste multiple files while excluding first column

I have a directory with 100 files of the same format:
> S43.txt
Gene S43-A1 S43-A10 S43-A11 S43-A12
DDX11L1 0 0 0 0
WASH7P 0 0 0 0
C1orf86 0 15 0 1
> S44.txt
Gene S44-A1 S44-A10 S44-A11 S44-A12
DDX11L1 0 0 0 0
WASH7P 0 0 0 0
C1orf86 0 15 0 1
I want to make a giant table containing all the columns from all the files, however when I do this:
paste S88.txt S89.txt | column -d '\t' >test.merge
Naturally, the file contains two 'Gene' columns.
How can I paste ALL the files in the directory at once?
How can I exclude the first column from all the files after the first one?
Thank you.
If you're using bash, you can use process substitution in paste:
paste S43.txt <(cut -d ' ' -f2- S44.txt) | column -t
Gene S43-A1 S43-A10 S43-A11 S43-A12 S44-A1 S44-A10 S44-A11 S44-A12
DDX11L1 0 0 0 0 0 0 0 0
WASH7P 0 0 0 0 0 0 0 0
C1orf86 0 15 0 1 0 15 0 1
(cut -d$'\t' -f2- S44.txt) will read all but first column in S44.txt file.
To do this for all the file matching S*.txt, use this snippet:
arr=(S*txt)
file="${arr[1]}"
for f in "${arr[#]:1}"; do
paste "$file" <(cut -d$'\t' -f2- "$f") > _file.tmp && mv _file.tmp file.tmp
file=file.tmp
done
# Clean up final output:
column -t file.tmp
use join with the --nocheck-order option:
join --nocheck-order S43.txt S44.txt | column -t
(the column -t command to make it pretty)
However, as you say you want to join all the files, and join only takes 2 at a time, you should be able to do this (assuming your shell is bash):
tmp=$(mktemp)
files=(*.txt)
cp "${files[0]}" result.file
for file in "${files[#]:1}"; do
join --nocheck-order result.file "$file" | column -t > "$tmp" && mv "$tmp" result.file
done

sum of column in text file using shell script

I have file like this
1814 1
2076 2
2076 1
3958 1
2076 2
2498 3
2858 2
2858 1
1818 2
1814 1
2423 1
3588 12
2026 2
2076 1
1814 1
3576 1
2005 2
1814 1
2107 1
2810 1
I would like to generate report like this
1814 3
2076 6
3958 1
2858 3
Basically calculate the total for each unique value in column 1
Using awk:
awk '{s[$1] += $2} END{ for (x in s) print x, s[x] }' input
Pure Bash:
declare -a sum
while read key val ; do
((sum[key]+=val))
done < "$infile"
for key in ${!sum[#]}; do
printf "%4d %4d\n" $key ${sum[$key]}
done
The output is sorted:
1814 4
1818 2
2005 2
2026 2
2076 6
2107 1
2423 1
2498 3
2810 1
2858 3
3576 1
3588 12
3958 1
Perl solution:
perl -lane '$s{$F[0]} += $F[1] }{ print "$_ $s{$_}" for keys %s' INPUT
Note that the output is different from the one you gave.
sum totals for each primary key (integers only)
for key in $(cut -d\ -f1 test.txt | sort -u)
do
echo $key $(echo $(grep $key test.txt | cut -d\ -f2 | tr \\n +)0 | bc)
done
simply sum a column of integers
echo $(cut -d\ -f2 test.txt | tr \\n +)0 | bc

Resources