Move Second Column of Each Row to a New Next Row - bash

I can move every second row into the second column of the previous row by:
awk '{printf "%s%s",$0,(NR%2?FS:RS)}' file > newfile
But I can't do it the other way around. What I have is as below:
1 a
2 b
3 c
I need
1
a
2
b
3
c
I have checked several similar column-row shifting questions, but couldn't figure out my case. Thanks!

You can use this awk command with OFS='\n' to get output field separator as newline after forcing awk to rewrite each record with $1=$1 trick:
awk '{$1=$1} 1' OFS='\n' file
1
a
2
b
3
c
You can also use grep -o:
grep -Eo '\w+' file
1
a
2
b
3
c

Just use xargs with 1 record at a time,
xargs -n1 <file
1
a
2
b
3
c
From the man xargs page
-n max-args, --max-args=max-args
Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option) is exceeded, unless the
-x option is given, in which case xargs will exit.

you can use tr
cat file | tr ' ' '\n'
or sed
sed -r 's/ /\n/g' file
you get,
1
a
2
b
3
c

Related

awk to do group by sum of column

I have this csv file and I am trying to write shell script to calculate sum of column after doing group by on it. Column number is 11th (STATUS)
My script is
awk -F, 'NR>1{arr[$11]++}END{for (a in arr) print a, arr[a]}' $f > $parentdir/outputfile.csv;
File output expected is
COMMITTED 2
but actual output is just 2.
It prints only count and not group by sum. If I delete any other columns and run same query then it works fine but not with below sample data.
FILE NAME;SEQUENCE NR;TRANSACTION ID;RUN NUMBER;START EDITCREATION;END EDITCREATION;END COMMIT;EDIT DURATION;COMMIT DURATION;HAS DEPENDENCY;STATUS;DETAILS
Buldhana_Refinesource_FG_IW_ETS_000001.xml;1;4a032127-b20d-4fa8-9f4d-7f2999c0c08f;1;20180831130210345;20180831130429638;20180831130722406;140;173;false;COMMITTED;
Buldhana_Refinesource_FG_IW_ETS_000001.xml;2;e4043fc0-3b0a-46ec-b409-748f98ce98ad;1;20180831130722724;20180831130947144;20180831131216693;145;150;false;COMMITTED;
change the FS to ; in your script
awk -F';' 'NR>1{arr[$11]++}END{for (a in arr) print a, arr[a]}' file
COMMITTED 2
You're using wrong field separator. Use
awk -F\;
; must be escaped to use it as a literal. Except this, your approach seems OK.
Besides awk, you may also use
tail -n +2 $f | cut -f11 -d\; | sort | uniq -c
or
datamash --header-in -t \; -g 11 count 11 < $f
to do the same thing.

count patterns in a csv file from another csv file in bash

I have two csv files
File A
ID
1
2
3
File B
ID
1
1
1
1
3
2
3
What I want to do is to count how many times that a ID in File A show up in File B, and save the result in a new file C (which is in csv format). For example, 1 in File A shows up 4 times in File B. So in the new file C, I should have something like
File C
ID,Count
1,4
2,1
3,2
Originally I was thinking use "grep -f", but it seems like it only works with .txt format. Unfortunately, File A and B are both in csv format. So now, I am thinking maybe I could use a for loop to get the ID from File A individually and use grep -c to count each one of them. Any idea will be helpful.
Thanks in advance!
You can use this awk command:
awk -v OFS=, 'FNR==1{next} FNR==NR{a[$1]; next} $1 in a{freq[$1]++}
END{print "ID", "Count"; for (i in freq) print i, freq[i]}' fileA fileB
ID,Count
1,4
2,1
3,2
You could use join, sort, uniq and process substitution <(command) creatively:
$ join -2 2 <(sort A) <(sort B | uniq -c) | sort -n > C
$ cat C
ID 1
1 4
2 1
3 2
And if you really really want the header to be ID Count, before writing to file C you could replace that 1 with Count with sed by adding:
... | sed 's/\(ID \)1/\1Count/' > C
to get
ID Count
1 4
2 1
3 2
and if you really really want commas as separators instead of spaces, to replace them with spaces using tr, add also:
... | tr \ , > C
to get
ID,Count
1,4
2,1
3,2
You could of course ditch the trand use the sed like this instead:
... | sed 's/\(ID \)1/\1Count/;s/ /,/' > C
And the output would be like above.

Combine two lines from different files when the same word is found in those lines

I'm new with bash, and I want to combine two lines from different files when the same word is found in those lines.
E.g.:
File 1:
organism 1
1 NC_001350
4 NC_001403
organism 2
1 NC_001461
1 NC_001499
File 2:
NC_001499 » Abelson murine leukemia virus
NC_001461 » Bovine viral diarrhea virus 1
NC_001403 » Fujinami sarcoma virus
NC_001350 » Saimiriine herpesvirus 2 complete genome
NC_022266 » Simian adenovirus 18
NC_028107 » Simian adenovirus 19 strain AA153
i wanted an output like:
File 3:
organism 1
1 NC_001350 » Saimiriine herpesvirus 2 complete genome
4 NC_001403 » Fujinami sarcoma virus
organism 2
1 NC_001461 » Bovine viral diarrhea virus 1
1 NC_001499 » Abelson murine leukemia virus
Is there any way to get anything like that output?
You can get something pretty similar to your desired output like this:
awk 'NR == FNR { a[$1] = $0; next }
{ print $1, ($2 in a ? a[$2] : $2) }' file2 file1
This reads in each line of file2 into an array a, using the first field as the key. Then for each line in file1 it prints the first field followed by the matching line in a if one is found, else the second field.
If the spacing is important, then it's a little more effort but totally possible.
For a more Bash 4 ish solution:
declare -A descriptions
while read line; do
name=$(echo "$line" | cut -d '»' -f 1 | xargs echo)
description=$(echo "$line" | cut -d '»' -f 2)
eval "descriptions['$name']=' »$description'"
done < file2
while read line; do
name=$(echo "$line" | cut -d ' ' -f 2)
if [[ -n "$name" && -n "${descriptions[$name]}" ]]; then
echo "${line}${descriptions[$name]}"
else
echo "$line"
fi
done < file1
We could create a sed-script from the second file and apply it to the first file. It is straight forward, we use the sed s command to construct another sed s command from each line and store in a variable for later usage:
sc=$(sed -rn 's#^\s+(\w+)([^\w]+)(.*)$#s/\1/\1\2\3/g;#g; p;' file2 )
sed "$sc" file1
The first command looks so weird, because we use # in the outer sed s and we use the more common / in the inner sed s command as delimiters.
Do a echo $sc to study the inner one. It just takes the parts of each line of file2 into different capture groups and then combines the captured strings to a s/find/replace/g; with
find is \1
replace is \1\2\3
You want to rebuild file2 into a sed-command file.
sed 's# \(\w\+\) \(.*\)#s/\1/\1 \2/#' File2
You can use process substitution to use the result without storing it in a temp file.
sed -f <(sed 's# \(\w\+\) \(.*\)#s/\1/\1 \2/#' File2) File1

Linux bash grouping

I have this file:
count,name
1,B1
1,B1
1,B3
1,B3
1,B2
1,B2
1,B2
and I routinely have to get counters on the total per group. The first number is always one. The only important thing is the group. I wrote a java program to do it for me. The output would be
B1: 2
B2: 3
B3: 2
The format is not important, just the counters per group name.
I was wondering, can this be done in bash? awk? sed?
Well, it is very simple to solve with sort and uniq:
$ sort file | uniq -c
2 1,B1
3 1,B2
2 1,B3
Then, if you need the proper formatting, you may use cut to strip the first column, and awk to print the result:
$ cut -d ',' -f 2 file | sort | uniq -c | awk '{printf "%s: %d\n", $2, $1}'
B1: 2
B2: 3
B3: 2
With awk, I would write
awk -F, 'NR>1 {n[$2]++} END {OFS=":";for (x in n) print x, n[x]}' file
assuming you actually have a header line in the file.

how awk takes the result of a unix command as a parameter?

Say there is an input file with tabs delimited field, the first field is integer
1 abc
1 def
1 ghi
1 lalala
1 heyhey
2 ahb
2 bbh
3 chch
3 chchch
3 oiohho
3 nonon
3 halal
3 whatever
First, i need to compute the counts of the unique values in the first field, that will be:
5 for 1, 2 for 2, and 6 for 3
Then I need to find the max of these counts, in this case, it's 6.
Now i need to pass "6" to another awk script as a parmeter.
I know i can use command below to get a list of count:
cut -f1 input.txt | sort | uniq -c | awk -F ' ' '{print $1}' | sort
but how do i get the first count number and pass it to the next awk command as a parameter not as an input file?
This is nothing very specific for awk.
Either a program can read from stdin, then you can pass the input with a pipe:
prg1 | prg2
or your program expects input as parameter, then you use
prg2 $(prg1)
Note that in both cases prg1 is processed before prg2.
Some programs allow both possibilities, while a huge amount of data is rarely passed as argument.
This AWK script replaces your whole pipeline:
awk -v parameter="$(awk '{a[$1]++} END {for (i in a) {if (a[i] > max) {max = a[i]}}; print max}' inputfile)" '{print parameter}' otherfile
where '{print parameter}' is a standin for your other AWK script and "otherfile" is the input for that script.
Note: It is extremely likely that the two AWK scripts could be combined into one which would be less of a hack than doing it in a way such as that outlined in your question (awk feeding awk).
You can use the shell's $() command substitution:
awk -f script -v num=$(cut -f1 input.txt | sort | uniq -c | awk -F ' ' '{print $1}' | sort | tail -1) < input_file
(I added the tail -1 to ensure that at most one line is used.)

Resources