Check a Value and Tag according to that value and append in same row using shell - bash

I have a file as
NUMBER|05-1-2016|05-2-2016|05-3-2016|05-4-2016|
0000000 | 0 | 225.993 | 0 | 324|
0003450 | 89| 225.993 | 0 | 324|
0005350 | 454 | 225.993 | 54 | 324|
In example There are four dates in the header
I want to check the value under the date for the field 1 'number' and tag values according to that using shell
example if value is between 0-100 tag 'L' and if greater than 100 , tag 'H'
So the output should be like
NUMBER|05-1-2016|05-2-2016|05-3-2016|05-4-2016|05-1-2016|05-2-2016|05-3-2016|05-4-2016|
0000000 | 0 | 225.993 | 0 | 324| L | H | L | H|
0003450 | 89| 225.993 | 0 | 324|L | H | L | H|
0005350 | 454 | 225.993 | 54 | 324|H | H | L | H|

A quick and dirty example, that:
sets the input and output field separator (-F and OFS below) to |,
prints the the header (record with NR==1)
for all others prints the fields 1-5, and then executes function lh for fields 2-5
defines the function lh, as one returning L for values < 100, and H for all others
Code:
awk -F \| '
BEGIN {OFS="|"}
NR==1 {print}
NR > 1 {print $1, $2, $3, $4, $5, lh($2), lh($3), lh($4), lh($5) }
function lh(val) { return (val < 100) ? "L" : "H"}
' file.txt
Alternative function lh:
function lh(val) {
result = "";
if (val < 100) {
result = "L";
} else {
result = "H";
}
return result;
}

Related

how to delete few rows of data from a text file using shell scripting based on some conditions

I have a text file with more than 100k rows. Below mentioned data is a sample for the text file I have. I want to use some conditions on this data and delete some rows. The text file does not have headers (ID,NAME,Code-1,code,2-code-3). I mentioned for reference. How can I achieve this with shell scripting?
Input test file:
| ID | NAME | Code-1 | code-2 | code-3 |
| $$ | 5HF | 1E | N | Y |
| $$ | 2MU | 3C | N | Y |
| $$ | 32E | 3C | N | N |
| AB | 3CH | 3C | N | N |
| MK | A1M | AS | P | N |
| $$ | Y01 | 01 | F | Y |
| $$ | BG0 | 0G | F | N |
Conditions:
if code-2 = 'N' and code-1 not equal to ( '3C' , '3B' , '32' , '31' , '3D' ) then ID='$$'
if code-2 ='N' and code-1 equal to ( '3C' , '3B' , '32' , '31' , '3D') then accept any ID and (accept ID='$$' only if code-3='Y')'
if code-2 != 'N' then accept (ID='$$' only if code-3='Y') and all other IDs
Output:
| ID | NAME | Code-1 | code-2 | code-3 |
| $$ | 5HF | 1E | N | Y |
| $$ | 2MU | 3C | N | Y |
| AB | 3CH | 3C | N | N |
| MK | A1M | AS | P | N |
| $$ | Y01 | 01 | F | Y |
It's encouraged you demonstrate own efforts when ask questions. But I do understand this question could be complicated if you are new to Bash. Here is my solution using awk. Spent 0.545s processed 137k lines on my computer (with moderate specs).
awk '{
ID=$2; NAME=$4; CODE1=$6; CODE2=$8; CODE3=$10;
if (CODE2 == "N") {
if (CODE1 ~ /(3C|3B|32|31|3D)/) {
if (ID == "$$") {
if (CODE3 == "Y") {
print;
}
}
else {
print;
}
}
else {
if (ID == "$$") {
print;
}
}
}
else {
if (ID == "$$") {
if (CODE3 == "Y") {
print;
}
}
else {
print;
}
}}' file
Note it has certain restrictions:
a) It delimits values by spaces not |. It will work with your exact input format, but won't work with input rows without additional spaces, e.g.
|$$|32E|3C|N|N|
|AB|3CH|3C|N|N|
b) For the same reason, the command will generate incorrect result, if col value has extra spaces, e.g.
| $$ | 32E FOO | 3C | N | N |
| AB | 3CH BBT | 3C | N | N |

Sort multiple tables inside Markdown file with text interspersed between them

There is a Markdown file with headings, text, and unsorted tables. I want to programmatically sort each table by ID, which is the 3rd column, in descending order, preferably using PowerShell or Bash. The table would remain in its place in the file.
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 1A | Item 1B | 8 |
| Item 2A | Item 2B | 9 |
| Item 3A | Item 3B | 6 |
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 4A | Item 4B | 3 |
| Item 5A | Item 5B | 2 |
| Item 6A | Item 6B | 4 |
I have no control over how the Markdown file is generated. Truly.
Ideally the file would remain in Markdown after the sort for additional processing. However, I explored these options without success:
Convert to JSON and sort (the solutions I tried didn't agree with tables)
Convert to HTML and sort (only found JavaScript solutions)
This script alone, while helpful, would need to be modified to parse through the Markdown file (having trouble finding understandable guidance on how to run a script on content between two strings)
The reason for command line (and not JavaScript on the HTML, for example) is that this transformation will take place in an Azure Release Pipeline. It is possible to add an Azure Function to the pipeline, which would allow me to run JavaScript code in the cloud, and I will pursue that if all else fails. I want to exhaust command-line options first because I am not very familiar with JavaScript or how to pass content between Functions and releases.
Thank you for any ideas.
By modifying the referred script, how about:
flush() {
printf "%s\n" "${lines[#]:0:2}"
printf "%s\n" "${lines[#]:2}" | sort -t \| -nr -k 4
lines=()
}
while IFS= read -r line; do
if [[ ${line:0:1} = "|" ]]; then
lines+=("$line")
else
(( ${#lines[#]} > 0 )) && flush
echo "$line"
fi
done < input.md
(( ${#lines[#]} > 0 )) && flush
Output:
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 2A | Item 2B | 9 |
| Item 1A | Item 1B | 8 |
| Item 3A | Item 3B | 6 |
# Heading
Text
| Col A | Col B | ID |
|---------|---------|----|
| Item 6A | Item 6B | 4 |
| Item 4A | Item 4B | 3 |
| Item 5A | Item 5B | 2 |
BTW, if Perl is your option, here is an alternative:
perl -ne '
sub flush {
print splice(#ary, 0, 2); # print header lines
# sort the table with keying the ID by Schwartzian transform
print map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [$_, (split(/\s*\|\s*/))[3] ] }
#ary;
#ary = ();
}
# main loop
if (/^\|/) { # table section
push(#ary, $_);
} else { # other section
if ($#ary > 0) {
&flush;
} else {
print;
}
}
END {
if ($#ary > 0) { &flush; }
}
' input.md
Hope this helps.
If possible to identify markdown tables, a small 'awk' (or bash/python/perl) can filter the output. It assume each table has 2 header line.
awk -v 'FS="|" '
function cmp_id(i1, v1, i2, v2) {
return v1-v2 ;
}
function show () {
asorti(k, d, "cmp_id")
# for (i=1 ; i<=n; i++ ) print i, k[i], d[i] ;
# Print first 2 original header row, followed by sorted data lines
print s[1] ; print s[2]
for (i=1 ; i<=n; i++ ) if ( d[i]>=3 ) print s[d[i]] ;
n = 0
}
# Capture tables
/^\|/ { s[++n] = $0 ; k[n] = $4 ; next }
n > 0 { show() ; }
{ print }
END { show() ; }
'

awk command to print multiple columns using for loop

I am having a single file in which it contains 1st and 2nd column with item code and name, then from 3rd to 12th column which contains its 10 days consumption quantity continuously.
Now i need to convert that into 10 different files. In each the 1st and 2nd column should be the same item code and item name and the 3rd column will contain the consumption quantity of one day in each..
input file:
Code | Name | Day1 | Day2 | Day3 |...
10001 | abcd | 5 | 1 | 9 |...
10002 | degg | 3 | 9 | 6 |...
10003 | gxyz | 4 | 8 | 7 |...
I need the Output in different file as
file 1:
Code | Name | Day1
10001 | abcd | 5
10002 | degg | 3
10003 | gxyz | 4
file 2:
Code | Name | Day2
10001 | abcd | 1
10002 | degg | 9
10003 | gxyz | 8
file 3:
Code | Name | Day3
10001 | abcd | 9
10002 | degg | 6
10003 | gxyz | 7
and so on....
I wrote a code like this
awk 'BEGIN { FS = "\t" } ; {print $1,$2,$3}' FILE_NAME > file1;
awk 'BEGIN { FS = "\t" } ; {print $1,$2,$4}' FILE_NAME > file2;
awk 'BEGIN { FS = "\t" } ; {print $1,$2,$5}' FILE_NAME > file3;
and so on...
Now i need to write it with in a 'for' or 'while' loop which would be faster...
I dont know the exact code, may be like this..
for (( i=3; i<=NF; i++)) ; do awk 'BEGIN { FS = "\t" } ; {print $1,$2,$i}' input.tsv > $i.tsv; done
kindly help me to get the output as i explained.
If you absolutely need to to use a loop in Bash, then your loop can be fixed like this:
for ((i = 3; i <= 10; i++)); do awk -v field=$i 'BEGIN { FS = "\t" } { print $1, $2, $field }' input.tsv > file$i.tsv; done
But it would be really better to solve this using pure awk, without shell at all:
awk -v FS='\t' '
NR == 1 {
for (i = 3; i < NF; i++) {
fn = "file" (i - 2) ".txt";
print $1, $2, $i > fn;
print "" >> fn;
}
}
NR > 2 {
for (i = 3; i < NF; i++) {
fn = "file" (i - 2) ".txt";
print $1, $2, $i >> fn;
}
}' inputfile
That is, when you're on the first record,
create the output files by writing the header line and a blank line (as in specified in your question).
For the 3rd and later records, append to the files.
Note that the code in your question suggests that the fields in the file are separated by tabs, but the example files seem to use | padded with variable number of spaces. It's not clear which one is your actual case. If it's really tab-separated, then the above code will work. If in fact it's as the example inputs, then change the first line to this:
awk -v OFS=' | ' -v FS='[ |]+' '
bash + cut solution:
input.tsv test content:
Code | Name | Day1 | Day2 | Day3
10001 | abcd | 5 | 1 | 9
10002 | degg | 3 | 9 | 6
10003 | gxyz | 4 | 8 | 7
day_splitter.sh script:
#!/bin/bash
n=$(cat $1 | head -1 | awk -F'|' '{print NF}') # total number of fields
for ((i=3; i<=$n; i++))
do
fn="Day"$(($i-2)) # file name containing `Day` number
$(cut -d'|' -f1,2,$i $1 > $fn".txt")
done
Usage:
bash day_splitter.sh input.tsv
Results:
$cat Day1.txt
Code | Name | Day1
10001 | abcd | 5
10002 | degg | 3
10003 | gxyz | 4
$cat Day2.txt
Code | Name | Day2
10001 | abcd | 1
10002 | degg | 9
10003 | gxyz | 8
$cat Day3.txt
Code | Name | Day3
10001 | abcd | 9
10002 | degg | 6
10003 | gxyz | 7
In pure awk:
$ awk 'BEGIN{FS=OFS="|"}{for(i=3;i<=NF;i++) {f="file" (i-2); print $1,$2,$i >> f; close(f)}}' file
Explained:
$ awk '
BEGIN {
FS=OFS="|" } # set delimiters
{
for(i=3;i<=NF;i++) { # loop the consumption fields
f="file" (i-2) # create the filename
print $1,$2,$i >> f # append to target file
close(f) } # close the target file
}' file

How to align text in columns (centered) without removing delimiter?

I would like to adjust my source centered in columns...
Source:
IP | ASN | Prefix | AS Name | CN | Domain | ISP
109.228.12.96 | 8560 | 109.228.0.0/18 | ONEANDONE | DE | fasthosts.com | Fast Hosts LTD
Goal:
IP | ASN | Prefix | AS Name | CN | Domain | ISP
109.228.12.96 | 8560 | 109.228.0.0/18 | ONEANDONE | DE | fasthosts.com | Fast Hosts LTD
I tried different things with the command column...but I have double spaces inside:
cat Source.txt | sed 's/ *| */#| /g' | column -s '#' -t
IP | ASN | Prefix | AS Name | CN | Domain | ISP
109.228.12.96 | 8560 | 109.228.0.0/18 | ONEANDONE | DE | fasthosts.com | Fast Hosts LTD
Is there a way to use column without removing the delimiter...or another solution?
Thanks in advance for your help!
You can also do everything in awk. Save the program to pr.awk and run
awk -f pr.awk input.dat
BEGIN {
FS = "|"
ARGV[2] = "pass=2" # a trick to read file two times
ARGV[3] = ARGV[1]
ARGC=4
pass = 1
}
function trim(s) {
sub(/^[[:space:]]+/, "", s) # remove leading
sub(/[[:space:]]+$/, "", s) # and trailing whitespaces
return s
}
pass == 1 {
for (i=1; i<=NF; i++) {
field = trim($i)
len = length(field)
w[i] = len>w[i] ? len : w[i] # find the maximum width
}
}
pass == 2 {
line = ""
for (i=1; i<=NF; i++) {
field = trim($i)
s = i==NF ? field : sprintf("%-" w[i] "s", field)
sep = i==1 ? "" : " | "
line = line sep s
}
print line
}
column has input sepatator -s and also output seperator -o
so call is like
cat file | column -t -s '|' -o '|'

Split a column into separate columns based on value

I have a tab delimited file that looks as follows:
cat my file.txt
gives:
1 299
1 150
1 50
1 57
2 -45
2 62
3 515
3 215
3 -315
3 -35
3 3
3 6789
3 34
5 66
5 1334
5 123
I'd like to use Unix commands to get a tab-delimited file that based on values in column#1, each column of the output file will hold all relevant values of column#2
(I'm using here for the example the separator "|" instead of tab only to illustrate my desired output file):
299 | -45 | 515 | 66
150 | 62 | 215 | 1334
50 | | -315 |
57 | | -35 |
| | 3 |
The corresponding Headers (1,2,3,5; based on column#1 values) could be a nice addition to the code (as shown below), but the main request is to split the information of the first file into separated columns. Thanks!
1 | 2 | 3 | 5
299 | -45 | 515 | 66
150 | 62 | 215 | 1334
50 | | -315 |
57 | | -35 |
| | 3 |
Here's a one liner that matches your output. It builds a string $ARGS containing as many process substitutions as there are unique values in the first column. Then, $ARGS is used as the argument for the paste command:
HEADERS=$(cut -f 1 file.txt | sort -n | uniq); ARGS=""; for h in $HEADERS; do ARGS+=" <(grep ^"$h"$'\t' file.txt | cut -f 2)"; done; echo $HEADERS | tr ' ' '|'; eval "paste -d '|' $ARGS"
Output:
1|2|3|5
299|-45|515|66
150|62|215|1334
50||-315|
57||-35|
||3|
You can use gnu-awk
awk '
BEGIN{max=0;}
{
d[$1][length(d[$1])+1] = $2;
if(length(d[$1])>max)
max = length(d[$1]);
}
END{
PROCINFO["sorted_in"] = "#ind_num_asc";
line = "";
flag = 0;
for(j in d){
line = line (flag?"\t|\t":"") j;
flag = 1;
}
print line;
for(i=1; i<=max; ++i){
line = "";
flag = 0;
for(j in d){
line = line (flag?"\t|\t":"") d[j][i];
flag = 1;
}
print line;
}
}' file.txt
you get
1 | 2 | 3 | 5
299 | -45 | 515 | 66
150 | 62 | 215 | 1334
50 | | -315 |
57 | | -35 |
| | 3 |
Or, you can use python .... for example, in split2Columns.py
import sys
records = [line.split() for line in open(sys.argv[1])]
import collections
records_dict = collections.defaultdict(list)
for key, val in records:
records_dict[key].append(val)
from itertools import izip_longest
print "\t|\t".join(records_dict.keys())
print "\n".join(("\t|\t".join(map(str,l)) for l in izip_longest(*records_dict.values(), fillvalue="")))
python split2Columns.py file.txt
you get same result
#Jose Ricardo Bustos M. - thanks for your answer! I unfortunately couldn't install on my Mac the gnu-awk, but based on your suggestive answer I've performed something similar using awk:
HEADERS=$(cut -f 1 try.txt | awk '!x[$0]++');
H=( ${HEADERS// / });
MAXUNIQNUM=$(cut -f 1 try.txt |uniq -c|awk '{print $1}'|sort -nr|head -1);
awk -v header="${H[*]}" -v max=$MAXUNIQNUM
'BEGIN {
split(header,headerlist," ");
for (q = 1;q <= length(headerlist); q++)
{counter[q]=1;}
}
{for (z = 1; z <= length(headerlist); z++){
if (headerlist[z] == $1){
arr[counter[z],headerlist[z]] = $2;
counter[z]++
};
}
}
END {
for (x = 1; x <= max; x++){
for (y = 1; y<= length(headerlist); y++){
printf "%s\t",arr[x,headerlist[y]];
}
printf "\n"
}
}' try.txt
This is using an array to keep track of the column headings, using them to name temporary files and paste everything together in the end:
#!/bin/bash
infile=$1
filenames=()
idx=0
while read -r key value; do
if [[ "${filenames[$idx]}" != "$key" ]]; then
(( ++idx ))
filenames[$idx]="$key"
echo -e "$key\n----" > "$key"
fi
echo "$value" >> "$key"
done < "$1"
paste "${filenames[#]}"
rm "${filenames[#]}"

Resources