Extract URLs (multiple lines) from texttable - bash

My source:
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| positives | total | scan_date | url |
+===========+=======+======================+==================================================================================+
| 4 | 65 | 2015-09-21 23:29:33 | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
| | | | prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| 1 | 64 | 2015-09-17 19:28:50 | http://thebackpack.fr/ |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| 1 | 64 | 2015-09-17 08:44:16 | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
| | | | prettyphoto/images/prettyPhoto/light_rounded/ |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
I would like to extract the full URLs (Full URL in one line):
hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
hxxp://thebackpack.fr/
hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/
The multiple lines URL is my problem. I tried for example: awk '{print $9}'
Thanks in advance for your help!

You can use this awk command:
awk -F '[[:blank:]]*\\|[[:blank:]]*' 'NR<3 || NF<5{next}
$2{if (url) print url; url=$5; next}
{url=url $5}
END{print url}' file
Output:
http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
http://thebackpack.fr/
http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/

Related

Use AWK with delimiter to print specific columns

My file looks as follows:
+------------------------------------------+---------------+----------------+------------------+------------------+-----------------+
| Message | Status | Adress | Changes | Test | Calibration |
|------------------------------------------+---------------+----------------+------------------+------------------+-----------------|
| Hello World | Active | up | 1 | up | done |
| Hello Everyone Here | Passive | up | 2 | down | none |
| Hi there. My name is Eric. How are you? | Down | up | 3 | inactive | done |
+------------------------------------------+---------------+----------------+------------------+------------------+-----------------+
+----------------------------+---------------+----------------+------------------+------------------+-----------------+
| Message | Status | Adress | Changes | Test | Calibration |
|----------------------------+---------------+----------------+------------------+------------------+-----------------|
| What's up? | Active | up | 1 | up | done |
| Hi. I'm Otilia | Passive | up | 2 | down | none |
| Hi there. This is Marcus | Up | up | 3 | inactive | done |
+----------------------------+---------------+----------------+------------------+------------------+-----------------+
I want to extract a specific column using AWK.
I can use CUT to do it; however when the length of each table varies depending on how many characters are present in each column, I'm not getting the desired output.
cat File.txt | cut -c -44
+------------------------------------------+
| Message |
|------------------------------------------+
| Hello World |
| Hello Everyone Here |
| Hi there. My name is Eric. How are you? |
+------------------------------------------+
+----------------------------+--------------
| Message | Status
|----------------------------+--------------
| What's up? | Active
| Hi. I'm Otilia | Passive
| Hi there. This is Marcus | Up
+----------------------------+--------------
or
cat File.txt | cut -c 44-60
+---------------+
| Status |
+---------------+
| Active |
| Passive |
| Down |
+---------------+
--+--------------
| Adress
--+--------------
| up
| up
| up
--+--------------
I tried using AWK but I don't know how to add 2 different delimiters which would take care of all the lines.
cat File.txt | awk 'BEGIN {FS="|";}{print $2,$3}'
Message Status
------------------------------------------+---------------+----------------+------------------+------------------+-----------------
Hello World Active
Hello Everyone Here Passive
Hi there. My name is Eric. How are you? Down
Message Status
----------------------------+---------------+----------------+------------------+------------------+-----------------
What's up? Active
Hi. I'm Otilia Passive
Hi there. This is Marcus Up
The output I'm looking for:
+------------------------------------------+
| Message |
|------------------------------------------+
| Hello World |
| Hello Everyone Here |
| Hi there. My name is Eric. How are you? |
+------------------------------------------+
+----------------------------+
| Message |
|----------------------------+
| What's up? |
| Hi. I'm Otilia |
| Hi there. This is Marcus |
+----------------------------+
or
+------------------------------------------+---------------+
| Message | Status |
|------------------------------------------+---------------+
| Hello World | Active |
| Hello Everyone Here | Passive |
| Hi there. My name is Eric. How are you? | Down |
+------------------------------------------+---------------+
+----------------------------+---------------+
| Message | Status |
|----------------------------+---------------+
| What's up? | Active |
| Hi. I'm Otilia | Passive |
| Hi there. This is Marcus | Up |
+----------------------------+---------------+
or random other columns
+------------------------------------------+----------------+------------------+
| Message | Adress | Test |
|------------------------------------------+----------------+------------------+
| Hello World | up | up |
| Hello Everyone Here | up | down |
| Hi there. My name is Eric. How are you? | up | inactive |
+------------------------------------------+----------------+------------------+
+----------------------------+---------------+------------------+
| Message |Adress | Test |
|----------------------------+---------------+------------------+
| What's up? |up | up |
| Hi. I'm Otilia |up | down |
| Hi there. This is Marcus |up | inactive |
+----------------------------+---------------+------------------+
Thanks in advance.
One idea using GNU awk:
awk -v fldlist="2,3" '
BEGIN { fldcnt=split(fldlist,fields,",") } # split fldlist into array fields[]
{ split($0,arr,/[|+]/,seps) # split current line on dual delimiters "|" and "+"
for (i=1;i<=fldcnt;i++) # loop through our array of fields (fldlist)
printf "%s%s", seps[fields[i]-1], arr[fields[i]] # print leading separator/delimiter and field
printf "%s\n", seps[fields[fldcnt]] # print trailing separator/delimiter and terminate line
}
' File.txt
NOTES:
requires GNU awk for the 4th argument to the split() function (seps == array of separators; see gawk string functions for details)
assumes our field delimiters (|, +) do not show up as part of the data
the input variable fldlist is a comma-delimited list of columns that mimics what would be passed to cut (eg, when a line starts with a delimiter then field #1 is blank)
For fldlist="2,3" this generates:
+------------------------------------------+---------------+
| Message | Status |
|------------------------------------------+---------------+
| Hello World | Active |
| Hello Everyone Here | Passive |
| Hi there. My name is Eric. How are you? | Down |
+------------------------------------------+---------------+
+----------------------------+---------------+
| Message | Status |
|----------------------------+---------------+
| What's up? | Active |
| Hi. I'm Otilia | Passive |
| Hi there. This is Marcus | Up |
+----------------------------+---------------+
For fldlist="2,4,6" this generates:
+------------------------------------------+----------------+------------------+
| Message | Adress | Test |
|------------------------------------------+----------------+------------------+
| Hello World | up | up |
| Hello Everyone Here | up | down |
| Hi there. My name is Eric. How are you? | up | inactive |
+------------------------------------------+----------------+------------------+
+----------------------------+----------------+------------------+
| Message | Adress | Test |
|----------------------------+----------------+------------------+
| What's up? | up | up |
| Hi. I'm Otilia | up | down |
| Hi there. This is Marcus | up | inactive |
+----------------------------+----------------+------------------+
For fldlist="4,3,2" this generates:
+----------------+---------------+------------------------------------------+
| Adress | Status | Message |
+----------------+---------------|------------------------------------------+
| up | Active | Hello World |
| up | Passive | Hello Everyone Here |
| up | Down | Hi there. My name is Eric. How are you? |
+----------------+---------------+------------------------------------------+
+----------------+---------------+----------------------------+
| Adress | Status | Message |
+----------------+---------------|----------------------------+
| up | Active | What's up? |
| up | Passive | Hi. I'm Otilia |
| up | Up | Hi there. This is Marcus |
+----------------+---------------+----------------------------+
Say that again? (fldlist="3,3,3"):
+---------------+---------------+---------------+
| Status | Status | Status |
+---------------+---------------+---------------+
| Active | Active | Active |
| Passive | Passive | Passive |
| Down | Down | Down |
+---------------+---------------+---------------+
+---------------+---------------+---------------+
| Status | Status | Status |
+---------------+---------------+---------------+
| Active | Active | Active |
| Passive | Passive | Passive |
| Up | Up | Up |
+---------------+---------------+---------------+
And if you make the mistake of trying to print the '1st' column, ie, fldlist="1":
+
|
|
|
|
|
+
+
|
|
|
|
|
+
If GNU awk is available, please try markp-fuso's nice solution.
If not, here is a posix-compliant alternative:
#!/bin/bash
# define bash variables
cols=(2 3 6) # bash array of desired columns
col_list=$(IFS=,; echo "${cols[*]}") # create a csv string
awk -v cols="$col_list" '
NR==FNR {
if (match($0, /^[|+]/)) { # the record contains a table
if (match($0, /^[|+]-/)) # horizontally ruled line
n = split($0, a, /[|+]/) # split into columns
else # "cell" line
n = split($0, a, /\|/)
len = 0
for (i = 1; i < n; i++) {
len += length(a[i]) + 1 # accumulated column position
pos[FNR, i] = len
}
}
next
}
{
n = split(cols, a, /,/) # split the variable `cols` on comma into an array
for (i = 1; i <= n; i++) {
col = a[i]
if (pos[FNR, col] && pos[FNR, col+1]) {
printf("%s", substr($0, pos[FNR, col], pos[FNR, col + 1] - pos[FNR, col]))
}
}
print(substr($0, pos[FNR, col + 1], 1))
}
' file.txt file.txt
Result with cols=(2 3 6) as shown above:
+---------------+----------------+-----------------+
| Status | Adress | Calibration |
+---------------+----------------+-----------------|
| Active | up | done |
| Passive | up | none |
| Down | up | done |
+---------------+----------------+-----------------+
+---------------+----------------+-----------------+
| Status | Adress | Calibration |
+---------------+----------------+-----------------|
| Active | up | done |
| Passive | up | none |
| Up | up | done |
+---------------+----------------+-----------------+
It detects the column width in the 1st pass then splits the line on the column position in the 2nd pass.
You can control the columns to print with the bash array cols which is assigned at the beginning of the script. Please assign the array to the list of desired column numbers in increasing order. If you want to use the bash variable in different way, please let me know.

How to fix newline character in csv exported in shell script?

I want to fix this below issue in csv file using unix. I don't have access to source so i have to fix with this csv file alone. I need to desired output. is it achievable. Please help.
I have tried this below code but it doesn't work.
perl -p00e 's/\n|/|/g' test.csv
Issue:
DATECODE|SUBCLASSCODE|SUBCLASS_NAME|CLASS
2021-05-25|2202|Bras|1310
2021-05-25|1119|No Longer in Use - Depleted by 2019 Reclass|0805
2021-05-25|0949|No Longer in Use - Depleted by 2021 Reclass|0231
2021-05-25|1928|Fishing Gloves|1155
2021-05-25|1604|Training FW|1080
2021-05-25|0894|Hunting Waders|0894
2021-05-25|1873|Small Game|0326
2021-05-25|9950|EVENT
REGISTRATION FEE|9950
2021-05-25|0476|Regular Golf Gloves|0476
2021-05-25|1366|
Shorts|0988
2021-05-25|1914|Wade Shoes|0894
2021-05-25|0537|No Longer in Use - Depleted by 2019 Reclass|0537
2021-05-25|1635|Pickleball FW|
0021
2021-05-25|0679|Case Sunglasses|0679
2021-05-25|1544|Sandals|0001
2021-05-25|
1527|Golf/Tennis Accessories|1059
2021-05-25|1582|Lifestyle FW|0502
Desired Result:
DATECODE|SUBCLASSCODE|SUBCLASS_NAME|CLASS
2021-05-25|2202|Bras|1310
2021-05-25|1119|No Longer in Use - Depleted by 2019 Reclass|0805
2021-05-25|0949|No Longer in Use - Depleted by 2021 Reclass|0231
2021-05-25|1928|Fishing Gloves|1155
2021-05-25|1604|Training FW|1080
2021-05-25|0894|Hunting Waders|0894
2021-05-25|1873|Small Game|0326
2021-05-25|9950|EVENT REGISTRATION FEE|9950
2021-05-25|0476|Regular Golf Gloves|0476
2021-05-25|1366|Shorts|0988
2021-05-25|1914|Wade Shoes|0894
2021-05-25|0537|No Longer in Use - Depleted by 2019 Reclass|0537
2021-05-25|1635|Pickleball FW|0021
2021-05-25|0679|Case Sunglasses|0679
2021-05-25|1544|Sandals|0001
2021-05-25|1527|Golf/Tennis Accessories|1059
2021-05-25|1582|Lifestyle FW|0502
You can fix the output fairly simply with awk using 3-rules. Specifically, you will check that each line begins with a date in your format and ends (e.g. the 4th field $4) with 4-digits. If so, just print the line (rule 1). If not, and the line begins with a date in your format, just output without a '\n' so you can append the next line to it (rule 2). If you have reach a line that satisfies neither rule 1 or rule 2, it is the end of the previous line, just output with a '\n' to complete the previous line (rule 3).
That can be done with:
awk -F'|' '
NF==4 && $4~/^[[:digit:]]{4}$/ { print; next }
$1~/[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ {
printf "%s",$0
next
}
{ print }
' f.csv
Example Use/Output
With your input file in f.csv you would obtain:
$ awk -F'|' '
> NF==4 && $4~/^[[:digit:]]{4}$/ { print; next }
> $1~/[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ {
> printf "%s",$0
> next
> }
> { print }
> ' f.csv
DATECODE|SUBCLASSCODE|SUBCLASS_NAME|CLASS
2021-05-25|2202|Bras|1310
2021-05-25|1119|No Longer in Use - Depleted by 2019 Reclass|0805
2021-05-25|0949|No Longer in Use - Depleted by 2021 Reclass|0231
2021-05-25|1928|Fishing Gloves|1155
2021-05-25|1604|Training FW|1080
2021-05-25|0894|Hunting Waders|0894
2021-05-25|1873|Small Game|0326
2021-05-25|9950|EVENTREGISTRATION FEE|9950
2021-05-25|0476|Regular Golf Gloves|0476
2021-05-25|1366|Shorts|0988
2021-05-25|1914|Wade Shoes|0894
2021-05-25|0537|No Longer in Use - Depleted by 2019 Reclass|0537
2021-05-25|1635|Pickleball FW|0021
2021-05-25|0679|Case Sunglasses|0679
2021-05-25|1544|Sandals|0001
2021-05-25|1527|Golf/Tennis Accessories|1059
2021-05-25|1582|Lifestyle FW|0502
Which is the output you specified.
You can write it in condensed form with one rule per-line as:
awk -F'|' '
NF==4 && $4~/^[[:digit:]]{4}$/ { print; next }
$1~/[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ { printf "%s",$0; next }
{ print }
' f.csv
Look things over and let me know if you have further questions.
You have also very simple solution
perl -pe 's/\n/ /g;s/2021-/\n2021-/g;s/\| */|/g' input.txt
gives you
+------------+--------------+---------------------------------------------+--------+
| DATECODE | SUBCLASSCODE | SUBCLASS_NAME | CLASS |
+------------+--------------+---------------------------------------------+--------+
| 2021-05-25 | 2202 | Bras | 1310 |
| 2021-05-25 | 1119 | No Longer in Use - Depleted by 2019 Reclass | 0805 |
| 2021-05-25 | 0949 | No Longer in Use - Depleted by 2021 Reclass | 0231 |
| 2021-05-25 | 1928 | Fishing Gloves | 1155 |
| 2021-05-25 | 1604 | Training FW | 1080 |
| 2021-05-25 | 0894 | Hunting Waders | 0894 |
| 2021-05-25 | 1873 | Small Game | 0326 |
| 2021-05-25 | 9950 | EVENT REGISTRATION FEE | 9950 |
| 2021-05-25 | 0476 | Regular Golf Gloves | 0476 |
| 2021-05-25 | 1366 | Shorts | 0988 |
| 2021-05-25 | 1914 | Wade Shoes | 0894 |
| 2021-05-25 | 0537 | No Longer in Use - Depleted by 2019 Reclass | 0537 |
| 2021-05-25 | 1635 | Pickleball FW | 0021 |
| 2021-05-25 | 0679 | Case Sunglasses | 0679 |
| 2021-05-25 | 1544 | Sandals | 0001 |
| 2021-05-25 | 1527 | Golf/Tennis Accessories | 1059 |
| 2021-05-25 | 1582 | Lifestyle FW | 0502 |
+------------+--------------+---------------------------------------------+--------+

Match string in file1 with string in file2

my data examples are
1.txt
MTQZ3CODT0SQKGE3QE6B | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05
2.txt
MTQZ3CODT0SQKGE3QE6B | joe#example.com
desired output
joe#example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05
I suppose to match & replace 1st column from 1.txt
with 2nd column in 2.txt
so far i did try :
awk 'BEGIN { while((getline < "file2.txt") > 0) a[$1]=$3 } { $1 = a[$1] } 1' file1.txt
Its work well but after 12hours of running i just finalise only 1GB looks very slow
INFO: file1.txt=7GB file2.txt=4GB my memory 16GB
I'm not sure what cause the slowly thing but i hope if there's another fast way then i'm using of awk
will be helpfull.
Thanks!!
Note: I'm running out of memory is there another way to do it
and that's to not have an array at all?
Also in my case lines are randomly and not in the same lines!
$ join <(sort 2.txt) <(sort 1.txt) | cut -d' ' -f3-
joe#example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05
If that's not all you need then edit your question to provide more truly representative sample input/output including cases that this doesn't work for.
You may use this awk:
awk -F ' *\\| *' -v OFS=' | ' '
FNR == NR {
map[$1]=$2
next
}
$1 in map {
$1 = map[$1]
} 1' 2.txt 1.txt
joe#example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05

Find references to files, recursively

In a project where XML/JS/Java files can contain references to other such files, I'd like to be able to have a quick overview of what has to be carefully checked, when one file has been updated.
So, it means I need to eventually have a look at all files referencing the modified one, and all files referencing files which refer to the modified one, etc. (recursively on matched files).
For one level, it's quite simple:
grep -E -l -o --include=*.{xml,js,java} -r "$FILE" . | xargs -n 1 basename
But how can I automate that to match (grand-(grand-))parents?
And how can that be, maybe, made more readable? For example, with a tree structure?
For example, if the file that interests me is called modified.js...
show-referring-files-to modified.js
... I could wish such an output:
some-file-with-ref-to-modified.xml
|__ a-file-referring-to-some-file-with-ref-to-modified.js
another-one-with-ref-to-modified.xml
|__ a-file-referring-to-another-one-with-ref-to-modified.js
|__ a-grand-parent-file-having-ref-to-ref-file.xml
|__ another-file-referring-to-another-one-with-ref-to-modified.js
or any other output (even flat) which allows for quickly checking which files are potentially impacted by a change.
UPDATE -- Results of current proposed answer:
ahmsff.js
|__ahmsff.xml
| |__ahmsd.js
| | |__ahmsd.xml
| | | |__ahmst.xml
| | | | |__BESH.java
| |__ahru.js
| | |__ahru.xml
| | | |__ahrut.xml
| | | | |__ashrba.js
| | | | | |__ashrba.xml
| | | | | | |__STR.java
| | |__ahrufrp.xml
| | | |__ahru.js
| | | | |__ahru.xml
| | | | | |__ahrut.xml
| | | | | | |__ashrba.js
| | | | | | | |__ashrba.xml
| | | | | | | | |__STR.java
| | | | |__ahrufrp.xml
| | | | | |__ahru.js
| | | | | | |__ahru.xml
| | | | | | | |__ahrut.xml
| | | | | | | | |__ashrba.js
| | | | | | | | | |__ashrba.xml
| | | | | | | | | | |__STR.java
| | | | | | |__ahrufrp.xml
(...)
I'd use a shell function (for the recursion) inside an shell script:
Assuming the filenames are unique have no characters that need escaping in them:
File: /usr/local/bin/show-referring-files-to
#!/bin/sh
get_references() {
grep -F -l --include=*.{xml,js,java} -r "$1" . | grep -v "$3" | while read -r subfile; do
#read each line of the grep result into the variable subfile
subfile="$(basename "$subfile")"
echo "$2""$subfile"
get_references "$subfile" ' '"$2" "$3"'\|'"$subfile"
done
}
while test $# -gt 0; do
#loop so more than one file can be given as argument to this script
echo "$1"
get_references "$1" '|__' "$1"
shift
done
There still are lots of performance enhancements possible.
Edit: Added $3 to prevent infinite-loop.

Can't iterate over array in Bash

I need to add a new column with a (ordinal) number after the last column in my table.
Both input and output files are .CSV tables.
Incoming table has more then 500 000 lines (rows) of data and 7 columns, e.g. https://www.dropbox.com/s/g2u68fxrkttv4gq/incoming_data.csv?dl=0
Incoming CSV table (this is just an example, so "|" and "-" are here for the sake of clarity):
| id | Name |
-----------------
| 1 | Foo |
| 1 | Foo |
| 1 | Foo |
| 4242 | Baz |
| 4242 | Baz |
| 4242 | Baz |
| 4242 | Baz |
| 702131 | Xyz |
| 702131 | Xyz |
| 702131 | Xyz |
| 702131 | Xyz |
Result CSV (this is just an example, so "|" and "-" are here for the sake of clarity):
| id | Name | |
--------------------------
| 1 | Foo | 1 |
| 1 | Foo | 2 |
| 1 | Foo | 3 |
| 4242 | Baz | 1 |
| 4242 | Baz | 2 |
| 4242 | Baz | 3 |
| 4242 | Baz | 4 |
| 702131 | Xyz | 1 |
| 702131 | Xyz | 2 |
| 702131 | Xyz | 3 |
| 702131 | Xyz | 4 |
First column is ID, so I've tried to group all lines with the same ID and iterate over them. Script (I don't know bash scripting, to be honest):
FILE=$PWD/$1
# Delete header and extract IDs and delete non-unique values. Also change \n to ♥, because awk doesn't properly work with it.
IDS_ARRAY=$(awk -v FS="|" '{for (i=1;i<=NF;i++) if ($i=="\"") inQ=!inQ; ORS=(inQ?"♥":"\n") }1' $FILE | awk -F'|' '{if (NR!=1) {print $1}}' | awk '!seen[$0]++')
for id in $IDS_ARRAY; do
# Group $FILE by $id from $IDS_ARRAY.
cat $FILE | grep $id >> temp_mail_group.csv
ROW_GROUP=$PWD/temp_mail_group.csv
# Add a number after each row.
# NF+1 — add a column after last existing.
awk -F'|' '{$(NF+1)=++i;}1' OFS="|", $ROW_GROUP >> "numbered_mails_$(date +%Y-%m-%d).csv"
rm -f $PWD/temp_mail_group.csv
done
Right now this script works almost like I want to, except that it thinks that (for example) ID 2834 and 772834 are the same.
UPD: Although I marked one answer as approved it does not assign correct values to some groups of records with the same ID (right now I don't see a pattern).
You can do everything in a single script:
gawk 'BEGIN { FS="|"; OFS="|";}
/^-/ {print; next;}
$2 ~ /\s*id\s*/ {print $0,""; next;}
{print "", $2, $3, ++a[$2];}
'
$1 is the empty field before the first | in the input. I use an empty output column "" to get the leading |.
The trick is ++a[$2] which takes the second field in each row (= the ID column) and looks for it in the associative array a. If there is no entry, the result is 0. By pre-incrementing, we start with 1 and add 1 every time the ID reappears.
Every time you write a loop in shell just to manipulate text you have the wrong approach. The guys who invented shell also invented awk for shell to call to manipulate text - don't disappoint them :-).
$ awk '
BEGIN{ w = 8 }
{
if (NR==1) {
val = sprintf("%*s|",w,"")
}
else if (NR==2) {
val = sprintf("%*s",w+1,"")
gsub(/ /,"-",val)
}
else {
val = sprintf(" %-*s|",w-1,++cnt[$2])
}
print $0 val
}
' file
| id | Name | |
----------------------
| 1 | Foo | 1 |
| 1 | Foo | 2 |
| 1 | Foo | 3 |
| 42 | Baz | 1 |
| 42 | Baz | 2 |
| 42 | Baz | 3 |
| 42 | Baz | 4 |
| 70 | Xyz | 1 |
| 70 | Xyz | 2 |
| 70 | Xyz | 3 |
| 70 | Xyz | 4 |
An awk way
Without considering the dotted line being extended.
awk 'NR>2{$0=$0 (++a[$2])"|"}1' file
output
| id | Name |
-------------
| 1 | Foo |1|
| 1 | Foo |2|
| 1 | Foo |3|
| 42 | Baz |1|
| 42 | Baz |2|
| 42 | Baz |3|
| 42 | Baz |4|
| 70 | Xyz |1|
| 70 | Xyz |2|
| 70 | Xyz |3|
| 70 | Xyz |4|
Here's a way to do it with pure Bash:
inputfile=$1
prev_id=
while IFS= read -r line ; do
printf '%s' "$line"
IFS=$'| \t\n' read t1 id name t2 <<<"$line"
if [[ $line == -* ]] ; then
printf '%s\n' '---------'
elif [[ $id == 'id' ]] ; then
printf ' Number |\n'
else
if [[ $id != "$prev_id" ]] ; then
id_count=0
prev_id=$id
fi
printf '%2d |\n' "$(( ++id_count ))"
fi
done <"$inputfile"

Resources