how to iterate in a file using keyword on a bash - bash

in some file there is some content like:
scenario1{
user_range:="1..100"
ip_low:="192.168.1.1"
ip_high:=192.168.1.100
...
}
scenario2{
user_range:="101..200"
ip_low:="192.168.2.1"
ip_high:=192.168.2.100"
...
}
...
I want replace some values using sed -i. But I can't figure out how to iterates by keyword "scenario" in order to change user_ranges and ips for the whole file.

awk to the rescue!
$ awk -v RS='\n}' 'BEGIN{OFS="\n"}
{from=250*c+1; to=250*(++c);
sub(/:=.*/,":=\""from".."to"\"",$2)}
{print $0 RT}' file
scenario1{
user_range:="1..250"
ip_low:="192.168.1.1"
ip_high:=192.168.1.100
...
}
scenario2{
user_range:="251..500"
ip_low:="192.168.2.1"
ip_high:=192.168.2.100"
...
}
ip addresses can be done similarly if there is a regular pattern.

If you insist on using sed You may find it easier if you convert your file to a csv-format first.
tr '\n' ',' <testfile | tr '}' '\n' | tr -d "{" |sed 's/^,*//g;s/,*$//g' >csvfile
Since this results in one scenario per line, it will be much easier to use sed

It is quite easy with plain bash to seperate the values. I assume that the order of the key-value pairs and the number of newlines per stanza stay the same (just for demonstration purpose)
while read line
do
scenario=${line//\{/}
read line; user_range=${line}
read line; ip_low=${line}
read line; ip_high=${line}
read line; endchar=${line}
# here you can insert every piece of code you need
# to change your variables
cat<<-EOF
$scenario{
$user_range
$ip_low
$ip_high
}
EOF
done <file_like_your_example >new_file

Related

How to get values in a line while looping line by line in a file (shell script)

I have a file which looks like this (file.txt)
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
I want to loop trough this line by line an extract the key values.
so the result should be like ,
AJGUIGIDH568
AJGUIGIDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
So I wrote a code like this to loop line by line and extract the value between {"key":" and ","rule": because key values is in between these 2 patterns.
while read p; do
echo $p | sed -n "/{"key":"/,/","rule":,/p"
done < file.txt
But this is not working. can someone help me to figure out me this. Thanks in advance.
Your sample input is almost valid json. You could tweak it to make it valid and then extract the values with jq with something like:
sed -e 's/squid/"squid/' -e 's/$/"}/' file.txt | jq -r .key
Or, if your actual input really is valid json, then just use jq:
jq -r .key file.txt
If the "random-txt" may include double quotes, making it difficult to massage the input to make it valid json, perhaps you want something like:
awk '{print $4}' FS='"' file.txt
or
sed -n '/{"key":"\([^"]*\).*/s//\1/p' file.txt
or
while IFS=\" read open_brace key colon val _; do echo "$val"; done < file.txt
For the shown data, you can try this awk:
awk -F '"[:,]"' '{print $2}' file
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
With the give example you can simple use
cut -d'"' -f4 file.txt
Assumptions:
there may be other lines in the file so we need to focus on just the lines with "key" and "rule"
the only text between "key" and "rule" is the desired string (eg, squid never shows up between the two patterns of interest)
Adding some additional lines:
$ cat file.txt
{"key":"AJGUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"TJHJHJHDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"YUUUIGIDH566","rule":squid:111-some_random_text_here
ignore this line}
{"key":"HJHHIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
{"key":"ATYUGUIDH556","rule":squid:111-some_random_text_here
ignore this line}
{"key":"QfgUIGIDH568","rule":squid:111-some_random_text_here
ignore this line}
One sed idea:
$ sed -nE 's/^(.*"key":")([^"]*)(","rule".*)$/\2/p' file.txt
AJGUIGIDH568
TJHJHJHDH568
YUUUIGIDH566
HJHHIGIDH568
ATYUGUIDH556
QfgUIGIDH568
Where:
-E - enable extended regex support (and capture groups without need to escape sequences)
-n - suppress printing of pattern space
^(.*"key":") - [1st capture group] everything from start of line up to and including "key":"
([^"]*) - [2nd capture group] everything that is not a double quote (")
(","rule".*)$ - [3rd capture group] everything from ",rule" to end of line
\2/p - replace the line with the contents of the 2nd capture group and print

convert a file content using shell script

Hello everyone I'm a beginner in shell coding. In daily basis I need to convert a file's data to another format, I usually do it manually with Text Editor. But I often do mistakes. So I decided to code an easy script who can do the work for me.
The file's content like this
/release201209
a1,a2,"a3",a4,a5
b1,b2,"b3",b4,b5
c1,c2,"c3",c4,c5
to this:
a2>a3
b2>b3
c2>c3
The script should ignore the first line and print the second and third values separated by '>'
I'm half way there, and here is my code
#!/bin/bash
#while Loops
i=1
while IFS=\" read t1 t2 t3
do
test $i -eq 1 && ((i=i+1)) && continue
echo $t1|cut -d\, -f2 | { tr -d '\n'; echo \>$t2; }
done < $1
The problem in my code is that the last line isnt printed unless the file finishes with an empty line \n
And I want the echo to be printed inside a new CSV file(I tried to set the standard output to my new file but only the last echo is printed there).
Can someone please help me out? Thanks in advance.
Rather than treating the double quotes as a field separator, it seems cleaner to just delete them (assuming that is valid). Eg:
$ < input tr -d '"' | awk 'NR>1{print $2,$3}' FS=, OFS=\>
a2>a3
b2>b3
c2>c3
If you cannot just strip the quotes as in your sample input but those quotes are escaping commas, you could hack together a solution but you would be better off using a proper CSV parsing tool. (eg perl's Text::CSV)
Here's a simple pipeline that will do the trick:
sed '1d' data.txt | cut -d, -f2-3 | tr -d '"' | tr ',' '>'
Here, we're just removing the first line (as desired), selecting fields 2 & 3 (based on a comma field separator), removing the double quotes and mapping the remaining , to >.
Use this Perl one-liner:
perl -F',' -lane 'next if $. == 1; print join ">", map { tr/"//d; $_ } #F[1,2]' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F',' : Split into #F on comma, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

bash remove/change values from one field with a loop

I have a file where the 10th column in excel contains prices.
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5500",19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50000",19.50,bieber,20160506,0,,N,E,,,,,,
When it goes to csv the quotes and the comma's stay.
I need to pick out the column that is surrounded by quotes - I use grep -o
and then after clearing the commas, i get rid of the quotes.
I can't use quotes or comma to delimit in awk because the prices get broken up into different fields.
cat /tmp/wowmom | awk -F ',' '{print $10}'
"5000"
"75
"100
"5500"
"50
"350
"50000"
while read line
do
clean_price=$(grep -o '".*"' $line)
echo "$clean_price" | tr -d',' > cleanprice1
echo "cleanprice1" | tr -d'"' > clearnprice2
done </tmp/wowmom
I get errors though "No such file or directory" on the grep
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory
I want to some way, Isolate the value within quotes with a grep -o and take out comma from the number , then use awk to take the quotes out of field 10.
I am doinng this manually right now It is a suprizingly long job - there are thousands of lines on this.
You an use FPAT with gnu-awk for this:
awk -v FPAT='"[^"]+",|[^,]*' '{gsub(/[",]+/, "", $10)} 1' OFS=, file
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,
You are using the wrong tool here.
sed -r 's/^(([^,]+,){9})"([^,]+),?([^,]+)"/\1\3\4/' file.csv > newfile.csv
The regular expression captures the first nine fields into the first back reference (and also populates the second with the last of the nine fields), the number before the separator comma in the third, and the rest of the number in the fourth, then the substitution glues them back without the skipped elements.
If you have numbers with more than one thousands separator (i.e. above one million), you will need a slightly more complex script.
In terms of what's wrong with your original script, the second argument to grep is the name of the file to grep, not the string to grep. You can use a here string (in Bash) or pipe the string to grep, but again, this is not how you do it properly.
grep -o '"[^"]*"' <<<"$line"
or
printf '%s' "$line" | grep -o '"[^"]*"'
Notice also the quotes -- omitting quotes are a common newbie error; you can get away with it for a while, and then it bites you.
A pure Bash solution:
while IFS=\" read -r l n r; do
printf '%s\n' "$l${n//,/}$r"
done < input_file.txt
If you're looking for perl:
#!perl
use strict;
use warnings;
use Text::CSV;
use autodie;
my $csv = Text::CSV->new({binary=>1, eol=>"\n"});
my $filename = shift #ARGV;
open my $fh, "<", $filename;
while (my $row = $csv->getline($fh)) {
$row->[9] =~ s/,//g;
$csv->print(*STDOUT, $row);
}
close $fh;
demo:
$ perl csv.pl file
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,

Bash command to extract characters in a string

I want to write a small script to generate the location of a file in an NGINX cache directory.
The format of the path is:
/path/to/nginx/cache/d8/40/32/13febd65d65112badd0aa90a15d84032
Note the last 6 characters: d8 40 32, are represented in the path.
As an input I give the md5 hash (13febd65d65112badd0aa90a15d84032) and I want to generate the output: d8/40/32/13febd65d65112badd0aa90a15d84032
I'm sure sed or awk will be handy, but I don't know yet how...
This awk can make it:
awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}'
Explanation
BEGIN{FS=""; OFS="/"}. FS="" sets the input field separator to be "", so that every char will be a different field. OFS="/" sets the output field separator as /, for print matters.
print ... $(NF-1)$NF, $0 prints the penultimate field and the last one all together; then, the whole string. The comma is "filled" with the OFS, which is /.
Test
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' <<< "13febd65d65112badd0aa90a15d84032"
d8/40/32/13febd65d65112badd0aa90a15d84032
Or with a file:
$ cat a
13febd65d65112badd0aa90a15d84032
13febd65d65112badd0aa90a15f1f2f3
$ awk 'BEGIN{FS=""; OFS="/"}{print $(NF-5)$(NF-4), $(NF-3)$(NF-2), $(NF-1)$NF, $0}' a
d8/40/32/13febd65d65112badd0aa90a15d84032
f1/f2/f3/13febd65d65112badd0aa90a15f1f2f3
With sed:
echo '13febd65d65112badd0aa90a15d84032' | \
sed -n 's/\(.*\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)\)$/\2\/\3\/\4\/\1/p;'
Having GNU sed you can even simplify the pattern using the -r option. Now you won't need to escape {} and () any more. Using ~ as the regex delimiter allows to use the path separator / without need to escape it:
sed -nr 's~(.*([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2}))$~\2/\3/\4/\1~p;'
Output:
d8/40/32/13febd65d65112badd0aa90a15d84032
Explained simple the pattern does the following: It matches:
(all (n-5 - n-4) (n-3 - n-2) (n-1 - n-0))
and replaces it by
/$1/$2/$3/$0
You can use a regular expression to separate each of the last 3 bytes from the rest of the hash.
hash=13febd65d65112badd0aa90a15d84032
[[ $hash =~ (..)(..)(..)$ ]]
new_path="/path/to/nginx/cache/${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}/$hash"
Base="/path/to/nginx/cache/"
echo '13febd65d65112badd0aa90a15d84032' | \
sed "s|\(.*\(..\)\(..\)\(..\)\)|${Base}\2/\3/\4/\1|"
# or
# sed sed 's|.*\(..\)\(..\)\(..\)$|${Base}\1/\2/\3/&|'
Assuming info is a correct MD5 (and only) string
First of all - thanks to all of the responders - this was extremely quick!
I also did my own scripting meantime, and came up with this solution:
Run this script with a parameter of the URL you're looking for (www.example.com/article/76232?q=hello for example)
#!/bin/bash
path=$1
md5=$(echo -n "$path" | md5sum | cut -f1 -d' ')
p3=$(echo "${md5:0-2:2}")
p2=$(echo "${md5:0-4:2}")
p1=$(echo "${md5:0-6:2}")
echo "/path/to/nginx/cache/$p1/$p2/$p3/$md5"
This assumes the NGINX cache has a key structure of 2:2:2.

How to convert HHMMSS to HH:MM:SS Unix?

I tried to convert the HHMMSS to HH:MM:SS and I am able to convert it successfully but my script takes 2 hours to complete because of the file size. Is there any better way (fastest way) to complete this task
Data File
data.txt
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,071600,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,072200,072200,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,072600,072600,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073200,073200,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,073500,073500,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,073700,073700,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,073900,073900,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,074400,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,090200,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,090900,090900,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,091500,091500,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,091900,091900,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092500,092500,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,092900,092900,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,093200,093200,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,093500,093500,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,094500,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,170100,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,170400,170400,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,170700,170700,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171000,171000,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,171500,171500,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,171900,171900,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172500,172500,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,172900,172900,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,173500,173500,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,174100,,
My code : script.sh
#!/bin/bash
awk -F"," '{print $5}' Data.txt > tmp.txt # print first line first string before , to tmp.txt i.e. all Numbers will be placed into tmp.txt
sort tmp.txt | uniq -d > Uniqe_number.txt # unique values be stored to Uniqe_number.txt
rm tmp.txt # removes tmp file
while read line; do
echo $line
cat Data.txt | grep ",$line," > Numbers/All/$line.txt # grep Number and creats files induvidtually
awk -F"," '{print $5","$4","$7","$8","$9","$10","$11}' Numbers/All/$line.txt > Numbers/All/tmp_$line.txt
mv Numbers/All/tmp_$line.txt Numbers/Final/Final_$line.txt
done < Uniqe_number.txt
ls Numbers/Final > files.txt
dos2unix files.txt
bash time_replace.sh
when you execute above script it will call time_replace.sh script
My Code for time_replace.sh
#!/bin/bash
for i in `cat files.txt`
do
while read aline
do
TimeDep=`echo $aline | awk -F"," '{print $6}'`
#echo $TimeDep
finalTimeDep=`echo $TimeDep | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'`
#echo $finalTimeDep
##########
TimeAri=`echo $aline | awk -F"," '{print $7}'`
#echo $TimeAri
finalTimeAri=`echo $TimeAri | awk '{for(i=1;i<=length($0);i+=2){printf("%s:",substr($0,i,2))}}'|awk '{sub(/:$/,"")};1'`
#echo $finalTimeAri
sed -i 's/',$TimeDep'/',$finalTimeDep'/g' Numbers/Final/$i
sed -i 's/',$TimeAri'/',$finalTimeAri'/g' Numbers/Final/$i
############################
done < Numbers/Final/$i
done
Any better solution?
Appreciate any help.
Thanks
Sri
If there's a large quantity of files, then the pipelines are probably what are going to impact performance more than anything else - although processes can be cheap, if you're doing a huge amount of processing then cutting down the amount of time you do pass data through a pipeline can reap dividends.
So you're probably going to be better off writing the entire script in awk (or perl). For example, awk can send output to an arbitary file, so the while lop in your first file could be replaced with an awk script that does this. You also don't need to use a temporary file.
I assume the sorting is just for tracking progress easily as you know how many numbers there are. But if you don't care for the sorting, you can simply do this:
#!/bin/sh
awk -F ',' '
{
print $5","$4","$7","$8","$9","$10","$11 > Numbers/Final/Final_$line.txt
}' datafile.txt
ls Numbers/Final > files.txt
Alternatively, if you need to sort you can do sort -t, -k5,4,10 (or whichever field your sort keys actually need to be).
As for formatting the datetime, awk also does functions, so you could actually have an awk script that looks like this. This would replace both of your scripts above whilst retaining the same functionality (at least, as far as I can make out with a quick analysis) ... (Note! Untested, so may contain vauge syntax errors):
#!/usr/bin/awk
BEGIN {
FS=","
}
function formattime (t)
{
return substr(t,1,2)":"substr(t,3,2)":"substr(t,5,2)
}
{
print $5","$4","$7","$8","$9","formattime($10)","formattime($11) > Numbers/Final/Final_$line.txt
}
which you can save, chmod 700, and call directly as:
dostuff.awk filename
Other awk options include changing fields in-situ, so if you want to maintain the entire original file but with formatted datetimes, you can do a modification of the above. Change the print block to:
{
$10=formattime($10)
$11=formattime($11)
print $0
}
If this doesn't do everything you need it to, hopefully it gives some ideas that will help the code.
It's not clear what all your sorting and uniq-ing is for. I'm assuming your data file has only one entry per line, and you need to change the 10th and 11th comma-separated fields from HHMMSS to HH:MM:SS.
while IFS=, read -a line ; do
echo -n ${line[0]},${line[1]},${line[2]},${line[3]},
echo -n ${line[4]},${line[5]},${line[6]},${line[7]},
echo -n ${line[8]},${line[9]},
if [ -n "${line[10]}" ]; then
echo -n ${line[10]:0:2}:${line[10]:2:2}:${line[10]:4:2}
fi
echo -n ,
if [ -n "${line[11]}" ]; then
echo -n ${line[11]:0:2}:${line[11]:2:2}:${line[11]:4:2}
fi
echo ""
done < data.txt
The operative part is the ${variable:offset:length} construct that lets you extract substrings out of a variable.
In Perl, that's close to child's play:
#!/usr/bin/env perl
use strict;
use warnings;
use English( -no_match_vars );
local($OFS) = ",";
while (<>)
{
my(#F) = split /,/;
$F[9] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[9];
$F[10] =~ s/(\d\d)(\d\d)(\d\d)/$1:$2:$3/ if defined $F[10];
print #F;
}
If you don't want to use English, you can write local($,) = ","; instead; it controls the output field separator, choosing to use comma. The code reads each line in the file, splits it up on the commas, takes the last two fields, counting from zero, and (if they're not empty) inserts colons in between the pairs of digits. I'm sure a 'Code Golf' solution would be made a lot shorter, but this is semi-legible if you know any Perl.
This will be quicker by far than the script, not least because it doesn't have to sort anything, but also because all the processing is done in a single process in a single pass through the file. Running multiple processes per line of input, as in your code, is a performance disaster when the files are big.
The output on the sample data you gave is:
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,,07:16:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:22:00,07:22:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TAB,07:26:00,07:26:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:32:00,07:32:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:35:00,07:35:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,MRO,07:37:00,07:37:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,CPT,07:39:00,07:39:00,
10,SRI,AA,20091210,8503,ABCXYZ,D,N,TMP,07:44:00,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,,09:02:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:09:00,09:09:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:15:00,09:15:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TAB,09:19:00,09:19:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:25:00,09:25:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:29:00,09:29:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,MRO,09:32:00,09:32:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,CPT,09:35:00,09:35:00,
10,SRI,AA,20091210,8505,ABCXYZ,D,N,TMP,09:45:00,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,CPT,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,MRO,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TAB,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8506,ABCXYZ,U,N,TMP,,,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,,17:01:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,CPT,17:04:00,17:04:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,MRO,17:07:00,17:07:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:10:00,17:10:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:15:00,17:15:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TAB,17:19:00,17:19:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:25:00,17:25:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:29:00,17:29:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:35:00,17:35:00,
10,SRI,AA,20091210,8510,ABCXYZ,U,N,TMP,17:41:00,,

Resources