Sed search and replace in every line in file, first raw - bash

i'm trying to develop a shell script which scan a csv file (example below) and replace the EPOCH time stamp with the a readable time stamp.
the csv file looks like (column is EPOCH) :
1464007935578, 852111
1464007935600, 852111
1464007935603, 852111
1464007935603, 1900681
my script code:
#!/bin/bash
file=xxx.csv
while read line; do
epoch=`echo $line | awk -F',' '{print $1}'`
miliSec=$(($epoch % 1000 ))
timeWithoutMiliSec=`echo "$(date +"%T" -d#$(($epoch / 1000)))"`
fullTime=`echo "$timeWithoutMiliSec,$miliSec"`
echo $line | awk -F',' '{print $1}'|sed -i 's/[0-9]*/'$fullTime'/g' $file
done <$file
Desired output
12:52:15,255 852111
12:52:15,257 852111
12:52:15,259 852111
12:52:15,261 1900681
but when i run script it stuck and create many files start with sed*
and i must kill the script in order to stop it.
i think that the problem might be with the while loop combining with the sed while both (sed and while) take argument $file
can someone please advise about that issue?
Thanks a lot!

Use awk's strftime() function:
$ awk -F, -v OFS=, '{print strftime("%F", $1/1000), $2}' file
2016-05-23, 852111
2016-05-23, 852111
2016-05-23, 852111
2016-05-23, 1900681
This uses
%F
Equivalent to specifying ā€˜%Y-%m-%dā€™. This is the ISO 8601 date format.
Note I have to divide by 1000 because you have the miliseconds in your input.

Related

Reformat date in text file (.csv) with sed and date

This is the input .csv file
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
.
.
.
Now, I want to convert the second column (date) to this specific format
date -d '10/12/2017 2:10pm' +'%Y/%m/%d %H:%M:%S', so that "10/12/2017 2:10pm" converts to "2017/10/12 14:10:00"
Expecting output file
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
.
.
.
I know it can be done by using bash or python, but I want to do it in one-line command. Any ideas? Is there a way to pass date result to sed?
One-liner awk approach.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd) }1' OFS=, infile #>>outfile
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
This will output changes in your Terminal, you need to redirect the output to a file if you need record the output or use FILENAME to redirect the output to the input infile itself.
awk -F',' '{gsub(/"/,"",$2); cmd="date -d\""$2"\" +\\\"%Y/%m/%d\\ %T\\\"";
cmd |getline $2; close(cmd); print >FILENAME }' OFS=, infile
Or with GNU awk implementations which does support -i inplace identifier for in-place replace. see 'awk' save modifications in place
You can do it in one line, but that begs the question -- "How long of a line do you want?" Since you have it labeled 'shell' and not bash, etc., you are a bit limited in your string handling. POSIX shell provides enough to do what you want, but it isn't the speediest remedy. You are either going to end up with an awk or sed solution that calls date or a shell solution that calls awk or sed to parse old date from the original file and feeds the result to date to get your new date. You will have to work out which provides the most efficient remedy.
As far as the one-liner goes, you can do something similar to the following while remaining POSIX compliant. It simply uses awk to get the 2nd field from the file, pipes the result to a while loop which uses expr length "$field" to get the length and uses that within expr substr "$field" "2" <length expression - 2> to chop the double-quotes from the end of the original date olddt, followed by date -d "$olddt" +'%Y/%m/%d %H:%M:%S' to get newdt and finally sed -i "s;$olddt;$newdt;" to perform the substitution in place. Your one-liner (shown with auto line-continuations for readability)
$ awk -F, '{print $2}' timefile.txt |
while read -r field; do
olddt="$(expr substr "$field" "2" "$(($(expr length "$field") - 2))")";
newdt=$(date -d "$olddt" +'%Y/%m/%d %H:%M:%S');
sed -i "s;$olddt;$newdt;" timefile.txt; done
Example Input File
$ cat timefile.txt
"item1","10/11/2017 2:10pm",1,2, ...
"item2","10/12/2017 3:10pm",3,4, ...
Resulting File
$ cat timefile.txt
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
There are probably faster ways to do it, but this is a reasonable length one-liner (relatively speaking).
Revised less ugly sed method:
sed 's/^.*,"\|",.*//g;h;s#.*#date "+%Y/%m/%d %T" -d "&"#e;H;g;s#\n\|$#,#g;s/^/s,/' input.csv | sed -f - input.csv
Spread out, (it works the same):
sed 's/^.*,"\|",.*//g
h;
s#.*#date "+%Y/%m/%d %T" -d "&"#e;
H;
g;
s#\n\|$#,#g;
s/^/s,/' input.csv | sed -f - input.csv
Output:
"item1","2017/10/11 14:10:00",1,2, ...
"item2","2017/10/12 15:10:00",3,4, ...
How it works:
The first sed block uses the evaluate command to run date, the output of which is used to generate some new sed substitute commands. To show the new s commands, temporarily replace the shell script | pipe with a # comment:
s,10/11/2017 2:10pm,2017/10/11 14:10:00,
s,10/12/2017 3:10pm,2017/10/12 15:10:00,
These are piped to the second sed.

Convert date to timestamp in bash (with miliseconds)

I have CSV file in the following format
20170102 00:00:00.803,
20170102 00:00:01.265,
20170102 00:00:05.818,
I've managed to add slashes with
sed -r 's#(.{4})(.{2})(.{2})(.{2})(.{2})#\1/\2/\3 \4:\5:#' file.csv > newfile.csv
as below, to enable coversion to timestamp
2017/01/02 0:0::00:00.803
2017/01/02 0:0::00:01.265
2017/01/02 0:0::00:05.818
But after using
cat newfile.csv | while read line ; do echo $line\;$(date -d "$t" "+%s%N") ; done > nextfile.csv
I got :
2017/01/02 0:0::00:00.803,1499727600000000000
2017/01/02 0:0::00:01.265,1499727600000000000
2017/01/02 0:0::00:05.818,1499727600000000000
There's probably something wrong my data, but I'm too much of a beginner to be able to get missing values. It would be very much appreciated if you could drop me some sed/awk magic.Thanks!
EDIT: I need to have a timestamp with miliseconds, but all I got for now is just zeros (how typical)
Not sure if this is what you are after but you could just parse the output without date to form the date stamp.
awk '{ print substr($0,1,4)"/"substr($0,5,2)"/"substr($0,7,2)" "substr($0,10,2)":"substr($0,13,2)":"substr($0,16) }' dates.csv
We use awk to pull out the extract of the line concerning day, month, year etc (substr function) and then use print to output the data in the required format.
gawk solution:
awk -F',' '{ match($1,/^([0-9]{4})([0-9]{2})([0-9]{2}) ([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})/,a);
print mktime(sprintf("%d %d %d %d %d %d",a[1],a[2],a[3],a[4],a[5],a[6]))*1000 + a[7] }' file.csv
The output:
1483308000803
1483308001265
1483308005818
The original format is accepted by date as time stamp. You need not sed it. I believe you need "date,milliseconds since 1970-01-01 00:00:00 UTC" in your output. Try this in bash.
generateoutput.sh
#!/bin/bash
while read -r line
do
echo -n $line,
echo `date -d "$line" "+%s%N"` / 1000000 | bc
done < <(sed 's/,//g' $1)
where timestamp.csv is your file in original format.
bc - basic calculator, convert nanoseconds to milliseconds
Parsing large files line by line is bound to take time.
If you need better performance, split your original file. I suggest doing this in a new directory
split -l 100000 -d <filename>
Run generateoutput.sh in parallel for each of this file and tee -a output
ls -l x* | awk '{print $9}' | xargs -n1 -P4 generateoutput.sh | tee -a output.csv

How can I retrieve numeric value from text file in shell script?

below content has been written in a text file called test.txt. How can I retrieve pending & completed count value in shell script?
<p class="pending">Count: 0</p>
<p class="completed">Count: 0</p>
Here's what I tried:
#!/bin/bash
echo
echo 'Fetching job page and write to Jobs.txt file...'
curl -o Jobs.txt https://cms.test.com
completestatus=`grep "completed" /home/Jobs.txt | awk -F "<p|< p="">" '{print $2 }' | awk '{print $4 }'`
echo $completestatus
if [ "$completestatus" == 0 ]; then
grep and awk commands can almost always be combined into 1 awk command. And 2 awk commands can almost always be combined to 1 awk command also.
This solves your immediate problem (using a little awk type casting trickery).
completedStatus=$(echo "<p class="pending">Count: 0</p>^J
<p class="completed">Count: 0</p>" \
| awk -F : '/completed/{var=$2+0.0;print var}' )
echo completedStatus=$completedStatus
The output is
completedStatus=0
Note that you can combine grep and awk with
awk -F : '/completed/' test.txt
filters to just the completed line , output
<p class=completed>Count: 0</p>
When I added your -F argument, the output didn't change, i.e.
awk -F'<p|< p="">' '/completed/' test.txt
output
<p class=completed>Count: 0</p>
So I relied on using : as the -F (field separator). Now the output is
awk -F : '/completed/{print $2}'
output
0</p>
When performing a calculation, awk will read a value "looking" for a number at the front, if it finds a number, it will read the data until it finds a non-numeric (or if there is nothing left). So ...
awk -F : '/completed/{var=$2+0.0;print var}' test.txt
output
0
Finally we arrive at the solution above, wrap the code in a modern command-substitution, i.e. $( ... cmds ....) and send the output to the completedStatus= assignment.
In case you're thinking that the +0.0 addition is what is being output, you can change your file to show completed count = 10, and the output will be 10.
IHTH
another awk
completedStatus=$(awk -F'[ :<]' '/completed/{print $(NF-1)}' file)
If I got you right, you just want to extract pending or completed and the value. If that is the case,
Then Using SED,
please check out below script.Output shared via picture, please click to see
#!/bin/bash
file="$1"
echo "Simple"
cat $1 |sed 's/^.*=\"\([a-z]*\)\">Count: \([0-9]\)<.*$/\1=\2/g'
echo "Pipe Separated"
cat $1 |sed 's/^.*=\"\([a-z]*\)\">Count: \([0-9]\)<.*$/\1|\2/g'
echo "CSV Style or comma separeted"
cat $1 |sed 's/^.*=\"\([a-z]*\)\">Count: \([0-9]\)<.*$/\1,\2/g'

format date in file using awk

Content of the file is
Feb-01-2014 one two
Mar-02-2001 three four
I'd like to format the first field (the date) to %Y%m%d format
I'm trying to use a combination of awk and date command, but somehow this is failing even though i got the feeling i'm almost there:
cat infile | awk -F"\t" '{$1=system("date -d " $1 " +%Y%m%d");print $1"\t"$2"\t"$3}' > test
this prints out date's usage pages which makes me think that the date command is triggered properly, but there is something wrong with the argument, do you see the issue somewhere?
i'm not that familiar with awk,
You don't need date for this, its simply rearranging the date string:
$ awk 'BEGIN{FS=OFS="\t"} {
split($1,t,/-/)
$1 = sprintf("%s%02d%s", t[3], (match("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3, t[2])
}1' file
20140201 one two
20010302 three four
You can use:
while read -r a _; do
date -d "$a" '+%Y%m%d'
done < file
20140201
20010302
system() returns the exit code of the command.
Instead:
cat infile | awk -F"\t" '{"date -d " $1 " +%Y%m%d" | getline d;print d"\t"$2"\t"$3}'
$ awk '{var=system("date -d "$1" +%Y%m%d | tr -d \"\\n\"");printf "%s\t%s\t%s\n", var, $2, $3}' file
201402010 one two
200103020 three four

Adding a single date to the first column of a file

I have a file that looks like
1234-00AA12 .02
5678-11BB34 .03
In a bash script I have an expression like
day=$(...)
that greps a date in the format YYYY/MM/DD (if this matters), let's say 2014/01/21 for specificity.
I want to produce the following:
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
(The first column is the day, the second and third columns are fixed as "1").
After a bit of googling I tried:
cat file|awk -F "-" '{split($2,array," "); printf "%s,%s,%s,%s,%s,%s\n",$day,"1","1",$1,array[1],array[2]}'> output.csv
but $day isn't working with awk.
Any help would be appreciated.
Try this awk:
awk -v d=$(date '+%Y/%m/%d') '{print d,1,1,$1,$2}' OFS=, file
2014/02/07,1,1,1234-00AA12,.02
2014/02/07,1,1,5678-11BB34,.03
$ awk -v day="$day" 'BEGIN{FS="[ -]";OFS=","} {print day,1,1,$1,$2,$3}' file
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
awk wouldn't understand shell variables. You need to pass those to it:
awk -vdd="$day" -F "-" '{split($2,array," "); printf "%s,%s,%s,%s,%s,%s\n",dd,"1","1",$1,array[1],array[2]}'
Moreover, rather than saying:
cat file | awk ...
avoid the useless use of cat:
awk file
With bash
day="2014/01/21"
(
IFS=,
while IFS=" -" read -ra fields; do
new=( "$day" 1 1 "${fields[#]}" )
echo "${new[*]}"
done < file
)
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
I run the while loop in a subshell just to keep changes to IFS localized.

Resources