When using SED to replace timestamps with human readable dates the timestamp is always replaced with the epoch date plus the value placed after"\"
I have this working with Perl example but prefer to use sed. I have tried varying escape sequences and quoting "'` etc.
sed -re "s/([0-9]{10})/$(date -d #\1)/g" mac.txt
input (one string):
834|task|3||1561834555|Ods|12015|info|Task HMI starting 837|task|3||1561834702|Nailsd|5041|info|Configured with engine 6000.8403 (/opt/NAI/LinuxShield/engine/lib/liblnxfv.so), dats 9297.0000 (/opt/NAI/LinuxShield/engine/dat), 197 extensions, 0 extra drivers
Expect date conversion but results are:
834|task|3||Wed Dec 31 19:00:01 EST 1969|Ods|12015|info|Task HMI starting 837|task|3||Wed Dec 31 19:00:01 EST 1969|Nailsd|5041|info|Configured with engine 6000.8403 (/opt/NAI/LinuxShield/engine/lib/liblnxfv.so), dats 9297.0000 (/opt/NAI/LinuxShield/engine/dat), 197 extensions, 0 extra drivers 838|task.
basically:
this is what is being called:
$(date -d #\1) instead of $(date -d #\1561834555)
sed never sees the $(date -d #\1) -- the shell has executed that command substitution before sed launches.
You could something like this:
sed -Ee 's/([0-9]{10})/$(date -d #\1)/g' -e 's/^/echo "/' -e 's/$/"/' mac.txt | sh
(note the single quotes, preventing the shell from doing any expansions)
However, it's much more sensible to use a language that has date facilities built-in. GNU awk:
gawk -F '|' -v OFS='|' '{
for (i=1; i<=NF; i++)
if ($i ~ /^[0-9]{10}$/)
$i = strftime("%c", $i)
print
}' mac.txt
Perl will probably be installed:
perl -MPOSIX=strftime -pe 's{\b(\d{10})\b}{ strftime("%c", localtime $1) }ge' mac.txt
# or, this is more precise as it only substitutes *fields* that consist of a timestamp
perl -MPOSIX=strftime -F'\|' -lape '$_ = join "|", map { s/^(\d{10})$/ strftime("%c", localtime $1) /ge; $_ } #F' mac.txt
Related
I have a txt file with 1000 rows of various epoch times (1396848990 = Sun Apr 6 22:36:30 PDT 2014). How can I count the number of rows taking place between 8 PM and midnight.
You can use awk to do the following:
awk 'int(strftime("%H", $1)) >= 20 {print $1}' $input_file | wc -l
It will use strftime() to convert unix epoch time stamps to hours (%H), cast it to an integer (int()) and compare to the number 20. If the number is larger - print the time stamp.
On the outside, wc can take care of counting the lines printed.
Of course, you can count with awk, too:
awk 'int(strftime("%H", $1)) >= 20 {n+=1} END{print n}' $input_file
It will silently initialize the variable n with zero and print the result at the end.
Edit: strftime() seems to exist in GNU awk:
$ awk -V
GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.4, GNU MP 6.1.0)
While awk surely is faster and easier to read, I think you can do this using only (modern) bash.
Something like this should work:
let counter=0; (while read -ra epoch; \
do [ $(printf "%(%H)T" ${epoch[0]}) -ge 20 ] && \
let "counter++"; done; echo $counter) <inputfile
It works like Pavels answer but with bash's printf to format the date (probably still strftime behind the curtains).
reading to an array (-a) and let counter++ only works in modern bash (4?)
If your date command supports -d (--date) option, how about:
from=$(date +%s -d "Apr 6 20:00:00 PDT 2014")
to=$(date +%s -d "Apr 7 0:00:00 PDT 2014")
awk -v from=$from -v to=$to '$0 >= from && $0 <= to' file | wc -l
I assume that there may be a better way to do it but the only one I came up with was using AWK.
I have a file with name convention like following:
testfile_2016_03_01.txt
Using one command I am trying to shift it by one day testfile_20160229.txt
I started from using:
file=testfile_2016_03_01.txt
IFS="_"
arr=($file)
datepart=$(echo ${arr[1]}-${arr[2]}-${arr[3]} | sed 's/.txt//')
date -d "$datepart - 1 days" +%Y%m%d
the above works fine, but I really wanted to do it in AWK. The only thing I found was how to use "date" inside AWK
new_name=$(echo ${file##.*} | awk -F'_' ' {
"date '+%Y%m%d'" | getline date;
print date
}')
echo $new_name
okay so two things happen here. For some reason $4 also contains .txt even though I removed it(?) ##.*
And the main problem is I don't know how to pass the variables to that date, the below doesn't work
`awk -F'_' '{"date '-d "2016-01-01"' '+%Y%m%d'" | getline date; print date}')
ideally I want 2016-01-01 to be variables coming from the file name $2-$3-$4 and substract 1 day but I think I'm getting way too many single and double quotes here and my brain is losing..
Equivalent awk command:
file='testfile_2016_03_01.txt'
echo "${file%.*}" |
awk -F_ '{cmd="date -d \"" $2"-"$3"-"$4 " -1 days\"" " +%Y%m%d";
cmd | getline date; close(cmd); print date}'
20160229
WIth GNU awk for time functions:
$ file=testfile_2016_03_01.txt
$ awk -v file="$file" 'BEGIN{ split(file,d,/[_.]/); print strftime(d[1]"_%Y%m%d."d[5],mktime(d[2]" "d[3]" "d[4]" 12 00 00")-(24*60*60)) }'
testfile_20160229.txt
This might work for you:
file='testfile_2016_03_01.txt'
IFS='_.' read -ra a <<< "$file"
date -d "${a[1]}${a[2]}${a[3]} -1 day" "+${a[0]}_%Y%m%d.${a[4]}"
Now I have the code to work on this file type:
cat myfile.txt
XSAP_SM1_100 COR-REV-SAPQ-P09 - 10/14/2013 -
SCHEDULE XSAP_SM1_100#COR-REV-SAPQ-P09 TIMEZONE Europe/Paris
ON RUNCYCLE RULE1 "FREQ=WEEKLY;BYDAY=WE"
EXCEPT RUNCYCLE CALENDAR2 FR1DOFF -1 DAYS
EXCEPT RUNCYCLE SIMPLE3 11/11/2011
AT 0530
:
XSAP_SM1_100#CORREVSAPQP09-01
AT 0640 TIMEZONE Europe/Paris
XSAP_SM1_100#CORREVSAPQP09-02
AT 0645 TIMEZONE Europe/Paris
Code is
awk 'BEGIN { RS=":"; FS="\n"}
NR==2 {
for(i=1;i<=NF;++i) {
if($i !~ /^$/) {
split($i,tmp,"#")
i=i+1
split($i,tmp2," ")
printf "\"%s\",\"%s\",\"%s\"\n", tmp[1],tmp[2],tmp2[2]
}
}
}'
But I have another file type i.e.I'll be executing this command to 1000s of files in for loop but as of I have consolidated and only for below type it's not working as expected.
] cat testing.txt
ODSSLT_P09 COR-ODS-SMT9-B01 - 12/29/2015 -
SCHEDULE ODSSLT_P09#COR-ODS-SMT9-B01 TIMEZONE UTC
ON RUNCYCLE RULE1 "FREQ=DAILY;"
AT 0505
PRIORITY 11
:
ODSSLT_P09#CORODSSMT9001-01
UNTIL 2355 TIMEZONE Asia/Shanghai
EVERY 0100
ODSSLT_P09#CORODSSMT9001-02
AT 2355
EVERY 0100
ODSSLT_P09#CORODSSMT9001-03
ODSSLT_P09#CORODSSMT9001-04
UNTIL 2355 TIMEZONE Asia/Shanghai
EVERY 0100
EOF
Expected output for this file:
"ODSSLT_P09","CORODSSMT9001-01",""
"ODSSLT_P09","CORODSSMT9001-02","2355"
"ODSSLT_P09","CORODSSMT9001-03",""
"ODSSLT_P09","CORODSSMT9001-04",""
Actual output from the code is
| grep -v -i -w -E
"CONFIRMED|DEADLINE|DAY|DAYS|EVERY|NEEDS|OPENS|PRIORITY|PROMPT|UNTIL|AWSBIA291I|END|FOLLOWS" |
awk 'BEGIN { RS=":"; FS="\n"}
NR==2 {for(i=1;i<=NF;++i) {
if($i !~ /^$/) {
split($i,tmp,"#")
i=i+1
split($i,tmp2," ")
printf "\"%s\",\"%s\",\"%s\"\n", tmp[1],tmp[2],tmp2[2]
}}}'
output just gives:
"ODSSLT_P09","CORODSSMT9001-01",""
"AT 2355","",""
"ODSSLT_P09","CORODSSMT9001-04",""
The best solution would be a small awk program doing everything (awk will loop through the input, so write something without a while).
Since you have tagged with ksh and not bash or linux, I do not trust your version of awk.
First try joining the lines and split again except for the AT. I hope no lines will have the string EOL, so I will join with an EOL marker.
sed 's/$/EOL/' myfile.txt |
tr -d "\n" |
sed -e 's/EOLAT/ AT/g' -e 's/EOL/\n/g'
Perhaps your sed version will not understand the \n, in that case replace it with a real newline.
I know what I want to do with the sed output, so I will filter before sed and change the sed commands.
foundcolon="0";
grep -E "^:$|XSAP|AT" myfile.txt |
sed 's/$/EOL/' |
tr -d "\n" |
sed -e 's/EOLAT//g' -e 's/EOL/\n/g' -e 's/#/ /g' |
while read -r xsap corr numm rest_of_line; do
if [ "${foundcolon}" = "0" ]; then
if [ "${xsap}" = ":" ]; then
foundcolon="1"
fi
continue
fi
printf '"%s","%s","%s"\n' "${xsap}" "${corr}" "${numm}";
done
Using another sed option, sed -e '/address1/,/address2/ d' will make it even more simple:
grep -E "^:$|XSAP|AT" myfile.txt |
sed 's/$/EOL/' |
tr -d "\n" |
sed -e 's/EOLAT//g' -e 's/EOL/\n/g' -e '1,/^:$/ d' -e 's/#/ /g' |
while read -r xsap corr numm rest_of_line; do
printf '"%s","%s","%s"\n' "${xsap}" "${corr}" "${numm}";
done
Here's a more or less pure awk solution, which produces literally the
requested output for the given input file. It suffers from having no
knowledge of the problem domain.
awk '
/^:/ { start=1; next }
! start {next}
$1 == "AT" {
split(last,a,/#/)
printf "\"%s\",\"%s\",\"%s\"\n", a[1], a[2], $2
last=""
next
}
{
last=$0
}' data
using java:
File file = new File("C:/Users/Administrator/Desktop/es.txt");
List<String> lines = FileUtils.readLines(file, "utf-8");
for (String line : lines) {
String[] arr = line.split("\\u007C\\u001C");
System.out.println(arr.length);
System.out.println(Arrays.toString(arr));
}
how can I do it in shell(awk, tr, or sed)?
I've tried this, but it didn't work:
awk -F\u007c\u001c '{print $1}' es.txt
Thanks.
Obviously, U+007C and U+001C are plain old 7-bit ASCII characters, so splitting on those doesn't actually require any Unicode support (apart from possibly handling any ASCII-incompatible Unicode encoding in the files you are manipulating; but your question indicates that your data is in UTF-8, so that does not seem to be the case here. UTF-16 would require the splitting tool to be specifically aware of and compatible with the encoding).
Assuming your question can be paraphrased as "if I know the numeric Unicode code point I want to split on, how do I pass that to a tool which is capable of splitting on it", my recommendation would be Perl.
perl -CSD -aF'\N{U+1f4a9}' -nle 'print $F[0]' es.txt
using U+1F4A9 as the separator. (Perl's arrays are zero-based, so $F[0] corresponds to Awk's $1. The -a option requests field splitting to the array #F; normally, Perl does not explicitly split the input into fields.) If the hex code for the code point you want to use as the field separator is in a shell variable, use double quotes instead of single, obviously.
PIPE='007C'
FS='001C'
perl -CSD -aF"\N{U+$PIPE}\N{U+$FS}" -nle 'print $F[0]' es.txt
Alternatively, if the tool you want to use handles UTF-8 transparently, you can use the ANSI C quoting facility of Bash to specify the separator. Unicode support seems only to have been introduced in Bash 4.2 so e.g. Debian Squeeze (currently oldoldstable) does not have it.
awk -F$'\U0001f4a9' '{print $1}' es.txt # or $'\u007c' for 4-digit code points
However, because the quoting facility is a form of single quotes, you can't (easily) have the separator's code point value in a variable.
gawk 4.1.3
[root#test /tmp]$ more a
\u8BF7\u5C06\u60A8\u8981\u8F6C\u6362\u7684\u6C49\u6587\u8981\u8F6C\u5185\u5BB9\u
7C98\u8D34\u5728\u8FD9\u91CC\u3002
[root#test /tmp]$ awk -F '.u8981..8F6C' '{print $1}' a
\u8BF7\u5C06\u60A8
[root#test /tmp]$ awk -F '.u8981..8F6C' '{print $2}' a
\u6362\u7684\u6C49\u6587
[root#test /tmp]$ awk -F '.u8981..8F6C' '{print $3}' a
\u5185\u5BB9\u7C98\u8D34\u5728\u8FD9\u91CC\u3002
Pure bash:
As your question is tagged shell there is a pure bash way:
declare -a out=()
pnt=0
while IFS= read -d '' -n1 char ;do
LANG=C LC_ALL=C printf -v val %d "'$char"
(( val == 195 )) && out[pnt]+= &&
printf -v out[pnt+1] "%s" "${char}" &&
((pnt+=2)) ||
printf -v out[pnt] "%s%s" "${out[pnt]}" "${char}"
done <<<'Il est déjà très tard!'
Where submited string containg UTF8 chars and newlines, this will create an array of 7 strings:
declare -p o
declare -a o=([0]="Il est d" [1]="é" [2]="j" [3]="à" [4]=" tr" [5]="è" [6]=$'s tard!\n')
or
cat -n <(printf -- "<%s>\n" "${o[#]#Q}")
1 <'Il est d'>
2 <'é'>
3 <'j'>
4 <'à'>
5 <' tr'>
6 <'è'>
7 <$'s tard!\n'>
Where even fields are separators and odd fields are content.
As a function:
splitOnUnicod () {
local -n out=$1
out=()
local -i pnt=0 cval
local char
while IFS= read -d '' -rn1 char; do
LANG=C LC_ALL=C printf -v cval %d "'$char";
((cval==195)) && out[pnt]+= && printf -v out[++pnt] %s "$char" && pnt+=1 || printf -v out[pnt] %s%s "${out[pnt]}" "$char";
done
}
Then
splitOnUnicod myvar <<<"Généralités"
declare -p myvar
declare -a myvar=([0]="G" [1]="é" [2]="n" [3]="é" [4]="ralit" [5]="é" [6]=$'s\n')
splitOnUnicod myvar < <(printf "Iñès.")
declare -p myvar
declare -a myvar=([0]="I" [1]="ñ" [2]="" [3]="è" [4]="s.")
Where ñ as è are separators, they are in even fields.
paste <(printf %s\\n "${!myvar[#]}") <(printf %s\\n "${myvar[#]}")
0 I
1 ñ
2
3 è
4 s.
I have a file (as one often does) with dates in *nix time as seconds from the Epoch, followed by a message and a final "thread" field I am wanting to select. All separated with a '|' as exported from a sqlite DB...
e.g
1306003700|SENT|21
1277237887|SENT|119
1274345263|SENT|115
1261168663|RECV|21
1306832459|SENT|80
1306835346|RECV|80
Basically, I can use sed easily enough to select and print lines that match the "thread" field and print the respective times with messages, thus:
> cat file | sed -n "s/^\([0-9]*\)\|\(.*\)\|80$/\1 : \2/p"
1306832459 : SENT
1306835346 : RECV
But what I really want to do is also pass the time field through the unix date command, so:
> cat file | sed -n "s/^\([0-9]*\)\|\(.*\)\|80$/`date -r \1` : \2/p"
But this doesn't seem to work - even though it seems to accept it. It just prints out the same (start of Epoch) date:
Thu 1 Jan 1970 01:00:01 BST : SENT
Thu 1 Jan 1970 01:00:01 BST : RECV
How/can I evaluate/interpolate the back reference \1 to the date command?
Maybe sed isn't the way to match these lines (and format the output in one go)...
awk is perfect for this.
awk -F"|" '$3 == '80' { print system("date -r " $1), ":", $2 }' myfile.txt
Should work.(Can't guarantee that the system call is right though, didn't test it)
This pure bash
wanted=80
(IFS=\|; while read sec message thread
do
[[ $thread == $wanted ]] && echo $(date -r $sec) : $message
done) < datafile.txt
print
Tue May 31 11:00:59 CEST 2011 : SENT
Tue May 31 11:49:06 CEST 2011 : RECV
You can quote variables in " " for the better safety...
Perl is handy here:
perl -MPOSIX -F'\|' -lane '
next unless $F[2] == "80";
print(strftime("%Y-%m-%d %T", localtime $F[0]), " : ", $F[1])
' input.file
This might work for you:
sed -n 's/^\([0-9]*\)|\(.*\)|80$/echo "$(date -d #\1) : \2"/p' file | sh
or if you have GNU sed:
sed -n 's/^\([0-9]*\)|\(.*\)|80$/echo "$(date -d #\1) : \2"/ep' file
Using awk:
$(awk -F'|' '/80$/{printf("echo $(date -d #%s) : %s;",$1,$2);}' /path/to/file)
The date command will be executed before the sed command.
It may be easiest to break out perl or python for this job, or you can use some kind of bash loop otherwise.
Inspired by Rafe's answer above:
awk -F"|" '$3 == '80' { system("date -d #" $1 " | tr -d \"\n\""); print " :", $2 }' myfile.txt
For my version of date, it seems that -d is the argument, instead of -r, and that it requires a leading #. The other change here is that the system call executes (so date actually does the printing, newline and all - hence the tr) and returns the exit code. We don't really care to print the exit code (0), so we move the system call outside of any awk prints.