Parsing timestamp using sed and embedded command - shell

There's a file with some lines containing some text and either date or time stamp:
...
string1-20141001
string2-1414368000000
string3-1414454400000
...
I want to quickly convert time stamps to dates, like this:
$ date -d #1414368000 +"%Y%m%d"
20141027
and I want to do this dynamically with sed or some similar command line tool. For testing I unsuccessfully use this:
$ echo "something-1414454400000" | sed "s/-\(..........\)...$/-$(date -d #\\1 +'%Y%m%d')/"
date: invalid date '#\\1'
something-
but echoing seems to be working:
$ echo "something-1414454400000" | sed "s/-\(..........\)...$/-$(echo \\1)/"
something-1414454400
so what could be done?

It's interesting what's happening here. Some pointers:
Always single-quote your regex for sed, if possible, when using BASH (etc), especially if using special characters like$. This is why date is being run (with -d #\\1) before sed even gets involved.
Your "working" echo example isn't, actually (I believe): echo \\1 produces \1 (and as above, will do so before sed even gets invoked). This then happens to valid sed replacement syntax, so will substitute your group on the LHS, which is why the output looks about right.
Note that by using -r, you can use easier / more advanced regex syntax.
Hard to say exactly what to do without a bit more context, but to fix the immediate problems, try something like:
echo "something-1414454400000" | sed -re 's/-([0-9]{10,}).+/-$(date -d #\1 +"%Y%m%d")/'
which produces: $(date -d #1414454400) (which you can then pipe to sh)
Or for a more complete solution, you can change the regex to produce a shell command directly, and pipe it:
echo "something-1414454400000" | sed -re 's/(.*-)([0-9]{10,10}).+/echo \1$(date -d #\2 \"+%Y%M%d\")/' | sh
..producing something-20140028

You can do this in BASH:
while read -r p; do
if [[ "$p" =~ ^(.+-)([0-9]{10}).{3}$ ]]; then
echo -n "${BASH_REMATCH[1]}"
date -d "#${BASH_REMATCH[2]}" +"%Y%m%d"
else
echo "$p"
fi
done < file
OUTPUT:
string1-20141001
string2-20141026
string3-20141027

awk -F- 'BEGIN { OFS=FS }
$2 ~ /^[0-9]{13}$/ {
"date -d#" $2/1000 " +%Y%m%d " | getline t; $2=t }1'

Just try this command. I have checked it. It is working on your inputs.
cat file | sed -E "s,(.*)-(.*),\1-`date -d #1414368000 +'%Y%m%d'`,g"

Related

Modify bash variable with sed

Why doesn't the follow bash script work? I would like it to output
two lines like this:
XXXXXXX
YYYYYYY
It works if I change the sed line to use a filename instead of the variable, but I want to use the variable.
#!/bin/bash
input=$(echo -e '=======\n-------\n')
for sym in = -; do
if [ "$sym" == '-' ]; then
replace=Y
else
replace=X
fi
printf "%s\n" "s/./$replace/g"
done | sed -f- <<<"$input"
The main problem is that you're giving sed two sources to read standard input from: the for loop that is fed through the pipe, and the variable coming through the here-string. As it turns out, the here-string gets precedence and sed complains that there are extra characters after a command (= is a command).
Instead of a here-string, you could use process substitution:
for sym in = -; do
if [ "$sym" == '-' ]; then
replace=Y
else
replace=X
fi
printf "%s\n" "s/./$replace/g"
done | sed -f- <(printf '%s\n' '=======' '-------')
You'll notice that the output isn't what you want, though, namely
YYYYYYY
YYYYYYY
This is because the sed script you end up with looks like this:
s/./X/g
s/./Y/g
No matter what you do first, the last command replaces everything with Y.

Bad Substitution error with pdfgrep as variable?

I'm using a bash script to parse information from a PDF and use it to rename the file (with the help of pdfgrep). However, after some working, I'm receiving a "Bad Substitution" error with line 5. Any ideas on how to reformat it?
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+")
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+")
$({ read dobmonth; read dobday; read dobyear; } < (pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+"))
# Check id1 is found, else do nothing
if [ ${#id1} ]; then
mv "$f" "${id1}_${id2}_${printf '%02d-%02d-%04d\n' "$dobmonth" "$dobday" "$dobyear"}.pdf"
fi
done
There are several unrelated bugs in this code; a corrected version might look like the following:
#!/usr/bin/env bash
shopt -s nullglob nocaseglob
for f in *.pdf; do
id1=$(pdfgrep -i "ID #: " "$f" | grep -oE "[M][0-9][0-9]+") || continue
id2=$(pdfgrep -i "Second ID: " "$f" | grep -oE "[V][0-9][0-9]+") || continue
{ read dobmonth; read dobday; read dobyear; } < <(pdfgrep -i "Date Of Birth: " "$f" | grep -oE "[0-9]+")
printf -v date '%02d-%02d-%04d' "$dobmonth" "$dobday" "$dobyear"
mv -- "$f" "${id1}_${id2}_${date}.pdf"
done
< (...) isn't meaningful bash syntax. If you want to redirect from a process substitution, you should use the redirection syntax < and the process substitution <(...) separately.
$(...) generates a subshell -- a separate process with its own memory, such that variables assigned in that subprocess aren't exposed to the larger shell as a whole. Consequently, if you want the contents you set with read to be visible, you can't have them be in a subshell.
${printf ...} isn't meaningful syntax. Perhaps you wanted a command substitution? That would be $(printf ...), not ${printf ...}. However, it's more efficient to use printf -v varname 'fmt' ..., which avoids the overhead of forking off a subshell altogether.
Because we put the || continues on the id1=$(... | grep ...) command, we no longer need to test whether id1 is nonempty: The continue will trigger and cause the shell to continue to the next file should the grep fail.
Do what Charles suggests wrt creating the new file name but you might consider a different approach to parsing the PDF file to reduce how many pdfregs and pipes and greps you're doing on each file. I don't have pdfgrep on my system, nor do I know what your input file looks like but if we use this input file:
$ cat file
foo
ID #: M13
foo
Date Of Birth: 05 21 1996
foo
Second ID: V27
foo
and grep -E in place of pdfgrep then here's how I'd get the info from the input file by just reading it once with pdfgrep and parsing that output with awk instead of reading it multiple times with pdfgrep and using multiple pipes and greps to extract the info you need:
$ grep -E -i '(ID #|Second ID|Date Of Birth): ' file |
awk -F': +' '{f[$1]=$2} END{print f["ID #"], f["Second ID"], f["Date Of Birth"]}'
M13 V27 05 21 1996
Given that you can use the same read approach to save the output in variables (or an array). You obviously may need to massage the awk command depending on what your pdfgrep output actually looks like.

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

Speed up bash filter function to run commands consecutively instead of per line

I have written the following filter as a function in my ~/.bash_profile:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
exit 0
}
to find lines of anything piped into it matching a regular expression, and highlight matches using ANSI escape codes on a VT100-compatible terminal.
For example, the following finds and highlights the strings bin, U or 1 which are whole words in the last 10 lines of /etc/passwd:
tail /etc/passwd | hilite "\b(bin|[U1])\b"
However, the script runs very slowly as each line forks an echo, egrep and sed.
In this case, it would be more efficient to do egrep on the entire input, and then run sed on its output.
How can I modify my function to do this? I would prefer to not create any temporary files if possible.
P.S. Is there another way to find and highlight lines in a similar way?
sed can do a bit of grepping itself: if you give it the -n flag (or #n instruction in a script) it won't echo any output unless asked. So
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
could be simplified to
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
EDIT:
Here's the whole function:
hilite() {
REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g");
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
}
That's all there is to it - no while loop, reading, grepping, etc.
If your egrep supports --color, just put this in .bash_profile:
hilite() { command egrep --color=auto "$#"; }
(Personally, I would name the function egrep; hence the usage of command).
I think you can replace the whole while loop with simply
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
because sed can read from stdin line-by-line so you don't need read
I'm not sure if running egrep and piping to sed is faster than using sed alone, but you can always compare using time.
Edit: added -n and p to sed to print only highlighted lines.
Well, you could simply do this:
egrep "$1" $line | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
But I'm not sure that it'll be that much faster ; )
Just for the record, this is a method using a temporary file:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
export FILE=$2
if [ -z "$FILE" ]
then
export FILE=~/tmp
echo -n > $FILE
while read line
do
echo $line >> $FILE
done
fi
egrep "$1" $FILE | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
return $?
}
which also takes a file/pathname as the second argument, for case like
cat /etc/passwd | hilite "\b(bin|[U1])\b"

Substitution with sed + bash function

my question seems to be general, but i can't find any answers.
In sed command, how can you replace the substitution pattern by a value returned by a simple bash function.
For instance, I created the following function :
function parseDates(){
#Some process here with $1 (the pattern found)
return "dateParsed;
}
and the folowing sed command :
myCatFile=`sed -e "s/[0-3][0-9]\/[0-1][0-9]\/[0-9][0-9]/& parseDates &\}/p" myfile`
I found that the caracter '&' represents the current pattern found, i'd like it to be passed to my bash function and the whole pattern to be substituted by the pattern found +dateParsed.
Does anybody have an idea ?
Thanks
you can use the "e" option in sed command like this:
cat t.sh
myecho() {
echo ">>hello,$1<<"
}
export -f myecho
sed -e "s/.*/myecho &/e" <<END
ni
END
you can see the result without "e":
cat t.sh
myecho() {
echo ">>hello,$1<<"
}
export -f myecho
sed -e "s/.*/myecho &/" <<END
ni
END
Agree with Glenn Jackman.
If you want to use bash function in sed, something like this :
sed -rn 's/^([[:digit:].]+)/`date -d #&`/p' file |
while read -r line; do
eval echo "$line"
done
My file here begins with a unix timestamp (e.g. 1362407133.936).
Bash function inside sed (maybe for other purposes):
multi_stdin(){ #Makes function accepet variable or stdin (via pipe)
[[ -n "$1" ]] && echo "$*" || cat -
}
sans_accent(){
multi_stdin "$#" | sed '
y/àáâãäåèéêëìíîïòóôõöùúûü/aaaaaaeeeeiiiiooooouuuu/
y/ÀÁÂÃÄÅÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜ/AAAAAAEEEEIIIIOOOOOUUUU/
y/çÇñÑߢÐð£Øø§µÝý¥¹²³ªº/cCnNBcDdLOoSuYyY123ao/
'
}
eval $(echo "Rogério Madureira" | sed -n 's#.*#echo & | sans_accent#p')
or
eval $(echo "Rogério Madureira" | sed -n 's#.*#sans_accent &#p')
Rogerio
And if you need to keep the output into a variable:
VAR=$( eval $(echo "Rogério Madureira" | sed -n 's#.*#echo & | desacentua#p') )
echo "$VAR"
do it step by step. (also you could use an alternate delimiter , such as "|" instead of "/"
function parseDates(){
#Some process here with $1 (the pattern found)
return "dateParsed;
}
value=$(parseDates)
sed -n "s|[0-3][0-9]/[0-1][0-9]/[0-9][0-9]|& $value &|p" myfile
Note the use of double quotes instead of single quotes, so that $value can be interpolated
I'd like to know if there's a way to do this too. However, for this particular problem you don't need it. If you surround the different components of the date with ()s, you can back reference them with \1 \2 etc and reformat however you want.
For instance, let's reverse 03/04/1973:
echo 03/04/1973 | sed -e 's/\([0-9][0-9]\)\/\([0-9][0-9]\)\/\([0-9][0-9][0-9][0-9]\)/\3\/\2\/\1/g'
sed -e 's#[0-3][0-9]/[0-1][0-9]/[0-9][0-9]#& $(parseDates &)#' myfile |
while read -r line; do
eval echo "$line"
done
You can glue together a sed-command by ending a single-quoted section, and reopening it again.
sed -n 's|[0-3][0-9]/[0-1][0-9]/[0-9][0-9]|& '$(parseDates)' &|p' datefile
However, in contrast to other examples, a function in bash can't return strings, only put them out:
function parseDates(){
# Some process here with $1 (the pattern found)
echo dateParsed
}

Resources