Shell Script String Manipulation - bash

I'm trying to replace an epoch timestamp within a string, with a human readable timestamp. I know how to convert the epoch to the time format I need (and have been doing so manually), though I'm having trouble figuring out how to replace it within the string (via script).
The string is a file name, such as XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz (epoch is bolded).
I've been converting the timestamp with the following gawk code:
gawk 'BEGIN{print strftime("%Y%m%d.%k%M",1296243507)}'
I'm generally unfamiliar with bash scripting. Can anyone give me a nudge in the right direction?
thanks.

You can use this
date -d '#1296066338' +'%Y%m%d.%k%M'
in case you don't want to invoke awk.

Are all filenames the same format? Specifically, "." + epoch + ".gz"?
If so, you can use a number of different routes. Here's one with sed:
$ echo "XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz" | sed 's/.*\.\([0-9]\+\)\.gz/\1/'
1296066338
So that extracts the epoch, then send it to your gawk command. Something like:
#!/bin/bash
...
epoch=$( echo "XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz" | sed 's/.*\.\([0-9]\+\)\.gz/\1/' )
readable_timestamp=$( gawk "BEGIN{print strftime(\"%Y%m%d.%k%M\",${epoch})}" )
Then use whatever method you want to replace the number in the filename. You can send it through sed again, but instead of saving the epoch, you would want to save the other parts of the filename.
EDIT:
For good measure, a working sample on my machine:
#!/bin/bash
filename="XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz"
epoch=$( echo ${filename} | sed 's/.*\.\([0-9]\+\)\.gz/\1/' )
readable_timestamp=$( gawk "BEGIN{print strftime(\"%Y%m%d.%k%M\",${epoch})}" )
new_filename=$( echo ${filename} | sed "s/\(.*\.\)[0-9]\+\(\.gz\)/\1${readable_timestamp}\2/" )
echo ${new_filename}

You can use Bash's string manipulation and AWK's variable passing to avoid having to make any calls to sed or do any quote escaping.
#!/bin/bash
file='XXXX-XXX-2011-01-25-3.6.2-record.pb.1296066338.gz'
base=${file%.*.*} # XXXX-XXX-2011-01-25-3.6.2-record.pb
epoch=${file#$base} # 1296066338.gz
epoch=${epoch%.*} # 1296066338
# we can extract the extension similarly, unless it's safe to assume it's ".gz"
humantime=$(gawk -v "epoch=$epoch" 'BEGIN{print strftime("%Y%m%d.%k%M",epoch)}')
newname=$base.$humantime.gz
echo "$newname"
Result:
XXXX-XXX-2011-01-25-3.6.2-record.pb.20110126.1225.gz

Related

I need to know how to access the characters that I want from an argument from the command line

I need to know how to access to any character I want from an argument from the command line. Something like the cut command in bash, but from the command line.
For example if I write a date in the format dd/mm/yyyy, and I only want the characters dd, mm, and yyyy. How do I do that?
I've tried with the read command, but I don't know how to use it so well.
Bash provide a few ways to extract substrings (from command line input, or just variable).
Extract by fixed positions
# Process command line
input=$1
# Extract by position - MM columns
dd=${input:0:2}
mm=${input:3:2}
yyyy=${input:6:4}
echo "dd=$dd, mm=$mm, yyyy=$yyyy"
Extract by splitting (will work for cases like 1/2/2019)
input=$1
# extract first segment (DD)
dd=${input%%/*}
# Strip first segment from input
input=${input#*/}
# Extract second segment (MM)
mm=${input%%/*}
input=${input#*/}
yyyy=${input%/*}
echo "dd=$dd, mm=$mm, yyyy=$yyyy"
You probably want to set IFS for read, though the question isn't totally clear.
IFS=/ read dd mm yyyy <<< "31/12/2019"
echo "$yyyy-$mm-$dd" # -> 2019-12-31
Thanks to all of you, your answers were very helpful! I've solved my problem, sorry for not answering earlier. I did this
date=$1
day=$2
dd=$(echo $date | awk -F / '{print $1}')
mm=$(echo $date | awk -F / '{print $2}')
yyyy=$(echo $date | awk -F / '{print $3}')

How can I generate multiple counts from a file without re-reading it multiple times?

I have large files of HTTP access logs and I'm trying to generate hourly counts for a specific query string. Obviously, the correct solution is to dump everything into splunk or graylog or something, but I can't set all that up at the moment for this one-time deal.
The quick-and-dirty is:
for hour in 0{0..9} {10..23}
do
grep $QUERY $FILE | egrep -c "^\S* $hour:"
# or, alternately
# egrep -c "^\S* $hour:.*$QUERY" $FILE
# not sure which one's better
done
But these files average 15-20M lines, and I really don't want to parse through each file 24 times. It would be far more efficient to parse the file and count each instance of $hour in one go. Is there any way to accomplish this?
You can ask grep to output the matching part of each line with -o and then use uniq -c to count the results:
grep "$QUERY" "$FILE" | grep -o "^\S* [0-2][0-9]:" | sed 's/^\S* //' | uniq -c
The sed command is there to keep only the two digit hour and the colon, which you can also remove with another sed expression if you want.
Caveats: this solution works with GNU grep and GNU sed, and will produce no output, rather than "0", for hours with no log entries. Kudos to #EdMorton for pointing these issues out in the comments, and other issues that were fixed in the answer above.
Assuming the timestamp appears with a space before the 2-digit hour, then a colon after
gawk -v patt="$QUERY" '
$0 ~ patt && match($0, / ([0-9][0-9]):/, m) {
print > (m[1] "." FILENAME)
}
' "$FILE"
This will create 24 files.
Requires GNU awk for the 3-arg form of match()
This is probably what you really need, using GNU awk for the 3rd arg to match() and making assumptions about what your input might look like, what your QUERY variable might contain, and what the output should look like:
awk -v query="$QUERY" '
match($0, " ([0-9][0-9]):.*"query, a) { cnt[a[1]+0]++ }
END {
for (hr=0; hr<=23; hr++) {
printf "%02d = %d\n", hr, cnt[hr]
}
}
' "$FILE"
Don't really use all upper case for non-exported shell variables btw - see Correct Bash and shell script variable capitalization.

convert date string with hour and minutes to seconds in bash

I have a sample string that contains YYYYMMDDHHMM format.
I am trying to convert it into seconds but getting below error.
[vagrant#CentOS-Seven ~]$ export SAMPLE_DATE=201812051147
[vagrant#CentOS-Seven ~]$ echo $SAMPLE_DATE
201812051147
[vagrant#CentOS-Seven ~]$ date -d $SAMPLE_DATE
date: invalid date ‘201812051147’
[vagrant#CentOS-Seven ~]$
According to this answer date has not option to specify the input format. Therefore you have to convert your input into a format that is accepted by date. The sed command in the following command converts 201812051147 to 2018-12-05 11:47.
To convert a date to seconds, use the output format +%s.
$ input=201812051147
$ date -d "$(sed -E 's/(....)(..)(..)(..)(..)/\1-\2-\3 \4:\5/' <<< "$input")" +%s
1544010420
Please not that the output depends on your systems timezone. You can change the timezone by setting the TZ environment variable.
By the way: You don't have to export your variable in this case. Also, the convention for naming variables in bash is to use lowercase letters. By convention only special variables are written in all caps. Using variable names written in all caps could lead to a name collisions with these variables.
Apparently separating date and time parts with a space is enough, so:
$ echo 201812051147 |
sed -E 's/(.{8})(.{4})/\1 \2/' |
date -f - +"%s"
Output:
1544003220
You can use the "touch" command to assign that date to a file and then print it using "date", try this:
touch -t 201812051147 tmpfile ; date -r tmpfile +%s ; rm -f tmpfile

Add file date to file name in bash

I'm looking for a programmatic way to add a file's date to the filename. I'm on a Mac, Yosemite (10.10).
Using Bash, I have put a fair amount of effort into this, but just too new to get there so far. Here's what I have so far:
#!/bin/bash
#!/bin/bash
(IFS='
'
for x in `ls -l | awk '{print$9" "$7"-"$6"-"$9}'`
do
currentfilename=$(expr "$x" : '\($substring\)')
filenamewithdate=$(expr "$x" : '.*\($substring\)')
echo "$currentfilename"
echo "$filenamewithdate"
done)
The idea here is to capture detailed ls output, use awk to capture the strings for the columns with the filename ($9), and also date fields ($7 and $6), then loop that output to capture the previous filename and new filename with the date to mv the file from old filename to new. The awk statement adds a space to separate current filename from new. The echo statement is there now to test if I am able to parse the awk ouput. I just don't know what to add for $substring to get the parts of the string that are needed.
I have much more to learn about Bash scripting, so I hope you'll bear with me as I learn. Any guidance?
Thanks.
Looking at the stat man page, you'd want:
for file in *; do
filedate=$(stat -t '%Y-%m-%dT%H:%M:%S' -f '%m' "$file")
newfile="$file-$filedate"
echo "current: $file -> new: $newfile"
done
Adjust your preferred datetime format to your taste.
You could save a line with
for file in *; do
newfile=$(stat -t '%Y-%m-%dT%H:%M:%S' -f '%N-%m' "$file")

Extract part of file name with multiple sections

I am trying to extract part of a file name to compare with other file names as it is the only part that does not change. here is the pattern and an example
clearinghouse.doctype.payer.transID.processID.date.time
EMDEON.270.60054.1234567890123456789.70949996.20120925.014606403
all sections are the same length at all times with the exception of clearinghouse & doctype that can vary in character length.
The part of the filename that i need for comparison is the transID.
What would be the cleanest shortest way to do this in a shell script.
Thanks
There are lots of ways to do this, the easiest tool for simple tasks is the cut command. Tell cut what character you want to use as a delemiter and which fields you want to print. Here is the command that does what you want.
file=EMDEON.270.60054.1234567890123456789.70949996.20120925.014606403
transitId=$(echo $file | cut -d. -f4)
Awk can do the same thing, and allows you do much more complicated logic as well.
file=EMDEON.270.60054.1234567890123456789.70949996.20120925.014606403
transitId=$(echo $file | awk -F. '{print $4}')
You can split the filename apart using the read command using an appropriate value
for IFS.
filename="EMDEON.270.60054.1234567890123456789.70949996.20120925.014606403"
IFS="." read clHouse doctype payer transID procID dt tm <<< "$filename"
echo $transID
Since you only want the transaction ID, it's overkill to assign every part to a specific variable. Use a single dummy variable for the other fields:
# You only need one variable after transID to swallow the rest of the input without
# splitting it up.
IFS="." read _ _ _ transID _ <<< "$filename"
or just read each part into a single array and access the proper element:
IFS="." read -a parts <<< "$filename"
transID="${parts[3]}"
You can do this with a parameter expansion:
$ foo=EMDEON.270.60054.1234567890123456789.70949996.20120925.014606403
$ bar=${foo%.[0-9]*.[0-9]*.[0-9]*}
$ echo "${bar##*.}"
1234567890123456789
tranid==`echo file_name|perl -F -ane 'print $F[3]'`

Resources