File has the following data and want to remove the 'T' and '-07:00' in the Timestamp field
407358186|2014-05-16T08:14:00-07:00|993827047
407358186|2014-05-15T08:58:00-07:00|993335621
407358186|2014-05-13T06:13:00-07:00|992181538
407358186|2014-05-11T19:58:00-07:00|991523532
Expected output result
407358186|2014-05-16 08:14:00|993827047
407358186|2014-05-15 08:58:00|993335621
407358186|2014-05-13 06:13:00|992181538
407358186|2014-05-11 19:58:00|991523532
You can use sed.
cat file | sed 's/T/\ /g' | sed 's/-07:00//g'
The first pipe simply replaces all T's with spaces, and the second pipe eliminates all instances of the string -07:00.
I'm assuming since you're removing the -07:00 that you are not interested in any timezone sensitive data. If you were, you would not remove it, and hence I am able to justify hardcoding '-07:00' into the answer.
Changing timestamps from other timezones would require code which is a bit more involved.
Related
I have a file with numerical data, and reading the variables from another file extract the correct string.
I have my code to read in the variables.
The problem is the variable can occur at different points within the string, i only want the string that has the variable on the right-hand side, i.e. the last 8 characters.
e.g.
grep 0335439 foobar.txt
00032394850033543984
00043245845003354390
00060224460033543907
00047444423700335439
In this case its the last line.
I have tried to write something using ${str: -8}, but then I lose the data in front.
I have found this command
grep -Eo '^.{12}(0335439)' foobar.txt
This works, however when I use my script and put a variable in the place it doesn't, grep -Eo '^.{12}($string)' foobar.txt.
I have tried without brackets but it still does not work.
Update:
In this case the length of the string is always 20 characters, so counting from the LHS is OK in my case, but you are correct its was not the answer to the original question. I tried to comment the code so say this but pasting it into the comment box removed the formatting.
i only want the string that has the variable on the right-hand side, i.e. the last 8 characters
A non-regex approach using awk is better suited for this job:
s='00335439'
awk -v n=8 -v kw="$s" 'substr($0, length()-n, n) == kw' file
00043245845003354390
Here we passing n=8 to awk and using substr($0, length()-n, n) we are getting last n characters in a line, which is then compared against variable kw which is set to a value on command line.
date +'%A %B %d' | sed -e 's/\(^\|[^[:digit:]]\+\)0\+\([[:digit:]]\)/\1\2/g
I like the output of the above command, which strips leading zeroes off days of the month produced by the date command, in the case of numerals less than 10. It's the only way I've thus far found of producing single digit dates from the date command's output for the day of the month, which otherwise would be 01, 02, 03, etc.
A couple of questions in this regard. Is there a more elegant way of accomplishing the stated goal of stripping off zeroes? I do know about date's %e switch and would like to use it, but with numerals 10 and greater it has the undesirable effect of losing the space between the month name and the date (so, July 2 but July10).
The second question regards the larger intended goal of arriving at such an incantation. I'm putting together a script that will scrape some data from a web page. The best way of locating the target data on the page is by searching on the current date. But the site uses only single digits for the first 9 days of the month, thus the need to strip off leading zeroes. So what's the best way of getting this complex command into a variable so I can call it within my script? Would a variable within a variable be called for here?
RESOLUTION
I'll sort of answer my own question here, though it is really input from Renaud Pacalett (below) that enabled me to resolve the matter. His input revealed to me that I'd not understood very well the man page, particularly the part where is says "date pads numeric fields with zeroes," and below that where it is written "- (hyphen) do not pad the field." Had I understood better those statements, I would have realized that there is no need for the complex sed line through which I piped the date output in the title of this posting: had I used there %-d instead of just %d there would have been no leading zeroes in front of numerals less than 10 and so no need to call sed (or tr, as suggested below by LMC) to strip them off. In light of that, the answer to the second question about putting that incantation into a variable becomes elementary: var=$(date +'%A %B %-d') is all that is needed.
I may go ahead and mark Renaud Pacalet's response as the solution since, even though I did not implement all of his suggestions into the latest incarnation of my script, it proved crucial in clarifying key requirements of the task.
If your date utility supports it (the one from GNU coreutils does) you can use:
date +'%A %B %-d'
The - tells date to not pad the numeric field. Demo:
$ date -d"2021/07/01" +'%A %B %-d'
Thursday July 1
Not sure I understand your second question but if you want to pass this command to a shell script (I do not really understand why you would do that), you can use the eval shell command:
$ cat foo.sh
#!/usr/bin/env bash
foo="$(eval "$1")"
echo "$foo"
$ ./foo.sh 'date -d"2021/07/01" +"%A %B %-d"'
Thursday July 1
Please pay attention to the double (") and simple (') quotes usage. And of course, you will have to add to this example script what is needed to handle errors, avoid misuses...
Note that many string comparison utilities support one form or another of extended regular expressions. So getting rid of these leading zeros or spaces can be as easy as:
grep -E 'Thursday\s+July\s+0*1' foo.txt
This would match any line of foo.txt containing
Thursday<1 or more spaces>July<1 or more spaces><0 or more zeros>1
I'm trying to extract a tag value of an HTML node that I already have in a variable.
I'm currently using Zsh but I'm trying to make it work in Bash as well.
The current variable has the value:
<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>
and I would like to get the value of data-count (in this case 0, but could be any length integer).
I have tried using cut, sed and the variables expansion as explained in this question but I haven't managed to adapt the regexs, or maybe it has to be done differently for Zsh.
There is no reason why sed would not work in this situation. For your specific case, I would do something like this:
sed 's/.*data-count="\([0-9]*\)".*/\1/g' file_name.txt
Basically, it just states that sed is looking for the a pattern that contains data-count=, then saves everything within the paranthesis \(...\) into \1, which is subsequently printed in place of the match (full line due to the .*)
Could you please try following.
awk 'match($0,/data-count=[^ ]*/){print substr($0,RSTART+12,RLENGTH-13)}' Input_file
Explanation: Using match function of awk to match regex data-count=[^ ]* means match everything from data-count till a space comes, if this regex is TRUE(a match is found) then out of the box variables RSTART and RLENGTH will be set. Later I am printing current line's sub-string as per these variables values to get only value of data-count.
With sed could you please try following.
sed 's/.*data-count=\"\([^"]*\).*/\1/' Input_file
Explanation: Using sed's capability of group referencing and saving regex value in first group after data-count=\" which is its length, then since using s(substitution) with sed so mentioning 1 will replace all with \1(which is matched regex value in temporary memory, group referencing).
As was said before, to be on the safe side and handle any syntactically valid HTML tag, a parser would be strongly advised. But if you know in advance, what the general format of your HTML element will look like, the following hack might come handy:
Assume that your variable is called "html"
html='<span class="alter" fill="#ffedf0" data-count="0" data-more="none"/>'
First adapt it a bit:
htmlx="tag ${html%??}"
This will add the string tag in front and remove the final />
Now make an associative array:
declare -A fields
fields=( ${=$(tr = ' ' <<<$htmlx)} )
The tr turns the equal sign into a space and the ${= handles word splitting. You can now access the values of your attributes by, say,
echo $fields[data-count]
Note that this still has the surrounding double quotes. Yuo can easily remove them by
echo ${${fields[data-count]%?}#?}
Of course, once you do this hack, you have access to all attributes in the same way.
I've inherited a Laravel system with a large single log file that is currently around 17GB in size, I'm now rotating future log files monthly, however I need to split the existing log by month.
The date is formatted as yyyy-mm-dd hh:mm:ss ("[2018-06-28 13:32:05]"). Does anybody know how I could perform the split using only bash scripting (e.g. through use of awk, sed etc.).
The input file name is laravel.log. I'd like output files to have format such as laravel-2018-06.log.
Help much appreciated.
Since the information you provide is a bit sparse, I will go with the following assumptions :
each log-entry is a single line
somewhere there is always one string of the form [yyyy-mm-dd hh:mm:ss], if there are more, we take the first.
your log-file is sorted in time.
The regex which matches your date is,
\\[[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2}\\]
or a bit less strict
\\[[-:0-9 ]{19}\\]
So we can use this in combination with match(s,ere) to get the desired string :
awk 'BEGIN{ere="\\[[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2}\\]"}
{ match($0,ere); fname="laravel-"substr($0,RSTART+1,7)".log" }
(fname != oname) { close(oname); oname=fname }
{ print > oname }' laravel.log
As you say that your file is a bit on the large side, you might want to test this first on a subset which covers a couple of months.
$ head -10000 laravel.log > laravel.head.log
$ awk '{...}' laravel.head.log
$ md5sum laravel.head.log
$ cat laravel.*-*.log | md5sum
If the md5sum is not matching, you might have a problem.
I was sent a large list of URL's in an Excel spreadsheet, each unique according to a certain get variable in the string (who's value is a number ranging from 5-7 numbers in length). I am having to run some queries on our databases based on those numbers, and don't want to have to go through the hundreds of entries weeding out the numbers one-by-one. What BASH commands that can be used to parse out the number from each line (it's the only number in each line) and consolidate it down to one line with all the numbers, comma separated?
A sample (shortened) listing of the CVS spreadsheet includes:
http://www.domain.com/view.php?fDocumentId=123456
http://www.domain.com/view.php?fDocumentId=223456
http://www.domain.com/view.php?fDocumentId=323456
http://www.domain.com/view.php?fDocumentId=423456
DocumentId=523456
DocumentId=623456
DocumentId=723456
DocumentId=823456
....
...
The change of format was intentional, as they decided to simply reduce it down to the variable name and value after a few rows. The change of the get variable from fDocumentId to just DocumentId was also intentional. Ideal output would look similar to:
123456,23456,323456,423456,523456,623456,723456,823456
EDIT: my apologies, I did not notice that half way through the list, they decided to get froggy and change things around, there's entries that when saved as CSV, certain rows will appear as:
"DocumentId=098765 COMMENT, COMMENT"
DocumentId=898765 COMMENT
DocumentId=798765- COMMENT
"DocumentId=698765- COMMENT, COMMENT"
With several other entries that look similar to any of the above rows. COMMENT can be replaced with a single string of (upper-case) characters no longer than 3 characters in length per COMMENT
Assuming the variable always on it's own, and last on the line, how about just taking whatever is on the right of the =?
sed -r "s/.*=([0-9]+)$/\1/" testdata | paste -sd","
EDIT: Ok, with the new information, you'll have to edit the regex a bit:
sed -r "s/.*f?DocumentId=([0-9]+).*/\1/" testdata | paste -sd","
Here anything after DocumentId or fDocumentId will be captured. Works for the data you've presented so far, at least.
More simple than this :)
cat file.csv | cut -d "=" -f 2 | xargs
If you're not completely committed to bash, the Swiss Army Chainsaw will help:
perl -ne '{$_=~s/.*=//; $_=~s/ .*//; $_=~s/-//; chomp $_ ; print "$_," }' < YOUR_ORIGINAL_FILE
That cuts everything up to and including an =, then everything after a space, then removes any dashes. Run on the above input, it returns
123456,223456,323456,423456,523456,623456,723456,823456,098765,898765,798765,698765,