StackOverflow.
I have a colleciton of notes from work. I keep them as markdown files, and had been formatting them with the date and year - for example, today's is titled 06132017.md
I am coming up on a year at work, so I have quite a few of these files. I wish to change the naming convention from month/day first to year first, so that I can sort them alphabetically and easily find dates I need.
So 06132017.md would become 20170613.md - this would keep 2016 and 2017 from mixing in aplha order. Is there a command I can run on a folder to do this?
If you have the Perl rename utility, it's rather simple to do:
$ prename 's/^(....)(....)(\.md)$/$2$1$3/' *.md
06132017.md renamed as 20170613.md
The dots match any character, the parenthesis group, and $N on the replacement side inserts the characters captured in the groups.
Or just in Bash:
$ for x in ????????.md ; do mv -v "$x" "${x:4:4}${x:0:4}.md" ; done
'06132017.md' -> '20170613.md'
${var:n:m} takes a substring of length m, starting at position n from variable var.
Related
I'm looking for a bit of help here. I'm a complete newbie!
I need to look in a file for a code matching the pattern A00000_00_A and append a count to it, so the first time it appears it is replaced with A00000_00_A_001, second time A00000_00_A_002 etc. The output needs to be written back to the same file. Each file only contains 1 code, but it appears multiple times.
After some digging I have found-
perl -pi -e 's/Q\d{4,5}'_'\d{2}_./$&.'_'.++$A /ge' /users/documents/*.xml
but the issue is the counter does not reset in each file.
That is, the output of the first file is say Q00390_01_A_1 to Q00390_01_A_7, while the second file is Q00391_01_A_8 to Q00391_01_A_10.
What I want is Q00390_01_A_1 to Q00390_01_A_7 in the first file and Q00391_01_A_1 to Q00391_01_A_2 in the second.
Does anyone have any idea on how to edit the above code to make it do that? I'm a total newbie so ideally an edit to what I have would be brilliant. Thanks
cd /users/documents/
for f in *.xml;do
perl -pi -e 's/facs=.(Q|M)\d{4,5}_\d{2}_\w/$&._.sprintf("%04d",++$A) /ge' $f
done
This matches the string facs= and any character, then "Q" or "M" followed by either four or five digits, then an underscore, then two digits, another underscore, and a word character. The entire match is then concatenated with an underscore and the value of $A zero padded to four digits.
date +'%A %B %d' | sed -e 's/\(^\|[^[:digit:]]\+\)0\+\([[:digit:]]\)/\1\2/g
I like the output of the above command, which strips leading zeroes off days of the month produced by the date command, in the case of numerals less than 10. It's the only way I've thus far found of producing single digit dates from the date command's output for the day of the month, which otherwise would be 01, 02, 03, etc.
A couple of questions in this regard. Is there a more elegant way of accomplishing the stated goal of stripping off zeroes? I do know about date's %e switch and would like to use it, but with numerals 10 and greater it has the undesirable effect of losing the space between the month name and the date (so, July 2 but July10).
The second question regards the larger intended goal of arriving at such an incantation. I'm putting together a script that will scrape some data from a web page. The best way of locating the target data on the page is by searching on the current date. But the site uses only single digits for the first 9 days of the month, thus the need to strip off leading zeroes. So what's the best way of getting this complex command into a variable so I can call it within my script? Would a variable within a variable be called for here?
RESOLUTION
I'll sort of answer my own question here, though it is really input from Renaud Pacalett (below) that enabled me to resolve the matter. His input revealed to me that I'd not understood very well the man page, particularly the part where is says "date pads numeric fields with zeroes," and below that where it is written "- (hyphen) do not pad the field." Had I understood better those statements, I would have realized that there is no need for the complex sed line through which I piped the date output in the title of this posting: had I used there %-d instead of just %d there would have been no leading zeroes in front of numerals less than 10 and so no need to call sed (or tr, as suggested below by LMC) to strip them off. In light of that, the answer to the second question about putting that incantation into a variable becomes elementary: var=$(date +'%A %B %-d') is all that is needed.
I may go ahead and mark Renaud Pacalet's response as the solution since, even though I did not implement all of his suggestions into the latest incarnation of my script, it proved crucial in clarifying key requirements of the task.
If your date utility supports it (the one from GNU coreutils does) you can use:
date +'%A %B %-d'
The - tells date to not pad the numeric field. Demo:
$ date -d"2021/07/01" +'%A %B %-d'
Thursday July 1
Not sure I understand your second question but if you want to pass this command to a shell script (I do not really understand why you would do that), you can use the eval shell command:
$ cat foo.sh
#!/usr/bin/env bash
foo="$(eval "$1")"
echo "$foo"
$ ./foo.sh 'date -d"2021/07/01" +"%A %B %-d"'
Thursday July 1
Please pay attention to the double (") and simple (') quotes usage. And of course, you will have to add to this example script what is needed to handle errors, avoid misuses...
Note that many string comparison utilities support one form or another of extended regular expressions. So getting rid of these leading zeros or spaces can be as easy as:
grep -E 'Thursday\s+July\s+0*1' foo.txt
This would match any line of foo.txt containing
Thursday<1 or more spaces>July<1 or more spaces><0 or more zeros>1
This question already has answers here:
remove date from filename but keep the file extension
(2 answers)
Closed 4 years ago.
Is there a quick and clever way to remove various timestamps from multiple files with different names? The timestamp format always remains the same, although the values differ. An example of my files would be...
A_BB_CC_20180424_134312
A_B_20180424_002243
AA_CC_DD_E_20180424_223422
C_DD_W_E_D_20180423_000001
with the expected output
A_BB_CC
A_B
AA_CC_DD_E
C_DD_W_E_D
Notice the last file has a different timestamp, I don't mind if this is a day specific timestamp removal or all, or two variations. My problem is I can't think of the code for an ever changing time value :(
Thanks in advance
EDIT - Adding edit in to show why this is not a duplicate as Tripleee thinks. His duplicate link is for files with the same prefix, my question is about files with different names so the answer is different.
Using parameter expansion %% bashism that removes the end of the filename:
for i in /my/path/*; do mv "$i" "${i%%_2018*}"; done
This relies on the timestamp that start with 2018...
Using awk:
for i in /my/path/*; do mv "$i" $(awk -v FS=_ 'NF-=2' OFS="_" <<< "$i"); done
This awk script is based on the field separator _. It prints the filename without the last 2 field representing the timestamp.
In order to rename a set of files and apply regular expressions in the renaming process you can use the rename command.
So in your example:
rename 's#_[0-9]*_[0-9]*##' *_[0-9]*
This renames all files in the current directory ending with _ followed by digits.
It cuts away all _ followed by digits followed by _ followed by digits.
I know this question has been asked, but I can't find more than one solution, and it does not work for me. Essentially, I'm looking for a bash script that will take a file list that looks like this:
image1.jpg
image2.jpg
image3.jpg
And then make a copy of each one, but number it sequentially backwards. So, the sequence would have three new files created, being:
image4.jpg
image5.jpg
image6.jpg
And yet, image4.jpg would have been an untouched copy of image3.jpg, and image5.jpg an untouched copy of image2.jpg, and so on. I have already tried the solution outlined in this stackoverflow question with no luck. I am admittedly not very far down the bash scripting path, and if I take the chunk of code in the first listed answer and make a script, I always get "2: Syntax error: "(" unexpected" over and over. I've tried changing the syntax with the ( around a bit, but no success ever. So, either I am doing something wrong or there's a better script around.
Sorry for not posting this earlier, but the code I'm using is:
image=( image*.jpg )
MAX=${#image[*]}
for i in ${image[*]}
do
num=${i:5:3} # grab the digits
compliment=$(printf '%03d' $(echo $MAX-$num | bc))
ln $i copy_of_image$compliment.jpg
done
And I'm taking this code and pasting it into a file with nano, and adding !#/bin/bash as the first line, then chmod +x script and executing in bash via sh script. Of course, in my test runs, I'm using files appropriately titled image1.jpg - but I was also wondering about a way to apply this script to a directory of jpegs, not necessarily titled image(integer).jpg - in my file keeping structure, most of these are a single word, followed by a number, then .jpg, and it would be nice to not have to rewrite the script for each use.
Perhaps something like this. It will work well for something like script image*.jpg where the wildcard matches a set of files which match a regular pattern with monotonously increasing numbers of the same length, and less ideally with a less regular subset of the files in the current directory. It simply assumes that the last file's digit index plus one through the total number of file names is the range of digits to loop over.
#!/bin/sh
# Extract number from final file name
eval lastidx=\$$#
tmp=${lastidx#*[!0-9][0-9]}
lastidx=${lastidx#${lastidx%[0-9]$tmp}}
tmp=${lastidx%[0-9][!0-9]*}
lastidx=${lastidx%${lastidx#$tmp[0-9]}}
num=$(expr $lastidx + $#)
width=${#lastidx}
for f; do
pref=${f%%[0-9]*}
suff=${f##*[0-9]}
# Maybe show a warning if pref, suff, or width changed since the previous file
printf "cp '$f' '$pref%0${width}i$suff'\\n" $num
num=$(expr $num - 1)
done |
sh
This is sh-compatible; the expr stuff and the substring extraction up front is ugly but Bourne-compatible. If you are fine with the built-in arithmetic and string manipulation constructs of Bash, converting to that form should be trivial.
(To be explicit, ${var%foo} returns the value of $var with foo trimmed off the end, and ${var#foo} does similar trimming from the beginning of the value. Regular shell wildcard matching operators are available in the expression for what to trim. ${#var} returns the length of the value of $var.)
Maybe your real test data runs from 001 to 300, but here you have image1 2 3, and therefore you extract one, not three digits from the filename. num=${i:5:1}
Integer arithmetic can be done in the bash without calling bc
${#image[#]} is more robust than ${#image[*]}, but shouldn't be a difference here.
I didn't consult a dictionary, but isn't compliment something for your girl friend? The opposite is complement, isn't it? :)
the other command made links - to make copies, call cp.
Code:
#!/bin/bash
image=( image*.jpg )
MAX=${#image[#]}
for i in ${image[#]}
do
num=${i:5:1}
complement=$((2*$MAX-$num+1))
cp $i image$complement.jpg
done
Most important: If it is bash, call it with bash. Best: do a shebang (as you did), make it executable and call it by ./name . Calling it with sh name will force the wrong interpreter. If you don't make it executable, call it bash name.
I was sent a large list of URL's in an Excel spreadsheet, each unique according to a certain get variable in the string (who's value is a number ranging from 5-7 numbers in length). I am having to run some queries on our databases based on those numbers, and don't want to have to go through the hundreds of entries weeding out the numbers one-by-one. What BASH commands that can be used to parse out the number from each line (it's the only number in each line) and consolidate it down to one line with all the numbers, comma separated?
A sample (shortened) listing of the CVS spreadsheet includes:
http://www.domain.com/view.php?fDocumentId=123456
http://www.domain.com/view.php?fDocumentId=223456
http://www.domain.com/view.php?fDocumentId=323456
http://www.domain.com/view.php?fDocumentId=423456
DocumentId=523456
DocumentId=623456
DocumentId=723456
DocumentId=823456
....
...
The change of format was intentional, as they decided to simply reduce it down to the variable name and value after a few rows. The change of the get variable from fDocumentId to just DocumentId was also intentional. Ideal output would look similar to:
123456,23456,323456,423456,523456,623456,723456,823456
EDIT: my apologies, I did not notice that half way through the list, they decided to get froggy and change things around, there's entries that when saved as CSV, certain rows will appear as:
"DocumentId=098765 COMMENT, COMMENT"
DocumentId=898765 COMMENT
DocumentId=798765- COMMENT
"DocumentId=698765- COMMENT, COMMENT"
With several other entries that look similar to any of the above rows. COMMENT can be replaced with a single string of (upper-case) characters no longer than 3 characters in length per COMMENT
Assuming the variable always on it's own, and last on the line, how about just taking whatever is on the right of the =?
sed -r "s/.*=([0-9]+)$/\1/" testdata | paste -sd","
EDIT: Ok, with the new information, you'll have to edit the regex a bit:
sed -r "s/.*f?DocumentId=([0-9]+).*/\1/" testdata | paste -sd","
Here anything after DocumentId or fDocumentId will be captured. Works for the data you've presented so far, at least.
More simple than this :)
cat file.csv | cut -d "=" -f 2 | xargs
If you're not completely committed to bash, the Swiss Army Chainsaw will help:
perl -ne '{$_=~s/.*=//; $_=~s/ .*//; $_=~s/-//; chomp $_ ; print "$_," }' < YOUR_ORIGINAL_FILE
That cuts everything up to and including an =, then everything after a space, then removes any dashes. Run on the above input, it returns
123456,223456,323456,423456,523456,623456,723456,823456,098765,898765,798765,698765,