How to specify range using bases greater than 10 in MacOS Terminal - macos

I am using
curl -0 https://cdn.(website).com/[range].(extension) -o "#1.(extension)"
to download files in a range of filename values.
How can I specify the range in a base higher than base 10 (I'm aiming for base 36 to encompass 0-9, a-z)?
ex. [000-zzz]

I don't know of an easy way to print numbers in a nonstandard base in bash, but you could use bash's brace expansion to generate the range textually. You can nest brace expansions, so {{0..9},{a..z}} will give you the digits followed by the lowercase letters. Then stack three of those to get all three-item sequences.
Note that this is a different feature from curl's bracket and brace expansion, so you can't use it in the output filename (the #1 in your example). But you can make a shell for loop, and use the shell variable to create the output filename. Something like this:
for itemnum in {{0..9},{a..z}}{{0..9},{a..z}}{{0..9},{a..z}}; do
curl -0 https://cdn.(website).com/${itemnum}.(extension) -o "${itemnum}.(extension)"
done

Related

How to single out the first bit and retaining the last bit of the filename using grep(find)

Greeting
I am writing a bash code to convert decimal to binary from a file name (Ex: 023-124.grf) and unfortunately, I only need to only convert the last 3 numbers of the file without interfering with the first bit
(it looks something like this: 124.grf)
I had already tried using cut but it is only ethical with a text file and as for grepping, i am still trying to figure out on using this command since I am still relatively new to bash
Is there a way to single out the first bit of the filename?
Well, I'm not sure you completely specified your problem, but luckily, even a very general variation of it can be solved fairly easily, considering that grep allows you to match both digit and non-digit characters.
So to match "the last 3 consecutive digits that are not succeeded by a digit" in any text (even if it looks like "234_blablabla_lololol_343123_blablabla_abc.ext" or "blabla_987123, rather than "555-123.ext"), you could literally translate the quoted definition to a regular expression, and get "123", by using [0-9] to match a digit and [^0-9] to match a non-digit. The latter serves the purpose of narrowing your digits down to the last ones present in the text, by stating that only non-digits may (optionally) succeed them.
E.g.:
echo 234_blablabla_lololol_343999_blablabla_abc.txt | grep '[0-9][0-9][0-9][^0-9]*$' | grep '^...'
999
Of course, there are many other ways to do this. For instance, grep has a -P flag to enable the most powerful kind of regular expression syntax it supports, namely Perl regex. With this, you can avoid a lot of redundant code.
E.g. with Perl regex, you can shorten repeats of the same regex unit ("atom"):
[0-9][0-9][0-9] -> [0-9]{3}
It even provides shorthands for common concepts as "character classes". One of these is "decimal digit", a shorthand for [0-9], denoted as \d:
[0-9]{3} -> \d{3}
You could also use lookaheads and lookbehinds to fetch your 3 digits in one pass, alleviating the need of grepping for the first 3 characters afterwards (the grep '^...' part), but I can't be bothered to look up the particular syntax for that in grep right now.
Now sadly, I would have to think a lot how to generalize the above definition of "the last 3 consecutive digits that are not succeeded by a digit" into "the last 3 consecutive digits", meaning the above regular expression would not match file names where the last run of 3 digits is succeeded by a digit anywhere later in the file name, such as "blabla_12_blabla_123_blabla_56.ext", but I am optimistic that your naming convention does not allow that.
You can use bash primitives to separate out the desired portion of the name. There's probably a slicker way to get the binary conversion of the decimal number, but I like dc:
$ name=023-124.grf
$ base=${name%.*}
$ echo "$base"
023-124
$ suffix=${base##*-}
$ echo $suffix
124
$ echo "$suffix" 2 o p | dc
1111100
$ new_name="${base%%-*}-$(echo $suffix 2 o p | dc).${name##*.}"
$ echo "$new_name"
023-1111100.grf

How to keep/remove numbers in a variable in shell?

I have a variable such as:
disk=/dev/sda1
I want to extract:
only the non numeric part (i.e. /dev/sda)
only the numeric part (i.e. 1)
I'm gonna use it in a script where I need the disk and the partition number.
How can I do that in shell (bash and zsh mostly)?
I was thinking about using Shell parameters expansions, but couldn't find working patterns in the documentation.
Basically, I tried:
echo ${disk##[:alpha:]}
and
echo ${disk##[:digit:]}
But none worked. Both returned /dev/sda1
With bash and zsh and Parameter Expansion:
disk="/dev/sda12"
echo "${disk//[0-9]/} ${disk//[^0-9]/}"
Output:
/dev/sda 12
The expansions kind-of work the other way round. With [:digit:] you will match only a single digit. You need to match everything up until, or from a digit, so you need to use *.
The following looks ok:
$ echo ${disk%%[0-9]*} ${disk##*[^0-9]}
/dev/sda 1
To use [:digit:] you need double braces, cause the character class is [:class:] and it itself has to be inside [ ]. That's why I prefer 0-9, less typing*. The following is the same as above:
echo ${disk%%[[:digit:]]*} ${disk##*[^[:digit:]]}
* - Theoretically they may be not equal, as [0-9] can be affected by the current locale, so it may be not equal to [0123456789], but to something different.
You have to be careful when using patterns in parameter substitution. These patterns are not regular expressions but pathname expansion patterns, or glob patterns.
The idea is to remove the last number, so you want to make use of Remove matching suffix pattern (${parameter%%word}). Here we remove the longest instance of the matched pattern described by word. Representing single digit numbers is easily done by using the pattern [0-9], however, multi-digit numbers is harder. For this you need to use extended glob expressions:
*(pattern-list): Matches zero or more occurrences of the given patterns
So if you want to remove the last number, you use:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
0 /dev/dsk/c0t2d0s
We have to use ${disk#${disk%%*([0-9])}} to remove the prefix. It essentially searches the last number, removes it, uses the remainder and remove that part again.
You can also make use of pattern substitution (${parameter/pattern/string}) with the anchors % and # to anchor the pattern to the begin or end of the parameter. (see man bash for more information). This is completely equivalent to the previous solution:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
0 /dev/dsk/c0t2d0s

Bash wildcard pattern using `seq`

I am trying the following command:
ls myfile.h1.{`seq -s ',' 3501 3511`}*
But ls raises the error:
ls: cannot access myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511}*: No such file or directory
Seems like ls is thinking the entire line is a filename and not a wildcard pattern. But if I just copy that command ls myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511}* in the terminal I get the listing as expected.
Why does typing out the command in full work, but not the usage with seq?
seq is not needed for your case, try
$ ls myfile.h1.{3500..3511}
if you want to use seq I would suggest using format option
$ ls $(seq -f 'myfile.h1.%g' 3501 3511)
but I don't think there is any reason to do so.
UPDATE:
Note that I didn't notice the globbing in the original post. With that, the brace extension still preferred way
$ ls myfile.h1.{3500..3511}*
perhaps even factoring the common digit out, if your bash support zero padding
$ ls myfile.h1.35{00..11}*
if not you can extract at least 3 out
$ ls myfile.h1.3{500..511}*
Note that the seq alternative won't work with globbing.
Other answer has more details...
karakfa's answer, which uses a literal sequence brace expansion expression, is the right solution.
As for why your approach didn't work:
Bash's brace expansion {...} only works with literal expressions - neither variable references nor, as in your case, command substitutions (`...`, or, preferably, $(...)) work[1] - for a concise overview, see this answer of mine.
With careful use of eval, however, you can work around this limitation; to wit:
from=3501 to=3511
# CAVEAT: Only do this if you TRUST that $from and $to contain
# decimal numbers only.
eval ls "myfile.h1.{$from..$to}*"
#ghoti suggests the following improvement in a comment to make the use of eval safe here:
# Use parameter expansion to remove all non-digit characters from the values
# of $from and $to, thus ensuring that they either contain only a decimal
# number or the empty string; this expansion happens *before* eval is invoked.
eval ls "myfile.h1.{${from//[^0-9]/}..${to//[^0-9]/}}*"
As for how your command was actually evaluated:
Note: Bash applies 7-8 kinds of expansions to a command line; only the ones that actually come into play here are discussed below.
first, the command in command substitution `seq -s ',' 3501 3511` is executed, and replaced by its output (also note the trailing ,):
3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,
the result then forms a single word with its prefix, myfile.h1.{ and its suffix, }*, yielding:
myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,}*
pathname expansion (globbing) is then applied to the result - in your case, since no files match, it is left as-is (by default; shell options shopt -s nullglob or shopt -s failglob could change that).
finally, literal myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,}* is passed to ls, which - because it doesn't refer to an existing filesystem item - results in the error message you saw.
[1] Note that the limitation only applies to sequence brace expansions (e.g., {1..3}); list brace expansions (e.g, {1,2,3}) are not affected, because no up-front interpretation (interpolation) is needed; e.g. {$HOME,$USER} works, because brace expansion results expanding the list to separate words $HOME, and $USER, which are only later expanded.
Historically, sequence brace expansions were introduced later, at a time when the order of shell expansions was already fixed.

Match a range of file names with variable end, in a Bash script

Let's say I have a number of files named file1, file2, file3, and so on. I'm trying to find a way to match the first N files, in a Bash script, where N is a variable. Here are the options I've considered so far:
Brace expansion, i.e. file{1..3}, doesn't allow variable end. In other words, file{1..$N} doesn't work.
A range expression can be used to match numeric characters. It allows variable end, i.e. file[1-$N], but this works only until N > 9.
$(seq 1 $N) can be used to create a sequence of numbers, but it doesn't help since the problem is to match a sequence of numbers in a file name. Were the files name simply 1, 2, 3, and so on, this would work.
Here is another solution. I'm not advocating it, but then again there can be legitimate uses for eval ;) ...also I think not being able to use a variable in a range is an annoying/less intuitive shortcoming.
N=5
eval echo {1..$N}
So you could do
eval ls file{1..$N}
I found a solution using extended globs. They need to be enabled with shopt -s extglob command. #(...) can be used to match any of a set of patterns separated by | character, e.g. file#(1|2|3). Now I just need to generate the number sequence with | as the separator character instead of a newline:
shopt -s extglob
range=$(seq 1 $N)
ls file#(${range//$'\n'/|})
Could you simply do,
for file01.txt, file02.txt, file345.txt, file678.txt...
cat file*.txt > file_all.txt
or am I missing the point?

Allowing punctuation characters in directory and file names in bash

What techniques or principles should I use in a bash script to handle directories and filenames that are allowed to contain as many as possible of
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
and space?
I guess / is not a valid filename or directory name character in most linux/unix systems?
So far I have had problems with !, ;, |, (a space character) and ' in filenames.
You are right, / is not valid, as is the null-byte \0. There is no way around that limitation (besides file system hacking).
All other characters can be used in file names, including such surprising characters as a newline \n or a tab \t. There are many ways to enter them so that the shell does not understand them as special characters. I will give just a pragmatic approach.
You can enter most of the printable characters by using the singlequote ' to to quote them:
date > 'foo!bar["#$%&()*+,-.:;<=>?#[\]^_`{|}~'
Of course, you cannot enter a singlequote this way, but for this you can use the doublequote ":
date > "foo'bar"
If you need to have both, you can end one quotation and start another:
date > "foo'bar"'"bloh'
Alternatively you also can use the backslash \ to escape the special character directly:
date > foo\"bar
The backslash also works as an escaper withing doublequotes, it does not work that way within singlequotes (there it is a simple character without special meaning).
If you need to enter non-printable characters like a newline, you can use the dollar-singlequote notation:
date > $'foo\nbar'
This is valid in bash, but not necessarily in all other shells. So take care!
Finally, it can make sense to use a variable to keep your strange name (in order not to have to spell it out directly:
strangeName=$(xxd -r <<< "00 41 42 43 ff 45 46")
date > "$strangeName"
This way you can keep the shell code readable.
BUT in general it is not a good idea to have such characters in file names because a lot of scripts cannot handle such files properly.
To write scripts fool-proof is not easy. The most basic rule is the quote variable usage in doublequotes:
for i in *
do
cat "$i" | wc -l
done
This will solve 99% of the issues you are likely to encounter.
If you are using find to find directory entries which can contain special characters, you should use printf0 to separate the output not by spaces but by null-bytes. Other programs like xargs often can understand a list of null-byte separated file names.
If your file name can start with a dash - it often can be mistaken as an option. Some programs allow giving the special option -- to state that all following arguments are no options. The more general approach is to use a name which does not start with a dash:
for i in *
do
cat ./"$i" | wc -l
done
This way, a file named -n will not run cat -n but cat ./-n which will not be understood as the option -n given to cat (which would mean "number lines").
Always quote your variable substitutions. I.e. not cp $source $target, but cp "$source" "$target". This way they won't be subject to word splitting and pathname expansion.
Specify "--" before positional arguments to file operation commands. I.e. not cp "$source" "$target", but cp -- "$source" "$target". This prevents interpreting file names starting with dash as options.
And yes, "/" is not a valid character for file/directory names.

Resources