Bash: delete chars in a file, only on the first line - bash

I am making a script to copy a lot of files from a path to another one.
Any files on the path has, on the first line, a lot of "garbage" untill the word "Return-Path....."
File content example:
§°ç§°*é*é*§°ç§°çççççççReturn-PathOTHERTHINGS
REST
OF
THE
FILE
EOF
Probably sed or awk could help on this.
THE PROBLEM:
I want the whole content of the file, except for anything previous then "Return-Path" and it should be stripped ONLY on the first line, in this way:
Return-PathOTHERTHINGS
REST
OF
THE
FILE
EOF
Important thing: anything before Return-Path is "binary", infact files are seen as binary...
How to solve?

Ok, it's a new day, and now I do feel like coding this for you :-)
The algorithm is described in my other answer to your same question.
#!/bin/bash
################################################################################
# behead.sh
# Mark Setchell
#
# Utility to remove stuff preceding specified string near start of binary file
#
# Usage: behead.sh <infile> <outfile>
################################################################################
IN=$1
OUT=$2
SEARCH="Return-Path"
for i in {0..80}; do
str=$(dd if="$1" bs=1 count=${#SEARCH} iseek=$i 2> /dev/null)
if [ "$str" == $SEARCH ]; then
# The following line will go faster if you exchange "bs" and "iseek"
# parameters, because it will work in bigger blocks, it just looks
# wrong, so I haven't done it.
dd if="$1" of="$OUT" bs=1 iseek=$i 2> /dev/null
exit $?
fi
done
echo String not found, sorry.
exit 1
You can test it works like this:
#
# Create binary with 15 bytes of bash, then "Return-Path", then entire bash in file "bashed"
(dd if=/bin/bash bs=1 count=15 2>/dev/null; echo -n 'Return-Path'; cat /bin/bash) > bashed
#
# Chop off junk at start of "bashed" and save in "restored"
./behead.sh bashed restored
#
# Check the restored "bash" is exactly 11 bytes longer than original,
# as it has "Return-Path" at the beginning
ls -l bashed restored
If you save my script as "behead.sh" you will need to make it executable like this:
chmod +x behead.sh
Then you can run it like this:
./behead.sh inputfile outputfile
By the way, there is no concept of "a line" in a binary file, so I have assumed the first 80 characters - you are free to change it, of course!

Try:
sed '1s/.*Return-Path/Return-Path/'
This command substitutes anything before "Return-Path" with "Return-Path" only on the first line.

I don't feel like coding this at this minute, but can give you a hint maybe. "Return-Path" is 11 characters. You can get 11 characters from a file at offset "n" with
dd if=file bs=1 count=11 iseek=n
So if you do a loop with "n" starting at zero and increasing till the result matches "Return-Path" you can calculate how many bytes you need to remove off the front. Then you can do that with another "dd".
Alternatively, have a look at running the file through "xxd", editing that with "sed" and then running it back through "xxd" the other way with "xxd -r".

Related

Redirecting the result files to different variable file names

I have a folder with, say, ten data files I01.txt, ..., I10.txt.. Each file, when executed using the command /a.out, gives me five output files, namely f1.txt, f2.txt, ... f5.txt.
I have written a simple bash program to execute all the files and save the output printed on the screen to a variable file using the command
./ cosima_peaks_444_temp_muuttuva -args > $counter-res.txt.
Using this, I am able to save the on screen output to the file. But the five files f1 to f5 are altered to store results of the last file run, in this case I10, and the results of the first nine files are lost.
So I want to save the output of each I*.txt file (f1 ... f5) to a a different file such that, when the program executes I01.txt, using ./a.out it stores the output of the files
f1>var1-f1.txt , f2>var1-f2.txt... f5 > var1-f5.txt
and then repeats the same for I02 (f1>var2-f1.txt ...).
#!/bin/bash
# echo "for looping over all the .txt files"
echo -e "Enter the name of the file or q to quit "
read dir
if [[ $dir = q ]]
then
exit
fi
filename="$dir*.txt"
counter=0
if [[ $dir == I ]]
then
for f in $filename ; do
echo "output of $filename"
((counter ++))
./cosima_peaks_444_temp_muuttuva $f -m202.75 -c1 -ng0.5 -a0.0 -b1.0 -e1.0 -lg > $counter-res.txt
echo "counter $counter"
done
fi
If I understand you want to pass files l01.txt, l02.txt, ... to a.out and save the output for each execution of a.out to a separate file like f01.txt, f02.txt, ..., then you could use a short script that reads each file named l*.txt in the directory and passes the value to a.out redirecting the output to a file fN.txt (were N is the same number in the lN.txt filename.) This presumes you are passing each filename to a.out and that a.out is not reading the entire directory automatically.
for i in l*.txt; do
num=$(sed 's/^l\(.*\)[.]txt/\1/' <<<"$i")
./a.out "$i" > "f${num}.txt"
done
(note: that is 's/(lowercase L) ... /\one..')
note: if you do not want the same N from the filename (with its leading '0'), then you can trim the leading '0' from the N value for the output filename.
(you can use a counter as you have shown in your edited post, but you have no guarantee in sort order of the filenames used by the loop unless you explicitly sort them)
note:, this presumes NO spaces or embedded newline or other odd characters in the filename. If your lN.txt names can have odd characters or spaces, then feeding a while loop with find can avoid the odd character issues.
With f1 - f5 Created Each Run
You know the format for the output file name, so you can test for the existence of an existing file name and set a prefix or suffix to provide unique names. For example, if your first pass creates filenames 'pass1-f01.txt', 'pass1-f02.txt', then you can check for that pattern (in several ways) and increment your 'passN' prefix as required:
for f in "$filename"; do
num=$(sed 's/l*\(.*\)[.]txt/\1/' <<<"$f")
count=$(sed 's/^0*//' <<<"$num")
while [ -f "pass${count}-f${num}.txt" ]; do
((count++))
done
./a.out "$f" > "pass${count}-f${num}.txt"
done
Give that a try and let me know if that isn't closer to what you need.
(note: the use of the herestring (<<<) is bash-only, if you need a portable solution, pipe the output of echo "$var" to sed, e.g. count=$(echo "$num" | sed 's/^0*//') )
I replaced your cosima_peaks_444_temp_muuttuva with a function myprog.
The OP asked for more explanation, so I put in a lot of comment:
# This function makes 5 output files for testing the construction
function myprog {
# Fill the test output file f1.txt with the input filename and a datestamp
echo "Output run $1 on $(date)" > f1.txt
# The original prog makes 5 output files, so I copy the new testfile 4 times
cp f1.txt f2.txt
cp f1.txt f3.txt
cp f1.txt f4.txt
cp f1.txt f5.txt
}
# Use the number in the inputfile for making a unique filename and move the output
function move_output {
# The parameter ${1} is filled with something like I03.txt
# You can get the number with a sed action, but it is more efficient to use
# bash functions, even in 2 steps.
# First step: Cut off from the end as much as possiple (%%) starting with a dot.
Inumber=${1%%.*}
# Step 2: Remove the I from the Inumber (that is filled with something like "I03").
number=${Inumber#I}
# Move all outputfiles from last run
for outputfile in f*txt; do
# Put the number in front of the original name
mv "${outputfile}" "${number}_${outputfile}"
done
}
# Start the main processing. You will perform the same logic for all input files,
# so make a loop for all files. I guess all input files start with an "I",
# followed by 2 characters (a number), and .txt. No need to use ls for listing those.
for input in I??.txt; do
# Call the dummy prog above with the name of the first input file as a parameter
myprog "${input}"
# Now finally the show starts.
# Call the function for moving the 5 outputfiles to another name.
move_output "${input}"
done
I guess you have the source code of this a.out binary. If so, I would modify it so that it outputs to several fds instead of several files. Then you can solve this very cleanly using redirects:
./a.out 3> fileX.1 4> fileX.2 5> fileX.3
and so on for every file you want to output. Writing to a file or to a (redirected) fd is equivalent in most programs (notable exception: memory mapped I/O, but that is not commonly used for such scripts - look for mmap calls)
Note that this is not very esoteric, but a very well known technique that is regularly used to separate output (stdout, fd=1) from errors (stderr, fd=2).

Examining realtime data timestamp and redirecting output with bash

I have been using socat to pull ASCII streams over UDP and write them to files. The following is one such line.
socat UDP-RECV:$UDP_PORT,reuseaddr - | cat >> $INSTRUMENT_$JDAY_RAW &
Each stream being received already has its data timestamped by the sender using ts (part of moreutils) with the year, Julian day, hour, min, second, and msec. If the Julian day changes, the JDAY variable on the receiving end doesn't get reinitialized and cat merrily keeps piping data into the same file with yesterday's timestamp.
Here is an example of the udp stream being received by socat. It is being recorded at 20hz.
2015 317 06 34 43 303 winch680 000117.9 00000000 00000000.0
2015 317 06 34 43 353 winch680 000117.5 00000000 00000000.0
Is there some way in bash I can take each line received by socat, examine the jday timestamp field, and change the output file according to that timestamp?
You may parse the input stream using the read built-in program in bash. You may obtain further information with $ help read. It normally separates tokens using whitespace. If you provided a two-line preview of what your output looks like, it might be easier to help.
The variables $INSTRUMENT, and $JDAY have to be defined before that cat command is launched, because cat will open the file before it starts writing to it.
If $JDAY and $INSTRUMENT are somehow to be extracted from each line, you can use the following bash snippet (assuming lines read by socat look like <INSTRUMENT> <JDAY> <TS> yaddi yadda ...):
function triage_per_day () {
while read INSTRUMENT JDAY TS REST; do
echo "$TS $REST" >> "${INSTRUMENT}_${JDAY}_RAW";
done
}
triage_per_day < <(socat UDP-RECV:"${UDP_PORT}",reuseaddr -)
If you want to get fancier, you can use file handles to help bash run a bit faster. You can use file descriptor redirections to keep outputting to the same file as long as the day is the same. This will minimize the number of file opens and closes bash has to do.
function triage_per_day () {
local LAST_JDAY=init
exec 5>&1 # save stdout
exec 1>&2 # echos are sent to stderr until JDAY is redefined
while read INSTRUMENT JDAY TS REST; do
if [[ "$JDAY" != "$LAST_JDAY" ]]; then
# we need to change output file
# send stdout to file in append mode
exec 1>>"${INSTRUMENT}_${JDAY}_RAW"
LAST_JDAY="${JDAY}"
fi
echo "$TS $REST"
done
exec 1>&5 # restore stdout
exec 5>&- # close stdout copy
}
triage_per_day < <(socat UDP-RECV:"${UDP_PORT}",reuseaddr -)
If you wish to tokenize your lines over different characters than whitespace, say ',' commas, you can locally modify the special variable IFS:
function extract_ts () {
local IFS=,; # special bash variable: internal-field-separator
# $REST will contain everything after the third token. it is a good
# practice to specify one more name than your last token of interest.
while read TOK1 TS REST; do
echo "timestamp is $TS";
done
}
If you need fancier processing of each line to extract timestamps and other fields, you may instead execute external programs (python/perl/cut/awk/grep, etc.), but this will be much slower than simply sticking with the bash builtin functions like read or echo. If you have to do this, and speed is an issue, you may consider changing your script to a different language that gives you the expressiveness you need. You may wish to also look into bash Pattern substitution in the manual if you need fancy regular expressions.
function extract_ts () {
# store each line in the variable $LINE
while read LINE; do
TS="$(echo "$LINE" | ...)";
echo "Timestamp is $TS";
done
}
Recommended practices
Also, I should mention that it is good practice to surround your bash variables in double quotes (like in the answer) if you intend to use them as filename parameters. This is especially true if the names contain spaces or special characters -- like could be expected from a filename derived from dates or times. In cases where your variables expand to nothing (due to human or programming error), positional parameters will be missing, with sometimes bad repercussions.
Consider:
# copy two files to the directory (bad)
$ cp file1 file2 $MYDIR
If $MYDIR is undefined, then this command amounts to overwriting file2 with the contents of file1. Contrast this with cp file1 file2 "$MYDIR" which will fail early because the target "" does not exist.
Another source for problems that I see in your question is the variable names followed by underscores _, like $INSTRUMENT. Those should be surrounded in curly braces { }.
INSTRUMENT=6
BAR=49
echo $INSTRUMENT_$BAR # prints '49', but you may have expected 6_49
Because _ are valid characters in variable names, bash will attempt to greedily 'glue' the '_' after INSTRUMENT to match the longest valid variable name possible, which would be $INSTRUMENT_. This variable is undefined however, and expands to the empty string, so you're left with the rest, $BAR. This example can be correctly rewritten as:
INSTRUMENT=6
BAR=49
echo ${INSTRUMENT}_${BAR} # prints 6_49
or even better (avoiding future surprises if values ever change)
echo "${INSTRUMENT}_${BAR}" # prints 6_49
Not with cat. You'll need a [not bash] script (e.g. perl/python or C program).
Replace:
socat UDP-RECV:$UDP_PORT,reuseaddr - | cat >> $INSTRUMENT_$JDAY_RAW &
With:
socat UDP-RECV:$UDP_PORT,reuseaddr - | myscript &
Where myscript looks like:
while (1) {
get_data_from_socat_on_stdin();
if (jdaynew != jdayold) {
close_output_file();
jdayold = jdaynew;
}
if (output_file_not_open)
open_output_file(jdaynew);
write_data_to_output_file();
}
This is the code that worked for me.
The input udp stream looks like this:
2015 317 06 34 43 303 winch680 000117.9 00000000 00000000.0
#!/bin bash
# This code creates a function which reads the fields in the
# udp stream into a table
# and uses the fields in the table to determine output.
UDP_PORT=5639
function DATAOUT () {
while read YR JDY MIN SEC MSEC INST TENS SPEED LINE; do
echo "$YR $JDY $HR $MIN $SEC $MSEC $INST $TENS $SPEED $LINE" >> "${INST}_${JDY}_RAW";
done
}
DATAOUT < <(socat udp-recv:${UDP_PORT},reuseaddr -)

Bash: Extract user path (/home/userID) from read line containing full path and replace with "~"

I'm constructing a bash script file a bit at a time. I'm learning as I
go. But I can't find anything online to help me at this point: I need to
extract a substring from a large string, and the two methods I found using ${} (curly brackets) just won't work.
The first, ${x#y}, doesn't do what it should.
The second, ${x:p} or ${x:p:n}, keeps reporting bad substitution.
It only seems to work with constants.
The ${#x} returns a string length as text, not as a number, meaning it does not work with either ${x:p} or ${x:p:n}.
Fact is, it's seems really hard to get bash to do much math at all. Except for the for statements. But that is just counting. And this isn't a task for a for loop.
I've consolidated my script file here as a means of helping you all understand what it is that I am doing. It's for working with PureBasic source files, but you only have to change the grep's "--include=" argument, and it can search other types of text files instead.
#!/bin/bash
home=$(echo ~) # Copy the user's path to a variable named home
len=${#home} # Showing how to find the length. Problem is, this is treated
# as a string, not a number. Can't find a way to make over into
# into a number.
echo $home "has length of" $len "characters."
read -p "Find what: " what # Intended to search PureBasic (*.pb?) source files for text matches
grep -rHn $what $home --include="*.pb*" --exclude-dir=".cache" --exclude-dir=".gvfs" > 1.tmp
while read line # this checks for and reads the next line
do # the closing 'done' has the file to be read appended with "<"
a0=$line # this is each line as read
a1=$(echo "$a0" | awk -F: '{print $1}') # this gets the full path before the first ':'
echo $a0 # Shows full line
echo $a1 # Shows just full path
q1=${line#a1}
echo $q1 # FAILED! No reported problem, but failed to extract $a1 from $line.
q1=${a0#a1}
echo $q1 # FAILED! No reported problem, but failed to extract $a1 from $a0.
break # Can't do a 'read -n 1', as it just reads 1 char from the next line.
# Can't do a pause, because it doesn't exist. So just run from the
# terminal so that after break we can see what's on the screen .
len=${#a1} # Can get the length of $a1, but only as a string
# q1=${line:len} # Right command, wrong variable
# q1=${line:$len} # Right command, right variable, but wrong variable type
# q1=${line:14} # Constants work, but all $home's aren't 14 characters long
done < 1.tmp
The following works:
x="/home/user/rest/of/path"
y="~${x#/home/user}"
echo $y
Will output
~/rest/of/path
If you want to use "/home/user" inside a variable, say prefix, you need to use $ after the #, i.e., ${x#$prefix}, which I think is your issue.
The hejp I got was most appreciated. I got it done, and here it is:
#!/bin/bash
len=${#HOME} # Showing how to find the length. Problem is, this is treated
# as a string, not a number. Can't find a way to make over into
# into a number.
echo $HOME "has length of" $len "characters."
while :
do
echo
read -p "Find what: " what # Intended to search PureBasic (*.pb?) source files for text matches
a0=""; > 0.tmp; > 1.tmp
grep -rHn $what $home --include="*.pb*" --exclude-dir=".cache" --exclude-dir=".gvfs" >> 0.tmp
while read line # this checks for and reads the next line
do # the closing 'done' has the file to be read appended with "<"
a1=$(echo $line | awk -F: '{print $1}') # this gets the full path before the first ':'
a2=${line#$a1":"} # renove path and first colon from rest of line
if [[ $a0 != $a1 ]]
then
echo >> 1.tmp
echo $a1":" >> 1.tmp
a0=$a1
fi
echo " "$a2 >> 1.tmp
done < 0.tmp
cat 1.tmp | less
done
What I don't have yet is an answer as to whether variables can be used in place of constants in the dollar-sign, curly brackets where you use colons to mark that you want a substring of that string returned, if it requires constants, then the only choice might be to generate a child scriot using the variables, which would appear to be constants in the child, execute there, then return the results in an environmental variable or temporary file. I did stuff like that with MSDOS a lot. Limitation here is that you have to then make the produced file executable as well using "chmod +x filename". Or call it using "/bin/bash filename".
Another bash limitation found it that you cannot use "sudo" in the script without discontinuing execution of the present script. I guess a way around that is use sudo to call /bin/bash to call a child script that you produced. I assume then that if the child completes, you return to the parent script where you stopped at. Unless you did "sudo -i", "sudo -su", or some other variation where you become super user. Then you likely need to do an "exit" to drop the super user overlay.
If you exit the child script still as super user, would typing "exit" but you back to completing the parent script? I suspect so, which makes for some interesting senarios.
Another question: If doing a "while read line", what can you do in bash to check for a keyboard key press? The "read" option is already taken while in this loop.

Use first 3 characters of a filename as a variable in shell script

this is my first post so hopefully I will make my question clear.
I am new to shell scripts and my task with this one is to add a new value to every line of a csv file. The value that needs added is based on the first 3 digits of the filename.
I bit of background. The csv files I am receiving are eventually being loaded into partitioned oracle tables. The start of the file name (e.g. BATTESTFILE.txt) contains the partitioned site so I need to write a script that takes the first 3 characters of the filename (in this example BAT) and add this to the end of each line of the file.
The closest I have got so far is when I stripped the code to the bare basics of what I need to do:
build_files()
{
OLDFILE=${filename[#]}.txt
NEWFILE=${filename[#]}.NEW.txt
ABSOLUTE='path/scripts/'
FULLOLD=$ABSOLUTE$OLDFILE
FULLNEW=$ABSOLUTE$NEWFILE
sed -e s/$/",${j}"/ "${FULLOLD}" > "${FULLNEW}"
}
set -A site 'BAT'
set -A filename 'BATTESTFILE'
for j in ${site[#]}; do
for i in ${filename[#]}; do
build_files ${j}
done
done
Here I have set up an array site as there will be 6 'sites' and this will make it easy to add additionals sits to the code as the files come through to me. The same is to be siad for the filename array.
This codes works, but it isn't as automated as I need. One of my most recent attempts has been below:
build_files()
{
OLDFILE=${filename[#]}.txt
NEWFILE=${filename[#]}.NEW.txt
ABSOLUTE='/app/dss/dsssis/sis/scripts/'
FULLOLD=$ABSOLUTE$OLDFILE
FULLNEW=$ABSOLUTE$NEWFILE
sed -e s/$/",${j}"/ "${FULLOLD}" > "${FULLNEW}"
}
set -A site 'BAT'
set -A filename 'BATTESTFILE'
for j in ${site[#]}; do
for i in ${filename[#]}; do
trust=echo "$filename" | cut -c1-3
echo "$trust"
if ["$trust" = 'BAT']; then
${j} = 'BAT'
fi
build_files ${j}
done
done
I found the code trust=echo "$filename" | cut -c1-3 through another question on StackOverflow as I was researching, but it doesn't seem to work for me. I added in the echo to test what trust was holding, but it was empty.
I am getting 2 errors back:
Line 17 - BATTESTFILE: not found
Line 19 - test: ] missing
Sorry for the long winded questions. Hopefully It contains helpful info and shows the steps I have taken. Any questions, comment away. Any help or guidance is very much appreciated. Thanks.
When you are new with shells, try avoiding arrays.
In an if statement use spaces before and after the [ and ] characters.
Get used to surrounding your shell variables with {} like ${trust}
I do not know how you fill your array, when the array is hardcoded, try te replace with
SITE=file1
SITE="${SITE} file2"
And you must tell unix you want to have the rightside eveluated with $(..) (better than backtics):
trust=$(echo "${filename}" | cut -c1-3)
Some guidelines and syntax help can be found at Google
Just use shell parameter expansion:
$ var=abcdefg
$ echo "${var:0:3}"
abc
Assuming you're using a reasonably capable shell like bash or ksh, for example
Just in case it is useful for anyone else now or in the future, I got my code to work as desired by using the below. Thanks Walter A below for his answer to my main problem of getting the first 3 characters from the filename and using them as a variable.
This gave me the desired output of taking the first 3 characters of the filename, and adding them to the end of each line in my csv file.
## Get the current Directory and file name, create a new file name
build_files()
{
OLDFILE=${i}.txt
NEWFILE=${i}.NEW.txt
ABSOLUTE='/app/dss/dsssis/sis/scripts/'
FULLOLD=$ABSOLUTE$OLDFILE
FULLNEW=$ABSOLUTE$NEWFILE
## Take the 3 characters from the filename and
## add them onto the end of each line in the csv file.
sed -e s/$/";${j}"/ "${FULLOLD}" > "${FULLNEW}"
}
## Loop to take the first 3 characters from the file names held in
## an array to be added into the new file above
set -A filename 'BATTESTFILE'
for i in ${filename[#]}; do
trust=$(echo "${i}" | cut -c1-3)
echo "${trust}"
j="${trust}"
echo "${i} ${j}"
build_files ${i} ${j}
done
Hope this is useful for someone else.

BASH script: Downloading consecutive numbered files with wget

I have a web server that saves the logs files of a web application numbered. A file name example for this would be:
dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log
The last 3 digits are the counter and they can get sometime up to 100.
I usually open a web browser, browse to the file like:
http://someaddress.com/logs/dbsclog01s001.log
and save the files. This of course gets a bit annoying when you get 50 logs.
I tried to come up with a BASH script for using wget and passing
http://someaddress.com/logs/dbsclog01s*.log
but I am having problems with my the script.
Anyway, anyone has a sample on how to do this?
thanks!
#!/bin/sh
if [ $# -lt 3 ]; then
echo "Usage: $0 url_format seq_start seq_end [wget_args]"
exit
fi
url_format=$1
seq_start=$2
seq_end=$3
shift 3
printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$#"
Save the above as seq_wget, give it execution permission (chmod +x seq_wget), and then run, for example:
$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50
Or, if you have Bash 4.0, you could just type
$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log
Or, if you have curl instead of wget, you could follow Dennis Williamson's answer.
curl seems to support ranges. From the man page:
URL
The URL syntax is protocol dependent. You’ll find a detailed descrip‐
tion in RFC 3986.
You can specify multiple URLs or parts of URLs by writing part sets
within braces as in:
http://site.{one,two,three}.com
or you can get sequences of alphanumeric series by using [] as in:
ftp://ftp.numericals.com/file[1-100].txt
ftp://ftp.numericals.com/file[001-100].txt (with leading zeros)
ftp://ftp.letters.com/file[a-z].txt
No nesting of the sequences is supported at the moment, but you can use
several ones next to each other:
http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html
You can specify any amount of URLs on the command line. They will be
fetched in a sequential manner in the specified order.
Since curl 7.15.1 you can also specify step counter for the ranges, so
that you can get every Nth number or letter:
http://www.numericals.com/file[1-100:10].txt
http://www.letters.com/file[a-z:2].txt
You may have noticed that it says "with leading zeros"!
You can use echo type sequences in the wget url to download a string of numbers...
wget http://someaddress.com/logs/dbsclog01s00{1..3}.log
This also works with letters
{a..z} {A..Z}
Not sure precisely what problems you were experiencing, but it sounds like a simple for loop in bash would do it for you.
for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done
You can use a combination of a for loop in bash with the printf command (of course modifying echo to wget as needed):
$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html
Interesting task, so I wrote full script for you (combined several answers and more). Here it is:
#!/bin/bash
# fixed vars
URL=http://domain.com/logs/ # URL address 'till logfile name
PREF=logprefix # logfile prefix (before number)
POSTF=.log # logfile suffix (after number)
DIGITS=3 # how many digits logfile's number have
DLDIR=~/Downloads # download directory
TOUT=5 # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
file=$PREF`printf "%0${DIGITS}d" $i`$POSTF # local file name
dl=$URL$file # full URL to download
echo "$dl -> $DLDIR/$file" # monitoring, can be commented
wget -T $TOUT -q $dl -O $file
if [ "$?" -ne 0 ] # test if we finished
then
exit
fi
done
At the beggiing of the script you can set URL, log file prefix and suffix, how many digits you have in numbering part and download directory. Loop will download all logfiles it found, and automaticaly exit on first non-existant (using wget's timeout).
Note that this script assumes that logfile indexing starts with 1, not zero, as you mentioned in example.
Hope this helps.
Here you can find a Perl script that looks like what you want
http://osix.net/modules/article/?id=677
#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;
for($count=1;$count<=$max;$count++) {
if($count<10) {
$url=$base_url."0".$count.$format; #insert a '0' and form the URL
}
else {
$url=$base_url.$count.$format; #no need to insert a zero
}
system("$program $url");
}
I just had a look at the wget manpage discussion of 'globbing':
By default, globbing will be turned on if the URL contains a globbing character. This option may be used to turn globbing on or off permanently.
You may have to quote the URL to protect it from being expanded by your shell. Globbing makes Wget look for a directory listing, which is system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix "ls" output).
So wget http://... won't work with globbing.
Check to see if your system has seq, then it would be easy:
for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done
If your system has the jot command instead of seq:
for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done
Oh! this is a similar problem I ran into when learning bash to automate manga downloads.
Something like this should work:
for a in `seq 1 999`; do
if [ ${#a} -eq 1 ]; then
b="00"
elif [ ${#a} -eq 2 ]; then
b="0"
fi
echo "$a of 231"
wget -q http://site.com/path/fileprefix$b$a.jpg
done
Late to the party, but a real easy solution that requires no coding is to use the DownThemAll Firefox add-on, which has the functionality to retrieve ranges of files. That was my solution when I needed to download 800 consecutively numbered files.

Resources