I have a loop, in a bash script. It runs a programme that by default outputs a text file when it works, and no file if it doesn't. I'm running it a large number of times (> 500K) so I want to merge the output files, row by row. If one iteration of the loop creates a file, I want to take the LAST line of that file, append it to a master output file, then delete the original so I don't end up with 1000s of files in one directory. The Loop I have so far is:
oFile=/path/output/outputFile_
oFinal=/path/output.final
for counter in {101..200}
do
$programme $counter -out $oFile$counter
if [ -s $oFile$counter ] ## This returns TRUE if file isn't empty, right?
then
out=$(tail -1 $oFile$counter)
final=$out$oFile$counter
$final >> $oFinal
fi
done
However, it doesn't work properly, as it seems to not return all the files I want. So is the conditional wrong?
You can be clever and pass the programme a process substitution instead of a "real" file:
oFinal=/path/output.final
for counter in {101..200}
do
$programme $counter -out >(tail -n 1)
done > $oFinal
$programme will treat the process substitution as a file, and all the lines written to it will be processed by tail
Testing: my "programme" outputs 2 lines if the given counter is even
$ cat programme
#!/bin/bash
if (( $1 % 2 == 0 )); then
{
echo ignore this line
echo $1
} > $2
fi
$ ./programme 101 /dev/stdout
$ ./programme 102 /dev/stdout
ignore this line
102
So, this loop should output only the even numbers between 101 and 200
$ for counter in {101..200}; do ./programme $counter >(tail -1); done
102
104
[... snipped ...]
198
200
Success.
Related
I'm trying to create a shell script that will create multiple files (or a batch of files) of a specified amount. When the amount is reached, script stops. When the script is re-executed, the files pick up from the last file created. So if the script creates files 1-10 on first run, then on the next script execution should create 11-20, and so on.
enter code here
#!/bin/bash
NAME=XXXX
valid=true
NUMBER=1
while [ $NUMBER -le 5 ];
do
touch $NAME$NUMBER
((NUMBER++))
echo $NUMBER + "batch created"
if [ $NUMBER == 5 ];
then
break
fi
touch $NAME$NUMBER
((NUMBER+5))
echo "batch complete"
done
Based on my comment above and your description, you can write a script that will create 10 numbered files (by default) each time it is run, starting with the next available number. As mentioned, rather than just use a raw-unpadded number, it's better for general sorting and listing to use zero-padded numbers, e.g. 001, 002, ...
If you just use 1, 2, ... then you end up with odd sorting when you reach each power of 10. Consider the first 12 files numbered 1...12 without padding. a general listing sort would produce:
file1
file11
file12
file2
file3
file4
...
Where 11 and 12 are sorted before 2. Adding leading zeros with printf -v avoids the problem.
Taking that into account, and allowing the user to change the prefix (first part of the file name) by giving it as an argument, and also change the number of new files to create by passing the count as the 2nd argument, you could do something like:
#!/bin/bash
prefix="${1:-file_}" ## beginning of filename
number=1 ## start number to look for
ext="txt" ## file extension to add
newcount="${2:-10}" ## count of new files to create
printf -v num "%03d" "$number" ## create 3-digit start number
fname="$prefix$num.$ext" ## form first filename
while [ -e "$fname" ]; do ## while filename exists
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
while ((newcount--)); do ## loop newcount times
touch "$fname" ## create filename
((! newcount)) && break; ## newcount 0, break (optional)
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
Running the script without arguments will create the first 10 files, file_001.txt - file_010.txt. Run a second time, it would create 10 more files file_011.txt to file_020.txt.
To create a new group of 5 files with the prefix of list_, you would do:
bash scriptname list_ 5
Which would result in the 5 files list_001.txt to list_005.txt. Running again with the same options would create list_006.txt to list_010.txt.
Since the scheme above with 3 digits is limited to 1000 files max (if you include 000), there isn't a big need to get the number from the last file written (bash can count to 1000 quite fast). However, if you used 7-digits, for 10 million files, then you would want to parse the last number with ls -1 | tail -n 1 (or version sort and choose the last file). Something like the following would do:
number=$(ls -1 "$prefix"* | tail -n 1 | grep -o '[1-9][0-9]*')
(note: that is ls -(one) not ls -(ell))
Let me know if that is what you are looking for.
I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.
To begin, I have written the following code to separate the relevant data into easy chunks:
#!/bin/bash
destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
done
This worked perfectly, so the next step is the worlds simplest cat command:
cat $file3 $file2 > outfile
However, what I need to do is to stitch file2 to the previous file3. Look at this screenshot of the directory for better understanding.
See how these files are all sequential over time:
*_20090412T235945_20090413T235944_* ### April 13
*_20090413T235945_20090414T235944_* ### April 14
So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.
This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.
Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.
"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."
If I understand what you want correctly, it can be done with one line of code:
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.
Example
To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:
$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15
Here is the result of running our command:
$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15
As you can see, the first two lines of each file have been transferred to the preceding file.
How it works
awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.
NR==1 || FNR==16{f=FILENAME ".new"}
When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.
For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.
print>f
This prints the current line to file f.
(If this was a shell script, we would use >>. This is not a shell script. This is awk.)
Using a glob to specify the file names
destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' "$destination"*.ascii
Your task is not that difficult at all. You want to gather a list of all _end files in the directory (using a for loop and globbing, NOT looping on the results of ls). Once you have all the end files, you simply parse the dates using parameter expansion w/substing removal say into d1 and d2 for date1 and date2 in:
stuff_20090413T235945_20090414T235944_end
| d1 | | d2 |
then you simply subtract 1 from d1 into say date0 or d0 and then construct a previous filename out of d0 and d1 using _snip instead of _end. Then just test for the existence of the previous _snip filename, and if it exists, paste your info from the current _end file to the previous _snip file. e.g.
#!/bin/bash
for i in *end; do ## find all _end files
d1="${i#*stuff_}" ## isolate first date in filename
d1="${d1%%T*}"
d2="${i%T*}" ## isolate second date
d2="${d2##*_}"
d0=$((d1 - 1)) ## subtract 1 from first, get snip d1
prev="${i/$d1/$d0}" ## create previous 'snip' filename
prev="${prev/$d2/$d1}"
prev="${prev%end}snip"
if [ -f "$prev" ] ## test that prev snip file exists
then
printf "paste to : %s\n" "$prev"
printf " from : %s\n\n" "$i"
fi
done
Test Input Files
$ ls -1
stuff_20090413T235945_20090414T235944_end
stuff_20090413T235945_20090414T235944_snip
stuff_20090414T235945_20090415T235944_end
stuff_20090414T235945_20090415T235944_snip
stuff_20090415T235945_20090416T235944_end
stuff_20090415T235945_20090416T235944_snip
stuff_20090416T235945_20090417T235944_end
stuff_20090416T235945_20090417T235944_snip
stuff_20090417T235945_20090418T235944_end
stuff_20090417T235945_20090418T235944_snip
stuff_20090418T235945_20090419T235944_end
stuff_20090418T235945_20090419T235944_snip
Example Use/Output
$ bash endsnip.sh
paste to : stuff_20090413T235945_20090414T235944_snip
from : stuff_20090414T235945_20090415T235944_end
paste to : stuff_20090414T235945_20090415T235944_snip
from : stuff_20090415T235945_20090416T235944_end
paste to : stuff_20090415T235945_20090416T235944_snip
from : stuff_20090416T235945_20090417T235944_end
paste to : stuff_20090416T235945_20090417T235944_snip
from : stuff_20090417T235945_20090418T235944_end
paste to : stuff_20090417T235945_20090418T235944_snip
from : stuff_20090418T235945_20090419T235944_end
(of course replace stuff_ with your actual prefix)
Let me know if you have questions.
You could store the previous $file3 value in a variable (and do a check if it is not the first run with -z check):
#!/bin/bash
destination='media/user/directory/'
prev=""
for file1 in $destination*.ascii
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
if [ -z "$prev" ]; then
cat $prev $file2 > outfile
fi
prev=$file3
done
I am absolutely new to bash scripting but I need to perform some task with it. I have a file with just one column of numbers (6250000). I need to extract 100 at a time, put them into a new file and submit each 100 to another program. I think this should be some kind of a loop going through my file each 100 numbers and submitting them to the program.
Let's say my numbers in the file would look like this.
1.6435
-1.2903
1.1782
-0.7192
-0.4098
-1.7354
-0.4194
0.2427
0.2852
I need to feed each of those 62500 output files to a program which has a parameter file. I was doing something like this:
lossopt()
{
cat<<END>temp.par
Parameters for LOSSOPT
***********************
START OF PARAMETERS:
lossin.out \Input file with distribution
1 \column number
lossopt.out \Output file
-3.0 3.0 0.01 \xmin, xmax, xinc
-3.0 1
0.0 0.0
0.0 0.0
3.0 0.12
END
}
for i in {1..62500}
do
sed -n 1,100p ./rearnum.out > ./lossin.out
echo temp.par | ./lossopt >> lossopt.out
rm lossin.out
cut -d " " -f 101- rearnum.out > rearnum.out
done
rearnum is my big initial file
If you need to split it into files containing 100 lines each, I'd use split -l 100 <source> which will create a lot of files named like xaa, xab, xac, ... each of which contain at most 100 lines of the source file (the last file may contain fewer). If you want the names to start with something other than x you can give the prefix those names should use as the last argument to split as in split -l 100 <source> OUT which will now give files like OUTaa, OUTab, ...
Then you can loop over those files and process them however you like. If you need to run a script with them you could do something like
for file in OUT*; do
<other_script> "$file"
done
You can still use a read loop and redirection:
#!/bin/bash
fnbase=${1:-file}
increment=${2:-100}
declare -i count=0
declare -i fcount=1
fname="$(printf "%s_%08d" "$fnbase" $((fcount)))"
while read -r line; do
((count == 0)) && :> "$fname"
((count++))
echo "$line" >> "$fname"
((count % increment == 0)) && {
count=0
((fcount++))
fname="$(printf "%s_%08d" "$fnbase" $((fcount)))"
}
done
exit 0
use/output
$ bash script.sh yourprefix <yourfile
Which will take yourfile with many thousands of lines and write every 100 lines out to yourprefix_00000001 -> yourprefix_99999999 (default is file_000000001, etc..). Each new filename is truncated to 0 lines before writing begins.
Again you can specify on the command line the number of lines to write to each file. E.g.:
$ bash script.sh yourprefix 20 <yourfile
Which will write 20 lines per file to yourprefix_00000001 -> yourprefix_99999999
Even though it may seem stupid for professional in bash, I will take the risk and post my own answer to my question
cat<<END>temp.par
Parameters for LOSSOPT
***********************
START OF PARAMETERS:
lossin.out \Input file with distribution
1 \column number
lossopt.out \Output file
-3.0 3.0 0.01 \xmin, xmax, xinc
-3.0 1
0.0 0.0
0.0 0.0
3.0 0.12
END
for i in {1..62500}
do
sed -n 1,100p ./rearnum.out >> ./lossin.out
echo temp.par | ./lossopt >> sdis.out
rm lossin.out
tail -n +101 rearnum.out > temp
tail -n +1 temp > rearnum.out
rm temp
done
This script consequentially "eats" the big initial file and puts the "pieces" into the external program. After it takes one portion of 100 number, it deletes this portion from the big file. Then, the process repeats until the big file is empty. It is not an elegant solution but it worked for me.
I have a file and contents are like :
|T1234
010000000000
02123456878
05122345600000000000000
07445678920000000000000
09000000000123000000000
10000000000000000000000
.T1234
|T798
013457829
0298365799
05600002222222222222222
09348977722220000000000
10000057000004578933333
.T798
Here one complete batch means it will start from |T and end with .T.
In the file i have 2 batches.
I want to edit this file to delete a batch for record 10(position1-2),if from position 3 till position 20 is 0 then delete the batch.
Please let me know how i can achieve this by writing a shell script or syncsort or sed or awk .
I am still a little unclear about exactly what you want, but I think I have it enough to give you an outline on a bash solution. The part I was unclear on is exactly which line contained the first two characters of 10 and remaining 0's, but it looks like that is the last line in each batch. Not knowing exactly how you wanted the batch (with the matching 10) handled, I have simply written the remaining wanted batch(es) out to a file called newbatch.txt in the current working directory.
The basic outline of the script is to read each batch into a temporary array. If during the read, the 10 and 0's match is found, it sets a flag to delete the batch. After the last line is read, it checks the flag, if set simply outputs the batch number to delete. If the flag is not set, then it writes the batch to ./newbatch.txt.
Let me know if your requirements are different, but this should be fairly close to a solution. The code is fairly well commented. If you have questions, just drop a comment.
#!/bin/bash
ifn=${1:-dat/batch.txt} # input filename
ofn=./newbatch.txt # output filename
:>"$ofn" # truncate output filename
declare -i bln=0 # batch line number
declare -i delb=0 # delete batch flag
declare -a ba # temporary batch array
[ -r "$ifn" ] || { # test input file readable
printf "error: file not readable. usage: %s filename\n" "${0//*\//}"
exit 1
}
## read each line in input file
while read -r line || test -n "$line"; do
printf " %d %s\n" $bln "$line"
ba+=( "$line" ) # add line to array
## if chars 1-2 == 10 and chars 3 on == 00...
if [ ${line:0:2} == 10 -a ${line:3} == 00000000000000000000 ]; then
delb=1 # set delete flag
fi
((bln++)) # increment line number
## if the line starts with '.'
if [ ${line:0:1} == '.' ]; then
## if the delete batch flag is set
if [ $delb -eq 1 ]; then
## do nothing (but show batch no. to delete)
printf " => deleting batch : %s\n" "${ba[0]}"
## if delb not set, then write the batch to output file
else
printf "%s\n" ${ba[#]} >> "$ofn"
fi
## reset line no., flags, and uset array.
bln=0
delb=0
unset ba
fi
done <"$ifn"
exit 0
Output (to stdout)
$ bash batchdel.sh
0 |T1234
1 010000000000
2 02123456878
3 05122345600000000000000
4 07445678920000000000000
5 09000000000123000000000
6 10000000000000000000000
7 .T1234
=> deleting batch : |T1234
0 |T798
1 013457829
2 0298365799
3 05600002222222222222222
4 09348977722220000000000
5 10000057000004578933333
6 .T798
Output (to newbatch.txt)
$ cat newbatch.txt
|T798
013457829
0298365799
05600002222222222222222
09348977722220000000000
10000057000004578933333
.T798
Suppose I have a file (sizes.txt)
daveclark#foo.com 0 23252 0
mikeclark#foo.com 0 45131 1
clark#foo.com 0 55235 0
joeclark#bar.net 33632 1
maryclark#bar.net 0 55523 0
clark#bar.net 0 99356 0
Now I have another file (users.txt)
clark#foo.com
clark#bar.net
What I want to do is find each line in sizes.txt for the specific email addresses in users.txt...using a loop, bash or one-liner in CentOS. Here's the key point, I need to find lines that only contain clark#foo.com and then clark#bar.net - meaning this should be one line only for each.
The most simple way that comes to mind...
for i in `cat users.txt`; do grep $i sizes.txt; done
...but this does not work because processing the first line of users.txt will return the lines containing daveclark#foo.com, mikeclark#foo.com and clark#foo.com. I explicitly want the line containing "clark#foo.com" (the third line of sizes.txt). Processing second line of users.txt, will have the same problem (it will return maryclark#bar.net and clark#bar.net lines) I know this has to be something totally simple that I'm overlooking.
What you are looking for is the exact match with grep. In your case that would be the -w option.
So
for i in cat users.txt do
grep -w "^$i" sizes.txt
done
should do the trick.
Cheers.
You can try something like this using only bash built-in functions and syntax:
while read -r user ; do
while read -r s_user s_column_2 s_column_3 s_column_4 ; do
[ "${s_user}" = "${user}" ] && printf "%b\t%b\t%b\t%b\n" "${s_user}" "${s_column_2}" "${s_column_3}" "${s_column_4}"
done < sizes.txt
done < users.txt
this nested while could be slow when using big size.txt files. In those cases you could use this in combination with awk