I want to delete a batch from file - shell

I have a file and contents are like :
|T1234
010000000000
02123456878
05122345600000000000000
07445678920000000000000
09000000000123000000000
10000000000000000000000
.T1234
|T798
013457829
0298365799
05600002222222222222222
09348977722220000000000
10000057000004578933333
.T798
Here one complete batch means it will start from |T and end with .T.
In the file i have 2 batches.
I want to edit this file to delete a batch for record 10(position1-2),if from position 3 till position 20 is 0 then delete the batch.
Please let me know how i can achieve this by writing a shell script or syncsort or sed or awk .

I am still a little unclear about exactly what you want, but I think I have it enough to give you an outline on a bash solution. The part I was unclear on is exactly which line contained the first two characters of 10 and remaining 0's, but it looks like that is the last line in each batch. Not knowing exactly how you wanted the batch (with the matching 10) handled, I have simply written the remaining wanted batch(es) out to a file called newbatch.txt in the current working directory.
The basic outline of the script is to read each batch into a temporary array. If during the read, the 10 and 0's match is found, it sets a flag to delete the batch. After the last line is read, it checks the flag, if set simply outputs the batch number to delete. If the flag is not set, then it writes the batch to ./newbatch.txt.
Let me know if your requirements are different, but this should be fairly close to a solution. The code is fairly well commented. If you have questions, just drop a comment.
#!/bin/bash
ifn=${1:-dat/batch.txt} # input filename
ofn=./newbatch.txt # output filename
:>"$ofn" # truncate output filename
declare -i bln=0 # batch line number
declare -i delb=0 # delete batch flag
declare -a ba # temporary batch array
[ -r "$ifn" ] || { # test input file readable
printf "error: file not readable. usage: %s filename\n" "${0//*\//}"
exit 1
}
## read each line in input file
while read -r line || test -n "$line"; do
printf " %d %s\n" $bln "$line"
ba+=( "$line" ) # add line to array
## if chars 1-2 == 10 and chars 3 on == 00...
if [ ${line:0:2} == 10 -a ${line:3} == 00000000000000000000 ]; then
delb=1 # set delete flag
fi
((bln++)) # increment line number
## if the line starts with '.'
if [ ${line:0:1} == '.' ]; then
## if the delete batch flag is set
if [ $delb -eq 1 ]; then
## do nothing (but show batch no. to delete)
printf " => deleting batch : %s\n" "${ba[0]}"
## if delb not set, then write the batch to output file
else
printf "%s\n" ${ba[#]} >> "$ofn"
fi
## reset line no., flags, and uset array.
bln=0
delb=0
unset ba
fi
done <"$ifn"
exit 0
Output (to stdout)
$ bash batchdel.sh
0 |T1234
1 010000000000
2 02123456878
3 05122345600000000000000
4 07445678920000000000000
5 09000000000123000000000
6 10000000000000000000000
7 .T1234
=> deleting batch : |T1234
0 |T798
1 013457829
2 0298365799
3 05600002222222222222222
4 09348977722220000000000
5 10000057000004578933333
6 .T798
Output (to newbatch.txt)
$ cat newbatch.txt
|T798
013457829
0298365799
05600002222222222222222
09348977722220000000000
10000057000004578933333
.T798

Related

shell script to create multiple files, incrementing from last file upon next execution

I'm trying to create a shell script that will create multiple files (or a batch of files) of a specified amount. When the amount is reached, script stops. When the script is re-executed, the files pick up from the last file created. So if the script creates files 1-10 on first run, then on the next script execution should create 11-20, and so on.
enter code here
#!/bin/bash
NAME=XXXX
valid=true
NUMBER=1
while [ $NUMBER -le 5 ];
do
touch $NAME$NUMBER
((NUMBER++))
echo $NUMBER + "batch created"
if [ $NUMBER == 5 ];
then
break
fi
touch $NAME$NUMBER
((NUMBER+5))
echo "batch complete"
done
Based on my comment above and your description, you can write a script that will create 10 numbered files (by default) each time it is run, starting with the next available number. As mentioned, rather than just use a raw-unpadded number, it's better for general sorting and listing to use zero-padded numbers, e.g. 001, 002, ...
If you just use 1, 2, ... then you end up with odd sorting when you reach each power of 10. Consider the first 12 files numbered 1...12 without padding. a general listing sort would produce:
file1
file11
file12
file2
file3
file4
...
Where 11 and 12 are sorted before 2. Adding leading zeros with printf -v avoids the problem.
Taking that into account, and allowing the user to change the prefix (first part of the file name) by giving it as an argument, and also change the number of new files to create by passing the count as the 2nd argument, you could do something like:
#!/bin/bash
prefix="${1:-file_}" ## beginning of filename
number=1 ## start number to look for
ext="txt" ## file extension to add
newcount="${2:-10}" ## count of new files to create
printf -v num "%03d" "$number" ## create 3-digit start number
fname="$prefix$num.$ext" ## form first filename
while [ -e "$fname" ]; do ## while filename exists
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
while ((newcount--)); do ## loop newcount times
touch "$fname" ## create filename
((! newcount)) && break; ## newcount 0, break (optional)
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
Running the script without arguments will create the first 10 files, file_001.txt - file_010.txt. Run a second time, it would create 10 more files file_011.txt to file_020.txt.
To create a new group of 5 files with the prefix of list_, you would do:
bash scriptname list_ 5
Which would result in the 5 files list_001.txt to list_005.txt. Running again with the same options would create list_006.txt to list_010.txt.
Since the scheme above with 3 digits is limited to 1000 files max (if you include 000), there isn't a big need to get the number from the last file written (bash can count to 1000 quite fast). However, if you used 7-digits, for 10 million files, then you would want to parse the last number with ls -1 | tail -n 1 (or version sort and choose the last file). Something like the following would do:
number=$(ls -1 "$prefix"* | tail -n 1 | grep -o '[1-9][0-9]*')
(note: that is ls -(one) not ls -(ell))
Let me know if that is what you are looking for.

Bash: checking substring increments with modular arithmetic

I have a list of files with file names that contain a substring of 6 numbers that represents HHMMSS, HH: 2 digits hour, MM: 2 digits minutes, SS: 2 digits seconds.
If the list of files is ordered, the increments should be in steps of 30 minutes, that is, the first substring should be 000000, followed by 003000, 010000, 013000, ..., 233000.
I want to check that no file is missing iterating the list of files and checking that neither of these substrings is missing. My approach:
string_check=000000
for file in ${file_list[#]}; do
if [[ ${file:22:6} == $string_check ]]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
string_check=$((string_check+3000)) #this is the key line
done
And the previous to the last line is the key. It should be formatted to 6 digits, I know how to do that, but I want to add time like a clock, or, in more specific words, modular arithmetic modulo 60. How can that be done?
Assumptions:
all 6-digit strings are of the format xx[03]0000 (ie, has to be an even 00 or 30 minutes and no seconds)
if there are strings like xx1529 ... these will be ignored (see 2nd half of answer - use of comm - to address OP's comment about these types of strings being an error)
Instead of trying to do a bunch of mod 60 math for the MM (minutes) portion of the string, we can use a sequence generator to generate all the desired strings:
$ for string_check in {00..23}{00,30}00; do echo $string_check; done
000000
003000
010000
013000
... snip ...
230000
233000
While OP should be able to add this to the current code, I'm thinking we might go one step further and look at pre-parsing all of the filenames, pulling the 6-digit strings into an associative array (ie, the 6-digit strings act as the indexes), eg:
unset myarray
declare -A myarray
for file in ${file_list}
do
myarray[${file:22:6}]+=" ${file}" # in case multiple files have same 6-digit string
done
Using the sequence generator as the driver of our logic, we can pull this together like such:
for string_check in {00..23}{00,30}00
do
[[ -z "${myarray[${string_check}]}" ]] &&
echo "Problem: (file) '${string_check}' is missing"
done
NOTE: OP can decide if the process should finish checking all strings or if it should exit on the first missing string (per OP's current code).
One idea for using comm to compare the 2 lists of strings:
# display sequence generated strings that do not exist in the array:
comm -23 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
# OP has commented that strings not like 'xx[03]000]` should generate an error;
# display strings (extracted from file names) that do not exist in the sequence
comm -13 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
Where:
comm -23 - display only the lines from the first 'file' that do not exist in the second 'file' (ie, missing sequences of the format xx[03]000)
comm -13 - display only the lines from the second 'file' that do not exist in the first 'file' (ie, filenames with strings not of the format xx[03]000)
These lists could then be used as input to a loop, or passed to xargs, for additional processing as needed; keeping in mind the comm -13 output will display the indices of the array, while the associated contents of the array will contain the name of the original file(s) from which the 6-digit string was derived.
Doing this easy with POSIX shell and only using built-ins:
#!/usr/bin/env sh
# Print an x for each glob matched file, and store result in string_check
string_check=$(printf '%.0sx' ./*[0-2][0-9][03]000*)
# Now string_check length reflects the number of matches
if [ ${#string_check} -eq 48 ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
Alternatively:
#!/usr/bin/env sh
if [ "$(printf '%.0sx' ./*[0-2][0-9][03]000*)" \
= 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi

Creating a progress bar for BASH script exporting system log files

Essentially for a set number of systems logs pulled and exported I need to indicate the scripts progress by printing a character "#". This should eventually create a progress bar with a width of 60. Something like what's presented below: ############################################# ,additionally I need the characters to build from left to right indicating the progression of the script.
The Question/Problem that this code was based off of goes as follows: "Use a separate invocation of wevtutil el to get the count of the number of logs and scale this to,say, a width of 60."
SYSNAM=$(hostname)
LOGDIR=${1:-/tmp/${SYSNAM}_logs}
i=0
LOGCOUNT=$( wevtutil el | wc -l )
x=$(( LOGCOUNT/60 ))
wevtutil el | while read ALOG
do
ALOG="${ALOG%$'\r'}"
printf "${ALOG}:\r"
SAFNAM="${ALOG// /_}"
SAFNAM="${SAFNAM//\//-}"
wevtutil epl "$ALOG" "${SYSNAM}_${SAFNAM}.evtx"
done
I've attempted methods such as using echo -ne "#", and printf "#%0.s" however the issue that I encounter is that the "#" characters gets printed with each instance of the name of the log file being retrieved; also the pattern is printed vertically rather than horizontally.
LOGCOUNT=$( wevtutil el | wc -l )
x=$(( LOGCOUNT/60 ))
echo -ne "["
for i in {1..60}
do
if [[ $(( x*i )) != $LOGCOUNT ]]
then
echo -ne "#"
#printf '#%0.s'
fi
done
echo "]"
printf "\n"
echo "Transfer Complete."
echo "Total Log Files Transferred: $LOGCOUNT"
I tried previously integrating this code into the first block but no luck. But something tells me that I don't need to establish a whole new loop, I keep thinking that the first block of code only needs a few lines of modification. Anyhow sorry for the lengthy explanation, please let me know if anything additional is needed for assistance--Thank you.
For the sake of this answer I'm going to assume the desired output is a 2-liner that looks something like:
$ statbar
file: /bin/cygdbusmenu-qt5-2.dll
[######## ]
The following may not work for everyone as it comes down to individual terminal attributes and how they can(not) be manipulated by tput (ie, ymmv) ...
For my sample script I'm going to loop through the contents of /bin, printing the name of each file as I process it, while updating the status bar with a new '#' after each 20 files:
there are 719 files under my /bin so there should be 35 #'s in my status bar (I add an extra # at the end once processing has completed)
we'll use a few tput commands to handle cursor/line movement, plus erasing previous output from a line
for printing the status bar I've pre-calculated the number of #'s and then use 2 variables ... $barspace for spaces, $barhash for #'s; for each 20 files I strip a space off $barspace and add a single # to $barhash; by (re)printing these 2x variables every 20x files I get the appearance of a moving status bar
Putting this all together:
$ cat statbar
clear # make sure we have plenty of room to display our status bar;
# if we're at the bottom of the console/window and we cause the
# windows to 'scroll up' then 'tput sc/rc' will not work
tput sc # save pointer/reference to current terminal line
erase=$(tput el) # save control code for 'erase (rest of) line'
# init some variables; get a count of the number of files so we can pre-calculate the total length of our status bar
modcount=20
filecount=$(find /bin -type f | wc -l)
# generate a string of filecount/20+1 spaces (35+1 for my particular /bin)
barspace=
for ((i=1; i<=(filecount/modcount+1); i++))
do
barspace="${barspace} "
done
barhash= # start with no #'s for this variable
filecount=0 # we'll re-use this variable to keep track of # of files processed so need to reset
while read -r filename
do
filecount=$((filecount+1))
tput rc # return cursor to previously saved terminal line (tput sc)
# print filename (1st line of output); if shorter than previous filename we need to erase rest of line
printf "file: ${filename}${erase}\n"
# print our status bar (2nd line of output) on the first and every ${modcount} pass through loop;
if [ ${filecount} -eq 1 ]
then
printf "[${barhash}${barspace}]\n"
elif [[ $((filecount % ${modcount} )) -eq 0 ]]
then
# for every ${modcount}th file we ...
barspace=${barspace:1:100000} # strip a space from barspace
barhash="${barhash}#" # add a '#' to barhash
printf "[${barhash}${barspace}]\n" # print our new status bar
fi
done < <(find /bin -type f | sort -V)
# finish up the status bar (should only be 1 space left to 'convert' to a '#')
tput rc
printf "file: -- DONE --\n"
if [ ${#barspace} -gt 0 ]
then
barspace=${barspace:1:100000}
barhash="${barhash}#"
fi
printf "[${barhash}${barspace}]\n"
NOTE: While testing I had to periodically reset my terminal in order for the tput commands to function properly, eg:
$ reset
$ statbar
I couldn't get the above to work on any of the (internet) fiddle sites (basically having problems getting tput to work with the web-based 'terminals').
Here's a gif displaying the behavior ...
NOTES:
the script does print every filename to stdout but since this script isn't actually doing anything with the files in question a) the printfs occur quite rapidly and b) the video/gif only captures a (relatively) few fleeting images ("Duh, Mark!" ?)
the last printf "file: -- DONE --\n" was added after I created the gif, and I'm being lazy by not generating and uploading a new gif

Make cat command to operate recursively looping through a directory

I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.
To begin, I have written the following code to separate the relevant data into easy chunks:
#!/bin/bash
destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
done
This worked perfectly, so the next step is the worlds simplest cat command:
cat $file3 $file2 > outfile
However, what I need to do is to stitch file2 to the previous file3. Look at this screenshot of the directory for better understanding.
See how these files are all sequential over time:
*_20090412T235945_20090413T235944_* ### April 13
*_20090413T235945_20090414T235944_* ### April 14
So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.
This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.
Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.
"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."
If I understand what you want correctly, it can be done with one line of code:
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.
Example
To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:
$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15
Here is the result of running our command:
$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15
As you can see, the first two lines of each file have been transferred to the preceding file.
How it works
awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.
NR==1 || FNR==16{f=FILENAME ".new"}
When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.
For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.
print>f
This prints the current line to file f.
(If this was a shell script, we would use >>. This is not a shell script. This is awk.)
Using a glob to specify the file names
destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' "$destination"*.ascii
Your task is not that difficult at all. You want to gather a list of all _end files in the directory (using a for loop and globbing, NOT looping on the results of ls). Once you have all the end files, you simply parse the dates using parameter expansion w/substing removal say into d1 and d2 for date1 and date2 in:
stuff_20090413T235945_20090414T235944_end
| d1 | | d2 |
then you simply subtract 1 from d1 into say date0 or d0 and then construct a previous filename out of d0 and d1 using _snip instead of _end. Then just test for the existence of the previous _snip filename, and if it exists, paste your info from the current _end file to the previous _snip file. e.g.
#!/bin/bash
for i in *end; do ## find all _end files
d1="${i#*stuff_}" ## isolate first date in filename
d1="${d1%%T*}"
d2="${i%T*}" ## isolate second date
d2="${d2##*_}"
d0=$((d1 - 1)) ## subtract 1 from first, get snip d1
prev="${i/$d1/$d0}" ## create previous 'snip' filename
prev="${prev/$d2/$d1}"
prev="${prev%end}snip"
if [ -f "$prev" ] ## test that prev snip file exists
then
printf "paste to : %s\n" "$prev"
printf " from : %s\n\n" "$i"
fi
done
Test Input Files
$ ls -1
stuff_20090413T235945_20090414T235944_end
stuff_20090413T235945_20090414T235944_snip
stuff_20090414T235945_20090415T235944_end
stuff_20090414T235945_20090415T235944_snip
stuff_20090415T235945_20090416T235944_end
stuff_20090415T235945_20090416T235944_snip
stuff_20090416T235945_20090417T235944_end
stuff_20090416T235945_20090417T235944_snip
stuff_20090417T235945_20090418T235944_end
stuff_20090417T235945_20090418T235944_snip
stuff_20090418T235945_20090419T235944_end
stuff_20090418T235945_20090419T235944_snip
Example Use/Output
$ bash endsnip.sh
paste to : stuff_20090413T235945_20090414T235944_snip
from : stuff_20090414T235945_20090415T235944_end
paste to : stuff_20090414T235945_20090415T235944_snip
from : stuff_20090415T235945_20090416T235944_end
paste to : stuff_20090415T235945_20090416T235944_snip
from : stuff_20090416T235945_20090417T235944_end
paste to : stuff_20090416T235945_20090417T235944_snip
from : stuff_20090417T235945_20090418T235944_end
paste to : stuff_20090417T235945_20090418T235944_snip
from : stuff_20090418T235945_20090419T235944_end
(of course replace stuff_ with your actual prefix)
Let me know if you have questions.
You could store the previous $file3 value in a variable (and do a check if it is not the first run with -z check):
#!/bin/bash
destination='media/user/directory/'
prev=""
for file1 in $destination*.ascii
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
if [ -z "$prev" ]; then
cat $prev $file2 > outfile
fi
prev=$file3
done

if output file from loop isn't empty, copy last line bash

I have a loop, in a bash script. It runs a programme that by default outputs a text file when it works, and no file if it doesn't. I'm running it a large number of times (> 500K) so I want to merge the output files, row by row. If one iteration of the loop creates a file, I want to take the LAST line of that file, append it to a master output file, then delete the original so I don't end up with 1000s of files in one directory. The Loop I have so far is:
oFile=/path/output/outputFile_
oFinal=/path/output.final
for counter in {101..200}
do
$programme $counter -out $oFile$counter
if [ -s $oFile$counter ] ## This returns TRUE if file isn't empty, right?
then
out=$(tail -1 $oFile$counter)
final=$out$oFile$counter
$final >> $oFinal
fi
done
However, it doesn't work properly, as it seems to not return all the files I want. So is the conditional wrong?
You can be clever and pass the programme a process substitution instead of a "real" file:
oFinal=/path/output.final
for counter in {101..200}
do
$programme $counter -out >(tail -n 1)
done > $oFinal
$programme will treat the process substitution as a file, and all the lines written to it will be processed by tail
Testing: my "programme" outputs 2 lines if the given counter is even
$ cat programme
#!/bin/bash
if (( $1 % 2 == 0 )); then
{
echo ignore this line
echo $1
} > $2
fi
$ ./programme 101 /dev/stdout
$ ./programme 102 /dev/stdout
ignore this line
102
So, this loop should output only the even numbers between 101 and 200
$ for counter in {101..200}; do ./programme $counter >(tail -1); done
102
104
[... snipped ...]
198
200
Success.

Resources