Sed getting stuck in infinite loop after removing entry from file - bash

Howdie do,
I'm currently working on a script that will take a list of IPs, store them in a LIST, loop through that list and compare the IP's to those in two text files. If the IP is duplicated in both files, it will remove the IP from one of the files.
The two files that contain the duplicates:
cat jeremy
209.240.105.0
cat jeremy2
209.240.105.0
Now the code is pretty simple:
LIST="$(cat /STORAGE/ips | awk -F ':' '{print $1}')"
for I in $LIST
do
DUP1=$(grep -rwl "$I" /STORAGE/jeremy/ | awk -F '/' '{print $4}' | sed 2d)
DUP2=$(grep -rwl "$I" /STORAGE/jeremy/ | awk -F '/' '{print $4}' | sed 1d)
cat $DUP1 | while read IP; do sed -i "/^${IP}$/d" $DUP2 ; done
done
That actually works and removes the duplicate IP from the $DUP2 file as it should, but it seems to get stuck in an infinite loop.
I saw this as after I run the script, it will remove the duplicate as it should, but the script just keeps running.
If you press enter while the script is turning it's wheels, it spits out:
sed: no input files
sed: no input files
But you can clearly see the duplicate IP has been removed:
[/STORAGE/jeremy]# cat jeremy
[/STORAGE/jeremy]# cat jeremy2
209.240.105.0
So it does it's job, but the sed command seems to be stuck in a loop. I've only today really started to learn more about sed it's capabilites, but is there an equivalent to break; like c++ or c#?
I just need sed to break out of the while read loop
The Input and Output files are posted below and also, this is not a duplicate question. I did raise a question earlier about this script, but that was just to get better understanding of how to use the regex with the sed and awk.
IP Input file that generates $LIST
209.240.105.0:255.255.255.255:209.240.105.0
209.240.105.1:255.255.255.255:209.240.105.1
The two files that I'm testing on just contain a list of one IP at the time:
Test file #1 jeremy:
209.240.105.0
Test file #2 jeremy2:
209.240.105.0
Once the script runs, it should only remove the IP from the Test File #2:
Test file #1 jeremy:
209.240.105.0
Test file #2 jeremy2:
Which the script currently does. It's just that I have to kill the script manually instead of it breaking out of the while read loop

Let's start with this, uses GNU awk for "\<" word-delimiters:
gawk -F':' '
NR==FNR{ gsub(/\./,"\\."); ips["\\<" $1 "\\>"]; next }
{
for (ip in ips) {
if ( match($0,ip) ) {
print ip, FILENAME, RSTART, RLENGTH
}
}
}
' /STORAGE/ips /STORAGE/jeremy/* |
sort
That should print, for each IP address, the file name(s) it occurs in plus the character position it first occurs in on each line, and the length of the IP address.
Does it?
Once you post some sample input and expected output we can go further.

Related

How to combine awk and sed in while read line from text file to pull parts and rearrange the output

I have text files that have a source path + filename and the destination path.
What I need is to pull the destination path then add just the filename from the line then add a system command to it.
I am nesting a while loop within a for loop to crawl through a directory of text files to first stage files then get the hash using digest then write the results to a text file.
Each line in the text file looks like this.
/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/
I can get the destination path or the file name but it is giving me fits trying to get them together.
I need it to combine into /folder/folder/folder/xxxxx/file.jpg.
Then I need to add a stage command, stage /folder/folder/folder/xxxxx/file.jpg
this gets path;
for file in ls 10*.txt; do cat $file | awk '{print $2}'; done
And this gets the file name;
for file in ls 10*.txt; do TIF=`cat $file | awk '{print $6}' FS=/`; echo $TIF; done
But when I try to combine them using awk, sed, cut or anything esle I can Google, it only pulls the first one in the statement.
Assuming that your input file has tab separated fields and there are no space chars in any of your file/path data, try this,
echo "/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/" \
| awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}'
output
stage /folder/folder/folder/xxxxx/file.jpg
This will then work with
awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file
Review the output to be sure all files will be processed correctly. If so, you can then pass the output to bash and all files will be processed (staged?), i.e.
awk '-F\t' '{n=split($1,fileArr,"/"); print "stage " $2 fileArr[n]}' file | bash
IHTH
You can use sed with the delimiter #.
First match the last word (string without slash) before the whitespace, it will be stored in \1.
Store the path (after the whitespace) in \2.
echo '/folder/folder/folder/file.jpg /folder/folder/folder/xxxxx/' |
sed -r 's#.*/([^/]*)\s+(.*)#stage \2/\1#'

How to parse the output of `ls -l` into multiple variables in bash?

There are a few answers on this topic already, but pretty much all of them say that it's bad to parse the output of ls -l, and therefore suggest other methods.
However, I'm using ncftpls -l, and so I can't use things like shell globs or find – I think I have a genuine need to actually parse the ls -l output. Don't worry if you're not familiar with ncftpls, the output returns in exactly the same format as if you were just using ls -l.
There is a list of files at a public remote ftp directory, and I don't want to burden the remote server by re-downloading each of the desired files every time my cronjob fires. I want to check, for each one of a subset of files within the ftp directory, whether the file exists locally; if not, download it.
That's easy enough, I just use
tdy=`date -u '+%Y%m%d'`_
# Today's files
for i in $(ncftpls 'ftp://theftpserver/path/to/files' | grep ${tdy}); do
if [ ! -f $i ]; then
ncftpget "ftp://theftpserver/path/to/files/${i}"
fi
done
But I came upon the issue that sometimes the cron job will download a file that hasn't finished uploading, and so when it fires next, it skips the partially downloaded file.
So I wanted to add a check to make sure that for each file that I already have, the local file size matches the size of the same file on the remote server.
I was thinking along the lines of parsing the output of ncftpls -l and using awk, something like
for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
...
x=filesize # somehow get the file size and the filename
y=filename # from $i on each iteration and store in variables
...
done
but I can't seem to get both the filename and the filesize from the server into local variables on the same iteration of the loop; $i alternates between $9 and $5 in the awk string with each iteration.
If I could manage to get the filename and filesize into separate variables with each iteration, I could simply use stat -c "%s" $i to get the local size and compare it with the remote size. Then its a simple ncftpget on each remote file that I don't already have. I tinkered with syncing programs like lftp too, but didn't have much luck and would rather do it this way.
Any help is appreciated!
for loop splits when it sees any whitespace like space, tab, or newline. So, IFS is needed before loop, (there are a lot of questions about ...)
IFS=$'\n' && for i in $(ncftpls -l 'ftp://theftpserver/path/to/files' | awk '{print $9, $5}'); do
echo $i | awk '{print $NF}' # filesize
echo $i | awk '{NF--; print}' # filename
# you may have spaces in filenames, so is better to use last column for awk
done
The better way I think is to use while not for, so
ls -l | while read i
do
echo $i | awk '{print $9, $5}'
#split them if you want
x=echo $i | awk '{print $5}'
y=echo $i | awk '{print $9}'
done

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

Bash awk append to same line

There are numerous posts about removing leading white space and appending an entry to a single existing line in a file using awk. None of my attempts work - just three examples here of the many I have tried.
Say I have a file called $log with a single line
a:b:c
and I want to add a fourth entry,
awk '{ print $4"d" }' $log | tee -a $log
output seems to be a newline
`a:b:c:
d`
whereas, I want all on the same line;
a:b:c:d
try
BEGIN { FS = ":" } ; awk '{ print $4"d" }' $log | tee -a $log
or, this - avoid a new line
awk 'BEGIN { ORS=":" }; { print $4"d" }' $log | tee -a $log
no change
`a:b:c:
d`
awk is placing a space after c: and then writing d to the next line.
EDIT: | tee -a $log appears to be necessary to write the additional string to the file.
$log contains 39 variables and was generated using awk without | tee -a
odd...
The actual command to write $40 to the single line entries
awk '{ print $40"'$imagedir'" }' $log
output
+ awk '{ print $40"/home/geoland/Asterism-DEVEL/DSO" }'
/home/geoland/.asterism/log
but this does not write to the $log file.
How should I append d to the same line without leading white space using awk - also looking at sed xargs and other alternatives.
Using awk:
awk '{ print $0":d" }' file
Using sed:
sed 's/$/:d/' file
Using only bash:
while IFS= read -r line; do
echo "$line:d"
done < file
Using sed:
$ echo a:b:c | sed 's,\(^.*$\),\1:d,'
a:b:c:d
Thanks all... This is the solution I went with. I also needed to write the entire line to a perpetual log file because the log file is overwritten at each new process instance.
I will further investigate an awk solution.
logname=$imagedir/log_$name
while IFS=: read -r line; do
echo "$line$imagedir"
done < $log | tee $logname
This places $imagedir directly behind the last IFS ':' separator
There is probably room for refinement.
I too am not entirely sure what you're trying to do here.
Your command line, awk '{ print $4"d" }' $log | tee -a $log is problematic in a number of ways.
First, your awk script tries to print the 4th field, which is empty. Unless you say otherwise, fields are separated by whitespace, and the string a:b:c has no whitespace. So .. awk prints "d". And tee -a appends to your existing logfile, so what you're seeing is the original data, along with the d printed by awk. That's totally expected.
Second, it appears to have tee appending to the same file that awk is in the process of reading. This won't make an endless loop, as awk should stop reading the input file after whatever was the last byte when the file was opened, but it does mean you may have repeated data there.
Your other attempts, aside from some syntactical errors, all suffer from the same assumption that $4 means something that it does not.
The following awk snippet sets the input and output field separators to :, then sets the 4th field to "d", then prints the line.
$ echo "a:b:c" | awk 'BEGIN{FS=OFS=":"} {$4="d"} 1'
a:b:c:d
Is that what you want?
If you really do need to append this data to an existing log file, you can do so with tee -a or simple >> redirection. Just bear in mind that awk will only see the content of the file as of the time it was run, and by appending, you are not replacing lines.
One other thing. If you are actually hoping to use the content of the shell variable $imagedir inside awk, you should pass the variable in rather than exiting your quotes. For example:
$ echo "a:b:c" | awk -v d="foo/bar" 'BEGIN{FS=OFS=":"} {$4=d} 1'
a:b:c:foo/bar
sed "s|$|$imagedir|" file | tee newfile
This does the trick. Read 'file' and write the contents of 'file' with the substitution to a 'new file', so as to read the image directory when using a secondary standalone process.
Because the variable is a directory with several / these need to be escaped, so as not to interpret as sed delimiters. I had difficulty with this using a variable.
A neater option was to use an alternative delimiter. Not to be confused with the pipe that follows.

awk execute same command on different files one by one

Hi I have 30 txt files in a directory which are containing 4 columns.
How can I execute a same command on each file one by one and direct output to different file.
The command I am using is as below but its being applied on all the files and giving single output. All i want is to call each file one by one and direct outputs to a new file.
start=$1
patterns=''
for i in $(seq -43 -14); do
patterns="$patterns /cygdrive/c/test/kpi/SIGTRAN_Load_$(exec date '+%Y%m%d' --date="-${i} days ${start}")*"; done
cat /cygdrive/c/test/kpi/*$patterns | sed -e "s/\t/,/g" -e "s/ /,/g"| awk -F, 'a[$3]<$4{a[$3]=$4} END {for (i in a){print i FS a[i]}}'| sed -e "s/ /0/g"| sort -t, -k1,2> /cygdrive/c/test/kpi/SIGTRAN_Load.csv
Sth like this
for fileName in /path/to/files/foo*.txt
do
mangleFile "$fileName"
done
will mangle a list of files you give via globbing. If you want to generate the file name patterns as in your example, you can do it like this:
for i in $(seq -43 -14)
do
for fileName in /cygdrive/c/test/kpi/SIGTRAN_Load_"$(exec date '+%Y%m%d' --date="-${i} days ${start}")"*
do
mangleFile "$fileName"
done
done
This way the code stays much more readable, even if shorter solutions may exist.
The mangleFile of course then will be the awk call or whatever you would like to do with each file.
Use the following idiom:
for file in *
do
./your_shell_script_containing_the_above.sh $file > some_unique_id
done
You need to run a loop on all the matching files:
for i in /cygdrive/c/test/kpi/*$patterns; do
tr '[:space:]\n' ',\n' < "$i" | awk -F, 'a[$3]<$4{a[$3]=$4} END {for (i in a){print i FS a[i]}}'| sed -e "s/ /0/g"| sort -t, -k1,2 > "/cygdrive/c/test/kpi/SIGTRAN_Load-$i.csv"
done
PS: I haven't tried much to refactor your piped commands that can probably be shortened too.

Resources