How to check if a file have been complete and reach to EOF? - bash

My collaborator was processing a large batch of files, but some of the output files seem to be interrupted before they were completed. It seems that these incomplete files do not have the end of the file character (EOF). I would like to do a script in batch to loop through all of these files and check if the EOF character is there for every one of the ~500 files. Can you give me any idea of how to do this? Which command can I use to know if a file has EOF character at the end?
I am not sure if there is supposed to be a special character at the end of the files when they are complete, but normal files looks like this
my_user$ tail CHSA0011.fastq
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
#HS40_15367:8:1106:6878:29640/2
TGATCCATCGTGATGTCTTATTTAAGGGGAACGTGTGGGCTATTTAGGCTTTATGACCCTGAAGTAGGAACCAGA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
#HS40_15367:8:1202:14585:48098/1
TGATCCATCGTGATGTCTTATTTAAGGGGAACGTGTGGGCTATTTAGGCTTTATGACCCTGAAGTAGGAACCAGA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
my_user$
But when I do tail tho thse interrupted files they look like:
my_user$ tail IST-MES1.fastq
#HS19_13305:3:1115:13001:3380/2
GTGGAGACGAGGTTTCACCATGTTGGCCAGGCTGGTCTCGAGCTCCTGACCTCAAGTGATCCGTCTGCCTTGGCC
+
#B#FFFFFHHHHFHHIJJJJJIIJJJJJJJIJJJJGIIJJGIIGIIJJJJFDHHIJFHGIGHIHHHFFFFFFEEE
#HS19_13305:3:1106:5551:75750/2
CGAGGTTTCACCATGTTGGCCAGGCTGGTCTCGAGCTCCTGACCTCAAGTGATCCGTCTGCCTTGGCCCCCCAAA
+
CCCFFADFHHHHHJJIJJJJJJJJJJJJEGGIJGGHIIJIIIIIIJJJJDEGGIJJJGIIIJJIJJJHHHFDDDD
#HS19_13305:3:2110:17731:73616/2
CGAGGTTTCACCATGTTGGCCAGGCTGmy_user$
As you can see, in normal files my_user$ is displayed one line below the end of the file. But in these interrupted ones my_user$ is next to the end of the files. Maybe it just because the file does not end with a line breaker \n ?
I am sorry if the question is a bit confusing,
cheers,
Guillermo

Yes, the difference is because in the first case the file ends with \n (new line).
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
my_user$
In this case it doesn't have a new line so the next thing it prints is your use (actually your PS1)
CGAGGTTTCACCATGTTGGCCAGGCTGmy_user$
You can try this:
echo "CCCFFADFHHHHH" # <--- implicitly includes newline at the end
echo -n "CCCFFADFHHHHH" # <--- does not include newline at the end
There are actually two endline options, \r and \n and there are different standards according to your OS. I will assume you are working on linux and only \n is used. So in this example the newline character is 0x0a (number 10) in the ascii map.
If you want to know the last char of each file, you can do:
echo -n "CCCFFADFHHHHH" > uglyfile.txt
echo "CCCFFADFHHHHH" > nicefile.txt
for file in *.txt; do
echo -n "$file ends with: 0x";
tail -c 1 $file | xxd -p;
done;
If you want to know which files end with a char that is not a newline, you can do:
echo -n "CCCFFADFHHHHH" > uglyfile.txt
echo "CCCFFADFHHHHH" > nicefile.txt
for file in *.txt; do
lastchar_hex=`tail -c 1 $file | xxd -p`
if [[ $lastchar_hex != '0a' ]]; then
echo "File $file does not end with newline"
fi;
done;

Related

Using sed in order to change a specific character in a specific line

I'm a beginner in bash and here is my problem. I have a file just like this one:
Azzzezzzezzzezzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I try in a script to edit this file.ABC letters are unique in all this file and there is only one per line.
I want to replace the first e of each line by a number who can be :
1 in line beginning with an A,
2 in line beginning with a B,
3 in line beginning with a C,
and I'd like to loop this in order to have this type of result
Azzz1zzz5zzz1zzz...
Bzzz2zzz4zzz5zzz...
Czzz3zzz6zzz3zzz...
All the numbers here are random int variables between 0 and 9. I really need to start by replacing 1,2,3 in first exec of my loop, then 5,4,6 then 1,5,3 and so on.
I tried this
sed "0,/e/s/e/$1/;0,/e/s/e/$2/;0,/e/s/e/$3/" /tmp/myfile
But the result was this (because I didn't specify the line)
Azzz1zzz2zzz3zzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I noticed that doing sed -i "/A/ s/$/ezzz/" /tmp/myfile will add ezzz at the end of A line so I tried this
sed -i "/A/ 0,/e/s/e/$1/;/B/ 0,/e/s/e/$2/;/C/ 0,/e/s/e/$3/" /tmp/myfile
but it failed
sed: -e expression #1, char 5: unknown command: `0'
Here I'm lost.
I have in a variable (let's call it number_of_e_per_line) the number of e in either A, B or C line.
Thank you for the time you take for me.
Just apply s command on the line that matches A.
sed '
/^A/{ s/e/$1/; }
/^B/{ s/e/$2/; }
# or shorter
/^C/s/e/$3/
'
s command by default replaces the first occurrence. You can do for example s/s/$1/2 to replace the second occurrence, s/e/$1/g (like "Global") replaces all occurrences.
0,/e/ specifies a range of lines - it filters lines from the first up until a line that matches /e/.
sed is not part of Bash. It is a separate (crude) programming language and is a very standard command. See https://www.grymoire.com/Unix/Sed.html .
Continuing from the comment. sed is a poor choice here unless all your files can only have 3 lines. The reason is sed processes each line and has no way to keep a separate count for the occurrences of 'e'.
Instead, wrapping sed in a script and keeping track of the replacements allows you to handle any file no matter the number of lines. You just loop and handle the lines one at a time, e.g.
#!/bin/bash
[ -z "$1" ] && { ## valiate one argument for filename provided
printf "error: filename argument required.\nusage: %s filename\n" "./$1" >&2
exit 1
}
[ -s "$1" ] || { ## validate file exists and non-empty
printf "error: file not found or empty '%s'.\n" "$1"
exit 1
}
declare -i n=1 ## occurrence counter initialized 1
## loop reading each line
while read -r line || [ -n "$line" ]; do
[[ $line =~ ^.*e.*$ ]] || continue ## line has 'e' or get next
sed "s/e/1/$n" <<< "$line" ## substitute the 'n' occurence of 'e'
((n++)) ## increment counter
done < "$1"
Your data file having "..." at the end of each line suggests your files is larger than the snippet posted. If you have lines beginning 'A' - 'Z', you don't want to have to write 26 separate /match/s/find/replace/ substitutions. And if you have somewhere between 3 and 26 (or more), you don't want to have to rewrite a different sed expression for every new file you are faced with.
That's why I say sed is a poor choice. You really have no way to make the task a generic task with sed. The downside to using a script is it will become a poor choice as the number of records you need to process increase (over 100000 or so just due to efficiency)
Example Use/Output
With the script in replace-e-incremental.sh and your data in file, you would do:
$ bash replace-e-incremental.sh file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
To Modify file In-Place
Since you make multiple calls to sed here, you need to redirect the output of the file to a temporary file and then replace the original by overwriting it with the temp file, e.g.
$ bash replace-e-incremental.sh file > mytempfile && mv -f mytempfile file
$ cat file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...

Writing a script for large text file manipulation (iterative substitution of duplicated lines), weird bugs and very slow.

I am trying to write a script which takes a directory containing text files (384 of them) and modifies duplicate lines that have a specific format in order to make them not duplicates.
In particular, I have files in which some lines begin with the '#' character and contain the substring 0:0. A subset of these lines are duplicated one or more times. For those that are duplicated, I'd like to replace 0:0 with i:0 where i starts at 1 and is incremented.
So far I've written a bash script that finds duplicated lines beginning with '#', writes them to a file, then reads them back and uses sed in a while loop to search and replace the first occurrence of the line to be replaced. This is it below:
#!/bin/bash
fdir=$1"*"
#for each fastq file
for f in $fdir
do
(
#find duplicated read names and write to file $f.txt
sort $f | uniq -d | grep ^# > "$f".txt
#loop over each duplicated readname
while read in; do
rname=$in
i=1
#while this readname still exists in the file increment and replace
while grep -q "$rname" $f; do
replace=${rname/0:0/$i:0}
sed -i.bu "0,/$rname/s/$rname/$replace/" "$f"
let "i+=1"
done
done < "$f".txt
rm "$f".txt
rm "$f".bu
done
echo "done" >> progress.txt
)&
background=( $(jobs -p) )
if (( ${#background[#]} ==40)); then
wait -n
fi
done
The problem with it is that its impractically slow. I ran it on a 48 core computer for over 3 days and it hardly got through 30 files. It also seemed to have removed about 10 files and I'm not sure why.
My question is where are the bugs coming from and how can I do this more efficiently? I'm open to using other programming languages or changing my approach.
EDIT
Strangely the loop works fine on one file. Basically I ran
sort $f | uniq -d | grep ^# > "$f".txt
while read in; do
rname=$in
i=1
while grep -q "$rname" $f; do
replace=${rname/0:0/$i:0}
sed -i.bu "0,/$rname/s/$rname/$replace/" "$f"
let "i+=1"
done
done < "$f".txt
To give you an idea of what the files look like below are a few lines from one of them. The thing is that even though it works for the one file, it's slow. Like multiple hours for one file of 7.5 M. I'm wondering if there's a more practical approach.
With regard to the file deletions and other bugs I have no idea what was happening Maybe it was running into memory collisions or something when they were run in parallel?
Sample input:
#D00269:138:HJG2TADXX:2:1101:0:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:0:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
Sample output:
#D00269:138:HJG2TADXX:2:1101:1:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:2:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
Here's some code that produces the required output from your sample input.
Again, it is assumed that your input file is sorted by the first value (up to the first space character).
time awk '{
#dbg if (dbg) print "#dbg:prev=" prev
if (/^#/ && prev!=$1) {fixNum=0 ;if (dbg) print "prev!=$1=" prev "!=" $1}
if (/^#/ && (prev==$1 || NR==1) ) {
prev=$1
n=split($1,tmpArr,":") ; n++
#dbg if (dbg) print "tmpArr[6]="tmpArr[6] "\tfixNum="fixNum
fixNum++;tmpArr[6]=fixNum;
# magic to rebuild $1 here
for (i=1;i<n;i++) {
tmpFix ? tmpFix=tmpFix":"tmpArr[i]"" : tmpFix=tmpArr[i]
}
$1=tmpFix ; $0=$0
print $0
}
else { tmpFix=""; print $0 }
}' file > fixedFile
output
#D00269:138:HJG2TADXX:2:1101:1:0 1:N:0:CCTAGAAT+ATTCCTCT
GATAAGGACGGCTGGTCCCTGTGGTACTCAGAGTATCGCTTCCCTGAAGA
+
CCCFFFFFHHFHHIIJJJJIIIJJIJIJIJJIIBFHIHIIJJJJJJIJIG
#D00269:138:HJG2TADXX:2:1101:2:0 1:N:0:CCTAGAAT+ATTCCTCT
CAAGTCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCG
I've left a few of the #dbg:... statements in place (but they are now commented out) to show how you can run a small set of data as you have provided, and watch the values of variables change.
Assuming a non-csh, you should be able to copy/paste the code block into a terminal window cmd-line and replace file > fixFile at the end with your real file name and a new name for the fixed file. Recall that awk 'program' file > file (actually, any ...file>file) will truncate the existing file and then try to write, SO you can lose all the data of a file trying to use the same name.
There are probably some syntax improvements that will reduce the size of this code, and there might be 1 or 2 things that could be done that will make the code faster, but this should run very quickly. If not, please post the result of time command that should appear at the end of the run, i.e.
real 0m0.18s
user 0m0.03s
sys 0m0.06s
IHTH
#!/bin/bash
i=4
sort $1 | uniq -d | grep ^# > dups.txt
while read in; do
if [ $((i%4))=0 ] && grep -q "$in" dups.txt; then
x="$in"
x=${x/"0:0 "/$i":0 "}
echo "$x" >> $1"fixed.txt"
else
echo "$in" >> $1"fixed.txt"
fi
let "i+=1"
done < $1

Displaying only single most recent line of a command's output

How can I print a command output like one from rm -rv * in a single line ? I think it would need \r but I can't figure out how.
I would need to have something like this :
From:
removed /path/file1
removed /path/file2
removed /path/file3
To : Line 1 : removed /path/file1
Then : Line 1 : removed /path/file2
Then : Line 1 : removed /path/file3
EDIT : I may have been misunderstood, I want to have the whole process beeing printing in a single same line, changing as the command outputs an another line (like removed /path/file123)
EDIT2 : The output is sometimes too long to be display in on line (very long path). I would need something that considers that problem too :
/very/very/very/long/path/to/a/very/very/very/far/file/with-a-very-very-very-long-name1
/very/very/very/long/path/to/a/very/very/very/far/file/with-a-very-very-very-long-name2
/very/very/very/long/path/to/a/very/very/very/far/file/with-a-very-very-very-long-name3
Here's a helper function:
shopt -s checkwinsize # ensure that COLUMNS is available w/ window size
oneline() {
local ws
while IFS= read -r line; do
if (( ${#line} >= COLUMNS )); then
# Moving cursor back to the front of the line so user input doesn't force wrapping
printf '\r%s\r' "${line:0:$COLUMNS}"
else
ws=$(( COLUMNS - ${#line} ))
# by writing each line twice, we move the cursor back to position
# thus: LF, content, whitespace, LF, content
printf '\r%s%*s\r%s' "$line" "$ws" " " "$line"
fi
done
echo
}
Used as follows:
rm -rv -- * 2>&1 | oneline
To test this a bit more safely, one might use:
for f in 'first line' 'second line' '3rd line'; do echo "$f"; sleep 1; done | oneline
...you'll see that that test displays first line for a second, then second line for a second, then 3rd line for a second.
If you want a "status line" result that is showing the last line output by the program where the line gets over-written by the next line when it comes out you can send the output for the command through a short shell while loop like this:
YourCommand | while read line ; do echo -n "$line"$' ...[lots of spaces]... \r' ; done
The [Lots of spaces] is needed in case a shorter line comes after a longer line. The short line needs to overwrite the text from the longer line or you will see residual characters from the long line.
The echo -n $' ... \r' sends a literal carriage return without a line-feed to the screen which moves the position back to the front of the line but doesn't move down a line.
If you want the text from your command to just be output in 1 long line, then
pipe the output of any command through this sed command and it should replace the carriage returns with spaces. This will put the output all on one line. You could change the space to another delimiter if desired.
your command | sed ':rep; {N;}; s/\n/ /; {t rep};'
:rep; is a non-command that marks where to go to in the {t rep} command.
{N;} will join the current line to the next line.
It doesn't remove the carriage return but just puts the 2 lines in the buffer to be used for following commands.
s/\n/ /; Says to replace the carriage return character with a space character. They space is between the second and third/ characters.
You may need to replace \r\n depending on if the file has line feeds. UNIX files don't unless they came from a pc and haven't been converted.
{t rep}; says that if the match was found in the s/// command then go to the :rep; marker.
This will keep joining lines, removing the \n, then jumping to :rep; until there are no more likes to join.

Unix one-liner to swap/transpose two lines in multiple text files?

I wish to swap or transpose pairs of lines according to their line-numbers (e.g., switching the positions of lines 10 and 15) in multiple text files using a UNIX tool such as sed or awk.
For example, I believe this sed command should swap lines 14 and 26 in a single file:
sed -n '14p' infile_name > outfile_name
sed -n '26p' infile_name >> outfile_name
How can this be extended to work on multiple files? Any one-liner solutions welcome.
If you want to edit a file, you can use ed, the standard editor. Your task is rather easy in ed:
printf '%s\n' 14m26 26-m14- w q | ed -s file
How does it work?
14m26 tells ed to take line #14 and move it after line #26
26-m14- tells ed to take the line before line #26 (which is your original line #26) and move it after line preceding line #14 (which is where your line #14 originally was)
w tells ed to write the file
q tells ed to quit.
If your numbers are in a variable, you can do:
linea=14
lineb=26
{
printf '%dm%d\n' "$linea" "$lineb"
printf '%d-m%d-\n' "$lineb" "$linea"
printf '%s\n' w q
} | ed -s file
or something similar. Make sure that linea<lineb.
If you want robust in-place updating of your input files, use gniourf_gniourf's excellent ed-based answer
If you have GNU sed and want to in-place updating with multiple files at once, use
#potong's excellent GNU sed-based answer (see below for a portable alternative, and the bottom for an explanation)
Note: ed truly updates the existing file, whereas sed's -i option creates a temporary file behind the scenes, which then replaces the original - while typically not an issue, this can have undesired side effects, most notably, replacing a symlink with a regular file (by contrast, file permissions are correctly preserved).
Below are POSIX-compliant shell functions that wrap both answers.
Stdin/stdout processing, based on #potong's excellent answer:
POSIX sed doesn't support -i for in-place updating.
It also doesn't support using \n inside a character class, so [^\n] must be replaced with a cumbersome workaround that positively defines all character except \n that can occur on a line - this is a achieved with a character class combining printable characters with all (ASCII) control characters other than \n included as literals (via a command substitution using printf).
Also note the need to split the sed script into two -e options, because POSIX sed requires that a branching command (b, in this case) be terminated with either an actual newline or continuation in a separate -e option.
# SYNOPSIS
# swapLines lineNum1 lineNum2
swapLines() {
[ "$1" -ge 1 ] || { printf "ARGUMENT ERROR: Line numbers must be decimal integers >= 1.\n" >&2; return 2; }
[ "$1" -le "$2" ] || { printf "ARGUMENT ERROR: The first line number ($1) must be <= the second ($2).\n" >&2; return 2; }
sed -e "$1"','"$2"'!b' -e ''"$1"'h;'"$1"'!H;'"$2"'!d;x;s/^\([[:print:]'"$(printf '\001\002\003\004\005\006\007\010\011\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177')"']*\)\(.*\n\)\(.*\)/\3\2\1/'
}
Example:
$ printf 'line 1\nline 2\nline 3\n' | swapLines 1 3
line 3
line 2
line 1
In-place updating, based on gniourf_gniourf's excellent answer:
Small caveats:
While ed is a POSIX utility, it doesn't come preinstalled on all platforms, notably not on Debian and the Cygwin and MSYS Unix-emulation environments for Windows.
ed always reads the input file as a whole into memory.
# SYNOPSIS
# swapFileLines lineNum1 lineNum2 file
swapFileLines() {
[ "$1" -ge 1 ] || { printf "ARGUMENT ERROR: Line numbers must be decimal integers >= 1.\n" >&2; return 2; }
[ "$1" -le "$2" ] || { printf "ARGUMENT ERROR: The first line number ($1) must be <= the second ($2).\n" >&2; return 2; }
ed -s "$3" <<EOF
H
$1m$2
$2-m$1-
w
EOF
}
Example:
$ printf 'line 1\nline 2\nline 3\n' > file
$ swapFileLines 1 3 file
$ cat file
line 3
line 2
line 1
An explanation of #potong's GNU sed-based answer:
His command swaps lines 10 and 15:
sed -ri '10,15!b;10h;10!H;15!d;x;s/^([^\n]*)(.*\n)(.*)/\3\2\1/' f1 f2 fn
-r activates support for extended regular expressions; here, notably, it allows use of unescaped parentheses to form capture groups.
-i specifies that the files specified as operands (f1, f2, fn) be updated in place, without backup, since no optional suffix for a backup file is adjoined to the -i option.
10,15!b means that all lines that do not (!) fall into the range of lines 10 through 15 should branch (b) implicitly to the end of the script (given that no target-label name follows b), which means that the following commands are skipped for these lines. Effectively, they are simply printed as is.
10h copies (h) line number 10 (the start of the range) to the so-called hold space, which is an auxiliary buffer.
10!H appends (H) every line that is not line 10 - which in this case implies lines 11 through 15 - to the hold space.
15!d deletes (d) every line that is not line 15 (here, lines 10 through 14) and branches to the end of the script (skips remaining commands). By deleting these lines, they are not printed.
x, which is executed only for line 15 (the end of the range), replaces the so-called pattern space with the contents of the hold space, which at that point holds all lines in the range (10 through 15); the pattern space is the buffer on which sed commands operate, and whose contents are printed by default (unless -n was specified).
s/^([^\n]*)(.*\n)(.*)/\3\2\1/ then uses capture groups (parenthesized subexpressions of the regular expression that forms the first argument passed to function s) to partition the contents of the pattern space into the 1st line (^([^\n]*)), the middle lines ((.*\n)), and the last line ((.*)), and then, in the replacement string (the second argument passed to function s), uses backreferences to place the last line (\3) before the middle lines (\2), followed by the first line (\1), effectively swapping the first and last lines in the range. Finally, the modified pattern space is printed.
As you can see, only the range of lines spanning the two lines to swap is held in memory, whereas all other lines are passed through individually, which makes this approach memory-efficient.
This might work for you (GNU sed):
sed -ri '10,15!b;10h;10!H;15!d;x;s/^([^\n]*)(.*\n)(.*)/\3\2\1/' f1 f2 fn
This stores a range of lines in the hold space and then swaps the first and last lines following the completion of the range.
The i flag edits each file (f1,f2 ... fn) in place.
With GNU awk:
awk '
FNR==NR {if(FNR==14) x=$0;if(FNR==26) y=$0;next}
FNR==14 {$0=y} FNR==26 {$0=x} {print}
' file file > file_with_swap
The use of the following helper script allows using the power of find ... -exec ./script '{}' l1 l2 \; to locate the target files and to swap lines l1 & l2 in each file in place. (it requires that there are no identical duplicate lines within the file that fall within the search range) The script uses sed to read the two swap lines from each file into an indexed array and passes the lines to sed to complete the swap by matching. The sed call uses its "matched first address" state to limit the second expression swap to the first occurrence. An example use of the helper script below to swap lines 5 & 15 in all matching files is:
find . -maxdepth 1 -type f -name "lnum*" -exec ../swaplines.sh '{}' 5 15 \;
For example, the find call above found files lnumorig.txt and lnumfile.txt in the present directory originally containing:
$ head -n20 lnumfile.txt.bak
1 A simple line of test in a text file.
2 A simple line of test in a text file.
3 A simple line of test in a text file.
4 A simple line of test in a text file.
5 A simple line of test in a text file.
6 A simple line of test in a text file.
<snip>
14 A simple line of test in a text file.
15 A simple line of test in a text file.
16 A simple line of test in a text file.
17 A simple line of test in a text file.
18 A simple line of test in a text file.
19 A simple line of test in a text file.
20 A simple line of test in a text file.
And swapped the lines 5 & 15 as intended:
$ head -n20 lnumfile.txt
1 A simple line of test in a text file.
2 A simple line of test in a text file.
3 A simple line of test in a text file.
4 A simple line of test in a text file.
15 A simple line of test in a text file.
6 A simple line of test in a text file.
<snip>
14 A simple line of test in a text file.
5 A simple line of test in a text file.
16 A simple line of test in a text file.
17 A simple line of test in a text file.
18 A simple line of test in a text file.
19 A simple line of test in a text file.
20 A simple line of test in a text file.
The helper script itself is:
#!/bin/bash
[ -z $1 ] && { # validate requierd input (defaults set below)
printf "error: insufficient input calling '%s'. usage: file [line1 line2]\n" "${0//*\//}" 1>&2
exit 1
}
l1=${2:-10} # default/initialize line numbers to swap
l2=${3:-15}
while IFS=$'\n' read -r line; do # read lines to swap into indexed array
a+=( "$line" );
done <<<"$(sed -n $((l1))p "$1" && sed -n $((l2))p "$1")"
((${#a[#]} < 2)) && { # validate 2 lines read
printf "error: requested lines '%d & %d' not found in file '%s'\n" $l1 $l2 "$1"
exit 1
}
# swap lines in place with sed (remove .bak for no backups)
sed -i.bak -e "s/${a[1]}/${a[0]}/" -e "0,/${a[0]}/s/${a[0]}/${a[1]}/" "$1"
exit 0
Even though I didn't manage to get it all done in a one-liner I decided it was worth posting in case you can make some use of it or take ideas from it. Note: if you do make use of it, test to your satisfaction before turning it loose on your system. The script currently uses sed -i.bak ... to create backups of the files changed for testing purposes. You can remove the .bak when you are satisfied it meets your needs.
If you have no use for setting default lines to swap in the helper script itself, then I would change the first validation check to [ -z $1 -o -z $2 -o $3 ] to insure all required arguments are given when the script is called.
While it does identify the lines to be swapped by number, it relies on the direct match of each line to accomplish the swap. This means that any identical duplicate lines up to the end of the swap range will cause an unintended match and failue to swap the intended lines. This is part of the limitation imposed by not storing each line within the range of lines to be swapped as discussed in the comments. It's a tradeoff. There are many, many ways to approach this, all will have their benefits and drawbacks. Let me know if you have any questions.
Brute Force Method
Per your comment, I revised the helper script to use the brute forth copy/swap method that would eliminate the problem of any duplicate lines in the search range. This helper obtains the lines via sed as in the original, but then reads all lines from file to tmpfile swapping the appropriately numbered lines when encountered. After the tmpfile is filled, it is copied to the original file and tmpfile is removed.
#!/bin/bash
[ -z $1 ] && { # validate requierd input (defaults set below)
printf "error: insufficient input calling '%s'. usage: file [line1 line2]\n" "${0//*\//}" 1>&2
exit 1
}
l1=${2:-10} # default/initialize line numbers to swap
l2=${3:-15}
while IFS=$'\n' read -r line; do # read lines to swap into indexed array
a+=( "$line" );
done <<<"$(sed -n $((l1))p "$1" && sed -n $((l2))p "$1")"
((${#a[#]} < 2)) && { # validate 2 lines read
printf "error: requested lines '%d & %d' not found in file '%s'\n" $l1 $l2 "$1"
exit 1
}
# create tmpfile, set trap, truncate
fn="$1"
rmtemp () { cp "$tmpfn" "$fn"; rm -f "$tmpfn"; }
trap rmtemp SIGTERM SIGINT EXIT
declare -i n=1
tmpfn="$(mktemp swap_XXX)"
:> "$tmpfn"
# swap lines in place with a tmpfile
while IFS=$'\n' read -r line; do
if ((n == l1)); then
printf "%s\n" "${a[1]}" >> "$tmpfn"
elif ((n == l2)); then
printf "%s\n" "${a[0]}" >> "$tmpfn"
else
printf "%s\n" "$line" >> "$tmpfn"
fi
((n++))
done < "$fn"
exit 0
If the line numbers to be swapped are fixed then you might want to try something like the sed command in the following example to have lines swapped in multiple files in-place:
#!/bin/bash
# prep test files
for f in a b c ; do
( for i in {1..30} ; do echo $f$i ; done ) > /tmp/$f
done
sed -i -s -e '14 {h;d}' -e '15 {N;N;N;N;N;N;N;N;N;N;G;x;d}' -e '26 G' /tmp/{a,b,c}
# -i: inplace editing
# -s: treat each input file separately
# 14 {h;d} # first swap line: hold ; suppress
# 15 {N;N;...;G;x;d} # lines between: collect, append held line; hold result; suppress
# 26 G # second swap line: append held lines (and output them all)
# dump test files
cat /tmp/{a,b,c}
(This is according to Etan Reisner's comment.)
If you want to swap two lines, you can send it through twice, you could make it loop in one sed script if you really wanted, but this works:
e.g.
test.txt: for a in {1..10}; do echo "this is line $a"; done >> test.txt
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6
this is line 7
this is line 8
this is line 9
this is line 10
Then to swap lines 6 and 9:
sed ':a;6,8{6h;6!H;d;ba};9{p;x};' test.txt | sed '7{h;d};9{p;x}'
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 9
this is line 7
this is line 8
this is line 6
this is line 10
In the first sed it builds up the hold space with lines 6 through 8.
At line 9 it prints line 9 then prints the hold space (lines 6 through 8) this accomplishes the first move of 9 to place 6. Note: 6h; 6!H avoids a new line at the top of the pattern space.
The second move occurs in the second sed script it saves line 7 to the hold space, then deletes it and prints it after line 9.
To make it quasi-generic you can use variables like this:
A=3 && B=7 && sed ':a;'${A}','$((${B}-1))'{'${A}'h;'${A}'!H;d;ba};'${B}'{p;x};' test.txt | sed $(($A+1))'{h;d};'${B}'{p;x}'
Where A and B are the lines you want to swap, in this case lines 3 and 7.
if, you want swap two lines, to create script "swap.sh"
#!/bin/sh
sed -n "1,$((${2}-1))p" "$1"
sed -n "${3}p" "$1"
sed -n "$((${2}+1)),$((${3}-1))p" "$1"
sed -n "${2}p" "$1"
sed -n "$((${3}+1)),\$p" "$1"
next
sh swap.sh infile_name 14 26 > outfile_name

replace first line of files by another made by changing path from unix to windows

I am trying to do a bash script that:
loop over some files : OK
check if the first line matches this pattern (#!f:\test\python.exe) : OK
create a new path by changing the unix style to windows style : KO
Precisely,
From: \c\tata\development\tools\virtualenvs\test2\Scripts\python.exe
I want to get: c:\tata\development\tools\virtualenvs\test2\Scripts\python.exe
insert the new line by appending #! and the new path : KO
Follow is my script but I'm really stuck!
for f in $WORKON_HOME/$env_name/$VIRTUALENVWRAPPER_ENV_BIN_DIR/*.py
do
echo "----"
echo file=$f >&2
FIRSTLINE=`head -n 1 $f`
echo firstline=$FIRSTLINE >&2
unix_path=$WORKON_HOME/$env_name/$VIRTUALENVWRAPPER_ENV_BIN_DIR/python.exe
new_path=`echo $unix_path | awk '{gsub("/","\\\")}1'`
echo new_path=$new_path >&2
# I need to change the new_path by removing the first \ and adding : after the first letter => \c -> c:
new_line="#!"$new_path
echo new_line=$new_line >&2
case "$FIRSTLINE" in
\#!*python.exe* )
# Rewrite first line
sed -i '1s,.*,'"$new_line"',' $f
esac
done
Output:
file=/c/tata/development/tools/virtualenvs/test2/Scripts/pip-script.py
firstline=#!f:\test\python.exe
new_path=\c\tata\development\tools\virtualenvs\test2\Scripts\python.exe
new_line=#!\c\tata\development\tools\virtualenvs\test2\Scripts\python.exe
Line that is written in the file: (some weird characters are written I do not know why...)
#!tatadevelopment oolsirtualenvs est2Scriptspython.exe
Line I am expecting:
#!c:\tata\development\tools\virtualenvs\test2\Scripts\python.exe
sed is interpreting the backslashes and characters following them as escapes, so you're getting, e.g. tab. You need to escape the backslashes.
sed -i "1s,.*,${new_line//\\/\\\\}," "$f"

Resources