I have a script running in linux which is simply multiple function calls outputted into a file. I have a function which is an overview of important information That I would like to append to the top of the file for easy viewing.
The problem is that I cannot simply call this overview function first because it is dependant on previous functions.
Is there an easier way to do this without creating a temp file? This is a fairly large file and that would take pretty long.
If you're using something like a Perl script, it should be possible to first leave some space at the top for the overview, then write all your data.
After this, reopen the file with write/append (w+) access, move the filehandle position to the desired position early in the file, using seek() or sysseek() functions, then write the overview data to it.
Help on Perl functions can be obtained here: http://perldoc.perl.org/perlfunc.html
First of all, perhaps you meant prepend, not append.
In linux, suppose you want:
file1 -> BottomContent
file2 -> TopContent
You could use:
$ cat file2 file1 > finalfile; rm file[12]
#SandeepY has one reasonable solution.
Any time you modify a file in Unix, you're using a system that is opening your original file and an new file, so there's (almost always) a temporary file involved, whether you can see it or not.
That being said, another solution, as you specfied a function is providing some output, is to use a process group to "marshall" your output into one stream, and redirect that into your file.
mv mainFile mainFile.tmp
{
myFunc
cat mainFile.tmp
} > mainFile && /bin/rm mainFile.tmp
As you seem to need this regularly, it should be easy to turn this into a function, replacing mainFile with "$1".
IHTH
Related
I have two folders where the 1st has 19 .fa files and the 2nd has 37096 .fa files
Files in the 1st folder are named BF_genomea[a-s].fa, and files in the 2nd are named [1-37096]ZF_genome.fa
I have to run this process where lastz filein1stfolder filein2ndfolder [arguments] > outputfile.axt, so that I run every file in the 1st folder against every file in the 2nd folder.
Any sort of output file's naming would serve, as far as it allows for id which particular combination of parent files they came from, and they have extension .axt
This is what I have done so far
for file in /tibet/madzays/finch_data/BF_genome_split/*.fa; do for otherfile in /tibet/madzays/finch_data/ZF_genome_split/*.fa; name="${file##*/}"; othername="${otherfile##*/}"; lastz $file $otherfile --step=19 --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=/tibet/madzays/finch_data/BFvsZFLASTZ/HoxD55.q > /home/madzays/qsub/test/"$name""$othername".axt; done; done
Ad I said in a comment, the inner loop is missing a do keyword (for otherfile in pattern; do <-- right there). Is this in the form of a script file? If so, you should add a shebang as the first line to tell the OS how to run the script. And break it into multiple lines and indent the contents of the loops, to make it easier to read (and easier to spot problems like the missing do).
Off the top of my head, I see one other thing I'd change: the output filenames are going to be pretty ugly, just the two input files mashed together with a ".atx" on the end (along the lines of "BF_genomeac.fa14ZF_genome.fa.axt"). I'd parse the IDs out of the input filenames and then use them to build a more reasonable output filename convention. Something like this
#!/bin/bash
for file in /tibet/madzays/finch_data/BF_genome_split/*.fa; do
for otherfile in /tibet/madzays/finch_data/ZF_genome_split/*.fa; do
name="${file##*/}"
tmp="${name#BF_genomea}" # remove filename prefix
id="${tmp%.*}" # remove extension to get the ID
othername="${otherfile##*/}"
otherid="${othername%ZF_genome.fa}" # just have to remove a suffix here
lastz $file $otherfile --step=19 --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=/tibet/madzays/finch_data/BFvsZFLASTZ/HoxD55.q > "/home/madzays/qsub/test/BF${id}_${otherid}ZF.axt"
done
done
The code can nearly directly been translated from your requierements:
base=/tibet/madzays/finch_data
for b in {a..s}
do
for z in {1..37096}
do
lastz $base/BF_genome_split/${b}.fa $base/ZF_genome_split/${z}.fa --hspthresh=2200 --gappedthresh=10000 --ydrop=3400 --inner=2000 --seed=12of19 --format=axt --scores=$base/BFvsZFLASTZ/HoxD55.q > /home/madzays/qsub/test/${b}-${z}.axt
done
done
Note that oneliners easily lead to errors, like missing dos, which are then hard to find from the error message (error in line 1).
I am looking for a quick and dirty one-liner to sync only certain settings in remote config files. Need to preserve what's unique and sync generic settings. Example:
Config1.conf:
HOSTNAME=COMP1
IP=10.10.13.10
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
Remote-Config2.txt:
HOSTNAME=COMP2
IP=10.10.13.11
LOCATION=FOO
BUILDING=BAR
ROOM=BAZ
I need to sync or copy replace only the bottom 3 lines over ssh. The line numbers are predictable, by the way. Always lines 4,5 and 6 in this case.
Here's a working idea that is missing one piece (a standard replacement for the non-standard utility I used to replace the vars in the local conf):
for var in $(ssh root#10.10.8.12 'sed -n "4,6p" /etc/conf1.conf');do <missing piece> ${var/=*}=${var/*=} local-conf.conf; done
So this uses variable expansion and a non-standard utility but needs like a sed or Perl routine to replace the info in the local conf.
Update
The last line of code actually works. Tested and works! However -- the missing piece is a custom non-standard utility. I'm asking if someone can think of something, using standard Linux tools, to replace that.
One solution would be to take the left side and match, then replace the right side. This is basically what that utility does. Looks for the variable in the conf then sets it. Using variable expansion is one way (shown).
Here's an alternative solution that does not require the command to have special knowledge of the file contents:
Take a copy of the files you want to sync. Then, in the copy, deliberately vandalise (arbitrarily modify) the lines you do not want synced. It doesn't matter what they say as long as there are the same number of lines and they'll never match the actual file contents. Have some fun. This becomes your base version. Your example might look like this:
HOSTNAME=foo
IP=bar
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
rsync the remote files into a temporary location. This is the remote version.
For each file, take a three-way diff.
diff3 -3 <localfile> <basefile> <remotefile>
The output of diff3 is an "ed script" that decribes what edits to make to the local file so that it would look like the remote file.
The -3 option tells it to only output the non-conflicting differences. This is why we vandalised the base files in the first place: so those lines would have conflicts.
Once you have the ed script for a file, you can visually check it, if you choose, and then apply the update using patch:
cat <ed-script> | patch --ed <localfile>
So, to do this recursively, you might have:
cd $localdir
for file in `find . -type f`; do
diff3 -3 "$file" "$basedir/$file" "$remotedir/$file" | patch --ed "$file"
done
You probably need to add some checks that the base and remote files actually exist.
I need a text processing tool that can perform search and replace operations PER LINE on HUGE TEXT FILES (>0.5 GB). Can be either windows or linux based. (I don't know if there is anything like a streamreader/writer in Linux but I have a feeling that it would be the ideal solution. The editors I have tries so far load the whole file into the momory.)
Bonus question: a tool that can MERGE two huge texts on a per line basis, separated with e.g. tabs
Sounds like you want sed. For example,
sed 's/foo/bar/' < big-input-file > big-output-file
should replace the first occurrence of foo by bar in each line of big-input-file, writing the results to big-output-file.
Bonus answer: I just learned about paste, which seems to be exactly what you want for your bonus question.
'sed' is built into Linux/Unix, and is available for Windows. I believe that it only loads a buffer at a time (not the whole file) -- you might try that.
What would you be trying to do with the merge -- interleaved in some way, rather than just concatenating?
Add: interleave.pl
use strict;
use warnings;
my $B;
open INA, $ARGV[0];
open INB, $ARGV[1];
while (<INA>) {
print $_;
$B = <INB>;
print $B;
}
close INA;
close INB;
run: perl interleave.pl fileA fileB > mergedFile
Note that this is a very bare-bones utility. It does not check if the files exist, and it expects that the files have the same number of lines.
I would use perl for this. It is easy to read a file line by line, has great search/repace available using regular expressions, and will enable you to merge, and you can make your perl script aware of both files.
I found myself quite stomped. I am trying to output data from a script to a file.
Altho I need to keep only the last 10 values, so the append won't work.
The main script returns one line; so I save it to a file. I use tail to get the last 10 lines and process them, but then I get to the point where the file is too big, due the fact that I continue to append lines to it (the script output a line every minute or so, which bring up the size of the log quite fast.
I would like to limit the number of writes that I do on that script, so I can always have only the last 10 lines, discarding the rest.
I have thought about different approaches, but they all involve a lot of activity, like create temp files, delete the original file and create a new file, with just the tail of the last 10 entry; but it feels so un-elegant and very amateurish.
Is there a quick and clean way to query a file, so I can add lines until I hit 10 lines, and then start to delete the lines in chronological order, and add the new ones on the bottom?
Maybe things are easier than what I think, and there is a simple solution that I cannot see.
Thanks!
In general, it is difficult to remove data from the start of a file. The only way to do it is to overwrite the file with the tail that you wish to keep. It isn't that ugly to write, though. One fairly reasonable hack is to do:
{ rm file; tail -9 > file; echo line 10 >> file; } < file
This will retain the last 9 lines and add a 10th line. There is a lot of redundancy, so you might like to do something like:
append() { test -f $1 && { rm $1; tail -9 > $1; } < $1; cat >> $1; }
And then invoke it as:
echo 'the new 10th line' | append file
Please note that this hack of using redirecting input to the same file as the later output is a bit fragile and obscure. It is entirely possible for the script to be interrupted and delete the file! It would be safer and more maintainable to explicitly use a temporary file.
Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.