Create an applescript from a perl and a ruby script? - applescript

I download protein sequences from http://ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi#mainform, which are downloaded with filename FASTA.FA. For each protein in the file, it includes one describing row, and then protein sequence separated with a new row after each 70 characters.
Example:
>CAA47401 B/Yamagata/16/88 1988// NA
MLPSTIQTLTLFLTSGGVLLSLYVSASLSYLLYSDILLKFSPTEITAPKVPLDCANASNVQAVNRSATKG
MTLLLSEPEWTYPRLSCQGSTFQKALLISPHRFGESRGNSAPLIIREPFIACGPKECKHFALTHYAAQPG
>AAB26739 Influenza B virus 1973// NA
MLPSTIQTLTLFLTSGGVLLSLYVSASLSYLLYSDILLKFSPTKITAPTMSLDCANVSNVQAVNRSATKE
DVPCIGIEMVHDGGKETWHSAATAIYCLMGSGQLLWDIVTGVAMAL
I have a ruby script that converts this to a file that fits Excel better, where the first line gets one cell and the entire protein sequence gets one another cell (perl makes a tab between and Excel puts stuff separated with tab in a new cell).
This is my script:
ruby -e 'first_line = true; while line = STDIN.gets; line.chomp!;
if line =~ /^>/; puts unless first_line; print line[1..-1]; print "\t";
else; print line; end; first_line = false; end; puts' < ~/Downloads/FASTA.fa > ~/Downloads/Sequences.xls
On the website where I download the files you can change how the first line should be formatted, and I include a "+" between each description, and I then have a perl script that converts + to tab (some descriptions have a space within it, so I cant use space as a separator).
perl -p -i -e "s/\+/\t/g" ~/Downloads/Sequences.xls
These two hacks successfully creates a nice excel file for me, and I have made an Automator program from these two scripts that sits in my dock.
However, now my group wants me to create an applescript out of this. If I have understood this correctly, it's not as simple as just typing "do shell script" and then paste the script, but you have to format the actual script itself so that applescript understands it. Could anyone please help me in creating this?
Thanks!

I'd say you should make the ruby script into a proper .rb file, and while you're at it, use Ruby to do the tab-replacement (or use the Spreadsheet gem, if you want fancier Excel output, though that might require some setup on other people's computers). Or do it all in Perl instead. Just to save you the trouble of spinning up two runtimes for such trivial search/replace stuff. For that matter you can do it all in sed, I'm sure.
Anyway, once you have the script as a file, you can make a new script in AppleScript Editor and save it as a script bundle. Then you can include the scripts "inside" your AppleScript. That way, you know where the scripts are, and they're much, much easier to edit if need be. Then you can use do shell script to invoke the script with the proper arguments. Simplified example:
-- get script path
set rb to POSIX path of (path to me) & "Contents/Resources/Scripts/convert.rb"
-- run script
do shell script "ruby '" & rb & "' < inputfile.fa > outputfile.xls"
If you want, you can even get some drag-and-drop conversion going, instead of hard-coding a path to ~/Downloads/FASTA.fa (if someone forgets to move/delete an old download, it'll keep converting that one, and writing over the same xls, since a new download would be "FASTA-1.fa" or something, so avoid hard-coded paths if you can)

Related

Building .exe via shell causes extra character in file name

I'm trying to create a shell script via cygwin that will automatically build an executable and run it. It's a very simple format of
#!/bin/bash
gcc test.c -o hello
./hello.exe
When I enter the 2nd and 3rd lines separately, everything works normally. However, if I save those 3 lines into a .sh file, the resulting .exe built has some extra character added in that will always throw off the last line.
hello.exe
I can't even replicate the file name because no tool, including the character map/MS word/other ASCII tools online will give me any result. Some online tool gave me the ASCII result &#61453, but as far as I can tell that doesn't correspond to anything meaningful. How can I avoid this problem in my shell script?
Very likely you have Windows linefeeds in the .sh file. Make sure you have Unix linefeeds.

How do I read a file in Ruby using a specific script?

I am doing a series of tutorials on how to code in Ruby. I want to read a .txt file using this formula:
filename = ARGV.first
prompt = "> "
txt = File.open(filename)
puts "Here's your file: #{filename}"
puts txt.read()
puts "I'll also ask you to type it again:"
print prompt
file_again = STDIN.gets.chomp()
txt_again = File.open(file_again)
puts txt_again.read()
The text file reads:
This is stuff I typed into a file. It is really cool stuff.
Lots and lots of fun to have in here.
The name for the text file is ex15_sample.txt. I tried with the above formula, and nothing seems to work. I have a hard understanding how to use both ARGV and STDIN.gets.chomp.
What should I do? I ask that you use the formula above; this stuff is a little confusing, so for now, just use the formula above.
The script works. You're not explaining how you're trying to run the script or what errors you're seeing, so it's a bit hard to help you.
If you have a text file named ex15_sample.txt in the same directory as your script (let's call it script.rb), and if you have Ruby set up properly, then if you run it with
$ ruby script.rb ex15_sample.txt
everything should work fine.
If you're trying to change the first line to always use ex15_sample.txt, be sure to put it in quotes:
filename = "ex15_sample.txt" # Without the quotes, you'll get an error.
Again, it's hard to help you without knowing exactly how you're running the script or what errors you're getting.
Update: I seems your issue is that you aren't clear on how to run a Ruby script. The simplest way is to, at your system's command prompt, type ruby then a space, then the name of the file with a Ruby script in it. If your script is in a file named script.rb, you would type ruby script.rb. That won't work if your script is in a file with a different name. If the script is in a file named read-a-file.rb, then you need to type ruby read-a-file.rb.
This particular script wants a command line argument after the file name. If the text file you want to read is in a file named ex15_sample.txt, then you need to type that after the script name. In the previous example, the command would become ruby read-a-file.rb ex15_sample.txt. That will only work if the files are in the same directory (a.k.a. folder).

Get Input for a bash script by capturing it from a VIM session

I am creating a new CLI application, where I want to get some sensitive input from the user. Since, this input can be quite descriptive as well as the information is a bit sensitive, I wanted to allow user to enter a command like this from this app:
app new entry
after which, I want to provide user with a VIM session where he can write this descriptive input, which when he exits from this VIM session, will be captured by my script and used for further processing.
Can someone tell me a way (probably some hidden VIM feature - since, I am always amazed by them) so that I can do so, without creating any temporary file? As explained in a comment below, I would prefer a some-what in-memory file, since the information can be a bit sensitive, and hence, I would like to process it first via my script and then only, write it to disk in an encrypted manner.
Git actually does this: when you type git commit, a new Vim instance is created and a temporary file is used in this instance. In that file, you type your commit message
Once Vim gets closed again, the content of the temporary file is read and used by Git. Afterwards, the temporary file gets deleted again.
So, to get what you want, you need the following steps:
create a unique temporary file (Create a tempfile without opening it in Ruby)
open Vim on that file (Ruby, Difference between exec, system and %x() or Backticks)
wait until Vim gets terminated again (also contained in the above SO thread)
read the tempoarary file (How can I read a file with Ruby?)
delete the temporary file (Deleting files in ruby)
That's it.
You can make shell create file descriptors attached to your function and make vim write there, like this: (but you need to split script in two parts: one that calls vim and one that processes its input):
# First script
…
vim --cmd $'aug ScriptForbidReading\nau BufReadCmd /proc/self/fd/* :' --cmd 'aug END' >(second-script)
. Notes:
second-script might actually be a function defined in first script (at least in zsh). This also requires bash or zsh (tested only on the latter).
Requires *nix, maybe won’t work on some OSes considered to be *nix.
BufReadCmd is needed because vim hangs when trying to read write-only descriptor.
It is suggested that you set filetype (if needed) right away, without using ftdetect plugins: in case your script is not the only one which will use this method.
Zsh will wait for second-script to finish, so you may continue script right after vim command in case information from second-script is not needed (it would be hard to get from there).
Second script will be launched from a subshell. Thus no variable modifications will be seen in code running after vim call.
Second script will receive whatever vim saves on standard input. Parent standard input is not directly accessible, but using </dev/tty will probably work.
This is for zsh/bash script. Nothing will really prevent you from using the same idea in ruby (it is likely more convenient and does not require splitting into two scripts), but I do not know ruby enough to say how one can create file descriptors in ruby.
Using vim for this seems like overkill.
The highline ruby gem might do what you need:
require 'highline'
irb> pw = HighLine.new.ask('info: ') {|q| q.echo = false }
info:
=> "abc"
The user's text is not displayed when you set echo to false.
This is also safer than creating a file and then deleting it, because then you'd have to ensure that the delete was secure (overwriting the file several times with random data so it can't be recovered; see the shred or srm utilities).

Is there a script I can run to remove all the hard (carriage) returns in a .txt file?

I have a .txt (Mac OS X Snow Leopard) file that has a lot of text. At the end of a paragraph, there is a hard return that moves the next paragraph onto another line. This is causing some issues with what I am wanting to do to get the content into my db, so I am wondering if there is anyway I can remove the hard returns? Is there some sort of script I can run? I am really hoping I don't have to go through and manually take the hard returns out.
To recap, here is what it looks like now:
This is some text. Text is what this is.
And then this is the next paragraph that is on a different line.
And this is what I would like to get to:
This is some text. Text is what this is. And then this is the next paragraph that is on a different line.
For all several thousand lines in my .txt file.
Thanks!
EDIT:
The text I am dealing with in my txt file is actually HTML:
 <span class="text">1 </span> THis is where my text is<br/>
And when I run the cat command in terminal like mentioned below, only the first is there. Everything else is missing...
In a terminal:
cat myfile.txt | tr -d '\r' > file2.txt
There's probably a more efficient way to do this, since the "tr -d '\r'" is the active ingredient, but that's the idea.
I normally just use an editor with good Regular Expression support. TextWrangler is great.
An end of line in TextWrangler is \r, so to remove it, just search for \r and replace it with a space. TBH, I always wondered how it handles CRLF-encoded files, but somehow it works.
I believe you can do this with Applescript. Unfortunately I'm not familiar with it however the following should help you to acomplish this (it's for a different problem but it will lead you in the direction you need to go): http://macscripter.net/viewtopic.php?id=18762
Alternatively if you didn't want to do this with Applescript and have Excel installed (or access to a machine with it) then the following should help: http://www.mrexcel.com/forum/showthread.php?t=474054
In Linux terminal cat file.txt | tr -d "\r\n" | > new file.txt will do. Modify \r\n part to remove desired charters.

key logging in unix

I am a newbie to unix scripting, I want to do following and I have little clue how to proceed.
I want to log the input and output of certain set of commands, given on the terminal, to a trace file. I should be able to switch it on and off.
E.g.
switch trace on
user:echo Hello World
user:Hello World
switch trace off
Then the trace log file, e.g. trace.log, it's content should be
echo Hello World
Hello World
One thing that I can think to do is to use set -x, redirecting its output to some file, but couldn't find a way to do that. I did man set, or man -x but I found no entry. Maybe I am being too naive, but some guidance will be very helpful.
I am using bash shell.
See script(1), "make typescript of terminal session". To start a new transcript in file xyz: script xyz. To add on to an existing transcript in file xyz: script -a xyz.
There will be a few overhead lines, like Script started on ... and Script done on ... which you could use awk or sed to filter out on printout. The -t switch allows a realtime playback.
I think there might have been a recent question regarding how to display a transcript in less, and although I can't find it, this question and this one address some of the same issues of viewing a file that contains control characters. (Captured transcripts often contain ANSI control sequences and usually contain Returns as well as Linefeeds.)
Update 1 A Perl program script-declutter is available to remove special characters from script logs.
The program is about 45 lines of code found near the middle of the link. Save those lines of code in a file called script-declutter, in a subdirectory that's on your PATH (for example, $HOME/bin if that's on your search path, else (eg) /usr/local/bin) and make the file executable. After that, a command like
script-declutter typescript > out
will remove most special characters from file typescript,
while directing the result to file out.

Resources