Directory argument for Bowtie2 - bioinformatics

I'm using the software Bowtie 2 which aligns genome sequences. I have all my indexes in a directory miniReference1.
When I call Bowtie2 with the -x <dir> option I get the error that my index is not a Bowtie 2 index. What am I doing wrong? Below is a screenshot:

The -x argument takes the path to the basename of the index. That means, if the index called miniReference1.{suffix} is inside a folder also called miniReference1 then it must be -x path/to/miniReference1/miniReference1. There is no need to use this variable, just use the plain path.

I had a similar issue running bowtie 2.5.0 against index files built by bowtie 2.4.5. The error message shows readU: No such file or directory even though the files are there.
I rebuilt the indexes using bowtie 2.5.0 and the issue was resolved. So I'd suggest rebuilding the indexes using the same version as the aligner, and then running it again.

I'm quoting the relevant part of the manual:
-x           The basename of the index for the reference genome. The basename is the name of any of the index files up to but not including the final .1.bt2 / .rev.1.bt2 / etc. bowtie2 looks for the specified index first in the current directory, then in the directory specified in the BOWTIE2_INDEXES environment variable. [Emphasis by me.]
After reading this, I assume that what was missing is to tell BowTie2 the directory it is supposed to look in by setting the environment variable BOWTIE2_INDEXES. In bash you would do that by issuing the command
export BOWTIE2_INDEXES=miniReference1
If that doesn't help try to provide the absolute path. It is unclear to me whether Bowtie2 expects a Posix path name like /c/Users/raghdad/Documents/project-x/sequences/miniReference1 or a Windows path name like C:\Users\.... (pwd in your terminal gives you the Posix name for the current directory).
The -x file base name argument, miniReference1, is correct but does not name a directory; instead, it tells BowTie2 the set of indexes you want to work with, which is identified by the common "base name" of the files (the part up to the first dot). The reason for this scheme is probably that you could have different sets of indexes in the same directory, and each set would have a distinct and unique base name. Of course your way of putting each set in its own directory appears to be much cleaner, unless there is a reason to assess several index sets in one go.

Related

Duplicated output when using: `find pwd .`

I am trying to find some files and get the absolute path.
If I use: find `pwd` .
I get the files with absolute path but I also get them from ./
If I use: find `pwd` then I just get the files once.
Why Is that happening ?
Arguments given to find which precede any options, actions or arguments thereto are parsed as locations from which to start a search. (The POSIX standard doesn't require that find operate at all when not passed at least one such location, though GNU's version does so anyhow by treating . as a default starting location if none are given).
When you instruct find to start from the same location twice by passing it two different paths that refer to the same place, you're thus telling it to run two separate searches starting at the same place -- so if the set of files doesn't change between when the first search runs and the second one, you get the same results twice.

autoconf: how do I substitute the library prefix?

CLISP's interface to PARI is configured with the configure.in containing AC_LIB_LINKFLAGS([pari]) from lib-link.m4.
The build process also requires the Makefile to know where the datadir of PARI is located. To this end, Makefile.in has
prefix = #LIBPARI_PREFIX#
DATADIR = #datadir#
and expects to find $(DATADIR)/pari/pari.desc (normally
/usr/share/pari/pari.desc or /usr/local/share/pari/pari.desc).
This seems to work on Mac OS X where PARI is installed by homebrew in /usr/local (and LIBPARI_PREFIX=/usr/local), but not on Ubuntu, where PARI is in /usr, and LIBPARI_PREFIX is empty.
How do I insert the location of the PARI's datadir into the Makefile?
PS. I also asked this on the autoconf mailing list.
PPS. In response to #BrunoHaible's suggestion, here is the meager attempt at debugging on Linux (where LIBPARI_PREFIX is empty).
$ bash -x configure 2>&1 | grep found_dir
+ found_dir=
+ eval ac_val=$found_dir
+ eval ac_val=$found_dir
You are trying to use $(prefix) in an unintended way. In an Autotools-based build system, the $(prefix) represents a prefix to the target installation location of the software you're building. By setting it in your Makefile.in, you are overriding the prefix that configure will try to assign. However, since you appear not to have any installation targets anyway, at least at that level, that's probably more an issue of poor form than a cause for malfunction.
How do I insert the location of the PARI's datadir into the Makefile?
I'd recommend computing or discovering the needed directory in your configure script, and exporting it to the generated Makefile via its own output variable. Let's take the second part first, since it's simple. In configure.in, having in some manner located the wanted data directory and assigned it to a variable
DATADIR=...
, you would make an output variable of that via the AC_SUBST macro:
AC_SUBST([DATADIR])
Since you are using only Autoconf, not Automake, you would then manually receive that into your Makefile by changing the assignment in your Makefile.in:
DATDIR = #DATADIR#
Now, as for locating the data directory in the first place, you have to know what you're trying to implement before you can implement it. From your question and followup comments, it seems to me that you want this:
Use a data directory explicitly specified by the user if there is one. Otherwise,
look for a data directory relative to the location of the shared library. If it's not found there then
(optional) look under the prefix specified to configure, or specifically in the specified datadir (both of which may come from the top-level configure). Finally, if it still has not been found then
look in some standard locations.
To create a configure option by which the user can specify a custom data directory, you would probably use the AC_ARG_WITH macro, maybe like this:
AC_ARG_WITH([pari-datadir], [AS_HELP_STRING([--with-pari-datadir],
[explicitly specifies the PARI data directory])],
[], [with_pari_datadir=''])
Thanks to #BrunoHaible, we see that although the Gnulib manual does not document it, the macro's internal documentation specifies that if AC_LIB_LINKFLAGS locates libpari then it will set LIBPARI_PREFIX to the library directory prefix. You find that that does work when the --with-libpari option is used to give it an alternative location to search, so I suggest working with that. You certainly can try to debug AC_LIB_LINKFLAGS to make it set LIBPARI_PREFIX in all cases in which the lib is found, but if you don't want to go to that effort then you can work around it (see below).
Although the default or specified installation prefix is accessible in configure as $prefix, I would suggest instead going to the specified $datadir. That is slightly tricky, however, because by default it refers to the prefix indirectly. Thus, you might do this:
eval "datadir_expanded=${datadir}"
Finally, you might hardcode a set of prefixes such as /usr and /usr/local.
Following on from all the foregoing, then, your configure.in might do something like this:
DATADIR=
for d in \
${with_pari_datadir} \
${LIBPARI_PREFIX:+${LIBPARI_PREFIX}/share/pari} \
${datadir_expanded}/pari \
/usr/local/share/pari \
/usr/share/pari
do
AS_IF([test -r "$[]d/pari.desc"], [DATADIR="$[]d"; break])
done
AS_IF([test x = "x$DATADIR"], [AC_MSG_ERROR(["Could not identify PARI data directory"])])
AC_SUBST([DATADIR])
Instead of guessing the location of datadir, why don't you ask PARI/GP where its datadir is located? Namely,
$ echo "default(datadir)" | gp -qf
"/usr/share/pari"
does the trick.

Path to bash source?

We're seeing a situation where this:
% . setup.sh
sources a different file (in a different directory) than
% . ./setup.sh
Is there some sort of path that affects the '.' command?
Arguments to source that don't contain a / are subject to PATH lookup.
If bash is not in POSIX mode, and it cannot find the requested file on your PATH, then the current directory is searched as well (which can lead to the impression that path lookup is not performed in the first place).

Bash/shell/OS interpretation of . and .. — can I define ...?

How do . and .., as paths (vs. ranges, e.g., {1..10}, which I'm not concerned with), really work? I know what they do, and use them all the time, but don't fully grasp how/where they're interpreted. Does the shell handle them? The interpreting process? The OS?
The reason why I'm asking is that I'd like to be able to use ... to refer to ../.., .... to refer to ../../.., etc. (up to some small finite number; I don't need bash to process an arbitrarily large number of dots). I.e., if my current directory is /tmp/let/me/out, and I call cd ..., my resulting current directory should be /tmp/let. I don't particularly care if ... etc. show up in ls -a output like . and .. do, but I would like to be able to call cat /tmp/let/me/out/..../phew.txt to print the contents of /tmp/phew.txt.
Pointers to relevant documentation appreciated as well as direct answers. This kind of syntax question is very hard to Google.
I'm using bash 4.3.42, by the way, with the autocd and globstar shell options.
. and .. are genuine directory names. They are not "sort-cuts", aliases, or anything fake.
They happen to point to the same inode as the other name you use. A file or directory can have several names pointing to the same inode, these are usually known as hard links, to distinguish them from symbolic (or soft) links.
If you are on Linux or OS X you can use stat to look at most of the inode metadata - it is what ls looks at. You will see there is an inode number. If you stat . and stat current-directory-name you will see that number is the same.
The one thing that is not held in the inode is the filename - that is held in the directory.
So . and .. reside in the directory on the file system, they are not a figment of the shell's imagination. So, for example, I can use . and .. quite happily from C.
I doubt you can change them - personally I have never tried and I never will. You would have to change what these filenames linked to by editing the directory. If you managed it you would probably do irreparable damage to your file system.
I write this to clarify what has already been written before.
In many file systems a DIRECTORY is a file; a special type of file that the file system identifies as being distinctly a directly.
A directory file contains a list of names that map to files on the disk
A file, including a directly does not have an intrinsic name associated with it (not true in all file systems). The name of a file exists only in a directory.
The same file can have an entry in multiple directories (hard link). The same file can then have multiple names and multiple paths.
The file system maintains in every directory entries for "." and ".."
In such file systems there are always directory ENTRIES for the NAMES "." and "..". These entries are maintained by the file system.
The name "." links to its own directory.
The name ".." links to the parent directory EXCEPT for the top level directory where it links to itself (. and .. thus link to the same directory file).
So when you use "." and ".." as in /dir1/dir2/../dir3/./dir4/whatever,
"." and ".." are processed in the exact same way as "dir1" and "dir2".
This translation is done by the file system; not the shell.
cd ...
Does not work because there is no entry for "..." (at least not normally).
You can create a directory called "..." if you want.
You can actually achieve something like this, though this is an ugly hack:
You can run a command before every command entered to bash, and after every command. For that you trap the DEBUG pseudo signal and set a command to PROMPT_COMMAND, respectively.
trap 'ln -s ../.. ... &>/dev/null | true' DEBUG
PROMPT_COMMAND='rm ...'
With this, it seems like there's an additional entry in the current directory:
pwd
# /tmp/crazy-stuff
ls -a
# . .. ... foo
ls -a .../tmp/crazy-stuff
# . .. ... foo
Though this only works in the current directory, because the symbolic links is deleted after each command invokation. Thus ls foo/bar/... won't work this way.
Another ugly hack would be to "override" mkdir such that it populates every new directory with these symbolic links.
See also the comments on the second answer here, particularly Eliah's: https://askubuntu.com/questions/327126/what-is-a-dot-only-named-folder
Much in the same way that when you cd into some directory subdir, you're actually following a pointer that points to that directory, .. is a pointer added by the OS that points to the parent directory, and I'd imagine . works the same way.

What does slash dot refer to in a file path?

I'm trying to install a grunt template on my computer but I'm having issues. I realized that perhaps something different is happening because of the path given by the Grunt docs, which is
%USERPROFILE%\.grunt-init\
What does that . mean before grunt-init?
I've tried to do the whole import manually but it also isn't working
git clone https://github.com/gruntjs/grunt-init-gruntfile.git "C:\Users\Imray\AppData\Roaming\npm\gru
nt-init\"
I get a message:
fatal: could not create work tree dir 'C:\Users\Imray\AppData\Roaming\npm\.grunt-init"'.: Invalid argument
Does it have to do with this /.? What does it mean?
The \ (that's a backslash, not a slash) is a directory delimiter. The . is simply part of the directory name.
.grunt-init and grunt-init are two distinct names, both perfectly valid.
On Unix-like systems, file and directory names starting with . are hidden by default, which is why you'll often see such names for things like configuration files.
The . is part of a directory name. Filenames can contain . . The \ is a separator between directory names.
Typically, files or directories starting with . are considered "hidden" and/or used for storing metadata. In particular, shell wildcard expansion skips over files that start with ..
For example if you wrote ls -d * then it would not show any files or directories beginning with . (including . and .., the current and parent directories).
Linux hides files and directories whose names begin with dot, unless you use the a (for "all") option when listing directory contents. If this convention is not followed on Windows, your example is probably just a carryover.
It may well be something behind the scenes (later) expects that name to match exactly. While I like things, installers, for example, to just do what I said, I realize that keeping default value is the most tested path.
Directories starting with a dot are invisible by default on xNIX systems. Typically used for configurations files and similar in a users home directory.
\ before " has a special meaning on windows, the error is because windows won't let you create a file containing " as part of its name.

Resources