In tutorials it has written this command:
~~~
./bw \
-hmmdir en-us \
-moddeffn en-us/mdef.txt \
-ts2cbfn .ptm. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn cmudict-en-us.dict \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
~~~
But I checked my feat.params and it has this content:
~~~
-lowerf 130
-upperf 6800
-nfilt 25
-transform dct
-lifter 22
-feat 1s_c_d_dd
-svspec 0-12/13-25/26-38
-agc none
-cmn current
-varnorm no
-model ptm
-cmninit 40,3,-1
~~~
I don't know how should I config these options? I am trying to config acoustic model for contunuous speaking.
I got my model from here:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-ptm-5.2.tar.gz/download
And tried to configure the above command like this:
./bw -hmmdir en-us -moddeffn en-us/mdef.txt -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn cmudict-en-us.dict -ctlfn robot_train.fileids -lsnfn robot_train.transcription -accumdir -lda feature_transform .
But I get these error messages:
INFO: main.c(229): Compiled on Mar 22 2018 at 12:54:02 ERROR:
"cmd_ln.c", line 607: Unknown argument name 'feature_transform' ERROR:
"cmd_ln.c", line 704: Failed to parse arguments list ERROR:
"cmd_ln.c", line 753: Failed to parse arguments list, forced exit
I changed my model to this file (from the ptm model that tutorial linked to it) : https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-5.2.tar.gz/download
Then I removed the -lda feature_transform from my command, and it worked!
I'd like to lemmatize and POS tag text according to the FAQ. This command provided in the FAQ works correctly:
$ java -cp "*:lib/*" edu.stanford.nlp.tagger.maxent.MaxentTagger \
-model models/english-left3words-distsim.tagger \
-textFile samsawme.txt -outputFormat inlineXML \
-outputFormatOptions lemmatize -sentenceDelimiter newline
Output:
<?xml version="1.0" encoding="UTF-8"?>
<pos>
<sentence id="0">
<word wid="0" pos="NNP" lemma="Sam">Sam</word>
<word wid="1" pos="VBD" lemma="see">saw</word>
<word wid="2" pos="PRP" lemma="I">me</word>
<word wid="3" pos="." lemma=".">.</word>
</sentence>
</pos>
However, if I add the -tokenize false flag, and use instead a tokenized version of the text file, the lemmas disappear from the XML file:
Contents of samsawme_tokenized.txt:
Sam saw me .
Command:
$ java -cp "*:lib/*" edu.stanford.nlp.tagger.maxent.MaxentTagger \
-model models/english-left3words-distsim.tagger \
-textFile samsawme_tokenized.txt -outputFormat inlineXML \
-outputFormatOptions lemmatize -sentenceDelimiter newline \
-tokenize false # !!!
Output:
<?xml version="1.0" encoding="UTF-8"?>
<pos>
<sentence id="0">
<word wid="0" pos="NNP">Sam</word>
<word wid="1" pos="VBD">saw</word>
<word wid="2" pos="PRP">me</word>
<word wid="3" pos=".">.</word>
</sentence>
</pos>
Is there any workaround to including lemmas when tagging pre-tokenized but not necessarily lemmatized text?
I have a file called macse.cmd which contains 1000 commands to execute, 1 command per line.
I want to use parallel to execute 30 at a time. I don't care in what order they are executed as long as all are.
I tried "parallel -j 30 ./macse.cmd" but this caused them to run 1 by 1 and I am not even sure how to stop them.
Adrian
p.s.
Commands look like:
java -jar -Xmx5000m ~/programs/macse_v1.01b.jar -prog alignSequences -seq M715_2100035271/all_unaligned.fasta -out_NT M715_2100035271/aligned_nt.fasta -out_AA M715_2100035271/aligned_aa.fasta
java -jar -Xmx5000m ~/programs/macse_v1.01b.jar -prog alignSequences -seq M715_100078281/all_unaligned.fasta -out_NT M715_100078281/aligned_nt.fasta -out_AA M715_100078281/aligned_aa.fasta
java -jar -Xmx5000m ~/programs/macse_v1.01b.jar -prog alignSequences -seq M715_510001221/all_unaligned.fasta -out_NT M715_510001221/aligned_nt.fasta -out_AA M715_510001221/aligned_aa.fasta
java -jar -Xmx5000m ~/programs/macse_v1.01b.jar -prog alignSequences -seq M715_100094159/all_unaligned.fasta -out_NT M715_100094159/aligned_nt.fasta -out_AA M715_100094159/aligned_aa.fasta
So it's only the M715_ number that changes between commands.
Is the command always the same? As in
echo "A"
echo "B"
echo "C"
?
Then you should change it to:
"A"
"B"
"C"
and run: parallel -j 30 -a macse.cmd echo where of course echo is your actuall command.
parallel -j 30 < ./macse.cmd
or:
parallel java -jar -Xmx5000m ~/programs/macse_v1.01b.jar -prog alignSequences -seq {}/all_unaligned.fasta -out_NT {}/aligned_nt.fasta -out_AA {}/aligned_aa.fasta ::: M*/
Walk through the tutorial:
man parallel_tutorial
I'm trying to deploy an oracle application to another machine that uses the dbxora.dll file from dbexpress. I have included the file with the program but when i run the program and try to execute a query, it returns with an error that it could not load the dbxora.dll.
I have the following all in a folder together:
dbxora.dll
Application.exe
dbxdrivers.ini
dbxconnection.ini
I have exhausted myself looking every where I possibly can think of to figure out how exactly to do this. I just can't figure it out.
As requested here are the contents of the dbxdrivers.ini file:
[Installed Drivers]
DBXTrace=1
DBXPool=1
DataSnap=1
ASA=1
ASE=1
DB2=1
Firebird=1
Informix=1
Interbase=1
MSSQL=1
MySQL=1
Odbc=1
Oracle=1
SQLite=1
[DataSnap]
DriverUnit=Data.DBXDataSnap
DriverAssemblyLoader=Borland.Data.TDBXClientDriverLoader,Borland.Data.DbxClientDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
Port=211
[ASA]
DriverUnit=Data.DBXSybaseASA
DriverPackageLoader=TDBXDynalinkDriverLoader,DbxCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXSybaseASAMetaDataCommandFactory,DbxSybaseASADriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXSybaseASAMetaDataCommandFactory,Borland.Data.DbxSybaseASADriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverASA
LibraryName=dbxasa.dll
LibraryNameOsx=libsqlasa.dylib
VendorLib=dbodbc*.dll
VendorLibWin64=dbodbc*.dll
VendorLibOsx=libdbodbc12.dylib
HostName=ServerName
Database=DBNAME
User_Name=user
Password=password
Port=
ConnectionString=
BlobSize=-1
ErrorResourceFile=
LocaleCode=0000
IsolationLevel=ReadCommitted
[ASA TransIsolation]
DirtyRead=0
ReadCommited=1
RepeatableRead=2
[ASE]
DriverUnit=Data.DBXSybaseASE
DriverPackageLoader=TDBXDynalinkDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXSybaseASEMetaDataCommandFactory,DbxSybaseASEDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXSybaseASEMetaDataCommandFactory,Borland.Data.DbxSybaseASEDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverASE
LibraryName=dbxase.dll
VendorLib=libct.dll;libcs.dll
VendorLibWin64=libsybct64.dll;libsybcs64.dll
HostName=ServerName
DataBase=Database Name
User_Name=user
Password=password
BlobSize=-1
TDS Packet Size=512
Client HostName=
Client AppName=
ErrorResourceFile=
LocaleCode=0000
IsolationLevel=ReadCommitted
[ASE TransIsolation]
DirtyRead=0
ReadCommited=1
RepeatableRead=2
[DBXPool]
DelegateDriver=True
DriverName=DBXPool
DriverUnit=Data.DBXPool
DriverPackageLoader=TDBXPoolDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXPoolDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
[DBXTrace]
DelegateDriver=True
DriverName=DBXTrace
DriverUnit=Data.DBXTrace
DriverPackageLoader=TDBXTraceDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXTraceDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
[AutoCommit]
False=0
True=1
[BlockingMode]
False=0
True=1
[WaitOnLocks]
False=1
True=0
[CommitRetain]
False=0
True=1
[OS Authentication]
False=0
True=1
[Multiple Transaction]
False=0
True=1
[Trim Char]
False=0
True=1
[SQLDialect]
1=0
2=1
3=2
[DB2]
DriverUnit=Data.DBXDb2
DriverPackageLoader=TDBXDynalinkDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXDb2MetaDataCommandFactory,DbxDb2Driver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXDb2MetaDataCommandFactory,Borland.Data.DbxDb2Driver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverDB2
LibraryName=dbxdb2.dll
VendorLib=db2cli.dll
VendorLibWin64=db2cli64.dll
Database=DBNAME
User_Name=user
Password=password
BlobSize=-1
ErrorResourceFile=
LocaleCode=0000
IsolationLevel=ReadCommitted
Decimal Separator=.
[DB2 TransIsolation]
DirtyRead=0
ReadCommited=1
RepeatableRead=2
[Firebird]
DriverUnit=Data.DBXFirebird
DriverPackageLoader=TDBXDynalinkDriverLoader,DbxCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXFirebirdMetaDataCommandFactory,DbxFirebirdDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXFirebirdMetaDataCommandFactory,Borland.Data.DbxFirebirdDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverINTERBASE
LibraryName=dbxfb.dll
LibraryNameOsx=libsqlfb.dylib
VendorLib=fbclient.dll
VendorLibWin64=fbclient.dll
VendorLibOsx=/Library/Frameworks/Firebird.framework/Firebird
BlobSize=-1
CommitRetain=False
Database=database.fdb
ErrorResourceFile=
LocaleCode=0000
Password=masterkey
RoleName=RoleName
ServerCharSet=
SQLDialect=3
IsolationLevel=ReadCommitted
User_Name=sysdba
WaitOnLocks=True
Trim Char=False
[Informix]
DriverUnit=Data.DBXInformix
DriverPackageLoader=TDBXDynalinkDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXInformixMetaDataCommandFactory,DbxInformixDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXInformixMetaDataCommandFactory,Borland.Data.DbxInformixDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverINFORMIX
LibraryName=dbxinf.dll
LibraryNameOsx=libsqlinf.dylib
VendorLib=isqlb09a.dll
VendorLibWin64=isqlt09a.dll
VendorLibOsx=libifcli.dylib
HostName=ServerName
DataBase=Database Name
User_Name=user
Password=password
BlobSize=-1
ErrorResourceFile=
LocaleCode=0000
IsolationLevel=ReadCommitted
Trim Char=False
[Informix TransIsolation]
DirtyRead=0
ReadCommited=1
RepeatableRead=2
[Interbase]
DriverUnit=Data.DBXInterBase
DriverPackageLoader=TDBXDynalinkDriverLoader,DbxCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXInterbaseMetaDataCommandFactory,DbxInterBaseDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXInterbaseMetaDataCommandFactory,Borland.Data.DbxInterBaseDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverINTERBASE
LibraryName=dbxint.dll
LibraryNameOsx=libsqlib.dylib
VendorLib=GDS32.DLL
VendorLibWin64=ibclient64.dll
VendorLibOsx=libgds.dylib
BlobSize=-1
CommitRetain=False
Database=database.gdb
ErrorResourceFile=
LocaleCode=0000
Password=masterkey
RoleName=RoleName
ServerCharSet=
SQLDialect=3
IsolationLevel=ReadCommitted
User_Name=sysdba
WaitOnLocks=True
Trim Char=False
[IBToGo]
DriverUnit=Data.DBXInterBase
DriverPackageLoader=TDBXDynalinkDriverLoader,DbxCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXInterbaseMetaDataCommandFactory,DbxInterBaseDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXInterbaseMetaDataCommandFactory,Borland.Data.DbxInterBaseDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverINTERBASE
LibraryName=dbxint.dll
LibraryNameOsx=libsqlib.dylib
VendorLib=ibtogo.dll
VendorLibWin64=ibtogo64.dll
VendorLibOsx=libibtogo.dylib
BlobSize=-1
CommitRetain=False
Database=database.gdb
ErrorResourceFile=
LocaleCode=0000
Password=masterkey
RoleName=RoleName
ServerCharSet=
SQLDialect=3
IsolationLevel=ReadCommitted
User_Name=sysdba
WaitOnLocks=True
Trim Char=False
AutoUnloadDriver=True
[Interbase TransIsolation]
ReadCommited=1
RepeatableRead=2
[MSSQL]
SchemaOverride=%.dbo
DriverUnit=Data.DBXMSSQL
DriverPackageLoader=TDBXDynalinkDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXMsSqlMetaDataCommandFactory,DbxMSSQLDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXMsSqlMetaDataCommandFactory,Borland.Data.DbxMSSQLDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverMSSQL
LibraryName=dbxmss.dll
VendorLib=sqlncli10.dll
VendorLibWin64=sqlncli10.dll
HostName=ServerName
DataBase=Database Name
User_Name=user
Password=password
BlobSize=-1
ErrorResourceFile=
LocaleCode=0000
IsolationLevel=ReadCommitted
OS Authentication=False
Prepare SQL=False
[MSSQL9]
SchemaOverride=%.dbo
DriverUnit=DBXMSSQL
DriverPackageLoader=TDBXDynalinkDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXMsSqlMetaDataCommandFactory,DbxMSSQLDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXMsSqlMetaDataCommandFactory,Borland.Data.DbxMSSQLDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverMSSQL
LibraryName=dbxmss9.dll
VendorLib=sqlncli.dll
VendorLibWin64=sqlncli.dll
HostName=ServerName
DataBase=Database Name
User_Name=user
Password=password
BlobSize=-1
ErrorResourceFile=
LocaleCode=0000
IsolationLevel=ReadCommitted
OS Authentication=False
Prepare SQL=False
[MSSQL TransIsolation]
DirtyRead=0
ReadCommited=1
RepeatableRead=2
[MYSQL]
DriverUnit=Data.DBXMySQL
DriverPackageLoader=TDBXDynalinkDriverLoader,DbxCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXMySqlMetaDataCommandFactory,DbxMySQLDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXMySqlMetaDataCommandFactory,Borland.Data.DbxMySQLDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverMYSQL
LibraryName=dbxmys.dll
LibraryNameOsx=libsqlmys.dylib
VendorLib=LIBMYSQL.dll
VendorLibWin64=libmysql.dll
VendorLibOsx=libmysqlclient.dylib
BlobSize=-1
Database=DBNAME
ErrorResourceFile=
HostName=ServerName
LocaleCode=0000
Password=password
User_Name=user
Compressed=False
Encrypted=False
[Odbc]
DriverUnit=Data.DBXOdbc
DriverPackageLoader=TDBXOdbcDriverLoader,DBXOdbcDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXOdbcDriverLoader,Borland.Data.DbxOdbcDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXOdbcMetaDataCommandFactory,DbxOdbcDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXOdbcMetaDataCommandFactory,Borland.Data.DbxOdbcDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
[Oracle]
DriverUnit=Data.DBXOracle
DriverPackageLoader=TDBXDynalinkDriverLoader,DBXCommonDriver170.bpl
DriverAssemblyLoader=Borland.Data.TDBXDynalinkDriverLoader,Borland.Data.DbxCommonDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
MetaDataPackageLoader=TDBXOracleMetaDataCommandFactory,DbxOracleDriver170.bpl
MetaDataAssemblyLoader=Borland.Data.TDBXOracleMetaDataCommandFactory,Borland.Data.DbxOracleDriver,Version=17.0.0.0,Culture=neutral,PublicKeyToken=91d62ebb5b0d1b1b
GetDriverFunc=getSQLDriverORACLE
LibraryName=dbxora.dll
LibraryNameOsx=libsqlora.dylib
VendorLib=oci.dll
VendorLibWin64=oci.dll
VendorLibOsx=libociei.dylib
DataBase=Database Name
User_Name=user
Password=password
BlobSize=-1
ErrorResourceFile=
LocaleCode=0000
IsolationLevel=ReadCommitted
RowsetSize=20
OS Authentication=False
Multiple Transaction=False
Trim Char=False
Decimal Separator=.
[Oracle TransIsolation]
DirtyRead=0
ReadCommited=1
RepeatableRead=2
[Sqlite]
DriverUnit=Data.DbxSqlite
DriverPackageLoader=TDBXSqliteDriverLoader,DBXSqliteDriver170.bpl
MetaDataPackageLoader=TDBXSqliteMetaDataCommandFactory,DbxSqliteDriver170.bpl
Also here is the error message i am getting in XP
The application or DLL \Application\Path\dbxora.dll is not a valid windows image. Please check this against your installation diskette
Then on both Windows xp and windows 7 i get this
Unable to load dbxora.dll (Error Code 193). It may be missing from the system path
After searching for hours and finally looking up the error code, i have found this result.
I was using the 64-bit version of the dll instead of the 32-bit. After correcting this issue it works perfect.
Thanks to everyone for all their help.
I defined my own input format as follows which prevents file spliting:
import org.apache.hadoop.fs.*;
import org.apache.hadoop.mapred.TextInputFormat;
public class NSTextInputFormat extends TextInputFormat {
#Override
protected boolean isSplitable(FileSystem fs, Path file) {
return false;
}
}
I compiled this using Eclipse into a class NSTextInputFormat.class. I copied this class to a client from where the job is launched. I used following command for launching the job and passing above class as inputformat.
hadoop jar $HADOOP_HOME/hadoop-streaming.jar -Dmapred.job.queue.name=unfunded -input 24222910/framefile -input 24225109/framefile -output Output -inputformat NSTextInputFormat -mapper ExtractHSV -file ExtractHSV -file NSTextInputFormat.class -numReduceTasks 0
This fails saying:
-inputformat : class not found : NSTextInputFormat
Streaming Job Failed!
I set the PATH and CLASSPATH variable to the directory containing NSTextInputFormat.class, but still that doesnot work. Any pointers to this will be helpful.
There are a few gotchas here that can get you if you are not familiar with Java.
-inputformat (and the other commandline options that expect classnames) expects a fully qualified classname, otherwise it expects to find the class in some org.apache.hadoop... namespace. So you must include a package name in you .java file
package org.example.hadoop;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.mapred.TextInputFormat;
public class NSTextInputFormat extends TextInputFormat {
#Override
protected boolean isSplitable(FileSystem fs, Path file) {
return false;
}
}
And the specify the full name on the commandline:
-inputformat org.example.hadoop.NSTextInputFormat
When you build the jar file the .class file must also be in a directory structure that mirrors the package name. I'm sure this is Java Packaging 101, but if you are using Hadoop Streaming then you probably aren't too familiar with Java in the first place. Passing the -d option to javac will tell it to compile the input files into .class files in directories that match the package name.
javac -classpath `hadoop classpath` -d ./output NSTextInputFormat.java
The compiled .class file will be written to ./output/org/example/hadoop/NSTextInputFormat.class. You will need to create the output directory but the other sub-directories will be created for you. The jar file can then be created like so:
jar cvf myjar.jar -C ./output/ .
And you should see some output similar to this:
added manifest
adding: org/(in = 0) (out= 0)(stored 0%)
adding: org/example/(in = 0) (out= 0)(stored 0%)
adding: org/example/hadoop/(in = 0) (out= 0)(stored 0%)
adding: org/example/hadoop/NSTextInputFormat.class(in = 372) (out= 252)(deflated 32%)
Bundle the input format and mapper class into a jar (myjar.jar) and add the -libjars myjar.jar option to the command line:
hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-libjars myjar.jar \
-Dmapred.job.queue.name=unfunded \\
-input 24222910/framefile \
-input 24225109/framefile \
-output Output \
-inputformat NSTextInputFormat \
-mapper ExtractHSV \
-numReduceTasks 0