I need to control the sub-path of the different different files being managed by MultipleOutputFormat based on the reducer key.
I basically want to set the sub path of the file based on the key given to the reducer.
I can changed the file name by overwrting the generateFileNameForKeyValue method of MultipleOutputFormatbut how can I also change the sub-path of these files?
I mean with just overriding the generateFileNameForKeyValue, I get
mySetJobConfigOutputPath/fileNameBasedKey1.dat
/fileNameBasedKey2.dat
/fileNameBasedKey3.dat
...
but I want to make it to be organize files like below
mySetJobConfigOutputPath/path0ConfiguredInsideReducerBasedOnKey/fileNameBasedKey1.dat
/path1ConfiguredInsideReducerBasedOnKey/fileNameBasedKey2.dat
/fileNameBasedKey3.dat
/path2ConfiguredInsideReducerBasedOnKey/fileNameBasedKey8.dat
as seen, the sub-path and the file name are both figured out by the key inside the reducer.
I know how to configure the file name but was wondering if I can configure the sub-path of the each file under the mySetJobConfigOutputPath folder?
I found out that that I can override the getInputFileBasedOutputFileName method also and give it the sub-Path in there.
#Override
protected String getInputFileBasedOutputFileName(JobConf conf, String Name)
{
//your logic goes here. Simply addd the sub path to the name and return
}
You should still implement the generateFileNameForKeyValue to convert your lead file name to the key
UPDATE: Basically this explains it all http://www.infoq.com/articles/HadoopOutputFormat
Related
I need to upload/download file in s3 bucket with user metadata organisationID, OrganisationType, fileType. Filename will be same(file.pdf, file.xls) but it contents will be differ based on organisationID, OrganisationType, fileType
if organisationID =1, OrganisationType=A, fileType = P then i need to download file.pdf
If organisationID =1, OrganisationType=A, fileType = X then i need to download file.xls.
I have below dbouts, How to have same keyname in bucket as filename is same in different scenarios
how to download file based on userMeta files without key
One way to do this is to mimic a hierarchical layout. You could make the key name for oraganizationID=1, OrganizationType=A fileType=P as 1/A/P/file.pdf. Similarly you could make the key name for organisationID=1, OrganisationType=A, fileType=X as 1/A/X/file.xls. Another way is to just embed the metadata in the key name without mimicking a directory structure like 1_A_P_file.pdf. Given the metadata and known file naming logic you can recreate the key name and retrieve the appropriate file. Strip away the metadata in the file name prior to returning the file to the end user.
I am trying to write a groovy script which obtains the source file location, I already tried using
SOurceDataStore.getLogicalSchema().getTechnology().getInternalName()
This only returns "FILE" technology that is being used.
If I wanted to obtain this path:
How can I achieve that?
Method getSchemaName() of class OdiPhysicalSchema can be used to retrieve the schema (or path for the file technology).
Now it looks like you want to retrieve it from a Datastore so you should first find the model, then the logical schema, then the physical schema linked to it through a specific context. You would need an IOdiContextFinder to get the OdiContext object. The code would be something like this :
context = ((IOdiContextFinder)odiInstance.getTransactionalEntityManager().getFinder(OdiContext.class)).findByCode("MY_CONTEXT");
filepath = SOurceDataStore.getModel().getLogicalSchema().getPhysicalSchema(context).getSchemaName();
Trying to save an h2o model with some specific name that differs from the model's model_id field, but trying something like...
h2o.save_model(model=model,
path='/some/path/then/filename',
force=False)
just creates a dir/file structure like
some
|__path
|__then
|__filename
|__<model_id>
as opposed to
some
|__path
|__then
|__filename
Is this possible to do from the save_model method?
I can't / hesitate to simply change the model_id before calling the save method because the model names have timestamps appended to them to avoid name collisions with other models that may be on the h2o cluster (am trying to remove these timestamps when saving on disk and simplifying the name on the cluster before saving creates a time where naming collision can occur if other processes are also attempting to save such a model (of, say, a different timestamp)).
Any way to get this behavior or other common alternatives / workarounds?
This is currently not possible, however I created a feature request here. There is a related question here which shows a solution for R (could be adapted to Python). The work-around is just to rename the file manually using a few lines of R/Python code.
I am sorry if this has been answered before but all my searching is not coming up with a result.
I would like to place files directly into the target path and it not generate the UUID folder and then place the file in there. I know about the whole same filename could exist that is why I change the filename on the onChange event before uploading
I have tried to modify the handler.php but either I am not editing the correct lines or something else is going on.
After long and tiring, trying to figure this out, hours I have found a work around on this.
If you sent a blank uuid to the script it will not create the folder and will just place the file in the folder that you told the endpoint to put items. Not sure if this is how the script is supposed to work but it works for me.
I do not have to worry about files that are named the same as i have the script also change the file name before it gets upload with pre-prending a unique string to the file name.
callbacks:
{
onSubmit: function(id, name) {
this.setUuid(id, "")
console.log("onSubmit called")
}
}
I have several tables in my application that are displaying lists of objects from classes. I have properties to represent each value in those classes as defined in the following example
...
Public Property Cod() As Int32
Get
Return _codigo
End Get
Set(ByVal Value As Int32)
_codigo = value
End Set
End Property
...
My code hides every column and then, I use the "formatarCol" method to state which columns I want shown represented as the second parameter(which must have the same name as the properties mentioned above) and the name to be displayed for that column as the third parameter.
...
Utilidades.formatarCol(.Columns, "Cod", "Cod")
Utilidades.formatarCol(.Columns, "Estab", "Estabelecimento")
Utilidades.formatarCol(.Columns, "Sel", "Sel.")
...
Everything is working fine but I was trying to rename some of the properties. If I rename said properties I have to go to each table and manually change each string. I can't just use a replace all because different classes may have properties with the same name.
I was hoping that someone had a suggestion on how to get the Property's name instead of manually adding a string so that if I need to rename a Property again I won't have to manually go through every column where it's used and change the string.
You can rename property via refactor:
Right click your property in code -> Refactor -> Rename
It will only rename property for your class and for all occurrences of that property.
To access the name of a property in a class you can use NameOf
Example:
NameOf([Namespace to class].Cod) //returns "Cod"