Hive: MetadataTypedColumnsetSerDe upload jar - hadoop

I have a need to load an artifact of MetadataTypedColumnsetSerDe.
But I can't find it anywhere in open resourses (as opposed to DynamicSerDe, for example).
Could somebody know, where I can find this jar?

Related

What is the best way to inject different robots.txt into war file based on target?

I'd like to include a robots.txt file into a WAR file but use a different version based on the destination (e.g. sandbox or production).
What is the best way to achieve this with maven? My first thought it is
Create two different robots.txt in the source code; one called produdction.robots.txt and another called sandbox.robots.txt
Use https://coderplus.github.io/copy-rename-maven-plugin/rename-mojo.html to rename the appropriate file to robots.txt during the build process
use maven war configuration to exclude the other file
Is there a more elegant way? Note: we're using Gitlab CI/CD though I don't think that is too pertinent, assuming it is best to keep this process solely within the maven build cycle.
Thanks!

spring cloud data flow running out of disk space

We have large number of tasks(~30) kicked off by SCDF on PCF, however we are running to disk space issues with SCDF, the issue appears to be due to SCDF downloading artifacts each time a task is invoked.
The artifacts in our case are downloaded from an rest endpoint https://service/{artifact-name-version.jar} (which inturn serves it from an S3 repository)
Every time a task is invoked, it appears that SCDF downloads the artifact (to ~tmp/spring-cloud-deployer directory)verifies the sha1 hash to make sure it's the latest before it launches the task on PCF
The downloaded artifacts never get cleaned up
It's not desirable to download artifacts each time and fill up disk space in ~tmp/ of SCDF instance on PCF.
Is there a way to tell SCDF not to download artifact if it already exists ?
Also, can someone please explain the mechanism of artifact download, comparing sha1 hash and launching tasks (and various options around it)
Thanks !
SCDF downloads the artifacts for the following reasons at the server side.
1) Metadata (application properties) retrieval - if you have an explicit metadata resource then only that is downloaded
2) The corresponding deployer (local, CF) eventually downloads the artifact before it sends the deployment request/launching request.
The hash value is used for unique temp file creation when the artifact is downloaded.
Is there a way to tell SCDF not to download artifact if it already exists?
The HTTP based (or any explicit URL based other than maven, docker) artifacts are always downloaded due to the fact that the resources in a specific URL can be replaced with some other resource and we don't want to use the cache in this case.
Also, We recently deprecated the use of cache cleanup mechanism as it wasn't being used effectively.
If your use case (with this specific disk space limitation can't handle caching multiple artifacts) requires this cleaning of cache feature, please create a Github request here
We were also considering the removal of HTTP based artifact after it is deployed/launched. Looks like it is worth to revisit that now.

Artifactory/Jenkins Upload Spec using classifier?

I'm looking to use the upload spec to upload artifacts to artifactory from a jenkins job. I'd like to be able to use it to additionally attach a classifier to the artifact for if it is referenced through Maven.
Do you know if there's a way to do this? I'd like to use the upload spec for simplicity and ideally avoid having to get my hands too dirty with Maven.
I realised the answer to my own question. You assign a repository layout to your repo. For example maven-2-default. This describes the file format and how that relates to version number, classifier etc.
You can use file specs to upload or download from artifactory in a Jenkins Free Style job (simply check the Generic-Artifactory Integration) or a Jenkins Pipeline Job.
HTH,
Or

Why Doesn't nexus index list snaphot jars?

Nexus index lists pom, zip, test.jar, sources.jar and docs.zip for my snapshot but doesn't list the jar artifact.
Nexus view of storage shows the jar
Maven users pulling deps from this instance regularly down the jar
Is there a way to get it to show in the search and index with a download link or is the default artifact unlisted on purpose (I guess because I should always access it via pom).
You can, it works for me :
I don't see the snapshot jars in the browse index tab, but I see them in browse storage tab, and I can also see them in my web browser, seeing something like this :
https://repository.apache.org/content/groups/snapshots/commons-beanutils/commons-beanutils/1.8.4-SNAPSHOT/
I use :
Sonatype Nexus™ Open Source Edition, Version: 1.9.2
Samuel and I discussed the issue in the other answer. It seems this is working as designed probably due to the transient nature of snapshots.
You can find the snapshot jars (or default artifacts) via the full url and by browsing the storage but they do not show up in the index or the web UI search.

What's the proper way to access the filesystem from a bundle independent of the launcher?

I have a few resources (log files, database files, separate configuration files, etc.) that I would like to be able to access from my OSGi bundles. Up until now, I've been using a relative file path to access them. However, now my same bundles are running in different environments (plain old Felix and Glassfish).
Of course, the working directories are different and I would like to be able to use a method where the directory is known and deterministic. From what I can tell, the working directory for Glassfish shouldn't be assumed and isn't spec'ed (glassfish3/glassfish/domains/domain1/config currently).
I could try to embed these files in the bundle themselves, but then they would not be easily accessible. For instance, I want it to be easy to find the log files and not have to explode a cached bundle to access it. Also, I don't know that I can give my H2 JDBC driver a URL to something inside a bundle.
A good method is to store persistent files in a subdirectory of the current working directory (System.getProperty("user.dir") or of the users home directory (System.getProperty("user.home"))
Temporary and bundle specific files should be stored in the bundle's data area (BundleContext.getData()). Uninstalling the bundle will then automatically clean up. If different bundles need access to the same files, use a service to pass this information.
Last option is really long lived critically important files like major databases should be stored in /var or Window's equivalent. In those cases I would point out the location with Config Admin.
In general it is a good idea to deliver the files in a bundle and expand them to their proper place. This makes managing the system easier.
You have some options here. The first is to use the Configuration Admin service to specify a configuration directory, so you can access files if you have to.
For log files I recommend Ops4J Pax Logging. It allows you to simply use a logging API like slf4j and Pax Logging does the log management. It can be configured using a log4j config.
I think you should install the DB as a bundle too. For example I use Derby a lot in smaller projects. Derby can simply be started as a bundle and then manages the database files itself. I'm not sure about h2 but I guess it could work similarly.

Resources