is there a way to get Spark Tracking URL other than mining log files for the log output? - hadoop

I have an Scala application that creates a Spark Session and I have set up health checks that use the Spark REST API. The Spark Application itself runs on Hadoop Yarn. The REST API URL is currently retrieved by reading the Spark logging generated when the Spark Session is created. This works most of the time but there are some edge cases in my application where it doesn't work so well.
Does anyone know of another way to get this tracking URL?

"You can do this by reading the yarn.resourcemanager.webapp.address value from YARN's config and the application ID (which is exposed both in an event sent on the listener bus, and an existing SparkContext method."
Copied the paragraph above as is from the developer's response found at: https://issues.apache.org/jira/browse/SPARK-20458
UPDATE:
I did try the solution and got pretty close. Here's some Scala/Spark code to build that URL:
#transient val ssc: StreamingContext = StreamingContext.getActiveOrCreate(rabbitSettings.checkpointPath, CreateStreamingContext)
// Update yarn logs URL in Elasticsearch
YarnLogsTracker.update(
ssc.sparkContext.uiWebUrl,
ssc.sparkContext.applicationId,
"test2")
And the YarnLogsTracker object goes something like this:
object YarnLogsTracker {
private def recoverURL(u: Option[String]): String = u match {
case Some(a) => a.split(":").take(2).mkString(":")
case None => ""
}
def update(rawUrl: Option[String], rawAppId: String, tenant: String): Unit = {
val logUrl = s"${recoverURL(rawUrl)}:8042/node/containerlogs/container${rawAppId.substring(11)}_01_000002/$tenant/stdout/?start=-4096"
...
Which produces something like this: http://10.99.25.146:8042/node/containerlogs/container_1516203096033_91164_01_000002/test2/stdout/?start=-4096

I've discovered a "reasonable" way to obtain this. Obviously, the best way would be for Spark libraries to expose the ApplicationReport that they're already fetching to the launcher application directly, since they go to the trouble of setting delegation tokens, etc. However, this seems unlikely to happen.
This approach is two-pronged. First, it attempts to build a YarnClient itself, in order to fetch the ApplicationReport, which will have the authoritative tracking URL. However, from my experience, this can fail (ex: if the job was run in CLUSTER mode, with a --proxy-user in a Kerberized environment, then this will not be able to properly authenticate to YARN).
In my case, I'm calling this helper method from the driver itself, and reporting the result back to my launcher application on the side. However, in principle, any place where you have the Hadoop Configuration available should work (including, possibly, your launcher application). You can obviously use either "prong" of this implementation (or both) depending on your needs and tolerance for complexity, extra processing, etc.
/**
* Given a Hadoop {#link org.apache.hadoop.conf.Configuration} and appId, use the YARN API (via an
* {#link YarnClient} instance) to get the application report, which includes the trackingUrl. If this fails,
* then as a fallback, it attempts to "guess" the URL by looking at various YARN configuration properties,
* and assumes that the URL will be something like: <pre>[yarnWebUI:port]/proxy/[appId]</pre>.
*
* #param hadoopConf the Hadoop {#link org.apache.hadoop.conf.Configuration}
* #param appId the YARN application ID
* #return the app trackingUrl, either retrieved using the {#link YarnClient}, or manually constructed using
* the fallback approach
*/
public static String getYarnApplicationTrackingUrl(org.apache.hadoop.conf.Configuration hadoopConf, String appId) {
LOG.debug("Attempting to look up YARN url for applicationId {}", appId);
YarnClient yarnClient = null;
try {
// do not attempt to fail over on authentication error (ex: running with proxy-user and Kerberos)
hadoopConf.set("yarn.client.failover-max-attempts", "0");
yarnClient = YarnClient.createYarnClient();
yarnClient.init(hadoopConf);
yarnClient.start();
final ApplicationReport report = yarnClient.getApplicationReport(ConverterUtils.toApplicationId(appId));
return report.getTrackingUrl();
} catch (YarnException | IOException e) {
LOG.warn(
"{} attempting to get report for YARN appId {}; attempting to use manually constructed fallback",
e.getClass().getSimpleName(),
appId,
e
);
String baseYarnWebappUrl;
String protocol;
if ("HTTPS_ONLY".equals(hadoopConf.get("yarn.http.policy"))) {
// YARN is configured to use HTTPS only, hence return the https address
baseYarnWebappUrl = hadoopConf.get("yarn.resourcemanager.webapp.https.address");
protocol = "https";
} else {
baseYarnWebappUrl = hadoopConf.get("yarn.resourcemanager.webapp.address");
protocol = "http";
}
return String.format("%s://%s/proxy/%s", protocol, baseYarnWebappUrl, appId);
} finally {
if (yarnClient != null) {
yarnClient.stop();
}
}
}

Related

Fabric8 customResourceDefinitions test

I am working on Fabric8 unit test, now I am trying to create a CRD against KubernetesServer.
import io.fabric8.kubernetes.api.model.apiextensions.v1.CustomResourceDefinition;
public class TestCertManagerService {
#Rule
public KubernetesServer server = new KubernetesServer();
#Test
#DisplayName("Should list all CronTab custom resources")
public void testCronTabCrd() throws IOException {
// Given
//server.expect().get().withPath("/apis/stable.example.com/v1/namespaces/default/crontabs").andReturn(HttpURLConnection.HTTP_OK, ?????).once();
KubernetesClient client = server.getClient();
CustomResourceDefinition cronTabCrd = client.apiextensions().v1().customResourceDefinitions()
.load(new BufferedInputStream(new FileInputStream("src/test/resources/crontab-crd.yml")))
.get();
client.apiextensions().v1().customResourceDefinitions().create(cronTabCrd);
}
}
When I ran it, I got the following error
TestCertManagerService > testCronTabCrd FAILED
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://localhost:60690/apis/apiextensions.k8s.io/v1/customresourcedefinitions.
at app//io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:694)
at app//io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:673)
at app//io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:626)
at app//io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:566)
at app//io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:527)
at app//io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:510)
at app//io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:136)
at app//io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:505)
at app//io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:494)
at app//io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:87)
at app//com.ibm.si.qradar.cp4s.service.certmanager.TestCertManagerService.testCronTabCrd(TestCertManagerService.java:94)
I have a few of questions:
(1) In this case, I am using v1() interface, sometimes I saw example code is using v1beta1(), what decides this version? By the way, I am using Kubernetes-client library 5.9.0
(2) In my code , I comments out this line
server.expect().get().withPath("/apis/stable.example.com/v1/namespaces/default/crontabs").andReturn(HttpURLConnection.HTTP_OK, ?????).once();
What is this statement for? In my case, I want to load a CRD, then create a CR, what is "?????" in the statement?
Any ideas for stack trace? How to fix it?
I appreciate it in advance.
From the code which you shared, it looks like you're using Fabric8 Kubernetes Mock Server in expectations mode. Expectations mode requires the user to set the REST API expectations. So the code shown below is setting some expectations from Mock Server viewpoint.
// Given
server.expect().get()
.withPath("/apis/stable.example.com/v1/namespaces/default/crontabs")
.andReturn(HttpURLConnection.HTTP_OK, getCronTabList())
.once();
These are the expectations set:
Mock Server would be requested a GET request at this URL: /apis/stable.example.com/v1/namespaces/default/crontabs . From URL we can expect a resource under stable.example.com apigroup with v1 version, default namespace and crontabs as plural.
When this URL is being hit, you're also defining response code and response body in andReturn() method. First argument is the response code (200 in this case) and second argument is the response body (a List object of CronTab which would be serialized and sent as response by mock server).
This request is only hit .once(), if KubernetesClient created by Mock Server requests this endpoint more than once; the test would fail. If you want to hit the endpoint more than once, you can use .times(..) method instead.
But in your test I see you're loading a CustomResourceDefinition from YAML and creating it which doesn't seem to match the expectations you set earlier. If you're writing a test about creating a CustomResourceDefinition, it should look like this:
#Test
#DisplayName("Should Create CronTab CRD")
void testCronTabCrd() throws IOException {
// Given
KubernetesClient client = server.getClient();
CustomResourceDefinition cronTabCrd = client.apiextensions().v1()
.customResourceDefinitions()
.load(new BufferedInputStream(new FileInputStream("src/test/resources/crontab-crd.yml")))
.get();
server.expect().post()
.withPath("/apis/apiextensions.k8s.io/v1/customresourcedefinitions")
.andReturn(HttpURLConnection.HTTP_OK, cronTabCrd)
.once();
// When
CustomResourceDefinition createdCronTabCrd = client.apiextensions().v1()
.customResourceDefinitions()
.create(cronTabCrd);
// Then
assertNotNull(createdCronTabCrd);
}
Bdw, if you don't like setting REST expectations. Fabric8 Kubernetes Mock Server also has a CRUD mode which mock real Kubernetes APIServer. You can enable it like this:
#Rule
public KubernetesServer server = new KubernetesServer(true, true);
then use it in test like this:
#Test
#DisplayName("Should Create CronTab CRD")
void testCronTabCrd() throws IOException {
// Given
KubernetesClient client = server.getClient();
CustomResourceDefinition cronTabCrd = client.apiextensions().v1()
.customResourceDefinitions()
.load(new BufferedInputStream(new FileInputStream("src/test/resources/crontab-crd.yml")))
.get();
// When
CustomResourceDefinition createdCronTabCrd = client.apiextensions().v1()
.customResourceDefinitions()
.create(cronTabCrd);
// Then
assertNotNull(createdCronTabCrd);
}
I added CustomResourceLoadAndCreateTest and CustomResourceLoadAndCreateCrudTest tests in my demo repository: https://github.com/r0haaaan/kubernetes-mockserver-demo

Plaid Link with Java API - Why am I getting 'client_id must be a properly formatted, non-empty string' error

I'm using the official Plaid Java API to make a demo application. I've got the back end working in Sandbox, with their /sandbox/public_token/create generated public keys.
Now, I'm trying to modify the front-end from Plaid's quickstart project to talk with my back end, so I can start using the development tier to work with my IRL bank account.
I'm implementing the basic first step - generating a link_token. However, when the front end calls my controller, I get the following error:
ErrorResponse{displayMessage='null', errorCode='INVALID_FIELD', errorMessage='client_id must be a properly formatted, non-empty string', errorType='INVALID_REQUEST', requestId=''}
This is my current iteration on trying to generate a link_token:
public LinkTokenResponse generateLinkToken() throws IOException {
List<String> plaidProducts = new ArrayList<>();
plaidProducts.add("transactions");
List<String> countryCodes = new ArrayList<>();
countryCodes.add("US");
countryCodes.add("CA");
Response<LinkTokenCreateResponse> response =
plaidService.getClient().service().linkTokenCreate(new LinkTokenCreateRequest(
new LinkTokenCreateRequest.User("test_user_ID"),
"test client",
plaidProducts,
countryCodes,
"en"
).withRedirectUri("")).execute();
try {
ErrorResponse errorResponse = plaidService.getClient().parseError(response);
System.out.println(errorResponse.toString());
} catch (Exception e) {
// deal with it. you didn't even receive a well-formed JSON error response.
}
return new LinkTokenResponse(response.body().getLinkToken());
}
I modeled this after how it seems to work in the Plaid Quickstart's example. I do not see client ID being set explicitly anywhere in there, or anywhere else in Plaid's Java API. I'm at a bit of a loss.
I'm not super familiar with the Java Plaid library specifically, but when using the Plaid client libraries, the client ID is generally set when initializing the client instance. From there, it is automatically included in any calls you make from that client.
You can see the client ID being set in the Java Quickstart here:
https://github.com/plaid/quickstart/blob/master/java/src/main/java/com/plaid/quickstart/QuickstartApplication.java#L67
PlaidClient.Builder builder = PlaidClient.newBuilder()
.clientIdAndSecret(configuration.getPlaidClientID(), configuration.getPlaidSecret());
switch (configuration.getPlaidEnv()) {
case "sandbox":
builder = builder.sandboxBaseUrl();
break;
case "development":
builder = builder.developmentBaseUrl();
break;
case "production":
builder = builder.productionBaseUrl();
break;
default:
throw new IllegalArgumentException("unknown environment: " + configuration.getPlaidEnv());
}
PlaidClient plaidClient = builder.build();

Can anyone tell me the Java utility to download documents to your local PC from Content Engine in filenet?

Hello Guys I am trying to write the java utility to download the documents to local PC from content engine in filenet can anyone help me out?
You should read about FileNet P8 CE API, you can start here:
You have to know that the FileNet Content Engine has two types of interface that can be used to connect to it: RMI and SOAP. A cmd line app you are planning to write, can connect only by SOAP (I am not sure that this is true for the newest versions, but what is definitely true, that it is much easier to setup the SOAP connection than EJB), so you have to read that part of the documentation, how to establish a connection in this way to your Content Engine.
On the link above, you can see that first of all you have to collect the required jars for SOAP connection: please check the "Required for a Content Engine Java API CEWS transport client" section for the file names.
After you collect them, you will need a SOAP WSDL URL and a proper user and password, the user has to have read properties and read content right to the documents you would like to download. You also need to know the ObjectStore name and the identifier or the location of your documents.
Now we have to continue using this Setting Up a Thick Client Development Environment link (I opened it from the page above.)
Here you have to scroll down to the "CEWS transport protocol (non-application-server dependent)" section.
Here you can see, that you have to create a jaas.conf file with the following content:
FileNetP8WSI {
com.filenet.api.util.WSILoginModule required;
};
This file must be added as the following JVM argument when you run the class we will create:
java -cp %CREATE_PROPER_CLASSPATH% -Djava.security.auth.login.config=jaas.conf DownloadClient
Now, on the top-right corner of the page, you can see links that describes what to do in order to get a connection, like "Getting Connection", "Retrieving an EntireNetwork Object" etc. I used that snipplet to create the class below for you.
public class DownloadClient {
public static void main(String[] args) throws Exception{
String uri = "http://filenetcehost:9080/wsi/FNCEWS40MTOM";
String userId = "ceadmin";
String password = "password";
String osName = "Test";
UserContext uc = UserContext.get();
try {
//Get the connection and default domain
Connection conn = Factory.Connection.getConnection(uri);
Domain domain = Factory.Domain.getInstance(conn, null);
ObjectStore os = Factory.ObjectStore.fetchInstance(domain, osName, null);
// the last value (jaas samza name) must match with the name of the login module in jaas.conf
Subject subject =UserContext.createSubject(connection, userId, password, "FileNetP8WSI");
// set the subject to the local thread via threadlocal
uc.pushSubject(subject);
// from now, we are connected to FileNet CE, and objectStore "Test"
//https://www.ibm.com/support/knowledgecenter/en/SSNW2F_5.2.0/com.ibm.p8.ce.dev.ce.doc/document_procedures.htm
Document doc = Factory.Document.getInstance(os, ClassNames.DOCUMENT, new Id("{F4DD983C-B845-4255-AC7A-257202B557EC}") );
// because in FileNet a document can have more that one associated content element
// (e.g. stores single page tifs and handle it as a multipaged document), we have to
// get the content elements and iterate list.
ContentElementList docContentList = doc.get_ContentElements();
Iterator iter = docContentList.iterator();
while (iter.hasNext() )
{
ContentTransfer ct = (ContentTransfer) iter.next();
// Print element sequence number and content type of the element.
// Get and print the content of the element.
InputStream stream = ct.accessContentStream();
// now you have an inputstream to the document content, you can save it local file,
// or you can do what you want with it, just do not forget to close the stream at the end.
stream.close();
}
} finally {
uc.popSubject();
}
}
}
This code is just shows how can you implement such a thick client, I have created it now using the documentation, not production code. But after specifying the packages to import, and may handle the exceptions it will probably work.
You have to specify the right URL, user, password and docId of course, and you have to implement the copy from the TransferInputStream to a FileOutputStream, e.g. by using commons.io or java NIO, etc.

ETW EventSource not logging events on Windows Server

I wrote an ETW EventSource using the Microsoft EventSource Libary 1.1.25 on Nuget. The purpose of the EventSource is to send events to a custom event log for a security application we maintain. The code works locally, but we can not get events to be written to the event log on the server.
The EventSource is named (similar too) Company-Security and sends events to the Admin Channel. Locally on my development machine, I can register the eventsource manifest with wevtutil, and see the Company-Security folder with the Admin log underneath in Windows Event Viewer. When I run the application, the events are recorded in the event log.
However, when I deploy the application to the test server (running Windows Server 2012), event logging is not working. The log is created and visible in Event Viewer after I register the manifest with wevtutil, though the name is slightly different. A folder named Company-Security/Admin is created with a log named Company-Security/Admin insider the folder. I can also run perfview on the server and see the events created. However, nothing is ever written to the event log. I have also put some debug statements in the EventSource code and can see that the EventSource IsEnabled() is returning true.
Below are code snippets of the base class and the implementation class of the eventsource I wrote.
I've researched and can't find any explanation as to why event logging does not work on the server, but works on the development machine. I assume I am missing something, but not sure what.
Abstract Base Class:
public abstract class SecurityEventsBase : EventSource {
protected unsafe void WriteEvent(int eventId, long arg1, string arg2, string arg3) {
if (IsEnabled()) {
if (arg2 == null) {
arg2 = "[not provided]";
}
if (arg3 == null) {
arg3 = "[not provided]"; ;
}
fixed (char* arg2Ptr = arg2) {
fixed (char* arg3Ptr = arg3) {
EventSource.EventData* dataDesc = stackalloc EventSource.EventData[3];
dataDesc[0].DataPointer = (IntPtr)(&arg1);
dataDesc[0].Size = 8;
dataDesc[1].DataPointer = (IntPtr)arg2Ptr;
dataDesc[1].Size = (arg2.Length + 1) * 2;
dataDesc[2].DataPointer = (IntPtr)arg3Ptr;
dataDesc[2].Size = (arg3.Length + 1) * 2;
WriteEventCore(eventId, 3, dataDesc);
}
}
}
}
EventSource Class:
[EventSource(Name="Company-Security",LocalizationResources="Events.Properties.Resources")]
public sealed class AuthorizationEvents : SecurityEventsBase {
public static AuthorizationEvents Log = new AuthorizationEvents();
[Event(2000,Level=EventLevel.Informational,Channel=EventChannel.Admin,Message="User '{1}' ({0}) logged in successfully from IP Address {2}")]
public void Login(long UserId, string UserName, string IPAddress) {
if (IsEnabled()) {
WriteEvent(2000, UserId, UserName, IPAddress);
}
}
** additional events would follow here**
}
I finally resolved this problem. It had to do with permissions on the folder the manifest and binary manifest resource files were stored in.
I found this StackOverflow answer which helped me resolve the problem: https://stackoverflow.com/a/13090615/5202678
I had to grant Read & Execute privileges to the folder to the local Users group to the folder the manifest files were stored in. Once I did this, events immediately started recording in the Event Log.

Session management for a RESTful Web Service using Jersey

I am developing a Restful Web Service using Jersey between my Android, iPhone apps and MySQL. I also use Hibernate to map the data to the database.
I have a sessionId (key). it is generated when user Login to the system.
In User class:
public Session daoCreateSession() {
if (session == null) {
session = new Session(this);
} else {
session.daoUpdate();
}
return session;
}
In Session Class:
Session(User user) {
this.key = UUID.randomUUID().toString();
this.user = user;
this.date = new Date();
}
void daoUpdate() {
this.key = UUID.randomUUID().toString();
this.date = new Date();
}
When user Sign in to the system successfully, I send this sessionId to the Mobile app client. Then when I want to get some information from database based on the logged in user, I check this Session key as authentication in the REST Services for every request.
For example for the list of project that user is involved in, I use client.GET(SERVER_ADDRESS/project/get/{SessionID})
insetead of client.GET(SERVER_ADDRESS/project/get/{username}).
And if it is not a valid session key, I'll send back to the client a 403 forbidden code.
You can also take a look here
The thing is I am not sure about my approach. what do you think about cons in this approach considering for Jersey and a mobile app?
I still don't know if the Session key approach is a good idea in my case.
If you want to use SessionId then it should have a validation time, like this:
private static final int MINUTES = 90;
public boolean isValid() {
return System.currentTimeMillis() - date.getTime() < 1000 * 60 * MINUTES;
}
This is a solved problem - servlet containers like Tomcat already do session management, and can distribute session state to other containers in the cluster either by broadcasting over TCP, or by using a shared data source like memcache.
I'd suggest reading up on what's already available, rather than inadvertently reinventing the wheel. Additionally, this is going to become an incredibly hot table table if your application proves popular. How will you clear out old session IDs?

Resources