Flume: Data transferring to Server

Flume: Data transferring to Server - hadoop

I am new to Flume-ng. I have to write a program, which can transfer a text file to other program (agent). I know we must know about agent i.e. host-ip, port number etc. Then a source, sink and a channel should be defined. I just want to transfer a log file to server. My client code is as follows.
public class MyRpcClientFacade {
public class MyClient{
private RpcClient client;
private String hostname;
private int port;
public void init(String hostname, int port) {
this.hostname = hostname;
this.port = port;
this.client = RpcClientFactory.getDefaultInstance(hostname, port);
}
public void sendDataToFlume(String data) {
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
try {
client.append(event);
} catch (EventDeliveryException e) {
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, port);
}
}
public void cleanUp() {
client.close();
}
}
Above code can send only String data to specified process. But i have to send files. Moreover tell me please that whether Source,Channel and Sink have to be written onto server? And if so, how to configure and write these three. Please help me. Give a small sample of Source,Sink And Channel

Actually you just have to get flume client on each node. Then you provide a config file providing information about their behaviors.
For instance, if your node read a file (read each new lines and send them as events to channel ), and send file contents trough a RPC socket. Your configuration will look like :
# sources/sinks/channels list
<Agent>.sources = <Name Source1>
<Agent>.sinks = <Name Sink1>
<Agent>.channels = <Name Channel1>
# Channel attribution to a source
<Agent>.sources.<Name Source1>.channels = <Name Channel1>
# Channel attribution to sink
<Agent>.sinks.<Name Sink1>.channels = <Name Channel1>
# Configuration (sources,channels and sinks)
# Source properties : <Name Source1>
<Agent>.sources.<Name Source1>.type = exec
<Agent>.sources.<Name Source1>.command = tail -F test
<Agent>.sources.<Name Source1>.channels = <Name Channel1>
# Channel properties : <Name Channel1>
<Agent>.channels.<Name Channel1>.type = memory
<Agent>.channels.<Name Channel1>.capacity = 1000
<Agent>.channels.<Name Channel1>.transactionCapacity = 1000
# Sink properties : <Name Sink1>
<Agent>.sinks.<Nom Sink1>.type = avro
<Agent>.sinks.<Nom Sink1>.channel = <Nom Channel1>
<Agent>.sinks.<Nom Sink1>.hostname = <HOST NAME or IP>
<Agent>.sinks.<Nom Sink1>.port = <PORT NUMBER>
Then you will have to set an agent, which will read on an avro source on same port and process the event the way you want to store them.
I hope it helps ;)

Related

Azure DataLake client fails on creating a directory

I'm creating a POC to store files in Azure following the steps in https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-dotnet. In the snippet below creating the directory fails with message No such host is known. (securedfstest02.blob.core.windows.net:443). Appreciate any suggestion to workaround
this issue.
using Azure;
using Azure.Storage;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace DataLakeHelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
try
{
CreateFileClientAsync_DirectoryAsync().Wait();
}
catch(Exception e)
{
Console.WriteLine(e);
}
}
static async Task CreateFileClientAsync_DirectoryAsync()
{
// Make StorageSharedKeyCredential to pass to the serviceClient
string storageAccountName = "secureblobtest02";
string storageAccountKey = "mykeyredacted";
string dfsUri = "https://" + storageAccountName + ".dfs.core.windows.net";
StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(storageAccountName, storageAccountKey);
// Create DataLakeServiceClient using StorageSharedKeyCredentials
DataLakeServiceClient serviceClient = new DataLakeServiceClient(new Uri(dfsUri), sharedKeyCredential);
// Create a DataLake Filesystem
DataLakeFileSystemClient filesystem = serviceClient.GetFileSystemClient("my-filesystem");
if(!await filesystem.ExistsAsync())
await filesystem.CreateAsync();
//Create a DataLake Directory
DataLakeDirectoryClient directory = filesystem.CreateDirectory("my-dir");
if (!await directory.ExistsAsync())
await directory.CreateAsync();
// Create a DataLake File using a DataLake Directory
DataLakeFileClient file = directory.GetFileClient("my-file");
if(!await file.ExistsAsync())
await file.CreateAsync();
// Verify we created one file
var response = filesystem.GetPathsAsync();
IAsyncEnumerator<PathItem> enumerator = response.GetAsyncEnumerator();
Console.WriteLine(enumerator?.Current?.Name);
// Cleanup
await filesystem.DeleteAsync();
}
}
}

--Update
In your question, you mention of Azure data lake, but you seem to have the host: securedfstest02.blob.core.windows.net
Azure Data Lake Storage uses .dfs.core.windows.net/ whereas a Azure Blob Storage uses .blob.core.windows.net/ While using Blob service related operations in ADLS you would have to change the endpoint too accordingly.
please note the official MS docs URI templates.
I have used the same code and was able to create directory. Just replaced my adls credentials. I have not configured any additional permissions. ADLS is Allowed access from all networks. You might want to check if yours is by default configured to specific network or if firewall allows client (your) IP.
using Azure;
using Azure.Storage;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace DataLakeHelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Starting....");
try
{
Console.WriteLine("Executing...");
CreateFileClientAsync_DirectoryAsync().Wait();
Console.WriteLine("Done");
}
catch (Exception e)
{
Console.WriteLine(e);
}
}
static async Task CreateFileClientAsync_DirectoryAsync()
{
// Make StorageSharedKeyCredential to pass to the serviceClient
string storageAccountName = "kteststarageeadls";
string storageAccountKey = "6fAe+P8LRe8LH0Ahxxxxxxxxx5ma17Slr7SjLy4oVYSgj05m+zWZuy5X8p4/Bbxxx8efzCj/X+On/Fwmxxxo7g==";
string dfsUri = "https://" + "kteststarageeadls" + ".dfs.core.windows.net";
StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(storageAccountName, storageAccountKey);
// Create DataLakeServiceClient using StorageSharedKeyCredentials
DataLakeServiceClient serviceClient = new DataLakeServiceClient(new Uri(dfsUri), sharedKeyCredential);
// Create a DataLake Filesystem
DataLakeFileSystemClient filesystem = serviceClient.GetFileSystemClient("my-filesystem");
if (!await filesystem.ExistsAsync())
await filesystem.CreateAsync();
//Create a DataLake Directory
DataLakeDirectoryClient directory = filesystem.CreateDirectory("my-dir");
if (!await directory.ExistsAsync())
await directory.CreateAsync();
// Create a DataLake File using a DataLake Directory
DataLakeFileClient file = directory.GetFileClient("my-file");
if (!await file.ExistsAsync())
await file.CreateAsync();
// Verify we created one file
var response = filesystem.GetPathsAsync();
IAsyncEnumerator<PathItem> enumerator = response.GetAsyncEnumerator();
Console.WriteLine(enumerator?.Current?.Name);
// Cleanup
//await filesystem.DeleteAsync();
}
}
}
I've edited storage account key, used for reference only.

Get dynamically assigned server port?

When I call
import org.apache.http.impl.nio.bootstrap.*;
import java.net.InetSocketAddress;
HttpServer server = ServerBootstrap.bootstrap()
.setListenerPort(0)
// ...
.create();
server.start();
How do I get the actual port number assigned to the server?
I tried
int port = ((InetSocketAddress) server.getEndpoint().getAddress()).getPort();
But that just returned 0

HttpComponents hide a lot of the internals and especially what you are looking for, a workaround would be to retrieve a port outside the library and use it:
int port;
try (ServerSocket socket = new ServerSocket()) {
socket.setReuseAddress(false);
socket.bind(new InetSocketAddress(0));
port = socket.getLocalPort();
}
HttpServer server = ServerBootstrap.bootstrap()
.setListenerPort(port)
// ...
.create();
server.start();

How to change the from address in send grid email

I have configured the send grid API for email service in my spring boot APP. And, it's working fine. I wanted to change the from address as "no-reply#xyz.com" instead of "apikey". But, I couldn't.
Also, I tried it using JavaMaiSender. But, no luck.
Could you please anyone let me know?
public void sendEmailUsingSendgrid(EmailRequest emailRequest) throws IOException {
String text = getEmailTemplate(emailRequest);
SendGrid sg = new SendGrid(sendGridApi);
sg.addRequestHeader("X-Mock", "true");
Request request = new Request();
Mail mail = new Mail();
mail.setFrom(new Email(emailRequest.getFr()));
mail.setSubject(emailRequest.getSbjt());
mail.addContent(new Content("text/html", text));
List<String> mailList = Arrays.asList(emailRequest.getTo());
for (String to : mailList) {
Personalization p1 = new Personalization();
p1.addTo(new Email(to));
mail.addPersonalization(p1);
}
mail.setReplyTo(new Email("noreply#xyz.com"));
request.setMethod(Method.POST);
request.setEndpoint("mail/send");
request.setBody(mail.build());
sg.api(request);
}
Properties
# SENDGRID
sendgrid-api-key=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

How to extract and manipulate data within a Nifi processor

I'm trying to write a custom Nifi processor which will take in the contents of the incoming flow file, perform some math operations on it, then write the results into an outgoing flow file. Is there a way to dump the contents of the incoming flow file into a string or something? I've been searching for a while now and it doesn't seem that simple. If anyone could point me toward a good tutorial that deals with doing something like that it would be greatly appreciated.

The Apache NiFi Developer Guide documents the process of creating a custom processor very well. In your specific case, I would start with the Component Lifecycle section and the Enrich/Modify Content pattern. Any other processor which does similar work (like ReplaceText or Base64EncodeContent) would be good examples to learn from; all of the source code is available on GitHub.
Essentially you need to implement the #onTrigger() method in your processor class, read the flowfile content and parse it into your expected format, perform your operations, and then re-populate the resulting flowfile content. Your source code will look something like this:
#Override
public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) {
return;
}
final ComponentLog logger = getLogger();
AtomicBoolean error = new AtomicBoolean();
AtomicReference<String> result = new AtomicReference<>(null);
// This uses a lambda function in place of a callback for InputStreamCallback#process()
processSession.read(flowFile, in -> {
long start = System.nanoTime();
// Read the flowfile content into a String
// TODO: May need to buffer this if the content is large
try {
final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
result.set(new MyMathOperationService().performSomeOperation(contents));
long stop = System.nanoTime();
if (getLogger().isDebugEnabled()) {
final long durationNanos = stop - start;
DecimalFormat df = new DecimalFormat("#.###");
getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
}
} catch (Exception e) {
error.set(true);
getLogger().error(e.getMessage() + " Routing to failure.", e);
}
});
if (error.get()) {
processSession.transfer(flowFile, REL_FAILURE);
} else {
// Again, a lambda takes the place of the OutputStreamCallback#process()
FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
final String resultString = result.get();
final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);
// TODO: This can use a while loop for performance
out.write(resultBytes, 0, resultBytes.length);
out.flush();
});
processSession.transfer(updatedFlowFile, REL_SUCCESS);
}
}
Daggett is right that the ExecuteScript processor is a good place to start because it will shorten the development lifecycle (no building NARs, deploying, and restarting NiFi to use it) and when you have the correct behavior, you can easily copy/paste into the generated skeleton and deploy it once.

IBM IOT C# client thowing invalid ip address exception when constructing gatewayclient

I am beginning to use the sample IBM-IOT C# sample code as per
https://github.com/ibm-watson-iot/iot-csharp/blob/master/docs/Gateway.rst
however I get "An invalid IP address was specified." thrown when the gateway constructor is called using the org id.
I'm using an orgid of 'p3wg4w' (set in config and accessed as a string property Globals.WatsonOrgID" )
my code is
private static void InitGatewayClient()
{
if (gw == null)
{
gw = new GatewayClient(Globals.WatsonOrgID,
Globals.WatsonGatewayDeviceType,
Globals.WatsonGatewayDeviceID,
Globals.WatsonAuthMethod,
Globals.WatsonToken);
gw.commandCallback += processCommand;
gw.errorCallback += processError;
gw.connect();
Console.WriteLine("Gateway connected");
Console.WriteLine("publishing gateway events..");
}
}
Has anyone seen this before ?

check if you can access or if you can:
telnet p3wg4w.messaging.internetofthings.ibmcloud.com 8883
The libraries aren't using any IP to create the connection, it is using the below vars
public static string DOMAIN = ".messaging.internetofthings.ibmcloud.com";
public static int MQTTS_PORT = 8883;
I can only think that your firewall is blocking the connection
I've used the below sample and worked just fine for me:
https://github.com/ibm-watson-iot/iot-csharp/blob/master/sample/Gateway/SampleGateway.cs

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Flume: Data transferring to Server - hadoop

Related

Azure DataLake client fails on creating a directory

Get dynamically assigned server port?

How to change the from address in send grid email

How to extract and manipulate data within a Nifi processor

IBM IOT C# client thowing invalid ip address exception when constructing gatewayclient

Categories

Resources