I working on spark streaming job in which incoming stream join with existing hive table. I have created a singleton hiveContext. When hiveContext fetch the table data from hive, spark give warning and after few day warning converts into error.
18/03/10 15:55:28 INFO parquet.ParquetRelation$$anonfun$buildInternalScan$1$$anon$1: Input split: ParquetInputSplit{part: hdfs://nameservice1/user/hive/warehouse/iot.db/iotdevice/part-r-00000-931d1d81-af03-41a4-b659-81a883131289.gz.parquet start: 0 end: 5695 length: 5695 hosts: []}
18/03/10 15:55:28 WARN security.UserGroupInformation: PriviledgedActionException as:svc-ra-iotloaddev (auth:SIMPLE) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
18/03/10 15:55:28 WARN kms.LoadBalancingKMSClientProvider: KMS provider at [https://iotserver9009.kd.iotserver.com:16000/kms/v1/] threw an IOException [org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]!!
It will stop the job after some day.
Here is code for creating hivecontext
#transient private var instance: HiveContext = _
def getHiveContext(sparkContext: SparkContext, propertiesBroadcast: Broadcast[Properties]): HiveContext = {
synchronized {
val configuration = new Configuration
configuration.addResource("/etc/hadoop/conf/hdfs-site.xml")
UserGroupInformation.setConfiguration(configuration)
UserGroupInformation.getCurrentUser.setAuthenticationMethod(AuthenticationMethod.KERBEROS)
val secure = propertiesBroadcast.value.getProperty("kerberosSecurity").toBoolean
if (instance == null) {
UserGroupInformation.loginUserFromKeytabAndReturnUGI(
propertiesBroadcast.value.getProperty("hadoop.kerberos.principal"), sparkContext.getConf.get("spark.yarn.keytab"))
.doAs(new PrivilegedExceptionAction[HiveContext]() {
#Override
def run(): HiveContext = {
System.setProperty("hive.metastore.uris", propertiesBroadcast.value.getProperty("hive.metastore.uris"));
if (secure) {
System.setProperty("hive.metastore.sasl.enabled", "true")
System.setProperty("hive.metastore.kerberos.keytab.file", sparkContext.getConf.get("spark.yarn.keytab"))
System.setProperty("hive.security.authorization.enabled", "false")
System.setProperty("hive.metastore.kerberos.principal", propertiesBroadcast.value.getProperty("hive.metastore.kerberos.principal"))
System.setProperty("hive.metastore.execute.setugi", "true")
}
instance = new HiveContext(sparkContext)
instance.setConf("spark.sql.parquet.writeLegacyFormat", "true")
instance.sparkContext.hadoopConfiguration.set("parquet.enable.summary-metadata", "false")
instance.setConf("hive.exec.dynamic.partition", "true")
instance.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
instance
}
})
}
UserGroupInformation.loginUserFromKeytabAndReturnUGI(
propertiesBroadcast.value.getProperty("hadoop.kerberos.principal"), sparkContext.getConf.get("spark.yarn.keytab"))
.doAs(new PrivilegedExceptionAction[HiveContext]() {
#Override
def run(): HiveContext = {
instance
}
})
}
}
I usually solve these problems running kinit before running the command.
The command klist will display your valid kerberos tickets.
Related
We are trying to connect to Presto using Java code and execute some queries. Catalog we are using is MySQL.
Presto is installed on the Linux server. Presto CLI is working fine on Linux. Started Presto in Linux.
MySQL is also installed on the Linux machine. We are able to access MySQL in windows using DbVisualizer.
I created a MySQL connector catalog for Presto. I'm successful in querying data of MySQL using Presto CLI as presto --server localhost:8080 --catalog mysql --schema tutorials.
Executing the Java code on the Windows machine, I'm able to access MySQL and execute queries, but we are unable to query data. When we try to run a query from Presto, it is giving us Error Executing Query. In the below example, I have used a jar from Trinosql
package testdbPresto;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;
public class PrestoJdbc {
public static void main(String args[]) throws SQLException, ClassNotFoundException {
try{
//connect mysql server tutorials database here
Class.forName("com.facebook.presto.jdbc.PrestoDriver");
String url = "jdbc:trino://35.173.241.37:8080/mysql/tutorials";
Properties properties = new Properties();
properties.setProperty("user", "root");
properties.setProperty("password", "Redcar88!");
properties.setProperty("SSL", "true");
Connection connection = DriverManager.getConnection(url, properties);
Statement statement = null;
statement = connection.createStatement();
//select mysql table author table two columns
String sql;
sql = "select auth_id, auth_name from mysql.tutorials.author";
ResultSet resultSet = statement.executeQuery(sql);
//Extract data from result set
while (resultSet.next()) {
//Retrieve by column name
String name = resultSet.getString("auth_name");
//Display values
System.out.println("name : " + name);
}
//Clean-up environment
resultSet.close();
statement.close();
connection.close();
}catch(Exception e){ e.printStackTrace();}
}
}
Output:
java.sql.SQLException: Error executing query
at io.trino.jdbc.TrinoStatement.internalExecute(TrinoStatement.java:274)
at io.trino.jdbc.TrinoStatement.execute(TrinoStatement.java:227)
at io.trino.jdbc.TrinoStatement.executeQuery(TrinoStatement.java:76)
at testdbPresto.PrestoJdbc.main(PrestoJdbc.java:29)
Caused by: java.io.UncheckedIOException: javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
at io.trino.jdbc.$internal.client.JsonResponse.execute(JsonResponse.java:154)
at io.trino.jdbc.$internal.client.StatementClientV1.<init>(StatementClientV1.java:110)
at io.trino.jdbc.$internal.client.StatementClientFactory.newStatementClient(StatementClientFactory.java:24)
at io.trino.jdbc.QueryExecutor.startQuery(QueryExecutor.java:46)
at io.trino.jdbc.TrinoConnection.startQuery(TrinoConnection.java:728)
at io.trino.jdbc.TrinoStatement.internalExecute(TrinoStatement.java:239)
... 3 more
Caused by: javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
at sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:448)
at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:174)
at sun.security.ssl.SSLTransport.decode(SSLTransport.java:110)
at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1279)
at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1188)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:401)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:373)
at io.trino.jdbc.$internal.okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:299)
at io.trino.jdbc.$internal.okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:268)
at io.trino.jdbc.$internal.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:160)
at io.trino.jdbc.$internal.okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:256)
at io.trino.jdbc.$internal.okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:134)
at io.trino.jdbc.$internal.okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:113)
at io.trino.jdbc.$internal.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.trino.jdbc.$internal.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.trino.jdbc.$internal.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at io.trino.jdbc.$internal.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:125)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.trino.jdbc.$internal.client.OkHttpUtil.lambda$basicAuth$1(OkHttpUtil.java:85)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.trino.jdbc.$internal.client.OkHttpUtil.lambda$userAgent$0(OkHttpUtil.java:71)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at io.trino.jdbc.$internal.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.trino.jdbc.$internal.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
at io.trino.jdbc.$internal.okhttp3.RealCall.execute(RealCall.java:77)
at io.trino.jdbc.$internal.client.JsonResponse.execute(JsonResponse.java:131)
... 8 more
It is quite old question but it might be still relevant.
You are trying connect to trino with presto jdbc driver. PrestoSQL was rebranded as Trino . So in order to access trino via jdb, you should use trino jdbc driver.
Add trino dependency in your classpath.
If you use maven, add this dependency in the pom.
<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-jdbc</artifactId>
<version>${trino-jdbc.version}</version>
</dependency>
Then use the following driver
Class.forName("io.trino.jdbc.TrinoDriver");
Here is a code that works with Trino.
fun main() {
val trinoUrl = "jdbc:trino://myDomain:443"
val properties = Properties()
properties.setProperty("user", "noUserS")
// properties.setProperty("password", "noPass")
properties.setProperty("SSL", "true")
DriverManager.getConnection(trinoUrl, properties).use { trinoConn ->
trinoConn.createStatement().use { statement ->
statement.connection.catalog = "catalog1"
statement.connection.schema = "default"
println("Executing query...")
statement.executeQuery("""
select
restaurantId,
type,
time
from table1
where time > CURRENT_TIMESTAMP - INTERVAL '1' hour
""".trimIndent()
).use { resultSet ->
val list = mutableListOf<Map<String, String>>()
while (resultSet.next()) {
val data = mapOf(
"restaurantId" to resultSet.getString("restaurantId"),
"type" to resultSet.getString("type"),
"time" to resultSet.getString("time")
)
list.add(data)
}
println("Records returned: ${list.size}")
println(list)
}
}
}
exitProcess(0)
}
It is Kotlin, but it's easy to understand.
The .use {..} it's try-with-resources in Java.
Hope this helps.
Folks, I'm new to all this data streaming process but I was able to build and submit a Flink job that will read some CSV data from Kafka and aggregate it then put it in Elasticsearch.
I was able to do the first two parts, and print out my aggregation to STDOUT. But when I added the code to put it to Elasticsearch, it seems nothing is happening there (no data being added). I looked at the Flink job manager log and it looks fine (no errors) and says:
2020-03-03 16:18:03,877 INFO
org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge
- Created Elasticsearch RestHighLevelClient connected to [http://elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200]
Here is my code at this point:
/*
* This Scala source file was generated by the Gradle 'init' task.
*/
package flinkNamePull
import java.time.LocalDateTime
import java.util.Properties
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.{FlinkKafkaConsumer010, FlinkKafkaProducer010}
import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.{DataTypes, Table}
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.descriptors.{Elasticsearch, Json, Schema}
object Demo {
/**
* MapFunction to generate Transfers POJOs from parsed CSV data.
*/
class TransfersMapper extends RichMapFunction[String, Transfers] {
private var formatter = null
#throws[Exception]
override def open(parameters: Configuration): Unit = {
super.open(parameters)
//formatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
}
#throws[Exception]
override def map(csvLine: String): Transfers = {
//var splitCsv = csvLine.stripLineEnd.split("\n")(1).split(",")
var splitCsv = csvLine.stripLineEnd.split(",")
val arrLength = splitCsv.length
val i = 0
if (arrLength != 13) {
for (i <- arrLength + 1 to 13) {
if (i == 13) {
splitCsv = splitCsv :+ "0.0"
} else {
splitCsv = splitCsv :+ ""
}
}
}
var trans = new Transfers()
trans.rowId = splitCsv(0)
trans.subjectId = splitCsv(1)
trans.hadmId = splitCsv(2)
trans.icuStayId = splitCsv(3)
trans.dbSource = splitCsv(4)
trans.eventType = splitCsv(5)
trans.prev_careUnit = splitCsv(6)
trans.curr_careUnit = splitCsv(7)
trans.prev_wardId = splitCsv(8)
trans.curr_wardId = splitCsv(9)
trans.inTime = splitCsv(10)
trans.outTime = splitCsv(11)
trans.los = splitCsv(12).toDouble
return trans
}
}
def main(args: Array[String]) {
// Create streaming execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
// Set properties per KafkaConsumer API
val properties = new Properties()
properties.setProperty("bootstrap.servers", "kafka.kafka:9092")
properties.setProperty("group.id", "test")
// Add Kafka source to environment
val myKConsumer = new FlinkKafkaConsumer010[String]("raw.data3", new SimpleStringSchema(), properties)
// Read from beginning of topic
myKConsumer.setStartFromEarliest()
val streamSource = env
.addSource(myKConsumer)
// Transform CSV (with a header row per Kafka event into a Transfers object
val streamTransfers = streamSource.map(new TransfersMapper())
// create a TableEnvironment
val tEnv = StreamTableEnvironment.create(env)
println("***** NEW EXECUTION STARTED AT " + LocalDateTime.now() + " *****")
// register a Table
val tblTransfers: Table = tEnv.fromDataStream(streamTransfers)
tEnv.createTemporaryView("transfers", tblTransfers)
tEnv.connect(
new Elasticsearch()
.version("7")
.host("elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local", 9200, "http") // required: one or more Elasticsearch hosts to connect to
.index("transfers-sum")
.documentType("_doc")
.keyNullLiteral("n/a")
)
.withFormat(new Json().jsonSchema("{type: 'object', properties: {curr_careUnit: {type: 'string'}, sum: {type: 'number'}}}"))
.withSchema(new Schema()
.field("curr_careUnit", DataTypes.STRING())
.field("sum", DataTypes.DOUBLE())
)
.inUpsertMode()
.createTemporaryTable("transfersSum")
val result = tEnv.sqlQuery(
"""
|SELECT curr_careUnit, sum(los)
|FROM transfers
|GROUP BY curr_careUnit
|""".stripMargin)
result.insertInto("transfersSum")
// Elasticsearch elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200
env.execute("Flink Streaming Demo Dump to Elasticsearch")
}
}
I'm not sure how I can debug this beast... Wondering if somebody can help me figure out why the Flink job is not adding data to Elasticsearch :(
From my Flink cluster, I'm able to query Elasticsearch just fine (manually) and add records to my index:
curl -XPOST "http://elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200/transfers-sum/_doc" -H 'Content-Type: application/json' -d'{"curr_careUnit":"TEST123","sum":"123"}'
A kind soul in the Flink mailist pointed out the fact that it could be Elasticsearch buffering my records... Well, it was. ;)
I have added the following options to the Elasticsearch connector:
.bulkFlushMaxActions(2)
.bulkFlushInterval(1000L)
Flink Elasticsearch Connector 7 using Scala
Please find a working and detailed answer which I have provided here.
I am trying to connect to aws DocumentDB with async mongoClient.
I created a DocumentDB cluster in aws and success connect via ssh command line.
I went over here and created MongoClient and success connected and insert events.
But when I tried create com.mongodb.async.client.MongoClient, connection failed with folowing error:
No server chosen by WritableServerSelector from cluster description
ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE,
serverDescriptions=[ServerDescription{address=aws-cluster:27017,
type=UNKNOWN, state=CONNECTING,
exception={com.mongodb.MongoSocketReadTimeoutException: Timeout while
receiving message}, caused by
{io.netty.handler.timeout.ReadTimeoutException}}]}. Waiting for 30000
ms before timing out.
ClusterSettings clusterSettings = ClusterSettings.builder()
.applyConnectionString(new ConnectionString(connectionString)).build();
List<MongoCredential> credentials = new ArrayList<>();
credentials.add(
MongoCredential.createCredential(
mongoUserName,
mongoDBName,
mongoPassword));
MongoClientSettings settings = MongoClientSettings.builder()
.credentialList(credentials)
.clusterSettings(clusterSettings)
.streamFactoryFactory(new NettyStreamFactoryFactory())
.writeConcern(WriteConcern.ACKNOWLEDGED)
.build();
com.mongodb.async.client.MongoClient mongoClient = MongoClients.create(settings);
MongoDatabase testDB = mongoClient.getDatabase("myDB");
MongoCollection<Document> collection = testDB.getCollection("test");
Document doc = new Document("name", "MongoDB").append("type", "database");
//**trying insert document => here I got an error**
collection.insertOne(doc, new SingleResultCallback<Void>() {
#Override
public void onResult(final Void result, final Throwable t) {
System.out.println("Inserted!");
}
});
Do you have any ideas, why does it happen?
I solved it by using uri:
String uri = "mongodb://<username>:<Password>#<hostname>:27017/?ssl=true&ssl_ca_certs=cert";
MongoClientSettings settings = MongoClientSettings.builder()
.streamFactoryFactory(new NettyStreamFactoryFactory())
.applyConnectionString(new ConnectionString(uri))
.build();
com.mongodb.async.client.MongoClient mongoClient = MongoClients.create(settings);
I encountered a similar error , for me it was related to the TLS configs.
I disabled the TLS in documentDB https://docs.aws.amazon.com/documentdb/latest/developerguide/security.encryption.ssl.html
In my case I had to restart the cluster after disabling the TLS. (TLS was not needed for the use case). After the restart the connection was established successfully.
Spring Web server not started when we specify RPC Security Management configuration in node.conf file.
Getting error as Unresolved reference: proxy, while running PartyAServer.
Below is my server configuration for PartyA node :-
task runPartyAServer(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
main = 'com.example.server.Server'
environment "server.port", "10022"
environment "config.rpc.host", "localhost"
environment "config.rpc.port", "10006"
}
I am able to start the node with following node A configuration, but facing error while running PartyA server.
node {
name "O=PartyA,L=London,C=GB"
advertisedServices = ["com.example"]
p2pPort 10005
rpcPort 10006
cordapps = ["$corda_release_group:corda-finance:$corda_release_version",
"com.example:java-source:$version",
"com.example:base:$version"]
}
Below is my node.conf for PartyA node :-
extraAdvertisedServiceIds=[
"com.example"
]
myLegalName="O=PartyA,L=London,C=GB"
networkMapService {
address="localhost:10002"
legalName="O=Controller,L=London,C=GB"
}
p2pAddress="localhost:10005"
rpcAddress="localhost:10006"
rpcUsers=[]
security = {
authService = {
dataSource = {
type = "DB",
passwordEncryption = "SHIRO_1_CRYPT",
connection = {
jdbcUrl = "jdbc:oracle:thin:#172.16.105.21:1521:SFMS"
username = "abinay"
password = "abinay"
driverClassName = "oracle.jdbc.OracleDriver"
}
}
options = {
cache = {
expireAfterSecs = 120
maxEntries = 10000
}
}
}
}
Without having username and password how nodeRPCConnection(proxy) will be established with following code,
#PostConstruct
public void initialiseNodeRPCConnection() {
NetworkHostAndPort rpcAddress = new NetworkHostAndPort(host,rpcPort);
CordaRPCClient rpcClient = new CordaRPCClient(rpcAddress);
rpcConnection = rpcClient.start(username, password);
proxy = rpcConnection.getProxy();
staticMap.put("proxy",proxy);
}
This strikes me as more likely to be an oracle connection issue.
I'd start by writing some java code just to ensure that you can connect to the oracle DB, and then focus on getting that to work in the cordapp environment.
There's a good developer example on this (in both java and kotlin: https://github.com/corda/samples-java/tree/master/Basic/flow-database-access ; https://github.com/corda/samples-kotlin/tree/master/Basic/flow-database-access)
Best of luck on this!
I am new to spark and scala. I am trying to run an example given in google. I am encounting following exception when running this program.
Exception is:
17/05/25 11:13:42 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error starting Twitter stream - java.lang.IllegalStateException: Authentication credentials are missing.
Code that I am executing is as follows:
PrintTweets.scala
package example
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import org.apache.log4j.Level
import Utilities._
object PrintTweets {
def main(args: Array[String]) {
// Configure Twitter credentials using twitter.txt
setupTwitter()
val appName = "TwitterData"
val conf = new SparkConf()
conf.setAppName(appName).setMaster("local[3]")
val ssc = new StreamingContext(conf, Seconds(5))
//val ssc = new StreamingContext("local[*]", "PrintTweets", Seconds(10))
setupLogging()
// Create a DStream from Twitter using our streaming context
val tweets = TwitterUtils.createStream(ssc, None)
// Now extract the text of each status update into RDD's using map()
val statuses = tweets.map(status => status.getText())
statuses.print()
ssc.start()
ssc.awaitTermination()
}
}
Utilities.scala
package example
import org.apache.log4j.Level
import java.util.regex.Pattern
import java.util.regex.Matcher
object Utilities {
/** Makes sure only ERROR messages get logged to avoid log spam. */
def setupLogging() = {
import org.apache.log4j.{Level, Logger}
val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)
}
/** Configures Twitter service credentials using twiter.txt in the main workspace directory */
def setupTwitter() = {
import scala.io.Source
for (line <- Source.fromFile("../twitter.txt").getLines) {
val fields = line.split(" ")
if (fields.length == 2) {
System.setProperty("twitter4j.oauth." + fields(0), fields(1))
}
}
}
/** Retrieves a regex Pattern for parsing Apache access logs. */
def apacheLogPattern():Pattern = {
val ddd = "\\d{1,3}"
val ip = s"($ddd\\.$ddd\\.$ddd\\.$ddd)?"
val client = "(\\S+)"
val user = "(\\S+)"
val dateTime = "(\\[.+?\\])"
val request = "\"(.*?)\""
val status = "(\\d{3})"
val bytes = "(\\S+)"
val referer = "\"(.*?)\""
val agent = "\"(.*?)\""
val regex = s"$ip $client $user $dateTime $request $status $bytes $referer $agent"
Pattern.compile(regex)
}
}
When I check using print statments I find the exception is happening at line
val tweets = TwitterUtils.createStream(ssc, None)
I am giving credentials in twitter.txt file which is read properly by program. When I don't place twitter.txt in appropriate directory it shows explicit error, It shows explicit error unauthorized access when I give blank keys for customer key and secret etc in twitter.txt
If you need more details about error related information or versions of software let me know.
Thanks,
Madhu.
I could reproduce the issue with your code. I believe its your problem.
You might have not configured twitter.txt properly. Your twitter.txt file should be like this ->
consumerKey your_consumerKey
consumerSecret your_consumerSecret
accessToken your_accessToken
accessTokenSecret your_accessTokenSecret
I hope it helps.
After changing twitter.txt file syntax to following , single space between key and value it worked
consumerKey your_consumerKey
consumerSecret your_consumerSecret
accessToken your_accessToken
accessTokenSecret your_accessTokenSecret