IllegalStateException when trying to run spark streaming with twitter - spark-streaming

I am new to spark and scala. I am trying to run an example given in google. I am encounting following exception when running this program.
Exception is:
17/05/25 11:13:42 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error starting Twitter stream - java.lang.IllegalStateException: Authentication credentials are missing.
Code that I am executing is as follows:
PrintTweets.scala
package example
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import org.apache.log4j.Level
import Utilities._
object PrintTweets {
def main(args: Array[String]) {
// Configure Twitter credentials using twitter.txt
setupTwitter()
val appName = "TwitterData"
val conf = new SparkConf()
conf.setAppName(appName).setMaster("local[3]")
val ssc = new StreamingContext(conf, Seconds(5))
//val ssc = new StreamingContext("local[*]", "PrintTweets", Seconds(10))
setupLogging()
// Create a DStream from Twitter using our streaming context
val tweets = TwitterUtils.createStream(ssc, None)
// Now extract the text of each status update into RDD's using map()
val statuses = tweets.map(status => status.getText())
statuses.print()
ssc.start()
ssc.awaitTermination()
}
}
Utilities.scala
package example
import org.apache.log4j.Level
import java.util.regex.Pattern
import java.util.regex.Matcher
object Utilities {
/** Makes sure only ERROR messages get logged to avoid log spam. */
def setupLogging() = {
import org.apache.log4j.{Level, Logger}
val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)
}
/** Configures Twitter service credentials using twiter.txt in the main workspace directory */
def setupTwitter() = {
import scala.io.Source
for (line <- Source.fromFile("../twitter.txt").getLines) {
val fields = line.split(" ")
if (fields.length == 2) {
System.setProperty("twitter4j.oauth." + fields(0), fields(1))
}
}
}
/** Retrieves a regex Pattern for parsing Apache access logs. */
def apacheLogPattern():Pattern = {
val ddd = "\\d{1,3}"
val ip = s"($ddd\\.$ddd\\.$ddd\\.$ddd)?"
val client = "(\\S+)"
val user = "(\\S+)"
val dateTime = "(\\[.+?\\])"
val request = "\"(.*?)\""
val status = "(\\d{3})"
val bytes = "(\\S+)"
val referer = "\"(.*?)\""
val agent = "\"(.*?)\""
val regex = s"$ip $client $user $dateTime $request $status $bytes $referer $agent"
Pattern.compile(regex)
}
}
When I check using print statments I find the exception is happening at line
val tweets = TwitterUtils.createStream(ssc, None)
I am giving credentials in twitter.txt file which is read properly by program. When I don't place twitter.txt in appropriate directory it shows explicit error, It shows explicit error unauthorized access when I give blank keys for customer key and secret etc in twitter.txt
If you need more details about error related information or versions of software let me know.
Thanks,
Madhu.

I could reproduce the issue with your code. I believe its your problem.
You might have not configured twitter.txt properly. Your twitter.txt file should be like this ->
consumerKey your_consumerKey
consumerSecret your_consumerSecret
accessToken your_accessToken
accessTokenSecret your_accessTokenSecret
I hope it helps.

After changing twitter.txt file syntax to following , single space between key and value it worked
consumerKey your_consumerKey
consumerSecret your_consumerSecret
accessToken your_accessToken
accessTokenSecret your_accessTokenSecret

Related

How to test fastapi with oracle, sql alchemy?

I have a fastapi application where I use sqlalchemy and stored procedures.
Now I want to test my endpoints like in the documentation
import pytest
from fastapi.testclient import TestClient
from fastapi import FastAPI
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from ..dependencies import get_db
import cx_Oracle
host = 'xxxx'
port = 1111
sid = 'FUU'
user = 'bar'
password = 'fuubar'
sid = cx_Oracle.makedsn(host, port, sid=sid)
database_url = 'oracle://{user}:{password}#{sid}'.format(
user=user,
password=password,
sid=sid,
)
engine = create_engine(database_url, connect_args={"check_same_thread": False})
TestingSessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
app = FastAPI()
init_router(app)
#pytest.fixture()
def session():
db = TestingSessionLocal()
try:
yield db
finally:
db.close()
#pytest.fixture()
def client(session):
# Dependency override
def override_get_db():
try:
yield session
finally:
session.close()
app.dependency_overrides[get_db] = override_get_db
yield TestClient(app)
def test_index(client):
res = client.get("/")
assert res.text
assert res.status_code == 200
def test_search_course_by_verid_exist():
response = client.get(
'search', params={"search_query": "1111", "semester": "S2022"})
# course exist
assert response.status_code == 200
I've tried it with creating a new app and/or importing it via getting the app from the main.py
from ..main import app
The method is in my courses router.
#router.get("/search", status_code=status.HTTP_200_OK)
async def search_course(
response: Response,
search_query: Union[str, None] = None,
semester: Union[int, None] = None,
db: Session = Depends(get_db),
):
.....
return response
The index test already failes by returning assert 400 == 200. For the 2nd (test_search_course_by_verid_exist) I'll get
AttributeError: 'function' object has no attribute 'get'
My main has some middleware settings like
app.add_middleware(
SessionMiddleware, secret_key="fastAPI"
) # , max_age=300 this should match Login action timeout in token-settings of a realm
app.add_middleware(
TrustedHostMiddleware,
allowed_hosts=settings.ALLOWED_HOSTS,
)
# MIDDLEWARE
#app.middleware("http")
async def check_route(request: Request, call_next):
....
I'm clueless what I'm missing or if things are just different with cx_Oracle
I've tried changing the testclient from fastapi to the starlette one. I've tried not overriding the db and just import the original db settings (which are basically the same). But nothing works.
I'm not sure if this is the proper way to test FastAPI application, https://fastapi.tiangolo.com/tutorial/testing/
Why you didn't declare client as :
client = TestClient(app)
?
Idk if this was the root problem. But naming my fixtures solved the problem and the db connection is working.
conftest.py
#pytest.fixture(name="db_session", scope="session")
def db_session(app: FastAPI) -> Generator[TestingSessionLocal, Any, None]:
Also created the app fixture
#pytest.fixture(name="app", scope="session")
def app() -> Generator[FastAPI, Any, None]:
"""
Create a fresh database on each test case.
"""
_app = start_application()
yield _app

Flink is not adding any data to Elasticsearch but no errors

Folks, I'm new to all this data streaming process but I was able to build and submit a Flink job that will read some CSV data from Kafka and aggregate it then put it in Elasticsearch.
I was able to do the first two parts, and print out my aggregation to STDOUT. But when I added the code to put it to Elasticsearch, it seems nothing is happening there (no data being added). I looked at the Flink job manager log and it looks fine (no errors) and says:
2020-03-03 16:18:03,877 INFO
org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge
- Created Elasticsearch RestHighLevelClient connected to [http://elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200]
Here is my code at this point:
/*
* This Scala source file was generated by the Gradle 'init' task.
*/
package flinkNamePull
import java.time.LocalDateTime
import java.util.Properties
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.{FlinkKafkaConsumer010, FlinkKafkaProducer010}
import org.apache.flink.api.common.functions.RichMapFunction
import org.apache.flink.configuration.Configuration
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.table.api.{DataTypes, Table}
import org.apache.flink.table.api.scala.StreamTableEnvironment
import org.apache.flink.table.descriptors.{Elasticsearch, Json, Schema}
object Demo {
/**
* MapFunction to generate Transfers POJOs from parsed CSV data.
*/
class TransfersMapper extends RichMapFunction[String, Transfers] {
private var formatter = null
#throws[Exception]
override def open(parameters: Configuration): Unit = {
super.open(parameters)
//formatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
}
#throws[Exception]
override def map(csvLine: String): Transfers = {
//var splitCsv = csvLine.stripLineEnd.split("\n")(1).split(",")
var splitCsv = csvLine.stripLineEnd.split(",")
val arrLength = splitCsv.length
val i = 0
if (arrLength != 13) {
for (i <- arrLength + 1 to 13) {
if (i == 13) {
splitCsv = splitCsv :+ "0.0"
} else {
splitCsv = splitCsv :+ ""
}
}
}
var trans = new Transfers()
trans.rowId = splitCsv(0)
trans.subjectId = splitCsv(1)
trans.hadmId = splitCsv(2)
trans.icuStayId = splitCsv(3)
trans.dbSource = splitCsv(4)
trans.eventType = splitCsv(5)
trans.prev_careUnit = splitCsv(6)
trans.curr_careUnit = splitCsv(7)
trans.prev_wardId = splitCsv(8)
trans.curr_wardId = splitCsv(9)
trans.inTime = splitCsv(10)
trans.outTime = splitCsv(11)
trans.los = splitCsv(12).toDouble
return trans
}
}
def main(args: Array[String]) {
// Create streaming execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
// Set properties per KafkaConsumer API
val properties = new Properties()
properties.setProperty("bootstrap.servers", "kafka.kafka:9092")
properties.setProperty("group.id", "test")
// Add Kafka source to environment
val myKConsumer = new FlinkKafkaConsumer010[String]("raw.data3", new SimpleStringSchema(), properties)
// Read from beginning of topic
myKConsumer.setStartFromEarliest()
val streamSource = env
.addSource(myKConsumer)
// Transform CSV (with a header row per Kafka event into a Transfers object
val streamTransfers = streamSource.map(new TransfersMapper())
// create a TableEnvironment
val tEnv = StreamTableEnvironment.create(env)
println("***** NEW EXECUTION STARTED AT " + LocalDateTime.now() + " *****")
// register a Table
val tblTransfers: Table = tEnv.fromDataStream(streamTransfers)
tEnv.createTemporaryView("transfers", tblTransfers)
tEnv.connect(
new Elasticsearch()
.version("7")
.host("elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local", 9200, "http") // required: one or more Elasticsearch hosts to connect to
.index("transfers-sum")
.documentType("_doc")
.keyNullLiteral("n/a")
)
.withFormat(new Json().jsonSchema("{type: 'object', properties: {curr_careUnit: {type: 'string'}, sum: {type: 'number'}}}"))
.withSchema(new Schema()
.field("curr_careUnit", DataTypes.STRING())
.field("sum", DataTypes.DOUBLE())
)
.inUpsertMode()
.createTemporaryTable("transfersSum")
val result = tEnv.sqlQuery(
"""
|SELECT curr_careUnit, sum(los)
|FROM transfers
|GROUP BY curr_careUnit
|""".stripMargin)
result.insertInto("transfersSum")
// Elasticsearch elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200
env.execute("Flink Streaming Demo Dump to Elasticsearch")
}
}
I'm not sure how I can debug this beast... Wondering if somebody can help me figure out why the Flink job is not adding data to Elasticsearch :(
From my Flink cluster, I'm able to query Elasticsearch just fine (manually) and add records to my index:
curl -XPOST "http://elasticsearch-elasticsearch-coordinating-only.default.svc.cluster.local:9200/transfers-sum/_doc" -H 'Content-Type: application/json' -d'{"curr_careUnit":"TEST123","sum":"123"}'
A kind soul in the Flink mailist pointed out the fact that it could be Elasticsearch buffering my records... Well, it was. ;)
I have added the following options to the Elasticsearch connector:
.bulkFlushMaxActions(2)
.bulkFlushInterval(1000L)
Flink Elasticsearch Connector 7 using Scala
Please find a working and detailed answer which I have provided here.

Spring boot stackdriver logging is textPayload and not jsonPayload

I have a log filter that logs out essential request information for debugging and log analytics. But as you can see, the text payload is really hard to read.
I don't want to have to copy + paste this text payload into a text editor every single time. Is there a way to make stack driver print this in a collapsable json instead?
More info:
- GKE pod
#Component
class LogFilter : WebFilter {
private val logger = LoggerFactory.getLogger(LogFilter::class.java)
override fun filter(exchange: ServerWebExchange, chain: WebFilterChain): Mono<Void> {
return chain
.filter(exchange)
.doAfterTerminate {
val request = exchange.request
val path = request.uri.path
val routesToExclude = listOf("actuator")
var isExcludedRoute = false
for (r in routesToExclude) { if (path.contains(r)) { isExcludedRoute = true; break; } }
if (!isExcludedRoute) {
val startTime = System.currentTimeMillis()
val statusCode = exchange.response.statusCode?.value()
val requestTime = System.currentTimeMillis() - startTime
val msg = "Served $path as $statusCode in $requestTime msec"
val requestPrintMap = mutableMapOf<Any, Any>()
requestPrintMap["method"] = if (request.method != null) {
request.method.toString()
} else "UNKNOWN"
requestPrintMap["path"] = path.toString()
requestPrintMap["query_params"] = request.queryParams
requestPrintMap["headers"] = request.headers
requestPrintMap["status_code"] = statusCode.toString()
requestPrintMap["request_time"] = requestTime
requestPrintMap["msg"] = msg
logger.info(JSONObject(requestPrintMap).toString())
}
}
}
}
What you will need to do is customize Fluentd in GKE. Pretty much it's creating a Fluend daemonset for logging instead of the default logging method.
Once that is done, you can setup structured logging to send jsonPayload logs to Stackdriver Logging.
The default Stackdriver logging agent configuration for Kubernetes will detect single-line JSON and convert it to jsonPayload. You can configure Spring to log as single-line JSON (e.g., via JsonLayout1) and let the logging agent pick up the JSON object (see https://cloud.google.com/logging/docs/agent/configuration#process-payload).
1Some of the JSON field names are different (e.g., JsonLayout uses "level" for the log level, while the Stackdriver logging agent recognizes "severity"), so you may have to override addCustomDataToJsonMap to fully control the resulting log entries.
See also GKE & Stackdriver: Java logback logging format?

Gmail API is overriding the custom Message-ID header while sending emails

We are manually setting a custom Message-ID header while sending emails using Java MimeMessage. The Message-ID format is following the RFC822 standard. However, on sending the mail via the Gmail API, the Message-ID header is getting overwritten with a new one from gmail.
Instead if we we Java Mail and send the email via SMTP, the custom Message-ID is retained by Gmail.
Is there a way to have a custom Message-ID while sending email via the Gmail API ?
I have checked the following question, but I am not sure if its still the case. (RFC822 Message-Id in new Gmail API)
[UPDATE]
EmailMimeMessage.scala
package utils.email
import javax.mail._
import javax.mail.internet._
import play.api.Logger
class EmailMimeMessage(session: Session, messageId: String) extends MimeMessage(session) {
#throws(classOf[MessagingException])
override def updateMessageID(): Unit = {
Logger.info(s"[EmailMimeMessage] before sending add message id: $messageId")
setHeader("Message-ID", messageId)
}
}
GmailApiService.scala
package utils.email
import java.io.ByteArrayOutputStream
import java.util.Properties
import javax.mail.Session
import javax.mail.internet.{InternetAddress, MimeMessage}
import com.google.api.client.auth.oauth2.{BearerToken, Credential}
import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport
import com.google.api.client.http.HttpTransport
import com.google.api.client.json.JsonFactory
import com.google.api.client.json.jackson2.JacksonFactory
import com.google.api.client.repackaged.org.apache.commons.codec.binary.Base64
import com.google.api.services.gmail.Gmail
import scala.util.Try
case class EmailToBeSent(
to_email: String,
from_email: String,
from_name: String,
reply_to_email: String,
subject: String,
textBody: String,
htmlBody: String,
message_id: String
)
object GmailApiService {
private val APPLICATION_NAME: String = "Gmail API Java Quickstart"
private val JSON_FACTORY: JsonFactory = JacksonFactory.getDefaultInstance
private val HTTP_TRANSPORT: HttpTransport = GoogleNetHttpTransport.newTrustedTransport()
def createEmail(emailToBeSent: EmailToBeSent): Try[MimeMessage] = Try {
val props = new Properties()
val session = Session.getDefaultInstance(props, null)
val email = new EmailMimeMessage(session, emailToBeSent.message_id)
email.setFrom(new InternetAddress(emailToBeSent.from_email))
email.addRecipient(javax.mail.Message.RecipientType.TO, new InternetAddress(emailToBeSent.to_email))
email.setSubject(emailToBeSent.subject)
email.setText(emailToBeSent.textBody)
email
}
def createMessageWithEmail(email: MimeMessage) = Try {
val baos = new ByteArrayOutputStream
email.writeTo(baos)
val encodedEmail = Base64.encodeBase64URLSafeString(baos.toByteArray)
val message = new com.google.api.services.gmail.model.Message()
message.setRaw(encodedEmail)
message
}
def sendGmailService(emailToBeSent: EmailToBeSent, accessToken: String) = Try {
val credential = new Credential(BearerToken.authorizationHeaderAccessMethod)
.setAccessToken(accessToken)
val service = new Gmail.Builder(HTTP_TRANSPORT, JSON_FACTORY, credential).setApplicationName(APPLICATION_NAME).build
val user = "me"
val message = createEmail(emailToBeSent) flatMap { email => createMessageWithEmail(email) }
val sentMessage = service.users().messages().send(user, message.get).execute()
sentMessage
}
}
On calling GmailApiService.sendGmailService as follows (with Message-ID: "<1495728783999.123.456.local#examplegmail.com>"), in the sent email the Message-ID is overwritten by GMail with something like "YYfdasCAN=-fdas432HFD43FD_THD#mail.gmail.com":
val emailToBeSent = EmailToBeSent(
to_email = "mary_to#gmail.com",
from_email = "john_from#examplegmail.com",
from_name = "John Doe",
reply_to_email = "john_from#gmail.com",
subject = "How are you ?",
textBody = "Hey, how are you ?",
htmlBody = "<strong>Hey, how are you ?</strong>",
message_id ="<1495728783999.123.456.local#examplegmail.com>",
in_reply_to_id = None,
sender_email_settings_id = 0
)
val sentMsg = GmailApiService.sendGmailService(emailToBeSent, GOOGLE_OAUTH_ACCESS_TOKEN).get
My answer from 2014 is still correct: RFC822 Message-Id in new Gmail API
Gmail always sets the RFC822 Message-Id header on outgoing emails.

Headers disappear in integration test on REST service

I have an integration test in my Grails 3.2.2 application that is supposed to check that CORS support is operational. When I start the application and use something like Paw or Postman to do a request, the breakpoint I have set in CorsFilter shows that my headers are set properly. But when I do the same request from an integration test using RestBuilder with the following code:
void "Test request http OPTIONS"() {
given: "JSON content request"
when: "OPTIONS are requested"
def rest = new RestBuilder()
def optionsUrl = url(path)
def resp = rest.options(optionsUrl) {
header 'Origin', 'http://localhost:4200'
header 'Access-Control-Request-Method', 'GET'
}
then: "they are returned"
resp.status == HttpStatus.SC_OK
!resp.json
}
The breakpoint in CorsFilter shows that both headers are null:
And the weird thing is that when I put a breakpoint in RestTemplate, right before the request is executed, the headers are there:
I don't get how those headers can disappear. Any idea?
I was working on this problem problem recently, and while I don't know where RestBuilder is suppressing the Origin header, I did come up with a workaround for testing that grails' CORS support is operating as configured: using HTTPBuilder instead of RestBuilder to invoke the service.
After adding org.codehaus.groovy.modules.http-builder:http-builder:0.7.1 as a testCompile dependency in build.gradle, and with grails.cors.allowedOrigins set to http://localhost, the following tests both worked as desired:
import geb.spock.GebSpec
import grails.test.mixin.integration.Integration
import groovyx.net.http.HTTPBuilder
import groovyx.net.http.HttpResponseException
import groovyx.net.http.Method
#Integration
class ExampleSpec extends GebSpec {
def 'verify that explicit, allowed origin works'() {
when:
def http = new HTTPBuilder("http://localhost:${serverPort}/todo/1")
def result = http.request(Method.GET, "application/json") { req ->
headers.'Origin' = "http://localhost"
}
then:
result.id == 1
result.name == "task 1.1"
}
def 'verify that explicit, disallowed origin is disallowed'() {
when:
def http = new HTTPBuilder("http://localhost:${serverPort}/todo/1")
http.request(Method.GET, "application/json") { req ->
headers.'Origin' = "http://foobar.com"
}
then:
HttpResponseException e = thrown()
e.statusCode == 403
}
}
Had same problem. After some research I found out: http://hc.apache.org/, it supports sending 'Origin' and options requests.
import grails.test.mixin.integration.Integration
import grails.transaction.Rollback
import groovy.util.logging.Slf4j
import org.apache.http.client.HttpClient
import org.apache.http.client.methods.HttpOptions
import org.apache.http.impl.client.MinimalHttpClient
import org.apache.http.impl.conn.BasicHttpClientConnectionManager
import spock.lang.Specification
#Integration
#Rollback
#Slf4j
class CorsIntegrationSpec extends Specification {
def 'call with origin'() {
when:
def response = call(["Origin":"test","Content-Type":"application/json"])
then:
response != null
response.getStatusLine().getStatusCode() == 200
response.containsHeader("Access-Control-Allow-Origin")
response.containsHeader("Access-Control-Allow-Credentials")
response.containsHeader("Access-Control-Allow-Headers")
response.containsHeader("Access-Control-Allow-Methods")
response.containsHeader("Access-Control-Max-Age")
}
private call (Map<String, String> headers) {
HttpOptions httpOptions = new HttpOptions("http://localhost:${serverPort}/authz/token")
headers.each { k,v ->
httpOptions.setHeader(k,v)
}
BasicHttpClientConnectionManager manager = new BasicHttpClientConnectionManager()
HttpClient client = new MinimalHttpClient(manager)
return client.execute(httpOptions)
}
}

Resources