Airflow SimpleHttpOperator for HTTPS - https

I'm trying to use SimpleHttpOperator for consuming a RESTful API. But, As the name suggests, it only supporting HTTP protocol where I need to consume a HTTPS URI. so, now, I have to use either "requests" object from Python or handle the invocation from within the application code. But, It may not be a standard way. so, I'm looking for any other options available to consume HTTPS URI from within Airflow. Thanks.

I dove into this and am pretty sure that this behavior is a bug in airflow. I have created a ticket for it here:
https://issues.apache.org/jira/browse/AIRFLOW-2910
For now, the best you can do is override SimpleHttpOperator as well as HttpHook in order to change the way that HttpHook.get_conn works (to accept https). I may end up doing this, and if I do I'll post some code.
Update:
Operator override:
from airflow.operators.http_operator import SimpleHttpOperator
from airflow.exceptions import AirflowException
from operators.https_support.https_hook import HttpsHook
class HttpsOperator(SimpleHttpOperator):
def execute(self, context):
http = HttpsHook(self.method, http_conn_id=self.http_conn_id)
self.log.info("Calling HTTP method")
response = http.run(self.endpoint,
self.data,
self.headers,
self.extra_options)
if self.response_check:
if not self.response_check(response):
raise AirflowException("Response check returned False.")
if self.xcom_push_flag:
return response.text
Hook override
from airflow.hooks.http_hook import HttpHook
import requests
class HttpsHook(HttpHook):
def get_conn(self, headers):
"""
Returns http session for use with requests. Supports https.
"""
conn = self.get_connection(self.http_conn_id)
session = requests.Session()
if "://" in conn.host:
self.base_url = conn.host
elif conn.schema:
self.base_url = conn.schema + "://" + conn.host
elif conn.conn_type: # https support
self.base_url = conn.conn_type + "://" + conn.host
else:
# schema defaults to HTTP
self.base_url = "http://" + conn.host
if conn.port:
self.base_url = self.base_url + ":" + str(conn.port) + "/"
if conn.login:
session.auth = (conn.login, conn.password)
if headers:
session.headers.update(headers)
return session
Usage:
Drop-in replacement for SimpleHttpOperator.

This is a couple of months old now, but for what it is worth I did not have any issue with making an HTTPS call on Airflow 1.10.2.
In my initial test I was making a request for templates from sendgrid, so the connection was set up like this:
Conn Id : sendgrid_templates_test
Conn Type : HTTP
Host : https://api.sendgrid.com/
Extra : { "authorization": "Bearer [my token]"}
and then in the dag code:
get_templates = SimpleHttpOperator(
task_id='get_templates',
method='GET',
endpoint='/v3/templates',
http_conn_id = 'sendgrid_templates_test',
trigger_rule="all_done",
xcom_push=True
dag=dag,
)
and that worked. Also notice that my request happens after a Branch Operator, so I needed to set the trigger rule appropriately (to "all_done" to make sure it fires even when one of the branches is skipped), which has nothing to do with the question, but I just wanted to point it out.
Now to be clear, I did get an Insecure Request warning as I did not have certificate verification enabled. But you can see the resulting logs below
[2019-02-21 16:15:01,333] {http_operator.py:89} INFO - Calling HTTP method
[2019-02-21 16:15:01,336] {logging_mixin.py:95} INFO - [2019-02-21 16:15:01,335] {base_hook.py:83} INFO - Using connection to: id: sendgrid_templates_test. Host: https://api.sendgrid.com/, Port: None, Schema: None, Login: None, Password: XXXXXXXX, extra: {'authorization': 'Bearer [my token]'}
[2019-02-21 16:15:01,338] {logging_mixin.py:95} INFO - [2019-02-21 16:15:01,337] {http_hook.py:126} INFO - Sending 'GET' to url: https://api.sendgrid.com//v3/templates
[2019-02-21 16:15:01,956] {logging_mixin.py:95} WARNING - /home/csconnell/.pyenv/versions/airflow/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[2019-02-21 16:15:05,242] {logging_mixin.py:95} INFO - [2019-02-21 16:15:05,241] {jobs.py:2527} INFO - Task exited with return code 0

I was having the same problem with HTTP/HTTPS when trying to set the connections using environment variables (although it works when i set the connection on the UI).
I've checked the issue #melchoir55 opened (https://issues.apache.org/jira/browse/AIRFLOW-2910) and you don't need to make a custom operator for that, the problem is not that HttpHook or HttpOperator can't use HTTPS, the problem is the way get_hook parse the connection string when dealing with HTTP, it actually understand that the first part (http:// or https://) is the connection type.
In summary, you don't need a custom operator, you can just set the connection in your env as the following:
AIRFLOW_CONN_HTTP_EXAMPLE=http://https%3a%2f%2fexample.com/
Instead of:
AIRFLOW_CONN_HTTP_EXAMPLE=https://example.com/
Or set the connection on the UI.
It is not a intuitive way to set up a connection but I think they are working on a better way to parse connections for Ariflow 2.0.

In Airflow 2.x you can use https URLs by passing https for schema value while setting up your connection and can still use SimpleHttpOperator like shown below.
my_api = SimpleHttpOperator(
task_id="my_api",
http_conn_id="YOUR_CONN_ID",
method="POST",
endpoint="/base-path/end-point",
data=get_data,
headers={"Content-Type": "application/json"},
)

Instead of implementing HttpsHook, we could just put one line of codes into HttpsOperator(SimpleHttpOperator)#above as follows
...
self.extra_options['verify'] = True
response = http.run(self.endpoint,
self.data,
self.headers,
self.extra_options)
...

in Airflow 2, the problem is been resolved.
just check out that :
host name in Connection UI Form, don't end up with /
'endpoint' parameter of SimpleHttpOperator starts with /

I am using Airflow 2.1.0,and the following setting works for https API
In connection UI, setting host name as usual, no need to specify 'https' in schema field, don't forget to set login account and password if your API server request ones.
Connection UI Setting
When constructing your task, add extra_options parameter in SimpleHttpOperator, and put your CA_bundle certification file path as the value for key verify, if you don't have a certification file, then use false to skip verification.
Task definition
Reference: here

Related

Fetch emails through IMAP with proxy of form user:password:host:port

I have code to login to my email account to fetch recent emails:
def fetchRecentEmail(emailAddr, emailPassword, timeout=120):
host = fetch_imap_server(emailAddr) # e.g. 'outlook.office365.com'
with IMAP4_SSL(host) as session:
status, _ = session.login(emailAddr, emailPassword)
if status == 'OK':
# fetch most recent message
status, messageData = session.select("Inbox")
:
I'm trying to tweak it to go through a proxy.
ref: How can I fetch emails via POP or IMAP through a proxy?
ref: https://gist.github.com/sstevan/efccf3d5d3e73039c21aa848353ff52f
In each of the above resources, the proxy is of clean form IP:PORT.
However my proxy is of the form USER:PASS:HOST:PORT.
The proxy works:
USER = 'Pp7fwti5n-res-any-sid-' + random8Digits()
PASS = 'abEDxts7v'
HOST = 'gw.proxy.rainproxy.io'
PORT = 5959
proxy = f'{USER}:{PASS}#{HOST}:{PORT}'
proxies = {
'http': 'http://' + proxy,
'https': 'http://' + proxy
}
response = requests.get(
'https://ip.nf/me.json',
proxies=proxies, timeout=15
)
The following code looks like it should work, but errors:
HOST = 'outlook.office365.com'
IMAP_PORT = 963
PROXY_TYPE = 'http' # rainproxies are HTTP
mailbox = SocksIMAP4SSL(
host=HOST,
port=IMAP_PORT,
proxy_type=PROXY_TYPE,
proxy_addr=URL,
proxy_port=PORT,
username=USER,
password=PASS
)
emailAddress, emailPassword = EMAIL.split(',')
mailbox.login(emailAddress, emailPassword)
typ, data = mailbox.list()
print(typ)
print(data)
I needed to add a timeout arg/param in 2 places to get the code to run:
def _create_socket(self, timeout=None):
sock = SocksIMAP4._create_socket(self, timeout)
server_hostname = self.host if ssl.HAS_SNI else None
return self.ssl_context.wrap_socket(
sock, server_hostname=server_hostname
)
def open(self, host='', port=IMAP4_PORT, timeout=None):
SocksIMAP4.open(self, host, port, timeout)
Rather confusing that nobody else seems to have flagged that in the gist.
But it still won't work.
If I use any number other than 443 for IMAP_PORT I get this error:
GeneralProxyError: Socket error: 403: Forbidden
[*] Note: The HTTP proxy server may not be supported by PySocks (must be a CONNECT tunnel proxy)
And if I use 443, while I now get no error, mailbox = SocksIMAP4SSL( never completes.
So I am still far from a working solution.
I am hoping to run this code simultaneously on 2 CPU cores, so I don't understand the implications of using port 443. Is that going to mean that no other process on my system can use that port? And if this code is using this port simultaneously in two processes, does this mean that there will be a conflict?
Maybe you can try monkeypatching socket.socket with PySocket.
import socket
import socks
socks.set_default_proxy(socks.SOCKS5, HOST, PORT, True, USER, PASS)
socket.socket = socks.socksocket
Then check if your IMAP traffic is going through a given proxy.

MIP SDK fails to protect files

I'm using MIP file sample command line interface to apply labelling.
When trying to apply a label that has protection set, i got "Label requires ad-hoc protection, but protection has not yet been set" error.
Therefore, I tried protecting the file using "--protect" option and got the following error message:
"Something bad happened: The service didn't accept the auth token. Challenge:['Bearer resource="https://aadrm.com", realm="", authorization="https://login.windows.net/common/oauth2/authorize"'], CorrelationId=ce732e4a-249a-47ec-a7c2-04f4d68357da, CorrelationId.Description=ProtectionEngine, CorrelationId=6ff992dc-91b3-4610-a24d-d57e13902114, CorrelationId.Description=FileHandler"
This is my auth.py file:
def main(argv):
client_id = str(argv[0])
tenant_id = str(argv[1])
secret = str(argv[2])
authority = "https://login.microsoftonline.com/{}".format(tenant_id)
app = msal.ConfidentialClientApplication(client_id, authority=authority, client_credential=secret)
result = None
scope = ["https://psor.o365syncservice.com/.default"]
result = app.acquire_token_silent(scope, account=None)
if not result:
logging.info("No suitable token exists in cache. Let's get a new one from AAD.")
result = app.acquire_token_for_client(scopes=scope)
if "access_token" in result:
sys.stdout.write(result['access_token'])
else:
print(result.get("error"))
print(result.get("error_description"))
print(result.get("correlation_id")) # You may need this when reporting a bug
if __name__ == '__main__':
main(sys.argv[1:])
I tried to change the scope to ["https://aadrm.com/.default"] and then I was able to protect the file, but when I try getting the file status or try applying label on it I get the same error message with bad auth token.
These are the permissions as configured in azure portal:
Thank you
I think that scope you have is incorrect: https://psor.o365syncservice.com/.default
It should be https://syncservice.o365syncservice.com/.default.
A good way to handle this is to just append .default to whatever resource the AcquireToken() call gets in the resource param. Something like this.

HAproxy+Lua: Return requests if validation fails from Lua script

We are trying to build an incoming request validation platform using HAProxy+Lua.
Our use-case is to create a LUA scripts that will essentially make a socket call to a Validation API, and based on the response
from Validation API we want to redirect the request to a backend API, and if the validation fails we would want to return
the request right from the LUA script. For example, for 200 response we would want to redirect the request to backend api, and for 404 we would want
to return the request. From the documentation, I understand that there are various default functions available
with Lua-Haproxy integration.
core.register_action() --> I'm using this. Take TXN as input
core.register_converters() --> Essentially used for string manipulations.
core.register_fetches() --> Takes TXN as input and returns string; Mainly used for representing dynamic backend profiles in haproxy config
core.register_init() --> Used for initialization
core.register_service() --> You have to return the response mandatorily while using this function, which doesn't satisfy our requirements
core.register_task() --> For using normal functions. No mandatory input class. TXN is required to fetch header details from request
I have tried all of the functions from above list, I understand that core.register_service is basically to return a response from the
Lua script. However, what is problematic is, we must send the response from the LUA script and it will not redirect the request to BACKEND.
Currently, I am using core.register_action to interrupt the requests, but I'm not able to return the request using this function. Here's
what my code looks like:
local http_socket = require("socket.http")
local pretty_print = require("pl.pretty")
function add_http_request_header(txn, name, value)
local headerName = name
local headerValue = value
txn.http:req_add_header(headerName, headerValue)
end
function call_validation_api()
local request, code, header = http_socket.request {
method = "GET", -- Validation API Method
url = "http://www.google.com/" -- Validation API URL
}
-- Using core.log; Print in some cases is a blocking operation http://www.arpalert.org/haproxy-lua.html#h203
core.Info( "Validation API Response Code: " .. code )
pretty_print.dump( header )
return code
end
function failure_response(txn)
local response = "Validation Failed"
core.Info(response)
txn.res:send(response)
-- txn:close()
end
core.register_action("validation_action", { "http-req", "http-res" }, function(txn)
local validation_api_code = call_validation_api()
if validation_api_code == 200 then
core.Info("Validation Successful")
add_http_request_header(txn, "test-header", "abcdefg")
pretty_print.dump( txn.http:req_get_headers() )
else
failure_response(txn) --->>> **HERE I WANT TO RETURN THE RESPONSE**
end
end)
Following is the configuration file entry:
frontend http-in
bind :8000
mode http
http-request lua.validation_action
#Capturing header of the incoming request
capture request header test-header len 64
#use_backend %[lua.fetch_req_params]
default_backend app
backend app
balance roundrobin
server app1 127.0.0.1:9999 check
Any help is much appreciated in achieving this functionality. Also, I understand that SOCKET call from Lua script is a blocking call, which is opposite to HAProxy's default nature of keep-alive connection. Please feel free to suggest any other utility to achieve this functionality, if you have already used it.
Ok I have figured out the answer to this question:
I created 2 backends for success and failure of requests, and based on the response I am returning 2 different strings. In "failure_backend", I have called a different service, which essentially is a core.register_service and can return the response. I'm pasting code for both the configuration file and lua script
HAProxy conf file:
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2
maxconn 4000
user haproxy
group haproxy
daemon
#lua file load
lua-load /home/aman/coding/haproxy/http_header.lua
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
retries 3
timeout http-request 90s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend http-in
bind :8000
mode http
use_backend %[lua.validation_fetch]
default_backend failure_backend
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend success_backend
balance roundrobin
server app1 172.23.12.94:9999 check
backend failure_backend
http-request use-service lua.failure_service
# For displaying HAProxy statistics.
frontend stats
bind :8888
default_backend stats
backend stats
stats enable
stats hide-version
stats realm Haproxy Statistics
stats uri /haproxy/stats
stats auth aman:rjil#123
Lua script:
local http_socket = require("socket.http")
local pretty_print = require("pl.pretty")
function add_http_request_header(txn, name, value)
local headerName = name
local headerValue = value
txn.http:req_add_header(headerName, headerValue)
end
function call_validation_api()
local request, code, header = http_socket.request {
method = "GET", -- Validation API Method
url = "http://www.google.com/" -- Validation API URL
}
-- Using core.log; Print in some cases is a blocking operation http://www.arpalert.org/haproxy-lua.html#h203
core.Info( "Validation API Response Code: " .. code )
pretty_print.dump( header )
return code
end
function failure_response(txn)
local response = "Validation Failed"
core.Info(response)
return "failure_backend"
end
-- Decides back-end based on Success and Failure received from validation API
core.register_fetches("validation_fetch", function(txn)
local validation_api_code = call_validation_api()
if validation_api_code == 200 then
core.Info("Validation Successful")
add_http_request_header(txn, "test_header", "abcdefg")
pretty_print.dump( txn.http:req_get_headers() )
return "success_backend"
else
failure_response(txn)
end
end)
-- Failure service
core.register_service("failure_service", "http", function(applet)
local response = "Validation Failed"
core.Info(response)
applet:set_status(400)
applet:add_header("content-length", string.len(response))
applet:add_header("content-type", "text/plain")
applet:start_response()
applet:send(response)
end)

Simulate session using WithServer

I am trying to port tests from using FakeRequest to using WithServer.
In order to simulate a session with FakeRequest, it is possible to use WithSession("key", "value") as suggested in this post: Testing controller with fake session
However when using WithServer, the test now looks like:
"render the users page" in WithServer {
val users = await(WS.url("http://localhost:" + port + "/users").get)
users.status must equalTo(OK)
users.body must contain("Users")
}
Since there is no WithSession(..) method available, I tried instead WithHeaders(..) (does that even make sense?), to no avail.
Any ideas?
Thanks
So I found this question, which is relatively old:
Add values to Session during testing (FakeRequest, FakeApplication)
The first answer to that question seems to have been a pull request to add .WithSession(...) to FakeRequest, but it was not applicable to WS.url
The second answer seems to give me what I need:
Create cookie:
val sessionCookie = Session.encodeAsCookie(Session(Map("key" -> "value")))
Create and execute request:
val users = await(WS.url("http://localhost:" + port + "/users")
.withHeaders(play.api.http.HeaderNames.COOKIE -> Cookies.encodeCookieHeader(Seq(sessionCookie))).get())
users.status must equalTo(OK)
users.body must contain("Users")
Finally, the assertions will pass properly, instead of redirecting me to the login page
Note: I am using Play 2.4, so I use Cookies.encodeCookieHeader, because Cookies.encode is deprecated

Sending An HTTP Request using Intersystems Cache

I have the following Business Process defined within a Production on an Intersystems Cache Installation
/// Makes a call to Merlin based on the message sent to it from the pre-processor
Class sgh.Process.MerlinProcessor Extends Ens.BusinessProcess [ ClassType = persistent, ProcedureBlock ]
{
Property WorkingDirectory As %String;
Property WebServer As %String;
Property CacheServer As %String;
Property Port As %String;
Property Location As %String;
Parameter SETTINGS = "WorkingDirectory,WebServer,Location,Port,CacheServer";
Method OnRequest(pRequest As sgh.Message.MerlinTransmissionRequest, Output pResponse As Ens.Response) As %Status
{
Set tSC=$$$OK
Do ##class(sgh.Utils.Debug).LogDebugMsg("Packaging an HTTP request for Saved form "_pRequest.DateTimeSaved)
Set dateTimeSaved = pRequest.DateTimeSaved
Set patientId = pRequest.PatientId
Set latestDateTimeSaved = pRequest.LatestDateTimeSaved
Set formName = pRequest.FormName
Set formId = pRequest.FormId
Set episodeNumber = pRequest.EpisodeNumber
Set sentElectronically = pRequest.SentElectronically
Set styleSheet = pRequest.PrintName
Do ##class(sgh.Utils.Debug).LogDebugMsg("Creating HTTP Request Class")
set HTTPReq = ##class(%Net.HttpRequest).%New()
Set HTTPReq.Server = ..WebServer
Set HTTPReq.Port = ..Port
do HTTPReq.InsertParam("DateTimeSaved",dateTimeSaved)
do HTTPReq.InsertParam("HospitalNumber",patientId)
do HTTPReq.InsertParam("Episode",episodeNumber)
do HTTPReq.InsertParam("Stylesheet",styleSheet)
do HTTPReq.InsertParam("Server",..CacheServer)
Set Status = HTTPReq.Post(..Location,0) Quit:$$$ISERR(tSC)
Do ##class(sgh.Utils.Debug).LogDebugMsg("Sent the following request: "_Status)
Quit tSC
}
}
The thing is when I check the debug value (which is defined as a global) all I get is the number '1' - I have no idea therefore if the request has succeeded or even what is wrong (if it has not)
What do I need to do to find out
A) What is the actual web call being made?
B) What the response is?
There is a really slick way to get the answer the two questions you've asked, regardless of where you're using the code. Check the documentation out on the %Net.HttpRequest object here: http://docs.intersystems.com/ens20102/csp/docbook/DocBook.UI.Page.cls?KEY=GNET_http and the class reference here: http://docs.intersystems.com/ens20102/csp/documatic/%25CSP.Documatic.cls?APP=1&LIBRARY=ENSLIB&CLASSNAME=%25Net.HttpRequest
The class reference for the Post method has a parameter called test, that will do what you're looking for. Here's the excerpt:
method Post(location As %String = "", test As %Integer = 0, reset As %Boolean = 1) as %Status
Issue the Http 'post' request, this is used to send data to the web server such as the results of a form, or upload a file. If this completes correctly the response to this request will be in the HttpResponse. The location is the url to request, e.g. '/test.html'. This can contain parameters which are assumed to be already URL escaped, e.g. '/test.html?PARAM=%25VALUE' sets PARAM to %VALUE. If test is 1 then instead of connecting to a remote machine it will just output what it would have send to the web server to the current device, if test is 2 then it will output the response to the current device after the Post. This can be used to check that it will send what you are expecting. This calls Reset automatically after reading the response, except in test=1 mode or if reset=0.
I recommend moving this code to a test routine to view the output properly in terminal. It would look something like this:
// To view the REQUEST you are sending
Set sc = request.Post("/someserver/servlet/webmethod",1)
// To view the RESPONSE you are receiving
Set sc = request.Post("/someserver/servlet/webmethod",2)
// You could also do something like this to parse your RESPONSE stream
Write request.HttpResponse.Data.Read()
I believe the answer you want to A) is in the Server and Location properties of your %Net.HttpRequest object (e.g., HTTPReq.Server and HTTPReq.Location).
For B), the response information should be in the %Net.HttpResponse object stored in the HttpResponse property (e.g. HTTPReq.HttpResponse) after your call is completed.
I hope this helps!
-Derek
(edited for formatting)
From that code sample it looks like you're using Ensemble, not straight-up Cache.
In that case you should be doing this HTTP call in a Business Operation that uses the HTTP Outbound Adapter, not in your Business Process.
See this link for more info on HTTP Adapters:
http://docs.intersystems.com/ens20102/csp/docbook/DocBook.UI.Page.cls?KEY=EHTP
You should also look into how to use the Ensemble Message Browser. That should help with your logging needs.

Resources