Logstash extracting values from sp_executesql - elasticsearch

We're tracking and shipping our SQL Server procedure timeouts into Elasticsearch, so we can visualize them in Kibana in order to spot issues.
Some of our SQL queries are parameterized and use sp_executesql.
Would it be possible to extract its parameters and their values from query?
For instance:
EXEC sp_executesql N'EXEC dbo.MySearchProcedure #UserId=#p0,#SearchPhrase=#p1'
, N'#p0 int,#p1 nvarchar(max)'
, #p0 = 11111
, #p1 = N'denmark';
And get this result out of it:
{
"Procedure": "dbo.MySearchProcedure",
"Statement": "exec sp_executesql N'exec Search.GetAnalysisResultsListTextSearch #SubscriberId=#p0,#SearchTerms=#p1,#SortType=#p2',N'#p0 int,#p1 nvarchar(max) ,#p2 int',#p0=47594,#p1=N'denmark',#p2=0",
"Parameters": {
"UserId": 11111,
"SearchPhrase": "denmark"
}
}

Sounds like a job for the ruby{} filter. First, locate all your key=value pair in the query (#userid=#p0, probably using ruby's scan feature), then locate the assignments (#p0=1234, using scan again), then create a new field combining the two (userid=1234). In the ruby filter:
event['userid'] = '1234'

Related

How to fetch sql query results in airflow using JDBC operator

I have configured JDBC connection in Airflow connections. My Task part of DAG looks like below which contains a select statement. When triggering the DAG is success, but my the query results are not printed in log. How to fetch the results of the query using JDBC operator.
dag = DAG(dag_id='test_azure_sqldw_v1',
default_args=default_args,schedule_interval=None,dagrun_timeout=timedelta(seconds=120),)
sql="select count(*) from tablename"
azure_sqldw=JdbcOpetask_id='azure_sqldw',sql=sql,jdbc_conn_id="cdf_sqldw",autocommit=True,dag=dag)
The operator does not print to the log. It just run the query.
If you want to fetch results to do something with it you need to use the hook.
from airflow.providers.jdbc.hooks.jdbc import JdbcHook
def func(jdbc_conn_id, sql, **kwargs):
"""Print df from JDBC """
pprint(kwargs)
hook = JdbcHook(jdbc_conn_id=jdbc_conn_id)
df = hook.get_pandas_df(sql=sql,autocommit=True)
print(df.to_string())
run_this = PythonOperator(
task_id='task',
python_callable=func,
op_kwargs={'jdbc_conn_id': 'cdf_sqldw', 'sql': 'select count(*) from tablename' },
dag=dag,
)
You can also create a custom operator that does the required action you seek.

Fantastic elastic search plugin, no search results

The query generated by the Fantastic ES plugin always returns no results for a search. The SQL statement generated by the query:
SELECT SQL_CALC_FOUND_ROWS wp_posts.*
FROM wp_posts
WHERE 1=1
AND wp_posts.ID IN (0)
AND wp_posts.post_type IN (\'post\', \'page\',\'attachment\', \'events\', \'testimonies\', \'leadership\')
AND (wp_posts.post_status = \'publish\' OR wp_posts.post_author = 1 AND wp_posts.post_status = \'private\')
ORDER BY wp_posts.post_type ASC
LIMIT 0, 500'
It always has wp_posts.ID IN (0) which would seem to always be false? All the posts are in elasticsearch and I am able to run queries on the data from the command line. I am new to Elasticsearch and this plugin, maybe I am missing something simple?

ActiveRecord Subquery Inner Join

I am trying to convert a "raw" PostGIS SQL query into a Rails ActiveRecord query. My goal is to convert two sequential ActiveRecord queries (each taking ~1ms) into a single ActiveRecord query taking (~1ms). Using the SQL below with ActiveRecord::Base.connection.execute I was able to validate the reduction in time.
Thus, my direct request is to help me to convert this query into an ActiveRecord query (and the best way to execute it).
SELECT COUNT(*)
FROM "users"
INNER JOIN (
SELECT "centroid"
FROM "zip_caches"
WHERE "zip_caches"."postalcode" = '<postalcode>'
) AS "sub" ON ST_Intersects("users"."vendor_coverage", "sub"."centroid")
WHERE "users"."active" = 1;
NOTE that the value <postalcode> is the only variable data in this query. Obviously, there are two models here User and ZipCache. User has no direct relation to ZipCache.
The current two step ActiveRecord query looks like this.
zip = ZipCache.select(:centroid).where(postalcode: '<postalcode>').limit(1).first
User.where{st_intersects(vendor_coverage, zip.centroid)}.count
Disclamer: I've never used PostGIS
First in your final request, it seems like you've missed the WHERE "users"."active" = 1; part.
Here is what I'd do:
First add a active scope on user (for reusability)
scope :active, -> { User.where(active: 1) }
Then for the actual query, You can have the sub query without executing it and use it in a joins on the User model, such as:
subquery = ZipCache.select(:centroid).where(postalcode: '<postalcode>')
User.active
.joins("INNER JOIN (#{subquery.to_sql}) sub ON ST_Intersects(users.vendor_coverage, sub.centroid)")
.count
This allow minimal raw SQL, while keeping only one query.
In any case, check the actual sql request in your console/log by setting the logger level to debug.
The amazing tool scuttle.io is perfect for converting these sorts of queries:
User.select(Arel.star.count).where(User.arel_table[:active].eq(1)).joins(
User.arel_table.join(ZipCach.arel_table).on(
Arel::Nodes::NamedFunction.new(
'ST_Intersects', [
User.arel_table[:vendor_coverage], Sub.arel_table[:centroid]
]
)
).join_sources
)

Formatting Postgres row_to_json response for query

I have the following Postgres query:
"SELECT \"responses\".\"index\", \"responses\".\"created_at\",
ROUND(AVG(\"responses\".\"numeric\")) AS numeric
FROM \"responses\"
WHERE \"responses\".\"time\" = '#{time}'
GROUP BY \"responses\".\"index\", \"responses\".\"created_at\""
I'm trying to output the response as json using row_to_json. I can use:
"select row_to_json(row)
from (
SELECT \"responses\".\"index\", \"responses\".\"created_at\",
ROUND(AVG(\"responses\".\"numeric\")) AS numeric
FROM \"responses\"
WHERE \"responses\".\"time\" = '#{time}'
GROUP BY \"responses\".\"index\", \"responses\".\"created_at\"
) row"
Which will give me:
{"row_to_json"=>"{\"index\":1,\"created_at\":\"2014-07-12 03:51:00\",\"numeric\":3}"}
However I don't really want the response nested in the row_to_json hash. Is there a simple way to remove that so I just return:
"{\"index\":1,\"created_at\":\"2014-07-12 03:51:00\",\"numeric\":3}"
You should use array_to_json and array_agg functions.
For example:
SELECT array_to_json(array_agg(row_to_json(row))) FROM ...
It will return a correct JSON array
References:
http://hashrocket.com/blog/posts/faster-json-generation-with-postgresql
http://reefpoints.dockyard.com/2014/05/27/avoid-rails-when-generating-json-responses-with-postgresql.html

How to write a query with two ? placeholders in sequence?

I am using a NamedParameterJdbcTemplate, but found that this problem is in the underlying JdbcTemplate class, so I will show the problem as it occurs with the JdbcTemplate (so let's not worry about the safety of the SQL query here).
Here's what I am trying to achieve:
String sql = "SELECT * FROM clients ORDER BY ? ?";
return jdbcTemplate.query(sql,
new Object[] { "name", "ASC" },
new ClientResultSetExtractor());
I expected the first place-holder to be replaced with "name" and the second with "ASC", which would create the valid SQL query:
SELECT * FROM clients ORDER BY name ASC
But unfortunately, running that jdbc query does not work:
ERROR: syntax error at or near "$2" at character 35
STATEMENT: SELECT * FROM clients ORDER BY $1 $2
What am I doing wrong?
EDIT
I had assumed the problem was the two placeholders in sequence, but even when I remove the first one, it still won't accept just the last one, which should tell the query whether to sort in ASC or DESC order. Is this a bug, and if not, why the heck is this not acceptable????
You're trying to use parameters incorrectly.
Parameters are not column names or SQL statement keywords. They're data content (eg., WHERE LastName = ? is a valid parameterized statement, WHERE ? = 'Smith' is not).

Resources