This is for a project on Bot Framework Composer (not SDK, so i'm using built in telemetry export settings).
I am looking for the best way to store event logs from bot conversations for analysis. From what I've researched, the method recommended is going through Application Insights, which I activated and tested. The data I require seems to be all captured in table customEvents.
The issue is I need to be able manipulate the data for analysis. But in Application Insights it's read only (and possibly purge via API). I need to be able to add tables, edit text, etc. I have a lot of experice with postgreSQL so that's my first choice for bot log storage.
So my question is, what is the efficient way to get the customEvents data table that is in application insights to a postgres database? From what I see, application insights only exports to azure storage? But that does not have a database option. And if I understand some of the pipelines suggested, they copy data to storage, and then copy to a database. Isn't that a lot of storage cost, as same data will be in application insights, storageBlobs AND postgres?
What is the best pipeline? The goal is to have non-redundant pipeline that transfers event data that is in 'customEvents' to a postgres table with same columns.
(If there is a way to redirect data that goes to customEvents in application isights directly to postgres table that would be perfect too. )
There is no such to redirect data from application insights directly into postgres table.
The first solution is continuous export to azure storage as you know. Storage blob does not cost very much and you can clear the old data periodically to reduce the cost.
Another way is to use the application insights query api. To do that, you need to write your own logic to query the custom events from application insights, then insert them into your DB by your code.
Related
We have developed an internal crm and used it for the last months. Now we have decided to open it to the public as a Saas project and I'm wondering which is the best solution to upgrade the database structure that actually is made for only one company and expand it to be able to manage multiple paying customers.
At the moment the scheduled solution is to add a "customer" field to every column in the database and upgrade the backend logic to use this field.
Are there more elegant solutions to this problem?
The database is mySql and the backend is made with laravel.
CRM Data can be very sensitive and you need to be extremely careful not to "leak" data to wrong customers.
For an existing app, I would argue for a system to create fresh DB for each customer.
You would have 1 codebase that connects to customer specific DB.
This way you dont need to change too much in your current DB structure, but "just" implement the mechanism to use the correct DB according to customer account.
This is how I would do it :
In any wah this is a massive paradigm change from an internal app to a SAAS platform app, and you should identify the necessary steps to go through to achieve the desired result.
Are there any concerns with using Snowflake as the data repository for a web API from an enterprise architecture perspective?
I think the question to be asked is how are you going to use the data. It is not clear what you mean by web API data repository. If you are talking about the API interaction data, then Snowflake is not the right choice for that. You should look for a transactional data store for such use cases. However, from that data , if you want to derive insights and analytics you can ingest the transactional data to Snowflake and build your analytics layer on top of it. But the question will be why would you like to do that, most of the API products have that analytics engine already built in their product.
So I want to create a live dashboard (probably a node based app, with react front end). This dashboard will display performance data from a series of websites from the data gathered using Googles Lighthouse Performance audit tool.
The Lighthouse tool published a JSON file with a bunch of keys and values for performance analytics.
I will using something like d3 or chart.js to eventually render this data.
My issue is with how to provide this "live" data to the web front end.
here is my idea so far (just need to know if it is viable.)
A Jenkins job will run my dockerised scrit which uses the lighthouse adk to give it a site and return a json performance report.
The jenkins job will put the json file into an S3 bucket.
A lambda will be triggered each time an item is added to the S3 bucket
The lambda will extract the desired values from the json report and write these to dynamo db
Dynamo DB stream will be used to get the latest values from the dynamo table.
The web front end will query the dynamo DB streams and render the data into chars and graphs.
Can you see this process working? would this give me a sort of "live" data feed? the idea is that the performance reports will be created multiple times during the day
I don't think the DynamoDB stream will work the way you think, unless I'm totally misunderstanding something about DynamoDB streams. How would DynamoDB push streaming data to a web browser?
I would recommend having the Lambda function add a timestamp to each record it inserts into DynamoDB. Have the timestamp field be the sort key for the primary index of the table.
Next have another Lambda function that queries the DynamoDB table for the latest record(s) using the timestamp field. Expose that Lambda function via API Gateway.
Finally have the web front-end make API calls to the endpoint you created in API Gateway to retrieve the latest performance data.
"live" can mean different things to different people and for infrequently changing data (a few times a day is not frequent compared to an interactive chat) the overhead of managing sockets, etc. might not be worth it compared to simply refreshing the page.
I don't see why you need Dynamo here; you can just read from S3 directly and perhaps use versioning on objects to track the different stats for each run.
If you genuinely want browser-based notifications you can look in to AWS IoT, and have a Lambda subscribed to the S3 bucket where the results are run that extracts the values and publishes them to IoT, which can expose a web socket for your browser based app.
We have an app and we want to log how the user is interacting with it. For example are they using the pages we expect them to. I dont want to log this via the app as it will be very hard for me to then get this information from the device. Each page interacts with webservices so I was planning to log that interaction.
I have had some thoughts on this
* as the webservice is being called add a logging table to the database - problem here could be performance impact
* use log4j async mode to log these details.
Does anyone have any other suggestion on how to do this? Im reading the Lean Startup at the moment (very good so far) and this sort of thing seems fundamental to it so Im wondering if there are any other tips to this.
Thanks
Since no one answered this for a couple months, I thought a couple pointers might help you...
Use mobile analytics tools
Fabric.io
Google Analytics for Mobile Apps
Flurry
Amazon Mobile Analytics
appsee
Have the server record what users access (that's the approach you're considering). To offload the overhead, there are a couple tactics you could employ (mix 'n match as you will):
Use async mechanisms (async operations in the server, such as Futures; log4j async mode; async databases; etc).
Use a separate database.
Use a NoSQL database only to write accesses. Later on you process that information in a separate analytics application.
Have the client (mobile app) record the actions and send them in bulk to the server once in a while (as frequently as you need / want / can afford).
Cheers
I am looking for some recommendations on a good data store for activity feeds. The goal is to have a Twitter/Facebook type feed log consisting of various activities users can do throughout our website. The "wall" or "feed" would updated via AJAX showing what the users of the website are currently doing. It will be written to often and then the most recent will be displayed on the site.
(e.g. John Smith recommended Jane Smith's article 2 seconds ago)
We currently are storing the feeds in MySQL but performance has been poor and I'm concerned with hindering performance throughout the rest of the website if we are constantly hitting the database to grab the most recent user activity as well as writing the feeds.
Any recommendations would be greatly appreciated!
Make use of the best caching solutions like memcache to increase performance. Other than scaling, there are no performance-increasing possibilities for an activity feed.
I would vote for using http://redis.io/ or http://www.mongodb.org/ as an alternative to MySQL for short-term, almost live activity feeds across a site. And a cron job to dump history of activities into MySQL for record keeping.
A look at tumblr's or twitters architectures can push you to the right direction as well.
You should take the microservices approach to separate between the datastore that stores the users' actions to the one that store the actual data.
Pub/Sub is the right approach to handle the big stream of users' actions.
Use Kafka or Google Pub/Sub cloud service for a scalable data pipeline. They can take the load with its scalable architecture.
Independently consume the messages from Kafka to some database such as MySQL or Google BigQuery for analytics purposes you must have.