I've been experimenting with querying raw accelerometer data from an Android Wear (Moto360) watch but it seems to drain too much battery (about 15% per hour). Presumably this is because the Moto360 does not support batching of sensor data. I was wondering if the Google Fit Sensor's API can be used to intelligently store accelerometer data (x,y,z) in the Fitness Store. So far I've only come across storing activity types and not raw sensor data. I am looking for a battery efficient solution.
Related
I'm looking for a high throughput in-memory database for storing binary chunks of sizes between 1.5MB to 3MB (images).
The use case is live video stream computer vision inference pipeline, where we have multiple deep models doing inference on 720p video at 25FPS in real-time. Our current solution is Amazon FSX with Lustre, which can handle the task (average throughput is 180MB/s). The models are in their own K8s pods and read the decoded video frames from the FSX. The problem is that it takes a long time to setup for each run and it's not optimal, since in order to increase the throughput you also need to pay for extra space, which we don't really need, since the storage is temporary and most of the time less than a 1000 frames are stored at once. Ideally, we would have an in-memory database on an instance, which can be lifted up fast and is can have a very high throughput (up to 500MB/s).
I've tested Redis and Memcached as an alternative, but both fail to achieve similar performance, which I assume is due to large chunk sizes (as far as I know both are meant for many smaller sized chunks and not for larger ones).
Any suggestions on what else to test or in what direction to look would be very helpful.
Thank you!
You could take a look at eXtremeDB. I work for the vendor (McObject), so hopefully this won't get flagged as 'commercial' since you asked for ideas. eXtremeDB has been used for facial and fingerprint recognition in access control systems. Not exactly the same use case, but perhaps similar enough to warrant a look.
I am working on the Hyperledger Application that can store sensor data from IoT.
Using HLF v1.4 with Raft. Each IoT device will provide JSON data at fixed intervals which gets stored in Hyperledger. I have worked with HLF v1.3 which doesn't scale very well.
With v1.4, I am planning to start with 2 organization setup with 5 peers for each organization.
But the limiting factor seems to be, as the number of blocks increase by adding new transactions and querying the network takes a longer time.
What are the steps that can be taken to scale the HLF with v1.4 or onwards.
What type of Server specs should be used for good performance, like RAM, CPUs when selecting a server e.g EC2
You can change your block size. If you increase the size of your block, then the number of blocks will get reduced. For better query and Invoke functionality you can limit the data storing into Blockchain. Yes, Computation speed also matters in Blockchain, if you have good speed, tps may vary. Try with instance types like t3 medium or more than that like t3 large.
In my app I collect a lot of metrics: hardware/native system metrics (such as CPU load, available memory, swap memory, network IO in terms of packets and bytes sent/received, etc.) as well as JVM metrics (garbage collectins, heap size, thread utilization, etc.) as well as app-level metrics (instrumentations that only have meaning to my app, e.g. # orders per minute, etc.).
Throughout the week, month, year I see trends/patterns in these metrics. For instance when cron jobs all kick off at midnight I see CPU and disk thrashing as reports are being generated, etc.
I'm looking for a way to assess/evaluate metrics as healthy/normal vs unhealthy/abnormal but that takes these patterns into consideration. For instance, if CPU spikes around (+/- 5 minutes) midnight each night, that should be considered "normal" and not set off alerts. But if CPU pins during a "low tide" in the day, say between 11:00 AM and noon, that should definitely cause some red flags to trigger.
I have the ability to store my metrics in a time-series database, if that helps kickstart this analytical process at all, but I don't have the foggiest clue as to what algorithms, methods and strategies I could leverage to establish these cyclical "baselines" that act as a function of time. Obviously, such a system would need to be pre-seeded or even trained with historical data that was mapped to normal/abnormal values (which is why I'm learning towards a time-series DB as the underlying store) but this is new territory for me and I don't even know what to begin Googling so as to get back meaningful/relevant/educated solution candidates in the search results. Any ideas?
You could categorize each metric (CPU load, available memory, swap memory, network IO) with the day and time as bad or good for each metric.
Come up with a set of data for a given time frame with metric values and whether they are good or bad. Train a model using 70% of the data with the good and bad answers in the data.
Then test the trained model using the other 30% of data without the answers to see if you get the predicted results (good,bad) from the model. You could use a classification algorithm.
happy new year and best wishes!
we are collecting a great amount of GPS positions for analytics purpose that we would like to store and process (2-3GB daily data) using Heroku / Amazon services. We are looking for a suitable solution. We initially thought about a system where the data are directly uploaded to Amazon S3, a Worker Dyno constantly tries to process them and puts the GPS positions to a Heroku PostGIS database, then another Worker Dyno would be used on demand to compute analytics output on the fly. We also heard about Amazon Elastic Map Reduce that works directly with raw data in S3 without a PostGIS database. And we need your guidance.
What are your recommendation for this kind of needs for storing and processing data (Heroku add-on, architectures, etc)? What do you think of the 2 alternatives listed above?
Many thanks
It is difficult to give a precise answer as the details of your processing are not clear. Do you need per user analytics, per region analytics, across days etc.
I can point you to some related services:
Amazon Kinesis - a new service that is targeted at such use cases (Internet of things like). You can PUT your readings from various sources (including directly from the mobile devices) and read them on the server side.
Amazon DynamoDB - NoSQL DB that AWS recently added a geospatial library for it: http://www.allthingsdistributed.com/2013/09/dynamodb-geospatial.html
http://aws.typepad.com/aws/2013/09/new-geo-library-for-dynamodb-.html
RDS with PostgreSQL - PostgreSQL is very good for GIS calculation and with RDS it is even easier to manage as most of the DBA work that is needed (installation, updates, backup, restore, etc.) are done by RDS service.
S3 - THE place to store your data for batch processing. Note that it is best to have larger files for most processing cases like EMR. You can have a connector that reading the data from Kinesis and storing it into S3 (see GitHub example: https://github.com/awslabs/amazon-kinesis-connectors/tree/master/src/main/java/com/amazonaws/services/kinesis/connectors/s3)
Amazon EMR - this is the cluster management service that makes running jobs like Hadoop jobs much easier. You can find a presentation about using EMR for geospatial analytics in re:invent BDT201 and video
You should also consider pre-processing the data to limit the number of redudant records. Most of your positions are likely to be at the same location. In otherwords, the device will be sitting still much of the time.
one approach is to store a new position only if its speed is greater then 0 and the last stored location is also at 0. That way you store only the fist location after the device stops moving. There will be noise on the GPS speed so you iwll not get rid of every resting position.
Another option would be to store only when a new position is some distance from the previously stored position.
You can always return a result for any requested time by finding the closest record before the requested timestamp.
If you use the range compression, consider setting the required distance at least as large as the expected RMS error for the GPS device, likly to be about 5M minimum, use a longer distance if you can stand it.
Doing the math for distance between Geo locations can be resource expensive, pre-calculate a delta lat lon value to use with incoming positions to speed that up.
EMR launched Kinesis connector so one could process such a dataset using familiar tools in Hadoop Ecosystem. Did you see http://aws.typepad.com/aws/2014/02/process-streaming-data-with-kinesis-and-elastic-mapreduce.html ?
My question is similar to this. I need data struture to store and access large amount of time series data. In my case insert rate is very hight - 10-100k inserts per second. Data items is a tuples that contains timestamp, sensor id and sensor value. And I have very large number of sensors. In my case values that is older than some point in time must be erased.
I need to query dataset by sensor id and time range. All the data must be stored in external memory, there is no way to fit it in main memory.
I know about TSB-tree already, but TSB-tree is hard to implement and there is no guarantee that it will do the job. I suspect that TSB-tree doesn't behave very good under high insert rate.
Is there any alternative? Maybe something like LSM-tree but for multidimentional data?
Because you're using external memory, you may want to read through the chapter on B-trees in Henrik Jonsson's thesis - B-trees themselves are a very popular way to index data in external memory and you should be able to find implementations in any language, and Jonnson discusses how to adapt them to store time series data.