How can I make my server to accept the data sent by cc3200 through mqtt protocol ?Made cc3200 to publish the values successfully to my server IP address but I don't know what should I do to make my server dump those incoming values into its database.Actually I use XAMPP for server functionalities.
any suggestion guys ?
Am using hivemq broker
If your primary goal is to have some telemetry data from CC3200 stored in the database, I would suggest that you take a look at this webinar. You can configure Kaa server to use one of multiple existing log appenders to publish your data to Spark, Cassandra, MongoDB, HDFS, Couchbase, etc. There are several major benefits of doing data collection with Kaa:
All of the data is structured end-to-end. You define telemetry data model in Kaa UI, which translates into Avro-compatible schemas, and generates object bindings in the Kaa SDK. Instead of writing boilerplate code for data marshalling, you just invoke SDK functions like this: kaa_logging_add_record(kaa_client_get_context(kaa_client)->log_collector, log_record); where log_record is a structure auto-generated by Kaa based on your data model. On the other end, in your analytics system, you receive structured data that you can immediately start processing and querying - no need for the custom interpretation code, it's auto-generated for you.
You can write to several destinations simultaneously: for example, save telemetry data into HDFS for warehousing, send to Spark for stream analytics, and push to your custom data processing/visualization service with REST. All of this is configurable by adding log appenders through the Kaa administrative UI.
Kaa takes care of the data delivery reliability and consistency. You can set up one or more reliable log appenders. It is not until all of the configured reliable appenders acknowledge a successful write that the client is instructed to remove the local data copy.
Kaa server is scalable and reliable out-of-the box. There is no single point of failure in the cluster. You can add more server capacity on the fly by spinning off more nodes. They would register against Zookeeper and the cluster would automatically rebalance the load. If there is a node failure, the clients automatically migrate to the remaining nodes.
Kaa is transport agnostic, so you can plug in pretty much any transport protocol implementation you like, including MQTT. The default protocol is similar to MQTT in the amount of overhead it introduces.
The integration instructions specifically for CC3200 are being prepared for the upcoming 0.8.0 release here.
Disclaimer: I work for a company behind Kaa open-source IoT platform.
Related
FIWARE offers using context providers to fetch data from external sources for entities that are queried through the context broker.
With QuantumLeap, historical data can be stored in a time series database such as CrateDB.
Is it possible to combine these two concepts? When querying for historical data in a QuantumLeap setup, could some data instead be fetched from another database via a registered context provider (or a similar proxy implementation)? Preferably using out-of-the-box FIWARE components without too much custom magic.
You can always send the information to the QuantumLeap using the proper protocol. For this purpose, inside the FIWARE Ecosystem we have defined the IoT Agents to translate the common transport protocols and payload formats towards NGSI protocol. Additionally, there is other component, FIWARE Draco that could be used in order to translate the information from one DB to other. In case of QuantumLeap, and also any other FIWARE component, the normal way of working is through subscription to the source of data and this is the major reason why we put always in the architectures the FIWARE Orion Context Broker, whose purpose is to notify subscribers any update in the context information.
What I have?
A lot of different microservices managing by different teams. All microservices persist data in Aerospike database.
What I want to achieve?
I'm building new microservice that relies on data handled by another services. I want to listen the changes in entities, but unfortunately that microservices don't put anything in message queue, they have only usual REST APIs, so I cant just subscribe to events.
The idea is to listen a transaction log(event log/commit log/WAL) of database. This approach is also using in different Event Sourcing systems, but I cant found any Aerospike API that would stream this log. So the question - does Aerospike provide any similar functionality, may be with different name?
Aerospike, in its enterprise edition, has a feature called change notification framework which may fit your requirements. It informs an external agent about all the write operations. This is built over the XDR functionality which is meant for replicating across data centers using a digestlog.
If you are not planning for enterprise, you should reconsider having your own message queue in front of Aerospike.
I need a mechanism to send data from node-red, to be stored in HDFS (Hadoop).
I prefer the data to be streamed. I am thinking about using the 'websocket out' node to write the data to it and use a Flume agent to read.
I am new to node-red.
Could you please let know if I am in the right direction and clarify with some details if I am not? Any alternate approach should also be fine.
Update: node-red offers 'bluemixhdfs' node which is exclusively tied up with IBM bluemix whereas I am using only a vanilla hadoop.
I recently had the similar issue for a small project of mine. So I try to explain my approach.
A little background: In the application, I had to do some processing on real-time streaming data from different data sources. At the same time, I also needed to store the streaming data for future processing.
I used Apache Kafka message broker as an integration agent between Node-RED and HDFS (and also for Apache Spark Stream processing engine).
In Node-RED, I used Kafka node to publish streaming data from different data sources to separate topics in Kafka.
Node-RED flow with Streaming data sources and Apache Kafka
HDFS Sink Connector, a Kafka Connect component, is then used to store the streaming data to the HDFS.
Flow Architecture for Node-RED to HDFS and Spark Streaming using Kafka Message broker
This approach can also be adopted when many streaming data sources like IoT sensors, Stock market data, Social media data, weather api, etc. are to be connected as a single flow using Node-RED and then want to use HDFS for storing these data for further processing.
I'm afraid that I'm not a Hadoop expert and so probably can't provide an answer directly. However it looks like Kafka supports websockets and this should be reasonably performant.
Depending on your architecture though, you should pay some attention to websocket security. Unless NR and Hadoop are both on a private secured network, websockets may be tricky to secure properly.
I think that websocket performance would be reasonable as long as the data size per transaction isn't too large (kb rather than Gb). You will need to do some testing though as there are too many factors influencing the performance of Node-RED to easily predict whether it will have the performance you require.
Node-RED supports a great many types of connectivity so if websockets don't work in your architecture, there are plenty of others such as UNIX pipes, TCP or UDP connections.
I have gone through the official documentation at https://www.elastic.co/blog/found-interfacing-elasticsearch-picking-client
But it does not give any benchmarks or performance numbers to help choose among the clients. And I am finding it non-trivial to setup a TransportClient or setup a NodeClient because the documentation for that is also really sparse with little to no examples whatsoever.
So if someone has already done some benchmarking on choosing a client, I would really appreciate that and focus more on tuning an established client rather than evaluating what client to choose.
Our application is a write-heavy application and we plan to have a 50-shard, 50-replica ES cluster for that.
All those clients are fine for querying and they all have their pros and cons (below list is not exhaustive):
A Node client provides a single hop into the cluster but since it will also be part of the cluster it can also induce too much chatter within the cluster
A Transport client is not part of the cluster, hence requires a two-hop roundtrip, and communicates with a single node at a time in a round-robin fashion (from the list provided during its construction)
Jest is basically the missing client for the ES REST interface
If you feel like you don't need all what Jest has to offer and simply want to interact with a few endpoints, you might as well create your own REST client by using Spring REST template, Apache HTTP, etc
If you're going to have a write-heavy application I suggest you don't even use any of those clients at all. The main reason is that they are all synchronous in nature and if any component of your architecture or the network were to fail for some reason, then you'd lose data, and that might not be an option for you.
If you have plenty of data to ingest, you normally go the asynchronous way, i.e. storing your data in a temporary (yet durable) queue (Kafka, Redis, JMS, etc) and then let another process stream it to ES. There are many ways to do that, but a very simple one is to use Logstash for that.
Whether you decide to store your data in Kafka or JMS or Redis, you can then let Logstash consume your data and stream it to ES, i.e. you let Logstash worry about the heavy write part, which it does very well. That can be achieved very easily with
a kafka or redis or stomp input
a few filters to massage your data
an elasticsearch output to forward the resulting data to ES via the bulk endpoint.
With that kind of well-tuned setup, you can handle very heavy write loads without needing to worry about which client you want to use and how you need to tune it. The question is still open for querying, though, but since the write part is paramount in your case, you need to make it solid, the only serious way is by going asynchronous and let a well-developed and tested ETL (such as Logstash, or fluentd, etc) do it for you.
UPDATE
It is worth noting that as of ES 5.0, there will be a new Java REST client available.
I have a different environments across a few Cloud providers, like windows servers, linux servers in rackspace, aws..etc. And there is a firewall between that and internal network.
I need to build a real time servers environment where all the newly generated IIS logs, apache logs will be sync to an internal big data environment.
I know there are tools like Splunk or Sumologic that might help but we are required to implement this logic in open source technologies. Due to the existence of the firewall, I am assuming I can only pull the logs instead push from the cloud providers.
Can anyone share with me what is the rule of thumb or common architecture for sync up tons of logs in NRT (near real time)? I heard of Apache Flume, Kafka and wondering if those are required or it is just a matter of using something like rsync.
You can use rsync to get the logs but you can't analyze them in the way Spark Streaming or Apache Storm does.
You can go ahead with one of these two options.
Apache Spark Streaming + Kafka
OR
Apache Storm + Kakfa
Have a look at this article about integration approaches of these two options.
Have a look this presentation, which covers in-depth analysis of Spark Streaming and Apache Storm.
Performance is dependent on your use case. Spark Steaming is 40x faster to Storm processing. But if you add "reliability" as key criteria, then data should be moved into HDFS first before processing by Spark Streaming. It will reduce final throughput.
Reliability Limitations: Apache Storm
Exactly once processing requires a durable data source.
At least once processing requires a reliable data source.
An unreliable data source can be wrapped to provide additional guarantees.
With durable and reliable sources, Storm will not drop data.
Common pattern: Back unreliable data sources with Apache Kafka (minor latency hit traded for 100% durability).
Reliability Limitations: Spark Streaming
Fault tolerance and reliability guarantees require HDFS-backed data source.
Moving data to HDFS prior to stream processing introduces additional latency.
Network data sources (Kafka, etc.) are vulnerable to data loss in the event of a worker node failure.