Connect to torrent "swarm" or DHT with ruby

Connect to torrent "swarm" or DHT with ruby - ruby

I'm probably lacking in some fundamental understanding of how BitTorrent, DHT, and the "swarm" work, as I'm not even sure if DHT and the "swarm" are one and the same.
However, I'm trying to find peers, the number of peers, and some statistics about a torrent from its magnet link (and hash).
I've looked for some libraries to accomplish this, but they seem either outdated or unrelated, or just bencode things.
How do I go about connecting and requesting information? A brief explanation would be delightful.

official specs: http://www.bittorrent.org/beps/bep_0000.html
non-official spec: http://wiki.theory.org/BitTorrentSpecification
BEncoding is a data serialization format used in bittorrent for several purposes
The DHT is a global, decentralized, iterative, UDP-based lookup system which can be used to locate clients participating in a particular swarm based on an infohash, which can be either obtained directly from a magnet link or calculated from a .torrent metadata file.
If you have the announce URL for a tracker (an optional part of the torrent file or the magnet link) you can obtain client addresses directly from a tracker.
Once you have obtained client addresses for a particular swarm you can connect to them - or they will connect to you if you have announced yourself to the DHT/a responsible tracker - using the bittorrent wire protocol, which is basically an asynchronous, binary messaging protocol.
To get a fully-functioning, state-of-the-art bittorrent client you have to implement the following:
a DHT node (bencoding over UDP)
tracker announce and scrape protocol (uses bencoding over HTTP and a custom binary prtocol over UDP)
the bittorrent wire protocol (custom binary protocol over TCP with an optional extension layer. some of the messages are bencoded. A congestion-avoiding UDP-based transport protocol called µTP also exists as alternative to TCP)
a torrent file parser (bencoding, obviously)
general stuff like queueing active torrents, file management, highly concurrent network IO
This is a lot of work which to my knowledge has not been done in ruby. So either you have a lot ahead of yourself or you might want to use a bittorrent library written in a different language (e.g. libtorrent) or interface with a client providing a web service (e.g. transmission).

Related

Using gRPC as a IoT protocol instead of LWM2M/CoAP

I have been toying with the idea of using gRPC for 'IoT' type devices; not very constrained things like sensors; more like single board computer inbuilt devices like robots, drones and the like. What is needed a interface between device and centralized controller as the devices are developed separately and possible by other companies. So a versioned interface language is a must; and it should not be just in a word document; something programmable like a header file or WSDL or IDL or ProtocolBuffer. Also between device and controller the behavior should be specified for common use case like registration, re-registration etc.This can be in word file or in some non technical document.
The (rpc) interface specification in Protocol Buffer (ver 3) along with the efficient implementation via gRPC is leading me to choose this over CoAP/ LWM2M (Leshan Java and C++ implementation).
Having used both LWM2M and grPC, I would say that gRPC is more developer friendly; interface definition and implementation is fast,compared to OMA LWM2M process.Of course there is no Observer-Notify in gRPC, but for that MQTT should suffice.
Strictly one cannot compare LWM2M to gRPC. LWM2M is not just the interface but defines behavior in lot of IoT cases like BootStrap, Registration, KeepAlive , SW Upgrade etc and its universal HTTP like GET, PUT on an URL type addressable resource makes it very complete. However most of these behaviors can be custom defined with some effort.
Some of the IoT things which we plan to orchestrate are far from little brained devices like bulbs, and more like robots. Has anyone used gRPC for similar purposes. Any success of failure stories to share

The thing I would miss in gRPC for IoT are the MQTT MQ capabilities like queueing of messages, broker bridging QoS Parameter. Or for CoAP the Out of Band messages over SMS Transport. In this context gRPC is "just" an RPC framework which assumes to be always connected over TCP.

We were thinking in the same solution as Joe's, which includes CoAP + Protobuf:
Protobuf & its IDL are a really helpful way to define and govern our APIs in a centralized way. Google's api-linter as well as AIPs are high quality resources for keeping everything sane.
CoAP for transport: Lightweight protocol for IoT devices. It's available in practically every RTOS/embedded/platform and it's adapted to the low bandwitdh we usually face in this world. Being the protocol chosen by Thread and the support Project Connected Home Over IP is giving to it does also help.
Use Protobuf for the messages no matter they are Request/Response or even event messages pushed by the IoT device. An already solved problem by nanopb which is a protobuf wire compatible C code generator for embedded systems.
Generate our own stubs based on the IDL to create our own wrapper over CoAP and nanopb code. Unary calls are supported and also server streaming by leveraging this to the CoAP observable mechanisms.

I have taken the plunge and used it in a project with connected 'Devices'; These are small computers like Raspberry-pi. Overall it has been good experience; and languages used are C++ and Java mainly and also JavaScript in Node.js . We have used this as Dockerized microservices; Load Balancing is what we have not done; and I read that HTTP/2 based load balancing is tricky; Will update that part; planning to use Kubernetes for that. Overall Container technology with versioned interfaces - GRPC seems like a good fit for (micro) services

I use an esp32 and R_Pi with CoAP and protobufs. As far as I know gRPC is not supported for esp32/8266. I'm pretty happy with it, but didn't do any concrete testing against lwm2m. Implementation is here

Is there a framework for writing a decentral application?

If I want to write a node for a P2P application (like Bitcoin, Bitorrent, etc.) there are a lot of parts that are the same:
I need to bootstrap to the network (discover other peers)
I need to manage a list of peers, and monitor their states
I need to retrieve lists of more peers from my neighbour peers
Etc, etc.
Since I don't want to re-invent the wheel, is their a framework that I could as a sort of base library to build on?

You mention both bitcoin and bittorrent, which are quite different, so I'm assuming you don't want to be bound to any specific protocol or even serialization format.
And yet you mention peer-discovery and stats-management which are high-level concerns, be built on top of some network protocol.
But the protocol dictates how such a mechanism would work.
It sortof sounds like you're asking if there are pre-built roofs that would fit on skyscrapers just as well as on a wood cabin.
So if you actually want to design your own protocol you probably should look more at the foundation first.
which language do you want to use
what IO / event processing libraries are available
what protocol parsers and serializers are available
do you aim for throughput? low memory footprint? low latency? minimal amount of programmer-hours spent?
what kind of security is needed? heavy crypto use at the protocol level will need a trustworthy crypto library (don't roll your own!)
what kind of auxiliary things do you need (where does the data go? filesystem? databases? do you need a UI?)
Alternatively, depending how one interprets your question, if you want to write a client for a specific network then you should simply look for a library implementing the core concepts of that specific network while freeing you up to implement the rest of the application.
In bittorrent's case such an example would be libtorrent

Existing Event Driven Network Protocols

I am building a set of programs that consist of multiple clients and a single server.
The clients are frequently pushing small packets of data to the server, which will validate the information (returning an error if the data is invalid), and process the received information. The information may then incur the firing of events, which clients will be subscribed to, allowing for clients to be instantly (or as close as possible) notified (along with a small amount of data).
I have some ideas about how to do this, but I am trying to avoid creating a protocol of my own, mainly as I'm sure it would take forever and I would probably make a few errors. So I was wondering if there are any existing protocols that I could implement into my system that would provide such functionality.
The number of clients will initially be quite small, but will be growing over time to potentially include 1000's of clients (with their own subscriptions), and several front end servers (each one handling a subset of subscriptions) parsing the information back and forth with back end servers for improved capability.
So, if anyone knows of any existing protocols that implement these requirements and functionality, that would be fantastic.
EDIT
I am currently looking at the XMPP protocol, and the JXTA protocol suite (for reference, and implement with another language). Both seem quite good and provide the necessary connectivity, but I have not had the opportunity to test each of them out in my environment, or if they are even suitable for what I am attempting.
Additionally, some of the network clients will be outside of the local network and operating over WAN. Security is not so much of an issue, but I need to take into account the increase latency of this, and firewall rules (local to the connection that is hosting the application and ISP firewalls) that could be blocking certain ports or transport protocols (I have read some text that said that some ISPs where blocking UDP packets, but not sure of how wide this goes. I can do it at home, the office, mobile, friends houses, etc and have yet to experience it myself).

I'm sorry if the following is not exactly what you're after but I am slightly confused by your use of the word 'protocol'. I understand a protocol to be a 'communication specification' only, where the implementation is left entirely to you. If that is the case I always find the the following graphic usefull, link.
If on the other hand you are looking for a solution which allows you to easily implement the networking side of your application, helping save time, then checkout the following network libraries, which implement their own custom protocol:
NetworkComms.Net
Lidgren
ZeroMQ
Disclaimer: I'm a developer for NetworkComms.Net

Faye vs. Socket.IO (and Juggernaut)

Socket.IO seems to be the most popular and active WebSocket emulation library. Juggernaut uses it to create a complete pub/sub system.
Faye is also popular and active, and has its own javascript library, making its complete functionality comparable to Juggernaut. Juggernaut uses node for its server, and Faye can use either node or rack. Juggernaut uses Redis for persistence (correction: it uses Redis for pub/sub), and Faye only keeps state in memory.
Is everything above accurate?
Faye says it implements Bayeux -- i think Juggernaut does not do this -- is that because Juggernaut is lower level (IE, I can implement Bayeux using Juggernaut)
Could Faye switch to using the Socket.IO browser javascript library if it wanted to? Or do their javascript libraries do fundamentally different things?
Are there any other architectural/design/philosophy differences between the projects?

Disclosure: I am the author of Faye.
Regarding Faye, everything you've said is true.
Faye implements most of Bayeux, the only thing missing right now is service channels, which I've yet to be convinced of the usefulness of. In particular Faye is designed to be compatible with the CometD reference implementation of Bayeux, which has a large bearing on the following.
Conceptually, yes: Faye could use Socket.IO. In practise, there are some barriers to this:
I've no idea what kind of server-side support Socket.IO requires, and the requirement that the Faye client (there are server-side clients in Node and Ruby, remember) be able to talk to any Bayeux server (and the Faye server to any Bayeux client) may be deal-breaker.
Bayeux has specific requirements that servers and clients support certain transport types, and says how to negotiate which one to use. It also specifies how they are used, for example how the Content-Type of an XHR request affects how its content is interpreted.
For some types of error handling I need direct access to the transport, for example resending messages when a client reconnects after a Node WebSocket dies.
Please correct me if I've got any of this wrong - this is based on a cursory scan of the Socket.IO documentation.
Faye is just pub/sub, it's just based on a slightly more complex protocol and has a lot of niceties built in:
Server- and client-side extensions
Wildcard pattern-matching on channel routes
Automatic reconnection, e.g. when WebSockets die or the server goes offline
The client works in all browsers, on phones, and server-side on Node and Ruby
Faye probably looks a lot more complex compared to Juggernaut because Juggernaut delegates more, e.g. it delegates transport negotiation to Socket.IO and message routing to Redis. These are both fine decisions, but my decision to use Bayeux means I have to do more work myself.
As for design philosophy, Faye's overriding goal is that it should work everywhere the Web is available and should be absolutely trivial to get going with. I'ts really simple to get started with but its extensibility means it can be customized in quite powerful ways, for example you can turn it into a server-to-client push service (i.e. stop arbitrary clients pushing to it) by adding authentication extensions.
There is also work underway to make it more flexible on the server side. I'm looking at adding clustering support, and making the core pub-sub engine pluggable so you could use Faye as a stateless web frontend for another pub-sub system like Redis or AMQP.
I hope this has been helpful.

AFAIK, yes, apart from the fact Juggernaut only uses Redis for Pubsub, not persistence. Also means client libraries in most languages have already been written (since it just needs a Redis adapter).
Juggernaut doesn't implement Bayeux, but rather has a very simple custom JSON protocol
I don't know, but probably
Juggernaut is very simple, and designed to be that way. Although I haven't used Faye, from the docs it looks like it has a lot more features than just PubSub. Being built on top of Socket.IO has it advantages too, Juggernaut's supported in practically every browser, both desktop and mobile.
I'll be really interested in what Faye's author has to say. As I say, I haven't used it and it would be great to know how it compares to Juggernaut. It's probably the case of using the best tool for the job. If it's pubsub you need, Juggernaut does that very well.

Faye certainly could.
Another example of a similar project on top of Socket.IO:
https://github.com/aaronblohowiak/Push-It

How to extract information from client/server communication with no documentation?

What are methods for undocumented client/server communication to be captured and analyzed for information you want and then have your program looking for this information in real time? For example, programs that look at online game client/server communication and get information and use it to do things like show location on a 3rd party map, etc.

Wireshark will allow you to inspect communication between the client-server (assuming you're running one of them on your machine). As you want to perform this snooping in your own application, look at WinPcap. Being able to reverse engineer the protocol is a whole other kettle of fish, mind.

In general, wireshark is an excellent recommendation for traffic/protocol analysis- however, you seem to be looking for something else:
For example, programs that look at online game client/server communication and get information and use it to do things like show location on a 3rd party map, etc.
I assume you are referring to multiplayer games and game servers?
If so, these programs are usually using a dedicated service connection to query the corresponding server for positional updates and other meta information on a different port, they don't actually intercept or inspect client/server communciations at realtime, and they don't really interfere with these updates, either.
So, you'll find that most game servers provide support for a very simply passive connection (i.e. output only), that's merely there for getting certain runtime state, which in turn is often simply polled by a corresponding external script/webpage.
Similarly, there's often also a dedicated administration interface provided on a different port, as well as another one that publishes server statistics, so that these can be easily queried for embedding neat stats in webpages.
Depending on the type of game server, these may offer public/anonymous use, or they may require certain credentials to access such a data port.
More complex systems will also allow you to subscribe only to specific state and updates, so that you can dynamically configure what data you are interested in.
So, even if you had complete documentation on the underlying protocol, you wouldn't really be able to directly inspect client/server communications without being in between these communications. This can however not be easily achieved. In theory, this would basically require a SOCKS proxy server to be set up and used by all clients, so that you can actually inspect the communications going on.
Programs like wireshark will generally only provide useful information for communications going on on your own machine/network, and will not provide any information about communications going on in between machines that you do not have access to.
In other words, even if you used wireshark to a) reverse engineer the protocol, b) come up with a way to inspect the traffic, c) create a positional map - all this would only work for those communications that you have access to, i.e. those that are taking place on your own machine/network. So, a corresponding online map would only show your own position.
Of course, there's an alternative: you can emulate a client, so that you are being provided with server-side updates from other clients, this will mostly have to be in spectator mode.
This in turn would mean that you are a passive client that's just consuming server-side state, but not providing any.
So that you can in turn use all these updates to populate an online map or use it for whatever else is on your mind.
This will however require your spectator/client to be connected to the server all the time, possibly taking up precious game server slots.
Some game servers provide dedicated spectator modes, so that you can observe the whole game play using a live feed. Most game servers will however automatically kick spectators after a certain idle timeout.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio