Why does FTP require port number to be split?

Why does FTP require port number to be split? - ftp

I recently had to implement a FTP client (in active mode). Something I found remarkable in RFC 959 is the fact that the port number should be split into 8-bits for the PORT command.
An example: when using port 20000 on the client, this should binary be split. 20000 base 10 = 0100111000100000 base 2. This should be split into 01001110 and 00100000, which are resp. 78 and 32. These numbers should be sent as plaintext digits.
Is there any reason why the standard chose this approach? It seems weird both from an efficiency and an easy to debug standpoint.

Is there any reason why the standard chose this approach?
This is likely lost in history. But probably the typical format for IP:Port as used today was not established at this time (this was way before HTTP and the syntax of URLs) so encoding a sockaddr_in with its 4 byte IP and 2 byte port as a sequence of 6 numbers delimited by comma probably made some sense.
It seems weird both from an efficiency and an easy to debug standpoint.
FTP is a text based protocol. Efficiency was obviously not a design criteria - otherwise it would have been done all binary. Having a sequence of 6 bytes instead of IP:port is fine for debugging if the layer where the debugging is done is C code and your are effectively dealing with a 6 byte addressing (4 byte IP, 2 byte port) in the form of a sockaddr_in struct.

Related

IPsec anti-replay service, sequence no is less than lower sequence in the window, packet will be dropped?

I have one more query over the IPsec anti replay window service, considering one example. I am having a 64 window size, window size range from 1 to 64. Considering all sequence number received by the receiver except seq no 3, later received seq no 68 and the top window shifted to 4 bits and bottom window to 4 bit right. Top= 68 Bottom= 5 So now in this case,
the first question is:
Whether the window will shift 4 bit? I think yes. need input for the same?
If yes what will happen for seq no 3 index which is not received( which was not marked). Later If seq no 3 gets in then seq no < bottom so packet should dropped right? Could someone please share their inputs for the same.
NOTE: I am using a odp-dpdk as the data engine here, linux is not coming into play here.

I didn't quite understand your first question, but yes, the bottom limit becomes 5 now. If you receive a packet with sequence number 3 after that, then the packet will be dropped.
There's no re-transmission mechanism in IPSec; the upper layer protocols need to take care of the missing packets. For example, TCP will re-transmit a packet which hasn't been acknowledged within a time-frame. At IPSec layer, this packet will get encrypted and transmitted again. IPSec won't even care that it's a re-transmission.

Significantly different LAN transmit speed

I have two machines, one Mac and one Linux, in the same local network. I tried to transfer files by using one of them as an HTTP server, it turned out the download speeds were quite different based on which one was the server. If I use Mac as the server, the download speed was around 3MB/s, but in the opposite way, it's about 12MB/s. Then I used iperf3 to test the speed between them and got a similar result:
When Mac was the server and Linux the client:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 28.7 MBytes 2942 KBytes/sec 1905 sender
[ 5] 0.00-10.00 sec 28.4 MBytes 2913 KBytes/sec receiver
When Linux was the server and Mac the client:
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 162 MBytes 16572 KBytes/sec sender
[ 4] 0.00-10.00 sec 161 MBytes 16526 KBytes/sec receiver
I asked a friend to do the download test for me and he told me the speeds were both around 1MB/s on his two Mac, which was far from the router's capacity. How could this happen?

This isn't going to be much of an answer, but it will probably be long enough that it is not going to fit into a comment.
Your observation of "bogus TCP header length" is very interesting; I have never seen it in a capture before, so I wanted to check out exactly what it means. Here you can see that it means that the wireshark TCP protocol dissector can't make any sense of the segment, because the TCP header length is less than the minimum TCP header length.
So it seems you have an invalid TCP segment. Only two causes I know are that it was somehow erroneously constructed (i.e. a bug or intrusion attempt) or that it was corrupted.
I have certainly created plenty of invalid segments when working with raw sockets, and I have seen plenty of forged segments that were not standards conforming, but this doesn't seem likely to be the case in your situation.
So, based on my experience, it seems most likely that it has been somehow corrupted. Although if it was a transmitted packet in the capture, then you are actually sending a invalid segment. So in what follows, I'm assuming it was a received segment.
So where could it have been corrupted? The first mystery is that you are seeing it at all in a capture. If it had been corrupted in the network, the Frame Check Sequence (FCS, a CRC) shouldn't match, and it should have been discarded.
However, it is possible to configure your NIC/Driver to deliver segments with an invalid FCS. On linux you would check/configure these settings with ethtool and the relevant parameters are rx-fcs and rx-all (sorry, I don't know how to do this on a Mac). If those are both "off," your NIC/Driver should not be sending you segments with an invalid FCS and hence they wouldn't appear in a capture.
Since you are seeing the segments with an invalid TCP header length in your capture, and assuming your NIC/Driver is configured to drop segments with an invalid FCS, then your NIC saw a valid segment on the wire, and the segment was either corrupted before the FCS was calculated by a transmitter (usually done in the NIC), or corrupted after the FCS was validated by the receiving NIC.
In both these cases, there is a DMA transfer over a bus (e.g. PCI-e) between CPU memory and the NIC. I'm guessing there is a hardware problem causing corruption here, but I'm not so confident in this guess as I have little information to go on.
You might try getting a capture on both ends to compare what is transmitted to what is received (particularly in the case of segments with invalid TCP header lengths). You can match segments in the captures using the ID field in the IP header (assuming that doesn't get corrupted as well).
Good luck figuring it out!

What 's different between "text frame" and "binary frame" in Websocket?

Text frame(opcode = 0x01) and Binary frame(opcode = 0x02) in RFC 6455. What's different between them and which one is faster?

For some background and perhaps more familiarity in other realms, HTTP/1 relied on an unstructured plaintext protocol while HTTP/2 allowed for faster processing of messages through binary framing. Also, SMTP relies on text while TCP relies on a binary protocol. To sum up, text relies on ASCII ('Hello') while binary, although a confusing term, has no readable representation. In JavaScript, socket.send(new ArrayBuffer(8)) is a basic binary object and is one example of the format that can be sent. This allocates a contiguous memory area of 8 bytes and pre-fills it with zeroes.
Hopefully this provides some good context.
0x01 is hexadecimal number (denoted by 0x and represents the integer 1) is the 4 bit opcode for the receiver (client or server) to know what type of "frame" it will receive. UTF-8 is denoted by 0x01 and raw binary is denoted by 0x02.
Raw binary will be much faster. Imagine an architecture like so:
If we send textual data (opcode 0x01) via the Websocket, it must translate the received data at the proxy. Before the TCP server responds to the client message, it may have to translate the response to textual data. Encoding binary data to ASCII text via base64 will increase the size of the message by 20-30%.
With an opcode of 0x02, we can skip 2 steps in the whole request/response cycle and reduce the size of the message we are passing. In short, we skip interpretation. Moreover, the Websocket spec has UTF-8 validation rules. The difference between text and binary might be 2x in favor of binary.

Significance of Bytes as 8 bits

I was just wondering the reason why A BYTE IS 8 BITS ? Specifically if we talk about ASCII character set, then all its symbols can be represented just 7 bits leaving one spare bit(in reality where 8 bits is 1 Byte). So if we assume, that that there is big company wherein everyone has agreed to just use ASCII character set and nothing else(also this company doesn't have to do anything with the outside world) then couldn't in this company the developers develop softwares that would consider 7 Bits as 1 Byte and hence save one precious bit, and if done so they could save for instance 10 bits space for every 10 bytes(here 1 byte is 7 bits again) and so, ultimately lots and lots of precious space. The hardware(hard disk,processor,memory) used in this company specifically knows that it need to store & and bunch together 7 bits as 1 byte.If this is done globally then couldn't this revolutionise the future of computers. Can this system be developed in reality ?
Won't this be efficient ?

A byte is not necessarily 8 bits. A byte a unit of digital information whose size is processor-dependent. Historically, the size of a byte is equal to the size of a character as specified by the character encoding supported by the processor. For example, a processor that supports Binary-Coded Decimal (BCD) characters defines a byte to be 4 bits. A processor that supports ASCII defines a byte to be 7 bits. The reason for using the character size to define the size of a byte is to make programming easier, considering that a byte has always (as far as I know) been used as the smallest addressable unit of data storage. If you think about it, you'll find that this is indeed very convenient.
A byte is defined to be 8 bits in the extremely successful IBM S/360 computer family, which used an 8-bit character encoding called EBCDI. IBM, through its S/360 computers, introduced several crucially important computing techniques that became the foundation of all future processors including the ones we using today. In fact, the term byte has been coined by Buchholz, a computer scientist at IBM.
When Intel introduced its first 8-bit processor (8008), a byte was defined to be 8 bits even though the instruction set didn't support directly any character encoding, thereby breaking the pattern. The processor, however, provided numerous instructions that operate on packed (4-bit) and unpacked (8-bit) BCD-encoded digits. In fact, the whole x86 instruction set design was conveniently designed based on 8-bit bytes. The fact that 7-bit ASCII characters fit in 8-bit bytes was a free, additional advantage. As usual, a byte is the smallest addressable unit of storage. I would like to mention here that in digital circuit design, its convenient to have the number of wires or pins to be powers of 2 so that every possible value that appear as input or output has a use.
Later processors continued to use 8-bit bytes because it makes it much easier to develop newer designs based on older ones. It also helps making newer processors compatible with older ones. Therefore, instead of changing the size of a byte, the register, data bus, address bus sizes were doubled every time (now we reached 64-bit). This doubling enabled us to use existing digital circuit designs easily, significantly reducing processor design costs.

The main reason why it's 8 bits and not 7 is that is needs to be a power of 2.
Also: imagine what nibbles would look like in 7-bit bytes..
Also ideal (and fast) for conversion to and from hexadecimal.
Update:
What advantage do we get if we have power of 2... Please explain
First, let's distinguish between a BYTE and a ASCII character. Those are 2 different things.
A byte is used to store and process digital information (numbers) in a optimized way, whereas a character is (or should be) only meant to interact with us, humans, because we find it hard to read binary (although in modern days of big-data, big-internetspeed and big-clouds, even servers start talking to each other in text (xml, json), but that's a whole different story..).
As for a byte being a power of 2, the short answer:
The advantage of having powers of 2, is that data can easily be aligned efficiently on byte- or integer-boundaries - for a single byte that would be 1, 2, 4 and 8 bits, and it gets better with higher powers of 2.
Compare that to a 7-bit ASCII (or 7-bit byte): 7 is a prime number, which means only 1-bit and 7-bit values could be stored in an aligned form.
Of course there are a lot more reasons one could think of (for example the lay-out and structure of the logic gates and multiplexers inside CPU's/MCU's).
Say you want to control the in- or output pins on a multiplexer: with 2 control-lines (bits) you can address 4 pins, with 3 inputs, 8 pins can be addressed, with 4 -> 16,.. - idem for address-lines. So the more you look at it, the more sense it makes to use powers of 2. It seems to be the most efficient model.
As for optimized 7-bit ASCII:
Even on a system with 8-bit bytes, 7-bit ASCII can easily be compacted with some bit-shifting. A Class with a operator[] could be created, without the need to have 7-bit bytes (and of course, a simple compression would even do better).

An algorithm to determine if a number belongs to a group or not

I'm not even sure if this is possible but I think it's worth asking anyway.
Say we have 100 devices in a network. Each device has a unique ID.
I want to tell a group of these devices to do something by broadcasting only one packet (A packet that all the devices receive).
For example, if I wanted to tell devices 2,5,75,116 and 530 to do something, I have to broadcast this : 2-5-75-116-530
But this packet can get pretty long if I wanted (for example) 95 of the devices to do something!!!
So I need a method to reduce the length of this packet.
After thinking for a while, I came up with an idea:
what if I used only prime numbers as device IDs? Then I could send the product of device IDs of the group I need, as the packet and every device will check if the remainder of the received number and its device ID is 0.
For example if I wanted devices 2,3,5 and 7 to do something, I would broadcast 2*3*5*7 = 210 and then each device will calculate "210 mod self ID" and only devices with IDs 2,3,5 and 7 will get 0 so they know that they should do something.
But this method is not efficient because the 100th prime numbers is 541 and the broadcasted number may get really big and the "mod" calculation may get really hard.(the devices have 8bit processors).
So I just need a method for the devices to determine if they should do something or ignore the received packet. And I need the packet to be as short as possible.
I tried my best to explain the question, If its still vague, please tell me to explain more.

You can just use a bit string in which every bit represents a device. Then, you just need a bitwise AND to tell if a given machine should react.
You'd need one bit per device, which would be, for example, 32 bytes for 256 devices. Admittedly, that's a little wasteful if you only need one machine to react, but it's pretty compact if you need, say, 95 devices to respond.
You mentioned that you need the device id to be <= 4 bytes, but that's no problem: 4 bytes = 32 bits = enough space to store 2^32 device ids. For example, the device id for the 101st machine (if you start at 0) could just be 100 (0b01100100) = 1 byte. You would just need to use that to figure out which byte of the packet to use (ceil(100 / 8) = the 13th) and bitwise AND that byte against 100 % 8 = 4 = 0b00000100.
As cobarzan said, you also can use a hybrid scheme allowing for individual addressing. In that scenario, you could use the first bit as a signal to indicate multiple- or single-machine addressing. As cobarzan said, that requires more processing, and it means the first byte can only store 7 machine signals, rather than 8.

Like Ed Cottrell suggested, a bit string would do the job. If the machines are labeled {1,..,n}, there are 2n-1 possible subsets (assuming you do not send requests with no intended target). So you need a data structure able to hold every possible signature of such a subset, whatever you decide the signature to be. And n bits (one for each machine) is the best one can do regarding the size of such a data structure. The evaluation performed on the machines takes constant time (on machine with label l just look at the lth bit).
But one could go for some hybrid scheme. Say you have a task for one device only, then it would be a pity to send n bits (all 0s, except one). So you can take one additional bit T which indicates the type of packet. The value of T is set to 0 if you are sending a bit string of length n as described above or set to 1 if you are using a more appropriate scheme (i.e. less bits). In the case of just one machine that needs to perform the task, you could send directly the label of the machine (which is O(log n) bits long). This approach reduces the size of the packet if you have less than O(n/log n) machines you need to perform the task. Evaluation on the machines is more expensive though.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio