MY ENVIRONMENT
I'm sending commands to a camera simulator device made by Vivid Engineering (Model CLS-211) over RS-232 from my laptop which is running CentOS 7.
ISSUE
I have installed two different serial monitors (minicom, gtkterm) and can successfully send successive commands over and over to the device. I can send a command to dump the memory contents as well. There are several configuration commands I have to send to the CLS-211 to set it up per a specific test. I want to automate this process and have written a ruby script to write a list of commands to the CLS-211 to make this process easier. However, it appears that I am not terminating each command correctly or the CLS-211 requires a specific terminator/signal that I am not giving to it in my ruby script. I'm confused why I can successfully accomplish this task with a serial monitor but not a ruby script. I've configured the serial port settings correctly per their manual so I know this is not the issue. You'll see those settings below defined in my scirpts. Their manual points out they use HyperTerminal which I can't use because I'm on a Linux system. Manufacture mentioned that other serial terminals should work just fine but they have only chosen to test one out being HyperTerminal. I've asked for their feedback on the issue and they simply say "We don't use Linux but it shouldn't be much different, good luck".
TROUBLESHOOTING
I have verified that my "send.rb" script is working to the best of my knowledge by writing a "read.rb" script to read back what I sent. I essentially connected pin 2 "rx" to pin 3 "tx" on the RS-232 cable for a loopback test. Below is my two scripts and the resulting output from running this test.
## Filename send.rb
require 'serialport'
ser = SerialPort.new("/dev/ttyS0", 9600, 8, 1, SerialPort::NONE)
ser.write "LVAL_LO 5\r\n"
ser.write "LVAL_HI 6\r\n"
ser.write "FVAL_LO 7\r\n"
ser.write "FVAL_HI 8\r\n"
ser.close
## Filename read.rb
require 'serialport'
ser = SerialPort.new("/dev/ttyS0", 9600, 8, 1, SerialPort::NONE)
while true do
printf("%c", ser.getc)
end
ser.close
Just found out that I cannot post more than 2 links since my reputation is so low. Anyways the output is just the following...
username#hostname $ ruby read.rb
LVAL_LO 5
LVAL_HI 6
FVAL_LO 7
FVAL_HI 8
I have hooked up the CLS-211 and dumped the memory contents by issuing the DUMP command using GtkTerm and this works fine. The following image shows the memory contents of the first four parameters being LVAL_LO, LVAL_HI, FVAL_LO, and FVAL_HI. I'm just choosing to show four values in the memory dump for the sake of keeping this thread short versus listing all of them. Since I cannot include more than 2 links because my reputation is low being a new guy I'm typing what the output looks like in GtkTerm instead...
CLS-211 initializing, please wait
............
ready
CLS211 Camera Link Simulator CLI
Vivid Engineering
Rev 3.0
DUMP
LVAL_LO = 0x0020 / 32
LVAL_HI = 0x0100 / 256
FVAL_LO = 0x0002 / 2
FVAL_HI = 0x0100 / 256
In the above image you can clearly see that the system boots as expected. After I typed in the command "DUMP" it printed out the memory contents successfully. You see that LVAL_LO = 32, LVAL_HI = 256, FVAL_LO = 2, and FVAL_HI = 256. As I mentioned before I can successfully type in a command to GtkTerm to change a specific parameter as well. The below images shows that after typing in the command LVAL_LO 5 to GtkTerm and then issuing a DUMP command the value 5 was read correctly and LVAL_LO was changed as expected. I can replicate this successful behavior with every command using GtkTerm.
Again, I can't post more than 2 links so I'm writing the output below...
LVAL_LO 5
DUMP
LVAL_LO = 0x0005 / 5
LVAL_HI = 0x0100 / 256
FVAL_LO = 0x0002 / 2
FVAL_HI = 0x0100 / 256
At this point I was like ok everything is working as expected. Lets see if I can execute my ruby script and replicate the same successfully. I then ran the script I typed up above titled "send.rb". Then I opened up GtkTerm and issued a DUMP command afterwards to see if those values were taken. Let it be known that before I ran "send.rb" the values that existed in memory on the CLS-211 were LVAL_LO = 32, LVAL_HI = 256, FVAL_LO = 2, and FVAL_HI = 256. You can see that after running "send.rb", opening GtkTerm back up and issuing the DUMP command the CLS-211 replied w/ "invalid entry". After issuing it again you'll see that it dumped the contents of memory and showed LVAL_LO was changed correctly but the other three values were not.
Almost Successful
At this point I assumed that the first value was being received and written to the memory contents of the CLS-211 correctly but the other commands were not being received correctly. I assumed this was most likely because of the lack of any delay. Therefore, I placed a 1 second delay between each ser.write command. I changed the script "send.rb" to the following.
## Filename send.rb
require 'serialport'
ser = SerialPort.new("/dev/ttyS0", 9600, 8, 1, SerialPort::NONE)
ser.write "LVAL_LO 9\r\n"
sleep(1)
ser.write "LVAL_HI 10\r\n"
sleep(1)
ser.write "FVAL_LO 11\r\n"
sleep(1)
ser.write "FVAL_HI 12\r\n"
sleep(1)
ser.close
The following is the result of running "send.rb" again with the above changes, opening up GtkTerm, and executing the DUMP command to verify memory.
Added sleep(1)
Nothing really changed. I was able to tell that the script took longer to execute and the first value did change but like before the last three values I sent did not get received and saved to memory on the CLS-211.
CONCLUSION
How can I continue troubleshooting this issue? What sort of terminations are happening to each command that I send through GtkTerm and is that different from what I have sent in my ruby script "send.rb" being "...\r\n". Totally lost and out of options on what I can do next.
[EDIT/UPDATE 10/09/17]
I'm so stupid. The one termination character I forgot to try out "by itself" was carriage return "\r". Using a carriage return by itself after each command fixed the issue. I'm curious what requirements drive a manufacturer to define how a serial packet should be constructed in terms of a termination character(s). I would think there would be a predefined standard to what termination character(s) should be used. For completeness I have included my code below to what it should be in the case of communicating correctly with the CLS-211 device. Basically, i took out the '\n' and kept the '\r' that was it.
## Filename send.rb
require 'serialport'
ser = SerialPort.new("/dev/ttyS0", 9600, 8, 1, SerialPort::NONE)
ser.write "LVAL_LO 9\r"
sleep(1)
ser.write "LVAL_HI 10\r"
sleep(1)
ser.write "FVAL_LO 11\r"
sleep(1)
ser.write "FVAL_HI 12\r"
sleep(1)
ser.close
Related
I'm having trouble copying files to ESP32 device running micropython via rshell (remote micropython shell). The problem seems to occur randomly. Sometimes I manage to successfuly send all files (like 20 of them) with rsync and sometimes I cannot even copy one file with cp. There is no error message and script doesn't crash, it just stops working and freezes the console. I've tried with and without -a parameter (I'm sending only .py files). The last thing printed with debug is code to be run on microcontroller below and it just stops there. I didn't find any pattern. I've tried using some other esp32 device and other windows PC. Same results. I even tried lowering default 32 buffer to 16, no improvement. The worst thing is that is sometimes works fine and I cannot get constant results (even bad). It stops on random files, not always the same one.
Rsync command (with --mirror parameter) is very helpful and I could not just copy all files by hand.
EDIT: just tested on mac and it works fine. I guess it's just problem on windows...
Adding /pyboard/protocol/parser.py
----- About to send 2269 bytes of code to the pyboard -----
def recv_file_from_host(src_file, dst_filename, filesize, dst_mode='wb'):
"""Function which runs on the pyboard. Matches up with send_file_to_remote."""
import sys
import ubinascii
import os
if False:
try:
import pyb
usb = pyb.USB_VCP()
except:
try:
import machine
usb = machine.USB_VCP()
except:
usb = None
if usb and usb.isconnected():
# We don't want 0x03 bytes in the data to be interpreted as a Control-C
# This gets reset each time the REPL runs a line, so we don't need to
# worry about resetting it ourselves
usb.setinterrupt(-1)
try:
with open(dst_filename, dst_mode) as dst_file:
bytes_remaining = filesize
if not False:
bytes_remaining *= 2 # hexlify makes each byte into 2
buf_size = 32
write_buf = bytearray(buf_size)
read_buf = bytearray(buf_size)
while bytes_remaining > 0:
# Send back an ack as a form of flow control
sys.stdout.write('\x06')
read_size = min(bytes_remaining, buf_size)
buf_remaining = read_size
buf_index = 0
while buf_remaining > 0:
if False:
bytes_read = sys.stdin.buffer.readinto(read_buf, read_size)
else:
bytes_read = sys.stdin.readinto(read_buf, read_size)
if bytes_read > 0:
write_buf[buf_index:bytes_read] = read_buf[0:bytes_read]
buf_index += bytes_read
buf_remaining -= bytes_read
if False:
dst_file.write(write_buf[0:read_size])
else:
dst_file.write(ubinascii.unhexlify(write_buf[0:read_size]))
if hasattr(os, 'sync'):
os.sync()
bytes_remaining -= read_size
return True
except:
return False
output = recv_file_from_host(None, '/protocol/parser.py', 1467)
if output is None:
print("None")
else:
print(output)
-----
I had the same problem ,
when trying
C:\> cp src\main.py /pyboard/
the cmd freezes.
When I copied using the following
C:\> cp src/main.py /pyboard/
There where no issues, so maybe rshell has some problems when there are "\" in the path
I have the following code to download a torrent off of a magnet URI.
#python
#lt.storage_mode_t(0) ## tried this, didnt work
ses = lt.session()
params = { 'save_path': "/save/here"}
ses.listen_on(6881,6891)
ses.add_dht_router("router.utorrent.com", 6881)
#ses = lt.session()
link = "magnet:?xt=urn:btih:395603fa..hash..."
handle = lt.add_magnet_uri(ses, link, params)
while (not handle.has_metadata()):
time.sleep(1)
handle.pause () # got meta data paused, and set priority
handle.file_priority(0, 1)
handle.file_priority(1,0)
handle.file_priority(2,0)
print handle.file_priorities()
#output is [1,0,0]
#i checked no files written into disk yet.
handle.resume()
while (not handle.is_finished()):
time.sleep(1) #wait until download
It works, However in this specific torrent, there are 3 files, file 0 - 2 kb, file 1 - 300mb, file 3 - 2kb.
As can be seen from the code, file 0 has a priority of 1, while the rest has priority 0 (i.e. don't download).
The problem is that when the 0 file finishes downloading, i want to it to stop and not download anymore. but it will sometimes download 1 file -partially, sometimes 100mb, or 200mb, sometimes couple kb and sometimes the entire file.
So my question is: How can i make sure only file 0 is downloaded, and not 1 and 2.
EDIT: I added a check for whether i got metadata, then set priority and then resume it, however this still downloads the second file partially.
The reason this happens is because of the race between adding the torrent (which starts the download) and you setting the file priorities.
To avoid this you can set the file priorities along with adding the torrent, something like this:
p = parse_magnet_uri(link)
p['file_priorities'] = [1, 0, 0]
handle = ses.add_torrent(p)
UPDATE:
You don't need to know the number of files, it's OK to provide file priorities for more files than ends up being in the torrent file. The remaining ones will just be ignored. However, if you don't want to download anything (except for the metadata/.torrent) from the swarm, a better way is to set the flag_upload_mode flag. See documentation.
p = parse_magnet_uri(link)
p['flags'] |= add_torrent_params_flags_t.flag_upload_mode
handle = ses.add_torrent(p)
Accroding to 《Working with Unix Process》, I know that the smallest filedescriptor is 3, because 0, 1, 2 is allocated to STDIN``STDOUT``STDERR.
But in ubuntu, I put code like this in a file:
passwd = File.open('/etc/passwd')
puts passwd.fileno
then I got 7 instead of 3, if I just put code in irb, it is 9.Is Ubuntu is something different? How can I get the 7 resources opened by ubuntu?
To see which files any process has open, check in /proc/XX/fd where XX is the pid of the process you are interested in. For the current process, you can look in /proc/self/fd.
We have bash script (job wrapper) that writes to a file, launches a job, then at job completion it appends to the file information about the job. The wrapper is run on one of several thousand batch nodes, but has only cropped up with several batch machines (I believe RHEL6) accessing one NFS server, and at least one known instance of a different batch job on a different batch node using a different NFS server. In all cases, only one client host is writing to the files in question. Some jobs take hours to run, others take minutes.
In the same time period that this has occurred, there seems to be 10-50 issues out of 100,000+ jobs.
Here is what I believe to effectively be the (distilled) version of the job wrapper:
#!/bin/bash
## cwd is /nfs/path/to/jobwd
## This file is /nfs/path/to/jobwd/job_wrapper
gotEXIT()
{
## end of script, however gotEXIT is called because we trap EXIT
END="EndTime: `date`\nStatus: Ended”
echo -e "${END}" >> job_info
cat job_info | sendmail jobtracker#example.com
}
trap gotEXIT EXIT
function jobSetVar { echo "job.$1: $2" >> job_info; }
export -f jobSetVar
MSG=“${email_metadata}\n${job_metadata}”
echo -e "${MSG}\nStatus: Started" | sendmail jobtracker#example.com
echo -e "${MSG}" > job_info
## At the job’s end, the output from `time` command is the first non-corrupt data in job_info
/usr/bin/time -f "Elapsed: %e\nUser: %U\nSystem: %S" -a -o job_info job_command
## 10-360 minutes later…
RC=$?
echo -e "ExitCode: ${RC}" >> job_info
So I think there are two possibilities:
echo -e "${MSG}" > job_info
This command throws out corrupt data.
/usr/bin/time -f "Elapsed: %e\nUser: %U\nSystem: %S" -a -o job_info job_command
This corrupts the existing data, then outputs it’s data correctly.
However, some job, but not all, call jobSetVar, which doesn't end up being corrupt.
So, I dig into time.c (from GNU time 1.7) to see when the file is open. To summarize, time.c is effectively this:
FILE *outfp;
void main (int argc, char** argv) {
const char **command_line;
RESUSE res;
/* internally, getargs opens “job_info”, so outfp = fopen ("job_info", "a”) */
command_line = getargs (argc, argv);
/* run_command doesn't care about outfp */
run_command (command_line, &res);
/* internally, summarize calls fprintf and putc on outfp FILE pointer */
summarize (outfp, output_format, command_line, &res); /
fflush (outfp);
}
So, time has FILE *outfp (job_info handle) open the entire time of the job. It then writes the summary at the end of the job, and then doesn’t actually appear to close the file (not sure if this is necessary with fflush?) I've no clue if bash also has the file handle open concurrently as well.
EDIT:
A corrupted file will typically end consist of the corrupted part, followed with the non-corrupted part, which may look like this:
The corrupted section, which would occur before the non-corrupted section, is typically largely a bunch of 0x0000, with maybe some cyclic garbage mixed in:
Here's an example hexdump:
40000000 00000000 00000000 00000000
00000000 00000000 C8B450AC 772B0000
01000000 00000000 C8B450AC 772B0000
[ 361 x 0x00]
Then, at the 409th byte, it continues with the non-corrupted section:
Elapsed: 879.07
User: 0.71
System: 31.49
ExitCode: 0
EndTime: Fri Dec 6 15:29:27 PST 2013
Status: Ended
Another file looks like this:
01000000 04000000 805443FC 9D2B0000 E04144FC 9D2B0000 E04144FC 9D2B0000
[96 x 0x00]
[Repeat above 3 times ]
01000000 04000000 805443FC 9D2B0000 E04144FC 9D2B0000 E04144FC 9D2B0000
Followed by the non corrupted section:
Elapsed: 12621.27
User: 12472.32
System: 40.37
ExitCode: 0
EndTime: Thu Nov 14 08:01:14 PST 2013
Status: Ended
There are other files that have much more random corruption sections, but more than a few were cyclical similar to above.
EDIT 2: The first email, sent from the echo -e statement goes through fine. The last email is never sent due to no email metadata from corruption. So MSG isn't corrupted at that point. It's assumed that job_info probably isn't corrupt at that point either, but we haven't been able to verify that yet. This is a production system which hasn't had major code modifications and I have verified through audit that no jobs have been ran concurrently which would touch this file. The problem seems to be somewhat recent (last 2 months), but it's possible it's happened before and slipped through. This error does prevent reporting which means jobs are considered failed, so they are typically resubmitted, but one user in specific has ~9 hour jobs in which this error is particularly frustrating. I wish I could come up with more info or a way of reproducing this at will, but I was hoping somebody has maybe seen a similar problem, especially recently. I don't manage the NFS servers, but I'll try to talk to the admins to see what updates the NFS servers at the time of these issues (RHEL6 I believe) were running.
Well, the emails corresponding to the corrupt job_info files should tell you what was in MSG (which will probably be business as usual). You may want to check how NFS is being run: there's a remote possibility that you are running NFS over UDP without checksums. That could explain some corrupt data. I also hear that UDP/TCP checksums are not strong enough and the data can still end up corrupt -- maybe you are hitting such a problem (I have seen corrupt packets slipping through a network stack at least once before, and I'm quite sure some checksumming was going on). Presumably the MSG goes out as a single packet and there might be something about it that makes checksum conflicts with the garbage you see more likely. Of course it could also be an NFS bug (client or server), a server-side filesystem bug, busted piece of RAM... possibilities are almost endless here (although I see how the fact that it's always MSG that gets corrupted makes some of those quite unlikely). The problem might be related to seeking (which happens during the append). You could also have a bug somewhere else in the system, causing multiple clients to open the same job_info file, making it a jumble.
You can also try using different file for 'time' output and then merge them together with job_info at the end of script. That may help to isolate problem further.
Shell opens 'job_info' file for writing, outputs MSG and then shall close its file descriptor before launching main job. 'time' program opens same file for append as stream and I suspect the seek over NFS is not done correctly which may cause that garbage. Can't explain why, but normally this shall not happen (and is not happening). Such rare occasions may point to some race condition somewhere, can be caused by out of sequence packet delivery (due to network latency spike) or retransmits which causes duplicate data, or a bug somewhere. At first look I would suspect some bug, but that bug may be triggered by some network behavior, e.g. unusually large delay or spike of packet loss.
File access between different processes are serialized by kernel, but for additional safeguard may be worth adding some artificial delays - sleep timers between outputs for example.
Network is not transparent, especially a large one. There can be WAN optimization devices which are known to cause application issues sometimes. CIFS and NFS are good candidates for optimization over WAN with local caching of filesystem operations. Might be worth looking for recent changes with network admins..
Another thing to try, although can be difficult due to rare occurrences is capture of interesting NFS sessions via tcpdump or wireshark. In really tough cases we do simultaneous capturing on both client and server side and then compare the protocol logic to prove that network is or is not working correctly. That's a whole topic in itself, requires thorough preparation and luck but usually a last resort of desperate troubleshooting :)
It turns out this was actually another issue altogether, apparently to do with an out-of-date page being written to disk.
A bug fix was supplied to the linux-nfs implementation:
http://www.spinics.net/lists/linux-nfs/msg41357.html
I've had success with LuaSocket's TCP facility, but I'm having trouble with its FTP module. I always get a timeout when trying to retrieve a (small) file. I can download the file just fine using Firefox or ftp in passive mode (on Ubuntu Dapper Linux).
I thought it might be that I need LuaSocket to use passive FTP, but then I found that it seems to do that by default. The file I'm trying to retrieve via FTP can be accessed with passive FTP via other programs on my machine, but not via active mode. I found some talk about "hacking" passive mode support into LuaSocket, and that discussion implies that later versions stopped using passive mode, but my version seems to use passive anyway (I'm using 2.0.1; newest is 2.0.2 and does not appear to have any changes relevant to my use case). I'm a little confused about how that post may relate to my situation, partly because it's very old and LuaSocket's source now bears little resemblance to the code in that discussion).
I've boiled my code down to this:
local ftp = require "socket.ftp"
ftp.TIMEOUT = 10
print(ftp.get("ftp://ftp.us.dell.com/app/dpart.txt"))
This gives me a timeout. I ran it under strace on Linux (same as ptrace on Solaris). Here's an abridged transcript:
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
recv(3, "230-Welcome to the Dell FTP site."..., 8192, 0) = 971
send(3, "pasv\r\n", 6, 0) = 6
recv(3, 0x8089a58, 8192, 0) = -1 EAGAIN (Resource temporarily unavailable)
select(4, [3], NULL, NULL, {9, 999934}) = 0 (Timeout)
There's another site I tried connecting to, but it has a password which I can't post here, but in that case the results were slightly different...I got trace like the above but with select() succeeding at the end, then this:
recv(3, "227 Entering Passive Mode (123,456,789,0,12,34)\r\n", 8192, 0) = 49
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("123.456.789.0")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(5, [4], [4], NULL, {9, 999694}) = 0 (Timeout)
Compare this to the trace of my "ftp" program in passive mode (which works fine, though note that it does not set the sockets to nonblocking like LuaSocket does):
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6
write(5, "PASV\r\n", 6) = 6
read(3, "227 Entering Passive Mode (123,456,789,0,12,34)\r\n", 1024) = 51
connect(6, {sa_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("123.456.789.0")}, 16) = 0
So I've tried LuaSocket against these two different FTP sites with different but similar failures. I also tried it from another machine where active FTP works, and it didn't have any better luck there (presumably because LuaSocket is always using passive mode, from what I can tell by reading the source in socket/ftp.lua).
So can anyone here make the LuaSocket two-liner at the top work? Note that on my machine, active FTP to Dell's site doesn't work (I can connect but as soon as I do ls it disconnects), so if you get LuaSocket to work please also note whether active FTP to Dell's site from another program works on your machine.
Hm. It looks like the problem is that LuaSocket uses "pasv" in lower case. I'm going try to figure out a work-around.
Hm. Nope, it looks quite elegantly welded shut. The easiest thing to do is probably to copy that particular file to its equivalent place in a hierarchy in an earlier path in LUA_PATH. That is, (usually) make a local copy of the file, e.g. path/to/your/project/socket/ftp.lua.
Then edit the local file:
- self.try(self.tp:command("user", user or USER))
+ self.try(self.tp:command("USER", user or USER))
- self.try(self.tp:command("pass", password or PASSWORD))
+ self.try(self.tp:command("PASS", password or PASSWORD))
- self.try(self.tp:command("pasv"))
+ self.try(self.tp:command("PASV"))
- self.try(self.tp:command("port", arg))
+ self.try(self.tp:command("PORT", arg))
- local command = sendt.command or "stor"
+ local command = sendt.command or "STOR"
- self.try(self.tp:command("cwd", dir))
+ self.try(self.tp:command("CWD", dir))
- self.try(self.tp:command("type", type))
+ self.try(self.tp:command("TYPE", type))
- self.try(self.tp:command("quit"))
+ self.try(self.tp:command("QUIT"))
Perversely, a navelnaut expedition using getfenv, getmetatable, etc didn't seem to be worth it. I consider it a serious problem with the design. (of LuaSocket)
It's worth noting that RFC0959 uses all-caps commands. (Probably because it's from the 7-bit ASCII era.)
Note that the server is failing to follow the FTP specification, which states commands are case-insensitive. See RFC959, section 5.3 "The command codes are four or fewer alphabetic characters.
Upper and lower case alphabetic characters are to be treated
identically. Thus, any of the following may represent the
retrieve command:
RETR Retr retr ReTr rETr"
This problem is now fixed, with the question and first answer a great help.
Luasocket is correct to RFC 959 (first comment here is not right about upper case, see RFC959 section 5.2)
At least Microsoft FTP server is not compliant. There might be others.
The solution is change pasv to PASV and is a workaround for a command case sensitive server. Details are on the Lua email list, where the archive will be web accessible in a few days.
(edit line 59 of ftp.lua)