#emc-devel | Logs for 2007-10-01

[20:06:15] <alex_joni> seems like the problem mgouget reports on the lists is the same we had for halui iirc
[20:10:05] <alex_joni> hi again
[20:10:07] <mgouget> Hello
[20:10:24] <alex_joni> mgouget: cradek noticed the same thing happening to halui running along with axis or another gui
[20:10:43] <alex_joni> the issue is that multiple gui's running at once can lead to stepping on each other's toes
[20:10:46] <mgouget> I think I have a problem, and I am sot sure It can be solved.
[20:11:03] <alex_joni> the fix I would try is this:
[20:11:07] <alex_joni> http://cvs.linuxcnc.org/cgi-bin/cvsweb.cgi/emc2/src/emc/usr_intf/halui.cc.diff?r1=1.50;r2=1.51
[20:12:15] <mgouget> In my test program, I start at serial_number+10000000
[20:12:38] <alex_joni> yes, but you still use : emcCommandSerialNumber = emcStatus->echo_serial_number;
[20:12:50] <cradek> it's wrong to do that
[20:13:02] <cradek> you will always conflict then!
[20:14:06] <mgouget> emcCommandSerialNumber = emcStatus->echo_serial_number + SERIAL_ISOLATION; with SERIAL_ISOLATION=10000000
[20:14:23] <alex_joni> mgouget: I don't see that in http://www.gouget.org/emc/testnml.c
[20:15:19] <mgouget> I tested with very different serial numbers; it seems that when a command is started, only *one* DONE message is sent
[20:15:44] <alex_joni> are you using waitreceived() ?
[20:15:58] <alex_joni> or wait_done?
[20:16:10] <cradek> if any of your MDI commands take more than 3 seconds (emcTimeout) you will stop waiting and send a new message
[20:16:28] <mgouget> Correct, this is a mod I did afterwards, but it changes nothing. I put the uptodate version now
[20:17:39] <mgouget> Done. prog is: http://www.gouget.org/emc/testnml.c
[20:17:40] <alex_joni> mgouget: any reason to use EMC_WAIT_DONE ?
[20:18:11] <cradek> alex_joni: without WAIT_DONE, switching modes will abort an mdi move that's still in progress
[20:18:58] <mgouget> alex_joni: Not for things that can be checked in status, like estop or mode. But I need it for mdi.
[20:19:00] <SWPadnos> so here's a funny thing. neither emc2-cvs-build-dep or emc2-dev depends on cvs
[20:19:30] <cradek> you do not need cvs to build emc2
[20:20:01] <SWPadnos> that's true, though emc2-cvs-build-dep kinda makes you think the person will get it from cvs (I know that's not necessary, but still)
[20:20:22] <cradek> firefox is nice for browsing cvsweb...
[20:20:30] <SWPadnos> yep
[20:20:38] <alex_joni> mgouget: ok..
[20:20:55] <SWPadnos> and I suspect one could use wget to get a tarfile as well
[20:21:05] <cradek> yep my thoughts exactly
[20:21:07] <alex_joni> mgouget: so what's the issue you're seeing now?
[20:21:24] <alex_joni> * alex_joni does the wget all the time
[20:21:27] <mgouget> alex_joni: In fact, I am onlyinterested in emcCommandWaitDone(). BUT *sometimes* the waitreceived is eaten too.
[20:21:30] <alex_joni> for each release
[20:21:44] <alex_joni> mgouget: try to be a bit more specific
[20:21:54] <cradek> SWPadnos: it may be pedantic but I don't think cvs should be a build-dep
[20:21:58] <SWPadnos> I'm finally doing an install on one of these embedded PCs, and I'm just noticing everything that needs to be done to be able to build EMC2
[20:22:13] <SWPadnos> that's fine. it makes sense because you can get the source through other means
[20:22:23] <mgouget> alex_joni: I don know if it is possible to run 2 GUI at the same time *reliabily*
[20:22:28] <SWPadnos> it just seemed funny, since "cvs" is in the name of the package
[20:22:37] <SWPadnos> (emc2-cvs-build-dep)
[20:22:37] <cradek> SWPadnos: is any stuff missing from the wiki page?
[20:22:53] <SWPadnos> dunno, I'm doing it from memory / as needed
[20:22:56] <cradek> SWPadnos: (yeah I agree, and I'm not sure I like that package or its name)
[20:22:58] <SWPadnos> I can check though
[20:23:05] <mgouget> alex_joni: I need to get an ack for *all* my commands.
[20:23:20] <SWPadnos> emc2-dev-build-dep might make more sense
[20:23:24] <SWPadnos> or less - dunno
[20:23:31] <alex_joni> mgouget: I can understand that
[20:23:41] <alex_joni> mgouget: what you're saying is that sometimes it doesn't come?
[20:23:50] <cradek> mgouget: did you see above where I said that commands that take more than 3 seconds won't be waited on?
[20:24:10] <alex_joni> cradek: except if he uses a negative timeout
[20:24:18] <cradek> emcTimeout = 3.0;
[20:24:32] <mgouget> alex_joni: I started to dive deep in libnml, but things get very quickly very complex.
[20:25:27] <mgouget> yes, that is correct, but I log all the acks I get, and sometimes, one is missing, even a long time after.
[20:25:54] <alex_joni> mgouget: did you get a chance to notice a pattern when an ack is missing?
[20:26:01] <alex_joni> (like doing stuff to the other GUI?)
[20:27:14] <mgouget> alex_joni: yes, it seems that, when I switch to manual, tkemc sends an nml command, and I get the ack from the tkemc command instead of mine.
[20:27:47] <alex_joni> hmm, and your ack never comes?
[20:27:55] <alex_joni> or maybe you just intercepted the wrong one?
[20:28:32] <mgouget> No, I log *all* acks, and mine does NOT comes, even long after.
[20:28:58] <alex_joni> can you tell me what NML commands where sent?
[20:29:07] <alex_joni> something like this:
[20:29:18] <alex_joni> (myapp): switch to manual
[20:29:20] <mgouget> Just try to start mini, and when started, run testnml.
[20:29:24] <alex_joni> (tkemc): whatever
[20:29:33] <alex_joni> (emc): ack whatever
[20:29:36] <mgouget> The commands are at line 596.
[20:29:54] <alex_joni> mgouget: that's a bit hard.. I'm in sweden right now, about 1500km from my emc pc ;)
[20:31:01] <mgouget> alex_joni: OK :) I see my problem as a showstopper, so I have all the time needed to solve it. It can wait some days...
[20:31:37] <alex_joni> if you can give me more data, I can look at the code in emctask
[20:31:46] <alex_joni> maybe I can spot something obvious
[20:32:17] <mgouget> Anyway; commands are: dr_mdi(); dr_mdicmd("g0 x0.1");
[20:32:17] <mgouget> dr_mdicmd("g0 x0");
[20:32:17] <mgouget> dr_manual();
[20:32:53] <mgouget> I will put a log of the output.
[20:33:16] <alex_joni> perfect
[20:36:13] <mgouget> test is running....
[20:38:38] <mgouget> alex_joni: ok, log is at: http://www.gouget.org/emc/testnml.log
[20:40:54] <alex_joni> hmm
[20:43:13] <alex_joni> I don't understand this:
[20:43:14] <alex_joni> Sending sendMdi()
[20:43:14] <alex_joni> emcCommandWaitDone(10000013): Got serial:12 status:1
[20:43:17] <alex_joni> sendMdi() done
[20:45:28] <mgouget> OK: dr_mdicmd(), line 502, sends an mdi command. It first check if we are in mdi mode, if not calls sendMdi() to switch to mdi mode. Then it calls sendMdiCmd(cmd).
[20:46:34] <alex_joni> I got that..
[20:46:49] <alex_joni> but sendMdi() sends a command with serial 100000013
[20:47:05] <alex_joni> right?
[20:47:21] <alex_joni> then emcCommandWaitDone will have to wait for an ack for 100000013
[20:47:47] <mgouget> yes, because at init, line 582, I have: emcCommandSerialNumber = emcStatus->echo_serial_number + SERIAL_ISOLATION;
[20:47:53] <alex_joni> ok, I understood that..
[20:47:56] <alex_joni> but:
[20:48:01] <alex_joni> emcCommandWaitDone(10000013): Got serial:12 status:1
[20:48:19] <alex_joni> that doesn't sound like it got the ack for 100000013
[20:48:50] <alex_joni> yet the next line is 'sendMdi() done'
[20:48:51] <mgouget> I think that this comes from "mini" which is running
[20:49:15] <alex_joni> I am also sure the serial:12 is not a message from you
[20:49:33] <alex_joni> but (your) code moves on, without waiting for the 10000013 ack
[20:49:41] <alex_joni> or am I missing something obvious here?
[20:50:22] <alex_joni> ah, I think I know what happens...
[20:50:45] <alex_joni> can you add another printf to emcCommandWaitDone() ?
[20:50:52] <alex_joni> if (emcStatus->status == RCS_DONE) {
[20:51:08] <mgouget> Yes, I don't log when emcCommandWaitDone returns OK, the higher command logs a "done" when everything is OK.
[20:51:27] <alex_joni> printf(emcCommandWaitDone(%d): Got right? serial: %d status: %d\N", ...)
[20:51:43] <mgouget> OK, I do that, run it and put it on the web
[20:51:48] <alex_joni> perfect
[20:59:25] <mgouget> OK, log is there: http://www.gouget.org/emc/testnml1.log
[21:01:05] <mgouget> RCS_DONE is 1
[21:01:07] <alex_joni> mgouget: can you try to increase the timeout?
[21:01:46] <alex_joni> what I notice here is the following:
[21:01:57] <alex_joni> you always get a good response, then a bad one
[21:02:10] <alex_joni> the bad one (every second one) has the serial wrong by one
[21:02:35] <mgouget> OK, I can pass it to 6s, but it wont change anything, as for example serial 10000016 does not appear after
[21:03:15] <alex_joni> mgouget: I suspect it's a bug incrementing the message number somewhere
[21:03:46] <mgouget> in testnml or in emc?
[21:04:03] <alex_joni> hmm
[21:05:55] <alex_joni> can you try another thing for me?
[21:06:05] <alex_joni> try increasing the message number by 2 in sendMdi()
[21:06:08] <mgouget> I don't think so, for example, you will never see Got serial:10000019 status:1
[21:06:29] <mgouget> OK, I add 3
[21:06:40] <alex_joni> before sending it out
[21:07:06] <alex_joni> emcCommandSerialNumber += 3;
[21:07:20] <alex_joni> mode_msg.serial_number = emcCommandSerialNumber
[21:08:54] <mgouget> log is running...
[21:10:52] <alex_joni> mgouget: 2-3 are enough
[21:11:02] <mgouget> OK, log is there: http://www.gouget.org/emc/testnml2.log
[21:11:48] <alex_joni> ??
[21:12:34] <mgouget> meaculpa, I added 3 to sendMdi(), not sendMdiCmd() :(
[21:12:42] <alex_joni> ;)
[21:12:57] <mgouget> extraball....
[21:13:51] <alex_joni> brb
[21:16:19] <mgouget> OK, log: http://www.gouget.org/emc/testnml3.log
[21:18:34] <alex_joni> back
[21:19:58] <alex_joni> well, not sure what I can suggest you
[21:20:26] <alex_joni> it certainly looks like the problem isn't on your end
[21:20:48] <alex_joni> maybe you can catch a log with the commands mini sends at the same time
[21:20:48] <mgouget> I think that the problem is deep-rooted, emc has never been intended for having more than *1* supervisor :(
[21:21:00] <alex_joni> mgouget: I'm not so sure
[21:21:10] <alex_joni> emc was always intended to have more than one GUI
[21:21:21] <alex_joni> I think
[21:21:58] <mgouget> Do you know if a test program for libnml is available?
[21:22:20] <alex_joni> nope
[21:22:32] <alex_joni> I don't
[21:22:58] <alex_joni> what I would do is drop an email to fred proctor, maybe he knows something ;)
[21:23:03] <mgouget> who picks the messages sent?
[21:23:56] <alex_joni> sorry?
[21:25:00] <mgouget> in emc, is it "task" that picks the messages?
[21:25:17] <alex_joni> I don't understand 'picks'
[21:26:16] <alex_joni> but the idea is the following
[21:26:27] <alex_joni> we have: emcCommand, emcStatus and emcError
[21:26:42] <mgouget> I send a request, through nml, to *someone*; is this someone in the directory "src/emc/task" ?
[21:27:01] <alex_joni> milltask (aka emctaskmain.cc & co) is the one that receives emcCommand, and sets emcStatus & emcError
[21:27:06] <SWPadnos> the receiver could be anywhere
[21:27:17] <alex_joni> mgouget: the receiver is src/emc/task
[21:27:27] <alex_joni> SWPadnos: software-wise (logic)
[21:27:37] <SWPadnos> ah
[21:27:50] <mgouget> Ok, so I have to instrument it, adding logs...
[21:27:50] <alex_joni> mgouget: look at emc.nml
[21:29:38] <mgouget> I have looked, it is the "plumbing" for nml
[21:30:51] <mgouget> seems like "emcsvr.cc" is interesting...
[21:31:45] <alex_joni> hmm, I read something in the docs just now
[21:32:11] <alex_joni> there is the MP config option for buffers
[21:32:24] <alex_joni> that says how many processes can connect to a buffer
[21:32:25] <mgouget> ah ah...
[21:32:30] <alex_joni> emcCommand has 16 defined
[21:32:46] <alex_joni> but both xemc and keystick use the 'cnum' value of 10
[21:32:56] <alex_joni> I would try this:
[21:33:02] <alex_joni> add your own process list:
[21:33:20] <alex_joni> P myproc emcCommand LOCAL localhost W 0 10.0 0 11
[21:33:26] <alex_joni> (with 'cnum' = 11)
[21:33:51] <mgouget> ??? sorry?
[21:34:05] <alex_joni> mgouget: you have the testnml.c file
[21:34:07] <alex_joni> right?
[21:34:14] <mgouget> yes
[21:34:16] <alex_joni> it's local to the rest of emc (I assume)
[21:34:21] <alex_joni> for now..
[21:34:25] <mgouget> yes
[21:34:47] <alex_joni> ok, right now both mini and testnml connect to the buffers using the process definition called 'xemc'
[21:34:56] <mgouget> yes
[21:35:14] <alex_joni> I suppose/hope that is wrong :)
[21:35:44] <alex_joni> and I hope that by connecting using a different process definition you won't have these problems
[21:35:56] <alex_joni> so we need to edit emc.nml, to add another process definition
[21:36:06] <alex_joni> then change testnml.c to use the new process definition
[21:37:26] <alex_joni> mgouget: http://pastebin.ca/722264
[21:37:38] <mgouget> Ok, understood, I change the emc.nml in the configs/sim directory
[21:38:18] <alex_joni> * alex_joni wants to know if this works
[21:38:27] <alex_joni> this could mean we will change emc.nml before 2.2.0
[21:38:49] <alex_joni> (have a line for each GUI, and have them all fixed to read their line, not always 'xemc')
[21:39:41] <mgouget> Everything is not perfectly clear, but it looks like I have at least a pointer :))
[21:39:49] <alex_joni> hmm.. I see that in emc1 emcJavaGui had cnum=8, keystick cnum=9, xemc cnum=10
[21:41:08] <SWPadnos> wouldn't they still share the same status channel though?
[21:41:35] <alex_joni> SWPadnos: they still use the same channels
[21:41:51] <alex_joni> they just have different ID's (no idea what that does though)
[21:42:04] <SWPadnos> hmmm
[21:42:07] <SWPadnos> dunno either :)
[21:42:26] <alex_joni> oh, "c_num is the connection number for this process.c_num must be between 0 and n-1 where n is the number of total_connections specified on the buffer line. Update February-2005: This is not used unless the BufferType is GLOBMEM or mutex=mao is specified and autocnum is not specified."
[21:44:03] <alex_joni> mgouget: if that doesn't work, another thing to try is adding 'queue' to the 'B emcStatus SHMEM localhost 10240 0 0 2 16 1002 TCP=5005 xdr' line
[21:45:09] <mgouget> ok;
[21:48:53] <alex_joni> there are some additional docs here: http://www.isd.mel.nist.gov/projects/rcslib/
[21:48:55] <mgouget> Ok, changed xemc to myproc in testnml, added lines in emc.nml, and started. It works exactly as before.
[21:51:12] <alex_joni> :/
[21:51:17] <alex_joni> try the queue now
[21:51:26] <alex_joni> you don't need to recompile anything for that
[21:51:28] <mgouget> doing it
[21:53:30] <mgouget> Nothing works at all, I just get an ERROR status each time :(
[21:53:49] <mgouget> I have restarted mini before.
[21:54:03] <alex_joni> oopsy :P
[21:54:16] <mgouget> and BTW mini does not work too.
[21:55:18] <mgouget> BUT I got this msg in mini:
[21:55:23] <mgouget> libnml/cms/cms_in.cc 1484: CMS: emcStatus message queue is full.
[21:55:23] <mgouget> libnml/cms/cms_in.cc 1485: (continued) CMS: Message requires 8336 bytes but only
[21:55:23] <mgouget> 0 bytes are left.
[21:55:47] <alex_joni> well.. that's about what I had for you
[21:55:57] <alex_joni> mgouget: I'm sorry, not sure what to advise you next
[21:56:15] <alex_joni> you could try to email will shackleford
[21:56:25] <alex_joni> he wrote the stuff, so he should know
[21:56:36] <mgouget> Thanks, at least, I have some pointers, and the *uncertainty* that it *might* be possible :) :)
[21:56:38] <alex_joni> http://www.isd.mel.nist.gov/projects/rcslib/NMLcfg.html <- email at the end of the page
[21:56:59] <alex_joni> * alex_joni goes to bed
[21:57:02] <alex_joni> good night all
[21:57:19] <mgouget> good night in sweden, and THANKS
[21:57:35] <alex_joni> too bad it didn't help
[22:08:07] <mgouget> Bye!
[22:42:41] <jepler> yay, home again
[22:43:22] <SWPadnos> yay
[22:43:35] <SWPadnos> can you try checking out/updating magma CVS?
[22:43:53] <SWPadnos> it seems their server is down
[22:47:44] <jepler> SWPadnos: I'll try
[22:48:00] <SWPadnos> ok - whenever. just want to see if it's on my end
[22:49:09] <jepler> cvs [checkout aborted]: unrecognized auth response from cvs.gna.org: This service is temporarily unavailable. Please try later.
[22:49:22] <SWPadnos> ok, it's not just me then :) thanks
[22:49:56] <jepler> actually I just ran 'cvs co' again and I'm getting files
[22:50:01] <jepler> so either its intermittent or it just came back?
[22:50:12] <SWPadnos> hmm
[22:51:20] <SWPadnos> ok. still not working here
[22:52:58] <jepler> after that checkout completed, I got the 'unavailable' response to 4 attempts to 'cvs up'
[23:26:56] <jepler> while case $# in 0) break ;; esac
[23:26:58] <jepler> do
[23:27:05] <jepler> </clevar bash scripts>
[23:48:05] <SWPadnos> wow - it's working. yay!
[23:48:15] <SWPadnos> (after only 372 or so attempts)
[23:48:54] <SWPadnos> and it seemed really fast
[23:50:38] <SWPadnos> hmmm
[23:53:43] <LawrenceG> in a cvs make, sudo make install, the library ~/emc2-trunk/tcl/hal.so does not get copied into /usr/share/emc/tcl/hal.so which breaks machine/show hal config command
[23:54:01] <LawrenceG> worth a look before 2.2
[23:54:32] <jmkasunich__> make install has a lot of issues, we've been kind of ignoring it
[23:54:43] <jmkasunich__> I suppose we should try to fix it before 2.2
[23:55:11] <jmkasunich__> most users use packages, most devs do run-in-place
[23:55:26] <LawrenceG> I have several machine running very recent cvs versions so I can look for issues before 2.2
[23:55:39] <LawrenceG> just getting a lathe package going
[23:55:39] <jmkasunich__> thanks
[23:56:01] <LawrenceG> my homemade spindle encoder sems to work a treat
[23:56:08] <jmkasunich__> you have the shoptask, right?
[23:56:11] <LawrenceG> yes
[23:56:14] <jmkasunich__> I saw that encoder, pretty cool
[23:56:19] <jmkasunich__> (I have a shoptask too)
[23:56:36] <LawrenceG> I have cut air with the threading program
[23:56:51] <jmkasunich__> air sure machines nice
[23:57:08] <jmkasunich__> the chips always break just so, and are easy to clean up
[23:57:16] <LawrenceG> need to fine tune the .ini file a bit..... yes air makes for long tool life
[23:58:27] <LawrenceG> I ran the lathe at 250,650 and 950 rpm and encoder signals look great in halscope and on a real scope
[23:59:36] <LawrenceG> the only bitch will be changing belts.... will require removing and recalibrating slot encoders
[23:59:51] <jmkasunich__> ouch