#emc-devel | Logs for 2008-03-19

[02:29:28] <elson> hello, all!
[02:29:54] <SWPadnos> hmmm. jmkasunich - there are two places where a "realtime error" can be printed. one does it once, the other will do 'something' 10 times
[02:30:02] <SWPadnos> hi elson
[02:30:19] <elson> I've got a "twilight zone" situation, which I'll send to the mail list, but I need to run latency-test, and it gets an error.
[02:30:21] <SWPadnos> how's life in MO?
[02:30:22] <elson> Hi, Steve!
[02:30:32] <elson> In MO, right now, it is WET!
[02:30:37] <SWPadnos> heh
[02:30:54] <SWPadnos> that could be better (or worse) than COLD! like we have
[02:30:56] <jmkasunich> SWPadnos: I think the one most people are seeing only once is the one that only appears once
[02:31:04] <SWPadnos> that may be so
[02:31:10] <jmkasunich> and they are jumping to the wrong conclusion
[02:31:48] <elson> Yeah, well, that's MY "twilight Zone", too. But, my system only does it when COLD??!!
[02:31:52] <SWPadnos> I had taken a look at one of them, but didn't see where it could be delayed to avoid printing errors if there's a startup glitch
[02:32:19] <SWPadnos> got a sticky axis?
[02:32:35] <elson> No, that won't cause a realtime delay!
[02:32:38] <SWPadnos> heh
[02:32:57] <SWPadnos> oh - so you get a RT delay problem when the PC is cold?
[02:33:46] <elson> I just updated to the trunk, as of Sunday, and now I am getting realtime delays which I never got before, and at the same time, one or more axes get their hardware encoder count trashed. But, only when cold? HuH???
[02:34:04] <SWPadnos> I agree
[02:34:05] <elson> Yeah, I am STUMPED by that one!
[02:34:07] <SWPadnos> Huh?
[02:34:10] <SWPadnos> :)
[02:34:31] <SWPadnos> so if you turn on the PC and wait a while before starting EMC, it's fine?
[02:34:54] <SWPadnos> or is it the external hardware that needs to warm up?
[02:35:10] <elson> OK, I can see one possiblity, an EPP/1284 port timeout. Something throws off the timing, and these 10 us timeouts bog down the PPMC driver.
[02:35:44] <elson> Yeah, COULD be external hdw, that's why I want to run the latency test and clear or indict the computer.
[02:36:13] <SWPadnos> I can see something if there's a problem that's on a threashold - cold caps might pull the oscillator a little one way or the other
[02:36:27] <SWPadnos> but you'd have to be really close to the margin for that to be a problem
[02:36:47] <elson> Anyway, the latency-test program gives "Line 136: halrun: command not found" but the file is clearly there in the scripts dir.
[02:37:03] <SWPadnos> did you source emc-environment?
[02:37:44] <SWPadnos> (I'm assuming run-in-place for the CVS checkout, but that there's an installed version as well)
[02:38:03] <elson> Well, if you don't clear the timeout every bus cycle, then it can stay timed-out for a while. Flipping the bus direction has code to reset the timeout bit, otherwise I don't test it, as it slows everything sown a lot.
[02:38:55] <SWPadnos> is this a USC/UPC system or PPMC?
[02:39:06] <elson> Ummm, yeah, I probably forgot a step. I did the CVS checkout, ./configure --enable-run-in-place,
[02:39:15] <elson> then make, and make setuid
[02:39:24] <SWPadnos> `. scripts/emc-environment`
[02:39:37] <elson> It is the PPMC on my Bridgeport.
[02:39:42] <SWPadnos> then stuff will work in that shell
[02:39:50] <SWPadnos> do that in any shell you want to use with that checkout
[02:40:09] <elson> OK, I will try that, thanks, Steve!
[02:40:16] <SWPadnos> hmmm - ok. that makes it a little harder to do the hairdryer / cold spray trick ;)
[02:41:29] <elson> I thought this might have been an ESD problem for a while, but it is practically raining INSIDE now and still doing it. So, not likely.
[02:41:59] <SWPadnos> what kind of temperatures are you talking about for it to work/not work?
[02:42:30] <elson> I let it run for about 5 hours Sunday evening without failure, after about 3 failures in the first ten minutes ofter powering up.
[02:42:43] <elson> Something around 65 F.
[02:43:28] <elson> It is still about 65 F, but the equipment warms itself just a little. Probably the computer warms up inside a LOT more than the PPMC card cage.
[02:44:13] <SWPadnos> do you have separate control of power to the PC vs. the PPMC?
[02:44:33] <elson> Yes, I can pull 120 V plugs and power in any sequence.
[02:44:52] <SWPadnos> ok, that suggests a plan of attack at any rate
[02:45:31] <SWPadnos> ice packs on the PPMC, turn on the PC for 20 minutes or so (run glxgears or do repetitive kernel compiles just to heat it up faster)
[02:45:36] <elson> Well, I wanted to put any real time problems to rest first. This machine has on-mobo video, and that may be a problem.
[02:46:16] <SWPadnos> hmmm. I wonder if there's an overrun count somewhere in HAL
[02:46:42] <elson> This thing is quirky and intermittent enough as to make proving I have fixed it very tough.
[02:47:50] <elson> Oh, the overrun numbers I got in the dmesg was "normal 715757-724311, last time 1054259. Does that mean 1.05 ms?
[02:48:20] <SWPadnos> there's a parameter called motion.servo.overruns
[02:48:37] <SWPadnos> it gets incremented any time there's an overrun, so you can look at that with halscope
[02:48:38] <elson> Oh, that sounds interesting!
[02:49:39] <SWPadnos> it may go up by 5 each time though - there's a rolling buffer of 5 periods and it may trigger every time the anomaly is in the buffer
[02:50:07] <SWPadnos> (I'm not looking at the code carefully enough to tell at the moment :) )
[02:50:43] <SWPadnos> those numbers are clock cycles, so it's 1.05 ms if your CPU clock is 1GHz
[02:52:02] <elson> Ah, so on a 600 MHz machine, it is WORSE! Almost 2 ms! But wait, what about the 715xxx number. Or, is that the count of a full ms period at a 600 MHz clock?
[02:52:15] <SWPadnos> there are also motion.servo.last-period and if CPU_KHZ is available, motion.servo.last-period-ns
[02:52:35] <SWPadnos> that sounds high, by about 115k
[02:52:50] <SWPadnos> you sure it's not a 733?
[02:54:10] <elson> Very good, I just logged in and checked, in is indeed a 730 MHz CPU.
[02:54:41] <SWPadnos> heh
[02:55:17] <elson> OK, well, I need to go check a few things and start up the latency test.
[02:55:37] <SWPadnos> if you start EMC, you'll get that count of overruns
[02:55:52] <elson> Thanks for the info. I'll send a note in the mail list when I get something figured out.
[02:55:53] <SWPadnos> just stick a halmeter on it and do the things you'd do during a latency test
[02:56:02] <elson> Yes.
[02:56:03] <SWPadnos> cool. good luck chasing the ghost
[02:56:18] <elson> Boy, they just get stranger and stranger!
[02:56:21] <SWPadnos> heh
[02:56:26] <SWPadnos> that's a good sign, usually
[02:56:36] <SWPadnos> (means all the easy stuff is already figured out)
[02:56:39] <elson> Unless you're the one who has to find it!
[02:56:50] <SWPadnos> don't forget dewpoint!
[02:56:52] <SWPadnos> :)
[02:57:35] <elson> Yeah, that ocurred to me, but I don't THINK that's it. Dewpoint is high now, was low last week. Both times did it cold.
[03:03:42] <cradek> I wonder if elson ever saw jmk's blog post
[03:04:00] <cradek> I bet that's one of the more intricate setups using ppmc
[03:17:59] <SWPadnos> I wonder if JonE has been to Stuart's shop?
[03:18:36] <cradek> not to my knowledge (I'd probably know)
[03:18:41] <SWPadnos> heh
[03:20:03] <cradek> man is it only tuesday?
[03:20:10] <cradek> I can't believe it
[03:20:17] <SWPadnos> yeah - pretty amazing
[03:20:25] <SWPadnos> it feels like Thursday or Friday already
[03:20:33] <cradek> I agree
[03:26:59] <fenn> the earth must have passed through a time portal
[03:27:26] <fenn> or maybe it's the whole spring break thing
[03:27:30] <cradek> or I stayed up too late last night (seems more likely the problem)
[03:27:46] <cradek> hmm, maybe I should go to bed.
[03:29:10] <SWPadnos> heh - but it's early there!
[12:12:37] <Guest532> Guest532 is now known as skunkworks_
[12:49:06] <jepler> hi skunkworks_
[12:59:54] <jepler> An error occurred. Dammit. Error was: You have to choose something to select by.
[13:01:12] <skunkworks_> Hi jepler.
[13:01:56] <skunkworks_> I thought the overrun popup said something to the effect that 'all other errors will be supressed..'
[13:02:02] <cradek_> cradek_ is now known as cradek
[13:02:14] <skunkworks_> Am I thinking of a different error?
[13:02:31] <skunkworks_> (I don't have emc on anything here right now)
[13:05:36] <cradek> there are now two errors. I think one of them (the older one) says that
[13:08:28] <jepler> in fact, the new error is printed once at RTAPI_MSG_ERR and 9 more times at RTAPI_MSG_WARN, so it will appear the same from the user's viewpoint (axis only pops up one dialog per run)
[13:43:15] <cradek> it seems like a lot of people get it (and ignore it)
[13:43:50] <skunkworks_> and wonder why they have stalling/smoothness issues.
[13:45:23] <cradek> it's really hard to know how much RT problem is too much RT problem
[13:52:00] <alex_joni> right.. emc1 had 100% overrun before reporting anything
[13:52:01] <alex_joni> iirc
[13:54:03] <cradek> if (emcmotDebug->cur_time - emcmotDebug->last_time > 10 * emcmotConfig->servoCycleTime) { reportError("controller missed realtime deadline."); IGNORE_REALTIME_ERRORS = 1; }
[13:54:07] <cradek> ^^ emc1
[13:54:34] <cradek> it had to miss more than ten servo cycles to report a problem!
[13:54:40] <alex_joni> ouch :/
[13:54:52] <cradek> (I would get this error on my laptop)
[13:54:53] <alex_joni> that's what? 1k pulses?
[13:55:08] <cradek> what do you mean pulses? :-)
[13:55:13] <alex_joni> steps
[13:55:21] <cradek> steps? I still don't understand
[13:55:47] <cradek> (I bet when that line was written there were no steps)
[13:55:54] <alex_joni> if 10 servo cycles have passed, about 50 times more base_period cycles have passed
[13:56:09] <alex_joni> so 250 lost steps before it reports
[14:00:40] <skunkworks_> bah - you would hardly notice ;)
[14:26:19] <skunkworks_> damn emc2 sucks http://lists.ourproject.org/pipermail/bdi4emc-help/2008-January/000359.html
[14:27:48] <alex_joni> skunkworks_: looking up long forgotten projects?
[14:28:31] <skunkworks_> Some times I get board.. I like to keep up with the jones.
[14:30:13] <cradek> wow, a full 20% of the messages on that list in January were denigrating EMC2 (but I guess that's only 1 message)
[14:31:54] <skunkworks_> Heh
[14:32:14] <skunkworks_> my 866 only got 40khz pulse rate.
[14:32:25] <alex_joni> with doublestep?
[14:32:29] <skunkworks_> :) yes
[14:33:00] <skunkworks_> that was safe - I might have been able to get maybe 45 but 50 was throwing rt errors
[14:33:27] <skunkworks_> after runnig for a while.
[14:34:07] <skunkworks_> that was this http://www.electronicsam.com/images/866mhzyum.JPG
[14:35:15] <skunkworks_> I think that is more than enough for most machines..
[14:35:40] <cradek> that looks like .25us, which is only 4kHz?
[14:36:46] <alex_joni> bbl
[14:37:33] <skunkworks_> time base is set to 10us*2.5 = 25us = 40khz
[14:37:48] <skunkworks_> did I do that right?
[14:38:19] <cradek> doh, I was looking at the wrong mark on the knob
[14:38:47] <cradek> AND my math was wrong
[14:38:52] <cradek> wow, what a massive failure
[14:38:55] <skunkworks_> heh :) thats ok
[14:40:27] <skunkworks_> that would make our z axis be able to run at 240ipm (input scale of 10000) not that it would go that fast.
[15:23:38] <skunkworks_> I love how he says mach turns circles around emc2. He has a toxic personallity.
[15:34:33] <SWPadnos> which "he"?
[15:37:29] <skunkworks_> Oh - I think we know Who 'he' is..
[15:37:33] <skunkworks_> *all
[15:37:50] <SWPadnos> heh - I guess I need to finish reading through the list or something :)
[15:40:36] <SWPadnos> hmmm. must be on cnczone or something
[15:40:43] <SWPadnos> (which I don't read)
[15:41:02] <skunkworks_> sorry - didn't actaully post the link http://lists.ourproject.org/pipermail/bdi4emc-help/2007-July/000345.html
[15:41:09] <SWPadnos> oh, him :)
[15:48:03] <jepler> (hi paul)
[15:49:09] <skunkworks_> ;)
[15:52:16] <cradek> haha
[15:52:52] <SWPadnos> I'm sure it's true. 2.0.4 on BDI may actually be worse than mach :)
[15:53:15] <SWPadnos> I mean, it's on the web, so it *must* be true
[15:53:43] <cradek> "knocks the socks of" sounds like a euphemism
[15:53:55] <cradek> a relative of "knocks boots with" maybe?
[15:54:07] <SWPadnos> heh -"is hanging around the ankles of" ?
[15:58:00] <skunkworks_> hopping http://cia.vc/stats/project/tuxcnc
[15:58:09] <skunkworks_> * skunkworks_ shuts up now.
[19:38:52] <jepler> on hardy, dpkg-shlibdeps prints out lots of warnings
[19:38:52] <jepler> dpkg-shlibdeps: warning: debian/emc2/usr/bin/m5i20cfg shouldn't be linked with libgcc_s.so.1 (it uses none of its symbols).
[19:38:56] <jepler> dpkg-shlibdeps: warning: symbol _Z23SET_MOTION_CONTROL_MODEid used by debian/emc2/usr/lib/librs274.so found in none of the libraries.
[19:39:13] <jepler> I understand the librs274.so warnings, and they're not indicative of a problem
[19:39:22] <jepler> (the program which links librs274.so must define those functions)
[19:39:32] <jepler> I'm less sure about the "shouldn't be linked with" messages..
[19:39:37] <jepler> oh well
[20:00:38] <jepler> yuck
[20:00:39] <jepler> Starting EMC2...
[20:00:39] <jepler> alloc: invalid block: 0x7fc43ad24f78: 0 0
[20:00:39] <jepler> /usr/bin/emc: line 594: 9624 Aborted (core dumped) $EMCDISPLAY -ini $INIFILE $EMCDISPLAYARGS $EXTRA_ARGS
[20:00:42] <jepler> Shutting down and cleaning up EMC2...
[20:01:00] <cradek> ouch
[20:01:13] <fenn> try that with halrun or /etc/init.d/realtime and see if you get the same result?
[20:01:32] <jepler> that's in axis (I think) .. the basic realtime stuff worked fine
[20:03:22] <jepler> rtai latency test, emc latency test, emc test suite
[20:03:45] <jepler> how odd, "wish" programs on hardy get antialiased text, but python-tk programs don['t.
[20:04:22] <fenn> maybe the library versions between axis and hal dont match (stale libraries)
[20:05:17] <fenn> oh, it's a tcl thing
[20:06:59] <fenn> "it seems that this problem arises from Tcl being
[20:07:00] <fenn> compiled without thread support."
[20:13:05] <jepler> 'alloc: invalid block' is a result of this?
[20:13:33] <fenn> well, alloc: invalid block is a tcl error
[20:13:43] <jepler> so it is
[20:33:47] <jepler> it might be due to the dodgy way that I create the togl Tcl package from within a Python module, too
[20:35:17] <jepler> at any rate, that's all I have to do to trigger it
[20:35:18] <jepler> $ python -c 'import Tkinter, _togl; t = Tkinter.Tk(); _togl.install(t)'
[20:35:18] <jepler> alloc: invalid block: 0x7fbe4d287f78: b0 0
[20:43:44] <jepler> aha
[20:43:44] <jepler> $ ldd /usr/lib/python2.5/lib-dynload/_tkinter.so | grep libtk8.
[20:43:44] <jepler> libtk8.4.so.0 => /usr/lib/libtk8.4.so.0 (0x00007ff27b37e000)
[20:43:49] <jepler> $ ldd lib/python/_togl.so | grep libtk8.
[20:43:49] <jepler> libtk8.5.so.0 => /usr/lib/libtk8.5.so.0 (0x00007fc9320f4000)
[20:44:03] <jepler> _togl and _tkinter are being linked with different versions of libtk
[20:44:48] <jepler> *facepalm*
[20:45:56] <fenn> * fenn prances
[20:46:18] <jepler> what are you happy about? something besides my misfortune, I hope.
[20:46:31] <fenn> i was on the right track
[20:46:38] <jepler> yeah to point me at tcl helped
[20:46:52] <fenn> and it was stale libraries after all
[20:53:30] <jepler> removing {tcl,tk}8.5-dev and installing 8.4-dev seems to fix it..
[20:56:09] <jepler> yay axis runs
[20:56:18] <jepler> framerate is not great running on the wrong side of DSL :-P
[21:09:57] <skunkworks_> even a broken clock is right twice a day... ;)