#emc-devel | Logs for 2008-11-24

[00:26:01] <jepler> alex_joni: I glanced at it but didn't pay much attention -- is that the combination we have on hardy?
[00:26:53] <jepler> or is it only on ibex
[00:27:47] <jepler> looks like hardy is on 4.2, not 4.3
[00:27:54] <jepler> so I'll skip worrying about this one
[02:26:32] <jmkasunich> jepler: cradek: either of you here?
[02:29:07] <jmkasunich> I'm planning to revert much of my hacking about in hal_lib and a couple other places (alias stuff)
[02:29:51] <jmkasunich> from what I can see, "revert" really means "do an ordinary commit, after making the working copy match the version you are reverting to"
[02:29:51] <cradek> I'm here
[02:30:05] <cradek> yes that's about all you can do
[02:30:18] <jmkasunich> ok
[02:30:24] <cradek> did it go wrong?
[02:30:35] <jmkasunich> the approach I was using was stupid
[02:30:36] <SWPadnos> I had a couple of questions about completion and unalias
[02:30:38] <SWPadnos> heh
[02:30:48] <jmkasunich> I only realised it when I saw how many other things needed to change
[02:31:05] <cradek> ah
[02:31:08] <jmkasunich> the new plan is to actually change the name stored in the pin, and store the old name elsewhere
[02:31:27] <jmkasunich> that way everybody that needs a pin list (halcmd show, scope, meter, etc) can just do what they've always done
[02:32:02] <SWPadnos> in this case, the alias masks the original name for most operations
[02:32:10] <jmkasunich> the only code that needs to even know aliases exist is in hal_lib, and in halcmd show alias
[02:32:12] <cradek> ah, that sounds simpler
[02:32:13] <SWPadnos> (comment, not criticism)
[02:32:24] <SWPadnos> also the completion code for unalias I think
[02:32:39] <cradek> so you don't really need to revert, you just are going to continue with a different scheme
[02:32:40] <jmkasunich> yeah
[02:32:53] <jmkasunich> well, I committed a couple days of work on the other scheme
[02:33:07] <SWPadnos> but find_pin_by_name still has to match either the original or the alias
[02:33:16] <jmkasunich> I want to back up, and note it as such in the logs
[02:33:21] <jmkasunich> SWPadnos: yes, and it will
[02:33:25] <jmkasunich> that is part of hal_lib
[02:33:50] <SWPadnos> ok, so unalias is the only place where the dual-list merge has to be done (assuming that unalias should match either)
[02:34:09] <jmkasunich> I see no reason to do the dual-list merge
[02:34:26] <SWPadnos> or just leave the completions out of order, and check both the originals and aliases in the one run through the list
[02:34:30] <jmkasunich> unless the completion library needs the list to be sorted
[02:34:33] <jmkasunich> right
[02:34:49] <SWPadnos> I don't think it's necessary for it to be sorted, but I think it's better for the user when it is
[02:35:25] <SWPadnos> and the completion function isn't a loop, it's called repeatedly (like strtok) and returns one name at a time
[02:35:36] <jmkasunich> eww
[02:35:38] <SWPadnos> yeah
[02:35:53] <SWPadnos> there's a flag that gets passed in when it's a new completion set
[02:35:57] <jmkasunich> how the heck do we manage the mutex in that case?
[02:36:04] <SWPadnos> there's a global flag that gets set when the list is over
[02:36:08] <jmkasunich> I sure hope we aren't holding the mutex while the user is typing
[02:36:22] <SWPadnos> I don't know - maybe that was the thing I was going to look at :)
[02:36:34] <SWPadnos> completion only hapens when you do the tab/double-tab
[02:36:47] <SWPadnos> not while typing (we don't have look-ahead, thank god)
[02:36:50] <jmkasunich> I see
[02:36:53] <SWPadnos> though it would be cool :)
[02:37:29] <jmkasunich> does the completion lib give us the already typed chars, and we return the next thing that matches, or do we just return the next thing period, and it does the matching?
[02:37:42] <SWPadnos> you only return matches
[02:37:57] <SWPadnos> it passes in the current match string
[02:38:07] <jmkasunich> what is the completion lib called anyway? I should google/manpage it
[02:38:11] <SWPadnos> err
[02:38:17] <jepler> readline
[02:38:18] <SWPadnos> readline?
[02:39:10] <jepler> if you're writing a function like 'pin_generator', you return full names, but only ones that match the prefix you're given
[02:39:42] <SWPadnos> there's an option to sort horizontally instead of vertically
[02:39:50] <SWPadnos> that implies to me that the returns are sorted by readline
[02:39:56] <jepler> yes I think that's true
[02:40:02] <SWPadnos> I guess I could test
[02:43:11] <jepler> cradek: looks like we need 6-rotary-axis support to keep up with the latest uses of servo motors. can you have it finished by wednesday? http://hackaday.com/2008/11/23/cubear-berkeleys-rubiks-cube-solver/
[02:44:22] <cradek> sure, we'll have to call them !,@,#
[02:44:33] <SWPadnos> yes, it does sort (yay!)
[02:46:19] <jepler> cradek: # is used already
[02:46:28] <jepler> I propose `!@
[02:46:40] <cradek> oh, #, right
[02:46:48] <cradek> %^&
[02:46:57] <cradek> ha, ` would be lovely
[02:47:01] <jepler> we could upgrade the interpreter to unicode and use circled-A, circled-B, circled-C
[02:48:39] <SWPadnos> just switch to greek, alpha beta gamma
[02:49:10] <SWPadnos> which is even better since gamma douesn't start with a "c"
[02:49:14] <fenn> just use a different IP address for each axis
[02:49:14] <SWPadnos> doesn't
[02:49:21] <SWPadnos> IPV6 of course
[02:49:24] <fenn> of course
[02:50:17] <jepler> http://blogs.sun.com/jbeck/date/20041001#rm_rf_protection
[02:51:00] <jepler> pah, standards people
[02:51:21] <jepler> pah, crazy people who care what standards say
[02:51:37] <jepler> hi garage_seb
[02:52:21] <garage_seb> hi jepler
[02:52:33] <cradek> hi seb
[02:52:41] <cradek> I talked a lot earlier to your invisible self
[02:52:42] <garage_seb> dude i found some sweet music
[02:53:28] <garage_seb> http://www.archive.org/details/superjam2008-06-13.mk4
[02:53:41] <jepler> what's an mk4 file?
[02:53:45] <garage_seb> les claypool and gogol bordelly doing tom waits covers
[02:54:15] <garage_seb> cradek: i'll go ask my invisible self what you said, hold on
[02:54:29] <cradek> tell him hi for me
[02:59:04] <SWPadnos> jmkasunich, how wil the alias list work now? will "alias" in the pin struct be the SHM_OFFSET of the actual alias string?
[02:59:23] <SWPadnos> err - alias struct that is
[02:59:34] <garage_seb> cradek: i bet it's an hm2 bug, the invisible seb is always messing up my stuff
[02:59:57] <jmkasunich> no
[03:00:03] <jmkasunich> name in the pin struct will contain the alias
[03:00:15] <jmkasunich> alias in the pin struct will point to a struct that contains the old name
[03:00:17] <SWPadnos> sure, I'm wondering where the other name will be :)
[03:00:17] <cradek> haha
[03:00:40] <jmkasunich> don't code anything right now, let me get my shit straight first
[03:00:50] <SWPadnos> ok
[03:00:58] <jmkasunich> I might wind up calling that struct field "old_name" instead of alias
[03:01:01] <jmkasunich> or something
[03:01:07] <SWPadnos> whichever
[03:01:33] <SWPadnos> I think completion needs to complete both names, so I can get some of that framework done at least
[03:01:49] <SWPadnos> (tracking whether the alias has been returned or not, whether there is one ...)
[03:02:00] <jmkasunich> right now I'm just trying to get back to a good starting point
[03:02:21] <SWPadnos> ok, don't let me stop you
[03:02:23] <SWPadnos> :)
[03:02:41] <jmkasunich> well, I'm preparing a commit that will remove most of my work and some of yours
[03:02:57] <jmkasunich> then I'll get the few pieces I want to keep from cvs and add them back
[03:03:00] <jmkasunich> and go on from there
[03:03:08] <SWPadnos> ok - there wasn't much there from me other than some text string arrays and stuff
[03:06:02] <SWPadnos> garage_seb, did I understand correctly that the minimum unit of travel for the stepgen quadrature mode is one complete 4-phase quadrature cycle?
[03:06:16] <jmkasunich> I noticed that too - seems odd to me
[03:06:20] <SWPadnos> yeah
[03:06:26] <garage_seb> it is odd, and peter offered to change it
[03:06:38] <SWPadnos> one advantage is that you're always in phase when you turn the thing off then back on
[03:06:42] <SWPadnos> but it's weird
[03:06:50] <garage_seb> that was his reason - it was easy for him to implement
[03:07:04] <jmkasunich> it means you'll zoom thru four states very fast, even when the step rate is slow
[03:07:09] <garage_seb> jmkasunich: no
[03:07:09] <jmkasunich> that means you need fast optos, etc
[03:07:23] <jmkasunich> no?
[03:07:24] <SWPadnos> DLL-locked?
[03:07:37] <garage_seb> the hm2 stepgen has a 48-bit accumulator
[03:07:46] <garage_seb> it increments at a controllable frequency
[03:07:54] <garage_seb> when the bottom 32 bits overflow, it steps
[03:08:05] <SWPadnos> steps 1 phase?
[03:08:21] <garage_seb> for quadrature mode, bits 30 and 31 are converted to Gray code and emitted when they change
[03:08:25] <garage_seb> SWPadnos: yes
[03:08:35] <garage_seb> so it just multiplies your requested step rate by 4
[03:08:38] <SWPadnos> uh
[03:08:54] <jmkasunich> it should be using bits 32 and 33
[03:09:09] <SWPadnos> can it stop on anything other than 00?
[03:09:22] <garage_seb> SWPadnos: if you command 0 step rate, it'll stop wherever it is
[03:09:49] <SWPadnos> ok, so it sounds like jmkasunich has it right - the rate is wrong but it will move in non-x4 increments
[03:09:59] <garage_seb> the current driver wants it to move a certain number of "integral" steps, so it'll never command it to stop except when the quad output is 00
[03:10:14] <SWPadnos> ok, I get it now
[03:10:17] <garage_seb> right after the bottom 32 bits overflow into a "normal step"
[03:10:21] <SWPadnos> the driver should be changed then :)
[03:10:35] <SWPadnos> or the stepgen
[03:10:39] <garage_seb> it'd be easier to change the firmware to look at 32 and 33 i think
[03:10:53] <SWPadnos> that sounds like the more correct fix
[03:10:54] <garage_seb> it just didnt seem important to me at the time, i'm more interested in encoder velocity right now
[03:11:09] <SWPadnos> have at it then - just wondering :)
[03:11:15] <garage_seb> i'll bring it up with peter after velocity estimation kicks ass ;-)
[03:11:21] <SWPadnos> good plan
[03:11:26] <CIA-38> EMC: 03jmkasunich 07TRUNK * 10emc2/src/hal/hal_priv.h: reverting to version 1.31 - the approach I was taking to 'alias' is stupid - starting over
[03:11:26] <CIA-38> EMC: 03jmkasunich 07TRUNK * 10emc2/src/hal/utils/halcmd.c: reverting to version 1.123 - the approach I was taking for 'alias' is stupid, starting over
[03:11:26] <CIA-38> EMC: 03jmkasunich 07TRUNK * 10emc2/src/hal/hal_lib.c: reverting to version 1.62 - the approach I was taking for 'alias' is stupid, starting over
[03:11:29] <CIA-38> EMC: 03jmkasunich 07TRUNK * 10emc2/src/hal/utils/halcmd_commands.c: reverting to version 1.35 - the approach I was taking for 'alias' is stupid, starting over
[03:12:25] <jmkasunich> there - I had to take out some stuff I'm gonna be putting right back in, but this compiles
[03:13:08] <SWPadnos> it almost seems that it wasn't too hard to do that
[03:13:39] <jmkasunich> I just fetched the old versions (from cvsweb) and plopped them into my working copy
[03:13:56] <jmkasunich> then queued up all four commits (since each message is unique)
[03:14:07] <SWPadnos> oh - how do you do that?
[03:14:36] <jmkasunich> four shells, then hit return in each one - that was just so the farm doesn't try to build a half-and-half mixture
[03:14:41] <SWPadnos> heh
[03:15:13] <SWPadnos> I was wondering what the CVS queue command was
[03:15:19] <jmkasunich> heh
[03:15:19] <SWPadnos> :)
[03:19:14] <cradek> garage_seb: do you think just reordering hm2_read is the answer?
[03:19:38] <garage_seb> that code needs a pretty major reorg
[03:19:44] <garage_seb> i'm working on it now
[03:19:46] <cradek> uh-oh
[03:19:48] <cradek> ok
[03:19:59] <garage_seb> i'm merging hm2_encoder_read into hm2_encoder_process_tram
[03:20:05] <garage_seb> all the information it needs will be available there
[03:20:11] <garage_seb> it'll be simpler
[03:20:24] <garage_seb> also i'm switching to a rawcounts scheme like the software encoder
[03:20:30] <garage_seb> should fix the vel blip on index
[03:20:34] <cradek> neato
[03:20:39] <garage_seb> gimme 30 or 60 minutes
[03:20:59] <cradek> just let me know when I can test for you.
[03:23:13] <CIA-38> EMC: 03jmkasunich 07TRUNK * 10emc2/src/hal/hal_lib.c: restore version 1.66 - name length checks are not part of the 'alias' work, just happened to be added right in the middle
[03:24:02] <jmkasunich> revert 1.63, 1.64, 1.65, keep 1.66, revert 1.67 and 1.68.... whew
[03:26:45] <garage_seb> mmm cvs
[03:27:16] <jmkasunich> in this case, I do like CVS's file-by-file approach
[03:27:25] <jmkasunich> I picked the versions of each file that I wanted
[03:39:07] <SWPadnos> should I un-revert things that I see need restoring?
[03:39:22] <jmkasunich> not yet
[03:39:25] <SWPadnos> ok
[03:40:00] <jmkasunich> I want to keep everything kind of in sync
[03:40:19] <SWPadnos> ok
[03:40:32] <jmkasunich> for example, I reverted halcmd.c, even tho everything you did will be used - but it wouldn't compile without the changes to halcmd_commands.c
[03:40:45] <SWPadnos> I just noticed that the halcmd.c diff reverted only stuff which will still be needed
[03:41:07] <jmkasunich> right - I'll put it back with patch at the proper time
[03:41:13] <SWPadnos> hmmm. was the commit message truncated?
[03:41:40] <jmkasunich> dunno, but it certainly could have been - there is an upper limit to the length
[03:41:47] <SWPadnos> oh, alias - I see it now :)
[03:41:47] <jmkasunich> cvsweb can show complete diffs
[03:42:01] <SWPadnos> I hadn't noticed do_alias_cmd in the changes
[03:42:13] <jmkasunich> yeah, thats the only thing that broke it
[03:42:39] <SWPadnos> ok. let me know when you're ready. I'll be afk for a few minutes
[03:42:42] <jmkasunich> one alternative would be to put a do-nothing version of do_alias_cmd in halcmd_commands, so the halcmd.c code can be restored
[03:43:02] <jmkasunich> I really won't be ready for any collabrative stuff this evening
[03:43:43] <jmkasunich> although....
[03:44:40] <jmkasunich> if you want to put a dummy do_alias_cmd in halcmd_commands.c and then restore halcmd.c, you could do that at any time
[03:44:47] <jmkasunich> just make sure it builds together
[04:35:23] <jmkasunich> SWPadnos: at the hal_lib api level, unalias is done by calling alias("name", NULL)
[04:35:42] <SWPadnos> ok
[04:35:45] <jmkasunich> I detect "name" not found, and report error
[04:36:06] <SWPadnos> it seems that the alias command should be more or less the same (in halcmd), since it's still alias(name, alias)
[04:36:09] <jmkasunich> I will accept the call if "name" matches either old name or alias
[04:36:13] <SWPadnos> ok
[04:36:30] <jmkasunich> what if "name" matches the old name, and there is no alias?
[04:36:40] <jmkasunich> should I return success, or complain?
[04:36:57] <SWPadnos> no error I think
[04:37:08] <jmkasunich> thats what I was thinking
[04:37:17] <SWPadnos> you're talking about alias(pin_name_that_has_no_alias, NULL) ?
[04:37:21] <jmkasunich> yes
[04:37:39] <SWPadnos> ok, no error - there should be a way to guarantee that a pin is unaliased
[04:37:53] <jmkasunich> the way I'm coding it, it will do a little busywork
[04:38:16] <SWPadnos> if (found_pin->alias==0) return HAL_SUCCESS;
[04:38:20] <jmkasunich> unlink the pin from the list, then realise nothing needs done, and rescan the link to put it in the right place (which is where it was)
[04:39:00] <SWPadnos> would short-circuiting that make the function not take the lock?
[04:39:02] <jmkasunich> I'm not coding the "remove an alias" as a separate branch, 90% of what it does is the same as "add an alias"
[04:39:18] <SWPadnos> ok, so you're adding nothing as the alias
[04:39:27] <jmkasunich> if there is an alias, I need to unlink from list, change name back to orig, and relink in the right place
[04:39:33] <SWPadnos> hmmm
[04:39:40] <jmkasunich> the flow is:
[04:39:43] <SWPadnos> there needs to be a flag that says the pin is aliased
[04:39:52] <SWPadnos> oh, maybe not
[04:39:57] <jmkasunich> there is - oldname == 0 means no alias
[04:40:06] <jmkasunich> the flow is:
[04:40:15] <jmkasunich> test a bunch of stuff (duplicate names, etc)
[04:40:33] <jmkasunich> find pin and unlink it (those are related - the list manipulation uses prev pointers, etc)
[04:41:11] <jmkasunich> manipulate names (depends on operation, might mean copy name to oldname, and alias to name, or might mean copy oldname to name, and discard oldname struct)
[04:41:22] <jmkasunich> relink pin in list in the proper place, using name
[04:42:06] <jmkasunich> in the case we're talking about, "manipulate" winds up a no-op
[04:42:18] <SWPadnos> you can short circuit the duplicate checks when alias==NULL
[04:42:25] <jmkasunich> I do
[04:43:59] <SWPadnos> it seems like it should be easy to skip the list manipulation, but I'd have to see the code before I'd argue with you about it :)
[04:44:25] <SWPadnos> I suspect you've thought of everything I'm likely to think of off the top of my head (and a bit more)
[04:44:26] <jmkasunich> well, you don't know if the pin to unalias has an alias, until you've found it in the list
[04:44:31] <SWPadnos> sure
[04:44:45] <jmkasunich> I mostly wanted to know if that case should be an error, warning, or silent accept
[04:45:07] <jmkasunich> the rest of my rambling was implementation details - IOW, why I was asking the first question
[04:45:07] <SWPadnos> find_pin_by_name, then if (pin->alias == 0 && new_alias == NULL) return HAL_SUCCESS
[04:45:12] <SWPadnos> heh
[04:45:38] <SWPadnos> yeah, no errors if the pin is found and the end result is that there's no alias
[04:45:59] <jmkasunich> the problem with find_pin_by_name is that the very next step is to unlink it from the list (in most cases), and since the lists are linked only one way, that means I have to start at the beginning again
[04:46:17] <jmkasunich> I don't want to traverse the list twice
[04:46:39] <SWPadnos> oh, I fhought they were doubly linked
[04:46:41] <SWPadnos> thought
[04:46:46] <jmkasunich> the "right" way to do this (one right way anyhow) would be doubly linked
[04:47:02] <jmkasunich> right now only the list of functs in a thread is double-linked
[04:47:18] <jmkasunich> because you might want to insert a the 2nd slot from the end, which would be a pita otherwise
[04:47:18] <SWPadnos> ok, so hal_unlink_pin finds then unlinks the pin, but doesn't de-allocate it?
[04:47:22] <SWPadnos> sure
[04:47:35] <jmkasunich> hal_unlink_pin has nothing to do with this
[04:47:42] <SWPadnos> ok
[04:47:48] <jmkasunich> it unlinks a pin from a signal
[04:48:00] <jmkasunich> the unlink I've been talking about is related to linked list management
[04:48:05] <jmkasunich> (overloading words sucks)
[04:48:20] <SWPadnos> oh right - nevermind
[04:50:08] <jmkasunich> ok, I have find-n-unlink and name-manipulation coded
[04:50:22] <jmkasunich> relink-in-right-place next
[05:01:28] <jmkasunich> ok, I think I've got it
[05:01:31] <jmkasunich> no way to test yet tho
[05:01:45] <jmkasunich> I'm gonna commit the hal_lib.c and hal_priv.h changes
[05:02:05] <jmkasunich> then we can look at the halcmd changes (which should be a LOT simpler than before)
[05:02:11] <jmkasunich> "then" = "tomorrow"
[05:05:05] <SWPadnos> sounds good to me
[05:05:07] <SWPadnos> see you
[05:08:21] <jmkasunich> we shouldn't have to change show pin at all
[05:08:28] <jmkasunich> just add alias, unalias, and show alias
[05:08:37] <CIA-38> EMC: 03jmkasunich 07TRUNK * 10emc2/src/hal/ (hal_lib.c hal_priv.h): implementation of 'alias' for pins (params will follow after this is tested). Next step is to modify halcmd to invoke this code, and to show the results
[05:08:43] <jmkasunich> likewise, shouldn't have to modify scope or meter
[05:10:30] <jmkasunich> goodnight
[05:15:00] <garage_seb> cradek: http://highlab.com/~seb/encoder-no-glitch.png
[05:15:32] <SWPadnos> those dips are index glitches?
[05:15:46] <garage_seb> dips in vel?
[05:15:49] <SWPadnos> yes
[05:16:01] <garage_seb> have nothing to do with index
[05:16:02] <SWPadnos> (sorry - looks good for index_enable vs. position output :) )
[05:16:14] <garage_seb> i was turning the shaft with my fingers, maybe that was it
[05:16:26] <SWPadnos> ok. there just happens to be one at the index edge ...
[05:16:27] <garage_seb> or maybe vel estimation is still buggy
[05:16:38] <garage_seb> well they're all over
[05:16:43] <SWPadnos> that's truwe
[05:16:46] <SWPadnos> -w
[05:16:58] <garage_seb> vel's got problems, but i think the index problem is licked
[05:17:04] <SWPadnos> great
[05:17:09] <SWPadnos> !
[05:17:25] <garage_seb> i took me a while to realize that one of my two encoders with index doesnt actually output index...
[05:17:26] <garage_seb> sigh
[05:17:29] <garage_seb> hardware, man....
[05:17:34] <SWPadnos> heh
[05:20:18] <SWPadnos> ok, real bedtime now. night
[05:20:29] <garage_seb> goodnight SWPadnos
[05:28:06] <CIA-38> EMC: 03seb 07TRUNK * 10emc2/src/hal/drivers/mesa-hostmot2/ (TODO encoder.c hostmot2.c hostmot2.h):
[05:28:06] <CIA-38> EMC: This refactors the encoder code a bunch:
[05:28:06] <CIA-38> EMC: * position and index-enable now reset to 0 at the same time
[05:28:06] <CIA-38> EMC: * velocity doesnt glitch when index happens
[05:28:06] <CIA-38> EMC: It still needs work on low-speed velocity estimation.
[10:30:59] <CIA-38> EMC: 03cmorley 07TRUNK * 10emc2/src/hal/classicladder/config_gtk.c: switch to combo boxes for portname and serial port speed so options are obvious.Portname is not editable anymore though..it locks the system- need to fix that...
[12:07:06] <CIA-38> EMC: 03bigjohnt 07v2_2_branch * 10emc2/docs/src/gcode/main.lyx: removed calling files with O word
[12:10:23] <alex_joni> BigJohnT: hi
[12:10:35] <alex_joni> you also sent a commit for XY/XZ planes.. was that intentional?
[12:11:30] <BigJohnT> darn no
[12:11:48] <BigJohnT> I was still working on that
[12:13:06] <BigJohnT> that should be ok
[12:14:21] <BigJohnT> it is just lacking a couple of figures
[12:19:08] <alex_joni> ok :)
[13:00:58] <CIA-38> EMC: 03bigjohnt 07TRUNK * 10emc2/docs/src/gcode/images/ (G17.odg G17.png G18.odg G18.png): add images and source files
[13:06:23] <CIA-38> EMC: 03bigjohnt 07v2_2_branch * 10emc2/docs/src/gcode/main.lyx: add images to G17-18
[13:07:26] <BigJohnT> one more to go alex_joni but not now... I'm off to bring my Dad his breakfast...
[16:34:19] <cradek> hi seb
[16:34:25] <alex_joni> hi seb
[16:36:05] <alex_joni> cradek: probably an autoconnect :)
[16:37:39] <cradek> oh well
[16:38:08] <alex_joni> odd error:
[16:38:10] <alex_joni> "Ran out of GART memory (for 1048576)!
[16:38:26] <alex_joni> Please consider adjusting GARTSize option. "
[16:38:26] <jepler> ow my gart
[16:38:43] <alex_joni> seeems that's some ATI + mesa bug
[16:42:35] <seb_kuzminsky> hi fellas
[16:42:40] <alex_joni> hey seb_kuzminsky
[16:42:49] <seb_kuzminsky> i was watching the IT Crowd season opener :-D
[16:42:57] <alex_joni> seb_kuzminsky: was wondering if you can eyeball the buildbot waterfall for the failed tests
[16:43:06] <seb_kuzminsky> http://thepiratebay.org/torrent/4529786/The.IT.Crowd.S03E01.WS.PDTV.XviD-RiVER.%5BVTV%5D.avi
[16:43:08] <alex_joni> for some reason they seem to fail since rev. 27 or so
[16:43:35] <seb_kuzminsky> i think it's because the buildslaves run in VMs
[16:43:45] <alex_joni> but it used to run.. iirc
[16:43:54] <alex_joni> now they all fail with some strange error
[16:44:49] <seb_kuzminsky> i'll see if i can extract the info from the buildslaves
[16:58:29] <seb_kuzminsky> brb
[17:00:50] <seb_kuzminsky> i'm a dummy
[17:08:46] <alex_joni> err..
[17:10:32] <CIA-38> EMC: 03jepler 07TRUNK * 10emc2/docs/src/hal/intro.lyx: reflect switch to double-precision
[18:18:20] <CIA-38> EMC: 03seb 07TRUNK * 10emc2/scripts/runtests: This adds a "-v" option to runtests, so the buildbot can show us what happened.
[18:37:20] <seb_kuzminsky> alex_joni: ok, now we can see the error output from runtests:
[18:37:21] <seb_kuzminsky> http://emc2-buildbot.colorado.edu/buildbot-admin/builders/dapper-x86-trunk-realtime-rip/builds/100/steps/runtests/logs/stdio
[18:37:44] <seb_kuzminsky> not sure why it's erroring that way just yet...
[18:37:52] <alex_joni> * alex_joni looks
[18:38:21] <alex_joni> Authorization Required ;)
[18:38:39] <seb_kuzminsky> oops, take the -admin off the first dir in the url
[18:39:01] <seb_kuzminsky> realtime's not running when the tests start
[18:39:05] <alex_joni> yeah, figured it out
[18:39:26] <alex_joni> hmm.. that's odd
[18:40:13] <alex_joni> seb_kuzminsky: I suspected (but I am probably wrong) that hal can't load as the HAL magic key has changed lately
[18:40:36] <alex_joni> if there was a running/loaded HAL then halcmd can't unload it (unless manual intervention is present: e.g. -R)
[18:40:37] <seb_kuzminsky> i think it's a race condition
[18:40:49] <seb_kuzminsky> i can run the tests fine on my computer, just not on the VM buildslaves
[18:40:58] <alex_joni> but they used to run..
[18:41:06] <alex_joni> in the first 27 commits it worked
[18:54:07] <alex_joni> seb_kuzminsky: is it possible to trigger an lsmod from one of the runtests?
[18:54:28] <alex_joni> or maybe you can ssh to one of the buildbots and check if there is some HAL/RTAPI/RTAI loaded?
[18:55:02] <alex_joni> seb_kuzminsky: ah, another thing.. the grid link doesn't work (http://emc2-buildbot.colorado.edu/buildbot/)
[18:57:17] <seb_kuzminsky> i'm changing the buildslaves to export their consoles via VNC, then i'll log in and take a look
[18:57:28] <seb_kuzminsky> maybe the realtime environment got left runnign or somthing
[19:00:08] <alex_joni> one thing I can imagine is a race condition
[19:00:39] <alex_joni> what happens if there is a runtest going on when a new commit is triggered?
[19:00:50] <alex_joni> does the old one finish doing it's thing before a new one starts?
[19:01:35] <alex_joni> seb_kuzminsky: maybe try adding a 'halcmd -R' 'halcmd unload all' 'realtime stop' before running the tests..
[19:03:17] <seb_kuzminsky> alex_joni: i've set the buildbot up to only do one build at a time per slave, so they should never collide
[19:03:40] <seb_kuzminsky> what's a vnc viewer that can connect over a unix domain socket?
[19:05:36] <alex_joni> I use realvnc usually..
[19:06:16] <alex_joni> vnc4server / xvnc4viewer
[19:06:41] <alex_joni> afaik you can tunnel it through ssh
[19:07:28] <alex_joni> seb_kuzminsky: http://en.wikipedia.org/wiki/Comparison_of_remote_desktop_software
[19:08:07] <alex_joni> but I think a regular ssh should provide enough clues aswell..
[19:08:56] <seb_kuzminsky> my xvnc4viewer doesnt do it
[19:09:10] <seb_kuzminsky> i run "xvnc4viewer unix:/home/seb/tmp/.qemu-vnc"
[19:09:30] <seb_kuzminsky> it says "unable to resolve host by name: Connection timed out (110)"
[19:09:30] <alex_joni> can you ssh to the VM?
[19:09:46] <seb_kuzminsky> you can set it up that way but it's not how i have it set up currently
[19:10:08] <alex_joni> ah, ok.. if it's too much of a pita don't sweat it
[19:10:26] <alex_joni> it's not like you have more important things to do :)
[19:11:31] <seb_kuzminsky> i'll run vnc over tcp on the loopback, hold on
[19:16:11] <seb_kuzminsky> hold on, real life intrudes
[19:17:26] <alex_joni> * alex_joni grabs a bite
[20:19:51] <seb_kuzminsky> alex_joni: runtests runs halcmd in the VM, and halcmd is segfaulting
[20:20:16] <alex_joni> hmm..
[20:20:39] <alex_joni> any loaded modules?
[20:21:26] <alex_joni> * alex_joni starts one of his vm's up
[20:22:46] <seb_kuzminsky> the dmesg looks like it's loading the right stuff at the right time
[20:23:19] <seb_kuzminsky> the error now is different though, after i restarted the VMs with VNC enabled, so maybe it's a new problem...
[20:23:47] <seb_kuzminsky> http://emc2-buildbot.colorado.edu/buildbot/builders/hardy-x86-trunk-realtime-rip/builds/96/steps/runtests/logs/stdio
[20:25:40] <alex_joni> odd
[20:25:50] <alex_joni> this is standard hardy install.. right?
[20:25:59] <seb_kuzminsky> off the emc2 live cd
[20:26:10] <seb_kuzminsky> apt-get update, install some stuff for buildbot
[20:26:15] <seb_kuzminsky> nothing wierd
[20:26:34] <seb_kuzminsky> running "by hand" on that vm, as that user, in the buildbot's build dir, works fine
[20:26:55] <cradek> wrong path (version mismatch?)
[20:27:03] <alex_joni> it does a source..
[20:27:15] <alex_joni> and halcmd should complain not segfault :/
[20:27:26] <cradek> it does, in my experience
[20:27:40] <seb_kuzminsky> it's something wierd that the VM is doing, or that buildbot is doing
[20:27:43] <seb_kuzminsky> gotta be
[20:27:53] <alex_joni> seb_kuzminsky: did you try running the buildbot command by hand?
[20:28:02] <seb_kuzminsky> prolly buildbot, because when i do the bb command by hand it works fine
[20:28:03] <alex_joni> e.g. '/bin/sh -c /bin/bash -c 'source scripts/emc-environment && runtests -v'
[20:28:10] <seb_kuzminsky> right, that work
[20:28:12] <seb_kuzminsky> works
[20:29:34] <alex_joni> I notice the env has changed a lot
[20:29:39] <alex_joni> http://emc2-buildbot.colorado.edu/buildbot/builders/hardy-x86-trunk-realtime-rip/builds/26/steps/runtests/logs/stdio
[20:29:46] <alex_joni> vs. the one you linked earlier
[20:30:04] <seb_kuzminsky> those old runs ran as me, with my env
[20:30:13] <seb_kuzminsky> the new ones are run as "farmer", with a much simpler login
[20:30:43] <jepler> can you get a core file, or use strace to find out the last syscall before it dies?
[20:31:13] <cradek> is farmer's shell bash?
[20:31:13] <seb_kuzminsky> i haven't set up Try with buildbot, so anything i want it to do must be checked in
[20:31:14] <jepler> print the output of ulimit -a to make sure that there's not some memory limitation for buildbot that doesn't apply to you, particularly locked memory
[20:31:31] <seb_kuzminsky> maybe is can enable coredumps for thw whole buildbot session
[20:32:21] <seb_kuzminsky> max locked memory, kbytes, 20480
[20:32:25] <seb_kuzminsky> max memory unlimited
[20:32:49] <seb_kuzminsky> hold on, i'll enable coredumps
[20:34:38] <seb_kuzminsky> hm, except now it's running with the full login environment, not the restricted autostart environment
[20:34:49] <seb_kuzminsky> let's see what it does, then i'll restart it the system startup way
[20:35:16] <alex_joni> hmm.. runnning the abs test locked my VM
[20:35:25] <alex_joni> it never did that by running only emc2
[20:35:38] <alex_joni> * alex_joni was running dapper
[20:35:45] <seb_kuzminsky> i think these vms are more trouble than they're worth, for this application
[20:35:57] <seb_kuzminsky> anyone have a computer or two to run a real buildslave?
[20:36:00] <cradek> s/for this.*$//
[20:36:26] <alex_joni> hmm.. I have 2 servers at work not currently running..
[20:36:46] <alex_joni> 2 x dual-core Xeon's @ 3GHz
[20:37:18] <alex_joni> wonder if they are any RT good.. (all SCSI inside)
[20:40:17] <seb_kuzminsky> ok so it works when *i* start buildbot, but apparently not when the init scripts do it
[20:40:45] <alex_joni> oh
[21:13:32] <seb_kuzminsky> jepler: you got it right, it's a memlock ulimit issue
[21:13:56] <seb_kuzminsky> the buildbot when started by init at system boot time has only 32 kb max locked memory
[21:21:47] <seb_kuzminsky> on the 8.04 livecd, the memlock limit is set from /etc/security/limits.conf by pam_limits
[21:21:59] <seb_kuzminsky> i think pam only gets involved when someone logs in, right?
[21:26:12] <alex_joni> I thought halcmd detects a memory limit in TRUNK
[21:28:54] <seb_kuzminsky> http://emc2-buildbot.colorado.edu/buildbot/builders/hardy-x86-trunk-realtime-rip/builds/101/steps/environment/logs/stdio
[21:31:41] <jepler> alex_joni: yeah, I too thought that I'd fixed that to give an error message
[21:32:51] <alex_joni> I do remember that you commited a fix.. but can't find it atm
[21:44:31] <jepler> $ (ulimit -l 32; scripts/halrun show)
[21:44:31] <jepler> RTAPI: ERROR: failed to map shmem
[21:44:34] <jepler> that's what I get in 2.2
[21:45:13] <jepler> (well, a lot more messages than that, but that's the first one)
[21:46:07] <seb_kuzminsky> aha coredump, hold on
[21:58:34] <seb_kuzminsky> the segfault is in init_hal_data, line 1534
[21:58:40] <seb_kuzminsky> oops 2534
[21:59:05] <seb_kuzminsky> it's the first time it touches hal_data after the rtapi_shmem_getptr in hal_init()
[21:59:55] <seb_kuzminsky> rtapi_shmem_getptr returned a non-NULL pointer, because the retval is RTAPI_SUCCESS
[22:00:07] <seb_kuzminsky> but then it segfaults on the first access to the memory pointed to
[22:00:30] <jepler> $ (ulimit -l 32; ../scripts/halrun show)
[22:00:30] <jepler> RTAPI: ERROR: failed to map shmem
[22:00:35] <jepler> wfm
[22:00:43] <jepler> this is breezy, the only rt system I have handy right now
[22:01:00] <seb_kuzminsky> could mine's hardy
[22:02:00] <alex_joni> * alex_joni is off to bed
[22:02:02] <alex_joni> good night all
[22:02:17] <seb_kuzminsky> goodnight alex
[22:02:46] <jepler> seb_kuzminsky: can you tell what 'mem_id' was?
[22:02:55] <jepler> in hal_init
[22:03:31] <seb_kuzminsky> it's 1
[22:04:05] <seb_kuzminsky> shmem_addr_array[1] is 0xffffffff...
[22:04:26] <seb_kuzminsky> rtai_malloc must have returned it that way
[22:04:39] <jepler> oh, so -1 is the error return from rtai_malloc?
[22:04:40] <jepler> that's inventive!
[22:04:45] <seb_kuzminsky> lol
[22:06:33] <seb_kuzminsky> rtai_shm.h claims rtai_malloc returns 0 on failure and a valid address on success
[22:08:08] <seb_kuzminsky> it's a bug in rtai
[22:08:35] <seb_kuzminsky> rtai_shm.h line 202
[22:08:51] <seb_kuzminsky> it gets an error from mmap (MAP_FAILED), but still returns it
[22:09:18] <seb_kuzminsky> there should be a "return NULL;
[22:09:23] <seb_kuzminsky> between 207 and 208
[22:09:29] <seb_kuzminsky> whew
[22:10:07] <seb_kuzminsky> i wonder why it works for you jepler
[22:10:17] <jepler> breezy has a much older rtai on it
[22:10:22] <seb_kuzminsky> sure
[22:10:29] <jepler> must be a newish bug
[22:10:36] <seb_kuzminsky> i'll go see if they've fixed it yet
[22:10:43] <seb_kuzminsky> but first: lunch
[22:10:48] <seb_kuzminsky> bbl
[22:13:05] <jepler> doesn't look that way. http://cvs.gna.org/cvsweb/magma/base/include/rtai_shm.h?rev=1.11;cvsroot=rtai
[22:13:57] <jepler> though I have no idea which sort of lava-related cvs project I should be in
[22:16:20] <jepler> tempting to work around it by changing our test, though
[22:16:20] <jepler> - if (shmem_addr_array[shmem_id] == NULL) {
[22:16:21] <jepler> + if (shmem_addr_array[shmem_id] == NULL
[22:16:21] <jepler> + || shmem_addr_array[shmem_id] == (void*)-1) {
[23:31:30] <CIA-38> EMC: 03seb 07TRUNK * 10emc2/src/rtapi/rtai_ulapi.c:
[23:31:30] <CIA-38> EMC: rtai_malloc() in RTAI 3.6.1 can indicate failure by returning NULL or by returning (void*)(-1).
[23:31:30] <CIA-38> EMC: One of emc2's three calls to rtai_malloc() checked for both, this commit
[23:31:30] <CIA-38> EMC: makes all three check for both. (And adds comments describing the quirk.)
[23:31:31] <CIA-38> EMC: This wont make the buildbot happy, but it'll stop halcmd segfaulting.
[23:37:24] <seb_kuzminsky> jepler: we reached the same conclusion
[23:37:45] <seb_kuzminsky> why you think that's not the way to go?
[23:40:47] <seb_kuzminsky> what i really want to know is: how do you set ulimit globally, even for processes that dont go through pam?
[23:41:02] <seb_kuzminsky> that seems useful for machine manufacturers like smithy
[23:41:24] <seb_kuzminsky> they want: turn on the computer, it starts emc2 and axis (or some other ui)
[23:41:29] <seb_kuzminsky> without necessarily a login
[23:46:19] <seb_kuzminsky> ok that's better, the buildbot doesnt get segfaults in halcmd in trunk now
[23:46:22] <seb_kuzminsky> still fails tho
[23:46:33] <seb_kuzminsky> because it's not logged in, so pam_limits dont matter
[23:46:52] <seb_kuzminsky> i need to set "ulimit -l" globally... how to do that?
[23:55:32] <seb_kuzminsky> i emailed a bug report to the rtai mailing list, we'll see what happens