#emc | Logs for 2011-02-01

Back
[01:21:34] <jmkasunich> hi guys
[01:22:27] <jepler> hi jmk!
[01:22:33] <cradek> hi
[01:22:49] <jmkasunich> I saw the discussion about periodically updating vs not for hal pins
[01:22:49] <cradek> testing testing testing!
[01:22:58] <jmkasunich> and the debate about what should happen when you link a pin
[01:23:48] <jmkasunich> and some questions about halcmd net
[01:23:59] <jmkasunich> regarding net - I still want to backport it
[01:24:05] <cradek> good
[01:24:13] <jepler> I would like to see net in 2.1 as well
[01:24:25] <jmkasunich> along with the assorted bugfixes like buffer overflows, and the changes to the behavior of loadusr -w -W
[01:25:07] <cradek> did you see that alex and I would like no more new features after this weekend
[01:25:09] <jmkasunich> I didn't put it in 2.1 yet because, well, I dunno, because I thought that leaving it in trunk would somehow mean we'd have more confidence in it after a few days
[01:25:15] <jmkasunich> yes
[01:25:35] <jmkasunich> I guess it will be a busy weekend then ;-)
[01:25:37] <jepler> and we do -- we noticed the bug that rtapi_app demonstrated so well
[01:25:51] <jepler> (the exit and become ready at the same moment bug)
[01:25:54] <jmkasunich> yeah
[01:26:01] <cradek> cool
[01:26:09] <jmkasunich> otoh, it wasn't a bug before I changed loadusr
[01:26:14] <jepler> ssssh
[01:26:18] <cradek> grr
[01:26:29] <jmkasunich> actually, it was a bug
[01:27:10] <jepler> my goal for the weekend is to get the driver for that futurlec 8255 board ready
[01:27:13] <jmkasunich> because it would have continued happily on whether rtapi_app ended after creating the component, or it ended because something prevented it from creating the component
[01:27:49] <jmkasunich> my goal for the weekend is to get that cursed printf out of stepgen, by providing a better way to get that functionality
[01:28:07] <jepler> that might be a 2.2-only item?
[01:28:10] <jmkasunich> after hours of talking to SWP, and sleeping on it, I think there is something that maybe we can all live with
[01:28:32] <jmkasunich> cradek was bound and determined yesterday to stick the print into 2.1, even if we planned on taking it out for 2.2
[01:28:37] <cradek> you're talking about changing it in trunk right?
[01:28:49] <jmkasunich> no
[01:28:52] <cradek> (well it's in)
[01:29:10] <jmkasunich> if you want to stick that print in 2.1, I want to stick something better in 2.1 instead
[01:29:27] <jmkasunich> if you are content without the print in 2.1, then we can defer error handling to 2.2
[01:30:18] <jepler> I am guessing that the better error handling needs more than the weekend to be developed and debugged
[01:30:33] <jmkasunich> what I have in mind is pretty transparent
[01:30:52] <jmkasunich> add to the hal API: hal_set_errorcode(int comp_id, int instance, int errorcode);
[01:30:55] <cradek> will it be able to pop up a message in the guis?
[01:31:18] <jmkasunich> and "hal_get_errorcode()"
[01:31:54] <jmkasunich> yes
[01:32:24] <jmkasunich> when the errofcode goes non-zero, some code (either in user space, or maybe in motmod) can print the message
[01:32:47] <jmkasunich> in general the message would be "component 'stepgen.0' issued error N"
[01:33:01] <jmkasunich> but if there are specific errors that we want to print other messages for, we can
[01:33:31] <jmkasunich> if stepgen error 1 is "maxvel unachievable", you can print a book of EMC specific hints if you want
[01:33:51] <jmkasunich> (when you notice that the error code is from stepgen, and the value is 1)
[01:34:07] <jepler> this means that the app or book that translates (stepgen, 0, 1) to a message has to know about all the components that will ever exist
[01:34:12] <jepler> so that it can give human readable errors
[01:34:16] <jmkasunich> no
[01:34:27] <jepler> it means that the error can't offer error-specific information, such as "Requested: 100, attainable: 75"
[01:34:29] <jmkasunich> the man page for a component would explain what its error codes mean
[01:34:35] <jmkasunich> that is true
[01:34:41] <jepler> "Error 37 [OK]" is terrible user interface
[01:34:54] <jmkasunich> but I have real heartburn with doing printf style formatting in a realtime thread
[01:35:14] <jmkasunich> I can't emphasize enough how ugly that is (IMO)
[01:35:46] <cradek> I would really like to debate this some more, then do it in the trunk, and get on with the 2.1 release now.
[01:35:51] <jmkasunich> I don't know huw much of the me/swp discussion you guys lurked thru
[01:36:01] <jmkasunich> and I certainly don't want to start again
[01:36:21] <jepler> I'd like to find a solution that gives a good "user experience" and satisfies you
[01:36:30] <SWPadnos> hi ;)
[01:36:36] <jmkasunich> printfs in the realtime code will never satisify me
[01:36:37] <jepler> something that puts together the printf-like arguments from realtime and formats in userspace, for instance
[01:37:03] <jepler> e.g., struct { char *format; union int_or_char args[15]; }
[01:37:08] <jmkasunich> args are a problem because they'd be non-atomic
[01:37:45] <jmkasunich> in any case, the message itself shouldn't come (directly) from the component
[01:37:47] <jepler> you're already talking about storing 3 values
[01:38:04] <jmkasunich> none of which needs to be that large - they'd easily fit in 32 bits
[01:38:21] <jmkasunich> 16 bit component ID, 8 bit instance number, 8 bit error code
[01:38:57] <jmkasunich> back to "message shouldn't come from comp" - thats a key point I want to make
[01:39:16] <jmkasunich> what you might want to tell the user about "maxvel unattainable" isn't always going to be the same
[01:39:29] <jmkasunich> you may want a very emc specific message, or a more generic one
[01:40:04] <cradek> I'm concerned right now about only the exact version of hal that's in the 2.1 branch of emc's cvs.
[01:40:22] <jmkasunich> is the stepgen printf in there? or only in trunk?
[01:40:39] <cradek> you can do whatever you like on trunk afaic.
[01:40:45] <cradek> there's lots of time to work on it
[01:40:59] <jmkasunich> is the stepgen printf in 2.1? or only in trunk?
[01:41:01] <cradek> that message is in the 2.1 branch.
[01:41:11] <jmkasunich> then I have a problem with that
[01:41:27] <jmkasunich> I don't want to release that approach, I think its the wrong way to do things
[01:41:40] <jmkasunich> and once released, it will be a lot harder to replace
[01:42:03] <jmkasunich> hence my desire to come up with something better in two days
[01:42:05] <SWPadnos> I don't think users will care how they get their informative error messages
[01:42:16] <jmkasunich> I would much rather come up with somethign better in two weeks
[01:42:30] <SWPadnos> whether it's terrible programming or beautiful error handling for 2.1 may be irrelevant
[01:42:57] <jepler> nothing commits us to issuing the warning in exactly that way in 2.2 and beyond, even if we issue the warning that way in 2.1.
[01:43:08] <jmkasunich> just inertia
[01:43:18] <SWPadnos> the inertia is with developers, not users though
[01:43:23] <cradek> you can avoid the intertia by changing it in trunk right now
[01:43:25] <jmkasunich> I suppose I'm being a bit of a dick about this whole thing
[01:43:28] <jepler> that won't be a problem -- I see how committed you are to coming up with something better during 2.2!
[01:43:29] <SWPadnos> heh
[01:44:07] <jmkasunich> I think the reason I have such an issue with it is that it doesn't fit my mental model of hal
[01:44:25] <SWPadnos> I think the single error code for all of HAL is no better than individual prints from RT code, it's just a different kind of bad
[01:44:35] <jmkasunich> (and thats the same reason I have a problem with the "copy the value from dummy to sig on a link" and "only update pins when the widget is changed by the user"
[01:44:45] <jmkasunich> all of those things break the basic concept of hal
[01:45:02] <SWPadnos> I'd disagree with you on that, but I don't want to interrupt the error handling discussion :)
[01:45:13] <jmkasunich> which is simple parts that process signals rather than sending messages
[01:45:33] <jmkasunich> a hal comp should do the same thing over and over
[01:46:20] <SWPadnos> lets revisit that in a little bit
[01:46:45] <jmkasunich> re: single error code for all of hal - I won't argue that, an error code for each component would be better
[01:46:53] <jmkasunich> but harder to implement in two days
[01:47:07] <SWPadnos> sure. that's why I think we should leave the print in stepgen, and design something way better for 2.2
[01:47:21] <jepler> I've got the "revert rtapi_print_msg" change ready to check in
[01:47:25] <jmkasunich> that's why I think we should leave the print out of stepgen, and design something way better for 2.2 ;-)
[01:47:31] <SWPadnos> hacking an error system is no good - you have to design then implement
[01:47:34] <SWPadnos> heh
[01:47:46] <jmkasunich> ah-HA!
[01:47:59] <jmkasunich> what you said also applies to the printf!!!
[01:48:23] <jmkasunich> and that is my point - it is a hack, applied for the purpose of solving one (rather common) user error
[01:48:41] <SWPadnos> well, let's look at the "take it out" side for a sec. right now the plan (such as it is) is to release 2.2 "shortly after fest"
[01:48:43] <SWPadnos> heh
[01:48:59] <jmkasunich> don't worry about take it out vs leave it in
[01:48:59] <SWPadnos> the normal error reporting method right now is rtapi_print_msg ...
[01:49:04] <SWPadnos> used all over the place in HAL
[01:49:04] <jepler> there -- now we have to do something better in 2.2
[01:49:08] <jmkasunich> you guys are wearing me down
[01:50:07] <SWPadnos> if it'll only be there for ~4 months, and there's a plan to replace it, and there won't be as many lusers asking for help in the interim - well, draw your own conclusion :)
[01:50:55] <jmkasunich> like I said you are wearing me down
[01:51:04] <jmkasunich> I like jeffs aproach
[01:51:24] <SWPadnos> heh -whatever works :)
[01:51:27] <jmkasunich> ok, lets make a deal - with it out of trunk, I can rest assured that we'll do something better for the future
[01:51:41] <cradek> yays!
[01:51:47] <jmkasunich> with in in the branch, we'll have fewer clueless idiots asking stupid questions
[01:51:56] <SWPadnos> that's the hope anyway
[01:52:24] <SWPadnos> actually, it's a good test case. if there's no decline in stupid questions, we can just avoid the whole error handling question entirely
[01:52:49] <jmkasunich> SWPadnos: "the normal error handling is rtapi_print" "used all over hal" NOT in the RT code it ain;t
[01:53:14] <SWPadnos> ok - it's all over the kernel modules though, so there is precedent
[01:53:14] <jmkasunich> during pin export and such is not the issue, we can print all we want
[01:53:24] <SWPadnos> yep - understood
[01:53:48] <SWPadnos> ok - back to the signal thing, if wveryone's happy with the error handling plan
[01:53:56] <jmkasunich> just one more bit of info
[01:54:00] <SWPadnos> ok
[01:54:01] <jmkasunich> I ran some tests last night
[01:54:16] <jmkasunich> stepgen average execution time 3000-4000 clocks
[01:54:20] <jmkasunich> worst case 14000
[01:54:25] <jmkasunich> printing the message 24000
[01:54:37] <jmkasunich> ok, I'm done
[01:55:44] <SWPadnos> heh. (at least it only runs once, when it can't be generating steps ...)
[01:56:03] <SWPadnos> though other stuff could be going on in that thread, which would get screwed up ...
[01:56:06] <jmkasunich> hush you
[01:56:13] <SWPadnos> ok - I'm don now, too
[01:56:14] <SWPadnos> donw
[01:56:15] <SWPadnos> done
[01:56:17] <jmkasunich> ;-)
[01:56:19] <SWPadnos> finished
[01:56:26] <SWPadnos> ok - the signal thing
[01:57:02] <SWPadnos> I think it's a bug that the dummy value isn't written to the signal at writer connect time because there could be a long delay before the value gets updated, even in RT code
[01:57:03] <jmkasunich> people want to be able to have user space comps that don't regularly write to their outputs
[01:57:14] <jmkasunich> not in RT code
[01:57:23] <SWPadnos> that's one piece of it, yes
[01:57:39] <SWPadnos> ok - RT code that runs once every 2 seconds gets put in control of stepgen ...
[01:57:56] <SWPadnos> there's a pause of up to 2 seconds before the new connection has any effect
[01:58:03] <jmkasunich> then the assumption is that the signal that is being passed has a bandwidth well under 0.5Hz
[01:58:22] <SWPadnos> I don' t see the advantage of not copying the value
[01:58:36] <jmkasunich> we can debate the user side (and I am willing to consider compromize, after we discuss some details and other issues)
[01:58:44] <jmkasunich> but on the RT side, HAL _IS_ a sampled system
[01:59:02] <jmkasunich> outputs should be updated on every sample (and that should always be enough)
[01:59:46] <SWPadnos> ok, but aside from that pholosophical issue, is there actually any advantage to leaving out the one line of code that would accomplish everything everyone wants?
[01:59:59] <jmkasunich> if you change signal connections, nothing is gonna happen until the code that reads those signals runs
[02:00:18] <jmkasunich> SWPadnos: ok, lets skip the philosophy and go right to the detauls
[02:00:21] <jmkasunich> one line won't do it
[02:00:29] <SWPadnos> hold on - I have a hardware analogy that may convince you :)
[02:00:35] <jmkasunich> you are talking about copying dummy to *sig
[02:00:44] <SWPadnos> yes
[02:00:54] <jmkasunich> what about when you disconnect a pin from one sig and connect it to another?
[02:01:08] <SWPadnos> this is only done with writers
[02:01:15] <SWPadnos> (so it's 2 lines)
[02:01:16] <jmkasunich> should the old value of the old sig be copied to to the new one>?
[02:01:42] <jmkasunich> what about when you disconnect a pin from a signal and don't do anything else with it (for a while)
[02:01:51] <SWPadnos> ok - I see that issue. it's more than one line, but not insurmountable
[02:01:56] <jmkasunich> should the old value of the sig be copied to the dummy?
[02:02:04] <SWPadnos> ewll, there' no "reconnect" function - you disconnect then connect, right?
[02:02:54] <jmkasunich> you can linksp foo bar.0.out, even if pin bar.0.out is already connected to signal blat
[02:03:02] <jmkasunich> that results in an unlink immediately followed by a link
[02:03:14] <jmkasunich> dunno if it ever connects to the dummy in the middle (but I think it does)
[02:03:27] <SWPadnos> ok, sitll two discrete operations, done by two separate functions (called in sequence)
[02:03:56] <jmkasunich> I think so, would need to double check the code
[02:03:57] <SWPadnos> it probably does, unless the unlink and link code is duplicated
[02:04:03] <jmkasunich> right
[02:04:41] <SWPadnos> one short aside - when you disconnect an input pin, does the signal value get copied to the dummy?
[02:04:48] <jmkasunich> I don't think so
[02:04:57] <SWPadnos> ok
[02:05:25] <jmkasunich> that would be like having a huge capacitor on the input - disconnect it, and the value doesn't go to zero, it stays where it was
[02:06:02] <jmkasunich> this almost needs a little chart
[02:06:12] <jmkasunich> pin type and action, what do you do?
[02:06:23] <SWPadnos> ok - there's an assumption that the input can change with infinite slew rate (to the default value) without bad consequences
[02:06:26] <SWPadnos> right
[02:06:27] <jmkasunich> input pin, link - do nothing, the pin will read the signal
[02:06:47] <jmkasunich> input pin, unlink - connect to dummy, leave dummy value wherever it was?
[02:06:58] <SWPadnos> sounds good so far
[02:07:15] <jmkasunich> (normally zero or whatever the default for the componet is, but if you did a setp to the unconnected pin, you'll get the setp value instead
[02:07:16] <SWPadnos> I think io should be treated like input in both cases as well
[02:07:24] <SWPadnos> right
[02:07:51] <jmkasunich> yeah IO is one case where every write _is_ an explicit result of something happening in the module
[02:08:07] <SWPadnos> right, and ther eneeds to be some interlock or it gets screwed up
[02:08:21] <jmkasunich> actually, input unlink is non-obvious
[02:08:29] <jmkasunich> because of the setp while unlinked thing
[02:08:36] <SWPadnos> well, either way is OK
[02:09:00] <SWPadnos> if you don't copy, you get a step function, but then again it's assumed that the value is valid (is that a bad assumption?)
[02:09:23] <jmkasunich> its the same value that was going into the component before it was linked
[02:09:28] <SWPadnos> if you do copy, then as you pointed out, things may keep acting as though they were being driven (think stepgen enable, for instance)
[02:09:31] <jmkasunich> and probably the default value chose by that component
[02:09:54] <jmkasunich> right - I think the right choice for "input unlink" is don't copy
[02:09:59] <SWPadnos> right
[02:10:04] <jmkasunich> ok, inputs are done
[02:10:08] <jmkasunich> now outputs
[02:10:24] <jmkasunich> (we think we know what to do about io, but lets revisit that after we are happy with outputs)
[02:10:35] <SWPadnos> so for outputs, I'd say that it's valid to copy in both cases
[02:10:40] <SWPadnos> reasons:
[02:10:40] <jmkasunich> output link - copy dummy to *sig?
[02:10:54] <SWPadnos> yep. and copy sig to dummy at unlink
[02:11:20] <SWPadnos> reason being that the component did write its output. it's a HAL implementation detail that the data went somewhere else
[02:11:37] <SWPadnos> so changing where the pin is connected shouldn't change the value of the pin
[02:12:04] <jmkasunich> I think you've convinced me
[02:12:12] <SWPadnos> yay! :)
[02:12:25] <jmkasunich> if it was a real component, the output would immediately drive the signal as soon as you connected it
[02:12:37] <jmkasunich> because real components work in continuous time, not discrete
[02:12:45] <SWPadnos> right. it's the wires that have the memory in HAL, which isn't quite like the real world
[02:13:08] <jmkasunich> existing hal RT comps work in discrete, and by writing every time they get the same result (at least any readers in the same thread will see the same result)
[02:13:32] <jmkasunich> with the copies, even readers in faster threads would get the expected result
[02:13:33] <SWPadnos> userspace just makes that discrete time more noticeable to - err - users ;)
[02:13:36] <SWPadnos> right
[02:14:06] <SWPadnos> you wanted to revisit IO?
[02:14:25] <SWPadnos> (I'm happy treating IO as a reader in this context)
[02:14:44] <jmkasunich> I think we decided correctly the first time around
[02:14:55] <jmkasunich> I will implement that
[02:15:08] <jmkasunich> cradek: is this a feature or a bugfix? ;-)
[02:15:10] <SWPLinux> ok. next question - is this something that should be in 2.1? (I think so)
[02:15:12] <SWPLinux> heh
[02:15:47] <jmkasunich> hmm...
[02:15:54] <jmkasunich> I think I see a race
[02:16:14] <jmkasunich> pin is unlinked (pointing at dummy)
[02:16:20] <jmkasunich> it runs, writes 1.0 to dummy
[02:16:33] <jmkasunich> somebody does halcmd unlink
[02:16:45] <jmkasunich> hal_lib copies 1.0 from dummy to temp
[02:16:53] <SWPLinux> sure. you have two pieces of data to change - the pointer and the value
[02:16:58] <jmkasunich> rt thread interrupts the unlink
[02:17:08] <jmkasunich> rt comp writes new value to the signal
[02:17:16] <jmkasunich> unlink resumes, and writes the old value to the signal
[02:17:41] <SWPLinux> in the case of an unlink, change the pointer first, then the dummy value
[02:17:55] <SWPLinux> the dummy shouldn't matter as much, even if it's about to get reconnected
[02:18:24] <jmkasunich> none if this matters much if its a "proper" component that writes its outputs all the time
[02:18:35] <SWPLinux> right
[02:18:37] <jmkasunich> but if its one that writes on change only, you could have the wrong data for a long time
[02:19:03] <SWPLinux> on link, read dummy value, write dummy to signal, then change pin pointer
[02:19:21] <jmkasunich> thats what bugs me - its so much easier to guarantee a robust system if you just write your outputs periodically
[02:19:33] <jmkasunich> is that so much to ask from a component author?
[02:19:36] <SWPLinux> heh
[02:20:31] <jmkasunich> there are NO races in the current system - the order of functions in a thread precisely and unambigously determines what gets written when
[02:20:52] <jmkasunich> by allowing a user space thread (the unlink or link command) to write a signal, you introduce the possiblity of races that didn't exist before
[02:21:25] <jmkasunich> infact, I mis-spoke
[02:21:35] <jmkasunich> its not the order of functions in a thread at all
[02:21:45] <jmkasunich> its the fact that hal enforces only one writer per signal
[02:21:58] <SWPLinux> yes
[02:22:04] <jmkasunich> unlink() becomes the second writer and all that goes out the window
[02:22:24] <SWPLinux> there's also the double-read approach
[02:22:46] <SWPLinux> which is based on the assumption that several instructions in a row will execute in userspace, before a second RT interrupt
[02:23:05] <jmkasunich> I'm not seeing that helping, am I missing something?
[02:23:15] <jmkasunich> assume an unlink
[02:23:35] <jmkasunich> and the comp (which only writes when someone clicks it) gets clicked in the middle of the operation
[02:23:44] <SWPLinux> it can't
[02:23:56] <jmkasunich> how do we ensure that dummy has the latest value that the comp wrote after the unlink is done?
[02:24:04] <jmkasunich> it can't what?
[02:24:05] <SWPLinux> well, maybe it can. Linux is a preemptive multitasker, after all ...
[02:24:15] <jmkasunich> yeah, it is ;-)
[02:24:33] <SWPLinux> read it again
[02:24:45] <jmkasunich> in fact, you can't guarantee that a RT thread doesn't run 100 times in the middle of an unlink command
[02:24:54] <SWPLinux> it does get more complex than a one-liner, it's true
[02:25:22] <jmkasunich> unfortunately you've now convinced me that it is a beneficial thing to do
[02:25:28] <SWPLinux> unlink procedure:
[02:25:29] <SWPLinux> heh
[02:25:41] <SWPLinux> read signal value
[02:25:45] <SWPLinux> read dummy pin value
[02:25:48] <SWPLinux> change pointer
[02:26:14] <jmkasunich> number your steps, so we can come back to them
[02:26:19] <jmkasunich> 1) read signal value
[02:26:21] <SWPLinux> heh - even better
[02:26:22] <SWPLinux> ok
[02:26:25] <jmkasunich> 2) read dummy
[02:26:26] <SWPLinux> start over
[02:26:29] <jmkasunich> 3) change pointer
[02:26:40] <jmkasunich> 4) steal underpands
[02:26:44] <SWPLinux> ok, except that I'm about to reorder them ;)
[02:26:44] <jmkasunich> 5) profit!
[02:26:52] <SWPLinux> 4a) ...
[02:27:05] <SWPLinux> (there's always a ... before profit!)
[02:27:24] <jmkasunich> even for calvin?
[02:27:41] <SWPLinux> hmmm. it would be good to have a special dummy location to point pins
[02:27:51] <SWPLinux> well, maybe the pin dummy works for that
[02:28:04] <SWPLinux> ok. here goes:
[02:28:10] <SWPLinux> 1) read signal
[02:28:19] <SWPLinux> 2) write that to dummy location
[02:28:22] <SWPLinux> (pin dummy data)
[02:28:28] <SWPLinux> 3) change pointer
[02:28:49] <SWPLinux> 4) read signal again
[02:29:31] <SWPLinux> 5) read dummy
[02:30:25] <SWPLinux> 6) if (dummy != first signal read), done (pin has updated the dummy value)
[02:31:11] <SWPLinux> 7) write second signal value to dummy
[02:31:47] <SWPLinux> there's still a race between 6 and 7, but we are unlinking here, so there is guaranteed to be some time before the value is needed again
[02:32:05] <jmkasunich> "some time" doesn't matter
[02:32:31] <jmkasunich> the whole reason for this is to allow components that may wait a very long "some time" between updating their outputs
[02:32:44] <SWPLinux> yeah. we need a test-and-set atomic instruction
[02:32:52] <jmkasunich> given that, we need to get the copy right _every_ time
[02:33:12] <SWPLinux> which does exist on some CPUs (cmpxch8b or some such)
[02:33:59] <jmkasunich> yeah, but suddely we're diving into /arch/asm/bitops.h or some such - I don't want to go there
[02:34:17] <SWPLinux> even worse - I'm not sure it's exported beyond some spinlock usage or soemthing
[02:37:34] <SWPLinux> hmmm. "I'm trying to think, but nothing happens"
[02:37:39] <jmkasunich> * jmkasunich wants to write a polite note to component authors asking them to periodically update their outputs
[02:37:58] <SWPLinux> well, that's only part of the problem ...
[02:38:25] <jmkasunich> no, its all of the problem
[02:39:02] <SWPLinux> no, because you still have the issue that HAL holds pin values in the "wires" rather than in the pins
[02:39:07] <jmkasunich> if you _don't_ copy, the net result is that the link simply has no effect until the next time the comp writes its output
[02:40:15] <jmkasunich> since links are asynchronous anyway, and done by user space code that can be deferred because of other things (premption), there is no difference between a link that is deferred because halcmd got prempted, and one that got "deferred" until the next time the comp does a write
[02:40:45] <jmkasunich> as long as the "next time" isn't a long time
[02:41:20] <jmkasunich> hmm, there is another issue on unlink
[02:41:40] <jmkasunich> we've been worrying about copying the signal value to dummy for later use during a link
[02:41:45] <jmkasunich> but what about the signal itself?
[02:41:58] <jmkasunich> when its disconnected, it will retain its previous value
[02:42:14] <SWPLinux> yeah. I have a solution to that, I think
[02:42:19] <SWPLinux> actually to both problems
[02:42:36] <SWPLinux> it totally changes the way pins and signals are handled in HAL though :)
[02:43:12] <SWPLinux> and I'm not sure it can work with IO pins (multiple writers) - it'll bear more thought
[02:43:20] <jmkasunich> are you gonna suggest that the storage be part of the output pin instead of the signal?
[02:43:43] <SWPLinux> yes. change which one has the double dereference and the single dereference
[02:44:19] <SWPLinux> it does slow down pin cod though, which isn't good
[02:44:21] <SWPLinux> code
[02:45:03] <jmkasunich> does it?
[02:45:06] <SWPLinux> I dunno
[02:45:20] <SWPLinux> the signal becomes a pointer to the actual writer pin
[02:45:24] <SWPLinux> data
[02:45:39] <jmkasunich> today, reading a pin is: x = *(comp->pin)
[02:45:51] <jmkasunich> writing a pin is *(comp->pin) = x
[02:46:14] <jmkasunich> the sig never comes into play - the link command magically alters comp->pin to point at sig->data
[02:46:19] <SWPLinux> I think it would be a **, but I'm not sure
[02:46:26] <SWPLinux> right
[02:46:35] <SWPLinux> so the same thing would happen here, I guess
[02:47:07] <SWPLinux> right - you need the second dereference so you can change the writer of the signal
[02:47:09] <jmkasunich> if there are 10 readers and 1 writer, at the moment all 11 pointers aim at sig->data
[02:47:23] <jmkasunich> and unlinking or linking a pin to that signal means altering that pins pointer only
[02:47:30] <SWPLinux> hmmm - right. ok
[02:47:50] <SWPLinux> but unlinking from the writer means updating all the reader pointers
[02:47:55] <jmkasunich> if the 10 reader pointers pointed at the writer, then changing to a different writer would mean changing all 10 reader pointers)
[02:48:04] <SWPLinux> right :)
[02:48:06] <jmkasunich> bigtime non-atomic operation there
[02:48:17] <SWPLinux> right. that's why the double dereference
[02:48:27] <jmkasunich> oh
[02:48:34] <SWPLinux> the pins point to the signal, which is apointer to the writer (or a dummy)
[02:48:40] <jmkasunich> comp->pin->sig->writer_pin.data
[02:48:44] <SWPLinux> yep
[02:48:54] <SWPLinux> the advantage is that writers never change
[02:49:05] <SWPLinux> they just point to the data (or are the data ...)
[02:49:12] <jmkasunich> are the data I think
[02:49:16] <SWPLinux> yep
[02:49:29] <jmkasunich> that eliminates the copy
[02:49:33] <SWPLinux> exactly
[02:49:42] <jmkasunich> the dummy is part of the signal struct
[02:49:47] <SWPLinux> the only problem is io pins, though that may not be a problem
[02:49:49] <SWPLinux> right
[02:49:59] <SWPLinux> this may actually significantly reduce the HAL memory footprint
[02:50:04] <jmkasunich> and is used for signals that have no writers (which includes tri-state signals - IO)
[02:50:11] <SWPLinux> yep
[02:50:38] <jmkasunich> I don't see it reducing the memory footprint much
[02:50:52] <SWPLinux> true - not much
[02:51:02] <SWPLinux> every writer would lose a pointer
[02:51:06] <jmkasunich> gets rid of pin->dummy, but that is small compared to the rest of the pin data, notabley the name
[02:51:36] <jmkasunich> the actual structs that are part of components woudn't change size at all
[02:51:43] <SWPLinux> true
[02:51:52] <jmkasunich> reader or I/O pins would have a pointer, writer pins would have a value
[02:52:17] <SWPLinux> sure, which means that the data can come out of the component. so it's a small win there
[02:52:19] <jmkasunich> that right there is ugly in terms of converting legacy components - .comp ones would be easy, change .comp and recompile
[02:52:43] <jmkasunich> no, the size of the comp struct remains the same
[02:52:50] <SWPLinux> heh - that's the problem with using non-OO languages to make OO-like systems :)
[02:53:07] <jmkasunich> :-P
[02:53:10] <SWPLinux> heh
[02:53:55] <SWPLinux> ok, a quick grep for *( and -> yields 47391202 occurrences in hal/ :)
[02:54:22] <jmkasunich> liar
[02:54:25] <SWPLinux> heh
[02:54:48] <jmkasunich> if you had said 10000, I might have believed you
[02:54:59] <SWPLinux> I wanted to avoid that, actually :)
[02:55:20] <jmkasunich> I like ->
[02:55:44] <jmkasunich> I am far more comfortable with pointers than with objects
[02:55:48] <jmkasunich> they don't hide what they are
[02:55:51] <SWPLinux> yeah -like that would be a useful search term
[02:56:40] <SWPLinux> yeah. in most cases where things don't hide what they are, you have to do a lot of editing if you want to change what they are ...
[02:57:10] <jmkasunich> philosophy again
[02:57:15] <SWPLinux> in the case of comp, it looks like C (even though it's macro-ized for readability), so things can get rewritten pretty easily
[02:57:37] <jmkasunich> I'd rather be able to dive down to lower levels of abstraction, OO folks want to focus on higher levels
[02:57:53] <SWPLinux> oh, I think that's reality - I'm not saying which way is better (most of my code is in assembly language ...)
[02:58:01] <SWPLinux> right
[02:58:14] <jmkasunich> anyway, I think that we can both agree that the comp approach is nicer
[02:58:18] <SWPLinux> at the low levels, you have to know exactly what's happening
[02:58:31] <SWPLinux> yep
[02:58:38] <jmkasunich> but in my case, thats because even tho its hidden, I still KNOW what is under there
[02:58:39] <SWPLinux> it helps a lot in many circumstances
[02:58:42] <SWPLinux> yep
[02:58:45] <jmkasunich> if I didn't know, I'd be less happy about it
[02:59:15] <jmkasunich> anyway, enough philosophy
[02:59:20] <SWPLinux> ok
[02:59:28] <jmkasunich> a while back, we were convinced that the copy was a good thing
[02:59:48] <SWPLinux> well, that having pins not get changed by HAL is a good thing
[02:59:56] <jmkasunich> then we (or at least I) became convinced that doing it in a race-proof manner is non-trivial to impossible
[03:00:00] <SWPLinux> yes
[03:00:35] <jmkasunich> there are ways, such as the double indirection thing, but they definitely are NOT options for 2.1
[03:00:56] <SWPLinux> certainly not
[03:01:03] <jmkasunich> so, what do we do?
[03:01:07] <jmkasunich> (for 2.1)
[03:01:21] <jmkasunich> I'm tempted to ask awallin to make his widgets periodic
[03:01:41] <SWPLinux> well, since a race-less copy is probably out of the question, I'd say leave it and make the widgets periodic
[03:01:43] <SWPLinux> :)
[03:02:10] <jepler> I think making them periodic is not hard, but don't quote me on that
[03:02:15] <SWPLinux> the way jepler made the hal module, it's trivial - an update function gets called every 100 ms
[03:02:23] <SWPLinux> it was easy for the dials anyway
[03:03:03] <SWPLinux> the problem was that awallin's components have a lot more work to do, so it looks like it's ugly to do it 10 times a second when nothing is changing
[03:03:07] <jmkasunich> event handler: if (user tweaks something) { localout = value_from_widget(); *pin = localout }
[03:03:17] <jmkasunich> update: *pin = localout;
[03:03:36] <jmkasunich> the only thing you do ten times a second is an assignment
[03:03:48] <SWPLinux> possibly an assignment to many pins though
[03:04:06] <jmkasunich> python isn't that slow
[03:04:06] <SWPLinux> the particular one that was a problem was a set of radio buttons with multiple bit outputs
[03:04:10] <SWPLinux> nope
[03:04:19] <jmkasunich> you can probably assign to 100 pins in the time it takes to make one tkinter call
[03:04:39] <SWPLinux> well, you have the same multiple-writer problem with a UI though
[03:04:51] <jmkasunich> ?
[03:04:53] <SWPLinux> there's the periodic update (like RT kernel code)
[03:05:00] <SWPLinux> and the event handler (like userspace)
[03:05:16] <SWPLinux> if the update is non-atomic, you don't want to interleave the two
[03:05:25] <jmkasunich> but I think in reality there is a polling loop, and only one thread of execution
[03:05:34] <SWPLinux> probably
[03:05:38] <SWPLinux> dunno though
[03:05:48] <jmkasunich> otherwise every GUI app in the world would be filled with races
[03:05:58] <jepler> that's true of Tk
[03:06:16] <SWPLinux> it's pretty common to separate UI and other stuff into multiple threads so the UI doesn't freeze when something is going on
[03:06:28] <jepler> usually you have one thread, and code is called based on timers or events, and returns after doing a bit of work
[03:06:30] <SWPLinux> (one for UI, one for "other stuff")
[03:06:56] <jmkasunich> basically, we want a "refresh" event
[03:07:00] <SWPLinux> ok, and the timeout is just another event? (in python)
[03:07:14] <jepler> axis is an example of how you *can* add threads to Tk if you want. The position is logged from the NML structure in a separate thread which means it doesn't lose track during a (potentially 100ms long) redraw of the screen
[03:07:31] <jepler> SWPLinux: yeah, something like that
[03:07:36] <jmkasunich> but you had to explicitly add that thread
[03:07:37] <SWPLinux> ok - good enough
[03:08:01] <jepler> jmkasunich: yep
[03:08:16] <jmkasunich> by default, if you are handling a mouse click event, you don't have to worry about a keypress event on another (or the same) widget running concurrently
[03:08:25] <jepler> SWPLinux: I shy away from saying that is "in python", because Python has interfaces to many different GUI toolkits, and they may not all have the same model
[03:08:33] <jmkasunich> that would make GUI programmers nuts I would think
[03:08:58] <SWPLinux> jepler: ok - just wondering how it's likely to work with halmodule
[03:09:35] <jepler> jmkasunich: don't worry -- even with just one thread, there are *still* a lot of ways that the (in)experienced programmer can shoot himself in the foot
[03:09:44] <jepler> basically they all boil down to recursively entering the event loop
[03:09:54] <jepler> (I don't think that pyvcp does this)
[03:11:25] <jmkasunich> the fact that recursively entering the loop is considers a foot-injuring incident tells me that you don't normally have concurrent event handling (if your code was written to handle concurrent events, then re-entering the loop wouldn't be so bad, as long as it was bounded)
[03:12:05] <jmkasunich> so...
[03:12:14] <SWPLinux> interestingly enough, I had a problem with events being asynchronous in a program I wrote (on Windows). The right-click event handler caused a popup menu to be displayed, and did soemthing depending on what menu item the user clicked. the function would always exit without doing anything. as it turns out, the call to pop up the menu just created an event, and I had to use a separate...
[03:12:15] <SWPLinux> ...function to wait for a selection (trackpopup, as it happens) before continuing
[03:12:52] <jmkasunich> we need "refresh" or "update" or "timepout" events (whatever you want to call them)
[03:13:04] <SWPLinux> they're already there - jepler is smart :)
[03:13:09] <jmkasunich> and short simple handlers for that event for each widget in pyvcp
[03:13:38] <jmkasunich> actually, the normal event handlers should be written like:
[03:13:40] <SWPLinux> is this short enough:
[03:13:41] <SWPLinux> def update(self,pycomp):
[03:13:44] <SWPLinux> self.pycomp[self.halpin] = self.faccumul8r
[03:14:02] <jmkasunich> query the widget; write to a local; call the refresh handler;
[03:14:42] <jmkasunich> SWPadnos: yes
[03:14:46] <SWPLinux> mouse/key actions cause events, so the internal value can easily be changed then
[03:14:52] <jmkasunich> right
[03:15:11] <jmkasunich> and that event should call the refresh, so we don't wait 100mS to write to the pin
[03:15:15] <SWPLinux> anyway - it's all there, but awallin was having some specific problem with a component or two
[03:15:33] <SWPLinux> the other components already work this way
[03:15:34] <jmkasunich> (the event handler could explicitly write the local and the pin, but it seems better to reuse the refresh handler)
[03:16:09] <SWPLinux> sure. if we can assume that a particular widget will not interrupt itself, then there aren't any race issues
[03:16:49] <jmkasunich> so, have we decided what do to (for 2.1)?
[03:17:00] <jmkasunich> s/do to/to do/
[03:17:15] <SWPLinux> I think "leave it alone and fix the errant components" is the best bet for 2.1
[03:17:19] <jmkasunich> agreed
[03:17:26] <jmkasunich> hope awallin agrees too
[03:17:32] <SWPLinux> I do think it's an error that should be addressed, either in 2.2 or in the refactor
[03:17:36] <jmkasunich> yes
[03:18:10] <SWPLinux> did someone start a wiki page with things in that category?
[03:18:19] <jmkasunich> tasks for trunk: error communication to user space and the user himself, and race free online unlink and link with well defined behavior
[03:18:45] <jmkasunich> ohh, HGR is having a sidewalk sale tomorrow ;-)
[03:18:51] <jmkasunich> free lunch
[03:18:57] <SWPLinux> heh
[03:19:03] <jmkasunich> 8am to 2pm
[03:19:15] <SWPLinux> hey - there's an eBay seller near you - "chuckwhole"
[03:19:17] <jmkasunich> dammit, I intend to sleep till at least 11, and then I have some chores that need done before 1
[03:19:27] <SWPLinux> has some nice industrial stuff
[03:19:35] <jmkasunich> I'm sure there's more than one, cleveland is a big city
[03:19:38] <SWPLinux> heh
[03:19:40] <jmkasunich> or do you mean very near
[03:19:54] <SWPLinux> damn - I forgot a couple of auctions this afternoon
[03:23:13] <jmkasunich> ok, I need to decide what kind of actual accomplishment I'm going to make tonight
[03:23:15] <jmkasunich> (besides talking)
[03:23:22] <SWPLinux> check!
[03:23:23] <jmkasunich> at least this time we reached some decisions
[03:23:32] <SWPLinux> yes. good ones, I think
[03:23:37] <jmkasunich> I guess backporting halcmd net is a start (that and the other fixes)
[03:23:52] <SWPLinux> http://wiki.linuxcnc.org/cgi-bin/emcinfo.pl/emcinfo.pl?Emc2.1.0
[03:25:32] <jepler> I'm so sick of documentation, but I'll try to work on some of those HAL component manpages if I can stand it
[03:25:49] <jmkasunich> jepler: I will be doing stepgen and freqgen
[03:25:58] <jepler> jmkasunich: OK. pwmgen?
[03:26:04] <jmkasunich> that two
[03:26:07] <jmkasunich> too
[03:26:10] <SWPLinux> it's not on the list, but I'm trying to make some quickstart guides
[03:26:52] <jmkasunich> jepler: if you want to do something right away - can you figure out which hal components do and don't have manpages, and put the list on the wiki?
[03:26:51] <jepler> I added some stuff to the "stepper config" lyx file earlier today that I did not backport yet
[03:27:01] <jepler> I used 'net' and didn't want to backport as-is if net wouldn't be in 2.1
[03:27:09] <jepler> jmkasunich: yes I'll put a list of the ones that don't
[03:27:13] <jmkasunich> ok
[03:27:22] <jmkasunich> I should have net backported in less than an hour
[03:27:37] <jmkasunich> (I want to review the diff carefully, then commit it, no real coding to be done)
[03:27:48] <jmkasunich> then I'll start on manpages
[03:28:07] <jmkasunich> ls
[03:33:28] <jepler> list added to the wiki page
[03:35:43] <jmkasunich> tanks
[03:36:23] <jepler> jmkasunich: is the non-blocks version of 'estop' called 'estop_latch'? Or are they different?
[03:36:45] <jmkasunich> eww...
[03:36:49] <jmkasunich> I'm not sure
[03:36:50] <SWPLinux> I think I fussed and pouted enough to have the name changed :)
[03:37:07] <jmkasunich> neither one of those is very nice (they smell of kluge)
[03:37:17] <SWPLinux> the functionality may have changed as a result of discussions with jonE as well
[03:37:23] <jmkasunich> yeah
[03:37:25] <SWPLinux> (though I'm not sure)
[03:37:55] <jmkasunich> btw, I'm not committing to man pages for the blocks versions of things
[03:38:12] <jepler> I was going to make the blocks manpage mostly a pointer to the new manpages and a note that blocks is deprecated...
[03:38:20] <jmkasunich> I like that idea
[03:39:08] <SWPLinux> the two estops are the same
[03:39:23] <jmkasunich> other than their names?
[03:39:26] <SWPLinux> yep
[03:39:42] <SWPLinux> the _latch was added because it doesn't actually stop anything, I think
[03:40:00] <jmkasunich> yeah, its just part of the estop system
[03:40:25] <SWPLinux> heh - "part of the stop system", if I want to be pedantic ;)
[03:40:45] <jepler> DESCRIPTION
[03:40:45] <jepler> Most of the items available in blocks are the same as in the individual
[03:40:48] <jepler> components, named below. blocks is deprecated and should not be used
[03:40:51] <jepler> in new HAL configurations. blocks may be removed from emc2 as soon as
[03:40:54] <jepler> version 2.2.0.
[03:41:08] <SWPLinux> (I was just in a long discussion about semi-S2 safety regs, which are at least as paranoid as anything in the machine industry :) )
[03:41:11] <jmkasunich> s/soon/early/
[03:41:27] <SWPLinux> looks good to me
[03:41:30] <jepler> OK
[03:41:55] <SWPLinux> should blocks be capitalized at the start of sentences, or left lowercase since it's "computerspeak"?
[03:42:22] <cradek> this geek says it's "misspelled" unless it's lowercase
[03:42:23] <jmkasunich> if you are referring to the component, I would not capitalize
[03:42:41] <jepler> I'll move on to documenting something more useful now :)
[03:42:43] <jmkasunich> if you are referring to the things in it ("the available blocks") then yes
[03:42:51] <SWPLinux> ok by me - ws curious about this particular situation
[03:42:54] <SWPLinux> s/ws/was/
[03:43:43] <jmkasunich> wow, more than I expected without manpages
[03:43:48] <jepler> I'm volunteering to do these manpages: blocks counter debounce weighted_sum
[03:44:02] <jmkasunich> ok
[03:44:08] <jmkasunich> add encoder to my list
[03:44:30] <jepler> pid is probably an important one too
[03:44:42] <jmkasunich> and supply (I think I'll rewrite that one as a .comp, it will be self documenting that way)
[03:44:43] <jepler> this list doesn't include any hardware drivers, I made it on a 'sim' machine
[03:45:25] <jmkasunich> hmm... some things like parport and motenc are nicely documented in the lyx
[03:45:43] <jepler> yeah some of these have quite extensive documentation in lyx
[03:45:44] <jmkasunich> I can't help thinking that we're missing something by having manpages and lyx as independent things
[03:45:52] <cradek> I'm going to head home - goodnight guys
[03:45:59] <jepler> don't forget your brownies
[03:46:00] <jmkasunich> goodnight chris
[03:46:06] <SWPLinux> see you later
[03:46:10] <cradek> thanks, I won't
[03:46:30] <jmkasunich> lyx has the advantage of allowing pictures and such
[03:46:37] <jmkasunich> man is more "at your fingertips"
[03:46:45] <jepler> yeah -- I like man for that reason
[03:46:58] <jepler> no "start a pdf viewer" or webbrowser
[03:49:48] <jmkasunich> it would be nice if the lyx could import the manpage for the basics (pins, etc) and add more detail - additional text and/or drawings
[03:49:59] <jmkasunich> sometimes a pic is worth a thousand words
[03:50:23] <jmkasunich> like the stepping types, and the definition of stepspace, dirhold, etc
[03:51:53] <jepler> it's tough -- especially for the autogenerated stuff
[03:52:01] <jmkasunich> yeah
[03:52:10] <jmkasunich> for now, I'll be content with manpages
[03:52:46] <jmkasunich> its a copout, but for modules that have nice detailed descriptions in the lyx, the manpage will probably be very brief, and reference the lyx
[03:53:24] <jepler> I don't mind that at all
[03:59:04] <jmkasunich> is the variable memory length for halscope a backport candidate?
[04:00:28] <jmkasunich> nah - to big of a diff
[04:04:50] <jmkasunich> jepler: is the comp in trunk the same as the one in 2.1?
[04:05:01] <jmkasunich> * jmkasunich is being lazy, I should just diff it
[04:05:02] <jepler> jmkasunich: I think there may be differences now
[04:05:17] <jepler> in fact I'm pretty sure it's different
[04:05:23] <jmkasunich> I was wondering if any of the docs should be backported
[04:05:26] <jmkasunich> sounds like no
[04:05:39] <jepler> earlier I had to backport the "pins with default values" change from TRUNK to v2_1_branch
[04:07:50] <jepler> these revisions are new features that are not on v2_1_branch:
[04:07:51] <jepler> revision 1.11
[04:07:52] <jepler> date: 2006/12/26 03:49:05; author: jepler; state: Exp; lines: +27 -5
[04:07:52] <jepler> improve error handling. introduce 'unsigned' and 'signed' as names for the HAL
[04:07:55] <jepler> types, and deprecate 's32' and 'u32'.
[04:07:55] <jepler> revision 1.14
[04:07:58] <jepler> date: 2007/01/07 21:20:40; author: jepler; state: Exp; lines: +63 -5
[04:08:00] <jepler> comp can now build userspace components
[04:08:06] <jepler> other recent changes were bugfixes and have been backported
[04:08:33] <jepler> I'm still not sure whether 1.11 is the right thing to do, incidentally
[04:08:34] <jepler> 'night
[04:08:44] <jmkasunich> ok, short answer - I will not backport any changes to comp.lyx that you haven't alread backported
[04:08:56] <jmkasunich> goodnight
[04:09:12] <SWPLinux> see you Jeff
[05:56:37] <jmkasunich> damn...
[05:56:44] <jmkasunich> waitpid can only wait for its own children
[14:02:43] <jepler> jmkasunich: whee -- waituser is cool
[14:02:58] <jepler> I had wanted much the same thing
[14:03:13] <jepler> looks like the implementation was not bad either
[14:03:17] <alex_joni> heh
[14:03:26] <alex_joni> was?
[14:03:31] <jepler> is?
[14:03:42] <alex_joni> ok.. thought you want to change it :D
[14:03:43] <jepler> good morning alex
[14:04:12] <alex_joni> hi jeff
[14:09:37] <alex_joni> bbl
[15:19:01] <lerneaen_hydra_> lerneaen_hydra_ is now known as lerneaen_hydra
[16:20:57] <jepler> cradek: I'm kinda thinking that the separate -doc package is not a good idea, if it means upgraders will lose the PDF documentation
[16:21:04] <jepler> (the "launcher in the main package" bug aside)
[16:23:40] <jepler> that said, the docs are about 3x as big as the main package
[16:24:09] <jepler> 4528 emc2-docs_2.1.0~alpha0_all.deb 1628 emc2-sim_2.1.0~alpha0_amd64.deb
[16:36:26] <cradek> 5-6 MB is getting to be pretty big for dialups, but I also prefer to have them together
[16:42:59] <lerneaen_hydra> aren't EMC's dependancies quite meaty? if so 5-6mb shouldn't be that much (compared to the rest)
[16:44:17] <cradek> a bigger issue might be that all updates will be that size - dialup folks already have to make special provisions for the initial install, and they can get all the dependencies then
[16:45:56] <cradek> but if we can assume only a few bugfix updates per year (like 2.0 had), maybe it doesn't matter at all
[16:47:38] <jepler> 6 megabytes is just 25 minutes at around 30kbps
[16:48:37] <cradek> ok
[16:48:40] <cradek> I'm not worried then
[17:59:30] <tomp_> many options allowed by tkinter do not work with pyvcp. (ref = http://infohost.nmt.edu/tcc/help/pubs/tkinter/radiobutton.html) eg: <relief>RIDGE</relief> for radiobutton
[18:00:19] <tomp_> i see the values are known to the parser, but somehow they're disallowed.
[18:01:24] <tomp_> vcpparse.py has the disallowed options in w_command

#emc-devel | Logs for 2007-01-20