Back
[00:00:01] <_petev> ok
[00:00:25] <jmkasunich> the PID output can (and does) swing rapidly, _especially_ when you give it a step input
[00:00:35] <_petev> yes, I see that
[00:00:56] <jmkasunich> it will go to the rail at first, then when you get close, it will almost as rapidly drop back down to whatever is needed
[00:01:29] <_petev> yes, I will see it drop to the other rail to decel, then it oscillates a bit
[00:01:29] <jmkasunich> if the drive has an accel ramp, its own speed ref will ramp up while the PID output is at the rail, then when the PID drops down, the drive can't follow, and overshoots
[00:02:08] <jmkasunich> you are introducing a lag inside the position loop
[00:02:40] <_petev> so what you are saying is the accel on the PID output will be more than the machine limits
[00:03:07] <jmkasunich> it can be, especially if you are applying a step to the position command
[00:03:22] <_petev> I understand for the step case, but that's just for tuning
[00:03:28] <_petev> how about normal operation?
[00:03:47] <_petev> are you saying it's noraml for the ddt of the PID to be more than the INI accel limit?
[00:03:54] <jmkasunich> in normal operation, the position command will obey machine constraints
[00:04:12] <_petev> so for normal operation, it should not present a problem?
[00:04:17] <jmkasunich> since a well tuned position loop will track the command, the actual position, vel, and accel will also obey constraints
[00:04:26] <_petev> right
[00:04:43] <jmkasunich> that is one possible issue with using steps to tune
[00:04:55] <_petev> so it appears well tuned, except for the I behavior, but I still see issues
[00:05:10] <jmkasunich> you wind up optimizing for the "step drive the loop into saturation, then it recovers" case, which you hopefully will never see in practice
[00:05:13] <_petev> I need to try an tune one of hte other axis as well
[00:05:57] <_petev> I would think the same values would work since the belts are off, but the X, Y, seem to have more audible noise
[00:06:05] <_petev> I need to scope them and see what's going on
[00:06:11] <jmkasunich> yep
[00:06:17] <jmkasunich> can't tune without a scope
[00:06:27] <_petev> the Z drives had about 36mV offset ont he DAC
[00:06:32] <_petev> all the others were close to 0
[00:06:38] <_petev> so that may be something
[00:06:54] <_petev> I offset it in the drive, but it seems suspicious from the behavior
[00:06:54] <jmkasunich> an I term would deal with that automatically
[00:07:16] <_petev> yes, but I can't get I to work for both static an dynamic conditions
[00:07:25] <jmkasunich> the I term you mean?
[00:07:30] <_petev> yes
[00:07:46] <jmkasunich> have you tried small steps?
[00:07:48] <_petev> if I add enough to correct static errors, the dynamic behavior becomes unstable
[00:07:52] <_petev> yes
[00:07:54] <jmkasunich> like 1/2 of a motor revolution?
[00:08:13] <_petev> the step I'm using now is about 1/3 rev
[00:08:16] <jmkasunich> ok
[00:08:29] <jmkasunich> what kind of times are you achieving?
[00:08:32] <_petev> and pretty low freq
[00:08:34] <jmkasunich> under 100mS I hope
[00:08:39] <_petev> plenty of settling time
[00:08:52] <_petev> under 100ms for what, step response?
[00:08:55] <jmkasunich> yes
[00:09:03] <_petev> closer to 50-60 ms
[00:09:13] <jmkasunich> thats about what I was expecting
[00:09:36] <_petev> but it still FE on home operation
[00:09:40] <jmkasunich> I strongly urge you to remove as many limits as possible in the drive
[00:09:53] <_petev> ok, I will take them out and try it
[00:09:56] <jmkasunich> I don't even want to talk about homing until the drives are tuned
[00:10:02] <_petev> then I want to look at the other axis too
[00:10:04] <_petev> sure
[00:10:16] <_petev> but it does appear tuned now, except for I
[00:10:27] <_petev> I will see if removing drive limits helps with I
[00:10:46] <_petev> I also want to take a quick look at the PID block diagram
[00:10:48] <jmkasunich> if you don't achieve an acceptable steady state error shortly after the step, then its not tuned
[00:11:01] <jmkasunich> checking the mazak gains now...
[00:11:09] <_petev> it's pretty good ont he dynamic case with low I
[00:11:13] <_petev> or even no I
[00:11:21] <_petev> but not on a static case
[00:11:32] <_petev> it won't drive the error to 0 without much more I
[00:12:12] <jmkasunich> pgain 4000 igain 6400 dgain 15
[00:12:18] <jmkasunich> no FF0, no FF1
[00:12:23] <_petev> is that mazak?
[00:12:26] <jmkasunich> 0.000035 deadband
[00:12:30] <jmkasunich> yes
[00:12:56] <_petev> with an I gain like that the static case is good, but dynamic is unstable
[00:13:05] <_petev> let me try without drive limits
[00:13:44] <jmkasunich> the ferror limits are set fairly high (0.050 and 0.010) but ISTR that the actual error was under 0.002 during the accel and decel of a rapid, and under 0.0001 pretty much all the rest of the time
[00:14:11] <_petev> do u see any reason why machine limits can't be obeyed when stopping due to FE?
[00:14:47] <jmkasunich> by definition, when you have an FE, the drives are no longer tracking the commanded position
[00:14:48] <_petev> it seems like worst case the pos should just freeze, then at least it's only a vel step and not a pos step
[00:15:02] <_petev> yes, but they are presummably very close
[00:15:10] <_petev> as you just got an FE
[00:15:15] <jmkasunich> I guess I don't understand the question
[00:15:40] <jmkasunich> when the machine gets an FE, the drives are disabled
[00:15:47] <_petev> it seem like chaning the cmd-pos to the fb-pos just causes more spikes out of the PID and abuses the machine
[00:16:03] <jmkasunich> do you have the PID enable hooked up?
[00:16:45] <jmkasunich> if you do, as soon as the machine FEs out, the enable turns off, the PID outputs go to zero, and it coasts to a stop (or ramps at the drive limit, if you don't disable the drives)
[00:16:46] <_petev> maybe there is some delay before the drives react, because they see the large spikes from PID, this is what got me looking for the large accels in the first place
[00:16:58] <jmkasunich> what large spikes
[00:17:08] <jmkasunich> the pid output should be going to zero
[00:17:19] <_petev> when FE happens, the ddt of the motor-pos is very large
[00:17:25] <_petev> it makes a pos step
[00:17:32] <_petev> and the drives were seeing this
[00:17:36] <jmkasunich> the PIDs output should be zero!
[00:17:51] <jmkasunich> are you listening to me? do you have the PID enable hooked up?
[00:17:55] <_petev> I will add the enable
[00:17:58] <_petev> I heard you
[00:18:03] <_petev> just telling you what I see
[00:18:13] <_petev> the drives say there is an accel violation
[00:18:23] <jmkasunich> the enable is the only thing that will save your machine if you have an encoder violation
[00:18:30] <_petev> and we determinen yesterday it conicided with the FE assertion
[00:18:30] <jmkasunich> s/violation/failure
[00:19:05] <_petev> yes, but maybe we need to disable a cycle before stopping the motion
[00:19:23] <jmkasunich> if enable is hooked up, the PID will be disabled the instant the FE occurs, and it won't make a damn bit of difference what the pos-cmd does
[00:19:25] <_petev> right now the large step happens at the same time as the FE
[00:19:36] <_petev> I will add amp enable and see where that is
[00:20:01] <_petev> hmm, I will check that too then
[00:20:10] <_petev> I know the enables from axis are going to the PIDs
[00:20:19] <_petev> so maybe the timing is a cycle off
[00:20:35] <_petev> or maybe the drives have a delay on shutting down
[00:20:46] <jmkasunich> the drives might, the PIDs don't
[00:20:49] <jmkasunich> if they do its a bug
[00:21:10] <_petev> I was thinking more the relation between FE and enable from axis
[00:21:13] <_petev> not the PIDs
[00:21:24] <jmkasunich> you should be able to scope pos-cmd, amp-enable (from motion, goes to PID and at your discretion to the amps), and pid output
[00:21:37] <_petev> yes, I will do that
[00:21:58] <jmkasunich> the PID output and thus the DACs should to zero the instant the FE happens if the PID enables are connected
[00:22:07] <_petev> hopefully it won't be much of an issue as I don't want to be getting FEs all the time ;-)
[00:22:19] <jmkasunich> the amps will do a max-torque decel toward zero, until they see the disable (if its connected)
[00:22:26] <_petev> I will check the HAL file too
[00:22:41] <_petev> I'm pretty sure the thread order is good, but it can't hurt to check again
[00:28:22] <petev> was just looking at the PID block diagram
[00:28:31] <petev> it looks pretty standard
[00:28:59] <petev> what do you think about a IIR low pass on the PID sum output and maybe a notch filter too
[00:29:24] <petev> might be good for taking out resonances and reducing the high freq gain from D
[00:51:57] <jmkasunich> feel free to build that stuff into a new PID component
[00:54:44] <petev> do you see any issues with the phase shift an IIR filter would introduce?
[00:57:07] <jmkasunich> depends on the ratio between filter corner frequency and system bandwidth
[00:57:13] <jmkasunich> its just another thing to tune
[00:57:35] <petev> do you have any plots of the phase shift of any of the PID configs?
[00:57:44] <petev> would be useful to see that they look like
[00:57:53] <petev> maybe I should make a matlab model?
[00:58:00] <jmkasunich> Bode!
[00:58:20] <jmkasunich> the I term rolls off at 20dB/decade, and has 90 degrees lag
[00:58:32] <jmkasunich> the P term has constant gain and 0 phase shift
[00:58:39] <jmkasunich> at zero freq, I dominates
[00:58:56] <petev> what about the FF?
[00:58:58] <jmkasunich> at some point, they cross over, and phase shift goes from 90 lagging to zero
[00:59:16] <jmkasunich> FF isn't inside the loop (because feedback doesn't modify it)
[00:59:24] <jmkasunich> superposition says you can model it separately
[00:59:30] <jmkasunich> which is why I tune without it
[00:59:42] <petev> ok, so kind of like a flat bottom V?
[01:00:00] <jmkasunich> what, the phase lag?
[01:00:09] <petev> no, bode gain
[01:00:43] <jmkasunich> bode gain is infinite at DC, slopes down as freqency goes up, then levels off when Pgain begins to dominate
[01:00:57] <petev> then back up with D, no?
[01:00:59] <jmkasunich> the Pgain/Igain ratio determines that level-off point
[01:01:04] <jmkasunich> true
[01:01:23] <jmkasunich> hadn't thought about D that much
[01:02:00] <petev> with the motor in the loop, that adds another integrator, no?
[01:02:24] <jmkasunich> I recall from the Mazak that once I had the Pgain/Igain RATIO correct (determined by amp bandwidth, motor inertia, etc), I could change the absolute value of both of them together and it worked nicely
[01:02:49] <petev> you mean ratio wise or offset wise?
[01:03:02] <petev> I found I could keep the ratio the same with P/D
[01:03:03] <jmkasunich> the drive/motor combo is an integrator, with at least on additional pole (roll off point)
[01:03:46] <jmkasunich> I think the key was matching the P/I ratio (and the gain of 90 degrees of phase margin) lined up with the amp/motor's pole (and loss of 90 degrees)
[01:04:20] <petev> what would it take to make halscope plot one input against another instead of time?
[01:04:43] <petev> might be nice to make a sweep gen with two siggens
[01:04:55] <petev> could probably see the bode response directly
[01:06:00] <jepler> halscope doesn't do that, but if you don't need user-friendly software you can gather the data with halsampler and graph it with gnuplot or the like
[01:06:30] <jmkasunich> you could probably even FFT it ;-)
[01:06:46] <petev> true
[01:07:17] <petev> just put in an impulse and sample it
[01:18:06] <cradek> anyone want to dig into this max-axes thing?
[01:18:12] <jmkasunich> yeah
[01:18:29] <cradek> I've got 2h11 and no power pack, so let's get going
[01:18:48] <jmkasunich> http://cvs.linuxcnc.org/cgi-bin/cvsweb.cgi/emc2/src/emc/motion/motion.c.diff?r1=1.91;r2=1.92
[01:19:00] <jmkasunich> thats when it was added
[01:19:25] <jmkasunich> prior to that, there were always the same number of joints, and all were inited
[01:20:36] <jmkasunich> http://cvs.linuxcnc.org/cgi-bin/cvsweb.cgi/emc2/src/emc/motion/control.c.diff?r1=1.90;r2=1.91
[01:20:38] <cradek> ok I see
[01:20:44] <jmkasunich> equivalent commit in control.c
[01:21:11] <jmkasunich> lots of places where he just changed EMCMOT_MAX_AXIS to EMCMOT_MAX_JOINTS
[01:21:45] <jmkasunich> Its still upper case, does that mean its still a compile time constant
[01:21:55] <jmkasunich> (and he's still looping over all the joints)
[01:22:07] <jmkasunich> or is that the insmod param now?
[01:22:39] <jmkasunich> looks like the only thing changed in control.c is AXIS to JOINTS
[01:23:46] <jmkasunich> ditto command.c
[01:24:45] <jmkasunich> +/* userdefined number of max joints. default is EMCMOT_MAX_AXIS(=8),
[01:24:45] <jmkasunich> + but can be altered at motmod insmod time */
[01:24:45] <jmkasunich> +extern int EMCMOT_MAX_JOINTS;
[01:24:54] <jmkasunich> evil... its a variable, but all uppercase
[01:25:10] <cradek> hmm
[01:25:48] <jmkasunich> so, pretty much every "loop thru joints" loop in the motion controller now loops thru a smaller number of joints
[01:26:01] <cradek> ok I wonder what we have to do to make that "ok"
[01:26:40] <jmkasunich> free mode stuff should be fine, joints are decoupled there
[01:26:57] <jmkasunich> anything that takes an emcPose and converts it will have issues I bet
[01:27:08] <jmkasunich> kins for example
[01:27:12] <cradek> jepler points out that #axes and #joints don't have to be the same
[01:27:22] <cradek> so it's a big mess
[01:27:27] <jmkasunich> yep
[01:27:41] <jmkasunich> the joints/axes thing has always been a big mess
[01:27:50] <jmkasunich> big enough that we are afraid to tackle it
[01:28:42] <jepler> - emcmotConfig->numAxes = EMCMOT_MAX_AXIS;
[01:28:42] <jepler> + emcmotConfig->numAxes = EMCMOT_MAX_JOINTS;
[01:28:46] <jmkasunich> control.c line 1793
[01:29:16] <jmkasunich> jepler: ?
[01:30:07] <jmkasunich> control.c line 1797, I bet the higher numbered elements of "positions" don't get initialized
[01:30:08] <jepler> it just struck me as strange
[01:30:31] <jmkasunich> he changed that name from AXIS to JOINT everywhere it appeared
[01:30:32] <jepler> I mean, it's "right" in some sense
[01:30:34] <jepler> right
[01:30:43] <jmkasunich> but JOINT is a variable now
[01:30:46] <jepler> right
[01:30:57] <cradek> maybe we should revert the change except for the pin exporting
[01:31:01] <cradek> was that the only real goal?
[01:31:18] <jmkasunich> I think
[01:31:21] <jepler> and the copying-to-HAL
[01:31:30] <jepler> I think that first, alex tried to just skip the pin export
[01:31:33] <jmkasunich> but that might not be easy, there have been a lot of other changes since then
[01:31:38] <jepler> it crashed immediately, at the copy-to-hal step
[01:31:48] <jmkasunich> well duh
[01:31:49] <jepler> there's also stuff like reading the limit switches from HAL
[01:31:59] <jepler> so you have to put the logic at least at that level
[01:32:14] <jmkasunich> if we want to "just" get rid of the HAL, some loops will have to be modified, and others left alone
[01:32:40] <jmkasunich> but, I think only a few need changed back, like the positions[] one
[01:32:58] <jmkasunich> so instead of reverting the whole thing, and then re-applying the hal parts
[01:33:11] <jmkasunich> why not start where we are, and only revert the bad parts
[01:33:19] <cradek> ok if we can identify them
[01:33:28] <jmkasunich> forward and reverse kins are candidates
[01:33:35] <cradek> wonder how many of us can see the problems currently - i'll try
[01:33:47] <jmkasunich> cartesean poses _always_ have 6 coords, regardless of the number of joints
[01:33:58] <SWPadnos> I think part of the idea with that patch was to rename all the stuff that deals with joints, so that it's easier to see what the hell is going on when the "real work" gets done later
[01:34:10] <jmkasunich> so transformations from joint to cartesean space have to fake out the missing joints somehow
[01:34:20] <jmkasunich> SWPadnos: yes
[01:34:32] <jmkasunich> another reason not to revert the whole thing
[01:34:47] <SWPadnos> right
[01:35:21] <jmkasunich> although it terribly offends me that the "max joints" variable is all uppercase
[01:35:30] <jmkasunich> I think EMCMOT_MAX_JOINTS should be the constant 8
[01:35:38] <jmkasunich> and emcmot_num_joints should be the variable
[01:36:05] <jepler> I can agree with that, but if we make that change let's do it as a separate commit from the substantive changes
[01:36:10] <jmkasunich> right
[01:36:16] <jmkasunich> I'm tempted to do it right now
[01:36:32] <jmkasunich> it won't take long, and will help clarify while we do the rest
[01:36:33] <cradek> if so, you will break any reverse patching we might want to do
[01:36:43] <jmkasunich> oh
[01:36:48] <jmkasunich> drat
[01:37:01] <cradek> I'd say it would be best to swallow that (justified) impulse for a bit
[01:37:23] <jmkasunich> there are places where we are going to want to loop 0 to 8, and other places where we will want to loop 0 to num_joints
[01:37:46] <jmkasunich> for example, when loading "positions[]" before doing kins
[01:38:06] <jmkasunich> I want to do a 0-8 loop, writing zeros, then a 0-num_joints loop writing actual feedback values
[01:38:24] <jmkasunich> (right now it only does the latter, which I think is why non-initialized stuff appears)
[01:38:44] <cradek> crap, I get beautiful zeroes for everything on my -simulator build
[01:38:51] <cradek> this is what you saw last night too iirc
[01:38:58] <jmkasunich> yeah, no error on the sim
[01:39:17] <cradek> that's an extra challenge
[01:39:24] <jmkasunich> strangely tho, I also saw no error on the 46 hour old checkout
[01:39:49] <jmkasunich> since alex's change dates from February, that doesn't make sense
[01:40:10] <jmkasunich> unless alex's change opened the door to the bug, but a later change actually made it happen
[01:40:42] <jmkasunich> anyway, positions[0-7] NEEDs to be inited before calling kins
[01:40:45] <jmkasunich> I'm gonna do that now
[01:41:17] <jmkasunich> I'll use EMCMOT_MAXJOINTS_CONSTANT for the '8'
[01:41:23] <jmkasunich> so we can fix the names later
[01:48:54] <jepler> valgrind (which is not always right when it comes to emc :-P) thinks that uninitialized values are used in the following places:
http://pastebin.ca/471681
[01:49:06] <jepler> not sure if that is any help or not
[01:49:55] <jmkasunich> can't hurt, I'll take a look at those
[01:53:09] <jmkasunich> does valgrind do cross-file checks?
[01:53:44] <jmkasunich> unless I threw my line numbers off (a distinct possibility), 459 is this:
[01:53:46] <jmkasunich> if (abs_ferror > joint->ferror_high_mark) {
[01:53:58] <jmkasunich> motion.c line 931 is
[01:54:05] <jmkasunich> joint->ferror_high_mark = 0.0;
[01:56:21] <jepler> yes, it works across files -- it works by keeping track of which addresses have been written with defined values, and propagates "defined" bits around as the program is running
[01:56:42] <jepler> but like I said it's not always right when it comes to emc -- I don't think it understands shared memory blocks and those are pretty central to emc
[01:57:06] <jmkasunich> oh, its a runtime thing...
[01:57:10] <jepler> yes
[02:01:11] <jmkasunich> I'm very surprized it got upset about ferror_high_mark
[02:01:44] <jmkasunich> motion.c and controll.c are linked together, running as the same process (in sim, I assume that what you used for valgrind)
[02:02:01] <jmkasunich> the struct resides in shmem, but both the use and the init are from the same process
[02:03:33] <jmkasunich> oops
[02:04:02] <cradek> oops?
[02:04:37] <jmkasunich> typo
[02:04:46] <jmkasunich> in the code I committed
[02:05:11] <jmkasunich> I'm surprized the compiler didn't warn me
[02:05:18] <jmkasunich> "statement has no effect" or something
[02:08:12] <SWPadnos> heh - perfectly valid C code there
[02:08:27] <jmkasunich> yes, but stupid
[02:08:40] <SWPadnos> yeah
[02:08:45] <jmkasunich> and I swear I've had the compiler warn me about similar stupid but valid code before
[02:09:06] <SWPadnos> in a flurry of compile messages, would you notice?
[02:09:18] <jmkasunich> yes
[02:09:18] <SWPadnos> I'm not sure I would
[02:09:28] <jmkasunich> I have a little script called build
[02:09:41] <SWPadnos> ah - looks for warnings?
[02:09:48] <jmkasunich> that invokes make and make setuid with options that keep it quiet
[02:09:53] <jmkasunich> so warning show up better
[02:10:03] <SWPadnos> ah
[02:12:25] <cradek> bbl
[02:43:23] <jmkasunich> hmm
[02:43:35] <jmkasunich> it looks like my positions[] thing might have fixed it?
[02:43:52] <jmkasunich> I'm not sure, because I don't know how consistently it fails
[02:44:32] <jmkasunich> but its working now
[02:56:57] <jmkasunich> cradek: when you get home can you test it?
[02:57:50] <cradek> back
[02:57:55] <cradek> I'll go try
[03:07:27] <cradek> yes! I think that fixed it
[03:07:32] <jmkasunich> yay!
[03:08:24] <cradek> even if I give it a starting C value in position.txt, it works
[03:08:33] <jmkasunich> cool
[03:08:37] <cradek> and it gets zeroed when I exit
[03:10:20] <SWPadnos> hmmm. is that a good thing?
[03:10:32] <cradek> yes, it's an undefined joint
[03:10:40] <SWPadnos> zeroing it, I mean
[03:10:54] <jmkasunich> yes, its an undefined joint ;-)
[03:11:01] <cradek> the only case it should be there is if we just removed that joint from the machine config
[03:11:33] <SWPadnos> sorry - I was thinking of the possibility of an integrator (erroneously) pointing multiple configs at the same positions.txt file
[03:11:50] <SWPadnos> one that uses a spindle as a C axis, for example
[03:11:56] <SWPadnos> or whatever
[03:12:17] <SWPadnos> so I guess erroneously could be intentionally :)
[03:13:04] <jmkasunich> if a joint doesn't exist in a machine config, we need to have it be zero throughout EMC, to avoid problems
[03:13:20] <jmkasunich> reusing a positions file for machines with different number of joints is wrong
[03:13:44] <SWPadnos> I agree. I think it's the integrator's problem if they want to use multiple configurations for the same machine
[03:14:15] <SWPadnos> that will be a problem either using multiple files, or zeroing unused axes/joints, so it doesn't really matter
[03:14:40] <cradek> jmkasunich: thanks for finding the problem - I feared it was much worse than this
[03:15:21] <jmkasunich> so did I
[03:15:29] <jmkasunich> I tried testing it just on a whim
[03:16:06] <SWPadnos> is this in response to the problem with the soft limit error persisting across runs, or was it something else?
[03:16:08] <jmkasunich> the thing that I fixed was clearly wrong, but the path from there to the G28 error was by no means clear, so there could have been some other problem
[03:16:17] <jmkasunich> no, this was G28 broken
[03:16:19] <SWPadnos> ok
[03:16:24] <SWPadnos> I missed that part
[03:16:47] <jmkasunich> it was spotted yesterday while cradek was testing the soft limits fix, but has actually been present for a while
[03:17:03] <cradek> I think the g28 problem was GET_EXTERNAL_POSITION reading unzeroed a,b,c out of the emcStatus
[03:17:06] <jmkasunich> maybe a couple days, maybe since february
[03:17:21] <SWPadnos> ah, ok. I think I dimly remember some discussions about G28, which may or may not have been related :)
[03:23:27] <jmkasunich> goodnight all
[03:23:36] <SWPadnos> night
[03:24:46] <cradek> bye
[16:51:36] <petev> jmkasunich, u there?
[18:39:24] <jmkasunich> hi guys
[18:39:34] <SWPadnos> hi jmkasunich
[18:40:04] <jmkasunich> what are you doing inside on such a fine day?
[18:40:12] <SWPadnos> I just got in ;)
[18:40:24] <SWPadnos> I think the wife and I may go out and do some "green-up" stuff
[18:40:28] <jmkasunich> ditto - just finished mowing the weeds
[18:41:31] <jmkasunich> I think there might be a few blades of grass between the ground ivy and the dandylions
[18:41:55] <SWPadnos> survival of the fittest or something ;)
[18:56:21] <alex_joni> hi guys
[18:56:28] <alex_joni> was there something I borked?
[18:57:39] <jmkasunich> not really
[18:58:03] <alex_joni> that sounds like half of it :)
[18:58:06] <jmkasunich> rememeber when you made MAX_JOINTS a variable, so it wouldn't export all those HAL pins
[18:58:13] <jmkasunich> axis.7.foo, etc
[18:58:14] <alex_joni> yeah..
[18:58:41] <jmkasunich> there was one place where a loop needed to run all the way up to 8, to init all members of an array, not just the ones that had joints defined
[19:02:44] <alex_joni> right, I thought I catched those :/
[19:14:06] <jmkasunich> all but one of them ;-)
[19:14:20] <alex_joni> at least till next time
[19:14:28] <alex_joni> ;-)
[19:16:38] <jmkasunich> I just remembered one other thing from that change you did, that I want to change
[19:16:50] <jmkasunich> EMCMOT_MAX_JOINTS is now a variable, so it shouldn't be uppercase
[19:17:01] <jmkasunich> * jmkasunich fires up the search/replace editor
[19:17:51] <alex_joni> sed?
[19:18:20] <jmkasunich> I'll probably use a regular editor, so I can make sure I don't modify more than I want to
[19:18:40] <alex_joni> mcedit search&replace works great for me
[19:32:36] <jmkasunich> grrr
[19:32:46] <alex_joni> * alex_joni hides quickly
[19:32:50] <jmkasunich> another place where there is joint/axis wierdndss
[19:33:07] <jmkasunich> in command.c: EMCMOT_SET_NUM_AXES command
[19:33:16] <jmkasunich> is that axes, or joints?
[19:33:44] <jmkasunich> if joints, it shouldn't actually set anything, since we're now getting num_joints from the insmod command line
[19:34:09] <jmkasunich> it should verify that the number requested from userspace matches the number passes in the insmod parameter
[19:34:14] <jmkasunich> I think
[19:34:25] <alex_joni> it shouldn't match
[19:34:31] <jmkasunich> why not?
[19:34:37] <alex_joni> joints != axes
[19:35:01] <jmkasunich> but is that call actually doing axes, or is it doing joints?
[19:35:58] <alex_joni> it's doing axes
[19:36:24] <jmkasunich> ok, then we shouldn't be checking that number against MAX_JOINTS, we should be checking it against MAX_AXIS
[19:37:16] <alex_joni> darn, I hate having this large lag (about 5 seconds now)
[19:37:48] <alex_joni> it's a really twisted way this comes from the ini to motion
[19:38:19] <alex_joni> I tried to fix it twice, but failed each time.. and I somehow scrapped my changes
[19:38:34] <jmkasunich> well right now I'm doing small steps
[19:38:54] <jmkasunich> I'm changing EMCMOT_MAX_JOINTS (which has been a variable since your change) to num_joints
[19:39:04] <alex_joni> I did so too.. but I came to a shaky ladder, which I needed to replace
[19:39:18] <jmkasunich> there will still be EMCMOT_MAX_JOINTS, which is the constant 8
[19:39:37] <alex_joni> right.. sounds sane
[19:39:37] <jmkasunich> there is also EMCMOT_MAX_AXIS, which right now is 8, but it should probably be 6
[19:39:51] <jmkasunich> can't fix that until we evaluate everywhere it is used
[19:42:02] <alex_joni> EMCMOT_MAX_AXIS is not really meaningfull
[19:43:01] <alex_joni> I mean limiting the number of axes is something that doesn't work right..
[19:43:13] <jmkasunich> true
[19:43:27] <jmkasunich> the constant is still needed, because its used to declare some arrays
[19:43:45] <jmkasunich> right now its used to declare some arrays which should really be using EMCMOT_MAX_JOINTS instead
[19:43:52] <alex_joni> if you define XYZ but have a g-code with A in it.. it will simply hang
[19:47:01] <jmkasunich> first step is to separate every usage of EMCMOT_MAX_* into either AXIS or JOINTS
[19:47:23] <alex_joni> I think task should send EMCMOT_MAX_JOINT
[19:47:30] <jmkasunich> send where?
[19:47:42] <jmkasunich> to motmod?
[19:47:50] <alex_joni> yeah
[19:48:01] <alex_joni> with the number of joints coming from the ini
[19:48:05] <jmkasunich> motmod gets the number of joints from the insmod param
[19:48:16] <jmkasunich> because thats the only way it can know the right number of HAL pins to export
[19:48:30] <alex_joni> and the kins (just like jeff's gantrykins) should make the assignments
[19:48:32] <jmkasunich> btw, the number you are talking about is NOT EMCMOT_MAX_JOINTS
[19:48:56] <jmkasunich> EMCMOT_MAX_JOINTS (and every other all caps name) is a #define constant - in this case, 8
[19:49:18] <alex_joni> right.. I meant the EMCMOT command
[19:49:56] <jmkasunich> EMCMOT_SET_NUM_AXIS you mean?
[19:50:04] <jmkasunich> should be changed to EMCMOT_SET_NUM_JOINTS ?
[19:50:56] <jmkasunich> lag lag
[19:51:10] <alex_joni> yeah, that's what I meant.. although I have no idea what good that call does
[19:51:16] <jmkasunich> none at this point
[19:51:25] <jmkasunich> we could rename it EMCMOT_CHECK_NUM_JOINTS
[19:51:26] <alex_joni> or ever..
[19:51:40] <jmkasunich> and it could compare the num_joints from user space to the one passed as an insmod param
[19:51:42] <jmkasunich> just a safety check
[19:51:49] <alex_joni> yeah, that might be something.. although I envision it coming from the same ini param
[19:52:00] <jmkasunich> if the halfile is done right, it does
[19:52:35] <alex_joni> insmod ... num_joints=[KINS]NUM_JOINTS
[19:52:44] <jmkasunich> yep
[19:53:17] <jmkasunich> actually the insmod param is max_joints right now
[19:53:25] <jmkasunich> which I think is wrong, it should be num_joints
[19:53:43] <jmkasunich> * jmkasunich fixes that too
[19:53:54] <alex_joni> * alex_joni is forgetfull..
[19:53:59] <alex_joni> damn lag
[19:54:22] <alex_joni> this really annoys me
[20:02:31] <jmkasunich> ok, changed max_joints to num_joints in a bazillion sample configs
[20:02:39] <jmkasunich> now back to the source
[20:05:55] <alex_joni> * alex_joni sighs
[20:06:17] <alex_joni> think I'll head to bed.. I'll be around tomorrow
[20:06:23] <jmkasunich> ok
[20:06:26] <alex_joni> (from home)
[20:06:40] <alex_joni> good night all
[20:15:16] <jmkasunich> I don't think we'll ever really solve the axis-joint mess until we give joints names
[20:15:42] <jmkasunich> strings, like "knee", "saddle", "table", "quill", "cross-slide", "shoulder", "elbow", etc
[20:15:53] <jmkasunich> (the last two are for puma robots...)
[20:16:02] <SWPadnos> it's too bad motmod needs information back from kins - it would be so nice to have motion just output XYZABC< and have kjins HAL modules that take those in, and output N joints, dependent on the kins
[20:16:36] <jmkasunich> yeah, free mode makes a mess of that
[20:16:52] <SWPadnos> I think free mode would just be in kins at that point
[20:17:01] <SWPadnos> since it's just joints (right?)
[20:17:08] <jmkasunich> maybe
[20:17:32] <SWPadnos> but the problem is that motion needs to know if it's allowed to do the next pose, which needs feedback from kins ...
[20:17:35] <jmkasunich> but EMCMOT commands like JOG_INCR would somehow have to get passed out of motmod then
[20:18:24] <SWPadnos> hmmm
[20:18:28] <SWPadnos> kins has two tasks:
[20:18:38] <SWPadnos> 1) to translate cartesian space to joint space
[20:18:59] <SWPadnos> 2) to tell the TP (in motion) what it can/can't do, based on the machine configuration
[20:19:23] <SWPadnos> number 2 requires back and forth function calls between motion and kins (like now)
[20:19:25] <jmkasunich> I'm not so sure it handles 2 very well
[20:19:50] <SWPadnos> true, 1 is handled this way, and 2 is poorly handled the same way (now)
[20:19:54] <jmkasunich> also, 1 gets split: 1a) transform from cartesean to joint 1b) transform from joint to cartesean
[20:20:14] <SWPadnos> "this way" means by effectively linking the machine <-> world coordinate transforms into motion
[20:20:19] <SWPadnos> sire
[20:20:20] <SWPadnos> sure
[20:20:50] <SWPadnos> but, once motion knows what it can and can't do, there's absolutely no reason why it has to output the joint parameters
[20:21:03] <SWPadnos> err - joint commands (on HAL pins)
[20:21:14] <jmkasunich> what about free mode?
[20:21:44] <SWPadnos> I guess it all depends on how we can get control commands down from userspace into kins
[20:22:02] <SWPadnos> right now, that's a pretty well-defined thing - it all goes through motion
[20:22:09] <jmkasunich> EMCMOT_SET_ARC is a command in coord mode, that requires a move in cartesean space, transformed to joint and output.... EMCMOT_JOG_CONT is a command in free mode that requires a move in joint space
[20:22:32] <jmkasunich> both are passed thru the same shmem struct, and handled by the same big switch statement in command.c
[20:22:38] <jmkasunich> and I think that is appropriate
[20:23:32] <SWPadnos> well, it's possible to have kins export a "num_joints" HAL ppin, and motion to have a "jog_joint_num" and "jog_command" pin
[20:23:40] <SWPadnos> or something like that
[20:23:43] <jmkasunich> eww
[20:23:49] <SWPadnos> yeah - not pretty
[20:24:02] <jmkasunich> I'm a hal proponent, but not for things like that
[20:24:05] <SWPadnos> but I haven't thought through the whole thing yet (I'm not sure I can, to be honest ;) )
[20:24:37] <jmkasunich> a big part of the problem is simply what I was getting at when I said we need to name the joints
[20:24:44] <SWPadnos> remember, we have both HAL connections *and* "load-time linking"
[20:25:06] <SWPadnos> right - I think part of the problem at the lower level is that it's all FUBAR'ed at the high level already
[20:25:08] <jmkasunich> too much of the code was written by people who weren't making the axis-joint distinction
[20:25:28] <jmkasunich> that runs thru all levels - GUIs, task, interp, and motion
[20:25:50] <SWPadnos> there can be a function in kins called "jog_axis(int joint_num, float displacement)"
[20:26:05] <jmkasunich> especially GUIs... GUI writers who had trivkins on the brain wanted to spare users from knowing the joints/axes distinction
[20:26:24] <SWPadnos> right - so that system has to be fiddled with to get a "good" solution. otherwise, it's all a patch (which may still be a better solution than now)
[20:26:31] <jmkasunich> jog_axis? wtf? thats part of the problem!
[20:26:46] <SWPadnos> err - sorry, jog_joint ;)
[20:27:24] <SWPadnos> hmmm. somewhat theoretical question for you:
[20:28:42] <SWPadnos> if kins can tell us how far it can move and how much it can accel along each world axis in one servo/TP cycle, do you think it's valid to do a cartesian vector sum to get "true" limits at that point?
[20:28:55] <SWPadnos> in case that wasn't clear, here's an example
[20:29:55] <SWPadnos> at the current position, our oddkins module calculates the max deltaV in +/-X, +/-Y, +/-Z ... that it can attain
[20:30:10] <SWPadnos> (similar for max vel ...)
[20:30:44] <SWPadnos> do you think it's valid to do a vector sum on those to get the equivalent of TRAJ_MAX_{ACCEL,VEL} at that point?
[20:30:53] <jmkasunich> no clue
[20:30:57] <SWPadnos> heh - thanks :)
[20:31:40] <jmkasunich> actually, if traj_max_accel (or vel) is a single number, it should be based on the lowest of the individuals I would think
[20:32:23] <SWPadnos> well, what should happen is that the cartesian motion should be limited by whatever is the most limiting joint, or the TRAJ limit, in that order
[20:32:36] <SWPadnos> (and of course the programmed rate as well ...)
[20:55:21] <jmkasunich> halui is FULL of places where "num_axis" should be "num_joints"
[20:55:28] <jmkasunich> what a mess
[21:00:49] <jmkasunich> I should figure out what remains to be done to get the new puter "ready"
[21:01:13] <SWPadnos> hmmm. maybe I should put a little more time into RT+SMP+A64
[21:01:25] <jmkasunich> before I start giving it server-ish duties that don't like reboots
[21:01:44] <SWPadnos> I had the kernel running fine, but couldn't compile RTAI due to module versioning being on in the kernel
[21:01:56] <jmkasunich> I'd like to have the temp and fan sensors working, and experiment a bit woth OC'ing
[21:02:06] <SWPadnos> the kernel rebuilt without modversions wouldn't boot
[21:02:12] <jmkasunich> ick
[21:02:15] <SWPadnos> yeah
[21:02:37] <SWPadnos> I'm pretty sure I didn't change anything else, but then again, I may have done anything wrong - modules, modules install, initrd ...
[21:02:46] <jmkasunich> I'm not even gonna attempt RTAI on the new box (actually I should try Chris's experimental one long enough to run a latency test)
[21:03:07] <SWPadnos> if it boots for you - it didn't for me (though that could be due to the number of CPUs)
[21:03:22] <jmkasunich> I have a single dual core, should be OK
[21:03:29] <jmkasunich> I guess thats a worthy experiment
[21:03:44] <jmkasunich> if I could remember how to get the experimental kernel
[21:05:00] <SWPadnos> heh
[21:05:10] <SWPadnos> http://www.linuxcnc.org/experimental/ ?
[21:05:18] <SWPadnos> yep
[21:06:27] <jmk-solo> copy to a handy directory and then run some command that Chris gave me 2 weeks ago when I tested for him
[21:06:33] <jmk-solo> (and I promptly forgot)
[21:07:01] <SWPadnos> download anywhere, then apt-get install *.deb (I think)
[21:07:15] <SWPadnos> in that dir, of course
[21:07:28] <jmk-solo> downloading now
[21:10:20] <jmk-solo> I need better hostnames
[21:10:27] <SWPadnos> heh
[21:10:40] <jmk-solo> naming the box after the case its in... lame
[21:11:27] <jmk-solo> could name them after our cats and dogs... we've had enough over the years
[21:11:40] <jmk-solo> only one of each at the moment
[21:11:53] <SWPadnos> use hillbilly names
[21:11:57] <SWPadnos> just for fun
[21:12:15] <jmk-solo> cletus
[21:12:18] <jmk-solo> jethro
[21:12:39] <jmk-solo> sallymae
[21:12:41] <SWPadnos> then there's maw, paw, grandaddy
[21:12:43] <SWPadnos> ...
[21:14:25] <jmk-solo> I could use able, baker, charlie, etc
[21:14:34] <jmk-solo> kinda boring, but easy
[21:15:23] <SWPadnos> true
[21:15:40] <SWPadnos> at least as good as mine: "main", "Opteron", "laptop" ...
[21:15:59] <SWPadnos> I actually have a couple of real names though - "Thoth" (company computer) and "multimedia"
[21:16:13] <jmk-solo> now that I have a (pseudo) domain jmkasunich.dyndns.org, I wonder if I should be giving each machine full host/domain names?
[21:16:42] <SWPadnos> I'd ask a security guru, but I suspect the answer is no
[21:17:01] <jmk-solo> the router will send all incoming traffic (ssh, http) to one machine anyway, whichever one is designated as the server
[21:17:44] <SWPadnos> if the names aren't reachable from outside, they're probably not necessary
[21:17:50] <jmk-solo> yeah
[21:18:03] <jmk-solo> I do need to figure out how to do proper dns inside
[21:18:18] <jmk-solo> if I want to log into one machine from another right now, I need to use the IP
[21:18:37] <jmk-solo> one thing at a time tho
[21:18:45] <jmk-solo> lets try that experimental kernel
[21:19:26] <jmk-solo> apt-get install isn't it
[21:19:33] <jmk-solo> I think it was a dpkg invocation actually
[21:19:39] <jmk-solo> * jmk-solo reads man page
[21:20:11] <SWPadnos> since you'll have different IPs for internal vs. external traffic, the best bet will likely be to manually populate a HOSTS file on each machine, with the machine name and local IP in it
[21:20:16] <SWPadnos> machine names, that is
[21:20:28] <jmk-solo> as long as local I{s remain the same
[21:20:41] <jmk-solo> which is a matter of getting the linksys router configured properly
[21:21:07] <jmk-solo> I've thought about getting a wireless router, since my work lappy has wireless
[21:21:14] <SWPadnos> ah - it'll try to do that, but will likely get screwed up if you do a lot of plugging in of unknown computers (while the normal ones are off)
[21:21:16] <jmk-solo> its a pain to use it at home otherwise
[21:23:41] <jmk-solo> dpkg -i *.deb seems do be the right command
[21:24:36] <SWPadnos> ok, that'll work
[21:24:45] <SWPadnos> I don't remember - it's been nearly a week since I did it ;)
[21:24:46] <jmk-solo> time to try it
[21:32:52] <SWPadnos> well, either it worked, or ... it didn't ;)
[21:33:17] <SWPadnos> with that lag, I'd say "didn't" (unless you neeed some other kernel modules or video drivers)
[21:34:39] <jmk-solo> it worked, but I had to switch from the nvidia driver to nv
[21:34:54] <jmk-solo> just started latency, 16uS so far, no "abuse" yet
[21:37:51] <SWPadnos> hmmm.
[21:38:15] <SWPadnos> too bad there's no source package :)
[21:38:37] <SWPadnos> I could try rebuilding after changing NR_CPUS to 4 instead of 2
[21:39:04] <SWPadnos> hmmm, though I have all 64-bit stuff on that machine, so Iprobably need to stick with the 64-bit kernel or face many other issues
[21:39:36] <jmk-solo> so far only 16.97uS latency
[21:39:53] <jmk-solo> I've built emc, done 'find /', dragged windows around
[21:40:12] <SWPadnos> run glxgears -printfps in a terminal, then do all that other stuff too
[21:40:28] <jmk-solo> jmkasunich@solo:/usr$ glxgears
[21:40:29] <jmk-solo> Xlib: extension "GLX" missing on display ":0.0".
[21:40:29] <jmk-solo> Error: couldn't get an RGB, Double-buffered visual
[21:40:29] <jmk-solo> jmkasunich@solo:/usr$
[21:40:32] <jmk-solo> driver issue maybe?
[21:40:50] <SWPadnos> hmm. maybe the mesa 3d libraries aren't being loaded
[21:40:53] <SWPadnos> strange
[21:40:57] <jmk-solo> I've run glxgears here before, with different kernels
[21:41:08] <SWPadnos> sure, I'm sure it worked on nvidia
[21:41:31] <SWPadnos> atually, if you reboot into the other kernel, I'd be curious to know the scores you get from glxgears
[21:41:46] <jmk-solo> the stock ubuntu SMP kernel?
[21:42:07] <SWPadnos> you can recompile the nvidia module by the way - download from nvidia and the installer will compile if necessary
[21:42:08] <SWPadnos> yes
[21:42:26] <jmk-solo> I had a heck of a fight trying to get the driver to load in the first place
[21:42:29] <SWPadnos> the laptop has a 7800Go / 7950Go or something like that, right?
[21:42:53] <jmk-solo> the work lappy? dunno
[21:43:06] <SWPadnos> oh right - solo is the new machine, not the laptop ...
[21:43:07] <jmk-solo> this box has a 7100GS
[21:43:28] <jmk-solo> the only linux on the lappy is a VM
[21:43:49] <SWPadnos> right - nevermind :)
[21:46:05] <jmk-solo> I just noticed something interesting.... when a compile is running, 'lat max' tents to be small, between 300 and 3000nS, often below 1000
[21:46:27] <jmk-solo> when the compile isn't running, its between 15000 and 16000
[21:47:07] <SWPadnos> interesting
[21:47:16] <jmk-solo> I wonder if when not running it goes to the idle state, and that halts the CPU or does something else that hurts latency?
[21:47:31] <jmk-solo> when compiling, it never goes idle, and could respond to interrupts right away
[21:48:34] <SWPadnos> the RT kernel shouldn't idle
[21:48:50] <SWPadnos> or at least it shouldn't halt or reduce frequency
[21:49:05] <jmk-solo> I got warnings when I booted, that it can't change frequency
[21:49:21] <jmk-solo> (I have cpu freq meters in the panel, I think the warning came from them)
[21:49:27] <SWPadnos> right
[21:49:27] <jmk-solo> they are showing full freq all the time
[21:49:56] <jmk-solo> but there is no denying the latency results, both max and avg latency drop a lot when the machine is busy
[21:50:35] <jmk-solo> RTH| lat min| ovl min| lat avg| lat max| ovl max| overruns
[21:50:35] <jmk-solo> RTD| -1424| -1608| 2614| 15910| 16970| 0
[21:50:35] <jmk-solo> RTD| -1428| -1608| 2622| 15922| 16970| 0
[21:50:35] <jmk-solo> RTD| -1422| -1608| 2532| 15661| 16970| 0
[21:50:35] <jmk-solo> RTD| -1427| -1608| 2615| 15824| 16970| 0
[21:50:35] <jmk-solo> RTD| -1421| -1608| 2610| 15884| 16970| 0
[21:50:37] <jmk-solo> RTD| -1591| -1608| -925| 15704| 16970| 0
[21:50:40] <jmk-solo> RTD| -1594| -1608| -1003| 1380| 16970| 0
[21:50:41] <jmk-solo> RTD| -1584| -1608| -1096| 311| 16970| 0
[21:50:44] <jmk-solo> RTD| -1598| -1608| -953| 892| 16970| 0
[21:50:46] <jmk-solo> RTD| -1569| -1608| -997| 745| 16970| 0
[21:50:48] <jmk-solo> RTD| -1582| -1608| -1021| 520| 16970| 0
[21:50:49] <jmk-solo> RTD| -1487| -1608| -1067| 904| 16970| 0
[21:50:51] <jmk-solo> RTD| -1601| -1608| -1019| 685| 16970| 0
[21:50:54] <jmk-solo> see that big drop?
[21:52:41] <SWPadnos> yep
[21:53:49] <SWPadnos> I wonder if it's something weird like the cache not being filled with as many different things, because there's a CPU hog running (ie, only one process gets a chance to run instead of many)
[21:54:15] <jmk-solo> I have doubts about that
[21:54:54] <jmk-solo> when the machine is idle, the only thing that can mess with cache are various background things, and they won't stop running during a compile
[21:55:24] <SWPadnos> but if they're low priority, then they wouldn't be at the top of the run queue if another high priority task needs the CPU
[21:55:40] <SWPadnos> s/high/higher/
[21:56:12] <jmk-solo> hmm
[21:56:31] <jmk-solo> running "while true ; do echo "foo" /dev/null ; done
[21:56:57] <jmk-solo> took the %idle down to 49%, and had the same effect on latency
[21:57:13] <jmk-solo> 90% of the readings are less than 1000nS
[21:57:19] <SWPadnos> you can test it - make a task that just loops and prints out a number every 1000000 loops, set it to a low priority, and see how fast the numbers come up with and without a compile running
[21:57:27] <SWPadnos> interesting
[21:57:47] <cradek> SWPadnos: I'm building a matching linux-source package to put with those now
[21:57:51] <SWPadnos> the renice it for a higher priority, and see what happens :)
[21:58:04] <SWPadnos> cradek, great - I'll see if I can use it - I do have he 64-bit ABI problem though
[21:58:09] <SWPadnos> s/he/the/
[21:58:16] <jmk-solo> two of those bash loops, %idle is now zero, and latency is even better
[21:59:16] <SWPadnos> this is interesting - I had done latency tests way back when on my celeron (touchscreen) machine, and they were very good - like 6500 max
[21:59:44] <SWPadnos> (that was a BDI) when I installed Ubuntu/EMC2, the latency is now in the 11000-12000 range
[22:00:06] <SWPadnos> it's a 500 MHz machine, so the cycles vs. nanoseconds thing would tend to skew the count the other way
[22:00:27] <jmk-solo> reniced both bashes to 19 and latancy is still great
[22:00:30] <SWPadnos> the system probably has a lot more crap running in the background
[22:00:56] <cradek> maybe rtai is worse now, or the kernel is
[22:00:59] <SWPadnos> (useless crap on that box - it doesn't even have USB)
[22:01:05] <SWPadnos> it's possible, but I hope not
[22:01:08] <cradek> crap processes shouldn't hurt anything should they?
[22:01:23] <SWPadnos> hmmm - actually, the RTAI tests still print in ns, so it's only 2x worse
[22:01:36] <jmk-solo> cradek: any process can dirty cache
[22:01:38] <SWPadnos> if they evict the RT tasks from cache, they'll most certainly hurt latency
[22:02:04] <SWPadnos> jmk and I did some tests way back when, on the stepgen make_pulses code
[22:02:06] <cradek> ah ok
[22:02:30] <SWPadnos> when we took the period way down (faster), the Tmax went way down
[22:02:31] <jmk-solo> can somebody else try a latency test (SMP or uniprocessor), and then while its running do "while true ; do echo "foo" >/dev/null ; done" and see if that changes the latency?
[22:03:21] <SWPadnos> interesting - the first one may have been bound to CPU 0, which would give you the just-under-50% CPU usage
[22:03:28] <SWPadnos> and most of the cache help
[22:03:58] <cradek> if you really want good latency you can isolate a cpu for rtai with a boot param
[22:04:55] <jmk-solo> it just blows my mind that loading the CPU with a nice 19 process can improve latency
[22:05:24] <SWPadnos> it could be as simple as the kernel looking at more processes to decide on which one should run
[22:05:36] <SWPadnos> (when the obvious one isn't there)
[22:07:06] <jmk-solo> I restarted the latency test... with one nice 19 endless loop running, my ovl max so far is 3924nS
[22:08:45] <SWPadnos> argh. stupid missing /dev/rtf3
[22:11:09] <SWPadnos> ok, it doesn't seem to help on my kiosk PC
[22:11:09] <jmkasunich> doesn't seem to have a beneficial effect on this machine
[22:11:26] <SWPadnos> I wonder if it's a fluke of the SMP kernel
[22:11:36] <jmkasunich> average went up by a few hundred ns I think, max remained pretty similar
[22:11:55] <jmkasunich> maybe
[22:12:32] <jmkasunich> I wonder if having a process running full blast on one cpu tends to keep other processes on the other CPU, instead of bopping back and forth
[22:14:04] <jmk-solo> I am pleasantly surprized with this machine anyway. I was expecting newer = worse latency, but even without the "keep it busy" trick, I topped out at 16970, the sempron is close to 30000
[22:14:55] <cradek> jmk-solo: try booting with isolcpus
[22:14:55] <SWPadnos> this chip has 2M per core of cache, right? (or was it 4M)
[22:15:06] <jmk-solo> 4M shared level 2 cache
[22:15:07] <cradek> SWPadnos: that source package is now uploaded
[22:15:31] <SWPadnos> ok, thanks
[22:24:09] <cradek> Finally it is up to you to exploit such a feature by assigning all of your
[22:24:09] <cradek> tasks to the isolated CPUs, according to your needs, by using:
[22:24:09] <cradek> "rt_task_init_cpuid", "rt_thread_init_cpuid", "rt_task_init_schmod".
[22:24:26] <cradek> hmm looks like an emc change might be needed to do that
[22:24:38] <SWPadnos> I think all RTAPI tasks are explicitly bound to CPU 0
[22:24:48] <SWPadnos> which is fine in the uniprocessor case
[22:24:51] <cradek> oh ok
[22:25:06] <cradek> so if isolcpus=0 boots, it might be what we would want
[22:25:18] <cradek> I'll go try it
[22:25:21] <SWPadnos> I'm relatively sure jmk said that during a discussion about SMP a long time ago
[22:26:23] <jmkasunich> yeah, I think rtapi puts all RT tasks on cpu 0
[22:27:43] <cradek> it boots...
[22:29:05] <cradek> doesn't work though - top shows stuff on both cpus still.
[22:33:44] <SWPadnos> you know - 149M of updates really take a long time to install on a celeron 500 computer
[22:34:20] <jmk-solo> they really should put out new isos for the LTS released every 6 months or so
[22:34:45] <cradek> that would be nice.
[22:35:14] <jmk-solo> that doesn't help folks who are installed and trying to keep up to date, but prevents the "install, then dl a crapload of updates" thing for new installs
[22:35:27] <SWPadnos> they should figure out a better way of updating than having every computer download the new packages from the net (like an update CD or a very easy to set up local update server)
[22:35:44] <jmk-solo> if you are installed and trying to stay up to date, its not so painfull - a package today, 3 tomorrow, 1 a couple days later, etc
[22:35:57] <SWPadnos> assuming the machine is always/often connected to the net ...
[22:36:14] <jmk-solo> if its not its crippled
[22:36:32] <SWPadnos> a machine control has no real need to be on the net, except for software updates ...
[22:36:57] <jmk-solo> true
[22:37:24] <jmk-solo> but I find it very convenient to be able to just search for packages anytime I need something
[22:37:36] <SWPadnos> me too at the moment
[22:37:38] <cradek> a machine control has no need for 99% of the ubuntu updates anyway
[22:37:43] <jmk-solo> dinnertime, bbl
[22:37:49] <SWPadnos> when I get a computer connected to the BP, I may rethink that though
[22:37:52] <SWPadnos> see you
[22:38:07] <jmk-solo> a machine control has no need for 99% of the ubuntu _packages_
[22:38:24] <jmk-solo> but if you got them, its good to keep them updated
[22:38:40] <cradek> hmm, I don't remember the sun setting at 5:30 pm
[22:38:47] <SWPadnos> heh
[22:38:53] <SWPadnos> it's 6:30, silly
[22:38:54] <cradek> nor it being green
[22:39:04] <SWPadnos> tornadoes on the way
[22:39:23] <SWPadnos> or at least bad weather
[22:39:48] <cradek> yep
[22:41:45] <cradek> wow, 3000 nsec latency
[22:41:57] <SWPadnos> got the isolated CPU working?
[22:42:15] <cradek> yes it seems like it can only isolate cpus > #0
[22:42:24] <cradek> so we'd have to tweak emc to use it
[22:42:26] <SWPadnos> heh - that makes sense, actually
[22:42:51] <cradek> well the docs don't say it
[22:42:57] <cradek> but even when I say isolate 0, it isolates 1
[22:43:43] <SWPadnos> no, but since CPU 0 is in many ways the "master" (and the one connected to interrupts), it makes sense that you can't isolate it - the rest of the system must have access to its resources
[22:44:09] <cradek> 3526
[22:45:08] <cradek> with find / and fullscreen GL running
[22:45:23] <cradek> 4261
[22:45:36] <SWPadnos> gah - I've got to change the power supply in that kiosk PC - it's almost as loud as your HP
[22:45:58] <cradek> computers are supposed to be loud. how else do you know they're working?
[22:46:06] <cradek> man I need to go find a different keyboard
[22:46:19] <SWPadnos> sometimes I try to use the monitor to tell me if the computer is working
[22:46:23] <SWPadnos> oh, and the keyboard
[22:46:28] <SWPadnos> :)
[22:46:38] <cradek> amazingly, the machine I'll take to fest this time is even bigger
[22:46:52] <SWPadnos> impossible!
[22:47:06] <cradek> nope
[22:47:10] <SWPadnos> I could bring in my old KS-011B-based system, but it doesn't work, and it's heavy
[22:47:19] <cradek> hmm we should make sure at least one of us brings a CD burner
[22:47:28] <SWPadnos> I'll have one or more
[22:47:40] <cradek> ok I won't mess with it then - I would have to go buy one
[22:48:04] <SWPadnos> ah
[22:48:09] <cradek> 5189
[22:48:16] <cradek> keeps going up - wonder where it will stop
[22:48:21] <SWPadnos> I have a couple of CD-RW drives and a DVD+everything recorder as well
[22:48:27] <cradek> but I guess that's still really good isn't it
[22:48:38] <SWPadnos> it's pretty darned good, yes
[22:52:46] <cradek> looks like it's now stuck there
[22:53:07] <cradek> wow here comes the weather
[22:53:21] <SWPadnos> woosh
[22:54:24] <cradek> http://www.weather.com/maps/maptype/dopplerradarusnational/centraldopplerradar1800_large.html
[22:54:27] <cradek> hey look, red
[22:54:57] <SWPadnos> I guess they're predicting weather for your area
[22:55:04] <cradek> yes I'd say that prediction is right
[22:56:20] <SWPadnos> ok - off to find a DC supply for that kiosk PC (got a picoPSU for it, which may actually allow me to install a CD-ROM in the space formerly occupied by the power supply)
[22:56:40] <SWPadnos> oh wait, it isn't 5.25" deep - damn
[22:56:40] <cradek> I think I found the 0 to change to a 1 in rtapi... this will be interesting
[22:56:55] <SWPadnos> I'd be sure you have >1 CPU first ...
[22:57:00] <SWPadnos> (in code, if possible)
[22:57:10] <cradek> oh it's not like I'm going to check it in
[22:57:14] <SWPadnos> heh
[22:57:19] <SWPadnos> it would be great though
[22:58:05] <cradek> http://pastebin.ca/472949
[22:58:07] <cradek> ^^ final results
[22:59:34] <cradek> wow, I think it actually sounds smoother
[23:04:43] <cradek> http://timeguy.com/cradek-files/emc/periods.png
[23:05:19] <cradek> 1us/div
[23:12:27] <cradek> 10usec base period runs without noticeable slowdown on the gui
[23:13:17] <jepler> cradek: ooh you tried out AC offset
[23:13:21] <jepler> how d'you like it?
[23:13:23] <cradek> yeah it's very nice
[23:13:33] <cradek> that plot would have been a huge pain otherwise
[23:13:47] <cradek> holy crap, weather is happening
[23:13:53] <jepler> yeah I hear it too
[23:16:59] <cradek> wow, 4kHz servo+traj cycle
[23:20:03] <SWPadnos> how low a BASE_PERIOD can you get?
[23:20:13] <cradek> I didn't try any lower than 10usec
[23:20:43] <SWPadnos> that's relatively low ;)
[23:41:58] <jmk-solo> impressive cradek
[23:43:02] <cradek> jmk-solo: looks like smp would be involved for getting the highest performance from emc
[23:43:44] <jmk-solo> should probably figure out how to do smp well in rtapi
[23:43:52] <jmk-solo> (like putting things on CPU 1)
[23:44:19] <cradek> yeah, I changed that one number
[23:45:48] <cradek> the latency test must figure out which processor to run on, because the numbers got a lot better when I had one isolated - maybe you should look at the source for that
[23:46:08] <jmk-solo> not me boss
[23:46:15] <jmk-solo> I have enough irons in the fire
[23:46:24] <cradek> ok
[23:46:34] <cradek> sorry, I read "should probably" as "I should probably"
[23:46:43] <jmk-solo> "we"
[23:46:47] <cradek> when you meant "I wish someone else would" :-)
[23:47:04] <petev> is emc applying the soft limits to angular axes? If so, what happens if you allow more than one rotation?
[23:47:17] <jmk-solo> is the source for the latency test installed by default, or do I have to go hunting for it
[23:47:37] <cradek> kernel and rtai source are separate - you'd have to hunt a bit
[23:47:59] <jmk-solo> drn
[23:48:01] <jmk-solo> darn
[23:48:06] <cradek> chris@emc:~$ apt-get source rtai-modules-2.6.15-magma
[23:48:17] <jmk-solo> that includes the latency test?
[23:48:43] <cradek> rtai-3.3/testsuite/kern/latency/latency-module.c
[23:52:26] <petev> jmk-solo, I had some success last night
[23:52:38] <jmkasunich> thats good to hear
[23:52:43] <petev> I started looking closely at the PID output during dynamic moves
[23:52:50] <petev> and it was clear something was really wrong
[23:53:10] <petev> it looked like a chopper output, saturating from rail to rail every other few cycles
[23:53:33] <petev> I found a low pass filter in the drives that was burried a few layers down under tuning
[23:53:48] <petev> it was set to 250Hz on the drive I was trying to tune
[23:53:56] <jmkasunich> yuck
[23:54:05] <petev> I think the P and D were way too high to compensate for this
[23:54:15] <petev> and that's why the steady state was so unstable
[23:54:34] <petev> it was set to other values on other drives, which is why they were behaving different
[23:54:52] <petev> the PID now tunes more like what I would expect, and the numbers are much smaller
[23:55:02] <jmkasunich> I told ya... lags inside a loop suck
[23:55:11] <petev> the output looks much better too, and it's holding tenths right now
[23:55:32] <petev> I still need to work on I as it winds up on long moves, need to set the limit
[23:55:43] <petev> then I think I can turn it up so it reacts faster
[23:56:17] <petev> from the numbers I'm getting now, I'm wondering if the Mazak drives weren't in torque mode?
[23:56:18] <jmkasunich> winds up on long moves? does that mean you have significant error while moving, and it only gets good after you stop?
[23:56:42] <jmkasunich> not sure about the Mazak drives
[23:56:43] <petev> the error is not bad during the move, but it builds up due to the length of the move
[23:56:55] <petev> it's still holding tenths, but I think it can be better
[23:56:59] <jmkasunich> they were NOT in encoder feedback vel mode, I can tell you that
[23:57:05] <jmkasunich> the encoders don't go to the drives
[23:57:20] <jmkasunich> they're either in armature voltage feedback vel mode, or torque mode
[23:57:28] <petev> I was thinking the torque loop in the drive, not the encoders
[23:58:00] <jmkasunich> your drives have vel loopsin the drive, using encoder for vel fb, don't they?
[23:58:07] <jmkasunich> loops in
[23:58:15] <petev> BTW, the hal config script in tkemc has some redefined variables
[23:58:33] <petev> the drives have pos, vel, and torq loops
[23:58:38] <jmkasunich> I know no-tink about no tkemc stuff
[23:58:44] <petev> you can choose which to use at the external interface
[23:59:01] <petev> when you use vel, the pos loop isn't used, etc.
[23:59:09] <jmkasunich> right
[23:59:12] <petev> the encoders are always position feedback
[23:59:21] <petev> they can be scaled, but that's it
[23:59:29] <jmkasunich> you can also get velocity from encoders of course
[23:59:49] <jmkasunich> when you run the drive in vel loop mode, what does it use for velocity feedback?
[23:59:57] <petev> I don't think the drive will do it for you though, you have to differentiate yourself