#emc-devel | Logs for 2007-08-03

Back
[00:08:42] <SWPLinux> cradek, jepler, jmkasunich: have any of you fiddled with isolcpus much?
[00:09:37] <cradek> I tried it, it works
[00:09:48] <SWPLinux> argh - phone
[00:19:46] <SWPLinux> did you see a big difference in latency with isolcpus?
[00:21:21] <cradek> yes
[00:21:29] <SWPLinux> hmmmm
[00:31:45] <SWPLinux> sorry - long phone call :)
[00:32:02] <SWPLinux> I don't see much difference - still ~17000 us
[00:32:18] <cradek> isolcpus=1?
[00:32:22] <SWPLinux> is there ayes
[00:32:28] <SWPLinux> oops - yes
[00:32:36] <cradek> top shows all the user processes on 0?
[00:32:41] <SWPLinux> isolcpus=1 and also noirqbalance
[00:32:43] <SWPLinux> one sec
[00:32:55] <cradek> I didn't use that other one
[00:33:01] <SWPLinux> yes
[00:33:21] <SWPLinux> I can try without it in abit
[00:33:25] <SWPLinux> or a bit
[00:33:26] <cradek> seems working then
[00:33:34] <cradek> maybe it just doesn't help on yours
[00:33:46] <cradek> are you sure the rtai test is running on cpu 1?
[00:33:47] <SWPLinux> argh. it would help if I would see CPU number instead of CPU % :)
[00:34:58] <SWPLinux> ok, it didn't work
[00:35:33] <cradek> kernel things like kflushd/1 are the only things that should be on cpu 1
[00:35:57] <SWPLinux> ok, it is all kernel stuff it seems
[00:36:33] <SWPLinux> did you notice that isolcpus is a list? I think that's why everything needed to be put on CPU 1
[00:36:46] <SWPLinux> or was it intentional to isolate #1?
[00:37:01] <cradek> I had two processors, I couldn't isolate 0, so I isolated 1
[00:37:14] <cradek> I don't know how it works with more than 2
[00:37:29] <SWPLinux> you'd have to do isolcpus=3 on a quad box for rtapi to choose the correct one
[00:37:28] <cradek> you don't have more than 2 do you?
[00:37:32] <SWPLinux> not on this machine
[00:37:36] <cradek> yes I think that's right
[00:39:06] <SWPLinux> ok, the latency test does run on CPU 1
[00:39:21] <cradek> but it's no better?
[00:39:38] <SWPLinux> a little worse if anything, but too close to call
[00:39:49] <cradek> huh
[00:39:58] <cradek> want some P3s? I've got a few extras
[00:40:02] <SWPLinux> this is a core 2 duo T5600 CPU
[00:40:04] <SWPLinux> heh
[00:40:21] <SWPLinux> if only I could fit a few into a tiny little enclosure, along with a couple of PCI cards
[00:40:21] <cradek> do you need it better than 17000?
[00:40:27] <SWPLinux> probably
[00:40:34] <cradek> I mean a lot of machines give that and work fine
[00:40:35] <cradek> oh
[00:40:35] <SWPLinux> this is for a 10 KHz update loop
[00:40:55] <SWPLinux> loading does seem to help though, as jmk saw before
[00:43:57] <SWPLinux> ok - still got one with a 16000+ number, but lots of lines with 100-200 max
[00:44:29] <cradek> yuck, so you've got a glitchmaker of some kind
[00:44:41] <SWPLinux> hmmm. a triple nested loop from 1-1000 in bash is not very quick to complete
[00:44:40] <SWPLinux> yep
[00:45:29] <SWPLinux> very low - often <100, none above 200, then something in the 5500-6500 range, and even less often something 10k-17k
[00:45:52] <SWPLinux> every 5 seconds. could be USB (this is a USB keyboard too)
[00:46:26] <SWPLinux> I really need to make a version of that test that does a histogram at the end
[00:48:50] <SWPLinux> another item of a little interest - the latency test doesn't use floating point, but there's an option for the latency_test kernel module to use FP if you want
[00:48:58] <SWPLinux> I just don't know how to get that option to the module
[00:50:16] <SWPLinux> well, I'll continue this another day. at least the systems work, even if they're not as good as I had hoped
[00:50:25] <skunkworks> wait - did I meet stuart?
[00:50:30] <SWPLinux> I'll need to stick a 5i20 in one and do some scope tests
[00:50:38] <cradek> he's the guy who brought us pizza
[00:50:44] <SWPLinux> skunkworks: probably, he was around quite a bit
[00:50:49] <cradek> were you there that night?
[00:50:54] <skunkworks> no
[00:51:05] <skunkworks> must have been befor thursday night
[00:51:06] <cradek> he was the one with the teenage son
[00:51:17] <SWPLinux> he's the guy that was sitting at the end of the table Jon Elson was at with his little PC with the touch panel
[00:51:28] <skunkworks> ah - ok
[00:51:49] <skunkworks> seemed knowlegable. tried to give one of the touch panels to chris?
[00:51:54] <SWPLinux> yep
[00:52:02] <SWPLinux> might have succeeded
[00:52:08] <skunkworks> ok - I remember now
[00:52:33] <SWPLinux> it's very hot. I think I should have another ice cream sandwich
[00:52:47] <cradek> it's very hot - I think I should stay inside
[00:53:16] <SWPLinux> I'm inside, and it's very hot. maybe I should go outside
[00:53:20] <cradek> darnit now I want an ice cream sandwich
[00:53:26] <SWPLinux> with my ice cream sandwich
[00:53:28] <SWPLinux> :)
[00:53:31] <cradek> curses
[00:54:08] <SWPLinux> at least I didn't say something like oatmeal with maple and brown sugar
[00:54:32] <cradek> I got some really good granola yesterday
[00:54:44] <SWPLinux> what kind? (a granola cereal?)
[00:54:51] <cradek> cashew raisin and banana in it
[00:54:58] <SWPLinux> nice
[00:55:07] <cradek> no, the simple kind that's just fruit/nuts/oats
[00:55:11] <SWPLinux> possibly a bit sweet though :)
[00:55:38] <cradek> not sweetened at all, just the raisins
[00:55:54] <SWPLinux> hmmm. I think I'll run the latency test without X
[00:55:58] <SWPLinux> see you later
[00:56:02] <cradek> good luck
[00:56:06] <SWPLinux> thanks
[00:56:11] <cradek> try text mode too
[00:56:30] <cradek> he may not realize that he's not in text mode
[00:56:43] <SWPadnos> text vs. fbcon?
[00:56:49] <cradek> yes
[00:56:50] <SWPadnos> ok
[01:30:01] <SWPadnos> well, I've got the test running in text mode (I think - not sure how to make sure it's text and not fbcon), and it's interesting
[01:30:41] <SWPadnos> without a CPU load, the max latency is consistently 16700 or thereabouts - every line that prints has something around there for the max
[01:31:47] <SWPadnos> with a CPU load, the numbers are often in double digits, and mostly stay below a few hundred, except that every 5 seconds or every 15 seconds (sometimes every 20 seconds - it seems to always be a multiple of 5 seconds), there's a number in the high 5000's
[01:32:21] <SWPadnos> and every once in a while - not sure if this is every 60 / 64 seconds, there's also a 16000+ number
[01:32:54] <SWPadnos> I have the test running now, redirected to a file. I'll let it sit for a bit without any keyboard activity or screen updates and see what happens
[03:53:20] <SWPadnos> well, I don't know what causes the latencies to suck so badly when the CPU is unloaded, but I did find out what was causing the 5-second blips - it's kjournald - the journaling daemon for ext3 (and others)
[03:54:43] <SWPadnos> I think someone had noticed this before, and the general attitude was that journaling is more imporatnt that a few microsecond blip from time to time
[03:55:25] <cradek> interesting
[03:55:45] <SWPadnos> indeed
[03:56:44] <SWPadnos> it's good news for me, because I can load down the second core with a do-nothing program, and I don't need to write to disk for the most part, so I can use a non-journaling filesystem (which I'd do anyway since I really want to use a flash drive for this)
[03:57:02] <SWPadnos> which leaves me with sub-microsecond latencies pretty consistently
[03:58:19] <cradek> that's some good troubleshooting
[03:58:24] <SWPadnos> thanks
[12:54:01] <steve_stallings> steve_stallings is now known as steves_logging
[14:46:17] <SWPadnos> more latency data: using a Celeron-M 1.866 GHz (the same clock speed as the core 2 duo), I get slightly higher, but much more consistent timing
[14:46:48] <SWPadnos> latencies are between 1000-2000 ns almost all the time. there are no 5000+ spikes from kjournald
[15:13:54] <skunkwork> the high letency you where getting yesterday every 60 seconds or so.. Would that maybe be the intel smi issue?
[15:15:01] <SWPadnos> it wasn't every 60 seconds, it was every 5 seconds
[15:15:13] <SWPadnos> and only 5000 or 15000-ish, not 49723560395
[15:15:18] <SWPadnos> like SMI :)
[15:15:44] <skunkwork> right - thought maybe the computer was fast enough that it wasn't so large ;)\
[15:15:45] <SWPadnos> oh - on the UP system, adding a load doesn't help, nor does it hurt
[15:15:47] <SWPadnos> heh
[15:15:55] <SWPadnos> 64 seconds is still 64 seconds :)
[15:16:18] <SWPadnos> hmmm. coffee time. maybe I should look at the max lat now - it's been running for a while
[15:39:34] <SWPadnos> interesting. the low latencies were still there - the max was 2179 when I looked at it. that was with the 2.6.20-SMP experimental kernel. With the 2.6.15-magma, the times are much worse - around 6000
[15:39:58] <SWPadnos> same bootup, same machine, same CPU - just selected the magma kernel in grub
[16:52:26] <SWPadnos> cradek, how difficult do you think it would be to do a comparison of the options used when compiling the stock kernel on the liveCD vs. the SMP 2.6.20 kernel in /experimental?
[16:52:49] <SWPadnos> I know a lot of options were added, so there would be a lot of noise in a straight diff of .configs
[17:05:05] <alex_joni> hi guys
[17:05:22] <SWPadnos> hi Alex
[17:05:37] <alex_joni> * alex_joni is in a car driving to a customer :)
[17:05:45] <SWPadnos> get off the internet, fool!
[17:05:50] <alex_joni> why?
[17:05:57] <alex_joni> I'm not driving
[17:06:01] <SWPadnos> heh
[17:06:02] <alex_joni> and it's boring :P
[17:07:27] <alex_joni> right now we're doing 15kmh or so
[17:07:33] <alex_joni> some kind of congestion
[17:07:37] <SWPadnos> ugh. I can almost run that fast
[17:07:49] <SWPadnos> well, I could when I was in the army anyway
[17:08:52] <alex_joni> heh
[17:16:07] <alex_joni> who wants to hear something funny?
[17:16:51] <skunkwork> Me
[17:16:55] <alex_joni> did you see paul's email complaining about compiling 2.1.7 on 2.6.20 ?
[17:16:55] <SWPadnos> oh - me too
[17:17:09] <SWPadnos> I saw the email, but didn't bother to look at the compile logs
[17:17:17] <alex_joni> it fails on rtai_rtapi.c line 128
[17:17:23] <SWPadnos> err - didn't have a chance to look at the logs :)
[17:17:26] <alex_joni> guess who wrote that line :P
[17:17:28] <SWPadnos> heh
[17:17:41] <alex_joni> yabosukz
[17:19:53] <alex_joni> although from the message I think something else is happening (not at line 128)
[17:30:13] <SWPadnos> hmmm. I can compile fine on 2.6.20, though it isn't 2.6.20-11 (it's whatever is in Chris' SMP kernel)
[17:30:34] <SWPadnos> though that is TRUNK, I didn't try with 2.1.7 release
[17:31:06] <alex_joni> it's quite different in TRUNK
[17:32:10] <SWPadnos> I can try a checkout and build on the SMP kernel a little later today, see if I get the same (or similar) problems
[17:32:43] <alex_joni> if it's already fixed in TRUNK.. I wouldn't bother
[17:33:06] <alex_joni> but I ran emc2.TRUNK fine on the smp kernel
[17:33:15] <alex_joni> I think even 2.1.x
[17:33:35] <alex_joni> * alex_joni looks
[17:34:00] <alex_joni> (it's nice to aheva SMP vmware)
[17:34:07] <SWPadnos> I think we should really try to get the 2.6.20 kernel on the liveCD - I get latencies in the 1-2 uS range with it
[17:34:33] <SWPadnos> I haven't tried 2.6.20 uniprocessor yet, so there may be something between UP vs. SMP
[17:35:01] <alex_joni> bet it is
[17:35:09] <alex_joni> I only have the 2.6.17 packages here
[17:35:45] <SWPadnos> 2.6.20 is in /experimental - that's what I'm using (heven't built a custom kernel there yet)
[17:36:06] <SWPadnos> I think I'll make a new latency test that has some statistical and/or logging capability
[17:36:18] <SWPadnos> the min/max is great, but it isn't the greatest for analysis
[17:36:53] <SWPadnos> I'll have to see if this hard drive will still boot a laptop first though :)
[17:38:11] <alex_joni> maybe you can use the min/max of HAL threads to analyze it
[17:38:35] <SWPadnos> no - I want a test that runs outside of HAL
[17:38:54] <alex_joni> ROFL.. I plugged in a webcam
[17:38:58] <SWPadnos> like the current test, but with bin a histogram and/or a several second log
[17:46:49] <alex_joni> bbl
[17:47:02] <SWPadnos> have fun
[17:51:13] <alex_joni> thanks.. it's going a bit faster now :D
[19:50:55] <SWPadnos> heh - cradek and his bug extraction woes :)
[19:51:03] <cradek> man oh man
[19:51:35] <alex_joni> Man page "oh man" not found