#emc-devel | Logs for 2006-02-17

[01:55:55] <cradek> hi all
[01:55:56] <skunkworks> hey
[01:56:09] <jmkasunich> hi
[01:57:46] <jmkasunich> ok, how do I ssh into skunkworks broken box?
[01:58:06] <skunkworks> can I post my ip here
[01:58:15] <jmkasunich> better to priv it
[01:58:33] <skunkworks> good
[01:59:11] <skunkworks> I don't think I can start a pm - I don't think I can for some reason
[01:59:57] <jmkasunich> try again
[02:00:13] <jmkasunich> (I signed in with my password and turned "unfiltered" on
[02:00:27] <jmkasunich> (freenode has been doing stuff with privmsg to deal with spam
[02:02:38] <cradek> when you have the IP and username, just ssh username@
[02:02:46] <cradek> then you'll get a password prompt
[02:03:58] <jmkasunich> ok, I'm logged into skunkworks's box
[02:04:17] <jmkasunich> it's the installed deb that fails, or a cvs checkout, or both?
[02:05:11] <jmkasunich> tap, tap, is this thing on?
[02:05:42] <jmkasunich> I see a CVS checkout by alex
[02:06:57] <skunkworks> I did the cradek isntall first - that didnt work - then alex while goofing around installed the source and tried to compile it - I don't think that worked either
[02:07:16] <jmkasunich> I tried to re-run ./configure, and it seems to be hung
[02:07:36] <skunkworks> crap
[02:07:50] <jmkasunich> if I give a ctrl-C, will that kill the ssh session, or go to the remote box?
[02:07:57] <cradek> it'll go to the remote box
[02:08:01] <skunkworks> I cannot reboot it from here. :(
[02:08:04] <cradek> it will work just like a local shell
[02:09:01] <jmkasunich> skunkworks - dunno if the box is hung, maybe just my session
[02:09:04] <jmkasunich> trying another one
[02:09:40] <jmkasunich> box is OK, I have another ssh shell
[02:09:48] <cradek> where did configure stop?
[02:09:54] <skunkworks> good
[02:09:58] <jmkasunich> checking for ranlib
[02:10:06] <cradek> strange
[02:10:14] <jmkasunich> WTF!? the second shekk stopped in the middle of "ps -A"
[02:10:27] <jmkasunich> got the header line only
[02:10:38] <cradek> skunkworks: can I have the login information too?
[02:10:40] <jmkasunich> this is probably an ssh problem
[02:10:45] <cradek> let me try it
[02:10:50] <cradek> I use ssh 100 times a day
[02:10:59] <skunkworks> im me
[02:11:06] <skunkworks> pm what ever
[02:11:15] <jmkasunich> skunkworks, I can pass it to him
[02:11:30] <cradek> got it
[02:11:35] <jmkasunich> ok
[02:12:01] <jmkasunich> cradek: first thing, find out what my shells are doing
[02:12:05] <cradek> it's nice and responsive here
[02:12:30] <cradek> nothing - they're just idle at bash
[02:12:31] <skunkworks> poor machine ;)
[02:12:32] <jmkasunich> I have a NAT router, think that might be messing it up?
[02:12:37] <cradek> 12597 ? S 0:00 sshd: sam@pts/2
[02:12:37] <cradek> 12598 pts/2 Ss+ 0:00 -bash
[02:12:37] <cradek> 10932 ? S 0:00 sshd: sam@pts/0
[02:12:37] <cradek> 10933 pts/0 Ss+ 0:00 -bash
[02:12:46] <skunkworks> violated everyway from sunday
[02:12:58] <cradek> no, nat is unlikely to mess up ssh
[02:13:12] <jmkasunich> the configure and ps -A probably ran fine, but my link got fscked somehow
[02:13:19] <cradek> I agree
[02:13:31] <jmkasunich> how do I terminate my login?
[02:13:45] <cradek> RETURN ~ .
[02:14:10] <cradek> configure runs fine for me...
[02:14:17] <cradek> wonder what's wrong between you two
[02:14:21] <jmkasunich> that would be wonderfull if I could actually type anything into the shell
[02:14:30] <cradek> you can just type it blind
[02:14:37] <cradek> that's a command to your local ssh
[02:14:42] <jmkasunich> tried, no joy
[02:14:48] <jmkasunich> all caps on the RETURN?
[02:14:56] <cradek> no, sorry, the return key
[02:14:59] <cradek> uh, enter
[02:15:12] <jmkasunich> <return>, then '~', then '.'?
[02:15:17] <cradek> yes
[02:15:28] <cradek> that should terminate the ssh
[02:15:31] <jmkasunich> that worked
[02:15:49] <jmkasunich> ok, both conns closee
[02:15:56] <jmkasunich> closed even
[02:16:05] <cradek> I would not expect problems with this... very strange
[02:16:12] <cradek> ssh always works
[02:16:32] <jmkasunich> not for me
[02:16:37] <cradek> 23664 ? D 0:00 /sbin/insmod /usr/realtime-2.6.12-magma/modules/rtai_up.ko
[02:16:40] <cradek> 23665 ? D 0:00 /sbin/insmod /usr/realtime-2.6.12-magma/modules/rtai_up.ko
[02:16:43] <cradek> there are hung insmods
[02:16:51] <jmkasunich> just tried again, logged in, ps -A printed header and nothing else
[02:17:11] <jmkasunich> dmesg?
[02:17:25] <cradek> ...
[02:17:27] <cradek> [ 5035.309916] HAL: thread created
[02:17:27] <cradek> [ 5035.309940] MOTION: setting Traj cycle time to 10000000 nsecs
[02:17:27] <cradek> [ 5035.309947] MOTION: setting Servo cycle time to 1000000 nsecs
[02:17:27] <cradek> [ 5035.309951] MOTION: init_threads() complete
[02:17:29] <cradek> [ 5035.309954] MOTION: init_module() complete
[02:17:37] <cradek> looks ok
[02:17:45] <jmkasunich> so it loaded at least once
[02:17:58] <cradek> there's a motmod etc loaded, I'm going to try to unload them
[02:18:07] <jmkasunich> rtai_up.ko is the rtai uniprocessor scheduler
[02:18:15] <cradek> crap, sudo wants a password
[02:18:24] <skunkworks> should be the same
[02:18:36] <skunkworks> it is the only user on the machine as far as I know
[02:18:36] <cradek> aha
[02:18:43] <jmkasunich> yeah, user pw, not root
[02:18:55] <jmkasunich> we've had problems with that before
[02:19:26] <cradek> ok all modules unloaded nicely
[02:19:31] <jmkasunich> rtai_ksched (or rtai_sched, don't recall) is a symlink to rtai_up (or _smp, or maybe even another one)
[02:19:52] <jmkasunich> you can load using the symlink, but you must unload with the real name, or something like that
[02:19:58] <cradek> right
[02:20:02] <cradek> realtime start works fine
[02:20:26] <cradek> Module Size Used by
[02:20:26] <cradek> hal_lib 24460 0
[02:20:26] <cradek> rtapi 25664 1 hal_lib
[02:20:26] <cradek> rtai_math 25860 0
[02:20:26] <cradek> rtai_sem 14976 1 rtapi
[02:20:28] <jmkasunich> the prob seemed to be with user space access to the HAL shmem
[02:20:28] <cradek> rtai_shm 8192 1 rtapi
[02:20:31] <cradek> rtai_fifos 23500 1 rtapi
[02:20:33] <cradek> rtai_up 69400 4 rtapi,rtai_sem,rtai_shm,rtai_fifos
[02:20:36] <cradek> rtai_hal 20888 5 rtapi,rtai_sem,rtai_shm,rtai_fifos,rtai_up
[02:20:39] <cradek> adeos 14336 2 rtai_up,rtai_hal
[02:20:58] <jmkasunich> looks very normal
[02:21:05] <cradek> I'm going to be useless on this - I wish we could figure out your ssh problem
[02:21:11] <jmkasunich> try halcmd show
[02:21:22] <cradek> Loaded HAL Components:
[02:21:22] <cradek> ID Type Name
[02:21:22] <cradek> 01 User halcmd14546
[02:21:30] <cradek> everything else is empty
[02:21:37] <jmkasunich> as expected
[02:21:42] <skunkworks> do you need your router to port forward the right port to your box?
[02:22:03] <skunkworks> on your end?
[02:22:06] <jmkasunich> could be.... but it establishes the initial connection, then loses it later
[02:22:09] <cradek> skunkworks: no, it's a one-way outgoing connection
[02:22:39] <cradek> skunkworks: no fancy stuff needed like for dcc,ftp,etc
[02:22:52] <skunkworks> how does the info get returned?
[02:23:02] <cradek> the one tcp connection stays open
[02:23:17] <cradek> forever, until you disconnect
[02:23:21] <skunkworks> ah
[02:23:27] <cradek> or, in jmk's case, until he runs a command that gives a lot of output
[02:23:43] <jmkasunich> what's the command to view open connections?
[02:23:52] <cradek> netstat maybe?
[02:24:14] <cradek> jmkasunich: you have ethernet to your nat box, then broadband of some kind?
[02:24:26] <jmkasunich> yes
[02:24:28] <jmkasunich> DSL
[02:25:01] <jmkasunich> I can access the NAT config with a browser (its one of those little plastic boxes, not a computer
[02:26:22] <jmkasunich> netstat says I have a connection (even after the lockup)
[02:26:40] <skunkworks> for alex to get in I had to port forward 22 or 24 to my internal ip (what ever port it was)
[02:26:52] <cradek> yeah 22
[02:28:53] <cradek> jmkasunich: try ping
[02:29:03] <cradek> I think that's right outside skunkworks's machine
[02:29:13] <jmkasunich> works
[02:29:27] <cradek> jmkasunich: try ping -s1300
[02:30:02] <jmkasunich> works
[02:30:14] <jmkasunich> tried pinging his box, no joy
[02:30:17] <cradek> huh
[02:30:28] <cradek> yeah pings are blocked somewhere past this address
[02:30:49] <skunkworks> I think charter cable disabled it.
[02:30:53] <cradek> I'm baffled
[02:31:09] <cradek> wish I could help
[02:31:25] <skunkworks> Also I think my router is set to not respond to pings.
[02:31:32] <jmkasunich> we can do three way troubleshooting ;-/
[02:31:40] <cradek> I'll try
[02:31:43] <skunkworks> I could turn that on temp if that would help
[02:31:52] <cradek> no, I think it wouldn't help
[02:31:55] <jmkasunich> try bin/halcmd loadrt blocks wcomp=1
[02:32:02] <jmkasunich> the bin/halcmd show again
[02:32:53] <cradek> sam@ubuntu:~$ halcmd loadrt blocks wcomp=1
[02:32:53] <cradek> RTAPI: ERROR: version mismatch 0 vs 529
[02:32:53] <cradek> HAL: ERROR: rtapi init failed
[02:32:53] <cradek> halcmd: hal_init() failed
[02:32:53] <cradek> NOTE: 'rtapi' kernel module must be loaded
[02:33:08] <jmkasunich> so we can recreate it
[02:33:19] <jmkasunich> the error came on the loadrt, or the show?
[02:33:27] <cradek> the loadrt
[02:33:33] <cradek> I pasted the prompt and command too
[02:33:40] <jmkasunich> duh
[02:33:51] <jmkasunich> try the show
[02:34:05] <jmkasunich> same prob I bet
[02:34:11] <cradek> sam@ubuntu:~/emc2/src$ halcmd show
[02:34:11] <cradek> RTAPI: ERROR: version mismatch 0 vs 529
[02:34:11] <cradek> HAL: ERROR: rtapi init failed
[02:34:11] <cradek> halcmd: hal_init() failed
[02:34:11] <cradek> NOTE: 'rtapi' kernel module must be loaded
[02:34:25] <jmkasunich> but the very first show worked
[02:34:42] <jmkasunich> scripts/realtime stop, we'll try again from the too
[02:34:44] <jmkasunich> top
[02:35:01] <jmkasunich> did the stop work?
[02:35:04] <cradek> sam@ubuntu:~/emc2/src$ /etc/init.d/realtime start
[02:35:04] <cradek> sam@ubuntu:~/emc2/src$ halcmd show
[02:35:04] <cradek> Loaded HAL Components:
[02:35:04] <cradek> ID Type Name
[02:35:04] <cradek> 01 User halcmd16517
[02:35:09] <cradek> yes stop and restart gives this
[02:35:15] <jmkasunich> ok, do another show
[02:35:25] <cradek> any number of shows work
[02:35:45] <jmkasunich> but the loadrt fails? and then shows fail too?
[02:35:57] <jmkasunich> cat /proc/rtapi/*
[02:36:16] <cradek> sam@ubuntu:~$ halcmd loadrt blocks wcomp=1
[02:36:16] <cradek> HAL:0: ERROR: Can't find module 'blocks' in /usr/rtlib
[02:36:22] <cradek> eh???
[02:36:34] <jmkasunich> maybe thats the module path thing
[02:36:44] <jmkasunich> lemme try it here
[02:37:19] <jmkasunich> running installed or in place?
[02:37:34] <cradek> installed
[02:37:42] <cradek> what env do I set?
[02:37:48] <jmkasunich> HAL_RTMOD_DIR
[02:38:07] <jmkasunich> but what is strange is that you got farther this time
[02:38:25] <cradek> sam@ubuntu:~$ export HAL_RTMOD_DIR=/usr/realtime-2.6.12-magma/modules/emc2
[02:38:25] <cradek> sam@ubuntu:~$ halcmd loadrt blocks wcomp=1
[02:38:25] <cradek> HAL:0: ERROR: module 'blocks' not loaded
[02:38:37] <cradek> BUT
[02:38:38] <cradek> sam@ubuntu:~$ lsmod|head
[02:38:38] <cradek> Module Size Used by
[02:38:38] <cradek> blocks 18508 0
[02:38:46] <cradek> it DOES load
[02:39:33] <jmkasunich> how does module-helper return status?
[02:39:54] <cradek> it just execvs insmod/rmmod, so their exit codes are returned
[02:40:02] <jmkasunich> ok
[02:40:10] <jmkasunich> damn I miss kate
[02:40:18] <cradek> apt-get install kate
[02:40:34] <jmkasunich> and 47 gigs of dependencies
[02:40:37] <jmkasunich> not now
[02:40:49] <jmkasunich> making and installing TESTING
[02:40:50] <cradek> 22MB
[02:41:04] <jmkasunich> still, not now - I'll manage with gedit
[02:41:48] <cradek> I can build RIP TESTING if you want
[02:42:00] <jmkasunich> hold on
[02:42:34] <jmkasunich> ok, I understand the "not loaded"
[02:42:44] <cradek> ok cool
[02:42:59] <jmkasunich> after forking and invoking module-helper (and waiting for it to return) halcmd does a show comp (internally)
[02:43:07] <jmkasunich> its not seeing the installed module, so it complains
[02:43:26] <jmkasunich> there is something fscked with shared memory
[02:43:37] <cradek> ok
[02:43:41] <jmkasunich> halcmd show still only shows halcmd<somenum>, right>
[02:43:43] <cradek> so this is another symptom of the same problem
[02:43:48] <jmkasunich> s/>/?/
[02:43:57] <cradek> yes
[02:44:02] <cradek> 01 User halcmd16669
[02:44:07] <jmkasunich> cat /proc/rtapi/*
[02:44:29] <cradek> you want it all? it's a lot
[02:44:37] <jmkasunich> just a sec
[02:44:43] <cradek> ******* RTAPI MODULES *******
[02:44:43] <cradek> ID Type Name
[02:44:43] <cradek> 01 RT HAL_LIB
[02:44:43] <cradek> 02 RT HAL_blocks
[02:44:49] <cradek> **** RTAPI SHARED MEMORY ****
[02:44:49] <cradek> ID Users Key Size
[02:44:49] <cradek> RT/UL
[02:44:49] <cradek> 01 2/0 1212238881 65500
[02:45:30] <jmkasunich> where is "realtime" on an install?
[02:45:37] <cradek> /etc/init.d
[02:45:59] <jmkasunich> oh, not in my path
[02:46:04] <cradek> nope
[02:47:15] <jmkasunich> something tells me that when you do halcmd <anything> it opens a different shmem block than the one the RT HAL is using
[02:48:02] <jmkasunich> ok, try this:
[02:48:06] <jmkasunich> halcmd -f
[02:48:14] <jmkasunich> that will open a halcmd, and it will wait for input
[02:48:27] <cradek> sam@ubuntu:~/emc2/src$ halcmd -f
[02:48:27] <cradek> RTAPI: ERROR: version mismatch 0 vs 529
[02:48:27] <cradek> HAL: ERROR: rtapi init failed
[02:48:27] <cradek> halcmd: hal_init() failed
[02:48:27] <cradek> NOTE: 'rtapi' kernel module must be loaded
[02:48:28] <jmkasunich> in another shell, cat /proc/rtapi/shmem
[02:49:02] <jmkasunich> each time it says that, it must be accessing a different shmem or something
[02:49:20] <jmkasunich> try again, see if you can get it to run (give a halcmd: prompt)
[02:49:55] <cradek> no, I tried many times
[02:49:59] <jmkasunich> ok
[02:50:11] <cradek> and now halcmd show doesn't work
[02:50:28] <jmkasunich> realtime stop, clean things up
[02:50:38] <jmkasunich> I have a short RTAI shmem test program, let me find it
[02:51:20] <jmkasunich> https://mail.rtai.org/pipermail/rtai/2005-July/012321.html
[02:51:20] <cradek> ok clean
[02:51:42] <jmkasunich> there's a 20-30 line program embedded in that email message
[02:51:58] <jmkasunich> can you cut/paste it onto sam's box and compile it?
[02:52:18] <cradek> ready
[02:52:28] <jmkasunich> that was fast
[02:52:39] <cradek> I compile like the wind
[02:52:45] <jmkasunich> run it in a shell, it should print 1-30 at 1 second intervals, then exit
[02:52:55] <cradek> root@ubuntu:~# ./a.out
[02:52:55] <cradek> SHM_USR: Allocation failed
[02:53:13] <jmkasunich> oh, rtai isn't loaded
[02:53:16] <jmkasunich> realtime start
[02:53:34] <jmkasunich> (don't need rtapi or hal for this, but do need the rtai modules)
[02:53:37] <cradek> root@ubuntu:~# ./a.out
[02:53:37] <cradek> SHM_USR: incrementing count: was 18546688, now 18546689
[02:53:37] <cradek> SHM_USR: incrementing count: was 18546689, now 18546690
[02:53:37] <cradek> SHM_USR: incrementing count: was 18546690, now 18546691
[02:53:41] <cradek> ...
[02:53:47] <jmkasunich> well thats fscked
[02:54:05] <jmkasunich> maybe not, hang on
[02:54:09] <cradek> I don't see you clearing *p in that program
[02:54:15] <jmkasunich> no
[02:54:26] <jmkasunich> I thought RTAI cleared shmem regions, maybe not
[02:54:29] <cradek> should I add that?
[02:54:30] <jmkasunich> irrelevant anyway
[02:54:33] <cradek> ok
[02:54:33] <jmkasunich> no
[02:54:43] <jmkasunich> because the prog only gets interesting when you run two of them
[02:54:50] <jmkasunich> open another shell
[02:54:50] <cradek> did an old version of rtai clear shmem?? that could be our bug
[02:55:06] <jmkasunich> run in one shell, then start the prog in the other before the first one exits
[02:55:20] <jmkasunich> they both should be accessing and incrementing the same counter
[02:55:39] <cradek> thy are
[02:55:42] <cradek> they are
[02:55:47] <cradek> and it started at 0 this time
[02:56:06] <cradek> the last one exited at 60
[02:56:10] <jmkasunich> correct
[02:56:39] <cradek> when I run it again, it's starts at 60.
[02:56:43] <cradek> it
[02:57:01] <jmkasunich> the very first time, it opened some region of memory with a big number in it
[02:57:10] <cradek> ok
[02:57:19] <cradek> emc doesn't depend on the memory starting cleared does it?
[02:57:26] <jmkasunich> second time, a different region (or cleared the region), and then when you ran a second instance, they both accessed the same region
[02:57:27] <cradek> that could explain why we get random failures
[02:57:51] <jmkasunich> nothing in HAL depends on memory clearing (99.9% certain, I'll check in a few mins)
[02:58:36] <jmkasunich> when you ran it a third time, it must have opened the same region again. That's not required (once both instances ended in the previous run, the shmem region is gone)
[02:58:45] <cradek> ok
[02:59:01] <cradek> should I try this thing you describe with three invocations?
[02:59:07] <jmkasunich> sure
[02:59:55] <cradek> seems to work right (they all share the numbers)
[02:59:59] <jmkasunich> ok
[03:00:13] <jmkasunich> I'd be very suprised to see the exact same bug
[03:00:31] <jmkasunich> but that prog is a simple way to test shmem in general
[03:00:45] <jmkasunich> emc's usage is a little more complex
[03:00:57] <jmkasunich> hal_lib.ko opens a shmem region when loaded
[03:01:07] <jmkasunich> each RT hal module opens the same region
[03:01:14] <jmkasunich> as does each non-RT hal module
[03:01:16] <jmkasunich> and halcmd
[03:01:32] <jmkasunich> in the case of halcmd, every invocation opens it, then closes it on program exit
[03:01:36] <cradek> what triggers the version mismatch error?
[03:01:50] <jmkasunich> but since hal_lib.ko is loaded, all invocations refer to that region
[03:01:56] <cradek> ok
[03:02:04] <jmkasunich> there is a magic number and a version number in the shmem block
[03:02:15] <cradek> ok
[03:02:22] <cradek> so we're getting a different block maybe
[03:02:34] <jmkasunich> the very first time its opened, some global init needs done. that is done if the magic number is missing, then sets the magic
[03:02:39] <jmkasunich> it also sets the version
[03:03:04] <jmkasunich> subsequent opens of the region check the magic, see it set, know they don't have to do the global init, then they check the version
[03:03:27] <jmkasunich> to make sure you don't run hal components with mismatched hal data structs
[03:03:53] <jmkasunich> for instance if you changed the structure defs and didn't recompile everything, or if you got a binary hal module from somewhere
[03:04:01] <cradek> ok I see now
[03:04:13] <cradek> I think I'm not getting any wrong behavior with your test program
[03:04:23] <jmkasunich> I tend to agree
[03:05:11] <jmkasunich> alex made some changes to rtapi_common.h to print some stuff
[03:05:28] <jmkasunich> I wonder if those are in the checkout in ~/sam/emc2?
[03:05:48] <jmkasunich> he prints if magic is found, etc
[03:06:09] <cradek> sam@ubuntu:~/emc2.aj$ bin/halcmd show
[03:06:09] <cradek> init_rtapi_data: initial rev_code=529
[03:06:09] <cradek> init_rtapi_data: rtapi_mutex_try() returned -1
[03:06:09] <cradek> init_rtapi_data: assigned rev_code=529
[03:06:09] <cradek> ird: #1 529
[03:06:11] <cradek> ird: #2 529
[03:06:13] <cradek> ird: #4 529
[03:06:16] <cradek> ird: #4 529
[03:06:19] <cradek> init_rtapi_data: rev_code=529
[03:06:21] <cradek> Loaded HAL Components:
[03:06:24] <cradek> ID Type Name
[03:06:26] <cradek> 01 User halcmd20114
[03:07:07] <jmkasunich> where is that ird: coming from?
[03:07:34] <cradek> on subsequent runs, I get different output:
[03:07:40] <cradek> sam@ubuntu:~/emc2.aj$ bin/halcmd show
[03:07:40] <cradek> init_rtapi_data: MAGIC is ok, rev_code=529
[03:07:40] <cradek> Loaded HAL Components:
[03:07:40] <cradek> ID Type Name
[03:07:40] <cradek> 01 User halcmd20129
[03:08:15] <cradek> that print is in init_rtapi_data
[03:08:52] <jmkasunich> ok, it wasn't in the part he pasted into the email
[03:09:02] <cradek> I think it only happens the first time
[03:10:08] <cradek> sam@ubuntu:~/emc2.aj$ bin/halcmd -f
[03:10:08] <cradek> init_rtapi_data: MAGIC is ok, rev_code=529
[03:10:08] <cradek> halcmd:
[03:10:10] <jmkasunich> he prints after each of those for loops?
[03:10:14] <cradek> yes
[03:10:23] <cradek> now I have a halcmd prompt
[03:10:32] <jmkasunich> he was trying to see if it got stomped on by the loops
[03:11:08] <cradek> what did you want me to do with a halcmd: prompt open?
[03:11:41] <jmkasunich> cat /proc/rtapi/shmem
[03:13:54] <jmkasunich> also, cat /proc/rtai/names
[03:15:17] <jmkasunich> still there?
[03:15:39] <jmkasunich> on this box. rtai/names shows three lines with SHMEM in them
[03:16:08] <jmkasunich> on is 12288 bytes, I think that is used by RTAI, one is 65536, that is HAL, and one is 2Meg, dunno what is using that
[03:16:37] <jmkasunich> usage counts on the two small ones are both 2
[03:16:52] <jmkasunich> (this is with blocks loaded)
[03:17:48] <cradek> argh, now I'm having connectivity problems
[03:17:48] <cradek> bear with me if I disappear for a few minutes
[03:17:48] <cradek> fsort of
[03:17:48] <cradek> argh
[03:18:06] <cradek> ok I think I'm back
[03:19:23] <cradek> sam@ubuntu:~$ cat /proc/rtapi/shmem
[03:19:23] <cradek> **** RTAPI SHARED MEMORY ****
[03:19:23] <cradek> ID Users Key Size
[03:19:23] <cradek> RT/UL
[03:19:23] <cradek> 01 1/0 1212238881 65500
[03:19:33] <cradek> Slot Name ID Type RT_Handle Pointer Tsk_PID MEM_Sz USG Cnt
[03:19:36] <cradek> -------------------------------------------------------------------------------
[03:19:39] <cradek> 55 CF$Z86 0x48414c21 SHMEM 0xf8dfe000 0x00000000 0 65536 2
[03:19:42] <cradek> 62 RTGLBF 0x9ac6d9e5 SHMEM 0xf8f1d000 0x00000000 0 2097152 1
[03:19:45] <cradek> 74 PUFUQK 0x90280a48 SHMEM 0xf8c1b000 0x00000000 0 12288 2
[03:19:49] <jmkasunich> that's with the halcmd prompt showing?
[03:20:07] <cradek> yes I think it's still there, but that terminal is hung
[03:20:22] <cradek> 20176 pts/1 SL+ 0:00 bin/halcmd -f
[03:20:26] <cradek> yes it's still running
[03:20:42] <jmkasunich> if the halcmd was still going, then RT/UL under rtapi shared memory should be 1/1
[03:20:52] <jmkasunich> hal_lib on the RT side, and halcmd on the user side
[03:21:09] <cradek> I notice size is also different
[03:21:13] <cradek> 65500 vs 65536
[03:21:32] <jmkasunich> I request a little under 64K in case their using a slab allocator
[03:21:40] <cradek> ah
[03:21:58] <jmkasunich> if I asked for exaclty 64K and they add a few bytes of overhead, all of a sudden you get twice as much
[03:22:10] <cradek> right
[03:22:31] <jmkasunich> interesting that they have an accurage usage count
[03:23:04] <jmkasunich> I think the magic number and/or the version is getting stomped somehow
[03:23:31] <cradek> when I kill the halcmd, /proc/rtapi/shmem doesn't change
[03:23:48] <cradek> and also when I start another one
[03:23:53] <jmkasunich> yeah, somehow it didn't even know about the halcmd
[03:24:08] <cradek> sam@ubuntu:~/emc2.aj$ bin/halcmd -f
[03:24:08] <cradek> init_rtapi_data: MAGIC is ok, rev_code=529
[03:24:14] <cradek> but the halcmd thinks it's ok
[03:24:58] <cradek> I notice this machine has 1.5GB of ram
[03:25:05] <jmkasunich> wow, thats a lot
[03:25:41] <jmkasunich> I've been looking at code
[03:25:47] <skunkworks> it is a machine in limbo right now - it was a rip for a large immage setter
[03:26:13] <jmkasunich> the only way to get the version mismatch message is for the magic number in the RTAPI region to be OK but the version to be munged
[03:26:24] <cradek> interesting
[03:26:39] <jmkasunich> especially since the version is immediately after the magic in the struct
[03:26:50] <cradek> seems pretty unlikely to get that magic # by chance
[03:27:17] <jmkasunich> yeah
[03:27:41] <jmkasunich> although after you're loaded rtapi once, there is at least one memory location that contains the magic
[03:27:54] <jmkasunich> (maybe - I think I might actually clear it on removal)
[03:28:43] <jmkasunich> nope (probably should tho, as long as I can be _SURE_ I'm the last one holding it)
[03:30:07] <jmkasunich> if the magic gets messed up, that isn't pretty either
[03:30:33] <cradek> you void* in rtapi_shmem_getptr can point anywhere in 1.5GB right? I think you can get 4GB with a four byte pointer?
[03:30:34] <jmkasunich> because the next time you load halcmd, it will redo the global init, stomping on rtapi internal data
[03:30:48] <cradek> youR
[03:31:36] <jmkasunich> should be able to point anywhere
[03:31:56] <cradek> ok
[03:32:03] <cradek> just thinking all that ram makes this machine unusual
[03:32:08] <jmkasunich> yeah
[03:32:26] <jmkasunich> rtapi does a lot of housekeeping
[03:32:40] <jmkasunich> most of which I haven't looked at in a couple years
[03:32:57] <cradek> obviously because it has always worked until now...
[03:33:12] <jmkasunich> except for the previous shared memory strangeness
[03:33:32] <cradek> that must have been before my time
[03:33:35] <jmkasunich> which was repeatable on multiple boxes, depended only on the kernel/rtai
[03:33:48] <jmkasunich> (that email with the test program in it)
[03:33:49] <skunkworks> I could pull some of the ram out of it tomorrow
[03:33:58] <cradek> yeah this one's pretty special because we know everything is the same as our boxes
[03:34:12] <cradek> skunkworks: it's only a shot in the dark...
[03:34:26] <cradek> skunkworks: in case you haven't noticed, I don't know what I'm talking about here :-)
[03:34:39] <skunkworks> I am hanging on for dear lifr
[03:34:41] <skunkworks> life
[03:34:42] <jmkasunich> you know, once something stomps on the block that rtapi uses for its data, things can get messy
[03:35:01] <jmkasunich> that is the 12288 block, not the 64K one
[03:35:28] <jmkasunich> (that is also where the magic and version codes are - this isn't even a HAL thing, it is either RTAPI or RTAI itself)
[03:36:03] <jmkasunich> (or bad memory, but what are the odds of us getting the same bad memory every time we ask for a block of shmem
[03:36:23] <cradek> also seems unlikely
[03:36:28] <cradek> if the box is stable otherwise
[03:36:41] <jmkasunich> cradek: look in dmesg or /var/log/messages, and see if any of alex's messages are in there
[03:36:55] <jmkasunich> that code he patched is common to both RT and user space
[03:38:47] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207439] init_rtapi_data: initial rev_code=529
[03:38:51] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207443] init_rtapi_data: rtapi_mutex_try() returned 0
[03:38:54] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207447] init_rtapi_data: assigned rev_code=529
[03:38:57] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207451] ird: #1 529
[03:38:59] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207454] ird: #2 529
[03:39:02] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207459] ird: #4 529
[03:39:04] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207461] ird: #4 529
[03:39:07] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207464] init_rtapi_data: rev_code=529
[03:39:09] <cradek> Feb 15 18:21:18 ubuntu kernel: [ 4871.207487] RTAPI: Init complete
[03:39:18] <jmkasunich> yesterday?
[03:39:22] <cradek> just a couple that look like these
[03:39:26] <cradek> yes that's yesterday
[03:39:54] <cradek> 0200 alex time? Was he really working on it then?
[03:40:14] <jmkasunich> dunno
[03:40:17] <skunkworks> he ended around 1;00 his time
[03:40:27] <cradek> right, he was in germany
[03:40:36] <skunkworks> said he had to go to bed - had to catch a plane - yes germany
[03:40:42] <skunkworks> :0
[03:41:15] <jmkasunich> interesting that the initial rev code is correct
[03:41:23] <cradek> jmkasunich: so if I get it to fail again, will I get more info with alex's debug output?
[03:41:55] <jmkasunich> that line is only executed if the magic does NOT match, which means (I thought) that we had an uninitialized shmem block)
[03:42:16] <jmkasunich> actually, most of alex's output is normal
[03:42:26] <skunkworks> logger_devel: bookmark
[03:42:26] <skunkworks> See http://solaris.cs.utt.ro/irc/irc.freenode.net:6667/emcdevel/2006-02-17#T03-42-26
[03:42:27] <cradek> oh hey
[03:42:32] <cradek> sam@ubuntu:~/emc2.aj$ /etc/init.d/realtime start
[03:42:32] <cradek> sam@ubuntu:~/emc2.aj$ bin/halcmd show
[03:42:32] <cradek> init_rtapi_data: initial rev_code=529
[03:42:32] <cradek> init_rtapi_data: rtapi_mutex_try() returned 0
[03:42:32] <cradek> init_rtapi_data: assigned rev_code=529
[03:42:34] <cradek> ird: #1 529
[03:42:37] <cradek> ird: #2 0
[03:42:39] <cradek> ird: #4 0
[03:42:42] <cradek> ird: #4 0
[03:42:44] <cradek> init_rtapi_data: rev_code=0
[03:42:46] <jmkasunich> !wow!
[03:42:47] <cradek> RTAPI: ERROR: version mismatch 0 vs 529
[03:42:57] <cradek> that's repeatable, it does it over and over
[03:43:31] <jmkasunich> ok, I don't have alex's code
[03:43:43] <jmkasunich> where is the irc: #1 and #2?
[03:43:56] <jmkasunich> I'm looking at the unmodified rtapi_common.c
[03:44:01] <cradek> for (n = 0; n <= RTAPI_MAX_SHMEMS; n++) {
[03:44:04] <cradek> around this loop
[03:44:08] <cradek> rtapi_common.h
[03:44:30] <jmkasunich> #1 is after the tasks loop, before the shmem loop? and #2 is after the shmem loop?
[03:44:40] <cradek> yes
[03:45:47] <jmkasunich> that loop is clearing stuff that is pretty far away from the rev_code
[03:46:20] <jmkasunich> and holding a mutex while it does it...
[03:46:22] <cradek> obviously not in this case
[03:46:35] <cradek> one of those pointers is wrong?
[03:46:48] <jmkasunich> which pointers?
[03:47:20] <cradek> oh it's all inside the struct
[03:47:22] <cradek> hmm
[03:47:47] <cradek> can rtapi_print do %p?
[03:47:54] <jmkasunich> I think so
[03:48:05] <jmkasunich> this is halcmd causing this, right?
[03:48:07] <cradek> let me add some
[03:48:09] <cradek> yes
[03:48:10] <cradek> halcmd show
[03:48:18] <jmkasunich> rtapi.ko is loaded the whole time
[03:48:25] <jmkasunich> so magic should be set
[03:48:30] <cradek> rtapi 25664 1 hal_lib
[03:48:34] <jmkasunich> and this code should _NOT_ be running
[03:48:45] <cradek> oh!
[03:49:25] <cradek> are you positive it doesn't run for you?
[03:49:30] <cradek> can you put this same printf in yours?
[03:49:42] <jmkasunich> yes (similar anyway)
[03:52:30] <jmkasunich> I'm gonna print "data" too
[03:57:03] <skunkworks> Ok - I am going to have to call it a night. Is this going well?
[03:57:18] <cradek> skunkworks: well, we see things that look wrong, which is good
[03:57:24] <jmkasunich> heh
[03:57:25] <skunkworks> seems like you are getting closer
[03:57:36] <cradek> thanks for letting us play with your machine
[03:57:46] <jmkasunich> yeah, thanks
[03:57:54] <skunkworks> use the machine as long as you like or untill it locks up ;)
[03:58:12] <cradek> it seems solid enough
[03:58:32] <jmkasunich> what is your email? (in case we want to ask you to remove some ram or something tomorrow)
[03:58:35] <skunkworks> I will talk to you guys tomorrow to see if you want me to change anything (memory)
[03:59:01] <skunkworks> you can email me at samcoinc@gmail.com
[03:59:18] <jmkasunich> ok
[03:59:47] <skunkworks> good luck - good night
[04:00:01] <jmkasunich> Feb 16 22:59:28 localhost kernel: [ 9048.888960] rtapi_init (RT): calling global init, data e0ae2000Feb 16 22:59:28 localhost kernel: [ 9048.888964] init_rtapi_data(): start
[04:00:01] <jmkasunich> Feb 16 22:59:28 localhost kernel: [ 9048.888967] ird: data: e0ae2000 magic 0
[04:00:01] <jmkasunich> Feb 16 22:59:28 localhost kernel: [ 9048.888970] ird: magic not right, initial rev 0
[04:00:02] <jmkasunich> Feb 16 22:59:28 localhost kernel: [ 9048.888975] ird: #1 data: e0ae2000 magic 308286473 rev 529
[04:00:02] <jmkasunich> Feb 16 22:59:28 localhost kernel: [ 9048.888981] ird: #2 data: e0ae2000 magic 308286473 rev 529
[04:00:03] <jmkasunich> Feb 16 22:59:28 localhost kernel: [ 9048.888984] rtapi_init (RT): rev code OK
[04:00:07] <jmkasunich> Feb 16 22:59:28 localhost kernel: [ 9048.888998] RTAPI: Init complete
[04:00:13] <jmkasunich> thats on the realtime side
[04:00:28] <jmkasunich> john@ke-main-ubuntu:~/emcdev/emc2testing/src$ halcmd show
[04:00:28] <jmkasunich> rtapi_init (RT): calling global init, data 0xb7f0c000
[04:00:28] <jmkasunich> init_rtapi_data(): start
[04:00:28] <jmkasunich> ird: data: 0xb7f0c000 magic 308286473
[04:00:28] <jmkasunich> init_rtapi_data: MAGIC is ok, rev_code=529
[04:00:30] <jmkasunich> rtapi_init (RT): rev code OK
[04:00:32] <jmkasunich> Loaded HAL Components:
[04:00:36] <jmkasunich> on the user side
[04:01:49] <cradek> data->rev_code at 0xb7f16004 val 529
[04:01:49] <cradek> data->rev_code at 0xb7f16004 val 529
[04:01:49] <cradek> data->shmem_array[n].bitmap[m] at 0xb7f17004
[04:01:49] <cradek> data->rev_code at 0xb7f16004 val 0
[04:01:49] <cradek> data->shmem_array[n].bitmap[m] at 0xb7f17008
[04:01:51] <cradek> data->rev_code at 0xb7f16004 val 0
[04:02:05] <cradek> that bitmap line is the one that nukes it
[04:02:09] <cradek> data->shmem_array[n].bitmap[m] = 0;
[04:02:23] <cradek> it writes a zero to b7f17004
[04:02:30] <cradek> but the value at b7f16004 is nuked
[04:02:36] <cradek> one bit different
[04:02:40] <jmkasunich> bad address line?
[04:02:53] <jmkasunich> time to run memtextx86 all night
[04:03:03] <cradek> yeah maybe so
[04:03:09] <cradek> but none of this should be running at all?
[04:03:26] <jmkasunich> ?
[04:03:49] <cradek> you said something about this whole block of code shouldn't be running?
[04:04:03] <jmkasunich> well, it runs if magic is busted
[04:04:22] <jmkasunich> magic is at b7f16000
[04:04:30] <jmkasunich> and probably vulnerable to the same thing
[04:04:38] <cradek> right before rev_code?
[04:04:41] <jmkasunich> yes
[04:05:09] <cradek> let me see if b7f17000 is written to
[04:05:33] <cradek> I would be surprised/thrilled if this was a "simple" hardware problem
[04:05:40] <jmkasunich> first time, magic is wrong (expected), so the init code sets magic, then runs the rest of the init loop which busts magic and rev (or maybe only one of them, intermittently)
[04:05:57] <cradek> ok I see
[04:05:58] <jmkasunich> next time, if magic is ok but rev is wrong, we get the rev mismatch
[04:06:15] <jmkasunich> if magic is wrong, we re-run the init code and fsck up rtapi's accounting
[04:06:36] <cradek> data->rev_code at 0xb7f34004 val 529
[04:06:41] <cradek> data->shmem_array[n].bitmap[m] at 0xb7f35004
[04:06:41] <cradek> data->rev_code at 0xb7f34004 val 0
[04:06:55] <cradek> different location this time, but still one bit different
[04:07:12] <cradek> SAME BIT
[04:07:15] <jmkasunich> yes
[04:07:37] <cradek> data->rev_code at 0xb7f7e004 val 529
[04:07:39] <cradek> data->shmem_array[n].bitmap[m] at 0xb7f7f004
[04:07:39] <cradek> data->rev_code at 0xb7f7e004 val 0
[04:07:47] <cradek> same bit
[04:07:57] <cradek> smoking gun??
[04:08:10] <jmkasunich> looks smokey to me
[04:08:17] <cradek> "somewhat smokey"
[04:08:20] <cradek> haha
[04:08:46] <jmkasunich> its amazing that nothing else crashes
[04:08:51] <cradek> no kidding
[04:09:01] <cradek> I can happily build emc over and over
[04:09:13] <jmkasunich> unless RTAI allocates shmem from one end of memory, and linux allocates from the other
[04:09:43] <cradek> 0xb8000000 is pretty high
[04:09:58] <jmkasunich> do top, see if its using any swap
[04:10:07] <jmkasunich> I bet with all that ram, it never gets full
[04:10:14] <cradek> no swap in use
[04:10:15] <jmkasunich> so Linux never uses those addys
[04:10:49] <jmkasunich> back in the day or RTLinux, you needed to reserve space at end of phys memory for RT shmem
[04:10:55] <cradek> these addresses are at 3GB
[04:11:08] <jmkasunich> you don't any more, but I bet RTAI still allocates shmem from the end
[04:11:34] <cradek> but the bit that seems wrong is really low. Problems would show up everywhere.
[04:11:48] <cradek> every 4k
[04:12:10] <jmkasunich> he could have a bad DIMM or intermitten socket pin
[04:12:28] <cradek> yeah I guess dram works in strange ways
[04:12:28] <jmkasunich> so first 1G is fine, last 512M has a prob every 4K
[04:12:45] <jmkasunich> (assuming 3x512M dimms)
[04:13:03] <cradek> you know what
[04:13:09] <cradek> I could reboot it into memtest86
[04:13:29] <jmkasunich> and let him get the results tomorrow?
[04:13:30] <cradek> it could run all night, you send him an email describing what to look for, and he could report in the morning
[04:13:49] <jmkasunich> waitaminnit - how can you do that
[04:13:59] <jmkasunich> reboot, yes, but reboot <foo>?
[04:14:12] <cradek> I can use my powers only for good, not for evil
[04:14:24] <cradek> well this is assuming there's no CD or floppy in the machine
[04:14:46] <cradek> I would just set the "default" grub boot entry to memtest
[04:14:47] <jmkasunich> maybe we should just have him run memtext when he gets there?
[04:14:57] <jmkasunich> oh, I see
[04:15:06] <jmkasunich> except...
[04:15:12] <jmkasunich> never mind
[04:15:40] <jmkasunich> (was wondering "how will he get it out of memtest", then realized he has 30 seconds or whatever at the grub menu)
[04:15:40] <cradek> there's no floppy in it
[04:15:58] <cradek> yeah he just has to use the menu
[04:16:26] <cradek> what the heck, I'm going to do it
[04:16:31] <jmkasunich> ok
[04:16:36] <cradek> he'll know in the morning
[04:16:40] <jmkasunich> hours of memtest are a good thing
[04:16:50] <cradek> if there's red(?) on the screen it's bad
[04:16:58] <cradek> I think the errors are red iirc
[04:18:22] <jmkasunich> he did say the machine was used, maybe it just has "old" - memory needs reseated in sockets or something
[04:18:46] <cradek> I bet memtest will show this easily
[04:18:51] <jmkasunich> yeah
[04:18:53] <cradek> but memtest takes a long time on this much ram
[04:18:59] <cradek> I have a 6G machine at work
[04:19:05] <jmkasunich> I just wish there was a way we could see the results
[04:19:06] <cradek> it takes many hours per cycle
[04:19:17] <jmkasunich> hmm
[04:19:19] <cradek> yeah, but that's impossible until morning
[04:19:21] <jmkasunich> don't reboot it yet
[04:19:25] <cradek> ok
[04:19:35] <jmkasunich> remember my shmem test program?
[04:19:39] <cradek> yes
[04:19:51] <jmkasunich> so we make it get a big block (64K or so)
[04:19:56] <jmkasunich> and do our own little memtest
[04:20:11] <cradek> ahhhh
[04:20:26] <cradek> when I make -j on emc, all the gccs crash with a seg fault
[04:20:30] <cradek> it's ram
[04:20:48] <jmkasunich> -j runs em in parallel?
[04:20:59] <cradek> yes, all of the files at once
[04:21:12] <cradek> there are a thousand kernel oopses in the dmesg now
[04:21:16] <jmkasunich> smoking gun
[04:21:35] <cradek> [105492.130112] EIP: 0060:[<c0138a84>] Not tainted VLI
[04:21:35] <cradek> [105492.130115] EFLAGS: 00010202 (2.6.12-magma)
[04:21:35] <cradek> [105492.130133] EIP is at page_add_anon_rmap+0x18/0x5c
[04:21:45] <cradek> yep, smoke pouring out everywhere
[04:21:48] <jmkasunich> damned impressive troubleshooting sir!
[04:21:53] <cradek> haha
[04:21:57] <cradek> it's why I get the big bucks
[04:22:00] <cradek> oh, wait
[04:22:06] <jmkasunich> * jmkasunich takes off his hat and makes a sweeping bow
[04:22:14] <cradek> same to you
[04:22:18] <jmkasunich> I bow to the master
[04:22:21] <cradek> bah
[04:22:31] <jmkasunich> (now I'll just send you the hard ones ;-)
[04:22:39] <cradek> I'm going to reboot it before it crashes
[04:22:44] <cradek> it's probably quite fucked now
[04:23:03] <jmkasunich> into memtest? or better to not mess with menu.lst while its unstable?
[04:23:10] <cradek> yeah, into memtest
[04:23:18] <cradek> it seems ok as long as I don't run gcc :-)
[04:23:20] <cradek> vi is small
[04:23:33] <cradek> ok here it goes
[04:23:37] <cradek> any last words?
[04:23:42] <jmkasunich> nope
[04:24:10] <LawrenceG> use the force luke....
[04:24:16] <cradek> ok it's done
[04:24:41] <cradek> ha I forgot we were spamming a public channel all this time
[04:25:11] <cradek> I sure expected to find a software problem...
[04:25:15] <cradek> I'm happy it's not
[04:25:15] <LawrenceG> I like to see the masters at work
[04:25:20] <jmkasunich> ditto
[04:25:39] <jmkasunich> this is twice that we've had strange rtapi stuff, and it turned out to be something else
[04:25:58] <cradek> I'm glad it's not my rtai build... that's a pain in the neck
[04:27:51] <jmkasunich> so do you want to talk about setupconfig and configs/common? or do you want to go to sleep?
[04:28:57] <cradek> I think you pretty much answered my question
[04:29:08] <cradek> I think we should undo that mess while we can, I don't like it
[04:29:26] <jmkasunich> which part of the mess, the whole common/ thing?
[04:29:30] <cradek> yes
[04:29:40] <cradek> the fact that you can't copy a config to a different directory and have it work
[04:29:59] <cradek> the sample configs are there FOR copying
[04:30:18] <jmkasunich> "the fact that you can't copy it unless you use a special tool"
[04:30:26] <cradek> right
[04:31:23] <jmkasunich> I really hate the idea of client.nml, server.nml, emc.nml in every fscking sample dir tho
[04:31:38] <jmkasunich> maintainence nightmare
[04:31:53] <cradek> well let's look at this a different way
[04:32:04] <cradek> say we change the inis at install time to use an absolute path to common
[04:32:14] <cradek> suddenly, you can copy a sample config
[04:32:38] <cradek> if we want to update something in common, we can - do we want that to affect the previously copied configs, or just the samples?
[04:32:47] <jmkasunich> depends
[04:32:53] <jmkasunich> (NEFS)
[04:33:07] <cradek> I don't speak your crazy moon-language
[04:33:15] <cradek> I mean, what's NEFS?
[04:33:23] <jmkasunich> if the "something" is an NML file (which is rarely ever changed by the user) then we probalby want to fix everybody
[04:33:50] <cradek> ok, another approach
[04:33:54] <jmkasunich> if its a file that they've modified, we don't want to stomp on their mods
[04:34:07] <cradek> the deb updater will NOT nuke a changed config file without asking
[04:34:15] <cradek> I tagged everything in /etc/emc2 as config files
[04:34:18] <jmkasunich> thats why setupconfig copies everything out of common into their dir when you do a new
[04:34:30] <cradek> so if some dummy edits a sample config, it won't get overwritten without asking them
[04:34:52] <jmkasunich> nice
[04:34:58] <cradek> ok, yet another approach
[04:35:06] <jmkasunich> that covers the debs, which seem to be (rightly) your focus
[04:35:13] <jmkasunich> not so good for rip or cvs checkout
[04:35:20] <cradek> I don't know or care what's in an nml file, I've never needed to change it
[04:35:34] <jmkasunich> right, it is almost never changed
[04:35:35] <cradek> so if we take NMLFILE=.... out of the inis, let's have emc do a reasonable default thing
[04:35:49] <cradek> if you need to do something different, you can specify an NMLFILE=
[04:35:57] <jmkasunich> I wouldn't go that far
[04:36:01] <cradek> :-)
[04:36:03] <cradek> brainstorming
[04:36:06] <jmkasunich> yeha
[04:36:08] <jmkasunich> yeah
[04:36:10] <cradek> that doesn't solve the core-stepper thing though.
[04:36:27] <jmkasunich> I don't have too much heartburn about the hal files
[04:36:54] <jmkasunich> it kinda sucks if we have mutiples
[04:37:01] <jmkasunich> actually core-servo is worse
[04:37:11] <jmkasunich> there are probably only 2 configs that use core-stepper
[04:37:27] <cradek> yeah, it sucks, but it also sucks to not be able to use the normal system tools in a reasonable way to manipulate configs
[04:37:28] <jmkasunich> maybe even only one, in which case it shouldn't be in common anyway
[04:37:37] <jmkasunich> but core-servo is used by multiple configs
[04:37:41] <jmkasunich> agreed
[04:37:42] <cradek> I wish we could minimize both sucks
[04:38:25] <jmkasunich> I could see absolute paths for nmlfiles, and local copies for hal files (even if it means duplication)
[04:39:04] <jmkasunich> looks like the .var and .tbl files are already duplicated in every directory anyway
[04:39:23] <cradek> sometimes I think this whole thing is silly, john
[04:39:29] <cradek> every computer is hooked to one mill
[04:39:34] <cradek> you only need one config
[04:39:37] <jmkasunich> probably because of the desire to have stepper.tbl, ppmc.tbl, foo.tbl instead of emc.tbl
[04:39:38] <cradek> you only ever use one config.
[04:40:05] <jmkasunich> yep
[04:40:14] <jmkasunich> or course, you get 20 samples
[04:40:30] <cradek> sure, that's fine
[04:41:51] <jmkasunich> ok, given that it is silly, what do we do?
[04:42:02] <cradek> that's a good question
[04:42:16] <jmkasunich> drop common/, put everything in each sample config, and let them copy dirs as needed?
[04:43:03] <cradek> maybe dropping common is the first step
[04:43:08] <cradek> I don't know what the second step is, though
[04:43:19] <jmkasunich> drop pickconfig and setupconfig? ;-)
[04:43:40] <jmkasunich> nah, I think they are usefull for aunt tillie types, if nobody else
[04:43:43] <cradek> pickconfig is good for trying the different GUIs if nothing else
[04:43:46] <jmkasunich> but simplify as much as possible
[04:44:05] <jmkasunich> you know, that is really the problem
[04:44:08] <cradek> honestly, unless we plan to have a full gui editor, I don't think we gain much from setupconfig
[04:44:14] <cradek> what is?
[04:44:41] <jmkasunich> we have at least three "dimensions" and we're trying to cover them with lots of samples
[04:44:55] <jmkasunich> but with a 3D space, coverage is sparse even with a lot of samples
[04:45:26] <jmkasunich> dimension 1: machine config - simple steppers, medium servo, complex mazak with toolchanger
[04:45:27] <cradek> yeah, I've thought that too
[04:45:56] <jmkasunich> dimension 2: I/O devices - motenc, stc, m5i20, vigalent, parport
[04:46:05] <jmkasunich> dimension 3: UI
[04:46:24] <cradek> worse: 1&2 are hardware, 3 is a user preference
[04:46:31] <cradek> I know ray (and you?) disagree
[04:46:38] <cradek> but I think gui is a user pref.
[04:47:01] <jmkasunich> user as opposed to integrator?
[04:47:18] <cradek> yes
[04:47:34] <jmkasunich> sometimes user and integrator are one and the same ;-)
[04:47:38] <cradek> sure
[04:48:01] <cradek> you know what else I think, and you're going to object: mm/inch is a user preference too
[04:48:16] <jmkasunich> no ;-)
[04:48:21] <jmkasunich> GUI inch/mm, yes
[04:48:26] <jmkasunich> default GUI units, yes
[04:48:29] <cradek> yes
[04:48:30] <jmkasunich> machine configs, no
[04:48:47] <jmkasunich> the values of HAL signals are in either mm or inches, you can't go changing that on a whim
[04:48:57] <cradek> I think it's absurd that we have to rewrite the entire latter to change the former
[04:49:00] <jmkasunich> INPUT_SCALE is either counts/mm or counts/in
[04:49:22] <cradek> (well we fixed that in AXIS)
[04:49:26] <cradek> (I think)
[04:49:31] <jmkasunich> isn't default GUI units set by one line in the ini file?
[04:49:40] <cradek> yeah the user units
[04:49:44] <jmkasunich> ok
[04:49:51] <jmkasunich> not a problem then
[04:49:59] <cradek> there are two possible values: 1 and 0.393whatever, everything else breaks
[04:50:10] <jmkasunich> that is just fscked
[04:50:13] <cradek> but if you change those, you have to rewrite the rest of the damn ini
[04:50:20] <cradek> yes it is
[04:50:23] <jmkasunich> waitaminnit
[04:50:26] <jmkasunich> those aren't users units
[04:50:29] <jmkasunich> are they?
[04:50:32] <cradek> yes
[04:50:48] <cradek> number of user units per mm
[04:51:34] <jmkasunich> ok, there is units in [TRAJ] and in [AXIS]
[04:51:47] <cradek> yeah I don't know why you have to say it 4 times
[04:51:57] <cradek> those probably aren't even used
[04:52:08] <jmkasunich> cause Fred and Will are academics
[04:52:22] <jmkasunich> and were designing a very flexible program
[04:52:31] <cradek> also you specify LINEAR or ANGULAR for XYZABC, but half the code has hardcoded XYZ=linear ABC=angular
[04:52:57] <cradek> the ini is full of crap that we don't need, and that makes the configuration process obtuse
[04:52:57] <jmkasunich> part of that comes from a failure to distinguish between axes and joints
[04:53:21] <cradek> hmm.
[04:53:22] <jmkasunich> axes - cartesean space coordinates, three linear, three angles
[04:53:28] <jmkasunich> joints - machine coordinates
[04:53:29] <cradek> I think I'm just complaining now
[04:53:39] <jmkasunich> for a trivkins machine they are the same
[04:53:48] <cradek> yeah I know the difference, but I've written code that doesn't (knowing full well what I was doing)
[04:54:15] <jmkasunich> emc does and does not know the difference
[04:54:20] <cradek> yep
[04:54:21] <jmkasunich> (at the same time!
[04:54:45] <jmkasunich> if you have trivial kins, then a huge amount of the ini file is redundant
[04:54:53] <jmkasunich> if you have non-trivial kins, you need it
[04:54:53] <cradek> seems we could maybe have a simple ini format and a complex ini format
[04:55:20] <cradek> the simple ini format could be fully specified with one gui form (one screen)
[04:55:56] <cradek> scale, vel, accel * 3, units (pick from 2), gui (pick from 3) ...?
[04:56:38] <cradek> PERIOD (actually you would specify your machine's MHz)
[04:56:41] <jmkasunich> a few others, but I get your point
[04:56:47] <jmkasunich> max feed override
[04:56:54] <cradek> yeah
[04:57:20] <jmkasunich> but you are right, 80% of the ini file is stuff that joe average user never changes
[04:57:29] <cradek> if we had that, even for just steppers, it would be a big step
[04:57:43] <cradek> what percentage of our users have steppers? 90?
[04:58:01] <cradek> if you have servos you have to be much, much more aware of what's going on because you have to tune them
[04:58:27] <cradek> we can probably concentrate on the ease of use for the stepper people.
[04:58:32] <jmkasunich> this gets back to where I was going with my "3D" comments
[04:58:53] <cradek> I follow you
[04:59:06] <jmkasunich> instead of trying to cover the space with samples, provide a wizard/script/whatever that asks questions and generates the ini
[04:59:25] <jmkasunich> stepper/servo? branch based on that
[04:59:37] <cradek> and puts the ONE ini in the place where it goes, wherever that is
[04:59:39] <jmkasunich> which GUI? branch based on that
[04:59:59] <jmkasunich> we still need to support multiples, for wierd people
[05:00:26] <cradek> we tend to confuse the tillies and the power users
[05:00:36] <jmkasunich> (like us - we might find ourselves loading somebody elses config to help them, or we want a working config and a sim)
[05:00:37] <jmkasunich> yeah
[05:00:41] <cradek> we want to make it easy for tillie but maximally powerful for the power users
[05:00:46] <cradek> that is not a reasonable goal.
[05:01:06] <cradek> power users may not even want our gui.
[05:01:11] <jmkasunich> and the bias on that scale from tilly to power depends on who you asn (and when)
[05:01:12] <cradek> we don't have to concentrate on ease for them.
[05:01:28] <jmkasunich> s/asn/ask
[05:01:41] <cradek> yes. the person you ask will tend to be on the opposite end of the scale!
[05:01:47] <cradek> (whichever you pick)
[05:01:50] <jmkasunich> heh
[05:02:05] <cradek> probably because it's easier to argue than do
[05:02:06] <cradek> I do it too
[05:02:24] <jmkasunich> unfortunately I think we need to focus on the tillies, because those are the ones that will make us want to commit mayhem
[05:02:36] <cradek> not just because of that
[05:02:50] <jmkasunich> I was quite impressed with Willie Walker the other day
[05:02:56] <jmkasunich> never heard of him before
[05:02:56] <cradek> because tillies are the ones who will go buy a xylotex and just want the thing to work without any screwing around.
[05:03:14] <jmkasunich> he pops up on list with a good description of his problem and what he;s already tried to fix it
[05:03:22] <jmkasunich> he responds well to our advice
[05:03:33] <jmkasunich> and he succeeds
[05:03:39] <jmkasunich> we may never hear from him again
[05:03:51] <cradek> not sure I remember him
[05:04:05] <jmkasunich> needed to debounce his limit switches
[05:04:08] <cradek> oh right
[05:04:14] <cradek> he was sure cheery when I tried to help him
[05:04:21] <cradek> a nice guy, I bet
[05:04:36] <jmkasunich> meanwhile there was another guy with limit switch problems, he gave us nothing to work with
[05:04:53] <cradek> there's a whole range of people out there...
[05:05:07] <jmkasunich> I prefer to deal with the smart ones
[05:05:11] <jmkasunich> dumb people annoy me
[05:05:12] <cradek> maybe we have to work on accomodating the ones on the "far" end.
[05:05:37] <jmkasunich> I know..... but :-(
[05:05:44] <cradek> maybe I'll write an ini generator
[05:05:53] <cradek> totally standalone
[05:06:03] <cradek> it just has to write a file, maybe copy some others
[05:06:38] <jmkasunich> it would be nice if it used a template of sorts
[05:06:58] <jmkasunich> so you don't have to rewrite the program to extend it
[05:07:06] <cradek> configure --with-x-maximum-velocity=1.2 --with-x-acceleration=20
[05:07:20] <cradek> sorry, kidding
[05:07:23] <jmkasunich> lol
[05:07:43] <jmkasunich> I was thinking about things like having comments in the generated ini
[05:07:51] <jmkasunich> if you had a template ini
[05:07:55] <jmkasunich> with things like:
[05:08:06] <cradek> if the program is simple (not f-ing tcl) extending it would be as easy as editing a template
[05:08:48] <jmkasunich> GUI = {choices:axis, tkemc, mini;descriptions: new and fancy, blue, fscking huge window}
[05:08:59] <cradek> hehehe
[05:09:10] <jmkasunich> params that you don't need to prompt the user about would just be copied
[05:09:24] <jmkasunich> as would the comments
[05:09:42] <cradek> I'd prefer to lay out a screen with all the choices in one place though - wizards are irritating for simple things
[05:09:53] <cradek> with that in mind, it's hard to run from a template
[05:09:58] <jmkasunich> yeah
[05:10:05] <cradek> I hate being asked one question at a time
[05:10:11] <jmkasunich> other things wouldn't work for templates either
[05:10:14] <cradek> you should be able to see all related things at once.
[05:10:17] <jmkasunich> scaling - you need to do math
[05:10:40] <jmkasunich> let them fill in things like steps/rev, microstepping, gear ratio, and thread pitch
[05:11:10] <cradek> degrees per step
[05:11:20] <jmkasunich> heh, both are used
[05:11:21] <cradek> (the thing that's written on the motor)
[05:11:23] <cradek> oh
[05:11:31] <cradek> well they could enter either.
[05:12:01] <jmkasunich> radiobutton (x) steps/rev (_) degrees/step
[05:12:41] <jmkasunich> the idea sounds good for basic machines
[05:12:50] <jmkasunich> a lot harder as things get more complex
[05:13:02] <jmkasunich> like the guy who needed to debounce his limit switches
[05:13:28] <cradek> yeah, that's outside the realm of this hypothetical program.
[05:14:24] <cradek> it would be nice if, without editing files, someone could get some reasonable steps/dirs out the parport.
[05:14:40] <jmkasunich> yeha
[05:14:44] <jmkasunich> yeah
[05:14:57] <cradek> then, that might snag them long enough to figure out how to get their limit switches working
[05:15:05] <cradek> and then they're committed
[05:15:05] <jmkasunich> right now, really the only thing they have to change is scale and maybe vel/acc
[05:15:18] <cradek> xylotex or standard pinout
[05:15:23] <jmkasunich> yes
[05:15:27] <cradek> inch or mm
[05:15:33] <cradek> very basic things
[05:16:11] <jmkasunich> probably one screen, at worst 4 tabs (one for general, one for each axis with scaling, accel, and velocity stuff)
[05:16:54] <jmkasunich> more brainstorming
[05:17:36] <jmkasunich> what if we keep setupconfig, give it the existing new, backup, restore commands
[05:17:57] <cradek> does setupconfig work today?
[05:17:58] <jmkasunich> but new then gives you a choice of copying a template, OR invoking the program you are describing
[05:18:33] <jmkasunich> it did and maybe still does for RIP, but needs fixed to understand paths and permissions for installed
[05:18:49] <jmkasunich> and of course if common goes away a lot of cruft can come out of it
[05:19:02] <cradek> I didn't know it was so close to being done.
[05:19:06] <jmkasunich> anyway, the program you are describing could of course be invoked alone
[05:19:13] <cradek> if we polish it up can we have a release?
[05:19:18] <jmkasunich> backup works (I think) restore no
[05:19:21] <jmkasunich> new works
[05:20:01] <cradek> hey I just had a ridiculous idea
[05:20:12] <cradek> a web-based ini generator
[05:20:14] <jmkasunich> given the time it took to make pickconfig, I suspect it would take a very busy weekend or a week to get setupconfig to a similar level of doneness
[05:20:23] <cradek> you fill out the web form, it gives you your ini to download
[05:20:48] <jmkasunich> how does that compare to the program you were just describing?
[05:20:51] <jmkasunich> both have forms
[05:20:57] <jmkasunich> both generate ini files
[05:21:03] <cradek> it's the same. different type of programming.
[05:21:05] <jmkasunich> which is harder to write?
[05:21:38] <cradek> for me the web is probably easier, but it can't directly manipulate the files on the user's computer.
[05:21:46] <cradek> just a thought.
[05:22:01] <jmkasunich> web is easier?
[05:22:02] <cradek> the benefit is web forms are so familiar to everybody.
[05:22:13] <cradek> probably. not much gui to design.
[05:22:41] <cradek> who am I kidding? they'd both be harder than it seems like they should be.
[05:22:58] <jmkasunich> yep
[05:23:09] <cradek> we need some more volunteers.
[05:23:10] <jmkasunich> thats what happened with setupconfig
[05:23:25] <jmkasunich> the GUI code started to overwhelm the actual "stuff" that it does
[05:23:28] <cradek> 2-3-4 of us aren't enough to make this project
[05:23:38] <cradek> yeah, that's because tcl is awful.
[05:23:50] <jmkasunich> what is better?
[05:24:00] <cradek> I really need to figure that out.
[05:24:17] <cradek> python is better than tcl. gtk+python may be the way to go, I'm not sure.
[05:24:23] <cradek> wxpython might be ok
[05:24:34] <jmkasunich> I'm a C programmer who's been draged kicking and screaming into GUI stuff
[05:24:37] <cradek> gtk+glade+C seems not hard to use
[05:24:52] <jmkasunich> scope is GTK + C
[05:24:57] <cradek> I'd take lisp+gtk if I had it
[05:24:57] <jmkasunich> I didn't use glade
[05:25:05] <cradek> I'd take anything at ALL over tcl
[05:25:23] <cradek> hmm, except maybe perl
[05:25:43] <jmkasunich> is perl the one where indenting counts? or is that python?
[05:25:51] <cradek> I should write a simple app in each of these and find out which sucks the least
[05:26:01] <cradek> yes python uses indention where C uses { }
[05:26:25] <cradek> seems strange for only about the first five minutes
[05:26:44] <cradek> of course you need a decent editor to make it easy to work with
[05:26:58] <jmkasunich> I just want us to release the damned thing so I can go back to the stuff I really want to write
[05:27:14] <jmkasunich> VCP and an associated HAL<->NML UI thing
[05:27:40] <cradek> it's a bit silly that you're stuck writing things like setupconfig.
[05:27:51] <jmkasunich> like you said, we need more people
[05:28:00] <cradek> yep
[05:28:00] <jmkasunich> it seems like its always 2-3 people
[05:28:09] <jmkasunich> the names change, but the list never grows
[05:28:15] <cradek> and all in the board now, which is odd
[05:28:16] <jmkasunich> Fred/Will
[05:28:23] <jmkasunich> then Matt/Ray
[05:28:29] <jmkasunich> then Paul/Me
[05:28:39] <jmkasunich> then Alex/Paul/Me
[05:28:52] <jmkasunich> now Alex/You/Jepler/me
[05:28:59] <jmkasunich> (so I guess we are getting better)
[05:29:06] <jmkasunich> there are others as well
[05:29:09] <cradek> I do see a bit of increase there
[05:29:31] <jmkasunich> not so odd really
[05:29:43] <jmkasunich> the people who get elected are those who are seen as getting things done
[05:29:54] <cradek> true.
[05:30:20] <cradek> before I ran, I voted for the ones I saw hanging out on irc answering questions. That was my only metric.
[05:30:40] <jmkasunich> name recognition ;-)
[05:30:50] <cradek> not really - if they didn't care to help people use the software, they didn't get my vote.
[05:31:09] <jmkasunich> my helpfullness varies
[05:31:32] <cradek> you just spent your evening helping
[05:31:52] <jmkasunich> because it seemed there was a bug in code that I consider mine
[05:32:17] <cradek> that's exactly when your help is needed
[05:32:55] <jmkasunich> I must admit that I ignored the complaints about BASE_PERIOD far longer than I should have
[05:33:03] <jmkasunich> that only took one evening to find and fix
[05:33:14] <jmkasunich> but it was at least a week after it was first reported
[05:33:42] <cradek> sometimes it's hard to take seriously the first few reports
[05:34:00] <cradek> bug reporting, etc, is not an exact science
[05:34:04] <jmkasunich> true
[05:34:23] <jmkasunich> the nature of the report makes a huge differnence
[05:34:31] <cradek> sometimes a real bug report is mixed up in lots of other things, like gene's recent problem with the makefile
[05:34:57] <jmkasunich> between "jeez, what an idiot, he probalby fscked it up" and "hmm, that looks real, and theres enough info here that I might be able to find the problem"
[05:35:06] <cradek> exactly right
[05:35:41] <jmkasunich> it really was a joy working with the bouncy limit switch guy
[05:35:52] <cradek> yeah, I thought so too
[05:36:07] <cradek> we've had several of those
[05:36:22] <jmkasunich> he understood HAL enough to hook up the limits on his own, used scope quite well for someone not accustomed to such things...
[05:36:40] <cradek> they pop in, give a great bug report, get an answer right away (because the report is so good), they're gone
[05:37:45] <cradek> like the guy who reported the accel problem when the axes/traj were different - it made me want to fix it for him, he'd gone to the trouble of figuring out exactly what was going on
[05:38:00] <jmkasunich> yeah
[05:38:15] <cradek> I had known it was somehow wrong for a long time, but never bothered to change the numbers the dozen different ways necessary to figure it out
[05:38:28] <jmkasunich> here is the _other_ guy with limit problems: I have a fresh install of BDI 4.38. I haven't even hooked up the machine
[05:38:28] <jmkasunich> to the computer yet and I am getting the error that says hardware limit
[05:38:28] <jmkasunich> error on axis 0,1, and 2. Nothing has even been hooked up to the
[05:38:28] <jmkasunich> computer yet how can this be. Any thoughts?
[05:39:00] <cradek> ouch.
[05:39:07] <jmkasunich> yeah, I had some thoughts when I read that, but I wasn't gonna share them on a public mailing list
[05:39:09] <cradek> not a good grasp of what might be going on...
[05:39:31] <cradek> I have a programmer at work who's like that
[05:39:44] <cradek> I so want to say "how about you try troubleshooting?"
[05:39:55] <jmkasunich> he might hurt himself
[05:40:10] <cradek> well, he'd give me that look
[05:40:13] <cradek> you know the one
[05:40:18] <cradek> haha
[05:40:21] <cradek> I better get to bed
[05:40:25] <jmkasunich> same here
[05:40:29] <cradek> goodnight
[05:40:30] <jmkasunich> its tomorrow already
[05:40:33] <cradek> did you email skunks?
[05:40:38] <jmkasunich> oops, no
[05:40:48] <jmkasunich> I'll do that now
[05:40:57] <cradek> I'm anxious to hear what he finds in the morning
[05:41:11] <cradek> and I bet he'll have it fixed and working before the end of the day
[05:41:35] <jmkasunich> and then we get to post to the list (so folks know it wasn't rtapi ;-)
[05:41:50] <cradek> ha
[05:42:28] <cradek> or ubuntu!
[05:42:40] <jmkasunich> right
[05:42:50] <cradek> btw there was a security update, so I built new kernel packages
[05:43:05] <cradek> let me know if the update gives you any troubles pleas
[05:43:05] <cradek> e
[05:43:27] <jmkasunich> I saw the updates, they're downloaded and installed, but I imagine I won't use them until I reboot
[05:43:35] <cradek> oh done already? cool
[05:43:41] <jmkasunich> (tomorrow, I power down overnight except on the weekends)
[05:44:25] <cradek> ok, goodnight now
[05:44:29] <jmkasunich> night
[05:48:34] <jmkasunich> sent
[13:07:01] <alex_joni> logger_devel, bookmark
[13:07:01] <alex_joni> See http://solaris.cs.utt.ro/irc/irc.freenode.net:6667/emcdevel/2006-02-17#T13-07-01
[13:08:07] <skunkworks> lobber_devel: bookmark
[13:08:21] <skunkworks> oops
[13:08:29] <skunkworks> logger_devel: bookmark
[13:08:29] <skunkworks> See http://solaris.cs.utt.ro/irc/irc.freenode.net:6667/emcdevel/2006-02-17#T13-08-29
[13:29:57] <alex_joni> skunkworks: use Tab expension, just like in *nix
[13:30:43] <skunkworks> - ok I have no clue what you just said there
[13:31:20] <alex_joni> when you try to execute a command in linux, do you always type the whole name, or do you type the first few letters then hit the 'Tab' key?
[13:31:32] <alex_joni> try it now, type 'sk' and then hit tab
[13:31:45] <alex_joni> your IRC software should expand it to skunkworks
[13:32:04] <skunkworks> that is cool - thanks
[13:32:15] <alex_joni> the same thing works in command line mode
[13:32:21] <alex_joni> for folders, commands, etc
[13:32:47] <skunkworks> so it looks at what is currently up and sees what matches?
[13:33:13] <alex_joni> if you have multiple solutions to the extension, try pushing the Tab twice, it will give you a list of possible extensions
[13:33:16] <alex_joni> skunkworks: yes
[13:34:12] <skunkworks> double tabbing doesn't work in mirc
[13:34:20] <skunkworks> multible tabbing
[13:34:49] <skunkworks> but it is a start - will have to try it in the whatever irc client is in linux
[13:36:30] <skunkworks> wow that is neet. It will help my bad spelling ;)
[13:37:25] <skunkworks> ok - I am going to reboot this thing - it was on pass 4 - 0 errors
[13:38:06] <skunkworks> I will take one of the chips out and run emc - was emc broken the last you remember or should I be able to run it?
[13:38:11] <skunkworks> alex_joni?
[13:40:12] <skunkworks> I have the boot screen they where talking about - do I pick the 2.6.12 magma (top one)
[13:55:28] <alex_joni> sorry, was away
[13:55:40] <alex_joni> yes
[13:55:48] <alex_joni> you pick the 2.6.12-magma
[13:56:19] <alex_joni> skunkworks: you should be able to run emc (from the GUI)
[13:58:33] <skunkworks> ok
[13:58:37] <skunkworks> got side tracked
[14:12:28] <cradek> morning
[14:12:42] <rayh> Hi Chris
[14:12:56] <rayh> Say, TESTING helps a LOT.
[14:13:17] <cradek> great
[14:13:55] <rayh> I got roltek running with both installed and rip.
[14:14:22] <rayh> He's seeing some problems with his cdrom burner yet.
[14:14:23] <cradek> moving TESTING is almost like a mini-release
[14:14:52] <cradek> I'll make new ubuntu packages at the same time, so everyone gets to test the same thing, no matter how they choose to update
[14:15:03] <rayh> I'll try to get an install going here in the next couple of days.
[14:15:22] <cradek> I still haven't tried to fix my CD burner either
[14:15:27] <rayh> Will there be an easy way for someone to see which testing they have.
[14:15:42] <cradek> yes, it'll be in help/about
[14:15:49] <rayh> Ah. Great.
[14:16:11] <alex_joni> hi guys
[14:16:18] <alex_joni> * alex_joni managed to finish stuff for now ;)
[14:16:19] <cradek> help/about also tells if you are NOT running a testing version
[14:16:21] <rayh> Hi alex
[14:16:24] <alex_joni> so I'm back ;)
[14:16:25] <cradek> are you back home?
[14:16:27] <cradek> yay
[14:16:43] <alex_joni> cradek: my main server crashed yesterday, while I was away.. that was a PITA
[14:16:49] <cradek> ouch
[14:16:53] <skunkworks> cradek - I ran emc with the full memory - it didn't start. removed the 512mb stick booted and emc started. Reinstalled the 512mb stick - emc didn't start. I have another 512 I can put in to see.
[14:16:54] <cradek> that always happens when we're away doesn't it
[14:17:08] <cradek> skunkworks: that's great news
[14:17:11] <alex_joni> the hdd with the home's wasn't working any more
[14:17:13] <alex_joni> skunkworks: yay
[14:17:27] <alex_joni> so I guess that makes cradek & jmk a bit smarten than memtest86
[14:17:42] <cradek> skunkworks: it was not just an emc problem - when I loaded the machine it royally crapped out... you could try that again too.
[14:17:44] <alex_joni> cradek: glad you managed to take over from where I left that
[14:17:49] <cradek> be back in a bit
[14:19:23] <alex_joni> cradek: ok
[14:29:43] <alex_joni> rayh: hi there
[14:34:48] <rayh> Back home from your travels?
[14:35:00] <alex_joni> yeah, but only to find a mess over here
[14:35:13] <rayh> Your computers?
[14:35:20] <alex_joni> I arrived last night, and now I just finished my first day to get things back together
[14:35:32] <alex_joni> rayh: our server (and mainly the /home HDD)
[14:35:40] <rayh> Ouch.
[14:35:43] <alex_joni> about 80G
[14:35:56] <alex_joni> of data, but it's recovered, and back operational now
[14:36:06] <rayh> Fantastic.
[14:36:11] <alex_joni> good thing I had a spare server I started to set-up last week
[14:36:21] <alex_joni> so I only had to swap them remotely yesterday ;)
[14:36:30] <rayh> I guess. What went wrong?
[14:36:37] <skunkworks> ok - I just put a differnt 512 back in and it is not starting
[14:37:11] <alex_joni> skunkworks: that might prove to be interesting
[14:37:37] <alex_joni> can you live with 1G for now?
[14:38:08] <skunkworks> yes - do you think that there is some odd limit in the rt/emc2?
[14:38:19] <alex_joni> cradek: maybe a bug in the kernel accessing stuff over 1G ?
[14:38:26] <alex_joni> skunkworks: it's emc independent
[14:38:36] <alex_joni> even gcc started to crash when using lots of memory
[14:42:50] <alex_joni> skunkworks: can you try and swap the mem chips? the 512 and the 1024, I mean
[14:43:03] <skunkworks> that is what I am doing
[14:43:08] <skunkworks> right as we speek
[14:43:10] <skunkworks> booting
[14:44:49] <alex_joni> ok
[14:44:59] <alex_joni> hi sam_
[14:45:10] <sam_> Starting emc...
[14:45:10] <sam_> HAL: ERROR: pin 'axis.0.motor-pos-cmd' not found
[14:45:10] <sam_> HAL:5: link failed
[14:45:10] <sam_> HAL config file /etc/emc2/sample-configs/sim-AXIS//../common/core_sim.hal failed.
[14:45:10] <sam_> Shutting down and cleaning up EMC...
[14:45:19] <alex_joni> ok, so same thing
[14:45:21] <sam_> yes
[14:45:45] <sam_> so it doesn't seem to be the physical memory but the amount
[14:45:55] <alex_joni> it might be the chipset
[14:46:05] <alex_joni> and the address it tries to write/read to/from
[14:46:32] <alex_joni> because only one bit is always the problem (reading the discussion of cradek & jmk) on the higher address space
[14:46:43] <alex_joni> so if you use less memory you don't get in that address space
[14:46:59] <alex_joni> if you use more, then EMC internal stuff ends up there, and is affected by the bug
[14:47:12] <alex_joni> but I am VERY surprised it doesn't show up in memtest86
[14:48:36] <sam_> odd
[14:49:24] <alex_joni> indeed
[14:49:39] <alex_joni> can you put 2G in?
[14:49:47] <alex_joni> you said you had 2 512 chips
[14:49:49] <sam_> yes
[14:49:56] <sam_> hold on - I will do that
[14:50:03] <alex_joni> try that, it might get emc to work
[14:51:09] <skunkworks> crap - no I can't. I thought there was 3 slots in there but only 2
[14:51:23] <alex_joni> well.. that's it
[14:51:29] <alex_joni> use the 2x512 ;)
[14:51:34] <skunkworks> it doesn't shut down correctly either
[14:51:34] <alex_joni> and you have a spare 1G
[14:51:43] <alex_joni> that is expectable
[14:51:48] <skunkworks> the screen goes wacky
[14:51:57] <alex_joni> in order to shutdown an ATX box, you need ACPI in the kernel
[14:52:11] <alex_joni> but ACPI & RT don't mix well
[14:52:20] <alex_joni> so RT kernels have ACPI disabled by default
[14:52:29] <alex_joni> skunkworks: was emc running when you shut down?
[14:52:40] <skunkworks> normally it goes though the text shut down - but with the extra memory and emc crash it doesn't
[14:52:52] <alex_joni> oh, then I'm not worried ;)
[14:52:56] <skunkworks> Probably was - houw do you stop it?
[14:53:00] <alex_joni> extra memory means linux might crash too
[14:53:28] <alex_joni> keep the power button pressed for more than 4 seconds
[14:53:35] <alex_joni> or pull the power cord ;)
[14:54:10] <skunkworks> right - I ment before I shut down - I thought you guys where stopping the left over emc stuff
[14:54:23] <alex_joni> you could try this:
[14:54:36] <alex_joni> /usr/bin/emc_module_helper remove motmod
[14:54:47] <alex_joni> /etc/init-d/realtime stop
[14:54:55] <skunkworks> ok
[14:55:32] <alex_joni> * alex_joni heads home..
[14:55:47] <skunkworks> I don't know how old the bios is - could that effect it - if there was a bug?
[14:58:02] <alex_joni> probably not, I suspect a HW problem
[14:58:14] <alex_joni> I'll be back later
[15:03:55] <cradek> skunkworks: I've had a P4 machine with bad cache (on the processor itself)
[15:04:11] <cradek> the problem with your machine is not necessarily in the ram modules
[15:04:27] <cradek> it could be in the processor or on the motherboard too.
[15:08:03] <skunkworks> ok
[15:08:15] <skunkworks> have you run emc on more than 1gb?
[15:08:31] <skunkworks> putting 2 512mb chips works also
[15:08:40] <cradek> not personally
[15:09:34] <cradek> my fast machine has only 512
[15:09:58] <cradek> I know it's a long shot, but do you have another P4 processor you could try?
[15:10:03] <skunkworks> so with this motherboard 1gb is the limit. I am trying to think if I have enough pc133 memory to take my other box to over 1gb
[15:10:13] <skunkworks> I will have to look
[15:10:15] <skunkworks> might
[15:15:35] <cradek> also I noticed that it is running at at only 1800MHz when the processor is a 2400
[15:15:48] <cradek> so maybe the motherboard thinks something is wrong
[15:22:37] <rayh> is ubuntu okay with sata drives?
[15:23:21] <cradek> yes, I'm sure it is
[15:24:46] <rayh> Thanks.
[15:25:58] <alex_joni> hello
[15:26:36] <SWP_Away> yes, ubuntu works just fine with SATA
[15:26:49] <SWP_Away> even SATA CD/DVD burners
[15:27:28] <cradek> that's good to know
[15:27:34] <cradek> so far I only have SCSI and regular IDE
[15:27:53] <SWP_Away> in fact, ubuntu is the only OS I've been able to install on my big machine
[15:27:55] <alex_joni> do you guys have a hint for a good server for me?
[15:28:04] <SWP_Away> tried Gentoo 64, XP 64
[15:28:18] <SWP_Away> hosting or to buy?
[15:28:21] <alex_joni> I need some hotswappable, SCSI drives (better with HW Raid), hot-swappable PSU's
[15:28:22] <SWP_Away> SWP_Away is now known as SWPadnos
[15:28:25] <alex_joni> SWPadnos: to buy
[15:28:44] <alex_joni> not very much processor speed, 1G mem is fine
[15:29:01] <alex_joni> not very much =~1-2GHz
[15:29:04] <cradek> alex_joni: I'm happy with my sempron 3300 which was quite cheap but it's fast
[15:29:12] <alex_joni> cradek: for home?
[15:29:14] <SWPadnos> I'm not sure you can get anything that slow ;)
[15:29:14] <cradek> yes
[15:29:38] <alex_joni> cradek: yes, for home it's ok, but I want to replace the normal PC I used as a server
[15:29:45] <cradek> ah
[15:29:52] <cradek> I don't have any idea about server class hardware
[15:30:06] <cradek> my experience is that it's expensive and harder to replace parts on when something goes wrong.
[15:30:26] <alex_joni> yes, but it might be more reliable (I hope)
[15:30:28] <SWPadnos> http://www.retrobox.com, for used server stuff
[15:30:46] <alex_joni> SWPadnos: I'd rather go with something I can get around here
[15:30:52] <alex_joni> so I guess IBM & co.
[15:31:08] <SWPadnos> hotswap PSU will be the kicker, I think
[15:31:18] <SWPadnos> can you get SuperMicro hardware there?
[15:31:18] <skunkworks> I build my servers - I like to use supermicro motherboards
[15:31:31] <alex_joni> dunno..
[15:31:34] <skunkworks> :)
[15:31:49] <SWPadnos> well, the case I have is their SC743-645T
[15:32:21] <SWPadnos> there is a version with hot-swap power supplies, and I think there's a version with a SCSI RAID cage (instead of SATA)
[15:32:53] <SWPadnos> any of their server motherboards will fit. I have the H8DCE
[15:34:33] <alex_joni> hmm.. this looks nice: http://www.supermicro.com/products/system/4U/7044/SYS-7044H-X8R.cfm
[15:34:38] <alex_joni> kinda something I want
[15:34:59] <SWPadnos> that's basically what I have for my new workstation
[15:35:16] <SWPadnos> you can flip it over - the right hand set of drive bays can be rotated
[15:35:57] <alex_joni> I don't want to flip it over, I want to rack-mount it ;)
[15:36:04] <SWPadnos> see if these guys will ship to you: http://www.monarchcomputer.com
[15:36:25] <SWPadnos> oh, well in that case, you can still look at the towers, they can be rackmounted as well
[15:36:34] <SWPadnos> you flip them on their sides ;)
[15:37:15] <jepler> SWPadnos: am I supposed to take retrobox.com seriously? It comes up with a flash-only front page here
[15:37:36] <SWPadnos> only if you want to. that flash page is a new thing
[15:37:51] <jepler> ugh. now it's opened a new window and must be starting java or something
[15:37:52] <SWPadnos> http://www.retrobox.com/rbwww/home/
[15:37:53] <jepler> I hate the web
[15:38:39] <SWPadnos> where else can you get a SCSI array with 1 36G drive + 13 9G drives for $151
[15:39:08] <SWPadnos> only $65 for the 12 x 18G arrays
[15:39:26] <alex_joni> hi jeff
[15:40:41] <alex_joni> SWPadnos: http://www.retrobox.com/rbwww/home/unit_view.asp?id=1483201&bin_id=world
[15:40:46] <alex_joni> that sounds nice
[15:41:08] <jepler> What would you want with an 12 x 18G array? Throughput?
[15:41:16] <SWPadnos> yep. $256 isn't bad ;)
[15:41:40] <SWPadnos> how about 100G of redundant storage with hot spares?
[15:44:54] <alex_joni> that's nice too
[15:44:58] <alex_joni> what's RAID 5?
[15:46:04] <skunkworks> I like raid 5
[15:46:17] <skunkworks> need 3 or more drives of the same size
[15:46:37] <skunkworks> you loose the capasity of one drive but if any one goes bad you keep running
[15:46:44] <SWPadnos> minimum of 3 drives, parity data is striped across all drives
[15:47:06] <SWPadnos> data is completely available if any one drive in the array fails
[15:47:22] <alex_joni> ok, I like that ;)
[15:48:00] <SWPadnos> you can also add drives, and you still lose only one for parity (so 5 drives gives 4x storage + 1 parity)
[15:48:10] <skunkworks> right
[15:48:42] <SWPadnos> and with most SCSI controllers, you can have hot spares - drives that are running but unused. if a drive fails, the spare will automatically be added to the array, and the data rebuilt.
[15:49:17] <alex_joni> hrmm.. a sample config is $4180
[15:49:22] <SWPadnos> heh
[15:49:32] <skunkworks> (make sure your harware raid alows for dynamic resizing - so you can add drives and rebuild without wiping clean and starting over
[15:49:53] <skunkworks> I would think anything new would do that though.
[15:50:22] <alex_joni> hope the 'Adaptec 1662200 2930 Ultra 320 SCSI 64-Bit PCI Host Adaptor' might know that ;)
[15:50:24] <SWPadnos> I don't think it's that common, actually
[15:51:07] <SWPadnos> if you plan to run Linux, then hardware RAID isn't necessarily the best bet
[15:51:16] <alex_joni> RAID levels: 0, 1, 10, 5, 50, JBOD
[15:51:28] <alex_joni> Online RAID Level Migration
[15:51:29] <alex_joni> Online capacity expansion
[15:51:29] <alex_joni> Immediate RAID availability (background initialization)
[15:51:30] <skunkworks> my latest supermicro motherboard with raid allows that - my older motherboards don't (pentium II vintage)
[15:51:34] <alex_joni> SWPadnos: how come?
[15:51:44] <skunkworks> oh - forgot about linux - sorry
[15:51:46] <SWPadnos> the Linux RAID code is pretty efficient
[15:52:00] <SWPadnos> hardware used to be faster, but isn't any more
[15:52:06] <skunkworks> (use novell and micro$oft
[15:52:19] <SWPadnos> CPU speed has improved way faster than drives or RAID controllers
[15:52:30] <skunkworks> interesting
[15:54:10] <SWPadnos> you also get the advantage of being able to RAID any drives - combine SATA + IDE + USB + SCSI in one array
[15:54:30] <skunkworks> that is pretty neet
[15:54:49] <SWPadnos> so you could do something silly like have RAID5 SCSI drives, and mirror the whole array to a single huge capacity IDE drive ;)
[15:55:20] <SWPadnos> some people do that kind of thing for "backup" purposes
[15:55:32] <alex_joni> heh, that really sounds nice
[17:43:36] <rayh> Is there any point to apt-cdrom add with the ubuntu?
[17:44:02] <cradek> I think the CD is already in apt
[17:44:08] <cradek> you mean the install CD?
[17:44:32] <cradek> yeah it's the first line in my sources.list
[17:46:29] <rayh> Okay. Thanks. Installing now.
[17:47:40] <cradek> are you going to install the extra packages over your dialup?
[17:48:31] <cradek> if so I bet you can safely skip the ubuntu OS updates
[17:48:47] <cradek> especially their updated kernel packages, since you won't be using them
[17:48:53] <cradek> that will save you tens of MB
[17:50:51] <cradek> off to lunch
[17:53:37] <rayh> Thanks for the tip.\
[18:19:16] <sam_> cradek: alex_joni: ?
[18:31:33] <skunkworks> Ok - I have some interesting info.
[18:33:12] <skunkworks> I just installed ubuntu on a totally differnt computer. (my workstation dell dimention 3000) that has 1.25gb of memory. Emc crashes upon startup just like the other computer. If I remove the 256mb - emc starts and runs.
[18:34:12] <skunkworks> The plot thickens ;)
[18:35:17] <skunkworks> 2.8ghz pentium 4 with 1.25gb memory
[18:36:31] <skunkworks> thinking now it may not be hardware.
[18:37:13] <skunkworks> it is like people have real jobs or something ;)
[18:38:13] <skunkworks> btw - nice work - I was goofing around with emc2 on the other computer (1.8ghz) and was able to get .0002 period. that is an unreal improvement.
[18:38:58] <skunkworks> I ment .00002 - and I am happy with .00003
[18:39:11] <skunkworks> .00003 is really all I need
[18:40:33] <skunkworks> that gives me 100ipm on my slow axis.
[18:44:47] <rayh_> This is from ubuntu.
[18:44:59] <rayh_> Now to get the emc upgrades.
[18:50:43] <skunkworks> ?
[18:51:23] <rayh_> The following packages have unmet dependencies:
[18:51:23] <rayh_> emc2-axis: Depends: emc2 but it is not going to be installed
[18:51:23] <rayh_> E: Broken packages
[18:51:23] <rayh_> r
[18:51:48] <rayh_> darn
[18:52:23] <rayh_> cradek, You around?
[18:53:28] <skunkworks> are you trying to install ubuntu and then emc2 from cradeks site?
[18:53:46] <rayh_> Ubuntu is installed.
[18:54:04] <rayh_> I've got the ubuntu box online
[18:54:05] <skunkworks> then there is like 50mb of ubuntu updates
[18:54:15] <rayh_> grabbed his installer script
[18:54:33] <rayh_> Cradek thought I could skip those updates.\
[18:54:54] <skunkworks> I haven't and have not had a problem (installed it atleast 5 times so far)
[18:55:21] <skunkworks> although I have not tried skipping them
[18:55:55] <rayh_> He said much of the updates were kernel which would not get used at all.
[18:56:05] <rayh_> When we installed his kernel.
[18:56:08] <skunkworks> I mean that I have allways installed the updates before emc2
[18:56:27] <skunkworks> interesting
[18:57:05] <skunkworks> maybe he has a newer ubuntu install cd?
[18:57:05] <rayh_> I understand. My problem is a dialup. If I can skip, I will.
[18:57:14] <rayh_> 5.10
[18:57:30] <skunkworks> yah - that would be a pain
[18:58:10] <skunkworks> I downloaded it from the ubuntu's site a few days ago - still required 50mb of updates ;)
[18:58:11] <rayh_> Maybe the problem is with alex's server.
[18:58:25] <skunkworks> I just installed it about an hour ago
[18:58:32] <rayh_> ah okay.
[18:58:49] <skunkworks> emc2 that is from cradeks script
[19:00:09] <skunkworks> could you look at the updates and only install the one you need?
[19:00:28] <skunkworks> (I have no clue which one - unmet?)
[19:01:12] <skunkworks> just talking out of my ass - not a linux person
[19:01:58] <rayh_> not much of one either.
[19:02:56] <skunkworks> no one is around.
[19:02:58] <rayh_> ticked the kernel headers and image and 1h55m to go.
[19:03:11] <skunkworks> wow
[19:03:12] <rayh_> That'll be a start.
[19:03:23] <skunkworks> I could email them to you ;)
[19:03:48] <rayh_> uh huh
[19:03:52] <skunkworks> ;)
[19:05:15] <rayh_> I should probably hire a horse or cross-country skier.
[22:42:16] <rayh> Ubuntu is up with EMC. Question about the developer stuff?
[22:43:24] <skunkworks> did it take the updates for it to work?
[22:48:25] <rayh> No.
[22:48:46] <rayh> Because I used static IP addys on the local net.
[22:49:08] <rayh> It didn't uncomment the normal locations for packages needed by emc
[22:49:20] <skunkworks> wow
[22:50:09] <rayh> I was looking for the emc source package name
[23:14:16] <rayh> cradek: What do I need to do to get the emc source stuff for development?