#emc | Logs for 2011-02-01

Back
[00:00:44] <jepler> for the control-line the enable and direction lines are tied -- enable to gnd (always enable) and dir to vcc (always gate A to B)
[00:01:05] <jmkasunich> so they're using it as a unidirectional buffer
[00:01:05] <jepler> for the data lines, the enable is tied to gnd (always enable) but I didn't determine what the direction line is tied to
[00:01:58] <jepler> with only the control-line '245 installed, it also lives through the read function
[00:01:59] <jmkasunich> did you have to unsolder the chips?
[00:02:05] <jepler> no, they're socketed (happily)
[00:02:07] <jmkasunich> not surprising
[00:02:12] <jmkasunich> wow, thats rare these days
[00:02:46] <jmkasunich> I bet the 245 dir pin is driven by some tiger local bus signal
[00:03:14] <jmkasunich> are the 8255's socketed?
[00:03:20] <jepler> yes they are too
[00:03:41] <jepler> you think it would be interesting to unsocket them and put in both 245s?
[00:03:44] <jmkasunich> did you already try pulling them (with the 245 installed)? ISTR you mentiioning that but I'm not sure
[00:04:00] <jepler> no, I haven't done that yet
[00:04:00] <jmkasunich> yes, assuming some assumptions are right
[00:04:11] <jmkasunich> assumptions being that the 245 is between the tiger and the 8255s
[00:04:26] <jmkasunich> 245 pointing at empty sockets = nothing happening
[00:04:26] <jepler> I'm pretty sure of that
[00:04:42] <jmkasunich> 8255 pointing back at the tiger when it wants to use the bus = problem
[00:04:52] <jmkasunich> that would point you directly at 245 dir issues
[00:04:54] <jepler> no, the 245 would still drive the tiger data bus even if its inputs were floating
[00:05:08] <jepler> whether or not the 8255s are there
[00:05:32] <jepler> or are you telling me that this will say whether it's 8255 bus or tiger bus?
[00:05:32] <jmkasunich> right - if you still get the problem it exonerates the 8255 and points a finger at the dir line
[00:06:04] <jepler> if it's two things driving one bus, it could be tiger + 245, or it could be 245 + 8255
[00:06:26] <jmkasunich> yeah, but the latter would (I think) not mess up the tiger register reads
[00:06:45] <jmkasunich> you'd have contention on the far side of the 245, the near side would be OK
[00:07:11] <jmkasunich> although I'm a little confused because the registers in the tiger shouldn't be on the local bus at all
[00:09:58] <jepler> I figured over-current somewhere was killing the chip -- in fact, until I saw the configuration registers read back earlier, I assumed the tiger320 was dead and gone, not just a little bit dead
[00:10:10] <jepler> * jepler wonders if he swapped the '245s around between the sockets
[00:10:16] <jepler> (I didn't track which was which when I removed 'em)
[00:10:29] <jmkasunich> they're fairly hard to kill
[00:11:38] <jmkasunich> there are also more "interesting" things that could make it not work when you have contention
[00:12:16] <jmkasunich> for example, if one is driving high and the other low, there will be a pulse of current that could cause ground bounce on one chip or the other and disrupt things that nominally have no connection to the bus where the contention is
[00:13:23] <jepler> this is interesting -- the dir input of the data bus '245 is an output of the control bus '245. If I can believe the silkscreen, it's XRD# which I've taken to mean the 8255-bus version of the tiger's RD# signal
[00:14:22] <jmkasunich> that sounds reasonable
[00:14:31] <jepler> there's another chip I could unsocket: 74HCT139 which is a 2-to-4 decoder. I believe its outputs are connected to the CS# lines of the 8255s
[00:14:42] <jepler> without that one, the 8255s should never driver the data bus
[00:14:43] <jmkasunich> logical
[00:15:02] <jmkasunich> yep - are the 8255's CMOS? if so, you should pull their floating enable inputs high
[00:15:11] <jmkasunich> if TTL they'll be high by themselves
[00:16:49] <jepler> here's a thought: there are local bus addresses that would correspond to the unused output of the '139. that's a way to get nothing to drive the 8255 data bus while having all the chips socketed
[00:17:18] <jmkasunich> yeah
[00:17:28] <jmkasunich> I'm not sure what the idea here is
[00:17:53] <jmkasunich> if you accessing certain addresses is the problem, you can just remove those accesses and see if it stops crashing
[00:18:12] <jepler> that lets me leave everything in the sockets, have nothing driving the 8255 bus, but have the 245 drive the tiger320 bus
[00:18:15] <jmkasunich> how is changing the addresses a better test?
[00:18:24] <jmkasunich> oh, ok
[00:18:50] <jmkasunich> do you have a fairly repeatable crash condition now? or is it still random?
[00:19:08] <jepler> the 8255s are marked "82C55AC" and I don't see a convenient way to tie their CS# lines high but this will work just as well for pointing the finger at the 245 <-> tiger bus
[00:19:20] <jmkasunich> yep
[00:19:26] <jepler> for me, it happens almost instantly after installing the read function
[00:20:00] <jepler> (on a very few occasions I've been able to observe in 'halcmd show' that the read function takes on the order of 2 seconds before it died entirely, but usually the crash is too fast for that)
[00:20:43] <jmkasunich> other than all the annoying reboots, that is good
[00:21:11] <jepler> and the problem now is that I keep bending the card bracket on each insertion .. maybe I should unscrew it
[00:21:14] <jmkasunich> at work we're trying to troubleshoot a problem (in france, on stuff that has been modified by others) that happens between once a day and once a week
[00:21:29] <jepler> sounds like me with the hardy/amd64/rtai kernel
[00:21:36] <jepler> well, except the france part
[00:30:45] <jepler> when RD# is going high, the 8255 will continue driving its outputs for up to 75ns while '245 is guaranteed to start driving its outputs after only 20ns. Does that 55ns matter?
[00:31:23] <jmkasunich> hard to day - it is certainly something that a good designer would try to avoid
[00:31:47] <jmkasunich> the question is, are we dealing with the work of a no-good designer, or a good one who has determined that in this case it's OK?
[00:40:07] <jepler> OK, doing the reads at offset 0xf0 (no 8255 should be driving) does not crash, doing reads at offset 0xc0 (8255 should be driving) does
[00:40:29] <jmkasunich> progress!
[00:40:45] <jmkasunich> these are dip chips?
[00:41:09] <jepler> yes, the 8255s are socketed DIPs
[00:41:14] <jepler> about the only surface part is the tiger320
[00:41:27] <jmkasunich> you don't happen to have a dip clip so you could scope them do you?
[00:41:40] <jepler> nope
[00:42:37] <jmkasunich> I realized that with the crashing and all it would be kind of hard anyway
[00:42:58] <jmkasunich> I was thinking trigger on the RD line, and probe ground, vcc, etc looking for bounces
[00:43:16] <jmkasunich> as well as probing the data lines looking for contention (levels that are neither hi nor low)
[00:43:58] <jmkasunich> it really seems odd that contention on the 8255 side could bust things
[00:44:21] <jmkasunich> when you read from the address that doesn't crash, what do you get? 0xFF?
[00:46:06] <jepler> if the I/Os aren't back-to-back, the machine stays up
[00:46:13] <jepler> (adding a printf of the value input is enough)
[00:46:20] <jmkasunich> hmmm
[00:46:26] <jmkasunich> very very interesting
[00:47:27] <jmkasunich> this has to be more than just local bus contention anyway - to take down the PC it has to somehow propogate to the PCI bus
[00:47:29] <jepler> when I read from the non-crashing address I get 0x00 consistently
[00:47:39] <jmkasunich> a timing thing, or a wait state thing...
[00:48:00] <jepler> when I read from the crashing address, the first time I got 0xff. Then I modified the program to outb() the loop counter to the same address and I got different results
[00:48:25] <jepler> oh and this time it locked even with a print and a delay
[00:48:32] <jmkasunich> which means maybe you are reading back a value from the previous write, floating on the data bus
[00:49:09] <jepler> no, I am not sure about that. if the 8255s ports are in output mode (not sure what mode they're in by default) then this would in principle read back the same value
[00:49:22] <jmkasunich> oh, ok
[00:49:57] <jepler> here are a few lines from the output before the crash. first digit of each pair is value read, the second is the value written just before (stupid order, I know): ff a9 / fe a8 / ff a7 / fe a6 / ff a5 / fe a4 / ff a3 / 45 a2 / ff a1 / fc a0 / ff 9f <crash>
[00:50:30] <jepler> so let me just note that there's a pretty high correlation in last-bits but the rest is less clear
[00:50:51] <jmkasunich> only one bit in the readback is toggling
[00:51:02] <jmkasunich> so "bits" is a bit optimistic
[00:51:20] <jmkasunich> oh, didn't see the 45
[00:51:49] <jepler> sorry, should have starred it or something
[00:51:57] <jmkasunich> should have read it or something
[00:53:12] <jmkasunich> lemme get this right - when you read (only) you get 0xff, and at least if you do a printf it doesn't crash
[00:53:26] <jmkasunich> when you write and read, you get the data you showed me, and it crashes
[00:55:38] <jepler> if I do 256 inb() back to back it crashes immediately. If I have something else (like printf or udelay) it doesn't consistently crash. It did eventually crash when the loop consisted of outb / inb / printf / udelay(1000)
[00:56:06] <jmkasunich> so outb is not a neccessary condition for the crash
[00:56:11] <jepler> no, I don't believe it is
[00:56:26] <jmkasunich> consecutive inb's will do it, as will outb immediately followed by inb
[00:56:32] <jepler> before I did any outb(), the read-back from 8255 was consistently 0xff, which is what I would expect having read the datasheet again
[00:56:49] <jmkasunich> try a delay between the outb and the inb, and otherwise exactly the same as the last time
[00:57:05] <jepler> (when a reset condition is set, the digital I/O are set to input mode but there is some kind of pull-up type device)
[00:57:33] <jepler> so if it was working right I'd be reading 0xff all the time
[00:57:39] <jmkasunich> maybe the condition is "any access immediately followed by a read" or even any access followed immediately by any access
[00:59:21] <jepler> I've never had it lock up on reads, and I'm pretty sure that's sam's experience too
[00:59:34] <jmkasunich> on writes you mean?
[00:59:39] <jepler> er yes
[01:00:05] <jmkasunich> so that probably rules out write-write
[01:00:09] <jepler> yes I think so
[01:00:19] <jmkasunich> we know read-read will do it
[01:00:49] <jepler> the functions are fairly symmetrical -- (read, trivial amount of computation) * (up to 9 repetitions) or (trivial amount of computation, write) * (up to 9 repetitions)
[01:00:51] <jmkasunich> and we think write-read will do it, but maybe its really just individual reads plus dumb luck
[01:00:54] <jepler> in hal that is
[01:01:34] <jmkasunich> I've always wondered how I/O reads are handled
[01:01:45] <jmkasunich> the frontside bus is so much faster than PCI
[01:01:58] <jmkasunich> I/O writes can be posted, and the CPU carries on
[01:02:41] <jmkasunich> does the CPU have to halt while a PCI read is happening? or does it do some kind of out-of-order execution while waiting for the data
[01:02:52] <jepler> I don't know about PCI but I've looked carefully through the tiger datasheet and it looks like the device on the host bus has to deliver the response in specific number of PCI bus cycles (I configure for 15 bus cycles, following the sample code)
[01:03:32] <jepler> I suspect but don't know that the CPU doesn't speculate ahead of inb
[01:03:56] <jepler> inb/outb are signs of low-performance hardware, there's no point in making the CPU fast in that case
[01:04:05] <jmkasunich> in the case of our code, it can't speculate very far - I suspect the very next line uses the result of the inb
[01:04:38] <jepler> that's true as well
[01:05:14] <jmkasunich> so the CPU does an inb, and the frontside bus goes into wait states
[01:05:26] <jmkasunich> then the PCI bus cycle starts, and it goes into wait states
[01:05:33] <jmkasunich> then the local bus cycle starts
[01:05:46] <jepler> but the tiger isn't waiting for any signal from the 8255. no matter what happens on the local bus, it just latches that value after XXns and puts it on the pci bus
[01:06:01] <jmkasunich> ok, that was my next question
[01:06:24] <jmkasunich> and that comes to the heart of the weirdness
[01:06:48] <jmkasunich> it seems that no matter what the tiger grabs, it should be able to put that on the PCI bus without bringing down the whole machine
[01:07:36] <jepler> if I read the datasheet right, it simply asserts local bus READ# for XXns and latches the data from the local bus at -10..+5ns from the rising edge of READ#
[01:08:26] <jepler> OK, so you want a loop that is outb(); delay(); inb(); printf(); ?
[01:08:53] <jmkasunich> yeah - maybe add a delay after the printf too, if you think the printf can happen quickly
[01:09:03] <jmkasunich> delay, out, delay, in, print
[01:09:36] <jmkasunich> the 15 clock delay that you programmed the tiger for (following the example) - can that be made longer?
[01:10:03] <jepler> no, 15 is the highest
[01:10:11] <jmkasunich> ok
[01:10:48] <jmkasunich> (more than one "tested" hardware design has fallen on its face when purchasing got a batch of chips from another vendor and the delays changed
[01:11:07] <jmkasunich> if you could add more delay it would be an interesting test, but if you can't you can't
[01:11:19] <jepler> with a couple of udelay(100000) it hasn't crashed yet but it does spuriously read back non-0xff
[01:12:11] <jmkasunich> that 100000 is in uS? nS?
[01:12:14] <jepler> us
[01:12:29] <jmkasunich> so 100mS between reads, only 10 per second?
[01:12:33] <jepler> yes
[01:12:58] <jmkasunich> when you had no delay it could do thousands or tens of thousands per second, right?
[01:14:36] <jepler> without delay and print, yes
[01:15:10] <jmkasunich> that muddies the data a bit
[01:15:35] <jmkasunich> you had one run where "out, in, print, delay" crashed it, right?
[01:15:46] <jmkasunich> that migh have been after a few hundred cycles
[01:16:31] <jmkasunich> we can't rule out something as simple as "a random timing issue that gives a 1 in 10000 chance of crashing on any read"
[01:16:53] <jepler> yes, I agree.
[01:17:05] <cradek> have you tried it in a (very) different machine?
[01:17:16] <jmkasunich> what if you make the udelay about 20 instead of 100000
[01:17:33] <jmkasunich> 20uS is still very many PCI cycles, and very very many instructions
[01:17:41] <jepler> cradek: I don't have a dev environment, but I did try the livecd on another machine and it locked just like it does on this machine and like it does for sam
[01:18:10] <jmkasunich> ok, 3 machines kind of rules out machine specific stuff
[01:18:13] <cradek> oh right, sam too
[01:18:32] <jepler> (well, sam thought it was correlated with toggling inputs but I don't know what to make of that)
[01:18:42] <jmkasunich> is this two different boards? the one you have and a different on at sam's? or did he send you the one that causes problems?
[01:18:57] <jepler> jmkasunich: he bought 2+ boards and sent me one.
[01:19:02] <jepler> so they are probably from the same batch
[01:19:09] <jmkasunich> but not the same board
[01:19:12] <jepler> but they are not the same board
[01:19:42] <jmkasunich> does their software work?
[01:19:52] <cradek> fascinating question
[01:19:59] <jmkasunich> I dunno if anybody has (or can) try it
[01:20:04] <jepler> I don't have any software
[01:20:16] <jepler> there's a document in thai that shows a delphi program doing some kind of operation with the board...
[01:20:27] <jmkasunich> nor do you likely want to infest your PC with delphi
[01:20:54] <jmkasunich> I was wondering specifically if anybody had tried on of these two boards
[01:21:03] <jmkasunich> if its a batch related timing thing, their SW might break too
[01:21:29] <jmkasunich> sounds like not a practical test
[01:22:07] <jmkasunich> the "out. delay. in, delay, print" run still going?
[01:22:27] <jepler> I changed the program
[01:22:43] <jepler> I'm doing inb() from the no-op port, no delays, prints once every 65536 operations.
[01:23:00] <jepler> it's now done about 83 million and is still going OK
[01:23:02] <jmkasunich> so banging the snot out of it
[01:23:08] <jepler> yes as fast as I can
[01:23:41] <jmkasunich> and you had reliable crashes when doing 256 no-delay bursts of inbs, right?
[01:23:48] <jmkasunich> (to the 8255 address)
[01:23:52] <jepler> yes.
[01:24:40] <jmkasunich> so, we KNOW the 8255 address is needed to make it crash
[01:24:46] <jepler> I feel pretty safe saying that
[01:25:22] <jmkasunich> we're not sure if time between accesses matters
[01:25:55] <jmkasunich> it could be a 1 in 50000 thing and the keeps us from getting the couple hundred K that we'd need to know
[01:25:58] <jepler> now 270 million inb()s from the no-op port .. I am satisfied that this will not crash the machine
[01:26:11] <jmkasunich> I was satisfied at a million or two
[01:26:33] <jmkasunich> 4 or 5 orders of magnitude better than the 256 bursts is pretty significant
[01:27:14] <jmkasunich> from the tiger's point of view there is no difference between the ports, right?
[01:27:36] <jepler> no; it has a 4-bit host address bus
[01:28:07] <jepler> the 8255s take a0, a1 and the '139 decodes a2, a3 into chip selects
[01:28:35] <jepler> whan a2=a3=1, there's no 8255 getting a chip select
[01:28:40] <jmkasunich> when you read the no-op port you get zeros, right?
[01:28:57] <jmkasunich> a hundred million of 'em
[01:29:25] <jepler> as far as I saw during that run
[01:29:51] <jmkasunich> unanswerable question - what would happen if the local bus was pulled up and reading the no-op port returned ff?
[01:29:51] <jepler> I thought I saw a 01 scroll by at the beginning of the first run after a mixed outb/inb program but I didn't recreate it and so I didn't mention it
[01:30:31] <jepler> there's actually a spot for a pull-up resistor pack on the 8255-side data bus that isn't populated, but I don't have any SIP resistors to put in there.
[01:32:33] <jmkasunich> which side of the 245 is facing the tiger? A or B?
[01:32:40] <jepler> A
[01:32:51] <jmkasunich> so a low on DIR drives the tiger
[01:33:20] <jepler> yes, I think so -- the signal silkscreened XREAD# is what drives that 245 input.
[01:33:28] <jmkasunich> what is the tiger P/N? (or do you have a URL for the datasheet)
[01:33:40] <jepler> http://www.tjnet.com/software/download/data_sheets/Tiger320_data_sheet.pdf
[01:34:26] <jepler> what will HCT logic do with floating inputs?
[01:34:33] <jmkasunich> they'll float
[01:34:40] <jmkasunich> IOW, nothing predictable
[01:35:04] <jmkasunich> and sometimes things not nice - if it sits halfway between rails the chip can draw excessive current
[01:35:24] <jmkasunich> often the capacitance of the traces can hold the previous value for quite a while
[01:36:05] <jmkasunich> famous oops - write a memcheck that writes a value to an address and then immediately reads back from the same address to check - can pass with no memory installed
[01:36:25] <jepler> outb(0xff); inb() isn't turning up nonzero reads
[01:36:51] <jmkasunich> in this case the 245 is enabled all the time....
[01:37:06] <jmkasunich> I think that means that the far side bus doesn't get a chance to float
[01:37:29] <jmkasunich> you might write ff to it, but then the local bus (tiger side) is driven to zero by the tiger between cycles
[01:37:46] <jmkasunich> that drives the far side to zero, and thats what you read on the next cycle
[01:37:49] <jepler> yeah I can see that
[01:39:47] <jepler> OK, this test crashed after 1690 read()s. let me reboot so I can be sure to get the rest of the details of just what I ran ...
[01:39:49] <jmkasunich> datasheet says you can use memory or IO, and ISTR swp talking about that
[01:40:04] <jepler> yes
[01:40:17] <jepler> I never had luck with memory-mapped access
[01:40:42] <jmkasunich> as in it crashed? or just never got any data back from the 8255s?
[01:40:58] <jepler> outputs never appeared
[01:41:22] <jmkasunich> wow, this is a little bridge
[01:41:30] <jmkasunich> didn't realize the local bus was 8 bits
[01:41:56] <jepler> yeah -- 3 8255s are about its limit of complexity
[01:42:10] <jepler> there's some kind of clocked serial bus too, but I have no use for that
[01:42:22] <jmkasunich> I notice that they set the subsystem ID stuff using pullups and downs on the data bus
[01:43:11] <jepler> yes.
[01:43:57] <jepler> this is the program I ran last: http://emergent.unpy.net/index.cgi-files/sandbox/crashme.c
[01:44:18] <jepler> the internet says that outb() to port 80 gives typical 1usec delays (ISA bus speed, just like parport)
[01:44:34] <jepler> the internet suggests that usleep() in userspace is inaccurate for short delays so I chose this method instead
[01:45:16] <jepler> v was always nonzero and when it crashed the last line on the console was 0xfffff965 meaning that it had done 1690 (+-1) cycles
[01:45:52] <jmkasunich> offset c0 is an 8255, right?
[01:45:55] <jepler> yes
[01:46:05] <jepler> c0, d0, e0 are 8255s, and f0 is nothing
[01:46:23] <jepler> ?0, ?4, ?8, ?c are the 4 registers of each 8255
[01:46:47] <jepler> oh I was off a bit in the # of cycles because the count starts at 256
[01:46:49] <jmkasunich> I noticed the shift of address lines
[01:47:42] <jepler> OK, multiplied the delay by 100 and it crashed at ffffff52b -- something like 3000 reads
[01:47:56] <jmkasunich> not a statistically different number
[01:48:24] <jmkasunich> you said v was always non-zero
[01:48:30] <jmkasunich> that means it printed every time?
[01:48:49] <jepler> yes, I suppose it was printing every time
[01:49:07] <jepler> so the difference in delay might not be as great as I thought
[01:49:09] <jmkasunich> the print delay is probably longer than the outb(80) delay
[01:49:52] <jepler> but we know the delay was at least 1ms in the second test that died after 3000 reads
[01:50:06] <jepler> (2000 outb(0x80, 0x80))
[01:50:14] <jepler> probably more like 2ms
[01:50:18] <jmkasunich> yeah - so its unlikely to be the actual timing that matters
[01:50:26] <jmkasunich> more likely just the sheer number of reads
[01:50:59] <jmkasunich> if we're averaging one crash per a couple thousand reads, it might look good in a test that does 10 reads/second
[01:52:20] <jepler> "good" would still be stretching it
[01:52:35] <jmkasunich> well, it could take a couple hundred seconds to show up
[01:52:42] <jmkasunich> dunno how long your test runs were
[01:52:47] <jepler> "huh, it crashed after 5 minutes -- well, that's windows. "
[01:54:05] <jepler> I don't see what else to try
[01:54:15] <jepler> either this board (and sam's) is bad, or the whole board design is bad
[01:54:31] <jmkasunich> all bus-mastering, interrupts, and power management stuff is disabled, right?
[01:56:14] <jepler> I dunno. I assumed that the power-up settings were OK except for the ones set in the delphi program I read
[01:57:01] <jepler> but even if I could turn off an interrupt that was being spuriously generated, there's still the fact that it doesn't seem to reliably read the 8255 bus
[01:58:33] <jmkasunich> squirrley
[01:58:45] <jmkasunich> it would be nice to be able to stick a scope on it
[01:59:00] <jmkasunich> but triggering on the crash would be quite a trick
[01:59:49] <jepler> http://emergent.unpy.net/index.cgi-files/sandbox/regs.txt
[01:59:56] <jepler> this is the power-up state
[02:00:19] <jepler> well, it's what I read after booting into linux
[02:00:28] <jmkasunich> will take a bit of datasheet reading to figure out what that all means
[02:00:29] <jepler> as far as I've checked it matches the power-up state from the datasheet
[02:01:10] <jmkasunich> you're only writing to 00, 02, and 03, right?
[02:01:25] <jepler> yes, I believe that's right
[02:01:31] <jepler> The difference between the DMA end address and DMA start address will be the amount of data
[02:01:34] <jepler> that is to be transferred by the DMA. If the start address is “X” and the data transferred is “n”
[02:01:36] <jmkasunich> what is this: outb(0x00, base+offset+3*4);
[02:01:37] <jepler> bytes, the end address will be “X + n - 4”. The minus 4 is needed to point to the very last location
[02:01:40] <jepler> as the first location is 0 and not 1.
[02:02:03] <jepler> that would set the control register of an 8255 to "all ports output" if it worked
[02:02:20] <jmkasunich> ok
[02:02:39] <jepler> (I don't think the paste above is pertinent, I just thought it was a terrible explanation of the dma address range)
[02:02:53] <jmkasunich> that it is
[02:03:32] <jmkasunich> this test program in userspace- but it still causes a hard lock of the entire machine?
[02:04:22] <jepler> yes
[02:04:49] <jmkasunich> _nothing_ on the local bus should be able to do that
[02:05:28] <jepler> I agree
[02:05:29] <SWPadnos> pci_write can cause that, when the wrong registers of the PLX chips are twiddled
[02:05:52] <jmkasunich> oh, we were talking about the difference between the time it takes the 8255 to release the bus and the time it takes the 245 to turn around and start driving after a read...
[02:06:06] <SWPadnos> ok, I didn't read closely enough then
[02:06:27] <jmkasunich> I think thats a non-issue - during the read, <somedata> is coming from the 8255 and going to the tiger
[02:06:46] <jmkasunich> when the 245 turns around, its gonna see <somedata> floating on the tiger bus, and send that back to the 8255
[02:07:01] <jmkasunich> which will match what the 8255 is driving until it releases the bus - no contention
[02:08:10] <jepler> I believe DMA, watchdog, and interrupts are all turned off in the power-on state and in the regs.txt dump .. I don't see anything else I can do there
[02:08:54] <jmkasunich> we may be fighting a losing battle - the card may simply have hardware issues
[02:09:16] <jmkasunich> it would be a good datapoint if sam could somehow test it with their software
[02:09:48] <jepler> I'm done for the night
[02:09:48] <jmkasunich> even a single lockup running their code, and its "this card sucks, don't buy one, and we're not gonna waste our time anymore"
[02:10:01] <jmkasunich> goodnight
[02:10:03] <jepler> jmkasunich, SWPadnos, cradek: thanks for your help with this
[02:10:07] <jepler> especially you, jmkasunich
[02:10:13] <jmkasunich> thanks for trying
[02:10:29] <jmkasunich> you're the one who had to keep hitting reset (thats gotta suck after a while)
[02:10:51] <SWPadnos> oh - you're welcome, for what little I've done. thanks for the (gargantuan) effort you've put in
[05:51:57] <tom1> I've been running the i/o programs for Futurlec's PCI8255 on W2k
[05:52:41] <tom1> There's several programs, all really the same, just developed on VB/Delphi/VC
[05:53:11] <tom1> All have same format, a widget with 3 rows & columns
[05:53:37] <tom1> each is a port, and under the post is a READ and a WRITE btn
[05:53:50] <tom1> each has a text field for display/or entry as case may be
[05:54:26] <tom1> none blows up, reads always return FF, writes dotn care what i put into the text fields
[05:55:21] <tom1> i dont have hdwr hooked up but will try to find flatbands, dipswx, leds and stuff to test with
[05:55:50] <tom1> i can also post the software somewhere if any one wants it.
[05:56:14] <tom1> I've asked teh futurlec people to release the source of the drivers but havent heard anything
[05:56:20] <tom1> its 2am g'nite
[05:57:00] <tom1> btw: i need this box to test with if i need w2k so my moniker may be diff for a while
[12:19:07] <jepler> skunkworks_: good morning .. you should check the logs
[12:19:09] <jepler> logger_dev: bookmark
[12:19:09] <jepler> Just this once .. here's the log: http://www.linuxcnc.org/irc/irc.freenode.net:6667/emcdevel/2008-05-06.txt
[12:19:18] <jepler> skunkworks_: actually starting at the end of -05-05.txt
[12:21:33] <skunkworks_> jepler: thanks. reading now.
[12:34:21] <skunkworks_> holy smokes - this is going to take some coffee..
[13:09:19] <skunkworks_> jepler: wow
[13:09:37] <skunkworks_> Thanks for all your work. (and every ones..)
[13:10:51] <skunkworks_> I will see if I can get their software working.. (I might have to get it directly fromt them as I don't know if I can find or have the cd)
[13:13:27] <skunkworks_> oh - tom seems to have tried it.
[13:14:43] <skunkworks_> I think he should try the hal driver and see if his machine locks up. (maybe as jmkasunich had said - was a bad batch)
[13:19:30] <alex_joni> what's a good invocation for find to find files newer than 2 months?
[13:25:04] <skunkworks_> * skunkworks_ would have thought that he copied the cd to his computer at some point.
[13:34:59] <jepler> alex_joni: something like find ... -mtime -60 ?
[13:36:09] <alex_joni> ok, thanks
[13:36:16] <SWPadnos> I wonder if setting that timing register down to 3 or so would help
[13:36:50] <jepler> all their examples use the "1 1" setting for 15 cycles.
[13:36:59] <SWPadnos> yeah, that's true
[13:37:03] <jepler> if the source fragments in the pdf files I've found can be trusted
[13:37:15] <SWPadnos> is that in PCI clocks or some other "local bus clock"?
[13:38:31] <skunkworks_> I feel bad that something that should have been easy turned out to be such a cluster f$ck
[13:40:00] <jepler> SWPadnos: "When a PCI read cycle takes place .. the READ# signal will be active for 3-12 PCI cycles, the actual number of cycles is determined by the setting of bits 4 and 5 in the internal register 0x00."
[13:40:33] <jepler> (yes, it says 12, not 15)
[13:40:35] <SWPadnos> 15 is outside that range ??
[13:41:08] <SWPadnos> I was thinking that the tiger chip may be stupid and drive the data lines while it's pausing before the read
[13:41:33] <SWPadnos> if the 8255 has a 75ns max time to output, then you only need 3 PCI clocks to guarantee that the output is valid
[13:41:52] <SWPadnos> that would reduce bus contention duration if the tiger is dumb
[13:41:55] <jepler> but we determined the problem was on the 8255 side of the '245s
[13:42:35] <SWPadnos> ... if the tiger holds the 245's in the wrong direction during the delay ... :)
[13:43:01] <SWPadnos> the 8255 and '245 can probably sink a lot more current than the tiger anyway
[13:43:07] <jepler> elsewhere, T_p, command pulse width, is documented as taking min 80ns, max 640ns which is 21.12 cycles at 33MHz .. confusing
[13:43:29] <SWPadnos> is that for the serial bus?
[13:43:38] <SWPadnos> doesn't matter really
[13:44:01] <jepler> that's the time READ# or WRITE# is asserted on the local bus
[13:44:09] <jepler> just trying to figure out what 15 "cycles" actually is ..
[13:44:14] <SWPadnos> heh
[13:45:35] <SWPadnos> one of the tests you did had the control '245 plus all the 8255s installed, right (ie, only the "data" '245 removed)?
[13:46:11] <jepler> yes. that setup never locked up.
[13:46:16] <SWPadnos> ok
[13:47:17] <SWPadnos> since the tiger chip got screwed sometimes (reading all FF, etc), it also could have been screwed enough to fubar the PCI bus
[13:47:56] <SWPadnos> I think the windows demo software is strictly user event driven, with the possible exception that there may be a timer used for reading port status
[13:48:20] <jepler> in one of the samples there's a 100ms timer for reads when you place the port in input mode. I don't know which sample tom1 may have been using.
[13:48:21] <SWPadnos> the delphi code also runs in ring 3, so I/O instructions will be slower
[13:49:10] <SWPadnos> a power-related problem probably wouldn't surface as often with a 100ms scan time
[13:49:20] <SWPadnos> caps would have enough time to recharge, etc
[13:51:25] <jepler> I got lockups with delays of ~2ms + printf() to console, didn't test bigger than that
[13:53:32] <SWPadnos> looks like you tried 100ms, and had no lockups
[13:53:44] <SWPadnos> but sometimes got back "non-0xff data"
[15:19:25] <skunkworks_> tomp: have you tried the hal driver for the 8255 card yet?
[15:19:26] <tomp> ello, i tried all 3 examples, bc/vb/delphi. what should i test?
[15:19:57] <tomp> oh, on linux? i thought you needed win$ testing
[15:19:59] <skunkworks_> To rule out a bad batch of cards? (jeplers and mine where bought at the same time)
[15:20:05] <skunkworks_> That too
[15:20:24] <tomp> ok, it's it the car now, can try after a trip
[15:20:33] <tomp> l8r :)
[18:26:46] <SWPadnos> hmmm. always sync before testing moidified realtime/kernel code
[18:27:01] <SWPadnos> -i
[18:28:47] <jepler> I've thought about adding a 'sync' in halrun for just that purpose
[18:29:08] <jepler> I've put 'loadusr -W sync' in hal files too
[18:29:27] <SWPadnos> yes, either of those methods would have helped :)
[18:29:55] <SWPadnos> not much is lost. at least it seems that make clean ; make ; sudo make install will fix it
[18:30:09] <jepler> oh I always use RIP
[18:30:23] <SWPadnos> this is for an embedded system using HAL
[18:30:28] <SWPadnos> the power supply thing
[18:30:39] <jepler> oh -- fixing bugs in it or something?
[18:30:44] <jepler> I thought that project was done
[18:31:04] <SWPadnos> it was, until new specs arrived
[18:31:36] <jepler> oh I see
[18:31:47] <jepler> I hope new money arrived with the new specs
[18:32:03] <SWPadnos> it should, once I finish and bill
[18:32:39] <alex_joni> new bill :)
[19:14:22] <jepler> man I should get back to offs
[19:17:29] <alex_joni> jepler: maybe you can join efforts with awallin?
[19:17:58] <jepler> alex_joni: right now I just need to find and fix some bugs
[19:20:11] <jepler> and I need to stop taking on new projects and get back to old ones
[19:21:52] <alex_joni> heh :)
[19:23:46] <skunkworks_> quick - everyone send jepler some new hardware.. ;)
[19:24:51] <SWPadnos> right - I never got around to ordering 7i43s (then Seb Kuzminsky showed up)
[19:25:24] <jepler> yeah and I am thrilled about that
[19:25:47] <SWPadnos> heh
[20:02:26] <SWPadnos> hmmm. I wonder if it's possible to use the parallel port on the 7i43 as an I/O port - ie use USB for communication and the parport for extra I/O
[20:06:46] <tomp2> skunkworks_: could you pastebin a hal file with the desired cfg for the pci_8255?
[20:10:30] <tomp2> what is the cfg syntac for pci_8255? i tried loadrt pci_8255 cfh="0xB001 IIIIOOOOIOIO" count=1
[20:13:54] <alex_joni> tomp2: modinfo /path/to/pci_8255.ko
[20:17:49] <tomp2> thx (modinfo new 2 me )
[20:19:43] <jepler> SWPadnos: interesting question
[20:19:54] <SWPadnos> I asked Pete - we'll see what he says
[20:35:42] <tomp2> modinfo pci_8255 tells me "parm: io:I/O addresses of 8255s (array of int)"
[20:35:42] <tomp2> does it mean base address(es) card(s) or chip's?
[20:35:42] <tomp2> The code says
[20:35:42] <tomp2> // relay off (active-high), cs low
[20:35:42] <tomp2> WRITE(0x10, io[i]+3, 0);
[20:35:43] <tomp2> and since there's only 1 relay, it must be the card addr right?
[20:36:09] <SWPadnos> careful when reading the code
[20:36:22] <tomp2> how so?
[20:36:38] <SWPadnos> there are some global vars, some defines that make those globals look like locals, and local vars with similar or identical names
[20:37:09] <tomp2> ok, so it the array of card addresses or chip address?
[20:37:11] <SWPadnos> and those defines (like comp has, for pointer dereferencing etc) are midway through the code
[20:37:21] <SWPadnos> just look carefully ;)
[20:37:42] <tomp2> wow, thx
[20:38:08] <SWPadnos> that particular write is to a tiger320 control register, and sets the relay and chip select output states before setting the directions to output (on the next line)
[20:38:50] <SWPadnos> in that case, io is the card address, so io[i] would be the i'th card base address
[20:40:14] <SWPadnos> peteW says no, the DB25 can't be used as I/O
[22:16:18] <tomp2> http://imagebin.ca/view/2EKEb16.html loadrt pci_8255 io="111100001010" dir="0xB001"
[22:17:05] <tomp2> and all the fx's addf'd to servo thread
[22:20:41] <tomp2> crap
[22:20:57] <tomp2> http://imagebin.ca/view/2EKEb16.html loadrt pci_8255 dir="111100001010" io="0xB001"
[22:21:43] <tomp2> used ETT's suggested PCI_TREE to determine base address
[22:34:16] <jepler> dir should be of the form 0xfff where each bit specifies a direction for a certain group of pins; io is almost certain to end 00, not 01.
[22:34:56] <jepler> there's also no guarantee that the io address is the same if you ran "PCI_TREE" on a different OS or with the card in a different machine
[22:35:04] <jepler> use lspci -v in linux instead
[22:35:12] <tomp2> PCI_Tree said 0XB001
[22:35:17] <tomp2> i can do that too
[22:35:36] <tomp2> but its now a windows box
[22:36:16] <tomp2> and the 0xffff...
[22:36:24] <tomp2> will try that too
[22:36:27] <jepler> also the realtime threads aren't running, since e.g., ...b6 and ...b6-not are both FALSE -- that would be impossible in normal usage
[22:36:45] <jepler> 0x000 will give all inputs and 0xfff all outputs, or vice versa (not sure)
[22:37:17] <tomp2> why arent the rt threads running ( how to check ) ?
[22:37:29] <jepler> I don't know
[22:37:46] <jepler> but the most likely way for a pin and its -not to both be FALSE is if the function is not running
[22:38:13] <tomp2> the indication of fred = !fred is why, ok
[22:39:56] <tomp2> got a cfg string that you used handy? for next reboot to linux?
[22:40:16] <jepler> your io= will be different from mine
[22:40:28] <jepler> so I don't see that it will be that helpful
[22:42:50] <tomp2> your dir
[22:43:16] <tomp2> i'll use lspci for io
[22:52:22] <jepler> loadrt pci_8255 io=0xc000 dir=0x888
[22:52:53] <jepler> can you send me or tell me where to get one of these windows demo programs? I think I have a machine with windows that I can put the card in ...
[22:54:26] <tomp2> i can send or upload the files and a card
[22:55:16] <tomp2> i just did loadrt pci_8255 io="0xB000" dir="0xA59"
[22:55:25] <jepler> Ok, that should give a mix of inputs and outputs
[22:55:53] <tomp2> it's running and i got 3 dialogs.. registering pci*255.0.0 .. 91 b0c0 and then 2 more
[22:56:09] <jepler> dialogs?
[22:56:20] <tomp2> i still have pci8255.0.0.a0 and a0-not both false
[22:56:24] <tomp2> yes dialogs
[22:56:37] <tomp2> pop ups with OK buttons
[22:57:11] <tomp2> its running now
[22:57:48] <jepler> oh -- I guess if you're running this in emc2 those diagnostic messages will pop up as dialogs
[22:57:59] <tomp2> k
[22:57:59] <jepler> (I always just use 'halrun', I don't start a full emc .. but do whatever you know best)
[22:58:30] <tomp2> dont you want me to do anything ;) ( what i know best ...)
[22:58:42] <jepler> don't sell yourself short
[22:59:18] <tomp2> lemme see if i got the cd files on the windows partition of that box
[23:03:27] <jepler> the output of 'halcmd show thread' will probably let me tell whether the functions I'm interested in are running
[23:04:44] <tomp2> checking now
[23:13:25] <tomp2> http://pastebin.ca/1009873
[23:13:46] <tomp2> i dont see any mention of 8255
[23:14:10] <tomp2> checking addf's
[23:14:28] <jepler> yeah you'd have to addf it
[23:14:44] <tomp2> (this is sneakernet with thumbdrives)
[23:14:49] <jepler> oh I see
[23:14:56] <jepler> that's about as inconvenient as rebooting every 2 minutes
[23:15:05] <jepler> which has been my experience trying to troubleshoot this thing
[23:15:11] <tomp2> i'm getting the .ahl now
[23:15:18] <tomp2> .hal
[23:20:28] <jepler> I think I may have found the demo programs in exe form
[23:20:46] <jepler> http://www.ett.co.th/downloada.html ET-PC8255
[23:20:53] <jepler> (unless pc8255 and pci8255 are different products, not just typos ..
[23:23:19] <jepler> .. maybe not
[23:27:29] <jepler> no, it's for some older card, not the pci8255
[23:29:57] <jepler> hi skunkworks
[23:30:33] <tomp2> the halcmd show thread http://pastebin.ca/1009887 versus the AXIS hal Show Configuration utility http://imagebin.ca/view/A9PodecW.html
[23:30:41] <tomp2> i did have the addf's commented out
[23:31:08] <tomp2> i did take a picture of the gui showing functions not threads
[23:31:11] <jepler> OK, I can explain what you're seeeing in the screenshot
[23:31:20] <jepler> that lists all the available functions, whether they've been added to a thread or not
[23:31:34] <jepler> in fact you can see by the number "0" before the name that those are not on a thread
[23:31:40] <jepler> as opposed to those that show "1" there, which are on a thread
[23:32:05] <tomp2> ok, i can paste the .hal next ( cuz i dont see why they aren;t being added )
[23:32:09] <skunkworks> Hi jepler
[23:32:45] <jepler> skunkworks: tomp also has one of these cards in an emc machine and he's trying to help troubleshoot it
[23:32:53] <jepler> it's slowed down by the fact that he has to sneakernet each screenshot or paste
[23:33:01] <skunkworks> yeck :)
[23:33:24] <skunkworks> looks like he needs to add the read funtions to a thread :)
[23:33:38] <skunkworks> * skunkworks act like he knows what he is doing
[23:33:45] <jepler> yep I'm pretty sure that's the missing step before he can have his machine lock up like ours do
[23:33:56] <skunkworks> heh
[23:34:30] <jepler> meanwhile I was trying to get ahold of the windows test programs .. thought I'd found them on one of those thai-language websites but it was the wrong thing
[23:35:04] <skunkworks> I saw that on the logger.. I emailed futurlec but I think it would be quicker to get it from tom
[23:35:26] <tomp2> http://pastebin.ca/1009897
[23:35:55] <tomp2> it looks to me as if i addde them and they didnt get added
[23:36:22] <jepler> looks like in the screenshots the functions have "-" while your paste has "_"
[23:36:32] <tomp2> will check
[23:36:37] <jepler> though I have to admit I would expect that to error and not continue starting emc
[23:36:56] <tomp2> (dont believe what i type)
[23:43:38] <jepler> here is the hal file I use to lock up my machine: http://emergent.unpy.net/index.cgi-files/sandbox/8255.hal
[23:43:58] <jepler> just put it in your ~, change the I/O address in your favorite editor, open up a terminal, and type 'halrun 8255.hal'
[23:44:05] <jepler> after a few seconds, that reliably locks my machine up
[23:51:06] <tomp2> emc dies, pci_8255,write not found ( i had probs with write_relay & write_all so I cut to the chase and only addf'd write & read )
[23:51:43] <tomp2> should be pci_8255.write not pci_8255.0.write correct?
[23:52:01] <tomp2> urff, reading back...
[23:52:42] <jepler> there are several different write functions .. it is all a bit confusing and over-engineered. pci_8255.write-all writes all connectors and the relay. .write-relay writes the relay open. .0.write, .1.write and .2.write write one 8255 each.
[23:52:59] <jepler> that hal file worked for me exactly as I uploaded it, with emc 2.2.5.
[23:53:06] <jepler> well, "worked" as in locked up the system :-P
[23:53:17] <skunkworks> heh
[23:55:35] <tomp2> ".0.0"? two .0's ? as in pci8255.0.0.write thread from your paste ...will try that
[23:55:48] <jepler> well .. whatever my paste says ..
[23:56:01] <jepler> maybe the functions are for the individual ports of the connectors? I just remember that I made it too complicated.
[23:57:03] <skunkworks> I call it elegant

#emc-devel | Logs for 2008-05-06