A bug fix in the 8086 microprocessor, revealed in the die's silicon

Daily Digest email

Get the top HN stories in your inbox every day.

anyfoo

Once again, absolutely amazing. Those are more details of a really interesting internal CPU bug than I could have ever hopes for.

Ken, do you think in some future it might be feasible for a hobbyist (even if just a very advanced one like you) to do some sort of precise x-ray imaging that would obviate the need to destructively dismantle the chip? For a chip of that vintage, I mean.

Obviously that's not an issue for 8086 or 6502, since there are more than plenty around. But if there were ever for example an engineering sample appearing, it would be incredibly interesting to know what might have changed. But if it's the only one, dissecting it could go very wrong and you lose both the chip and the insight it could have given.[1]

Also in terms of footnotes, I always meant to ask: I think they make sense as footnotes, but unlike footnotes in a book or paper (or in this short comment), I cannot just let my eyes jump down and back up, which interrupts flow a little. I've seen at least one website having footnotes on the side, i.e. in the margin next to the text that they apply to. Maybe with a little JS or CSS to fully unveil then. Would that work?

molticrystal

>some sort of precise x-ray imaging that would obviate the need to destructively dismantle the chip

I don't know about a hobbyist, but are you talking about something along the lines of "Ptychographic X-ray Laminography"? [0] [1]

[0] https://spectrum.ieee.org/xray-tech-lays-chip-secrets-bare

[1] https://www.nature.com/articles/s41928-019-0309-z

kens

That X-ray technique is very cool. One problem, though, is that it only shows the metal layers. The doping of the silicon is very important, but doesn't show up on X-rays.

anyfoo

I haven't ever looked into it itself, but what you just pasted seems like a somewhat promising answer indeed. Except for the synchrotron part I guess? Maybe?

sbierwagen

>Except for the synchrotron part I guess? Maybe?

They were using 6.2keV x-rays from the Swiss Light Source, a multi-billion Euro scientific installation. 6.2keV isn't especially energetic by x-ray standards, (a tungsten target x-ray tube will do ten times that) so either they needed monochromacy or the high flux you can only get from a building-sized synchrotron. Given that the paper says they poured 76 million Grays of ionizing radiation into an area of 90,000 cubic micrometers over the course of 60 hours suggests the latter. (A fatal dose of whole-body radiation to a human is about 5 Grays. This is not a tomography technique that will ever be applied to a living subject, though there is some interesting things no doubt being done right now to frozen bacteria or viruses.)

drmpeg

When I was at C-Cube Microsystems in the mid 90's, during the bring-up of a new chip they would test fixes with a FIB (Focused Ion Beam). Basically a direct edit of the silicon.

mepian

Intel is still using FIB for silicon debugging.

userbinator

The obvious workaround for this problem is to disable interrupts while you're changing the Stack Segment register, and then turn interrupts back on when you're done. This is the standard way to prevent interrupts from happening at a "bad time". The problem is that the 8086 (like most microprocessors) has a non-maskable interrupt (NMI), an interrupt for very important things that can't be disabled.

Although it's unclear whether the very first revisions of the 8088 (not 8086) with this bug ended up in IBM PCs, since that would be a few years before its introduction, the original PC and successors have the ability to disable NMI in the external logic via an I/O port.

_tom_

This was in the early IBM PCs. I know, because I remember I had to replace my 8088 when I got the 8087 floating point coprocessor. I don't recall exactly why this caused it to hit the interrupt bug, but it did.

anyfoo

Maybe because IBM connected the 8087's INT output to the 8086's NMI, so the 8087 became a source of NMIs (and with really trivial stuff like underflow/overflow, forced rounding etc., to boot).

Even if you could disable the NMI with external circuitry, that workaround quickly becomes untenable, especially when it's not fully synchronous.

_tom_

That sounds right. It's been a while.

undefined

[deleted]

acomjean

We used to disable interrupts for some processes at work (hpux/pa-risc). It makes them quasi real time, though if something goes wrong you have to reboot to get that cpu back…

Psrset was the command, there is very little info on it since HP gave up on HPUX.. I thought it was a good solution for some problems.

They even sent a bunch of us to a red hat kernel internals class to see if Linux had something comparable.

https://www.unix.com/man-page/hpux/1m/psrset/

azalemeth

I'd never heard of processor sets (probably because I've never used HPUX in anger), but they sound like a great feature, especially in the days where one CPU had only one execution engine on it. Modern Linux has quite a lot of options for hard real time on SMP systems, some of them Free and some of them not. Of all places, Xilinx has quite a good overview: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/188424...

kaszanka

For those who are curious, it's the highest bit in port 70h (this port is also used as an index register when accessing the CMOS/RTC): https://wiki.osdev.org/CMOS#Non-Maskable_Interrupts

userbinator

Port 70h is where it went in the AT (which is what most people are probably referring to when they say "PC compatible" these days.) On the PC and XT, it's in port A0:

https://wiki.osdev.org/NMI

Thre's also another set of NMI control bits in port 61h on an AT. It's worth noting that the port 70h and 61h controls are still there in the 2022 Intel 600-series chipsets, almost 40 years later; look at pages "959" and "960" of this: https://cdrdv2.intel.com/v1/dl/getContent/710279?fileName=71...

fnordpiglet

I sincerely wish I were this person. Reading this makes me feel I’ve fundamentally failed in my life decisions.

quickthrower2

Whatever you decide to do next, please be gentle with yourself. If you are in a position to even appreciate the post, you have probably done well!

I get it. And i think i might know why: we are all smart here. Smart enough to do outstandingly well at high school. As time goes on: university and career the selection bias reveals itself. We are now with millions of peers and statistically some of them will be well ahead by some measure. In short: the feeling is to be expected, and it is by design of modern life and high interconnectedness

toofy

this is a really nice reply.

sometimes in the too often contrarian comment sections, it’s jarring to see someone’s comment recognize the human being on the other end. it’s inspiring.

i’m not the op, but your comment was kinda rad. thanks.

kens

I'm not sure if I should be complimented or horrified by this.

redanddead

why not onboard him into the initiative?

hyperman1

I am always awestruck by Kens post, and felt severely underqualified to comment on anything, even as a regular if basic 'elektor' reader.

But the previous post, about the bootstrap drivers gave me a chance. I had an hour to spare, the size of the circuit was small enough, so I just stared at it on my cell phone, scrolling back and forth, back and forth. It took a while, but slowly I started picking up some basic understanding. Big shoutout to Ken, BTW, he did everything to make it easy for mortals to follow along.

I am still clearly at the beginner level, and surely can't redo that post on my own, but there is a path forward to learn this.

If I'd redo that post, I'd advise to have pen and paper ready, and print out the circuit a few times to doodle upon them. And have lots of time to focus.

holoduke

I actually had some eye openers on how a cpu works in more detail after reading this article. Is there any good material for understanding cpu design for beginners?

johannes1234321

Checkout Ben Eaters series on building a breadboard computer: https://youtube.com/playlist?list=PLowKtXNTBypGqImE405J2565d...

In the series he builds all the things more or less from the group up. And if that isn't enough he in another video series builds his breadboard VGA graphics card

Sirened

Hardware is fun! It's never too late :)

prettyStandard

I'm sure you could figure this out if you wanted to.

https://www.reddit.com/r/devops/comments/8wemc2/feeling_unsu...

_Microft

It's never too late.

fnordpiglet

https://youtu.be/n3SsJdm8bMY

undefined

[deleted]

ajross

I love these so much.

kens: were you able to identify the original bug? I understand the errata well enough, it implies that the processor state is indeterminate between the segment assignment and subsequent update to SP. But this isn't a 2/386 with segment selectors; on the 8086, SS and SP are just plain registers. At least conceptually, whenever you access memory the processor does an implicit shift/add with the relevant segment; there's no intermediate state.

Unless there is? Was there a special optimization that pre-baked the segment offset into SP, maybe? Would love to hear about anything you saw.

kens

I believe this was a design bug: the hardware followed the design but nobody realized that the design had a problem. Specifically, it takes two instructions to move the stack pointer to a different segment (updating the SS and the SP). If you get an interrupt between these two instructions, everything is deterministic and "works". The problem is that the combination of the old SP and the new SS points to an unexpected place in memory, so your stack frame is going to clobber something. The hardware is doing the "right thing" but the behavior is unusable.

ajross

Huh... well that's disappointing. That doesn't sound like a hardware bug to me at all. There's nothing unhandleable about this circumstance that I see, any combination of SS/SP points to a "valid" address, code just has to be sure the combination always points to enough memory to store an interrupt frame.

In the overwhelmingly common situation where your stack started with an SP of 0, you just assign SP back to 0 before setting SS and nothing can go wrong (you are, after all, about to move off this stack and don't care about its contents). Even the weirdest setups can be handled with a quick thunk through an SP that overlaps in the same 64k region as the target.

It's a clumsy interface but hardly something that Intel has traditionally shied away from (I mean, good grief, fast forward a half decade and they'll have inflicted protected mode transitions on us). I'm just surprised they felt this was worth a hardware patch.

I guess I was hoping for something juicier.

pm215

The problem is that in general "code" can't do that because the code that's switching stack doesn't necessarily conceptually control the stack it's switching to/from. One SS:SP might point to a userspace process, for instance, and the other SS:SP be for the kernel. Both stacks are "live" in that they still have data that will later be context switched back to. You might be able to come up with a convoluted workaround, but since "switch between stacks" was a design goal (and probably assumed by various OS designs they were hoping to get ports for), the race that meant you can't do it safely was definitely a design bug.

caf

Consider the case when the MSDOS command interpreter is handing off execution to a user program. It can't just set SP to 0 first (well, really 0xFFFF since the stack grows down), because it's somewhere several frames deep in its own execution stack, an execution stack which will hopefully resume when the user program exits.

Programs using the 'tiny' memory model, like all COM files, have SS == CS == DS so new_ss:old_sp could easily clobber the code or data from the loaded file.

munch117

> That doesn't sound like a hardware bug to me at all.

It isn't. It's a rare case of a software bug being worked around in hardware. That alone makes it fascinating.

The natural way of handling it in software would be to disable interrupts temporarily while changing the stack segment and pointer. You could just say that software developers should do that, but there's a performance cost, and more importantly: The rare random failures from the software already out there that didn't do this, would give your precious new CPU a reputation for being unstable. Better to just make it work.

duskwuff

> I understand the errata well enough, it implies that the processor state is indeterminate between the segment assignment and subsequent update to SP.

The root cause was effectively a bug in the programming model, not in the implementation -- there was no way to atomically relocate the stack (by updating SS and SP). The processor state wasn't indeterminate between the two instructions that updated SS and SP, but the behavior resulting from an interrupt at that point would almost certainly be unintended, since the CPU would push state onto a stack in the wrong location, potentially overwriting other stack frames, and the architecture as originally designed provided no way to prevent this.

wbl

It's not that the state is indeterminate it's that a programmer would have some serious trouble ensuring that it was possible to dump registers at that moment.

ajross

No, that would just be a software bug (and indeed, interrupt handlers absolutely can't just arbitrarily trust segment state on 8086 code where app code messes with it, for exactly that reason). The errata is a hardware error: the specified behavior of the CPU after a segment assignment is clear per the docs, but the behavior of the actual CPU is apparently different.

anyfoo

The CPU itself pushes flags and return address on the stack during an interrupt. If it's an NMI, you can't prevent an interrupt between mov/pop ss and mov/pop sp, which most systems will always have a need to do eventually.

Therefore, the CPU has consistent state, but an extra fix was still necessary just to prevent interrupts after a mov/pop ss.

mjw1007

The 8086 didn't have a separate stack for interrupts.

So, as I understand it, the problem isn't that code running in the interrupt handler might push data to a bad place; the problem is that the the processor itself would do so when pushing the return address before branching to the interrupt handler.

kklisura

> While reverse-engineering the 8086 from die photos, a particular circuit caught my eye because its physical layout on the die didn't match the surrounding circuitry.

Is there a software that builds/reverses circuity from die photos or this is all manual work?

speps

Check out the Visual 6502 project for some gruesome details: http://www.visual6502.org/

The slides are a great summary: http://www.visual6502.org/docs/6502_in_action_14_web.pdf

kens

Commercial reverse-engineering places probably have software. But in my case it's all manual.

_Microft

A Twitter thread by the author can be found here:

https://twitter.com/kenshirriff/status/1596622754593259526

raphlinus

Or, if you prefer Mastodon: https://mastodon.online/@kenshirriff@oldbytes.space/10941232...

kens

Usually if I post a Twitter thread, people on HN want a blog post instead. I'm not used to people wanting the Twitter thread :)

jiggawatts

Blogs are too readable.

I prefer the challenge of mentally stitching together individual sentences interspersed by ads.

IIAOPSW

Well I've got great news for you (1/4)

bhaak

Twitter threads force the author to be more concise and split the thoughts into smaller pieces. And if there are picture to the tweets that makes them even better.

It's a very good starting point, getting a summary of the topic, before diving into a longer blog post knowing what to expect and with some knowledge you probably didn't have before.

The way I'm learning best is having some skeleton that can be fleshed out. A concise Twitter threads builds such a skeleton if I didn't have it already.

ilyt

Forcing to be concise in nuanced topics usually does more bad than good. And format of Twitter is abhorrent.

Like, even your comment would need to be split into 2 twits

Then again I read pretty fast so someone writing slightly longer sentence than absolutely necessary doesn't really bother me in the first place

kjs3

For Odins sake, PLEASE don't add the mental complexity of just trying to read a Twitter thread on top on the mental complexity of grokking your posts. This is what blogs are for.

lazzlazzlazz

The Twitter thread is better. You get the full context of other peoples' reactions, comments, references, etc. This is lost with blogs.

It's the same reason why many people go to the Hacker News comments before clicking the article. :)

serf

>The Twitter thread is better.

let's recognize subjectivity a bit here. it's different, you may like it more -- that's great , but prove 'better' a bit better.

i've said this dozens of times, and i'm sure people are getting tired of the sentiment, but Twitter is a worse reading experience than all but the very worst blog formats for long-format content.

ads, forced long-form formatting that breaks flow constantly, absolutely unrelated commentary half of the time (from people that have absolutely no clout or experience with whatever subject), and the forced MeTooism/Whataboutism inherent with social media platforms and their trolling audience.

It gets a lot of things right with communication, and many people love reading long-form things on Twitter -- but really i'm just asking you to convince me here ; there is very little 'better' from what I see : help me to realize the offerings.

tl;dr : old man says 'I just don't get IT.' w.r.t. long-form tweeting.

Shared404

I like having the twitter/mastodon thread easily linked, but 10/10 times would rather see the article first.

kelnos

I won't quite say I don't care about other people's reactions (seeing as I'm here on HN, caring about other people's reactions), but Twitter makes it annoying enough to follow the original multi-tweet material such that I greatly prefer the blog post.

MBCook

So this was all to avoid having to re-layout everything and cut new rubylith, right? And at this point was all still done by hand?

I suppose you’d have to re-test everything with either a new layout or this fix, so no real cost to save there?

retrac

> And at this point was all still done by hand?

Sort of. From what I've gathered, at this point Intel was still doing the actual layout with rubylith, yes. But in small modular sections. The sheets were then digitized and stitched into the final whole in software; there wasn't literally a giant 8086 rubylith sheet pieced together by hand, unlike just a few years before with the 8080. But the logic and circuits etc. were on paper, and there was no computer model capable of going from that to layout. The computerized mask was little more than an digital image. So a hardware patch it would have to be, unless you want to redo a lot.

Soon, designers would start creating and editing such masks directly in CAD software. But those were just giant (for the time) images, really, with specialized editors. Significant software introspection and abstraction handling them came later. I don't think the modern approach of full synthesis from a specification, really came into use until the late 80s.

rasz

286 was an RTL model hand converted module by module to transistor/gate level schematic. Afair 386 was the first Intel CPU where they fully used synthesis (work out of UC Berkeley https://vcresearch.berkeley.edu/faculty/alberto-sangiovanni-... co-founder of a little company called Cadence) instead of manual routing. Everything went thru logic optimizers (multi-level logic synthesis) and will most likely be unrecognizable.

I found a paper on this 'Coping with the Complexity of Microprocessor Design at Intel – A CAD History' https://www.researchgate.net/profile/Avinoam-Kolodny/publica...

>In the 80286 design, the blocks of RTL were manually translated into the schematic design of gates and transistors which were manually entered in the schematic capture system which generated netlists of the design.

>Albert proposed to support the research at U.C. Berkeley, introduce the use of multi-level logic synthesis and automatic layout for the control logic of the 386, and to set up an internal group to implement the plan, albeit Alberto pointed out that multi-level synthesis had not been released even internally to other research groups in U.C. Berkeley.

>Only the I/O ring, the data and address path, the microcode array and three large PLAs were not taken through the synthesis tool chain on the 386. While there were many early skeptics, the results spoke for themselves. With layout of standard cell blocks automatically generated, the layout and circuit designers could myopically focus on the highly optimized blocks like the datapath and I/O ring where their creativity could yield much greater impact

>486 design:

    A fully automated translation from RTL to layout (we called it RLS: RTL to Layout Synthesis)
    No manual schematic design (direct synthesis of gate-level netlists from RTL, without graphical schematics of the circuits)
    Multi-level logic synthesis for the control functions
    Automated gate sizing and optimization
    Inclusion of parasitic elements estimation
    Full chip layout and floor planning tools

retrac

Thanks. Fascinating!

marcosdumay

I remember the Pentium was marketed as "the first computer entirely designed on CAD", well into the 90's. But I'm not sure how real was that marketing message.

eric__cartman

That's a good advertisement for 486s because they were baller enough to pull that off.

dboreham

Yes. A hardware patch.

dezgeg

And even still today this kind of manual hand-patching will be done for minor bug fixes (instead of 'recompiling' the layout from the RTL code).

One motivator is if the fixes are simple enough, the modifications can be done such they only affect the top-most metal layer, so there is no need to remanufacture the masks for other layers, saving time and $$$.

techwiz137

Ah yes, the infamous mov ss, pop ss that caused many a debuggers to fail and be detected.

klelatti

Fantastic work by Ken yet again.

It's striking how many of the early microprocessors shipped with bugs. The 6502 and 8086 both did and it was an even bigger problem for processors like the Z8000 and 32016 where the bugs really helped to hinder their adoption.

And the problem of bugs was a motivation for both the Berkeley RISC team and for Acorn ARM team when choosing RISC.

adrian_b

All the current processors are shipping with known bugs, including all Intel and AMD CPUs and all CPUs with ARM cores, also including even all the microcontrollers with which I have ever worked, regardless of manufacturer.

The many bugs (usually from 20 to 100) of each CPU model are enumerated in errata documents with euphemistic names like "Specification Update" of Intel and "Revision Guide" of AMD. Some processor vendors have the stupid policy of providing the errata list only under NDA.

Many of the known bugs have workarounds provided by microcode updates, a few are scheduled to be fixed in a later CPU revision, some affect only privileged code and it is expected that the operating system kernels will include workarounds that are described in the errata document, and many bugs have the resolution "Won't fix", because it is considered that they either affect things that are not essential for the correct execution of a program, e.g. the values of the performance counters or the speed of execution, or because it is considered that those bugs happen only in very peculiar circumstances that are unlikely to occur in any normal program.

I recommend the reading of some of the "Specification Update" documents of Intel, to understand the difficulties of designing a bug-free CPU, even if much of the important information about the specific circumstances that cause the bugs is usually omitted.

albert_e

Fascinating!

Are there any good documentary style videos that dive into some of such details of chips and microprocessor architectures?

caf

I believe this one-instruction disabling of interrupts is called the 'interrupt shadow'.

Daily Digest email

Get the top HN stories in your inbox every day.