Debugging The Hard Stuff

I’m used to working on very difficult problems, like the most complicated electronic circuitry on the planet.

ScreenHunter_994 Jul. 13 09.50

This award was from the IBM/Motorola/Apple Power PC design team in Austin, Texas, which developed the brain of Macs and Playstations for many years.

Microprocessor designs have to be perfect. They can’t make any mistakes, ever. Modern designs contain billions of transistors, and every one of them has to work perfectly – all the time. The way you become a “Debug God” is by analyzing the design through many different approaches. Any single methodology is inadequate and doomed to failure, so I developed and utilized dozens of different methodologies to make sure that no bugs slipped through. During my 20 year career designing microprocessors, no silicon bug was ever found in any part of any design which I worked on.

By contrast, finding flaws with adjustments to the temperature record is like shooting fish in a barrel.  The work being done would fail any reasonable high school science class. It is a complete joke, and doesn’t even vaguely resemble science.

The mistake people keep making is using the same broken methodology to verify the original broken methodology. In order to find problems, you need to analyze the data from many different angles.

No methodology is perfect, but a straight up average of raw temperatures is the simplest, cleanest and most effective way to spot trouble. It may be less precise than other methods, but exposes potential fundamental issues related to accuracy. Once you start gridding, adjusting and infilling, you can no longer see the forest for the trees – and you are lost.

[youtube=https://www.youtube.com/watch?v=nvlTJrNJ5lA&feature=kp]

 

About Tony Heller

Just having fun
This entry was posted in Uncategorized. Bookmark the permalink.

27 Responses to Debugging The Hard Stuff

  1. David A says:

    “Once you start gridding, adjusting and infilling, you can no longer see the forest for the trees – and you are lost”

    Yes, absolute FUBAR.

  2. Steve Keohane says:

    Making stuff actually work is a different world than sitting at a desk imagining catastrophic results from an ill-formed theory with no measurable basis in reality, and ‘adjusting’ reality to conform to the fantasy, thus proving the fantasy. The desperation is palpable.

  3. tom0mason says:

    Well done, what an awesome piece of technology that was.
    Maybe the IC would have worked better if you averaged the doping levels across the whole device.

  4. gregole says:

    Awesome achievement! Your dedication to this blog and to the truth is a benefit to us all.

  5. Ben Vorlich says:

    Never let a designer test what he has designed, he’ll only confirm that “the algorithm is working as designed”. You have to give it to someone who is paid to or wants to break it or someone who has to use it.

    As someone once said of users:
    Engineers are developing better and better idiot proof systems, but nature is developing better and better idiots

    • _Jim says:

      Reason why rigorous environmental testing (including what was termed colloquially as ‘shake and bake’ in the mil contract and manufacturing business) is performed on product destined for service in harsh environments, including space and satellite environs. Even commercial products normally gets subjected to internal testing such done by a performance engineering test group which will hopefully find any ‘weak’ areas before customers in the field begin to phone customer support with issues they have found …

      Once a design is released to manufacturing though, the ‘cost hawks’ begin to converge and shop for the cheapest components available … this can raise other issues in the field if improper choice of cheaper components results in out-of-tolerance operation beyond that which the automated EOL (End of production Line) manufacturing tests screen for at room temperature (speaking of commercial gear; testing for space or mil apps normally include full shake and bake testing of product per contract requirements).

      .

      • emsnews says:

        I used to make computer board prototypes for Texas Instruments way back in the Stone Age.

        I had to drill and finesse everything EXACTLY right or else start all over again. The slightest error in drilling, etc. meant throwing it away and sometimes it took a dozen times to get a perfect specimen.

        The data, the chemistry, the layout, the holes, the various electronic parts had to be PERFECT. No flaws or if there was a slight flaw, in the production this would ‘travel’ over time into worse and worse territory.

        All this had to be done using primitive tools! We used REAL primitive tools. I remember the day my boss came rushing in from Texas waving this small square and yelling, ‘Elaine, you are out of a job!’

        It was the first microchip.

        • PiperPaul says:

          Reminds me of the days before CAD. Making revisions to complex manually-drawn technical drawings (in my case, piping, as in refineries and the like) was so difficult that you made damn sure what you were putting on paper was correct. Often today it seems like ‘Fire, Ready, Aim’.

  6. stpaulchuck says:

    you want to use the actual readings on the thermometers?? HOW DARE YOU!? [/sarc, for those who couldn’t tell]

  7. A C Osborn says:

    The trouble is very few people want to test this particular product, certainly no “Scientists” that is for sure.
    After all if they found it lacking what would they do with that information?

  8. manicbeancounter says:

    Excellent analogy. We can learn a lot from comparing and contrasting with different areas.
    Others that I like are:-

    In the criminal courts, what do you expect would happen if the defense was unable to question the evidence, particularly for data tampering and quality standards? Or if the defense lawyers were denounced for being paid to be biased? Or if guilt was decided by opinion polls of local police officers to see if they trusted the work of the colleagues?

    In the pharmaceutical industry, the scientists are of the highest quality, and have saved millions of lives through the drugs produced. Maybe we should let them decide their own testing procedures and quality control standards. Also pass laws to prevent “science deniers” from suing them when people start dropping dead or getting unpleasant side effects.

    In accountancy, examinations to achieve professional accreditation are rigorous. What right had “inexpert mouths” (Stephen Lewandowsky’s phrase) to question their work and ethical standards post Enron?

  9. Tel says:

    Every generation of Intel 386 and descendants had bugs, but anyhow people just worked around the bugs in software. The market still prefers Intel. Even Apple moved to Intel, eventually.

    Also, I think the ARM took over the PPC position as a cool running, low power device. That left the PPC in a kind of middle ground, not the first choice for either mobile devices or servers. I guess there’s a bit of specialist industrial stuff that PPC is still holding onto. I’m told IBM still use a lot of them.

    That said, more competition is good. Intel probably would never have pushed ARM to where it is if they weren’t feeling pressure from their lack in the embedded area.

    • Microprocessors don’t have a software layer to shield them, and I worked for Intel for many years designing Itanium and i7.

    • _Jim says:

      re: Tel July 13, 2014 at 9:59 pm
      Every generation of Intel 386 and descendants had bugs, but anyhow people just worked around the bugs in software.

      This was documented where? So’s the assembler-writing folk and the compiler-writing folk (for C, FORTRAN etc compilers) can, you know, write their ‘product’ around those documented ‘deficiencies’ …

      You’re aware some code can still be written in assembler for drivers and the like? So, it would be important to know which specific instructions to avoid, or sequence of instructions to avoid.

      You may be remembering this story:
      – – – – –
      The Pentium Chip Story: A Learning Experience
      by Vince Emery

      June 1994: Intel testers discover a division error in the Pentium chip. Intel managers decide that the error will not affect many people and do not inform anyone outside the company.

      The same month, Dr. Thomas R. Nicely, a professor of mathematics at Lynchburg College, Virginia, notices a small difference in two sets of numbers. He double-checks all his work by computing everything twice, in two different ways. Dr. Nicely spends months successively eliminating possible causes such as PCI bus errors and compiler artifacts.

      Wednesday, October 19: After testing on several 486 and Pentium-based computers, Dr. Nicely is certain that the error is caused by the Pentium processor.

      Monday, October 24: Dr. Nicely contacts Intel technical support. Intel’s contact person duplicates the error and confirms it, but says that it was not reported before.

      Sunday, October 30: After receiving no more information from Intel, Dr. Nicely sends an email message to a few people, announcing his discovery of a “bug” in Pentium processors. (Dr. Nicely’s original email message)

      The speed at which events develop from that email message graphically illustrates the nature of public relations on the Internet. This is how PR works today. Businesses of all kinds, take note.

      That same day, Andrew Schulman, author of Unauthorized Windows 95, receives Dr. Nicely’s email.


      – – – – – – .

      Moral of the story (at least for Intel): ‘Come clean’ and come clean early when it comes to err disclosure …

      .

      • You can’t have software workarounds for processors running Windows, because they have to run legacy code.

        • _Jim says:

          That’s the big difference between the Mac crowd and Windows crowd: the latter crowd (like me) still run legacy DOS apps on occasion to calculate something obscure like the parameters of Log Periodic antenna with S/W written by somebody back in the 80’s …

    • I once convinced the CEO of ARM to fund me to develop a dual ARM/x86 chip – but his lawyer nixed it because they made most of their money off Intel licensing.

  10. Chewster says:

    Unfortunately excellent designs don’t always lead to excellent final products.
    Many manufacturers lack quality control and huge problems arise from poor layout and final circuit assembly/soldering at the micro-miniature level.

    • _Jim says:

      With today’s modern reflow ovens and self-cleaning fluxes, those issue are moot for all but a few ‘houses’ that might be using out-of-date solder paste or with ‘boards’ that didn’t have the proper finish plating; I went down that road of layout and design and manufacturing with an RF customer we had that was basically doing WiMax (before the ‘standard’ was established) on the 1.9 thru 3.7 GHz bands in the mid to late 2000 time frame.

  11. Scott says:

    Anyone recognize the name “Route 66 (604 or better)”? I was able to run it easily with a 603ev at 180 MHz with ample RAM and a 1MB L2 cache (biggest available at the time)…but when it first came out, 180 MHz wasn’t available (except for maybe the 604).

    -Scott

  12. squid2112 says:

    I write software/firmware for embedded devices (Gateways & Energy Servers on ARM processors). We implement more testing mechanisms than we do development. We must be perfect when dealing with high energy systems (Terawatt and more). One tiny error and someone can die. I completely agree with your “debugging” methodology as I (we) do the same daily. It is a must. We have zero tolerance at all times.

  13. pwl says:

    However lofty or useful “gridding, adjusting and infilling” might be it is data fabrication and the more steps you do in your “gridding, adjusting and infilling” the less connection it has with the objective reality of Nature. The more a scientist’s work has data fabrication the more likely it’s scientific fraud and if they received monies as a result, financial fraud.

    If the data needs gridding data fabrication the solution is simple, add more sensors in the areas where they are lacking (or use satellite based sensors if available). Oh and just don’t grid. It’s an illusion anyhow as Nature isn’t gridded. Please show me the regular grid lines that Nature made? Please. There are none, grids are just made as lazy programmers and lazy scientists like things in neat matrices for doing math operations to further fabricate the data. Nature isn’t gridded, learn to program without grids. Grids are one of the problems with climate models, Nature isn’t gridded but climate models are. Any climate model that uses “gridding” is utterly nonsense.

    If the data needs adjusting data fabrication then don’t do it. If you do adjustment data fabrication it must be labeled as such and if it ever is presented as “temperature data” when it’s really a fantasy adjusted data and you receive money as a result that is not just scientific fraud but financial fraud as well. Just do not do data adjustments. They are merely a way to avoid the harsh objective reality of Nature by fabricating a fantasy reality as if it’s magically real somehow.

    If the data needs “infilling” the solution is simple, add more sensors in the areas where they are lacking (or use satellite based sensors if available). Infilling of data or as it is more accurately called fabricating fraudulent data is scientific fraud.

    Gridding is data fabrication. It’s fantasy.

    Adjusting the data is data fabrication. It’s fantasy.

    Infilling the data is data fabrication. It’s fantasy.

    [The most serious limitation with climate models is that the climate systems are composed of many simple systems that exhibit complex behavior due to the internal randomness that these simple systems generate. Note this internal randomness is a newly discovered form of randomness and is quite distinct from the other forms.

    Therefor these simple systems that exhibit complex behavior due to internal randomness can never be predicted, one can only observe, measure and record their behavior. Yes these systems can never be predicted. Never.

    When you attempt to apply “classical science” to simple systems that generate internal randomness you will inevitably fail due to their inherent randomness and thus their inherent unpredictability. This is why classical science is failing to comprehend the climate, the climate is made of many simple systems that generate complex random behavior. Note this is not the randomness of external systems, that also exists in climate systems and is yet another reason climate systems are impossible to predict.

    To learn a lot more about how classical science isn’t up for modeling the climate and for the proof regarding this new kind of internal randomness that some simple systems generate see Chapter Two of Stephen Wolfram’s A New Kind Of Science, http://pathstoknowledge.net/2010/08/25/wolframs-a-new-kind-of-science-upsets-climate-models-and-weather-forecasting%5D.

    “Two important characteristics of maps should be noticed. A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness.” – Alfred Korzybski

  14. Bryan says:

    Okay, but if some areas are overrepresented with sensors, relative to other areas, isn’t there some objective way to avoid giving the overrepresented areas too much weight? Can’t this be done without data fabrication? I thought that is what gridding is about.

Leave a Reply

Your email address will not be published. Required fields are marked *