What to check
- Check power and gnd connectivity on all Ics and discretes
- Check Vcc with a scope, not a meter
- Check clock connectivity
- Is the oscillator running: X2 pin should be toggling
- Is the processor is out of RESET: Reset pin should be low
- Activity on continuously active signals:
- AD (LSBs are best)
- A (LSBs are best)
- Address and data lines: Are bus lines shorted together? Probe all address and data lines
- All ICs – Chip selects, WR and RD
- Transceivers' voltage levels OK?
- HOST settings
- Port settings (COM, LPT), etc
- Chips in their sockets
- It's common for a pin to bend under the chip and appear to go into the socket
- Not done yet ?
- Timing analysis (data sheets, LA etc)
- Analyze data transfers (LA)
- Control registers
- Should be correctly initialized/programmed
- Make sure you are protecting all your registers.
- Make sure you restore protected values: The same number of values in the reverse orer.
- Use RETI not RET: Otherwise your interrupt will only be executed once and mysteriously stops executing
- Enable only one interrupt at a time, and even if it seems to work spend a few minutes checking its operation. This is a lot faster than searching for the cause of some weird problem later when a dozen interrupts are flying around.
- Does the stack balance: the stack just after the ISR return must be identical to the stack just before it
- Observe the behavior to find the bug (Determine the bug's symptoms) Often, one needs to reproduce it at all (if reported by a client from field) and make it "happen" often enough (e.g. more often than once a day) - often helps if the application is "overloaded" - as many inputs (including communication channels) is activated at a time and as quickly as possible.
- Gain as much information as possible about it
- Think about possible “reasons” that can cause the problem seen (Round up the usual suspects)
- Generate a hypothesis based on cause and effect
- Generate an experiment to test the hypothesis (what’s wrong and why)
- Fix the bug (Not a quick and dirty fix)
- Provide proof that the change really fixed the problem (Not merely because the symptom has disappeared)
- Look for way to find any similar problems.
- Stay focused on a single problem at a time.
- Zealot's regard for the scientific method
- Iterative loop of focus, hypothesis, and experiment.
- Never figure anything is working right until proven by repeated experiment
- Problems that mysteriously go away tend to mysteriously come back unless you can prove that the change really fixed the problem
- There is no cosmic conspiracy against you !
- Most problems have simple causes and solutions
- Know the electrical properties of the components in your system: Absolute maximum ratings of currents and voltages; Maximum frequency of operation; ESD
- Analytical thinking: Think about possible “reasons” that can cause the problem seen; Check the input to the system – your design and its schematics (remember GIGO)
- Problem domain identification: Eliminate hardware issues. Then it’s the software’s turn
Always design your system with debug in mind!
- provide a few spare pins that you can watch on a scope, or attach a few LEDs; have the software write easily recognisable patterns to this at key points. Pull the spare pins out to a pad. BTW that pad does come in mighty useful if you have to make a "forgotten" connection to a pin.
- if your chip has JTAG debug, provide access to it! For production, you could just not fit the actual connector - but keep the access there.
- have the facility to write debug "trace" messages to a serial port.
- set aside some memory for statistics recording, post-mortem info, etc - and provide a means to access it!
Keep the tracking of the changes you make in the program. First try to find the bug in these parts.
- The experience is, that most of the bugs originate from recently developed program parts. Use version management programs such as CVS and SVN which allow rollback and compare to older (known good) versions.
debugging at wikipedia