Circuit Card Academy

Module 06

Troubleshooting Methodology

Tools and component knowledge are ammunition; methodology is marksmanship. This module is the universal process that makes you efficient (fewest measurements to the answer) and accurate (the answer is actually the root cause). Memorize the sequence; the rest of your career is refining it.

The sequence

1. Understand the complaint → 2. Look → 3. Power-off checks → 4. Power-on: rails → 5. Heartbeat (clock & reset) → 6. Localize (half-split) → 7. Confirm the component → 8. Root-cause → 9. Repair, verify, document

1. Understand the complaint

Read the work order / ATE failure ticket / squawk completely before touching the board. What test failed, with what measured value, against what limit? Intermittent or hard? Any history (repeat offender)? The failure ticket tells you where to look — it does not always tell you what is broken (a "failed resistor" at ICT may be a cracked solder joint, a bent test pin, or a neighbor loading the net — see 09 — Teradyne Machines & Automated Test (ATE)). Define "known good": get the schematic, the test spec (expected voltages/currents/waveforms), and ideally a golden board for comparison.

2. Look (and smell)

Five disciplined minutes under magnification beats an hour of probing. Aerospace shops will give you a stereo microscope — use it. Hunt for:

3. Power-off checks

4. Power-on: rails first

5. Heartbeat: clock and reset

For anything with a processor/FPGA: rails good → clock running (scope) → reset releasing properly (scope, single-shot on power-up). No clock or perpetual reset explains almost any "board is totally dead but rails are fine" complaint. See 07 — Digital Board Troubleshooting.

6. Localize: half-split / signal tracing

The efficiency engine. The board is a signal chain: stimulus in one end, response out the other.

7. Confirm the component

Before condemning a part: isolate it (lift a leg / remove it) and verify it actually fails out of circuit, or prove it with an in-circuit measurement that has no alternative explanation. The joint, the via, and the trace are suspects of equal rank with the component — "R47 open" at the tester is just as often "R47's pad cracked."

8. Root-cause

Ask why it failed before closing:

9. Repair, verify, document

Failure statistics — where to bet first

Industry experience across electronics reliability consistently ranks the usual suspects roughly:

  1. Solder joints and interconnects (joints, vias, traces) — the plurality of all board faults, rising with thermal cycles and vibration (aerospace life is exactly that)
  2. Connectors and contacts — fretting, corrosion, bent pins, cable strain
  3. Capacitors — electrolytics dry out, tantalums short, MLCCs crack
  4. Mechanical/electromechanical — relays, switches, fans
  5. Semiconductors — usually killed by something else: overvoltage, ESD, heat, the shorted cap next door

So: the boring physical stuff first, exotic IC failures last. The microscope and the continuity beeper out-diagnose the logic analyzer most days.

Efficiency habits

Self-check

  1. Recite the 9-step sequence from memory.
  2. Board draws 10× expected current at power-up. Which steps does this prune?
  3. ATE says C31 failed, measuring 2nF vs 100nF expected. Give three explanations beside "bad cap."
  4. What's the fastest way to halve a 12-stage signal chain fault?
  5. Why must a burnt resistor never be just replaced and shipped?

Next: 07 — Digital Board Troubleshooting