Module 07
Digital Board Troubleshooting
Digital boards look intimidating — hundreds of identical-looking ICs — but they obey a brutally simple hierarchy: power → clock → reset → bus activity → logic. Faults at each level produce recognizable signatures. Work the hierarchy top-down and most "dead computer board" mysteries fall in minutes.
1. Logic levels — the vocabulary
A digital signal is a voltage interpreted as 1 or 0 against thresholds:
| Family | Supply | Input reads LOW below | Input reads HIGH above | Notes |
|---|---|---|---|---|
| TTL / LVTTL | 5V / 3.3V | 0.8V | 2.0V | The classic thresholds |
| 5V CMOS (HC etc.) | 5V | ~1.5V (0.3×Vdd) | ~3.5V (0.7×Vdd) | Tighter than TTL |
| 3.3V LVCMOS | 3.3V | 0.8V | 2.0V | Ubiquitous now |
| Lower rails | 2.5/1.8/1.2V | proportional | proportional | Core rails of FPGAs/CPUs |
The space between thresholds is undefined — a signal loitering there (e.g., 1.4V on a 3.3V input) is a defect signature in itself:
- Floating input (broken trace/joint, missing pull resistor) — drifts, picks up noise
- Bus contention — two outputs fighting; also shows as runt pulses and a warm chip
- Dead driver sagging under load, or a partial short
5V-tolerant vs not: driving 5V into a non-tolerant 3.3V input kills I/O — relevant when mixed-rail boards fail at interfaces.
2. The signal types you'll meet
- Clock — continuous square wave; the heartbeat. Sources: crystal + IC oscillator circuit, or a packaged oscillator module (4-pin can: power, ground, output, sometimes enable).
- Reset — usually active-low (
/RST,RESET̄). A supervisor IC holds it low until rails are stable, then releases. Watchdog timers yank it again if firmware stops petting them. - Chip select / enable (
/CS,/OE,/WE) — active-low gating signals. A device that's never selected never answers; a select stuck active causes contention. - Buses — parallel (address/data, mostly legacy/inter-chip) and serial:
- UART/RS-232/RS-422/485: idle high (logic level), start bit low, async. RS-422/485 are differential pairs — common on aerospace boxes, as is ARINC 429 (aircraft data bus, differential, distinctive bipolar waveform) and MIL-STD-1553 (transformer-coupled differential bus on military platforms). You don't need to master these protocols to repair around them — you need to recognize healthy vs dead activity on them and check the transceivers/transformers.
- I²C: SCL+SDA, open-drain, pull-ups, idle high. Stuck-low line = hung device or missing pull-up.
- SPI: SCLK, MOSI, MISO, /CS — bursts of clock with data, push-pull, fast.
- JTAG (IEEE 1149.1): the test access port (TCK/TMS/TDI/TDO) — boundary scan lets ATE wiggle and read IC pins from inside the chips, which is how testers verify BGA connections nobody can probe. Know which connector it is; the Teradyne may use it (09 — Teradyne Machines & Automated Test (ATE)).
3. The top-down ritual for a "dead" digital board
- Rails (06 — Troubleshooting Methodology §4): every rail present, in tolerance, clean. Modern boards have power sequencing requirements — rails must come up in order; a sequencer or enable daisy-chain that stalls leaves later rails at 0V with nothing "broken." Check enable pins of dead regulators.
- Clock: scope the oscillator output. Right frequency, healthy amplitude? No clock → oscillator power/enable → crystal and its load caps → replace crystal (mechanically fragile; prime suspect after drops/vibration).
- Reset: scope it through a power cycle (single-shot). Must release. Stuck low → supervisor IC, sagging rail (supervisor is doing its job), or a shorted reset net. Cycling repeatedly → watchdog: processor is crashing — go look at memory bus, core rails under load, firmware integrity.
- Activity: with power+clock+reset good, a working processor does things: bus bursts after release, chip selects strobing, status LEDs, UART chatter. Total silence with a good heartbeat = processor/firmware/memory — on a repair bench this is where boundary scan/functional ATE earns its keep, or where the diode-signature comparison against a golden board (04 — DMM Mastery §5) hunts dead I/O.
4. Digital failure signatures
| Symptom | Likely causes |
|---|---|
| Output stuck high or low regardless of input | Dead driver stage in IC; shorted net (to rail or ground); input side never toggling — trace upstream first |
| Mid-level voltage (~half rail) on a push-pull line | Contention (two drivers), or measuring a fast signal with a DMM (it averages! — scope it before declaring weirdness) |
| Runt pulses | Contention, weak driver, cracked joint making intermittent contact |
| One bit of a bus dead, others fine | Open trace/via/joint on that line, bent pin, ESD-killed pin |
| Adjacent pins shorted | Solder bridge (rework history?), dendrite, conformal-coat-hidden whisker |
| Works cold, dies warm (or inverse) | Cracked joint/via, marginal IC — freeze spray + heat gun to localize |
| I/O dead only on one connector | ESD/overvoltage entered there — check transceivers and series protection (these interface parts are sacrificial by design; transceiver replacement is the most routine of digital repairs) |
| Random crashes/watchdog resets | Rail ripple/sag under load, marginal clock, failing memory, intermittent joint |
DMM on logic — know its limit: a DMM averages. A 50% duty 3.3V clock reads ~1.65V DC, which looks exactly like contention. Anything that might be toggling gets the scope, not the meter. The DMM's digital jobs are: rails, continuity of bus lines (power off), junction signatures, and stuck-at levels confirmed static by scope first.
5. ESD discipline is a digital-board survival rule
Modern CMOS dies at static levels you can't feel (damage threshold far below the ~3kV human perception threshold). Worse than instant death is the walking wounded part: ESD-degraded, passes today, fails on the aircraft. This is why aerospace ESD rules are absolute, not ceremonial — wrist strap, dissipative mat, grounded iron, parts in shielded bags until installation. Full discipline in 10 — Aerospace Standards, ESD, and Workmanship.
6. Repairing around big silicon
You will rarely "fix" a processor — you'll prove the environment around it (rails, clock, reset, interfaces) good, prove its connections good (boundary scan / signature comparison), and replace it only when it's the last suspect standing. BGA replacement is specialist rework (hot-air/IR station, profiles, X-ray verification) — your shop will have a process and a designated station; your diagnostic job is to be sure before that expensive step.
7. Self-check
- Recite the hierarchy. Power → clock → reset → activity → logic
- A 3.3V line reads 1.6V on the DMM. Name the two very different explanations and the tool that separates them. Healthy toggling clock averaging out, vs contention/float — the scope
- Reset pulses low every 1.6 seconds. Watchdog — processor crashing; check rails under load, clock, memory
- I²C bus: SDA permanently low. Hung slave holding it, shorted net, or dead pull-up — power-cycle test, then isolate devices
- Why does ATE use boundary scan on BGA boards? Solder balls are unprobeable and invisible — 1149.1 tests the connections electrically from inside the ICs