Safe Refactoring with Property-Based Testing

The scenario is classic: a legacy function $f$ , tangled, difficult to maintain, but whose behavior is correct (or at least accepted as such by system users).

We want to rewrite it as a cleaner, more performant, better structured version $f'$ . The risk is obvious: introducing a subtle regression, a forgotten edge case, a silent divergence.

The traditional approach is to write unit tests covering known cases, but how do we ensure we haven’t omitted a case? Property-based testing offers a much stronger guarantee: verifying that $f(x) = f'(x)$ for thousands of randomly generated values.

Legacy Code as Oracle

This property, namely equivalence between the old and new implementation, is disarmingly simple to state.

We don’t describe what the function should do; we simply assert that the new version does exactly the same thing as the old one. The legacy code becomes its own oracle, its own executable specification. This is knowledge extraction: the old code’s behavior is the specification, even when nobody remembers why it works that way.

The PBT library generates arbitrary inputs conforming to the function’s domain, submits them to both implementations, and verifies equality of results. If a divergence appears, shrinking identifies the minimal input that distinguishes the two behaviors: this becomes a precise test case revealing exactly where the new implementation deviates.

Generator Quality

Implementation requires some precautions.

First, we must define generators that produce inputs representative of the function’s real domain:

A function working on users needs a user structure generator
A function manipulating trees needs a valid tree generator

The quality of verified refactoring depends directly on generator quality: overly uniform inputs will miss edge cases, malformed inputs will test a domain the function never encounters in production.

Analysis of real data can guide the design of relevant generators.

Defining Equality

Equality itself deserves attention:

For primitive values, === suffices
For complex structures, structural equality is needed
For floating-point results, approximate equality with epsilon
For functions returning effects, we must compare the produced effects or final values after interpretation

If the original function has side effects (mutating global state, writing files), we must either isolate them or capture and compare effect traces.

These complications reflect the inherent difficulties of legacy code; PBT doesn’t make them disappear, it makes them explicit.

A Safety Net for Bold Refactoring

This technique constitutes a remarkably effective safety net for behavior-preserving refactorings.

We can restructure code boldly (extracting functions, introducing abstractions, changing internal data structures) while continuously verifying that observable behavior remains identical.

Each test run explores a vast input space, far beyond what a human could imagine.

When refactoring is complete, when $f'$ has proven its equivalence with $f$ across thousands of cases, we can delete $f$ with relative confidence. The legacy code has served one last time: as the specification of its own replacement.

This approach extends the lifespan of software assets. That ten-year-old codebase doesn’t need to be thrown away: it can be improved piece by piece, with confidence. Complete rewrites are expensive and risky; incremental refactoring secured by PBT is economical and safe.

Want to dive deeper into these topics?

We help teams adopt these practices through hands-on consulting and training.

Schedule a call Talk to us on Discord

or email us at contact@evryg.com