ABI Stability in a Cross-Platform C++ SDK

One of the hardest engineering problems I’ve faced isn’t the cloud communication, the threading model, or the cross-platform build — it’s ABI stability.

This article is about what ABI stability actually means in practice, why it’s harder than most engineers expect, and the design decisions we made to get it right. I’m writing it because most articles on this topic are theoretical. This one isn’t.

What ABI Stability Actually Means

ABI — Application Binary Interface — is the contract between your compiled library and the code that links against it. It defines:

  • How functions are called at the binary level (calling conventions, name mangling)
  • How data structures are laid out in memory (field offsets, padding, alignment)
  • How virtual dispatch works (vtable layout)
  • How exceptions propagate across module boundaries

Source compatibility means your users can recompile and it works.

ABI compatibility means your users don’t need to recompile at all. The new library loads and works with old binaries.

ABI compatibility is not optional. Enterprise customers deploy your SDK embedded in their product. Their product ships to 50,000 enterprise endpoints. You release a patch. You cannot ask 50,000 endpoints to recompile their entire product stack. The new version of your SDK must load and work — without a single recompile — against binaries built against the old version.

If you break ABI, you break production. Silently. At 3am.

The Ways ABI Breaks — A Practical Taxonomy

  1. Adding a data member to a class
// v1.0 — users compile against this
class ScanResult {
    int device_id;
    ScanStatus status;
    // sizeof(ScanResult) == 8
};

// v1.1 — you add a field
class ScanResult {
    int device_id;
    ScanStatus status;
    uint64_t timestamp_ms;   // ← ABI break
    // sizeof(ScanResult) == 16
};

Old binaries allocated 8 bytes for ScanResult. New library writes 16 bytes. Stack corruption. Heap corruption. The crash happens somewhere completely unrelated and takes hours to diagnose.

2. Changing the order of virtual functions

// v1.0
class IDeviceScanner {
    virtual ScanResult scan() = 0;         // vtable slot 0
    virtual void cancel() = 0;             // vtable slot 1
};

// v1.1 — adding a function in the middle
class IDeviceScanner {
    virtual ScanResult scan() = 0;         // vtable slot 0
    virtual bool isRunning() = 0;          // ← vtable slot 1 now
    virtual void cancel() = 0;             // ← vtable slot 2 now
};

Old binaries call vtable slot 1 expecting cancel(). They get isRunning(). Wrong function called with wrong arguments. Undefined behaviour that may not crash immediately — it may silently corrupt state and crash .

What We Do About It — The Design Patterns

After shipping several versions of the SDK across three platforms, here are the patterns that actually work in production.

Pattern 1 — The Opaque Handle / PIMPL

The most powerful ABI stability pattern. Hide everything implementation-specific behind an opaque pointer.

// Public header — this is what users see
// scanner.h

class Scanner {
public:
    Scanner();
    ~Scanner();

    Scanner(const Scanner&) = delete;
    Scanner& operator=(const Scanner&) = delete;
    Scanner(Scanner&&) noexcept;
    Scanner& operator=(Scanner&&) noexcept;

    ScanResult scan(const ScanRequest& request);
    void cancel();
    bool isRunning() const;

private:
    struct Impl;                    // Forward declare only
    std::unique_ptr<Impl> impl_;    // Opaque pointer
};
// scanner.cpp — implementation details hidden from users

struct Scanner::Impl {
    // Everything that can change without breaking ABI lives here
    std::atomic<bool> running_{false};
    std::thread worker_;
    ThreadSafeQueue<ScanRequest> queue_;
    PolicyEngine policy_engine_;
    TelemetryCollector collector_;
    // Add fields here freely — no ABI impact
};

Scanner::Scanner() : impl_(std::make_unique<Impl>()) {}
Scanner::~Scanner() = default;  // Must be in .cpp where Impl is complete

ScanResult Scanner::scan(const ScanRequest& request) {
    return impl_->execute_scan(request);
}

Why this works: The public class Scanner never changes size. It’s always sizeof(unique_ptr<Impl>) — one pointer. Users compile against the public header. The Impl struct lives entirely in your translation unit and can change freely between releases.

The tradeoff: One heap allocation per object. One indirection on every method call. , this is entirely acceptable. For a tight inner loop processing network packets — measure first.

Pattern 2 — Pure Virtual Interfaces with Factory Functions

For polymorphic components, never expose a class hierarchy directly. Expose a pure virtual interface and a factory function.

// Public header — the interface contract
// i_device_scanner.h

class IDeviceScanner {
public:
    virtual ~IDeviceScanner() = default;

    virtual ScanResult scan(const ScanRequest& request) = 0;
    virtual void cancel() = 0;

    // NEVER add virtual functions in the middle.
    // ALWAYS add at the end.
    virtual bool isRunning() const = 0;           // added in v1.1
    virtual ScanMetrics getMetrics() const = 0;   // added in v1.2
};

// Factory — the only way to create implementations
// Returns raw pointer deliberately — caller manages lifetime explicitly
extern "C" IDeviceScanner* createScanner(const ScannerConfig& config);
extern "C" void destroyScanner(IDeviceScanner* scanner);

Key rules:

  1. Never insert virtual functions — always append at the end
  2. Use extern "C" factory functions — avoids C++ name mangling issues across compiler versions
  3. Return raw pointers from factory functions — unique_ptr with custom deleters breaks across module boundaries

Conclusion

ABI stability is one of those problems that seems simple until you ship a library that runs on millions of device and gets a bug report that says “it worked fine until we upgraded the SDK and now it crashes on a subset of machines with no relevant log output.”

The code in production C++ SDKs that runs quietly and never causes incidents isn’t magic. It’s disciplined application of these patterns, enforced by automated ABI compatibility checks in CI, reviewed by engineers who understand why the rules exist.

That discipline is what separates a library from a product.