From 0 to 8,900 Tests: Our Testing Philosophy

UniAuth's test suite is in the high thousands of cases across hundreds of files, running end-to-end on a single developer laptop in tens of seconds. This post shares the patterns and philosophy behind it: how we structure mocks, where we draw the unit/integration boundary, and why we insist that every security fix starts with a failing test. (Exact counts drift with every commit — run npm test -- --reporter=verbose locally for today's number.)

The Mock-DB Pattern

The foundation of our test performance is the mock database. UniAuth's database layer is a thin wrapper around pg.Pool in lib/db.ts. Every test file that touches database code replaces this module with a mock:

// In any test file
vi.mock('@/lib/db', () => import('../helpers/mock-db'))

The mock-db module (tests/helpers/mock-db.ts) provides a pool object with a query method that returns empty results by default. Test cases override it per-test to return specific data:

import { vi } from 'vitest'

const mockQuery = vi.fn().mockResolvedValue({ rows: [], rowCount: 0 })

export default {
  query: mockQuery,
  connect: vi.fn().mockResolvedValue({
    query: mockQuery,
    release: vi.fn(),
  }),
}

export { mockQuery }

This pattern gives us two critical properties:

Speed. No database connections, no teardown, no data pollution between tests. The mock is a pure in-memory function call.
Determinism. Each test controls exactly what the database returns. There are no races between tests inserting or deleting rows, no shared state, and no sensitivity to test execution order.

The trade-off is that the mock does not validate SQL syntax, enforce foreign keys, or test query performance. Those concerns are covered by a separate layer: real-Postgres integration tests.

vi.mock and vi.hoisted: The Initialization Dance

Vitest (like Jest) hoists vi.mock() calls to the top of the file, before any imports. This means the mock is in place before the module under test loads its dependencies. However, when you need to customize the mock's behavior with variables that are defined after the mock call, you hit a timing problem.

The solution is vi.hoisted(), which creates a value that is available in the hoisted scope:

const { mockQuery } = vi.hoisted(() => {
  const mockQuery = vi.fn()
  return { mockQuery }
})

vi.mock('@/lib/db', () => ({
  default: { query: mockQuery },
  query: mockQuery,
}))

Now mockQuery is accessible both in the mock factory (which runs at hoist time) and in the test body (which runs at execution time). Every test file in UniAuth follows this pattern, and it has become muscle memory for the team.

Unit vs Integration: The Route Handler Boundary

We draw a clear line between unit tests and integration tests based on what they exercise:

Unit tests (tests/unit/) test library modules: lib/oauth.ts, lib/crypto.ts, lib/session-utils.ts, etc. They mock the database and any external dependencies (email, SMS, Redis). They verify function-level behavior: given these inputs, does the function return the correct output or throw the expected error?
Integration tests (tests/integration/) test API route handlers: app/api/oauth/token/route.ts, app/api/auth/login/route.ts, etc. They also mock the database, but they exercise the full request/response cycle: constructing a Request object, calling the route handler, and asserting on the Response status, headers, and body.

The integration tests still use mock-db, not a real database. This means they are technically "integration" only in the sense that they integrate multiple modules (validation, auth, database queries, response formatting) through the route handler — but they do not integrate with external infrastructure. We reserve the term "end-to-end" for tests that hit a running server with a real database.

Why Not Mock at the Route Level?

Some teams mock individual library functions in route handler tests (e.g., mock verifyToken() to return a specific user). We deliberately avoid this because route handlers are the security boundary. If we mock verifyToken(), we are not testing that the route actually calls it, or that it handles the failure case correctly. By mocking only the database, we ensure the route handler's auth logic runs for real — the same code path that will execute in production.

The signSessionV2/isSafeUrl Mock Propagation Challenge

Some functions are used deep in the call stack and need to be mocked at the module level, not the function level. Two examples that caused us particular headaches:

signSessionV2() is called during session creation, which is called during login, which is called by the login route handler. Mocking it requires mocking lib/crypto.ts, but that module also exports encrypt(), decrypt(), hashToken(), and other functions used throughout the codebase. If you mock the entire module, you break everything. If you mock only signSessionV2(), you need to re-export all the other functions from the real module:

vi.mock('@/lib/crypto', async () => {
  const actual = await vi.importActual('@/lib/crypto')
  return {
    ...actual,
    signSessionV2: vi.fn().mockReturnValue('v2:mock-signature'),
    verifySessionV2: vi.fn().mockReturnValue(true),
    isPQCReady: vi.fn().mockReturnValue(true),
  }
})

isSafeUrl() has the same problem. It is a synchronous function exported from lib/crypto.ts alongside cryptographic functions. Tests for webhook delivery need to mock it (to allow test URLs like http://localhost:9999), but they cannot mock the entire crypto module without breaking token hashing and encryption.

The pattern above — vi.importActual() spread with selective overrides — is our standard solution. It is verbose but explicit, and it survives refactors because the test will fail if a new export is added to the module that the spread does not cover.

Real-Postgres Concurrency Tests

A subset of our tests require a real PostgreSQL instance. These are gated behind the RUN_DB_TESTS=1 environment variable and are not run in CI by default (they require a Postgres instance with the schema applied). They cover scenarios that cannot be tested with mock-db:

Unique constraint violations. Registering two users with the same email should fail with a specific error, not a generic database error.
Transaction isolation. Concurrent session creation for the same user should not create duplicate sessions due to race conditions.
Foreign key cascades. Deleting a user should cascade to sessions, connected services, and activity logs.
Query performance. SCIM list queries with pagination and filters should use the correct indexes (verified via EXPLAIN ANALYZE).

# Run only the real-DB tests
RUN_DB_TESTS=1 npx vitest run tests/db/

These tests are slow (5-10 seconds each due to setup/teardown) and stateful (they share a database and must clean up after themselves). We keep them separate from the main suite so that the fast feedback loop of the 30-second mock-db suite is never blocked by database availability.

Security-Fix-First Testing

Every security fix in UniAuth follows a strict protocol: write the failing test first, then write the fix, then verify the test passes. This is not aspirational — it is enforced in code review. A security PR without a regression test is sent back.

The reason is simple: if you write the fix first and then the test, you risk writing a test that passes for the wrong reason (e.g., the test does not actually exercise the vulnerability). By writing the test first and watching it fail, you prove that the test detects the vulnerability, and then the fix proves that it resolves it.

Example from a recent timing-attack fix in backup code verification:

// FAILING TEST (before fix)
it('should use constant-time comparison for backup codes', () => {
  // The old code used === which leaks timing information
  const hash = hashBackupCode('AAAA-BBBB')
  const result = verifyBackupCode('AAAA-BBBB', hash)
  expect(result).toBe(true)

  // Verify the implementation uses timingSafeEqual
  // (This assertion checked the internal implementation, which
  // failed before the fix replaced === with timingSafeEqual)
})

// FIX: Replace === with crypto.timingSafeEqual() in verifyBackupCode()

// PASSING TEST (after fix) — same test, now green

Test Fixture Security Audit

We periodically audit test fixtures for security anti-patterns. Common issues we check for:

Hardcoded secrets in test code. Test files should use process.env.JWT_SECRET = 'test-secret-for-unit-tests', not a production-looking key. This prevents copy-paste accidents where a test secret ends up in production config.
Overly permissive mocks. A mock that returns { role: 'admin' } for any input makes it easy to miss authorization checks. We prefer mocks that return specific roles for specific inputs and reject unexpected calls.
Missing error-path tests. For every auth-related function, we verify that the error path (invalid token, expired session, wrong password) is tested with the same rigor as the success path. An untested error path is a potential bypass.
No real credentials. We run a pre-commit hook that scans for patterns matching API keys, JWTs with real-looking payloads, and base64-encoded secrets. This has caught two near-misses where a developer tested against a staging environment and left the token in a test file.

What 8,900 Tests Gives Us

The test count itself is not the goal. What the suite gives us is confidence to ship. When we add post-quantum signatures to sessions, the suite tells us whether existing session behavior regressed. When we add multi-tenant SCIM isolation, the suite tells us whether single-tenant SCIM behavior broke. When we fix a timing attack, the suite proves the fix works and does not regress later.

An identity provider that ships bugs erodes trust that takes years to rebuild. Our test suite is our insurance policy against that erosion. It is not perfect — no suite is — but it has caught more regressions than we can count, and it runs fast enough that developers actually run it before every push.