development

Debugging CI Chrome Crashes

Date Published

System tests using Cuprite/Ferrum/Chrome were failing in GitLab CI (Docker executor) but passing locally. Chrome either crashed or hung during startup, producing zero diagnostic output.

The Failure

1) Upscale Page System Tests upscale type selection allows selecting crisp upscale type
   Got 0 failures and 4 other errors:

   1.1) Failure/Error: visit '/upscale'
        Timeout::ExitException: execution expired

   1.2) Failure/Error: example.run
        Ferrum::ProcessTimeoutError:
          Browser did not produce websocket url within 60 seconds

CI artifact log/chrome_debug.log: 0 bytes. No Chrome output captured.

Environment

  • CI runner image: registry.rscz.ru/rt/ci-runner:latest (Debian Trixie, Ruby 3.4.8, Node 22.13.0)
  • Chrome: Google Chrome 143.0.7499.192
  • Gems: cuprite 0.17, ferrum 0.17.1
  • Ruby: 3.4.8 (CI), 3.4.5 (local dev)
  • CI services: PostgreSQL 17, OpenSearch 2.11.1 (Java, 512MB heap), MinIO, Redis 7

A reference project (rslogin-ui) used the same CI runner image with no Chrome issues — which made this especially puzzling.

Key Red Flag

The Chrome diagnostic in CI passed (it runs Chrome directly with --no-sandbox), but tests failed. This meant the issue was in how Ferrum/Cuprite was launching Chrome, not Chrome itself.


Root Cause: The Silent Driver Re-registration

After 4 days and 20+ commits, the actual culprit was found: driven_by :cuprite silently discards all browser_options.

The capybara_helper.rb had a correctly configured Capybara.register_driver(:cuprite) block with all the right browser_options (including 'no-sandbox' => nil). None of it mattered because driven_by :cuprite in the RSpec before block silently replaced the entire registration.

How it happened:

  1. We registered a custom :cuprite driver with full browser_options including 'no-sandbox' => nil
  2. In RSpec config we had: driven_by :cuprite
  3. driven_by :cuprite calls ActionDispatch::SystemTesting::Driver.new(:cuprite).use which calls registerCapybara.register_driver(:cuprite) with its own block
  4. Since driven_by :cuprite was called without options, @options is {} and @screen_size is nil
  5. Our entire browser_options hash was discarded

The Fix

Replace driven_by :cuprite with @driver = :cuprite:

config.before(:each, type: :system) do
  # Setting @driver prevents RSpec Rails' SystemExampleGroup from calling
  # driven_by(:selenium_chrome_headless) as a fallback.
  # Capybara.default_driver = :cuprite ensures our registered driver is used.
  @driver = :cuprite
end

Secondary Issues Found

While chasing the main bug, I discovered several other issues that would have caused problems once the driver re-registration was fixed:

Ruby 3.4 Timeout Bug

Timeout::ExitException inherits from Exception, not Timeout::Error in Ruby 3.4. The rescue Timeout::Error never caught the timeout, causing cascading failures.

Chrome disable-features Override

Using string keys for browser_options like 'disable-features' => 'VizDisplayCompositor' overwrote Ferrum's default string-keyed "disable-features" which included critical Docker settings like site-per-process and IsolateOrigins.

Debugging Techniques That Helped

  • Monkey-patching gems to log: Prepending Cuprite::Driver#initialize to log the caller and incoming options immediately revealed the re-registration
  • Reading gem source code: Ferrum's process.rb and command.rb showed how Chrome flags are constructed
  • Comparison with working projects: The rslogin-ui project used symbol keys for browser_options, which coexisted with Ferrum's string keys

Timeline

2026-02-05

Issue discovered

CI tests failing with ProcessTimeoutError. Initial attempts: various Chrome flags, logging options.

2026-02-06

Red herrings chased

Fixed disable-features override, added Chrome wrapper script, improved diagnostics. Tests still failing.

2026-02-08

Breakthrough

Monkey-patched Cuprite::Driver#initialize to log incoming options. Discovered driver re-registration.

2026-02-09

Resolution

Replaced driven_by :cuprite with @driver = :cuprite. All tests passing.


Lessons Learned

driven_by Re-registers Drivers

If you configure Cuprite via Capybara.register_driver(:cuprite), do NOT also call driven_by :cuprite — it silently overwrites your registration with an empty one. Use @driver = :cuprite to prevent rspec-rails' Selenium fallback instead.

Monkey-Patching Reveals Truth

Prepending Cuprite::Driver#initialize to log the caller and incoming options immediately revealed the re-registration. Theoretical analysis of Ferrum's key handling wasted 20+ commits.

The Misleading Error

Chrome's error message "Running as root without --no-sandbox" sounds like a flag format issue. The real question was: why isn't the flag reaching Chrome at all?


Current State

The issue is resolved. The fail-fast around(:each) hook now prints Chrome's stderr from ProcessTimeoutError#output and stops the suite in CI if issues recur.

All browser options now properly reach Chrome:

  • 'no-sandbox' => nil — required for running as root in Docker
  • 'disable-features' => 'site-per-process,IsolateOrigins,TranslateUI' — memory-critical for containers
  • process_timeout: 120 — generous timeout for CI resource constraints

Total debugging time: ~4 days. Final fix: 1 line change.

#debugging#ci-cd#ruby#testing