Fault Management

Overview

The Fault Management system provides extensible monitoring and response capabilities for spacecraft operations. It monitors configured parameters against yellow and red thresholds, tracks time spent in each fault state, and can automatically trigger safe mode when critical (RED) conditions occur.

Key Features

  • Multi-parameter monitoring: Track multiple spacecraft parameters simultaneously

  • Configurable thresholds: Set yellow (warning) and red (critical) limits for each parameter

  • Spacecraft red limit constraints: Define health and safety pointing constraints with time-based safe mode triggering

  • Bidirectional thresholds: Support for both “below” and “above” threshold types

  • Time tracking: Accumulate duration spent in yellow and red states, or in constraint violations

  • Automatic safe mode: Trigger irreversible safe mode on RED conditions or sustained constraint violations

  • Schema-validated thresholds: Threshold names must match predefined Housekeeping fields

Configuration

Programmatic Configuration

Create and configure fault management using the FaultManagement class:

from conops.common import ACSMode
from conops.config.fault_management import FaultManagement
import rust_ephem

# Create fault management system
fm = FaultManagement()

# Add parameter thresholds
fm.add_threshold("battery_level", yellow=0.5, red=0.4, direction="below")
 fm.add_threshold("power_usage", yellow=450.0, red=500.0, direction="above")
 fm.add_threshold("recorder_fill_fraction", yellow=0.8, red=0.95, direction="above")

# Add mode-specific threshold (power usage only checked in SCIENCE mode)
fm.add_threshold(
    "power_usage",
    yellow=450.0,
    red=500.0,
    direction="above",
    acs_modes=[ACSMode.SCIENCE]
)

# Add spacecraft red limit constraints
fm.add_red_limit_constraint(
    name="spacecraft_sun_limit",
    constraint=rust_ephem.SunConstraint(min_angle=30.0),
    time_threshold_seconds=300.0,  # 5 minutes
    description="Prevent thermal damage from prolonged sun exposure"
)

fm.add_red_limit_constraint(
    name="spacecraft_earth_limit",
    constraint=rust_ephem.EarthLimbConstraint(min_angle=10.0),
    time_threshold_seconds=600.0,  # 10 minutes
    description="Prevent stray light contamination"
)

# Add monitoring-only constraint (no safe mode trigger)
fm.add_red_limit_constraint(
    name="spacecraft_moon_monitor",
    constraint=rust_ephem.MoonConstraint(min_angle=5.0),
    time_threshold_seconds=None,  # No automatic safe mode
    description="Monitor moon proximity (informational only)"
)

Threshold Parameters

Each threshold requires:

  • name: Unique identifier for the parameter (must match Housekeeping attribute name)

  • yellow: Warning threshold value

  • red: Critical threshold value (must be more severe than yellow)

  • direction: Either "below" or "above"

    • "below": Fault triggered when value ≤ threshold (e.g., battery level)

    • "above": Fault triggered when value ≥ threshold (e.g., temperature, power)

ACS Mode Filtering

Thresholds can be restricted to specific Attitude Control System (ACS) modes using the acs_modes parameter. This allows different fault policies for different operational modes.

For example: - Only check high power draw during SCIENCE mode - Check recorder fill level in all modes except SAFE mode - Monitor battery level in all modes (default behavior)

Programmatic Configuration with ACS Modes:

from conops.common import ACSMode
from conops.config.fault_management import FaultManagement

fm = FaultManagement()

# Check battery in all modes (default)
fm.add_threshold("battery_level", yellow=0.5, red=0.4, direction="below")

# Only check high power usage during science operations
fm.add_threshold(
    "power_usage",
    yellow=450.0,
    red=500.0,
    direction="above",
    acs_modes=[ACSMode.SCIENCE]
)

# Check recorder fill limits in multiple modes
fm.add_threshold(
    "recorder_fill_fraction",
    yellow=0.8,
    red=0.95,
    direction="above",
    acs_modes=[ACSMode.SCIENCE, ACSMode.SLEW, ACSMode.SETTLE]
)

Valid threshold names are the predefined Housekeeping fields. Commonly used fields include:

  • battery_level

  • power_usage / power_bus / power_payload

  • panel_illumination

  • recorder_fill_fraction / recorder_alert

  • sun_angle_deg / earth_angle_deg / moon_angle_deg

  • star_tracker_functional_count (automatically configured when star trackers are present)

During fault checking, the current ACS mode is determined from: 1. housekeeping.acs_mode (preferred) 2. acs.acsmode (fallback)

Thresholds are only evaluated when the current mode is in the acs_modes list. Omit acs_modes (or set to null) to check in all modes.

Spacecraft Red Limit Constraints

In addition to threshold-based monitoring, you can define spacecraft-level red limit constraints for health and safety. These are typically looser than instrument constraints (which are optimized for data quality) and exist purely to protect spacecraft hardware.

Red limit constraints can be added programmatically:

from conops.config.fault_management import FaultManagement
import rust_ephem

fm = FaultManagement()

# Add spacecraft sun constraint - thermal protection
fm.add_red_limit_constraint(
    name="spacecraft_sun_limit",
    constraint=rust_ephem.SunConstraint(min_angle=30.0),
    time_threshold_seconds=300.0,  # 5 minutes
    description="Prevent thermal damage from prolonged sun exposure"
)

# Add spacecraft earth constraint - stray light protection
fm.add_red_limit_constraint(
    name="spacecraft_earth_limit",
    constraint=rust_ephem.EarthLimbConstraint(min_angle=10.0),
    time_threshold_seconds=600.0,  # 10 minutes
    description="Prevent stray light contamination"
)

# Add monitoring-only constraint (no safe mode trigger)
fm.add_red_limit_constraint(
    name="spacecraft_moon_monitor",
    constraint=rust_ephem.MoonConstraint(min_angle=5.0),
    time_threshold_seconds=None,  # No automatic safe mode
    description="Monitor moon proximity (informational only)"
)

Red Limit Constraint Parameters

Each red limit constraint requires:

  • name: Unique identifier for the constraint

  • constraint: rust_ephem constraint definition (sun, earth_limb, moon, etc.)

  • time_threshold_seconds: Maximum continuous time in violation before triggering safe mode (set to null for monitoring only)

  • description: Human-readable description of the constraint purpose

Programmatic Usage

Creating Fault Management

from conops.common import ACSMode
from conops.config.fault_management import FaultManagement
import rust_ephem

# Create fault management system
fm = FaultManagement()

# Add thresholds programmatically
fm.add_threshold("battery_level", yellow=0.5, red=0.4, direction="below")
 fm.add_threshold("power_usage", yellow=450.0, red=500.0, direction="above")
 fm.add_threshold("recorder_fill_fraction", yellow=0.8, red=0.95, direction="above")

# Add mode-specific threshold
fm.add_threshold(
    "power_usage",
    yellow=450.0,
    red=500.0,
    direction="above",
    acs_modes=[ACSMode.SCIENCE]
)

# Add spacecraft red limit constraints
fm.add_red_limit_constraint(
    name="spacecraft_sun_limit",
    constraint=rust_ephem.SunConstraint(min_angle=30.0),
    time_threshold_seconds=300.0,  # 5 minutes
    description="Prevent thermal damage from prolonged sun exposure"
)

fm.add_red_limit_constraint(
    name="spacecraft_earth_limit",
    constraint=rust_ephem.EarthLimbConstraint(min_angle=10.0),
    time_threshold_seconds=600.0,  # 10 minutes
    description="Prevent stray light contamination"
)

# Add monitoring-only constraint (no safe mode trigger)
fm.add_red_limit_constraint(
    name="spacecraft_moon_monitor",
    constraint=rust_ephem.MoonConstraint(min_angle=5.0),
    time_threshold_seconds=None,  # No automatic safe mode
    description="Monitor moon proximity (informational only)"
)

Checking Parameters

Call check() each simulation cycle to evaluate monitored parameters and red limit constraints:

from datetime import datetime, timezone
from conops.ditl.telemetry import Housekeeping

# Create housekeeping telemetry packet
hk = Housekeeping(
    timestamp=datetime.now(tz=timezone.utc),
    battery_level=battery.battery_level,
    power_usage=power_system.total_draw,
    recorder_fill_fraction=data_system.buffer_usage_fraction,
    ra=current_pointing_ra,
    dec=current_pointing_dec,
    acs_mode=spacecraft_acs.acsmode
)

# Check parameters and constraints
classifications = fm.check(housekeeping=hk, acs=spacecraft_acs)

# classifications = {"battery_level": "yellow", "power_usage": "nominal", ...}
# Red limit constraints are checked automatically using housekeeping data

Retrieving Statistics

Get accumulated time in each fault state and constraint violation statistics:

stats = fm.statistics()

# For threshold-based parameters:
# {
#     "battery_level": {
#         "yellow_seconds": 120.0,
#         "red_seconds": 0.0,
#         "current": "yellow"
#     },
 #     "power_usage": {
#         "yellow_seconds": 45.0,
#         "red_seconds": 30.0,
#         "current": "red"
#     }
# }

# For red limit constraints:
# {
#     "spacecraft_sun_limit": {
#         "in_violation": False,
#         "total_violation_seconds": 180.0,
#         "continuous_violation_seconds": 0.0
#     }
# }

Separating Statistics by Type

To separate threshold-based and constraint-based statistics:

stats = fm.statistics()

# Get red limit constraint stats
constraint_stats = {
    name: data for name, data in stats.items()
    if any(c.name == name for c in fm.red_limit_constraints)
}

threshold_stats = {
    name: data for name, data in stats.items()
    if any(t.name == name for t in fm.thresholds)
}

Integration with QueueDITL

The fault management system is automatically integrated into the QueueDITL simulation loop when configured. It checks parameters after each power update:

from conops.config import MissionConfig
from conops.queue_ditl import QueueDITL

# Load config with fault_management section
config = MissionConfig.from_json("config_with_fault_management.json")

# Initialize defaults (adds battery_level, recorder_fill_fraction, and
# star_tracker_functional_count thresholds if not already present)
config.init_fault_management_defaults()

# Run simulation
ditl = QueueDITL(config, target_queue, begin, end, tle_file)
ditl.run()

# Check fault statistics after simulation
if config.fault_management:
    stats = config.fault_management.statistics()
    print(f"Fault statistics: {stats}")

Safe Mode Behavior

When safe_mode_on_red is true (default), any parameter reaching RED state or any red limit constraint exceeding its time threshold will:

  1. Set flag: The safe_mode_requested flag is set to True

  2. DITL checks flag: The QueueDITL loop detects the flag and enqueues an ENTER_SAFE_MODE command

  3. Irreversible operation: Safe mode cannot be exited once entered

  4. Sun pointing: Spacecraft points solar panels at Sun for maximum power

  5. Command queue cleared: All pending commands are discarded

  6. Emergency power: System operates in minimal power configuration

Red Limit Constraint Triggering

Red limit constraints track continuous violation time. The constraint must be violated continuously for the entire time_threshold_seconds duration before safe mode is triggered. If the constraint is satisfied (even briefly), the continuous violation counter resets to zero.

This allows for:

  • Transient violations: Brief constraint violations during slews or maneuvers

  • Grace periods: Reasonable allowance for operational flexibility

  • Critical protection: Sustained violations that could cause hardware damage trigger safe mode

Set time_threshold_seconds to null to create monitoring-only constraints that track violations but never trigger safe mode.

Example Configuration File

Complete example configurations are available in:

  • examples/example_config_with_fault_management.json - Threshold-based monitoring

  • examples/example_config_with_red_limits.json - Red limit constraints

  • examples/example_spacecraft_red_limits.py - Programmatic red limit configuration

The threshold-based example demonstrates monitoring of:

  • battery_level: Warning at 50%, critical at 40%

  • power_usage: Warning at 450W, critical at 500W

  • recorder_fill_fraction: Warning at 80%, critical at 95%

The red limit example demonstrates spacecraft health and safety constraints:

  • spacecraft_sun_limit: 30° exclusion zone, 5 minute threshold (thermal protection)

  • spacecraft_earth_limit: 10° exclusion zone, 10 minute threshold (stray light protection)

Event Log

All significant fault management transitions are recorded in an in-memory events list on FaultManagement.

FaultEvent fields:

  • utime – Unix timestamp (float)

  • event_type – One of threshold_transition, constraint_violation, safe_mode_trigger

  • name – Threshold / constraint name

  • cause – Human-readable description

  • metadata – Optional contextual dict (subset of keys; may include current value, thresholds, RA/Dec, violation durations)

Example:

# After running fm.check(...)
for evt in fm.events:
     print(evt)  # Uses concise __str__ representation

Filtering events:

safe_mode_events = [e for e in fm.events if e.event_type == "safe_mode_trigger"]
sun_constraint_events = [e for e in fm.events if e.name == "spacecraft_sun_limit"]

The event log is append-only for the duration of a simulation; clear with fm.events.clear() if needed between runs.

Housekeeping Schema and New Metrics

Threshold names must match predefined Housekeeping fields.

If you need to monitor a new metric, first add it to the Housekeeping model in code, then add thresholds for that new field. Arbitrary extra fields in Housekeeping(...) are not accepted.

API Reference

See conops.fault_management for detailed API documentation.

Best Practices

Threshold-Based Monitoring

  • Yellow before Red: Set yellow thresholds as early warnings before critical limits

  • Test thresholds: Validate threshold values don’t cause premature safe mode triggers

  • Monitor statistics: Review accumulated yellow/red time after simulations

  • Battery monitoring: Always include battery_level monitoring for power-critical missions

Red Limit Constraints

  • Looser than science constraints: Red limits should be less restrictive than instrument constraints

  • Hardware protection focus: Design constraints around thermal limits, detector saturation, etc.

  • Appropriate time thresholds: Allow transient violations during normal operations (slews, etc.)

  • Test time thresholds: Verify thresholds don’t trigger during routine maneuvers

  • Use monitoring mode: Set time_threshold_seconds: null to track violations without triggering safe mode

ACS Mode Filtering

  • Mode-appropriate monitoring: Use acs_modes to check parameters only when they matter operationally

  • Operational focus: Monitor high power usage during SCIENCE mode when needed

  • Data safety: Check recorder fill limits in operational modes where data accumulation matters

  • Power monitoring: Monitor battery levels in all modes (power is always critical)

  • Test mode transitions: Verify thresholds behave correctly during mode changes

General

  • Safe mode policy: Consider setting safe_mode_on_red: false for analysis runs where you want to observe fault behavior without intervention

  • Separate concerns: Use thresholds for subsystem health (battery, power, recorder fill), red limits for pointing safety