Fault Management
Overview
The Fault Management system provides extensible monitoring and response capabilities for spacecraft operations. It monitors configured parameters against yellow and red thresholds, tracks time spent in each fault state, and can automatically trigger safe mode when critical (RED) conditions occur.
Key Features
Multi-parameter monitoring: Track multiple spacecraft parameters simultaneously
Configurable thresholds: Set yellow (warning) and red (critical) limits for each parameter
Spacecraft red limit constraints: Define health and safety pointing constraints with time-based safe mode triggering
Bidirectional thresholds: Support for both “below” and “above” threshold types
Time tracking: Accumulate duration spent in yellow and red states, or in constraint violations
Automatic safe mode: Trigger irreversible safe mode on RED conditions or sustained constraint violations
Schema-validated thresholds: Threshold names must match predefined
Housekeepingfields
Configuration
Programmatic Configuration
Create and configure fault management using the FaultManagement class:
from conops.common import ACSMode
from conops.config.fault_management import FaultManagement
import rust_ephem
# Create fault management system
fm = FaultManagement()
# Add parameter thresholds
fm.add_threshold("battery_level", yellow=0.5, red=0.4, direction="below")
fm.add_threshold("power_usage", yellow=450.0, red=500.0, direction="above")
fm.add_threshold("recorder_fill_fraction", yellow=0.8, red=0.95, direction="above")
# Add mode-specific threshold (power usage only checked in SCIENCE mode)
fm.add_threshold(
"power_usage",
yellow=450.0,
red=500.0,
direction="above",
acs_modes=[ACSMode.SCIENCE]
)
# Add spacecraft red limit constraints
fm.add_red_limit_constraint(
name="spacecraft_sun_limit",
constraint=rust_ephem.SunConstraint(min_angle=30.0),
time_threshold_seconds=300.0, # 5 minutes
description="Prevent thermal damage from prolonged sun exposure"
)
fm.add_red_limit_constraint(
name="spacecraft_earth_limit",
constraint=rust_ephem.EarthLimbConstraint(min_angle=10.0),
time_threshold_seconds=600.0, # 10 minutes
description="Prevent stray light contamination"
)
# Add monitoring-only constraint (no safe mode trigger)
fm.add_red_limit_constraint(
name="spacecraft_moon_monitor",
constraint=rust_ephem.MoonConstraint(min_angle=5.0),
time_threshold_seconds=None, # No automatic safe mode
description="Monitor moon proximity (informational only)"
)
Threshold Parameters
Each threshold requires:
name: Unique identifier for the parameter (must match Housekeeping attribute name)yellow: Warning threshold valuered: Critical threshold value (must be more severe than yellow)direction: Either"below"or"above""below": Fault triggered when value ≤ threshold (e.g., battery level)"above": Fault triggered when value ≥ threshold (e.g., temperature, power)
ACS Mode Filtering
Thresholds can be restricted to specific Attitude Control System (ACS) modes using the acs_modes parameter. This allows different fault policies for different operational modes.
For example: - Only check high power draw during SCIENCE mode - Check recorder fill level in all modes except SAFE mode - Monitor battery level in all modes (default behavior)
Programmatic Configuration with ACS Modes:
from conops.common import ACSMode
from conops.config.fault_management import FaultManagement
fm = FaultManagement()
# Check battery in all modes (default)
fm.add_threshold("battery_level", yellow=0.5, red=0.4, direction="below")
# Only check high power usage during science operations
fm.add_threshold(
"power_usage",
yellow=450.0,
red=500.0,
direction="above",
acs_modes=[ACSMode.SCIENCE]
)
# Check recorder fill limits in multiple modes
fm.add_threshold(
"recorder_fill_fraction",
yellow=0.8,
red=0.95,
direction="above",
acs_modes=[ACSMode.SCIENCE, ACSMode.SLEW, ACSMode.SETTLE]
)
Valid threshold names are the predefined Housekeeping fields. Commonly used fields include:
battery_levelpower_usage/power_bus/power_payloadpanel_illuminationrecorder_fill_fraction/recorder_alertsun_angle_deg/earth_angle_deg/moon_angle_degstar_tracker_functional_count(automatically configured when star trackers are present)
During fault checking, the current ACS mode is determined from:
1. housekeeping.acs_mode (preferred)
2. acs.acsmode (fallback)
Thresholds are only evaluated when the current mode is in the acs_modes list. Omit acs_modes (or set to null) to check in all modes.
Spacecraft Red Limit Constraints
In addition to threshold-based monitoring, you can define spacecraft-level red limit constraints for health and safety. These are typically looser than instrument constraints (which are optimized for data quality) and exist purely to protect spacecraft hardware.
Red limit constraints can be added programmatically:
from conops.config.fault_management import FaultManagement
import rust_ephem
fm = FaultManagement()
# Add spacecraft sun constraint - thermal protection
fm.add_red_limit_constraint(
name="spacecraft_sun_limit",
constraint=rust_ephem.SunConstraint(min_angle=30.0),
time_threshold_seconds=300.0, # 5 minutes
description="Prevent thermal damage from prolonged sun exposure"
)
# Add spacecraft earth constraint - stray light protection
fm.add_red_limit_constraint(
name="spacecraft_earth_limit",
constraint=rust_ephem.EarthLimbConstraint(min_angle=10.0),
time_threshold_seconds=600.0, # 10 minutes
description="Prevent stray light contamination"
)
# Add monitoring-only constraint (no safe mode trigger)
fm.add_red_limit_constraint(
name="spacecraft_moon_monitor",
constraint=rust_ephem.MoonConstraint(min_angle=5.0),
time_threshold_seconds=None, # No automatic safe mode
description="Monitor moon proximity (informational only)"
)
Red Limit Constraint Parameters
Each red limit constraint requires:
name: Unique identifier for the constraintconstraint: rust_ephem constraint definition (sun, earth_limb, moon, etc.)time_threshold_seconds: Maximum continuous time in violation before triggering safe mode (set tonullfor monitoring only)description: Human-readable description of the constraint purpose
Programmatic Usage
Creating Fault Management
from conops.common import ACSMode
from conops.config.fault_management import FaultManagement
import rust_ephem
# Create fault management system
fm = FaultManagement()
# Add thresholds programmatically
fm.add_threshold("battery_level", yellow=0.5, red=0.4, direction="below")
fm.add_threshold("power_usage", yellow=450.0, red=500.0, direction="above")
fm.add_threshold("recorder_fill_fraction", yellow=0.8, red=0.95, direction="above")
# Add mode-specific threshold
fm.add_threshold(
"power_usage",
yellow=450.0,
red=500.0,
direction="above",
acs_modes=[ACSMode.SCIENCE]
)
# Add spacecraft red limit constraints
fm.add_red_limit_constraint(
name="spacecraft_sun_limit",
constraint=rust_ephem.SunConstraint(min_angle=30.0),
time_threshold_seconds=300.0, # 5 minutes
description="Prevent thermal damage from prolonged sun exposure"
)
fm.add_red_limit_constraint(
name="spacecraft_earth_limit",
constraint=rust_ephem.EarthLimbConstraint(min_angle=10.0),
time_threshold_seconds=600.0, # 10 minutes
description="Prevent stray light contamination"
)
# Add monitoring-only constraint (no safe mode trigger)
fm.add_red_limit_constraint(
name="spacecraft_moon_monitor",
constraint=rust_ephem.MoonConstraint(min_angle=5.0),
time_threshold_seconds=None, # No automatic safe mode
description="Monitor moon proximity (informational only)"
)
Checking Parameters
Call check() each simulation cycle to evaluate monitored parameters and red limit constraints:
from datetime import datetime, timezone
from conops.ditl.telemetry import Housekeeping
# Create housekeeping telemetry packet
hk = Housekeeping(
timestamp=datetime.now(tz=timezone.utc),
battery_level=battery.battery_level,
power_usage=power_system.total_draw,
recorder_fill_fraction=data_system.buffer_usage_fraction,
ra=current_pointing_ra,
dec=current_pointing_dec,
acs_mode=spacecraft_acs.acsmode
)
# Check parameters and constraints
classifications = fm.check(housekeeping=hk, acs=spacecraft_acs)
# classifications = {"battery_level": "yellow", "power_usage": "nominal", ...}
# Red limit constraints are checked automatically using housekeeping data
Retrieving Statistics
Get accumulated time in each fault state and constraint violation statistics:
stats = fm.statistics()
# For threshold-based parameters:
# {
# "battery_level": {
# "yellow_seconds": 120.0,
# "red_seconds": 0.0,
# "current": "yellow"
# },
# "power_usage": {
# "yellow_seconds": 45.0,
# "red_seconds": 30.0,
# "current": "red"
# }
# }
# For red limit constraints:
# {
# "spacecraft_sun_limit": {
# "in_violation": False,
# "total_violation_seconds": 180.0,
# "continuous_violation_seconds": 0.0
# }
# }
Separating Statistics by Type
To separate threshold-based and constraint-based statistics:
stats = fm.statistics()
# Get red limit constraint stats
constraint_stats = {
name: data for name, data in stats.items()
if any(c.name == name for c in fm.red_limit_constraints)
}
threshold_stats = {
name: data for name, data in stats.items()
if any(t.name == name for t in fm.thresholds)
}
Integration with QueueDITL
The fault management system is automatically integrated into the QueueDITL simulation loop when configured. It checks parameters after each power update:
from conops.config import MissionConfig
from conops.queue_ditl import QueueDITL
# Load config with fault_management section
config = MissionConfig.from_json("config_with_fault_management.json")
# Initialize defaults (adds battery_level, recorder_fill_fraction, and
# star_tracker_functional_count thresholds if not already present)
config.init_fault_management_defaults()
# Run simulation
ditl = QueueDITL(config, target_queue, begin, end, tle_file)
ditl.run()
# Check fault statistics after simulation
if config.fault_management:
stats = config.fault_management.statistics()
print(f"Fault statistics: {stats}")
Safe Mode Behavior
When safe_mode_on_red is true (default), any parameter reaching RED state or any red limit constraint exceeding its time threshold will:
Set flag: The
safe_mode_requestedflag is set toTrueDITL checks flag: The QueueDITL loop detects the flag and enqueues an
ENTER_SAFE_MODEcommandIrreversible operation: Safe mode cannot be exited once entered
Sun pointing: Spacecraft points solar panels at Sun for maximum power
Command queue cleared: All pending commands are discarded
Emergency power: System operates in minimal power configuration
Red Limit Constraint Triggering
Red limit constraints track continuous violation time. The constraint must be violated continuously for the entire time_threshold_seconds duration before safe mode is triggered. If the constraint is satisfied (even briefly), the continuous violation counter resets to zero.
This allows for:
Transient violations: Brief constraint violations during slews or maneuvers
Grace periods: Reasonable allowance for operational flexibility
Critical protection: Sustained violations that could cause hardware damage trigger safe mode
Set time_threshold_seconds to null to create monitoring-only constraints that track violations but never trigger safe mode.
Example Configuration File
Complete example configurations are available in:
examples/example_config_with_fault_management.json- Threshold-based monitoringexamples/example_config_with_red_limits.json- Red limit constraintsexamples/example_spacecraft_red_limits.py- Programmatic red limit configuration
The threshold-based example demonstrates monitoring of:
battery_level: Warning at 50%, critical at 40%
power_usage: Warning at 450W, critical at 500W
recorder_fill_fraction: Warning at 80%, critical at 95%
The red limit example demonstrates spacecraft health and safety constraints:
spacecraft_sun_limit: 30° exclusion zone, 5 minute threshold (thermal protection)
spacecraft_earth_limit: 10° exclusion zone, 10 minute threshold (stray light protection)
Event Log
All significant fault management transitions are recorded in an in-memory events list on FaultManagement.
FaultEvent fields:
utime– Unix timestamp (float)event_type– One ofthreshold_transition,constraint_violation,safe_mode_triggername– Threshold / constraint namecause– Human-readable descriptionmetadata– Optional contextual dict (subset of keys; may include current value, thresholds, RA/Dec, violation durations)
Example:
# After running fm.check(...)
for evt in fm.events:
print(evt) # Uses concise __str__ representation
Filtering events:
safe_mode_events = [e for e in fm.events if e.event_type == "safe_mode_trigger"]
sun_constraint_events = [e for e in fm.events if e.name == "spacecraft_sun_limit"]
The event log is append-only for the duration of a simulation; clear with fm.events.clear() if needed between runs.
Housekeeping Schema and New Metrics
Threshold names must match predefined Housekeeping fields.
If you need to monitor a new metric, first add it to the Housekeeping model in code, then add thresholds for that new field. Arbitrary extra fields in Housekeeping(...) are not accepted.
API Reference
See conops.fault_management for detailed API documentation.
Best Practices
Threshold-Based Monitoring
Yellow before Red: Set yellow thresholds as early warnings before critical limits
Test thresholds: Validate threshold values don’t cause premature safe mode triggers
Monitor statistics: Review accumulated yellow/red time after simulations
Battery monitoring: Always include battery_level monitoring for power-critical missions
Red Limit Constraints
Looser than science constraints: Red limits should be less restrictive than instrument constraints
Hardware protection focus: Design constraints around thermal limits, detector saturation, etc.
Appropriate time thresholds: Allow transient violations during normal operations (slews, etc.)
Test time thresholds: Verify thresholds don’t trigger during routine maneuvers
Use monitoring mode: Set
time_threshold_seconds: nullto track violations without triggering safe mode
ACS Mode Filtering
Mode-appropriate monitoring: Use
acs_modesto check parameters only when they matter operationallyOperational focus: Monitor high power usage during SCIENCE mode when needed
Data safety: Check recorder fill limits in operational modes where data accumulation matters
Power monitoring: Monitor battery levels in all modes (power is always critical)
Test mode transitions: Verify thresholds behave correctly during mode changes
General
Safe mode policy: Consider setting
safe_mode_on_red: falsefor analysis runs where you want to observe fault behavior without interventionSeparate concerns: Use thresholds for subsystem health (battery, power, recorder fill), red limits for pointing safety