Results & Metrics
Overall Performance
Astrophage achieves state-of-the-art results on the KOI classification task:
graph LR
subgraph "Metrics"
A[Accuracy<br/>94.81%]
B[Macro F1<br/>92.64%]
C[Weighted F1<br/>94.51%]
end
style A fill:#2ecc71,stroke:#27ae60,color:#fff
style B fill:#3498db,stroke:#2980b9,color:#fff
style C fill:#9b59b6,stroke:#8e44ad,color:#fff
Per-Class Performance
graph TD
subgraph "CONFIRMED"
C1[Precision: 89.95%]
C2[Recall: 94.54%]
C3[F1-Score: 92.18%]
end
subgraph "CANDIDATE"
A1[Precision: 88.42%]
A2[Recall: 85.06%]
A3[F1-Score: 86.71%]
end
subgraph "FALSE POSITIVE"
F1[Precision: 99.69%]
F2[Recall: 98.35%]
F3[F1-Score: 99.01%]
end
style C1 fill:#2ecc71,stroke:#27ae60,color:#fff
style C2 fill:#2ecc71,stroke:#27ae60,color:#fff
style C3 fill:#2ecc71,stroke:#27ae60,color:#fff
style F1 fill:#2ecc71,stroke:#27ae60,color:#fff
style F2 fill:#2ecc71,stroke:#27ae60,color:#fff
style F3 fill:#2ecc71,stroke:#27ae60,color:#fff
Detailed Breakdown
| Class | Precision | Recall | F1-Score | Support | Notes |
|---|---|---|---|---|---|
| CANDIDATE | 88.42% | 85.06% | 86.71% | 1,978 | Hardest class — ambiguous by definition |
| FALSE POSITIVE | 99.69% | 98.35% | 99.01% | 4,839 | Nearly perfect — FP flags are very strong |
| CONFIRMED | 89.95% | 94.54% | 92.18% | 2,747 | Strong — clear signals are easy to identify |
Confusion Matrix
graph TD
subgraph "Confusion Matrix (Test Set: 1,911 samples)"
A1[True CANDIDATE<br/>1,978] --> B1[Predicted CANDIDATE<br/>~1,682]
A1 --> B2[Predicted FALSE+<br/>~296]
A2[True FALSE+<br/>4,839] --> B3[Predicted FALSE+<br/>~4,759]
A2 --> B4[Predicted CANDIDATE<br/>~80]
A3[True CONFIRMED<br/>2,747] --> B5[Predicted CONFIRMED<br/>~2,597]
A3 --> B6[Predicted NOT<br/>~150]
end
style B1 fill:#3498db,stroke:#2980b9,color:#fff
style B3 fill:#e74c3c,stroke:#c0392c,color:#fff
style B5 fill:#2ecc71,stroke:#27ae60,color:#fff
style B2 fill:#e74c3c,stroke:#c0392c,color:#fff
style B4 fill:#3498db,stroke:#2980b9,color:#fff
style B6 fill:#e74c3c,stroke:#c0392c,color:#fff
Most confusion occurs between CANDIDATE and FALSE_POSITIVE — exactly where we expect it. Stage 1’s CONFIRMED separation is nearly clean.
Feature Importance
graph LR
subgraph "Top 10 Features"
A1[1. fpflag_sum<br/>0.2918]
A2[2. koi_fpflag_co<br/>0.0683]
A3[3. koi_max_mult_ev<br/>0.0630]
A4[4. koi_fpflag_nt<br/>0.0624]
A5[5. koi_model_snr<br/>0.0596]
A6[6. koi_fpflag_ss<br/>0.0450]
A7[7. koi_prad<br/>0.0437]
A8[8. snr_x_prad<br/>0.0390]
A9[9. koi_count<br/>0.0324]
A10[10. koi_ror<br/>0.0300]
end
Feature Importance by Category
graph TD
subgraph "Importance Distribution"
A[FP Flags<br/>~47%]
B[Signal Quality<br/>~20%]
C[Physical Params<br/>~18%]
D[Derived Features<br/>~15%]
end
A --> E[fpflag_sum dominates]
B --> F[SNR, max events]
C --> G[Radius, temperature]
D --> H[Interactions, ratios]
style A fill:#e74c3c,stroke:#c0392c,color:#fff
style E fill:#e74c3c,stroke:#c0392c,color:#fff
Astrophysical Insights
Insight 1: False Positive Flags (Very High Confidence)
graph LR
A[NASA FP Flags] --> B[fpflag_sum]
B -->|Value > 0| C[99%+ chance<br/>FALSE POSITIVE]
B -->|Value = 0| D[Needs further<br/>analysis]
style C fill:#e74c3c,stroke:#c0392c,color:#fff
style D fill:#3498db,stroke:#2980b9,color:#fff
Supporting features:
fpflag_sum,koi_fpflag_nt,koi_fpflag_ssNASA’s pre-vetting flags directly encode expert knowledge. When these are non-zero, the signal is almost certainly not a planet. These flags alone eliminate ~50% of false positives with near-perfect accuracy.
Insight 2: SNR-Radius Consistency (High Confidence)
graph LR
A[Real Planet] -->|Jupiter| B[High SNR]
A -->|Earth| C[Low SNR]
D[Inconsistent] -->|Large + Low SNR| E[Suspicious]
B --> F[snr_x_prad<br/>consistent]
C --> F
E --> G[snr_x_prad<br/>inconsistent]
style F fill:#2ecc71,stroke:#27ae60,color:#fff
style G fill:#e74c3c,stroke:#c0392c,color:#fff
Supporting features:
koi_model_snr,snr_x_prad,koi_pradReal planets have signal-to-noise ratios consistent with their size. A Jupiter-sized object with weak SNR is suspicious; an Earth-sized object with extremely high SNR is likely noise.
Insight 3: Transit Geometry (High Confidence)
graph LR
subgraph "Planet Transit"
A1[U-shaped curve]
A2[Specific depth/duration]
end
subgraph "Stellar Binary"
B1[V-shaped curve]
B2[Different depth/duration]
end
A1 --> C[depth_duration_ratio<br/>~ planet]
B1 --> D[depth_duration_ratio<br/>~ binary]
style C fill:#2ecc71,stroke:#27ae60,color:#fff
style D fill:#e74c3c,stroke:#c0392c,color:#fff
Supporting features:
depth_duration_ratio,log_period,koi_durationPlanetary transits produce characteristic U-shaped light curves with specific depth-to-duration ratios. Stellar binaries produce V-shaped eclipses with different geometry. Our derived
depth_duration_ratiocaptures this distinction.
Recommendations
graph TD
subgraph "Actionable Recommendations"
A1[Use Stage 1 as rapid filter]
A2[Investigate uncertain Stage 1 samples]
A3[Use Stage 2 for prioritization]
A4[Use fpflag_sum as pre-filter]
end
A1 --> B1[Quickly identify CONFIRMED]
A2 --> B2[Scientifically interesting edge cases]
A3 --> B3[Prioritize CANDIDATE follow-up]
A4 --> B4[Eliminate 50% of false positives instantly]
style A1 fill:#2ecc71,stroke:#27ae60,color:#fff
style A2 fill:#3498db,stroke:#2980b9,color:#fff
style A3 fill:#9b59b6,stroke:#8e44ad,color:#fff
style A4 fill:#f39c12,stroke:#e67e22,color:#fff
| # | Recommendation | Impact |
|---|---|---|
| 1 | Use Stage 1 as a rapid filter for follow-up observations | Saves telescope time |
| 2 | Investigate samples where Stage 1 is uncertain (probability ~0.5) | Most scientifically interesting |
| 3 | For NOT_CONFIRMED, use Stage 2 probability to prioritize follow-up | Efficient resource allocation |
| 4 | fpflag_sum alone eliminates ~50% of false positives with near-perfect accuracy | Dramatic efficiency gain |
Comparison with Baselines
graph LR
subgraph "Accuracy Comparison"
A[Single-Stage RF<br/>~91%]
B[Logistic Regression<br/>~87%]
C[SVM<br/>~89%]
D[Astrophage<br/>94.81%]
end
style D fill:#2ecc71,stroke:#27ae60,color:#fff
style A fill:#95a5a6,stroke:#7f8c8d,color:#fff
style B fill:#95a5a6,stroke:#7f8c8d,color:#fff
style C fill:#95a5a6,stroke:#7f8c8d,color:#fff
Astrophage’s two-stage architecture provides a 3-4% accuracy improvement over single-stage approaches, which is significant in the context of exoplanet discovery where each percentage point represents hundreds of potential planets.