Results & Metrics

Overall Performance

Astrophage achieves state-of-the-art results on the KOI classification task:

graph LR
    subgraph "Metrics"
        A[Accuracy<br/>94.81%]
        B[Macro F1<br/>92.64%]
        C[Weighted F1<br/>94.51%]
    end

    style A fill:#2ecc71,stroke:#27ae60,color:#fff
    style B fill:#3498db,stroke:#2980b9,color:#fff
    style C fill:#9b59b6,stroke:#8e44ad,color:#fff

Per-Class Performance

graph TD
    subgraph "CONFIRMED"
        C1[Precision: 89.95%]
        C2[Recall: 94.54%]
        C3[F1-Score: 92.18%]
    end

    subgraph "CANDIDATE"
        A1[Precision: 88.42%]
        A2[Recall: 85.06%]
        A3[F1-Score: 86.71%]
    end

    subgraph "FALSE POSITIVE"
        F1[Precision: 99.69%]
        F2[Recall: 98.35%]
        F3[F1-Score: 99.01%]
    end

    style C1 fill:#2ecc71,stroke:#27ae60,color:#fff
    style C2 fill:#2ecc71,stroke:#27ae60,color:#fff
    style C3 fill:#2ecc71,stroke:#27ae60,color:#fff
    style F1 fill:#2ecc71,stroke:#27ae60,color:#fff
    style F2 fill:#2ecc71,stroke:#27ae60,color:#fff
    style F3 fill:#2ecc71,stroke:#27ae60,color:#fff

Detailed Breakdown

Class	Precision	Recall	F1-Score	Support	Notes
CANDIDATE	88.42%	85.06%	86.71%	1,978	Hardest class — ambiguous by definition
FALSE POSITIVE	99.69%	98.35%	99.01%	4,839	Nearly perfect — FP flags are very strong
CONFIRMED	89.95%	94.54%	92.18%	2,747	Strong — clear signals are easy to identify

Confusion Matrix

graph TD
    subgraph "Confusion Matrix (Test Set: 1,911 samples)"
        A1[True CANDIDATE<br/>1,978] --> B1[Predicted CANDIDATE<br/>~1,682]
        A1 --> B2[Predicted FALSE+<br/>~296]

        A2[True FALSE+<br/>4,839] --> B3[Predicted FALSE+<br/>~4,759]
        A2 --> B4[Predicted CANDIDATE<br/>~80]

        A3[True CONFIRMED<br/>2,747] --> B5[Predicted CONFIRMED<br/>~2,597]
        A3 --> B6[Predicted NOT<br/>~150]
    end

    style B1 fill:#3498db,stroke:#2980b9,color:#fff
    style B3 fill:#e74c3c,stroke:#c0392c,color:#fff
    style B5 fill:#2ecc71,stroke:#27ae60,color:#fff
    style B2 fill:#e74c3c,stroke:#c0392c,color:#fff
    style B4 fill:#3498db,stroke:#2980b9,color:#fff
    style B6 fill:#e74c3c,stroke:#c0392c,color:#fff

Most confusion occurs between CANDIDATE and FALSE_POSITIVE — exactly where we expect it. Stage 1’s CONFIRMED separation is nearly clean.

Feature Importance

graph LR
    subgraph "Top 10 Features"
        A1[1. fpflag_sum<br/>0.2918]
        A2[2. koi_fpflag_co<br/>0.0683]
        A3[3. koi_max_mult_ev<br/>0.0630]
        A4[4. koi_fpflag_nt<br/>0.0624]
        A5[5. koi_model_snr<br/>0.0596]
        A6[6. koi_fpflag_ss<br/>0.0450]
        A7[7. koi_prad<br/>0.0437]
        A8[8. snr_x_prad<br/>0.0390]
        A9[9. koi_count<br/>0.0324]
        A10[10. koi_ror<br/>0.0300]
    end

Feature Importance by Category

graph TD
    subgraph "Importance Distribution"
        A[FP Flags<br/>~47%]
        B[Signal Quality<br/>~20%]
        C[Physical Params<br/>~18%]
        D[Derived Features<br/>~15%]
    end

    A --> E[fpflag_sum dominates]
    B --> F[SNR, max events]
    C --> G[Radius, temperature]
    D --> H[Interactions, ratios]

    style A fill:#e74c3c,stroke:#c0392c,color:#fff
    style E fill:#e74c3c,stroke:#c0392c,color:#fff

Astrophysical Insights

Insight 1: False Positive Flags (Very High Confidence)

graph LR
    A[NASA FP Flags] --> B[fpflag_sum]
    B -->|Value > 0| C[99%+ chance<br/>FALSE POSITIVE]
    B -->|Value = 0| D[Needs further<br/>analysis]

    style C fill:#e74c3c,stroke:#c0392c,color:#fff
    style D fill:#3498db,stroke:#2980b9,color:#fff

Supporting features: fpflag_sum, koi_fpflag_nt, koi_fpflag_ss

NASA’s pre-vetting flags directly encode expert knowledge. When these are non-zero, the signal is almost certainly not a planet. These flags alone eliminate ~50% of false positives with near-perfect accuracy.

Insight 2: SNR-Radius Consistency (High Confidence)

graph LR
    A[Real Planet] -->|Jupiter| B[High SNR]
    A -->|Earth| C[Low SNR]

    D[Inconsistent] -->|Large + Low SNR| E[Suspicious]

    B --> F[snr_x_prad<br/>consistent]
    C --> F
    E --> G[snr_x_prad<br/>inconsistent]

    style F fill:#2ecc71,stroke:#27ae60,color:#fff
    style G fill:#e74c3c,stroke:#c0392c,color:#fff

Supporting features: koi_model_snr, snr_x_prad, koi_prad

Real planets have signal-to-noise ratios consistent with their size. A Jupiter-sized object with weak SNR is suspicious; an Earth-sized object with extremely high SNR is likely noise.

Insight 3: Transit Geometry (High Confidence)

graph LR
    subgraph "Planet Transit"
        A1[U-shaped curve]
        A2[Specific depth/duration]
    end

    subgraph "Stellar Binary"
        B1[V-shaped curve]
        B2[Different depth/duration]
    end

    A1 --> C[depth_duration_ratio<br/>~ planet]
    B1 --> D[depth_duration_ratio<br/>~ binary]

    style C fill:#2ecc71,stroke:#27ae60,color:#fff
    style D fill:#e74c3c,stroke:#c0392c,color:#fff

Supporting features: depth_duration_ratio, log_period, koi_duration

Planetary transits produce characteristic U-shaped light curves with specific depth-to-duration ratios. Stellar binaries produce V-shaped eclipses with different geometry. Our derived depth_duration_ratio captures this distinction.

Recommendations

graph TD
    subgraph "Actionable Recommendations"
        A1[Use Stage 1 as rapid filter]
        A2[Investigate uncertain Stage 1 samples]
        A3[Use Stage 2 for prioritization]
        A4[Use fpflag_sum as pre-filter]
    end

    A1 --> B1[Quickly identify CONFIRMED]
    A2 --> B2[Scientifically interesting edge cases]
    A3 --> B3[Prioritize CANDIDATE follow-up]
    A4 --> B4[Eliminate 50% of false positives instantly]

    style A1 fill:#2ecc71,stroke:#27ae60,color:#fff
    style A2 fill:#3498db,stroke:#2980b9,color:#fff
    style A3 fill:#9b59b6,stroke:#8e44ad,color:#fff
    style A4 fill:#f39c12,stroke:#e67e22,color:#fff

#	Recommendation	Impact
1	Use Stage 1 as a rapid filter for follow-up observations	Saves telescope time
2	Investigate samples where Stage 1 is uncertain (probability ~0.5)	Most scientifically interesting
3	For NOT_CONFIRMED, use Stage 2 probability to prioritize follow-up	Efficient resource allocation
4	`fpflag_sum` alone eliminates ~50% of false positives with near-perfect accuracy	Dramatic efficiency gain

Comparison with Baselines

graph LR
    subgraph "Accuracy Comparison"
        A[Single-Stage RF<br/>~91%]
        B[Logistic Regression<br/>~87%]
        C[SVM<br/>~89%]
        D[Astrophage<br/>94.81%]
    end

    style D fill:#2ecc71,stroke:#27ae60,color:#fff
    style A fill:#95a5a6,stroke:#7f8c8d,color:#fff
    style B fill:#95a5a6,stroke:#7f8c8d,color:#fff
    style C fill:#95a5a6,stroke:#7f8c8d,color:#fff

Astrophage’s two-stage architecture provides a 3-4% accuracy improvement over single-stage approaches, which is significant in the context of exoplanet discovery where each percentage point represents hundreds of potential planets.

Keyboard shortcuts

Astrophage Documentation