• About
  • Get Started
  • Guides
  • ValidMind Library
    • ValidMind Library
    • Supported Models

    • QUICKSTART
    • For Model Documentation
    • For Model Validation

    • TESTING
    • Run Tests & Test Suites
    • Test Descriptions
    • Test Sandbox (BETA)

    • CODE SAMPLES
    • All Code Samples · LLM · NLP · Time Series · Etc.
    • Download Code Samples · notebooks.zip
    • Try it on JupyterHub

    • REFERENCE
    • ValidMind Library Python API
  • Support
  • Training
  • Releases
  • Documentation
    • About ​ValidMind
    • Get Started
    • Guides
    • Support
    • Releases

    • Python Library
    • ValidMind Library

    • ValidMind Academy
    • Training Courses
  • Log In
    • Public Internet
    • ValidMind Platform · US1
    • ValidMind Platform · CA1

    • Private Link
    • Virtual Private ValidMind (VPV)

    • Which login should I use?
  1. tests
  2. model_validation
  3. ClassifierThresholdOptimization

EU AI Act Compliance — Read our original regulation brief on how the EU AI Act aims to balance innovation with safety and accountability, setting standards for responsible AI use

  • ValidMind Library

  • Python API
  • 2.8.13
  • init
  • init_dataset
  • init_model
  • init_r_model
  • get_test_suite
  • log_metric
  • preview_template
  • print_env
  • reload
  • run_documentation_tests
  • run_test_suite
  • tags
  • tasks
  • test
  • log_text
  • RawData
    • RawData
    • inspect
    • serialize

  • Submodules
  • __version__
  • datasets
    • classification
      • customer_churn
      • taiwan_credit
    • credit_risk
      • lending_club
      • lending_club_bias
    • nlp
      • cnn_dailymail
      • twitter_covid_19
    • regression
      • fred
      • lending_club
  • errors
  • test_suites
    • classifier
    • cluster
    • embeddings
    • llm
    • nlp
    • parameters_optimization
    • regression
    • statsmodels_timeseries
    • summarization
    • tabular_datasets
    • text_data
    • time_series
  • tests
    • data_validation
      • ACFandPACFPlot
      • ADF
      • AutoAR
      • AutoMA
      • AutoStationarity
      • BivariateScatterPlots
      • BoxPierce
      • ChiSquaredFeaturesTable
      • ClassImbalance
      • CommonWords
      • DatasetDescription
      • DatasetSplit
      • DescriptiveStatistics
      • DickeyFullerGLS
      • Duplicates
      • EngleGrangerCoint
      • FeatureTargetCorrelationPlot
      • Hashtags
      • HighCardinality
      • HighPearsonCorrelation
      • IQROutliersBarPlot
      • IQROutliersTable
      • IsolationForestOutliers
      • JarqueBera
      • KPSS
      • LJungBox
      • LaggedCorrelationHeatmap
      • LanguageDetection
      • Mentions
      • MissingValues
      • MissingValuesBarPlot
      • MutualInformation
      • PearsonCorrelationMatrix
      • PhillipsPerronArch
      • PolarityAndSubjectivity
      • ProtectedClassesCombination
      • ProtectedClassesDescription
      • ProtectedClassesDisparity
      • ProtectedClassesThresholdOptimizer
      • Punctuations
      • RollingStatsPlot
      • RunsTest
      • ScatterPlot
      • ScoreBandDefaultRates
      • SeasonalDecompose
      • Sentiment
      • ShapiroWilk
      • Skewness
      • SpreadPlot
      • StopWords
      • TabularCategoricalBarPlots
      • TabularDateTimeHistograms
      • TabularDescriptionTables
      • TabularNumericalHistograms
      • TargetRateBarPlots
      • TextDescription
      • TimeSeriesDescription
      • TimeSeriesDescriptiveStatistics
      • TimeSeriesFrequency
      • TimeSeriesHistogram
      • TimeSeriesLinePlot
      • TimeSeriesMissingValues
      • TimeSeriesOutliers
      • TooManyZeroValues
      • Toxicity
      • UniqueRows
      • WOEBinPlots
      • WOEBinTable
      • ZivotAndrewsArch
      • nlp
    • model_validation
      • AdjustedMutualInformation
      • AdjustedRandIndex
      • AutoARIMA
      • BertScore
      • BleuScore
      • CalibrationCurve
      • ClassifierPerformance
      • ClassifierThresholdOptimization
      • ClusterCosineSimilarity
      • ClusterPerformanceMetrics
      • ClusterSizeDistribution
      • CompletenessScore
      • ConfusionMatrix
      • ContextualRecall
      • CumulativePredictionProbabilities
      • DurbinWatsonTest
      • FeatureImportance
      • FeaturesAUC
      • FowlkesMallowsScore
      • GINITable
      • HomogeneityScore
      • HyperParametersTuning
      • KMeansClustersOptimization
      • KolmogorovSmirnov
      • Lilliefors
      • MeteorScore
      • MinimumAccuracy
      • MinimumF1Score
      • MinimumROCAUCScore
      • ModelMetadata
      • ModelParameters
      • ModelPredictionResiduals
      • ModelsPerformanceComparison
      • OverfitDiagnosis
      • PermutationFeatureImportance
      • PopulationStabilityIndex
      • PrecisionRecallCurve
      • PredictionProbabilitiesHistogram
      • ROCCurve
      • RegardScore
      • RegressionCoeffs
      • RegressionErrors
      • RegressionErrorsComparison
      • RegressionFeatureSignificance
      • RegressionModelForecastPlot
      • RegressionModelForecastPlotLevels
      • RegressionModelSensitivityPlot
      • RegressionModelSummary
      • RegressionPerformance
      • RegressionPermutationFeatureImportance
      • RegressionR2Square
      • RegressionR2SquareComparison
      • RegressionResidualsPlot
      • RobustnessDiagnosis
      • RougeScore
      • SHAPGlobalImportance
      • ScoreProbabilityAlignment
      • ScorecardHistogram
      • SilhouettePlot
      • TimeSeriesPredictionWithCI
      • TimeSeriesPredictionsPlot
      • TimeSeriesR2SquareBySegments
      • TokenDisparity
      • ToxicityScore
      • TrainingTestDegradation
      • VMeasure
      • WeakspotsDiagnosis
      • sklearn
      • statsmodels
      • statsutils
    • prompt_validation
      • Bias
      • Clarity
      • Conciseness
      • Delimitation
      • NegativeInstruction
      • Robustness
      • Specificity
      • ai_powered_test
  • unit_metrics
  • vm_models

On this page

  • ClassifierThresholdOptimization
    • Purpose
    • Test Mechanism
    • Signs of High Risk
    • Strengths
    • Limitations
  • find_optimal_threshold
  • Edit this page
  • Report an issue
  1. tests
  2. model_validation
  3. ClassifierThresholdOptimization

validmind.ClassifierThresholdOptimization

ClassifierThresholdOptimization

@tags('model_validation', 'threshold_optimization', 'classification_metrics')

@tasks('classification')

defClassifierThresholdOptimization(dataset:validmind.vm_models.VMDataset,model:validmind.vm_models.VMModel,methods:Optional[List[str]]=None,target_recall:Optional[float]=None) → Dict[str, Union[pd.DataFrame, go.Figure]]:

Analyzes and visualizes different threshold optimization methods for binary classification models.

Purpose

The Classifier Threshold Optimization test identifies optimal decision thresholds using various methods to balance different performance metrics. This helps adapt the model's decision boundary to specific business requirements, such as minimizing false positives in fraud detection or achieving target recall in medical diagnosis.

Test Mechanism

The test implements multiple threshold optimization methods:

  1. Youden's J statistic (maximizing sensitivity + specificity - 1)
  2. F1-score optimization (balancing precision and recall)
  3. Precision-Recall equality point
  4. Target recall achievement
  5. Naive (0.5) threshold For each method, it computes ROC and PR curves, identifies optimal points, and provides comprehensive performance metrics at each threshold.

Signs of High Risk

  • Large discrepancies between different optimization methods
  • Optimal thresholds far from the default 0.5
  • Poor performance metrics across all thresholds
  • Significant gap between achieved and target recall
  • Unstable thresholds across different methods
  • Extreme trade-offs between precision and recall
  • Threshold optimization showing minimal impact
  • Business metrics not improving with optimization

Strengths

  • Multiple optimization strategies for different needs
  • Visual and numerical results for comparison
  • Support for business-driven optimization (target recall)
  • Comprehensive performance metrics at each threshold
  • Integration with ROC and PR curves
  • Handles class imbalance through various metrics
  • Enables informed threshold selection
  • Supports cost-sensitive decision making

Limitations

  • Assumes cost of false positives/negatives are known
  • May need adjustment for highly imbalanced datasets
  • Threshold might not be stable across different samples
  • Cannot handle multi-class problems directly
  • Optimization methods may conflict with business needs
  • Requires sufficient validation data
  • May not capture temporal changes in optimal threshold
  • Single threshold may not be optimal for all subgroups

Arguments

  • dataset: VMDataset containing features and target
  • model: VMModel containing predictions
  • methods: List of methods to compare (default: ['youden', 'f1', 'precision_recall'])
  • target_recall: Target recall value if using 'target_recall' method

Returns

  • Dictionary containing:
  • table: DataFrame comparing different threshold optimization methods (using weighted averages for precision, recall, and f1)
  • figure: Plotly figure showing ROC and PR curves with optimal thresholds

find_optimal_threshold

deffind_optimal_threshold(y_true:np.ndarray,y_prob:np.ndarray,method:str='youden',target_recall:Optional[float]=None) → Dict[str, Union[str, float]]:

Find the optimal classification threshold using various methods.

Arguments

  • y_true: True binary labels
  • y_prob: Predicted probabilities
  • method: Method to use for finding optimal threshold
  • target_recall: Required if method='target_recall'

Returns

  • Dictionary containing threshold and metrics
ClassifierPerformance
ClusterCosineSimilarity

© Copyright 2025 ValidMind Inc. All Rights Reserved.

  • Edit this page
  • Report an issue
Cookie Preferences
  • validmind.com

  • Privacy Policy

  • Terms of Use