ssl_framework.strategies#
This module contains strategy classes for label selection and integration in the SSL framework. These provide modular, swappable components for customizing semi-supervised learning behavior.
Selection Strategies#
Selection strategies determine which unlabeled samples to pseudo-label.
ConfidenceThreshold#
- class ssl_framework.strategies.ConfidenceThreshold(threshold=0.95)[source]#
Bases:
objectLabel selection strategy based on confidence threshold.
Selects unlabeled samples where the maximum predicted probability exceeds a specified threshold.
Example:
from ssl_framework.strategies import ConfidenceThreshold # Select samples with >95% confidence strategy = ConfidenceThreshold(threshold=0.95) # Use with SelfTrainingClassifier from ssl_framework.main import SelfTrainingClassifier ssl_clf = SelfTrainingClassifier( base_model=LogisticRegression(), selection_strategy=strategy )
- __init__(threshold=0.95)[source]#
Initialize the confidence threshold strategy.
- Parameters:
threshold (float, default=0.95) – Confidence threshold for selecting pseudo-labels. Samples with max probability > threshold will be selected.
- select_labels(X_unlabeled, y_proba)[source]#
Select samples based on confidence threshold.
- Parameters:
X_unlabeled (ndarray of shape (n_unlabeled_samples, n_features)) – Unlabeled feature data.
y_proba (ndarray of shape (n_unlabeled_samples, n_classes)) – Predicted class probabilities for unlabeled samples.
- Returns:
X_new_labeled (ndarray) – Feature data for newly selected samples.
y_new_labels (ndarray) – Predicted labels for newly selected samples.
indices_to_remove (ndarray) – Indices of samples to remove from unlabeled set.
TopKFixedCount#
- class ssl_framework.strategies.TopKFixedCount(k=10)[source]#
Bases:
objectLabel selection strategy that selects top K most confident samples.
This strategy always selects exactly K samples with the highest maximum predicted probabilities, regardless of confidence threshold.
Example:
from ssl_framework.strategies import TopKFixedCount # Always select top 20 most confident samples strategy = TopKFixedCount(k=20) # Use with SelfTrainingClassifier ssl_clf = SelfTrainingClassifier( base_model=LogisticRegression(), selection_strategy=strategy )
- __init__(k=10)[source]#
Initialize the top-K strategy.
- Parameters:
k (int, default=10) – Number of samples to select in each iteration.
- select_labels(X_unlabeled, y_proba)[source]#
Select the K most confident samples.
- Parameters:
X_unlabeled (ndarray of shape (n_unlabeled_samples, n_features)) – Unlabeled feature data.
y_proba (ndarray of shape (n_unlabeled_samples, n_classes)) – Predicted class probabilities for unlabeled samples.
- Returns:
X_new_labeled (ndarray) – Feature data for newly selected samples.
y_new_labels (ndarray) – Predicted labels for newly selected samples.
indices_to_remove (ndarray) – Indices of samples to remove from unlabeled set.
Integration Strategies#
Integration strategies determine how to integrate pseudo-labeled samples into the training set.
AppendAndGrow#
- class ssl_framework.strategies.AppendAndGrow[source]#
Bases:
objectLabel integration strategy that appends new labels to existing set.
This strategy grows the labeled dataset monotonically by appending newly pseudo-labeled samples to the current labeled set.
Example:
from ssl_framework.strategies import AppendAndGrow # Simply append new samples to labeled set strategy = AppendAndGrow() ssl_clf = SelfTrainingClassifier( base_model=LogisticRegression(), integration_strategy=strategy )
- integrate_labels(X_labeled, y_labeled, X_new_labeled, y_new_labels, **kwargs)[source]#
Integrate new pseudo-labeled samples by appending them.
- Parameters:
X_labeled (ndarray) – Current labeled feature data.
y_labeled (ndarray) – Current labeled targets.
X_new_labeled (ndarray) – New pseudo-labeled feature data.
y_new_labels (ndarray) – New pseudo-labels.
**kwargs – Additional parameters (ignored).
- Returns:
X_labeled_next (ndarray) – Updated labeled feature data.
y_labeled_next (ndarray) – Updated labeled targets.
sample_weights_next (None) – Sample weights (None for this strategy).
FullReLabeling#
- class ssl_framework.strategies.FullReLabeling(X_original, y_original)[source]#
Bases:
objectLabel integration strategy that re-labels the entire dataset each iteration.
Instead of growing the labeled set monotonically, this strategy always uses the original labeled data plus all newly pseudo-labeled samples.
Example:
from ssl_framework.strategies import FullReLabeling # Re-label from scratch each iteration strategy = FullReLabeling(X_original, y_original) ssl_clf = SelfTrainingClassifier( base_model=LogisticRegression(), integration_strategy=strategy )
- __init__(X_original, y_original)[source]#
Initialize the full re-labeling strategy.
- Parameters:
X_original (ndarray) – Original labeled feature data.
y_original (ndarray) – Original labeled targets.
- integrate_labels(X_labeled, y_labeled, X_new_labeled, y_new_labels, **kwargs)[source]#
Integrate labels by concatenating with original data only.
- Parameters:
X_labeled (ndarray) – Current labeled feature data (ignored).
y_labeled (ndarray) – Current labeled targets (ignored).
X_new_labeled (ndarray) – New pseudo-labeled feature data.
y_new_labels (ndarray) – New pseudo-labels.
**kwargs – Additional parameters (ignored).
- Returns:
X_labeled_next (ndarray) – Original data concatenated with new pseudo-labeled data.
y_labeled_next (ndarray) – Original labels concatenated with new pseudo-labels.
sample_weights_next (None) – Sample weights (None for this strategy).
ConfidenceWeighting#
- class ssl_framework.strategies.ConfidenceWeighting[source]#
Bases:
objectLabel integration strategy that weights samples by their confidence.
Newly pseudo-labeled samples are assigned weights proportional to their confidence, while original labeled samples maintain weight 1.0.
Example:
from ssl_framework.strategies import ConfidenceWeighting # Weight samples by their confidence strategy = ConfidenceWeighting() ssl_clf = SelfTrainingClassifier( base_model=LogisticRegression(), integration_strategy=strategy )
- integrate_labels(X_labeled, y_labeled, X_new_labeled, y_new_labels, y_proba=None, indices=None)[source]#
Integrate labels with confidence-based weighting.
- Parameters:
X_labeled (ndarray) – Current labeled feature data.
y_labeled (ndarray) – Current labeled targets.
X_new_labeled (ndarray) – New pseudo-labeled feature data.
y_new_labels (ndarray) – New pseudo-labels.
y_proba (ndarray, optional) – Predicted probabilities for all unlabeled samples.
indices (ndarray, optional) – Indices of selected samples in y_proba.
- Returns:
X_labeled_next (ndarray) – Updated labeled feature data.
y_labeled_next (ndarray) – Updated labeled targets.
sample_weights_next (ndarray) – Sample weights with confidence-based weighting.
Strategy Combinations#
Mix and match strategies for different behaviors:
Conservative SSL#
High confidence threshold with simple append strategy:
from ssl_framework.strategies import ConfidenceThreshold, AppendAndGrow
ssl_conservative = SelfTrainingClassifier(
base_model=LogisticRegression(),
selection_strategy=ConfidenceThreshold(threshold=0.98),
integration_strategy=AppendAndGrow(),
max_iter=10
)
Aggressive SSL#
Fixed count selection with confidence weighting:
from ssl_framework.strategies import TopKFixedCount, ConfidenceWeighting
ssl_aggressive = SelfTrainingClassifier(
base_model=LogisticRegression(),
selection_strategy=TopKFixedCount(k=50),
integration_strategy=ConfidenceWeighting(),
max_iter=5
)
Experimental SSL#
Full re-labeling approach (can be computationally expensive):
from ssl_framework.strategies import TopKFixedCount, FullReLabeling
ssl_experimental = SelfTrainingClassifier(
base_model=LogisticRegression(),
selection_strategy=TopKFixedCount(k=10),
integration_strategy=FullReLabeling(X_original, y_original),
max_iter=3
)
Custom Strategy Implementation#
To implement your own strategies, follow these interfaces:
Selection Strategy Interface#
class CustomSelectionStrategy:
def select_labels(self, X_unlabeled, y_proba):
\"\"\"Select samples for pseudo-labeling.
Parameters
----------
X_unlabeled : ndarray of shape (n_unlabeled_samples, n_features)
Unlabeled feature data.
y_proba : ndarray of shape (n_unlabeled_samples, n_classes)
Predicted class probabilities for unlabeled samples.
Returns
-------
X_new_labeled : ndarray
Feature data for newly selected samples.
y_new_labels : ndarray
Predicted labels for newly selected samples.
indices_to_remove : ndarray
Indices of samples to remove from unlabeled set.
\"\"\"
# Your selection logic here
pass
Integration Strategy Interface#
class CustomIntegrationStrategy:
def integrate_labels(self, X_labeled, y_labeled, X_new_labeled, y_new_labels, **kwargs):
\"\"\"Integrate new pseudo-labeled samples.
Parameters
----------
X_labeled : ndarray
Current labeled feature data.
y_labeled : ndarray
Current labeled targets.
X_new_labeled : ndarray
New pseudo-labeled feature data.
y_new_labels : ndarray
New pseudo-labels.
**kwargs
Additional parameters (y_proba, indices, etc.).
Returns
-------
X_labeled_next : ndarray
Updated labeled feature data.
y_labeled_next : ndarray
Updated labeled targets.
sample_weights_next : ndarray or None
Sample weights (None if not using weighting).
\"\"\"
# Your integration logic here
pass