ssl_framework.strategies#

This module contains strategy classes for label selection and integration in the SSL framework. These provide modular, swappable components for customizing semi-supervised learning behavior.

Selection Strategies#

Selection strategies determine which unlabeled samples to pseudo-label.

ConfidenceThreshold#

class ssl_framework.strategies.ConfidenceThreshold(threshold=0.95)[source]#

Bases: object

Label selection strategy based on confidence threshold.

Selects unlabeled samples where the maximum predicted probability exceeds a specified threshold.

Example:

from ssl_framework.strategies import ConfidenceThreshold

# Select samples with >95% confidence
strategy = ConfidenceThreshold(threshold=0.95)

# Use with SelfTrainingClassifier
from ssl_framework.main import SelfTrainingClassifier
ssl_clf = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    selection_strategy=strategy
)
__init__(threshold=0.95)[source]#

Initialize the confidence threshold strategy.

Parameters:

threshold (float, default=0.95) – Confidence threshold for selecting pseudo-labels. Samples with max probability > threshold will be selected.

select_labels(X_unlabeled, y_proba)[source]#

Select samples based on confidence threshold.

Parameters:
  • X_unlabeled (ndarray of shape (n_unlabeled_samples, n_features)) – Unlabeled feature data.

  • y_proba (ndarray of shape (n_unlabeled_samples, n_classes)) – Predicted class probabilities for unlabeled samples.

Returns:

  • X_new_labeled (ndarray) – Feature data for newly selected samples.

  • y_new_labels (ndarray) – Predicted labels for newly selected samples.

  • indices_to_remove (ndarray) – Indices of samples to remove from unlabeled set.

TopKFixedCount#

class ssl_framework.strategies.TopKFixedCount(k=10)[source]#

Bases: object

Label selection strategy that selects top K most confident samples.

This strategy always selects exactly K samples with the highest maximum predicted probabilities, regardless of confidence threshold.

Example:

from ssl_framework.strategies import TopKFixedCount

# Always select top 20 most confident samples
strategy = TopKFixedCount(k=20)

# Use with SelfTrainingClassifier
ssl_clf = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    selection_strategy=strategy
)
__init__(k=10)[source]#

Initialize the top-K strategy.

Parameters:

k (int, default=10) – Number of samples to select in each iteration.

select_labels(X_unlabeled, y_proba)[source]#

Select the K most confident samples.

Parameters:
  • X_unlabeled (ndarray of shape (n_unlabeled_samples, n_features)) – Unlabeled feature data.

  • y_proba (ndarray of shape (n_unlabeled_samples, n_classes)) – Predicted class probabilities for unlabeled samples.

Returns:

  • X_new_labeled (ndarray) – Feature data for newly selected samples.

  • y_new_labels (ndarray) – Predicted labels for newly selected samples.

  • indices_to_remove (ndarray) – Indices of samples to remove from unlabeled set.

Integration Strategies#

Integration strategies determine how to integrate pseudo-labeled samples into the training set.

AppendAndGrow#

class ssl_framework.strategies.AppendAndGrow[source]#

Bases: object

Label integration strategy that appends new labels to existing set.

This strategy grows the labeled dataset monotonically by appending newly pseudo-labeled samples to the current labeled set.

Example:

from ssl_framework.strategies import AppendAndGrow

# Simply append new samples to labeled set
strategy = AppendAndGrow()

ssl_clf = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    integration_strategy=strategy
)
__init__()[source]#

Initialize the append-and-grow strategy.

integrate_labels(X_labeled, y_labeled, X_new_labeled, y_new_labels, **kwargs)[source]#

Integrate new pseudo-labeled samples by appending them.

Parameters:
  • X_labeled (ndarray) – Current labeled feature data.

  • y_labeled (ndarray) – Current labeled targets.

  • X_new_labeled (ndarray) – New pseudo-labeled feature data.

  • y_new_labels (ndarray) – New pseudo-labels.

  • **kwargs – Additional parameters (ignored).

Returns:

  • X_labeled_next (ndarray) – Updated labeled feature data.

  • y_labeled_next (ndarray) – Updated labeled targets.

  • sample_weights_next (None) – Sample weights (None for this strategy).

FullReLabeling#

class ssl_framework.strategies.FullReLabeling(X_original, y_original)[source]#

Bases: object

Label integration strategy that re-labels the entire dataset each iteration.

Instead of growing the labeled set monotonically, this strategy always uses the original labeled data plus all newly pseudo-labeled samples.

Example:

from ssl_framework.strategies import FullReLabeling

# Re-label from scratch each iteration
strategy = FullReLabeling(X_original, y_original)

ssl_clf = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    integration_strategy=strategy
)
__init__(X_original, y_original)[source]#

Initialize the full re-labeling strategy.

Parameters:
  • X_original (ndarray) – Original labeled feature data.

  • y_original (ndarray) – Original labeled targets.

integrate_labels(X_labeled, y_labeled, X_new_labeled, y_new_labels, **kwargs)[source]#

Integrate labels by concatenating with original data only.

Parameters:
  • X_labeled (ndarray) – Current labeled feature data (ignored).

  • y_labeled (ndarray) – Current labeled targets (ignored).

  • X_new_labeled (ndarray) – New pseudo-labeled feature data.

  • y_new_labels (ndarray) – New pseudo-labels.

  • **kwargs – Additional parameters (ignored).

Returns:

  • X_labeled_next (ndarray) – Original data concatenated with new pseudo-labeled data.

  • y_labeled_next (ndarray) – Original labels concatenated with new pseudo-labels.

  • sample_weights_next (None) – Sample weights (None for this strategy).

ConfidenceWeighting#

class ssl_framework.strategies.ConfidenceWeighting[source]#

Bases: object

Label integration strategy that weights samples by their confidence.

Newly pseudo-labeled samples are assigned weights proportional to their confidence, while original labeled samples maintain weight 1.0.

Example:

from ssl_framework.strategies import ConfidenceWeighting

# Weight samples by their confidence
strategy = ConfidenceWeighting()

ssl_clf = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    integration_strategy=strategy
)
__init__()[source]#

Initialize the confidence weighting strategy.

integrate_labels(X_labeled, y_labeled, X_new_labeled, y_new_labels, y_proba=None, indices=None)[source]#

Integrate labels with confidence-based weighting.

Parameters:
  • X_labeled (ndarray) – Current labeled feature data.

  • y_labeled (ndarray) – Current labeled targets.

  • X_new_labeled (ndarray) – New pseudo-labeled feature data.

  • y_new_labels (ndarray) – New pseudo-labels.

  • y_proba (ndarray, optional) – Predicted probabilities for all unlabeled samples.

  • indices (ndarray, optional) – Indices of selected samples in y_proba.

Returns:

  • X_labeled_next (ndarray) – Updated labeled feature data.

  • y_labeled_next (ndarray) – Updated labeled targets.

  • sample_weights_next (ndarray) – Sample weights with confidence-based weighting.

Strategy Combinations#

Mix and match strategies for different behaviors:

Conservative SSL#

High confidence threshold with simple append strategy:

from ssl_framework.strategies import ConfidenceThreshold, AppendAndGrow

ssl_conservative = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    selection_strategy=ConfidenceThreshold(threshold=0.98),
    integration_strategy=AppendAndGrow(),
    max_iter=10
)

Aggressive SSL#

Fixed count selection with confidence weighting:

from ssl_framework.strategies import TopKFixedCount, ConfidenceWeighting

ssl_aggressive = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    selection_strategy=TopKFixedCount(k=50),
    integration_strategy=ConfidenceWeighting(),
    max_iter=5
)

Experimental SSL#

Full re-labeling approach (can be computationally expensive):

from ssl_framework.strategies import TopKFixedCount, FullReLabeling

ssl_experimental = SelfTrainingClassifier(
    base_model=LogisticRegression(),
    selection_strategy=TopKFixedCount(k=10),
    integration_strategy=FullReLabeling(X_original, y_original),
    max_iter=3
)

Custom Strategy Implementation#

To implement your own strategies, follow these interfaces:

Selection Strategy Interface#

class CustomSelectionStrategy:
    def select_labels(self, X_unlabeled, y_proba):
        \"\"\"Select samples for pseudo-labeling.

        Parameters
        ----------
        X_unlabeled : ndarray of shape (n_unlabeled_samples, n_features)
            Unlabeled feature data.
        y_proba : ndarray of shape (n_unlabeled_samples, n_classes)
            Predicted class probabilities for unlabeled samples.

        Returns
        -------
        X_new_labeled : ndarray
            Feature data for newly selected samples.
        y_new_labels : ndarray
            Predicted labels for newly selected samples.
        indices_to_remove : ndarray
            Indices of samples to remove from unlabeled set.
        \"\"\"
        # Your selection logic here
        pass

Integration Strategy Interface#

class CustomIntegrationStrategy:
    def integrate_labels(self, X_labeled, y_labeled, X_new_labeled, y_new_labels, **kwargs):
        \"\"\"Integrate new pseudo-labeled samples.

        Parameters
        ----------
        X_labeled : ndarray
            Current labeled feature data.
        y_labeled : ndarray
            Current labeled targets.
        X_new_labeled : ndarray
            New pseudo-labeled feature data.
        y_new_labels : ndarray
            New pseudo-labels.
        **kwargs
            Additional parameters (y_proba, indices, etc.).

        Returns
        -------
        X_labeled_next : ndarray
            Updated labeled feature data.
        y_labeled_next : ndarray
            Updated labeled targets.
        sample_weights_next : ndarray or None
            Sample weights (None if not using weighting).
        \"\"\"
        # Your integration logic here
        pass