Contributing to PySSL#
We welcome contributions to PySSL! This guide will help you get started, whether you want to report a bug, request a feature, or contribute code.
π― Ways to Contribute#
There are many ways to contribute to PySSL:
π Report bugs - Help us identify and fix issues
π‘ Request features - Suggest new functionality
π Improve documentation - Help make PySSL more accessible
π§ Submit code - Fix bugs or implement new features
π§ͺ Add tests - Improve test coverage
π Share examples - Show PySSL in action
π Quick Start for Contributors#
### 1. Set Up Development Environment
# Fork the repository on GitHub first
git clone https://github.com/YOUR-USERNAME/pyssl.git
cd pyssl
# Install development dependencies
pip install -e ".[test,docs]"
# Run tests to verify setup
pytest tests/
### 2. Create a Branch
git checkout -b feature/your-feature-name
# or
git checkout -b fix/issue-description
### 3. Make Your Changes
Write clear, well-documented code
Add tests for new functionality
Update documentation as needed
### 4. Submit Your Contribution
# Run tests and linting
pytest tests/
# Commit your changes
git add .
git commit -m "Add feature: brief description"
# Push to your fork
git push origin feature/your-feature-name
# Create a pull request on GitHub
π Reporting Bugs#
Before reporting a bug, please:
Check existing issues - Your bug might already be reported
Update to latest version - The bug might already be fixed
Create a minimal example - Help us reproduce the issue
### Bug Report Template
When reporting bugs, please include:
**Bug Description**
A clear description of what the bug is.
**To Reproduce**
```python
# Minimal code example that reproduces the bug
import ssl_framework
# ... your code here
```
**Expected Behavior**
What you expected to happen.
**Actual Behavior**
What actually happened.
**Environment**
- PySSL version: [e.g., 0.1.0]
- Python version: [e.g., 3.9.0]
- Operating System: [e.g., Ubuntu 20.04]
- scikit-learn version: [e.g., 1.3.0]
**Additional Context**
Any other relevant information.
π‘ Requesting Features#
We love feature requests! Please:
Check existing requests - Your idea might already be discussed
Explain the use case - Help us understand why itβs needed
Suggest implementation - If you have ideas about how to implement it
### Feature Request Template
**Feature Description**
A clear description of the feature you'd like to see.
**Use Case**
Describe the problem this feature would solve.
**Proposed Solution**
How you envision this feature working.
**Alternatives**
Any alternative solutions you've considered.
**Additional Context**
Any other relevant information.
π§ Development Guidelines#
### Code Style
We follow these conventions:
PEP 8 for Python code style
Type hints for all public functions
Docstrings for all public classes and methods
Clear variable names - prefer descriptive over concise
Example:
def select_confident_samples(
X_unlabeled: np.ndarray,
y_proba: np.ndarray,
threshold: float = 0.95
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Select samples above confidence threshold.
Parameters
----------
X_unlabeled : np.ndarray
Unlabeled feature data.
y_proba : np.ndarray
Predicted probabilities.
threshold : float, default=0.95
Confidence threshold.
Returns
-------
X_selected : np.ndarray
Selected feature data.
y_selected : np.ndarray
Selected pseudo-labels.
indices : np.ndarray
Selected sample indices.
"""
# Implementation here
pass
### Testing
All code contributions must include tests:
Unit tests for individual functions
Integration tests for complete workflows
Edge case tests for boundary conditions
Example test:
def test_confidence_threshold_selection():
"""Test that ConfidenceThreshold selects correct samples."""
# Arrange
X_unlabeled = np.array([[1, 2], [3, 4], [5, 6]])
y_proba = np.array([[0.9, 0.1], [0.6, 0.4], [0.98, 0.02]])
strategy = ConfidenceThreshold(threshold=0.95)
# Act
X_selected, y_selected, indices = strategy.select_labels(X_unlabeled, y_proba)
# Assert
assert len(X_selected) == 2 # Samples 0 and 2 should be selected
np.testing.assert_array_equal(indices, [0, 2])
### Documentation
Update documentation for:
New features - Add to user guide and API reference
API changes - Update docstrings and examples
Bug fixes - Note in changelog
Documentation is written in reStructuredText and built with Sphinx.
π Project Structure#
Understanding the project layout:
pyssl/
βββ ssl_framework/ # Main package
β βββ __init__.py
β βββ main.py # SelfTrainingClassifier
β βββ strategies.py # Selection/Integration strategies
βββ tests/ # Test suite
β βββ test_main.py
β βββ test_strategies.py
βββ docs/ # Documentation
β βββ source/
βββ examples/ # Example scripts and notebooks
βββ pyproject.toml # Project configuration
βββ README.md
π§ͺ Running Tests#
### Basic Testing
# Run all tests
pytest tests/
# Run specific test file
pytest tests/test_main.py
# Run with coverage
pytest tests/ --cov=ssl_framework --cov-report=html
### Test Types
Unit tests - Test individual functions/methods
Integration tests - Test complete workflows
Strategy tests - Test selection/integration strategies
### Writing Good Tests
Test the interface, not implementation
Use descriptive test names
Follow Arrange-Act-Assert pattern
Test edge cases and error conditions
π Documentation#
### Building Documentation
# Build HTML documentation
sphinx-build -b html docs/source docs/build
# Serve locally
python -m http.server -d docs/build 8000
### Documentation Types
API Reference - Auto-generated from docstrings
User Guide - Tutorials and how-to guides
Examples - Jupyter notebooks and scripts
π Pull Request Process#
### Before Submitting
β Tests pass - All existing and new tests
β Documentation updated - For new features
β Code style - Follows project conventions
β Clear commit messages - Describe what and why
### PR Review Process
Automated checks - Tests, linting, coverage
Code review - Maintainer review for correctness
Documentation review - Clarity and completeness
Final approval - Merge when ready
### PR Guidelines
Clear title - Summarize the change
Detailed description - Explain what and why
Link issues - Reference related issues
Small focused changes - Easier to review
Example PR description:
## Summary
Adds support for custom confidence thresholds in TopKFixedCount strategy.
## Changes
- Add optional `min_confidence` parameter to TopKFixedCount
- Update tests to cover new functionality
- Add documentation example
## Motivation
Addresses issue #123 where users wanted to combine TopK selection with minimum confidence requirements.
## Testing
- Added unit tests for new parameter
- Verified existing tests still pass
- Tested with real dataset in examples/
π·οΈ Coding Standards#
### Python Standards
Python 3.8+ compatibility
Type hints for public APIs
Docstrings following NumPy style
Error handling with informative messages
### API Design
Scikit-learn compatibility - Follow sklearn conventions
Modular design - Clear separation of concerns
Backward compatibility - Avoid breaking changes
Clear interfaces - Well-defined strategy protocols
### Performance
Efficient NumPy operations - Vectorized computations
Memory conscious - Handle large datasets appropriately
Benchmark critical paths - Measure performance impact
π Recognition#
All contributors are recognized in:
CONTRIBUTORS.md - List of all contributors
Release notes - Credit for specific contributions
Documentation - Author attribution where appropriate
π€ Community Guidelines#
We strive to maintain a welcoming, inclusive community:
Be respectful - Treat everyone with kindness
Be constructive - Provide helpful feedback
Be patient - Everyone is learning
Assume good intent - Give others benefit of doubt
### Getting Help
If you need help:
π Read the docs - Most questions are answered here
π Search issues - Someone might have asked before
π¬ Start a discussion - Use GitHub Discussions for questions
π§ Contact maintainers - For sensitive issues
π Advanced Contributions#
### New Strategy Types
Want to implement a new selection or integration strategy?
Study existing strategies - Understand the interface
Implement the protocol - Follow method signatures
Add comprehensive tests - Cover edge cases
Document thoroughly - Include examples
### Performance Improvements
Profile first - Identify actual bottlenecks
Benchmark changes - Measure improvement
Maintain compatibility - Donβt break existing code
### New Features
Discuss first - Open an issue to discuss design
Start small - Implement minimal viable version
Iterate - Refine based on feedback
Thank you for contributing to PySSL! π