From 983e5fe11769639659379f1f481ac789a42fcabc Mon Sep 17 00:00:00 2001
From: "Alinson S. Xavier" <git@axavier.org>
Date: Thu, 20 May 2021 08:36:34 -0500
Subject: [PATCH] Add docs-sphinx

---
 docs-sphinx/Makefile           |  14 ++
 docs-sphinx/_static/custom.css |   7 +
 docs-sphinx/about.md           |  58 ++++++++
 docs-sphinx/benchmark.md       | 177 ++++++++++++++++++++++++
 docs-sphinx/conf.py            |  16 +++
 docs-sphinx/customization.md   | 182 ++++++++++++++++++++++++
 docs-sphinx/index.md           |  35 +++++
 docs-sphinx/usage.md           | 246 +++++++++++++++++++++++++++++++++
 8 files changed, 735 insertions(+)
 create mode 100644 docs-sphinx/Makefile
 create mode 100644 docs-sphinx/_static/custom.css
 create mode 100644 docs-sphinx/about.md
 create mode 100644 docs-sphinx/benchmark.md
 create mode 100644 docs-sphinx/conf.py
 create mode 100644 docs-sphinx/customization.md
 create mode 100644 docs-sphinx/index.md
 create mode 100644 docs-sphinx/usage.md

diff --git a/docs-sphinx/Makefile b/docs-sphinx/Makefile
new file mode 100644
index 0000000..c987782
--- /dev/null
+++ b/docs-sphinx/Makefile
@@ -0,0 +1,14 @@
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs-sphinx/_static/custom.css b/docs-sphinx/_static/custom.css
new file mode 100644
index 0000000..d484dec
--- /dev/null
+++ b/docs-sphinx/_static/custom.css
@@ -0,0 +1,7 @@
+h1.site-logo {
+    font-size: 30px !important;
+}
+
+h1.site-logo small {
+    font-size: 20px !important;
+}
\ No newline at end of file
diff --git a/docs-sphinx/about.md b/docs-sphinx/about.md
new file mode 100644
index 0000000..fb98131
--- /dev/null
+++ b/docs-sphinx/about.md
@@ -0,0 +1,58 @@
+```{sectnum}
+---
+start: 4
+depth: 2
+suffix: .
+---
+```
+
+# About
+
+## Authors
+
+* **Alinson S. Xavier,** Argonne National Laboratory <<axavier@anl.gov>>
+* **Feng Qiu,** Argonne National Laboratory <<fqiu@anl.gov>>
+
+## Acknowledgments
+
+* Based upon work supported by **Laboratory Directed Research and Development** (LDRD) funding from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357, and the **U.S. Department of Energy Advanced Grid Modeling Program** under Grant DE-OE0000875.
+
+## References
+
+
+If you use MIPLearn in your research, or the included problem generators, we kindly request that you cite the package as follows:
+
+* **Alinson S. Xavier, Feng Qiu.** *MIPLearn: An Extensible Framework for Learning-Enhanced Optimization*. Zenodo (2020). DOI: [10.5281/zenodo.4287567](https://doi.org/10.5281/zenodo.4287567)
+
+If you use MIPLearn in the field of power systems optimization, we kindly request that you cite the reference below, in which the main techniques implemented in MIPLearn were first developed:
+
+* **Alinson S. Xavier, Feng Qiu, Shabbir Ahmed.** *Learning to Solve Large-Scale Unit Commitment Problems.* INFORMS Journal on Computing (2020). DOI: [10.1287/ijoc.2020.0976](https://doi.org/10.1287/ijoc.2020.0976)
+
+## License
+
+```text
+MIPLearn, an extensible framework for Learning-Enhanced Mixed-Integer Optimization
+Copyright © 2020, UChicago Argonne, LLC. All Rights Reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted
+provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of
+   conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of
+   conditions and the following disclaimer in the documentation and/or other materials provided
+   with the distribution.
+3. Neither the name of the copyright holder nor the names of its contributors may be used to
+   endorse or promote products derived from this software without specific prior written
+   permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
+IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+```
diff --git a/docs-sphinx/benchmark.md b/docs-sphinx/benchmark.md
new file mode 100644
index 0000000..88b6db5
--- /dev/null
+++ b/docs-sphinx/benchmark.md
@@ -0,0 +1,177 @@
+```{sectnum}
+---
+start: 2
+depth: 2
+suffix: .
+---
+```
+
+# Benchmarks
+
+MIPLearn provides a selection of benchmark problems and random instance generators, covering applications from different fields, that can be used to evaluate new learning-enhanced MIP techniques in a measurable and reproducible way. In this page, we describe these problems, the included instance generators, and we present some benchmark results for  `LearningSolver` with default parameters.
+
+## Preliminaries
+
+### Benchmark challenges
+
+When evaluating the performance of a conventional MIP solver, *benchmark sets*, such as MIPLIB and TSPLIB, are typically used. The performance of newly proposed solvers or solution techniques are typically measured as the average (or total) running time the solver takes to solve the entire benchmark set. For Learning-Enhanced MIP solvers, it is also necessary to specify what instances should the solver be trained on (the *training instances*) before solving the actual set of instances we are interested in (the *test instances*). If the training instances are very similar to the test instances, we would expect a Learning-Enhanced Solver to present stronger perfomance benefits.
+
+In MIPLearn, each optimization problem comes with a set of **benchmark challenges**, which specify how should the training and test instances be generated. The first challenges are typically easier, in the sense that training and test instances are very similar. Later challenges gradually make the sets more distinct, and therefore harder to learn from.
+
+### Baseline results
+
+To illustrate the performance of `LearningSolver`, and to set a baseline for newly proposed techniques, we present in this page, for each benchmark challenge, a small set of computational results measuring the solution speed of the solver and the solution quality with default parameters. For more detailed computational studies, see [references](about.md#references). We compare three solvers:
+
+* **baseline:** Gurobi 9.0 with default settings (a conventional state-of-the-art MIP solver)
+* **ml-exact:** `LearningSolver` with default settings, using Gurobi 9.0 as internal MIP solver
+* **ml-heuristic:** Same as above, but with `mode="heuristic"`
+
+All experiments presented here were performed on a Linux server (Ubuntu Linux 18.04 LTS) with Intel Xeon Gold 6230s (2 processors, 40 cores, 80 threads) and 256 GB RAM (DDR4, 2933 MHz). All solvers were restricted to use 4 threads, with no time limits, and 10 instances were solved simultaneously at a time.
+
+
+
+## Maximum Weight Stable Set Problem
+
+### Problem definition
+
+Given a simple undirected graph $G=(V,E)$ and weights $w \in \mathbb{R}^V$, the problem is to find a stable set $S \subseteq V$ that maximizes $ \sum_{v \in V} w_v$. We recall that a subset $S \subseteq V$ is a *stable set* if no two vertices of $S$ are adjacent. This is one of Karp's 21 NP-complete problems.
+
+### Random instance generator
+
+The class `MaxWeightStableSetGenerator` can be used to generate random instances of this problem, with user-specified probability distributions. When the constructor parameter `fix_graph=True` is provided, one random Erdős-Rényi graph $G_{n,p}$ is generated during the constructor, where $n$ and $p$ are sampled from user-provided probability distributions `n` and `p`. To generate each instance, the generator independently samples each $w_v$ from the user-provided probability distribution `w`. When `fix_graph=False`, a new random graph is generated for each instance, while the remaining parameters are sampled in the same way.
+
+### Challenge A
+
+* Fixed random Erdős-Rényi graph $G_{n,p}$ with $n=200$ and $p=5\%$
+* Random vertex weights $w_v \sim U(100, 150)$
+* 500 training instances, 50 test instances
+
+```python
+MaxWeightStableSetGenerator(w=uniform(loc=100., scale=50.),
+                            n=randint(low=200, high=201),
+                            p=uniform(loc=0.05, scale=0.0),
+                            fix_graph=True)
+```
+
+![alt](figures/benchmark_stab_a.png)
+
+
+## Traveling Salesman Problem
+
+### Problem definition
+
+Given a list of cities and the distance between each pair of cities, the problem asks for the
+shortest route starting at the first city, visiting each other city exactly once, then returning
+to the first city. This problem is a generalization of the Hamiltonian path problem, one of Karp's
+21 NP-complete problems.
+
+### Random problem generator
+
+The class `TravelingSalesmanGenerator` can be used to generate random instances of this
+problem. Initially, the generator creates $n$ cities $(x_1,y_1),\ldots,(x_n,y_n) \in \mathbb{R}^2$,
+where $n, x_i$ and $y_i$ are sampled independently from the provided probability distributions `n`,
+`x` and `y`. For each pair of cities $(i,j)$, the distance $d_{i,j}$ between them is set to:
+$$
+    d_{i,j} = \gamma_{i,j} \sqrt{(x_i-x_j)^2 + (y_i - y_j)^2}
+$$
+where $\gamma_{i,j}$ is sampled from the distribution `gamma`.
+
+If `fix_cities=True` is provided, the list of cities is kept the same for all generated instances.
+The $gamma$ values, and therefore also the distances, are still different.
+
+By default, all distances $d_{i,j}$ are rounded to the nearest integer.  If `round=False`
+is provided, this rounding will be disabled.
+
+### Challenge A
+
+* Fixed list of 350 cities in the $[0, 1000]^2$ square
+* $\gamma_{i,j} \sim U(0.95, 1.05)$
+* 500 training instances, 50 test instances
+
+
+```python
+TravelingSalesmanGenerator(x=uniform(loc=0.0, scale=1000.0),
+                           y=uniform(loc=0.0, scale=1000.0),
+                           n=randint(low=350, high=351),
+                           gamma=uniform(loc=0.95, scale=0.1),
+                           fix_cities=True,
+                           round=True,
+                          )
+```
+
+![alt](figures/benchmark_tsp_a.png)
+
+
+## Multidimensional 0-1 Knapsack Problem
+
+### Problem definition
+
+Given a set of $n$ items and $m$ types of resources (also called *knapsacks*), the problem is to find a subset of items that maximizes profit without consuming more resources than it is available. More precisely, the problem is:
+
+$$
+\begin{align*}
+    \text{maximize}
+        & \sum_{j=1}^n p_j x_j
+        \\
+    \text{subject to}
+        & \sum_{j=1}^n w_{ij} x_j \leq b_i
+        & \forall i=1,\ldots,m \\
+    & x_j \in \{0,1\}
+        & \forall j=1,\ldots,n
+\end{align*}
+$$
+
+### Random instance generator
+
+The class `MultiKnapsackGenerator` can be used to generate random instances of this problem. The number of items $n$ and knapsacks $m$ are sampled from the user-provided probability distributions `n` and `m`. The weights $w_{ij}$ are sampled independently from the provided distribution `w`. The capacity of knapsack $i$ is set to
+
+$$
+    b_i = \alpha_i \sum_{j=1}^n w_{ij}
+$$
+
+where $\alpha_i$, the tightness ratio, is sampled from the provided probability
+distribution `alpha`. To make the instances more challenging, the costs of the items
+are linearly correlated to their average weights. More specifically, the price of each
+item $j$ is set to:
+
+$$
+    p_j = \sum_{i=1}^m \frac{w_{ij}}{m} + K  u_j,
+$$
+
+where $K$, the correlation coefficient, and $u_j$, the correlation multiplier, are sampled
+from the provided probability distributions `K` and `u`.
+
+If `fix_w=True` is provided, then $w_{ij}$ are kept the same in all generated instances. This also implies that $n$ and $m$ are kept fixed. Although the prices and capacities are derived from $w_{ij}$, as long as `u` and `K` are not constants, the generated instances will still not be completely identical.
+
+
+If a probability distribution `w_jitter` is provided, then item weights will be set to $w_{ij} \gamma_{ij}$ where $\gamma_{ij}$ is sampled from `w_jitter`. When combined with `fix_w=True`, this argument may be used to generate instances where the weight of each item is roughly the same, but not exactly identical, across all instances. The prices of the items and the capacities of the knapsacks will be calculated as above, but using these perturbed weights instead.
+
+By default, all generated prices, weights and capacities are rounded to the nearest integer number. If `round=False` is provided, this rounding will be disabled.
+
+
+!!! note "References"
+    * Freville, Arnaud, and Gérard Plateau. *An efficient preprocessing procedure for the multidimensional 0–1 knapsack problem.* Discrete applied mathematics 49.1-3 (1994): 189-212.
+    * Fréville, Arnaud. *The multidimensional 0–1 knapsack problem: An overview.* European Journal of Operational Research 155.1 (2004): 1-21.
+    
+### Challenge A
+
+* 250 variables, 10 constraints, fixed weights
+* $w \sim U(0, 1000), \gamma \sim U(0.95, 1.05)$
+* $K = 500, u \sim U(0, 1), \alpha = 0.25$
+* 500 training instances, 50 test instances
+
+
+```python
+MultiKnapsackGenerator(n=randint(low=250, high=251),
+                       m=randint(low=10, high=11),
+                       w=uniform(loc=0.0, scale=1000.0),
+                       K=uniform(loc=500.0, scale=0.0),
+                       u=uniform(loc=0.0, scale=1.0),
+                       alpha=uniform(loc=0.25, scale=0.0),
+                       fix_w=True,
+                       w_jitter=uniform(loc=0.95, scale=0.1),
+                      )
+```
+
+![alt](figures/benchmark_knapsack_a.png)
+
diff --git a/docs-sphinx/conf.py b/docs-sphinx/conf.py
new file mode 100644
index 0000000..630490d
--- /dev/null
+++ b/docs-sphinx/conf.py
@@ -0,0 +1,16 @@
+project = "MIPLearn"
+copyright = "2020-2021, UChicago Argonne, LLC"
+author = ""
+release = "0.2.0"
+extensions = ["myst_parser"]
+templates_path = ["_templates"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+html_theme = "sphinx_book_theme"
+html_static_path = ["_static"]
+html_css_files = ["custom.css"]
+html_theme_options = {
+    "repository_url": "https://github.com/ANL-CEEESA/MIPLearn/",
+    "use_repository_button": True,
+    "extra_navbar": "",
+}
+html_title = f"MIPLearn<br/><small>{release}</small>"
diff --git a/docs-sphinx/customization.md b/docs-sphinx/customization.md
new file mode 100644
index 0000000..cc087fc
--- /dev/null
+++ b/docs-sphinx/customization.md
@@ -0,0 +1,182 @@
+```{sectnum}
+---
+start: 3
+depth: 2
+suffix: .
+---
+```
+
+# Customization
+
+## Customizing solver parameters
+
+### Selecting the internal MIP solver
+
+By default, `LearningSolver` uses [Gurobi](https://www.gurobi.com/) as its internal MIP solver, and expects models to be provided using the Pyomo modeling language. Supported solvers and modeling languages include:
+
+* `GurobiPyomoSolver`: Gurobi with Pyomo (default).
+* `CplexPyomoSolver`: [IBM ILOG CPLEX](https://www.ibm.com/products/ilog-cplex-optimization-studio) with Pyomo.
+* `XpressPyomoSolver`: [FICO XPRESS Solver](https://www.fico.com/en/products/fico-xpress-solver) with Pyomo.
+* `GurobiSolver`: Gurobi without any modeling language.
+
+To switch between solvers, provide the desired class using the `solver` argument:
+
+```python
+from miplearn import LearningSolver, CplexPyomoSolver
+solver = LearningSolver(solver=CplexPyomoSolver)
+```
+
+To configure a particular solver, use the `params` constructor argument, as shown below.
+
+```python
+from miplearn import LearningSolver, GurobiPyomoSolver
+solver = LearningSolver(
+    solver=lambda: GurobiPyomoSolver(
+        params={
+            "TimeLimit": 900,
+            "MIPGap": 1e-3,
+            "NodeLimit": 1000,
+        }
+    ),
+)
+```
+
+
+## Customizing solver components
+
+`LearningSolver` is composed by a number of individual machine-learning components, each targeting a different part of the solution process. Each component can be individually enabled, disabled or customized. The following components are enabled by default:
+
+* `LazyConstraintComponent`: Predicts which lazy constraint to initially enforce.
+* `ObjectiveValueComponent`: Predicts the optimal value of the optimization problem, given the optimal solution to the LP relaxation.
+* `PrimalSolutionComponent`: Predicts optimal values for binary decision variables. In heuristic mode, this component fixes the variables to their predicted values. In exact mode, the predicted values are provided to the solver as a (partial) MIP start.
+
+The following components are also available, but not enabled by default:
+
+* `BranchPriorityComponent`: Predicts good branch priorities for decision variables.
+
+### Selecting components
+
+To create a `LearningSolver` with a specific set of components, the `components` constructor argument may be used, as the next example shows:
+
+```python
+# Create a solver without any components
+solver1 = LearningSolver(components=[])
+
+# Create a solver with only two components
+solver2 = LearningSolver(components=[
+    LazyConstraintComponent(...),
+    PrimalSolutionComponent(...),
+])
+```
+
+### Adjusting component aggressiveness
+
+The aggressiveness of classification components, such as `PrimalSolutionComponent` and `LazyConstraintComponent`, can be adjusted through the `threshold` constructor argument. Internally, these components ask the machine learning models how confident are they on each prediction they make, then automatically discard all predictions that have low confidence. The `threshold` argument specifies how confident should the ML models be for a prediction to be considered trustworthy. Lowering a component's threshold increases its aggressiveness, while raising a component's threshold makes it more conservative.
+
+For example, if the ML model predicts that a certain binary variable will assume value `1.0` in the optimal solution with 75% confidence, and if the `PrimalSolutionComponent` is configured to discard all predictions with less than 90% confidence, then this variable will not be included in the predicted MIP start.
+
+MIPLearn currently provides two types of thresholds:
+
+* `MinProbabilityThreshold(p: List[float])` A threshold which indicates that a prediction is trustworthy if its probability of being correct, as computed by the machine learning model, is above a fixed value.
+* `MinPrecisionThreshold(p: List[float])` A dynamic threshold which automatically adjusts itself during training to ensure that the component achieves at least a given precision on the training data set. Note that increasing a component's precision may reduce its recall.
+
+The example below shows how to build a `PrimalSolutionComponent` which fixes variables to zero with at least 80% precision, and to one with at least 95% precision. Other components are configured similarly.
+
+```python
+from miplearn import PrimalSolutionComponent, MinPrecisionThreshold
+
+PrimalSolutionComponent(
+    mode="heuristic",
+    threshold=MinPrecisionThreshold([0.80, 0.95]),
+)
+```
+
+### Evaluating component performance
+
+MIPLearn allows solver components to be modified, trained and evaluated in isolation. In the following example, we build and
+fit `PrimalSolutionComponent` outside the solver, then evaluate its performance.
+
+```python
+from miplearn import PrimalSolutionComponent
+
+# User-provided set of previously-solved instances
+train_instances = [...]
+
+# Construct and fit component on a subset of training instances
+comp = PrimalSolutionComponent()
+comp.fit(train_instances[:100])
+
+# Evaluate performance on an additional set of training instances
+ev = comp.evaluate(train_instances[100:150])
+``` 
+
+The method `evaluate` returns a dictionary with performance evaluation statistics for each training instance provided,
+and for each type of prediction the component makes. To obtain a summary across all instances, pandas may be used, as below:
+
+```python
+import pandas as pd
+pd.DataFrame(ev["Fix one"]).mean(axis=1)
+```
+```text
+Predicted positive          3.120000
+Predicted negative        196.880000
+Condition positive         62.500000
+Condition negative        137.500000
+True positive               3.060000
+True negative             137.440000
+False positive              0.060000
+False negative             59.440000
+Accuracy                    0.702500
+F1 score                    0.093050
+Recall                      0.048921
+Precision                   0.981667
+Predicted positive (%)      1.560000
+Predicted negative (%)     98.440000
+Condition positive (%)     31.250000
+Condition negative (%)     68.750000
+True positive (%)           1.530000
+True negative (%)          68.720000
+False positive (%)          0.030000
+False negative (%)         29.720000
+dtype: float64
+```
+
+Regression components (such as `ObjectiveValueComponent`) can also be trained and evaluated similarly,
+as the next example shows:
+
+```python
+from miplearn import ObjectiveValueComponent
+comp = ObjectiveValueComponent()
+comp.fit(train_instances[:100])
+ev = comp.evaluate(train_instances[100:150])
+
+import pandas as pd
+pd.DataFrame(ev).mean(axis=1)
+```
+```text
+Mean squared error       7001.977827
+Explained variance          0.519790
+Max error                 242.375804
+Mean absolute error        65.843924
+R2                          0.517612
+Median absolute error      65.843924
+dtype: float64
+```
+
+### Using customized ML classifiers and regressors
+
+By default, given a training set of instantes, MIPLearn trains a fixed set of ML classifiers and regressors, then selects the best one based on cross-validation performance. Alternatively, the user may specify which ML model a component should use through the `classifier` or `regressor` contructor parameters. Scikit-learn classifiers and regressors are currently supported. A future version of the package will add compatibility with Keras models.
+
+The example below shows how to construct a `PrimalSolutionComponent` which internally uses scikit-learn's `KNeighborsClassifiers`. Any other scikit-learn classifier or pipeline can be used. It needs to be wrapped in `ScikitLearnClassifier` to ensure that all the proper data transformations are applied.
+
+```python
+from miplearn import PrimalSolutionComponent, ScikitLearnClassifier
+from sklearn.neighbors import KNeighborsClassifier
+
+comp = PrimalSolutionComponent(
+    classifier=ScikitLearnClassifier(
+        KNeighborsClassifier(n_neighbors=5),
+    ),
+)
+comp.fit(train_instances)
+``` 
diff --git a/docs-sphinx/index.md b/docs-sphinx/index.md
new file mode 100644
index 0000000..04c428f
--- /dev/null
+++ b/docs-sphinx/index.md
@@ -0,0 +1,35 @@
+# MIPLearn
+
+**MIPLearn** is an extensible framework for solving discrete optimization problems using a combination of Mixed-Integer Linear Programming (MIP) and Machine Learning (ML). The framework uses ML methods to automatically identify patterns in previously solved instances of the problem, then uses these patterns to accelerate the performance of conventional state-of-the-art MIP solvers (such as CPLEX, Gurobi or XPRESS).
+
+Unlike pure ML methods, MIPLearn is not only able to find high-quality solutions to discrete optimization problems, but it can also prove the optimality and feasibility of these solutions.
+Unlike conventional MIP solvers, MIPLearn can take full advantage of very specific observations that happen to be true in a particular family of instances (such as the observation that a particular constraint is typically redundant, or that a particular variable typically assumes a certain value). 
+
+For certain classes of problems, this approach has been shown to provide significant performance benefits (see [benchmarks](benchmark.md) and [references](about.md)).
+
+## Features
+
+* **MIPLearn proposes a flexible problem specification format,** which allows users to describe their particular optimization problems to a Learning-Enhanced MIP solver, both from the MIP perspective and from the ML perspective, without making any assumptions on the problem being modeled, the mathematical formulation of the problem, or ML encoding. While the format is very flexible, some constraints are enforced to ensure that it is usable by an actual solver.
+
+* **MIPLearn provides a reference implementation of a *Learning-Enhanced Solver*,** which can use the above problem specification format to automatically predict, based on previously solved instances, a number of hints to accelerate MIP performance. Currently, the reference solver is able to predict: (i) partial solutions which are likely to work well as MIP starts; (ii) an initial set of lazy constraints to enforce; (iii) variable branching priorities to accelerate the exploration of the branch-and-bound tree; (iv) the optimal objective value based on the solution to the LP relaxation. The usage of the solver is very straightforward. The most suitable ML models are automatically selected, trained, cross-validated and applied to the problem with no user intervention.
+
+* **MIPLearn provides a set of benchmark problems and random instance generators,** covering applications from different domains, which can be used to quickly evaluate new learning-enhanced MIP techniques in a measurable and reproducible way.
+
+* **MIPLearn is customizable and extensible**. For MIP and ML researchers exploring new techniques to accelerate MIP performance based on historical data, each component of the reference solver can be individually replaced, extended or customized.
+
+## Site contents
+
+```{toctree}
+---
+maxdepth: 2
+---
+usage.md
+benchmark.md
+customization.md
+about.md
+```
+
+## Source Code
+
+* [https://github.com/ANL-CEEESA/MIPLearn](https://github.com/ANL-CEEESA/MIPLearn)
+
diff --git a/docs-sphinx/usage.md b/docs-sphinx/usage.md
new file mode 100644
index 0000000..9f761f4
--- /dev/null
+++ b/docs-sphinx/usage.md
@@ -0,0 +1,246 @@
+```{sectnum}
+---
+start: 1
+depth: 2
+suffix: .
+---
+```
+
+# Using MIPLearn
+
+## Installation
+
+In these docs, we describe the Python/Pyomo version of the package, although a [Julia/JuMP version](https://github.com/ANL-CEEESA/MIPLearn.jl) is also available. A mixed-integer solver is also required and its Python bindings must be properly installed. Supported solvers are currently CPLEX, Gurobi and XPRESS.
+
+To install MIPLearn, run: 
+
+```bash
+pip3 install --upgrade miplearn==0.2.*
+```
+
+After installation, the package `miplearn` should become available to Python. It can be imported
+as follows:
+
+```python
+import miplearn
+```
+
+## Using `LearningSolver`
+
+The main class provided by this package is `LearningSolver`, a learning-enhanced MIP solver which uses information from previously solved instances to accelerate the solution of new instances. The following example shows its basic usage:
+
+```python
+from miplearn import LearningSolver
+
+# List of user-provided instances
+training_instances = [...] 
+test_instances = [...]
+
+# Create solver
+solver = LearningSolver()
+
+# Solve all training instances
+for instance in training_instances:
+    solver.solve(instance)
+
+# Learn from training instances
+solver.fit(training_instances)
+
+# Solve all test instances
+for instance in test_instances:
+    solver.solve(instance)
+```
+
+In this example, we have two lists of user-provided instances: `training_instances` and `test_instances`. We start by solving all training instances. Since there is no historical information available at this point, the instances will be processed from scratch, with no ML acceleration. After solving each instance, the solver stores within each `instance` object the optimal solution, the optimal objective value, and other information that can be used to accelerate future solves. After all training instances are solved, we call `solver.fit(training_instances)`. This instructs the solver to train all its internal machine-learning models based on the solutions of the (solved) trained instances. Subsequent calls to `solver.solve(instance)` will automatically use the trained Machine Learning models to accelerate the solution process.
+
+
+## Describing problem instances
+
+Instances to be solved by `LearningSolver` must derive from the abstract class `miplearn.Instance`. The following three abstract methods must be implemented:
+
+* `instance.to_model()`, which returns a concrete Pyomo model corresponding to the instance;
+* `instance.get_instance_features()`, which returns a 1-dimensional Numpy array of (numerical) features describing the entire instance;
+* `instance.get_variable_features(var_name, index)`, which returns a 1-dimensional array of (numerical) features describing a particular decision variable.
+
+The first method is used by `LearningSolver` to construct a concrete Pyomo model, which will be provided to the internal MIP solver. The second and third methods provide an encoding of the instance, which can be used by the ML models to make predictions. In the knapsack problem, for example, an implementation may decide to provide as instance features the average weights, average prices, number of items and the size of the knapsack. The weight and the price of each individual item could be provided as variable features. See `src/python/miplearn/problems/knapsack.py` for a concrete example.
+
+An optional method which can be implemented is `instance.get_variable_category(var_name, index)`, which returns a category (a string, an integer or any hashable type) for each decision variable. If two variables have the same category, `LearningSolver` will use the same internal ML model to predict the values of both variables. By default, all variables belong to the `"default"` category, and therefore only one ML model is used for all variables. If the returned category is `None`, ML predictors will ignore the variable.
+
+It is not necessary to have a one-to-one correspondence between features and problem instances. One important (and deliberate) limitation of MIPLearn, however, is that `get_instance_features()` must always return arrays of same length for all relevant instances of the problem. Similarly, `get_variable_features(var_name, index)` must also always return arrays of same length for all variables in each category. It is up to the user to decide how to encode variable-length characteristics of the problem into fixed-length vectors. In graph problems, for example, graph embeddings can be used to reduce the (variable-length) lists of nodes and edges into a fixed-length structure that still preserves some properties of the graph. Different instance encodings may have significant impact on performance.
+
+
+## Describing lazy constraints
+
+For many MIP formulations, it is not desirable to add all constraints up-front, either because the total number of constraints is very large, or because some of the constraints, even in relatively small numbers, can still cause significant performance impact when added to the formulation. In these situations, it may be desirable to generate and add constraints incrementaly, during the solution process itself. Conventional MIP solvers typically start by solving the problem without any lazy constraints. Whenever a candidate solution is found, the solver finds all violated lazy constraints and adds them to the formulation. MIPLearn significantly accelerates this process by using ML to predict which lazy constraints should be enforced from the very beginning of the optimization process, even before a candidate solution is available.
+
+MIPLearn supports two types of lazy constraints: through constraint annotations and through callbacks.
+
+###  Adding lazy constraints through annotations
+
+The easiest way to create lazy constraints in MIPLearn is to add them to the model (just like any regular constraints), then annotate them as lazy, as described below. Just before the optimization starts, MIPLearn removes all lazy constraints from the model and places them in a lazy constraint pool. If any trained ML models are available, MIPLearn queries these models to decide which of these constraints should be moved back into the formulation. After this step, the optimization starts, and lazy constraints from the pool are added to the model in the conventional fashion.
+
+To tag a constraint as lazy, the following methods must be implemented:
+
+* `instance.has_static_lazy_constraints()`, which returns `True` if the model has any annotated lazy constraints. By default, this method returns `False`.
+* `instance.is_constraint_lazy(cid)`, which returns `True` if the constraint with name `cid` should be treated as a lazy constraint, and `False` otherwise.
+* `instance.get_constraint_features(cid)`, which returns a 1-dimensional Numpy array of (numerical) features describing the constraint.
+
+For instances such that `has_lazy_constraints` returns `True`, MIPLearn calls `is_constraint_lazy` for each constraint in the formulation, providing the name of the constraint. For constraints such that `is_constraint_lazy` returns `True`, MIPLearn additionally calls `get_constraint_features` to gather a ML representation of each constraint. These features are used to predict which lazy constraints should be initially enforced.
+
+An additional method that can be implemented is `get_lazy_constraint_category(cid)`, which returns a category (a string or any other hashable type) for each lazy constraint. Similarly to decision variable categories, if two lazy constraints have the same category, then MIPLearn will use the same internal ML model to decide whether to initially enforce them. By default, all lazy constraints belong to the `"default"` category, and therefore a single ML model is used.
+
+!!! warning
+    If two lazy constraints belong to the same category, their feature vectors should have the same length.
+
+### Adding lazy constraints through callbacks
+
+Although convenient, the method described in the previous subsection still requires the generation of all lazy constraints ahead of time, which can be prohibitively expensive. An alternative method is through a lazy constraint callbacks, described below. During the solution process, MIPLearn will repeatedly call a user-provided function to identify any violated lazy constraints. If violated constraints are identified, MIPLearn will additionally call another user-provided function to generate the constraint and add it to the formulation.
+
+To describe lazy constraints through user callbacks, the following methods need to be implemented:
+
+* `instance.has_dynamic_lazy_constraints()`, which returns `True` if the model has any lazy constraints generated by user callbacks. By default, this method returns `False`.
+* `instance.find_violated_lazy_constraints(model)`, which returns a list of identifiers corresponding to the lazy constraints found to be violated by the current solution. These identifiers should be strings, tuples or any other hashable type.
+* `instance.build_violated_lazy_constraints(model, cid)`, which returns either a list of Pyomo constraints, or a single Pyomo constraint, corresponding to the given lazy constraint identifier.
+* `instance.get_constraint_features(cid)`, which returns a 1-dimensional Numpy array of (numerical) features describing the constraint. If this constraint is not valid, returns `None`.
+* `instance.get_lazy_constraint_category(cid)`, which returns a category (a string or any other hashable type) for each lazy constraint, indicating which ML model to use. By default, returns `"default"`.
+
+
+Assuming that trained ML models are available, immediately after calling `solver.solve`, MIPLearn will call `get_constraint_features` for each lazy constraint identifier found in the training set. For constraints such that `get_constraint_features` returns a vector (instead of `None`), MIPLearn will call `get_constraint_category` to decide which trained ML model to use. It will then query the ML model to decide whether the constraint should be initially enforced. Assuming that the ML predicts this constraint will be necessary, MIPLearn calls `build_violated_constraints` then adds the returned list of Pyomo constraints to the model. The optimization then starts. When no trained ML models are available, this entire initial process is skipped, and MIPLearn behaves like a conventional solver.
+
+After the optimization process starts, MIPLearn will periodically call `find_violated_lazy_constraints` to verify if the current solution violates any lazy constraints. If any violated lazy constraints are found, MIPLearn will call the method `build_violated_lazy_constraints` and add the returned constraints to the formulation.
+
+```{tip}
+When implementing `find_violated_lazy_constraints(self, model)`, the current solution may be accessed through `self.solution[var_name][index]`.
+```
+
+## Obtaining heuristic solutions
+
+By default, `LearningSolver` uses Machine Learning to accelerate the MIP solution process, while maintaining all optimality guarantees provided by the MIP solver. In the default mode of operation, for example, predicted optimal solutions are used only as MIP starts.
+
+For more significant performance benefits, `LearningSolver` can also be configured to place additional trust in the Machine Learning predictors, by using the `mode="heuristic"` constructor argument. When operating in this mode, if a ML model is statistically shown (through *stratified k-fold cross validation*) to have exceptionally high accuracy, the solver may decide to restrict the search space based on its predictions. The parts of the solution which the ML models cannot predict accurately will still be explored using traditional (branch-and-bound) methods.  For particular applications, this mode has been shown to quickly produce optimal or near-optimal solutions (see [references](about.md#references) and [benchmark results](benchmark.md)).
+
+
+```{danger}
+The `heuristic` mode provides no optimality guarantees, and therefore should only be used if the solver is first trained on a large and representative set of training instances. Training on a small or non-representative set of instances may produce low-quality solutions, or make the solver incorrectly classify new instances as infeasible.
+```
+
+## Scaling Up
+
+### Saving and loading solver state
+
+After solving a large number of training instances, it may be desirable to save the current state of `LearningSolver` to disk, so that the solver can still use the acquired knowledge after the application restarts. This can be accomplished by using the the utility functions `write_pickle_gz` and `read_pickle_gz`, as the following example illustrates:
+
+```python
+from miplearn import LearningSolver, write_pickle_gz, read_pickle_gz
+
+# Solve training instances
+training_instances = [...]
+solver = LearningSolver()
+for instance in training_instances:
+    solver.solve(instance)
+
+# Train machine-learning models
+solver.fit(training_instances)
+
+# Save trained solver to disk
+write_pickle_gz(solver, "solver.pkl.gz")
+
+# Application restarts...
+
+# Load trained solver from disk
+solver = read_pickle_gz("solver.pkl.gz")
+
+# Solve additional instances
+test_instances = [...]
+for instance in test_instances:
+    solver.solve(instance)
+```
+
+
+### Solving instances in parallel
+
+In many situations, instances can be solved in parallel to accelerate the training process. `LearningSolver` provides the method `parallel_solve(instances)` to easily achieve this:
+
+```python
+from miplearn import LearningSolver
+
+training_instances = [...]
+solver = LearningSolver()
+solver.parallel_solve(training_instances, n_jobs=4)
+solver.fit(training_instances)
+
+# Test phase...
+test_instances = [...]
+solver.parallel_solve(test_instances)
+```
+
+
+### Solving instances from the disk
+
+In all examples above, we have assumed that instances are available as Python objects, stored in memory. When problem instances are very large, or when there is a large number of problem instances, this approach may require an excessive amount of memory. To reduce memory requirements, MIPLearn can also operate on instances that are stored on disk, through the `PickleGzInstance` class, as the next example illustrates.
+
+```python
+import pickle
+from miplearn import (
+    LearningSolver,
+    PickleGzInstance,
+    write_pickle_gz,
+)
+
+# Construct and pickle 600 problem instances
+for i in range(600):
+    instance = MyProblemInstance([...])
+    write_pickle_gz(instance, "instance_%03d.pkl" % i)
+        
+# Split instances into training and test
+test_instances  = [PickleGzInstance("instance_%03d.pkl" % i) for i in range(500)]
+train_instances = [PickleGzInstance("instance_%03d.pkl" % i) for i in range(500, 600)]
+
+# Create solver
+solver = LearningSolver([...])
+
+# Solve training instances 
+solver.parallel_solve(train_instances, n_jobs=4)
+
+# Train ML models
+solver.fit(train_instances)
+
+# Solve test instances 
+solver.parallel_solve(test_instances, n_jobs=4)
+```
+
+
+By default, `solve` and `parallel_solve` modify files in place. That is, after the instances are loaded from disk and solved, MIPLearn writes them back to the disk, overwriting the original files. To discard the modifications instead, use `LearningSolver(..., discard_outputs=True)`. This can be useful, for example, during benchmarks.
+
+## Running benchmarks
+
+MIPLearn provides the utility class `BenchmarkRunner`, which simplifies the task of comparing the performance of different solvers. The snippet below shows its basic usage:
+
+```python
+from miplearn import BenchmarkRunner, LearningSolver
+
+# Create train and test instances
+train_instances = [...]
+test_instances  = [...]
+
+# Training phase...
+training_solver = LearningSolver(...)
+training_solver.parallel_solve(train_instances, n_jobs=10)
+
+# Test phase...
+benchmark = BenchmarkRunner({
+    "Baseline": LearningSolver(...),
+    "Strategy A": LearningSolver(...),
+    "Strategy B": LearningSolver(...),
+    "Strategy C": LearningSolver(...),
+})
+benchmark.fit(train_instances)
+benchmark.parallel_solve(test_instances, n_jobs=5)
+benchmark.write_csv("results.csv")
+```
+
+The method `fit` trains the ML models for each individual solver. The method `parallel_solve` solves the test instances in parallel, and collects solver statistics such as running time and optimal value. Finally, `write_csv` produces a table of results. The columns in the CSV file depend on the components added to the solver.
+
+## Current Limitations
+
+* Only binary and continuous decision variables are currently supported. General integer variables are not currently supported by some solver components.