Alternating optimization is an iterative procedure for optimizing some function jointly over all parameters by alternating restricted optimization over individual parameter subsets.

More precisely, consider minimizing or maximizing \(f:\mathbb{R}^n \to \mathbb{R}\) over the set of feasible points \(x \in X \subseteq \mathbb{R}^n\). The underlying algorithm of alternating optimization is as follows:

Assign an initial value for \(x\).

Optimize \(f\) with respect to a subset of parameters \(\tilde{x}\) while holding the other parameters constant. (Note that alternating optimization is a generalization of joint optimization, where the only parameter subset would be the whole set of parameters.)

Replace the values in \(x\) by the optimal values for \(\tilde{x}\) found in step 2.

Repeat from step 2 with another parameter subset.

Stop when the process has converged or reached an iteration limit.

When the joint optimization is (numerically) difficult (or not feasible).

When there is a natural division of the parameters. That is the case, e.g., for likelihood functions, where the parameter space naturally divides into parameter subsets corresponding to linear effects, variances and covariances with different influence on the likelihood value.

To improve optimization time in some cases, see (Hu and Hathaway 2002) for an example.

Compared to joint optimization, alternating optimization may be better in bypassing local optima, see (James C. Bezdek and Hathaway 2002).

Alternating optimization (under certain conditions on \(f\)) can convergence to the global optimum. However, the set of possible solutions also contains saddle points (James C. Bezdek et al. 1987).

J. Bezdek and Hathaway (2003) shows that alternating optimization under reasonable assumptions is locally \(q\)-linearly convergent.

As an application, we consider minimizing the Himmelblau’s function in \(n = 2\) dimensions with parameter constraints \(-5 \leq x_1, x_2 \leq 5\):

```
library("ao")
#> Lade nötiges Paket: optimizeR
#> Thanks for using {optimizeR} 0.3.2.
<- function(x) (x[1]^2 + x[2] - 11)^2 + (x[1] + x[2]^2 - 7)^2
himmelblau ao(f = himmelblau, p = c(0,0), partition = list(1, 2),
base_optimizer = optimizer_optim(lower = -5, upper = 5, method = "L-BFGS-B"))
#> $optimum
#> [1] 1.940035e-12
#>
#> $estimate
#> [1] 3.584428 -1.848126
#>
#> $sequence
#> iteration partition time p1 p2
#> 1 0 0 0.0000000000 0.000000 0.000000
#> 2 1 1 0.0087349415 3.395691 0.000000
#> 3 1 2 0.0003659725 3.395691 -1.803183
#> 4 2 1 0.0003030300 3.581412 -1.803183
#> 5 2 2 0.0002748966 3.581412 -1.847412
#> 6 3 1 0.0003690720 3.584381 -1.847412
#> 7 3 2 0.0002288818 3.584381 -1.848115
#> 8 4 1 0.0002319813 3.584427 -1.848115
#> 9 4 2 0.0001449585 3.584427 -1.848126
#> 10 5 1 0.0001530647 3.584428 -1.848126
#> 11 5 2 0.0001459122 3.584428 -1.848126
#>
#> $time
#> Time difference of 0.01552892 secs
```

The call minimizes `f`

by alternating optimizing with
respect to each parameter separately, where the parameters all are
initialized at the value 0.

The `ao()`

function has the following arguments:

`f`

: The function to be optimized, returning a single numeric value. The first argument of`f`

must be a numeric vector of the length of`p`

followed by any other arguments specified by the`...`

argument.`p`

: Starting parameter values for the optimization.`...`

: Additional arguments to be passed to`f`

.`partition`

: A list of vectors of indices of`p`

, specifying the partition of the alternating optimization. The default is`as.list(1:length(p))`

, i.e. each parameter is optimized separately. Parameter indices can be members of multiple subsets.`base_optimizer`

: An object of class`optimizer`

, which can be specified via`optimizeR::set_optimizer()`

. The default optimizer is`stats::optim()`

.`iterations`

: The number of iterations through the partitions. The default is`10`

.`tolerance`

: A non-negative numeric value. The function terminates prematurely if the euclidean distance between the current solution and the one from the last iteration is smaller than`tolerance`

. The default is`1e-6`

.`print.level`

: This argument determines the level of printing which is done during the optimization process. Three values (analogue to`stats::nlm()`

) can be specified:`0`

(the default): no printing`1`

: initial and final details are printed`2`

: full tracing information is printed

`plot`

: If`TRUE`

, the parameter updates are plotted. The default is`FALSE`

.

Bezdek, James C., and Richard J. Hathaway. 2002. “Some Notes on
Alternating Optimization.” *Proceedings of the 2002 AFSS
International Conference on Fuzzy Systems. Calcutta: Advances in Soft
Computing*. https://doi.org/10.1007/3-540-45631-7_39.

Bezdek, James C, Richard J Hathaway, Michael J Sabin, and William T
Tucker. 1987. “Convergence Theory for Fuzzy c-Means:
Counterexamples and Repairs.” *IEEE Transactions on Systems,
Man, and Cybernetics* 17 (5): 873–77. https://doi.org/10.1109/TSMC.1987.6499296.

Bezdek, James, and Richard Hathaway. 2003. “Convergence of
Alternating Optimization.” *Neural, Parallel and Scientific
Computations* 11 (December): 351–68.

Hu, Yingkang, and Richard J Hathaway. 2002. “On Efficiency of
Optimization in Fuzzy c-Means.” *Neural, Parallel and
Scientific Computations* 10.