Firefly penalty-based algorithm for bound constrained mixed-integer nonlinear programming

In this article, we aim to extend the firefly algorithm (FA) to solve bound constrained mixed-integer nonlinear programming (MINLP) problems. An exact penalty continuous formulation of the MINLP problem is used. The continuous penalty problem comes out by relaxing the integrality constraints and by adding a penalty term to the objective function that aims to penalize integrality constraint violation. Two penalty terms are proposed, one is based on the hyperbolic tangent function and the other on the inverse hyperbolic sine function. We prove that both penalties can be used to define the continuous penalty problem, in the sense that it is equivalent to the MINLP problem. The solutions of the penalty problem are obtained using a variant of the metaheuristic FA for global optimization. Numerical experiments are given on a set of benchmark problems aiming to analyze the quality of the obtained solutions and the convergence speed. We show that the firefly penalty-based algorithm compares favourably with the penalty algorithm when the deterministic DIRECT or the simulated annealing solvers are invoked, in terms of convergence speed.


Introduction
This article aims to analyze the merit, performance-wise, of a penalty approach for globally solving mixed-integer nonlinear programming (MINLP) problems. A continuous relaxation of the MINLP problem is carried out by converting it to a finite sequence of nonlinear programming (NLP) problems with only continuous variables. The problem to be addressed has the form: subject to x ∈ C ⊂ R n (a compact set) x i ∈ R for i ∈ I c ⊆ I ≡ {1, . . . , n} x j ∈ Z for j ∈ I d ⊆ I (in particular x j ∈ {0, 1}) I c ∩ I d = ∅ and I c ∪ I d = I (1) where f is a nonlinear continuous function. The study carried out in this article assumes that the compact set C is C = {x ∈ R n : l i ≤ x i ≤ u i , i ∈ I}. Since x j ∈ Z for j ∈ I d , we define the feasible region of problem (1) as follows: CONTACT M. Fernanda P. Costa mfc@math.uminho.pt This way, |I c | is the number of continuous variables, |I d | gives the number of integer variables, and l and u are the vectors of the lower and upper bounds on the variables, respectively.
The continuous relaxation of an MINLP is obtained by relaxing the integrality conditions from x j ∈ Z, j ∈ I d to x j ∈ R, j ∈ I d . Many approaches using continuous relaxation have been devised to solve MINLP problems. [1] The most used and simple approach solves the continuous relaxation of the MINLP problem followed by rounding off the continuous values (of the integer variables) to their nearest integral values. This type of approach does not work well on problems with a large number of variables.
In general, the presence of integer variables implies that the feasible region W is not convex. We may distinguish convex from nonconvex MINLP problems. If the problem functions are convex, the MINLP problem is called convex; otherwise, it is called nonconvex. A convex MINLP problem is easier to solve than a nonconvex one, since its continuous relaxation is itself a convex problem, and therefore, likely to be tractable, at least in theory. Surveys on integer and mixed-integer programming can be found in [2,3].
Exact and heuristic methods have been proposed to solve MINLP problems. For convex MINLP, some quite effective exact methods that have been devised based on the convex property include branch-and-bound (BB), [4,5] outer approximation, [6] branch-and-reduce, [7] branch-and-cut, generalized Benders decomposition, LP/NLP-based BB and hybrid methods. [2,8] They are capable of solving large problems with hundreds of variables although the computational time required to reach a solution may grow exponentially. By contrast, the continuous relaxation of a nonconvex MINLP is itself a global optimization problem, and therefore, likely to be NP-hard. Spatial BB has emerged from a combination of the standard BB approach and standard NLP methods, like for example sBB and BARON solvers. COUENNE (Convex Over-and Under-ENvelopes for Nonlinear Estimation) is a popular spatial BB solver for solving nonconvex MINLP. [4] The main issues addressed by the paper are related with bounds tightening and branching strategies. When solving nonconvex MINLP, most algorithms have difficulty in reaching a feasible point. This issue has been addressed in the literature by feasibility pumps. [9,10] Recent derivative-free methods for locating a local minimizer of MINLP problems are presented in [11][12][13][14][15]. In the first paper, the generalized pattern search algorithm for linearly constrained (continuous) optimization was extended to mixed variable problems and the constraints are treated by the extreme barrier approach. In [12,13], pattern search-based methods are used to tackle continuous and discrete variables simultaneously. The approaches in [14,15] use a discrete line search to deal with the discrete variables, although in [15], a simple penalty is used to handle the nonlinear constraints.
Recently, exact penalty approaches have also been extended to general nonlinear integer programming problems. Furthermore, it has been shown that a general class of penalty functions can be used to solve general MINLP problems. [1,[16][17][18][19] When a global solution to a nonconvex NLP problem is required, global optimization solvers capable of converging to a global minimizer may be used, like for example, the exact methods BARON, [18,20] αBB [21] and DIRECT. [22][23][24] In the context of penalty function techniques for solving general constrained NLP problems, a diversity of metaheuristic algorithms (like differential evolution, electromagnetism-like mechanism, genetic algorithm and artificial fish swarm algorithm) have been appearing in the literature. [25][26][27][28][29] Recently, a metaheuristic algorithm, termed as firefly algorithm (FA) that mimics the social behaviour of fireflies based on the flashing and attraction characteristics of fireflies, has been developed for continuous optimization. [30,31] Several variants of the firefly algorithm do already exist in the literature. Improvements have been made to FA aiming to accelerate convergence (see, e.g. [32][33][34]). A Gaussian FA [35] and a hybrid FA with harmony search is proposed in [36]. Another modified FA is presented in [37], allowing the movement of the brightest firefly along a direction randomly chosen from a set of directions, provided that its brightness improves. A convergence analysis based on a parameter selection on FA is presented in [38]. A recent review of firefly algorithms is available in [39].
Our contribution in this article is directed to the area of continuous reformulation of bound constrained MINLP problems by relaxing the integrality conditions and adding suitable penalty terms to the objective function. Two penalty terms are proposed: one is based on the hyperbolic tangent function and the other on the inverse hyperbolic sine function. These proposals satisfy the required assumptions to prove that the bound constrained MINLP problem is equivalent to a continuous penalty relaxation, in the sense that they have the same global minimizers. Furthermore, for the NLP problems, a new fitness-based adaptive scheme is incorporated into the firefly movement equation, within the metaheuristic FA, to globally solve the continuous penalty problem. This new adaptive FA is less dependent on user-provided parameters since only one control parameter is required to switch from global to local search.
The remaining part of this article is organized as follows. Section 2 describes the two proposed continuous penalty terms, including their properties and Section 3 reviews FA and presents the new ideas concerning the fitness-based adaptive firefly movement. Section 4 contains the results of all the numerical experiments and the conclusions are summarized in Section 5.

Penalty function technique
This section is concerned with the exact penalty approach that can be extended to solve MINLP problems. In this context, problem (1) is equivalent to the following continuous reformulation, which comes out by relaxing the integer constraints on the variables and adding a particular penalty term to the objective function: where ε ∈ (0,ε] and P(x; ε) is the penalty term, in the sense that they have the same global minimizers. The resulting penalty function in the NLP problem (3) is termed 'exact' since the two problems have the same global minimizers for a sufficiently small value of the penalty parameter ε. [17,18] The following assumptions about f and the penalty P are considered.

Assumption 2.2:
For all x, y ∈ W and for all ε ∈ R + , whereL > L and p is chosen as in (4). Furthermore, let S = z∈W S(z), ∃x / ∈ S such that P(x; ε) ≥ P(x; ε) for all x ∈ C\S and for all ε > 0.  An example of the class of penalty terms that can be used to solve MINLP problems is the following [1,17,18]: In [17], assuming that f satisfies Assumption 2.1, it is shown that the penalty (9) satisfies Assumptions 2.2 and 2.3. Then, Theorem 2.4 can be applied and it has been concluded that ∃ε > 0 such that for any ε ∈ (0,ε], problem (1) and problem (3) based on penalty (9) have the same global minimizers.
Different penalty terms may be defined for problem (3), as long as some properties are maintained. Our proposals consider the hyperbolic tangent and the inverse hyperbolic sine functions. For these functions, we prove that Assumptions 2.2 and 2.3 hold.

The hyperbolic tangent penalty
The first newly proposed penalty term for solving MINLP via the equivalence between problems (1) and (3) is based on the hyperbolic tangent function, tanh ( · ), which is differentiable and strictly increasing on the set C containing [l, u]: and ε ∈ R + is the penalty parameter. According to Theorem 2.4, we prove that penalty P t can be used in problem (3). Property 2.5: For the penalty term in (10), there existsε > 0 such that for any ε ∈ (0,ε], problems (1) and (3) have the same global minimizers.
The proof of this result is closely related to the proof given in [17] and is left for the Appendix 1 of the paper. Figure 1 shows the behaviour of the hyperbolic tangent penalty (10) as ε decreases (for ε = 1, 0.5, 0.25, 0.125, 0.0625, 0.03125). The figure also shows the penalties based on the 'log' penalty (P) in (9), for ε = 1 and ε = 0.03125. We observe that the penalty in (10) increases faster than the penalty 'log' as ε decreases and the integrality violation increases.

The inverse hyperbolic sine penalty
The inverse hyperbolic sine, asinh (t), is a differentiable function, for all real t, that is suitable to define the penalty term in the problem (3) when solving MINLP problems like (1). Function asinh (t) is nonnegative and strictly increasing on [0, +∞). The penalty term takes the form where ε > 0 is the penalty parameter. The behaviour of the inverse hyperbolic sine penalty for ε = 1, 0.5, 0.25, 0.125, 0.0625, 0.03125 is shown in Figure 2. For comparison, the 'log' penalty (P) for ε = 1 and ε = 0.03125 is also displayed. In this case, as ε decreases and the integrality violation increases, the behaviour of penalty (11) is somewhat similar to the 'log' penalty. Once again, using Theorem 2.4 we can prove the following: Property 2.6: For the penalty term in (11), there existsε > 0 such that for any ε ∈ (0,ε], problems (1) and (3) have the same global minimizers.
The proof of this result is closely related to the proof given in [17] and is left for the Appendix 1 of the paper.

The exact penalty algorithm
We now briefly describe the exact penalty method aiming to find a global minimizer of an MINLP problem. [18] See Algorithm 1. In this article, we implement the exact penalty method with the newly proposed penalty terms (10) and (11), since they satisfy Assumptions 2.2 and 2.3, as shown in Subsections 2.1 and 2.2. To compute a δ (k) approximation to the global minimizer of the penalty problem (3) that satisfies for fixed ε (k) and δ (k) , we propose the stochastic population-based strategy known as FA. The motivation and advantages for using FA (in Algorithm 2 below) are described in the next section. The penalty parameter is updated ε (k+1) = σ ε ε (k) , σ ε ∈ (0, 1), when the computed approximate global minimizer x (k) , of the penalty function ψ(x; ε (k) ), is not feasible for problem (1) and condition holds, where z (k) ∈ W is the point that minimizes the distance between x (k) and the set S( j to the nearest integer, and z [18]). On the other hand, the parameter ε (k) remains unchanged when x (k) is feasible or when x (k) is infeasible but condition (13) is not satisfied.
The algorithm terminates when a feasible solution is found within δ min of the known global optimum solution. However, if the optimal value is not known, the algorithm may use another condition, for example, one based on the number of iterations, k > k max , for a given threshold value k max (or on the total number of function evaluations, nf > nf max , where nf max is the threshold value).
The convergence results presented in [18] for a general equivalence framework, where a general constrained NLP problem is considered equivalent to a sequence of penalty problems by adding a suitable penalty term to the objective function (in the sense that they have the same global minimizers), have been extended to the exact penalty method with the penalty term (9), for solving an MINLP problem via an NLP continuous reformulation.
Based on the results reported in [18], we now state the main convergence results of Algorithm 1. We assume that the sequence of iterates {x (k) } is well defined, i.e. the approximate global minimizer x (k) of the penalty function ψ( · ) that satisfies condition (12) can be found. Since we use the metaheuristic FA to compute the approximate global minimizer, there is no theoretical guarantee that the approximate solution can be found in finite time. Relying on the probability theory, a stochastic convergence may be established for FA. This type of convergence is different from the convergence concept of classical analysis. The probabilistic version of pointwise convergence from elementary real analysis is the convergence with probability one. This means that, in the limit, the convergence of FA to an approximate minimizer of the penalty function will be guaranteed with probability one. This issue will be addressed in a future study. Lemma 2.7: Let {x (k) } be the sequence of iterates produced by Algorithm 1. Then either • ∃k such that ε (k) =ε, for any k ≥k, and every accumulation pointx of the sequence {x (k) } is feasible (x ∈ W); or • ε (k) → 0 and every accumulation pointx of a subsequence {x (k i ) } is feasible, with k i belonging to the set of indices in which condition (13) is satisfied; holds.
The following lemma states that the parameter ε is updated only a finite number of times. Lemma 2.8: Let {x (k) } and {ε (k) } be sequences produced by Algorithm 1. Then ∃k such that ε (k) =ε, for any k ≥k.
The following result is a consequence of Lemmas 2.7 and 2.8 and δ (k) → 0. Theorem 2.9: Every accumulation pointx of the sequence of iterates {x (k) } produced by Algorithm 1 is a global minimizer of the problem (1).
Since our penalty term proposals satisfy Assumptions 2.2 and 2.3, the proofs of Lemmas 2.7 and 2.8 and Theorem 2.9 follow the same arguments used in the proofs of Lemmas 1, 2 and Theorem 2 in [18].

Solving NLP continuous problems
Firefly algorithm is a bio-inspired metaheuristic algorithm that is capable of converging to a global solution of an optimization problem. It is inspired by the flashing behaviour of fireflies at night. According to [30,31,37], the three main rules used to construct FA are the following: • all fireflies are unisex, meaning that any firefly can be attracted to any other brighter one; • the brightness of a firefly is determined from the encoded objective function; • attractiveness is directly proportional to brightness but decreases with distance.
The performance of any metaheuristic depends on the diversification (or exploration) and intensification (or exploitation) features of the algorithm. Diversification aims to explore the search space and to compute a diversity of solutions. The exploration aspect of a metaheuristic relies on randomization. Intensification aims to locally exploit a region identified as a promising one. FA is a population-based metaheuristic that does not require any derivative information and its performance does not depend on the properties of the objective function. The balance between local and global searches is crucial for the success of FA. Both searches emerge from the equation of the firefly movement (see Equation (14) below), in particular, they depend on how the control parameters vary. Furthermore, the performance of the algorithm is only moderately affected by the size of the population of fireflies. The algorithm is not dependent on large population sizes to be able to converge to high-quality solutions.

Classical FA
We use the point x = (x 1 , x 2 , . . . , x n ) T to represent the location of a firefly in the search space C and the position of the jth point of a population of m points is represented by x j ∈ R n . In the classical FA, a firefly i is attracted to another more brighter firefly j, and the movement may be determined by: where the second term on the right-hand side of (14) is due to the attraction, while the third term is due to randomization with α ∈ (0, 1) being the randomization parameter. Here, L(0, 1) is a random number from the standard Lévy distribution (with a location parameter equals to 0, a scale parameter equals to 1 and the shape parameter equals to 0.5), and the vector σ i gives the variation of x i relative to the position of the best firefly, x 1 , as follows: The parameter β gives the attractiveness of a firefly and varies with the light intensity seen by adjacent fireflies and the distance between themselves. β 0 is the attraction parameter when the distance is zero. The parameter γ characterizes the variation of the attractiveness, and is crucial to speed the convergence of the algorithm. In theory, γ could take any value in the set [0, ∞). When γ → 0, the attractiveness is constant β = β 0 and a flashing firefly can be seen anywhere in the domain. When γ → ∞, the attractiveness is almost zero in the sight of other fireflies or the fireflies are short-sighted. In particular, when β 0 = 0, the algorithm behaves like a simple random walk. In summary, the three control parameters of FA are: the randomization parameter α, the attractiveness β and the absorption coefficient γ . [39,40] In the herein presented implementation of FA, we use previously proposed updates for α and γ that decrease with the iteration counter, k FA , of the algorithm. [41] Thus, α is allowed to decrease linearly with k FA , from an upper level α max to a lower level α min : where k FA max is the threshold number of allowed iterations. In order to make the attraction term on the right-hand side of (14) to dominate the movement mostly at the end of the iterative process, γ is also decreased with k FA from γ max to γ min , as follows: Algorithm 2 presents the main steps of a classical FA. We note that the best point of the population is the position of the brightest firefly. When fireflies are ranked according to their fitness, i.e. according to the objective value ψ k ( · ) ≡ ψ(·; ε (k) ), the best firefly is x 1 .

Adaptive FA
An opportune parameter choice is fundamental to increase the overall efficiency of the algorithm. Here, we propose an adaptive FA in the sense that the local search activity relies on the relative variation in fitness between the firefly i and each of the other brighter fireflies. The brighter is the firefly (relative to firefly i), the higher is the attraction, i.e. when ψ k (x j ) < ψ k (x l ) < ψ k (x i ), the attractiveness is higher when firefly i moves towards firefly j than towards firefly l, being equal to one when the movement is towards the brightest firefly x 1 . In the context of solving the NLP continuous problem (3), the movement that results from firefly i be attracted to the brighter firefly j is defined by where the attractiveness depends on the variation in fitness ψ between the firefly i and the brighter firefly j, relative to the difference between firefly i and the brightest firefly of all. Thus, it has the ability to easily adapt to problem landscape. The randomization term is similar to (14). We note here that the second term of Equation (18) is scaled by the parameter exp −α that aims to enforce the second term dominance at the end of the iterative process, while the third term dominates at the beginning of the iterations (see the above-referred updating scheme (16)). The adaptive FA possesses good exploring as well as exploiting capabilities. Unlike the classical and other variants of FA, the adaptive FA has just one control parameter.

Numerical experiments
Several sets of experiments were carried out aiming to compare the convergence speed and the solution quality produced by the algorithm when tested with both penalty terms P t ( · ) in (10) and P a ( · ) in (11). Convergence speed is measured comparing the number of function evaluations needed to reach a certain threshold of objective function value (available in the literature). Solution quality is measured in terms of difference between the obtained result and the best known optimal value, f * .
The numerical experiments were carried out on a PC Intel Core 2 Duo Processor E7500 with 2.9GHz and 4Gb of memory RAM. The algorithms were coded in Matlab Version 8.1 (R2013a).

Experimental setting
During most experiments, the size of the population in the 'FA' and 'adaptive FA' is made to depend on the problem's dimension and is set to m = min{5n, 50}. Twenty-two instances of 14 MINLP problems are used for benchmarking. Seven problems have n = 2 variables, two have n = 4 variables, problem Dixon-Price is tested with n = 2 and n = 4, problem Sum Squares is tested with n = 5 and n = 10, and problems Ackley, Levy and Rastrigin are tested with n = 5, 10 and 20. See the Appendix 2 for the full description of the problems and their acronyms. The parameters in Algorithm 1 are set as follows: ε (1) = 10, δ (1) = 1, δ min = 1e-04, σ ε = 0.1, σ δ = 0.1, p = 1, k max = 20. On the other hand, in Algorithm 2 we set: k FA max = 100, α max = 0.5, α min = 0.001, γ max = 10, γ min = 0.001, and β 0 = 1.

Comparison between 'FA' and 'adaptive FA'
First, a comparison between 'FA' and the herein proposed 'adaptive FA' based on the solution quality is presented. Algorithm 1 is terminated when the number of iterations exceeds k max . Table 1 shows the difference between the objective function value of the best run (out of 10 independent runs) and the known global solution, |f best − f * |. Both penalty terms (10) and (11) are tested. The results correspond to instances of the 14 problems with small dimensions. We conclude that 'adaptive FA' performs slightly better than 'FA' and the penalty algorithm associated with the hyperbolic tangent |f best − f * | using penalty (10) |f best − f * | using penalty (11) Prob.  (10), produces higher quality solutions on 8 cases against 6 of the penalty (11) (among the 30 results of both variants, where 16 are ties).

Population size effect on performance
In this section, we present some results concerning with the effect on the algorithm performance of the size of the population of fireflies. We use four instances with n = 5 and test three values of m: 2n, 5n and 10n. Table 2 reports on |f best − f * |. Both proposed penalties (10) and (11) are tested. Based on the results, we may conclude that the quality is higher with larger population sizes, although the difference between the use of m = 5n and m = 10n is almost not noticeable.

Other experimental tests
Finally, we use all the instances above mentioned and analyze the performance of the exact penalty technique in terms of the quality of the obtained solutions. Table 3 shows the difference |f best − f * | obtained after running each instance 10 times with each tested penalty. For these experiments, the penalty algorithm is terminated solely with the condition k > k max with k max = 20. We also note that the population size is again set to m = min{5n, 50}. For comparison, we present the results for the penalties (10), (11) and (9). We also aim to compare the performance of the algorithm when using different values for the initial ε (1) : 10 and 100. Since larger dimensional instances are now included in the test set, we allow the 'adaptive FA' to run for a maximum of 50 iterations. Apart four instances, BL, BF1, Buk and Him, that perform evenly and extremely well with the three tested penalties, we observe that the quality of the obtained solutions is higher with the penalty (10) with 42% of the tested cases (considering the total of 26 sets of runs with different |f best − f * | values, out of 44 sets), than with the penalties (11) (with 35% of the cases) and (9) (with only 23% of the cases). We also observe that the quality is higher when ε (1) = 10 is used (71% against 29%, Table 3. Reports on solution quality |f best − f * |, using k max = 20. Penalty (10) Penalty (11) Penalty (9) Prob. f * ε (1) = 10 ε (1) = 100 ε (1) = 10 ε (1) = 100 ε (1) (11) and (59% against 41% among 17 instances) with penalty (9), and it is a tie when penalty (10) is used. On the other hand, Table 4 shows the average number of function evaluations, 'nf avg ', and the standard deviation of function evaluations, 'St.D.', required to reach a feasible solution within δ min of the known global optimum solution, i.e. the algorithm terminates when both conditions x (k) ∈ W and f (x (k) ) ≤ f * +δ min (for δ min = 1.0e−04) are met. During these experiments, to check if x (k) ∈ W, where z (k) is defined as in (13), we use x (k) − z (k) ∞ ≤ σ min , where the tolerance error σ min is set to 1.0e−05.
The goal now is to analyze the convergence speed of the Algorithm 1 when using the proposed penalties and ε (1) = 100. For these experiments, we also set k FA max = 50. We note that the average number of function evaluations is computed only from the runs that terminated with the conditions x (k) ∈ W and f (x (k) ) ≤ f * + δ min . These are considered successful runs and the table reports the percentage of successful runs, 'SR' (%). In the table, the character '-' means that only one run in 10 was successful, thus nf avg is the number of function evaluations of that run and St.D. does not apply. The instances where the algorithm was not able to reach a feasible solution within the specified error tolerance, in any run, are identified with '0' in 'SR'. 'Emphasized' values in parentheses correspond to the objective function value of the best run (in the column of the average number of function evaluations) and a measure of the proximity of the obtained solution x (k) to the known global solution x * (in the column of St.D.).
From the results, we may conclude that the herein proposed penalties (10) and (11) have comparable convergence behaviour. The penalty (10) solves 59% of tested instances with 90-100% of successful runs (and 73% of instances with 80-100% of successful runs) and has 14% of instances with only unsuccessful runs, and the penalty (11) successfully solves 68% of instances with 90-100% of successful runs (and 73% of instances with 80-100% of successful runs), but has 18% of instances with only unsuccessful runs. On the other hand, the penalty (9) used in [17,18] is successful in 90-100% of the runs when solving 64% of the instances (and is successful in 80-100% of the runs with 68% of the instances), and produces only unsuccessful runs on 23% of the instances. We note that the penalty algorithm associated with the function (10) performs slightly better than with the other Table 4. Reports on convergence speed, using δ min = 1.0e-04 and σ min = 1.0e-05.
Penalty (10) Penalty ( two cases in comparison, with an overall average percentage of successful runs of 75%, against 74% of the function (11) and 72% of penalty (9). We now use two well-known global search techniques available in the literature to solve the NLP continuous problems (3) in this exact penalty context. The first is the deterministic and exact global search procedure 'DIRECT' [23]; the second is a point-by-point stochastic global algorithm known as 'simulated annealing' (SA). [42] The function 'simulannealbnd' from the Global Optimization Toolbox™ of Matlab is invoked.
The results for comparison are reported in Table 5. The penalty algorithm is terminated with the conditions x (k) ∈ W and f (x (k) ) ≤ f * + 1.0e-03 (19) where the tolerance error σ min is now set to 1.0e-03, and f (x (k) ) is the solution reported by 'DIRECT' and by 'SA', and is the best solution (among m) obtained by our proposed 'adaptive FA'. For these tests, the initial value for ε (1) is set to 100. While we use 'nf avg ' and 'St.D.' as criteria to analyze the convergence speed of our 'adaptive FA' and 'SA', only the number of function evaluations, 'nf ', is required as convergence criterion for 'DIRECT', as long as the obtained solution satisfies conditions (19), identified in the table with '1' in the columns marked with 'Suc'. On the other hand, if one of the conditions in (19) is not satisfied, the penalty algorithm stops after 50 iterations, the run is identified with '0' (unsuccessful run) and the best-reported solution is shown 'emphasized' inside parentheses. Table 5 also displays the CPU time, 'T', in seconds, required to achieve the reported solution. For the proposed 'adaptive FA' and for 'SA', the times correspond to the averaged values registered during the successful runs, 'T avg ' (among 10 runs), or the time of the best run when all of them are unsuccessful. All the other statistics concerned with the 'adaptive FA' have the same meaning as in the previous table.
For a fair comparison between 'adaptive FA', 'DIRECT' and 'SA' (when invoked in the penalty algorithm context), we use the same stopping criterion. Thus, each algorithm is terminated when the number of penalty function evaluations exceeds a threshold value, npenf max , set to 5000. We also test 'DIRECT' with the stopping parameter values defined by default. We also run two sets of experiments     with the 'SA': the first set stops the solver when 5000 function evaluations are reached; the second uses the stopping criteria and parameters defined by default in Matlab. We observe that the penalty strategy based on the 'adaptive FA' is capable of solving (i.e. conditions (19) are met for at least one run) 82% of the instances, and has 90-100% of successful runs in 73% of the instances. In general, the unsuccessful runs occur when solving the larger dimensional problems. The total time required to solve the entire set of tested problems is 6.33e+02 sec.
When the solver 'DIRECT' is used to solve the bound constrained NLP continuous penalty problems, we obtain different convergence speeds. The version with the setting npenf max = 5000 is able to successfully solve 55% of the instances (Suc = '1' in the table), while the version with the default parameter values solves only 32%.
The total time required by 'DIRECT' (with setting npenf max = 5000) to reach the reported solutions for all the problems is 4.26e+02. The default 'DIRECT' version takes much less time (4.57e+00 sec) since it terminates mostly before a good solution is reached, due to the default parameter values (npenf max = 20 and maximum number of iterations set to 10).
When the penalty approach invokes the 'SA', we conclude that both tested versions, one based on setting npenf max = 5000 and the other based on default parameters, successfully solve (with 90-100% of successful runs) a small percentage of instances, 27% and 32%, respectively. Overall, the first version produces a solution satisfying the conditions (19) in at least one run for 45% of the instances, against 55% of the second version. The total times to achieve the reported solutions are of the same order of magnitude: 1.73e+03 for 'SA' (with npenf max = 5000) and 3.92e+03 for 'SA' (with stopping default values).
From the comparison between the 'adaptive FA' and 'DIRECT' (with the setting npenf max = 5000), we conclude that the 'adaptive FA' performs better in terms of robustness (73% of successful runs against 55%), although it requires much more function evaluations. In terms of total time to achieve the reported solutions of all instances, 'adaptive FA' uses only 1.5 times more seconds than 'DIRECT'.
When a comparison is made between algorithms that converge successfully to the required solutions, the function evaluations required by the 'adaptive FA' are in general slightly larger than those of 'SA' (with the setting npenf max = 5000). However, the average times with 'SA' are mostly superior to those of the 'adaptive FA'. Furthermore, the 'adaptive FA' successfully solves almost three times more instances than 'SA'.

Conclusions
This article describes a penalty approach for solving MINLP problems that relies on a continuous reformulation of the MINLP problem by converting it to a finite sequence of nonlinear penalty problems with only continuous variables. For the two newly proposed penalty terms, 'tanh'-based and 'asinh'-based, we prove that the penalty problem and the MINLP problem are equivalent in the sense that they have the same minimizers. In the penalty-based algorithm, the global minimizer of the nonlinear penalty function is computed by a firefly algorithm. A variant of FA, coined 'adaptive FA', is proposed aiming to reduce control parameter dependence.
The numerical experiments carried out to compare the quality of the produced solutions, as well as the convergence speed of the algorithm, show that the hyperbolic tangent penalty produces slightly better results than the other two penalties in comparison, in terms of solution quality and average percentage of successful runs. Furthermore, the proposed 'adaptive FA'-penalty algorithm compares favourably with the penalty algorithm when the DIRECT deterministic solver or the stochastic simulated annealing solver is invoked.

Disclosure statement
No potential conflict of interest was reported by the authors.
For Case (ii) and using the mean theorem, we get . Relation (A2) is also obtained in this case. When considering Case (iii), we get P t j (x j ; ε) − P t j (z j ; ε) = 0 by Assumption 2.2. Thus, for ρ and ε satisfying (A1), we have for all z ∈ C with z j ∈ Z for j ∈ I d , and z i = x i for i ∈ I\I d , and for all x such that x − z ∞ < ρ.
Condition (6) in Assumption 2.3 holds if we define S as the union of the neighbourhoods S(z) = {x ∈ R n : x − z ∞ < ρ} of all z ∈ W.
Furthermore, for all x ∈ C\S and all ε > 0 we have since the hyperbolic tangent is strictly increasing, |x¯l −d| ≥ ρ, whered is the integer feasible value nearest to x¯l. Proof of Property 2.6: According to Theorem 2.4, to prove that penalty P a can be used in problem (3), we show that the penalty term (11) satisfies Assumptions 2.2 and 2.3. We assume that f satisfies Assumption 2.1. Proof: For all x, y ∈ W, P a (x; ε) = j∈I d asinh (ε) = |I d | asinh (ε) = P a (y; ε) and Assumption 2.2 is satisfied.
To prove that Assumption 2.3 is also satisfied, the behaviour of P a j (x j ; ε) = min lj≤di≤uj di∈Z asinh 1 ε |x j − d i | + ε (for j ∈ I d ) in a neighbourhood of a feasible z j is analyzed using the three previously referred cases (see the proof of Property 2.5). Let ρ > 0 be a sufficiently small value. Case (i): Using the mean theorem P a j (x j ; ε) − P a j (z j ; ε) = . Choosing ρ and ε such that Similarly for Case (ii), we have For Case (iii), we obtain P a j (x j ; ε) − P a j (z j ; ε) = 0, and we may conclude that for ρ and ε satisfying (A4), we get P a (x; ε) − P a (z; ε) ≥L for all z ∈ C (where z j ∈ Z for j ∈ I d , and z i = x i for i ∈ I\I d ) and for all x such that x − z ∞ < ρ. If S is defined as the union of the neighbourhoods S(z) = {x ∈ R n : x − z ∞ < ρ} of all z ∈ W, then condition (6) in Assumption 2.3 is satisfied.