Optimisation of Server Selection for Maximising Utility in Erlang-Loss Systems

This paper undertakes the challenge of server selection problem in Erlang-loss system (ELS). We propose a novel approach to the server selection problem in the ELS taking into account probabilistic modelling to reflect a practical scenario when user arrivals vary over time. The proposed framework is divided into three stages, including i) developing a new method for server selection based on the M/M/n/n queuing model with probabilistic arrivals; ii) combining server allocation results with further research on utility-maximising server selection to optimise system performance; and iii) designing a heuristic approach to efficiently solve the developed optimisation problem. Simulation results show that by using this framework, Internet Service Providers (ISPs) can significantly improve QoS for better revenue with optimal server allocation in their data centre networks. Received on 21 October 2019; accepted on 31 October 2019; published on 05 November 2019


Introduction
Although a variety of technologies [1] have emerged to provide enhanced services, a tremendously increasing number of users causes heavy congestion when many users simultaneously access the limited network resources, e.g. shared databases or file servers. The users suffering from either limited access or poor performance might decide to stop using the offered services, which leads to decreasing revenue for service providers. A common-sense solution is to simply provide more resources; however, it is not preferable as it would trigger high cost as well as a waste of resources. Therefore optimising available resources for a maximal network utility is a task of great importance to network designers.
With the continuing growth of a number of applications consuming high bandwidth such as video streaming, online video gaming, etc., the limited spectrum resources would be drained quickly causing serious traffic problems even on very welldesign/maintained networks [2,3]. Prioritising diverse types of traffic based on Quality-of-Service (QoS) is critical to managing the access to resources [4].
Specifically, the levels of QoS priorities can help to ensure applications to receive adequate bandwidth for acceptable performance in accordance with the required QoS from the users. For instance, the highest QoS priority would be reserved for emergency services like remote surgery and natural disaster, while the lowest QoS is generally assigned to the voice or data calls.
As user population having access to Internet services expands rapidly, it is necessary to continuously improve the accessibility of Internet resources by deploying multiple, distributed server sites or implementing cloud-based systems. The distribution of Internet services across the network allows us to place the servers geographically closer to the end-users, which accordingly improves the QoS and also enhances the service scalability by sharing the load among several server farms. However, this raises a principal issue of how to direct clients to the most appropriate service locations. Selecting a random server may result in compromising QoS, e.g. extensive delays due to distance and load on the servers. A correctly designed server selection mechanism should consider certain important factors, such as service independence, support of multiple policies, high availability, widearea load balancing or low overhead. On the other hand, service providers should be able to provide a minimum number of servers to guarantee good QoS for all the user requests.
Dealing with the client-server model, a well-known queuing system, namely the Erlang-loss system (ELS), has been largely used to model telecommunication applications. In the ELS, users are not allowed to enter the system if all the servers are busy. In other words, the maximum number of users that the ELS can serve is not more than the number of available servers and the arriving users will be blocked if there is no available server. The ELS has been considered in various communication systems where the system capacity is generally restricted by the number of servers.
In brief, the main contributions of our work are as follows: • We develop a new method for server selection based on the M/M/n/n queuing model with probabilistic arrivals. Specifically, two cases of the interarrival of users are considered, which include i) independent and identically distributed (i.i.d.) and ii) independent and non-identically distributed (i.n.d.) interarrivals. We then analyse and evaluate blocking probability and server utilisation for all the cases.
• We combine server selection results with further research on the utility-based solution to optimise system performance. An optimisation problem is formulated to find the optimal number of servers so as to maximise utility per server (UpS) 1 subject to blocking probability, server utilisation, and network latency constraints.
• As this is an NP-hard problem, we design a heuristic approach to efficiently solve the problem. Based on simulation results, we show that the algorithm works well with different network conditions while maximising system performance.
This paper is organised as follows: we start by surveying related work in Section 2. We then present the ELS and performance analysis in Section 3. In Section 4, we introduce our QoS utility-based model and extend the model to the ELS in Section 5. We evaluate the algorithm in Section 6 and conclude the work in Section 7. 1 The UpS is defined as the total network utility over the number of utilised servers.

Erlang-Loss System
The M/M/n/n queue is the most recognisable system in queuing theory, and it has been the selected area of study by many researchers. One of the early examinations of the ELS with state-dependent arrival and service rates was investigated by Brumelle in the late 70s [5]. CPU shared system and "birth-death" process were considered, where the state of the systems also consider the number of busy servers, the time until the next arrival and the remaining workload for the customers in the system. Krishnan [6] presented a simple and straightforward alternative to obtain results on convexity for arrival and service rates in the ELS and stopover in the Erlang delay system. In [7], Iversen & Mirtchev proposed a simple Erlang B model that can be used in a tele-traffic system for full accessibility with a generalised Poisson input stream. The idea in [7] was situated on the logic continuation of the Poisson distribution and the Erlang B formula. Procedures based on birth and death processes and state-dependent arrival rates were adopted verifying the simplicity and uniformity of the Erlang B model in representing both heavy and low traffic, and this makes the model attractive for modelling traffic, network evaluation and analysis.
A different approach to the M/M/n/n can be found in [8], where Yao & Knessl outlined two parallel M/M/n/n queues with n servers in each queue and no waiting lines. If both have the same occupancy, then the arrival is routed to any of them. Asymptotic approximations to the joint steady-state distribution of finding m and n servers occupied in the first and in the second queue were obtained along with finding the minimum allocation of the number of busy servers in the second queue. The Erlang loss model can be used for tackling the issues of clogging in a shared resource, that has been researched extensively, for instance in [9][10][11][12][13]. This problem has been modelled as a singleclass ELS or multi-class ELS model for both single link and network. It is well-known that the productform solution exists for the multi-class ELS model of a single link [10]. Kaufman [9] and Roberts [14] independently discovered one-dimensional recursive formula for computing the blocking probability, which simplified the computation. Based on [10], Nilsson et al. [11] proposed a more stable algorithm to compute blocking probability. As for provisioning purpose, Hampshire et al. [12] introduced a valid approach to the issues mentioned above.
Examining utilisation and Erlang traffic theory in asynchronous networks based on IP technology, Kavacky, et al. [15] proposed a test network model with a video traffic source. The difference between 2 EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 10 2019 -01 2020 | Volume 7 | Issue 22 | e1 two-link shaping methods was identified, and a starttime fair queuing was shown to be more suitable for the video traffic, establishing that the Erlang model is an appropriate choice for Variable Bit Rate traffic calculations on losses. Another research on the Erlang model has been applied for packet loss estimation in VoIP networks [16], where Misuth & Baronak presented a way of calculating the values of input variables, i.e. traffic load in Erlangs and number of lines, based on the characteristics of the codec in use and link interconnecting communication nodes. These values help to obtain packet loss probability, and their expectations were verified by simulations. It was also shown in [16] that the buffer utilisation in network nodes could positively influence the measured packet loss probability, and the Erlang B model could be used to determine the upper bound of packet loss probability in the general worst-case scenario where no buffer is available.
Considering the M/M/n/n queue with two types of arrival rates and various levels of priorities, Smith et al., [17] evaluated all priority frameworks and derived the average number of primary and secondary users and the blocking probabilities related to them. The steady-state probabilities were derived for the case of not given priority to the primary user. The derived expressions were simplified in a short form, which can also lead to straightforward results for the performance metrics. In the case where priority is allocated to the dominant users, the second users might have to drop a resource with a dropping probability. The ELS is also being used in cloud computing and traditional data centres. Recently, Li, 2016 [18] has studied the common resource capacity management problem in theoretical loss network systems with the applications in cloud computing. A stochastic optimisation framework was developed to find the optimal capability for diverse types of resources. Considering the Erlang fixedpoint approximation, a new quality-efficiency-driven fixed-point approximation for blocking probability was obtained, and the optimisation formed on the qualityefficiency-driven approximation can be solved with an iterative algorithm. In this paper, we design a novel M/M/n/n queuing model for server selection problem with probabilistic arrivals including i.i.d. and i.n.d. interarrivals. While the former is generally assumed to simplify the analysis, the latter reflects well the practical scenarios when the users may arrive with different rates at different time.

Utility Function
The utility function measures user satisfaction and it is a function of received QoS. Most of the work in literature considers the utility function as a nondecreasing function of the effective transmission rate (bandwidth) [19][20][21] or signal to interference ratio (SIR) of occurring connections [22]. This paper considers the utility function as a non-increasing function of latency. The function includes a special first region [0, T min ] where the utility score stays the same, so this presents a better way of how services are operating. Utilitybased optimisation improves comprehensive system performance. However, there is no, or just a few real functional solutions were deployed so far due to their complexity [21,22]. For example, the proposed algorithms in [21] require alteration of the TCP stack at the end-hosts. The utility function presented in the paper does not involve any modification in the TCP stack at the host-end so can be conveniently deployed in actual networks.
Many cloud services are running on geographically distributed data centres for better reliability and performance, which depends on server selection. Xu & Li [23] considered the emerging problem of joint request mapping and allocating with distributed data centres. A general convex optimisation was formulated, where the location is linked with the performance and costs. An efficient distribution algorithm was proposed to decompose a large-scale global problem into many sub-problems. Carrera et al., [24] presented a system that automatically allocated resource to clustered web applications. It is established on a utilitydriven application distribution algorithm to achieve equalised satisfaction across applications.
Utility-maximisation server selection is a fundamental problem to tackle because users want to have access to resources in the most efficient way. Phan et al. [25,26] presented a method for selection of replicated servers distributed over a wide area, allowing applications and network providers to trade-off costs with QoS for their users. Compared to the closest server selection approach, the proposed utility framework in [25] helps to reduce blocking probability while maintaining excellent utility for users. A polynomial optimisation algorithm was developed to allocate user service requests to servers located on the utility while satisfying transit cost constraint and an efficient lowoverhead distributed model was proposed to deal with a small-scale subset of data requirements. Yuan et al., [27] proposed a user-oriented QoE-driven multimedia service delivering a solution in the context of the content delivery network (CDN) architecture. The major improvement of the QoE-based server selection strategy is taking both the underlying network conditions and video quality into account. The experiment results in [27] confirmed that the proposed strategy based on neural network achieves significant improvements regarding user perception compared to traditional QoSbased methods. In this paper, we extend the work in [25,26] by considering M/M/n/n queuing model together with the utility-maximising problem. This helps to identify 3 EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 10 2019 -01 2020 | Volume 7 | Issue 22 | e1 the optimal number of servers, and thus it is expected to save a considerable resource instead of deploying all available servers in the system. Figure 1 illustrates an ELS model where users only enter the system when the servers are available and the number of users that can be served is not more than the number of available servers. If there is no available server, then these users will be blocked or lost to the system. The ELS is generally described via Kendall's notation as M/M/n/n which represents an n-server queue with Poisson arrivals, exponentially distributed service time, finite system capacity with a maximum of n users in the system, infinite population and first-in first-out discipline.

Erlang-Loss System and Its Performance Analysis
In this paper, we consider two cases of the interarrivals of the users, including i) independent and identically distributed (i.i.d.) interarrivals and ii) independent and non-identically distributed (i.n.d.) interarrivals. For brevity, let us denote the ELS with i.i.d. interarrivals and the ELS with i.n.d. interarrivals by (S1) and (S2), respectively. Suppose there are T time frames in a day and probabilistic arrivals are considered where the probability that there are user(s) coming in the i-th time frame, i = 1, 2, . . . , T , is α i satisfying Let us denote λ i as the arrival rate of users within the i-th time frame. Two systems (S1) and (S2) can be defined as follows: • System (S1): the interarrivals between users within T time frames are i.i.d. following exponential distribution with an identical arrival rate of λ 1 = λ 2 = · · · = λ T = λ.
In both systems, the service time at a server follows an exponential distribution with a service rate of µ. Let ρ i , i = 1, 2, . . . , T , denote the traffic intensity in the i-th time frame which is defined as a measure of the ability of a server to serve requests within the i-th time frame. The traffic intensity of system (S1) with i.i.d. arrivals is thus given by while the traffic intensity of system (S2) in the i-th time frame with i.n.d. arrivals is In the following, we analyse blocking probability and server utilisation in order to evaluate the performance of these two systems. In the M/M/n/n queue, the blocking probability is defined as a steady-state probability that there are n users in the system and server utilisation is defined as a percentage of time that a server is busy serving requests from the users.

Performance Analysis of System (S1)
Following birth-death process in system (S1), the steady-state probability that there are k users, k = 1, 2, . . . , n, in the i-th time frame, i = 1, 2, . . . , T , denoted by P (1) k,i , can be determined by where P (1) 0,i denotes the steady-state probability that system (S1) is idle in the i-th time frame and (a) is due to the i.i.d. arrivals, i.e. λ i = λ. With a note that n j=0 P (1) j,i = 1, it can be arrived at Substituting (4) into (3), we obtain Considering T time frames, the probability that there are k users in the system is given by where (a) is due to the fact that T i=1 α i = 1 and P (1) k,i in (5) is independent of the time frame i.
By definition, the blocking probability of system (S1) is thus determined by Let U A denote the utilisation of all servers in system (S1). U (1) A is also known as the average number of busy servers over T time frames. From (6) The server utilisation in system (S1), denoted by U S , is therefore given by

Performance Analysis of System (S2)
Considering system (S2), the steady-state probability that there are k users, k = 1, 2, . . . , n, in the i-th time frame, i = 1, 2, . . . , T , i.e. P (2) k,i , can be similarly given by Here, P 0,i denotes the steady-state probability that system (S2) is idle in the i-th time frame, which can be deduced as Substituting (11) into (10), we obtain Over T time frames, the probability that there are k users in the system (S2) is also given by The blocking probability of system (S2) is therefore obtained by Similarly, the utilisation of all servers in system (S2), i.e. U (2) A , can be computed by (15) and the server utilisation in system (S2), i.e. U (2) S , is given by

QoS Utility-based Model
Inspired by the work in [25] and [26], our QoS utilitybased model (QUM) is considered to implement server selection for an ELS subject to QoS requirements. In the QUM, there are basically two types of service providers, including i) Application Service Providers (ASPs) who are organisations providing cloud-based applications to their users from either their facilities or another source, and ii) Internet Service Providers (ISPs) who resolve user requests by implementing various resolution algorithms. The ASPs employ a utility function of the service that can be understood as the latency requirements for the services, while the ISPs compromise the QoS with the traffic cost. Prior to introducing the utility function in the QUM, let us consider the following example: Example 1. As shown in Fig. 2, two servers are deployed to provide services for two users following a M/M/2/2 queuing model with a finite system capacity of two users. It is assumed that both User 1 and User 2 require a voice service over the Internet which can be provided by either Server 1 or Server 2 and the latency t  Employing the traditional minimum-distance based selection algorithm, the optimal solution should be User 1 -Server 1 and User 2 -Server 2 as the minimum total latency of 29 ms. Notice that there is almost no difference between audio and real speech if latency is less than or equal to 20 ms [26]. In this scenario, User 1 receives the best QoS whereas User 2 might receive some disruption in the voice quality as it is above the safe latency threshold (20 ms) for voice services. Therefore, a better solution should be: User 1 to acquire the service from Server 2, suffering a latency of 20 ms while still maintaining the best QoS of voice service. In return, User 2 is served by Server 1 with a latency of 20 ms. This means that both users can obtain the best QoS using this server selection solution.
The observation in Example 1 leads to the definition of the utility function for each pair of user and service, which can be represented by a linear function of the latency. Assume there exists a k-th server among n servers to serve a j-th service, j = 1, 2, . . . , N S , of a coming i-th user, i = 1, 2, . . . , N U , where N S and N U denote the number of services and the number of users, respectively. We have the following definition of the utility function for each pair of user and service [26]: Definition 1. The utility function in a QUM for a pair of the i-th user, i = 1, 2, . . . , N U , and the j-th service, j = 1, 2, . . . , N S , is given by where δ ij ≥ 1 is the priority level of the j-th service requested by the i-th user, t ij is the network latency for the i-th user to have the j-th service, and T (l) ij and T (u) ij are the lower and upper thresholds, respectively, of the latency for the required QoS. (17) that the reciprocal of the priority level means a higher utility is expected when a user requests a service with a higher priority level. This reflects the practical network perspective when different users require the same service, then the user having a higher priority but achieving the same latency should achieve a higher utility.

Remark 1. It can be noticed in
Remark 2. Given a fixed service priority level, i.e. fixed δ ij , the utility function in (17) consists of three cases which can be interpreted with respect to two thresholds T ij provides a higher utility approaching the best performance. However, such performance enhancement is unnoticeable, and thus it can be regarded as of the same QoS.
The network latency is in an acceptable range providing a QoS ranked from "good" to "poor". The "fair" point can be included for further ranking depending on services, though it does not cause any change on the gradient of the graph.

QUM-based Server Selection in the ELS
In this section, we first present the network design aspects required for server selection taking into account various performance measures in the ELS along with those in the QUM. An optimisation problem is then developed, which aims to maximise UpS by implementing a QUM-based server selection in the ELS. For convenience, notations used throughout the paper are summarised in Table 1 in chronological order of their appearances.

Network Design Aspects
Adopting QUM to the ELS, QoS constraints in both the ELS and QUM need to be taken into account. It is crucial to consider the following network design aspects: Blocking Probability. In order to guarantee that all users can be served in the ELS, the number of user arrivals to the system should not be more than the number of servers. Otherwise, they will be dropped or blocked. This means that the blocking probability should not exceed a QoS threshold. Let P thre denote the blocking probability threshold. We then have the following constraint on the blocking probability, i.e. 6 EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 10 2019 -01 2020 | Volume 7 | Issue 22 | e1 where P (1) n and P (2) n are given by (7) and (14), respectively. Server Utilisation. Due to the limitation of the server, it may not spend all resources to serve requests from the users. Its utilisation is subject to a specified QoS constraint. Let us denote the server utilisation threshold by U S,thre . The server utilisation in an ELS (U where U (1) S and U (2) S are given by (9) and (16), respectively.

User-Service Allocation.
It is expected that all requests from the i-th user, i ∈ U = {1, 2, . . . , N U }, should be served by at least one server among n available servers.
ij ≤ 1, denote the fraction of the i-th user with the j-th service, j ∈ S = {1, 2, . . . , N S }, which is processed by the k-th server, k ∈ N = {1, 2, . . . , n}. We accordingly have the following constraint: Network Latency. The latency for the i-th user, i ∈ U, to get the j-th service, j ∈ S, from the k-th server, k ∈ N, should be not exceeding T Let ij , where t ij is the network latency given by: The utility function in (17) for the case when T ij can be rewritten as: Here, a ij is regarded as an auxiliary parameter which can be used to compute the utility function for a pair of the i-th user and the j-th service as: In order to maintain an acceptable network latency, a ij should satisfy the following constraint: where x + max(x, 0).

Optimisation Problem for QUM-based Server Selection in the ELS
In the above network design aspects, the user-service allocation is shown to be critical in every network, especially when a number of constraints, either explicit or implicit, need to be taken into account. With the aim of efficiently allocating the users and their required services with an optimal number of servers, we can formulate an optimisation problem that maximises UpS  (26) subject to the following constraints: where N max denotes the maximum number of available servers that can be used in the system. In the above optimisation problem, the constraints (C1)-(C6) are defined as in the network design aspects. It can be noticed that the optimisation problem in (26) is an integer programming problem since its design variable is the number of servers which is restricted to be integer (see (C7)). This problem is thus an NP-hard problem. In the following, let us introduce an heuristic approach to solve the problem in (26).
We first put the constraints in groups and then sequentially consider each group of constraints as in the following steps: With a limited number of available servers, i.e. N max in constraint (C7), the minimum number of servers can be easily found by an iterative search. After this step, the constraints (C1) and (C2) in the optimisation problem in (26) can be removed, whereas the constraint (C7) is replaced by • Step 2: Given the limited number of available servers, this step finds the optimal number of servers to maximise UpS in (26) subject to the constraints (C3)-(C6) and (C8). Notice that although some constraints of the ELS are relaxed, the optimisation problem is still in the NP-hard form with a stricter integer set of the number of servers. Therefore, an iterative searching approach can be employed, where we validate the total utility of all users and services with respect to different number of servers in the range [N min , N max ] until achieving the maximum UpS.
For clarity, the finding of the optimal number of servers with the proposed two-step iterative search for QUM-based server selection in the ELS is summarised in Algorithm 1. Step 2: Find the optimal number of servers 10: Set U pS max = 0, n opt = N min 11: for n = N min to N max (see (C8)) do 12: for all i ∈ U and j ∈ S do 13: if t Compute U pS = i,j u ij /n 24: if U pS ≥ U pS max then 25: U pS max = U pS 26: n opt = n 27: end if 28: end for 29: Output: n opt

Simulation Results
In this section, simulation results of QUM-based server selection in the ELS are presented. The 8 EAI  proposed algorithm is implemented and validated using MATLAB. Specifically, we first demonstrate the impacts of queuing parameters in the ELS, i.e. user arrival rates, user arrival probabilities, and service rate, on the optimal number of servers along with the maximum UpS that can be achieved with the QUM-based server selection. We then evaluate the performance of the proposed approach for different services with different latency requirements. Different server utilisation requirements are also taken into account to verify the effectiveness of the proposed algorithm. Furthermore, in the simulation, both i.i.d. and i.n.d. interarrivals with probabilistic arrivals are considered to show the practicability of employing QUM-based server selection.

Impacts of User Arrival Rates
Considering the impacts of user arrival rates on the system performance with the proposed two-step iterative search, Figs. 3a, 3b and 3c plot the minimum number of servers, i.e. N min , the optimal number of servers, i.e. N opt , and UpS, respectively, versus blocking probability threshold, i.e. P n,thre . In the simulation, the minimum and the optimal number of the servers are sequentially determined by employing Steps 1 and 2 in  Figs. 3a and 3b, the minimum and the optimal number of required servers monotonically decreases as the blocking probability threshold increases. This is due to the fact that we allow increasing utility value in (26) by sacrificing blocking probability in (27) which is shown in Fig. 3c. It is also shown that a lower number of servers are required with i.n.d. arrivals on average when compared to the i.i.d. case and higher UpS is achieved with the proposed algorithm over the case when all servers are deployed. This accordingly means a considerable resource can be saved with probabilistic arrival modelling for enhanced UpS.  Figure 5. Impacts of service rates threshold P n,thre . Additionally, it can be observed that, when the users arrive with a higher rate, it does not always mean that more servers are required. In fact, in probabilistic modelling, a lower probability of the user arrivals would result in less servers even the users arrive at a higher rate. Furthermore, the proposed algorithm is shown to save a considerable number of servers for a higher UpS in all cases of the arrival rates and their probabilities.

Impacts of Service Rate
We next simulate the impact of service rate on system performance. Figs Figure 7. Impacts of server utilisation requirements 6.5. Impacts of Server Utilisation Requirements As regards server utilisation requirements, Fig. 7a and 7b plot the optimal number of servers and the corresponding UpS, respectively, versus server utilisation threshold, i.e. U S,thre . The following parameters are used: T (u) = 150 ms, T (l) = 20 ms, δ = 1, µ = 2 users/min, α = [0.5, 0.5], P n,thre = 0.9, and N max = 40. Similar to Fig. 3, the arrival rate is set as λ = 20 users/min for i.i.d. arrivals and three different sets for i.n.d. arrivals, i.e. {(λ 1 , λ 2 )} ∈ {(28, 12), (30, 10), (32, 8)} users/min. It can be observed in Fig. 7 that the optimal number of required servers monotonically decreases as the server utilisation threshold increases. Also, a lower number of servers are required with probabilistic modelling for i.n.d. arrivals when compared to the i.i.d. case and a higher UpS is shown to achieve for all cases over the employment of all servers. This accordingly brings advantage to the proposed algorithm in resource saving with probabilistic arrival modelling.

Conclusions
In this paper, we have presented a novel utilitymaximising server selection for an ELS, i.e. M/M/n/n queuing model. Through simulation results, we have shown that the proposed algorithm works well with different network conditions to save a considerable number of servers for a high utility. In general, the minimum and the optimal number of the required servers monotonically decrease as the blocking probability threshold increases. More importantly, the probabilistic modelling with i.n.d. arrivals, which reflects well the practical scenario, has been shown to achieve a higher utility with a lower number of servers when compared to the i.i.d. arrival model. As future work, we will extend the utility function to support more QoS metrics along with the design of an online algorithm that can adapt the solutions to the changes of practical networks in real time.