## An Interview Question: Using Integer Programming

We can solve the Interview Question using a mathematical technique called Integer Programming. Let $d_1, d_2, \ldots, d_N$ be the variables representing diskette 1, diskette 2, diskette 3, etc. The values of the $d_k$ variables can only be 0 or 1. A 0 means the diskette is not used while a 1 means that it is used.

Each file is saved to a certain diskette. We want to know to what diskette $d_i$ a given file $f_j$ is assigned. To represent this, we assign the variable $a_{ij}$ a value of 1 if file $f_j$ is assigned to diskette $d_i$.

We will normalize the file sizes so that if $s_i$ is the size of $f_i$, the $s_i \le 1$. We do this by simply dividing all file sizes by the size of the diskette. For a given diskette $d_i$, the following constraint should be satisfied:

$d_i - s_1a_{i1} - s_2a_{i2} - \ldots - s_N a_{iN} \ge 0$

for diskette $i = 1, 2, \ldots, N$ and $s_i$ are the normalized file sizes of file $f_i$ for $i=1,2,\ldots,N$.

Since each file $f_j$ can only be assigned to one diskette, we have the following constraint:

$a_{1j} + a_{2j} + \ldots + a_{Nj} = 1$

where $a_{1j}$ is the variable representing the “file $f_j$ is in diskette $d_1$“, etc.

Finally, we have to constrain the value of $d_i$ to be either 0 or 1, that is,

$d_i \le 1$

for all $i=1,2,\ldots,N$.

## Integer Programming Formulation

Given the above information, we can formulate the Integer Programming problem as

Minimize:

$d_1 + d_2 + d_3 + \ldots + d_N$

subject to

$\begin{array}{rl} d_1 - s_1a_{11} - s_2a_{12} - s_3a_{13} - \ldots - s_Na_{1N} &\ge 0\\ d_2 - s_1a_{21} - s_2a_{22} - s_3a_{23} - \ldots - s_Na_{2N} &\ge 0\\ :\\ d_N - s_1a_{N1} - s_2a_{N2} - s_3a_{N3} - \ldots - s_Na_{NN} &\ge 0\\ a_{11} + a_{21} + a_{31} + \ldots + a_{N1} &= 1\\ a_{12} + a_{22} + a_{32} + \ldots + a_{N2} &= 1\\ :\\ a_{1N} + a_{2N} + a_{3N} + \ldots + a_{NN} &= 1\\ d_1 &\le 1\\ d_2 &\le 1\\ :\\ d_n &\le 1 \end{array}$

## Solving the Problem

We will use R to solve this Integer Programming Formulation. Please see code below:

library("lpSolve")
NUMFILES=4

# Generate random file sizes between 1 and 10
FileSizes=ceiling(10*runif(NUMFILES))
x = -1*FileSizes/10
l=length(x)

# Each files can be in any of the diskettes. Suppose there are N files,
# to determine if a file j is in diskette i, the value of variable x_ij will
# 1 if file j is in diskette i, and 0 otherwise.
# Here we construct the coefficients of variables x_ij which are the
# sizes of the files (normalized to 1)
zz=c()
for(i in 1:(l-1)){
zz=c(zz,x,rep(0,l*l))
}
zz=c(zz,x)

# Construct the coefficients of the indicator variables representing the
# diskettes d_i
zzmatrix=matrix(zz,ncol=l*l,byrow=T)
CoefficientsOfDiskettes=c();
for(i in 1:l){
ttt=rep(0,l)
ttt[i] = 1
CoefficientsOfDiskettes= c(CoefficientsOfDiskettes,ttt,zzmatrix[i,])
}

# Construct the coefficients of x_ij for constant j. These variables
# satisfy the equation \sum_{i=1}^N x_{ij}
SumOfFileAcrossDiskettes=c()
for(i in 1:l){
ttt=rep(0,l)
ttt[i]=1
SumOfFileAcrossDiskettes=c(SumOfFileAcrossDiskettes,rep(ttt,l))
}

# Prepend Coefficients of variables d_i. The value of these coefficients is 0.
SumOfFileAcrossDiskettesMatrix=matrix(SumOfFileAcrossDiskettes,ncol=l*l,byrow=T)
PrependCoefficientsOfDiskettes=c()
for(i in 1:l){
PrependCoefficientsOfDiskettes=c(PrependCoefficientsOfDiskettes,c(rep(0,l),SumOfFileAcrossDiskettesMatrix[i,]))
}

# Construct coefficients of d_i to construct constraint d_i <= 1
DisketteConstraints=c()
for(i in 1:l){
ttt=rep(0,l)
ttt[i]=1
DisketteConstraints=c(DisketteConstraints,ttt,rep(0,l*l))
}

# Construct matrix input of lpSolve
const.mat=matrix(c(CoefficientsOfDiskettes,PrependCoefficientsOfDiskettes,DisketteConstraints),ncol=l*(l+1),byrow=T)

print("Matrix Coefficients:")
print(const.mat)

# Construct inequalities/equalities
const.dir=c(rep(">=",l),rep("=",l),rep("<=",l))

# Construct Right-Hand side
const.rhs=c(rep(0,l),rep(1,l),rep(1,l))

# Construct Objective Function
objective.in=c(rep(1,l),rep(0,l*l))

# Invoke lpSolve
mylp=lp(direction="min",objective.in=objective.in,const.mat=const.mat,const.dir=const.dir,const.rhs=const.rhs,all.int=T)

# Print Results
print(paste("Number of Diskettes: ", sum(mylp$solution[1:l]))) tz=matrix(mylp$solution,ncol=l,byrow=T)
print("File Sizes: ")
print(FileSizes)
for(i in 2:(l+1)){
files = which(tz[i,] == 1)
if(length(files) > 0){
print(paste("Files in diskette ", i-1))
print(files)
}
}



Most of the code above is setting up the matrix of coefficients. The line 70 then calls on lpSolve to compute the optimal values of the variables

## Program Output

Running this code we get the output

[1] "Matrix Coefficients:"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,]    1    0    0    0   -1 -0.2 -0.1 -0.1    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[2,]    0    1    0    0    0  0.0  0.0  0.0   -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[3,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0
[4,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1
[5,]    0    0    0    0    1  0.0  0.0  0.0    1   0.0   0.0   0.0     1   0.0   0.0   0.0     1   0.0   0.0   0.0
[6,]    0    0    0    0    0  1.0  0.0  0.0    0   1.0   0.0   0.0     0   1.0   0.0   0.0     0   1.0   0.0   0.0
[7,]    0    0    0    0    0  0.0  1.0  0.0    0   0.0   1.0   0.0     0   0.0   1.0   0.0     0   0.0   1.0   0.0
[8,]    0    0    0    0    0  0.0  0.0  1.0    0   0.0   0.0   1.0     0   0.0   0.0   1.0     0   0.0   0.0   1.0
[9,]    1    0    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[10,]    0    1    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[11,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[12,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[1] "Number of Diskettes:  2"
[1] "File Sizes: "
[1] 10  2  1  1
[1] "Files in diskette  1"
[1] 2 3 4
[1] "Files in diskette  2"
[1] 1



## Interpreting the Result

Lines 2-14 of the output gives you the matrix of coefficients. Line 15 prints the number of diskettes needed to store the files. Line 17 prints the randomly generated file sizes from 1 to 10. Finally lines 18-21 prints which diskettes contain which files.

The space complexity of this solution is quite substantial. Given $N$ files, we need to specify $N^2 + N$ variables by $3\times N$ equations for a total of $(N^2 + N)\times 3N$ memory space for coefficients.

## Using Linear Programming to Solve the Call Center Problem

Agents cost money. Not only will you pay for their salary but also other benefits in you compensation package. So the less agents you have to do a job, the less overhead, the more profit for you.

In the last article, we solve the call center scheduling using a manual approach. The question now is this: Does our solution give us the least possible cost to operate the call center? We cannot be sure. However, there is a technique which we can use in order to calculate the least possible cost of our schedule subject to the constraints of the required agents per time slot. This technique is called Integer Linear Programming.

Formulation of the Problem

Let $x_i$, $0 \le i \le 335$, be the number of call center agents who comes to work at time slot i. Using this notation, we can calculate the number of call center agents current present in the office at time slot 1:

$S_1=x_0 + 0 + \ldots+ 0 + x_{320} + x_{321} + \ldots + x_{335}$

How did we arrive with this equation? If an agent starts at time slot 320, then he/she will still be in the office at time slot 1. This is however his/her last time slot for the shift. All agents that start after 320 until 336 will also be present at time slot 1. And so is the agent that starts at time slot.

Similarly, the number of agents in time slot 2 is given by

$S_2= x_0 + x_1 + 0 + \ldots + 0 + x_{321} + x_{322} + \ldots + x_{335}$

A similar pattern holds for i from 2 to 335.

Let $\vec{req}$ be the vector of required number of agents per time slot. Then we can specify for each time slot i that the number of agents be greater than or equal to req[i].

The linear programming problem can now be specified by:

Minimize $\sum_{i=0}^{335} x_i$

subject to the following contraints:

We can create programmatically the matrix of these constraints using this algorithm. The code is written in R, which is an open source statistical software.

x=seq(1:336)
y=(x+17) %% 336

bigmatrix=c();
for( i in 1:336){
z=(y - i) %% 336;
tmp=rep(0,336)
for( j in 1:336){
if(z[j] < 18 ){
tmp[j] = 1;
}
}
bigmatrix=c(bigmatrix,tmp);
}

Notice that we start our index from 1 rather than 0. This is because R is a 1-based programming language.

Using the lpSolve package of R, we can compute the minimum of the objective function using the lp function of the library.

bigmatrix=matrix(bigmatrix,nrow=336,byrow=T)
f.obj=rep(1,336);
f.con=bigmatrix;
f.dir=rep(">=",336)
f.rhs=mydata2

print("computing lp");
mylp=lp ("min", f.obj, f.con,
f.dir, f.rhs, int.vec=1:336)

After executing the code above, we get the following result:

The value of the objective function is given by:

> mylp$objval [1] 47 If we compare this to the manual method, the value of the objective function is just the sum of the components of z: > sum(z) [1] 48 This means that using the Linear Programming technique, we were able to come up with a solution with the least cost. ## Manual Approach to Call Center Scheduling In the last post, we defined the problem of call center scheduling. In this post, we are going to present a manual way of solving this problem. Let us first define some terminology. A time slot is a 30 minute interval, e.g. 12:00 am – 12:30 am. Each time slot has a required number of agents that must be in the office. Since there are 24 hours in a day, then there are 48 time slots in a day. each time slot is independent from each other. Therefore, the total number of independent time slots is 48*7 = 336. Let us define the following vectors • a vector $\vec{req}$ of length 336 whose components define the required number of agents in the ith slot. • a vector $\vec{v}$ of length 48*7=336. Each component of this vector represents the number of agents current in the office. • a vector $\vec{z}$ of length 336. This vector represents the number of call center agents that will start in that time slot. A value of 0 means that no agent is to begin working on that time slot. A value of n means n call center agents are expected to begin working on that time slot. Here is how we compute the vector $\vec{v}$[/tex] and [tex]$\vec{z}\$.

for(i in 1:length(req)){
if(v[i]  48*7 = 336

z[i] = z[i] + x;
# since an agent works 9 hours in a day,
# we have 9*2=18 slots
for(j in 0:17){
k=i + j;
l=k %% 337;
v[l] = v[l] + x;
}
}

The vector v is initialized to 0. For each component i of v, we compare it with the value req[i], which is the required number of agents for time slot i. If the value of v[i] is less than req[i], then we add the number of agents to start at that time slot, which is x=req[i] - v[i]. We update the vector z with this value, z[i]=z[i]+x. We then update the vector v by adding the value of x to all components of v starting at i until i+17 because an agent starting at timeslotiwill work 9 hours until i+17.

The code above is actually a code snippet from the program I created using R programming language. To know more about R, click this link.

Running this code in R, we get the following result:

The corresponding value of vector v is:

The difference of v from the required staffing req is :

The matrix form is more convenient:

This means that for slot 1, we 2 more agents rather than the required 1 agent at that time slot. These two agents will not be doing anything productive as far as their job is concerned. They are just there because they need to complete the 9 hours of work as stated in their contracts.

Now, is there a way for us to minimize this unproductivity while still meeting the required number of agents in the center? We will answer this in the next post.

## Call Center scheduling Problem

This problem I got from someone working in a call center. The call center operation in question operates 24 hours a day, 7 days a week. Every 30 minutes, there is a projected number of agents that need to be present in order to handle calls. Here is an example of such a requirement:

The data shown is truncated. The should be 48 rows corresponding to 48 half-hour intervals in a day. The whole data can be downloaded from here.

Assuming that an agent works 9 hours per day, how do you schedule your staff in such a way that you can meet the required number of agents required per 30 minute interval using a minimum number agents.

Approach to solving this problem

Imagine for a moment that you are the manager of this call center and you want to solve this problem manually. Looking at the first row and first column of the data, you’ll see that at 12:00 am to 12:30 am on Mon you need 1 agent to be in the office. Since an agent works 9 hours per day, this agent’s duty will end at 9:00 am. If n is the total number of agents in the office at 9:00, then by the next interval, there will be $n-1$ agents left ( Assuming there is no new agent who comes to work).

Looking down the row under Monday, you will see that a new person is required to be in the office at 6:00 am. You then assign a new person to come to the office at this time.

Further down, another person is required at 8:30 am. However, at 9:00 am, the first person’s shift is up but the requirement is still 3 persons. So you need to assign another person to come to office a 9 am to meet the requirement. Continuing the process in this way, you can ultimately come up with a schedule of your call center agents.

Complication

You have to notice that people working at 3:30 pm will intersect the next days shift. They will leave the office at 12:30 am the next day. The same is true for all agents who come to the office between 3:30 to 12:30 pm. To make matters worse, those agents who come to work Sunday starting at 3:30 pm will have to intersect monday’s shift. This means that the one person you initially scheduled above will not be alone, but will have a companion coming from the sunday’s shift.

Big Question

The big question therefore is this: Is the schedule you came up with using the manual approach the best schedule in terms of minimizing the number of staff?

We will answer this question in our next article.