An Interview Question: Using Integer Programming

We can solve the Interview Question using a mathematical technique called Integer Programming. Let d_1, d_2, \ldots, d_N be the variables representing diskette 1, diskette 2, diskette 3, etc. The values of the d_k variables can only be 0 or 1. A 0 means the diskette is not used while a 1 means that it is used.

Each file is saved to a certain diskette. We want to know to what diskette d_i a given file f_j is assigned. To represent this, we assign the variable a_{ij} a value of 1 if file f_j is assigned to diskette d_i.

We will normalize the file sizes so that if s_i is the size of f_i, the s_i \le 1. We do this by simply dividing all file sizes by the size of the diskette. For a given diskette d_i, the following constraint should be satisfied:

d_i - s_1a_{i1} - s_2a_{i2} - \ldots - s_N a_{iN} \ge 0

for diskette i = 1, 2, \ldots, N and s_i are the normalized file sizes of file f_i for i=1,2,\ldots,N.

Since each file f_j can only be assigned to one diskette, we have the following constraint:

a_{1j} + a_{2j} + \ldots + a_{Nj} = 1

where a_{1j} is the variable representing the “file f_j is in diskette d_1“, etc.

Finally, we have to constrain the value of d_i to be either 0 or 1, that is,

d_i \le 1

for all i=1,2,\ldots,N.

Integer Programming Formulation

Given the above information, we can formulate the Integer Programming problem as

Minimize:

d_1 + d_2 + d_3 + \ldots + d_N

subject to

\begin{array}{rl}  d_1 - s_1a_{11} - s_2a_{12} - s_3a_{13} - \ldots - s_Na_{1N} &\ge 0\\  d_2 - s_1a_{21} - s_2a_{22} - s_3a_{23} - \ldots - s_Na_{2N} &\ge 0\\  :\\  d_N - s_1a_{N1} - s_2a_{N2} - s_3a_{N3} - \ldots - s_Na_{NN} &\ge 0\\  a_{11} + a_{21} + a_{31} + \ldots + a_{N1} &= 1\\  a_{12} + a_{22} + a_{32} + \ldots + a_{N2} &= 1\\  :\\  a_{1N} + a_{2N} + a_{3N} + \ldots + a_{NN} &= 1\\  d_1 &\le 1\\  d_2 &\le 1\\  :\\  d_n &\le 1  \end{array}

Solving the Problem

We will use R to solve this Integer Programming Formulation. Please see code below:

library("lpSolve")
NUMFILES=4

# Generate random file sizes between 1 and 10
FileSizes=ceiling(10*runif(NUMFILES))
x = -1*FileSizes/10
l=length(x)

# Each files can be in any of the diskettes. Suppose there are N files,
# to determine if a file j is in diskette i, the value of variable x_ij will
# 1 if file j is in diskette i, and 0 otherwise.
# Here we construct the coefficients of variables x_ij which are the
# sizes of the files (normalized to 1)
zz=c()
for(i in 1:(l-1)){
  zz=c(zz,x,rep(0,l*l))
}
zz=c(zz,x)

# Construct the coefficients of the indicator variables representing the 
# diskettes d_i
zzmatrix=matrix(zz,ncol=l*l,byrow=T)
CoefficientsOfDiskettes=c();
for(i in 1:l){
 ttt=rep(0,l)
 ttt[i] = 1
 CoefficientsOfDiskettes= c(CoefficientsOfDiskettes,ttt,zzmatrix[i,])
}

# Construct the coefficients of x_ij for constant j. These variables 
# satisfy the equation \sum_{i=1}^N x_{ij}
SumOfFileAcrossDiskettes=c()
for(i in 1:l){
  ttt=rep(0,l)
  ttt[i]=1
  SumOfFileAcrossDiskettes=c(SumOfFileAcrossDiskettes,rep(ttt,l))
}

# Prepend Coefficients of variables d_i. The value of these coefficients is 0. 
SumOfFileAcrossDiskettesMatrix=matrix(SumOfFileAcrossDiskettes,ncol=l*l,byrow=T)
PrependCoefficientsOfDiskettes=c()
for(i in 1:l){
 PrependCoefficientsOfDiskettes=c(PrependCoefficientsOfDiskettes,c(rep(0,l),SumOfFileAcrossDiskettesMatrix[i,]))
}

# Construct coefficients of d_i to construct constraint d_i <= 1
DisketteConstraints=c()
for(i in 1:l){
 ttt=rep(0,l)
 ttt[i]=1
 DisketteConstraints=c(DisketteConstraints,ttt,rep(0,l*l))
}

# Construct matrix input of lpSolve
const.mat=matrix(c(CoefficientsOfDiskettes,PrependCoefficientsOfDiskettes,DisketteConstraints),ncol=l*(l+1),byrow=T)

print("Matrix Coefficients:")
print(const.mat)

# Construct inequalities/equalities
const.dir=c(rep(">=",l),rep("=",l),rep("<=",l))

# Construct Right-Hand side
const.rhs=c(rep(0,l),rep(1,l),rep(1,l))

# Construct Objective Function
objective.in=c(rep(1,l),rep(0,l*l))

# Invoke lpSolve
mylp=lp(direction="min",objective.in=objective.in,const.mat=const.mat,const.dir=const.dir,const.rhs=const.rhs,all.int=T)

# Print Results
print(paste("Number of Diskettes: ", sum(mylp$solution[1:l])))
tz=matrix(mylp$solution,ncol=l,byrow=T)
print("File Sizes: ")
print(FileSizes)
for(i in 2:(l+1)){
   files = which(tz[i,] == 1)
   if(length(files) > 0){
      print(paste("Files in diskette ", i-1))
      print(files)
   }
}

Most of the code above is setting up the matrix of coefficients. The line 70 then calls on lpSolve to compute the optimal values of the variables

Program Output

Running this code we get the output

[1] "Matrix Coefficients:"
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
 [1,]    1    0    0    0   -1 -0.2 -0.1 -0.1    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
 [2,]    0    1    0    0    0  0.0  0.0  0.0   -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0     0   0.0   0.0   0.0
 [3,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0
 [4,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1
 [5,]    0    0    0    0    1  0.0  0.0  0.0    1   0.0   0.0   0.0     1   0.0   0.0   0.0     1   0.0   0.0   0.0
 [6,]    0    0    0    0    0  1.0  0.0  0.0    0   1.0   0.0   0.0     0   1.0   0.0   0.0     0   1.0   0.0   0.0
 [7,]    0    0    0    0    0  0.0  1.0  0.0    0   0.0   1.0   0.0     0   0.0   1.0   0.0     0   0.0   1.0   0.0
 [8,]    0    0    0    0    0  0.0  0.0  1.0    0   0.0   0.0   1.0     0   0.0   0.0   1.0     0   0.0   0.0   1.0
 [9,]    1    0    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[10,]    0    1    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[11,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[12,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[1] "Number of Diskettes:  2"
[1] "File Sizes: "
[1] 10  2  1  1
[1] "Files in diskette  1"
[1] 2 3 4
[1] "Files in diskette  2"
[1] 1

Interpreting the Result

Lines 2-14 of the output gives you the matrix of coefficients. Line 15 prints the number of diskettes needed to store the files. Line 17 prints the randomly generated file sizes from 1 to 10. Finally lines 18-21 prints which diskettes contain which files.

The space complexity of this solution is quite substantial. Given N files, we need to specify N^2 + N variables by 3\times N equations for a total of (N^2 + N)\times 3N memory space for coefficients.

Advertisements

Published by

Bobby Corpus

Loves anything related to Mathematics, Physics, Computing and Economics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s