# An Interview Question: Using Integer Programming

We can solve the Interview Question using a mathematical technique called Integer Programming. Let $d_1, d_2, \ldots, d_N$ be the variables representing diskette 1, diskette 2, diskette 3, etc. The values of the $d_k$ variables can only be 0 or 1. A 0 means the diskette is not used while a 1 means that it is used.

Each file is saved to a certain diskette. We want to know to what diskette $d_i$ a given file $f_j$ is assigned. To represent this, we assign the variable $a_{ij}$ a value of 1 if file $f_j$ is assigned to diskette $d_i$.

We will normalize the file sizes so that if $s_i$ is the size of $f_i$, the $s_i \le 1$. We do this by simply dividing all file sizes by the size of the diskette. For a given diskette $d_i$, the following constraint should be satisfied:

$d_i - s_1a_{i1} - s_2a_{i2} - \ldots - s_N a_{iN} \ge 0$

for diskette $i = 1, 2, \ldots, N$ and $s_i$ are the normalized file sizes of file $f_i$ for $i=1,2,\ldots,N$.

Since each file $f_j$ can only be assigned to one diskette, we have the following constraint:

$a_{1j} + a_{2j} + \ldots + a_{Nj} = 1$

where $a_{1j}$ is the variable representing the “file $f_j$ is in diskette $d_1$“, etc.

Finally, we have to constrain the value of $d_i$ to be either 0 or 1, that is,

$d_i \le 1$

for all $i=1,2,\ldots,N$.

## Integer Programming Formulation

Given the above information, we can formulate the Integer Programming problem as

Minimize:

$d_1 + d_2 + d_3 + \ldots + d_N$

subject to

$\begin{array}{rl} d_1 - s_1a_{11} - s_2a_{12} - s_3a_{13} - \ldots - s_Na_{1N} &\ge 0\\ d_2 - s_1a_{21} - s_2a_{22} - s_3a_{23} - \ldots - s_Na_{2N} &\ge 0\\ :\\ d_N - s_1a_{N1} - s_2a_{N2} - s_3a_{N3} - \ldots - s_Na_{NN} &\ge 0\\ a_{11} + a_{21} + a_{31} + \ldots + a_{N1} &= 1\\ a_{12} + a_{22} + a_{32} + \ldots + a_{N2} &= 1\\ :\\ a_{1N} + a_{2N} + a_{3N} + \ldots + a_{NN} &= 1\\ d_1 &\le 1\\ d_2 &\le 1\\ :\\ d_n &\le 1 \end{array}$

## Solving the Problem

We will use R to solve this Integer Programming Formulation. Please see code below:

library("lpSolve")
NUMFILES=4

# Generate random file sizes between 1 and 10
FileSizes=ceiling(10*runif(NUMFILES))
x = -1*FileSizes/10
l=length(x)

# Each files can be in any of the diskettes. Suppose there are N files,
# to determine if a file j is in diskette i, the value of variable x_ij will
# 1 if file j is in diskette i, and 0 otherwise.
# Here we construct the coefficients of variables x_ij which are the
# sizes of the files (normalized to 1)
zz=c()
for(i in 1:(l-1)){
zz=c(zz,x,rep(0,l*l))
}
zz=c(zz,x)

# Construct the coefficients of the indicator variables representing the
# diskettes d_i
zzmatrix=matrix(zz,ncol=l*l,byrow=T)
CoefficientsOfDiskettes=c();
for(i in 1:l){
ttt=rep(0,l)
ttt[i] = 1
CoefficientsOfDiskettes= c(CoefficientsOfDiskettes,ttt,zzmatrix[i,])
}

# Construct the coefficients of x_ij for constant j. These variables
# satisfy the equation \sum_{i=1}^N x_{ij}
SumOfFileAcrossDiskettes=c()
for(i in 1:l){
ttt=rep(0,l)
ttt[i]=1
SumOfFileAcrossDiskettes=c(SumOfFileAcrossDiskettes,rep(ttt,l))
}

# Prepend Coefficients of variables d_i. The value of these coefficients is 0.
SumOfFileAcrossDiskettesMatrix=matrix(SumOfFileAcrossDiskettes,ncol=l*l,byrow=T)
PrependCoefficientsOfDiskettes=c()
for(i in 1:l){
PrependCoefficientsOfDiskettes=c(PrependCoefficientsOfDiskettes,c(rep(0,l),SumOfFileAcrossDiskettesMatrix[i,]))
}

# Construct coefficients of d_i to construct constraint d_i <= 1
DisketteConstraints=c()
for(i in 1:l){
ttt=rep(0,l)
ttt[i]=1
DisketteConstraints=c(DisketteConstraints,ttt,rep(0,l*l))
}

# Construct matrix input of lpSolve
const.mat=matrix(c(CoefficientsOfDiskettes,PrependCoefficientsOfDiskettes,DisketteConstraints),ncol=l*(l+1),byrow=T)

print("Matrix Coefficients:")
print(const.mat)

# Construct inequalities/equalities
const.dir=c(rep(">=",l),rep("=",l),rep("<=",l))

# Construct Right-Hand side
const.rhs=c(rep(0,l),rep(1,l),rep(1,l))

# Construct Objective Function
objective.in=c(rep(1,l),rep(0,l*l))

# Invoke lpSolve
mylp=lp(direction="min",objective.in=objective.in,const.mat=const.mat,const.dir=const.dir,const.rhs=const.rhs,all.int=T)

# Print Results
print(paste("Number of Diskettes: ", sum(mylp$solution[1:l]))) tz=matrix(mylp$solution,ncol=l,byrow=T)
print("File Sizes: ")
print(FileSizes)
for(i in 2:(l+1)){
files = which(tz[i,] == 1)
if(length(files) > 0){
print(paste("Files in diskette ", i-1))
print(files)
}
}



Most of the code above is setting up the matrix of coefficients. The line 70 then calls on lpSolve to compute the optimal values of the variables

## Program Output

Running this code we get the output

[1] "Matrix Coefficients:"
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,]    1    0    0    0   -1 -0.2 -0.1 -0.1    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[2,]    0    1    0    0    0  0.0  0.0  0.0   -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[3,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1     0   0.0   0.0   0.0
[4,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0    -1  -0.2  -0.1  -0.1
[5,]    0    0    0    0    1  0.0  0.0  0.0    1   0.0   0.0   0.0     1   0.0   0.0   0.0     1   0.0   0.0   0.0
[6,]    0    0    0    0    0  1.0  0.0  0.0    0   1.0   0.0   0.0     0   1.0   0.0   0.0     0   1.0   0.0   0.0
[7,]    0    0    0    0    0  0.0  1.0  0.0    0   0.0   1.0   0.0     0   0.0   1.0   0.0     0   0.0   1.0   0.0
[8,]    0    0    0    0    0  0.0  0.0  1.0    0   0.0   0.0   1.0     0   0.0   0.0   1.0     0   0.0   0.0   1.0
[9,]    1    0    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[10,]    0    1    0    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[11,]    0    0    1    0    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[12,]    0    0    0    1    0  0.0  0.0  0.0    0   0.0   0.0   0.0     0   0.0   0.0   0.0     0   0.0   0.0   0.0
[1] "Number of Diskettes:  2"
[1] "File Sizes: "
[1] 10  2  1  1
[1] "Files in diskette  1"
[1] 2 3 4
[1] "Files in diskette  2"
[1] 1



## Interpreting the Result

Lines 2-14 of the output gives you the matrix of coefficients. Line 15 prints the number of diskettes needed to store the files. Line 17 prints the randomly generated file sizes from 1 to 10. Finally lines 18-21 prints which diskettes contain which files.

The space complexity of this solution is quite substantial. Given $N$ files, we need to specify $N^2 + N$ variables by $3\times N$ equations for a total of $(N^2 + N)\times 3N$ memory space for coefficients.