We can solve the Interview Question using a mathematical technique called Integer Programming. Let be the variables representing diskette 1, diskette 2, diskette 3, etc. The values of the variables can only be 0 or 1. A 0 means the diskette is not used while a 1 means that it is used.

Each file is saved to a certain diskette. We want to know to what diskette a given file is assigned. To represent this, we assign the variable a value of 1 if file is assigned to diskette .

We will normalize the file sizes so that if is the size of , the . We do this by simply dividing all file sizes by the size of the diskette. For a given diskette , the following constraint should be satisfied:

for diskette and are the normalized file sizes of file for .

Since each file can only be assigned to one diskette, we have the following constraint:

where is the variable representing the “file is in diskette “, etc.

Finally, we have to constrain the value of to be either 0 or 1, that is,

for all .

## Integer Programming Formulation

Given the above information, we can formulate the Integer Programming problem as

**Minimize**:

**subject to**

## Solving the Problem

We will use R to solve this Integer Programming Formulation. Please see code below:

library("lpSolve") NUMFILES=4 # Generate random file sizes between 1 and 10 FileSizes=ceiling(10*runif(NUMFILES)) x = -1*FileSizes/10 l=length(x) # Each files can be in any of the diskettes. Suppose there are N files, # to determine if a file j is in diskette i, the value of variable x_ij will # 1 if file j is in diskette i, and 0 otherwise. # Here we construct the coefficients of variables x_ij which are the # sizes of the files (normalized to 1) zz=c() for(i in 1:(l-1)){ zz=c(zz,x,rep(0,l*l)) } zz=c(zz,x) # Construct the coefficients of the indicator variables representing the # diskettes d_i zzmatrix=matrix(zz,ncol=l*l,byrow=T) CoefficientsOfDiskettes=c(); for(i in 1:l){ ttt=rep(0,l) ttt[i] = 1 CoefficientsOfDiskettes= c(CoefficientsOfDiskettes,ttt,zzmatrix[i,]) } # Construct the coefficients of x_ij for constant j. These variables # satisfy the equation \sum_{i=1}^N x_{ij} SumOfFileAcrossDiskettes=c() for(i in 1:l){ ttt=rep(0,l) ttt[i]=1 SumOfFileAcrossDiskettes=c(SumOfFileAcrossDiskettes,rep(ttt,l)) } # Prepend Coefficients of variables d_i. The value of these coefficients is 0. SumOfFileAcrossDiskettesMatrix=matrix(SumOfFileAcrossDiskettes,ncol=l*l,byrow=T) PrependCoefficientsOfDiskettes=c() for(i in 1:l){ PrependCoefficientsOfDiskettes=c(PrependCoefficientsOfDiskettes,c(rep(0,l),SumOfFileAcrossDiskettesMatrix[i,])) } # Construct coefficients of d_i to construct constraint d_i <= 1 DisketteConstraints=c() for(i in 1:l){ ttt=rep(0,l) ttt[i]=1 DisketteConstraints=c(DisketteConstraints,ttt,rep(0,l*l)) } # Construct matrix input of lpSolve const.mat=matrix(c(CoefficientsOfDiskettes,PrependCoefficientsOfDiskettes,DisketteConstraints),ncol=l*(l+1),byrow=T) print("Matrix Coefficients:") print(const.mat) # Construct inequalities/equalities const.dir=c(rep(">=",l),rep("=",l),rep("<=",l)) # Construct Right-Hand side const.rhs=c(rep(0,l),rep(1,l),rep(1,l)) # Construct Objective Function objective.in=c(rep(1,l),rep(0,l*l)) # Invoke lpSolve mylp=lp(direction="min",objective.in=objective.in,const.mat=const.mat,const.dir=const.dir,const.rhs=const.rhs,all.int=T) # Print Results print(paste("Number of Diskettes: ", sum(mylp$solution[1:l]))) tz=matrix(mylp$solution,ncol=l,byrow=T) print("File Sizes: ") print(FileSizes) for(i in 2:(l+1)){ files = which(tz[i,] == 1) if(length(files) > 0){ print(paste("Files in diskette ", i-1)) print(files) } }

Most of the code above is setting up the matrix of coefficients. The line 70 then calls on **lpSolve** to compute the optimal values of the variables

## Program Output

Running this code we get the output

[1] "Matrix Coefficients:" [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [1,] 1 0 0 0 -1 -0.2 -0.1 -0.1 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 [2,] 0 1 0 0 0 0.0 0.0 0.0 -1 -0.2 -0.1 -0.1 0 0.0 0.0 0.0 0 0.0 0.0 0.0 [3,] 0 0 1 0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 -1 -0.2 -0.1 -0.1 0 0.0 0.0 0.0 [4,] 0 0 0 1 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 -1 -0.2 -0.1 -0.1 [5,] 0 0 0 0 1 0.0 0.0 0.0 1 0.0 0.0 0.0 1 0.0 0.0 0.0 1 0.0 0.0 0.0 [6,] 0 0 0 0 0 1.0 0.0 0.0 0 1.0 0.0 0.0 0 1.0 0.0 0.0 0 1.0 0.0 0.0 [7,] 0 0 0 0 0 0.0 1.0 0.0 0 0.0 1.0 0.0 0 0.0 1.0 0.0 0 0.0 1.0 0.0 [8,] 0 0 0 0 0 0.0 0.0 1.0 0 0.0 0.0 1.0 0 0.0 0.0 1.0 0 0.0 0.0 1.0 [9,] 1 0 0 0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 [10,] 0 1 0 0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 [11,] 0 0 1 0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 [12,] 0 0 0 1 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 0 0.0 0.0 0.0 [1] "Number of Diskettes: 2" [1] "File Sizes: " [1] 10 2 1 1 [1] "Files in diskette 1" [1] 2 3 4 [1] "Files in diskette 2" [1] 1

## Interpreting the Result

Lines 2-14 of the output gives you the matrix of coefficients. Line 15 prints the number of diskettes needed to store the files. Line 17 prints the randomly generated file sizes from 1 to 10. Finally lines 18-21 prints which diskettes contain which files.

The space complexity of this solution is quite substantial. Given files, we need to specify variables by equations for a total of memory space for coefficients.