# An Interview Question

I was given this interview question and I’d like to share it to you. The setting is back in the days when the largest size of a hard disk was 1 GB and there were no CD writers yet and the only way to back up your data is through those 1.44 floppy disks. You want to back up your files but you want to minimize the number of floppy disks you need to use. Assume your most important files are in a single directory. How will you distribute the files across your disks in such a way that the number of disks you use is minimized ?

To make this simple, let’s assume the following:

– we will not take into account that every file you copy to the disk has a record of the metadata of the file and stored on the disk as well. This will eat up space as you put more files. For our purposes, we ignore this complexity.
– The size of each file is less than or equal to 1.44

First we need to have a list of those files including the size and sort the list according to size in descending order. If A is the list of files, we can apply this algorithm:

B := list of files to copy to current floppy disk
remaining_size := 1.44 MB
For file in A:
If remaining_size - file.size > 0:
A.remove(file)
Copy all files listed in B to disk
Empty B
Repeat process for remaining files in A


Although there are other better algorithms than the one above, this is the one I managed to to come up during the interview.

We now need to determine how fast our algorithm can run.

## Worst Case Complexity

How slow can this algorithm get ? If for any two files $F_i$ and $F_j$ in A we have $F_i + F_j > 1.44$, then all files will have their own diskette. If this is the case, for each file, our algorithm will execute step 4. For the first disk, it will execute the step $N$ times. For the second disk, it will execute the step $N-1$ times, for the third disk it will execute $N-2$ times, etc. the total number of times it executes step 4 is the total number of comparisons and is equal to the summation:

$\displaystyle \sum_{1=1}^{N} i$

which is equal to

$\displaystyle \frac{N(N+1)}{2}$

Therefore, in the worst case, the complexity is $O(N^2)$.

## Best Case Complexity

The best case is when all files fit in just one diskette. For this, the total number of comparisons is $N$

## Average Case Complexity

On the average, files have different sizes. We now compute the complexity on the assumption that the probability distribution is uniform.

If $k$ is the number of diskettes, the number of comparisons is a sequence of monotonic decreasing numbers $\{ a_1, a_2, a_3, \ldots, a_k \}$ taken at random from the set $\{ 1, 2, \ldots, N\}$. Each of the numbers $a_j$, $j\in \{1, 2, \ldots, k\}$ has a probability $1/N$ of being chosen. Let $X$ be a random variable such that

$\displaystyle Pr(X=a_j) = \frac{1}{N} \text{ for } j=1,2,\ldots,k$

then the number of comparisons $C$ is equal to

$\displaystyle C = \sum_{i=1}^k X = kX$

The expected value of $C$ is given by the

$E[C] = E[kX] = kE[X]$

However, the expected value of X is given by

$\displaystyle E[X] = \sum_{j=1}^N j\cdot Pr(X=j) = \frac{1}{N} \sum_{j=1}^N j = \frac{1}{N}\frac{N(N+1)}{2} = \frac{N+1}{2}$

Therefore,

$\displaystyle E[C] = k\frac{N+1}{2}$

What remains is to determine the average value of $k$, which is the number of diskettes. If $M=1.44$ is the maximum file size, the average file size is $M/2$. The average total file size is then $NM/2$. The average number of diskettes is equal to the average total size divided by size of diskette, that is

$k = \displaystyle \frac{NM}{2}\frac{1}{M} = \frac{N}{2}$

This means that

$\displaystyle E[C] = \frac{N}{2} \frac{N+1}{2} = O(N^2)$

which is the same as the worst case complexity.

There is another way to solve this problem using Integer Programming.