I was given this interview question and I’d like to share it to you. The setting is back in the days when the largest size of a hard disk was 1 GB and there were no CD writers yet and the only way to back up your data is through those 1.44 floppy disks. You want to back up your files but you want to minimize the number of floppy disks you need to use. Assume your most important files are in a single directory. How will you distribute the files across your disks in such a way that the number of disks you use is minimized ?
To make this simple, let’s assume the following:
– we will not take into account that every file you copy to the disk has a record of the metadata of the file and stored on the disk as well. This will eat up space as you put more files. For our purposes, we ignore this complexity.
– The size of each file is less than or equal to 1.44
First we need to have a list of those files including the size and sort the list according to size in descending order. If A is the list of files, we can apply this algorithm:
B := list of files to copy to current floppy disk remaining_size := 1.44 MB For file in A: If remaining_size - file.size > 0: B.add(file) A.remove(file) Copy all files listed in B to disk Empty B Repeat process for remaining files in A
Although there are other better algorithms than the one above, this is the one I managed to to come up during the interview.
We now need to determine how fast our algorithm can run.
Worst Case Complexity
How slow can this algorithm get ? If for any two files and in A we have , then all files will have their own diskette. If this is the case, for each file, our algorithm will execute step 4. For the first disk, it will execute the step times. For the second disk, it will execute the step times, for the third disk it will execute times, etc. the total number of times it executes step 4 is the total number of comparisons and is equal to the summation:
which is equal to
Therefore, in the worst case, the complexity is .
Best Case Complexity
The best case is when all files fit in just one diskette. For this, the total number of comparisons is
Average Case Complexity
On the average, files have different sizes. We now compute the complexity on the assumption that the probability distribution is uniform.
If is the number of diskettes, the number of comparisons is a sequence of monotonic decreasing numbers taken at random from the set . Each of the numbers , has a probability of being chosen. Let be a random variable such that
then the number of comparisons is equal to
The expected value of is given by the
However, the expected value of X is given by
What remains is to determine the average value of , which is the number of diskettes. If is the maximum file size, the average file size is . The average total file size is then . The average number of diskettes is equal to the average total size divided by size of diskette, that is
This means that
which is the same as the worst case complexity.
There is another way to solve this problem using Integer Programming.