Proposition: If we are given n keys, where only m of these keys can open a door, then on average we would have to try \frac{n+1}{m+1} keys to open the door.

I previously discussed a similar problem in which I derived that since the probability of finding a correct key on the kth try is

P(X = k) = \displaystyle \frac{\displaystyle \binom{n-m}{k-1}m}{\displaystyle k\binom{n}{k}}

then in theory it would take \frac{n+1}{m+1} tries to oppen the door because

\displaystyle E[X] = \frac{n+1}{m+1}.

I became interested in modeling this proposition so implemented a MATLAB program, try_keys(n,m), in which each key is represented by a number from 1 to n and each correct keys is represented by a number from 1 to m. And when the try_keys(n,m) is executed it outputs a random sequence that starts with any key and ends with a correct key.

Now in theory if had 15 keys, such that 3 keys are correct, then on average I would expect to open the door on the 4th try since

\displaystyle \mu_{th}  = \frac{15+1}{3+1} = 4.

For Experiment1 I ran try_keys(15,3) a total of 10 times

try_keys(15,3)     =    13     7     4     6     5     1
try_keys(15,3)     =     4     2
try_keys(15,3)     =     3
try_keys(15,3)     =     4     5     1
try_keys(15,3)     =    14    10     5     4     2
try_keys(15,3)     =    12     4     3
try_keys(15,3)     =     7     1
try_keys(15,3)     =     2
try_keys(15,3)     =    15    10     6     1
try_keys(15,3)     =     4     9     1

and for Experiement2 I repeated the same thing

try_keys(15,3)     =    10     8     7     5     6     3
try_keys(15,3)     =    12     2
try_keys(15,3)     =    11     2
try_keys(15,3)     =     6     7     8     1
try_keys(15,3)     =    14     8     5     4     6     9     7    10     3
try_keys(15,3)     =     5     3
try_keys(15,3)     =    13     2
try_keys(15,3)     =     4     2
try_keys(15,3)     =     4     5    10     2
try_keys(15,3)     =    14    10     5     2

So from the first and second experiment, the average number of tries needed to open the door

\mu_{e1} = \frac{1}{10}(6+2+1+3+5+3+2+1+4+3) = 3

\mu_{e2}=\frac{1}{10}(6+2+2+4+9+2+2+2+4+4) = 3.7

are within 35% and 7.5% of \mu_{th}.

Now by the Law of large numbers we would expect that if we ran try_keys(15,3) an infinite number of times then the average would converge to 4. So, to illustrate this law I implemented average_try_keys(N,n,m) which runs try_keys(n,m), N times, and computes an average. And, I also implemented plot_average_try_keys(L,n,m) which plots average_try_keys(i*round(L/30),n,m) for integer values of i from 1 to 100. So, here are the plots that I got from plot_average_try_keys(L,15,3) using L values 1e2, 1e3, 1e4, 1e5 and 1e6.

And here is the code that I used to generate these plots.

try_keys

function [ s ] = try_keys(n,m)
%There are n keys which will be
%represented by numbers 1,2,...n.
%There are m correct keys which
%will be represented by numbers 1,2,...m.
%where m is less than or equal to n.
if( m > n || n <= 0 || m <= 0)
    bad_input = 1;
    quit;
end

%s is the sequence of keys draw will be s,
%where s(1) is the first key
%draw, s(2) the second, and so on.
%So at most we will try a total of n-m keys.
%s = zeros(1,n-m);
%We are assuming that that keys are
%uniformly distributed, so we will be
%using randi(10) but we wont be accepting
%keys which we have already
%tried.
s(1) = randi(n);

%We are done when we find a correct key so
%we check to see if first key is
%correct. If not then we get another key and so on.
if( size(find( s <= m ),2) ~= 0)
    done = 1;
else
    done = 0;
end
draw = 1;
while( done == 0)
isnewkey = 0;
while(isnewkey ==0)
    newkey = randi(n);
    if(size(find(s == newkey),2) == 0)
        s(draw + 1) = newkey;
        isnewkey = 1;
        if( size(find( s <= m ),2) ~= 0)
            done = 1;
        else
            draw = draw + 1;
        end
    end
end
end
end

average_try_keys

function mu = average_try_keys(N,n,m)
s =zeros(1,N);
for i=1:N
    s(i) =  size(try_keys(n,m),2);
end
mu = mean(s);
end

plot_average_try_keys

function plot_average_try_keys(L,n,m)
Num_points = 100;
expected_value = (n+1)/(m+1);
y = zeros(1,Num_points);
x = (1:Num_points)*round(L/Num_points);
for i=1:Num_points
y(i) = average_try_keys(i*round(L/Num_points),n,m);
end
plot(x,y,'-ro',x,expected_value,'-b')
h = legend('averages','expected value',2);
set(h,'Interpreter','none')

Given n possible keys where they are all equally likely to be chosen. If only 3 keys are correct, then on average how many keys would I have to try to find a correct one?

First lest try to clarify the problem by picturing the following scenario. We have a bucket with n keys and we would like to open a door but only 3 keys will work. We randomly choose a key and try to open the door. If the door opens then we are done; but if it doesn’t open then we randomly choose another key and so on. Now if we were to attempt to open this door then we would expect to do so after trying m_1 keys, where m_1 is some number between 1 and n-2. So if a friend were to attempt to open this door then we would expect some number m_2 between 1 and n-2, and from a kth friend we would get a some number m_k also between between 1 and n-2. Hence, we could argue that on average we would need to try \frac{1}{k} \sum_{i = 1}^k m_i keys to find the correct one, but each m_i is a random variable so if we were to repeat this experiment then we might get a different average! However, if k is a large number then we would expect any experimental averages to be close to each other, and if we let k approach infinity then these averages would converge to the expected value.

So we are interested in calculating the expected value but first we need to derive a probability mass function which would tell us the probability of finding the correct key on the kth try, P(X = k). Then the expected value will be given by

\displaystyle  E[X] = \sum_{k = 1}^{n-2}k P(X = k) .

For P(X = 1), we are interested in

P(finding correct key on 1st try)

which is given by

P(X = 1) = \frac{\begin{array}{c}\#~of~ ways~ in~ which  \\ 1st~ choice~ could ~be~ a~ correct~ key \end{array}}{\begin{array}{c} \#~ of~ ways~ in~ which \\ 1st~ key~ could~ be~ chosen\\ \end{array}}.

Since there are 3 correct keys then there are 3 ways in which the 1st choice could be a correct key. And, since there are n keys then there are n ways in which 1st key could be chosen. So

P(X = 1) =  \displaystyle \frac{3}{n}.

For P(X = 2), we are interested in

P(finding correct key on 2nd try)

which can be thought of as

P(1st key was not correct and 2nd key is correct)

which is given by

P(X = 2) =  \frac{ \begin{array}{c} \#~of~ ways~ in~ which  \\ 1st~ choice~ could ~be~ incorrect\\ and~2nd~correct \end{array}}{\begin{array}{c} \#~ of~ ways~ in~ which \\ 1st~and~2nd~ keys~ could~ be~ chosen\\ \end{array}}.

Since there are n-3 ways in which 1st choice could be incorrect and 3 ways in which 2nd choice could be correct then by the fundamental principle of counting there are (n-3)3 ways in which 1st key could be incorrect and 2nd correct. Similarly, since there are n ways in which the 1st key could be chosen and (n-1) ways in which 2nd key could be chosen then thre are n(n-1) ways in which 1st and 2nd keys could be chosen. So,

P(X = 2) = \displaystyle \frac{(n-3)3}{n(n-1)}.

If follows that

P(X = 3) = \displaystyle \frac{(n-3)(n-3 - 1)3}{n(n-1)(n-2)}

so in general

P(X = k) = \displaystyle \frac{(n-3)(n-3 -1) \cdots (n-3 - (k-2))3}{n(n-1)(n-2) \cdots (n - (k-1))}

which can be simplified to

P(X = k)  = \displaystyle \frac{\displaystyle \binom{n-3}{k-1}3}{\displaystyle k\binom{n}{k}}

where the binomial coefficient \binom{N}{i} is defined to be coefficient of the of x^i in the polynomial expansion of (1 + x)^N.

So the average number of keys we would have to try, to find a correct key would be given by

\displaystyle E[X] = \sum_{k=1}^{n-2} \frac{\displaystyle \binom{n-3}{k-1}3}{\displaystyle \binom{n}{k}}

which results in

\displaystyle E[X] =  \frac{n+1}{4}.