# Probabilistic Retrieval Models

For further discussion we'll make two important assumptions:

ranking the relevant documents depends on the number of documents the user has already seen: the more documents we see - the less useful they are.

relevance of \(D_i\) to \(Q\) is independent of other documents \(D_j\) from the collection. Therefore we can apply it to each document separately.

## Notation

Assume \( R=\{r, \neg r\} \) a binary random variable that indicates relevance

let \(r\) represent the event that document \(D\) is relevant

\(\neg r\) represent the event that \(D\) is not relevant

We need to estimate the probability of relevance of a document \(D\) w.r.t. query \(Q\). In other words, we need to find:

\(P(R=r|D, Q) \) - the probability that \(D\) is relevant to \(Q\)

\(P(R=\neg r| D, Q)\) - the probability that \(D\) is not relevant to \(Q\)

Applying Bayes Theorem to infer the probabilities:

\(P(R=r|D,Q)=\frac{P(D, Q|R=r)P(R=r)}{P(D,Q)}\)

\(P(R=\neg r| D,Q)=\frac{P(D,Q|R=\neg r)P(R=\neg r)}{P(D,Q)}\)