A language model learns probabilities by counting word sequences (n-grams) in a large training corpus. For a bigram model, it learns the chance of a word appearing after another. This learned knowledge is then used to evaluate new sentences and calculate perplexity.