The context are the parts of a discourse that surround a word or passage and can throw light on its meaning. So the words or tokens around the words or tokens you are analysing.
To learn about the meaning of a word, you can look/count the words surrounding the words you're interested in (the context). If you do this with a lot of text, you get an idea of what the context around a particular word usually is. Then you can compare the context of one word with the context of another word to see how similar it is. If the context seems similar, the word probably means something similar. This can be said based on the distributional hypothesis. From these counts of words appearing in the context you can also construct a vector which words really well as you can then compare words based on how similar the vectors are.
This is very much the same as the Lesk similarity. There you check the overlap in the glosses of two words in the thesaurus. But with context, you just do it with the contexts of words in a corpus. Arguably this is better, than for this no human annotation is required.
This approximates meaning. Basically, the computer can never grasp the meaning of a word, but it can know if two words are likely to mean the same / are synonyms because the vectors are close.
Contexts can be any size you want. You can take an entire document as a context or for instance a sentence.