An interesting evaluation of any text is the probability of encountering a new word as one progresses through the writing. In the chart below, I compare Romeo and Juliet, Pride and Prejudice, War and Peace and A Child's History of England. It is interesting to note that War and Peace has a very similar trajectory to Romeo and Juliet, that Austen is clearly below this curve and that a book aimed at children is the one in which one is most likely to encounter novel terms. This later insight might be attributed to the fact that historical documentation likely includes a continuous stream of new characters and locations where as fiction tends to focus on a limited number of both.
In the below chart, the vertical axis indicates the size of the vocabulary and the horizontal axis represents progress through the book (i.e. words read).