I am biased, but building the Intro to Information Retrieval chapters in your favorite language, bit by bit, is really good to get the feel of the tradeoffs for index capabilities.
Going to second the rec on "Index", it's a very understandable, well researched book that the general audience or even a skilled practitioner would enjoy.
Came here to recommend Managing Gigabytes as well. People these days are managing far more than gigabytes but the fundamental ideas remain useful.
Check out the first review on the Amazon page. Norvig read it around the time he started at Google.
Manning also have a book on Lucene, the library that powers Solr and ElasticSearch. IIRC the book covered how Lucene actually works under-the-good and would therefore act as a good reference on the subject in general.
Taming Text is about building a question-answering system; it came out about the time Watson came online; it's not a plan, rather a cookbook of experiments using Apache products like Solr and OpenNLP, but is a great tutorial on how question answering works.
Lucene in Action covers Lucene 3.0, and is from 2010. Current version is 9.4.2. So much has changed.