Do you like to read? Do you think that books should be free? Do you have some spare time?
Project Gutenberg (PG) (www.gutenberg.org) is an organization devoted to making electronic texts available free to everyone. All PG books are free of copyright in the United States. This means that the vast majority of the texts are old. The texts are scanned, digitized, and uploaded to the site in several formats. Most are available as plain text and html. Many are available in special formats for e-readers. PG also publishes audio books, musical scores and recordings, and photographs.
There are many ways to help PG, but the easiest is to volunteer with Distributed Proofreaders (DP)(www.pgdp.net). The concept of DP is to distribute the work of preparing an electronic book among a number of individuals, each doing a small part of the project. The book starts with a scanned image that has been processed through an OCR (optical character recognition) program. The OCR text is examined by three rounds of proofreaders, with each page being seen by three different individuals. After proofreading, the text goes to the formatters, who add markup for things like italics, boldface, poetry, block quotations, and chapter headings. After formatting, the entire text goes to the post-processor, who will put the pages together, resolve any issues raised by the proofers or formatters, and prepare a final text and usually also an html version of the book, which will be uploaded to Project Gutenberg.
As a beginner with DP, there are three types of tasks you can volunteer to do: proofreading, formatting, and smooth reading. For proofreading or formatting, you must register with DP and get a password. Anyone can do smooth reading.
Beginning proofreaders work in round P1. You will see only one page at a time of the text, and the pages you see may not be consecutive. Your job is to compare the scans with the OCR output and correct the OCR to match the scan. You are also expected to make a comment if you see something you think is a typographical mistake in the original. Many of the books available in P1 are in languages other than English, but that doesn’t mean you can’t work on them! There is a wide range of books to choose from, and you can stay with one book or skip around and try a few pages of several. For example, at the time I’m writing this article, texts in the P1 round include a biography of an admiral, a mathematical text, a history of travels in the western United States from 1748-1846, the 11th edition of Encyclopaedia Britannica, a chemistry text in French written by Lavoisier, and many other fascinating topics. If you are the kind of person, like me, who enjoys correcting the errors of others, you will find proofreading addictive. You will also get a good laugh now and again when the OCR misreads the text in a funny way, as it did this morning when I found a reference to “the virgin Marv” in a book about the Knights Templars.
Formatting will appeal to the type of person who likes to arrange and organize things. As a beginning formatter, you will work in round F1. You will also see one page at a time, with the original scan and the output of the third round of proofreading. Some pages will need no marking at all, others a lot. A large table, for example, may take an hour or more to format. It’s your job to make sure the spacing is correct around the titles, to mark the footnotes and illustrations, and to make sure the lines of a poem are indented correctly. Some formatters even specialize in tables or indices. At this time, there isn’t a lot of work for F1, but there is a big backlog in F2. That’s because to qualify for F2 you must do a certain number of pages in F1 and pass a test. There aren’t enough F2 formatters to keep up with the output from F1.
What if you want to read the whole book instead of getting one page at a time? Then smooth reading is for you. The smooth reader gets the book, usually in plain text format, after the post-processor has put it together, but before it is uploaded to Project Gutenberg. All you have to do is to read the book as you normally would, paying extra attention to possible errors in punctuation or spelling that may have slipped through. You make your notes right in the text, and send the annotated text back to the post-processor. The fun of smooth reading is that you get to see the book before anyone else does, and you get to read things that you perhaps wouldn’t normally read. Examples of books now available for smooth reading are: Old Time Wall Papers, The Camp Fire Girls Solve a Mystery, History of Painting in Italy, The Trial of Oscar Wilde, and A Population Study of the Prairie Vole. There’s something for everyone in smooth reading!
I hope this short description of Project Gutenberg and Distributed Proofreaders has piqued you interest and that you’ll want to check out some of the fascinating old books on these sites.
I thought about editing for Project Gutenberg, but I never really looked farther into it. I have enjoyed reading classics like the Age of Innocence, and it is good to know there is a way to give back to the site for those who have the time.
The Virgin Marv reminds me of a similar OCR typo I found in a book I was reading: The King of Spam (rather than Spain.)
Since I already spend a lot of time editing manuscripts, volunteering for Project Gutenberg is not right for me, but I am glad that there are people doing this. One of my favorite Project Gutenberg books is a biography of Aaron Burr which I often return to, if I have a spare moment during the day.
It’s good that older books are being preserved and revitalized in this way, and I applaud those who volunteer do this important work.