Show HN: An open-source, self-hostable synced narration platform for ebooks

smoores.gitlab.io

Hi, I made a thing! This is by far the most work I've ever sunk into a side project; I've been working on this thing for over two years, and I'm super proud of it, even though there's still a lot more to do!

Storyteller is a self-hosted platform for ebooks with synced narration. This is basically self-hosted WhisperSync, for anyone familiar with that Amazon product.

It's currently made up of two self-hostable backend systems and a mobile app for reading and listening to the books it produces. Technically it uses an open spec, EPUB 3's "Media Overlay", for syncing the narration, but very few ebook apps actually support Media Overlays, and even fewer work well and have nice interfaces.

The mobile app is available on the Apple App Store as "Storyteller Reader", and I plan to release it for Android as well early next year.

Anyway, I hope someone finds this interesting or useful!

Daily Digest email

Get the top HN stories in your inbox every day.

r4victor

Amazing! I've made a similar ebooks-audiobooks aligner years ago: https://github.com/r4victor/syncabook. At that time, I chose to synthesize the text and align two audio sequences because I found texts-alignment approaches (including ML-based ones) too compute-intensive and inadequate for long texts. I see Storyteller works by aligning the texts. Could you give some view on how long it takes to sync a book?

Also, my experience was that audio and text versions are often very different (e.g. the audio having an intro missing from the text). It'd be very interesting to know how well Storyteller handles such cases. Does it require manual audio/text editing or handle the differences automatically?

smoores

Hello! syncabook is awesome, and indeed Storyteller does take "the opposite" approach when it comes to forced alignment.

Others have linked to the docs, where I go into detail about the syncing algorithm, but at a high level:

Storyteller uses Whisper to transcribe the audio to text (this is the most computationally expensive part of the process)

Then we use a Levenshtein-distance-based fuzzy search algorithm to find each chapter in the text (this is attempting to account for the difference between audio and text versions, as you said!)

Then for each chapter, we find the start and end timestamp of each sentence, again using a fuzzy search across the transcription.

In general, Storyteller does a pretty good job; it treats the ebook as the source of truth, which means that at the moment it sometimes misses introductory and ending pieces of the audiobook, though it's on the roadmap to have some support for explicitly triggering those when that happens.

NoahKAndrews

The docs say it's usually 1-4 hours depending on the book and the hardware: https://smoores.gitlab.io/storyteller/docs/syncing-books

The docs also have a detailed section about the algorithm that goes into how it auto-handles differences between the audio and the text.

cyberax

One obvious optimization is to sample the audio file at regular intervals and transcribe only a part of the text. Then just interpolate the locations. This can speed it up by a couple of orders of magnitude.

smoores

This is true, but it really limits the ability to highlight the current sentence visually while it's being read, which is great for language learning and for reducing cognitive load. I actually spent a lot of time trying to get the timing as precise as I could to make this feel as natural as possible, and I think the effect is really nice!

sphars

This is really neat, it's something I hadn't thought about before. I've started listening to audiobooks on my commute, but I read at night. I currently use audiobookshelf[0] to listen to my ebooks, and it has support for ebooks as well. I've added a comment[1] on a discussion if audiobookshelf could read the epubs your took creates.

[0]: https://www.audiobookshelf.org/

[1]: https://github.com/advplyr/audiobookshelf/issues/189#issueco...

smoores

> I've started listening to audiobooks on my commute, but I read at night.

This is basically exactly why I started down this road over two years ago. I really wanted to be able to switch back and forth between my audiobooks and their text representations!

Thanks for mentioning Storyteller in that discussion, I'll have to hop in there!

sphars

Looking forward to trying out the Android app when it's available!

rpxio

I absolutely love this. However, my wife and kids all read EPUBS on kobo e-readers, so I wish we could somehow sync the last page read from kobo to Storyteller so that we could pick up on audiobook later. I’m not opposed to installing koreader on all of our kobos either if that would be required for syncing… it does look like koreader doesn’t support epub3 media overlays, but it does have a sync feature.

smoores

Thanks! Server-side position syncing and integration with KOReader are both on the roadmap, actually; you're not the first one to bring this up!

whycome

Amazon now has the text (books), and the audio (audible), and it’s absurd that there’s not some sort of sync feature. It would actually encourage people to cross-purchase books. There are so many times that I’m reading an ebook and I want to continue while driving and wish there was some sort of obvious and seamless “handoff” to continue with the audio version.

Infinitesimus

This has existed for a while, it's called Whispersync for Voice. Not available for all titles but it's there

whycome

TIL!

It looks like the feature is only available if you “add on” the audible version when you’re making your ebook purchase? And, for limited titles.

If I just bought a book in audible, there should be a “buy ebook” button in that app! And if I have the book in kindle, it should give me the option to add on the audio book after purchase. Seems like a missed opportunity— there must be a reason for it being so clunky.

Edit: I have not been able to find a single whispersync title. Looks like it’s not enabled at all in Canada? And the US books that have the feature don’t even follow the setup (eg icon) as described on the website (https://www.audible.com/ep/wfs)

smoores

Wow, this really blew up while I wasn't looking! Thank you everyone who's popped in here to ask questions and give feedback. If anyone does spend some time trying to set this up, please don't hesitate to hop into our Gitter channel (https://smoores.gitlab.io/storyteller/docs/say-hi) and say hi or ask for support or give feedback.

0x073

More information would be nice, a link to the iOS app or screenshots or what features the project have.

Is it a ebook/a book library like audiobookshelf with sync or just sync? ( https://www.audiobookshelf.org/ )

smoores

That's a great point; I'll try to add some more of these. I definitely meant to link to the app store page from the docs; I actually just updated them to include that.

It's a full ebook/audiobook library, with sync, though I've focused much more on the reader experience so far than the library management experience. Improving the library management experience is on the horizon, though!

joshstrange

Finding the app wasn’t super easy, I do wish they’d link to from the mobile apps page

https://apps.apple.com/us/app/storyteller-reader/id647446772...

mike986

Super cool project!

> even though there's still a lot more to do

A few have asked on this thread already, but since you're already using AI to transcibe, it would be super cool if we can use AI to generate audio using TTS

I quit audible (signed up a few times) because there are very few high quality audio book, even those spoke by the authors are bad (most of them are not pro narrator)

A good AI would be amazing, as they never get tired speaking for hours, yet maintaining the same energetic voice, intonation and pace.

smoores

It's on the list! https://gitlab.com/smoores/storyteller/-/issues/9

undefined

[deleted]

sandreas

This is pretty interesting...

I once wrote a similar thing for building a custom LJSPEECH dataset out of ebook/audiobook combinations to synthesize my favorite narrator voices using coqui-tts and the VITS model and make them "publish" books that never came out as audiobook.

It was able to synchronize the book contents to timestamps, split the spoken word in to sentences and create a LJSPEECH datasets out of the combinations. I used aeneas[1], it was a bit finicky to set up, but after a while it even was able to map non-english languages (in my case german) with more than 80% accuracy. Worked out pretty well, the LJSPEECH datasets were good (I still have them here), but the TTS tech was not there yet :-) Maybe it's time to revive this project using newer modelling approaches like XTTS or something...

[1]: https://www.readbeyond.it/aeneas/

vagrantJin

I've thought about exactly this a few years back but lacked the technical skills to implement it. there are some great books out there as you mentioned, but even worse are great books with mediocre narration/production. eg, A Song of Ice and Fire on Audible is absolutely horrid. The Martian by Andy Weir is fantastic. Can I transplant Will wheaton or Greg Tremblay into GOT? Can I have multiple characters narrated by different voices?

please revisit it if you can.

aedocw

You can do this today, though you would definitely be breaking copyright (you need to strip the DRM from the epub), and if you're cloning someone's voice without their permission you're probably breaking some more laws. You're pretty safe though assuming you don't distribute it or try to make money.

Check out https://github.com/aedocw/epub2tts for creating an audiobook from epub. Take a look in the utils directory for notes about fine-tuning a voice clone. I can tell you I've done some voices that are close enough to the original to be pretty shocking.

Feel free to get in touch if you have any questions, it's pretty fun making your own audiobooks with the reader of your choice!

monkeywork

IMHO the original narration on The Martian by RC Bray is better than Wil's. I enjoyed Wil's work on Ernest Cline's books but RC Bray and Dennis E. Taylor are (for me) top of the mountain when it comes to SF narration.

cyberax

You didn't include the link: https://smoores.gitlab.io/storyteller/

Looks super nice, the next step is to build a fully synced ecosystem for book management.

AnyTimeTraveler

You mean a system like Audiobookshelf[0]? I can highly recommend this, by the way. Works more reliably than any paid service I've ever tried.

[0] https://www.audiobookshelf.org/

cyberax

I'm more interested in something that would unite audiobooks and textual books.

I love to jump between listening (in my car or while walking) and reading. Right now, only Amazon Kindle + Audible provides a good experience, but it's impossible to import your own audiobooks into Audible.

smoores

Yup, this is the goal! Library management and a reader app already exist, though there's definitely work to be done, especially on the library management front.

qwerty456127

What I really want to get from the new era of machine learning we supposedly are coming through is human-quality self-hosted text-to-speech and speech-to-text so I would be able to listen to text ebooks and convert big podcasts and video/audio lecture courses to text making it easy to search through them and quote phrases from them. Is this it? Whatever I could find so far were either significantly worse than a human could do or expensive online services.

bberenberg

Amazing, I’ve been wanting something like this for years. If only Libby would integrate this so it could be used with rented books.

It would be great if you could add a link to the app on the App Store.

smoores

I keep forgetting to do this! Here's a link: https://apps.apple.com/us/app/storyteller-reader/id647446772... and I'll push up a change to the docs right now that includes that.

ck_one

What’s your use case for it?

bberenberg

Audiobooks while running / cooking / other activity where reading doesn’t make sense.

Ebook elsewhere.

timmb

Looks great! Is there an e-ink e-reader it’s compatible with? Would love to abandon the Amazon castle but could not go back to reading on a screen.

jupiter909

Looks like an interesting project.

I do highly suggest that a quick intro demo video and/or screen shots of a tool like this would be beneficial to the project.

smoores

Thanks! I think you're probably right. In the meantime, there are some screenshots on the App Store page for the reader app: https://apps.apple.com/us/app/storyteller-reader/id647446772...

Daily Digest email

Get the top HN stories in your inbox every day.