Personal Concierge Using OpenAI's ChatGPT via Telegram and Voice Messages

Daily Digest email

Get the top HN stories in your inbox every day.

sheepscreek

Great work! Thanks for taking the time to build this and open sourcing it. For me, it scratches on a very real itch. I’ve toyed with the idea of a personal AI assistant to act as a second brain, and help me prioritize and remember things with human intuition (“you can’t afford to delay X, do X today and maybe reach out to Z to set the expectations for delaying Y?”).

To me, the greatest strength of LLMs is not their knowledge (which is prone to hallucination), but their ability to analyze ambiguous requests with ease and develop a sane action plan - much like a competent human.

One a side note: wouldn’t it be significantly cheaper and as effective to use ChatGPT 3.5 by default, and reserve GPT 4 for special tasks with explicit instruction (“Use GPT 4 to…”).

For most chats, GPT 4 would be incredibly wasteful (read: expensive).

Also - it would be very cool to experiment the use of GPT 3.5 and GPT 4 in the same conversation! GPT 3.5 could leverage the analysis of GPT 4 and act as the primary communication “chatbot” interface for addressing incremental requests.

jbellis

GPT4 is so much better than 3.5 at virtually everything that I don't think it's worth trying to figure out which ones 3.5 is almost adequate enough for.

Also, the pricing is per token, so even with 4 it is close to negligible unless you are loading in a lot of context or your conversation gets very long.

literalAardvark

The real problem with 4 is that OpenAI are having trouble keeping up with the demand.

So it might not be an option until 4 turbo comes out in 37 hours or whatever their development cycle is these days.

ericmcer

Seriously their release cycle has been so rapid that I am building stuff with the idea that AI will be better and cheaper by the time I am ready to release it.

Right now it is a bit of a blocker because you can easily get a single prompt to cost you .005c - .01c, which would crush you if you ever had any kind of scale.

hesdeadjim

Yea, I find it a fair bit annoying that I pay $20 and have a 25 message cap every 3 hours.

dror

GPT4 is definitively the future, but GPT3.5 is the present :-).

In addition to being more expensive, GPT4 is a lot slower. For most casual things I use gpt3 and upgrade to GPT4 as needed. I've actually had a couple of days where I spent > $1 on GPT4. It's hard to do with every day chat, but easy to do when you get it to look/improve large amounts of code.

This is all from the API/CLI not the web interface.

avereveard

This by defsult is using the zero shot agent attached to Google, it is burning trough tokens.

jjcon

Maybe it is just my niche use cases but after spending a good few hours with both, 3.5 for me has actually produced more coherent outputs for me which is a little confusing to me. Maybe I need to rethink my prompting or something.

russ

The one exception is probably speed. GPT-4 is noticeably slower than GPT-3.5.

https://twitter.com/natfriedman/status/1639029709395886080?s...

ratg13

>a personal AI assistant to act as a second brain, and help me prioritize and remember things with human intuition

Everyone wants this, but this is not the product.

The current AI offerings are information in --> information out.

It is not meant to keep state long term, and it is not meant to be your friend. It is meant to answer questions with the information available to it.

You can even see in the example screenshot they showcase the fact that it is not designed to be asked follow-up questions.

hn_throwaway_99

That's just the case for this bot, but is obviously not the case for ChatGPT.

I have a long running conversation with ChatGPT that I use to keep track of a verbal to-do list. I tell it my items with categories (e.g. work, personal, etc.) and estimated times, and then it outputs my complete task list, grouped by category. I then just tell it when I add tasks or complete tasks, and it continually keeps track of and outputs my current outstanding task list.

I've been using this for weeks now, and since it's all in a single conversation ChatGPT can keep track of the entire state over time.

I don't have access to plugins yet but it would be trivial to implement a personal AI assistant with ChatGPT if it could, for example, look up flight times and prices.

m3kw9

And what happens when your window runs out? It prunes the oldest convo which you will need to keep track. Likely it could be fine, but if there was a task from a long time ago, that could get wiped out

ratg13

ChatGPT can keep some state for you, but there is a limit to the amount of tokens you can keep going in an instance.

It’s enough to keep a todo list going, it’s not enough to make it your friend / coworker

If you built what you were describing right now, either the flight questions would push out your todo list, or you would need to build something to keep state yourself.

ravenstine

Oh dang, that sounds awesome. So a ChatGPT conversation doesn't have a historical limit? I guess I assumed it would start having to forget things at a certain point.

joseph_grobbles

[dead]

JanSt

You can use Langchain / Plugins to get around that issue

tudorw

good point, I'm doing this with chatGPT, I use a long conversation with 3.5 to help me write prompts for 4, it's fun, when I hit the rate cap on 4 I'll go back to 3.5 with what 4 came up with, then converse on the topic until my 4 cap lifts, combine that with being able to ask bing for things like links and current information and dalle for image based visualisations makes for an intriguing combination, bard gets a look in to but so far seems a little shy compared to 4 or bing.

umaar

Made something similar recently, but for WhatsApp: https://chatbling.net/

What behaviour would users prefer when uploading a voice message, a) the voice message is transcribed, so speech to text? Or b) the voice message is treated as a query, so you receive a text answer to your voice query?

I've done a) for now as mobile devices already let you type with your voice.

swores

I'd quite like a twilio script I could host that enables voice to voice with ChatGPT over a phone call, but for messaging apps (I'm gonna to try yours, though would prefer Signal) I'd personally prefer to stick with typing and use Apple's transcription (the default microphone on iOS keyboard) for any voice stuff - still wanting text back.

This is (in addition to the fact that Apple's works pretty well for me) mostly because that way I get to see the words appear as I'm speaking, and can fix any problems in real-time rather than waiting until I've finished leaving a voice note to find out it messed up. Bing AI chat, for example, trying to use their microphone button just leads to frustration as it regularly fails to understand me. But maybe Whisper is so good that I'd hardly ever need to care about errors?

I do suspect I'm an outlier in terms of how I use dictation, checking as I go - at least based on family members, they seem to either speak a sentence then look at it, or speak and then send without looking - so for them, off-device transcription would probably be welcome as long as it even slightly improves accuracy rates.

umaar

I see my server has restarted a few times! I imagine it's folks here since I haven't shared Chat Bling elsewhere yet. Sorry to anyone who started generating images, but haven't received a response. The 'jobs' for images generations are stored entirely within memory, so a server restart will lose all of that.

Going forward, I'll explore storing image jobs in redis or something, which will be more resilient to server crashes.

As for conversation history, I'll continue to keep that in memory for now (messages are evicted after a short time period, or if messages consume too many OpenAI tokens) - even that's lost during a server restart/crash. Feels like quite a big decision to store sensitive chat history in a persistent database, from a privacy standpoint.

swores

You could have a default "will be wiped after <x time>" policy / notification up front, plus an option to change this (in either direction, one way to "only store this in RAM not the DB, and wipe it as soon as I close this window - or maybe after an hour of inactivity", the other way to "please never delete (we reserve the right to delete anyway but will keep for at least Y days/months/whatever)". And also a "delete now" button to override. And then a cron job checking what's due to be deleted and wiping them from the DB/memory?

Of course, it maybe also adds more pressure to keep the server more secure without private conversations being accessible after a reboot...

umaar

Agreed, giving the user a choice would be best here. Something tells me most users would not change it from whatever the default is, but yeah still good to expose this as a setting which should be doable. Thanks for the input!

jimmyjack

How did you get Meta to approve? Been trying for so long.

jaggs

This is very cool. I tested it with a quick reminder request and it seemed to work. I'm a bit terrified by the privacy issue though. Combining OpenAI with WhatsApp seems like a marriage made in hell.

I guess the only solution will be to move to local bots and models on the phone which will interface out only when needed.

djohnston

dude how did you get Meta to approve your WA Business? I couldn't get verified after like two weeks of trying and gave up :(

hombre_fatal

This is the new hello world.

https://github.com/danneu/telegram-chatgpt-bot

https://t.me/god_in_a_bot (demo bot)

I tried building this for WhatsApp but Twilio is weirdly expensive. I don't even think Twilio is cheap for sending 2FA tokens.

moralestapia

I'm also on Twilio and yes it is expensive, a longish call (10 mins) comes to about $1.

They charge you for:

* Time spent using the "Twilio Client" (whatever that means)

* Inbound call time

* Transcription of each audio chunk, billing them at a minimum of 15s per function call

* Every time you use their text-to-speech functions (not even that is free)

aqme28

It is expensive, so I made a version for myself as a Discord bot https://github.com/alexQueue/GPTBotHost/ (note: code is sketchy. Not really cleaned up for the public)

hombre_fatal

> note: code is sketchy. Not really cleaned up for the public

No worries, you're in good company.

aivisol

Nice work. I have a question though. The example chat window you show has an interaction where AI explains that it cannot remember the previous question. Isn’t Langchain there for exact that purpose or am I missing something?

andag

Can you update the readme with some info privacy wise. Some info on who I'm sharing my data with?

Openai - fair enough, already doing that a fair amount .

pantulis

It seems that you can self host the thing. Apart from that, it seems that you would be sharing info obviously with OpenAI (both GPT and Whisper), Telegram and Google.

aftergibson

I'm guessing on step 3, the meant touch .env, not mkdir.

  mkdir .env and fill the following:

    TELEGRAM_TOKEN=
    OPENAI_API_KEY=
    PLAY_HT_SECRET_KEY=
    PLAY_HT_USER_ID=

marc

For people who are looking for a hosted solution: https://t.me/marcbot

Being able to use voice messages as an interface makes a huge difference. I can just ramble on, sharing my thoughts, and then have GPT turn it into something sensible.

Great for brainstorming, getting your thoughts out on "paper", etc.

mkw5053

I’ve been heavily using chatgpt (gpt 4) on my honeymoon/baby moon/vacation in Spain. Everything from itineraries to asking art history questions in museums. I’ve mainly been using the voice input on my iPhone for chatgpt on a mobile browser and I can’t help but think how useful better voice support will be.

tikkun

I've got an iPhone app in testflight beta that has speech to text and text to speech. Basically a nicer iPhone app for GPT-4, I tried most of the existing ones and none were particularly nice UX.

Pricing model for now is you just pay exactly what we pay (we just pass on the API costs plus Apple's 30%, no markup). We could add a use your own API key thing too to avoid Apple's 30%.

If you'd like access, email in profile

golergka

Did you have access to plugins?

m348e912

I don't know what the parent used, but here is an example of how to integrate GPT with your iPhone.

https://twitter.com/mckaywrigley/status/1640414764852711425

mkw5053

Unfortunately I’m still waiting

MetaWhirledPeas

Not as cool, but there for the lazy: install the Bing app on your phone (I guess you need to be accepted into the beta first?). I use it as a slow-thinking alternative to Google Assistant that usually gives much better answers.

throwaway2203

The Bing app isn't as responsive as ChatGPT. I asked it a slightly question about my taxes and it "binged" something weird and gave me a non-answer generic response.

titaniczero

I’ve noticed that bing chat is better if you instruct it not to search anything, that way it will use the model knowledge. I’ve learned to use the model knowledge or the web search results summary depending on what I want. But ChatGPT is still way better for model knowledge because it has fewer restrictions.

I wish they would make this distinction clearer in the UI. Most of the time it can answer without resorting to search, I think it would be better if the user explicitly specifies that they need web results.

MetaWhirledPeas

It definitely has a web search bias, no surprise, but that's kind of its superpower too. It lacks the snappy responses that Assistant can give, especially with routine questions like the weather.

rapsey

Are there any offline text to speech options that supports a wide variety of languages?

floitsch

Did something similar (without voice) that runs on an esp32. This way I don't need any server or keep my desktop machine running.

Supports Telegram and Discord.

https://github.com/floitsch/ai-projects/tree/main/chat

sheepscreek

OP integrated LangChan and the ability to Google results (and a neat way to integrate more agents). That’s the main draw for me in their implementation.

floitsch

Yeah... That would be hard to do in the limited memory of the ESP32. Main issue is the cost of TLS connections.

Heloseaa

Recently did the same in a lightweight alternative with python: https://github.com/clemsau/telegram-gpt

Looking to make it accessible, cheap and as lean as possible. I'd love to hear potential features ideas.

Hadriel

can you choose to use gpt4?

Heloseaa

I will look into it when I will be granted access to the GPT4. But yeah, I plan to the make accessible the switch between GPT 3.5 and GPT 4 right into telegram.

Daily Digest email

Get the top HN stories in your inbox every day.