Show HN: I've built a locally running Perplexity clone

github.com

The video demo runs a 7b Model on a normal gaming GPU. I think it already works quite well (accounting for the limited hardware power). :)

Daily Digest email

Get the top HN stories in your inbox every day.

nilsherzig

Happy to answer any questions and open for suggestions :)

It's basically a LLMs with access to a search engine and the ability to query a vector db.

The top n results from each search query (initialized by the LLM) will be scraped, split into little chunks and saved to the vector db. The LLM can then query this vector db to get the relevant chunks. This obviously isn't as comprehensive as having a 128k context LLM just summarize everything, but at least on local hardware it's a lot faster and way more resource friendly. The demo on GitHub runs on a normal consumer GPU (amd rx 6700xt) with 12gb vRAM.

FezzikTheGiant

If you're open to it, it would be great if you could make a post explaining how you built this. Even if it's brief. Trying to learn more about this space and this looks pretty cool. And ofc, nice work!

Nischalj10

a primer - https://github.com/nilsherzig/LLocalSearch/issues/17

nilsherzig

guys, i didn't thought there would be this much interest in my project haha. I feel kinda bad for just posting it in this state haha. I would love to make a more detailed post on how it works in the future (keep an eye on the repo?)

FezzikTheGiant

Thanks! As a CS student interested in learning more about this space how do you recommend I get started? I'm pretty early in my education so I kind of want to learn how to drive the car for now and learn how the engine works more formally later, if you know what I mean.

keefle

Wonderful work!

is it possible to make it only use a subset of the web? (Only sites that I trust and think are relevant to producing an accurate answer), and are there ways to possibly make it work offline on pre installed websites? (wikipedia, some other wikis and possibly news sites that are archived locally), and how about other forms of documents? (books and research papers as pdfs)

kidintech

Seconded. I tried to do this many years ago for my dissertation and failed, but this would be a dream of mine.

robertlagrant

Would it not be possible to create a search engine that only crawls certain sites?

nemoniac

Llocalsearch uses searxng which has a feature to blacklist/whitelist sites for various purposes.

nilsherzig

also a great idea to expose this to the frontend. thanks :)

nilsherzig

uhhhh both ideas are great, would you like to turn them into github issues? i will definitely look into both of them :)

mark_l_watson

Your project looks very cool. I had on my ‘list’ to re-learn Typescript (I took a TS course about 5 years ago, but didn’t do anything with it) so I just cloned your repo so I can experiment with it.

EDIT: I just noticed that most of the code is Go. Still going to play with it!

nilsherzig

Thanks :). Yea only the web part is typescript and I really wouldn't recommend to learn from my typescript haha

hanniabu

This is awesome, would love if there were executable files where these dependencies are needed. That would make it wayyyy more accessible rather than just to those that know how to use the command line and resolve dependencies (yes, even docker runs into that when fighting the local system).

ziziman

To scrape the websites, do you just blindly cut all of the HTML into defined size chunks or is there some more sophisticated logic to extract text of interest ?

I'm wondering because most news websites now have a lot of polluting elements like popups, would they also go into the database ?

totolouis

If you look at the vector handler in his code, he is using blue Monday sanitizer and doing some "replaceAll".

So I think there may be some useless data in the vector, but that may not be a issue since it is coming from multiple sources (for simple question at least)

koeng

What is the search engine that it uses?

nilsherzig

searxng, which is a locally running meta search engine combining a lot of different sources (including Google and co)

mmahemoff

This might be more of a searxng question, but doesn't it quickly run up against anti-bot measures? CAPTCHA challenges and Forbidden responses? I can see the manual has some support for dealing with CAPTCHA [1], but in practical terms, I would guess a tool like this can't be used extensively all day long.

I'm wondering if there's a search API that would make the backend seamless for something like this.

1. https://docs.searxng.org/admin/answer-captcha.html

d-z-m

any plans to support other backends besides ollama?

nilsherzig

Sure (if they are openai api compatible i can add them within minutes) otherwise I'm open for pull requests :)

Also, i don't own an Nvidia Card or Windows / MacOS

ivolimmen

"normal consumer GPU"... well mine is a 4GB 6600.. so I guess that varies.

nilsherzig

Sorry it wasn't my intention to gatekeep, but my 300€ card really is on the low end of LLM Things

monkeydust

This looks great, can I get a series of agents with clearly defined roles working together on a problem akin to Autogen?

wg0

In five year's time - by 2030, I foresee that lots of inference would be happening on local machines with models being downloaded on demand. Think docker registry of AI models which is pretty much Hugging Face already there.

This all would be due to optimisations within model inference code and techniques, hardware and packaging of software like the above.

Don't see billion dollar valuations for lots of AI startups out there to materialise into anything.

openquery

> I foresee that lots of inference would be happening on local machines with models being downloaded on demand

Why? It's much more efficient to have centralized special purpose hardware to run enormous models and then ship the comparatively small result over the internet.

By analogy, you don't have a search engine running on your phone right?

dns_snek

> Why?

Privacy, security, latency, offline availability, access to local data and services running on the device, just to name a few.

ilc

Big Tech + Countries: Those all sound like great reasons to centralize all access to AIs!

Sammi

You currently can't have a search engine running locally on your phone. Google search is possible the single largest c++ program every built. And nevermind the storage needs...

But in a few years we might be able to have LLMs running on our phones that work just as well if not better. Of couse as you mention the LLMs running on large servers might still be much more powerfull, but the local ones might be powerfull enough.

undefined

[deleted]

vachina

A more appropriate analogy would be driving your own car vs. taking the bus.

bufferoverflow

No, a more appropriate analogy would be driving your own billion-dollar super-yacht vs driving your own car.

Will not happen any time soon. Consumer hardware can't even run GPT-4 locally, and won't be able for a looong time. Each GPT-4 instance runs on 8 A100. The cost of such system is ~$81K. Not even in the ballpark of what most consumers can afford.

ThinkBeat

"640kb will be enough for everyone." (Gates)

I think that the models will evolve and grow as more powerful compute/hardware comes out.

You may be able to run scaled down n versions of what state of the art now, but by then the giant models will have grown in size and in required compute.

The 6 year old models will be retro computingish.

Somewhat like how you can play 6 year old games on a new powerful PC but by then the new huge games will no longer play well on your Old mach

lobocinza

There will be demand and supply for both cases.

bufferoverflow

Unfortunately training is insanely expensive. StabilityAI is struggling with staying alive. And Anthropic wants to spend $100 BILLION building a supercomputer just for training.

That's why I think these private companies will have the best AIs for many decades.

BrutalCoding

That’s a great project you pulled off. From the time I starred it (10-12h ago I think), and upon re-checking this post, you gained 500+ stars lol.

Visualized in a chart with star-history: https://star-history.com/#nilsherzig/LLocalSearch

nilsherzig

haha thanks for the chart link. i woke up with 1k more than it had yesterday, im kinda stressed out

sroussey

Ah, that’s a nice chart generator. Will have to use if I ever get any, lol.

arflikedog

A while back you commented on my personal project Airdraw which I really appreciated. This looks awesome and you're well on your way to another banger project - looking forward to toying around with this :)

gardenhedge

Did you just happened to see this post today and notice the username?

arflikedog

unironically yes, I used comments to hot fix a bunch of stuff when I first launched. It's a small world and I thought this was a cool moment

nilsherzig

Uhh yes I was really impressed by your project :)

sebzim4500

Whenever I see these projects I always find reading the prompts fascinating.

> Useful for searching through added files and websites. Search for keywords in the text not whole questions, avoid relative words like "yesterday" think about what could be in the text. > The input to this tool will be run against a vector db. The top results will be returned as json.

Presumably each clarification is an attempt to fix a bug experienced by the developer, except the fix is in English not in Go.

nilsherzig

haha yea pretty much, its amazing (and frustrating) how much of the programs "performance" depends on these prompts

htrp

Our current state of the art

also love your last commit

>fix: copilot is stupid and i should not blindly trust it

>https://github.com/nilsherzig/LLocalSearch/commit/9f45e24f15...

Everything wrong with code gen in a nutshell

nilsherzig

Yea im kinda stressed out to get it working for everyone haha. I would have caught that under different conditions. I'm a big e2e tests guy haha

xydac

This is cool, haven't run this yet but seems really promising. Am thinking how this can be a super useful to hook with internal corporate search engines and then get answers from that.

Good to see more of these non API key products being built (connected to local llms)

nilsherzig

I might try to hook this into our internal confluence, shouldn't be a problem

hubraumhugo

Excellent work! Cool side projects like that will eventually help you get hired by a top startup or may even lead to building your own.

I can only encourage other makers to post their projects on HN and put them out into the world.

nilsherzig

Yea it's also quite fulfilling to see people likening something you've put some work into :)

keyle

Impressive, I don't think I've seen a local model call upon specialised modules yet (although I can't keep up with everything going on).

I too use local 7b open-hermes and it's really good.

nilsherzig

Thanks :). It's just a lot of prompting and string parsing. There are models like "Hermes-2-Pro-Mistral" (the one from the video) which are trained to work with function signatures and outputting structured text. But at the end it's just strings in > strings out, haha. But its fun (and sometimes frustrating) to use LLMs for flow control (conditions, loops...) inside your programs.

davidcollantes

Got a link for that one? I have found a few with Hermes-2-Mistral in the name.

undefined

[deleted]

keyle

Wow, I didn't know about "Hermes 2 Pro - Mistral 7B", cheers!

nilsherzig

It's my go to "structured text model" atm. Try "starling-ml-beta" (7b) for some very impressive chat capabilities. I honestly think that it outperforms GPT3 half the time.

madacol

Have you considered using grammar sampling?

peter_l_downs

I'm just starting to get into downloading and testing models using llama.cpp and I'm curious which model you're actually using, since they seem to come in varying levels of quantization. Is this [0] the model page for the one you're using, or should I be looking somewhere else? What is the actual file name of the model you're using?

[0] https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GG...

windexh8er

Have you looked into tools like CrewAI [0]?

[0] https://www.crewai.io/

viksit

curious what hardware you use? and is any of this runnable on an m1 laptop?

keyle

Absolutely, 7B will run comfortably on 16GB of RAM and most consumer level hardware. Some of the 40B run on 32GB, but it depends on the model I found (GGUF, crossing fingers help).

I ran this originally on a M1 with 32GB, I run this on an Air M2 with 16GB (and mac mini M2 32GB), no problem.

I use llama.cpp with a SwiftUI interface (my own), all native, no scripts python/js/web.

7b is obviously less capable but the instant response makes it worth exploring. It's very useful as a Google search replacement that is instantly more valuable, for general questions, than dealing with the hellscape of blog spam ruling Google atm.

Note, for my complex code queries at $dayjob where time is of the essence, I still use GPT4 plus, which is still unmatched imho, without running special hardware at least.

regularfry

I've been occasionally using a 7b Q4 quant on llama.cpp on an 8GB M1. It's usable, if not amazing.

nilsherzig

Depends on your m1 specs, but should definitely be able to run a 7b model (at least with some quantization).

aagha

According to Crunchbase [0], Perplexity has raised over $100M.

You built this in your spare time?

The following things jump out to me:

- How much a hype cycle invites insane amounts of money - How trash the entire VC world is during a hype cycle - What an amazing thing ingenuity and passion are

Great job!

0 - https://www.crunchbase.com/organization/perplexity-ai

fnetisma

This is really neat! I have questions:

“Needs tool usage” and “found the answer” blocks in your infra, how are these decisions made?

Looking at the demo, it takes a little time to return results, from the search, vector storage and vector db retrieval, which step takes the most time?

nilsherzig

Thanks :)

Die LLM makes these decisions on its own. If it writes a message which contains a tool call (Action: Web search Action Input: weight of a llama) the matching function will be executed and the response returned to the LLM. It's basically chatting with the tool.

You can toggle the log viewer on the top right, to get more detail on what it's doing and what is taking time. Timing depends on multiple things: - the size of the top n articles (generating embeddings for them takes some time) - the amount of matching vector DB responses (reading them takes some time)

dcreater

> Die LLM

You mean the? The German is bleeding through haha

rzzzt

Wolfenstein 3D did it first! And then The Simpsons as well.

bobby_the_whale

[dead]

ldjkfkdsjnv

The big secret about perplexity is they havent done much beyond using off the shelf models

ggnore7452

I've been working on a small personal project similar to this and agree that replicating the overall experience provided by Perplexity.ai, or even improving it for personal use, isn't that challenging. (The concerns of scale or cost are less significant in personal projects. Perplexity doesn't do too much planning or query expansion, nor does it dig super deep into the sources afaik)

I must say, though, that they are doing a commendable job integrating sources like YouTube and Reddit. These platforms benefit from special preprocessing and indeed add value.

nilsherzig

I assume the same, feels like their product is just summarizing the top n results? I wouldn't need the whole vector db thing, if local models (or hardware) would be able to run with a context of this size.

msp26

Pretraining and even finetuning (to a good extent) is overrated and you can create plenty of value without it.

KuriousCat

How did they secure funds in that case?

code51

Simple, looking at the mirror and saying "Google-killer" firmly 3 times everyday.

basbuller

That is probably exactly why they got funding. You can sell it as focus on adding new features and leveraging the best available tools before reinventing the wheel.

They do train their own models now, but for about a year they just forwarded calls to models like gpt3.5T. You still have the option to use models not trained by perplexity.

KuriousCat

I still don't get it. What was the USP here? What is the allure in it for the investors?

hackernewds

which is why their engagement and model responses suck. the other competitors are far better

C.ai and Pi comes to mind

Daily Digest email

Get the top HN stories in your inbox every day.