Get the top HN stories in your inbox every day.
HarHarVeryFunny
jart
> I'm sure this, and other LLM/IDE integration has it's uses, but I'm failing to see how it's really any kind of major productivity boost for normal coding.
The Duomo in Florence took multiple generations to build. Took them forever to figure out how to build a roof for the thing. Would you want to be a builder who focuses your whole life on building a house you can't live in because it has no roof? Or would you simply be proud to be taking part in getting to help lay the foundation for something that'll one day be great?
That's my dream.
HarHarVeryFunny
Well, I'm just commenting on the utilty of LLMs, as they exist today, for my (and other related) uses cases.
No doubt there will be much better AI-based tools in the future, but I'd argue that if you want to accelerate that future then rather than applying what's available today, it'd make more sense to help develop more capable AI that can be applied tomorrow.
pama
We need the full pipeline of tools. What jart did is helping the future users of AI gain familiarity early.
benreesman
In general I almost never break even when trying to use an LLM for coding. I guess there’s a selection bias because I hate leaving my flow to go interact with some website, so I only end up asking hard questions maybe.
But since I wired Mixtral up to Emacs a few weeks ago I discovered that LLMs are crazy good at Lisp (and protobuf and prisma and other lispy stuff). GPT-4 exhibits the same property (though I think they’ve overdone it on the self-CoT prompting and it’s getting really snippy about burning compute).
My dots are now like recursively self improving.
airstrike
> though I think they’ve overdone it on the self-CoT prompting and it’s getting really snippy about burning compute
hear, hear! I have the exact same impression, probably since the gpt-4-turbo preview rolled out
unshavedyak
Man, I really want to get this working. Any recommendations for how to prompt or where this functionality helps?
ljm
The only thing I've used GPT for is generating commit messages based on my diff, because it's better than me writing 'wip: xyz' and gives me a better idea about what I did before I start tidying up the branch.
Even if I wanted to use it for code, I just can't. And it's actually make code review more difficult when I look at PRs and the only answer I get from the authors is "well, it's what GPT said." Can't even prove that it works right by putting a test on it.
In that sense it feels like shirking responsibility - just because you used an LLM to write your code doesn't mean you don't own it. The LLM won't be able to maintain it for you, after all.
nextaccountic
"it's what GPT said" should be a fireable offense
ljm
I wouldn't go that far; we all want to be lazy. Using it as a crutch and assuming everyone else uses GPT so it's all good - well, nobody is going to understand it any more.
Half of the stuff GPT comes up with in the reviews I could rewrite much more simply and directly, while improving code comprehension.
tmtvl
That may be a bit much, but I'd think it grounds for sitting down with the person in question to discuss the need for understanding the code they turn in.
lgrapenthin
Have you seen modern React frontend dev in JS? They copy paste about 500-1000 LOC per day and also make occasional modifications. LLMs are very well suited for this kind of work.
HarHarVeryFunny
That does seem like a pretty much ideal use case!
crabbone
Here's where my Emacs is putting the most effort when it comes to completion: shell sessions.
In my line of work (infra / automation) I may not write any new code that's going to be added to some project for later use for days, sometimes weeks.
Most of the stuff I do is root cause analysis of various system failures which require navigating multiple machines, jump hosts, setting up tunnels and reading logs.
So, the places where the lack of completion is the most annoying are, for example, when I have to compare values in some /sys/class/pci_bus/... between two different machines: once I've figured out what file I need in one machine in its sysfs, I don't have the command to read that file on the other machine, and need to retype it entirely (or copy and paste it between terminal windows).
I don't know what this autocompletion backend is capable of. I'd probably have to do some stitching to even get Emacs to autocomplete things in the terminal instead of or in addition to the shell running in it, but, in principle, it's not impossible and could have some merit.
spit2wind
> I'd probably have to do some stitching to even get Emacs to autocomplete things in the terminal instead of or in addition to the shell running in it
I wonder what you mean. The `dabbrev-expand` command (bound to `M-/` but default) will complete the characters before point based on similar strings nearby, starting with strings in the current buffer before the word to complete, and extending its search to other buffers. If you have the sysfs file path in one buffer, it will use that for completion. You may need to use line mode for for non-`M-x shell` terminals to use `dabbrev-expand`.
> In my line of work (infra / automation) I may not write any new code that's going to be added to some project for later use for days, sometimes weeks. > > Most of the stuff I do is root cause analysis of various system failures which require navigating multiple machines, jump hosts, setting up tunnels and reading logs.
This sounds like an ideal use case for literate programming. Are you using org-mode? Having an org-file with source blocks would store the path string for later completion by the methods described above (as well as document the steps leading to the root cause). You could also make an explicit abbrev for the path (local or global). The document could make a unique reference or, depending on how many and how common the paths are, you could define a set of sequences to use. For example "asdf" always expands to /sys/class/pci_bus/X and "fdsa" expands to something else.
Hope that helps or inspires you to come up with a solution that works for you!
crabbone
> This sounds like an ideal use case for literate programming.
No... not at all... Most of the "code" I write in this way is shell commands mixed with all kind of utilities present on the target systems. It's so much "unique" (in a bad way) that there's no point trying to automate it. The patterns that emerge usually don't repeat nearly often enough to merit automation.
Literate programming is the other extreme, it's like carving your code in stone. Too labor intensive to be useful in the environment where you don't even remember the code you wrote the day after and in most likelihood will never need it again.
> will complete the characters before point based on similar strings nearby
They aren't nearby. They are in a different tmux pane. Also, that specific keybinding doesn't even work in terminal buffers, I'd have to remap it to something else to access it.
The larger problem here is that in my scenario Emacs isn't the one driving the completion process (it's the shell running in the terminal), for Emacs to even know those options are available as candidates for autocompletion it needs to read the shell history of multiple open terminal buffers (and when that's inside a tmux session, that's even more hops to go to get to it).
And the problem here, again, is that setting up all these particular interactions between different completion backends would be very tedious for me, but if some automatic intelligence could do it, that'd be nice.
lordgrenville
> once I've figured out what file I need in one machine in its sysfs, I don't have the command to read that file on the other machine, and need to retype it entirely (or copy and paste it between terminal windows).
Tramp?
crabbone
How would Tramp know that I need an item from history of one session in another? Or maybe I'm not understanding how do you want to use it?
ArenaSource
They are really good at writing your print/console.log statements...
imiric
Just what I've been looking for!
Thanks for pushing the tooling of self-hosted LLMs forward, Justine. Llamafiles specifically should become a standard.
Would there be a way of connecting to a remote LLM that's hosted on the same LAN, but not on the same machine? I don't use Apple devices, but do have a capable machine on my network for this purpose. This would also allow working from less powerful devices.
Maybe the Llamafile could expose an API? This steps into LSP territory, and while there is such a project[1], leveraging Llamafiles would be great.
jart
llamafile has an HTTP server mode with an OpenAI API compatible completions endpoint. But Emacs Copilot doesn't use it. The issue with using the API server is it currently can't stream the output tokens as they're generated. That prevents you from pressing ctrl-g to interactively interrupt it, if it goes off the rails, or you don't like the output. It's much better to just be able to run it as a subcommand. Then all you have to do is pay a few grand for a better PC. No network sysadmin toil required. Seriously do this. Even with a $1500 three year old no-GPU HP desktop pro, WizardCoder 13b (or especially Phi-2) is surprisingly quick off the mark.
exe34
Hi, I haven't tried this myself, but it seems there's a way? https://github.com/ggerganov/llama.cpp/blob/master/examples/...
The call takes a "stream" boolean: stream: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to true.
And the response includes: stop: Boolean for use with stream to check whether the generation has stopped (Note: This is not related to stopping words array stop from input options)
Certainly the local web interface has a stop button, and I'm pretty sure that one did work.
But maybe I'm misunderstanding the challenge here?
tarruda
You're right, llama-cpp-python OpenAI compatible endpoint works with `stream:true` and you can interrupt generation anytime by simply closing the connection.
I use this in a private fork of Chatbot-UI, and it just works.
livrem
Llamafiles look a bit scary, like back when StableDiffusion models were distributed as pickled Python files (allowing, in theory, for arbitrary code execution when loading a model) before everyone switched to safetensors (dumb data files that do not execute code). Running a locally installed llama.cpp with a dumb GGUF file seems safer than downloading and running some random executable?
jart
Author here. Thanks for sharing your concern. Mozilla is funding my work on llamafile and Emacs Copilot because Mozilla wants to help users to be able to control their own AI experiences. You can read more about the philosophy of why we're building this and publishing these llamafiles if you check out Mozilla's Trustworthy AI Principles: https://foundation.mozilla.org/en/internet-health/trustworth... Read our recent blog post too: https://future.mozilla.org/blog/introducing-llamafile/ If you get any warnings from Windows Defender, then please file an issue with the Mozilla-Ocho GitHub project, and I'll file a ticket with Microsoft Security Intelligence.
livrem
Local AI is definitely a good thing and I can see why llamafiles can be useful. Sounds great for the use-case of a trusted organization distributing models for easy end-user deployment. But if I am going to be downloading a bunch of different llms to try out from various unknown sources it is a bit scary with executables compared to plain data files.
dkjaudyeqooe
Self-hosted LLMs are the future. Who wants to keep evil money sucking corporate non-profits in the driver's seat?
And more importantly, who wants to pipe all their private stuff through their servers? Given their attitude toward other people's copyrighted works its guaranteed to ingested by their model and queried in god mode by Sam Altman himself, looking for genius algorithms or ideas for his on-the-side startups.
vaxman
Nah, it will be "OpenAI Together With Github Gives You More!" and included with your cell phone bill /s
Let's see if 288GB multi-core M3 processors with 100GbE (on 10m copper!) happen.... but there's always https://huggingface.co/blog/lyogavin/airllm
No coders on StarTrek yoh https://youtu.be/MX95usfB2ZA
regularfry
Also worth knowing about in this space is ellama: https://github.com/s-kostyaev/ellama which uses the LLM package: https://github.com/ahyatt/llm#ollama to talk to ollama, and while ellama doesn't currently support talking over the network to ollama it also doesn't look like that would be a hard thing to add (specifically there are host and port params the underlying function supports but ellama doesn't use).
mark_l_watson
Thanks, that looks good. I will trying! I already have a good eMacs setup with GPT-4 APIs, and a VSCode setup, but in the last few months I have 80% moved to using local LLMs for all my projects where LLMs are an appropriate tool.
btbuildem
I've used ollama in the past, a few more moving parts than a llamafile, but it provides API endpoints out of the box (in a very similar format to openai).
theYipster
I'm running a MacBook Pro M1 Max with 64GB RAM and I downloaded the 34B Q55 model (the large one) and can confirm it works nicely. It's slow, but usable. Note I am running it on my Asahi Fedora Linux partition, so I do not know if or how it is utilizing the GPU. (Asahi has OpenGL support but not Metal.)
My environment is configured with ZSH 5.9. If I invoke the LLM directly as root (via SUDO,) it loads up quickly into a web server and I can interact with it via a web-browser pointed to localhost:8080.
However, when I try to run the LLM from Emacs (after loading the LISP script via M-x ev-b,) I get a "Doing vfork: Exec format error." This is when trying to follow the demo in the Readme by typing C-c C-k after I type the beginning of the isPrime function.
Any ideas as to what's going wrong?
jart
On Asahi Linux you might need to install our binfmt_misc interpreter:
sudo wget -O /usr/bin/ape https://cosmo.zip/pub/cosmos/bin/ape-$(uname -m).elf
sudo chmod +x /usr/bin/ape
sudo sh -c "echo ':APE:M::MZqFpD::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
sudo sh -c "echo ':APE-jart:M::jartsr::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
You can also turn any llamafile into a native ELF executable using the https://cosmo.zip/pub/cosmos/bin/assimilate-aarch64.elf program. There's one for x86 users too.theYipster
That fixed it! Many thanks.
vocx2tx
Unrelated to the plugin but wow the is_prime function in the video demonstration is awful. Even if the input is not divisible by 2, it'll still check it modulo 4, 6, 8, ... which is completely useless. It could be made literally 2x faster by adding a single line of code (a parity check), and then making the loop over odd numbers only. I hope you people using these LLMs are reviewing the code you get before pushing to prod.
throwaway4good
If you really need your own is prime implementation a bit of googling would have given you a much better implementation and some good discussions of pro and cons of various techniques.
Llm uis need a lot of work to match that.
undefined
jart
If you just run this, without Emacs:
./wizardcoder-python-34b-v1.0.Q5_K_M.llamafile
Then it'll launch the llama.cpp server and open a tab in your browser.skepticATX
The reviewing that most folks do is a quick glance and a “lgtm”.
If most people actually seriously scrutinized the code (which you should) it’d be apparent that the value proposition of using LLMs is not increased throughout, but better quality code.
If you just accept the output without much scrutiny, sure you’ll increase your throughput, but at the cost of quality and the mental model of the system that you would have otherwise built.
dack
This is great for what it does, but I want a more generic LLM integration that can do this and everything else LLMs do.
For example, one key stroke could be "complete this code", but other keystrokes could be:
- send current buffer to LLM as-is
- send region to LLM
- send region to LLM, and replace with result
I guess there are a few orthogonal features. Getting input into LLM various ways (region, buffer, file, inline prompt), and then outputting the result various ways (append at point, overwrite region, put in new buffer, etc). And then you can build on top of it various automatic system prompts like code completion, prose, etc.
karthink
> Getting input into LLM various ways (region, buffer, file, inline prompt), and then outputting the result various ways (append at point, overwrite region, put in new buffer, etc).
gptel is designed to do this. It also tries to provide the same interface to local LLMs (via ollama, GPT4All etc) and remote services (ChatGPT, Gemini, Kagi, etc).
WhatIsDukkha
Thank you for gptel, its really what I had been looking for in emacs llm.
Great work.
karthink
Glad it's useful.
ParetoOptimal
Gptel as others mentioned, but I can't believe no one linked the impressive and easy to follow demo:
https://www.youtube.com/watch?v=bsRnh_brggM
Lowest friction llm experience I've ever used... You can even use it in the M-x minibuffer prompt.
turboponyy
From elsewhere in this thread:
> Also worth checking out for more general use of LLMs in emacs: https://github.com/karthink/gptel
jart
You're the third person in the last 40 minutes to post a comment in this thread sharing a link to promote that project. https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... It must be a good project.
klibertp
It is, but I guess the reason it's mentioned so much right now is that the author posted a pretty convincing video a few days ago to Reddit: https://youtu.be/bsRnh_brggM (post: https://old.reddit.com/r/emacs/comments/18s45of/every_llm_in...)
From what I see, gptel is more interested in creating the best and least intrusive interface - it doesn't concern itself too much about which model you're using. The plan is to outsource the "connection" (API, local) to the LLM to another package, eventually.
Plankaluel
Super interesting and I will try it out for sure!
But: The mode of operation is quite different from how GitHub CoPilot works, so maybe the name is not very well chosen.
It's somewhat surprising that there isn't more development happening in integrating Large Language Models with Emacs. Given its architecture etc., Emacs appears to be an ideal platform for such integration. But most projects haven't been worked on for months etc. But maybe the crowd that uses Emacs is mostly also the crowd that would be against utilizing LLMs ?
jefftk
> maybe the crowd that uses Emacs is mostly also the crowd that would be against utilizing LLMs?
I think a bigger problem is that the crowd that uses emacs is just small. Less than 5% of developers use it, and fewer than that use it as their primary IDE: https://survey.stackoverflow.co/2022/#most-popular-technolog...
(I'm quite sad about this, as someone who pretty much only uses emacs)
safety1st
I'm an emacs believer (the idea of a programmer's text editor really just being a lisp environment makes a ton of sense), but I'm a very part-time user. There are just so many idiosyncracies that make it hard to get into. No one seems to drink their own kool-aid more fervently than the emacs community, it just feels like "this would make it easier for new users" is never allowed to be a design rationale for anything.
For me things started to get easier once I discovered cua-mode and xclip-mode. I have read some arguments about why these aren't the default, I think those arguments are sensible if you have a PhD in emacs, but for the other 99.99% of humanity they are just big signs that say "go away." It's very silly to me that the defaults haven't evolved and become more usable - the definition of being a power user is that you can and do override lots of defaults anyway, so the defaults should be designed to support new users, not the veterans.
jart
That's because learning how to use Emacs is basically the equivalent of Navy Seals training except for programmers. For Emacs believers, that's a feature, not a bug. The good news is that llamafile is designed to be easy and inclusive for everyone. Emacs users are just one of the many groups I hope will enjoy and benefit from the project.
G3rn0ti
> I think those arguments are sensible if you have a PhD in emacs.
To get that PhD just start reading „Mastering Emacs“ by Mickey Peterson: https://www.masteringemacs.org
Many people try learning by doing Emacs and it’s not a bad approach. However, I believe learning the fundamental „theory of editing“ will help you quite a lot to grasp this tool’s inherent complexity much faster. And it’s a fun read, I think.
shrimpx
`xclip-mode` looks like it should definitely be included by default. `cua-mode` is tougher because it messes with the default keybindings, making you type C-x twice (or Shift-C-x) for the large number of keybindings that start with C-x. That might be better for newcomers though, and bring more people to Emacs. Personally I would disable `cua-mode` if it were default.
ParetoOptimal
Warning: This turned into a pretty long response somehow
Doesn't cua mode kind of break the keybindings of emacs?
For instance I use:
- C-c C-c
- C-c C-e
Maybe those get moved to some other prefix?
Also I get the argument that C-v in emacs for paste would be nice, but doesn't that make it harder for you to discover yank-pop aka C-y M-y?
The problem to me it seems with using cua-mode medium or long term is not thinking in the system and patterns of emacs.
I assume if one doesn't want to learn different copy paste commands, they also probably don't want to read emacs high quality info manuals which impart deep understanding well.
EDIT: I found a good discussion on this.
Question:
> CUA mode is very close to the workflow I am used to outside Emacs, so I am tempted to activate it.
> But I have learned that Emacs may have useful gems hidden in its ways, and CUA mode seems something that was attached later on.
Parts of response:
> In short, what you “lose” is the added complexity to the key use. Following is more detailed explanation.
> Emacs C-x is prefix key for general commands, and C-c is prefix key of current major mode's commands.
> CUA mode uses C-x for cut and C-c for copy. In order to avoid conflict, cua uses some tricks. Specifically, when there's text selection (that is, region active), then these keys acts as cut and copy.
> But, sometimes emacs commands work differently depending on whether there's a text selection. For example, comment-dwim will act on a text selection if there's one, else just current line. (when you have transient-mark-mode on.) This is a very nice feature introduced since emacs 23 (in year 2009). It means, for many commands, you don't have to first make a selection.
full response: https://emacs.stackexchange.com/a/26878
I suppose it all hinges upon your response to reading this:
> CUA mode is very close to the workflow I am used to outside Emacs,
My response: Workflow outside of emacs?! How can we fix that? Outside of emacs I'm in danger of hearing "you have no power here!".
Typical response: Why can't emacs be more like other programs so I can more easily use it from time to time?
layer8
4.5% of all developers isn’t small in absolute terms. And diversity is a good thing.
weebull
...which is why the 75% using VS code is a bad thing.
jefftk
I'm not saying emacs has a low number of users to dunk on emacs: it's my primary editor! I was responding to:
> It's somewhat surprising that there isn't more development happening in integrating Large Language Models with Emacs
regularfry
https://github.com/s-kostyaev/ellama is active, as is https://github.com/jmorganca/ollama (which it calls for local LLM goodness).
Plankaluel
Thanks! I was not aware of ellama. Maybe the problem is more one of discoverability :D
ParetoOptimal
I thought there were quite a few emacs llm projects.
There's also llm.el which I've heard gas a push to he in core emacs:
mg
For vim, I use a custom command which takes the currently selected code and opens a browser window like this:
https://www.gnod.com/search/ai#q=Can%20this%20Python%20funct...
So I can comfortably ask different AI engines to improve it.
The command I use in my vimrc:
command! -range AskAI '<,'>y|call system('chromium gnod.com/search/ai#q='.substitute(iconv(@*, 'latin1', 'utf-8'),'[^A-Za-z0-9_.~-]','\="%".printf("%02X",char2nr(submatch(0)))','g'))
So my workflow when I have a question about some part of my code is to highlight it, hit the : key, that will put :'<,'> on the command line, then I type AskAI<enter>.All a matter of a second as it already is in my muscle memory.
nilsherzig
I think (just my experience) that copilot (the vim edition / plugin) uses more than just the current buffer as a context? It seems to improve when I open related files and starts to know function / type signatures from these buffers as well.
mg
That could be. If so, it would be interesting to know how Copilot does that.
For me, just asking LLMs "Can the following function be improved" for a function I just wrote is already pretty useful. The LLM often comes up with a way to make it shorter or more performant.
spenczar5
Yes, the official plugin sends context from recently opened other buffers. It determines what context to send by computing a jaccard similarity score locally. It uses a local 14-dimensional logistic regression model as well for some decisions about when to make a completion request, and what to include.
There are some reverse-engineering teardowns that show this.
nilsherzig
I just tried the gpt4, without any modifications it's impressively worse than the current chat model
bmikaili
For vim i‘d recommend dedicated plugins:
aiNohY6g
There's also https://github.com/David-Kunz/gen.nvim which works locally with ollama and eg. mistral 7B.
Any experience/comparison between them?
deepsquirrelnet
I don’t have experience with gp.nvim, but I liked David Kunz nvim quite a bit. I ended up forking it into a little pet project so that I could change it a bit more into what I wanted.
I love being able to use ollama, but wanted to be able switch to using GPT4 if I needed. I don’t really think automatic replacement is very useful because of how often I need to iterate a response. For me, a better replacement method is to visual highlight in the buffer and hit enter. That way you can iterate with the LLM if needed.
Also a bit more fine control with settings like system message, temperature, etc is nice to have.
bmikaili
Uh sorry, i was gonna link gen nvim I found gp to have more functions / modes to use it. Gp might be able to support local models using the openai spec, at least i saw an issue in their repo about that.
meitham
That’s nice! I would like to do something similar but my vim session are all remote over ssh, can we make it work without browser?
mg
Without a browser, I can't think of a solution that is as lean as just putting a line into your vimrc.
I guess you have to decide on an LLM that provides an API and write a command line tool that talks to the API. There probably also are open source tools that do this.
kmarc
just call a reverse-SSH-tunneled open (macos) or xdg-open (linux) as your netrw browser.
I use this daily, works well with gx, :GBrowse, etc
098799
This is quite intriguing, mostly because of the author.
I don't understand very well how llamafiles work, so it looks a little suspicious to just call it every time you want completion (model loading etc), but I'm sure this is somehow covered withing the llamafile's system. I wonder about the latency and whether it would be much impacted if a network call has been introduced such that you can use a model hosted elsewhere. Say a team uses a bunch of models for development, shares them in a private cluster and uses them for code completion without the necessity of leaking any code to openai etc.
jart
I've just added a video demo to the README. It takes several seconds the first time you do a completion on any given file, since it needs to process the initial system prompt. But it stores the prompt to a foo.cache file alongside your file, so any subsequent completions start generating tokens within a few hundred milliseconds, depending on model size.
098799
Thanks, this showcases the product very well.
Looks like I won't use it though, cause I like how Microsoft's copilot and it's implementations in emacs work: suggest completions with greyed out text after cursor, in one go, without the need to ask for it and discard it if it doesn't fit. Just accept the completion if you like it. For reference: https://github.com/zerolfx/copilot.el
That, coupled with speed, makes it usable for slightly extended code completion (up to one line of code), especially in a highly dynamic programming languages that have worse completion support.
jart
Fair enough. Myself on the other hand, I want the LLM to think when I tell it to think (by pressing the completion keystroke) and I want to be able to supervise it while it's thinking, and edit out any generated prompt content I dislike. The emacs-copilot project design lets me do that. While it might not be great for VSCode users, I think what I've done is a very appropriate adaptation of Microsoft's ideas that makes it a culture fit for the GNU Emacs crowd, because Emacs users like to be in control.
tarruda
Also not familiar with llamafiles, but if it uses llama.cpp under the hoods, it can probably make use of mmap to avoid fully loading on each run. If the GPU on Macs can access the mmapped file, then it would be fast.
jart
Author here. It does make use of mmap(). I worked on adding mmap() support to llama.cpp back in March, specifically so I could build things like Emacs Copilot. See: https://github.com/ggerganov/llama.cpp/pull/613 Recently I've been working with Mozilla to create llamafile, so that using llama.cpp can be even easier. We've also been upstreaming a lot of bug fixes too!
trwllm
[flagged]
jhellan
Does anyone else get "Doing vfork: Exec format error"? Final gen. Intel Mac, 32 GB memory. I can run the llamafile from a shell. Tried both wizardcoder-python-13b and phi
jart
Try downloading https://cosmo.zip/pub/cosmos/bin/assimilate-x86_64.macho chmod +x'ing it and running `./assimilate-x86_64.macho foo.llamafile` to turn it into a native binary. It's strange that's happening, because Apple Libc is supposed to indirect execve() to /bin/sh when appropriate. You can also try using the Cosmo Libc build of GNU Emacs: https://cosmo.zip/pub/cosmos/bin/emacs
jwr
I get the same vfork message on Apple Silicon (M3), even though I can run the llamafile from the command line. And I can't find an "assimilate" binary for my machine.
jart
On Silicon I can guarantee that the Cosmo Libc emacs prebuilt binary will have zero trouble launching a llamafile process. https://cosmo.zip/pub/cosmos/bin/emacs You can also edit the `call-process` call so it launches `ape llamafile ...` rather than `llamafile ...` where the native ape interpreter can be compiled by wgetting https://raw.githubusercontent.com/jart/cosmopolitan/master/a... and running `cc -o ape ape-m1.c` and then sudo mv'ing it to /usr/local/bin/ape
jhellan
Thank you
nemoniac
Here's someone else getting something similar.
phissenschaft
I use Emacs for most of my work related to coding and technical writing. I've been running phind-v2-codellama and openhermes using ollama and gptel, as well as github's copilot. I like how you can send an arbitrary region to an LLM and ask for things about it. Of course the UX is in early stage, but just imagine if a foundation model can take all the context (i.e. your orgmode files and open file buffers) and can use tools like LSP.
kramerger
> You need a computer like a Mac Studio M2 Ultra in order to use it. If you have a mere Macbook Pro, then try the Q3 version.
The intersection between people who use emacs for coding, and those who own a mac studio ultra must be miniscule.
Intel MKL + some minor tweaking gets you really excellent LLM performance on a standard PC, and that's without using the GPU.
jart
Do you know how much faster llama.cpp would go on something like an Intel Core i9 (has AVX2 but not AVX512) when it's compiled using `cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx`? Are we talking like 10% faster inference, or 100%?
Right now I'm reasonably certain llamafile is doing about the best job it can be doing on Intel/AMD, supporting SSSE3-only, AVX-only, and AVX2+F16C+FMA microprocessors at runtime. In fact, there's even an issue with the upstream llama.cpp project where they want to get rid of all the external BLAS dependencies. llama.cpp authors claim their quantization trick has actually enabled them to outdistance things like cuBLAS and I'd assume MKL too, which at best, can only operate on f32 and f16. https://github.com/ggerganov/ggml/issues/293
My concern with MKL is also that, judging by the llama.cpp README's brief mention of using it, adding support sounds like it'd entail a lot more than just dynamically linking a couple GEMM functions from libmkl.so/dll/dylib. It sounds like we'd have to go all in on some environment shell script and intel compiler. I also remember MKL being a huge pain on the TensorFlow team, since it's about as proprietary as it gets.
kramerger
Last year we got 10x performance improvements on pytorch stable diffusion although there was more to it than just using MKL.
Not sure how well this works for LLM. But the hardware is much, much faster than people think - even before using the ML accelators that some new CPUs have - but the software support seems to be lacking.
shepmaster
What is the upgrade path for a Llamafile? Based on my quick reading and fuzzy understanding, it smushes llama.cpp (smallish, updated frequently) and the model weights (large, updated infrequently) into a single thing. Is it expected that I will need to re-download multiple gigabytes of unchanged models when there's a fix to llama.cpp that I wish to have?
jart
llamafile is designed with the hope of being a permanently working artifact where upgrades are optional. You can upgrade to new llamafile releases in two ways. The first, is you can redownload the full weights I re-upload to Hugging Face with each release. However you might have slow Internet. In that case, you don't have to re-download the whole thing to upgrade.
What you'd do instead, is first take a peek inside using:
unzip -vl wizardcoder-python-13b-main.llamafile
[...]
0 Stored 0 0% 03-17-2022 07:00 00000000 .cosmo
47 Stored 47 0% 11-15-2023 22:13 89c98199 .args
7865963424 Stored 7865963424 0% 11-15-2023 22:13
fba83acf wizardcoder-python-13b-v1.0.Q4_K_M.gguf
12339200 Stored 12339200 0% 11-15-2023 22:13 02996644 ggml-cuda.dll
Then you can extract the original GGUF weights and our special `.args` file as follows: unzip wizardcoder-python-13b-main.llamafile wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args
You'd then grab the latest llamafile release binary off https://github.com/Mozilla-Ocho/llamafile/releases/ along with our zipalign program, and use it to insert the weights back into the new file: zipalign -j0 llamafile-0.4.1 wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args
Congratulations. You've just created your first llamafile! It's also worth mentioning that you don't have to combine it into one giant file. It's also fine to just say: llamafile -m wizardcoder-python-13b-v1.0.Q4_K_M.gguf -p 'write some code'
You can do that with just about any GGUF weights you find on Hugging Face, in case you want to try out other models.Enjoy!
Get the top HN stories in your inbox every day.
I'm sure this, and other LLM/IDE integration has it's uses, but I'm failing to see how it's really any kind of major productivity boost for normal coding.
I believe average stats for programmer productivity of production-quality, debugged and maybe reusable code are pretty low - around 100 LOC/day, although it's easy to hit 1000 LOC/day or more when building throwaway prototypes/etc.
The difference between productivity in terms of production quality code and hacking/prototyping is because of the quality aspect, and for most competent/decent programmers coding something themselves is going to produce better quality code, that they understand, than copying something from substack or an LLM. The amount of time it'd take to analyze the copied code for correctness, lack of vulnerabilities, or even just decent design for future maintainability (much more of a factor in terms of total lifetime software cost than writing the code in the first place) would seem to swamp any time gained in not having to write the code yourself (which is basically the easiest and least time consuming part of any non-trivial software project).
I can see the use of LLMs in some learning scenarios, or for cases when writing throwaway code where quality is unimportant, but for production code I think we're still a long way from the point where the output of an LLM is going to be developer-level and doesn't need to be scrutinized/corrected to such a degree that the speed benefit of using it is completely lost!