Get the top HN stories in your inbox every day.
celestialcheese
arnaudsm
Using LLMs with 100GB VRAM to convert PDFs to CSVs is truly depressing, but I am sure many companies will love it.
2023 office software already uses 1000x more ressources than 1990s'. I bet we are ready to do that again.
visarga
Not just PDFs with tables. It works on any semi-structured document with key-value pairs like invoices, purchase orders, receipts, tickets, forms, error messages, logs, etc.
The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work.
But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4.
version_five
Consulting companies are paying juniors > $150k per year to do this kind of thing. In some objective sense, it's absurd, but locally, it makes more sense to use an expensive gpu than an MBA class president. And in 10 years, everyone's phone will have that much compute anyway.
csomar
It's funny but React/Node/Electron apps will suddenly become minimalist once everyone and his brother start adding a neural model to his app that consumes 10GB of V/RAM.
martythemaniak
You're missing the developer time. You no longer have to spend hours (or days, perhaps weeks depending on the sources) stringing together random libs, munging and cleaning data, testing, etc etc.
arnaudsm
I agree, computers are cheapers than engineers.
But I wonder how much more productive our economies could be if everyone was taught programming the same way we teach reading & writing, and open standards were ubiquitous.
celestialcheese
If you’ve never built PDF or archive document parsing systems, you don’t know true pain.
I see it as incredible. Most PDFs that i see are basically just thin wrappers around image scans of documents that don’t exist anywhere anymore. Archives from estates, manuals, etc.
These techniques of using LLMs to clean ocr output is game changing because best in class before was human-in-the-loop systems that required huge amounts of rewriting to get useable output.
Now LLMs are unlocking for significantly cheaper previously difficult data sources for relatively cheap.
SongofEarth
On youtube there are timer and stopwatch videos that have millions of views, people are streaming 1080p videos for something that can be implemented locally within 20 lines of code, but does it matter really, it won't make a dent on Google's revenue.
If LLMs are deployed in large enough scale, the convenience really could justify the cost.
yawnxyz
we also had more secretaries and people who just retyped things all day in the 90's!
throwaway888abc
It's worth double for the increase in accuracy. Don't let me go to Amazon Mechanical poor souls Turk.
anonymouse008
> text extracted using tesseract
You're saying 'the text' without normalizing the rows and columns (basically the tab, space or newline delimited text with sporadic lines per row) was all you needed to send? I still have to normalize my tables even for GPT-4, I guess because I have weird merged rows and columns that attempt to do grouping info on top of the table data itself.
celestialcheese
exactly. Just sent raw tesseract output, no formatting or "fix the OCR text" step. So the data looked like:
``` col1col2col3\nrow label\tdatapoint1\tdatapoint2... ``` Very messy.
I don't think this is generalizable with the same 100% accuracy across any OCR output (they can be _really_ bad). I'm still planning on doing a first pass with a better Table OCR system like Textract, DocumentAI, PaddPaddle Table, etc which should improve accuracy.
anonymouse008
That’s still super cool!
Yeah my use cases are in the really bad category - I’ve been building parsers for a while, and I’ve basically given up to manually stating rows of interest if present logic. Camelot got so close but I ended up building my own control layer to pdfminer.six to accommodate (I’d recommend Camelot if you’re still exploring). It absolutely sucks needing to be so specific out the gate, but at least the context rarely changes.
swyx
better - you can do it copy pasting from pdf to gpt on your phone! https://twitter.com/swyx/status/1610247438958481408
anonymouse008
Definitely tried that way too, it didn’t work - my tables are pretty dang dumb. Merged cells, confidence intervals, weird characters in the cell field that change based on the row values - messing up a simple regex test, it’s really a billion dollar company solution but I’m about to punt it to the moon because it’s never fully done.
modernpink
What was the dollar cost to do this work? To iterate over a 40k context must be expensive.
celestialcheese
~$0.45
nightski
The discourse has made it seem that with context length larger is always better. I'm wondering if there is any degradation in quality of results when the context is scaled this large. Does it scale without loss of performance? Or is there a point where even though you can fit in a lot more information it causes the performance to degrade?
phillipcarter
In a brief test, I found that the bigger context window only meant that I could stuff a whole schema into the input. It still hallucinated a value. When I plugged in a call to a vector embedding to only use the top k most "relevant" fields it did exactly what I wanted: https://twitter.com/_cartermp/status/1657037648400117760
YMMV.
koboll
The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality, and even a long context window can't fix that. It will remember things from many many tokens ago, but it still doesn't reliably produce passable work.
The combination of a GPT-4-quality model and a long context window will unlock a lot of applications that now rely on somewhat lossy window-prying hacks (i.e. summarizing chunks). But any model quality below that won't move the needle much in terms of what useful work is possible, with the exception of fairly simple summarization and text analysis tasks.
phillipcarter
Maybe! I certainly look forward to that. Although in my testing GPT-4 also hallucinates a bit (less than gpt-3.5), and the latency is so poor that it's unworkable for our product.
pmoriarty
> The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality
It really depends on what you use it for.
I've found Claude better than GPT4 and even Claude+ at creative writing.
It also tends to give more comprehensive explanations without additional prompting. So I prefer to have it, rather than GPT3.5 or 4, explain things to me.
It's also free, which is another big win over GPT4.
dr_dshiv
I find Claude significantly better than 3.5. I’d love to be able to make the case for that with data…
ssd532
I am very impressed with the quality of GPT-4, even with the 8k model. However, I have started reaching the limit of what the 8k model can do. I am eagerly awaiting the release of the 32k model.
Claude 100k model is nowhere near in terms of quality in my experience.
rpcope1
Well, a larger context makes it easier to integrate other tools, like a vector database for information retrieval to jam into the context, and the more context, the more potentially relevant information can be added. For models like llama, where context is (usually) max 2K tokens, you're sort of limited as to how much potentially relevant information you can add when doing complex tasks.
emptysongglass
Any magic tricks to gaining access apart from waiting for months? I've been using GPT-4 and love it but would really love to test that 100k context window with long running chatbots.
famouswaffles
Claude-Instant-100k is available on Poe.com (but only usable as a paying subscriber). Claude-plus-100k isn't up yet but I'm guessing that's a matter of time.
dmix
Nice to see Poe is an actual iOS app for AI chat. Using ChatGPT via the Home Screen “app” is extremely frustrating because it logs you out constantly (maybe due to using Google to auth).
arcastroe
This is the reason I primarily use https://labs.kagi.com/fastgpt . I have it bookmarked as a home screen icon on my phone
systemsignal
If you’re using google login, use a chrome shortcut.
Should keep you logged in for longer and easier to log back in.
costco
I don't have any evidence but I think it's probably done on purpose to make amateur automated free ChatGPT use more annoying.
pmarreck
I use Google to auth on mobile Firefox and I don't get logged out constantly.
heliophobicdude
Perhaps. I don't have those issues from the direct account I have with them.
marcopicentini
Any timeframe when it will be released to the public?
We are in the middle of developing and app and we are not able to do it with the limited context window of Open Ai. We already submitted the request of access.
pmarreck
There are tricks you can do to better utilize the smaller context window, such as sub-summaries and attention tricks. That's how there are already products on the market that consume entire big PDF's and let you query them. Granted, a larger context window would still work better, but it's possible to do.
yawnxyz
it's using "overlapping chunking" methods and it usually works for generic PDFs. It really falls apart on technical documents, SOPs and research articles where you need to get context from chunks way above. Using vector DBs also doesn't work well bc you have to twiddle around with window size / overlappy-ness, which changes depending on what kind of paper you're uploading. It's a mess and takes too long
marcopicentini
The problem is that making a summary of a text of 100k token costs 2$ using Davinci.
modernpink
What are the commercial applications of mega context window LLMs at current prices? I would guess mainly legal. And what strategies would you rely on to reduce the accumulating costs over the course of a session?
undefined
atemerev
I don't understand this "slow rollout" thing about OpenAI competition. The chat / instruction models are continuously fine-tuned on real dialogues. To get these dialogues en masse, you need to deploy models to wide public. Otherwise, you will forever be on the losing side, if you can't quickly grab the streams of real time human-generated content.
People at OpenAI are smart, they understood that quickly, GPT-4 is available nearly everywhere, and lesser models are even free for anyone to use. This required hiring huge teams of moderators, but we are at land grab stage, everyone in the business needs to move fast and break a lot of things. However, GPT-4 and open source models are the only thing I can use. Bard "is not available in my country" (Switzerland), and the first thing that Claude access form is asking is whether I am based in US.
Well, their loss.
dataangel
It's probably the GPUs, they don't have enough capacity to handle more users. My guess is that GPT4 set off a buying spree. Even for CPUs, I've recently heard lead times for Sapphire Rapids servers are 2-3 months, high end switches 6 months, and those probably have way less demand.
s3p
I think it's cloud limitations. Anthropic probably doesn't have the ability to scale up extremely fast and accomodating hundreds of millions of users probably isn't as easy for them as it is for OpenAI.
williamcotton
If they are resource constrained and then opened up the flood gates resulting in poor performance and timeouts for every user it seems like it would sour more milk than otherwise.
okdood64
New to ML here, what’s the difference between parameters and context?
sghiassy
Parameters is like the number of neurons in your brain
Context is how much short term memory you can retain at any one time (think how many cards you can remember the order of in a deck of cards)
Closi
Paramters - number of internal variables/weights in the model
Context - Length of input/output buffer (number of input/output tokens possible).
capableweb
Other answers are already good, just offering yet another difference.
Parameters is something that gets set indirectly via training, it's kept within the weights of the model itself.
Context is what you as a user passes to the model when you're using it, it decides how much text you can actually pass it.
Being able to pass more context means you can (hopefully) make it understand more things that wasn't part of the initial training.
flerovium
POC or STFU
We can't assess how good it is if it's in closed beta. It's all cherry-picked twitter.
nico
It’s also available here on google collab: https://twitter.com/gpt_index/status/1657757847965380610?s=4...
anotheryou
no. you still need to bring your own api key for that.
syntaxing
Is there a trick to getting access? I’ve been on the waitlist for GPT-4 and Claude for a while. Been building some proof of concepts with GPT-3.5 but having better models would be a huge help.
gee_m_cee
If you're referring to a paid account, I never received a notification about my GPT-4 waitlist spot. I waited awhile for one, and then, at the prompting of a colleague, I just found a spot in the web UI to sign up. After one false start, it just worked.
pmoriarty
Try going through poe.com. I got access right away.
pr337h4m
Also available on poe.com
wangg
Sharing that this is available on Poe.com from Quora.
Get the top HN stories in your inbox every day.
Claude 100k 1.3 blew me away.
Giving it a task of extracting a specific column of information, using just the table header column text, from a table inside a PDF, with text extracted using tesseract, no extra layers on top. (for those that haven't tried extracting tables with OCR, it's a non-trivial problem, and the output is a mess)
> 40k tokens in context, it performed at extracting the data, at 100% accuracy.
Changing the prompt to target a different column from the same table, worked perfectly as well. Changing a character in the table in the OCR context to test if it was somehow hallucinating, also accurately extracted the new data.
One of those "Jaw to the floor" moments for me.
Did the same task in GPT-4 (just limiting the context window to just 8k tokens), and it worked, but at ~4x more expensive, and without being able to feed it the whole document.