Goodbye, Node.js Buffer

Daily Digest email

Get the top HN stories in your inbox every day.

bakkoting

The proposal for native base64 support for Uint8Arrays is mine. I'm glad to see people are interested in using it. (So am I!)

For a status update, for the last year or two the main blocker has been a conflict between a desire to have streaming support and a desire to keep the API small and simple. That's now resolved [1] by dropping streaming support, assuming I can demonstrate a reasonably efficient streaming implementation on top of the one-shot implementation, which won't be hard unless "reasonably efficient" means "with zero copies", in which case we'll need to keep arguing about it.

I've also been working on documenting [2] the differences between various base64 implementations in other languages and in JS libraries to ensure we have a decent picture of the landscape when designing this.

With luck, I hope to advance the proposal to stage 3 ("ready for implementations") within the next two meetings of TC39 - so either next month or January. Realistically it will probably take a little longer than that, and of course implementations take a while. But it's moving along.

[1] https://github.com/tc39/proposal-arraybuffer-base64/issues/1...

[2] https://gist.github.com/bakkot/16cae276209da91b652c2cb3f612a...

bsimpson

Thanks for doing this!

I had to convert a ReadableSteam to base64 recently, and I was shocked at how much boilerplate it required in 2023:

    if (isReadableStream<Uint8Array>(value)) {
      const chunks = [];

      for await (const chunk of value) {
        chunks.push(chunk);
      }

      if (chunks[0].byteLength) {
        const length = chunks.reduce(
          (agg, next) => agg + next.length, 0
        );

        // Make the same species TypedArray that ReadableStream gave us.
        value = new (chunks[0].constructor as Uint8ArrayConstructor)(length);

        for (let i = 0, offset = 0; i < chunks.length; i++) {
          value.set(chunks[i], offset);
          offset += chunks[i].length;
        }
      } else {
        throw new Error(`Unrecognized readable stream type: ${ chunks[0].constructor.name }`);
      }
    }

    return  `data:application/octet-stream;base64,${
      btoa(
        (value as Uint8Array).reduce(
          (result, charCode) => result + String.fromCharCode(charCode),
          ''
        )
      )
    }`;

I had to learn/implement a lot of little details to simply change the encoding from binary to its most common text representation. That's a day I would have rather spent doing product work.

bakkoting

This specific proposal will only really help with the last part - it will let you write

    return `data:application/octet-stream;base64,${(value as Uint8Array).toBase64()}`;

but you'll still have to do the work of reading the stream to a buffer yourself.

There is a _very_ early stage (as in, it's literally just an idea one person had, which may never happen) proposal [1] to do zero-copy ArrayBuffer concatenation, which would further simplify this - once you'd collected the chunks you could `value = new Uint8Array(ArrayBuffer.of(chunks.map(chunk => chunk.buffer))` instead of manual concatenation.

Finally, there's the Array.fromAsync proposal [2] and/or async iterator helpers proposal [3] (which I am also working on), which would make it easier to collect the chunks. Putting these together, you'd get something like

    if (isReadableStream<Uint8Array>(value)) {
      const chunks = await Array.fromAsync(value);

      if (chunks[0].byteLength) {
        value = new Uint8Array(ArrayBuffer.of(...chunks.map(chunk => chunk.buffer)));
      } else {
        throw new Error(`Unrecognized readable stream type: ${ chunks[0].constructor.name }`);
      }
    }

    return  `data:application/octet-stream;base64,${(value as Uint8Array).toBase64()}`;

[1] https://github.com/jasnell/proposal-zero-copy-arraybuffer-li...

[2] https://github.com/tc39/proposal-array-from-async

[3] https://github.com/tc39/proposal-async-iterator-helpers

bsimpson

It's nice that in an imaginary future that code would be shorter, but it's unfortunate that the conceptual understanding needed to write it isn't. You'd still need to know how to juggle a whole bunch of related concepts - ReadableStream chunks, Uint8Arrays, ArrayBuffers - to write that transformation.

Why does de-chunking a byte array need to be complicated:

    new Uint8Array(ArrayBuffer.of(...chunks.map(chunk => chunk.buffer)))

esp when chunking is specified by the platform in ReadableStream?

-----

You have made me realize I don't even know what the right venue is to vote on stuff. How should I signal to TC39 that e.g. Array.fromAsync is a good idea?

wayvey

It's good to see that people care about native JS standards. I'm a bit concerned how Bun seems to be adding their own proprietary APIs to their JS runtime, and can't help feeling that it could be intentional vendor lock-in. Since they are venture backed I'm sure they want some ROI from the project eventually and if they start charging money it will be harder to switch away due to the proprietary APIs without refactoring your project.

thomasreggi

It's really because we have some wonderful collaboration between vendors now, and they don't want to have to come up with their own APIs.

https://wintercg.org/

pjmlp

Cough. Vercel.

brundolf

The accusation of malice goes a bit far, but I agree it's a problem whether it's intentional or not. Bun takes a slapdash approach to shipping features and APIs, which means things get out fast but also rubs me the wrong way re: long-term implications and vision

eyelidlessness

I’ll vouch for non-malice. Bun’s creator frequently tweets about hypothetical APIs he may want to introduce, often seeking feedback explicitly. The ideas are always earnest “would you find this helpful?” sorts of things. (For what it’s worth, I frequently reply that these hypotheticals are a bad idea when they break expectations etc, at least if I can fit my reasoning in tweet length. I think this has been at least marginally successful in pushing back in some cases.)

The problem, with Bun but really with the ecosystem at large, is that shipping stuff is (still) the de facto way that standards kind of congeal into something resembling actual standards.

brundolf

Yeah. I follow him on twitter too, and it's both impressive and distressingly chaotic the way he spitballs and then settles on APIs in the form of twitter surveys

lemper

> can't help feeling that it could be intentional vendor lock-in.

it's always vendor lock-in. can never trust companies, mate!

jokethrowaway

It's still OSS, not proprietary APIs. Feel free to fork it in the future, once they need to make their money back.

If it's faster than node and the API to do what I need is better (Bun.FileSystemRouter comes to mind), I don't see why not to switch over.

Heck, I would accept even less retro-compatibility to get rid of transpiling, that completely oscene political mess they did with modules and treating TS as a 2nd class citizen.

The amount of time I'm still hit by ESM vs CJS is insane. Truly a Python 2vs3 moment for the node.js community.

chlorion

The term "proprietary" does not always refer to software licenses, but that's one of the common causes of something to be proprietary.

SahAssar

How would you define it?

The wiki article for it in the context of software (https://en.wikipedia.org/wiki/Proprietary_software) seems to pretty much stand it in opposition to open licenses, but I also have a sort of gut feeling that it could also refer to interfaces in OSS that are not designed to re-implementable. Not sure about that though.

vendiddy

I had to deal with binary data in a project recently and all the options made my head spin. File, Blob, Buffer, ArrayBuffer, Uint8Array, and so on.

Was very confused on what to use!

mofle

- Blob: Immutable raw data container with a size and MIME type, not directly readable.

- File: Like a Blob, but with additional file-specific properties (e.g., filename).

- ArrayBuffer: Fixed-length raw binary data in JavaScript, not directly accessible.

- Uint8Array: Interface for reading/writing binary data in ArrayBuffer, showing them as 8-bit unsigned integers.

- Buffer: Readable/writable raw binary data container in Node.js (subclass of Uint8Array)

bakkoting

Nit: "fixed-length" is no longer true as of very recently [1].

[1] https://github.com/tc39/proposal-resizablearraybuffer

pwdisswordfishc

Now it's bounded-length.

explaininjs

A File is a Blob, every file is `instanceof Blob`.

jongjong

TBH I prefer Buffer as an abstraction. Too bad it didn't make it into native JS standard. I can't think of many use cases in JS land were Uint8Array, Uint16Array, Uint32Array, Int8Array would be absolutely necessary. Seems more useful for perf optimizations. For WebAssembly? Surely a plain array of numbers can be used in most cases. The main use case for Buffer IMO is to convert between different formats like Base64, hex or for sending raw binary data over a transport; ultimately we just need an object to represent binary. It doesn't seem appropriate for a high level language like JS to concern itself with the details of CPU architecture which warrant thinking of binary in terms of 8-bits, 16-bits or 32-bits in the first place. As an abstraction, it's rather arbitrary to treat these numbers as special. It really seems to come down to a marginal performance optimization.

SuboptimalEng

Typed arrays are essential for web apps that use WebGL and WebGPU. Being able to send this type of data to run computations on the GPU can give you 1000x speed up.

You can see it in action on this WebGL fluid simulator[0] by PavelDoGreat.

[0] https://github.com/PavelDoGreat/WebGL-Fluid-Simulation

ninepoints

This isn't an argument against buffer though. Raw memory is raw memory

mateuspires

This simulator os awesome.

mofle

> I can't think of many use cases in JS land were Uint8Array, Uint16Array, Uint32Array, Int8Array would be absolutely necessary.

Buffer is a subclass of Uint8Array.

tomjakubowski

Sized numerics are useful even in high level, loosely and dynamically typed languages. Two examples are binary (de)serialization and OpenGL. In Python, libraries for those generally use sized numerics.

High level languages will still want to do low level things

akira2501

Buffer is a utility. It combines several abstractions that are common in nature into one highly useful class. I quite prefer using it as well.

This is what I feel the node people get right over the ES people, the ES ideology is so abstracted and pushes everything out into small utility classes so you have to create a handful of different objects with odd combinations of methods to get one useful conversion done.

The ES people also seem to have no love of the CLI or for type types of debugging and testing done there. As a result, I almost always choose the node created abstractions over the ES specified ones.

iso8859-1

> For WebAssembly? Surely a plain array of numbers can be used in most cases.

So a list of floats? No! Let's not use floats everywhere just because JavaScript does.

josephcsible

> buffers expose private information through global variables, a potential security risk.

Does JavaScript's security model let you effectively sandbox scripts running in the same context from each other? If not, then why does this matter?

brundolf

Consider a web server serving multiple ongoing requests from different users (with separate permissions). Each of them uses a buffer at some point (either a Buffer or a Uint8Array)

If the buffer doesn't have any cross-contamination with global state, there's no way one user could access another's data (because it's behind an object reference that never comes into the scope of the request logic for the other user). But if it did, and a malicious user found some other kind of vulnerability, they could potentially access data across-scopes

kklisura

Replace buffer in your example with plain object or just array - that rest of it still stands, right?

There's something other at play:

> buffers expose private information through global variables, a potential security risk.

This links to following piece of code [1]:

> // Somewhere in your code

> const privateBuf = Buffer.from(privateKey, 'hex');

> // Rogue package can access

> Buffer.from('1').buffer

I've just run it in node, and my god am I shocked!

> const privateBuf = Buffer.from('DEADBEEF', 'hex');

> Buffer.from('1').buffer

> ArrayBuffer { [Uint8Contents]: <2f 00 00 00 00 00 00 00 de ad be ef 00 > ....

[1] https://github.com/nodejs/node/issues/41588#issuecomment-101...

strken

From a cursory read, it matters because all Buffer.from(...) calls use a shared buffer, and buffer over-reads are a much more common vulnerability than easy access to arbitrary memory.

Not a security expert etc.

kev946

I find buffer useful for its conversion functions to/from different encodings, e.g., `Buffer.from(data, 'hex')` or `Buffer.toString('base64')`. Is there a good way to do this with `Uint8Array`?

mofle

People are working on bringing Base64/Hex conversion to JavaScript: https://github.com/tc39/proposal-arraybuffer-base64

I also provide a package to make the transition easier: https://github.com/sindresorhus/uint8array-extras (Feel free to copy-paste the code if you don't want another dependency)

bsimpson

Thanks for all the utility belts you've provided to the ecosystem.

It's insane to me that something as simple as concatenating an array needs a library, but as I've shown upthread, Uint8Arrays are way too complicated to work with.

ash_gti

You can use ‘atob’ and ‘btoa’ functions for some of that.

esprehn

Those functions are fundamentally broken: https://developer.mozilla.org/en-US/docs/Glossary/Base64#jav...

See the whole section on converting arbitrary binary data and the complex ways to do it.

denzquix

Although those functions operate on "binary strings", not Uint8Arrays, and there is no especially clean way that vanilla JS exposes to convert between the two that I am aware of.

undefined

[deleted]

davidmurdoch

Polyfill Buffer.

norman784

I would prefer not install any dependency if possible, I don't see a necessity of installing a polyfill just to not use the built in one because it's node specific.

shiomiru

I wonder why you would introduce an extra dependency for the base64 example. It seems more trivial than left pad.

https://github.com/sindresorhus/uint8array-extras/blob/cbf24...

betenoire

> // Required as `btoa` and `atob` don't properly support Unicode: https://developer.mozilla.org/en-US/docs/Glossary/Base64#the...

davidmurdoch

Buffer's `slice` is nice to have in many cases though. The pool makes it generally faster. And `allocUnsafe` is a great feature! I like Buffer.

Etheryte

As discussed in another thread, `subarray()`[0] fills the same purpose.

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

mofle

`Buffer.slice` is deprecated: https://nodejs.org/api/buffer.html#bufslicestart-end

keithwhor

Personally, I won't make the switch until the utility methods are supported natively. I don't think it's worth the extra package overhead to execute `buf.toString('base64')`. The second that lands, I'm all-in on abandoning buffers.

ravenstine

I agree with using Uint8Array (although it's really ArrayBuffer that's the key difference), but what is up with the author using an NPM package to do something as trivial as checking if an object is a Uint8Array? `Uint8Array.prototype.isPrototypeOf` should be perfectly adequate and doesn't involve adding an attack vector to your application.

mofle

`Uint8Array.prototype.isPrototypeOf` and `instanceof Uint8Array` do not work across realms (frames, Node.js VM, etc).

Feel free to copy-paste the function to your own code base if you don't want the dependency:

``` const objectToString = Object.prototype.toString;

export function isUint8Array(value) { return value && objectToString.call(value) === '[object Uint8Array]'; } ```

ravenstine

I understand what you're saying, but that's actually in support of my point. This is still extremely trivial code to implement and, from what I can tell, doesn't warrant downloading an NPM package. Have we already forgotten the left-pad fiasco?

This isn't meant as a personal attack on anyone, but we really need to frown upon needless dependencies, especially given the growing number of malicious NPM packages.

lexicality

You're talking to someone who has published well over a thousand packages, many of them tiny.

I suspect your philosophies are irreconcilable.

mofle

No one is forcing you to use it. You can choose to reimplement the code yourself or you can choose to copy-paste the code. I made the package for my own convenience as I need to transition a lot of packages from `Buffer` and I don't want to maintain duplicates of the code in every package. Others are free to use the package or not.

spicykraken

What do you mean by "across realms"?

Is that just another way of saying `Uint8Array.prototype.isPrototypeOf` and `instanceof Uint8Array` are not available in all JS environments?

I guess what I'm asking is the definition of a "Javascript Realm" in case I'm thinking it's something different.

mofle

https://weizmangal.com/2022/10/28/what-is-a-realm-in-js

Examples of this are frames in the browser and the `vm` module in Node.js.

silverwind

> `Uint8Array.prototype.isPrototypeOf` and `instanceof Uint8Array` do not work across realms (frames, Node.js VM, etc).

That sounds like a bug in those implementations.

eyelidlessness

That’s a reasonable intuition, but it’s not a bug. Global scopes are isolated between realms by design, and that applies to built-ins as well as their prototype chains.

paddy_m

Somewhat related, who has played with performant JS dataframe libraries? I mostly want to serialize from pandas/polars to JS in a fast way, with some very minor selection and indexing operations JS side. Ideally I could find a library that does dataframe like operations from reading base64 encoded Buffers/Arrays into their JS objects, then doing the right things.

I'm investigating https://arrow.apache.org/docs/js/ https://github.com/vega/falcon https://github.com/pola-rs/nodejs-polars https://github.com/uwdata/arquero

paddy_m

I added a github issue to my project where this matters literally this morning. I'd love to talk to someone dealing with the same stuff. I have baked dataframe -> JS serialization utils so many times over the past decade. I never want to do much in JS, just simple selection and filtering. But you end up needing both sides of the serialization to do it right.

kylebarron

nodejs-polars is node-specific and uses native FFI. polars can be compiled to Wasm but doesn't yet have a js API out of the box.

As for the fastest way to serialize data to Pandas data to the browser, you should use Parquet; it's the fastest to write on the Python side and read on the JS side, while also being compressed. See https://github.com/kylebarron/parquet-wasm (full disclosure, I wrote this)

thomasreggi

I've been using the native Uint8Array that Deno provides and it's been great overall. From time to time I may need to covert away from a Buffer type and it's a little bit of a headache, but I think the ecosystem is supporting Uint8Array, at least on the server very well.

undefined

[deleted]

Daily Digest email

Get the top HN stories in your inbox every day.