Skip to content(if available)orjump to list(if available)

Reverse Engineering Google Colab

Reverse Engineering Google Colab


·June 23, 2022


It’s about a bazillion times easier to reverse engineer colab if you just SSH into it. You can set up a reverse proxy. I used ngrok back in the day, but maybe they blocked it.

The most interesting thing was a custom binary that mounts your Google drive as a folder. I was able to copy it off colab and use it on my own Linux boxes, which was handy in a “oh neat, lookie there” kind of way. I assume it’ll break whenever they update their api, but you’d still be able to just grab the new binary from a random colab instance.

There’s also a custom script they run to set up everything, using Node. It spawns a bunch of stuff that I’ve forgotten. (It was 2019 when I was poking around, and a pandemic has a nice way of wiping one’s memory of ye olden hacking days. Still a bit sad I never got to go to the tensorflow conference.)

Anyway just ssh in and ls -la / and you’ll see one or two interesting folders. You can rsync them down to your box and examine at your leisure.


It should be noted that it is against their rules and you might get worse instances if they somehow detect it


I think they’re just trying to fight abuse. You can do everything from colab that you can do from ssh anyway. It’s just faster to enter commands.

Good catch though. I didn’t know that.

When I originally figured out how to ssh in, I kept it a secret figuring that it’d be a matter of time till they clamped down. Guess it took a few years, or I just missed it. Bunch of us in the ML scene used to do it regularly, since it’s way easier to monitor a training run via tmux.


I think they do shut you out if you try to spin any process through "unauthorised" means. There have many projects that offer automated setup of SSH/VNC/VSCode on a colab instance, and my experience has been that colab somehow is able to manage to shut off the connections soon after I start them.


But their other docs explain how to use SSH tunnels to connect to local runtimes:

Not quite the same thing I guess.


I would imagine their threat model pretty much assumes anyone can do anything on that host :-)


Doesn't Pro allow SSH?


I looked at the license agreement, and it says under "5. Restrictions"

> circumvent, reverse-engineer, modify, disable, or otherwise tamper with any security technology that Google uses to protect the Paid Service or encourage or help anyone else to do so;

> access the Paid Service other than by means authorized by Google; or

I'm not sure what exactly they mean by "means authorized by Google".


Just use rclone instead; it's open source, fast, supports nearly every cloud storage provider, and has FUSE support for mounting.


Impressive work.

Just came here to note that we read all of our in-product feedback submissions as well as GitHub issues:

If you've got feature requests or encounter bugs we appreciate you filing!


Do you have a plan to expose some high-level API endpoints? I have been dreaming about something like `<notebook_url>?runtime=gpu` which executes a Colab notebook without human interference. This can be extremely helpful in CI/CD environments when you have a lot of notebooks to test, e.g. for


Colab by design is made to be interactive. They even introduced CAPTCHA to make sure you don't train long models and go do something else.


No plans at this time; we try to prioritize interactive compute features. But this would be really cool to do! Maybe in the future.


Hey, just wanted say thanks for PM'ing colab. Great toolbox.


Question: Why does Google not allow children to use Colab?

I can imagine plenty of teenagers interested in programming would like to tinker on Colab. However, Google restricts the service to people 18 and above.


Where are you seeing 18+ restrictions? I went through a lot last year to get us approved for 13+ so we'd be good at least down to middle school ish.


Huh. As someone completely naive about the exact specifics of what goes into these sorts of considerations, what needed to be taken into account?

I can think of the "players interact" element of game ratings (and the fundamentally open-ended nature of that) but not much else. Perhaps it's mostly just moderation/policy?


Sorry, yes, it's 13+. That's still unreasonable, in my opinion. Lots of kids program before the age of 13.


> However, it's incredibly difficult to harness the compute power of Colab for anything beyond Jupyter notebooks. For Machine Learning engineers that want to productionize their models and bring them out of the notebook stage, this is a particularly relevant issue; notebooks, while perfect for exploration, don't play well with more advanced MLOps tools that codify the training process into a formal pipeline.

That isn't what Colab is intended for. Google has better and more productive tools for companies who can fit the bill, which is getting cheaper over time.

AI Notebooks behave the same in practice as Google Colab with one-click one/off for model testing + JupyterLab. If you want to minimize costs via spot instances, you can deploy a Compute Engine with the Deep Learning VM image, which also includes a running JupyterLab on launch if need to use that workflow, and also saves time by including your framework of choice. A spot VM with a T4 GPU is about $0.18/hour.


pops a terminal inline in the colab notbook on the backing vm. super useful if you get tired of having to shell execute all the time via the cell interface.


There's also colab-ssh [1] that sets up an SSH tunnel (through cloudflared) and allows you to connect from your ssh client in your own terminal.





There's an active effort to (again) implement Swift on Colab:


Make some of its features can be pulled into jupyterlab!