Skip to content(if available)orjump to list(if available)

BCHS: OpenBSD, C, httpd and SQLite web stack

dleslie

I'd be fine with this, even totally on-board, if C weren't so awful with respect to text. You don't even have to worry too much about free()ing your malloc()s if you design around short-lived processes. But this is just asking for security concerns among the tangled web of string and input processing your bespoke C routines are likely to develop into.

Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.

littlestymaar

> Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.

I love how trollish it is not to talk about Rust in that context.

teaearlgraycold

Maybe they didn’t want to bring forth the Rust stans

FpUser

Your message looks like a perfect example of trolling to me.

scrame

This entire thread is a troll, or a demonstration of Poe's law.

na85

If we'd just rewrite all of the things in Rust we could solve computer bugs forever, and world hunger too.

LeFantome

This having to re-write things is obviously Rust’s fatal flaw.

I cannot wait for the next great language, the one that brings all Rust’s advantages which is a pure superset so that it can still compile existing code. Surely something like that ought to end all these petty language rivalries forever!

ascendantlogic

This is what the Rust Evangelism Strike Force™ told me and by God, I believe them.

WesolyKubeczek

And end mortality!

pietromenna

BRHS does not sound as cool as BGHS. This is the only reason to exclude Rust and prefer Go. :p

the_only_law

I just wish there were better tools for navigating C codebases.

There’s been more than one time where I’m in some large auto tools based project trying to figure something out and there’s a call out to some dependencies I have no idea of.

Also many of the projects lack and sort of documentation or source code commenting. These aren’t someones pet project either. One of them was from a notable name in the open source community and the other one was a de-facto driver in a certain hardware space.

pjmlp

Using an IDE like VS/Clion/CDT/.. would already be of help.

Then there are tools like SourceGraph, CppDepend among others.

Too

Use clang to generate a compilation db. Most IDEs support this format out of the box, otherwise via plugin or routed through youcompleteme.

https://clang.llvm.org/docs/JSONCompilationDatabase.html

loeg

Have you tried ctags for navigation? Or clang-lsp?

jhallenworld

Try cscope.

tharne

It does seem sometimes that a lot of folks use C for philosophical rather than practical reasons.

That being said, I love seeing a push for simple stacks like this.

rkagerer

You don't even have to worry too much about free()ing your malloc()s

*gasp!* Such lack of symmetry... it disturbs something deep in my soul.

vbezhenar

It's just a well known arena allocator pattern, implemented on OS level.

rnkn

Is there a good string-manipulation C library?

avar

The strbuf library that's part of git.git is a pleasure to work with. It's C-string compatible (just a char /size_t pair), guarantees that the "char " member is always NULL-delimited, but can also be used for binary data. It also guarantees that the "buf" member is never NULL: https://github.com/git/git/blob/v2.34.0/strbuf.h#L6-L70

MrBuddyCasino

This looks really well thought out - bookmarked.

pjmlp

Yes, SDS from Redis project.

https://github.com/antirez/sds

However the moment you call into other C libraries, they naturally only expect a char *.

lelanthran

I got tired of running into this problem and decided to simply eat the cost of using `char *` in my string library.

int_19h

That's not really a problem if the only thing they need is direct access to a read-only view of the buffer (i.e. const char*) - then it's no different than C++ and std::string.

lelanthran

> Is there a good string-manipulation C library?

You will have to define "good". My string library[1][2] is "good" for me because:

1. It's compatible with all the usual string functions (doesn't define a new type `string_t` or similar, uses existing `char *`).

2. It does what I want: a) Works on multiple strings so repeated operations are easy, and b) Allocates as necessary so that the caller only has to free, and not calculate how much memory is needed beforehand.

The combination of the above means that many common string operations that I want to do in my programs are both easy to do and easy to visually inspect for correctness in the caller.

Others will say that this is not good, because it still uses and exposes `char *`.

[1] https://github.com/lelanthran/libds/blob/master/src/ds_str.h

[2] Currently the only bug I know of is the quadratic runtime in many of the functions. I intend to fix this at some point.

lanstin

No? Asking for code nav and you get three answers. Asking for this and you get crickets. In the 90s I worked at a place where we embedded TCL into all the apps, and rolled our own templating systems. I had to do a little string stuff in C after few years of go, and it sucked. Ugg. buf[len] = ‘\0’;

Using go, I thought I was getting back to low level stuff but this C experience made me appreciate strings in Go. Web servers in C are crazy bad idea, especially if they are spitting out html. Lisp would be better. Node would be better. Go would be better.

foxfluff

> buf[len] = ‘\0’;

So why didn't you use one of the bazillion library functions or third party libraries that terminate strings for you?

I feel like most of the criticism is coming from people who punish themselves by rejecting library functions and then telling that strings are hard. Doh.

pull_my_finger

Can't vouch for any in particular, but they do exist. https://github.com/oz123/awesome-c#string-manipulation

ainar-g

Considering that it's a stack that uses OpenBSD, my first thought would be Perl, although it's not a language that one could call “modern”, heh. It's included into the base system and has rich libraries for text processing, (Fast)?CGI, HTML, and all that.

pjmlp

If using C is a must, having static analysis as part of CI/CD pipeline and using libraries like SDS should be considered a requirement.

Otherwise, yes using anything safer, where lack of bounds checking isn't considered a feature is a much better option.

kloch

I wrote my first web app in 2000 using C/mysql. It was Insanely fast but very awkward to implement. I used C because it was (and still is) the only language I knew well.

At least if you are going to use C, you (should) know to be extremely paranoid about how you process anything received from the user. That doesn't remove the risk but at least you are focused on it.

tinus_hn

This is the same in any language. You can cause security issues in other languages as well if you trust the user/attacker.

teleforce

SQLite author is an avid Tcl user and he recently introduced a small, secure and modern CGI based web application called wapp [1],[2].

[1] Wapp - A Web-Application Framework for TCL:

https://wapp.tcl.tk/home

[2] EuroTcl2019: Wapp - A framework for web applications in Tcl (Richard Hipp):

https://www.youtube.com/watch?v=nmgOlizq-Ms

adamrezich

this is very cool. I only have a passing familiarity with Tcl, but I've been building my own toy web framework and this is a fantastic reference! they made a lot of the same choices I made API-wise but the way they went about it is worth studying.

dmux

I'd like to point out that Wapp doesn't necessarily need to be run as a plain-old CGI application, I've had success running it with it's own built in web-server behind NGINX, for example.

theamk

It seems pretty crazy to write web-facing apps in C, with no memory safety at all.

(They do have "pledge" but even in the most restricted case, this still leaves full access to database)

rossy

It seems like the database libraries they recommend for security, ksql and sqlbox, mitigate the risk with process separation and RBAC, so the CGI process doesn't have full access to the database.

It's definitely contrary to modern assumptions about web app security, but it's interesting to see web apps that are secure because they use OS security features as they were designed to be used, rather than web apps that do things that are insecure from an OS-perspective, like handling requests from multiple users in the same process, but are secure because they do it with safe programming languages.

theamk

ksql exports "ksql_exec", while sqlbox exports "sqlbox_exec" -- both of those allow execution of arbitrary SQL.

So no, the web apps cannot be made secure via OS support alone, because the OS security features are not adequate for high-level problems. Any sort of code exploit allows attacker to trivially access the entire database -- either to read anything, or to overwrite anything.

"pledge" and "unveil" can prevent new processes from being spawned, but they cannot prevent authentication bypass, database dumpling or database deletion.

spudwaffle

How is the overhead of creating a process per-request in this type of system?

jolux

Process-per-request is just infeasible with any significant amount of load.

tyingq

Though the majority of running web servers, load balancers, protocol proxies like php-fpm, etc, are probably written in C :)

aspyct

Yes, but they are...better built than your quick social network poll application thingy with customer's special sauce that you had 5 days to specify, develop and deploy.

C is a tremendous tool, but I don't think it's the best for customer facing web apps.

snvzz

Not to mention databases.

galdosdi

Funny to reflect that there was a time not so long ago when writing web apps (CGI usually) in C wasn't at all unusual (shortly before Perl became much more popular for this). And today, it is indeed kind of crazy.

wk_end

Depends on your definition of "not so long ago" - it's certainly most of the history of the web. The point when Perl, PHP, and Java started to become the dominant web app technologies is about as far from the present day as that point was from the moon landing.

codazoda

The moon landing was 1968, 26 years before PHP was created in 1994. And that was just 28 years ago.

Oops, math checks out. I’m old.

LeFantome

I remember writing CGI scripts in Perl in 1993 ( the year before Netscape ). I am not sure when CGI even became a thing but it could not have been long before that.

Not only was “not so long ago” kind of at the very beginning of meaningful web history but it was also for a very brief moment in time ( if we are talking pre-Perl ). Pre-Perl CGI may have never been a thing though as Perl is older than CGI.

I recall PHP being the next wave after Perl. One could argue it never lost its place even if it now has many neighbours.

Not a Perl advocate by the way though it did generate some pretty magical “dynamic” web pages from text files and directory listings back in the day. Similar story with PHP.

galdosdi

It is true the time when it might have been sane to write CGI in C was very brief. Perl took over almost immediately (and to my chagrin, eventually PHP ate Perl's lunch). I remember reading CGI books that would explain how to do it in either Perl or C, the justification being "in case you need C for performance" but in reality I don't think a ton of C CGI was written. There was definitely some though; I recall poking around in cgi-bin directories and finding some compiled executables (could have been another compiled language like C++) and being disappointed I couldn't view the source like with .pl files.

It really takes you back to a very specific point in time though. A magical time when every year or month software and internet technology would take big leaps and bounds. When you might do things in a way that is very manual and slow compared to today and yet it was amazing at the time.

pjmlp

You mean 1997?

By 1999 I was already using our own version of mod_tcl and unfortunely fixing exploits every now and then in our native libs called by Tcl.

rkeene2

How about a web-facing she'll that allows arbitrary code execution ? [0]

There's nothing fundamentally insecure about allowing C or any arbitrary code to execute on behalf of a user -- this is basically what cloud computing (especially "serverless") is.

As you identify, though, you need a Controlled Interface (CI) which accounts for this model for all resources and all kinds of resources and many tools do not (yet) allow for it.

[0] https://rkeene.dev/js-repl/?arg=bash

theamk

The big difference is that with bash (python, perl, php etc..) exploits, all you need is to upgrade a package, and you are secure. No need to touch any of the application code.

Compare it with C, where the bugs are likely unique per app, and require non-trivial effort to detect and fix.

Execution of user-specific code by serverless services requires non-trivial isolation, and is predicated on "each user has its own separated area" to work. This is not the case with most websites. Take HN for example -- there is a shared state (list of posts) and app-specific logic of who can edit the posts (original owner or moderator). No OS-based service can enforce this for you.

null

[deleted]

km

Writing C might be challenging for some, but as others have mentioned, one can use some other language which gives a statically linked binary to place in the httpd chroot. It won’t be BCHS then.

For uptime.is I’ve used a stack which I’ve started calling BLAH because of LISP instead of C.

jamal-kumar

People love to talk all sorts of trash on this kind of stack but it's really quite solid for what it does. If anyone was ever curious what a sizeable codebase in this kind of code would even look like, check out the source code for undeadly.org [1]. Yeah these people may be crazy but they're also OpenBSD developers and we really love to see what we can get away with using nothing other than what's available in the base distribution. I think a lot of what you see being written for production ends up being very similar to this kind of approach, maybe just utilizing rust or golang as the web application backend language if that's what is the more comfortable thing. Nothing but the base system and a single binary, not relying on an entire interpreter stack, sure can be smooth.

There's other examples of this kind of approach, too, writing straight C Common Gateway Interface web applications in public-facing production use - What comes to mind is the version control system web frontend that the people who write wireguard use, cgit [2] - If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?

Other places you see C web application interfaces include in embedded devices (SCADA, etc) and even the web interfaces for routers, which unfortunately ARE crazy because check out all the security problems! Good thing people at our favorite good old research operating system have done the whole pledge(2)[3] syscall to try and mitigate things when those applications go awry - understanding this part of the whole stack is probably key to seeing how any of it makes any sense at all in 2022. It sure would be nicer if those programs just crashed instead of opening up wider holes. Maybe we can hope these mitigations and a higher code quality for limited-resource device constraints all become more widespread.

[1] http://undeadly.org/src/ [2] https://git.zx2c4.com/cgit/ [3] https://learnbchs.org/pledge.html

foxfluff

> If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?

Probably precisely because they're better? I can see why people who are struggling with malloc and off-by-ones (https://news.ycombinator.com/item?id=29990985) would think it's crazy.

visireyi

we really love to see what we can get away with using nothing other than what's available in the base distribution

pkg_add sqlite3

Can't get away.

petee

#include <db.h>

Berkeley DB with a header date of 1994 :) In base, and of course it still works.

Sqlite was removed from base, again, in 6.1 (2019) --https://www.openbsd.org/faq/upgrade61.html

with this BSDCAN '18 pdf briefly explaining the issues (unmaintainable) -- https://www.openbsd.org/papers/bsdcan18-mandoc.pdf

mrweasel

I believe Sqlite was in base when BCHS was first presented. That and you can just grap the big single c file version of sqlite, no need for a package.

jamal-kumar

OK to be honest let me amend that, because you make a valid if not snarky point!

We like seeing what we can get away with using what's available in the base distribution and a few well-chosen, well-audited packages

rnkn

The Dunning-Kruger effect is stronger in people who spend a lot of time alone, e.g. programmers, which we will now see unfold below.

alexshendi

I propose an amendment to Godwin's Law to include "Dunning-Kruger" , "Dunning-Kruger-effect" and "Dunning-Kruger effect".

rnkn

omg it's like I'm Nostradamus!

null

[deleted]

petee

Another great stack for writing C (or now python) is https://kore.io which offers quite a few helper features, and its easy to get started

RcouF1uZ4gsC

> How do I pronounce BCHS?

I think the correct pronunciation is “Breaches”. Using C in this place as other have mentioned is very, very likely to lead to security issues. Even C++, with its better string handling would be a step up.

ThinkBeat

I remember writing a lot of early web stuff in Perl/CGI. The "servers" I wrote were fast. Perl had most things you could desire built in already.

Database stuff took a good deal of doing, but with little in terms of abstraction, it was also quite fast.

I would like to see a rennescance of using different protocols than HTTP and different content markup than HTML.

harryvederci

Interesting CGI content linked on there.

I've been reading about / hacking on CGI recently, and it's been kinda fun!

Question: One thing I keep reading is how inefficient it is to start a new process for each incoming connection. Could someone explain to me why that's such a bottleneck? I imagine it being an issue back when CGI was used everywhere, people moving away from CGI, and forgetting about it. But hasn't there been improvements in the meantime? Computers from today can run circles around those from a few decades back. Has everything improved except the speed / efficiency of starting a new process?

(I don't have a computer science background, but I guess you could already tell from the above.)

lelanthran

> Interesting CGI content linked on there.

>

>I've been reading about / hacking on CGI recently, and it's been kinda fun!

>

>Question: One thing I keep reading is how inefficient it is to start a new process for each incoming connection. Could someone explain to me why that's such a bottleneck? I imagine it being an issue back when CGI was used everywhere, people moving away from CGI, and forgetting about it. But hasn't there been improvements in the meantime? Computers from today can run circles around those from a few decades back. Has everything improved except the speed / efficiency of starting a new process?

>

It's not as bad as you think it is; just change the webserver to pre-fork. From this link[1], and the nice summary table in this link[2] - I note the following:

1. pre-forked servers perform very consistently (the variation before being overwhelmed) and appears at a glance to only be less consistent than epoll.

2. For up to 2000 concurrent requests, the pre-forked server performed either within a negligible margin against the best performer, or was the best performer itself.

3. The threaded solution had the best graceful degradation; if a script was monitoring the ratio of successfull responses, it would know well beforehand that an imminent failure was coming.

4. The epoll solution is objectively the best, providing both graceful degradation as well as managing to keep up with 15k concurrent requests without complete failure.

With all of the above said, it seems that using CGI with a pre-forked server is the second best option you can choose.

I suppose that you then only have to factor in the execution of the CGI program (don't use Java, C#, Perl, Python, Ruby, etc - very slow startup times).

[1] https://unixism.net/2019/04/linux-applications-performance-i...

[2] https://unixism.net/2019/04/linux-applications-performance-p... 1.

tleb_

Careful, pre-fork as described in the given link as worker processes each handling many requests. This result therefore does not answer the question about the cost of one process per request. The one that does seems to be fork, which is way less efficient (~460 seems like a low number of processes spawned per second though, can we really not do more?).

harryvederci

I'll read those articles you shared, thanks!

Currently the CGI stuff I'm working on is to run stuff on a cheap shared host, so I'll have to check which category of servers that Apache falls in.

Once an application I'm running on a shared host becomes successful enough, I'm probably going to want to move to a different environment, but I'm still interested in what that would mean for performance :)

lelanthran

> Once an application I'm running on a shared host becomes successful enough, I'm probably going to want to move to a different environment, but I'm still interested in what that would mean for performance :)

Depending on what you are doing and what language you are using, a $5/m DO droplet might be sufficient. I once ran a single multi-threaded server, serving a simple binary protocol, and over a 2 day period it handled sustained loads of up to 30k concurrent connections.

To get it that high, I had to up the file descriptor limit on that host.

jim_lawless

It's not just the start-up and shut-down costs. A CGI process might need to attain connections to databases or other resources that could be pooled and re-used if the process didn't completely terminate.

You might want to look at using FastCGI:

https://en.wikipedia.org/wiki/FastCGI

Basically, the CGI processes stay alive and the servers supporting FastCGI ( like Apache and nginx ) communicate with an existing FastCGI process that's waiting for more work, if available.

harryvederci

Thanks! That's a good point, about re-using connections.

For my current use-case* that wouldn't be an issue, so CGI could probably be OK there, then!

* A side project that uses SQLite (1 file per user), and no other external resources.

tonyarkles

I’m smiling at your question!

Yes, it’s less efficient than having a persistent server, but as all things are, it exists in a spectrum.

The load time for one of these processes is going to be almost trivial. I’m on mobile right now, but I would guess that it would be in a handful of milliseconds, especially when the binary is already in cache (due to other requests).

But if you want to compare this against a lot of the prevailing systems, it’ll still probably win on single request efficiency. Network hops, for example, are frequently quite slow and, if efficiency is your primary metric, should be avoided as much as possible. Things like Serverless go the opposite way and tore both your incoming request through a complex set of hops, and also your backend database requests.

harryvederci

Thanks for your response!

I guess I should do some benchmarks comparing different technologies.

> Things like Serverless go the opposite way and tore both your incoming request through a complex set of hops, and also your backend database requests.

I didn't know about that, thanks. If you know some good resources on the topic, feel free to put them in a reply to this message!

tonyarkles

https://www.johndcook.com/blog/2011/01/12/how-long-computer-... is a decent place to start for thinking about how different timings work for things. It's a bit on the stale side, some things have gotten much faster (e.g. disk "seeks" are dramatically different with NVMe), but a lot of it has stayed similar, and some will never change (packet timing to Europe has a speed-of-light limit for now)

aidenn0

Time a python program that imports a few things and then immediately exits. It's significantly more CPU time than you might think. If you use a language with fast startup times, preforking CGI servers can be quite fast.

Zababa

Lots of opinions but little facts in the comment. I'd love to see an experiment with people using that and their preferred web stack. Is this really slower to develop? By how much? Is this really unsecure? Is this really simpler, faster?

exdsq

I’d wager a good portion of my salary that a skilled BCHS developer is slower than a skilled Django/RoR developer to build a usual web app (with auth, payment gateways, admin panels, etc). Not to say BCHS doesn’t look like a laugh to use.

Zababa

I would too, but I'd like some hard data on this. For example, how much slower? 2 times? 10 times? 100 times? Is this an initial cost or a cost paid on all features? Is maintenance easier? Harder?

da39a3ee

I’d like to love man pages but

- I feel that they are linux only. On my MacOS system I can’t rely on man x being the man page for the right version of x. I know that in principle there are environment variables that make sure i’m getting the gnu core utils version or the base homebrew version rather than the system BSD version, but it’s too many moving parts. Furthermore even if I get it right, I can’t expect people I’m working with or mentoring to get it right, hence I can’t recommend man to them for documentation. God knows about man pages on Windows.

- I feel that a small amount of plain text documentation should be stored in the executable, not separately. Isn’t it a holdover from the vastly more constrained computing environments of the 70s and 80s that we’re keeping man pages separate from the executable? Its just asking to get out of sync / incorrectly paired up.

RTFM_PLEASE

OpenBSD pages (https://man.openbsd.org/) absolutely rock and other, proper BSDs do quite well.

Also, man pages are for more than just system utilities (man(1)). Which binary should hold pledge(2) (https://man.openbsd.org/pledge), exactly?

Your man pages should be updated when the associated tool is updated.

You are describing a MacOS issue, with its terrible package management, and frustrating toolchains.

da39a3ee

You seem to be missing my point which is that, as a maintainer of a command line tool, I need to and want to cater to users of all OSs. And in fact, I will allocate my efforts more towards popular OSs. I genuinely am sure that your BSDs are a nice environment, but surely you understand how fringe they are? The majority of my users are MacOS+Windows, with substantial Linux also.

In fact MacOS has an excellent package manager -- it's called homebrew. I don't really want to argue about it but you're the one who made an unjustified assertion about an OS which I bet you don't use. People like you insist that it's bad but no-one who uses it knows why. I maintained my own Linux laptop for 10 years, and for the last 10 years I've used homebrew on a Mac. It has literally never given me any problems! I've never even searched the issues on Github for a problem as far as I can remember.

Honestly I think that the thought processes of most Linux/Unix enthusiasts like you who criticize homebrew are

1. We hate MacOS because childish anti-capitalist ideologies

2. Therefore we will not admit that a nice command-line development environment can be created on MacOS

3. Therefore homebrew is bad

null

[deleted]

Shared404

> - I feel that they are linux only.

They're actually better on Free/Open BSD in my experience. As stupid as it makes me sound, I often struggle to parse Linux man pages, but the BSD's I've had no trouble with for a variety of topics.

> - I feel that a small amount of plain text documentation should be stored in the executable,

Isn't this how --help usually works? I would also rather have more documentation embedded, at least for some executables.

the_only_law

FreeBSD has pretty good documentation.

Comparatively, I’ve found NetBSD documentation to be lacking, although NetBSD seems to take the cake on code quality and legacy architectures (a feature I find my delve right now.

On the wider discussion of doc, I’ve found Linux kernel documentation to be a pain in the ass, and sometimes ever worse than windows kernel documentation (which I won’t even both to get into)

josephcsible

> On my MacOS system I can’t rely on man x being the man page for the right version of x.

But isn't that an issue with macOS, not an issue with man pages?

da39a3ee

Yes, it is. But MacOS (and Windows) are popular OSs for laptop users. I'm the maintainer of a command line tool that is reasonably popular and I believe that the majority of my users are MacOS. So the question is quite concrete for me -- should I provide documentation in the form of a man page? I do not currently, for the reasons I gave above (although I made a mistake in saying Linux when I meant Linux and *BSD for which I deserve what I get!) But I'd appreciate it being pointed out if my thinking is wrong here.

enriquto

> should I provide documentation in the form of a man page?

Yes, and this may be a much smaller effort than you suspect. Only by writing the output of --help in a certain order, you can use the "help2man" tool to generate a beautiful manpage automatically. Notice that your users do not need to have help2man installed, you run it yourself as part of your build process, to create the manpage from the latest source code.

It is very likely that if your tool already has a --help option, you don't really need to do anything to have a manpage. Just call help2man from your makefile.

tiffanyh

s/C/NIM

Why don’t more folks use NIM for web development. Seems like the perfect blend of performance, ergonomics and productivity.

LeFantome

How is NIM doing? When I first read your comment I thought you were talking about Zig ( since that is the language that seems to pop up a lot these days ). It took me a second to catch myself. It feels like I have not heard about NIM in ages.

I am sure the D and V guys are asking themselves the same question.

mrweasel

Last I look and played around with Nim, what I felt was missing is a good way of doing templating. Beyond that I honestly enjoyed working in Nim more that both Go and Rust, which where the other two language I attempted to learn last year.

Zababa

Because there's already Java/C#/Go/Rust/C++ in that space.

slt2021

Nim/Zig/Crystal - these three languages look the same to me for some reason