Favor real dependencies for unit testing

Favor real dependencies for unit testing


·January 12, 2022


Amen to the idea.

- Prefer real objects, or fakes over mocks. It will make your tests usually more robust.

- Use mocks when you must: to avoid networking, or other flaky things such as storage.

- Use mocks for “output only objects”, for example listeners, or when verifying the output for some logging. (But, prefer a good fake)

- Use mocks when you “need to get shit done”, it’s the easiest way to add tests in an area that has almost none, and the code is not designed to be easily testable. But remember this is tech debt, and try to migrate towards real objects over time.

That’s my short advice I told many times. So might as well comment with it here.


I think the original ideas of mocks (if you go and read Growing Object-Oriented Software, Guided by Tests) had some merit: In that style of TDD, mocks are used to discover (hopefully somewhat stable) interfaces between components, and in theory, it fits with the idea that OOP is about "objects sending messages to each other". I can believe that it's possible to write good systems with this kind of approach.

Unfortunately, in practice, mocks are rarely used like that and most "OOP" designs have horrible boundaries and are really not much about message passing anymore. That leads to brittle mocks where you constantly have to change tests when you change implementation details.

I have also gravitated away from classic OOP and much more towards the "functional core, imperative shell" concept as outlined in the article (although it's difficult to keep this pattern throughout a codebase, especially if you have team members). In such a system you really rarely need mocks.

Agreed that fakes, when you have them, are nicer than mocks, especially when the system to be faked has a large API (i.e. use a redis fake, instead of checking the exact commands you send to redis).

However, for some outside systems, writing a fake can be a lot of effort. In such a case, I think it's totally valid to write a "gateway class" that isn't unit tested (you can cover it with integration tests instead) which exposes a nice API (e.g. "storeFile(...)") and then to use mocks of that class in other tests.


In test cases where you extensively involve mocks, more often than not in my experience, you end up testing that your mocks do the thing you told them to.


Yeah, particularly the case with a lot of "glue" type code that's really just passing stuff back and forth and not really making any decisions with it. I've always struggled to feel that mock-based testing in this scenario was anything other than busy-work.


Yes, but the coverage numbers look great!


most of the bugs are in the joints of the system, not in the components

it's much easier to write modules that are internally consistent, much harder to be globally consistent across modules. mocks ensure you only test for internal consistency


Wouldn't the integration (or global consistency as you call it) of objects be exactly what was worth testing in that's the hard part?

Generally objects are simple enough that I can reason about them in my head. That's the whole point of encapsulating the state after all. That means tests are really less critical, since a thorough inspection should do. Between components it's MUCH harder to get any sort of coverage in your internal model, so tests that can be repeated become more useful.

I want the tests to fail if the software doesn't work. Not if some object doesn't do what it says it will in a way that doesn't matter to the system.

The hard part is exactly what I want tests to cover.


Mocks don't test the joints, which you have just suggested are where the bugs are.

Fakes get you a lot further without needing more comprehensive integration testing.

Fakes simulate dependencies - so you can test the joints - but can be tested themselves, and most importantly can have conformance tests that validate that the fake acts like what it is faking.


I have not found storage to be flaky and so I don't mock it. Tmpfile always gives me a unique file, and that is all the fake I need. I don't even look up the various forms of temp file to see which ones don't have a race condition as in practice they never do (if I was writing encryption or other such production code I would, but for a unit test the odds of. A race causing a failure are low enough to ignore)


At minimum, you need to be able to choose where the files are stored to use temp,

I find thtre is still the problem of complicated setup. Also, if you are going for particular semaitics (e.g. cross process interactions), isolating that for testing it more specifically can be helpful.

For me, I look for higher level abstractions and mock when possible and don't sweat testing against temp files otherwise. I've had several TDD people jump to wanting to mock each filesystem call. One was for a cross process storage API. I was trying to get them to just have a Backend interface for reading and writing instead, with tests just using an InMemory implementation as a Double.


- Prefer simulators over mocks

We're running business critical system for several years now, creating simulators has been one of our "secrets" that contributed to the success.

Not only our tests are running in as close to production environment as possible, we're also using them for local development where developers can spin up functioning system on own dev machine.


What do you mean by simulators? An anonymized copy of a database or something similar?


It means external services your project depends on in production should:

1. be used directly if possible/practical, if not...

2. ...then it's likely better to write simulators for them as opposed to mocking individual methods on the client in individual tests

There are several factors to consider. External services need to be able to run in isolated, temporary environments with short bootstrap time – if that's not possible, it's better to write simulator which provides this functionality. Services must provide determinism, if they don't, simulator should probably be written with control apis to provide this functionality etc.

In general the idea can be summarized as opting for using the highest level of functionality available so tests capture as wide code surface (used in production) as possible.

It complements, not replaces, lower level methods – ie. it still makes perfect sense to structure code as composition of pure functions with unit tests. It still makes sense to rely on static analysis and do not test in unit tests what is guaranteed by static type system etc.

The side effect of simulators means you can bootstrap your project locally in lightweight fashion for development - with all simulated functionality available.

For example if you're working on trading application, instead of mocking at the low level prices, order creation calls etc. it's better to write exchange simulator and use it instead – where full lifecycle of an order will work as expected – in tests and locally, when developing application.


I agree - but it can be hard to get good ones, e.g. if you need to simulate sql query execution.


We spin up a temporary and local MySQL instance per test run with helpers to generate necessary data on-demand. All our tests therefore use real queries on a real db. Due to speed constraints, this db is shared between all the tests you run in that session, so it’s possible to influence tests after yours runs in CI. The reality is that that is pretty easy to detect and it’s caused us to write less brittle code.


Sqlite is a great simulation for SQL. You need to limit your SQL to the subset that us supported by both it and your target database though, which might be a problem.


For simulating sql query execution the best is sql server itself.


It all sounds great. I agree totally in principle! I am finding that testing my fairly small Go project (static site generator, because the world definitely needs another one) chews up massive amounts of time. So I tend to avoid the testing pass for longer than I should. Any thoughts on that issue?


It isn't immediately obvious to me why a small Go static site generator would require "massive amounts of time" to run its tests, so it's hard to answer what you're doing wrong.

Are your tests perhaps just too darned big? You don't, in general, need to render 5000 pages of something to test your template doesn't crash or something.

It could also just be disk access. Consider trying an in-memory file system, or if you're on linux, look at using /dev/shm which is a RAM disk.

It is also possible you've snuck in a quadratic or worse time algorithm. There's nothing fundamental to the problem of a static site generator that would require such algorithms, but speaking from experience it is an environment where it's easy to loop over the return value of one function, which itself loops over something other function (quite likely the same data), which itself loops over the same structure, and it's easy to end up with O(n^3) or O(n^4) without realizing it. It's especially easy to end up with that being a "for each page" type of loop. Static site generation should be O(n), give or take small factors (maybe O(n log n) for some things technically, but at a scale where O(n log n) is practically O(n) anyhow).


Make it CI’s problem. You should only really be running tests regularly for the component you’re currently working on with CI making sure you don’t have accidental regressions.


if you don't mind spending the effort, have a quick profile to see where the tests are taking a long time - is it initializing the real components that could be mocked?

Or are the tests themselves taking a long time because of other factors, such as IO etc?

For IO, may be there should be an abstraction over these IO api and you use an inmemory option instead for testing.


Of io is the problem I general solve that by testing smaller data sets. I have not found io to the local disk is slow. Of course if it is network io that is bad, but local disks are fine for the size of data in my tests


I try to write the test first, or at least stub in the test how I think it should work. Or at least enough notes to pick it up "tomorrow". Then I have a reference for usage i can build with that in mind


Thanks all for super helpful feedback. This was a generous gift of your time and brainpower.


Generally the more difficult a test is to write and run, the more useful it is.


As a potential counter-argument, the use of mocks can enable testing of functionality that the current concrete implementation doesn't exercise. It's easier than one would think to accidentally rely on implementation details rather than coding just to the interface (and optionally any documented restrictions to that interface).

They explicitly call out clocks as a source of non-determinism that probably should be mocked, but I'll re-use them as an example anyway because everyone is familiar with them: it's extraordinarily useful for the tests to execute nearly immediately rather than actually waiting on a clock, and rare behavior like a clock running backward, two consecutive timings being identical within the clock's resolution, or whatever other weird artifacts that your code should handle are definitely better explicitly tested rather than not mocking the clock. Other domain-specific interfaces are often similarly able to exhibit a weird edge case that ought to be explicitly tested (rather than accidentally relying on a "nice" implementation) if you really want to unit test the callers and not integration test the coupled system.


Also error conditions - telling a fake to give you a specific error before you run the test is no different from telling a mock to do it.


It is different. You tell the mock to give you an specific error with specific information, while you tell the fake to give you the error for an specific condition.

It is subtle and may not be worth the time, but again, it may be.


The heart of the argument presented is that using mocks in unit tests is problematic because if an interface is changed, that will possibly break every test that involves mocking that interface and that's friction to making changes. This is silly. If you use real implementations and change the interface, you have the exact same problem except that your configuration/setup is also likely to have to change in subtle ways.

Let's just assume the absurd notion that a choice of creating tech debt or changing a common interface is a real choice. If there was a serious change that could not be accounted for with easy test changes, I'm not the only one to see the tests commented out with a "TODO: fix these". Developers tend to be pragmatic.

If you want to change code you will always measure the effect it will have on the code and tests are incidental, not the primary concern. Make it work. Make it right. Make it fast. I want to be able to trigger all code paths and exceptions (make it good). Using a real dependency, I would be left in the unfortunate situation of depending on knowledge of the internals of that dependency. It may not allow me to execute specific paths via pure configuration at all.

I don't think using real dependencies is a good idea at all for unit tests. Integration tests are a different and I do fear that they are being confused.


Dependencies are why we have functional tests.

Write as much code as you can that has no dependencies. Unit test that code exhaustively. Fake all inputs that don't contain behavior. Mock all interactions that do. Then write functional tests that check that the glue and state management actually work with the real things.

The plumber is still going to run water and check for leaks before they leave, no matter how many certifications the copper piping came with. But that's only at the end of a long process of work and inspections.


Nothing pisses me off like finding a suite of tests that has fakes with logic in them. By the time I find them, the fakes are longer than the tests. Often the commit history shows that this accumulated by accretion, and nobody ever pulled the emergency stop lever. Other times it's people who are wrong-headed about what problems tests are trying to solve (coverage chasers are but one category).


> The heart of the argument presented is that using mocks in unit tests is problematic (...) If you use real implementations

No. As you say, the moment you have a real out-of-process dependency, you no longer have a unit test, but an "integration-with-external world" test.

The heart of the argument is to not mock out internal (in-process) business logic. Instead:

a) do not use mocks (e.g. leveraging Moq); use fakes (aka "simulators"), i.e. proper classes having in-memory implementations of the out-of-process dependencies and crucially

b) replace only out-of-process dependencies, which usually are flaky/nondeterministic. Never replace internal business logic.

If you feel you need to replace internal business logic, you are not following the functional core architectural pattern. After you refactor to functional core you will no longer need to replace any dependencies in your unit test, as there won't be any - you will just call your tested pure method and make assertions on the returned value.

Special case here is testing "top-level/controller" logic. Here you need to use the in-memory fakes, but there will be only few, reused across all tests, and such test will be an "end-to-end internal-business-logic integration test", but still it will have all the properties of a unit test - it will be fast, deterministic, and you will be able to run it as part of a unit test suite out-of-the-box, with no environment setup necessary.

Further reading: https://enterprisecraftsmanship.com/posts/when-to-mock/

> If you want to change code you will always measure the effect it will have on the code and tests are incidental, not the primary concern

Exactly. Mocks are very brittle and break all the time, causing unnecessary rework. If you instead rely on a small set of fakes of out-of-process dependencies, you will drastically reduce test suite rework and improve signal-to-noise ratio.

> Using a real dependency, I would be left in the unfortunate situation of depending on knowledge of the internals of that dependency.

And what happens if the real dependency changes behavior, but you forget to update your mock? You will have a test that will be running against a mock that simulates obsolete behavior, no longer present in production. In worst case scenario the test will be green/passing, while there is a bug in production. Will you remember to always comb over your entire test suite and review all mocks to faithfully simulate actual production behavior?

> It may not allow me to execute specific paths via pure configuration at all.

If you write fakes, you will have full control over how to configure them.


> Will you remember to always comb over your entire test suite and review all mocks to faithfully simulate actual production behavior?

This doesn’t make sense. If I write unit tests for my ‘FunctionExecutor’, and it has a dependency with a function ‘shouldExecute’, I don’t care how that function is implemented, only that it returns a boolean. If the implementation of my ‘shouldExecute’ function changes, there is no need to update any other test as long as it still returns a boolean.

I do agree that what you call fakes are better, because you get the extra guarantee of them implementing the same interface (e.g. your compiler will scream bloody murder if you don’t update both).


> I don’t care how that function is implemented, only that it returns a boolean

Let's take your example of "shouldExecute". I assume your unit test operates on some inputs (with the values provided inside the unit test itself, naturally), and "shouldExecute" has potentially some nontrivial logic in it. Say, it reads value of some environment variable and if it is right, returns "true".

Now there are two possibilities:

a) your test inputs are made up. For example, you never set the environment variable value, and coerce "shouldExecute" to always return "true" anyway. The problem with such a test is that it is fiction, not a test of an actual production behavior. Sure, you will test what would happen if the logic would determine it should execute given no environment variable is set, but this will never happen in production. In production lack of environment variable would result in "shouldExecute" returning "false" and you should test for *that*. So you do care about the details of "shouldExecute", because you need to be aware that it returns "true" only if appropriate environment variable is set. And if you don't care if the "shouldExecute" returns true or false, then why do you call it in the first place? What are you even testing? I hope your "shouldExecute" doesn't have any side effects you depend on, and I do hope you do not use code coverage as a goal unto itself.

Plus, such a test cannot be used as "executable documentation", because the inputs are simplified to the point of irrelevance, and cannot help in understanding actual program behavior.

b) your test inputs are realistic, and reflect actual production behavior. This means you will have properly set up the environment variable value so that "shouldExecute" returns true. With that, the test is similar to actual production behavior, has good bug-catching ability, and can serve as executable specification. But here again you will have to worry about "shouldExecute" implementation.


Let me offer you another example. Imagine we want to test:

Compile(SourceCode sourceCode) { ValidateSyntax(sourceCode); /* complex logic post-processing the sourceCode here */ }

I could say here "I am testing Compile, and I do not care if ValidateSyntax throws an exception; I will just coerce it to return without throwing an exception".

And then I can write a test that takes as input some simple sourceCode like "blablabla" and claim I have somehow tested the "/* complex logic post-processing the sourceCode here */". But this is silly, in reality such fake input would never survive the validation, and thus testing what happens after is just waste of time. Hence I need to ensure sourceCode passes the *actual* validation, and hence I need to understand the logic inside the "ValidateSyntax" method.

It gets worse. Now imagine we have thoroughly tested the complex logic while using mocked out ValidateSyntax, but now the syntax has changed and thus the validate method behaves differently. If I have a mock of the "ValidateSyntax" method I might still feel good - the test is green, the coverage is still high. Except it is all I lie. I run "Compile" on some production data and it blows up. Why? Because the test was working against a "ValidateSyntax" mock that was mocking the result of validating obsolete syntax, which, now, with the changed behavior, would actually not pass validation and blow up. Basically my test was telling my "Compile" method works IF I assume syntax is still the obsolete one, but it doesn't tell me how my code behaves with the current syntax.

So every time program behavior changes, I need to go through all my mocks that were duplicating that behavior, and see if they are still faithfully reflecting it. Otherwise I risk ending up with a green test suite that tests nonexistent, impossible program executions.


In other words, shouldn't we prefer integration testing, or perhaps partial integration testing. This requires an initial effort of setting up your test framework/environment, but in my experience integration tests provide good value for the time you put into testing.

Rather than a granular test on a single class, test the orchestration of many classes. You end up hitting big % of code. As always, depends on the project. If we're building a rocket ship, you need both granular testing and coarse testing.


This is exactly how I feel as well! Units in isolation certainly deserve testing, but the actual long-lived value comes from testing the public interface of the module/program/etc. In one compiler project I maintain, I stopped writing unit tests years ago in favor of just feeding it input like users would and asserting on the outputs. It gives me both great freedom to refactor quickly, and great confidence that I'm not regressing.

As you say, depends on the project. What I described above is entirely free of side effects - I wouldn't dream of testing a web service this way.

Of course integration is never in conflict with unit testing - they're different and can happily coexist.


People keep saying this all the time, but apart from the fact that nobody can agree on what an "integration test" is (because there's almost always some part of the application flow that you're stubbing out), it just becomes immediately apparent in a code base of sufficient size that "just use integration tests for everything" is only possible if you severely under-test (which usually includes thing like not properly testing for error conditions etc.).


What? Nobody is advocating for integration tests to the exclusion of unit tests.

but apart from the fact that nobody can agree on what an "integration test" is

Not being precise doesn't invalidate a guideline. The ideas that "stubbing less is better" and "testing functionality end to end is good bang-for-buck" aren't crappy because people don't agree on the details.

it just becomes immediately apparent in a code base of sufficient size that "just use integration tests for everything" is only possible if you severely under-test

Your parent comment said "If we're building a rocket ship, you need both granular testing and coarse testing." Nobody is advocating for integration tests to the exclusion of unit tests. If you write a hash table or a CSV parser, yes you should unit test it.

But for most application-ish functionality you should reach for integration tests first. For example, testing direct message functionality in an app, checking that after a send there's a notification email queued and the recipient inbox endpoint says there's 1 unread will get you really far in 20 lines of code. Is it exhaustive? Of course not. But the simplicity is a huge virtue.

if you severely under-test (which usually includes thing like not properly testing for error conditions etc.)

I advocate "default to integration testing for application functionality". Those focused on unit tests often mock exactly the things most likely to break: integration points between systems. "Unit" tests of systems are often really verbose, prescriptive about internal state, and worst of all don't catch the bits that actually break.


> Nobody is advocating for integration tests to the exclusion of unit tests.

Oh, I got here just now, but well, let me advocate it. (Well, not all unit tests, but most of them.)

The ideal layer to test is the one that gives you visible behavior. You should test there, and compare the behavior with the specification.

Invisible behavior is almost never well defined and as a consequence any test there has a very high maintenance cost and low confidence results. Besides, it has a huge test area that comes with the large freedom of choice there. It is a bad thing to test in general.

Now, of course there are exceptions where the invisible behavior is well defined or where it has a lower test area than the visible one. On this case it's well worth testing there. But those are the exception.


> I advocate "default to integration testing for application functionality". Those focused on unit tests often mock exactly the things most likely to break: integration points between systems. "Unit" tests of systems are often really verbose, prescriptive about internal state, and worst of all don't catch the bits that actually break.

You're writing this as a comment to an article that explains exactly how to write unit tests that aren't brittle and avoid mocking.

"Only integration tests" vs. "brittle unit tests" is a false dichotomy.


> advocate "default to integration testing for application functionality".

I agree. I've also found that by starting at the top of the test pyramid, keeps focus on the goal.

I have recognised sometimes that by starting at a too low level, there is a danger of losing sight of the bigger picture.


Nobody can agree on what a unit test is either. I've seen people say that:

* If it uses an xUnit framework it's a unit test.

* That since the whole application constitutes a unit then a test for the whole application is a unit test.

* That anything that doesnt use the UI is a unit test.

Both names should be trashed, IMO. They both lack clear boundaries.


I wholeheartedly agree that people can't agree on what "unit test" means precisely, either (even though I think your specific examples are a bit disingenuous). In particular, classical and London-school / mockist TDD have rather different definitions of it.

That's why it's important to have a well-rounded test strategy with different types of tests that have different purposes, instead of using some blanket approaches.


Great article, not the first time I read this argument against the usage of mocks; I never understood though how to solve the problem of the explosion of the path that must be tested (usually exponential).

As an example, consider a single API that uses ~3 services, and these 3 services have underneath from 1 to 5 other internal or external dependencies (such as time, an external API service, a DB repository, and so on). How can I test this API, the 3 services, and their underneath dependencies without exponential paths to test - I want to be able to cover all the paths of my code, and ensure that my test only tests a single thing (either the API, the service, or the dependency interface); otherwise, it is not an unit test.

I always felt like that these type of tests without mocks works super-nice in nice situations without any external, or even complex but internal, dependency; otherwise, it becomes very very hard to test ONLY what I want, and not all the dependencies underneath.

Mocks allow me to stub the behaviour of a service/dependency that I can test in a separate fashion, covering all the paths, and ensuring that each unit test covers a single unit of my code, and not the integration of all my components.


Your dependencies do need to work reliably if you're not going to use mocks. But if they don't, then mocks might be necessary to ensure test reliability. That said if the interaction between your system and its deps is sufficiently unreliable that you need mocks for test reliability, how are you going to have any confidence in the system's production behavior?

Like, if I test one of the workflows of my service, it might involve executing queries and transactions on an underlying database. But these operations have essentially 100% reliability, so the fact that these operations are being "tested" at the same time as my service do not impact test reliability. I gain nothing by mocking them out.


Well, I strongly believe in testing all the queries/transactions (for example the repository pattern is super helpful to separate concerns and allow you to test only what concern your queries and the DB); but those are tested separately from the workflow of your service, if we are talking about unit tests. Why? Because otherwise either you write a gaziliion of tests, or they are unreliable. As an example, if you have in the service you wanna test various code paths, and your queries have paths of their own (what happens if you don't retrieve any object? if some property is null? and so on), I just think it gets messy quickly.

My solution is to test things separately, at least in unit tests. Obviously, this has pitfalls (it's easy and fun to write green tests, so not always tests respect the interface of the components and they don't get red even if something is wrong); but that's where integration testing come into play.

Have a few code paths where you touch multiple external services, and you want to test that everything actually works? Create integration tests that use either a fake or the actual service in a `staging` environment, take 10x time to run, but test the path that is most important for your logic. Obviously, if you many unit tests, you will have way less integration tests, but they serve different scopes, and one cannot substitute the other!


> I gain nothing by mocking [database queries and transactions] them out.

What about the fact that you will need to spin up, migrate and possibly seed a database in your CI pipeline? What about the toll this will take on the execution speed of your test suite? Additionally, consider that you also need to test the behavior of your system when the query fails, and using a mock implementation that always throws an exception is a trivial and reliable way of achieving this.

> That said if the interaction between your system and it's deps is sufficiently unreliable that you need mocks for test reliability, how are you going to have any confidence in the system's production behavior?

Sometimes your codebase depends on external services which are flaky for reasons beyond your control, just the way it be sometimes. Mocks are useful to ensure the system behaves a certain way when everything goes right as well as when everything goes wrong.

Ultimately the article raises many good points about avoiding mocks if it can be helped, but don't forget a test that only tests the happy path of your system is not very useful. Mock an error in that dependency you expect to always work and understand what would happen, make the necessary provisions.


> What about the fact that you will need to spin up, migrate and possibly seed a database in your CI pipeline?

The database system I use can be configured to start up reasonably quickly, and can be configured to operate on memfiles to reduce io pressure on the CI system. In fact, testing against the full scale local database is the only supported methodology for this particular rdbms.

> Sometimes your codebase depends on external services which are flaky for reasons beyond your control, just the way it be sometimes.

No doubt no doubt. As I mentioned, mocks or fakes might be necessary in a condition like this.

> Ultimately the article raises many good points about avoiding mocks if it can be helped, but don't forget a test that only tests the happy path of your system is not very useful.

My team uses interception and error injection for this case. We still have the real backend, but requests can be forced to fail either before or after executing on the backend.


I don’t get it. When I unit test, I want to only test what I’m testing. A dependency or the result of a dependency is not what I’m testing. So I mock the result of that dependency to unlink the test from the dependency. If an interface changes I generally want to know anyway, as the test may be different or even obsolete depending on the change.


I hear you, and that's how things work at my present shop.

However, let me give a strong defense of the point.

It turns out that the only thing you can test is a pure function. This is just the nature of a test, we pin down the inputs to a piece of code and we see what outputs it produces. All of the mocking that you are doing is an attempt to turn an impure function into a pure function. Even more extreme setups where you connect your container to a container running postgres, is trying to turn that container into a pure function in a different way. You don't need to do any of this if the function is pure in the first place.

Once we've established that “functional core” is a lazier means to the same ends the question of dependencies still comes up, and the “should I mock dependencies” question becomes “should I promote this internal function call to the main I/O section and pass in its result as an argument?”... And the answer is that that changes the language with which this outermost level is written. And probably this outermost shell should be written to sound like the business logic that you are implementing, and you should never do that.

If that's the logic then it means that the sort of testing you're doing is extremely particular and fussy. Because it suggests that every test should really be constructed at a business level, it is a story about your product that you want to make sure holds fast even when the internals are changed. This works really well with domain driven design, because that says that your modules should be also business level entities, so each module comes with some tests that say, here's what this sort of person interacts with the system like. So you are always testing integration of the pure functions, and why would you not. If those pure functions do not integrate together, you want to know about it, and you do not want to know about it through a persnickety test which just fixes the inputs and outputs of something that has no business relevance, because you know what happens in those cases: the developer just rewrites the test to say the opposite of what it used to say, so that it passes now. There is no semantic check on the test output because there cannot be if it is scoped too small.

I think you can quibble a lot with those details but I think that's the strongest case you can make for it?


I feel like you are still making the case for it. “Should I promote the logic of the dependency to the calling module?” No, I shouldn’t. That’s why I made them a dependency. They could be dependent to many modules, or there could be many implementations based on ioc implementations or a some random factory or strategy pattern. Or maybe I do not control the dependency as it is external to our platform. I can understand all too well the engineering bias to do less work. “But now all my tests are broken” may be the correct result.

With that said without concrete examples too argue about it’s hard to say if we are even disagreeing. The articles examples were wanting and to what level you break up your code is a hard fought learning exercise. Some don’t care, some say no more than fits on a screen, and on the other end some people use an NPM package to find out if a number is even.


When I unit test, there is normally not much going on in the “unit” besides its use of dependencies, so the tests are largely circular (asserting things about mock expectations).

I still have to write them, because Thou Shalt Have Unit Tests. I can’t consolidate the overly-trivial units, because that would not be Architecture Best Practices, and might even be Spaghetti Code.


> besides its use of dependencies

Isn't that sort of the point of the unit test in that case? You test the use of those dependencies:

Write a test where a mocked dependency returns something unexpected and see how your 'unit' responds.

This is why mocks are useful because often you rely on implementation details of your dependencies and only test the happy cases. With a mock you can return whatever edge cases you come up with and ensure your unit handles everything as expected.


No? The point of a unit test is to tell me something non-obvious about how the function behaves in certain circumstances. When unit testing glue code there is no information in the test result that is not also literally written out in the unit under test (it makes these calls in this order).


The way I think about tests is: what’s the cost of failure here vs the cost of testing?

Sometimes, bugs in this part of the codebase would not be a showstopper, so why test as heavily?

The most important tests by far are smokescreen integration tests for critical paths through the system. I tend to care much less about other tests in many cases.

Obviously, if I was writing a compiler or database or medical software, that’d be different. But I’m generally writing web applications where if the entire application were to fail for a day, we probably wouldn’t even lose a customer.


This is almost right except for two points:

For fakes, spin up the real thing. If you’re not able to model your database transactions deterministically, then your transactions could themselves be flawed and tests are great way to catch that.

Deterministic tests are not a goal in and of themselves. Controlled non determinism is valuable. This is popularized in various frameworks under the names of property checking and fuzzing which will let you know the seed to use for the failure for example so that while the runs don’t have the exact same input/output for every invocation, you get better coverage of your test space and can revisit the problematic points at any time. If you’re doing numeric simulation, make sure you are using a PRNG that’s seedable and that you log the seed at the start if you’re using a seed (and make sure time is an input parameter). Why is this technique valuable? You transitively get increasing code coverage for free through CI/coworkers running the tests AND you have a way to investigate issues sanely.


"Non-deterministic tests" usually refers to a test whose output depends on the execution environment in a way that cannot be controlled. For example, a multithreaded test with a race condition, or a test that uses the real-time clock.

This is fundamentally different from a test that uses random numbers. Another way to look at it: Deterministic tests can be used in `git blame`. A unit test that uses a specific PRNG algorithm and sets the seed is fine. A test with a race condition is not.

Fuzzing is very useful, but it's not unit testing. If you want to run a fuzzer or some other kind of endless randomized testing in CI, it should be a separate job from the unit tests (IMO).


This article espouses what is known as "classicist testing" school of thought and rejects "mockist/London-style testing". I wholeheartedly agree with it. I have been championing it in my team of 20+ devs since many years now, to great effect. You can read more about it in the excellent book by Vladimir Khorikov from January 2020 titled "Unit Testing Principles, Practices, and Patterns".


If you are testing the externalities, they aren't unit tests, but integration tests.

The better advice is: continue to isolate unit tests away from real dependencies, and ALSO have integration tests that test the way the package connects to dependencies (and the other packages in the software dependent on it).


I would argue that if you need mocks, the code isn’t well suited to unit testing and you’re better off with integration tests.

If you want to unit test it, then refactor in such a way that you no longer need mocks.


I have two services that communicate over the network. Neither does anything useful without data from the other. So I write my tests like this article suggests and inject at the boundaries of the services, the network boundary in this case. I've built tests that now only test my idea of what the services will return. IMO these aren't particularly useful tests: what happens when the remote service starts returning unexpected data or errors due load problems? I guess my point is just this article is good advice: prefer testing your actual code and dependencies, but unit testing isn't a replacement for integration testing.


I worked somewhere with a hard rule that you are not allowed to test internal classes.

So if I write my own sort algorithm I am not allowed to unit test that class unless I make it public.

But if the sort algorithm is to solve a specific sub problem in a library there is no need to make it public - it may not make much sense.

So I had to test it “through” its consumer class(es) that is public. For illustration sake lets say a MVC controller mocked up to the eyeballs with ORM mocks, logging mocks etc.

This bugged me because having direct access to a functional core I can quickly amplify the number of test cases against it and find hidden potential bugs much quicker.


I would say that testing only the public contract is generally very helpful.

When the logic gets super complex, I would think that a good comprise is to modularize your code - and test the public API of each module. A good heuristic is to see if that modules are (or could be) helpful by themselves in the future.

You then write tests for a module assuming it's dependents cover their edge cases.


As I wrote it I thought the same thing! I think some of the problem was org friction of adding a new module, getting approval to do so. This was a place with about 400 devs so they could really let anyone freely add modules and waiting for architecture approval would take too long for a typical ticket.


> So if I write my own sort algorithm I am not allowed to unit test that class unless I make it public.

I think there's a slow movement away from this kind of straightjacket. The compiler sees it all anyway. Class accessors serve 2 purposes.

1. They are a social tool; and this is only if you believe that developers who work on source code, which they can read, cannot be trusted to call methods to do what they need.

2. They are a convenience for opaque modules, so that users who may not have access to the source code, can avoid using APIs that may have no effect or problematic side effects, while also decluttering the API.

Python allows for methods to act as if they are declared public when using a special syntax eg _backdoor(). Go allows for any method that is in the same package, to access any other method regardless of accessors...like tests for that package. In javascript, you can use rewire.js to mock methods in imported modules (which are effectively methods in closures).

These solutions are an improvement to the classic rigid class access pattern that many languages are stuck with and forces the kind of theatre you have experienced.


In database land you can even freely bring up a number of _proprietary_ databases without needing an account or license. I spin up MySQL, Postgres, Oracle and SQL Server databases in containers in Github Actions for tests. Unfortunately IBM DB2 seems to require a license key to spin up their container. :(