单元测试原则 

Background   背景

I learned about the Unit Testing book through Saša Jurić’s Clarity talk. The entire talk was brilliant but the last 15 minutes especially, when he turned the discussion to testing, were eye-opening. Jurić attributed his style of testing units of behavior instead of units of code to Vladimir Khorikov’s Unit Testing book, so I decided to buy a copy.
我通过 Saša Jurić的 Clarity 演讲了解到这本单元测试的书。整个演讲都非常精彩,尤其是最后 15 分钟,当他把讨论转向测试时,让我大开眼界。Jurić将他的测试行为单元而不是代码单元的风格归功于 Vladimir Khorikov 的《单元测试》一书,因此我决定买一本。

I had the book but, truth is, I didn’t plan to read it. It didn’t even make it to my software bookshelf post. After all, I knew what worked for me and what didn’t when it came to tests; there sure was plenty to learn from the book, but I had enough to get by. I’d rather spend my reading time on some other book.
我曾有这本书,但说实话,我并没有计划读它。它甚至没能进入我的软件书架。毕竟,在测试方面,我知道什么对我有效,什么无效;这本书当然有很多值得学习的地方,但我已经足够应对了。我宁愿把阅读时间花在另一本书上。

But then I started a new job, joining a new team. What I found there was curious: my new colleagues had been maintaining an extensive test suite, they were very disciplined about it, coverage was high, every public function on every module had its own test. And, yet, this was an ineffective test suite. Tests were a lot of work to write, breaking on the smallest of refactors, bugs slipping through the cracks. What’s worse, this wasn’t perceived as a problem; the team didn’t realize they could do better.
但后来我开始了一份新工作,加入了一个新团队。我在那里发现了一些奇怪的事情:我的新同事们一直在维护一个庞大的测试套件,他们对这个套件非常严格,覆盖率很高,每个模块的每个公共函数都有自己的测试。然而,这个测试套件却非常低效。编写测试需要大量工作,最小的重构就会导致测试失败,错误会从缝隙中溜走。更糟糕的是,这并没有被视为一个问题;团队没有意识到他们可以做得更好。

I have strong opinions about testing, so I immediately knew what I wanted to change on this project. The problem was that my opinions were that: just opinions—based on experience but ultimately subjective intuitions. And I was the new guy, without reputation credits to spend; I would need something better than my gut feeling to convince the team to change habits, and to justify the effort to my manager. So for a while, I refrained from proposing any changes and started reading the testing book instead.
我对测试有着强烈的看法,所以我立刻知道在这个项目中我想改变什么。问题在于,我的看法只是看法——基于经验,但最终是主观的直觉。而且我是新人,没有可以使用的声誉积分;我需要比直觉更好的一些东西来说服团队改变习惯,并向经理证明付出的努力是值得的。所以一段时间里,我避免提出任何改变,转而开始阅读关于测试的书。

Right from the introduction, this book proved to be what I was looking for:
从引言部分开始,这本书就证明了我一直在寻找的就是它:

This book can help you articulate why the techniques and best practices you’ve been using all along are so helpful. Don’t underestimate this skill. The ability to clearly communicate your ideas to colleagues is priceless. A software developer—even a great one—rarely gets full credit for a design decision if they can’t explain why, exactly, that decision was made. This book can help you transform your knowledge from the realm of the unconscious to something you are able to talk about with anyone.
这本书能帮助你阐述为什么你一直以来使用的那些技术和最佳实践如此有帮助。不要低估这项技能。向同事清晰地传达你的想法是无价的。一个软件开发者——即使是很优秀的开发者——如果无法解释清楚某个设计决策的具体原因,很少能完全获得应有的认可。这本书能帮助你将你的知识从潜意识的领域转变为能够与任何人谈论的内容。

I come from a mathematical background and strongly believe that guidelines in programming, like theorems in math, should be derived from first principles. I’ve tried to structure this book in a similar way: start with a blank slate by not jumping to conclusions or throwing around unsubstantiated claims, and gradually build my case from the ground up. Interestingly enough, once you establish such first principles, guidelines and best practices often flow naturally as mere implications.
我来自数学背景,并坚信编程中的指南,就像数学中的定理一样,应该从第一性原理推导出来。我尝试以类似的方式构建这本书:不急于下结论或不提出未经证实的观点,从一张白纸开始,逐步构建我的论点。有趣的是,一旦你确立了这些第一性原理,指南和最佳实践往往会自然而然地作为推论出现。

What pleasantly surprised me was that, without trying too hard, this book says a lot about software design. It makes sense when you think about it: if, as the author suggests, we backtrack to the foundation of our discipline, we’ll land on what testing and design have in common: the pursuit of sustainable software.
让我惊喜的是,这本书在不费太多力气的情况下,谈了很多关于软件设计的内容。当你思考时,这很合理:如果按照作者的建议,我们回溯到我们学科的根基,我们会发现测试和设计有什么共同点:追求可持续的软件。

A good design lends itself to efficient testing—striving for a good test suite helps arrive at a good design. This is not to say that the code should be adjusted to the tests. And is not to say that the tests should be driving the implementation.
好的设计有助于高效的测试——追求一个好的测试套件有助于达到好的设计。这并不是说代码应该调整以适应测试。也不是说测试应该驱动实现。

This interrelation between design and testing is best illustrated in chapter 7, where the author suggests an ideal structure for the codebase, and shows how to refactor code towards that structure, thus enabling effective tests.
这种设计与测试之间的相互关系在第七章中得到了最好的说明,作者在那里建议了代码库的理想结构,并展示了如何重构代码以实现这种结构,从而能够进行有效的测试。

Overcomplicated code should be split into deep domain classes, to be thoroughly unit tested, and wide controllers, exercised by strategic integration tests.
过于复杂的代码应该拆分成深层次的领域类,以便进行彻底的单元测试,以及宽泛的控制器,通过策略性的集成测试进行验证。

I found this notion interestingly similar to the discussion of module depth from A Philosophy of Software Design:
我发现这个概念与《软件设计哲学》中关于模块深度的讨论非常相似

But where Ousterhout advocates for avoiding shallow modules, Khorikov suggests that there’s a role for such wide (and thin) classes: to orchestrate the pieces involved in any meaningful operation, freeing the domain model to focus on business logic—the program’s essence.
但奥斯特劳特主张避免浅层模块,科里科夫则建议这类宽泛(且单薄)的类有其作用:协调参与任何有意义操作的各个部分,让领域模型专注于业务逻辑——即程序的精髓。

Highlights   要点

Chapter 1: The goal of unit testing
第一章:单元测试的目标

  • The goal of testing is to enable sustainable growth of the software project.
    测试的目标是使软件项目能够可持续地增长。
  • Some tests are valuable and contribute a lot to overall software quality. Others don’t. They raise false alarms, don’t help you catch regression errors, and are slow and difficult to maintain.
    有些测试很有价值,对整体软件质量贡献很大。而另一些则不然。它们会发出虚假警报,无法帮助你捕捉回归错误,而且维护起来既慢又困难。
  • To enable sustainable project growth, you have to exclusively focus on high-quality tests—those are the only type of tests that are worth keeping in the test suite.
    为了实现可持续的项目增长,你必须专注于高质量的测试——只有这类测试才值得保留在测试套件中。
  • Coverage metrics are a good negative indicator (low coverage means you’re not testing enough) but a bad positive one (high coverage doesn’t guarantee good testing quality). Targeting a specific coverage number creates a perverse incentive that goes against the goal of unit testing.
    覆盖率指标是一个很好的负面指标(低覆盖率意味着测试不够充分),但不是一个好的正面指标(高覆盖率并不能保证测试质量好)。设定特定的覆盖率目标会形成一种扭曲的激励,这与单元测试的目标背道而驰。

Chapter 2: What is a unit test?
第二章:什么是单元测试?

  • A unit test is an automated test that:
    单元测试是一种自动化测试,它:
    • verifies a single unit of behavior,
      验证单个行为单元,
    • does it quickly,  快速执行,
    • and does it in isolation from other tests.
      并在与其他测试隔离的情况下进行。
  • Tests shouldn’t verify units of code. Rather, they should verify units of behavior, something that is meaningful for the problem domain and, ideally, something that a business person can recognize as useful. The number of classes it takes to implement such a unit of behavior is irrelevant.
    测试不应该验证代码单元,而应该验证行为单元,这种单元对问题领域有意义,理想情况下,业务人员能识别其价值。实现这种行为单元所需的类数量无关紧要。
  • The ubiquitous use of mocks produces tests that couple too tightly to the implementation.
    模拟的普遍使用会导致测试与实现耦合过于紧密。
  • Instead of reaching for mocks to test a large, complicated graph of interconnected classes, you should focus on not having such a graph of classes in the first place. More often than not, a large class graph is a result of a code design problem.
    与其使用模拟来测试一个庞大复杂的类图,你应该专注于从一开始就避免这样的类图。大多数情况下,庞大的类图是代码设计问题的结果。

Chapter 4: The four pillars of a good unit test
第四章:良好单元测试的四大支柱

  • A good unit test has the following four attributes:
    一个优秀的单元测试具有以下四个属性:
    • Protection against regressions
      防止回归
    • Resistance to refactoring
      抵抗重构
    • Fast feedback  快速反馈
    • Maintainability  可维护性
  • When there is resistance to refactoring, you become confident that your code changes won’t lead to regressions. Without such confidence, you will be much more hesitant to refactor and much more likely to leave the code base to deteriorate.
    当存在重构阻力时,你才会确信代码变更不会导致回归。如果没有这种信心,你将更犹豫是否重构,并且更有可能让代码库逐渐恶化。
  • The more the test is coupled to the implementation details of the system under test (SUT), the more false alarms it generates. You need to make sure the test verifies the end result the SUT delivers: its observable behavior, not the steps it takes to do that.
    测试与被测系统(SUT)的实现细节耦合得越紧密,它产生的误报就越多。你需要确保测试验证的是 SUT 最终交付的结果:它的可观察行为,而不是它达成这一结果所采取的步骤。
  • Choose black-box testing over white-box testing by default. If you can’t trace a test back to a business requirement, it’s an indication of the test’s brittleness. Either restructure or delete this test.
    默认选择黑盒测试而非白盒测试。如果你无法将一个测试追溯到业务需求,这表明该测试的脆弱性。要么重构这个测试,要么删除它。

Chapter 5: Mocks and test fragility
第五章:模拟对象与测试脆弱性

  • For a piece of code to be part of the system’s observable behavior, it has to do one of the following things:
    要让一段代码成为系统可观察行为的一部分,它必须做到以下其中一点:
    • Expose an operation that helps the client achieve one of its goals.
      暴露一个操作,帮助客户端实现其一个目标。
    • Expose a state that helps the client achieve one of its goals.
      暴露一个状态,帮助客户端实现其一个目标。
    Any code that does neither of those two things is an implementation detail.
    任何既不执行这两种行为中的任何一种的代码都是实现细节。
  • Ideally, the system’s public API surface should coincide with its observable behavior, and all its implementation details should be hidden from the eyes of the clients. Such a system has a well-designed API. Making the API well-designed automatically improves unit tests.
    理想情况下,系统的公共 API 表面应该与其可观察行为相一致,并且所有实现细节都应该对客户端隐藏。这样的系统具有设计良好的 API。使 API 设计良好会自动改进单元测试。
  • The way your system talks to the external world forms the observable behavior of that system as a whole. It’s part of the contract your application must hold at all times.
    你的系统与外部世界交互的方式构成了该系统的整体可观察行为。这是你的应用程序必须始终遵守的合同的一部分。
  • The use of mocks is beneficial when verifying the communication pattern between your system and external applications. Conversely, using mocks to verify communications between classes inside your system results in tests that couple to implementation details and therefore fall short of the resistance-to-refactoring metric.
    在验证你的系统与外部应用程序之间的通信模式时使用模拟是有益的。相反,使用模拟来验证系统内部类之间的通信会导致测试与实现细节耦合,因此不符合可重构性指标。

Chapter 7: Refactoring toward valuable unit tests
第七章:重构以实现有价值的单元测试

  • All production code can be categorized along two dimensions:
    所有生产代码都可以沿两个维度进行分类:
    • Complexity or domain significance.
      复杂度或领域重要性。
    • The number of collaborators.
      协作人数。
  • This categorization gives us four kinds of code:
    这种分类给了我们四种类型的代码:
    • Trivial code (low complexity/significance, few collaborators): this code shouldn’t be tested at all
      琐碎代码(低复杂度/重要性,协作者少):这种代码完全不需要测试
    • Domain model and algorithms (high complexity/significance, few collaborators): this code should be unit tested. The resulting unit tests are highly valuable and cheap.
      领域模型和算法(高复杂度/重要性,协作者少):这种代码应该进行单元测试。生成的单元测试非常有价值且成本低廉。
    • Controllers (low complexity/significance, many collaborators): controllers should be briefly tested as part of overarching integration tests.
      控制器(低复杂度/重要性,协作者多):控制器应作为整体集成测试的一部分进行简要测试。
    • Overcomplicated code (high complexity/significance, many collaborators): this code is hard to test, and as such it’s better to split it into domain/algorithms and controllers.
      过于复杂的代码(高复杂度/重要性,许多协作者):这段代码难以测试,因此最好将其拆分为领域/算法和控制器。
  • The domain model encapsulates the business logic and the controllers deal with the orchestration of collaborators. You can think of these two responsibilities in terms of code depth versus code width. Your code can be either deep (complex or important) or wide (work with many collaborators), but not both.
    领域模型封装业务逻辑,而控制器负责协作者的编排。你可以从代码深度与代码宽度的角度来理解这两个职责。你的代码可以是深的(复杂或重要),也可以是宽的(与许多协作者协作),但不能两者兼具。
  • Getting rid of the overcomplicated code and unit testing only the domain model and algorithms is the path to a highly valuable, easily maintainable test suite. With this approach, you won’t have 100% test coverage, but you don’t need to.
    消除过于复杂的代码,并仅对领域模型和算法进行单元测试,是构建高价值、易于维护的测试套件的道路。采用这种方法,你不必追求 100%的测试覆盖率,也无需这样做。

Chapter 8: Why integration testing?
第 8 章:为什么需要集成测试?

  • Check as many of the business scenario’s edge cases as possible with unit tests; use integration tests to cover one happy path, as well as any edge cases that can’t be covered by unit tests.
    尽可能多地用单元测试检查业务场景的边缘情况;使用集成测试覆盖一条成功路径,以及那些单元测试无法覆盖的边缘情况。
  • In the most trivial cases, you might have no unit tests whatsoever. Integration tests retain their value even in simple applications.
    在最简单的情况下,你可能完全没有单元测试。即使是在简单的应用程序中,集成测试仍然具有其价值。
  • Try to always have an explicit, well-known place for the domain model in your code base. The explicit boundary makes it easier to tell the difference between unit and integration tests.
    尽量在代码库中始终有一个明确的、众所周知的领域模型位置。明确的边界使得区分单元测试和集成测试更加容易。
  • Layers of indirection negatively affect your ability to reason about the code. This results in a lot of low-value integration tests, that provide insufficient protection against regressions combined with low resistance to refactoring.
    间接层会负面影响你推理代码的能力。这导致大量低价值的集成测试,这些测试在防止回归方面的保护不足,同时重构的阻力也很低。
  • In most backend systems, you can get away with just three layers: the domain model, application services layer (controllers), and infrastructure layer.
    在大多数后端系统中,你只需要三个层次:领域模型、应用服务层(控制器)和基础设施层。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注