格物致知

Over the past year or two, AI has been placed at the center of almost every technical conversation.

Every day, we hear claims that AI will reshape software development, replace many jobs, and allow one person to do the work of an entire team. The demos are impressive enough to make it feel as if mastering AI immediately means multiplying your productivity, and failing to keep up means being left behind.

That narrative affected me deeply. I felt both excited and anxious. Excited because it did feel like a new era was arriving. Anxious because I wondered whether I was already falling behind.

So I started bringing AI more actively into my daily workflow, not just as a toy, but as something I genuinely wanted to use in design, development, and testing. I wanted to know whether it could really help me keep up with this so-called “speed of the era.”

But after putting AI into real work, I gradually realized that there is a long distance between AI in demos and AI in engineering reality.

AI Does Not Understand Context, and It Is Not Just Because It Has Not Read Enough Documents

The project I work on day to day is a very large microservice system. In such a system, a requirement is rarely about changing a single API, page, or table. It may involve multiple services, legacy compatibility, business rules, rollout processes, permissions, payments, risk control, and many implicit conventions that only people who have worked in the system for a long time understand.

During the design phase, I often find AI’s proposals too vague. It can quickly list modules, workflows, APIs, and exception handling. At first glance, the output looks complete. But once I read carefully, I often realize that it has not truly understood the system. It may invent workflows that do not exist, misunderstand service boundaries, or produce a standardized design that cannot really be implemented.

At first, I thought the problem was that I had not provided enough context. Later, I realized it was more complicated than that.

In a real enterprise environment, context is not a clean, complete, always-available thing.

First, not everything is documented. A lot of critical knowledge exists in the memory of senior engineers, in verbal agreements formed after production incidents, or inside some mysterious conditional branch that nobody dares to touch.

Second, even when documentation exists, it may already be outdated. After a system has evolved for years, it is very common for documentation and actual implementation to drift apart.

Some people may say: talk is cheap, show me your code. So why not just let AI read the code?

But in reality, code is not always the truth. Production code only tells us how the system currently runs. It does not necessarily mean the business logic is correct. We once had a payment-related feature that had been online for half a year before a problem was discovered, eventually causing losses of over a million dollars. That incident left a deep impression on me: production code is not automatically the correct answer. It may simply be a problem that has not yet been exposed.

What makes it harder is that code usually tells you “what is happening,” but not “why it is happening.”

You may read from the code that when request.type != 2, a certain branch should be taken. But you may not know what type = 2 means in business terms, why that branch was added, whether it was a temporary fix, a legacy compatibility rule, or part of a financial policy. In a system with few comments, abstract naming, and fast-changing business logic, code provides behavioral traces, not complete context.

So when I say AI does not understand context, I do not simply mean that it has not read the entire codebase. More precisely, AI is facing a complex environment that even human engineers need long-term immersion to understand.

In real engineering, context is not just code and documentation. It is code, history, incidents, organizational memory, business constraints, and bugs that have not yet been discovered.

AI Writes Fast, But Review Is Not Easy

In the development phase, AI shows another kind of contrast.

There is no denying that AI writes code extremely fast. In many cases, it can generate in one hour what might have taken me two or three days. Boilerplate code, field mapping, API adaptation, and simple logic implementation are areas where it performs very well.

But the problem is that I cannot fully trust the code it writes.

AI-generated code still needs review, and that review process is often painful. Its changes are frequently huge, sometimes involving thousands of lines of diff. It seems to treat “finishing the requirement” as the only goal, without enough concern for the scope of change, architectural boundaries, maintainability, or how the system should evolve over time.

This still comes back to the context problem. The less AI understands the system, the more likely it is to use large-scale changes to cover uncertainty. It may quickly implement the feature, but fail to realize whether the change breaks an existing abstraction, bypasses an established process, or introduces branches that will be difficult to maintain later.

A subtle shift then happens: AI appears to have completed the work for me, but in fact it has handed another kind of work back to me.

In the past, I spent most of my time writing code. Now, I spend more time judging code.

After using AI, I write less code, but I carry more responsibility for judgment.

Testing Is Where AI Helps the Least

If AI can still provide some help during design and development, then testing is probably where I currently feel its limitations the most.

In a large microservice system, many features cannot be verified locally. They need to be deployed to a staging environment. They depend on real or realistic data. They require cooperation from Web or mobile apps. They involve logs, traces, metrics, and interactions across multiple services.

In this kind of scenario, AI cannot truly participate in the full validation process.

It can help me list test cases, remind me of edge cases, and generate drafts of test scenarios. But if it cannot access the real environment, operate the Web or mobile app, observe distributed logs, or understand the actual state of the staging environment, then its help remains relatively shallow.

This made me realize that correctness in software engineering does not end with writing a piece of code that looks reasonable. The hard part is proving that it is correct in a real environment.

AI Exposes the Context Debt of a System

After experiencing these gaps, I do not think AI is useless. On the contrary, I have started to understand its value from another angle.

AI is not only a code-writing tool. It is also a mirror. It reflects the context debt that already exists in our systems.

When AI keeps asking: What does this service do? What does this field mean? Why does this workflow behave this way? What business rule is behind this condition?

These are not only AI’s problems. They are also reminders of how much system knowledge has not been properly captured.

These problems existed before. We simply relied on human experience, memory, and communication to keep things barely working. Once AI enters the workflow, because it cannot automatically know all this implicit knowledge, the gaps become much more visible.

From this perspective, providing context to AI is itself a system knowledge audit.

More importantly, AI can not only expose these gaps, but also help repair them. It can generate first drafts of module explanations, call chain summaries, field meaning guesses, business workflow documents, and testing checklists based on code, APIs, logs, and historical requirements. Humans do not have to write everything from scratch. Instead, we can verify, correct, and supplement the output.

The key is not to “let AI write more documents,” but to build a system knowledge base that is traceable, verifiable, and continuously updated.

In my ideal AI tool, AI should not only read the knowledge base at the beginning of a task. It should also update the knowledge base at the end of a task. After each feature or code change, AI could use the requirement, code diff, test results, and review feedback to check whether related knowledge is still accurate: whether service responsibilities have changed, whether API contracts have changed, whether field meanings have changed, whether business workflows have changed, and whether historical constraints have been broken.

If updates are needed, AI should generate candidate changes with sources attached: which PR, which diff, and which requirement background led to this conclusion. Engineers can then review and confirm whether these changes are accurate.

In other words, AI should not only consume context. It should also help produce context.

Of course, there are risks. Knowledge base updates must not become piles of polished AI-generated text. AI-generated knowledge changes must be reviewable, source-backed, evidence-based, and marked with uncertainty where necessary. They should enter a review process as serious as code review.

Code review examines implementation. Knowledge review examines interpretation. Implementation errors may surface quickly, while wrong interpretations may remain hidden for a long time. So the real point is not to automatically generate documentation, but to bring knowledge changes into engineering governance.

AI Makes It Easier to “Write It,” and More Important to “Think It Through”

Another value of AI is that it does reduce a lot of manual development labor.

Boilerplate code, repetitive logic, and API adaptation that used to take a lot of time can now be generated quickly by AI. This is not free, because review, validation, and correction are still needed. But it does change how engineers may allocate their time.

If implementation becomes cheaper, engineers should spend more energy on higher-value work: understanding the business, designing boundaries, controlling complexity, and thinking about system evolution.

This is another new understanding I have gained about AI: AI may not lower the bar for engineers. It may actually raise it.

Because when “writing it” becomes easier, the truly scarce ability becomes judgment.

What should be done, and what should not?
Where should we introduce abstraction, and where should we keep things simple?
Should this requirement also fix some historical problem, or should we control the scope of change?
Could this feature affect payments, permissions, risk control, or user experience after launch?
The AI-generated implementation seems to work, but does it align with the long-term direction of the system?

These questions do not disappear just because AI can write code. On the contrary, they become more important.

Of course, this shift does not happen automatically. If we simply let AI generate large chunks of code and then exhaust ourselves reviewing them, we have only transformed coding labor into review labor. AI lowers the cost of writing code, but it does not necessarily lower the cost of building good software. In some cases, it shifts the cost from coding to understanding, review, validation, and governance.

So AI is more like an amplifier.

If system knowledge is well managed, AI amplifies that ability to accumulate knowledge. If the system is chaotic, AI amplifies the chaos. If engineers have strong judgment, AI amplifies output. If engineers only pursue speed, AI amplifies technical debt. If the team’s process is mature, AI improves efficiency. If the process is fragile, AI creates more uncontrolled changes.

Conclusion

Back to the anxiety I felt at the beginning.

I once felt anxious because of all the hype around AI. I worried that I had not kept up with the times. But after bringing AI into real work, I gradually became calmer.

AI is indeed powerful, but real-world engineering is not that simple.

It can generate code quickly, help understand systems, assist in organizing knowledge, and reduce some repetitive labor. But it still cannot replace engineers in taking responsibility for complex systems. It does not naturally understand history, business context, which code is correct, which documents are outdated, or which pieces of logic are simply problems that have not yet surfaced.

So my attitude toward AI is no longer blind pursuit, nor simple skepticism.

I prefer to see it as a high-output but low-context collaborator. It can help me obtain materials faster, but the final judgment still belongs to humans. It can help me write less code, but it also requires me to think more seriously about the system.

AI gives engineers the possibility of standing higher, but it also exposes whether we truly have the conditions to stand there.

Perhaps the real change is not that AI will do all the work for us, but that it forces us to answer a question again:

When writing code becomes easier and easier, what is the real value of an engineer?

2026-05 Thinking With AI