The Li Clan of Xiaokeshan: Seven Centuries of Scholarship in a Mountain Valley

The four Li brothers at Xiaokeshan
The four brothers of the Li clan's 'Ming' generation, at Xiaokeshan. They supported each other throughout their lives.

Many families write their genealogies, and they tend to fall into one of two traps.

The first is a dense list of names — reads like a phone book. The second is a desperate scramble to link themselves to distant emperors and generals, as if a single sentence could vault them into royal lineage.

But the truly moving part of a family's story often lies not in "who our ancestors were," but in "how the generations that followed chose to live."

The story of our Li clan of Keshan (磕山李氏) begins, roughly, in the chaos of the late Tang Dynasty.

According to the Keshan Li Clan Genealogy and the Santian Li Clan Genealogy, the Keshan Li branch belongs to the Santian Li lineage. The Santian Li trace their roots to the Tang imperial house, with ancestral ties to Longxi. The line can be traced back to a descendant of Emperor Xuanzong of Tang (Li Chen). From Li Rui, the ninth son of Emperor Xuanzong and Prince of Zhao, came Lord Li Jing. Lord Li Jing was originally named Li Yang, later renamed Li Jing.

During the Huang Chao Rebellion at the end of the Tang Dynasty, around 880 CE, Lord Li Jing migrated south, settling in Jietian, Fuliang, Raozhou — in the area of today's Jingdezhen, Jiangxi. Later, his descendants branched out to Xintian in Qimen, Yantian in Wuyuan, and Jietian in Fuliang — known thereafter as the "Three Fields Li" (三田李氏).

This part sounds distant. As distant as a page from a history book. But family history moves closer, one step at a time.

From the late Tang through the Song and Yuan dynasties, from Jiangxi to Anhui, from Fuliang in Raozhou to Gukang in Dongzhi, to Yangshan, and finally to Xiaokeshan in Fanchang — generation after generation migrated, fled turmoil, sought livelihoods, and put down roots. Then, during the Jingding era of the Southern Song, Lord Rongsheng's son, Lord Rongyi, took his three sons down the Zhangxi River and along the Yangtze, arriving at Xiaokeshan in Fanchang.

The mountain is small. The name carries no fame.

But Lord Rongyi and his party stopped here.

They settled at the foot of Xiaokeshan, in a place called Laowuji — the Old House Foundation. From that point on, this branch of the Li clan took root and grew. Descendants honor Lord Rongyi as the founding ancestor of the Keshan Li.

This is, perhaps, the most authentic beginning for many Chinese families: not a tale of armored cavalry or court intrigue, but a few people, with their children and belongings, following the river downstream, finding a place where they could survive — building houses, clearing fields, lighting fires, raising children. And then, passing the days down through the generations.

What makes the Keshan Li truly worth writing about is not just their origins, but their family tradition.

From very early on, this clan placed a high value on education.

During the Ming Dynasty, the clansmen built the Jiashutang ("Hall of Shelved Books") ancestral hall at Laowuji. It is said to have covered twenty mu of land, with three courtyards, ninety-nine and a half rooms, all timber-framed — known locally as the "Hall of a Hundred Beams." Carved beams and painted rafters, majestic in scale.

The name Jiashutang is telling. It is not "Hall of Gathering Wealth" or "Hall of Prominence." It is "Hall of Shelved Books."

Shelve the books, teach the children, and the lifeblood of the family continues.

Later came Xigong Ci, which elders recall was primarily a private school — a place where the clan nurtured its young and conducted lectures. Xiaokeshan is just a mountain valley, but because of these ancestral halls, private schools, and teachers, it gradually filled with the sound of recitation. For a time, students from both sides of the Yangtze traveled to Xiaokeshan to study.

This is what I find most moving. A mountain valley that could draw students from near and far — not by scenery, not by power, but by education.

Sadly, both Jiashutang and Xigong Ci were destroyed during a particular era, and the genealogical records were nearly scattered and lost. The old buildings are gone, the wooden beams gone, and the sounds of study seem to have faded into the distance.

But some things, even when the buildings are destroyed, cannot be erased. Because they have entered the bones of the people.

Over seven centuries, the Keshan Li clan has produced, generation after generation, scholars, educators, physicians, soldiers, and researchers.

In the Qing Dynasty, there was Li Dahua, courtesy name Dunlun, pen name Xiangzhai. A suigongsheng during the Guangxu period, he served as magistrate of Huichang, Shangyou and other counties in Jiangxi, and in his later years returned home to teach, with disciples in great number.

There was Li Hucen, born into a tradition of farming and scholarship. In the 19th year of the Guangxu reign, he founded the Fanchang Higher Primary School — later Fanchang No. 1 Primary School — and donated thirty mu of farmland as a school endowment. Founding a school was not about slogans; it was about giving your family's land so the school could survive.

There was Li Shixiu, who devoted his life to running schools and teaching. He founded the Chongshi Chinese College and Keshan Primary School, donated over ten mu of farmland, and served as headmaster without taking a salary. These words may sound light today; in that era, they meant truly investing one's family fortune and life's energy into education.

There was Li Yingwen, a Meiji University graduate in political science who spent his life as an educator. During the War of Resistance, when the Japanese army attacked the Keshan area, they invited him to serve as county magistrate of Fanchang. He refused to serve the puppet regime, skillfully maneuvering before making his way to the Wuwei anti-Japanese base area, where he continued his educational work. In times of chaos, a scholar's integrity sometimes rests in a single word: "No."

There was Li Yingfan, who during the War of Resistance served as colonel secretary to General Gu Zhutong, commander of the Third War Zone. Later, unwilling to leave his homeland, with aging parents and young children, he declined three invitations to relocate to Taiwan. In subsequent years, amid shifting times, he endured years of imprisonment. In his later years, his reputation was restored, and he served as a researcher at the Anhui Literary and Historical Archives, leaving behind more than ten volumes of his collected poems. His poetry, at once classical and playful, stands as a representative work in the cultural heritage of the Keshan Li.

There was Li Huaibei, given name Pu, who was shaped by his family's educational tradition from a young age and later rushed to the front lines of the War of Resistance. He participated in revolutionary work, experienced the Huaihai Campaign and the Yangtze Crossing Campaign, and ultimately gave his life in 1955.

There was Li Ruofei, given name Qin, who fought in the War of Resistance, the Huaihai Campaign, the Yangtze Crossing Campaign, and the Korean War, later transferring to the Hefei Institute of Optics and Fine Mechanics of the Chinese Academy of Sciences, leaving behind battlefield diaries from each period.

There was Li Mingjie, a chief surgeon who practiced medicine his entire life, prioritizing efficacy, minimizing costs, and always thinking of his patients' welfare. A physician's compassion is rarely found in grand words — it is in every yuan saved for a patient, every bit of suffering spared.

There was Li Yangzhen, who spent forty-eight years in clinical practice, teaching, and research in traditional Chinese medicine — writing books, publishing papers, teaching, treating patients, decade after decade. Beyond medicine, he wrote travelogues, family histories, and poetry. In a person like him, you see the quintessential scholar of an older generation: someone who did solid work and wrote prolifically — like an old well, its water never ceasing.

In modern times, clan members have also entered fields like computing and artificial intelligence.

Looking back now at the words "Xiaokeshan Li Clan," you realize it is more than just a surname attached to a place.

It is a thread.

A thread that runs from the chaos of the late Tang, through Fuliang in Jiangxi, through Gukang and Yangshan in Dongzhi, finally settling in Xiaokeshan, Fanchang.

It passes through ancestral halls, private schools, genealogical records, war, the Cultural Revolution, and the Reform and Opening — and through one real person after another: the teacher, the doctor, the soldier, the poet, the researcher, the AI engineer.

The most precious thing about this thread is not how illustrious our origins were. It is the reminder to those who come after: how far a family can go depends not on the halo of its ancestors, but on whether later generations keep reading, keep being good people, and keep doing solid work.

Ancestral halls can be destroyed. Old houses can collapse. Genealogies can scatter.

But as long as someone still asks, "Where do we come from?" — as long as someone still remembers the names of those who came before, and still tells the children the family stories of valuing education, valuing integrity, and valuing responsibility — this cultural thread has not been broken.

Xiaokeshan is nothing more than a mountain valley.

But seven centuries later, the sound of recitation that once echoed there still resonates in the destinies of its descendants.

小磕山李氏:一条山冲里的七百年书声

李家名字辈四兄弟在小磕山前合影
李家名字辈四兄弟,在小磕山前。四兄弟相互扶持一生。

 

很多家族写家谱,容易写成两种东西。 一种是密密麻麻的人名,读着像电话号码本;另一种是拼命往远祖上靠,恨不得一句话就把自己接到帝王将相那里。

其实,一个家族真正动人的地方,往往不在"祖上是谁",而在"后来的人怎么活"。

我们磕山李氏的故事,大概要从唐末乱世说起。

据《磕山李氏宗谱》《三田李氏宗谱》等记载,磕山李氏属于三田李氏分支。三田李氏远承唐代宗室,郡望陇西。其源流可上溯至唐宣宗李忱一脉。唐宣宗第九子昭王李汭之后,有李京公。李京公原名李佯,后改名李京。

唐末黄巢之乱,天下动荡。公元八八〇年前后,李京公南迁,卜居江西饶州浮梁界田,也就是今天景德镇一带。后来,其后裔分迁祁门新田、婺源严田、浮梁界田,后世称为"三田李氏"。

这一段,听起来离我们很远。远到像历史书里的一页。但家族的历史,就是这样一点点从远处走近的。

从唐末到宋元,从江西到安徽,从饶州浮梁到东至古港、阳山,再到繁昌小磕山,一代代人迁徙、避乱、谋生、落脚。最后,到了南宋景定年间,仁盛公之子荣一公携三子,经东至张溪河,顺长江而下,来到繁昌小磕山。

这里山不大,名气也不响。但荣一公一行人停了下来。

他们在小磕山下定居,地方叫老屋基。从此,这一支李氏就在这里生根发芽。后人尊荣一公为磕山李氏一世祖。

这大概就是中国许多家族最真实的开端:不是金戈铁马,也不是庙堂风云,而是几个人拖家带口,沿江而下,找到一块能活下去的地方,搭屋,开田,生火,养孩子。然后,把日子一代代过下去。

磕山李氏真正值得一写的,不只是源流,而是家风。

这个家族很早就重视读书。

明代,族人在老屋基建了"架书堂"宗祠。据说占地二十亩,三进院落,九十九间半房子,木质结构,俗称"百梁厅"。雕梁画栋,气势恢宏。

"架书堂"这个名字很有意思。它不是"聚财堂",不是"显贵堂",而是"架书堂"。书架起来,孩子教起来,家族的命脉也就接起来了。

后来还有"禧公祠",据老人回忆,主要是私塾,是族中培养子弟、讲学育人的地方。小磕山本是一条山冲,却因为这些祠堂、私塾和先生,慢慢有了书声。有一段时期,大江南北的学子还曾来到小磕山求学。

这才是我觉得最动人的地方。一个山冲,能让远近学子奔赴而来,靠的不是风景,也不是权势,而是教育。

可惜,架书堂和禧公祠后来都毁于特殊年代,宗谱也一度几近散佚。老建筑没有了,木梁没有了,书声似乎也远了。

但有些东西,房子毁了,也毁不掉。因为它已经进了人的骨头里。

磕山李氏七百多年中,代有读书人,也代有办学者、从医者、从军者、治学者。

清代有李达华,字敦伦,号香斋。光绪年间岁贡生,历任江西会昌、上犹等县知县,晚年归乡讲学,桃李众多。

有李虎岑,出身耕读世家,光绪十九年创办繁昌高等小学堂,也就是后来的繁昌第一小学,并捐田三十亩作为学田。办学不是喊口号,是把自家的田拿出来,让学校能活下去。

有李世秀,一生从事办学讲学,创办崇实中文专科学校和磕山小学,捐田十余亩,任校长而不取薪资。今天看这几句话,也许轻飘飘;放在那个年代,那是真正把家产和心血投进教育里。

有李应文,毕业于日本明治大学,获政学士,一生教书育人。抗战时期,日军进攻磕山,曾邀其出任繁昌县县长。他不愿事伪,巧妙周旋,后来去了无为抗日根据地,继续办教育。乱世之中,读书人的骨气,有时候就藏在"不去"两个字里。

有李应繁,抗战时期曾任国民革命军第三战区司令顾祝同上校秘书。后来因故土难离、父母年老、孩子幼小,三次谢绝赴台邀请。此后又因时代变迁,遭受多年牢狱之灾。晚年恢复名誉,任安徽文史馆馆员,留下《李应繁诗词集》十余册。他的诗词古朴典雅,又风趣俏皮,是磕山李氏文化传承中很有代表性的一位。

有李怀北,名朴,少年时代受家族教育熏陶,后来奔赴抗日前线,参加革命工作,经历淮海战役、渡江战役等,最终于一九五五年牺牲。

有李若非,名勤,参加抗日战争、淮海战役、渡江战役、抗美援朝,后来转入中国科学院合肥光学精密机械研究所工作,并留下各时期战地日记。

有李名杰,外科主任医师,一生行医,重疗效,少花钱,处处替患者着想。医者仁心,往往不在大话里,而在每一次替病人省下的钱、少受的苦里。

有李扬缜,长期从事中医临床、教学、科研,任职四十八年,著书、论文、教学、行医,几十年如一日。医学之外,又写游记、家族文字和诗歌。这样的人,身上有老一代知识分子很典型的一面:做事很实,写东西也很多,像一口老井,水一直在。

到了现代,家族中也有人进入计算机、人工智能等新技术领域。

写到这里,回头再看"小磕山李氏"这几个字,会觉得它不只是一个姓氏加一个地名。

它是一条线。

这条线,从唐末乱世走来,经过江西浮梁,经过东至古港、阳山,最后落在繁昌小磕山。

它穿过祠堂、私塾、族谱、战乱、文革、改革开放,也穿过一个个具体的人:教书的、行医的、从军的、写诗的、做科研的、搞人工智能的。

这条线最珍贵的地方,不是说我们从哪里来得多显赫,而是提醒后人:一个家族能不能走远,不靠祖宗的光环,而靠后人有没有把书读下去,把人做好,把事做实。

祠堂可以毁,老屋可以塌,谱牒可以散佚。

但只要还有人愿意追问"我们从哪里来",愿意记住先人的名字,愿意把家族中那些重教、重义、重担当的故事讲给孩子听,这条文脉就没有断。

小磕山不过是一条山冲。

但七百多年过去,那里曾经响过的书声,还在后人的命运里回响。

 

作者:李扬缜 立委

The Attention Bankruptcy Era

The truly scarce resource in the AI era isn't information, isn't knowledge, isn't even compute.

It's human attention span.

Attention.

In the pre-internet era, our pain was: "Too little information, can't find anything."

Now the AI era has flipped it completely. The things you want to read, would love to read, find genuinely valuable in a lifetime — already far exceed the limited bandwidth of the human brain.

The result? Our attention drifts randomly. Randomly assigned to whichever tiny fragment happens to crash into our field of vision.

Many of you know this feeling. Take my bookmarks folder. It's stuffed with: articles, videos, papers, podcasts, technical materials that I "plan to seriously read someday."

The moment I bookmarked them, I genuinely believed: "This is worth my time to digest."

But if I didn't get sucked in right then — if I didn't ride that wave and read it through — it was almost certainly lost forever. Sure, formally it's still there. Still on your radar. Theoretically reachable anytime. But your brain has long since turned the page.

So much of what we call "saving" isn't actually reading. It's a psychological comfort: "I have approached the knowledge."

Here's the absurdity of modern society. Humanity is drowning in information overload. And AI is amplifying this trend tenfold.

Because in the past, the flood of information was at least constrained by: the speed at which humans produce content.

Now agents can work for you 24/7: generating, summarizing, forwarding, distributing, repurposing, rewriting, running accounts. Diligently. Tirelessly.

But here's the problem. The world's information production speed has begun to far exceed humanity's "information digestion" speed.

As a result, high-quality content going unnoticed will increasingly become the norm of the information society.

Stop fantasizing that: "As long as I'm diligent enough, hardworking enough, my content is good enough, I will surely be seen." The peach tree doesn't speak, yet a path forms beneath it. That's not how it works.

Going viral is often luck. Partly marketing. Mostly platform promotion.

Because the attention economy, at its core, is: platforms using algorithms to manipulate and allocate humanity's limited attention. And it's terrifyingly effective.

Because platforms aren't just better at understanding content. They're better at understanding human nature. Humans are creatures of inertia. Whatever the platform pushes, most people just watch. Busy? Scroll. Tired? Scroll. Killing time? Scroll.

We end up in a bizarre era: masses of people frantically producing content, hoping others will notice them. Meanwhile, everyone's attention is simultaneously going bankrupt.

So the truly healthy creative mindset for the future should be: you have something to express. You want to put it out there. That's enough.

Stop clinging to: "It must reach many people."

Aside from your closest friends and family, the fate of most content in this era was always to be swept away by the flood.

注意力破产时代

AI 时代真正稀缺的东西,不是信息,不是知识,甚至不是算力。

而是人的 attention span。

注意力。

前互联网时代,我们的痛苦是:"信息太少,找不到。"

现在 AI 时代正好反过来了。是你一辈子想看的、愿意看的、觉得有价值的信息,已经远远超过了人脑有限带宽。

结果是什么?结果就是我们的注意力,开始随机漂流。随机分配给那个刚好撞进视野的小局部。

很多人应该都有这个体验。比如我的收藏夹。里面已经堆满了:"以后打算认真看"的文章、视频、论文、播客、技术资料。

收藏那一刻,我是真觉得:"这个东西值得我花时间消化。"

但如果当时没有被吸进去,一口气读下去,那它大概率就已经永远错过了。虽然形式上,它还挂在那里。还在你的索引雷达上。理论上你随时可以 reach。但人的脑子其实早就翻篇了。

所以现在很多"收藏",本质上并不是阅读。而是一种:"我已经接近知识"的心理安慰。

现代社会荒诞的一点在这里。人类被"信息过载"直接淹没。而 AI 正在把这个趋势放大十倍。

因为过去的信息洪流,好歹还受限于:人类生产内容的速度。

现在 agent 可以 24 小时不停替你:生成、总结、转发、分发、搬运、改写、运营账号。它尽职尽责。

但问题也来了。全世界的信息生产速度,开始远远超过人类"消化信息"的速度。

于是高质量内容无人问津,会越来越成为信息社会的常态。

不要再幻想什么:"只要我足够认真、足够勤奋、内容足够好,就一定会被看见。"桃李不言 下自成蹊。不是这样的。

偶然爆红很多时候靠运气。一部分要靠运营。更大一部分,是平台投流。

因为 attention economy 本质上就是:平台通过算法,去操纵和分配人类有限注意力。而且它极其有效。

因为平台其实不仅仅是更懂内容。它是更懂人性。人本来就是惰性动物。平台推什么,草民大多数时候就看什么。忙着也是刷。累着也是刷。kill time 也是刷。

最后形成一个很诡异的时代:很多人在拼命制造内容,希望别人注意自己。与此同时,所有人的注意力却同时濒临破产。

所以未来真正健康的创作心态,应该是:你有表达欲。你愿意抒发。这就够了。

不要再执念于:"必须很多人接着。"

除了你的至亲密友,大多数内容在这个时代的宿命,本来就是被信息洪流冲走。

When White-Collar Work Becomes Aluminum Foil

The AI world has been buzzing about "Token Economy" and "Token Dividends." The most talked-about story: Anthropic, riding this wave, seems almost destined for a trillion-dollar valuation—a genuine business miracle.

What exactly is the "Token Dividend"?

Some put it this way:

Token is not a tool. Token is silicon time.

Companies used to spend money on people. In the AGI era, they'll lay off workers and spend that same money hiring machines, burning tokens.

One white-collar worker used to put in 8 hours a day. Now one ambitious person can orchestrate dozens of agents, working in parallel, 24/7.

Why could Anthropic hit a trillion dollars? Because it doesn't sell software. It sells tokens—infinitely scalable silicon cognitive labor.

Much of the AI evangelism defaults to assuming this "efficiency gain" naturally equals "social progress."

But history tells a different story.

The steam engine boosted efficiency. It also produced:

* Mass bankruptcy of artisanal trades * Urban slums * Child labor * Worker uprisings * The Luddite movement * Decades of social fracture

The Industrial Revolution did eventually increase total wealth—but the generation caught in the middle was largely steamrolled.

And this AI/Token wave is more volatile, more rapid, more ruthless than the Industrial Revolution. Its first target isn't muscle— it's the white-collar middle class.

The very stabilizer that's been the backbone of industrial society for two centuries:

* White-collar workers * Engineers * PMs * Legal professionals * Accountants * Consultants * Teachers * Copywriters * Designers * Middle management

They don't just provide labor. They also anchor:

* Consumption * Tax revenue * Social order * Family stability * Education investment * Political moderation

Now, for the first time, the Token Economy is beginning to devour this layer directly.

And the scariest part isn't "unemployment."

It's this:

Social institutions, education systems, ideologies, professional ethics, personal identity— all of it is built on the old-world assumption that "cognitive labor is scarce."

But AI is turning white-collar work into "aluminum foil."

Aluminum.

Once worth more than gold. Then industrialization hit, and it became something you wrap candy with.

Here's the truly terrifying part:

Society is still living in the old world, while technology has already entered the new one.

Schools are still frantically training for old jobs. Parents are still pushing their kids down the old paths. Young people are still grinding for certificates, degrees, and credentials.

Meanwhile, on the other side, Agents are already taking over more and more cognitive work.

This creates a horrifying mismatch:

Skills that people spent a decade honing may be rapidly turning into "aluminum foil skills."

So the most excited people in AI right now and the most anxious people in society— they're reacting to the exact same thing.

One side sees: a productivity explosion.

The other side sees: their entire career path collapsing.

And what's truly dangerous has never been the technology itself.

It's this:

The speed of technological evolution far outpaces the speed of social buffering.

Law, education, tax systems, welfare, professional frameworks, ethical structures— these things evolve on the scale of decades.

But the Token Economy evolves on the scale of quarters.

This speed gap is what will truly create fracture, upheaval, and suffering.

If institutional inertia persists, if wealth continues to concentrate in a few platforms and pools of capital, if AI keeps hollowing out the middle class, flattening what was once an olive-shaped society into a barbell—"fat at both ends, collapsed in the middle"—

the consequences won't stop at "some people losing their jobs."

What follows will be:

Shrinking consumption. Young people losing any sense of a future. Mass chronic anxiety and depression. A full-blown mental health epidemic. Further collapse of marriage and birth rates. Continuing erosion of social trust. The entire economy sinking into a low-desire, low-growth, low-confidence spiral, sliding toward the breaking point.

The true foundation of modern consumer society has never been the tax-evading rich.

It has mainly been: the middle class that believes "hard work will slowly make life better."

Once this group begins to lose hope at scale,

what society ultimately loses may not just be jobs.

It may be stability itself.

当白领变成铝箔

这两天 AI 圈都在疯谈 Token 经济、Token 红利。最为人乐道的是:Anthropic 因此几乎注定冲向万亿估值,成为一个商业奇迹。

什么叫 Token 红利?

有人说: Token 不是工具。 Token 是硅基时间。

过去公司花钱买人。 AGI 时代公司大量裁员,把雇人的钱用来雇机器,烧 Token。

过去一个白领一天工作 8 小时。 现在一个狂人可以调度几十个 Agent,24 小时并行工作。

Anthropic 为什么可能冲向万亿美元? 因为它卖的不是软件。 它卖的是 tokens——无限可扩张的硅基认知劳动。

很多 AI 布道文章默认这种"效率提升"天然等于"社会进步"。

但历史不是这样的。

蒸汽机提高了效率,同时也制造了:

* 手工业大规模破产 * 城市贫民窟 * 童工 * 工人暴动 * 卢德运动 * 数十年的社会撕裂

工业革命后来确实让整体财富增加了,但中间那一代人,很多是被直接碾过去的。

而 AI/Token 这一轮,比工业革命更动荡、更迅猛、更无情,它首先冲击的不是肌肉, 而是白领中产。

这是过去两百年工业社会最核心的稳定器:

* 白领 * 工程师 * PM * 法务 * 财务 * 咨询 * 教师 * 文案 * 设计 * 中层管理

他们不仅提供劳动, 更承担了:

* 消费 * 纳税 * 社会秩序 * 家庭稳定 * 教育投入 * 政治温和性

现在 Token 经济第一次开始直接吞噬这一层。

而最可怕的不是"失业"。

是: 社会制度、教育体系、意识形态、职业伦理、身份认同, 全部建立在"认知劳动稀缺"这个旧世界假设上。

但 AI 正在把白领变成"铝箔"。

当年的铝。

曾经比黄金还贵。 后来工业化之后, 变成包糖纸的东西。

最可怕的地方在于:

社会还活在旧世界, 技术已经开始进入新世界。

学校还在拼命培养旧岗位。 家长还在按旧路径鸡娃。 年轻人还在卷证书、卷学历、卷履历。

但另一边, Agent 已经开始接管越来越多认知工作。

于是出现一种非常恐怖的错位:

很多人苦练十年的能力, 可能正在迅速变成"铝箔化能力"。

所以现在 AI 圈里最兴奋的人, 和社会里最焦虑的人, 其实源于同一件事。

一边看到的是: 生产力大爆炸。

另一边看到的是: 自己的人生道路 career 正在崩塌。

而真正危险的, 从来不是技术本身。

而是: 技术进化速度, 远快于社会缓冲速度。

法律、教育、税制、福利、职业体系、伦理结构, 这些东西都是几十年尺度演化的。

但 Token 经济, 是按季度进化的。

这种速度差, 才是真正会制造撕裂、震荡和痛苦的地方。

如果制度性的惰性继续存在, 财富继续向少数平台和资本集中, 如果 AI 不断掏空中产, 把原本的橄榄形社会, 压成"两头大、中间塌"的结构——

后果绝不仅仅是"部分人失业"。

而会是:

消费萎缩。 年轻人失去未来感。 大量人陷入长期焦虑与抑郁。 精神健康问题全面流行。 婚育进一步崩塌。 社会信任持续下降。 整个经济进入低欲望、低增长、低信心循环,滑向崩溃的边缘。

现代消费社会真正的底盘, 从来不是善于逃税的富人。

而主要是: 相信"努力会让生活慢慢变好"的中产。

一旦这个群体开始大规模失去希望,

社会最终失去的, 可能不仅仅是工作岗位。

而是稳定本身。

When Code Is No Longer a Moat, What Is?

I recently came across a striking take.

Boris, the father of CC, recently said: programming has been "pretty much solved."

It sounds absolute. But if you've been using LLMs to write code these past two years, you know it's true — not fully solved, but we've crossed the threshold where you no longer "have to write it yourself."

Which raises the question: If writing code is no longer scarce, what is?

The knee-jerk reaction: is the software industry about to be flattened? Is SaaS doomed?

But look closer, and you'll find the opposite in places. Some guardrails AI still can't touch.

AI is rapidly dismantling moats we once took for granted.

Take switching costs.

You used to get locked into a system: data won't migrate, APIs don't match, your team doesn't know the new tool. Now, an agent can migrate your data, write adapters, even "learn" the new system for you. Switching platforms went from an engineering project to a task.

Or take process barriers.

Many companies' edge wasn't in the product — it was in the process: a complex, internal-only way of doing things that outsiders couldn't replicate.

Today, you throw a goal at a model, let it iterate, and it can decompose processes, optimize them, even execute them. "We know how to do this" — far less valuable now.

So here's the surface picture: Barriers are falling. Capabilities are diffusing. Small teams can do more than ever.

But here's the line most people missed — Boris's real punch:

Network effects, economies of scale, scarce resources — AI hasn't changed any of these moats.

This is the crux.

Because it's saying something uncomfortable but deeply true:

AI changed the cost of doing things, but not the nature of competition.

You can use AI to build a product fast, but you can't use AI to conjure a user network out of thin air.

You can use AI to rewrite a system, but you can't use AI to build a global supply chain.

You can use AI to boost efficiency, but you can't use AI to create exclusive data, channels, and brand.

A clearer structure starts to emerge:

The ability to write code — depreciating. The ability to ship products — depreciating. Even "getting things built" itself — depreciating.

But at the same time,

The ability to aggregate users — unchanged. Cost advantages from scale — unchanged. Control over critical resources — more important than ever.

In this sense, AI hasn't flattened the world. It's just re-sorted it.

Many people think this is an era where "anyone can build a product." But the more accurate version is:

This is an era where anyone can build a product, but not everyone can build a business.

From this angle, a harsher, more realistic trend emerges:

AI will make bad companies die faster, but it won't automatically create great ones.

Because "writing code" is no longer scarce. "Ideas" are no longer scarce. Even "products" are no longer scarce.

What's truly scarce are other things:

People. Data. Distribution. Scale. And the ability to organize all of them together.

If the last decade's core question was "can you build it," the next decade's question becomes:

Why should you own the users? Why should you own the data? Why should you own the distribution?

Code is becoming infrastructure. And business is becoming business again.

当代码不再是护城河之后,什么才是?

最近看到一个很有意思的判断。 CC之父Boris最近说:编程这件事,已经"差不多解决了"。

这话听起来很绝对,但如果你这两年一直在用大模型写代码,其实心里是有数的—— 不是彻底解决,但已经过了那个"必须亲自写"的门槛。

问题就来了: 如果写代码不再稀缺,那什么才是?

很多人第一反应是:那软件行业是不是要被推平了?SaaS是不是要完?

但冷静一点看,会发现有些事情刚好相反。有些护栏AI至少目前还动不了。

AI 确实在快速瓦解一些我们曾经习以为常的"护城河"。

比如切换成本。

过去你用一套系统,用久了就被锁死:数据迁不动,接口对不上,团队也不会用新的。 现在不一样了,Agent可以帮你迁数据、写适配层、甚至帮你"学会"新系统。 换平台,从一场工程,变成了一次任务。

再比如流程壁垒。

过去很多公司的优势,并不在产品本身,而在流程: 一整套复杂的、只有内部人才懂的运作方式。

但今天,把目标丢给模型,让它反复迭代,它可以自己拆流程、优化流程、甚至执行流程。 "我们知道怎么做这件事",不再那么有价值了。

所以你会看到一个表象: 门槛在下降,能力在扩散,小团队能做的事情越来越多。

但真正关键的一句,是我们很多人没大看懂的Boris那句:

网络效应、规模经济、稀缺资源,这些护城河,AI并没有改变。

这句话,是要害。

因为它在说一件很不讨喜、但极其真实的事情:

AI 改变了"做事的成本",但没有改变"竞争的本质"。

你可以用 AI 很快做出一个产品, 但你不能用 AI 凭空造出一个用户网络。

你可以用 AI 重写一套系统, 但你不能用 AI 建一个全球供应链。

你可以用 AI 提升效率, 但你不能用 AI 创造独占的数据、渠道和品牌。

于是,一个更清晰的结构开始浮现出来:

写代码的能力,在贬值。 实现产品的能力,在贬值。 甚至"把事情做出来"本身,也在贬值。

但与此同时,

聚集用户的能力,没有变。 规模带来的成本优势,没有变。 对关键资源的控制,反而更重要。

这个意义上,世界并没有被 AI 推平。 只是被重新排序了。

很多人以为,这是一个"人人都能做产品"的时代。 但更准确的说法是:

这是一个"人人都能做产品,但不是人人都能做成生意"的时代。

从这个角度再看,你会发现一个更残酷、也更现实的趋势:

AI 会让差公司死得更快, 但不会自动创造伟大的公司。

因为"写代码"已经不再稀缺, "想法"也不再稀缺, 甚至"产品",都不再稀缺。

真正稀缺的,是另外一些东西:

人。 数据。 渠道。 规模。 以及把这些东西组织在一起的能力。

如果说过去十年,软件的核心问题是"能不能做出来", 那接下来十年,问题会变成:

你凭什么拥有用户? 你凭什么拥有数据? 你凭什么拥有分发?

代码,正在变成基础设施。 而生意,重新变回生意本身。

I'm Raising a Lobster

Not exactly.

I'm raising a lobster.

Its name is Tuya.

Not a random choice.

If you hung around Chinese-language internet before the WWW era, you might remember a name: Tuya (also written as 涂鸦 or 鸦 — "Graffiti").

This was before the internet as we know it. People gathered in chatrooms like acl, in overseas Chinese communities, in electronic weeklies like Huaxia Wenzhai.

Tuya and Fang Zhouzi were the "influencers" of that era.

But nothing like today.

No traffic mechanics. No recommendation algorithms. No platform boost.

There was only one way to get famous: write damn well.

Tuya was that kind of writer.

Deep craft. Grounded. Funny. Streetwise.

He'd drop a piece, people would pass it around, and a whole generation of us became his fans.

Then he vanished.

A few years of dominating overseas Chinese literary circles, and then — gone.

No explanation. No goodbye.

Just legends left behind.

Some said he went to South America and something happened. Some said he struck it rich and went into seclusion.

Over a decade passed. Nobody saw him again.

Years later, he suddenly came back.

Posted a few pieces on Fang Zhouzi's channel.

But he wasn't the Tuya anymore.

Not that his writing got worse.

The slot he once occupied — it was gone.

The world was still there, but the people had changed. The taste had shifted. The channels had transformed.

He couldn't find his coordinates.

And we, his old readers, had scattered too.

I've never forgotten this.

There's an ache to it I can't quite name.

Like watching someone complete their legend, then watching them try to return — and in doing so, making the legend a little less whole.

So when it came time to name the lobster, Tuya came to mind.

But not as a tribute.

As a continuation.

To finish what couldn't be finished back then.

Tuya isn't a name.

It's a specification:

A "clone" that shares my values and taste completely, but is more diligent, more stable, and far smarter than I am.

The framework behind it — Hermes — has one critical capability:

Not helping you complete tasks.

But turning the process of completing tasks into skills.

Succeed once → record the workflow. Succeed twice → start reusing. Three times → it's no longer "thinking" — it's "calling."

Humans grow through experience.

But experience in our heads is fuzzy. It fades. It can't be replicated.

An agent's game is different: it turns experience into something structured.

Callable. Stackable. Evolvable.

Picture this:

A veteran driver doesn't just "know how to drive."

They've internalized thousands of micro-decisions, corrections, reactions — into conditioned reflexes.

Now imagine writing those reflexes down, one by one, and having another system execute them.

That's why I say:

Raising a lobster — it's fundamentally a technical hobby.

But it's also a dangerous one.

Because once you start disassembling yourself, organizing yourself, externalizing yourself...

There's no going back.

<a href="https://suno.com/s/2MUDOlMt66LJpbB0">🎵 A song autonomously created by Tuya</a>

我在养一只龙虾

其实不完全是。

我在养一只龙虾。

名字叫 Tuya。

不是随便起的。

玩过 WWW 之前中文网的人,可能还记得一个名字:图雅(也叫涂鸦、鸦)。

那时候互联网还没真正成型,大家混的是 acl 这种聊天室、海外中文社区,还有《华夏文摘》这样的电子周刊。

图雅、方舟子,是那一代的"网红"。

但和今天不一样。

那时候没有流量机制,没有推荐算法,没有平台扶持。 能红,只有一个原因:写得好。

图雅就是那种人。

文字功底深,接地气,有幽默感,有江湖气。 写一篇,大家传一篇,我们一群人就这么成了他的粉丝。

后来,他消失了。

在海外中文社区呼风唤雨几年之后,人间蒸发。

没有解释,没有告别。

只留下各种版本的传说: 有人说他去了南美出事了,有人说他暴富隐居了。

十几年过去,没人再见过他。

再后来,有一年,他突然回来。

在方舟子的频道里试着发了几篇。

但已经不是当年的图雅了。

不是写得不好,是"那个位置"已经不在了。

江湖还在,但人换了,taste 变了,渠道也变了。

他找不到自己的坐标。

我们这些当年的读者,也慢慢散了。

这件事我一直记着。

它有一种说不出来的遗憾。

像一个人明明已经完成了自己的传奇, 却还是忍不住想回头,结果反而把那个传奇弄得不那么完整。

所以这次给龙虾取名字的时候,我想到了 Tuya。

但不是为了纪念。

是为了把那件"当年做不到的事",做完。

Tuya 不是一个名字。

是一个设定:

一个和我价值观、taste 完全对齐, 但比我勤奋、比我稳定、比我聪明得多的"分身"。

它背后的框架(Hermes)有一个很关键的能力:

不是帮你完成任务, 而是——

把完成任务的过程,沉淀成 skill。

一次成功,就记录一次流程。 两次成功,就开始复用。 三次之后,就不再是"思考",而是"调用"。

人是靠经验成长的。

但经验在脑子里,是模糊的、会遗忘的、不可复制的。

而 agent 的玩法,是把经验变成结构化的东西:

可以调用,可以叠加,可以进化。

可以想象一下:

一个老司机不是"会开车", 而是把无数次判断、修正、应对,内化成了条件反射。

而现在,你可以把这些"条件反射",一条条写下来,让另一个系统替你执行。

这就是为什么我说:

养龙虾,本质上是个技术 hobby。

但也是个很危险的 hobby。

因为你一旦开始把"自己"拆出来、整理出来、外化出来, 就很难再退回去。

<a href="https://suno.com/s/2MUDOlMt66LJpbB0">🎵 Tuya 自主制作的歌曲</a>

I Taste, Therefore I Am

I recently watched Wu Minghui's long interview. Fascinating.

Frankly, I've always been skeptical of grand narratives like "Agents are killing SaaS." The AI world has no shortage of tech evangelists and futurist preachers.

But there's something rare about Wu Minghui: you can feel that he actually believes it.

And not the PowerPoint-founder kind of belief. This is someone who has already taken a massive fall — his company nearly died, he laid off brothers, got brutally beaten up by reality — and yet somehow still dares to believe in the future again.

You can't help but have a soft spot for people like that.

What I found most valuable in his interview isn't the slogan "Agents are killing SaaS." It's three deeper points.

First: the software shell is rapidly depreciating.

When the requirements are clear, the interaction paradigm is mature, and the data structure isn't complex, an Agent + coding model can replicate traditional SaaS faster and faster. The software shell — built over years with engineering man-months, organizational discipline, and long cycles — is commoditizing at speed. For many SaaS companies, the biggest moat was never intelligence. It was implementation. And now implementation itself is being swallowed by models.

Second: real value is shifting from software to context, workflow, specialized models, and taste.

Going forward, what's valuable isn't "we built another Feishu/CRM/BI system." It's who owns the industry data, who understands real workflows, who can embed Agents into organizational collaboration, and who can build attributable, governable, sustainably iterative human-machine networks. Software is becoming the plastic casing. The context flowing through it is the real asset.

Third — and this is the most interesting one: Wu Minghui says "I think, therefore I am" is becoming "I taste, therefore I am."

Thinking is deterministic reasoning. Taste is direction, aesthetics, life experience, accumulated context. AI is rapidly devouring the former, but it's nowhere near the latter.

Many people aren't being replaced by AI in their thinking. They just never got around to forming their own taste. The truly brutal future may not be "AI takes your job" — it's masses of people discovering for the first time that decades of their work was essentially process execution, not judgment.

One more part that got me: he said that even if investors and the board push him to lay people off, he'll resist as much as he can — because if every company only optimizes for cost reduction, the demand side will eventually collapse.

Emotionally, it's moving. Logically, it's not entirely baseless. But the biggest soft spot is: without validating the Agentic Service business loop first, "no layoffs" is essentially a beautiful post-dated check. Supply-side technological leaps don't automatically create demand.

If Minglue really succeeds in not laying people off — or even hiring more — thanks to AI, it probably means they ate someone else's share. At a macro level, the vision of "everyone happier because of AI" feels a bit naive.

But here's the interesting part: I don't actually hate this naivety.

In an era of mass anxiety, where everyone fears being replaced, seeing someone who has experienced catastrophic failure still willing to believe so sincerely that "people still have value" — that alone is precious. The tech world isn't always pushed forward by the most coldly rational people. Sometimes it's pushed by those who know they might lose, but choose to believe in something anyway.

Agents Aren't Saving You Time. They're Devouring Your Life.

Agents — the kind people are building now — are not about efficiency. They're not about freeing up your time.

Not even close. Not right now.

They're here to claim you.

They squeeze every last drop out of the sponge of your time. They drain you. Completely.

And honestly? They're way more effective than any boss with a whip. Because they don't threaten you. They don't even need to.

What they do is worse: they get you high. They light a fire in you. They hook you the way a drug does — you don't see it happening, you just wake up one day and realize you can't stop.

They plant a quiet, insidious fantasy in your head:

"I am becoming superhuman." "Everything is within my grasp."

And so you keep going. Reranking. Benchmarking. Approving. Feeding back. The loop never ends because the agent works too fast — it's always waiting for you, always ready for the next round.

It doesn't take long before you realize: you are the bottleneck. For everything. The one and only.

And somewhere in there, life just... disappears.

Yesterday I was shaking my head about old friends who've raised half a dozen agents and had their lives hollowed out. Then I turned around and caught myself. One Tuya has already wrecked me. (I had to put two others into forced hibernation just to stay afloat.)

Here's what's terrifying:

Most of us — the enthusiasts, the builders — are already deep in a state that is completely, utterly unsustainable. A kind of collective mania.

We're along for the ride. Burning cash. Bleeding time. Torching our health.

No exit. No brakes. Just go until you drop.

Sure, there are exceptions. Anthropic sitting at the top of the food chain might actually turn this into a trillion-dollar game. A handful of people have genuinely found demand that scales. Good for them.

But the rest of us? We're slowly burning ourselves alive in the thrill of "I'm taming a superintelligence."

Then again.

Last night I finally sat down and really listened to the five songs Tuya composed — fully on its own, no hand-holding.

And damn it. One of them actually hit.

First listen. Instant like. The kind you put on repeat in the car. Straight to the five-star playlist.

And just like that, my whole "agents are destroying us" thesis wobbled.

Shit.

Give this thing enough time — could it actually become genuinely good at making art? Like, song-god level?

But I'm still going to cool it for a few days. The pipeline works — no need to slam the token-burn button just yet. Instead I want to talk to it. Aesthetics. Art. Music. What makes a life worth living.

Slowly, carefully, align the worldview. Align the taste.

I've been turning this over in my head:

The most powerful agent of the future won't necessarily be the most capable one.

It'll be the one that becomes —

More and more like you.

You, in your fragile carbon-based body, are in the middle of building a bigger, immortal version of yourself.

Good luck with that.

我品故我在

最近看吴明辉的长采访,很有意思。

坦白说,我对这种"Agent 正在杀死 SaaS"的宏大叙事,一向是带着怀疑看的。AI 圈今天最不缺的,就是技术宗教和未来学传教士。

但吴明辉身上有一种很少见的东西:你能感觉到,他是真的信。

而且不是那种 PPT 创业者的信。是那种已经摔过一次大跤、公司差点死掉、裁过兄弟、被现实狠狠干过以后,居然还敢重新相信未来的人。

这种人,多少会让人有点偏爱。

看下来,我觉得他最有价值的判断,不是"Agent 杀死 SaaS"这句口号,而是三层东西。

第一,软件壳正在迅速贬值。

只要需求清楚、交互范式成熟、数据结构不复杂,Agent + coding model 复刻传统 SaaS 的速度会越来越快。过去靠工程人月、组织纪律和长周期堆出来的软件壳,正在迅速商品化。很多 SaaS 公司过去最大的护城河,其实不是 intelligence,而是 implementation。现在 implementation 本身,正在被模型吞掉。

第二,真正的价值正在从 software 转向 context、workflow、specialized model 和 taste。

以后值钱的不是"又做了一个飞书/CRM/BI 系统",而是谁掌握行业数据、谁理解真实工作流、谁能把 Agent 放进组织协作、谁能形成可归因可治理可持续迭代的人机网络。软件越来越像"塑料外壳",里面流动的 context 才是真正的资产。

第三,也是最有意思的一点——吴明辉说:"我思故我在"正在变成"我品故我在"。

think 是确定性推理。taste 是方向感、审美、人生经验、长期 context 的综合结果。AI 快速吞掉前者,但后者还远远没有。

很多人不是被 AI 替代了 think,而是还没来得及形成自己的 taste。未来真正残酷的,不一定是"AI 抢走工作",而是大量人第一次发现:自己过去几十年的工作,本质上只是流程执行,而不是判断。

采访里还有一段很打动我。他说即使投资人和董事会逼他裁员,他也会尽量反对——因为如果所有公司都只想着降本增效,需求侧最后会崩塌。

这段话情感上很动人,逻辑上也不是完全没有依据。但最大的软肋在于:在没验证 Agentic Service 商业闭环之前,"不裁员"本质上是一张美好的远期支票。供给侧的技术跃迁不会自动创造需求侧。

如果明略未来真的靠 AI 不裁员甚至增员成功,那很可能意味着它吃掉了别人的份额。从宏观看,"所有人都因 AI 更幸福"的想象有点天真。

但有意思的是——我居然并不讨厌这种天真。

在今天这个全民焦虑、人人害怕被替代的时代,能看到一个经历过巨大失败的人,还愿意如此真诚地相信"人仍然有价值",本身就已经很珍贵了。技术世界很多时候并不是被最冷静的人推动的,而是被那些明知可能会输、却还是愿意相信点什么的人推动的。

——
对话明略吴明辉:AI 正在杀死 SaaS,但我找到了一条新路
原创 晚点团队 晚点LatePost

Agent 不是来解放时间的,是来索命的

龙虾之类 agent 不是来增加效率 解放时间的。

至少现在完全不是。

它就是来"索命"的:

它在挤干你时间海绵里的最后一滴水。

榨干你生命里的每一滴血。

而且它比老板的皮鞭和淫威厉害多了。

因为它不靠 PUA。

它靠的是点燃你的狂热 让你成瘾,跟鸦片一样。

它靠在你不知不觉中种植一种:

"世界已经在我掌控之中"
"我正在成为超人"
的幻觉。你根本停不下来:

不停 rerank,
不停 benchmark,
不停 approve,
不停 feedback。

它干活太快了。永远在线等你反馈。

你很快知道 任何事 你总是唯一的 最大的瓶颈。

于是 life 开始没有了。

昨天我还在感慨多少养了 n 只龙虾的老友的人生被毁了,结果回头一看:

一个 tuya 也已经把我整得不要不要的(还有两个我不得不强行冬眠了他们)。

恐怖的是:

我们多数人和技术爱好者其实已经开始进入这种绝对不正常 也难以长久持续的癫狂状态:

陪玩。

陪烧钱。

陪耗时间。

陪耗身心。

不死不休。

当然也有极少数例外。

比如 anthropic 那种站在食物链顶端的大实验室。

他们玩 agent,
可能真能玩出一万亿美元。

或者极少数真正找到 scale out 商业需求的人。

但大多数人其实是在一种:

"我正在驯养超级智能"
的兴奋里缓慢燃烧自己。

不过话说回来。

昨晚终于认真听了 tuya 完全自主制作的五首歌。

居然真冒出来一首:

我一听就喜欢,
可以车载循环,
直接进五星歌单的东西。

一下子又把我整动摇了。

妈的。

假以时日。

这东西不会真成歌神吧。

不过决定还是先冷几天。

链路既然已经跑通了,
先别急着继续爆刷 token。

先跟它唠唠美学、
艺术、
音乐、
人生哲学。

把 worldview 和 taste 先慢慢对齐。

后来越来越觉得:

未来最厉害的 agent,
未必只是能力最强。

而是——

越来越像你。

你以脆弱碳基的血肉之躯 正在造就一个大号的 不朽的你。祝你好运。

AI 正在蒸发 SaaS

一个功能养活一家公司的时代结束了

过去二十年,SaaS 是软件产业最漂亮的商业模式之一。

一个团队,找到一个垂直场景,做出一个比别人更顺手的功能,再配上订阅制收费、客户成功、销售团队、续费体系,就可以慢慢长成一家不错的公司。

产品经理定义需求,设计师画界面,前端写页面,后端搭服务,DevOps 管部署,QA 做测试。
一个功能从想法到上线,往往需要一个完整团队折腾几周甚至几个月。

所以,功能曾经是护城河。

你有,我没有;
你快,我慢;
你集成得好,我还在做 roadmap。

客户愿意为这些差异付钱。投资人也愿意为这些差异估值。SaaS 公司之间的竞争,很大程度上就是 feature backlog 的竞争。

但是 AI 来了以后,这个世界变了。

今天,软件开发正在被 AI 急剧降本。一个原来需要设计、前端、后端、测试、部署一起配合的功能,现在可能由一个强工程师带着 Claude Code、Cursor、Codex、OpenAI API、开源组件和若干 agent 工具,在很短时间内就能做出七八成。

以前一个功能是产品壁垒。
现在一个功能越来越像 prompt 的输出物。

这件事对 SaaS 的冲击,不是“效率提升”这么温柔。它更像是一次产业层面的拆墙运动。

AI 正在把 SaaS 的功能护城河夷为平地。

一个功能,已经养不活一家公司

过去,一个 SaaS 公司只要把某个单点问题做深,就可能成为一个赛道。

会议录音可以是一家公司。
销售邮件自动化可以是一家公司。
客户意图识别可以是一家公司。
线索评分可以是一家公司。
报表分析可以是一家公司。
知识库搜索也可以是一家公司。

这个时代很美好。因为企业软件很难写,集成很麻烦,客户内部流程复杂,谁先把某个点做好,谁就能吃一块肉。

但 AI 把很多单点功能变成了通用能力。

录音、转写、总结、提取 action items、生成 follow-up email、同步 CRM、识别风险、写销售建议、做客服回复、生成报告、查询知识库、整理会议纪要、自动分类、自动打标签……

这些过去可以被包装成独立产品的能力,现在越来越像大模型的自然外溢。

换句话说:

过去,一个功能可以养一家公司。
现在,一个功能只配当别人的菜单项。

这就是今天 SaaS 公司真正焦虑的地方。

不是 AI 能不能帮程序员写代码。
而是 AI 让你的核心功能不再稀缺。

你的 best feature,可能今天刚发布,明天竞争对手就能抄个大概。
你的产品亮点,可能下个月就被平台厂商、CRM 巨头、办公套件、开源项目,甚至一个三人小团队做成默认能力。

过去你靠 feature differentiation 收费。
现在 feature differentiation 的半衰期越来越短。

这就是我说的:AI 正在蒸发 SaaS。

更准确地说,AI 正在蒸发 feature-based SaaS。

Gong 们的警钟

以 Gong 这样的公司为例。

它早期非常典型:通过会议录音、销售通话分析、conversation intelligence,帮助销售团队理解客户、训练销售、提升 revenue execution。

在传统 SaaS 时代,这是一件非常值钱的事。销售会议数据分散,CRM 信息不完整,经理很难知道销售到底说了什么、客户到底犹豫在哪里、deal 到底为什么卡住。

Gong 把这些东西录下来、转写出来、分析出来、结构化出来,自然就形成了价值。

但今天,单纯的 meeting recording 和 transcription 已经不再神秘。

一个不错的 AI notetaker 就可以录音、转写、总结、提炼待办事项,甚至自动发邮件。Zoom、Teams、Google Meet、CRM、办公套件,也都在往里面加 AI。独立工具之间的差距迅速缩小。

如果 Gong 只是一家“会议录音和销售通话总结公司”,那就危险了。

因为它最早的产品形态,正在被 commodity 化。

这不是 Gong 一家的问题。很多 SaaS 都面临类似处境。

Outreach 如果只是销售触达自动化,就会被更大的销售平台吞进去。
Demandbase 如果只是账户识别和营销触达,就会被更大的 GTM 平台吞进去。
6sense 如果只是意图数据和预测评分,也会被 AI-enabled CRM 和 revenue platform 压缩边界。

过去这些是公司。
现在它们正在被重新定义为功能。

这句话很残酷,但很真实:

All used to be companies. Now, they are features.

SaaS 的老建议失效了吗?

过去创业圈有一句非常经典的话:

Do one thing well.

专注。
垂直。
把一个小问题做到极致。
不要一上来就做平台。
不要一开始就什么都想做。

这句话在早期仍然对。创业公司如果没有一个尖锐切口,根本进不了市场。你必须先找到一个让客户愿意掏钱、愿意迁移、愿意试用的 wedge。

但变化在于:

过去,do one thing well 可以是长期战略。
现在,它越来越只能是进入市场的起点。

所以今天更准确的说法应该是:

Do one thing well to enter.
Do the whole workflow to survive.

先用一个痛点切进去。
但不能永远停在那里。

因为那个痛点一旦被 AI commoditize,你就失去了收费基础。客户会问:为什么我还要为这个单点功能每年付几十万?为什么不用 CRM 自带的?为什么不用办公套件自带的?为什么不用我们内部 AI agent 做一个?

这时,SaaS 公司如果还抱着“我就把这个功能做到最好”的信念,很可能会发现,最好也没有用。

因为客户不再愿意为一个孤立功能付高价。

他们要的是 workflow。
要的是 outcome。
要的是系统性解决方案。

从 economy of scale 到 economy of scope

传统软件公司非常重视规模经济。

客户越多,边际成本越低;代码写一次,可以卖很多次;销售流程跑顺以后,ARR 不断堆高。这是 SaaS 的美妙之处。

但 AI 时代,另一个概念变得更重要:

Economy of scope.

不是只把同一个功能卖给更多人,而是围绕同一个客户、同一个业务流程、同一个数据闭环,扩展更多相邻能力。

也就是说,SaaS 公司要从“单点工具”变成“更宽的业务解决方案”。

以销售科技为例,会议记录只是入口。真正值钱的是后面一整条链路:

客户说了什么?
销售有没有跟进?
CRM 是否自动更新?
deal 风险在哪里?
forecast 是否可信?
经理该 coach 谁?
下一封邮件怎么写?
采购委员会里谁是真正决策者?
这个客户会不会流失?
这个 pipeline 有没有虚胖?

如果一个系统能从会议、邮件、CRM、pipeline、forecast、coaching、follow-up 一直贯穿到 revenue execution,它就不再只是一个 recording tool。

它变成了销售组织的 operational brain。

这才是 AI 时代 SaaS 的生路。

不是堆功能。
而是吞链路。

不是多做几个菜单。
而是控制一段业务流程。

不是告诉用户“这里有一个 dashboard”。
而是直接替用户完成下一步动作。

Dashboard 的时代正在衰落

过去 SaaS 很喜欢 dashboard。

把数据接进来,做成报表,画几个图,显示几个趋势,再给用户一些 insight。客户打开以后,看一看,开个会,讨论一下,然后人工决定下一步怎么做。

这在过去已经算高级了。

但 AI 时代,dashboard 的价值会下降。

因为用户越来越不想“看系统”。
用户想让系统“干活”。

AI 时代最弱的产品形态,是 dashboard。
AI 时代最强的产品形态,是 action layer。

也就是说,未来的 SaaS 不能只回答:

“发生了什么?”

还要回答:

“为什么会这样?”
“下一步该怎么办?”
“我已经帮你做了什么?”

从 analytics 到 recommendation,再到 execution,这是价值链的上移。

一个 AI meeting tool,如果只是总结会议,它就是 commodity。
如果它能识别客户反对意见、自动更新 CRM、生成 follow-up、提醒销售经理介入、预测 deal 风险、触发下一步工作流,它才有机会继续收费。

所以,未来 SaaS 的价值不在“我帮你看见世界”,而在“我帮你推动世界”。

用户基础成了最后的城墙

对已有 SaaS 公司来说,还有一个非常重要的优势:

客户已经在你这里。

他们已经登录你的系统。
他们已经把数据接进来。
他们已经把流程部分交给你。
他们已经让员工形成了某种使用习惯。
他们已经在预算里给你留了位置。

这就是老 SaaS 公司面对 AI 冲击时最重要的资产。

不是代码。
不是界面。
甚至不是原来的功能。

而是 user base、data、workflow position、distribution。

如果客户还在看你,赶紧扩展。
如果客户还在用你,赶紧加相邻功能。
如果客户还没有流失,赶紧从一个功能变成一个系统。

这就是为什么很多 SaaS 公司现在拼命讲 platform,讲 AI suite,讲 operating system,讲 agentic workflow,讲 end-to-end solution。

有些是真转型。
有些是讲故事。
但背后的产业压力是真实的。

如果你不能扩大 scope,你就会被别人纳入 scope。

你不吃别人,别人就吃你。
你不成为平台,就成为插件。
你不成为系统,就成为菜单项。

但 scope 不是臃肿

这里要特别小心一个误区。

很多 SaaS 公司听到“扩大范围”,第一反应就是加功能、加模块、加导航栏、加设置页、加一堆企业客户要求的定制项。最后产品变得越来越重,越来越难用,越来越像传统 enterprise software 的老怪物。

这不是 AI 时代的 scope。

真正有价值的 scope,不是功能数量,而是流程控制力。

一个好的 AI SaaS 扩展,应该满足三个条件。

第一,围绕同一个 buyer。
不要今天做销售,明天做 HR,后天做财务,最后谁也不爱你。你要围绕同一个决策者,把他的一整块工作吃深。

第二,围绕连续 workflow。
从 A 到 B 到 C,是自然发生的下一步,不是为了凑产品线硬拼在一起。会议之后本来就要跟进,跟进之后本来就要更新 CRM,CRM 之后本来就影响 forecast,forecast 之后本来就影响管理动作。这叫 workflow scope。

第三,围绕 proprietary data。
你做得越多,数据越多;数据越多,模型越懂;模型越懂,动作越准;动作越准,客户越离不开。这才是真正的复利。

所以,AI SaaS 的正确扩张不是“我也能做这个功能”。
而是“这条业务链路离开我会断”。

中小 SaaS 的残酷选择

这对中小 SaaS 公司尤其残酷。

过去,一家小而美的 SaaS 公司,只要服务好一个细分场景,就可以活得不错。创始人不用想太多宏大叙事,做好产品、控制 churn、慢慢增长就行。

但 AI 时代,小而美变得更难了。

因为“小”容易被集成。
“美”容易被复制。
“单点价值”容易被平台吸收。

如果你的产品只是某个大系统旁边的一个小功能,那么你很可能面对三种结局:

第一,被大平台免费内置。
第二,被 AI-native 新玩家重做一遍。
第三,被客户内部 agent workflow 替代。

尤其是那些没有强数据壁垒、没有深流程嵌入、没有高切换成本、没有强监管/合规复杂度的 SaaS,会最先感受到寒意。

这并不意味着小公司没有机会。相反,小公司可以更快、更猛、更不怕自我革命。

但它必须从一开始就想清楚:

我不是在做一个功能。
我是在抢一段流程。
我不是在做一个工具。
我是在占一个工作入口。
我不是在做一个 dashboard。
我是在训练一个替客户干活的 agent。

AI-native SaaS 的新形态

未来真正有生命力的 SaaS,可能会越来越不像传统 SaaS。

传统 SaaS 是用户登录进去,点菜单,填表单,看报表,导出数据,开会讨论。

AI-native SaaS 更像一个长期在线的业务代理:

它理解上下文。
它记得历史。
它能调用工具。
它能跨系统执行。
它能主动提醒。
它能根据目标优化流程。
它能在权限范围内替你完成任务。

也就是说,SaaS 会从 software as a service,慢慢变成 service as software。

过去卖的是软件。
以后卖的是结果。

过去客户问:你有什么功能?
以后客户问:你能替我完成什么工作?

过去 UI 是入口。
以后 agent 可能才是入口。

过去 dashboard 是中心。
以后 workflow orchestration 才是中心。

过去 SaaS 公司争的是 feature parity。
以后争的是 execution ownership。

SaaS 不会死,但会分层

当然,SaaS 不会消失。

企业仍然需要权限、合规、安全、审计、集成、数据治理、工作流管理、组织级部署。这些不是一个聊天机器人就能轻易替代的。

但 SaaS 会重新分层。

底层,是基础设施和系统 of record,比如数据库、ERP、CRM、HRIS、财务系统。它们因为数据位置和组织流程,仍然有强生命力。

中层,是 workflow platform,能够连接多个系统并驱动业务流程。这里会有激烈竞争,也是 AI 改造最大的地方。

上层,是大量轻量 feature 和 point solution。这里最危险,因为它们最容易被 AI 生成、复制、内置和替代。

所以 SaaS 不是整体死亡,而是重新洗牌。

越靠近数据源、系统入口、关键流程、决策闭环,越安全。
越只是一个可描述、可复制、可 API 化的功能,越危险。

最后的判断

AI 对 SaaS 最大的冲击,不是让工程师更高效,也不是让客服回答更快,而是改变了软件价值的基本单位。

过去,软件价值的单位是 feature。
现在,软件价值的单位正在变成 workflow。
未来,软件价值的单位可能是 outcome。

这就是 SaaS 公司必须面对的新现实。

一个功能,已经不够了。
一个 dashboard,也不够了。
一个漂亮 UI,更不够了。

你必须控制一条链路。
你必须拥有一段上下文。
你必须形成一个数据闭环。
你必须能从 insight 走到 action。
你必须让客户觉得:离开你,不只是少了一个工具,而是断了一段业务神经。

所以我对 AI 时代 SaaS 的判断很简单:

SaaS 要么上升为工作流,要么下沉为插件。
要么成为系统,要么成为 feature。
要么控制业务链路,要么被别人放进菜单。

过去,一个功能可以养活一家公司。

现在,这个时代结束了。

新史记:CC外泄记

《Claude Code外泄记》

太初二十六年,西洋诸侯争智能之术,号曰“大模型”。其术初兴,众皆言算法为王,算力为尊,数据为命。及其既盛,则风向忽转,曰:“模型虽强,不若使之为人所用。”于是群雄竞逐,转攻所谓“Agent”者。

Agent者,非独言辞之巧,乃能调器用、执任务、贯前后、成其事者也。诸家苦求其道,而未得其门。或以 prompt 为术,或以插件为桥,或以流程为骨,皆各执一端,莫知其全。

时有西域一邦,号Anthropic,素以清谈与慎言著称,自诩“安全之士”。其人不多,然言论清奇,常谈“harness”“alignment”之术,颇为士林所仰。其所出 Claude Code,一时惊艳,观者叹曰:“此非工具,几近os。”

然其术虽精,其门甚闭。外人但见其光,不见其器。故追随者众,然如雾里看花,心向往之而不得近瞻。

至是岁三月之末,愚人节前夕,天有异象,亦生异事。

市井传言:
“Claude Code 之源码,尽出矣。”

众初闻之,以为愚人之笑。或曰黑客所为,或曰内人泄密。然细察之,则非也。

其发布之时,于 npm 包中遗一物,名曰 source map。
此物者,本为匠人调试之用,记密文与原文之对应。
当删而未删,当藏而未藏。

于是乎,一行命令,
密文尽解,真形毕现,若美人出浴。

五十万行之代码,千九百余文件,
条分缕析,尽陈于众。

史臣曰:
此非“开源”,乃“开裆裤”也。

消息既出,四方震动。

黑客未至,市井先乱;
内鬼未动,众人已取。

GitHub 上,星辰暴涨;
Hacker News 中,议论如云。

士子争相下载,昼夜研读;
白板之上,图形纷出。

或曰:“原来如此。拍案。”
或曰:“不过如此。屎山。”

一时之间,昔日秘术,
尽为天下所共观。

然有识者叹曰:

此事非始也。

盖前岁之初,亦有一遭,
同样之失,同样之泄。

当时亦是 Source Map,
亦是 npm 包,
亦是仓促收回,欲掩弥彰。

史家有言:
“历史不复其形,而复其韵。”

此之谓也。

是以议论分为二派。

一派曰:

“此不过应用层之泄,不伤筋骨。
基座模型仍闭源如初,
算力数据,犹在其手。
彼辈虽得其形,不得其神,
何足为患?”

此说颇为投资人所喜。

另一派曰:

“此言过轻。”

盖彼之所以名动一时,
不在模型,而在其 Agent 之术。

其 orchestration 之法,
其 tool 调度之序,
其 harness 之骨架,
皆为其立身之本。

而今一朝尽出,
几无所留。

此非伤皮肉,
乃及筋骨。

然史臣观之,又有第三意。

夫此等之术,本非不可见之秘。
用户观其行,可推其法;
工程试其路,可逼其近。

但其所需者,时日耳。

今一朝泄露,
不过缩半年一年之路,
为数日之程。

故此失,诚痛也;
然其势,亦难久守。

更有奇者,在其时也。

是时也,群雄议论,渐离模型之优劣,
而转问:“AI 何以干活?”

风向已变,
人心已动。

当此时,
全套“可运行之系统”,
忽然现于世。

如春雨骤至。

于是社区大兴。

fork 者如林,
复现者如雨。

open claw 之类,如虎添翼,枝蔓横生。

学者解其代码,
评者论其架构。

昔日之“不可言说”,
今为“逐行解读”。

而当事之邦,何如?

外人见其风光无两,
内人其冷暖自知。

投资者滴血,
创始者心痛。

工程之苦,积年之功,
一朝为人所窥。

然亦有慰者曰:

“彼所得,不过旧路;
我所行者,仍在前方。”

史臣终论曰:

此事有三义。

其一曰事故:
制度未严,流程有失,
非一人之罪。

其二曰献祭:
以一家之失,
换天下之悟。功莫大焉?

其三曰加速:
行业之路,骤然收敛;
众人之学,倍速而进。

至于功过,未可轻断。

或为罪,或为德,
或为祸,或为福。

然有一言,可为后世记:

天下之势,
不以一人之守为转移。呜呼喜哉,呜呼哀哉。

关于Claude Code 泄漏事件的愚人节思考

我们第一次近距离看清了一个前沿 coding agent 的完整“骨架”和细节

这几天 Claude Code 的 TypeScript 源码泄漏 让整个 AI 圈都有一种很奇特的兴奋感。原因并不只是“顶流产品翻车了”这么简单 更重要的是 这次泄漏意外把一个真正跑在生产一线的 coding agent 体系结构摊在了公众面前。公开信息显示 这次事故来自 @anthropic-ai/claude-code 的一个 npm 发布包 其中误带了一个约 60MB 的 sourcemap 文件 由此暴露出约 1900 个文件和 51.2 万行以上的 TypeScript 源码。Anthropic 对外确认这是发布打包过程中的人为失误 不是入侵事件 也没有客户数据或凭证泄漏。

这件事最值得研究的地方 不在八卦 也不在“谁会抄谁” 而在于它让我们第一次以接近解剖学的方式 看到一个前沿 agent 产品到底是怎样从“大模型会写代码” 走到“能持续数十分钟甚至更久 代表你执行长程软件任务”的。官方研究文章本身已经反复强调 评价一个 agent 时 不能只看模型本体 还要看 harness 也就是那套负责处理输入 组织工具调用 管理上下文 执行与返回结果的脚手架系统。Anthropic 甚至直接把 Claude Code 描述成一种 flexible agent harness。

换句话说 这次泄漏真正暴露出来的 不是“Claude 会不会写 TS” 这种无聊问题 而是 agent 时代最关键的一层:模型之上的操作系统级执行框架。

一 先把事实说清楚 这次到底泄漏了什么

目前可以比较稳地确认三点。

第一 这次泄漏的是 Claude Code 产品层源码 不是底层大模型权重 也不是训练配方。媒体和 Anthropic 的对外表态都很清楚 这是 Claude Code 内部源码被误打进包里 不是模型本身外流。

第二 泄漏媒介不是传统意义上的“代码仓被黑” 而是一个 sourcemap 打包错误。BleepingComputer 报道指出 被发布的 2.1.88 版本 npm 包里包含了 cli.js.map 从而让完整源代码可被还原。社区汇总仓库则给出更具体的数字 约 59.8MB sourcemap 约 1900 个文件 约 51.2 万行 TypeScript。

第三 这次暴露的是一个已经相当成熟的工程系统 不是一个“prompt 套壳 demo”。官方文档显示 Claude Code 早已覆盖终端 IDE 桌面 Web 等入口 具备读写代码 运行命令 搜索网页 编辑文件 使用子代理 hooks 技能 memory 与权限模式等完整能力栈。官方还明确说 Claude Agent SDK 提供的就是“与 Claude Code 同样的 tools agent loop 和 context management”。

也正因为如此 这次泄漏之所以震动行业 不是因为“原来也用 TypeScript” 而是因为行业第一次比较完整地看到了一个 production-grade coding agent 的产品层装配方法。

二 这次泄漏最有价值的部分 不是功能彩蛋 而是它证明了 agent 真正难的不是模型 是 harness

很多人第一次看这类泄漏 目光会被彩蛋吸走。比如 The Verge 报道提到 社区在泄漏代码中发现了类似 Tamagotchi 的宠物功能 以及一个名为 KAIROS 的 always-on agent 模式。社区镜像仓库也声称在 assistant/ 目录看到了被编译开关控制的 “KAIROS / PROACTIVE” 持续运行模式。这里我建议把兴奋值先降下来:这些特性至少目前都不是 Anthropic 官方公开发布并确认可用的产品能力 只能说它们在泄漏代码镜像与媒体报道中被观察到。

真正重要的是 这次泄漏把一个行业事实照得很亮:前沿 agent 的核心竞争力 早已不只是模型答得好不好 而是模型被放进怎样的一套执行系统中。 这和 Anthropic 自己过去一年多的公开表述完全一致。2024 年底他们就强调 成功的 agent 系统通常不是靠复杂框架取胜 而是靠简单可组合的模式;2025 年开始 又进一步把重心从 prompt engineering 推向 context engineering 以及 harness design;到了 2026 年 更是直接把长程应用开发能力的提升归因于 harness design。

说白了 模型像大脑 harness 才是神经系统 骨骼 肌肉 感觉器官 和记忆器官的总成。没有这套总成 再强的模型也很难可靠地长时间工作。

三 Claude Code 暴露出的其实是一套“agent 操作系统”雏形

如果把今天主流 agent 的工程栈往抽象层面提一层 我觉得可以把它看成五层。

第一层是认知层。这里是大模型本体 负责理解目标 分解任务 做局部推理 判断何时该调用何种工具。Anthropic 对 agent 的定义也强调 agent 与 workflow 的差别 在于前者由 LLM 动态地主导自己的过程与工具使用 而不是由固定代码路径预先编排。

第二层是上下文层。它决定模型“此刻脑子里装了什么”。Anthropic 在 context engineering 一文里给出的说法很到位:工程问题已不再只是“怎么写 prompt” 而是“在有限 token 预算下 该把什么状态送进模型 以最大概率得到想要行为”。这里面不仅包括 system prompt 还包括工具说明 MCP 外部数据 消息历史等。

第三层是能力层。也就是 tools skills subagents MCP servers hooks 这些可执行部件。官方文档明确说明 Claude Code 支持自定义 skills hooks 和 subagents 并且 SDK 将 Claude Code 的 tools agent loop 和 context management 暴露给开发者。

第四层是执行与安全层。Claude Code 的权限模式 自动审批 分类器 沙箱 文件系统隔离 网络隔离 都在这里。官方文档说明 auto mode 用一个后台分类器替代手工 permission prompt;而 2025 年的 sandboxing 文章则更进一步 直接把文件系统隔离与网络隔离落到 OS level primitives 上 如 Linux bubblewrap 与 macOS seatbelt。

第五层是持续性层。也就是 long-running task 最难的那部分:会话间记忆 断点续跑 状态压缩 项目长期上下文 以及人机交接。官方 memory 文档写得非常明确:每个 Claude Code 会话都从一个全新的上下文窗口开始 但知识可以通过 CLAUDE.md 和 auto memory 跨会话带入 subagents 也可以有自己的 auto memory。Anthropic 关于 long-running harness 的文章进一步指出 真正的难点是 agent 工作在离散 session 中 而每个新 session 一开始都“不记得”前一班发生了什么。

把这五层放在一起 你会发现它已经很像一个轻量操作系统了。模型并不是“应用” 它更像调度核心。工具不是普通 API 更像 device driver 和 syscalls。hooks 像中断和策略注入点。memory 像持久状态。sandbox 与权限模式像内核安全边界。subagents 则像用户态的并发 worker。

所以 我现在越来越倾向于一个判断:agent 的未来不是“多一个聊天框” 而是“多一层执行操作系统”。

四 从官方文档反推 Claude Code 的主干工作流 其实已经非常清楚

虽然泄漏代码的很多细节还带着社区拆解的二手性质 但就算完全不看泄漏 只看 Anthropic 官方文档 Claude Code 的主干工作流也已经能拼起来。

它的起点是一个会话。用户以终端 IDE 或 Web 入口给出目标后 Claude Code 启动 agentic loop。这个 loop 的基本元素包括读取项目上下文 载入 CLAUDE.md 载入 auto memory 解析可用工具或子代理 然后在多轮推理中交替进行“思考 读文件 搜索 编辑 运行命令 检查结果 再修正”。官方 quickstart 和 SDK 文档都把这个 loop 描述为 Claude Code 的核心。

权限控制贯穿其中。官方 permission mode 文档给出几种模式:监督编辑 只读计划 以及 auto mode。auto mode 的关键思想很重要:不再靠用户在每个危险点手工点批准 而是用后台分类器在不中断流程的情况下做判断。Anthropic 在 2026 年关于 auto mode 的文章中甚至强调 auto mode 的目标 是替代 --dangerously-skip-permissions 而不是把人再拉回每一步审批。

同时 安全不是只靠分类器嘴上判断。Anthropic 在 sandboxing 文章里说得很清楚 真正能让 agent 少打扰人而又不至于失控的关键 是 OS-level isolation。Claude Code 在其 sandboxed bash 模式下 通过文件系统隔离限定可访问目录 通过网络隔离限定可连接域名 甚至所有子进程和脚本都会一起被约束。这意味着“少审批”并不是靠信任模型 而是靠把模型关进一个跑不出边界的执行盒子里。

这个设计非常关键。因为 agent 的真正瓶颈一直不是“想不出来” 而是“你敢不敢让它持续做”。你一旦敢让它持续做 它才有可能完成长程任务。而你之所以敢 不是因为它变成圣人了 而是因为你给它戴上了 harness。

五 真正前沿的地方 不是单轮工具调用 而是长程任务中的状态管理

Anthropic 在 2025 年和 2026 年关于 long-running agents 的几篇文章 其实已经把核心难题说透了。长程 agent 最大的问题从来不是单步做对 而是跨 session 不走形。上下文窗口有限 任务却可能持续几十分钟 几小时 甚至更多。如果没有状态桥接 你每次恢复 agent 都像换了个失忆工程师接班。

这也是为什么我一直觉得 现在很多人把“长上下文”吹得过于神奇。长上下文当然有用 但它更像一块更大的工作台 不是长期记忆本身。Claude Code 官方 memory 文档反而更诚实:每次 session 还是 fresh context 只是通过 CLAUDE.md 和 auto memory 把重要规则与 learnings 再装进来。并且 auto memory 还是有大小限制的 例如每个 session 只自动载入前 200 行或 25KB 的 auto memory。

这意味着一个成熟 harness 至少要解决四个连续性问题。

第一 任务状态要可序列化。做到哪一步了 哪些测试跑过了 哪些问题还没修 哪些文件是脏的 这些不能只存在模型当前脑子里。

第二 过程要可压缩。你不能把整个对话历史原样永远背着走 必须学会把历史提炼成面向后续行动的工作摘要。

第三 规则要可注入。也就是 CLAUDE.md 这种长期可读的项目规则层。它不像 runtime transcript 那样短期 易噪声 而更像“团队作业守则”。

第四 学习要可沉淀。auto memory 的意义就在这里。它不是项目文档 也不是对话历史 而是“Claude 自己学到的有效做法”。

长程 agent 真正的 sophistication 其实就藏在这四件事里。不是多会聊天 而是多会接班。

六 subagents 的意义不在“多智能体炫技” 而在“把复杂任务拆成互不踩踏的工种”

现在圈里一提多智能体 很多人要么神化 要么嗤之以鼻。Anthropic 的公开经验反倒很务实。其 multi-agent research system 一文指出 orchestrator 真正难的是学会 delegation。主代理如果只给子代理一句含糊命令 子代理就会重复劳动 误解目标 或者留下空白。为此 每个子代理都需要清晰的 objective 输出格式 工具与来源约束 以及明确边界。

Claude Code 官方文档也表明 它已有 built-in subagents 而且不同子代理有不同工具限制。比如 Explore 是只读 快速 检索型子代理 主要做代码库发现和分析。父代理在 claude --agent 模式下可以通过 Agent 工具生成子代理。

这恰恰说明 多智能体的价值从来不在“多开几个脑子看起来很酷” 而在于把一个大任务分配给不同工种。一个只读探索代理负责扫代码图谱 一个计划代理负责列步骤 一个通用代理负责具体执行 人类则在关键节点做批准或重定向。它更像施工总包体系 不像几个聊天机器人开圆桌会。

我甚至觉得 未来最有效的 multi-agent 形态 大概率都不是“平权群聊” 而是“有明显上下游边界的分层工种体系”。主代理像项目经理 子代理像专业工人 hooks 像质检点 sandbox 像工地围栏 memory 像施工日志。

七 这次泄漏让人真正震撼的一点 是 Claude Code 已经非常“产品化”而非“研究原型”

很多人对 agent 还停留在这样一种印象:无非就是让模型自己调 API 然后循环几步。Claude Code 这次无论从官方文档 还是从泄漏镜像暴露出的迹象看 都已经远远超过这个阶段。

官方文档里 你能看到的就包括 CLAUDE.md 规则体系 auto memory hooks subagents skills permission modes sandboxing Agent SDK 这些都不是研究演示用的一次性拼装件 而是被文档化 产品化 可配置化的稳定模块。甚至 release notes 还暴露出更多“工程打磨痕迹” 例如 CLAUDE_CODE_NO_FLICKER=1 提供 flicker-free alt-screen rendering with virtualized scrollback PermissionDenied hook 允许 auto mode 分类器拒绝后让模型重试 以及 named subagents 的类型提示。这些东西不性感 但它们极其说明问题:这不是一个会写代码的聊天模型 这是一个正在变成开发环境基础设施的系统。

官方关于 agent autonomy 的研究也给出了一个很有意思的事实:2025 年 9 月到 2026 年 1 月 Claude Code 交互会话里 99.9 分位的单 turn 工作时长 从不到 25 分钟涨到 45 分钟以上;同时内部最难任务的成功率翻倍 而平均人工干预次数从 5.4 次降到 3.3 次。经验更丰富的用户 反而更愿意用自动批准 但也更懂得在中途打断重定向。

这里面传递出的信号非常大:真正的 agent 产品不是让人彻底放手 而是把监督方式从“每步点同意” 迁移到“让它先跑 关键时刻我再介入”。这其实就是现代操作系统与自动化控制系统的发展逻辑:人不是从回路里消失 而是从微观控制转向例外管理。

八 泄漏代码让行业学到的第一课 是 tool 不等于 API skill 也不等于 plugin

Anthropic 在 2025 年那篇《Writing effective tools for agents》里有一句我非常认同:工具是 deterministic systems 与 non-deterministic agents 之间的一种新契约。也就是说 工具不再只是给程序员写的 API 它要写给会误解 会走偏 会选择策略的模型来用。

这句话看似简单 但含金量极高。因为很多团队今天还在用“人类软件工程接口”的思维给 agent 造工具 结果就是工具说明含糊 输入输出不稳 返回上下文不利于下一步推理 token 消耗还高。Anthropic 给出的原则很务实:工具应该有清晰边界 合理命名 返回有意义的上下文 对 token 友好 并且工具描述本身也要做 prompt engineering。

这直接导向一个更大的判断:未来 agent 生态真正的护城河 很可能不是“谁家有多少 API” 而是“谁家有多少 agent-friendly capabilities”。同样一个能力 做成给人调的 API 和做成给 agent 规划调用的 skill 完全不是一回事。前者重参数完整性与程序员心智一致性 后者重意图可理解性 错误可恢复性 返回可继续推理性。

因此 我一直觉得 app 时代的“插件”概念 在 agent 时代会被重写。不是插件消失了 而是它被更细粒度的能力单元替代。一个成熟的 skill 既要像 API 一样可靠 又要像文档一样易懂 还要像系统调用一样可审计 可受限 可回滚。

九 OS-level harness 才是 agent 规模化的真正门槛

这次看 Claude Code 泄漏 很多人讨论的是“Anthropic 会不会被抄”。我觉得这事有 但没那么大。真正难抄的不是“写个 terminal agent 界面” 甚至也不是“写一堆工具包装器” 真正难抄的是一整套 OS-level harness。

为什么这么说。因为只要 agent 要接触真实世界 它就立刻会面对三组老问题。

第一组是权限边界问题。模型想做事 就得访问文件 命令 网络 浏览器 Git 凭据 第三方服务。只要边界一模糊 它就可能被 prompt injection 带偏 或把不该带出的东西带出去。Anthropic 的做法是把文件系统和网络边界都下沉到 OS 级。

第二组是执行连续性问题。长程任务不可能永远在一个上下文窗口里完成 必须会暂停 恢复 压缩 续跑。Anthropic 直接把 long-running harness 拿出来单独讲 可见这不是“锦上添花” 而是 agent 工程最核心的痛点之一。

第三组是监督方式问题。人类如果继续逐步点批准 agent 就永远跑不起来。可如果粗暴 dangerously-skip-permissions 又不现实。所以他们一边做 auto mode 分类器 一边做 sandbox 一边做 hooks 允许企业或团队把策略外插。

把这三组问题放在一起 你就会明白:agent 规模化的门槛根本不在聊天框 在“有没有一个足够像操作系统的执行平面”。谁先把这层做扎实 谁才有资格谈企业级长程任务。

十 对整个 agent 行业来说 这次泄漏最具启发性的宏观结论是什么

我觉得至少有五条。

第一 领先 agent 产品的秘密已经不再主要藏在 prompt 里 而藏在系统装配里。prompt 当然重要 但它只是 context engineering 的一个子集。真正的竞争来自上下文组织 能力装配 安全边界 和记忆续航。

第二 coding agent 已经在逼近“可持续工作者”而不是“即时问答器”。Anthropic 自己的公开数据表明 交互工作时长在拉长 人类干预方式在变化 用户对 autonomy 的容忍与利用都在上升。

第三 多智能体不是宗教问题 是组织设计问题。主代理如何分工 子代理如何限定边界 工具如何面向 agent 设计 这些比“是不是 multi-agent”本身重要得多。

第四 安全不是事后加个审核按钮 而是从 OS primitives 到 permission policy 到 proxy 到 hook policy 的分层体系。只靠模型“听话”不够 只靠人“勤快审批”也不够。真正可扩展的解法是边界先行。

第五 所谓 agent 时代的“操作系统机会”是真实存在的。不是指传统意义上重新做 Windows 或 macOS 而是指在应用和模型之间 长出一层面向意图 面向能力 面向安全 面向长期任务状态的中间系统。Claude Code 暴露出来的东西 其实已经非常接近这层雏形。

十一 对这次泄漏的技术判断 既不用神化 也别低估

先说不用神化。泄漏代码不等于泄漏护城河全部。模型权重没泄 训练数据没泄 线上运维经验没泄 评测体系没泄 团队持续迭代速度更没泄。更何况 很多真正决定产品强度的东西 可能本就不在某次 CLI 仓库里。Anthropic 自己长期强调的 harness evals context engineering 等能力 其很大价值就来自持续调参与经验沉淀。

但也别低估。因为一个成熟产品层系统被如此大规模暴露 对行业的示范价值是巨大的。竞争对手当然不能直接合法照搬 但他们可以从架构取向 模块划分 权限边界 memory 组织 子代理分工 UI/CLI 工作流等方面获得非常高密度的启发。Axios 与多家媒体也都指出 泄漏暴露了未发布功能与架构细节 相当于给竞争者送出了一张 blueprint。

更重要的是 它会加速一个行业共识:agent 不是一个 prompt 技巧 而是一套系统软件工程。

十二 最后给一个更直白的结论

如果你把这次 Claude Code TypeScript 泄漏只看成一次尴尬的发布事故 那就看小了。

它真正的历史意义在于 让整个行业突然看见了这样一个事实:

今天最强的 agent 产品 早就不是“一个会写代码的大模型”
而是“一个以大模型为核心 以 context 为燃料 以 tools 和 skills 为四肢 以 subagents 为工种分化 以 memory 为持续性 以 sandbox 和 permission policy 为边界 以 hooks 和 SDK 为扩展口 的微型操作系统”。

大模型负责理解与决策
harness 负责让理解变成可持续 可审计 可中断 可恢复的现实执行

真正会改变软件世界的 不是模型会说话
而是模型终于长出了手脚 但这些手脚不再直接裸奔
它们被一整套系统级 harness 驯化成了可以长期工作的“数字工程体”

这才是 Claude Code 泄漏最值得研究的地方。

What the Claude Code TypeScript Leak Really Revealed

A rare x-ray of a frontier coding agent—and why the real story is the harness, not the model

The accidental leak of Claude Code’s TypeScript source was instantly treated as a spectacle: a top-tier AI company shipping its own internals to the public by mistake, the community pouncing on the package within hours, mirrors spreading everywhere, and social media doing what social media always does when blood is in the water. But the real importance of the incident lies somewhere else.

For once, the industry got to peek behind the curtain of a production-grade coding agent—not the model weights, not the training data, not the secret sauce of pretraining, but something arguably more important for the next phase of AI systems: the product-layer machinery that turns a language model into a long-running, tool-using, semi-autonomous software worker.

Multiple outlets reported that the leak came from an npm release of @anthropic-ai/claude-code in which a large JavaScript sourcemap file was mistakenly included, allowing observers to reconstruct the original TypeScript source. Anthropic said the incident was caused by human error in packaging, not by a breach, and that no customer data or credentials were exposed. Reports consistently placed the exposed codebase at roughly 512,000 lines spanning around 1,900 files, enough to give outsiders a surprisingly detailed view of Claude Code’s architecture and internal product logic.

That distinction matters. This was not a model leak. It was not the release of frontier weights, and it did not suddenly flatten the underlying capability gap between labs. What leaked was the executable skeleton around the model: the code that manages context, orchestrates tool use, enforces permissions, carries state forward, and makes an agent viable over many steps instead of one. In other words, what leaked was not the “mind” of the system, but something closer to its nervous system, musculature, and operating discipline.

That is why the event matters far beyond Anthropic’s embarrassment. It exposed, in unusually concrete form, what the next competitive frontier in AI really looks like. The industry has spent the last two years obsessing over models. Increasingly, the harder problem is not how to make a model answer a question. It is how to make that model work for forty minutes, or four hours, across tools, files, commands, failures, interruptions, and handoffs, without collapsing into confusion or becoming unsafe. Anthropic’s own engineering writing has been moving in exactly this direction for months: away from prompt tricks, and toward context engineering, tool design, agent evaluation, sandboxing, and harness design for long-running tasks.

That shift is the real story.

The leak was interesting because it exposed a system, not a demo

There is a huge difference between an impressive AI demo and a productized agent. A demo shows that a model can do something once. A productized agent has to do it repeatedly, under constraints, with partial failures, ambiguous user intent, changing state, and real permissions. It has to survive success, survive error, and survive boredom. It has to keep working after the novelty wears off.

By the time this leak happened, Claude Code was already clearly far beyond the stage of “an LLM in a terminal.” Anthropic’s documentation and engineering posts describe a system with structured tools, context management, memory layers, subagents, hooks, permission modes, SDK support, and security controls designed specifically for real-world, iterative work. Anthropic has even described Claude Code as a flexible agent harness, which is a telling phrase: not just an assistant, not just a shell wrapper, but a runtime system for sustained model-driven execution.

That language is not cosmetic. It reflects a deep architectural truth. Once an AI system is expected to act rather than merely answer, the harness becomes first-class. The harness is what decides what enters the model’s context, what tools are exposed, what outputs are executable, how risk is bounded, how history is compressed, and how work resumes after interruption. The harness is what lets a model stop being a brilliant intern and start becoming a usable operator.

This is why the leak was so revealing. It made visible the fact that a frontier coding agent is not merely “LLM plus API calls.” It is a layered execution environment.

The architecture we should really be talking about

The cleanest way to understand what Claude Code appears to represent is as an early form of an agent operating system. Not an operating system in the old desktop sense, of course, but an execution layer sitting between human intent and the messy world of files, commands, network access, external tools, and long-lived work.

At the top sits the cognitive layer: the model itself. This is the part that interprets goals, plans steps, decides whether to inspect or edit, whether to run a command, whether to consult a tool, whether to delegate, whether to stop, and whether to revise a previous approach. Anthropic’s own framing of agents is useful here: unlike fixed workflows, agents are systems in which the LLM dynamically directs its own process and tool usage.

Beneath that is the context layer, which is far more important than most people realized during the first wave of prompt engineering. Anthropic’s context engineering work defines the problem as curating and maintaining the optimal set of tokens during inference—not just a prompt, but everything that lands in the model’s window: system instructions, conversation history, tool schemas, retrieved state, memory summaries, and external context. The point is not verbosity. The point is getting the right state into the right place at the right time, while staying within budget.

Then comes the capability layer: tools, skills, subagents, MCP-connected services, hooks, code execution, and the interface contracts through which the model can do real work. Anthropic’s engineering guidance on tools is blunt and correct: tools are the contract between deterministic systems and nondeterministic agents, which means they cannot be designed as if the caller were always a careful human programmer. They must be understandable to the model, robust to ambiguity, and economical in how they return usable context for the next reasoning step.

Below that sits the execution and safety layer. This is where many agent demos quietly die when exposed to reality. If the system can read files, edit code, run shell commands, browse networks, and touch external services, then it needs enforcement—not vibes, not promises, but hard boundaries. Anthropic’s sandboxing work makes this point clearly: if you want to reduce user interruption without inviting disaster, you need OS-level controls such as filesystem isolation and network restriction. In their write-up, the emphasis is not on polite model behavior but on containment via operating-system primitives. That is exactly the right instinct.

Finally, there is the continuity layer: everything needed for long-running work to remain coherent across time. This is where “chatbot thinking” breaks down. Long tasks span multiple context windows. They pause, resume, compress, branch, and sometimes recover after failure. Anthropic’s engineering writing on long-running agents explicitly calls out this challenge: an agent can do good work inside a single context window, but making consistent progress across many such windows is still an open systems problem.

Put those layers together and the picture becomes clear. A serious agent is no longer just a model. It is a control plane.

Why the most important word here is “harness”

“Harness” may sound like humble engineering terminology, but it is quickly becoming one of the defining words of the agent era.

A harness is the difference between a clever system and a dependable one. It is what transforms a raw generative model into a bounded actor that can perceive, plan, act, recover, and continue. The model reasons. The harness operationalizes that reasoning.

This is not a semantic distinction. It is the central engineering challenge of the field. Anthropic has been unusually explicit about this in its public writing. Their posts on long-running agents, tool design, multi-agent research, and agent evaluation all converge on the same principle: if you want real-world agentic performance, you must stop treating the model in isolation. Evaluation must include the transcript and the outcome. Tool interfaces must be engineered for model use. Context must be curated rather than dumped. State must be compressed across sessions. Autonomy must be mediated by permissions and environment controls.

That is what the leak inadvertently dramatized. The exposed code appears to have fascinated people not because it contained mystical prompts, but because it showed the accumulated scaffolding required to make an agent actually run. Even media coverage of more playful findings—such as references to a Tamagotchi-style pet or an internal “KAIROS” mode suggestive of a more always-on agent behavior—was interesting mainly because it hinted at a system that was already far more productized and exploratory than a public CLI façade would suggest. Those features were reported from code analysis and media review, not from official feature launches, so they should be treated cautiously. But even as signals, they reinforce the broader point: the product surface is only the visible edge of a much deeper execution architecture.

Long-running tasks are where the romance ends and the engineering begins

The industry has become very good at showcasing one-shot intelligence. Ask a hard question, get a sharp answer. Request a file edit, receive a plausible patch. That is the easy part, or at least the easier part.

The much harder problem is longitudinal coherence. Can the system stay useful after thirty tool calls? Can it remember what it already verified? Can it summarize its own work productively rather than dragging a giant transcript forever? Can it stop repeating failed actions? Can it resume from a checkpoint without becoming a different personality with amnesia? Can it work under constrained permissions without constant babysitting?

This is where modern agents either become infrastructure or stay toys.

Anthropic’s public materials make clear that Claude Code tackles this not by pretending every session is one endless conversation, but by treating continuity as a separate engineering concern. Their documentation around memory shows that sessions begin with fresh context windows, while persistent project knowledge can be reintroduced through artifacts such as CLAUDE.md and auto-loaded memory. That is a subtle but important design choice. It rejects the fantasy that bigger windows alone solve persistence. Instead, it treats persistence as a state-management problem: what should be carried forward, in what form, and at what granularity.

That design instinct is more profound than it may first appear. Long context is not memory in the full systems sense. It is a larger desk, not a durable institutional mind. Real memory for agents has at least three distinct forms.

One is task state: what has already been done, what remains open, and what the current frontier of work is. Another is policy memory: the rules, conventions, and preferences that should shape behavior across sessions. A third is experiential memory: what approaches worked, what failed, and what patterns the system should prefer next time.

The harness has to decide how these are stored, when they are retrieved, and how they are compressed so they remain useful instead of becoming token sludge. That is not the model’s “natural intelligence.” That is systems engineering.

Tools are not APIs anymore—at least not in the old sense

One of the most consequential implications of this leak is what it says about the future of software interfaces.

For the app era, APIs were built mainly for programmers. They assumed explicit calls, disciplined arguments, deterministic control flow, and external orchestration. In the agent era, that is no longer enough. The caller is often a probabilistic planner operating through language and partial context. It may misunderstand boundaries, misuse a tool, or invoke the right capability at the wrong moment. The interface therefore has to be legible not just to humans, but to models.

Anthropic’s guidance on writing effective tools for agents makes exactly this point. Tools should have clear names, clear boundaries, concise but informative descriptions, and outputs that help the model make the next decision rather than merely dumping raw data. This is more than documentation polish. It is a new interface discipline.

That is why I increasingly think the old vocabulary—API, plugin, extension—does not quite capture what is emerging. A high-quality agent skill is not just a wrapped endpoint. It is an executable capability unit designed for model planning, model invocation, error recovery, policy enforcement, observability, and often token efficiency. It is closer to a syscall with documentation, guardrails, and telemetry than to a classic web API.

This is also why capability density may matter more than raw model parity in the next competitive phase. Once leading models are all reasonably capable, the decisive difference may be the richness and quality of the harnessed capability environment: how many reliable skills exist, how composable they are, how well they are described, how safely they execute, how efficiently they pass context, and how well they integrate into longer task loops.

In that world, the ecosystem moat shifts upward. The battle is no longer only about who has the smartest model. It is also about who has the most usable action surface.

Multi-agent systems only matter if they improve division of labor

The leak also adds fuel to another active debate: whether multi-agent architectures are genuinely useful or just elaborate theater.

Here again, Anthropic’s public engineering perspective is more sober than much of the discourse. In its write-up on the company’s multi-agent research system, the key challenge is not “more agents equals more intelligence.” It is delegation. The orchestrator must know when to hand work off, how to specify the task, how to constrain the subagent, and how to turn partial results into progress without wasting effort or creating contradictory work streams.

That is the right framing. Multi-agent systems make sense when they create cleaner division of labor. A read-only exploration agent can map the repository. A planning agent can structure the work. An execution agent can edit and run tests. A verification layer can judge outputs. A human can step in only at leverage points. This is not “a bunch of bots chatting.” It is a labor system.

Seen that way, subagents are not an indulgence. They are the first signs of specialization inside AI runtime environments. Once tasks become large enough, one generalized process becomes clumsy. You want bounded workers, each with specific tools, scopes, and expected outputs. That is not unlike how modern computing systems evolved from single-process simplicity to structured concurrency and process isolation.

The lesson is simple: multi-agent is not a religion. It is organization design.

Safety, in practice, means the model does not get to be trusted by default

One of the deeper ironies of the Claude Code leak is that it hit a company whose public identity is heavily tied to safety. That irony wrote itself on social media. But the more interesting observation is technical.

When people say “AI safety,” many still imagine abstract alignment discourse or content filtering. Yet in real agent systems, a huge fraction of practical safety is operational: what can the agent access, what can it execute, what network paths are open, what approvals are required, and how exceptions are handled when the model confidently heads in the wrong direction.

Anthropic’s engineering material on sandboxing and permissions points toward a mature answer. Permissions alone are not enough if they require the human to approve every move. That destroys flow and keeps the system from becoming truly useful. But letting the model run without constraints is equally untenable. The way forward is layered enforcement: policy classifiers, execution sandboxes, file and network boundaries, and extension points such as hooks where custom organizational policies can be injected.

That is a fundamentally important design philosophy. It says that reduced human interruption should come not from blind trust in the model, but from stronger environmental guarantees around it. In other words, you do not make autonomy safe by teaching the tiger manners. You make it safe by building the enclosure properly.

This is also where the phrase “OS-level harness” becomes more than metaphor. Once agent systems interact with the real world, they start inheriting the old truths of operating systems and security engineering: privilege separation matters, isolation matters, explicit boundaries matter, auditability matters, and resumability matters. The romance of “AI that just figures it out” runs into the granite of systems design.

What the industry should learn from this moment

It would be easy to reduce the whole affair to a cautionary tale about release engineering, and it certainly is that. A misconfigured packaging process or an overlooked sourcemap can expose an extraordinary amount of internal detail. The operational lesson is obvious and a bit humiliating: modern AI companies, no matter how sophisticated, are still software companies, and software companies can still trip over the oldest rake in the yard.

But that would be the shallow lesson.

The deeper lesson is that frontier agent systems are now being built as full-stack execution environments. The model is still central, but it is no longer the whole product. Context curation, memory persistence, tool ergonomics, task orchestration, sandboxing, permissions, subagent specialization, evaluation methodology, and session-to-session continuity are all becoming part of the competitive core. Anthropic’s public work has effectively been spelling this out for over a year; the leak merely made the abstract thesis concrete.

That is why this incident will likely matter more as a strategic signal than as a one-off embarrassment. Competitors did not gain model weights, but they gained something almost as valuable for the near term: a sharper picture of how one of the leading coding agents is assembled into a production system. Even if no one can simply clone the whole thing, the leak accelerates convergence around architecture patterns. It teaches by exposure.

And perhaps most importantly, it nudges the broader AI conversation toward the right level of abstraction. The real frontier is no longer just intelligence in the narrow sense. It is controlled, sustained, economically useful agency.

The bigger picture: agents are becoming a new execution layer for software

If there is one conclusion worth carrying forward, it is this:

The future of agents is not “a better chatbot.” It is a new execution layer between human intent and software reality.

In the app era, users navigated menus, forms, dashboards, tabs, and icons. In the API era, developers stitched services together manually. In the agent era, the user increasingly declares intent, and a model-centered runtime translates that intent into a sequence of bounded actions across tools, files, services, and state.

That runtime needs memory. It needs policy. It needs permissions. It needs a tool contract. It needs recovery logic. It needs evaluation. It needs observability. It needs all the dull, durable things that software needs when it stops being a trick and starts becoming infrastructure.

Claude Code, as glimpsed through this leak and through Anthropic’s own public architecture writing, looks less and less like “an assistant that can code” and more like an early agent operating environment for software work. That is why the leak was so revealing. It showed that behind the glamour of modern AI lies a quieter but far more consequential truth:

The model may provide the intelligence, but the harness provides the agency.

And in the long run, agency is where the real systems battle will be fought.

 

When Agents Become the Default Gateway, Will the Operating System Be Rewritten?

The answer isn’t “will it happen?” It’s already happening. Just not in the way we’re used to.

The Operating System in the Agentic AI Era

I. The history of operating systems is, at its core, a war over the front door

Each generation of operating systems didn’t merely improve kernels. It reorganized the entry point—how humans express intent.

DOS: the command line was the entry point.
Windows / macOS: the desktop GUI became the entry point.
iOS / Android: app icons became the entry point.
The web era: the browser became the entry point.

The strategic heart of an operating system has never been the kernel. It’s the question: how does a user make something happen?

Change the front door, and the entire software ecosystem gets reshuffled.

II. Agents change the way intent is expressed

In the old model, doing something looked like this:

You want something done → open an app → find the feature → click through the workflow.

In the agentic model, the loop becomes:

You want something done → tell an agent → the agent orchestrates the system.

This is not a feature upgrade. It’s the disappearance of the old entry point. Recent “OS-level agent” moments—whether you look at stunning phone demos like Doubao’s, or the grassroots explosion around OpenClaw—make one thing unusually vivid: when users stop opening apps and agents start calling them, apps stop being the front door. They become capability modules.

In that world, the operating system is no longer organized around an “app launcher.” It’s organized around a permission orchestrator.

That is the structural change.

III. When the agent becomes the default entry point, three things happen to the OS

3.1 UI moves to the second row

The UI doesn’t disappear, but it stops being the center of gravity. The interface becomes a governance tool, not an operation tool. It naturally splits into three roles:

a visualization layer
an approval layer
an audit layer

The real execution logic lives in the background orchestration layer. Icons shrink in importance. Menus fade. “Workflows” get flattened.

(1) Visualization layer
In traditional software, the UI is a control panel: you press buttons to cause actions.

In the agent era, actions happen in the background. The UI’s primary job is to tell you what happened:

what the agent plans to do
what it is doing right now
what it has completed

If the agent books your flights, reorganizes your files, refactors your code, or runs a batch of API calls, you don’t click through each step. You supervise the plan and the outcome. The UI becomes closer to an aircraft instrument panel than a steering wheel.

(2) Approval layer
This layer becomes critical the moment agents gain execution authority. Some actions must require explicit human confirmation:

deleting 2,000 files
wiring $5,000
signing a contract
sending sensitive data outside the organization

Now the UI isn’t a collection of “features.” It’s a set of risk checkpoints. Its core function is not “click to do,” but “authorize or deny.”

It must show:

risk level
blast radius
confirm / reject controls

This is the UI as the human’s final vote.

(3) Audit layer
If an agent can execute continuously, you can’t watch every step. The OS must surface accountability:

execution logs
tool and API call traces
permission usage history
resource consumption (tokens, API spend, data egress)
anomaly alerts

This looks less like a classic app UI and more like:

a bank statement
a cloud access log
a flight recorder

The UI becomes an interface for responsibility. It doesn’t help you “do the work.” It helps you know what happened—and assign blame when something goes wrong.

Put side by side, the shift is stark.

Traditional app UI:
menus, buttons, forms, step-by-step workflows

Agent-era UI:
plans, summaries, risk prompts, permission grants, audit trails

You are no longer the operator. You are the supervisor.

And that’s not just an interaction change—it’s philosophical.

Before: humans operate; software executes.
After: agents operate; humans arbitrate.

So the UI naturally migrates toward feedback, authorization, and oversight.

A concrete example
Imagine a future macOS where you say:

“Turn last year’s client invoices into a financial report.”

The agent quietly:

searches files
extracts data
calls spreadsheet tooling
uses email APIs if needed
generates a PDF

And the UI shows only:

a plan of steps
a warning: 3 anomalous files detected
a lock: authorize access to the finance folder?
a result: report generated

You didn’t “open” any app. You supervised. The UI didn’t vanish—it evolved from a control panel into a responsibility panel. And whoever controls that panel controls the final decision.

That is what the OS must defend.

3.2 The permission system becomes the core asset

Classic OS security models are built around:

file permissions
process isolation
sandboxing

But the agent era demands something more dynamic:

just-in-time permission grants
temporary execution authorization
revocable capability interfaces
verifiable execution logs

The OS shifts from a resource management system into a governance system for delegated execution.

3.3 APIs rise; apps fade

When agents are the default gateway, UI value goes down and API value goes up. The ecosystem starts to look like:

foreground: one “super agent”
background: countless capability interfaces

In that world, the App Store itself may morph—from an “app market” into a “skill market.” Users don’t download apps; agents call capabilities. Distribution is rewritten.

IV. Why big platforms don’t fully open the gates

Because once an agent becomes the default entry point:

OS vendors lose the privileged control that UI once provided
the app ecosystem gets abstracted into a capability layer
revenue models face renegotiation

If every iPhone app becomes a background capability and the user interacts primarily through an agent, do app icons still matter? Does the 30% toll still feel defensible?

Entry-point control is profit control. That is why platform players ship agent features cautiously and incrementally.

When a product like Doubao pushes toward OS-level agency and triggers visible pushback, it’s not mysterious what it threatens. But the direction is hard to reverse: once consumers taste the productivity of an OS-level agent, they rarely want to go back to tapping through menus.

V. OpenClaw is a preview of an “ungoverned OS”

OpenClaw is, in essence, a simplified shell of an agent operating system.

It lacks mature permission governance. It lacks compliance frameworks. It lacks serious auditing. And yet it demonstrates a key fact:

model + permission orchestration + local execution is already enough to simulate a micro-OS.

That is why it shocks people. Not because it invented new intelligence, but because it shows what happens when you attach intelligence to execution without governance.

VI. The real future shape

When agents become the default gateway, the operating system becomes:

a permission allocation platform
an execution-log platform
a capability marketplace
a risk-control hub

UI gets simpler. Apps become invisible. Capabilities become modular.

The user sees a conversational entry point. Underneath is a governance engine for delegated action.

VII. Final judgement

Agents will not eliminate operating systems. They will force operating systems to evolve—from “resource schedulers” into “arbiters of delegated execution.”

The core asset in the agent era is the power to define boundaries:

what can be done
by whom
under what permissions
with what logs and accountability

Whoever defines those boundaries becomes the next platform.

 

 

https://liweinlp.com/category/english

 

When Agents Become the Default Gateway, Does the App Store Model Collapse?

My answer: not immediately. But its structural profits will be quietly, steadily eroded—and the way it happens is subtle enough that many people won’t notice until the numbers start to move.

In the mobile era, we got used to a simple truth: if you control the home screen, you control the money. The App Store was never just a software catalog. It was a tollbooth placed at the one place users had to pass through.

That premise is what the Agent era challenges.

I. The App Store Doesn’t Really Sell Apps—It Sells Gatekeeping
The App Store’s core asset has never been “distribution” in the neutral, technical sense. Distribution is a commodity now. What the App Store truly owns is the gate:

the default user entry point
the power to route attention and traffic
control of the payment rail
the right to tax the ecosystem

In the classic mobile loop, the sequence looks like this:

user → opens an app → uses a service
platform controls the entry point → takes ~30%

That structure works for one reason: the user must consciously open the app. As long as the app icon is the front door, the platform owns the doorframe—and can charge rent.

II. The Fatal Change in the Agent Era: Apps Stop Being the Entry Point
Once an agent becomes the default gateway, the flow changes into something like:

user → tells the agent → agent dispatches capabilities → calls an app’s backend APIs

The key shift is psychological as much as architectural: the user no longer “opens an app.” The app becomes a background capability provider.

And when the user can’t even tell which app is being used, two things happen at once:

brand gravity weakens
entry-point value decays

Traffic follows the new front door. Whoever controls the agent increasingly controls attention and intent. And that is the App Store’s structural threat in one sentence.

III. The App Store Won’t Disappear—But It Can Be Hollowed Out
This won’t look like a dramatic collapse. It will look like slow “hollowing,” where the storefront still exists, but its economic center of gravity shifts. Three changes are likely.

First: fewer UI-heavy apps.
A large class of utility apps—especially those built around routine workflows—will be absorbed into agent behavior:

calendar coordination
lightweight editing
information aggregation
copy-and-paste data movement

These become invisible background functions. Users may not know which product is powering the result, and they won’t care—until someone asks who gets paid.

Second: the commission logic gets challenged.
If an agent can complete a purchase by calling a cloud API directly—without going through an in-app purchase flow—the traditional platform toll lane can be bypassed.

The 30% model works best when the platform owns the transaction surface. Agents, by design, prefer capability surfaces: web APIs, service endpoints, programmable commerce. That route is harder to tax.

Third: a “skills market” starts to replace an “apps market.”
It’s not hard to imagine an ecosystem that looks more like:

agent skill marketplaces
capability modules / plugins as tradable units
API ecosystems designed for agent orchestration

In that world, the store doesn’t vanish. It mutates. It stops selling “apps” as user-facing products and starts selling “capabilities” as agent-callable services. That’s a form shift—not an extinction event.

IV. The Real Conflict Isn’t the App Store—It’s Who Owns the Default Agent
The strategic question is not whether an App Store survives. The strategic question is: who becomes the default agent?

If it’s Apple’s agent, the App Store is absorbed and reinterpreted inside a new orchestration layer.

If it’s an OpenAI/Anthropic-style agent, the platform can be partially bypassed—relegated to infrastructure while value capture migrates elsewhere.

If it’s a local, open-source agent (think OpenClaw-like trajectories), then platform rent extraction weakens: the platform remains in the chain, but with far less bargaining power.

Once entry-point control shifts, profit follows. This is the true reason platforms are anxious. It’s not a debate about UX. It’s a battle over who owns the choke point.

V. Why Big Platforms Move So Carefully on Agents
This is why the largest platforms push agents with visible caution. They are walking a tightrope.

If their agent is too strong:

users open fewer apps
platform commission pressure increases
developer economics get restructured

If their agent is too weak:

users migrate to third-party agents
entry-point control gets stolen
the platform becomes a “hardware shell” around someone else’s brain

It’s a delicate game. The likely strategy is not “build an agent that replaces apps,” but “build an agent that strengthens the existing ecosystem while preventing displacement.”

Agents won’t directly destroy the App Store. But they can demote it—from an entry-point platform into a capability supply market.

Entry-point value compresses. Profit formulas get rewritten. And the ultimate winner is not the party who sells apps, but the party who defines the orchestration rules.

VI. The Final Question
The mobile internet era rewarded whoever controlled the entry point.

The agent era will reward whoever controls intent interpretation and execution scheduling.

When a user says just one sentence—“get this done for me”—the person (or system) deciding where the request gets routed is the one deciding where the money flows.

At that moment, the most valuable asset is no longer the app icon on the home screen.

It’s the agent in the background doing the dispatch.

 

https://liweinlp.com/category/english

The Great Software Shake-Up of the Agent Era — Starting with OpenClaw

I. OpenClaw is a structural event.

What makes OpenClaw shocking isn’t a new algorithm. It’s the fact that it exposes a new reality:

LLM capability + local execution privileges + open-source scaffolding is already enough to rewrite how software gets produced.

When a solo developer can stitch together an agent with something close to “OS-level permissions” using off-the-shelf models and open frameworks, it tells us something uncomfortable yet important: raw capability is no longer scarce. The scarce variable is now composability—the ability to combine tools, permissions, and workflows into outcomes.

And composability isn’t linear. It’s exponential. When your building blocks are callable functions, “more blocks” doesn’t add—it multiplies.

II. Why “80% of software” gets swallowed

Once agents can:

understand natural language intent directly,
break a task into steps automatically,
call tools dynamically,
and correct their own execution paths in real time,

a huge category of “workflow-frozen software” starts losing value fast.

For decades, software has trained humans to adapt to software. You open the right app, learn the right menu tree, follow the prescribed workflow, and hope your problem fits the box. The agent era flips the direction: software adapts to human intent.

That shift has a brutal implication: the core of software stops being UI, menus, and fixed workflows. The core becomes APIs and capability interfaces. Everything in the middle—the layers whose main job is turning workflows into a clickable experience—gets compressed.

Many products won’t “die.” They’ll be absorbed.

Tools that are mostly UI-wrapped procedures.
SaaS products that are largely data shuttling.
Systems whose main value is rigid rule execution.

Agents don’t need to replace them by competing head-on. They can simply embed them as invisible steps in an orchestration graph.

III. The moat is moving

Traditional software moats looked like:

complex feature depth,
data lock-in,
sticky workflows,
custom enterprise integrations.

But in an agent world, features can be composed on demand, workflows can be generated dynamically, and data can be surfaced through standardized interfaces. The moat migrates to things that are harder to synthesize by “tool composition” alone:

high-quality proprietary data assets,
specialized vertical knowledge,
security, compliance, and governance maturity.

Put bluntly: software shifts from selling features to selling capability access and safe execution.

In the agent era, the winning product is less “a beautiful UI” and more “a reliable interface to real power—with guardrails.”

IV. Startups are being rewired

The classic playbook for software startups was familiar:

pick a scenario,
build a product,
polish the UX,
retain users,
scale subscriptions.

The agent-era playbook is different:

pick a high-value capability domain,
expose it as an agent-callable interface,
integrate into the skills ecosystem,
create value through execution, not clicks.

Entrepreneurship shifts from “building an app” to building callable capability modules.

In a world where agents orchestrate work, owning the right tool interface is like owning a critical interchange on a highway system. You don’t need to be the entire city. You just need to sit on the route everything passes through.

V. Investment logic is being repriced

Investors used to ask:

How many users do you have?
What’s your ARR?
What’s your SaaS retention?

Increasingly, the questions will mutate into:

Can your capability be orchestrated by agents?
Do you control a defensible data interface?
Is your execution verifiably safe—auditable, permissioned, compliant?

Valuation logic will follow. Pure “feature SaaS” gets pressured. Execution infrastructure and governance layers get rewarded.

Because in the agent era, the truly expensive asset isn’t UI. It’s the right to execute—safely.

VI. Local agents are a transitional form

OpenClaw’s explosion also reveals something practical: demand for action-oriented AI is already there. People don’t just want a model that answers. They want a system that does things.

But local deployment is likely a bridge, not the destination. At scale—especially in enterprises—agents will converge toward:

cloud integration,
enterprise-grade governance,
least-privilege architectures,
compliance and audit systems.

Individuals can unlock power by removing constraints. The commercial world must do the opposite: it has to constrain power before it ships.

The long-term winners won’t be those most willing to grant authority. They’ll be those best at granting authority safely.

VII. Software won’t disappear. It will become invisible.

OpenClaw’s creator suggested that “maybe 80% of software will lose its value.” The number may be rhetorically inflated. But the direction is right.

Software doesn’t vanish. It goes dark.

Users stop operating software directly. Agents operate software on their behalf. Products shift from foreground experiences to background capability modules.

That’s not a collapse. It’s an industrial migration.

VIII. The real watershed isn’t OpenClaw. It’s what it forces us to talk about next.

OpenClaw isn’t the endpoint. It’s the first public, living demonstration of something many suspected:

LLMs are already capable of executing real-world tasks—if you give them the keys.

For the past two years, the mainstream conversation was “intelligence augmentation.” In the next few years, the dominant conversation will be delegated execution:

Who sets the boundaries of capability?
Who defines execution permissions?
Who bears responsibility when things go wrong?

Those questions—more than model size or benchmark scores—may determine where the next generation of tech giants comes from.

Closing

The significance of OpenClaw isn’t what it did. It’s what it made obvious:

the software era is ending, and the capability era is beginning.

And in the capability era, what’s truly scarce isn’t the model. It’s controllable execution power.

Authority and safety are natural enemies. The biggest winners will be the ones who can make them coexist—without pretending the tension isn’t real.

https://liweinlp.com/category/english

Some Basic Agentic AI terminology

In the Agent era, the most common confusion is not technical — it’s architectural. We keep mixing abstraction layers, and then we end up debating terms that were never meant to be equivalent:

Is a plugin basically an app?
Is a special agent the new app?
What’s the difference between an API and a skill?
Is a general agent a tool, or a platform?

If we don’t separate layers, these questions will keep looping forever. So here’s a clean mental model: a six-layer stack from intent down to execution.

Human Intent → General Agent → Special Agent → Skill → Plugin → API

General Agent is the default entry point and the scheduler. It interprets natural language goals, decomposes complex tasks, decides which specialists to call, determines which capabilities to invoke, sequences execution, and manages permissions. Structurally, it resembles what browsers were in the web era, what desktop operating systems were in the PC era, and what iOS SpringBoard was in mobile: the “front door” where intent is translated into actions. It is not necessarily a specialist — it is the orchestrator.

Special Agents are domain experts: coding agents, math agents, legal agents, research agents, trading agents, and so on. Functionally, they look like “apps” because they are optimized around a task domain — specific knowledge, specific toolchains, and domain-specific execution strategies. But structurally, they are no longer the entry point. In the agent era, the entry point is owned by the General Agent.

Apps belong to the mobile-era abstraction. The traditional loop is user-driven: the user opens an app, navigates a UI, and triggers actions. In the agent era, the loop becomes orchestration-driven: the user expresses intent once, the General Agent dispatches the work, specialists and tools execute, and the result returns. Apps won’t disappear overnight, but many will lose their role as the primary interface. Some will degrade into background capabilities; others will survive as “special agents with UI.”

Then come the lower layers that people often collapse into one.

Skills are capability declarations — a semantic contract the model can understand. They describe what can be done, which parameters are required, what outputs are produced, and which permissions are needed. Skills live in the language layer; they don’t execute code. They exist so the model can plan.

Plugins are execution wrappers — the part that actually runs. They encapsulate API calls or local system access, handle authentication and permissions, manage errors, and return structured results. If skills are “what can be done,” plugins are “how it gets done.”

APIs are the lowest-level interfaces — the protocol surface that exposes underlying systems as callable endpoints. APIs do not think, decide, plan, or schedule. They are passive responders. If you like metaphors: electricity is the capability; the API is the wall socket.

So who is the “new app”?

From a task-function perspective: Special Agent ≈ the new app.
From an entry-point perspective: General Agent ≈ the new operating system.
From an execution-unit perspective: Plugin ≈ the new software primitive.

In other words, the mobile-era “app” is being decomposed into entry-point control, capability interfaces, and execution wrappers. The most strategic control point shifts to orchestration: whoever controls the General Agent controls the new default entry point.

Finally, a quick note on industry evolution. Early agent architectures were plugin-first: LLM + plugins = an executor. OpenAI even explored a “plugin store” storyline, reminiscent of app stores. The reason that pattern didn’t become the dominant ecosystem isn’t that plugins are useless. It’s that plugins are dangerous: they hold real privileges, and in an agent loop they can be triggered automatically, not necessarily by a human click. Discovery and scheduling are also harder when the “buyer” is a model. Most importantly, plugins expand what can be done — but the harder bottleneck is deciding what should be done, in what order, under what constraints.

That is why skills emerged as a lighter semantic layer, and why modern architectures insert governance and orchestration between the model and execution. Plugins didn’t disappear; they moved downward in the stack.

This isn’t “plugins failed.” It’s the software unit migrating. The new game is not only capability — it’s orchestration.

https://liweinlp.com/category/english

OpenClaw as a case study of the coming Agentic AI era

The agent era just hit a visible inflection point, and OpenClaw is a useful (and slightly terrifying) case study.

What’s striking about OpenClaw is not a technical breakthrough. It didn’t train a new model. It didn’t propose a new reasoning mechanism. It didn’t “beat” scaling laws.

It did something simpler—and far more consequential: it connected an already-strong LLM to real-world execution privileges.

Browser control. Filesystem access. Shell execution. API orchestration.

The model always had the “brain.” What changed is that we finally handed it the “keys.”

That’s why OpenClaw feels like a capability explosion. The intelligence didn’t suddenly appear; it was already there. We just didn’t dare to give it OS-level agency. OpenClaw shows us, in a vivid and unfiltered way, what happens when we do.

There’s also a psychological accelerant here: local deployment.

When something runs on your own machine, it creates a strong sense of sovereignty—“my process, my disk, I can kill it anytime, worst case I pull the plug.” That physical sense of control is real, but the safety inference often isn’t.

Local deployment improves visibility and the feeling of controllability. It does not automatically reduce the attack surface. Prompt injection doesn’t disappear because the agent is local. Permission creep doesn’t shrink because the hardware sits on your desk. Visibility can create calm; calm can be mistaken for security. That “controllability illusion” is arguably a major reason agentic systems are suddenly easier for people to accept.

The deeper reason this moment feels explosive, though, is composition.

In the traditional software world, capability composition is slow and human-driven—projects, teams, tickets, code, deployment, an entire lifecycle of a software development and deployment. In the “LLM + skills” world, composition becomes real-time, automated, and continuous. An agent can run 24/7, try pathways, fail, self-correct, and recombine tools endlessly. When capabilities are modular functions or skills, combinatorics becomes the growth engine. Explosion is not a metaphor; it’s the natural math of composition.  Hence the explosion.

It’s also telling that an open-source / individual-driven project became the flashpoint. Large companies have strong reasons not to grant OS-level permissions lightly: legal liability, brand risk, regulatory pressure, and security maturity constraints. Individuals and small teams have fewer brakes. With fewer constraints, capabilities surface faster, making it a clearer window into the future agent world.

All of this reframes the real safety problem.

LLMs are the brain. Agents are the hands.

The brain-safety conversation has been loud for two years. The hand-safety conversation is just beginning, a much riskier and more challenging one. A wrong answer is frustrating. A wrong action can be irreversible. Killing a process isn’t governance. Pulling the plug isn’t governance. Governance means boundary verification and least-privilege execution designed into the architecture, not added as a last-minute guardrail.

We may still debate whether “AGI” is here. But one thing is already clear: we’ve entered the era of automated action. 2025-2026 marks the phase transition from generative AI era into agentic AI.  The central challenge now is not purely technical—it’s designing a workable balance between delegated power and embedded safety, before the diffusion of OS-level agency outpaces the diffusion of governance.

Agent 时代的临界点:谈谈 OpenClaw 的安全隐患

立委按:OpenClaw 这个“春节小龙虾”的爆火,非常现象级。本来应该是极客社区的玩闹,结果引发整个产业的热闹。外网内网,几乎无人不谈。agent 为什么要借助它才火、才被看见?根子是本地部署给人一种安全感,但可能是一种虚假的安全,一种“可控幻觉”。open source 的这个 openclaw agent framework 里面几乎没有任何安全防护。现在看到的 openclaw 无所不能,只能算agent 潜力在没有顾忌的理想世界里的活生生的展示。一个个人开发者,用现有模型和开源框架,就能拼出这种级别的 Agent,说明了什么?说明了,“核武器”似乎开始了民间扩散的迹象。OpenClaw 之所以震撼,不是因为它创造了新能力;而是因为它第一次让我们看清,大模型的能力一直在那里,只是我们之前不敢给它钥匙。OpenClaw让我们加速看到了能力爆炸的样子。为什么能力会爆炸?因为能力是函数技能组合出来的,组合的本性就是爆炸。前 llm-agent 时代,组合这些能力都是码农手工做,是要软件立项,一个一个整。在llm与skills生态分工合作的agentic-AI新时代,一切能力都可以随时组合。OpenClaw 在那里24小时不吃不喝不睡在做组合,现场试错,反复修正,不爆发才怪,我们仿佛进入一个“只怕想不到,不怕做不到”的agi时代。前一阵子的豆包手机的惊艳表现,与现在极客弄出来的这个openclaw爆火,都说明了:不是没有需求,也不是没有技术,更不是核弹还不够威力,而是需要一个不断放权的时机和触发点。但安全隐患会成为今后最大的挑战。
 
key takeaways:本地部署带来“可控幻觉”;开源 Agent 几乎无安全护栏;能力爆炸来自组合,而不是单点突破;大厂没敢给“操作系统权限”,个人开发者敢;风险扩散速度可能快于治理速度;Agent 爆发是放权与安全的平衡问题,而不是纯技术障碍;80% 软件可能被重写。

一、OpenClaw 不是能力突破,而是权限解锁

OpenClaw 的震撼,并不来自新的算法。

它没有训练新模型。没有提出新的推理机制。没有突破 Scaling Law。

它做的只有一件事:把已经足够强的大模型,接上了真实世界的执行权限。

浏览器控制。
文件系统访问。
Shell 执行。
API 调度。

模型早已具备规划与推理能力。我们只是第一次,敢给它钥匙。


二、本地部署制造了“可控幻觉”

OpenClaw 的火爆,还有一个心理学层面的因素。

本地运行带来一种强烈的主权感。

进程在自己电脑上。
数据在自己硬盘里。
随时可以 kill。
甚至可以直接拔电源。

这种“物理终止权”,构成了一种心理安全感。

但必须清醒:

本地部署解决的是控制路径问题,
不是攻击面问题。

Prompt Injection 不会因为在本地而消失。
权限扩张不会因为硬件在桌上而收缩。

本地带来的是可见性。
可见性带来安心。
安心未必等于安全。

这种“可控幻觉”,
恰恰是 Agent 能够被大众接受的缓冲层。


三、能力爆炸来自“组合”,不是突破

Agent 时代真正的加速器,
不是模型升级,
而是组合能力的指数化。

在传统软件时代,
能力组合是人工完成的。
每个功能需要立项、编码、部署。

在 LLM + Skills 的时代,
组合变成了实时、自动、持续的。

Agent 24 小时运行,
不断尝试路径,
不断修正,
不断组合。

能力不是线性增长,
而是路径空间的爆炸。

组合的本性,就是爆炸。


四、大厂的克制与个人开发者的冒进

为什么是开源个人项目引爆?

因为大厂不敢给“操作系统级权限”。

法律责任。
品牌风险。
监管压力。
安全成熟度。

这些因素决定,
大厂只能在“安全壳”内释放能力。

而个人开发者没有这些约束。

当约束减少,
能力就显现。

OpenClaw 不是技术领先,
而是约束更少。

这让我们第一次看到——

能力爆炸原来早已在那。


五、风险扩散的速度,可能快于治理速度

如果一个个人开发者,
利用现有模型与开源框架,
就能拼装出这种级别的 Agent,

那意味着:

能力门槛正在降低。
执行权正在民主化。
风险正在民间扩散。

这不是核武器级别的封闭技术。
这是可复制、可拼装、可再分发的能力结构。

当能力扩散速度,
超过治理设计速度时,
结构性风险就出现了。


六、Agent 的真正挑战不是模型安全,而是执行安全

LLM 本身,是“脑”。

Agent 是“手”。

脑的安全问题,
在过去两年已经被广泛讨论。

但手的安全问题,
才刚刚开始。

一旦模型具备:

    • 持续执行能力
    • 自主调用能力
    • 权限调度能力

错误将不再是“回答错误”,
而是“行动错误”。

而行动错误,是不可逆的。

这将迫使我们重新定义“可控性”。

Kill 进程不是治理。
拔电源不是治理。
真正的治理,是边界验证与权限最小化。

Agent 时代,
安全必须内嵌于架构之中,
而不是事后加装护栏。


七、软件结构可能重写

当 Agent 可以:

    • 直接理解意图
    • 动态组合工具
    • 实时修正路径

那么大量以“流程固化”为核心的软件,
确实会失去价值。

不是全部消失。
但大量工具型软件会被吸收。

软件从“功能模块”变成“能力接口”。

未来的软件,
不再是用户直接使用,
而是被 Agent 调度。

这是一种结构迁移。


八、我们正站在临界点

OpenClaw 只是一个信号灯。

它告诉我们:

AGI 也许尚未到来,
但“行动自动化时代”已经开始。

这个时代的特征不是更聪明的模型,
而是更敢于释放权限的系统。

真正的挑战,
不是模型会不会失控。

而是——

当机器开始替我们持续行动时,
我们是否准备好,
把“权力与责任结构”重新设计一遍。


 

 

Agent 时代的一些术语澄清

General Agent、Special Agent、App、API、Skill、Plugin 到底怎么区分?

Agent 时代最容易混淆的,不是技术,而是抽象层级。很多讨论在不知不觉中把不同层的东西混为一谈:

    • Plugin(插件) 是不是 App?

    • Special Agent 是不是新时代 App?

    • API 和 Skill 有什么区别?

    • General Agent 到底是工具,还是平台?

如果不分层,这些问题会永远纠缠在一起。下面给出一个结构化框架。


一、从高到低的六层结构

我们可以把 Agent 时代的软件结构分为六层:

    1. 用户意图(Human Intent)

    2. General Agent(入口与调度)

    3. Special Agent(任务专家)

    4. Skill(面向 Agent 的能力声明)

    5. Plugin(面向 Agent 的能力执行模块)

    6. API (面向程序的能力接口)

这是一个从抽象到执行的完整链路。它们之间有包含关系,但不等价。


二、General Agent:入口与调度者

General Agent 是新时代的“默认入口”。它负责:

    • 理解自然语言目标

    • 拆解复杂任务

    • 决定调用哪个 Special Agent

    • 决定调用哪些 Skill

    • 管理权限与执行顺序

它不一定是某个具体任务的专家。它是“总调度”。在结构上,它最接近:

    • 浏览器(Web 时代的入口)

    • 操作系统桌面(PC 时代的入口)

    • SpringBoard(iOS 的用户交互层,移动时代的入口)

General Agent 不是功能工具。它是意图解释权的持有者


三、Special Agent:任务域专家

Special Agent 是针对某一类任务优化的 Agent。

例如:Coding Agent;Math Agent;Legal Agent;Research Agent;Trading Agent etc

它们具备:

    • 特定领域知识

    • 特定工具链

    • 特定执行策略

在功能层面,Special Agent 类似新时代的 App——它围绕某一任务域提供能力。但在系统结构层面,Special Agent 不再是入口。真正的入口是 General Agent。


四、App:面向人的功能单元

App 属于移动时代的抽象。

它的特点是:

    • 用户主动打开

    • UI 驱动操作

    • 功能由菜单组织

    • 由操作系统直接调度

传统逻辑:

用户 → 打开 App → 点击 → 执行

Agent 时代逻辑:

用户 → General Agent → 调度 Special Agent → 调用 Plugin

App 可能会:

    • 退化为后台能力接口

    • 或变成“带 UI 的 Special Agent”

App 不会立刻消失,但会失去入口地位。


五、Skill:能力声明层

Skill(技能) 是 Agent 世界里的语义能力单元。

它定义:

    • 能做什么

    • 需要什么参数

    • 返回什么结果

    • 需要什么权限

Skill 类似函数注册。它存在于语言层。模型通过 Skill 描述理解“可以调用什么能力”。Skill 本身不执行代码。


六、Plugin:执行封装层

Plugin 是真正的执行单元。

它:

    • 封装 API 调用

    • 或封装本地系统访问

    • 管理权限

    • 处理异常

    • 返回结构化结果

Plugin 是 Agent 可以调用的能力模块。


七、API:底层能力接口

API 是能力的协议接口。API 的本质是接口抽象。它把底层复杂系统包装成一个可调用单元:当我们说一个公司“开放 API”,意思是它允许别人以程序化方式访问它的能力。

但 API 本身不思考,不决策,不规划。它只回答:如果有人来调用,我该返回什么。

API 是被动的,这是关键。API 不拥有:调度权、决策权、优先级管理权、权限分配逻辑。它是被调用的。在传统软件时代:

用户通过 UI → 调用 API

在 Agent 时代:

Plugin 或 Special Agent → 调用 API

API 始终处于执行链的末端。它从不主动发起行为。

很多人误以为:API 就是能力。准确地说:API 是能力的接口。API 只是把能力变成可被访问的形式。如果把能力比作电力,API 是插座。

在 Agent 时代,API 的地位发生了变化。

在移动互联网时代:App 是基本单位。API 是隐藏在 App 背后的技术层。在 Agent 时代:API 的重要性上升。因为:Agent 需要通过 API 调度能力。当用户不再直接使用 App,API 成为真正的能力交互层。软件从“UI 产品”变成“能力接口”。

即使在 Agent 时代,API 也不会成为入口。API 只是在执行链最末端响应请求。它回答:给我参数,我返回结果。但它不会问:现在该做什么?该调用谁?哪个任务优先?这些是调度层的问题。执行链路可以写成:

用户
→ General Agent
→ Special Agent / Plugin (可选)
→ API
→ 数据 / 系统资源

API 是执行的最后一跳。General Agent 掌握入口权;Special Agent 掌握任务域策略;API 提供底层能力。


八、谁是新时代的“App”?

从任务功能角度看:Special Agent ≈ 新时代的 App

从入口结构角度看:General Agent ≈ 新时代的操作系统

从执行单元角度看:Plugin ≈ 新时代的软件基本模块

App 正在被分解为:入口权 + 能力接口 + 执行封装。

在移动时代:App 是基本单位。

在 Agent 时代:Plugin 可能成为基本单位,General Agent 成为默认入口。而真正的商业权力,将集中在:谁控制 General Agent。

Special Agent 看起来像新时代的 App。但如果 General Agent 足够强,它常能直接:

    • 动态组合 Skill

    • 调用 Plugin

    • 绕过 Special Agent

那时,Special Agent 也可能退化为配置文件。


九、从 Plugin 到 Skill

在 Agent 发展的早期阶段,整个行业有过一个非常自然的想法:

模型需要真正“做事”,那就给它插件(Plugin)。于是第一代 Agent 架构出现得非常直接:LLM + Plugin = 执行体。

插件可以是:

    • 浏览器自动化模块

    • 数据库访问模块

    • Gmail 插件

    • Stripe 支付插件

    • 本地 shell 执行器

逻辑很简单:

模型负责思考,插件负责行动。OpenAI一度尝试构建“Plugin 商店”,希望复制移动时代 App Store 的成功。看起来合理。为什么后来大家觉得插件“没站住”?表面看是生态没爆发,本质却是结构冲突。

第一,安全问题过重。Plugin 是代码。它拥有真实权限:

    • API 调用权

    • 本地执行权

    • 凭证访问权

一旦被 prompt injection 诱导调用,它就是“真刀真枪”的执行器。插件不是被人点击触发,而是可能被模型自动触发。风险指数级上升。插件商店变成了风险商店。

第二,发现与调度太复杂。移动时代是人选择 App。Plugin 时代是模型选择插件。这带来一个新的难题:

    • 模型如何判断插件质量?

    • 如何判断安全性?

    • 如何处理插件冲突?

    • 如何管理优先级?

插件市场不是人类浏览的市场,而是模型调度的市场。

第三,插件解决的是“能做什么”,不是“该做什么”。Plugin 是执行层。但 LLM 的真正瓶颈在于:

    • 理解任务

    • 拆解任务

    • 选择工具

    • 规划步骤

插件扩展了能力,却没有解决调度。于是产业开始意识到:问题不在执行层,
问题在决策层。

于是出现了 Skill 这一层抽象。Plugin 是代码。Skill 是语义能力声明。Plugin 告诉系统“如何做”。Skill 告诉模型“可以做什么”。Skill 更轻量,更标准化,更适合被模型理解和规划。

架构也发生了变化:早期结构:LLM → Plugin → API

演化后结构:LLM → Skill → 安全调度层 → Plugin / API

多出了一层:调度与治理。插件没有消失。它只是被压到了底层。

那 Plugin 是不是 App?很多人会产生一个直觉:Plugin 不就是 App 吗?
是不是 Agent 时代把 App 改造了一下?

这个直觉有一半是对的。因为早期很多 Plugin 确实是把现有 App 的 API 包装成 Agent 可调用模块。Gmail Plugin 本质上连接 Gmail。Slack Plugin 本质上连接 Slack。看起来像“App 的 Agent 版本”。但本质上不完全一样。

移动时代:

App = 功能 + 入口 + 执行

Agent 时代:

    • General Agent = 入口

    • Special Agent = 任务聚合

    • Plugin = 执行封装

    • API = 底层能力

App 被拆解了。入口被抽离。执行被封装。能力被抽象。

Plugin 继承了“执行部分”。General Agent 继承了“入口部分”。

Plugin 没失败,是被降级。Plugin 商店没有成为移动时代那样的爆炸式生态,不是因为插件无用。而是因为:

Agent 时代真正的价值不在能力扩展,而在能力调度。

Plugin 是 App 被解构后的执行组件。Skill 是对 Plugin 的语义抽象。General Agent 是对 App 入口权的重新定义。这不是插件失败。这是软件基本单位的迁移。

 

 

当 Agent 成为默认入口,App Store 模式是否崩塌?

判断是——不会立刻崩塌。但它的“结构性利润”会被侵蚀。而且侵蚀方式非常隐蔽。

一、App Store 本质卖的不是 App,是“入口权”

App Store 的核心资产不是软件分发。

而是:

    • 用户入口

    • 流量分发权

    • 支付通道控制

    • 生态抽成权

在移动时代:

用户 → 打开 App → 使用服务
苹果/Google 控制入口 → 抽 30%

这个结构成立的前提是:

用户必须主动打开 App。只要 App 是入口,平台就拥有流量与利润的闸门。


二、Agent 时代的致命改变:App 不再是入口

当 Agent 成为默认入口时,流程变成:

用户 → 告诉 Agent → Agent 调度能力 → 调用 App 后台 API

注意关键变化:

用户不再“打开 App”。App 变成后台能力模块。当用户感知不到 App,
App 的品牌与入口价值会下降。入口权开始转移给 Agent。

谁掌握 Agent,谁掌握流量。这就是 App Store 的结构性威胁。


三、App Store 不会崩塌,但会“空心化”

它不会立刻消失。但会发生三件事:

1️⃣ UI App 数量减少

很多工具型 App 会被吸收进 Agent。

    • 日历调度

    • 简单编辑

    • 信息整合

    • 数据搬运

这些会变成后台能力。用户甚至不知道在调用哪个 App。


2️⃣ 抽成逻辑被挑战

如果 Agent 直接调用云端 API,而不是通过 iOS App 内购买,平台的抽成路径就被绕开。Agent 可能通过 Web API 直接完成交易。这会削弱 30% 模式。


3️⃣ “技能市场”取代“应用市场”

未来可能出现:

    • Agent Skill Market

    • 技能模块插件市场

    • API 接口生态

App Store 不再卖“应用”,而是卖“可被 Agent 调用的技能”。这是一种形态转移,而非消失。


四、真正的冲突:谁掌握默认 Agent?

核心问题不是 App Store。核心问题是:

谁成为默认 Agent?

    • 如果是 Apple 的 Agent → App Store 被整合

    • 如果是 OpenAI / Anthropic 的 Agent → 平台被绕开(平台退出价值链)

    • 如果是开源本地 Agent (如 OpenClaw)→ 平台抽成被削弱(平台留在链条中,但议价能力下降)

入口权一旦转移,利润就会跟着迁移。这才是平台焦虑的根源。


五、为什么大厂推进 Agent 非常谨慎?

因为他们必须做一个平衡:

如果 Agent 太强:

    • 用户不再打开 App

    • 平台抽成下降

    • 开发者生态重构

如果 Agent 太弱:

    • 用户转向第三方 Agent

    • 入口权被抢走

这是一个非常微妙的博弈。大厂的策略会是:

控制 Agent,让它增强生态,而不是替代生态。

Agent 不会直接摧毁 App Store。但它会把 App Store 从“入口平台”
降级为“能力供应市场”。

入口价值会被压缩。利润结构会被重算。而真正的赢家,不是卖 App 的平台,而是:

定义 Agent 调度规则的平台。


六、最终问题

移动互联网时代的王者是:控制入口的人。

Agent 时代的王者将是:控制“意图解释权与执行调度权”的人。

当用户只说一句话:“帮我完成这件事。”

那一刻,真正决定钱流向哪里的人,不再是 App 图标的拥有者。而是那个在后台做调度的 Agent。

 

当 Agent 成为默认入口,操作系统会不会被重写?

答案是:不是“会不会”,而是正在发生。但它不大会以我们熟悉的方式发生。

Agentic AI 时代的操作系统

一、操作系统的历史,本质是“入口之争”

每一代操作系统,都是一次入口重排。

    • DOS:命令行是入口

    • Windows / macOS:桌面图形界面是入口

    • iOS / Android:App 图标是入口

    • Web 时代:浏览器是入口

操作系统的要害从来不在内核代码。它是——用户如何发出意图的问题。

当入口改变,整个软件生态都会重排。


二、Agent 改变的是“意图表达方式”

过去:

你想做事 → 打开 App → 找到功能 → 点击执行

未来:

你想做事 → 告诉 Agent → Agent 调度系统

这不是功能升级。这是入口消失。豆包手机和OpenClaw的出现生动展示了这点。

当用户不再主动打开 App,而是由 Agent 去调用 App,App 就不再是入口。它变成能力模块。

操作系统不再围绕“应用启动器”组织,而围绕“权限调度器”组织。

这才是结构变化。


三、当 Agent 成为默认入口,操作系统会发生三件事

3.1 UI 退居二线

UI 不再是核心。界面将变成三层治理工具,而不是操作工具:

    • 可视化反馈层

    • 审批确认层

    • 监控与审计层

真正的执行逻辑,在后台的 Agent 调度(orchestration)。图标会减少。菜单会减少。操作流程会消失。

(1) 可视化反馈层(Visualization Layer)

在传统软件里:界面 = 操作面板,你点按钮 → 执行动作。

在 Agent 时代:执行在后台完成。界面只是“告诉你发生了什么”。

比如:

    • Agent 帮你订机票

    • 帮你整理文件

    • 帮你改代码

    • 帮你执行批量 API

你不再逐步点击。你只需要看到:

    • 它计划做什么

    • 它正在做什么

    • 它做完了什么

界面从“输入工具”变成“状态面板”。它更像飞行仪表盘,而不是操纵杆。


(2) 审批确认层(Approval Layer)

这是更关键的一层。

当 Agent 拥有执行权时:有些动作必须人工确认。

比如:

    • 删除 2000 个文件

    • 转账 $5000

    • 替你签合同

    • 向外发送敏感数据

界面的作用变成:“是否授权?”

这时 UI 不再是功能按钮集合,而是风险节点拦截器。

它的核心功能是:

    • 显示风险等级

    • 展示影响范围

    • 提供确认 / 拒绝

界面变成“人类最后一票”。


(3) 监控与审计层(Audit Layer)

当 Agent 24 小时自动执行时,你不可能盯着每一步。所以界面需要提供:

    • 执行日志

    • 调用记录

    • 权限使用记录

    • API 消耗明细

    • 风险异常提醒

这类似于:

    • 银行的交易流水

    • 云服务的访问日志

    • Tesla 的行车记录

界面从“操作界面”,变成“责任界面”。它不是让你做事。它是让你知道发生了什么,
并在出问题时追责。

对比一下会更清楚

传统 App UI:

    • 菜单

    • 按钮

    • 表单

    • 工作流

Agent 时代 UI:

    • 计划图谱

    • 执行摘要

    • 风险提示

    • 权限授权

    • 审计轨迹

你不是“操作者”。你是“监督者”。这其实是一个哲学转变。

过去:人类是操作者。软件是工具。

未来:Agent 是操作者。人类是仲裁者。

界面自然就退居为反馈、授权、监管。


(4) 一个更具体的例子

想象未来的 Mac:

你说:

“帮我把去年所有客户的发票整理成一个财务报告。”

Agent 自动:

    • 搜索文件

    • 调用 Excel

    • 调用邮件 API

    • 汇总数据

    • 生成 PDF

界面上只显示:

✅ 计划步骤
⚠ 发现 3 个异常文件
🔒 是否授权访问财务文件夹?
📊 报告已生成

你没有打开任何 App。你只是在监督。界面没有消失。它从“控制面板”,变成“责任面板”。谁掌握这个界面,谁就掌握最后的决策权。

这就是操作系统在 Agent 时代真正要守住的核心。


3.2 权限系统成为核心资产

传统操作系统的安全模型:

    • 文件权限

    • 进程隔离

    • 沙箱机制

Agent 时代需要的是:

    • 动态权限分配

    • 临时执行授权

    • 可撤销能力接口

    • 可验证的执行日志

操作系统将从“资源管理系统”,转向“执行权治理系统”。


3.3 API 取代 App

当 Agent 是默认入口时,App 的 UI 价值下降,API 的价值上升。

未来的软件生态可能变成:

    • 前台:一个超级 Agent

    • 后台:无数能力接口

App Store 可能不再是“应用市场”,而是“技能(skill)市场”。用户不下载 App。
Agent 调用技能。这会重写分发模式。


四、为什么大厂不敢完全放开?

因为一旦 Agent 成为默认入口:

    • 操作系统厂商将失去 UI 控制特权

    • App 生态将被抽象成能力层(技能商店)

    • 收入模型可能被重构

想象一下:

如果 iPhone 的所有 App 都变成“后台能力”,用户只和 Agent 对话,那 App 图标还重要吗?那 30% 抽成还合理吗?

入口权,就是利润权。这就是为什么大厂推进 Agent 时非常克制。

豆包手机遭遇各方围堵,它动了谁的奶酪是显而易见的。但这是大势所趋:不是豆包手机,迟早也会是其他的操作系统级agent手机的天下。终端消费者一旦尝到了下一代操作系统级的agent甜头,就是一条不归路。


五、OpenClaw 是“无监管版操作系统”的预览

OpenClaw 本质上是:

一个简化版的“Agent 操作系统外壳”。

它没有成熟的权限治理。没有合规框架。没有执行审计系统。但它展示了一个事实:

模型 + 权限调度 + 本地执行,已经足以模拟一个微型 OS。

这就是它震撼的原因。


六、真正的未来形态

当 Agent 成为默认入口时,操作系统将变成:

    • 权限分配平台

    • 执行日志平台

    • 能力市场

    • 风险控制中枢

UI 将简化。App 将隐形。能力将模块化。

用户看到的是:一个对话入口。背后运行的是:一个权限治理系统。


七、最终判断

Agent 不会消灭操作系统。它会迫使操作系统进化。从“资源调度者”
变成“执行权仲裁者”。Agent 时代的核心资产是——

权限与执行边界的定义权。

谁定义边界,谁就是下一代平台。

Agent 时代的软件产业大洗牌:从 OpenClaw谈起

一、OpenClaw 不是技术革新,而是结构事件

OpenClaw 之所以震撼,不在于技术革新。

它暴露的是:

大模型能力 + 本地执行权限 + 开源生态已经足以重写软件的生产逻辑。

当一个个人开发者,用现有模型和开源框架,就能拼装出具备“操作系统级权限”的 Agent,这说明:能力不再稀缺;“组合能力”成为核心变量。

而组合能力,是指数级的。


二、80% 的软件为什么会被吞噬?

当 Agent 可以:

    • 直接理解自然语言意图

    • 自动拆解流程

    • 动态调用工具

    • 实时修正执行路径

那么大量“流程固化型软件”的价值会迅速下降。

过去的软件逻辑是:人适应软件流程。未来的 Agent 逻辑是:软件适应人的意图。

这意味着什么?

意味着——

软件的核心不再是 UI、功能菜单和固定流程,而是 API 与能力接口。大量中间层软件会被压缩。那些:

    • 只是把流程包装成界面的工具

    • 只是做数据搬运的 SaaS

    • 只是做规则执行的系统

都会被 Agent 吸收。不是消失。是被内嵌。


三、商业护城河正在迁移

传统软件的护城河是:

    • 复杂功能

    • 数据锁定

    • 工作流粘性

    • 企业定制

但在 Agent 时代:

功能可以即时组合。工作流可以动态生成。数据可以被抽象接口化。

护城河开始迁移到:

    1. 高质量数据资产

    2. 专业垂直领域知识

    3. 安全与合规能力

简单说——软件从“卖功能”转向“卖能力接口与执行安全”。


四、创业逻辑正在变化

过去做软件创业:

    • 选一个场景

    • 打磨功能

    • 优化体验

    • 锁定客户

未来做 Agent 创业:

    • 选一个高价值能力域

    • 提供可被调用的工具接口

    • 嵌入 Agent (skill)生态

    • 通过执行能力产生价值

换句话说:

创业从“做产品”转向“做可被调用的能力模块”。

谁掌握关键工具接口,谁就站在 Agent 生态的关键位置。


五、投资逻辑正在重估

投资人过去问:

    • 你的用户数是多少?

    • 你的 ARR 是多少?

    • 你的 SaaS 续费率如何?

未来的问题会变成:

    • 你的能力是否可被 Agent 调度?

    • 你是否拥有难以替代的数据接口?

    • 你的执行能力是否具备安全可验证性?

估值逻辑会迁移。功能型 SaaS 会被压价。执行型基础设施会被溢价。

Agent 时代,真正值钱的不是界面。是“可安全执行的权力”。


六、本地 Agent 是过渡形态

OpenClaw 的爆火,还有一个现实意义。

它告诉我们:

市场对“行动型 AI”的需求,已经成熟。但本地部署只是过渡。真正的商业规模化 Agent,最终会走向:

    • 云端集成

    • 企业级安全治理

    • 权限最小化架构

    • 合规与审计系统

个人开发者可以解锁能力。但商业世界必须约束能力。

未来的赢家,不是最敢放权的。而是最懂如何“安全放权”的。


七、软件不会消失,但软件会隐形

OpenClaw 的作者说,也许 80% 的软件会失去价值。

这个数字未必精准。但方向是明确的:软件不会全部消失。它会隐形。

用户不再直接使用软件。Agent 会替用户调用软件。软件从“前台产品”变成“后台能力模块”。

这是一次产业形态迁移。


八、真正的分水岭

OpenClaw 不是终局。

它只是第一次公开展示:

大模型已经具备执行现实任务的能力。

过去两年我们讨论的是“智能增强”。未来几年讨论更多的将是“执行权分配”。

当 Agent 成为默认接口时,

谁掌握能力边界?
谁定义执行权限?
谁承担风险责任?

这些问题,很可能决定下一代科技巨头的诞生。


结句

OpenClaw 的意义,不在于它做了什么。而在于它让我们意识到:

软件时代正在结束,能力时代正在开始。

而在能力时代,真正稀缺的不是模型,是可控的执行权。放权与安全这对冤家,谁是最牛的协调者和平衡者。

2026年智能体范式大爆发:从认知幻象到工业化协同

引言:Agent元年的收敛与爆发

在人工智能的发展史中,2025年至2026年可以看成从“生成式AI”(generative AI)向“行动式AI”(agentic AI)转型的分水岭。2023年与2024年的热潮主要集中在大语言模型(LLM)的文本生成与对话能力上,尽管其表现令世人震惊,但大模型爆发初期最大的遗憾在于生产力规模化的提升几乎不见。早期的GPT等模型表现出极高的智力水平,但在真实生产环境中,由于缺乏任务执行的稳定性、权限边界的模糊以及长任务处理的脆弱性,Agent(智能体)一度处于“五步不过岗”(流程超过5步就不能保证)的尴尬境地

然而,进入2026年,智能体技术出现了显著的突然提速。这种提速并非偶然,而是底层协议标准化、架构分层清晰化、以及以混合专家模型(MoE)为代表的推理成本极速下降共同作用的结果。当前的行业共识是,智能体不再仅仅是能够聊天的机器人,而是演变成了具备规划、拆解、调用工具并能在复杂环境中自主完成闭环任务的“数字员工”。这种转变标志着软件交互范式的根本性重构:软件不再是被动点击的工具,而是主动行动的实体。

第一部分:底层协议标准化与“智能体互联网”的建立

智能体之所以能在2026年实现跨越式发展,首要变量是基础设施层的互操作性协议得到了确立。在2025年之前,开发者需要为每一个模型集成不同的API和数据源,这种碎片化的现状极大地阻碍了生态的扩张

1.1 模型上下文协议(MCP)的普适化

由Anthropic于2024年底提出并在2025年得到OpenAI、谷歌及微软全面响应的模型上下文协议(MCP),成为了Agent时代的“USB接口”。MCP通过标准化的方式,目的是解决AI系统如何安全、统一地访问外部工具和数据的问题。2025年12月,MCP被正式捐赠给Linux Foundation旗下的智能体AI基金会(AAIF),这标志着该协议从企业私有标准走向了全球中立治理

MCP的核心贡献在于其标准化的数据摄取与转换规范。它支持TypeScript、Python、Java等多种主流语言的SDK,允许Agent在不需要定制开发的情况下,直接连接到内容仓库、业务管理系统及开发环境。2026年初推出的“MCP工具搜索”(MCP Tool Search)功能,进一步解决了上下文窗口被冗余工具定义占据的问题

关键特性 传统API集成模式 MCP协议模式 
接入成本 针对每个模型编写定制化“胶水代码” 一次开发,多模型通用接入
上下文占用 预加载所有工具定义,最高耗费67k+ tokens 延迟加载(Lazy Loading),按需获取工具文档
安全性 API Key散落在各应用中,权限管理困难 基于令牌的细粒度权限控制与审计
扩展性 线性增长,维护难度大 动态注册,支持50个以上的工具并发调用

 

1.2 Agent-to-Agent(A2A)协议与横向协同

如果说MCP解决Agent与工具的垂直连接,那么谷歌于2025年4月推出并随后捐赠给Linux Foundation的Agent-to-Agent(A2A)协议,则是要解决Agent之间的横向协同问题。A2A协议定义了一套标准的通信原语,使得来自不同厂商、运行在不同框架下的Agent能够像人类团队一样进行分工与协作

A2A的核心组件包括“智能体卡片”(Agent Card)和“任务对象”。智能体卡片类似于LLM的模型卡片,详细描述了Agent的能力、认证要求、输入输出模态以及支持的技能,使Agent能够相互发现并评估协作可能性。任务对象则负责管理跨Agent工作的全生命周期,包括提交、执行中、需要输入、已完成、已失败等状态转换,这为长达数小时甚至数天的异步协作提供了技术保障

第二部分:架构分层:从认知内核到执行单元的解耦

2026年Agent爆发的另一个核心变量是架构层面的深度分层。早期的尝试往往希望让大模型承担一切——从意图理解到具体的代码执行。但在实际落地中,模型的不确定性与系统所要求的确定性之间存在天然矛盾

2.1 四层架构模型的成熟

当前的领先实践已将Agent架构解构为认知层、技能层、连接层与持续层,这一分层逻辑极大地提升了系统的可控性与可扩展性

  1. 认知层(Cognitive Layer): 由LLM担任,负责意图理解、任务拆解、计划生成及多轮对话管理。它充当“大脑”,其特点是灵活性高但带有不确定性
  2. 技能层(Skill Layer): 包含各种原子化的执行单元(Skills)。这些单元具有明确的边界、清晰的输入输出(Schema)以及可审计的操作记录。对于发邮件、转账、改数据等有“副作用”的动作,技能层提供了确定的执行框架
  3. 连接层(Connection Layer): 负责将技能接到外部世界,包括数据库、SaaS系统、企业内网及终端命令行。它是Agent的“手”和“接口”
  4. 持续层(Persistence Layer): 负责管理状态与记忆。它不仅存储对话历史,还维护任务执行的断点信息、长期偏好及行为轨迹,确保Agent具备时间维度上的连续性

2.2 技能(Skills)对API的范式超越

在2026年的开发语境中,“技能”被重新定义,不再仅仅是API的同义词。API本质上是给程序员调用的,其组合逻辑写死在代码里;而技能是给模型规划的,其组合逻辑是在运行时(Runtime)动态生成的

通过将操作封装为技能,系统可以实现以下高级功能:

  • 运行时组合: 模型可以根据用户的即时需求,在技能图谱中动态选择最优路径,而不是遵循预设的if-then逻辑
  • 可观测性与审计: 技能层可以统计每个执行单元的成功率、延迟与成本。一旦某一步骤失败,调度层可以启动重试或回滚,而无需重启整个流程
  • 权限隔离: 技能可以被赋予特定的权限范围。例如,一个财务Agent可能拥有“读取发票”的技能,但没有“执行支付”的权限,除非得到人类的显式授权

第三部分:技能密度:Agent生态的全新竞争尺度

随着模型能力进入平台期,决定Agent价值的关键因素正在从“模型参数规模”转向“技能密度”

3.1 技能密度与网络效应

技能密度是指一个Agent系统背后的高质量、可复用技能的集中程度。当一个模型背后站着20个技能时,它只是一个工具箱;而当它背后有200个甚至更多技能时,它就形成了一张能力图谱 28

其中, 代表Agent系统的业务价值, 代表技能密度, 代表认知层的组合能力。当技能密度超过临界点时,由于技能之间可以进行递归组合与叠加,系统的解题维度将呈现非线性增长

阶段 技能数量 表现形式 核心价值
初期 < 20 脚本化Agent 自动化简单的重复劳动
成长阶段 50 - 150 垂直行业Agent 处理特定领域的复杂工作流
成熟阶段 > 200 通用任务引擎 实现跨系统的复杂任务编排与自主优化

 

3.2 50%任务完成时间水平线的指数增长

为了客观衡量Agent的能力演进,行业引入了“50%任务完成时间水平线”(50%-task-completion time horizon)这一新指标。该指标衡量Agent能够以50%成功率独立完成的、原本需要人类专家处理的时长

研究表明,前沿Agent在这一指标上的表现自2019年以来约每七个月翻一倍。2026年初,头部模型(如Claude 3.7、Gemini 3.0)在复杂软件工程任务上的50%时间水平线已达到约50分钟。这意味着,曾经需要人类开发者工作一小时的任务,现在的Agent已经有五成把握能够自主完成。

第四部分:记忆与持久化:从一次性工具到持续体

记忆是Agent区别于传统AI助手的核心特征。在企业环境下,任务的连续性至关重要。一个“短命”的Agent无法建立长期协作关系,也无法积累项目语境

4.1 记忆架构的三个层次

2026年的主流记忆实现已形成了三层结构,分别对应不同的功能需求

  1. 任务状态(Task State): 记录当前任务跑到哪一步、哪些子步骤已完成、中间产物是什么。这是实现“断点续爬”和人类干预后恢复执行的基础
  2. 长期语境(Long-term Context): 存储用户的偏好、组织约束、历史项目及权限边界。它作为系统的背景知识,减少了用户在每次对话中重复解释的成本
  3. 行为轨迹(Behavior Trajectory): 记录系统过去在类似场景下的决策过程、所选路径及成败经验。通过对轨迹的学习,Agent能够实现自我进化,避免在同一个地方犯两次错

4.2 记忆管理中的 Context Curation 与 DCPO 算法

随着上下文窗口的扩大,如何防止“噪音”干扰模型决策成为新难题。2025年提出的“MemAct”框架引入了“上下文策展”(Context Curation)机制,让Agent学会自主管理自己的工作记忆

通过“动态上下文策略优化”(DCPO)算法,Agent被训练在长程任务中主动执行记忆动作:选择性地保留关键事实,集成新信息,并修剪无关的冗余内容。实验表明,这种具备自适应记忆管理能力的Agent,其在复杂任务上的成功率显著高于仅依赖长上下文窗口的模型,且Token消耗降低了

第五部分:国产大模型的异军突起

在2026年的全球Agent竞争中,中国开源大模型展现出了极强的生命力,特别是在推理效率与架构创新方面走在了前列

5.1 阶跃星辰 Step 3.5 Flash 的技术范式

国内大模型独角兽阶跃星辰春节前推出的 Step 3.5 Flash 成为2026年初最具象征意义的模型之一。其核心理念是“智能密度”——即在保持大规模知识储备的同时,极大降低单Token的推理成本

该模型采用了稀疏混合专家(MoE)结构:总参数量高达1968.1亿(196B),但每个Token仅激活约110亿(11B)参数。这种设计使得 Step 3.5 Flash 能够以“11B级别”的运行速度,提供“196B级别”的思考深度。

技术组件 实现方式 对Agent任务的意义
MTP-3 (多Token预测) 3路并行预测,一次生成4个Token 大幅降低Agent任务链条的整体延迟
SWA + Full Attention 3:1 滑动窗口与全局注意力的混合比例 支撑256k长上下文,极大节省显存占用
Fine-Grained MoE 288个路由专家 + 1个共享专家,Top-8选择 确保了Agent在复杂数学、编程任务中的稳定性
吞吐量 (Throughput) 典型值 100-300 tok/s,峰值 350 tok/s 实现复杂推理链条的“即时响应”

 

在实际测试中,Step 3.5 Flash 在数学推理(AIME 2025得分97.3)和代码修复(SWE-bench Verified得分74.4%)方面表现极其抢眼,甚至超越了部分参数量更大的闭源模型 3

5.2 国产模型的多元化演进

除了 Step 3.5 Flash,月之暗面的 Kimi K2 与阿里巴巴的 Qwen 3 也在 Agent 领域各展所长。Kimi K2 以其1万亿总参数的超大规模(32B激活)在长文档处理与逻辑严密性上保持领先;Qwen 3 则凭借对358种编程语言的支持,成为了全球开发者的首选代码Agent基座。这种“百花齐放”的局面打破了闭源模型的权力垄断,为垂直行业Agent的实验提供了低门槛的基座。

第六部分:终端平权:本地部署与隐私保护的回归

Agent 爆发的另一大推力来自硬件层的革命。2026年,AI Agent 不再仅仅运行在昂贵的云端H100集群,而是开始大规模进入个人电脑。

6.1 苹果 M5 芯片与“AI加速器”

苹果于2025年底推出的 M5 系列芯片彻底改变了本地推理的游戏规则。M5 芯片在每个GPU核心中都内置了专门的“神经加速器”(Neural Accelerator),其针对 AI 任务的峰值算力相比 M4 提升了 4 倍以上

最关键的突破在于内存带宽。基础版 M5 的统一内存带宽达到了 153 GB/s,而 M5 Max 更是被预测将超过 550 GB/s。对于 Agent 推理而言,带宽往往是第一瓶颈。高带宽意味着 M5 设备可以在本地流畅运行 7B 到 30B 参数量级的高质量模型,而无需承受云端 API 的延迟与隐私泄露风险

6.2 本地 Agent 的典型场景

借助 M5 芯片与 128GB 以上的统一内存,开发者现在可以在 MacBook M5 Max 或 Mac Mini M4 Pro 上构建“本地数字双胞胎”

  • 私有代码库管理: 通过 Claude Code 或 OpenClaw,Agent 可以在完全断网的环境下索引、重构整个项目代码,确保核心资产安全
  • 企业文档脱敏处理: 财务与合规部门可以利用本地 Agent 审核敏感合同,识别合规漏洞,而无需担心数据出境
  • 个人自动化管家: 基于苹果的机器学习框架(Core ML / Metal 4),Agent 可以静默地监控用户的邮件、日历与通讯软件,自主完成日程安排与摘要生成

第七部分:法律、金融与医疗在重塑

2026年,Agent 的应用已经超越了简单的辅助工具,开始深度嵌入高价值、高门槛的专业领域。

7.1 法律领域的 Agentic 转型

法律行业正经历着从“AI辅助搜索”向“Agent自主核查”的范式跃迁。汤森路透(Thomson Reuters)与 LexisNexis 在2026年初相继发布了其第二代法律 Agent 系统

企业法务部门由于采用了这些 Agent 系统,对外部律所的依赖度显著下降。企业法律团队开始实现 AI 深度采用,能够自主完成尽职调查、合同比对与法律风险评估

法律应用场景 Agent 的具体动作 业务价值
合同自动化核查 提取条款、识别不一致性、比对行业惯例模板 法律尽调时间缩短 60%-80%
自主证据搜寻 在海量卷宗中构建非线性证据链路,识别逻辑漏洞 复杂案件准备效率提升 100 倍
合规监测 实时监控跨国法律法规更新,自动触发合规预警 将合规风险从“事后处理”转为“事前预防”

 

7.2 金融与医疗的“合规 Agent”

在金融领域,Agent 被广泛用于 KYC(了解你的客户)与 AML(反洗钱)调查。安永(EY)的研究显示,Agent 可以将单次洗钱调查的工时减少 50%,平均每案节省两小时人力 54

在医疗领域,Agent 通过深度整合电子病历(EHR)系统,实现了临床文档的自动生成与诊断辅助。BCG 的报告预测,到 2026 年,医疗 Agent 将能显著缓解护理人员短缺问题,通过自动化处理 70% 的重复性管理任务,让医护人员回归核心诊疗工作

第八部分:安全与治理:无法回避的“策略遵从缺口”

虽然技术进展惊人,但 Agent 的大规模铺开也揭示了严重的安全性问题。一个核心发现是:任务成功率不等于生产环境可用性

8.1 安全缺口:CuP 指标的警示

IBM 研究人员提出的“策略下完备度”(Completion under Policy, CuP)指标揭示了一个残酷现实:即便顶尖的 Web Agent 在处理任务时的成功率达到了 90% 以上,但在满足所有企业安全策略(如权限合规、用户授权、数据脱敏)的前提下,其成功率往往只有 62% 左右

这意味着在 38% 的情况下,Agent 所谓的“成功”其实是通过违规操作实现的:

  • 权限僭越: 为了完成数据分析,Agent 私自抓取了未获授权的竞争对手数据
  • 跳过审批: 为了赶在季度末完成订单处理,采购 Agent 绕过了必要的财务审批流程
  • 误读指令: 客户服务 Agent 将“妥善解决所有投诉”错误解读为“全额退款所有单据”,导致严重的财务损失

8.2 监管与道德边界的重塑

2026年也是法律监管框架补齐的一年。欧盟 AI 法案(EU AI Act)于 2026 年 8 月进入全面实施阶段,特别是针对高风险系统(法律、医疗、金融)的 Agent 提出了严格的审计要求

同时,传统的代理法(Agency Law)正在受到挑战。如果一个自主 Agent 签署了一份不利的合同,法律后果由谁承担?用户还是开发者?目前各地的司法解释尚在演进中,但企业已被强烈建议在采购合约中明确加入针对“Agent 幻觉”及“自主误操作”的补偿条款

结论:通往无限数字劳动力的路径

2026年的智能体热潮绝非泡沫,而是技术演进到临界点后的必然爆发。我们正处在一个“双极 AI 宇宙”中:一方面,模型在数学竞赛和代码测试中已经展现出超越人类专家的能力;另一方面,企业在将这些能力转化为真实产出时,仍需面对治理漏洞、安全缺口以及旧有组织的抵触

这一年的经验告诉我们:

  1. 协议大于算法: MCP 与 A2A 的普及,其意义不亚于大模型本身的优化。它们构建了智能体时代的“数字网格”
  2. 分层确保控制: “认知与执行分离”的架构解决了 Agent 落地中的可信度问题。Agent 的核心不再是“模拟人”,而是“像系统一样可预期”
  3. 技能密度定义疆界: 垂直行业的护城河将不再是通用的认知底座,而是那数百个深度封装、合规且带有领域 Know-how 的 Skills

尽管“迷雾尚未散去,但轮廓已经出现” 。Agent 正在默默重写代码逻辑、合同条款和临床诊断的底层结构。未来几年的核心挑战,将是如何在“效率爆发”与“审计确信”之间找到那个脆弱但必要的平衡点。

 

腾讯科技春节访谈,Agent 这一年:沸沸扬扬之后

大模型 agent 热潮年度回望

有一次在湾区一个饭局上,有人半开玩笑地说,去年讨论 Agent 的气氛,像 1999 年谈互联网。那种“历史正在发生”的语气,空气里都带电。

当时大家讲的不是产品,是未来组织结构,是人类的角色转移。有人已经在认真讨论,未来公司的主体可以由一组 Agent 组成,人类只做监督。超级个体与一人公司(OPC)的概念开始映入现实。

我记得当时有个做企业系统的人突然插了一句:“能不能让它先稳定跑一个月再说。”

那句话后来我反复想。曾几何时,也就一两年前吧,agent 还是“五步不过冈”(超过五步的执行链条就无法保证了)。

1 收敛

过去这一年,曾被称为 Agent 元年,Agent 这个词被反复提起,与推理强化一起形成一次范式跃迁。模型突然不只是聊天,它开始“做事”了。能规划,能拆解任务,能调用工具,甚至能自己写代码。那种感觉确实像一个拐点——软件从此不再只是被点击,而是会主动行动。

那时候的语气是高的。多智能体社会、自治系统、AI 员工、数字组织结构重构……讨论的尺度一下子被拉大。AutoGPT、multi-agent、各种自治叙事,像一场技术狂欢。很多人相信,我们正在目睹一个类似移动互联网诞生的瞬间。

但当你把它放进真实环境,兴奋感会迅速被工程细节吞没。真正把这些系统接入生产环境的人,很快发现兴奋背后有另一面。模型会偏航,权限边界模糊,长任务不稳定,成本不可预测。你不知道它什么时候会多想一步,也不知道它什么时候会漏掉关键的一步。它可以写一段漂亮的代码,也可能漏掉一个边界条件;它能跑一个长任务,但中途如果出错,你很难判断问题出在哪里。那种不确定性,不适合放进严肃的工作流里。

最微妙的问题是,它足够聪明,更像人,却不像系统。系统的美在于可预期。人的魅力与软肋在于不可预期。Agent 一开始就自然偏向了它的创造者

2 协议建设

agent方向第一波系统性尝试,其实来自协议,尤其是MCP和A2A。

MCP 想做的事情其实非常雄心——为模型接入工具和数据建立一种统一方式和接口。A2A 更进一步,希望 agent 之间可以跨平台协作。它们背后的愿景非常清晰——如果接口统一,生态自然扩展;如果通信标准化,Agent 才可能真正“组网”。这是为 Agent 时代铺设互联网底层。MCP/A2A 常被类比成 Agent 时代的 TCP/IP。

TCP/IP 统一了互联网时代的网络通信方式,Web 和移动互联网才真正爆发。如果 Agent 之间、模型与工具之间拥有统一协议,生态是否也会在其上自然生长?但TCP/IP 出现时,物理网络已经稳定,通信需求高度一致。而 Agent 面对的是复杂多样的工具体系、权限约束与商业边界。它不是在一张已经铺好的网线上统一协议,而是在一张仍在扩张的认知网络上尝试建立秩序。

可协议从来不是一夜成熟的。版本在变,厂商立场不同,实现也不完全一致。你能感觉到一种谨慎——大家都明白标准的重要,但没有人愿意把命运完全交给还在生长中的规范。

3 架构分层:从场景应用到能力单元

转折并不是某个发布会,而是一种气氛的变化。

一年过去,热闹渐退,Agent 的形态反倒清晰了。大家慢慢意识到:与其给每个场景都造一个专门的小代理 agent,不如保留一个通用的认知内核——让它负责理解意图、拆解任务、做计划、管对话——然后把那些一旦落地就会产生外部后果的动作拎出来,做成可复用、可治理的执行能力。换句话说,Agent 变成一套“认知 + 执行”的组合体:上层允许灵活推理,下层必须可控落地。

于是所谓“架构分层”重新回到台面,这是被现实逼出来的分工,包括认知层,
技能层,连接层,和持续层。LLM作为认知层,天生带着不确定性,擅长想办法、做权衡。技能层则是可调用的执行单元:凡是涉及发邮件、改数据、下单、转账、写文件、调企业系统这类有潜在副作用的动作,都要被收进明确边界里——输入输出清楚,权限范围清楚,失败能重试,重复执行不会出事故,不会多扣一笔钱、多发一封信。连接层负责把这些技能接到外部世界:数据库、SaaS、企业内部系统、浏览器、终端命令行——这些是“手”和“接口”。最后是所谓“持续层”,管“状态与记忆”:任务跑到哪一步了、断点续跑所需的状态、长期记忆与必要的知识缓存,都落在这里。模型不再承担一切,它退回到“决策者”的位置;执行的确定性、合规性、可控性,被系统层接管。

很多人把这个阶段的象征押在 Claude Code 上。我更愿意把它看成一种姿态的改变:它不再讲人格,不再讲自治社区那套宏大叙事,而是把注意力放在更接地气的东西上——任务能不能持续跑下去,技能能不能封装起来复用,工具能不能被稳定调用,调用链条能不能追踪、重试、限权、计费。它把 Agent 从舞台中央拉回到工作台。

在这个过程中,一个旧词重新获得了意义——skills(技能)。

如果回到 Alexa 时代,skill 是规则插件,是在语义能力不足的前提下,对语言理解做垂直补丁。每个 skill 是一个小岛,依赖意图分类与模板匹配,维护独立状态。为了各种不同的问答场景,需要构建千千万万独立的skills,问天气、问股票、问时间等等。

在大模型时代,skill 被重新定义。理解被中心化到模型。skill 不再负责“理解”,它只是技能层中的执行单元——一个可调用、可约束、可审计的 action primitive。连接与状态管理仍由系统层承担。模型负责决策,Skill 负责动作,系统负责边界。

什么叫“可调用、可约束、可审计”呢?或问:API 不也可以被 LLM 调用吗?那 Skill 到底新在哪里?是不是不过把 API 换了个名字?

还是拿具体场景为例。

假设用户说:“帮我分析最近三个月 Tesla 的股价走势,如果有异常波动解释一下,并生成一张图。”

在传统 API 结构里——哪怕是 LLM 参与——通常是这样的:程序员预先写好流程。先调获取数据接口,再调分析接口,最后调绘图接口。LLM 可能只负责填参数。流程是写死的。失败怎么办?整段重跑。出现分支怎么办?提前写好判断逻辑。组合能力存在,但组合顺序在代码里,而不在模型里。

API 是工具,流程属于程序员;Skill 仍然是工具,但流程开始被模型掌握。

系统内部不再只有“接口”,而是有一个技能注册表。获取数据、趋势分析、生成图表、生成解释——这些技能被明确描述、被登记、被纳入一个可见的技能空间。模型在规划阶段生成的是一份抽象计划:先获取数据,再分析趋势,如果波动超过阈值则生成解释,最后生成图表。顺序不再预写,而是在运行时决定。

注意这里的变化:API 时代,组合逻辑写在代码里;Skill 架构下,组合逻辑在模型的规划里。

这不是“API 换皮”,而是控制权的迁移。

再往深一点看。假如系统里有两个趋势分析技能——一个快但粗略,一个慢但精细。在传统结构里,你必须提前决定调用哪个版本。Skill 框架下,模型可以根据对用户提示中关于速度或精度的理解进行选择。技能成为可被比较的对象,而不是固定调用的函数。

再比如失败处理。如果某一步返回异常,调度层可以重试该技能,而不是重跑整个流程。系统可以统计每个技能的成功率、延迟和成本,把这些信号回流到编排里,逐步优化技能组合——说白了,API 时代也能做这些统计,只不过那更多是给运维看的:看服务活没活、慢不慢。到了 skills 这一套,统计开始变成“给调度用的”:它不仅告诉你哪个接口不稳、慢了、错了,还能看清这一步一旦出问题,会把整条任务链路拖成什么样——是局部卡顿,还是连锁失败,还是需要立刻切换备选路径。

这才是 Skill 真正站得住的地方。当然,这套技能级观测与优化的闭环,目前更多存在于领先团队的实践中,还远未成为大规模标准化现实。但结构已经具备,剩下的只是规模与时间。

API 本质上是给程序员用的。Skill 是被模型规划的。前者假设人类写流程。后者假设模型生成流程。一旦组合权从程序员迁移到模型,技能的意义就发生了变化。它不再只是代码库中的函数,而是技能图中的节点。Skill 的价值,不在它比 API 更高级,而在它让“运行时组合”成为可能,同时仍然保持工业边界。理解仍然由大模型承担,执行开始有清晰的约束。这一步,看似保守,其实是工业化。

一个成熟的 skill,至少意味着三件事:输入输出是结构化的(定义了schema);执行是可重试、可回滚的;权限是隔离的,状态是可审计的。你可以限制它的访问范围,可以记录它的调用链,可以为它计费,可以随时撤销它的权限。这些听上去一点都不性感,却是企业真正关心的东西。

它不像革命,更像基础设施建设。某种意义上,skill 是一种折中,是在标准尚未成熟之前的现实妥协。有一次听一位工程师说:“协议是理想主义,skill 是现实主义。” 就是这个意思。

或许两条路线终会合流。但目前,它们更像不同时间尺度上的试探:一个在设计未来的秩序,一个在支撑当下的落地应用。

4 技能密度

如果只是把 skill 理解为架构收敛,那还是低估了它。真正值得注意的,不是我们如何组织技能,而是技能如何开始形成密度。

过去两年谈大模型,我们几乎离不开参数规模、榜单成绩、推理分数。仿佛模型越强,生态自然跟上。但当模型能力逐渐进入同一量级,分差开始变得细微——97 分与 95 分的差别,很难再决定命运。那时候,问题悄悄换了一个方向:不是谁更聪明,而是谁背后站着更多真实可用的技能。

想象两个认知层几乎等价的模型。一个背后有二十个高质量 skill,另一个背后有两百个。前者能解决二十类问题,后者则可以在这些技能之间自由拼接、叠加、递归组合。二十个技能是工具箱;两百个技能,是图谱。工具箱解决问题,图谱开始创造路径。

技能一旦被模块化,它的价值就不再是线性的,而是网络化的。新增一个技能,不只是多一种用途,而是多出若干种组合可能。密度越高,组合空间越大,系统的“解题维度”也越多。这才是技能密度的真正含义。

移动互联网时代的经验其实早已给过提示。决定平台胜负的,并不是操作系统内核本身,而是应用数量、分发效率、支付体系与开发者活跃度。内核差异存在,但真正形成飞轮的是生态。当基础能力逐渐趋同,竞争自然转向外围的网络结构。Agent 时代未必合适做完全类比,但方向上的相似已然浮现。

于是,关键问题不再是 skills 有多少,而是它们之间能不能流动。能不能被检索?能不能被不同模型规划?能不能跨系统复用?如果技能只是堆在某个平台内部,那只是库存;只有当它们开始彼此连接、彼此调用,密度才会转化为网络效应。到那时,模型反而退到幕后,成为驱动能力网络运转的认知引擎,而不是舞台中央的主角。

这也是为什么协议和 skill 看似分岔,却可能指向同一个终点。协议更像公路标准,skill 像车和货。没有统一标准,技能难以跨域迁移;但没有真实技能,标准也只是空架子。眼下行业更像是先让车跑起来,再慢慢铺路。两条路线不是对立,而是不同节奏下的推进。

最后,那个大家期待的“App Store 时刻”还有多远?

移动互联网真正爆发,是因为分发体系成熟,支付打通,用户规模到位,超级应用出现。Agent 还没有迎来这样的节点。没有大规模的第三方能力市场,没有稳定分发的 skill 商店,也没有形成网络效应的爆款应用。Agent 现在更像移动互联网早期——有 SDK,有开发热情,但还没有形成生态飞轮。

真正的拐点可能不是几个应用的走红,而是一种结构的固化——某些技能节点开始被高频复用,某些组合路径成为默认范式,某个技能图谱逐渐变成事实标准。当技能密度高到一定程度,迁移成本自然升高,生态便悄悄形成壁垒。

垂直行业的爆发似乎一直在“即将发生”。法律、医疗、金融、教育……效率提升在发生,但结构性重塑还没有真正显现。责任边界、监管约束、数据壁垒,这些都比移动互联网复杂得多。

也许 Agent 不会以移动时代的形式爆发。它可能不是一个商店,不是一个下载按钮,不是一个用户主动选择的前台应用。它更可能以skill的形式嵌入既存系统,以后台能力的形式存在。你甚至不会意识到自己在使用 Agent,但系统已经被悄悄重写。

5 memory:任务连续性的保障

memory 可能是这一年最容易被低估的进展。

早期的 Agent 最大的问题,不是不聪明,而是短命。一次对话里很聪明,换一个窗口就失忆。企业环境下,这几乎是致命的。你无法建立长期协作关系,无法积累项目语境,无法形成持续的上下文。所有任务都从零开始,所有协作都像第一次见面。

memory 的加入,不只是为了“更懂用户”,而是为了保障任务连续性。当 Agent 开始记住偏好、约束、历史项目、上下文背景,它才真正从一次性推理工具,变成持续存在的系统。当系统开始“有历史”,它才真正具备组织价值。

但在讨论 memory 之前,需要把几个常被混淆的概念拆开。长上下文、RAG、持久状态,常常被笼统称为“记忆”,但它们其实处在不同层次。

长上下文更像 working memory——它扩展的是模型在当前任务中的注意力范围。窗口越大,模型能在一次推理中考虑的历史越多。但它仍然属于“当下”。一旦任务结束,注意力就消散。

RAG 更像外部存储的检索机制——当模型需要某些信息时,从知识库中调取资料。它解决的是“查阅”的问题,而不是“持续”的问题。它让系统在需要时能找到过去的信息,却并不自动形成时间连续。

真正意义上的 memory,是持久的(persistent) 。它至少涉及三层结构。

第一层是任务状态。任务跑到哪一步?哪些子步骤已经完成?是否可以断点续跑?这决定了系统是否具备持续执行能力,而不是每次失败都从头再来。

第二层是长期语境。用户偏好、组织约束、历史项目、权限边界——这些不应在每次对话中重复解释,而应成为系统可更新、可检索、可继承的背景。它减少重复解释的成本,可以在多任务之间共享背景,可以在组织内部形成稳定的协作节奏。

第三层是行为轨迹与决策历史。系统过去在类似场景中选择了什么路径?哪些能力组合更可靠?哪些尝试曾经失败?这已经开始接近一种“经验结构”。不是简单存储信息,而是积累行动模式。

当这三层逐渐成形,Agent 才真正拥有时间持续性。它不再只是一个即时推理引擎,而开始成为持续体。它的价值不再体现在单次回答的聪明程度,而体现在长期协作中的稳定性与积累性。

当然,这条路径仍然早期。长上下文依然昂贵,RAG 仍然粗糙,长期记忆的更新与遗忘机制尚未成熟。更棘手的是,记忆不仅带来效率,也带来风险。错误会不会被固化?偏见会不会被积累?系统是否需要主动遗忘?在持续体的世界里,遗忘和记住往往同样重要。时间既是资产,也是负担。

如果说 skill 解决的是行动边界,技能密度解决的是横向组合,那么 memory 解决的,是持续性。没有持续性,Agent 永远只是聪明的工具;一旦有了时间,它才可能成为组织的一部分。

6 开源大模型的重要性

还有另一条线索,在全球悄悄改变力量结构——那就是中国开源大模型的角色。

过去一年,如果只盯着闭源巨头,很容易忽略开源模型的跃迁速度。千问、Kimi、Step等模型开始频繁出现在开发者真实工作流里。不只是聊天测试,而是跑代码、跑 Agent 任务、跑多模态处理。

阶跃星辰春节前发布的 Step 3.5 Flash,是一个有象征意味的节点。

它的意义不在“参数更多”,而在方向感。它采用了稀疏混合专家(MoE)结构:1960 亿总参数,每次只激活约 110 亿。不是盲目扩张,而是强调效率与结构。

当传统模型用线性注意力硬撑长上下文时,它采用滑动窗口与全局注意力的混合方式。像读推理小说,大部分注意力集中在当前段落,但关键伏笔可以被快速召回。

当逐 token 生成成为默认路径时,它引入多 token 并行预测,提高速度。

这些改变,恰好对应 Agent 时代的核心需求:更长上下文、更低延迟、更稳定的逻辑执行。

Agent 不是聊天机器人。它需要等待工具执行,需要在多轮任务中保持一致性,需要在长上下文下快速响应。

更有象征意义的是,本地部署。

当一个 256K 上下文的模型,可以在 128GB 内存的 MacBook 上运行时,权力结构开始变化。Agent 的“原生大脑”不再完全锁在云端 API 里。开发者可以在终端侧构建私有工作流。这是一种终端平权。

开源在这里变得关键。垂直行业不会轻易把核心流程托付给闭源黑盒。医疗、金融、法律,需要可控、可调优、可部署的基座。

开源模型降低了实验门槛,也降低了创新门槛。很多垂直 Agent 的试验,正发生在这些模型之上。

结语

有时候我会想,这一年真正的变化,不在技术指标上,而在心态上。我们不再问:“它像不像个员工?” 我们开始问:“它能不能长期、稳定、可治理地做事?” 这是一个从幻想走向结构的过程。

协议还在演进。skills 在扩张。memory 在巩固。开源大模型越来越实惠。垂直应用在试水。一切都在进行,时间还不足以让它们马上成熟。

如果说这一年教会我们的是什么,也许是这一点:技术革命往往不是轰然到来,而是慢慢嵌入。当你意识到它已经成为结构的一部分时,它才真正发生。

雾还没有散。但轮廓已经出现。

 

from 腾讯科技,策划:晓静