AI on Hongjiang Bao's Blog

Standing in 2026 — The Next Stage of How I Use AI

Thu, 23 Apr 2026 00:00:00 +0000

Foreword

Around this time last year I wrote Standing in 2025 — Looking Back at AI, and Looking Forward. My take back then was: AI is an amplifier — own the framework, hand the details to AI.

A year on, that conclusion isn’t wrong. It’s just no longer enough.

Because “have AI handle the details” has itself splintered into multiple wildly different layers. The productivity gap between people stuck on different layers is widening fast.

And I’ve personally hit a new bottleneck. This post is me writing down how I see that bottleneck and what I think the next stage is.

My five stages of using AI

A quick recap of how my own usage has evolved:

Stage	Form	Representative	What I do
L1	Web chat	ChatGPT web	Paste question in, paste answer out
L2	IDE plugin	GitHub Copilot	AI completes alongside me; I’m in the driver’s seat
L3	AI-native IDE	Cursor	AI edits multiple files; I review
L4	Terminal-native agent	Claude Code	AI can touch my whole machine; I confirm via dialogue
L5	?	?	?

The throughline is one thing: AI’s operating boundary keeps expanding, and humans are progressively freed from layer after layer of concrete operation.

Each leap is an order-of-magnitude jump.

The bottleneck: top experts can’t get enough out of three Max 20x’s; I can’t even use up one Pro

I just bought Claude’s Max 5x. The result: I can’t even fully use it.

Meanwhile, some top developers publicly say they can’t get enough capacity out of running three Max 20x accounts at the same time.

That contrast made me stop and think — same tools, why can they burn through ten times the compute? Experts this strong, even they don’t have enough? Where’s the actual gap? How do I close it?

Thinking it through, the answer is clear:

I’m the bottleneck.

For every task I’m still going back and forth with Claude Code:

“Search for related code first before changing anything”
“That approach isn’t right, try a different angle”
“Run the tests first”
“Confirm before committing”

I’m running 3–5 Agents in parallel and my brain is already taxed. Human context-switching has a cost. Tops, you can manage 5–7.

It’s a strange feeling — I’m holding a rocket but I’m still shifting gears one at a time.

The next stage: from “operator” to “legislator”

How do you break this bottleneck?

After working through Anthropic’s public docs, the workflow notes from Boris Cherny (the creator of Claude Code), and a bunch of heavy-user practices, my conclusion is:

The next stage isn’t “manage more Agents.” It’s that you stop personally managing Agents at all.

That sounds abstract, but unpack it and it’s very concrete:

	L4 (where I am)	L5 (next stage)
Am I in the loop?	Yes	No
What do I do?	Talk, confirm, correct	Write rules, define acceptance, arbitrate
When does AI stop?	When I say stop	When the rule says it’s done
If I leave the keyboard for 8 hours	The system halts, waiting for me	The system has been making progress for 8 hours

The single test for L5: when you walk away from the keyboard for 8 hours and come back, has the system stopped waiting for you, or has it already finished its run?

In L5, the focus of your work fundamentally changes:

Write specs, not prompts
Define acceptance criteria (tests, lint, human-review checkpoints), don’t review every step
Design hooks and guardrails, not confirmation buttons
Build a feedback loop (failures auto-feed back to the AI to keep iterating), don’t manually retry

You open your laptop in the morning, and what you see is no longer “Claude is waiting on me to approve something.” It’s: “Out of last night’s 12 PRs, 9 already passed automated acceptance, 3 are flagged red for me to judge.”

The only thing you do: look at those 3 reds, and patch the rule that let them turn red.

I’m already crawling toward it: what L4.5 actually feels like

L5 sounds far off, but my L4 has actually been transitioning into L4.5.

The feeling is already different. Claude Code can run remotely, run in the background; I’m not staring at every line it writes. Most of my day is spent looking at the reports it hands me, then doing a few things:

Judge whether the direction is right
Decide: keep going, switch approach, or send it back to redo
Make calls and decisions at key checkpoints
Handle whatever it can’t crack and gets stuck on

Honestly, I’m now more like a remote-control manager than a programmer.

I didn’t write the code, but I set the direction, I drew the boundaries, I judged the quality. It’s a strange state — the rhythm has changed completely. I get a noticeably larger amount done in a day, but every individual decision carries far more weight. One bad direction can torch hours of downstream output.

This still isn’t L5. L5 is when even the “look at the reports” step is partly handed off — the rules auto-filter 80% of the content, and I only look at the remaining 20%. But this transition state, L4.5, is already enough to make me feel it: managing AI feels nothing like writing code yourself. They are two completely different jobs.

The productivity gap is widening

This is the thing I most want to call out: the productivity gap between people stuck on different stages is widening at an order-of-magnitude rate.

L1 people use AI to answer questions and look stuff up. Limited gains, but real.
L2 / L3 people fold AI into their code-editing flow. Multiple times more efficient than L1.
L4 people get AI to run and complete a whole task autonomously. Multiple times more efficient again.
L5 people have a dozen tasks running in parallel and only do arbitration and legislation. Another order of magnitude on top.

The thresholds between these layers aren’t linear. L1 → L2 is easy: install a plugin. L4 → L5 is very hard — it requires you to redesign the entire shape of your work. You stop being “the person using the tool” and become “the person designing the rules for using the tool.”

What it asks of you: both knowledge density and total knowledge

Here’s the most counter-intuitive part.

A lot of people assume the AI era lowers the bar — “AI handles details, I just need a sketch.”

Wrong. The exact opposite.

The AI era raises the bar — and on two axes at once:

Total knowledge — you need to know more. Judging direction, drawing boundaries, arbitrating: every one of those decisions stands on actually understanding the domain. If you only know a tech stack at a surface level, you literally cannot tell that the code AI wrote is bad.

Knowledge density — your judgments per unit time must be higher and sharper. In L5, your day looks like:

10 minutes reading a PR summary, decide whether to merge
5 minutes reading a failure report, decide whether to fix code or fix the rule
20 minutes writing a spec that defines a feature’s acceptance edges
30 minutes reviewing a new rule, judging whether it’ll friendly-fire other tasks

Every action is a decision; nothing is “execution.”

In less time you have to make more decisions, more accurately: see at a glance where the AI will derail; while writing the spec, predict where the boundary will explode; the moment you see a new rule, know what it’ll friendly-fire.

These are exactly the abilities AI cannot replace. Because at their core they’re experience + taste + judgment, not information processing.

My read: second half of this year through the first half of next year, the first genuinely native L5 product will land — possibly Anthropic itself, possibly a third party on top of the Claude API. By that point, the gap between people will be wider still.

Standing in 2025 — Looking Back at AI, and Looking Forward

Wed, 12 Mar 2025 00:00:00 +0000

Foreword: It’s now 2025, and AI is white-hot. In this post I want to share my personal take on AI — what I understand, what I expect, and how it’s helping my career and life.

Looking at the present from the past, and predicting the future from now.

An article from January 2015 that predicted today’s AI

On January 27, 2015, an article was published that completely upended my view of AI. Here’s the (Chinese-translated) link: Artificial Intelligence may very well lead to humanity’s immortality or extinction — and it’s quite possible all of this will happen within our lifetimes (in Chinese) If you’re interested, please read the whole thing! I strongly recommend it! In this post I’ll just lift two of its conclusions:

Human technological progress is exponential!
AI’s growth could leap past human cognition in an instant!

At the time, most people would have called this nonsense. After all, in January 2015 OpenAI didn’t even exist. Look at the timeline:

1
2
3
4
5
6
7
8


2015 — OpenAI founded, focused on AGI.
2016 — AlphaGo beats Lee Sedol; AI surpasses humans at Go.
2017 — Transformers (Google) ignite the NLP revolution and become the foundation for GPT, BERT, etc.
2019 — GPT-2 (OpenAI) released, showing strong text generation.
2020 — GPT-3 (175B parameters) released, kicking off the AIGC craze.
2022 — ChatGPT (GPT-3.5) released and instantly explodes.
2023 — GPT-4 released, far more capable, multimodal (text + images).
2023 onward — AI products everywhere.

If you read that article, you should be a little stunned right now. Its predictions are almost spot on — incredibly forward-looking! Ten years ago, when the internet had only just gone mainstream and most people had only just gotten a smartphone, the author already predicted, accurately, where AI would be a decade later. Once again, I strongly recommend everyone who hasn’t read it go read it.

History has confirmed the article’s accuracy. So let’s see what that 2015 article predicted for after 2025. The article splits AI into three stages: Artificial Narrow Intelligence (ANI), Artificial General Intelligence (AGI), and Artificial Super Intelligence (ASI). I think we’d all agree that as humans, we’ve now reached AGI.

My personal take:

The moment ASI arrives, its intelligence will, in an instant, dwarf the sum of all human intelligence — growing at unbelievable, absurd exponential rates. Humanity is therefore extinguished, or made immortal.
The most optimistic estimate: ASI in 2030. Conservative: 2050. Pessimistic: 2080, or never.

All of today’s AI is essentially modeled on the human brain. Look at this from a compute angle:

1
2


As of 2024, the world's fastest supercomputer (e.g. Frontier) has reached 1.2 EFLOPS (1.2 × 10¹⁸ FLOPS).
Because the human brain doesn't run like a computer, direct comparisons are hard, but typical estimates put it at 1 EFLOPS to 100 EFLOPS (10¹⁸ to 10²⁰ FLOPS).

You can see, today’s strongest compute is approaching the brain. I’ve always believed: when computer compute exceeds the brain, intelligence will get very close to the brain too. And right now something like o3 (whose compute is still nowhere near a brain) is already far past the vast majority of humans. So what about the future? What happens as compute keeps growing?

In short: I think AI intelligence will continue to compound exponentially, just as that article predicted, possibly all the way to AGI and beyond.

An article I wrote in December 2023 about how I understand and use AI

Back in the GPT-3 era I’d already started using AI heavily. GPT-4’s arrival especially gave my technical chops, vision, and thinking a quantum leap. Original: How I Use AI — Notes from the Field Now, with even more, even stronger AI products around, my hands-on ability has grown massively, and I’m even more convinced that what I wrote then was correct. That’s exactly what the next section is about.

How AI completely flips the way I solve problems and think about them.

Let me re-quote my own conclusion: Human life is finite. We can’t master that much knowledge or that many skills. AI overturns the traditional learn-then-execute pipeline: master the framework, leave the details to AI, and your efficiency and ceiling go way up.

Some specific points:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


### AI completely changes how we learn knowledge and master skills
The traditional model demands you internalize a full system; AI fills in the specific details and dramatically speeds up learning.
You only need to grasp the "trunks" of the knowledge tree; AI fills in the "leaves."

### AI accelerates execution — going from learning to shipping is way faster
Projects that previously required full mastery before you could start can now be moved forward as long as you know it's broadly feasible — AI handles the rest.
Examples: building a blog, an AI WeChat public account, an AI chat website — all started as a concept, then AI fleshed out the details and got it shipped.

### AI is an efficient executor, but the ceiling is still set by you
AI helps with problems that already have mature solutions, but on frontier exploratory problems it still struggles.
Your own ability and depth of thought decide the final outcome — AI is just an amplifier.

### Practice is everything; the best way to use AI is "do it with your hands"
Real growth comes from actually doing it. The tricks come naturally as you go.
"Use it if you can, just dive in" — far more important than talking about technique.

For me personally, AI now touches 80%+ of what I do, work and dev included. DeepSeek R1’s reasoning ability can attack a problem from every angle. Its breadth and knowledge far exceed mine. If one day I genuinely learn that kind of structured thinking, I can’t even imagine the result. As I said: practice is everything. Use it deeply, you’ll get it. Without using it, talk is moonlight on water — useless.

Roundup of AI products as of today

As of writing, March 12, 2025, here’s my summary of the popular AIs out there, including ones I’ve personally used. Their characteristics, pros and cons. Includes more than just LLMs. For your reference:

LLMs
- GPT family
  - o3, o1 — currently the strongest and the most expensive reasoning models. Massive context (o1 = 200K), the strongest reasoning and logic right now. Especially good at code analysis and logic. I only call this big brother in when its little siblings can’t solve a problem.
  - o1-mini, o3-mini, o3-mini-high — mini versions with better speed and price/performance trade-offs. For hard code problems, I usually try them first.
  - 4o — the multimodal one. Well-rounded, fast. Reads images directly, parses audio. Speed and price are excellent. My most-used model right now.
  - 4.5 — newest GPT, way better creativity and “humanness,” but seriously expensive. I save it for creative work and copy.
  - Other GPT models — all weaker siblings; the only upside is a slightly cheaper price. Not worth discussing.
- DeepSeek family
  - R1 — currently the strongest Chinese-language reasoning model and a price/performance monster. You’ve all seen how strong it is. Cheap, especially good at very Chinese-flavored reasoning problems. Heads up: its coding ability is weaker than other models — really, don’t use it for code. I use it for problems with strong Chinese-language texture. Very grounded.
  - V3 — non-reasoning version. I rarely use it.
  - Other parameter sizes — DeepSeek’s strength is hitting GPT-4o-class performance with the lowest compute and hardware. A lot of websites sneakily pass off 32B / 70B versions as the full 671B. I’m calling it out because R1 32B 4-bit quantized is the best model you can run locally on a 4090 24G consumer GPU. From experience, useful for things involving confidentiality.
- Claude family
  - Claude 3.7 — multiple variants; the everyday one is Sonnet. Currently the second-best after GPT. Very humane to use; many heavy users say its coding ability beats GPT-4o. Great UX — they pioneered Canvas, and GitHub Copilot put it in their membership, which says a lot about its code price/performance. I use it to cross-check GPT’s code.
- Grok family
  - Grok 3 — Musk’s AI. Reportedly uses the most reasoning compute of anyone right now. In code it doesn’t have an obvious edge over the others. Worth noting: its content moderation is extremely loose.
- Gemini family
  - Gemini 2.0 — Google’s. Thanks to Google’s infrastructure muscle, very fast. Natively multimodal. Search ability clearly beats other models. (Of course — Google!) I don’t use it much; intelligence-wise it’s not clearly ahead.
- Other / local models
  - On Hugging Face there’s a flood of models, plus QwQ from China and others — each with their own strengths. Most are smaller, specialized for a specific domain. With limited bandwidth, I generally don’t bother.
- Manus
  - From all the noise, this is hype and a scam. Once people actually use it, we’ll see if it’s gold.
AI image generation
- Stable Diffusion — the famous SD. Open source, deeply customizable. Different base models = different styles. Tons of plugins. Easy to deploy and run on a home PC. Downsides: output is fully tied to the base model, and the learning curve is steep.
- Midjourney — also famous (MJ). Very strong, only available as a service. Wide range of styles. Downsides: little customization, expensive. The polar opposite of SD.
- DALL·E 3 — way behind the above two. The only upside is integration into the native ChatGPT web. Not really useful.
TTS (text-to-speech)
- VITS family
  - VITS was originally released by a Chinese developer; lots of forks and downstream work. Currently the strongest open-source one is GPT-SoVITS. With just a few minutes of source audio, it can produce highly similar multilingual speech. A home PC is enough to fine-tune and infer. Some “do it with your hands” results from me: Inferred version of my own voice WW2 commentary, original voice WW2 commentary, AI-synthesized
- Other vendors — Microsoft TTS, Google TTS, Douyin, all closed-source. Custom fine-tuning is expensive; otherwise you’re stuck with their pretrained voices. That said, the out-of-the-box quality is already excellent!
STT (speech-to-text)
- Whisper — OpenAI’s open-source model, supports 100+ languages. Currently the strongest open-source STT. Runs locally on a home PC.
- Others — Microsoft, Google, iFlytek, etc., all have plenty of APIs.
Other
- Suno — AI music generation, sounds good, but practical use is still weak. Future looks promising.
- Sora — video synthesis is everywhere now, but most output looks weird. Another path is heavily customized SD with image stitching for video. Both are being explored.
Combo / tooling
- GitHub Copilot — strongly recommended. Microsoft’s flagship. Deeply integrated into IDEs, especially VS Code and Visual Studio. Free tier exists, paid is $10/month. Wraps GPT, Claude, and Gemini. Once it sees your IDE context, it’s incredibly convenient.
- Cursor — an AI-first IDE built on VS Code, neck-and-neck with Copilot. Also very convenient. But because it’s an IDE rather than a plugin, you give up some flexibility.
- Poe — an aggregator over multiple AIs; basically a wrapper frontend that calls each vendor’s API. Pros: one-stop access, some free quota. Cons: API calls usually fall short of native vendor sites in features and quality.

My personal outlook on the future of AI

I think AI has already permanently changed how I think and how I solve problems. ~~Of course, if ASI shows up and humanity gets wiped out, none of this matters. So our default assumption has to be that AI carries us up the next tech tier.~~ In this era you have to keep up, keep leveling up, keep absorbing new knowledge. The only constant in the world is change itself. After watching the inflated boom in the China market, I’ve concluded: AI on its own doesn’t generate huge value the way some prior technologies did. It only produces magic when combined with another field. Like internet dev, traditional retail, traditional manufacturing, literature & film, education, etc.

Either way, we should all think like founders. Letting AI fill in the details and do the grunt work for you is the right move. Build out your own knowledge framework, stop sweating the details. Grab the essence of the problem and the main contradiction. In the future, I’ll go even deeper with AI in everything I do. I also hope to lift my whole team’s AI fluency and improve the workflow. Here are a few practical AI deployments I’ve distilled from work — some already shipped, some that I want my team to ship later:

Code dev
- Heavy use of GitHub Copilot to dramatically speed up coding and to auto-catch errors. Especially good for structured code. For unfamiliar libraries and APIs, it’s totally possible to be in a state of “I don’t remember it, but somehow I can use it” / “I haven’t read the docs, but once I get the framework I can write code right away.”
- Use git hooks to AI-review every commit. Keeps quality up and obvious mistakes out.
- Standardize project layout. Use the tree command (Windows) plus AI to keep directory structure clean.
- Reasoning about new requirements. When you face something you’ve never built, your first plan often misses corners. Ask AI to analyze: what’s the solution path, what should you watch for?
Game design / config
- Use AI to name variables. Few people pull off “faithful, expressive, elegant” naming. Things get confusing fast — AI helps a lot.
- Use AI for localization, even one-click scripts. High-quality, low-cost passable multi-language.
- Use AI for Excel formulas and number-crunching. Heavy Excel formulas are insane — just ask AI.
- Use AI to debug error messages. Designers usually aren’t from a code background and are stumped by errors. Paste the error, ask AI; most of the time you get something actionable. (This applies to anyone touching a project!)
- Use AI to write designer-side tools. Spot pain points in your own task and quickly write a tool to remove them.
- Idea collection. AI can quickly throw out a lot of ideas; you still have to filter and synthesize.
Art
- Reference and inspiration. Set up SD and Midjourney pipelines. Anything you can’t Google up in the right “vibe,” ask AI to draw. Quickly confirm direction with the requesting team. Build a curated prompt library by style — you can rapidly produce on-brief mockups, and unimportant cutscenes / scene art can ship as-is.
- Item icons, character concept art — already shipped successfully in the past. Tons of icons, even character concepts. AI does the first pass, art polishes — huge efficiency gain.
Other
- Voiceover. If you need it, GPT-SoVITS does customized VO well.

Two reference videos for understanding AI’s origins: 【Computer History】 NLP from “Past” to “Present” Why the Feynman Technique is called the ultimate learning method

How I Use AI — Notes from the Field

Tue, 26 Dec 2023 00:00:00 +0000

GPT — the greatest invention in history

Personal opinion. At least, as of right now. My programming and CS chops have leveled up massively, and I really owe that to GPT, especially GPT-4. Yes, it’s gone through many iterations and is far from where it started.

The mainstream AIs

A quick rundown of the popular conversational LLMs right now, ordered by my recommendation:

GPT-4 — strongest and most useful
Claude 2 — up to 100k context
New Bing — built on GPT-4, free to use, good for search
GPT-3.5 — cheap, mostly used for high-volume API calls
Bard — Google’s offering, so-so. Upside: free
iFlytek Spark (讯飞星火) — decent. Upside: convenient if you’re in China
Baidu ERNIE Bot (百度文心一言) — no comment
Other miscellaneous ones — with so many better options, why use junk?
Local LLMs — can be deployed locally, can be fine-tuned. Compared with using someone else’s product, you have far more creative possibilities.

In actual use, GPT-4 is the most helpful for both work and life, and the best one to use. The downsides are that registration and subscription are annoying, and $20/month is pricey. But I can tell you with full confidence: it is absolutely worth every cent. Truly insane. From here on, I’ll only talk about GPT-4. Other products aren’t worth discussing.

What GPT-4 changed for me

There are a million articles online about how amazing GPT-4 is and how to use it. I’m not going to repeat any of that. I’ll just talk about the biggest shifts for me personally.

A complete overhaul of how I learn and how I get things done

Traditionally, to write code or learn a new skill, you have to study the whole thing front-to-back, and only after enough projects, enough hitting walls, can you confidently say “yeah, I can handle this kind of problem.” The world’s knowledge is an ocean — the more you learn, the more you realize you don’t know. In one limited human life you can only ever master a tiny sliver. Especially in tech, innovation outpaces your ability to keep up. Picture all knowledge and skills as a giant tree. Each module is a trunk — say, Python is one trunk; Python’s syntax and libraries are the leaves on that trunk. Normally, to write decent Python software, you have to internalize the trunk and most of the leaves. Now here’s the problem: Python, Go, C#, Java, JS, TS, C++, etc. — that’s already a long list. Then there’s everything Linux: nginx, ufw, vim, OpenVPN, and on and on. .NET land has another whole stack. That’s before you get to algorithms, frameworks, design patterns, Docker, jump servers, and so on. With this much, even a whole life only gets you mastery of one or two. Everything else, you have to pretend you didn’t see.

Human life is finite. We can’t master that much.

GPT-4 totally upends this. It can answer questions and crank out tons of code on demand. (Of course, your professional level still caps the ceiling.)

GPT-4 massively speeds up learning, especially looking things up and debugging.
GPT-4 acts like a junior who completes the work you direct. Your own ability sets the upper bound.
Plus a huge amount of miscellaneous work.

GPT-4 is an extension of my ability — it handles the details.

To stick with the tree metaphor: I learn the trunks and the big-picture frame myself. The leaves and details, GPT-4 fills in. Walking every trunk and every leaf is too much for me. But touring all the trunks is easy. When I need a specific leaf, I tap GPT-4 to flesh it out.

Cover as many trunks as possible. When a problem hits, lean on GPT-4 for the leaves.

This completely overturns my old “learn it all first, then do it” methodology. Now, as long as I have a sense of all the trunks, I can ship something that’s theoretically possible but whose details I don’t yet know — and ship it fast.

A few examples:

Building this blog. In theory, set up a website, get a server. The details are massive — see The Tech Behind My Blog. That much technical detail, even though I’d never done it before, was tractable because I knew the trunks; the leaves I learned from GPT-4.
Building an AI WeChat public account. From an idea, starting with one official OpenAI API example, then a secondary WeChat account driven by simulated PC WeChat clicks, step by step it grew. I really was clueless at the start. Eventually it had Midjourney, Stable Diffusion, GPT-3.5/4, New Bing, voice-recognition chat, a membership system, and more. Tons of system and technical details. I knew it was theoretically doable and had a vague plan; GPT-4 filled in the rest. By the way, the account was called Xiao Hui Hen Zhi Hui (小慧很智慧). At its peak it had 4k+ subscribers. Costs got too high and I had limited bandwidth, so I stopped maintaining it.
Building an AI chat website. Started by forking an open-source vue3+express site from GitHub. Later I rewrote the backend in Python (FastAPI), with a database, account auth, etc. Eventually added WeChat Pay, SMS phone verification, and more. WeChat Pay especially is a beast — you also need an ICP-filed mainland China server and domain. But I knew it was possible, and I made it work in the end!
Building a giant LAN over OpenVPN tunnels

There are many similar cases in everyday work and life. I’ll stop listing.

GPT-4 completely flips the old “learn-it-all-then-start” model. Now, if it’s theoretically possible, I can move on it immediately and finish at terrifying speed.

Of course, for highly complex, exploratory work, both GPT-4 and I are stuck. Like research-grade new algorithms, or a distributed system that handles tens of millions of concurrent connections. But for most tasks someone has already solved before, with GPT-4 in the loop, I can do them fairly well.

As for specific tips and tricks — there’s already a flood of those online, and a few sentences won’t cover it. If you actually care, just go bang on it. If you can use it, the tricks come naturally as you go. If you can’t, no amount of talk will help.