My approach: I pulled data I crawled in 2023 from a certain Ford-like or Jiang-like site, filtering for articles published between 2010–2022 (pre-ChatGPT). I only filtered out extremely low-traffic or extremely short works, then randomly sampled nearly 10,000 multi-thousand-word texts as human-written samples.
and this loop runs very hot, near a trillion operations per second in L1 cache, so you want to avoid any other overhead when possible, compute things ahead of time and so on. but the core algorithm is just this. the rest is engineering and optimizations on top of this.
,推荐阅读服务器推荐获取更多信息
We have a more fully-worked example in our test
Agent CLI installed and configured for whichever agent(s) you use: