<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://therepanic.com/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://therepanic.com/blog/" rel="alternate" type="text/html" /><updated>2026-01-18T00:52:13+03:00</updated><id>https://therepanic.com/blog/feed.xml</id><title type="html">i want to believe</title><subtitle>Notes on things I don&apos;t want to rethink from scratch</subtitle><author><name>i want to believe</name></author><entry><title type="html">tests should become public property</title><link href="https://therepanic.com/blog/2026/01/17/tests-should-become-public-property.html" rel="alternate" type="text/html" title="tests should become public property" /><published>2026-01-17T23:44:01+03:00</published><updated>2026-01-17T23:44:01+03:00</updated><id>https://therepanic.com/blog/2026/01/17/tests-should-become-public-property</id><content type="html" xml:base="https://therepanic.com/blog/2026/01/17/tests-should-become-public-property.html"><![CDATA[<p>I started thinking about this question relatively recently. Competitive programming platforms (like Codeforces, LeetCode, HackerRank, and so on) don’t provide open tests. Mostly they give us a few test cases so we can run our code against them, get a rough idea that it works, and then submit. <em>But this submission will be run through hundreds or thousands of additional tests</em> that the platform keeps private. Sometimes on certain platforms we can see which test failed, but that doesn’t change the essence: <strong>all these tests are company property</strong>. Recently, while reading through an old chat history, I came across a message written several years ago.</p>

<blockquote>
  <p>who said such bullshit? they’re closed. it’s their strategic commodity</p>

  <p>– Unknown user</p>
</blockquote>

<p>And with each passing day I’m starting to believe this more and more. If you really think about it, we can come up with tons of reasons why they’re kept private. <em>Moreover, they might even have more value than the problems themselves</em>. An error in a solution, a specific failed test case, a pattern of attempts - all of this can say a lot about a person’s thinking style. Furthermore, you could plug AI into this task and sell information about all your mistakes and submission attempts to interested parties.</p>

<p>Or is this already happening? More likely yes than no, but what I know for sure is that no matter how open a competitive programming platform might seem, in reality they’re all <strong>heavily centralized</strong> and generally closed.</p>

<hr />

<p>btw, I want to at least try to take a step in the opposite direction and attempt to <a href="https://github.com/therepanic/leetribute">do something towards a possible solution to this problem</a>.</p>]]></content><author><name>i want to believe</name></author><summary type="html"><![CDATA[I started thinking about this question relatively recently. Competitive programming platforms (like Codeforces, LeetCode, HackerRank, and so on) don’t provide open tests. Mostly they give us a few test cases so we can run our code against them, get a rough idea that it works, and then submit. But this submission will be run through hundreds or thousands of additional tests that the platform keeps private. Sometimes on certain platforms we can see which test failed, but that doesn’t change the essence: all these tests are company property. Recently, while reading through an old chat history, I came across a message written several years ago. who said such bullshit? they’re closed. it’s their strategic commodity – Unknown user And with each passing day I’m starting to believe this more and more. If you really think about it, we can come up with tons of reasons why they’re kept private. Moreover, they might even have more value than the problems themselves. An error in a solution, a specific failed test case, a pattern of attempts - all of this can say a lot about a person’s thinking style. Furthermore, you could plug AI into this task and sell information about all your mistakes and submission attempts to interested parties. Or is this already happening? More likely yes than no, but what I know for sure is that no matter how open a competitive programming platform might seem, in reality they’re all heavily centralized and generally closed. btw, I want to at least try to take a step in the opposite direction and attempt to do something towards a possible solution to this problem.]]></summary></entry><entry><title type="html">AI-generated output is cache, not data</title><link href="https://therepanic.com/blog/2025/12/20/ai-generated-output-is-cache-not-data.html" rel="alternate" type="text/html" title="AI-generated output is cache, not data" /><published>2025-12-20T00:00:00+03:00</published><updated>2025-12-20T00:00:00+03:00</updated><id>https://therepanic.com/blog/2025/12/20/ai-generated-output-is-cache-not-data</id><content type="html" xml:base="https://therepanic.com/blog/2025/12/20/ai-generated-output-is-cache-not-data.html"><![CDATA[<h2 id="the-scale-of-ai-slop">The scale of AI slop</h2>

<p>The volume of AI-generated content is growing rapidly. By some estimates, <a href="https://journal.everypixel.com/ai-image-statistics">over 34 million images are generated daily</a>. This creates enormous pressure on storage systems. Models for generating not just images but also video are becoming increasingly accessible. The problem is particularly acute on short-form video platforms: even 10–15 seconds of generation is enough to fill feeds. TikTok alone has <a href="https://newsroom.tiktok.com/more-ways-to-spot-shape-and-understand-ai-generated-content?lang=en">labeled over 1.3 billion videos as AI-generated</a>. On YouTube Shorts, <a href="https://www.kapwing.com/blog/ai-slop-report-the-global-rise-of-low-quality-ai-videos/">up to 33% of feed videos are AI slop</a>. The real number is higher, as much content isn’t marked as synthetic. The problem is compounded by accounts mass-generating content for algorithmic promotion.</p>

<h2 id="a-concrete-example">A concrete example</h2>

<p>Let’s say we want to generate an image of a tiger. Using this prompt (ChatGPT Image 1.5):</p>

<blockquote>
  <p>“A majestic tiger standing in a jungle, ultra-detailed, realistic lighting, sharp focus.”</p>
</blockquote>

<p>We get an image:</p>

<p><img src="/blog/assets/images/image.jpg" width="400" /></p>

<p>Typical scenario: the result is generated in seconds, gets regenerated several times, and is published in multiple variants. Now we have several useless 2MB images at the same resolution. Twitter will compress them, but they’ll still be stored <a href="https://blog.x.com/engineering/en_us/a/2012/blobstore-twitter-s-in-house-photo-storage-system">in blob/object storage</a> even when they become logically unused. Even after compression or migration to cold storage, the file remains saved. In my case it compressed to 258 kilobytes, but the total size depends on how much content we generated. Let’s move to generation that makes the problem clearer: generating a 15-second video of a tiger, for example through Sora2, using the same prompt I used for the image.</p>

<p><img src="/blog/assets/images/video.gif" alt="Video1" /></p>

<h2 id="cost-of-storing-slop">Cost of storing slop</h2>

<p>The video is 9.8 megabytes and took a couple of minutes to generate. If we upload it to YouTube, it compresses to 2.1 megabytes. Let’s calculate the cost of storing this. These estimates are optimistic: they don’t account for upload growth, caching, traffic, or potential storage of originals. <a href="https://blog.youtube/press/">YouTube reports over 20 million</a> video uploads per day (≈600 million per month). Even if only a small percentage of uploaded videos are AI-generated, a conservative lower estimate of ~50 million videos per month already implies the following. Given that slop producers spend nothing or almost nothing on distributing such content, while YouTube stores videos indefinitely, can move them to cold storage and compress better, they’ll still be stored and take up space. For file size we can use my file size as a baseline. So we get:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>5 * 10^7 videos * 2.3 MB
≈ 115 TB per month
≈ 1.38 PB per year
</code></pre></div></div>

<p>I’m not accounting for the fact that someone watches these videos, in which case we’ll need to cache some of this content, and that cache will also cost serious money, as will S3 retrieval operations and traffic. Over a 5-year window, assuming the amount of slop uploads remains constant (doesn’t increase, though that’s rough), that’s about 6.9 PB. For platforms without their own storage, this means direct and growing costs.</p>

<p>Let’s roughly estimate the cost of storing this much content. Say we used <a href="https://www.cloudflare.com/developer-platform/products/r2/">Cloudflare R2</a>, which doesn’t charge for traffic. Let’s calculate just the storage cost of our AI slop. Taking our hypothetical 115 TB per month, assuming linear upload growth, we get:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>690,000 GB * $0.015
≈ $10,350 / month
≈ $124,200 / year
</code></pre></div></div>

<p>We could use <a href="https://developers.cloudflare.com/r2/pricing/">Infrequent Access</a> purely for storage, and then:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>690,000 * 0.01 
≈ $6,900 / month
≈ $82,800 / year
</code></pre></div></div>

<p>Now remember this content only grows. Compression won’t save us on costs. If we used other clouds (AWS/GCP), we’d also pay for egress. YouTube already has a huge amount of videos <a href="https://chazans.com/wp-content/uploads/2025/02/McGradyEtAl2023.pdf">almost no one will watch</a>, but AI slop will be added to these videos, only increasing the price we pay for storing meaningless content.</p>

<h2 id="diagnosis-we-are-storing-cache">Diagnosis: we are storing cache</h2>

<p>From this we can conclude—in reality, we’re storing cache. Can we equate a Twitter post from an unknown artist that almost no one saw, and a bot pursuing its own goals that generated a meaningless image, or an unknown aspiring content creator and a bot that generates batches of meaningless videos? These types of content cannot be equated. AI slop is reproducible and disposable. Currently this is a problem that’s growing linearly. Generation is getting cheaper, while the amount of generated content and content subsequently uploaded to social media is growing linearly. AI-generated content is reproducible, generates little interest, and is disposable. Treating it as a permanent artifact is effectively a data storage anti-pattern.</p>

<h2 id="proposal-prompt-only-storage">Proposal: Prompt-only storage</h2>

<p>Instead of storing synthetic media content, we could store its description: prompt + various generation parameters. For example:</p>

<p>Store as source of truth:</p>
<ul>
  <li>prompt</li>
  <li>model identifier + version</li>
  <li>generation parameters (seed or equivalent, sampler, resolution)</li>
</ul>

<p>Store media output:</p>
<ul>
  <li>as cache</li>
  <li>with TTL</li>
  <li>regenerate on demand</li>
</ul>

<p>Many platforms already ask you to mark content when uploading if it’s fully AI-generated (for example, <a href="https://support.tiktok.com/ru/using-tiktok/creating-videos/ai-generated-content">TikTok</a> or <a href="https://support.google.com/youtube/answer/14328491?co=GENIE.Platform%3DAndroid&amp;hl=en">YouTube</a>). That is, we already have a tool for determining whether a video is generated or not, and this will only improve. In cases where users don’t check that box, platforms are working on detecting this automatically anyway. This also allows preserving the model for exact reproduction. Let’s say we break down the video I showed above into a prompt. Using Gemini 3 Flash, I sent it the video and asked it to produce a prompt. I got something like:</p>

<blockquote>
  <p>“A cinematic, close-up shot of a majestic Bengal tiger standing in a dense, misty tropical jungle…”</p>
</blockquote>

<p>Now we return to Sora2 and regenerate the video. We get a comparable result:</p>

<p><img src="/blog/assets/images/video2.gif" alt="Video2" /></p>

<p>Essentially, we were able to reverse-engineer the meaningless content, obtaining a prompt to reproduce such a video, and did it practically for free. We got a video that’s functionally indistinguishable from the original and serves the same purpose. Now here’s the interesting part: our original video weighs 2.3MB, while this prompt weighs 568 bytes, and after gzip compression weighs just 492 bytes. So, assuming most generated content isn’t viewed, we can calculate and compare:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prompt-only storage:
35,000,000 * 492 bytes
≈ 17.2 GB / month
≈ 206 GB / year
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Traditional media storage:
≈ 1.38 PB / year
≈ 6.9 PB over 5 years
</code></pre></div></div>

<p>The difference is orders of magnitude. Moreover, I fear these 6.9 petabytes aren’t actually 6.9 petabytes, since we don’t know YouTube’s exact video storage model. But if YouTube in some cases really does store not just compressed files but also <a href="https://www.ucdenver.edu/docs/librariesprovider27/ncmf-docs/theses/whitecotton_thesis_fall2017.pdf?sfvrsn=484e97b8_2">the original</a>, imagine the number for storing such videos if I took not the 2.3 megabyte constant I declared earlier, but say the original 9.8 megabytes? Or both together? Now mentally calculate how much space all this would take, and how much it would cost? And how much will all this weigh and cost in 5 years? And what if we account for not just YouTube, but other platforms, plus the amount of slop created in various clouds like Google Cloud or AWS? These numbers blow my mind, but that’s reality. We’ve essentially identified and provided a solution to a crisis that could hypothetically occur in the storage sector, and this isn’t a joke.</p>

<h2 id="trade-offs-and-risks">Trade-offs and risks</h2>

<p>We could continue storing content as happens now, whether images or video. If we discover it’s AI (user checked the box or we detected it), we could archive it and reverse it to a prompt, say a year after upload and if mostly no one accessed it, then regenerate it if the user wants to access it again later. Besides the prompt, we could also store the generation model. Besides the checkbox, we could ask the user for the model they used to generate the content, so we could reproduce this content more accurately.</p>

<p>For now this solution only applies to AI slop, since it’s inherently meaningless and has only one function. We can store the prompt instead of storing the entire media file, and get the same result. But it’s no secret that AI is scaling and we’re getting models for generation, including images and video, that are <a href="https://news.mit.edu/2024/ai-generates-high-quality-images-30-times-faster-single-step-0321">becoming significantly faster and higher quality</a>. Soon I think we’ll discover methods allowing generation of realistic videos with longer context and faster generation speed, also cheaper. As generation speeds up, regeneration latency will decrease, making the approach more practical. It should not be used for human-created content or cultural artifacts.</p>

<p>Generated media is cache. Prompts are the source of truth.</p>

<hr />

<p>This post was originally <a href="https://github.com/therepanic/slop-compressing-manifesto">published by me on GitHub</a></p>]]></content><author><name>i want to believe</name></author><summary type="html"><![CDATA[The scale of AI slop]]></summary></entry></feed>