The HAR-file approach is interesting because Big Tech doesn't have to do anything different โ the data export is already there, exposed through dev tools by design. The asymmetry isn't that they have tools we don't, it's that they're systematic about consuming the data and we're not. Almost every tactical advantage in the AI era starts with that same realization: the model isn't the leverage, the harness around the model is. Most marketers โ and most companies โ don't have a harness yet, and the playing field is more level than it feels.
The HAR file move is sneaky good. The 'my machine, my data' framing is what finally clicks it. Most of the hand-wringing about scraping ethics evaporates once you draw the line at your own browser session. And the compute side is way more accessible than people think now, since Claude Code can chew through a 1GB JSON without anyone needing to know what graphlib is. Have you settled on a centrality measure you keep coming back to, or does it depend on the question?
The real broken bargain isnโt just that platforms monetized our content and attention. Itโs that they trained businesses to become dependent on platform-controlled visibility and platform-controlled definitions of influence. Then they slowly narrowed access, reduced organic reach, inserted themselves into every click path, and sold everyone back fragments of the value they helped create.
Great point all around Chris. Weโve been the product for a long time now. We must have been thinking similar thoughts this past week
https://davidarmano.substack.com/p/domain-is-the-new-data
The HAR-file approach is interesting because Big Tech doesn't have to do anything different โ the data export is already there, exposed through dev tools by design. The asymmetry isn't that they have tools we don't, it's that they're systematic about consuming the data and we're not. Almost every tactical advantage in the AI era starts with that same realization: the model isn't the leverage, the harness around the model is. Most marketers โ and most companies โ don't have a harness yet, and the playing field is more level than it feels.
The HAR file move is sneaky good. The 'my machine, my data' framing is what finally clicks it. Most of the hand-wringing about scraping ethics evaporates once you draw the line at your own browser session. And the compute side is way more accessible than people think now, since Claude Code can chew through a 1GB JSON without anyone needing to know what graphlib is. Have you settled on a centrality measure you keep coming back to, or does it depend on the question?
It always comes back to the question, but eigenvector centrality tends to be a good first choice.
The real broken bargain isnโt just that platforms monetized our content and attention. Itโs that they trained businesses to become dependent on platform-controlled visibility and platform-controlled definitions of influence. Then they slowly narrowed access, reduced organic reach, inserted themselves into every click path, and sold everyone back fragments of the value they helped create.