f.lapo.it

A protection against paywalls

I've been using 12ft.io for a while now, and it mostly "just works".

How does it work? The FAQ on the actual website goes like this:

The idea is pretty simple, news sites want Google to index their content so it shows up in search results. So they don't show a paywall to the Google crawler. We benefit from this because the Google crawler will cache a copy of the site every time it crawls it.

All we do is show you that cached, unpaywalled version of the page.


…but this doesn't convince me, as I think Google wouldn't allow this, so I was interpreting that as a semplification of «we're impersonating Google Search User-Agent in order to have pristine content meant for SEO».

I was very close, but not quite on spot: it does the same, but on Twitterbot.

There are two accesses I had on my own website when visiting it via 12ft.io:

3.238.205.96 - - [2023-03-23T09:38:21+01:00] "GET /$pagename HTTP/1.1" 200 1037 "-" "Twitterbot/1.0" gz:- TLSv1.3/TLS_AES_256_GCM_SHA384
34.205.92.245 - - [2023-03-23T09:39:29+01:00] "GET /$pagename HTTP/1.1" 200 1657 "-" "Twitterbot/1.0" gz:- TLSv1.3/TLS_AES_256_GCM_SHA384

Anyways: a good idea which works well against #paywall and will be difficult to filter out (source IPs are from Amazon AWS).
1 2