Google Analytics vs. Server Stats

Andy

Controlled Chaos
Staff member
For the web nerds among us, this is an interesting data point. We all know Analytics isn't going to show the same traffic as what the server sees--there's bot traffic, people use adblockers, JavaScript may not load, etc. This isn't a new thing. But that discrepancy is probably bigger than you think. Here's my June traffic, according to Google Analytics:
1658346250614.png
Well that's not anything to write home about

And here's what my server logs are reporting:
1658346428115.png
I never claimed to run a busy website


That's a pretty big discrepancy--reporting 2% of visitors and .5% of pageviews is not a lot. A quick check against some other sites I'm hosting shows that this isn't just an issue with one site.

I don't know if this indicates that bot traffic is getting more intense, or if Google Analytics is getting worse. I suspect it's a 60/40 split between the two, especially given that that article I linked had their sites on WP Engine, who attempts to filter bot traffic.
 
Can the server stat show where the visitors are coming from? I am wondering it's 12,300 page views by google bot or who else is crawling these days.
 
It does do some basic breakdowns. Here's the bot traffic*. Keep in mind this is hits, so just under 20,000 of the total 91,000 for June (I cropped it before because it didn't seem relevant 🤷🏻‍♂️).
1658353668421.png

* I believe it's using the user-agent string for that, so this is just who identified themselves as bots. Malicious bots are probably saying they're Chrome on Windows or something similar.
 
Where does Google get it telemetry for these reports? It used to only watch the number of hits to it's DNS servers and other data it can see. Is there an agent on your server reporting info to Google?
 
I think you need a js snippet for analytics to give the site owner data these days. I think the “tag manager” snippet is manadatory for getting the analytics. They usually also require some dns record modifications to prove ownership of the site too or atleast a file upload to the root directory but I could be confusing that with gsuite setup because I usually do them at the same time.
 
@Cory, @Eric has it exactly right--all of the analytics tracking comes from cookies that they place on your system via JavaScript (which is added to every page on our site). It's not relying on their DNS servers for telemetry (as far as I know), and they don't have access to our logs.

Between users that block cookies or JavaScript, users that block GA domains, bots that can't run cookies/JavaScript, and whatever filtering GA does, it's not surprising that there's a big difference between the two sources. I just didn't expect it to be so big. I'm testing alternate analytics systems (for a couple of reasons), so it'll be interesting to see that comparison in a month or so.
 
I wonder if these counters are counting the same thing. Sometimes a web server will count every image or URL served as a 'pageview', and a single page might have 30 to 60 internal elements that might count as a 'pageview' in the servers point of view.

Google, on the other hand, see the entire page as a page, even if you are serving up four separate WordPress pages with a lazy load. That used to be a trick, back in the day, to get extra page views. Also, does it even count people like me, with AdblockPlus enabled by default?

1659205981285.png
 
So, that's where things get a bit fuzzy--the server does do a pretty good job of breaking down things like "hits" vs "visitors". It's not fully page-aware like an in-page analytics tool would be, though, so I would think it could absolutely be fooled by things like that. I wouldn't think lazy loading (which we're only using for images; not full content) would fool the stats like that, but I've honestly not looked at that.

Adblockers absolutely prevent analytics from capturing visits, and I think that's a pretty big source of discrepancy (especially with a lot of browsers blocking third-party cookies by default). Google has been trying to rebuild the way cookies work to get around that. That's a big part of the reason I'm trialing some alternate analytics systems.
 
Alright, it's been a while. Let's see how things shook out. BLUF: Google Analytics definitely has a smaller data set, but it's hard to say how much is due to adblockers/third-party cookie blocking and how much is Google playing with their data. Alternative analytics programs exist, but they're expensive and not demonstrably better data-wise (but usually better morally).


First up, raw server stats for Humboldt Makers:
1663046937390.png

And now, Google Analytics for that same time period:
1663047009134.png

Aaand finally Matomo, a self-hosted GA alternative I've been testing:
1663047084873.png

What do these numbers tell us? Well, not much, really. I'm muddying the waters a bit by using 3 data sets, but I believe that Matomo's "visits", GA's "sessions", and the server's "visits" are basically analogous. Pageviews are pretty standard between all three, as are unique visitors. That means that GA is reporting 1% as many visits as the server and .5 as many pageviews, while Matomo is reporting 5% as many visits as the server and 4% as many pageviews. I've been seeing Matomo get blocked by adblockers, although at a lower rate than GA seems to be. This tracks in the data, although there's no way to know how much secret sauce Google is applying before handing their data to me. There are ways to get Matomo to defeat adblockers, which would be an interesting experiment but not one that I'm comfortable running.

Testing out alternative analytics packages is... interesting. There are a lot of different systems available, and they vary really broadly in terms of capabilities, ease of use, and price. However, none of them really compete with Google Analytics--they're typically relatively expensive, not as nice to look at, and have reduced features. Those aren't necessarily bad things, but it's problematic when most alternatives have all three together.

I think I'm going to keep comparing Google to other packages, since I've come this far 🤷‍♂️. I don't really expect to find dramatic differences, nor do I expect to find a really good replacement for Google Analytics. However, I'm am both curious enough and in a position where I can test the alternatives, so I might as well document it somewhere.
 
Back
Top