I like to have some insights on what pages get visibility on this blog. For most of its existence I was using Google Analytics, but I don’t like the prospect of participating to global tracking, and so I wanted to get rid of it. I ended up deciding to build my own web analytics solution, and use PowerBI to interpret the results.

It’s also an example of configuration for an Nginx Ingress where you can use the same sub-domain to aggregate two backend applications.

Overall Design

My idea is to build a JavaScript client that will send a message containing all the information I want. It is then collected by a .net core backend, deployed in a container on Kubernetes. I call it Minilytics, it’s available on GitHub.

Both the blog and the analytics solution are hosted on the same subdomain (www.feval.ca), so I use an Ingress configuration to route the traffic. Everything going to www.feval.ca/ana/ is directed to Minilytics, the rest is directed to the blog. An Azure Storage Table is used to store the events. Then I use PowerBI to analyze the data, which offers a mobile application, allowing me to see all the dirty things you’re doing to my blog from the comfort of my pocket-computer.

Deployment

Minilytics is a pure CRUD, doesn’t have any business logic. I don’t anticipate that it will have any updates, so I didn’t bother automating build or tests1.

In my previous blog I was detailing how this blog is now deployed, this solution is relatively similar. It has a deployment and a service configured in a separate namespace minilytics. I created a new Ingress configuration that gets deployed in the minilytics namespace:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/configuration-snippet: proxy_set_header x-forwarded-ip "$remote_addr";
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
  name: minilytics-ingress
  namespace: minilytics
spec:
  rules:
  - host: www.feval.ca # Should be variabilized and use helm instead.
    http:
      paths:
      - backend:
          serviceName: minilytics-service
          servicePort: 80
        path: /ana
  tls:
  - hosts:
    - www.feval.ca
secretName: tls-secret

The important thing here is the path, /ana, which will just catch the traffic directed there. Interestingly enough, the ingress built for the blog remains unmodified. The backend is also serving the JavaScript client, which is simply included in the head of the page.

There’s not much to the client. It’s loading on the onload of the page, and executes in a timeout after a second, so as not to block the page loading. It’s stealing the page you’re looking at, your “userid” (a random number, I just want to see if you’re ever coming back), and your browser information (I’m not sure why, I don’t really care about it), and sends all of that to the backend.

Reporting

The report is built up with PowerBI desktop. There’s a native connector for Azure Table Storage, so it’s pretty easy to use.

And there’s a mobile application, which allows consulting the report directly on my phone

Results

Now for the interesting thing, how do the results compare to Google Analytics?

That’s 20% more on the same period!?

SO AM I OVERESTIMATING??

I don’t think so. Maybe there’s some logic in Google Analytics to filter out some bad requests. But for the most part, more and more browsers are killing Google Analytics - Firefox is blocking it by default, so the benefit of my own solution is better numbers.

And this blog is now free from Google’s dirty hands.

Notes

  1. Shaaaame!