How can I tell how much bot traffic I'm getting in Google Analytics?

I've been doing a lot of work on improving my crawl budget and I want to see if Google Analytics reports similar data to the Search Console Crawl Stats charts.

Recommended Answers

All 8 Replies

It seems to me you can use a combination of logging ips that don't pass muster from recaptcha 2 or 3 with an exclude list in analytics to filter out bots.

I don't want to use Google Analytics to filter out bots. I want to use Google Analytics to create a view that only includes bots.

Also, I'm not using reCAPTCHA.

there is an option to disable bot traffic in google analytics report. You can enable that option to exclude the bot and spam traffic in GA reports.

I hope that will help you

I know there’s an option to disable bot traffic, but I am trying to do the opposite. I want to capture ONLY bot traffic.

I think what I’m going to do is resort to using a custom parameter in Google Analytics to capture whether it’s a bot or not.

So I have discovered how to do it :)

Since Googlebot ignores Google Analytics, you have to do this serverside. In my case, I created a new Property in my Analytics account to handle Googlebot traffic.

Then, you need to use Google Analytics Measurement Protocol, which is an HTTP request to https://www.google-analytics.com/collect with parameters of your choosing.

Documentation is available at https://developers.google.com/analytics/devguides/collection/protocol/v1/reference and a reference for all the parameter options are at https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters

In my specific case, I use Cloudflare CDN, which caches a lot of my pages on the edge for users who are not logged in. Since that's the case, I can't implement the HTTP request from within my PHP code, because it would only execute when the page is not cached.

Instead, I took advantage of Cloudflare Workers to send the HTTP request directly from the edge. In my case, I created a new worker with the following script:

const analyticsId = 'UA-98289-3'

addEventListener('fetch', event => {
    event.passThroughOnException()
    event.respondWith(handleRequest(event))
})

/**
 * Check request object for Googlebot UA to send tracking data
 * @param {Event} event
 */
async function handleRequest(event) {

    let request = event.request
    let userAgent = request.headers.get('user-agent')
    let response = await fetch(request)

    if (userAgent)
    {
        // If Googlebot, then track hit in Analytics
        if (userAgent.includes('Google')) {

            event.waitUntil(analyticsHit(
                {
                    uip: request.headers.get('CF-Connecting-IP'),
                    dl: request.url,
                    dt:
                        response.status + ' ' +
                        response.statusText + ' ' +
                        userAgent
                }
            ))
        }
    }

    // Return the original content
    return response

}

/**
 * Send bot tracking data using Analytics Measurement Protocol
 * @param {Object} tracking
 */
function analyticsHit(tracking) {
    let payload = 'v=1&t=pageview&tid=' + analyticsId

    for(var key in tracking) {
        payload += '&' + key + '=' + encodeURIComponent(tracking[key])
    }

    payload += '&cid=' + [ Math.round(Math.random() * 2147483647), Math.round(+new Date() / 1000.0) ].join('.')

    return fetch('https://www.google-analytics.com/collect', {
        method: 'POST',
        body: payload
    })
}
commented: Dani, Its great to see this information is will be helpful in future. +0

But I don't understand one thing that why do you want to record the bot traffic? However, you can use the filter in Google Analytics to get traces of your bot traffic.

thanks @dani ... you have provided huge information to us here. This forum wil be rock soon.

But I don't understand one thing that why do you want to record the bot traffic?

It can be useful, from an SEO perspective, to keep track of how Googlebot crawls your site. I've since updated this code block to also send to Google Analytics whether the page has been noindexed nor not. So now in my Google Analytics, I have views called "Broken Pages", "Indexable Pages", and "Wasted Crawl Budget". The Broken Pages view is any page that doesn't return 200 OK, so I can find a list of all URLs googlebot encountered that might be a 301 or a 404, and I can fix those. Wasted Crawl Budget is any page that Google crawled that was noindexed, so I can optimize my internal linking structure so Googlebot never has an opportunity to encounter any page that's noindexed. (Unless it's from an outside backlink).

commented: Wow, I never thought like that. It's very insightful. +0
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.