I apologize in advance for this being quite a long post... but, any help is great!

I have a website (coded in PHP) which publishes content provided by users in a revenue sharing model.

We pay each contributor a certain amount for every 1,000 "views" that their articles get. In the fine print, a "View" is intended to mean a unique visitor in a 24 hour time period. So if you visit an article three times in 24 hours, they would only get 1 view.

At first, we were tracking views simply by storing a cookie on the users browser. However, the views we were getting were seriously overinflated over the pageviews recorded by google analytic. My guess is because it was recording views for spiders and other robots, which google does not include (right?).

SO, we then switched over to using the google api to track "unique pageviews" for each article. This works very well. BUT, there is a problem:

Many of the articles we publish are broken up into multiple pages. The URL structure is such that each separate page has a unique URL (e.g., /article/21/page-2/this-is-the-article). In order to track the views for a certain article, we aggregate the unique pageviews from each page to one count for the entire article.

So that means that if a visitor reads a three page article, there will still be 3 "views" recorded, instead of the intended 1 (because we want to track unique visitor per article, not page of article).

The question therefore comes down to:

How do we track unique viewers of a single article, even if the article is split over multiple URLS (pages)??

Any help would be greatly appreciated!

Recommended Answers

All 9 Replies

1. check the valid user if valid get/save his IP.
1. $SERVER['REMOTE_ADDR'] get the ip of the viewer.
2. put the assign "user_id" or "save cookies" to that IP/user.
3. Get the date for today as the day of visit store in one column in your database and track their views with the same date.
4. how about tomorrow?
5. if the "save date" is equal to "server todays date" is equal meaning same date they are viewing the page if not equal grab the system date and overwrite the previous date.

Ok... that just sounds like a way to track a unique visitor though, regardless of whether or not it is a robot?

If I am going to track visitors with my own php code, I need a way to differentiate between bots and real visitors.

Well by simple logic, why not assume that one unique view is viewing the first page of an article. If so, why not just count the unique views on the first page of each article.

A note about search engine bots: Most search engines identify themselves with a special HTTP header. If you can write your own unique view counter (Cookies will work; it's how Google does it!), simply set it to ignore search engine spiders. Googlebot uses the Useragent String "Googlebot", so simply ignore the view counter when the $_SERVER['HTTP_USER_AGENT'] is "Googlebot".

Read Detecting Search Engine Bots in PHP for more information on search engine spider detection.

Side Note: I would strongly advise against using an IP to track a user. First, if a user is on a large network running through a central router, all the computers on that network would have the same IP address. Large businesses and colleges usually have several IP addresses, but not a unique IP for each computer. This means that if 100 unique people on one network view an article once in a 24 hour period, it will only register as 1 unique view, instead of 100. Beyond this, IP addresses are easily masked. Users can hide behind a proxy or use a client such as Tomato to change IP addresses.

they have one IP, but different in registered cookies or user_id's and username right?
tracking IP is good also if you want to banned that user using the same IP :-)
don't just concentrate in IP you have so many factors or fields to considered in your database.

Well by simple logic, why not assume that one unique view is viewing the first page of an article. If so, why not just count the unique views on the first page of each article.

A note about search engine bots: Most search engines identify themselves with a special HTTP header. If you can write your own unique view counter (Cookies will work; it's how Google does it!), simply set it to ignore search engine spiders. Googlebot uses the Useragent String "Googlebot", so simply ignore the view counter when the $_SERVER['HTTP_USER_AGENT'] is "Googlebot".

I thought of the first idea you mentioned there, only counting views of the main page. But I'm a little worried that this means users who come in via a search engine to say, the third page, won't get counted. Though, at the same time it might be rare that someone reads the third page without first going back to the first... maybe that will work.

Also, I like the idea of using my own cookie to track visitors, but then you get into the challenge of identifying more malicious bots that might not identify so clearly as google and other major search engines.

Well by simple logic, why not assume that one unique view is viewing the first page of an article. If so, why not just count the unique views on the first page of each article.

I checked my statistics, and as a matter of fact about 90% of pageviews for subsequent pages of a multi-page article were coming from previous pages. So, this seemed like a good solution, especially since an additional 4-7% of viewers who landed in the middle of an article ultimately went back to view the first page. Overall, just counting the first page views in counting most unique article views.

Another suggestion: If you are worried about search engines accessing pages other than the first, you can either set the robots.txt to allow a web spider access to the first page of every article. You might also want to try setting up a simple script that you include on pages other than the first that checks if the referrer is from a search engine and add some Javascript to manually force Google Analytics to track the user.

If you are satisfied with your solution to the problem, I ask that you close the forum and continue to post on the DaniWeb forums!

if you are worried about bots or some automated system you put a graphics verification in your page. You use "PHP Contact form with image verification"

When you build a form in your web page, you are susceptible to being spamed by automated systems. In order to make sure that the one who completes the form is human, you can use the system with image verification. (from hotscripts)

if you are worried about bots or some automated system you put a graphics verification in your page. You use "PHP Contact form with image verification"

When you build a form in your web page, you are susceptible to being spamed by automated systems. In order to make sure that the one who completes the form is human, you can use the system with image verification. (from hotscripts)

And how is this at all relevant? Yes, spam bots may try a DOS attack with several queries a second, and this would require a CAPTCHA, but this thread has nothing to do with a contact form.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.