Warning: strtotime() [function.strtotime]: It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'UTC' for 'GMT/0.0/no DST' instead in /home/imediqmc/public_html/include/inner-article-middle-body.inc.php on line 109
Warning: date() [function.date]: It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'UTC' for 'GMT/0.0/no DST' instead in /home/imediqmc/public_html/include/inner-article-middle-body.inc.php on line 109
May 31, 2012, 12:31 pm UTC
Interview of my alter ego on StatCounter numbers
There is a piece of information from StatCounter making rounds all over the internet, thanks to KPCB’s Mary Meeker & Liang Wu have shared Internet trends for 2012 couple of days back. My friend’s email also quoted the same source couple of weeks back and wanted my view on the numbers.
They quoted StatCounter to infer that:
“Rapidly Growing Mobile Internet Usage Surpassed More Highly Monetized Desktop Internet Usage in May, 2012, in India”
I have tried to understand StatCounter and validity of these numbers by interviewing my alter-ego
*Disclaimer: No offence to StatCounter. These are my alter ego’s personal views and do not necessarily reflect mine, so Juxt is in no way responsible for these views. Technically, you can’t sue my alter ego, but please don’t even sue me as an Individual for asking the questions – I am too broke to afford a court case.
Q. Do you know how StatCounter works?
A. Their website FAQs reads:
StatCounter is a web analytics service. Our tracking code is installed on more than 3 million sites globally.
Q. How did they get access to these 3 million websites globally?
A. I am not their spokesperson, but I guess they have that counting thingie/widget that we all see in blogs, or with creative common logo in lot of those 5 posts three years old blogs.
Q. Who implements that kind of thing?
A. I guess the long-tail websites and may be Wiki—not sure though, have to verify.
Q. Really, long-tail websites! How much of the total Web traffic do they account for in any country or the whole world, or for that matter in India?
A. Someone reputable once said 20%, and made a big theory out of it. Remember that Chris Anderson guy?
Q. What %age of the 3 million websites is from India?
A. God knows; they never mentioned it anywhere. I guess 10%; okay, maybe 20%; but it can't be 50%.
Q. By the way, globally how many websites are there in total?
A. Oh! The last count was some crazy shit number. The last Netcraft Web server survey reported some 644.2 million websites across world, but of which only about 169 million sites are active.
Q. Is the 3 million website number representative of these 169 million websites in the world?
A. It is complicated.
1. If they are some of the long-tail sites, then they are not. The top 20% account for 80% traffic.
2. Theoretically, they can be representative, provided they picked up the 3 million in such a way that they have all kinds of websites in the world in their sample. Something like this: all the 169 million sites are divided into too many small homogeneous groups and StatCounter has some sample of sites from each of the smaller groups picked up randomly.
3. I doubt that's their method of picking up the sample. It is definitely a biased sample because only a certain kind of people implemented the StatCounter widget/Java Script/etc. in their own websites voluntarily.
Q. Why don't they weigh their data?
A. I also wanted to know that. They hand out some defensive abracadabra like the following in their FAQs section:
We do not impose artificial weightings on our stats – this is a conscious and deliberate decision. Weighting stats means that the stats are only as good as the weighting methodology used. If the weighting data is inaccurate or out of date, then it renders the data completely incorrect. Further, applying a weighting factor to inaccurate data does not turn it into meaningful information – no matter what weighting factors are applied, the geographical spread of the initial stats is very important. For these reasons, we choose not to weigh our data in any way and, instead, we report it as we record it.
Reality is that they don't have a method to weigh their data. Who knows the universe of websites in the world, and, anyway, how can you divide the universe of websites into small homogeneous units/cells, then compare the sample to universe -- the traditional fullproof weighting method I am aware of? That is not practically possible for anyone—well, maybe anyone other than Google, who crawls all kinds of websites and Indexes. No bragging here, but if I were a Google employee with access to that kind of data, I could have tried something like that with StatCounter.
Q. By the way, what is it that they call a sample in StatCounter: websites or people/devices using websites?
A. Don’t they term ‘hits’ as sample? Their FAQs read:
In January 2012, our global sample consisted of over 18.1 billion hits. The ten countries with the largest individual sample sizes are listed below:
· 4.4 billion - United States
· 1.1 billion - Turkey
· 925 million - India
· 893 million - Brazil
Q. What is ‘hits’?
A. The last time I brushed up my Internet knowledge, ‘hit’ meant the following:
"Each time a visitor types in your Web address, the browser ‘hits’ the website's server for the document that tells the browser how to display your Web page. As the browser begins to display the information, it will continue to hit the server for any additional elements needed. For example, a logo will represent one hit as the server returns that image to your browser. Any background design constitutes another hit. And a music file represents yet another."
Q. How did they get the PC 50% : Mobile 50% number for India?
A. They say:
With each hit, a user-agent string is sent which allows us to determine the browser and operating system used, and also to establish if the hit came from a mobile device.
That means it is fairly simple: they just added up the Total Hits from India (= PC Hits + Mobile Hits) and then used a very simple formula of (PC Hits/Total Hits) : (Mobile Hits/Total Hits) :: 50:50.
Q. Oops... you confused the hell out of me... any last crisp comment... you really talk a lot! :)
A. Of the 3 million global sites of StatCounter, whatever percentage of sites that are from India are possibly visited by self-obsessed Internet industry people like you and your friends, sitting in a metro in some air-conditioned room, who has already migrated to a mobile Internet device. So, that damn hit is from A’s IPhone, B's Blackberry, and Your Android. Thanks for all the questions. Now I need to rush!
Conclusion: We must not believe all numbers that we come across from various sources. Numbers never lie, but people do. Any number is as good as the definitions and collection/collation method behind it. Please question how the number is arrived at before believing it on face value.
Part 1 of the story was published here.
April 22, 2013, 5:55 am UTC
April 4, 2013, 5:21 am UTC
March 29, 2013, 4:57 am UTC
March 15, 2013, 7:02 am UTC