Hi,

I have always wondered about the architecture of social networking websites like twitter,facebook,google plus. Just out of curiosity i want to know, how do they manage all those long and short posts(including images,web links,etc) with comments on them. Do they store them in database or in xml files or some mix and match? How do they actually manage such a big amount of data?

Thanks

Read about some of the infrastructure that they use:
https://developers.facebook.com/opensource/

Apache Hive is data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets.

Hi JorgeM,
Thanks for the reply.
I'hv checked fb engineering talks but its not what i am luking for. My curiosity is more regarding how posts and comments on them are stored in database or in some xml file and what will be the consequences of doing that? or there is some better approach?

I would tend to beleive that the data would not be stored on flat xml files. Aside from the fact of the volume that those sites process, on a smaller scale, you would just store the information in a typical relational database.

The problem is that with that amount of data, the traditional, relational database model isnt going to perform as well as those that handle what's known as BigData.

commented: informative +1

So will it be performance wise effective to store traditional fb post with say,100 comments on a relational db? and there are around 50 such posts.