Hey guys, we will be interested in learning from your experience in using Linux in Big Data projects. Has anyone used Hadoop, or MapR or Horton Works on Linux and any experiences you may have had on these. I am more interested in knowing if a certain distribution of Linux is better supported for Hadoop and why? Also would like to know if anyone is using Gluster, and if so, are there any other alternatives similar to Gluster?
I am more interested in knowing if a certain distribution of Linux is better supported for Hadoop and why?
Don't know about Hadoop specifically, but I would imagine that Red Hat Enterprise Linux (Server) is one of the best OS for this application (and by "best OS", I don't mean just the best Linux distro). Linux is used in the majority of servers ( > 60%) up to supercomputers ( > 90%), and RHEL is the prominent distro. If it's good enough for Google, Amazon, CERN, the US DoD, and most "cloud" providers, it's good enough for you. Another alternative is Fedora, which is based on Red Hat, but is free, you get the same extremely robust OS, but you'll have to do a bit more work to set things up correctly (and get good performance out of your server applications). By buying RHEL, you buy the support and assistance of the experienced professionals at Red Hat.
Novell's SUSE Linux Enterprise Server is also an obvious candidate. But I haven't heard as much good about it, and certainly, these days it seems they are having trouble keeping up. I'm no expert in this at all, and don't work in this business either, but, nevertheless, I just keep hearing great things about RHEL's stability, up-time, speed and low TCO. And Red Hat seem more proactive in the emerging cloud solutions.
I would guess that OpenSolaris or Oracle Linux wouldn't be bad choices either.
I'm sure you'll find more expert opinions than mine on this subject though.