I want to measure the throughput at each datanode by measuring the time taken for each read/write operation. It is very confusing to read through the million functions and find out where this is happening. Could someone list the series of calls made while reading/writing a block of data? am using version 1.0.1. Alternatively, if there is already an API which measures this at the datanode I could use that information.

Recommended Answers

All 2 Replies

The time for any specific read/write function in a hadoop cluster and data node can vary significantly. I don't suppose you are running an industrial strength management tool like Cloudera on your cluster, are you? They do track those sort of metrics, and can alert you when they exceed specified limits.

No. I cannot assume any metric logging system like Ganglia. This must work on a "vanilla" Hadoop distribution

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.