Hadoop divides the data into multiple file blocks and stores them on different machines. By default all machines are deemed to be on the same rack and thus if rack awareness is not configured there the possibility that Hadoop will place replicated copies of the block in same rack. This could result in data loss when that rack fails. Although rare this can be avoided by explicitly configuring Rack Awareness.
↧