-Example that Reduces Contention in Data Stage Job - Configuration File

The alternative to the first configuration method is more careful planning of the I/O behavior to reduce contention. You can imagine this could be hard given our hypothetical 6-way SMP with 4 disks because setting up the obvious one-to-one correspondence doesn't work. Doubling up some nodes on the same disk is unlikely to be good for overall performance since we create a hotspot. 

We could give every CPU two disks and rotate them around, but that would be little different than the previous strategy. So, let’s imagine a less constrained environment with two additional disks:

 computer host name “fastone”

6 CPUs

6 separate file systems on 4 drives named /fs0, /fs1, /fs2, /fs3, /fs4, /fs5

 

Now a configuration file for this environment might look like this:

 
                    {
                    node "n0" {
                                         pools ""
                                         fastname "fastone"
                                         resource disk "/fs0/ds/data" {pools ""}
                                         resource scratchdisk "/fs0/ds/scratch" {pools ""}
                    }
                    node "node2" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs1/ds/data" {pools ""}
                                         resource scratchdisk "/fs1/ds/scratch" {pools ""}
                    }
                    node "node3" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs2/ds/data" {pools ""}
                                         resource scratchdisk "/fs2/ds/scratch" {pools ""}
                    }
                    node "node4" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs3/ds/data" {pools ""}
                                         resource scratchdisk "/fs3/ds/scratch" {pools ""}
                    }
                    node "node5" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs4/ds/data" {pools ""}
                                         resource scratchdisk "/fs4/ds/scratch" {pools ""}
                    }
                    node "node6" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs5/ds/data" {pools ""}
                                         resource scratchdisk "/fs5/ds/scratch" {pools ""}
                    }
                    } /* end of entire config */

 

While this is the simplest scenario, it is important to realize that no single player, stage, or operator instance on any one partition can go faster than the single disk it has access to.

 

You could combine strategies by adding in a node pool where disks have a one-to-one association with nodes. These nodes would then not be in the default node pool, but a special one that you would specifically assign to stage / operator instances.

Smaller Configuration Example

Because disk and scratchdisk resources are assigned per node, depending on the total disk space required to process large jobs, it may be necessary to distribute file systems across nodes in smaller environments (fewer available CPUs/memory).

 

Using the above server example, this time with 4-nodes:

computer host name “fastone”

4 CPUs

6 separate file systems on 4 drives named /fs0, /fs1, /fs2, /fs3, /fs4, /fs5

 

                    {
                    node "node1" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs0/ds/data" {pools ""} /* start with fs0 */
                                         resource disk "/fs4/ds/data" {pools ""}
                                         resource scratchdisk "/fs4/ds/scratch" {pools ""} /* start with fs4 */
                                         resource scratchdisk "/fs0/ds/scratch" {pools ""}
                    }
                    node "node2" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs1/ds/data" {pools ""}
                                         resource disk "/fs5/ds/data" {pools ""}
                                         resource scratchdisk "/fs5/ds/scratch" {pools ""}
                                         resource scratchdisk "/fs1/ds/scratch" {pools ""}
                    }
                    node "node3" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs2/ds/data" {pools ""}
                                         resource disk "/fs6/ds/data" {pools ""}
                                         resource scratchdisk "/fs6/ds/scratch" {pools ""}
                                         resource scratchdisk "/fs2/ds/scratch" {pools ""}
                    }
                    node "node4" {
                                         fastname "fastone"
                                         pools ""
                                         resource disk "/fs3/ds/data" {pools ""}
                                         resource disk "/fs7/ds/data" {pools ""}
                                         resource scratchdisk "/fs7/ds/scratch" {pools ""}
                                         resource scratchdisk "/fs3/ds/scratch" {pools ""}
                    }
                    } /* end of entire config */

 

The 4-node example above illustrates another concept in configuration file setup – you can assign multiple disk and scratch disk resources for each node. 

 

Unfortunately, physical limitations of available hardware and disk configuration don’t always lend themselves to “clean” configurations illustrated above.

 

Other configuration file tips:

Consider avoiding the disk(s) that your input files reside on. Often those disks will be hotspots until the input phase is over. If the job is large and complex this is less of an issue since the input part is proportionally less of the total work.
 

Ensure that the different file systems mentioned as the disk and scratchdisk resources hit disjoint sets of spindles even if they're located on a RAID system. Do not trust high-level RAID/SAN monitoring tools, as their “cache hit ratios” are often misleading.

Never use NFS file systems for scratchdisk resources. Know what's real and what's NFS: Real disks are directly attached, or are reachable over a SAN (storage-area network - dedicated, just for storage, low-level protocols).

 

Proper configuration of scratch and resource disk (and the underlying filesystem and physical hardware architecture) can significantly affect overall job performance. Beware if you use NFS (and, often SAN) filesystem space for disk resources. For example, your final result files may need to be written out onto the NFS disk area, but that doesn't mean the intermediate data sets created and used temporarily in a multi-job sequence should use this NFS disk area. It is better to setup a "final" disk pool, and constrain the result sequential file or data set to reside there, but let intermediate storage go to local or SAN resources, not NFS.

Tags
Recent content