Tech Tips For Sun Storage

Tech Tips For Sun Storage

 
Problem Description — Copy Back not Starting after Replacing a Faulty Drive in a Sun Storage
2500, 2500-M2, 6140, 6540, 6580, 6780 and Flex line 380.
 
Symptoms 



Use case 1
1. Drive failed by SYSTEM or USER.
2. Reconstruction completes to GHS successfully.
3. Failed drive is replaced in the enclosure.
 
Results:

  •  For firmware 6.xx.xx.xx:
    The copy back operation will start, assuming that there are not more than two (2) operation in a combination of reconstruction or copy back taking place on the system. If so, it will be queued.
  • For firmware 7.10.xx.xx (all revisions) through 7.35.xx.xx (all revisions):
    The copy back operation will start, assuming that there are not more than two (2) operation in a combination of reconstruction or copy back taking place on the system. If so, this requires user intervention to trigger the copy back. (See the Solution below)
  • For firmware 7.50.xx.xx (all revisions) and higher:
    The copy back operation will start, assuming that there are not more than two (2) operation in a combination of reconstruction or copy back taking place on the system. If so, it will be queued.

Use case 2
1. Drive failed by SYSTEM or USER.
2. Reconstruction to GHS starts.
3. Failed drive is removed and replaced from system prior to the reconstruction completes.
4. Reconstruction completes successfully.
 
Results:

  • For firmware 6.xx.xx.xx:
    The copy back operation will start, assuming that there are not more than two (2) operation in a combination of reconstruction or copy back taking place on the system. If so, it will be queued.
  • For firmware 7.10.xx.xx (all revisions) through 7.35.xx.xx (all revisions):
    The copy back operation will not get queued and start automatically. This requires user intervention to trigger the copy back. (See the Solution below)
  • For firmware 7.50.xx.xx (all revisions) and higher:
    The copy back operation will start, assuming that there are not more than two (2) operation in a combination of reconstruction or copy back taking place on the system. If so, it will be queued.

 
Use case 3
1. Drive is pulled from system.
2. Reconstruction to GHS starts and completes.
3. Failed drive is replaced.
Results:

  • For firmware 6.xx.xx.xx:
    The copy back operation will start, assuming that there are not more than two (2) operation in a combination of reconstruction or copy back taking place on the system. If so, it will be queued.
  • For firmware 7.xx.xx.xx:
    The copy back operation will not get queued and start automatically. This requires user intervention to trigger the copy back. (See the Solution below)

Use case 4
1. Drive is bypassed by the array fimware due to a hardware issue.
2. Reconstruction to GHS starts upon the next write failure and it completes later.
3. Bypassed drive is removed and replaced.
 
Results:

  • For firmware 07.60.53.10 and later in the 6000 series:
    The copy back operation will not start automatically. This requires user intervention to trigger the copy back. (See the Solution below)
  • For firmware 07.35.67.10 and later in the 2500 series:
    The copy back operation will not start automatically. This requires user intervention to trigger the copy back. (See the Solution below)
  • For firmware 07.77.13.11 and later in the 2500-M2 series:
    The copy back operation will not start automatically. This requires user intervention to trigger the copy back. (See the Solution below)

Cause
This copy back function has changed slightly between firmware revisions. Depending on the
firmware and the circumstances, a copy from GHS to the replacement drive may not happen
without manual intervention.
For the situation where a drive is bypassed by the array firmware (above “Use case 4”), upon
insertion of a replacement drive to the same enclosure/slot, the new drive is seen as
unassigned. As drives are tracked by their World Wide Number (WWN), and the original drive
was never failed, the controller firmware is still looking for its existing not present/optimal drive
to be reinserted into the system to copy back from the GHS. This is a normal firmware behavior.



Solution
Sun Storage Common Array Manager (CAM)

Note: If you have the firmware level 6.xx.xx.xx on your array, no actions is needed after the
drive replacement. The copy back operation will start, assuming that there are not more than
two (2) operation in a combination of reconstruction or copy back taking place on the system. If
so, it will be queued.

Using the Browser User Interface (BUI):
1. Select the array in CAM.
2. Click on “Service Advisor”.
3. Expand “Portable Virtual Disk Management” on the left pane.
4. Select “Replace a Disk Drive” then follow the instructions.



Using the Command Line (CLI):
1. Use Sun Storage Common Array Manager (CAM) to confirm that reconstruction jobs are completed before moving forward to the next step.
2. Use the following CAM command line to list the drives needing replacement:

service -d <array-name> -c replace -q list
Location of the ‘service’ command:
Solaris: /opt/SUNWsefms/bin/
Linux: /opt/sun/cam/private/fms/bin/
Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin\

 

Example:
/opt/SUNWsefms/bin/service -d st6140c -c replace -q list
Executing the replace command on st6140c
Drives needing replacment:
Tray.85.Drive.02
In use hot spares:
Tray.85.Drive.16
Unassigned drives available for replacment:
Tray.85.Drive.10
Tray.85.Drive.05
Tray.85.Drive.11
Tray.85.Drive.06
Tray.85.Drive.04
Tray.85.Drive.03
Tray.85.Drive.08
Tray.85.Drive.12
Tray.85.Drive.07
Tray.85.Drive.09
Tray.85.Drive.13
Tray.85.Drive.14
Tray.85.Drive.02

The above example shows that the drive 85,02 needs to be replaced. This drive has
already been replaced but the copy back did not start.
 
3. Use the following CAM command line to manually trigger the copy back:
 

service -d <arrayname> -c replace -t <drive_needing_replacement> -q
<drive_to_be_used_for_the_replacement>

 

Example:
/opt/SUNWsefms/bin/service -d st6140c -c replace -t t85d02 -q t85d02
Executing the replace command on st6140c
Completion Status: Success

In the above example, we manually trigger the copy back by replacing the drive 85,02 with itself. This drive has already been physically replaced.
 
4. Use CAM to confirm that the copy back from the in use GHS started.