Table of Contents
Data Management
Data management is the development and execution of architectures, policies, practices and procedures in order to manage the information lifecycle needs of an enterprise in an effective manner.
Data management experts stress that data life cycle management is not a product, but a comprehensive approach to managing an organization’s data, involving procedures and practices as well as applications.
Also Read : What is Cluster Analysis ? Type of data in clustering analysis
Why Data Management?
- Poor Data quality.
- Lack of common standards.
- Data Consolidation
- Data loss due to lack of resources
- Meta data, rather than data itself is a problem
- If meta data does not agree, it becomes difficult to transfer the underlying information from one application or division to another.
- Control costs
- Unprotected Data
- Compliance with state and federal regulations liks SOX, HIPAA
Need for Data Management is felt very much due to the above reasons and it is growing more critical as organizations become more and more swamped with data overload.
Benefits of Data Management
Data Management provides the following as solutions for the above needs/issues.
- Data Analysis
- Database management system
- Data modeling
- Data migration or movement
- Data Quality assurance
- Data Security
- Meta-data management
- Data Protection and etc
Data Management Process
Data Management Process is divided into three major categories:
- Change Management
- Performance and capacity planning
- Tiered storage
All the above is encompassed under Information Life Cycle Management (ILM).
Also Read : What is a data management system ? definitions
ILM (Information Life Cycle Management)
ILM (Information Life Cycle Management) manages stored data from the time it is created until it is destroyed, saving storage costs.
ILM is a comprehensive approach to managing the flow of an information system’s data and associated metadata from creation and initial storage to the time when it becomes obsolete and is deleted.
ILM involves all aspects of dealing with data. It covers
- Automate the processes involved.
- Organizing data into separate tiers according to specified policies.
- Automating migration from one tier to another based on some policies (ex.) newer data and data that must be accessed more frequently is stored on faster but more expensive storage media, while less critical data is stored on cheaper, but slower media.
- Importance of any data doesn’t rely solely on its age or frequency of its access.
Issues/Challenges in Data Management
- Data Migration on Tiered storage system
- Controlling the Cost
- Backup or Archive?
- How can you resolve out-of capacity issue? (or)
- Why do you need Storage Capacity plan?
- Application Capacity
Data Management Tools
Administrators use many tools for Data Management. They are classified based on the tasks managed by them. Some of them are
- Configuration
- Provisioning
- Migration
- Archiving
- Performance Measurement
Configuration Tools:
- Supports setup and operational characteristics.
Ex. HDS (Hitachi Data Systems Inc. allows users to define, change and reassign logical unit numbers without rebooting, handle virtual LUNs, manage cache, and maintaining security.
Provisioning Tools:
- They are used to allocate storage resources to specific users or applications. As with configuration tools, provisioning tools are expanding their compatibility across multiple platforms.
Ex. HP’s storage essentials enterprise edition s/w
Veritas provisioning manager from Symantec Corp.
Migration tools:
- Handle data movement from one storage platform to another.
Ex. Transparent Data Migration Facility (TDMF) from Softek Storage Solutions Corp
Archiving tools:
- They are similar to migration tools but intended for long-term data retention and they focus on single-instance storage.
Ex. SnapLock from Network Appliance Inc (NetAPPs), IBM’s DR500
Monitoring, Measurement & Reporting tools:
- Gain insight into the behaviour and usage of storage infrastructure, Identify problem performance areas and plan for future storage expansions.
Ex. Symantec’s Veritas Command Centrat storage product
StorageEssentials SRM from HP
Data classification
Allows a corporation to organize its data according to its relative value so that it can be stored to the appropriate tier and more easily retrieved.
Data Classification cover four main areas:
- Discovery
- Classification
- Search
- Migration
Features:
- More robust and thorough.
- Able to examine the files and documents for keyword sequences and make contextual decisions about the data.
- Has enough information, draw inferences and make intelligent decisions about the data.
- May be implemented as h/w or s/w.
Data Protection
Storage professionals must keep a company’s data safe through disaster recovery planning, remote data protection and other security measures
Various types of data loss and availability risks that require data protection:
- Detectable file deletion or corruption
- Latent data deletion or corruption
- Storage device failure
- Interdependency failure
- Compound failure
- Site failure
Future of Data Management
Technologies emerging and gaining importance in Storage:
Data De-duplication
- Data deduplication (often called “intelligent compression” or “single-instance storage”) is a method of reducing storage needs by eliminating redundant data. Applied in CAS (Content Addressed storage).
Data Compression
- Data Compression basically attempts to reduce the size of a file by removing redundant data within the file ex. Applied in VTL(Virtual Tape library)
Data Encryption
- Encryption is used to protect data, preventing unauthorized users from accessing information.
Application Aware storage system
- Application-aware storage is a storage system with built-in intelligence about relevant applications and their utilization patterns.
Also Read : Big data Management tools : What it is and why it matters ? Extended RDBMS Architecture