Data Archiving

February 25, 2014

Introduction to Data Archiving:

Keeping your records archived is very important for business owner of all type of company. Archiving is the process of transferring data they no longer actively to separate data storage location that can be located and accessed when necessary. Data archiving is one of the best solution for the problems that are related to ever-increasing volumes of data. The mean of archiving is not creating backups for inactive data, because achieve retrievals can be selective, while backups are not. On the other hand, backups are oriented to the data store, and archives are application-oriented, so it can be a part of a backup and recovery scheme.

By archiving old data, you can manage uncontrollable data growth and keep your databases smaller and more manageable. It will help you to identify which files are no longer active in your organizations. With smaller tables, you will control index rebuilds times and backup/restore times. You will see performance gains on particular queries that either scan a huge part of the table or clustered index.

As per your business need, one should select either delete the unwanted data, or save the data to a history table and delete that from production database.

Implementing archival process is not tough task…you just need to follow three easy steps that are entailed in this process:

  • Identify data that you want to archive
  • Copying identify data in to history table
  • Clear your production database by deleting the identified data

One don’t need to archive single table every day, but they need to understand different tables in the database are connected to each other through foreign key relationships and then sequence your delete/archive process.

Furthermore, Data backups are used to restore data in case it is corrupted or destroyed, on the other hand data archives protect in active older information that occasionally need to be accessed. These topics give the details about the concept of archiving, identify high-level terminology connected with archiving, as well as define various other concepts associated with setting up and implementing a database data archiving strategy.

The Need for Archiving:

Not one or two there are several factors that require data archiving. First, the volume of data in database always increases. Second, most of this data is not active but must be maintained for business reasons and third, it is also important to preserve all inactive data as inexpensively and effectively as possible.

Databases are increasing at an unexpected rate that’s why data centers are more concerned about the volumes of data. Some inactive data exist in all over due to past dealings, the use of data warehouses, and several other reasons. Inactive data is not more useable but it’s less likely to be accessed.

Several reasons why you should maintain inactive data like:

  • You are emulating government necessities
  • You expect the requirement for creating a future trend analysis
  • You need to keep up an entire history of your clients

When the amounts of database increase, the percentage of inactive data automatically increase, this can create issues like:

  • Performance problems from extra input/output  (I/O) and widespread processing that effects clients
  • Manageability issues because huge objects become difficult identify and size limit are reached.
  • Extensive hardware and storage overheads

Benefits of Archiving

  • There are several benefits associated with archiving – reclaim disk space, reduce storage expense, lower data maintenance and enhance the performance of active data.
  • Interrelated is one of the main benefits of archiving. By recovering disk space that was used by inactive data, you can effectively cut down your storage expenses as well as reduce object maintenance.
  • And another benefit of archiving is that the search query result performance will increase automatically after by removing the inactive data from the database.
  • Lowering costs:
    • Lowering expenses is the most significant benefits of archiving your inactive data. You can store DB2 table archives on less expensive storage media than that holding your active data.
    • If your business active data is stored in high-performance, expensive data storage medium, then this is the perfect solution to establish less expensive storage medium on which to place your archive tables and files. Magnetic tape is best example of this type of medium.
    • Moreover, it improve performance of applications, use less CPU, Input/output, and memory resources, which helps to saving extra cost.
  • Less object maintenance:
    • Decreasing object maintenance is the key feature of inactive data archiving.
    • Archiving DB2 data decrease the overall number of DB2 objects that require maintenance. For instance, when less data exists, utility processing requires less time, cutting down batch processing times.

Initiate by assessing your needs for archiving. Give the answer of the following questions:

  • What data do we require to archive?
    • Our transactions inactive data.
    • In short temporary data which is useful in complete the whole transactions.
  • When do we require archiving this data?
    • Any time according to our requisite.
    • Weekly, Monthly, Quarterly or Yearly.
    • It is also depend on the volume of our PMS Database.
  • How long do we require preserving the archived data?
    • This data is maintained for long times.
    • It should be either march ending for our transaction data.
    • Or it may be many years.
    • Archive data can be save at any place, either on Hard Disk or CD’s
  • Do we require SQL access to the archived data?
    • Yes, but we can archive the data from any data source.
  • Should the archived data be stored in DB table archives or in flat-file archives?
    • We can use XML for store the inactive records.
    • Because the xml is also a file management system.
    • XML also maintains the all relationships between tables and all the data type for the data.
    • And XML is light weight then the other flat files or the any DBMS.
    • It is one type of file structure so it is more readable or easy to store anywhere.
  • Under which conditions and how often will we require retrieving archived data from either table archives or file archives?
    • When we need to refer in active data for reviewing the some old transactions.
    • Or doing some work like final audit (March ending).

Proposed Solutions:

Data Archiving through the Flat-File (Handle by Programmatically):

I prepared one solution that consists of some unique terms and some terms that are common to other areas of technology. To understand the all basic terms of data archiving, you have to understand how these are utilized to define data archiving.

    • If your databases are increasing so large and consist of huge amount of inactive data, then you can move and archive this data to archive tables or to archive flat files (Notepad). By following this, you can not only transform data from highly uses and highly expensive data storage to possibly less expensive data storage.
      • In my proposed solution, I can’t use temporary tables because of the XML file.
      • XML itself is a DBMS or it is the light weight flat file.
      • We can also set the DataSet or DataView directly on this XML file.
      • This facility is not provided into the other flat file.
    • After archiving data, you might want to selectively recover portion or all of the archived data. This process is different from backups, because you recover all data from backups.
      • In this case, we provide the selection for user which data u wants to retrieve.
      • Or because of we are using the XML we can provide the Query like facility on the Archive Data.
    • A table archive stays as a set of DB2 tables that you can archive and access quickly or frequently. Table archives are maintained in the same data store as the active data and provide immediate access to recovery of archived data. Table archives in interactive databases are regularly referred to as history tables.
      • Through the XML we can handle this situation very well.
      • With the main system we can also able to do some work on XML parallel.
      • In short we are doing some operation directly on the flat file. Without the use of XML we can’t do the same.
    • These archives fetch a probable delay in data retrieval; however, the huge numbers of options for storage gives a method for greatly reducing costs. Tape and optical storage gives relatively lower storage expenses than disk space.

Data Archiving through the Stored Procedure:

In this process we have to create fresh data based known as Archive Data. Which table we have to archive same formation and put at over here. Now what we have to do for data archive?

We have just need to call this stored process without any limits, to get all the transact dealings older than 6 months. As you can see and pass all possible date parameter to customize the archiving process. Keep in mind most important thing that doesn’t delete transactions located in the last three months.

As per your requirements, you can set this stored process as a job by utilizing SQL Server Agent, so that it works every week or fortnight.

This type of jobs tends to block the table when they are working. So it is better to calendar these jobs for off-peak hours. As well, keep an eye on the transaction log space use. In case of any issues with log space, prefer archiving in smaller batches.

Case Study for Data Archiving in Symphony – A Hotel Management System:

Now starting from the data archiving operation, we provide data archive facility to only Admin and it may be possible by including only in server installation setup. Not at the client end.

Case Study for Data Archiving in Symphony

 

Now Admin can initiate an operation of archiving the old data of reservation. As per the process flow he just set the filtering criteria for which data we have to cover in archiving operation. Means set the date. Data may be older than 2-3 years.

After Defining the Filter Criteria, our actual Archive process is start. This is complex process, so I try to divide whole process in 3 sub process.

Set the Basic Information Related to Data Archive

In that we will take the information related to the following details:

  • What the Filtered Criteria?
  • When we take the Data Archive?
  • At the time of Archiving what will be the size of archived data?
  • Where you store this archived data? (Physical Location of Archived Data File)
  • Information Related To Media Serial Number.
  • Who perform this archive operation on which date?

We will store this whole information in one table in the Database. And from that database we will retrieve back one archive ID which is unique for every archive operation.

Generate the result set of the affected Data in Archive Operation

After successful completion of Process 1, we will go for process

In this process we will get the list of filtered reservations, through that we will get the list of dependent data from different tables.

After getting all this data we build up the DataSet (Collection of DataTable) and write that whole data in particular binary file. So your data is very secure and user can’t do any update operation on that, because binary format is not readable like an XML document. One another advantage is that it is very less in size. So it is very cost effective with respect to XML data file.

After getting all this data we build up the DataSet (Collection of DataTable) and write that whole data in particular binary file. So your data is very secure and user can’t do any update operation on that, because binary format is not readable like an XML document. One another advantage is that it is very less in size. So it is very cost effective with respect to XML data file.

Binary file Generation Coding is defined below.

public void DeserializeDataSet(DataSet ds)
{
ds.RemotingFormat = SerializationFormat.Binary;
FileStream fs = new FileStream(“MyData.bin”, FileMode.OpenOrCreate);
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(fs, ds);
fs.Close();
}

Here in above code we pass the DataSet in one function through that dataset we generate our data file with name of “MyData.bin”. bin is extension for binary file. This file is very less than the XML file. It is I think 5 times less than the size of XML datafile with same data.

After creating that file we will write that file in Magnetic media or any media.

Update our DB and do actual delete old or in active data from the Database.

After successful completion of process 2, we will go for process 3.

In this process, we will use the result set of already get dependent data from the Database. It is sub process of Process No 2.

From that first of all Update the all filtered reservation records, make IsArchive field = true and put the ArchiveID = new archive Id which is coming from process 1 execution. After this successful update operation on database we will delete all the dependent records from the child tables.

For successful completion of whole DataArchive operation, the Process 1, 2, and 3 are the main or major requirements of the system. If any process or sub process may be failure, the whole DataArchive operation fails and you have to start from the Process-1.

Database Level Implementation

tra_Transaction_Master
Transaction_ID
Tran_Date_Time
TR_Source
Drawer_ID
EOF_Y_N
tra_Folio_Transactions
Transaction_ID
Reservation_ID
Folio_ID
Room_ID
Account_ID
Amount
CR_DB
Description
isVoid
Void_Reason


CREATE PROC dbo.ArchiveData
(
@CutOffDate datetime = NULL
)
AS
BEGIN
SET NOCOUNT ON
IF @CutOffDate IS NULL
BEGIN
SET @CutOffDate = DATEADD(mm, -6, CURRENT_TIMESTAMP)
END
ELSE
BEGIN
IF @CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP)
BEGIN
RAISERROR (‘Cannot delete transaction from last three months’, 16, 1)
RETURN -1
END
END
BEGIN TRAN
INSERT INTO Archive.dbo.tra_Transaction_Master
SELECT * FROM dbo.tra_Transaction_Master
WHERE Tran_Date_Time < @CutOffDate
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR (‘Error occured while copying data to Archive.dbo.tra_Transaction_Master’, 16, 1)
RETURN -1
END
INSERT INTO Archive.dbo.tra_Folio_Transactions
SELECT * FROM dbo.tra_Folio_Transactions
WHERE Transaction_ID IN
(
SELECT Transaction_ID FROM dbo.tra_Transaction_Master
WHERE Tran_Date_Time < @CutOffDate
)
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR (‘Error occured while copying data to Archive.dbo.tra_Folio_Transactions’, 16, 1)
RETURN -1
END
DELETE dbo.tra_Folio_Transactions
WHERE Transaction_ID IN
(
SELECT Transaction_ID FROM dbo.tra_Transaction_Master
WHERE Tran_Date_Time < @CutOffDate
)
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR (‘Error occured while deleting data from dbo.tra_Folio_Transactions’, 16, 1)
RETURN -1
END
DELETE dbo.tra_Transaction_Master
WHERE Tran_Date_Time < @CutOffDate
IF @@ERROR <> 0
BEGIN
ROLLBACK TRAN
RAISERROR (‘Error occured while deleting data from dbo.tra_Transaction_Master’, 16, 1)
RETURN -1
END
IF @@TRANCOUNT > 0
BEGIN
COMMIT TRAN
RETURN 0
END
END

Share on

Nitin Suvagiya

He is working with Softqube Technologies as Director, since 2009. He has over 15+ years of experience in Microsoft Technologies, working as a CEO and also providing consultation in DevOps as he is DevOps certified. He has good expertise in UX/UI designing in any application.

Let’s Work together!

"*" indicates required fields

Drop files here or
Max. file size: 5 MB, Max. files: 2.
    This field is for validation purposes and should be left unchanged.