Compress Files into Zip with Progress and Updates

J.C. SolvoTerra 1 Tallied Votes 2K Views Share

There are many examples of compressing files in VB.Net. So many of these examples are using third party libraries which is just not neccesarry. People were saying that files and archives exceeding 4Gb couldn't be used... I hate restrictions and decided to "Stick it to Microsoft"

To get a better understanding of Zip files and their contents I set about writing my own Zip file generator from the ground up (Still using Phil Katz's Zip algorythem). Eventually I successfully built my own Zip and ZIP64 archives byte by byte.

I then looked at my work and thought, now I have a grasp of the inner workings of a Zip file (With the PKZip algorythem) let's revist the compresion name space to make sure I wasn't simply wasting my time. It turns out that all I achieved from this project was to get a very solid working knowledge of Zip files, and how to build them internally as these romours about 4Gb limits etc were totally bogus, the Compression namespace converts large files (on the fly) to Zip64 as required. So...

Here is my Compression method, wrapped up nicely in an easy to use class. It uses the .Net4.5 Compression namespace and reads\writes files directly from\to your disk. If you observe your systems performance during compression, No additional memory is used.

I hope this helps a lot of people out in the future.

'INCLUDE FOLLOWING REFERENCES Requires .Net 4.5
'System.IO.Compression
'System.IO.Compression.FileSystem

Imports System.IO
Imports System.IO.Compression

Public Class Zipper

    Public Event Progress(Percent As Integer)
    Public Event Complete()
    Public Event StatusChanged(Status As String)


    Private _Cancel As Boolean
    Public Property Cancel As Boolean
        Get

            Return _Cancel

        End Get
        Set(value As Boolean)

            If _Cancel = True Then Exit Property

            _Cancel = value

        End Set
    End Property


    Private _Compression As CompressionLevel
    Public Property CompressionLevel As CompressionLevel
        Get
            Return _Compression
        End Get
        Set(value As CompressionLevel)
            _Compression = value
        End Set
    End Property

    Private _Target As String
    Public Property TargetURL As String
        Get
            Return _Target
        End Get
        Set(value As String)
            _Target = value
        End Set
    End Property

    Private _Source As String
    Public Property SourceURL As String
        Get
            Return _Source
        End Get
        Set(value As String)
            _Source = value
        End Set
    End Property

    Private _IsDir As Boolean
    Public ReadOnly Property SourceIsDirectory
        Get
            Return _IsDir
        End Get
    End Property

    Private _Overwrite As Boolean
    Public Property OverwriteTarget As Boolean
        Get
            Return _Overwrite
        End Get
        Set(value As Boolean)
            _Overwrite = value
        End Set
    End Property

    Private _IncludeRootDir As Boolean
    Public Property IncludeRootDir As Boolean
        Get
            Return _IncludeRootDir
        End Get
        Set(value As Boolean)
            _IncludeRootDir = value
        End Set
    End Property

    Private _SessionLength As Int64
    Private _SessionFiles As String()
    Private _RootDir As String

    Public Sub New(Source As String, Target As String, CompressionLevel As CompressionLevel)

        _Overwrite = False
        _IncludeRootDir = True
        _Target = Target
        _Compression = CompressionLevel
        _Cancel = False

        If IsDir(Source) <> 1 Then
            _IsDir = IsDir(Source)
            _Source = Source
        Else
            Throw New Exception("Source file or directory doesn't exist or cannot be accessed.")
        End If

    End Sub

    Private Function GetSessionLength() As Int64

        Dim sLen As Int64 = 0

        For Each SessionFile As String In _SessionFiles
            sLen += New FileInfo(SessionFile).Length
            If Cancel = True Then Exit For
        Next

        Return sLen

    End Function

    Private Function IsDir(Source As String) As Int16

        If File.Exists(Source) Then
            Return 0
        ElseIf Directory.Exists(Source) Then
            Return -1
        Else
            Return 1
        End If

    End Function

    Public Sub Compress()

        RaiseEvent StatusChanged("Gathering Required Information.")

        If SourceIsDirectory Then
            _SessionFiles = Directory.GetFiles(SourceURL, "*", SearchOption.AllDirectories)
        Else
            _SessionFiles = New String() {SourceURL}
        End If

        RaiseEvent StatusChanged("Examining Files.")

        _SessionLength = GetSessionLength()

        If SourceIsDirectory And IncludeRootDir = False Then
            _RootDir = SourceURL & "\"
        Else
            _RootDir = String.Join("\", SourceURL.Split("\").ToArray, _
                                   0, SourceURL.Split("\").ToArray.Length - 1) & "\"
        End If

        RaiseEvent StatusChanged("Compressing.")

        Try
            ZipItUp()
        Catch ex As Exception
            MsgBox(ex.Message)
            Exit Sub
        End Try

        If Cancel = True Then
            RaiseEvent StatusChanged("Cancelled.")
            RaiseEvent Progress(100)
        Else
            RaiseEvent StatusChanged("Complete.")
        End If

        RaiseEvent Complete()

    End Sub

    Private Sub ZipItUp()

        If Cancel = True Then Exit Sub

        Dim BlockSizeToRead As Int32 = 1048576 '1Mib Buffer
        Dim Buffer As Byte() = New Byte(BlockSizeToRead - 1) {}
        Dim BytesRead As Int64, TotalBytesRead As Int64
        Dim LiveProg As Int16 = 0
        Dim PrevProg As Int16 = 0

        If File.Exists(_Target) And OverwriteTarget = False Then
            Throw New Exception("Target File Already Exists.")
        Else
            File.Delete(_Target)
        End If

        Using FS As FileStream = New FileStream(_Target, FileMode.CreateNew, FileAccess.Write)

            Using Archive As ZipArchive = New ZipArchive(FS, ZipArchiveMode.Create)

                Dim Entry As ZipArchiveEntry = Nothing

                For Each SessionFile As String In _SessionFiles

                    Try
                        Using Reader As FileStream = File.Open(SessionFile, FileMode.Open, FileAccess.Read)

                            Entry = Archive.CreateEntry(SessionFile.Replace(_RootDir, ""), _
                                _Compression)

                            Using Writer As Stream = Entry.Open()

                                While (InlineAssignHelper(BytesRead, _
                                    Reader.Read(Buffer, 0, Buffer.Length - 1))) > 0
                                    Writer.Write(Buffer, 0, BytesRead)
                                    TotalBytesRead += BytesRead

                                    LiveProg = CInt((100 / _SessionLength) * TotalBytesRead)

                                    If LiveProg <> PrevProg Then

                                        PrevProg = LiveProg
                                        RaiseEvent Progress(LiveProg)

                                    End If

                                    If Cancel = True Then Exit While

                                End While

                            End Using
                        End Using

                    Catch Ex As Exception
                        TotalBytesRead += New FileInfo(SessionFile).Length
                        Console.WriteLine(String.Format("Unable to add file to archive: {0} Error:{1}", SessionFile, Ex.Message))
                    End Try

                    If Cancel = True Then Exit For

                Next

            End Using

        End Using

        If Cancel = True Then
            File.Delete(_Target)
        End If

    End Sub

    Private Function InlineAssignHelper(Of T)(ByRef target As T, value As T) As T
        target = value
        Return value
    End Function

End Class
J.C. SolvoTerra 109 Eat, Sleep, Code, Repeat Featured Poster

Note: I wish Chrome would fix my spell chek. Sorry everybody =0)

Here's a quick run down of the class.

The properties:

As most of the property names are self-descriptive I'm not going to go through each one, however an addition could be made to the setter methods of each property to prevent changes during compression.

If we add a private member say _Compressing as a boolean to the Zipper Class and set it to true when the compression method is called and false when the process is finished, doing the following to each setter method will prevent potential errors during the compression process due to a user changing a property.

    Private _Source As String
    Public Property SourceURL As String
        Get
            Return _Source
        End Get
        Set(value As String)

            'Add this code
            If _Compressing = True Then Exit Property

            _Source = value

        End Set
    End Property
The Methods
GetSessionLength
    Private Function GetSessionLength() As Int64
        Dim sLen As Int64 = 0
        For Each SessionFile As String In _SessionFiles
            sLen += New FileInfo(SessionFile).Length
            If Cancel = True Then Exit For
        Next
        Return sLen
    End Function

When the Compression method is called all the files\file paths are added to the List(Of String) _SessionFiles. This method simply iterates through each entry and tallys the total length in bytes of all the files to be read and compressed. This information can later be used to report the current progress of the compression process.

IsDir

J.C. SolvoTerra 109 Eat, Sleep, Code, Repeat Featured Poster

Note: I wish Chrome would fix my spell chek. Sorry everybody =0)

Here's a quick run down of the class.

The properties:

As most of the property names are self-descriptive I'm not going to go through each one, however an addition could be made to the setter methods of each property to prevent changes during compression.

If we add a private member say _Compressing as a boolean to the Zipper Class and set it to true when the compression method is called and false when the process is finished, doing the following to each setter method will prevent potential errors during the compression process due to a user changing a property.

    Private _Source As String
        Public Property SourceURL As String
            Get
                Return _Source
            End Get
            Set(value As String)

                'Add this code
                If _Compressing = True Then Exit Property

                _Source = value

            End Set
        End Property
The Methods

GetSessionLength

        Private Function GetSessionLength() As Int64
            Dim sLen As Int64 = 0
            For Each SessionFile As String In _SessionFiles
                sLen += New FileInfo(SessionFile).Length
                If Cancel = True Then Exit For
            Next
            Return sLen
        End Function

When the Compression method is called all the files\file paths are added to the List(Of String) _SessionFiles. This method simply iterates through each entry and tallys the total length in bytes of all the files to be read and compressed. This information can later be used to report the current progress of the compression process.

IsDir

        Private Function IsDir(Source As String) As Int16
            If File.Exists(Source) Then
                Return 0
            ElseIf Directory.Exists(Source) Then
                Return -1
            Else
                Return 1
            End If
        End Function

This method quite simply returns a value which determins if the Source object is a file, a folder or if it doesn't exist. The value -1 or True is returned if it is actually a directory, 0 or False if it's a file or 1 if it doesnt exist. This method is used during the Zipper Constructor method New.

         If IsDir(Source) <> 1 Then
                _IsDir = IsDir(Source)
                _Source = Source
            Else
                Throw New Exception("Source file or directory _
                doesn't exist or cannot be accessed.")
            End If

As you can see from this code, if the value returned is anything other than a Boolean value (-1 or 0) an exception is thrown. If the returned value is Boolean then the _IsDir and _Source private members are set.

InlineAssignHelper

        Private Function InlineAssignHelper(Of T)(ByRef target As T, _
        value As T) As T
            target = value
            Return value
        End Function

This method is (I believe) orginally from C# and there is no namespace for it in VB so the method has to be added manually. This method simply sets the Target reference to and returns the value of the argument "Value". It is used during the compression routine to determine how many bytes are read from a file. It basically returns the same value in two different locations (Traget and returned value) from the same call.

Compress

This is the main Public method for the user but doesn't contain the compression routine, this method prepares the information for the main compression method "ZipIt"

First, the _SessionFiles list is populated, the user can pass a directory containing sub-directories and files or a single file. If the source is a directory we use the Directory.Getfiles method to scan for all the files contained within.

_SessionFiles = Directory.GetFiles(SourceURL, "*", SearchOption.AllDirectories)

Other than the Source argument, there is also a Pattern argument (Here I used "*" = All files) and SearchOptions (SearchOptions.AllDirectories = Include subfolders).

In this example I havent made these arguments available to the user but these options could quite easiloy be implemented.

The pattern argument, is as you might expect "*.exe" returns all files that end with .exe etc
The SearchOptions can also be set to TopDirectoriesOnly which will ignore sub-directories.

If the user has selected a single file, I still populate the _SessionFiles list but only with the individual file

_SessionFiles = New String() {SourceURL}

Next I call the previousley mentioned GetSessionLength method and store it's value for later use

_SessionLength = GetSessionLength()

The last thing I do before calling ZipIt is set the root directory for the entries in the zip file.

If SourceIsDirectory And IncludeRootDir = False Then
     _RootDir = SourceURL & "\"
Else
     _RootDir = String.Join("\", SourceURL.Split("\").ToArray, _
                  0, SourceURL.Split("\").ToArray.Length - 1) & "\"
End If  

For those familiar with Zip files, you can opt to include or exclude the Root directory of the source object or objects. Because I don't wan't to include the entire file\directory path in the Zip entry name, I store the relevant Root path which is then removed later when creating entries in our Zip file.

For example if our File_To_Be_Zipped is "C:\Users\JoeBloggs\Documents\TestFile.doc" I only want the entry name to be "TestFile.doc" so I replace "C:\Users\JoeBloggs\Documents\" with ""

or a directory "C:\Users\JoeBloggs\Documents\My Projects\"

including the root Dir: the zip file may contain several entries like so

My Projects\File1.doc   
My Projects\File2.doc
My Projects\File3.doc
My Projects\SubDir\File1.doc    

or NOT including the root Dir: the zip file may contain several entries like so

File1.doc   
File2.doc
File3.doc
SubDir\File1.doc

Once all these details have been recorded we move onto the main compression method "ZipItUp"

ZipItUp

I start by declaring a few private members:

BlockSizeToRead is set to 1 Mib. This is the max size in bytes we attempt to read

Dim BlockSizeToRead As Int32 = 1048576 '1Mib Buffer

Buffer is the byte arrat which will holds the read bytes in each repetiton

Dim Buffer As Byte() = New Byte(BlockSizeToRead - 1) {}

BytesRead and TotalBytes are used to build a progress report. BytesRead holds the length of the actual amount of bytes read in each repetition

Dim BytesRead As Int64, TotalBytesRead As Int64

LiveProg and PrevProg are used to compare current progress and previous progress in order to only update the progress if the progress percent has actually changed. Using an int value for huge compression sessions may not be reponsive enough on some slower machines

    Dim LiveProg As Int16 = 0
    Dim PrevProg As Int16 = 0

Next I determine if the ueser wants to prevent the overwriting of an existing file or remove an existing file

            If File.Exists(_Target) And OverwriteTarget = False Then
                Throw New Exception("Target File Already Exists.")
            Else
                File.Delete(_Target)
            End If

The Main Routine:

    Using FS As FileStream = New FileStream(_Target, _
            FileMode.CreateNew, FileAccess.Write)

First we Create a new file stream. It's important to note here that the FileAccess is Write. Many examples online fail due to large file sizes simply because the wrong FileAccess has been set eg ReadWrite will proccess the filestream in memory before committing to disk. This means that not only are you filling up your RAM (Which happens very fast) and can cause an OutOfMemoryException with larger files it also renders this progress report useless as the file is compressed to memory very fast. On terminating the filestream with End Using, the stream is then written to disk, again for larger files this can take an additional 20 seconds meaning the progress bar is sitting at 100% whilst the user is still left waiting for a (without disk read\write calculations) undetermind amount of time.

Next a new Archive is created and attatch it to our filestream

        Using Archive As ZipArchive = New ZipArchive(FS, _
                ZipArchiveMode.Create)

And create a new ZipEntry object

    Dim Entry As ZipArchiveEntry = Nothing

We then begin iterating through the files to be compressed.

    For Each SessionFile As String In _SessionFiles

SessionFile holds the current file to be added to the archive in each repetition. We will create a new filestream, this time to read the bytes of the current file to be added to the file. Again ensuring the FileAccess is Read

    Using Reader As FileStream = File.Open(SessionFile, _
        FileMode.Open, FileAccess.Read)

We now create a new ArchiveEntry handle

    Entry = Archive.CreateEntry(SessionFile.Replace(_RootDir, ""), _
                                    _Compression)

You can now see the _RootDir value in action. The entry name will be stripped of the fully qualified path and switched out to a relative value using the Replace method. The compression argument is set by the user, These are Optimal, Fastest or None.

We then create a stream to our Entry within the archive

    Using Writer As Stream = Entry.Open()

And read the source files bytes in chunks of up to 1Mib

    While (InlineAssignHelper(BytesRead, _
                    Reader.Read(Buffer, 0, Buffer.Length - 1))) > 0

Here you can see the InlineAssignHelper method in use. It's returned value is used by the While statement to see if it has reached the end of the file. BytesRead is also populated with the same value, which will be used in just a second.

Immediatly after we have read a chunk of bytes we write it back to the entry stream

    Writer.Write(Buffer, 0, BytesRead)

And the TotalBytesRead is updated

    TotalBytesRead += BytesRead

Checking and updating the progress:

    LiveProg = CInt((100 / _SessionLength) * TotalBytesRead)

LiveProg holds the current state of the progress, as this is an integer, for larger sessions (infact, anything over 100 bytes) this value may get updated any number of times without the progress actually changing. There's no point in attempting to update a UI object repeatedly unless the value has actually changed.

This next parts checks to see if the progress has changed and if so update the UI, whilst recording the new progress for later comparison.

     If LiveProg <> PrevProg Then

        PrevProg = LiveProg
        RaiseEvent Progress(LiveProg)

    End If

A try\Catch clause has been implemented to capture troublesome files, this tends to catch files that are protected or opened by other proccess on your system.

    Catch Ex As Exception
        TotalBytesRead += New FileInfo(SessionFile).Length
        Console.WriteLine(String.Format("Unable to add file to _
            archive: {0} Error:{1}", SessionFile, Ex.Message))
    End Try

It is important if an error does occur to update the TotalBytesRead manually from this point. As the file is skipped (Normally at the Read statement) the routines progress wont be updated accordingly and will leave the user with an un-accurate progress position.

Example if this session has 10 files, each 100Mib in length and one of those files is skipped, The TotalBytesRead wouldn't be updated in the main loop. The progress would report 70% complete when it's 80% complete or even 90% complete when the whole process has actually finished.

Cancel

Throughout the code you will see various references to the _Cancel member. Even if an instance of Zipper is running on a thread the user can set Cancel to True. If this happens the current proccess and all proceeding processes will be skipped.

Well, that's pretty much it. I hope this has given you a good insight to my method for compressing files of any size using the Compression namespace in .Net 4.5

Cristhian_1 0 Newbie Poster

Good morning. What would be the code to decompress the file with the process bar?
Thanks!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.