There are many examples of compressing files in VB.Net. So many of these examples are using third party libraries which is just not neccesarry. People were saying that files and archives exceeding 4Gb couldn't be used... I hate restrictions and decided to "Stick it to Microsoft"

To get a better understanding of Zip files and their contents I set about writing my own Zip file generator from the ground up (Still using Phil Katz's Zip algorythem). Eventually I successfully built my own Zip and ZIP64 archives byte by byte.

I then looked at my work and thought, now I have a grasp of the inner workings of a Zip file (With the PKZip algorythem) let's revist the compresion name space to make sure I wasn't simply wasting my time. It turns out that all I achieved from this project was to get a very solid working knowledge of Zip files, and how to build them internally as these romours about 4Gb limits etc were totally bogus, the Compression namespace converts large files (on the fly) to Zip64 as required. So...

Here is my Compression method, wrapped up nicely in an easy to use class. It uses the .Net4.5 Compression namespace and reads\writes files directly from\to your disk. If you observe your systems performance during compression, No additional memory is used.

I hope this helps a lot of people out in the future.

Edited by J.C. SolvoTerra: Missed File Access Read

'INCLUDE FOLLOWING REFERENCES Requires .Net 4.5
'System.IO.Compression
'System.IO.Compression.FileSystem

Imports System.IO
Imports System.IO.Compression

Public Class Zipper

Public Event Progress(Percent As Integer)
Public Event Complete()
Public Event StatusChanged(Status As String)

Private _Cancel As Boolean
Public Property Cancel As Boolean
Get

Return _Cancel

End Get
Set(value As Boolean)

If _Cancel = True Then Exit Property

_Cancel = value

End Set
End Property

Private _Compression As CompressionLevel
Public Property CompressionLevel As CompressionLevel
Get
Return _Compression
End Get
Set(value As CompressionLevel)
_Compression = value
End Set
End Property

Private _Target As String
Public Property TargetURL As String
Get
Return _Target
End Get
Set(value As String)
_Target = value
End Set
End Property

Private _Source As String
Public Property SourceURL As String
Get
Return _Source
End Get
Set(value As String)
_Source = value
End Set
End Property

Private _IsDir As Boolean
Get
Return _IsDir
End Get
End Property

Private _Overwrite As Boolean
Public Property OverwriteTarget As Boolean
Get
Return _Overwrite
End Get
Set(value As Boolean)
_Overwrite = value
End Set
End Property

Private _IncludeRootDir As Boolean
Public Property IncludeRootDir As Boolean
Get
Return _IncludeRootDir
End Get
Set(value As Boolean)
_IncludeRootDir = value
End Set
End Property

Private _SessionLength As Int64
Private _SessionFiles As String()
Private _RootDir As String

Public Sub New(Source As String, Target As String, CompressionLevel As CompressionLevel)

_Overwrite = False
_IncludeRootDir = True
_Target = Target
_Compression = CompressionLevel
_Cancel = False

If IsDir(Source) <> 1 Then
_IsDir = IsDir(Source)
_Source = Source
Else
Throw New Exception("Source file or directory doesn't exist or cannot be accessed.")
End If

End Sub

Private Function GetSessionLength() As Int64

Dim sLen As Int64 = 0

For Each SessionFile As String In _SessionFiles
sLen += New FileInfo(SessionFile).Length
If Cancel = True Then Exit For
Next

Return sLen

End Function

Private Function IsDir(Source As String) As Int16

If File.Exists(Source) Then
Return 0
ElseIf Directory.Exists(Source) Then
Return -1
Else
Return 1
End If

End Function

Public Sub Compress()

RaiseEvent StatusChanged("Gathering Required Information.")

If SourceIsDirectory Then
_SessionFiles = Directory.GetFiles(SourceURL, "*", SearchOption.AllDirectories)
Else
_SessionFiles = New String() {SourceURL}
End If

RaiseEvent StatusChanged("Examining Files.")

_SessionLength = GetSessionLength()

If SourceIsDirectory And IncludeRootDir = False Then
_RootDir = SourceURL & "\"
Else
_RootDir = String.Join("\", SourceURL.Split("\").ToArray, _
0, SourceURL.Split("\").ToArray.Length - 1) & "\"
End If

RaiseEvent StatusChanged("Compressing.")

Try
ZipItUp()
Catch ex As Exception
MsgBox(ex.Message)
Exit Sub
End Try

If Cancel = True Then
RaiseEvent StatusChanged("Cancelled.")
RaiseEvent Progress(100)
Else
RaiseEvent StatusChanged("Complete.")
End If

RaiseEvent Complete()

End Sub

Private Sub ZipItUp()

If Cancel = True Then Exit Sub

Dim BlockSizeToRead As Int32 = 1048576 '1Mib Buffer
Dim Buffer As Byte() = New Byte(BlockSizeToRead - 1) {}
Dim LiveProg As Int16 = 0
Dim PrevProg As Int16 = 0

If File.Exists(_Target) And OverwriteTarget = False Then
Throw New Exception("Target File Already Exists.")
Else
File.Delete(_Target)
End If

Using FS As FileStream = New FileStream(_Target, FileMode.CreateNew, FileAccess.Write)

Using Archive As ZipArchive = New ZipArchive(FS, ZipArchiveMode.Create)

Dim Entry As ZipArchiveEntry = Nothing

For Each SessionFile As String In _SessionFiles

Try

Entry = Archive.CreateEntry(SessionFile.Replace(_RootDir, ""), _
_Compression)

Using Writer As Stream = Entry.Open()

LiveProg = CInt((100 / _SessionLength) * TotalBytesRead)

If LiveProg <> PrevProg Then

PrevProg = LiveProg
RaiseEvent Progress(LiveProg)

End If

If Cancel = True Then Exit While

End While

End Using
End Using

Catch Ex As Exception
Console.WriteLine(String.Format("Unable to add file to archive: {0} Error:{1}", SessionFile, Ex.Message))
End Try

If Cancel = True Then Exit For

Next

End Using

End Using

If Cancel = True Then
File.Delete(_Target)
End If

End Sub

Private Function InlineAssignHelper(Of T)(ByRef target As T, value As T) As T
target = value
Return value
End Function

End Class
2
Contributors
3
Replies
51
Views
3 Years
Discussion Span
Last Post by Cristhian_1

Note: I wish Chrome would fix my spell chek. Sorry everybody =0)

Here's a quick run down of the class.

# The properties:

As most of the property names are self-descriptive I'm not going to go through each one, however an addition could be made to the setter methods of each property to prevent changes during compression.

If we add a private member say _Compressing as a boolean to the Zipper Class and set it to true when the compression method is called and false when the process is finished, doing the following to each setter method will prevent potential errors during the compression process due to a user changing a property.

    Private _Source As String
Public Property SourceURL As String
Get
Return _Source
End Get
Set(value As String)

If _Compressing = True Then Exit Property

_Source = value

End Set
End Property


# The Methods

## GetSessionLength

    Private Function GetSessionLength() As Int64
Dim sLen As Int64 = 0
For Each SessionFile As String In _SessionFiles
sLen += New FileInfo(SessionFile).Length
If Cancel = True Then Exit For
Next
Return sLen
End Function


When the Compression method is called all the files\file paths are added to the List(Of String) _SessionFiles. This method simply iterates through each entry and tallys the total length in bytes of all the files to be read and compressed. This information can later be used to report the current progress of the compression process.

IsDir

Note: I wish Chrome would fix my spell chek. Sorry everybody =0)

Here's a quick run down of the class.

# The properties:

As most of the property names are self-descriptive I'm not going to go through each one, however an addition could be made to the setter methods of each property to prevent changes during compression.

If we add a private member say _Compressing as a boolean to the Zipper Class and set it to true when the compression method is called and false when the process is finished, doing the following to each setter method will prevent potential errors during the compression process due to a user changing a property.

    Private _Source As String
Public Property SourceURL As String
Get
Return _Source
End Get
Set(value As String)

If _Compressing = True Then Exit Property

_Source = value

End Set
End Property


# The Methods

GetSessionLength

        Private Function GetSessionLength() As Int64
Dim sLen As Int64 = 0
For Each SessionFile As String In _SessionFiles
sLen += New FileInfo(SessionFile).Length
If Cancel = True Then Exit For
Next
Return sLen
End Function


When the Compression method is called all the files\file paths are added to the List(Of String) _SessionFiles. This method simply iterates through each entry and tallys the total length in bytes of all the files to be read and compressed. This information can later be used to report the current progress of the compression process.

IsDir

        Private Function IsDir(Source As String) As Int16
If File.Exists(Source) Then
Return 0
ElseIf Directory.Exists(Source) Then
Return -1
Else
Return 1
End If
End Function


This method quite simply returns a value which determins if the Source object is a file, a folder or if it doesn't exist. The value -1 or True is returned if it is actually a directory, 0 or False if it's a file or 1 if it doesnt exist. This method is used during the Zipper Constructor method New.

         If IsDir(Source) <> 1 Then
_IsDir = IsDir(Source)
_Source = Source
Else
Throw New Exception("Source file or directory _
doesn't exist or cannot be accessed.")
End If


As you can see from this code, if the value returned is anything other than a Boolean value (-1 or 0) an exception is thrown. If the returned value is Boolean then the _IsDir and _Source private members are set.

InlineAssignHelper

        Private Function InlineAssignHelper(Of T)(ByRef target As T, _
value As T) As T
target = value
Return value
End Function


This method is (I believe) orginally from C# and there is no namespace for it in VB so the method has to be added manually. This method simply sets the Target reference to and returns the value of the argument "Value". It is used during the compression routine to determine how many bytes are read from a file. It basically returns the same value in two different locations (Traget and returned value) from the same call.

Compress

This is the main Public method for the user but doesn't contain the compression routine, this method prepares the information for the main compression method "ZipIt"

First, the _SessionFiles list is populated, the user can pass a directory containing sub-directories and files or a single file. If the source is a directory we use the Directory.Getfiles method to scan for all the files contained within.

_SessionFiles = Directory.GetFiles(SourceURL, "*", SearchOption.AllDirectories)


Other than the Source argument, there is also a Pattern argument (Here I used "*" = All files) and SearchOptions (SearchOptions.AllDirectories = Include subfolders).

In this example I havent made these arguments available to the user but these options could quite easiloy be implemented.

The pattern argument, is as you might expect "*.exe" returns all files that end with .exe etc
The SearchOptions can also be set to TopDirectoriesOnly which will ignore sub-directories.

If the user has selected a single file, I still populate the _SessionFiles list but only with the individual file

_SessionFiles = New String() {SourceURL}


Next I call the previousley mentioned GetSessionLength method and store it's value for later use

_SessionLength = GetSessionLength()


The last thing I do before calling ZipIt is set the root directory for the entries in the zip file.

If SourceIsDirectory And IncludeRootDir = False Then
_RootDir = SourceURL & "\"
Else
_RootDir = String.Join("\", SourceURL.Split("\").ToArray, _
0, SourceURL.Split("\").ToArray.Length - 1) & "\"
End If


For those familiar with Zip files, you can opt to include or exclude the Root directory of the source object or objects. Because I don't wan't to include the entire file\directory path in the Zip entry name, I store the relevant Root path which is then removed later when creating entries in our Zip file.

For example if our File_To_Be_Zipped is "C:\Users\JoeBloggs\Documents\TestFile.doc" I only want the entry name to be "TestFile.doc" so I replace "C:\Users\JoeBloggs\Documents\" with ""

or a directory "C:\Users\JoeBloggs\Documents\My Projects\"

including the root Dir: the zip file may contain several entries like so

My Projects\File1.doc
My Projects\File2.doc
My Projects\File3.doc
My Projects\SubDir\File1.doc


or NOT including the root Dir: the zip file may contain several entries like so

File1.doc
File2.doc
File3.doc
SubDir\File1.doc


Once all these details have been recorded we move onto the main compression method "ZipItUp"

ZipItUp

I start by declaring a few private members:

BlockSizeToRead is set to 1 Mib. This is the max size in bytes we attempt to read

Dim BlockSizeToRead As Int32 = 1048576 '1Mib Buffer


Buffer is the byte arrat which will holds the read bytes in each repetiton

Dim Buffer As Byte() = New Byte(BlockSizeToRead - 1) {}


BytesRead and TotalBytes are used to build a progress report. BytesRead holds the length of the actual amount of bytes read in each repetition

Dim BytesRead As Int64, TotalBytesRead As Int64


LiveProg and PrevProg are used to compare current progress and previous progress in order to only update the progress if the progress percent has actually changed. Using an int value for huge compression sessions may not be reponsive enough on some slower machines

    Dim LiveProg As Int16 = 0
Dim PrevProg As Int16 = 0


Next I determine if the ueser wants to prevent the overwriting of an existing file or remove an existing file

            If File.Exists(_Target) And OverwriteTarget = False Then
Throw New Exception("Target File Already Exists.")
Else
File.Delete(_Target)
End If


The Main Routine:

    Using FS As FileStream = New FileStream(_Target, _
FileMode.CreateNew, FileAccess.Write)


First we Create a new file stream. It's important to note here that the FileAccess is Write. Many examples online fail due to large file sizes simply because the wrong FileAccess has been set eg ReadWrite will proccess the filestream in memory before committing to disk. This means that not only are you filling up your RAM (Which happens very fast) and can cause an OutOfMemoryException with larger files it also renders this progress report useless as the file is compressed to memory very fast. On terminating the filestream with End Using, the stream is then written to disk, again for larger files this can take an additional 20 seconds meaning the progress bar is sitting at 100% whilst the user is still left waiting for a (without disk read\write calculations) undetermind amount of time.

Next a new Archive is created and attatch it to our filestream

        Using Archive As ZipArchive = New ZipArchive(FS, _
ZipArchiveMode.Create)


And create a new ZipEntry object

    Dim Entry As ZipArchiveEntry = Nothing


We then begin iterating through the files to be compressed.

    For Each SessionFile As String In _SessionFiles


SessionFile holds the current file to be added to the archive in each repetition. We will create a new filestream, this time to read the bytes of the current file to be added to the file. Again ensuring the FileAccess is Read

    Using Reader As FileStream = File.Open(SessionFile, _


We now create a new ArchiveEntry handle

    Entry = Archive.CreateEntry(SessionFile.Replace(_RootDir, ""), _
_Compression)


You can now see the _RootDir value in action. The entry name will be stripped of the fully qualified path and switched out to a relative value using the Replace method. The compression argument is set by the user, These are Optimal, Fastest or None.

We then create a stream to our Entry within the archive

    Using Writer As Stream = Entry.Open()


And read the source files bytes in chunks of up to 1Mib

    While (InlineAssignHelper(BytesRead, _


Here you can see the InlineAssignHelper method in use. It's returned value is used by the While statement to see if it has reached the end of the file. BytesRead is also populated with the same value, which will be used in just a second.

Immediatly after we have read a chunk of bytes we write it back to the entry stream

    Writer.Write(Buffer, 0, BytesRead)


    TotalBytesRead += BytesRead


Checking and updating the progress:

    LiveProg = CInt((100 / _SessionLength) * TotalBytesRead)


LiveProg holds the current state of the progress, as this is an integer, for larger sessions (infact, anything over 100 bytes) this value may get updated any number of times without the progress actually changing. There's no point in attempting to update a UI object repeatedly unless the value has actually changed.

This next parts checks to see if the progress has changed and if so update the UI, whilst recording the new progress for later comparison.

     If LiveProg <> PrevProg Then

PrevProg = LiveProg
RaiseEvent Progress(LiveProg)

End If


A try\Catch clause has been implemented to capture troublesome files, this tends to catch files that are protected or opened by other proccess on your system.

    Catch Ex As Exception
Console.WriteLine(String.Format("Unable to add file to _
archive: {0} Error:{1}", SessionFile, Ex.Message))
End Try


It is important if an error does occur to update the TotalBytesRead manually from this point. As the file is skipped (Normally at the Read statement) the routines progress wont be updated accordingly and will leave the user with an un-accurate progress position.

Example if this session has 10 files, each 100Mib in length and one of those files is skipped, The TotalBytesRead wouldn't be updated in the main loop. The progress would report 70% complete when it's 80% complete or even 90% complete when the whole process has actually finished.

Cancel

Throughout the code you will see various references to the _Cancel member. Even if an instance of Zipper is running on a thread the user can set Cancel to True. If this happens the current proccess and all proceeding processes will be skipped.

Well, that's pretty much it. I hope this has given you a good insight to my method for compressing files of any size using the Compression namespace in .Net 4.5

Good morning. What would be the code to decompress the file with the process bar?
Thanks!

Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.