Windows Path String Management

deceptikon 0 Tallied Votes 382 Views Share

Another quickie. This is a basic class for working with Windows path strings. It's based off of .NET's System.IO.Path class, but doesn't contain any API dependencies (and thus doesn't normalize the paths, it's all straight string handling). Methods that I've personally found to be useful are included.

The class is specific to std::string so as to remain both simple and clean.

#include <stdexcept>
#include <string>

namespace win32 {
    class Path final {
    public:
        static const char AltDirectorySeparator = '/';
        static const char DirectorySeparator = '\\';
        static const char ExtensionSeparator = '.';
        static const char VolumeSeparator = ':';

        static const char ReservedPathChars[];
        static const char ReservedFileChars[];

        static std::string ChangeExtension(const std::string& path, const std::string& extension, bool checked = true);
        static std::string ChangeFileName(const std::string& path, const std::string& filename, bool checked = true);

        static std::string Combine(const std::string& lhs, const std::string& rhs, bool checked = true);

        static std::string GetDirectoryName(const std::string& path, bool checked = true);
        static std::string GetExtension(const std::string& path, bool checked = true);
        static std::string GetFileName(const std::string& path, bool checked = true);
        static std::string GetFileNameWithoutExtension(const std::string& path, bool checked = true);
        static std::string GetRoot(const std::string& path, bool checked = true);

        static bool HasExtension(const std::string& path, bool checked = true);
        static bool HasRoot(const std::string& path, bool checked = true);

        static bool IsValidFileName(const std::string& filename);
        static bool IsValidPath(const std::string& path);
    };

    const char Path::ReservedPathChars[] = {
        // Invalid characters outside the low values
        '"','<', '>', '|',

        // ASCII/Unicode low value characters (0-31)
        '\x00','\x01','\x02','\x03','\x04','\x05','\x06','\x07','\x08','\x09','\x0a',
        '\x0b','\x0c','\x0d','\x0e','\x0f','\x10','\x11','\x12','\x13','\x14','\x15',
        '\x16','\x17','\x18','\x19','\x1a','\x1b','\x1c','\x1d','\x1e','\x1f'
    };

    const char Path::ReservedFileChars[] = {
        // Additional characters not in ReservedPathChars
        ':','*','?','\\','/',

        // Invalid characters outside the low values
        '"','<', '>', '|',

        // ASCII/Unicode low value characters (0-31)
        '\x00','\x01','\x02','\x03','\x04','\x05','\x06','\x07','\x08','\x09','\x0a',
        '\x0b','\x0c','\x0d','\x0e','\x0f','\x10','\x11','\x12','\x13','\x14','\x15',
        '\x16','\x17','\x18','\x19','\x1a','\x1b','\x1c','\x1d','\x1e','\x1f'
    };

    std::string Path::ChangeExtension(const std::string& path, const std::string& extension, bool checked)
    {
        if (checked) {
            if (!IsValidPath(path)) {
                throw std::invalid_argument("Invalid characters in path");
            }

            if (!IsValidFileName(extension)) {
                throw std::invalid_argument("Invalid characters in extension");
            }
        }

        auto end = path.size();

        // Locate the point of insertion for the new extension
        while (--end < std::string::npos) {
            char ch{path[end]};

            if (ch == ExtensionSeparator) {
                // Found an extension, it's safe to replace
                break;
            }
            else if (ch == DirectorySeparator || ch == AltDirectorySeparator || ch == VolumeSeparator) {
                // No extension found, so do nothing
                return path;
            }
        }

        auto new_path = path.substr(0, end);

        if (!path.empty()) {
            if (extension.empty() || extension[0] != ExtensionSeparator) {
                // Apply the extension separator if it's not already there
                new_path += ExtensionSeparator;
            }

            new_path += extension;
        }

        return new_path;
    }

    std::string Path::ChangeFileName(const std::string& path, const std::string& filename, bool checked)
    {
        if (checked) {
            if (!IsValidPath(path)) {
                throw std::invalid_argument("Invalid characters in path");
            }

            if (!IsValidFileName(filename)) {
                throw std::invalid_argument("Invalid characters in file name");
            }
        }

        return Combine(
            GetDirectoryName(path, false), 
            filename + ExtensionSeparator + GetExtension(path, false));
    }

    std::string Path::Combine(const std::string& lhs, const std::string& rhs, bool checked)
    {
        if (checked) {
            if (!IsValidPath(lhs)) {
                throw std::invalid_argument("Invalid characters in lhs path");
            }

            if (!IsValidPath(lhs)) {
                throw std::invalid_argument("Invalid characters in lhs path");
            }
        }

        /*
            If either string is empty, return the other
        */
        if (rhs.empty()) {
            return lhs;
        }
        
        if (lhs.empty()) {
            return rhs;
        }

        char ch{lhs.back()};
        auto path = lhs;

        if (ch != DirectorySeparator && ch != AltDirectorySeparator && ch != VolumeSeparator) {
            // Add a directory separator if one isn't already present
            path += DirectorySeparator;
        }

        return path + rhs;
    }

    std::string Path::GetDirectoryName(const std::string& path, bool checked)
    {
        if (checked && !IsValidPath(path)) {
            throw std::invalid_argument("Invalid characters in path");
        }

        auto end = path.size();

        // Locate the most nested directory or volume separator
        while (--end < std::string::npos) {
            char ch{path[end]};

            if (ch == VolumeSeparator || ch == DirectorySeparator) {
                // Found a separator, proceed to extract it
                break;
            }
        }

        if (end == std::string::npos) {
            // Failed to find any directory or volume separators
            return "";
        }

        return path.substr(0, end + 1);
    }

    std::string Path::GetExtension(const std::string& path, bool checked)
    {
        if (checked && !IsValidPath(path)) {
            throw std::invalid_argument("Invalid characters in path");
        }

        auto end = path.size();

        // Locate the start of an extension
        while (--end < std::string::npos) {
            char ch{path[end]};

            if (ch == ExtensionSeparator) {
                // Found an extension, proceed to return it
                break;
            }
            else if (ch == DirectorySeparator || ch == AltDirectorySeparator) {
                // No extension found, so the result is an empty string
                return "";
            }
        }

        if (end == path.size() - 1) {
            // It wasn't a real extension, just a trailing extension separator
            return "";
        }

        return path.substr(end);
    }

    std::string Path::GetFileName(const std::string& path, bool checked)
    {
        if (checked && !IsValidPath(path)) {
            throw std::invalid_argument("Invalid characters in path");
        }

        auto end = path.size();

        // Locate the most nested directory or volume separator
        while (--end < std::string::npos) {
            char ch{path[end]};

            if (ch == DirectorySeparator || ch == AltDirectorySeparator || ch == VolumeSeparator) {
                // Found a separator, proceed to extract the file name
                break;
            }
        }

        if (end == std::string::npos) {
            // No separators found, just return the argument
            return path;
        }

        auto filename = path.substr(end + 1);

        if (checked && !IsValidFileName(filename)) {
            throw std::invalid_argument("Invalid characters in file name");
        }

        return filename;
    }

    std::string Path::GetFileNameWithoutExtension(const std::string& path, bool checked)
    {
        if (checked && !IsValidPath(path)) {
            throw std::invalid_argument("Invalid characters in path");
        }

        std::string filename = GetFileName(path, false);

        if (checked && !IsValidFileName(filename)) {
            throw std::invalid_argument("Invalid characters in file name");
        }

        return filename.substr(0, filename.find_last_of(ExtensionSeparator));
    }

    std::string Path::GetRoot(const std::string& path, bool checked)
    {
        if (checked && !IsValidPath(path)) {
            throw std::invalid_argument("Invalid characters in path");
        }

        auto len = path.size();
        decltype(len) end = 0;

        if (len > 0 && (path[0] == DirectorySeparator || path[0] == AltDirectorySeparator)) {
            // The path is either an absolute directory or a UNC path
            end = 1;

            if (len > 1 && (path[1] == DirectorySeparator || path[1] == AltDirectorySeparator)) {
                // It's a UNC path, so locate the server and share
                int separators_found = 0;

                // Continue until two "directories" are detected, this locates the 
                // pattern of "\\<first>\<second>" that corresponds to a UNC share.
                while (++end < len && separators_found < 2) {
                    if (path[end] == DirectorySeparator || path[end] == AltDirectorySeparator) {
                        ++separators_found;
                    }
                }
            }
            else {
                // It's an absolute directory, so we're done
            }
        }
        else if (len > 1 && path[1] == VolumeSeparator) {
            // The path is rooted at a volume, locate the first non-separator
            end = 2;

            // A volume may or may not have a directory separator, both are valid
            if (len > 2 && (path[2] == DirectorySeparator || path[2] == AltDirectorySeparator)) {
                end = 3;
            }
        }

        return path.substr(0, end);
    }

    bool Path::HasExtension(const std::string& path, bool checked)
    {
        if (checked && !IsValidPath(path)) {
            throw std::invalid_argument("Invalid characters in path");
        }

        auto end = path.size();

        // Try to locate the extension separator
        while (--end < std::string::npos) {
            char ch{path[end]};

            if (ch == ExtensionSeparator) {
                // It's a valid extension only if the separator *isn't* the last character in the string
                return end != path.size() - 1;
            }
            else if (ch == DirectorySeparator || ch == AltDirectorySeparator || ch == VolumeSeparator) {
                // Found a directory or volume, so we're beyond the point where we need to continue
                break;
            }
        }

        return false;
    }

    bool Path::HasRoot(const std::string& path, bool checked)
    {
        if (checked && !IsValidPath(path)) {
            throw std::invalid_argument("Invalid characters in path");
        }

        auto len = path.size();

        if (len > 0) {
            // Non-empty strings may have either a directory or volume root
            if (path[0] == DirectorySeparator || path[0] == AltDirectorySeparator) {
                // It's a directory or UNC root, but there's no need to do further 
                // checks to differentiate between them. Either way the string is rooted.
                return true;
            }
            else if (len > 1) {
                // It's *not* a directory or UNC root, check for a volume
                // because the string is long enough to handle one.
                if (path[1] == VolumeSeparator) {
                    // The separator alone is enough to tell if the string is rooted.
                    return true;
                }
            }
        }

        return false;
    }

    bool Path::IsValidFileName(const std::string& filename)
    {
        return filename.find_first_of(ReservedFileChars, 0, sizeof ReservedFileChars) == std::string::npos;
    }

    bool Path::IsValidPath(const std::string& path)
    {
        return path.find_first_of(ReservedPathChars, 0, sizeof ReservedPathChars) == std::string::npos;
    }
}
mrnutty 761 Senior Poster

Hmm...seems like its a little more complicated than it needs to be. Some of your algorithms can be simplified using the algorithms in the string library, for example:

 std::string GetExtension(const std::string& filename){
    _validate(); //your exception validation
    auto endExtensionPosition = filename.find_last_of('.');
    auto endPathSeperatorPosition = filename.find_last_of("/\\");
    if(endPathSeperatorPosition > endExtensionPosition || endExtensionPosition != std::string::npos) return "";
    else return filename.substr(endExtensionPosition);

 }

Thats a rough idea not sure if that works in one shot though. Also since most of these functions are static, it would make more sense to put it under namespace instead of class, just to follow the idiom.

Another thing is that this can easily be extended to be handled in non-win32 environment just by changing the reservedChars, pathSeperator and so on. Anyways, nice job!

deceptikon 1,790 Code Sniper Team Colleague Featured Poster

Thanks for the feedback!

Hmm...seems like its a little more complicated than it needs to be.

Longer perhaps (length and complexity aren't 1-to-1), but I don't think it's any more complicated than, say, your suggested alternative in this case.

Some of your algorithms can be simplified using the algorithms in the string library

Yes, that's certainly an option. Full disclosure, I was totally expecting the first reply to be all "why didn't you use the standared algorithms, noob?" ;D

In fact, I did start with stuff like that, and chose to go with a more manual approach in later drafts because I found the latter to be cleaner and more readable in this instance. I'd call this a case of recognizing that just because the standard library is there, you don't have to use it if you don't feel it's the best fit.

it would make more sense to put it under namespace instead of class, just to follow the idiom.

To which idiom do you refer? There's more than one, and I clearly didn't follow the usual C++ conventions. In fact, one might argue that by taking interface cues from .NET, the "static class" idiom is better here. ;)

Another thing is that this can easily be extended to be handled in non-win32 environment just by changing the reservedChars, pathSeperator and so on.

Agreed, and the only reason it's Windows-specific is I was only 90% sure I could get a fully general approach right in the hour I spent on this. But the Windows rules I'm solid on, so it's as you see it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.