Skip to main content

module lang::paths::Windows

rascal-0.40.16

Defines the syntax of filesystem and network drive paths on DOS and Windows Systems.

Usage

import lang::paths::Windows;

Dependencies

import IO;
import util::SystemAPI;
import ParseTree;

Description

This syntax definition of file paths and file names in Windows formalizes open-source implementations manually written in Java, C++ and C# code. These are parsers for Windows syntax of file and directory names, as well as shares on local networks (UNC notation). It also derives from openly available documentation sources on Windows and the .NET platform for confirmation and test examples.

The main function of this module, Parse Windows Path:

  • faithfully maps any syntactically correctly Windows paths to syntactically correct loc values.
  • throws a ParseError if the path does not comply. Typically file names ending in spaces do not comply.
  • ensures that if the file exists on system A, then the loc representation resolves to the same file on system A via any IO function.
  • and nothing more. No normalization, no interpretatioon of . and .., no changing of cases. This is left to downstream processors of loc values, if necessary. The current transformation is purely syntactical, and tries to preserve the semantics of the path as much as possible.

Pitfalls

  • Length limitations are not implemnted by this parser. This means that overly long names will lead to IO exceptions when they are finally used.
  • The names of drives, files and devices are mapped as-is, without normalization. This means that the resulting loc value may not be a canonical representation of the identified resource. Normalization of loc values is for a different function TBD.

syntax WindowsPath

lexical WindowsPath
= unc : Slash Slash Slashes? PathChar* \ "." Slashes PathChar* Slashes WindowsFilePath
| uncDOSDrive : Slash Slash Slashes? DOSDevice Slashes Drive ":" OptionalWindowsFilePath
| uncDOSPath : Slash Slash Slashes? DOSDevice Slashes PathChar* Slashes WindowsFilePath
| absolute : Drive ":" Slashes WindowsFilePath
| driveRelative : Drive ":" WindowsFilePath
| directoryRelative: Slash WindowsFilePath
| relative : WindowsFilePath
;

syntax OptionalWindowsFilePath

lexical OptionalWindowsFilePath
= ()
| Slashes WindowsFilePath
;

syntax DOSDevice

lexical DOSDevice = [.?];

syntax PathChar

lexical PathChar = !([\a00-\a20\< \> : \" | ? * \\ /] - [\ ]);

syntax PathSegment

lexical PathSegment
= current: "."
| parent : ".."
| name : PathChar+ \ ".." \ "."
;

syntax Drive

lexical Drive = [A-Za-z];

syntax Slashes

lexical Slashes = Slash+ !>> [\\/];

syntax Slash

lexical Slash = [\\/];

syntax WindowsFilePath

lexical WindowsFilePath = {PathSegment Slashes}* segments Slashes? [\ .] !<< ();

function parseWindowsPath

Convert a windows path literal to a source location URI.

loc parseWindowsPath(str input, loc src=|unknown:///|)
  1. parses the path using the grammar for Windows Path
  2. takes the literal name components using string interpolation "<segment>". This means no decoding/encoding happens at all while extracting hostname, share name and path segment names. Also all superfluous path separators are skipped.
  3. uses loc + str path concatenation with its builtin character encoding to construct the URI. Also the right path separators are introduced.

function mapPathToLoc

UNC.

loc mapPathToLoc((WindowsPath) `<Slash _><Slash _><Slashes? _><PathChar* hostName><Slashes _><PathChar* shareName><Slashes _><WindowsFilePath path>`)

function mapPathToLoc

DOS UNC Device Drive.

loc mapPathToLoc((WindowsPath) `<Slash _><Slash _><Slashes? _><DOSDevice dq><Slashes _><Drive drive>:<OptionalWindowsFilePath path>`)

function mapPathToLoc

DOS UNC Device Path.

loc mapPathToLoc((WindowsPath) `<Slash _><Slash _><Slashes? _><DOSDevice dq><Slashes _><PathChar* deviceName><Slashes _><WindowsFilePath path>`)

function deviceIndicator

str deviceIndicator((DOSDevice) `?`)

str deviceIndicator((DOSDevice) `.`)

function mapPathToLoc

DOS UNCPath.

loc mapPathToLoc((WindowsPath) `<Slash _><Slash _><Slashes? _>?<Slashes _><PathChar* shareName><Slashes _><WindowsFilePath path>`)

function mapPathToLoc

Absolute: given the drive and relative to its root.

loc mapPathToLoc((WindowsPath) `<Drive drive>:<Slashes _><WindowsFilePath path>`)

function mapPathToLoc

Drive relative: relative to the current working directory on the given drive.

loc mapPathToLoc((WindowsPath) `<Drive drive>:<WindowsFilePath path>`)

function mapPathToLoc

Directory relative: relative to the root of the current drive.

loc mapPathToLoc((WindowsPath) `<Slash _><WindowsFilePath path>`)

function mapPathToLoc

Relative to the current working directory on the current drive.

loc mapPathToLoc((WindowsPath) `<WindowsFilePath path>`)

function appendPath

loc appendPath(loc root, WindowsFilePath path)

loc appendPath(loc root, (OptionalWindowsFilePath) ``)

loc appendPath(loc root, (OptionalWindowsFilePath) `<Slashes _><WindowsFilePath path>`)

Tests

test uncSharePath

test bool uncSharePath()
= parseWindowsPath("\\\\Server2\\Share\\Test\\Foo.txt")
== |unc://Server2/Share/Test/Foo.txt|;

test uncDrivePath

test bool uncDrivePath()
= parseWindowsPath("\\\\system07\\C$\\")
== |unc://system07/C$|;

test uncDOSDevicePathLocalFileQuestion

test bool uncDOSDevicePathLocalFileQuestion() {
loc l = parseWindowsPath("\\\\?\\c:\\windows\\system32\\cmd.exe");

if (IS_WINDOWS) {
assert exists(l);
}

return l == |unc://%3F/c:/windows/system32/cmd.exe|;
}

test uncDOSDevicePathLocalFileDot

test bool uncDOSDevicePathLocalFileDot() {
loc l = parseWindowsPath("\\\\.\\C:\\Test\\Foo.txt");

return l == |unc://./C:/Test/Foo.txt|;
}

test uncDOSDeviceUNCSharePath

test bool uncDOSDeviceUNCSharePath() {
// the entire UNC namespace is looped back into the DOS Device UNC encoding via
// the reserved name "UNC":
loc m1 = parseWindowsPath("\\\\?\\UNC\\Server\\Share\\Test\\Foo.txt");
loc m2 = parseWindowsPath("\\\\.\\UNC\\Server\\Share\\Test\\Foo.txt");

return m1 == |unc://%3F/UNC/Server/Share/Test/Foo.txt|
&& m2 == |unc://./UNC/Server/Share/Test/Foo.txt|;
}

test uncDOSDeviceVolumeGUIDReference

test bool uncDOSDeviceVolumeGUIDReference() {
loc l = parseWindowsPath("\\\\.\\Volume{b75e2c83-0000-0000-0000-602f00000000}\\Test\\Foo.txt");

return l == |unc://./Volume%7Bb75e2c83-0000-0000-0000-602f00000000%7D/Test/Foo.txt|;
}

test uncDOSDeviceBootPartition

test bool uncDOSDeviceBootPartition() {
loc l = parseWindowsPath("\\\\.\\BootPartition\\");
return l == |unc://./BootPartition|;
}

test simpleDrivePathC

test bool simpleDrivePathC()
= parseWindowsPath("C:\\Program Files\\Rascal")
== |file:///C:/Program%20Files/Rascal|;

test mixedSlashesDrivePathC

test bool mixedSlashesDrivePathC()
= parseWindowsPath("C:\\Program Files/Rascal")
== |file:///C:/Program%20Files/Rascal|;

test trailingSlashesDrivePathC

test bool trailingSlashesDrivePathC()
= parseWindowsPath("C:\\Program Files\\Rascal\\\\")
== |file:///C:/Program%20Files/Rascal|;

test simpleDrivePathD

test bool simpleDrivePathD()
= parseWindowsPath("D:\\Program Files\\Rascal")
== |file:///D:/Program%20Files/Rascal|;

test uncNetworkShareOk

test bool uncNetworkShareOk() {
loc l = parseWindowsPath("\\\\localhost\\ADMIN$\\System32\\cmd.exe");

if (IS_WINDOWS) {
return exists(l);
}
else {
return |unc://localhost/ADMIN$/System32/cmd.exe| == l;
}
}