7. File Access

Introduction

//----------------------------------------------------------------------------------
//testfile = new File('/usr/local/widgets/data')  // unix
testfile = new File('Pleac/data/blue.txt')      // windows
testfile.eachLine{ if (it =~ /blue/) println it }

// Groovy (like Java) uses the File class as an abstraction for
// the path representing a potential file system resource.
// Channels and Streams (along with Reader adn Writer helper
// classes) are used to read and write to files (and other
// things). Files, channels, streams etc are all "normal"
// objects; they can be passed around in your programs just
// like other objects (though there are some restrictions
// covered elsewhere - e.g. you can't expect to pass a File
// object between JVMs on different machines running different
// operating systems and expect them to maintain a meaningful
// value across the different JVMs). In addition to Streams,
// there is also support for random access to files.

// Many operations are available on streams and channels. Some
// return values to indicate success or failure, some can throw
// exceptions, other times both styles of error reporting may be
// available.

// Streams at the lowest level are just a sequence of bytes though
// there are various abstractions at higher levels to allow
// interacting with streams at encoded character, data type or
// object levels if desired. Standard streams include System.in,
// System.out and System.err. Java and Groovy on top of that
// provide facilities for buffering, filtering and processing
// streams in various ways.

// File channels provide more powerful operations than streams
// for reading and writing files such as locks, buffering,
// positioning, concurrent reading and writing, mapping to memory
// etc. In the examples which follow, streams will be used for
// simple cases, channels when more advanced features are
// required. Groovy currently focusses on providing extra support
// at the file and stream level rather than channel level.
// This makes the simple things easy but lets you do more complex
// things by just using the appropriate Java classes. All Java
// classes are available within Groovy by default.

// Groovy provides syntactic sugar over the top of Java's file
// processing capabilities by providing meaning to shorthand
// operators and by automatically handling scaffolding type
// code such as opening, closing and handling exceptions behind
// the scenes. It also provides many powerful closure operators,
// e.g. file.eachLineMatch(pattern){ some_operation } will open
// the file, process it line-by-line, finding all lines which
// match the specified pattern and then invoke some operation
// for the matching line(s) if any, before closing the file.


// this example shows how to access the standard input stream
// numericCheckingScript:
prompt = '\n> '
print 'Enter text including a digit:' + prompt
new BufferedReader(new InputStreamReader(System.in)).eachLine{ line ->
                                               // line is read from System.in
    if (line =~ '\\d') println "Read: $line"   // normal output to System.out
    else System.err.println 'No digit found.'  // this message to System.err
}
//----------------------------------------------------------------------------------

Opening a File

//----------------------------------------------------------------------------------
// test values (change for your os and directories)
inputPath='Pleac/src/pleac7.groovy'; outPath='Pleac/temp/junk.txt'

// For input Java uses InputStreams (for byte-oriented processing) or Readers
// (for character-oriented processing). These can throw FileNotFoundException.
// There are also other stream variants: buffered, data, filters, objects, ...
inputFile = new File(inputPath)
inputStream = new FileInputStream(inputFile)
reader = new FileReader(inputFile)
inputChannel = inputStream.channel

// Examples for random access to a file
file = new RandomAccessFile(inputFile, "rw") // for read and write
channel = file.channel

// Groovy provides some sugar coating on top of Java
println inputFile.text.size()
// => 13496

// For output Java use OutputStreams or Writers. Can throw FileNotFound
// or IO exceptions. There are also other flavours of stream: buffered,
// data, filters, objects, ...
outFile = new File(outPath)
appendFlag = false
outStream = new FileOutputStream(outFile, appendFlag)
writer = new FileWriter(outFile, appendFlag)
outChannel = outStream.channel

// Also some Groovy sugar coating
outFile << 'A Chinese sailing vessel'
println outFile.text.size() // => 24

Opening Files with Unusual Filenames

//----------------------------------------------------------------------------------
// No problem with Groovy since the filename doesn't contain characters with
// special meaning; like Perl's sysopen. Options are either additional parameters
// or captured in different classes, e.g. Input vs Output, Buffered vs non etc.
new FileReader(inputPath)
//----------------------------------------------------------------------------------

Expanding Tildes in Filenames

//----------------------------------------------------------------------------------
// '~' is a shell expansion feature rather than file system feature per se.
// Because '~' is a valid filename character in some operating systems, and Java
// attempts to be cross-platform, it doesn't automatically expand Tilde's.
// Given that '~' expansion is commonly used however, Java puts the $HOME
// environment variable (used by shells to do typical expansion) into the
// "user.home" system property. This works across operating systems - though
// the value inside differs from system to system so you shouldn't rely on its
// content to be of a particular format. In most cases though you should be
// able to write a regex that will work as expected. Also, Apple's
// NSPathUtilities can expand and introduce Tildes on platforms it supports.
path = '~paulk/.cvspass'
name = System.getProperty('user.name')
home = System.getProperty('user.home')
println home + path.replaceAll("~$name(.*)", '$1')
// => C:\Documents and Settings\Paul/.cvspass
//----------------------------------------------------------------------------------

Making Perl Report Filenames in Errors

//----------------------------------------------------------------------------------
// The exception raised in Groovy reports the filename
try {
    new File('unknown_path/bad_file.ext').text
} catch (Exception ex) {
    System.err.println(ex.message)
}
// =>
// unknown_path\bad_file.ext (The system cannot find the path specified)
//----------------------------------------------------------------------------------

Creating Temporary Files

//----------------------------------------------------------------------------------
try {
    temp = File.createTempFile("prefix", ".suffix")
    temp.deleteOnExit()
} catch (IOException ex) {
    System.err.println("Temp file could not be created")
}
//----------------------------------------------------------------------------------

Storing Files Inside Your Program Text

//----------------------------------------------------------------------------------
// no special features are provided, here is a way to do it manually
// DO NOT REMOVE THE FOLLOWING STRING DEFINITION.
pleac_7_6_embeddedFileInfo = '''
Script size is 13731
Last script update: Wed Jan 10 19:05:58 EST 2007
'''
ls = System.getProperty('line.separator')
file = new File('Pleac/src/pleac7.groovy')
regex = /(?ms)(?<=^pleac_7_6_embeddedFileInfo = ''')(.*)(?=^''')/
def readEmbeddedInfo() {
    m = file.text =~ regex
    println 'Found:\n' + m[0][1]
}
def writeEmbeddedInfo() {
    lastMod = new Date(file.lastModified())
    newInfo = "${ls}Script size is ${file.size()}${ls}Last script update: ${lastMod}${ls}"
    file.write(file.text.replaceAll(regex, newInfo))
}
readEmbeddedInfo()
// writeEmbeddedInfo()  // uncomment to make script update itself
// readEmbeddedInfo()   // uncomment to redisplay the embedded info after the update

// => (output when above two method call lines are uncommented)
// Found:
//
// Script size is 13550
// Last script update: Wed Jan 10 18:56:03 EST 2007
//
// Found:
//
// Script size is 13731
// Last script update: Wed Jan 10 19:05:58 EST 2007
//----------------------------------------------------------------------------------

Writing a Filter

//----------------------------------------------------------------------------------
// general pattern for reading from System.in is:
// System.in.readLines().each{ processLine(it) }

// general pattern for a filter which can either process file args or read from System.in is:
// if (args.size() != 0) args.each{
//     file -> new File(file).eachLine{ processLine(it) }
// } else System.in.readLines().each{ processLine(it) }

// note: the following examples are file-related per se. They show
// how to do option processing in scenarios which typically also
// involve file arguments. The reader should also consider using a
// pre-packaged options parser package (there are several popular
// ones) rather than the hard-coded processing examples shown here.

chopFirst = false
columns = 0
args = ['-c', '-30', 'somefile']

// demo1: optional c
if (args[0] == '-c') {
    chopFirst = true
    args = args[1..-1]
}

assert args == ["-30", "somefile"]
assert chopFirst

// demo2: processing numerical options
if (args[0] =~ /^-(\d+)$/) {
    columns = args[0][1..-1].toInteger()
    args = args[1..-1]
}

assert args == ["somefile"]
assert columns == 30

// demo3: multiple args (again consider option parsing package)
args = ['-n','-a','file1','file2']
nostdout = false
append = false
unbuffer = false
ignore_ints = false
files = []
args.each{ arg ->
    switch(arg) {
        case '-n': nostdout    = true; break
        case '-a': append      = true; break
        case '-u': unbuffer    = true; break
        case '-i': ignore_ints = true; break
        default: files += arg
    }
}
if (files.any{ it.startsWith('-')}) {
    System.err.println("usage: demo3 [-ainu] [filenames]")
}
// process files ...
assert nostdout && append && !unbuffer && !ignore_ints
assert files == ['file1','file2']

// find login: print all lines containing the string "login" (command-line version)
//% groovy -ne "if (line =~ 'login') println line" filename

// find login variation: lines containing "login" with line number (command-line version)
//% groovy -ne "if (line =~ 'login') println count + ':' + line" filename

// lowercase file (command-line version)
//% groovy -pe "line.toLowerCase()"


// count chunks but skip comments and stop when reaching "__DATA__" or "__END__"
chunks = 0; done = false
testfile = new File('Pleac/data/chunks.txt') // change on your system
lines = testfile.readLines()
for (line in lines) {
    if (!line.trim()) continue
    words = line.split(/[^\w#]+/).toList()
    for (word in words) {
        if (word =~ /^#/) break
        if (word in ["__DATA__", "__END__"]) { done = true; break }
        chunks += 1
    }
    if (done) break
}
println "Found $chunks chunks"


// groovy "one-liner" (cough cough) for turning .history file into pretty version:
//% groovy -e "m=new File(args[0]).text=~/(?ms)^#\+(\d+)\r?\n(.*?)$/;(0..<m.count).each{println ''+new Date(m[it][1].toInteger())+'  '+m[it][2]}" .history
// =>
// Sun Jan 11 18:26:22 EST 1970  less /etc/motd
// Sun Jan 11 18:26:22 EST 1970  vi ~/.exrc
// Sun Jan 11 18:26:22 EST 1970  date
// Sun Jan 11 18:26:22 EST 1970  who
// Sun Jan 11 18:26:22 EST 1970  telnet home
//----------------------------------------------------------------------------------

Modifying a File in Place with Temporary File

//----------------------------------------------------------------------------------
// test data for below
testPath = 'Pleac/data/process.txt'

// general pattern
def processWithBackup(inputPath, Closure processLine) {
    def input = new File(inputPath)
    def out = File.createTempFile("prefix", ".suffix")
    out.write('') // create empty file
    count = 0
    input.eachLine{ line ->
        count++
        processLine(out, line, count)
    }
    def dest = new File(inputPath + ".orig")
    dest.delete() // clobber previous backup
    input.renameTo(dest)
    out.renameTo(input)
}

// use withPrintWriter if you don't want the '\n''s appearing
processWithBackup(testPath) { out, line, count ->
    if (count == 20) {   // we are at the 20th line
        out << "Extra line 1\n"
        out << "Extra line 2\n"
    }
    out << line + '\n'
}

processWithBackup(testPath) { out, line, count ->
    if (!(count in 20..30)) // skip the 20th line to the 30th
        out << line + '\n'
}
// equivalent to "one-liner":
//% groovy -i.orig -pe "if (!(count in 20..30)) out << line" testPath
//----------------------------------------------------------------------------------

Modifying a File in Place with -i Switch

//----------------------------------------------------------------------------------
//% groovy -i.orig -pe 'FILTER COMMAND' file1 file2 file3 ...

// the following may also be possible on unix systems (unchecked)
//#!/usr/bin/groovy -i.orig -p
// filter commands go here

// "one-liner" templating scenario: change DATE -> current time
//% groovy -pi.orig -e 'line.replaceAll(/DATE/){new Date()}'

//% groovy -i.old -pe 'line.replaceAll(/\bhisvar\b/, 'hervar')' *.[Cchy] (globbing platform specific)

// one-liner for correcting spelling typos
//% groovy -i.orig -pe 'line.replaceAll(/\b(p)earl\b/i, '\1erl')' *.[Cchy] (globbing platform specific)
//----------------------------------------------------------------------------------

Modifying a File in Place Without a Temporary File

//----------------------------------------------------------------------------------
// general pattern
def processFileInplace(file, Closure processText) {
    def text = file.text
    file.write(processText(text))
}

// templating scenario: change DATE -> current time
testfile = new File('Pleac/data/pleac7_10.txt') // replace on your system
processFileInplace(testfile) { text ->
    text.replaceAll(/(?m)DATE/, new Date().toString())
}
//----------------------------------------------------------------------------------

Locking a File

//----------------------------------------------------------------------------------
// You need to use Java's Channel class to acquire locks. The exact
// nature of the lock is somewhat dependent on the operating system.
def processFileWithLock(file, processStream) {
    def random = new RandomAccessFile(file, "rw")
    def lock = random.channel.lock() // acquire exclusive lock
    processStream(random)
    lock.release()
    random.close()
}

// Instead of an exclusive lock you can acquire a shared lock.

// Also, you can acquire a lock for a region of a file by specifying
// start and end positions of the region when acquiring the lock.

// For non-blocking functionality, use tryLock() instead of lock().
def processFileWithTryLock(file, processStream) {
    random = new RandomAccessFile(file, "rw")
    channel = random.channel
    def MAX_ATTEMPTS = 30
    for (i in 0..<MAX_ATTEMPTS) {
        lock = channel.tryLock()
        if (lock != null) break
        println 'Could not get lock, pausing ...'
        Thread.sleep(500) // 500 millis = 0.5 secs
    }
    if (lock == null) {
        println 'Unable to acquire lock, aborting ...'
    } else {
        processStream(random)
        lock.release()
    }
    random.close()
}


// non-blocking multithreaded example: print first line while holding lock
Thread.start{
    processFileWithLock(testfile) { source ->
        println 'First reader: ' + source.readLine().toUpperCase()
        Thread.sleep(2000) // 2000 millis = 2 secs
    }
}
processFileWithTryLock(testfile) { source ->
    println 'Second reader: ' + source.readLine().toUpperCase()
}
// =>
// Could not get lock, pausing ...
// First reader: WAS LOWERCASE
// Could not get lock, pausing ...
// Could not get lock, pausing ...
// Could not get lock, pausing ...
// Could not get lock, pausing ...
// Second reader: WAS LOWERCASE
//----------------------------------------------------------------------------------

Flushing Output

//----------------------------------------------------------------------------------
// In Java, input and output streams have a flush() method and file channels
// have a force() method (applicable also to memory-mapped files). When creating
// PrintWriters and // PrintStreams, an autoFlush option can be provided.
// From a FileInput or Output Stream you can ask for the FileDescriptor
// which has a sync() method - but you wouldn't you'd just use flush().

inputStream = testfile.newInputStream()    // returns a buffered input stream
autoFlush = true
printStream = new PrintStream(outStream, autoFlush)
printWriter = new PrintWriter(outStream, autoFlush)
//----------------------------------------------------------------------------------

Reading from Many Filehandles Without Blocking

//----------------------------------------------------------------------------------
// See the comments in 7.14 about scenarios where non-blocking can be
// avoided. Also see 7.14 regarding basic information about channels.
// An advanced feature of the java.nio.channels package is supported
// by the Selector and SelectableChannel classes. These allow efficient
// server multiplexing amongst responses from a number of potential sources.
// Under the covers, it allows mapping to native operating system features
// supporting such multiplexing or using a pool of worker processing threads
// much smaller in size than the total available connections.
//
// The general pattern for using selectors is:
//
//      while (true) {
//         selector.select()
//         def it = selector.selectedKeys().iterator()
//         while (it.hasNext()) {
//            handleKey(it++)
//            it.remove()
//         }
//      }
//----------------------------------------------------------------------------------

Doing Non-Blocking I/O

//----------------------------------------------------------------------------------
// Groovy has no special support for this apart from making it easier to
// create threads (see note at end); it relies on Java's features here.

// InputStreams in Java/Groovy block if input is not yet available.
// This is not normally an issue, because if you have a potential blocking
// operation, e.g. save a large file, you normally just create a thread
 // and save it in the background.

// Channels are one way to do non-blocking stream-based IO.
// Classes which implement the AbstractSelectableChannel interface provide
// a configureBlocking(boolean) method as well as an isBlocking() method.
// When processing a non-blocking stream, you need to process incoming
// information based on the number of bytes read returned by the various
// read methods. For non-blocking, this can be 0 bytes even if you pass
// a fixed size byte[] buffer to the read method. Non-blocking IO is typically
// not used with Files but more normally with network streams though they
// can when Pipes (couple sink and source channels) are involved where
// one side of the pipe is a file.
//----------------------------------------------------------------------------------

Determining the Number of Bytes to Read

//----------------------------------------------------------------------------------
// Groovy uses Java's features here.
// For both blocking and non-blocking reads, the read operation returns the number
// of bytes read. In blocking operations, this normally corresponds to the number
// of bytes requested (typically the size of some buffer) but can have a smaller
// value at the end of a stream. Java also makes no guarantees about whether
// other streams in general will return bytes as they become available under
// certain circumstances (rather than blocking until the entire buffer is filled.
// In non-blocking operations, the number of bytes returned will typically be
// the number of bytes available (up to some maximum buffer or requested size).
//----------------------------------------------------------------------------------

Storing Filehandles in Variables

//----------------------------------------------------------------------------------
// This just works in Java and Groovy as per the previous examples.
//----------------------------------------------------------------------------------

Caching Open Output Filehandles

//----------------------------------------------------------------------------------
// Groovy uses Java's features here.
// More work has been done in the Java on object caching than file caching
// with several open source and commercial offerings in that area. File caches
// are also available, for one, see:
// http://portals.apache.org/jetspeed-1/apidocs/org/apache/jetspeed/cache/FileCache.html
//----------------------------------------------------------------------------------

Printing to Many Filehandles Simultaneously

//----------------------------------------------------------------------------------
// The general pattern is: streams.each{ stream -> stream.println 'item to print' }
// See the MultiStream example in 13.5 for a coded example.
//----------------------------------------------------------------------------------

Opening and Closing File Descriptors by Number

//----------------------------------------------------------------------------------
// You wouldn't normally be dealing with FileDescriptors. In case were you have
// one you would normally walk through all known FileStreams asking each for
// it's FileDescriptor until you found one that matched. You would then close
// that stream.
//----------------------------------------------------------------------------------

Copying Filehandles

//----------------------------------------------------------------------------------
// There are several concepts here. At the object level, any two object references
// can point to the same object. Any changes made by one of these will be visible
// in the 'alias'. You can also have multiple stream, reader, writer or channel objects
// referencing the same resource. Depending on the kind of resource, any potential
// locks, the operations being requested and the behaviour of third-party programs,
// the result of trying to perform such concurrent operations may not always be
// deterministic. There are strategies for coping with such scenarious but the
// best bet is to avoid the issue.

// For the scenario given, copying file handles, that corresponds most closely
// with cloning streams. The best bet is to just use individual stream objects
// both created from the same file. If you are attempting to do write operations,
// then you should consider using locks.
//----------------------------------------------------------------------------------

Program: netlock

//----------------------------------------------------------------------------------
// locking is built in to Java (since 1.4), so should not be missing
//----------------------------------------------------------------------------------

Program: lockarea

//----------------------------------------------------------------------------------
// Java locking supports locking just regions of files.
//----------------------------------------------------------------------------------