Skip to content

Automating Access Using Windows Script, Part 2

Access Archon #144

The following sample VBS program will show how to extract these values:

Option Explicit

 

Function FindZip (Str)

 

  Dim re, Matches

  Dim Zips(2)

 

  Set re = New RegExp

  re.Pattern = “((d{5})(-(d{4}))?)”

  Set Matches = re.Execute (Str)

  If Matches.Count = 0 Then

    FindZip = Empty

    Exit Function

  End If

 

  Zips(0) = Matches(0).SubMatches(0) ‘ ZipCode

  Zips(1) = Matches(0).SubMatches(1) ‘ Zip5

  Zips(2) = Matches(0).SubMatches(3) ‘ Zip4

 

  FindZip = Zips

 

  Set Matches = Nothing

  Set re = Nothing

 

End Function

 

Sub TestZip (Str)

 

  Dim Zip, Zips

 

  WScript.Echo “Testing string: ” & Str

 

  Zips = FindZip (Str)

 

  If IsEmpty (Zips) Then

    WScript.Echo “No zip code found”

  Else

    For Each Zip in Zips

      WScript.Echo Zip

    Next

  End If

 

  WScript.Echo

 

End Sub

 

TestZip (“Joe Cool, 123 Main Street, Brooklyn, NY 11223-0101”)

TestZip (“No zip here”)

 

The output from cscript will look like this:

Testing string: Joe Cool, 123 Main Street, Brooklyn, NY 11223-0101

11223-0101

11223

0101

 

Testing string: No zip here

No zip code found

Note from Helen:  Depending on what security software you have on your computer, you may get a warning message when running the script; I got one from Microsoft AntiSpyware:

Figure B.  Warning message when running a script

The MSDN VBS library has reference documentation for regular expressions, but I warn you now this documentation is not complete. For example, this documentation does not say that patterns are greedy, but they are. Because of the way MS documents objects, the documentation for Regular Expressions is not all in one area. A good starting point is the Regular Expression object documentation:

http://msdn.microsoft.com/library/en-us/script56/html/05f9ee2e-982f-4727-839e-b1b8ed696d0a.asp

Regular Expression patterns are meagerly documented as part of the Pattern property:

http://msdn.microsoft.com/library/en-us/script56/html/648fb4cf-2968-491c-b9de-51a7dad965f1.asp


Log Records

The log records were in the following format:

date time,ms [<>]seq rectype (tag=value [sep]…)

 

Each record had a fixed header section and a variable detail section. The header included date and time, a direction (< meant in, > meant out), a sequence number and a record type. The detail was a list of tags and values separated by a hex 01, and it was enclosed in parenthesis. Different records had different tags, but not all the tags needed to be extracted.

I created a Log table with columns for all the header fields, as well as columns for each tag that was needed. The tags were just numbers, but to make the querying easier, I made the column names for the tags meaningful. However, this required a map to convert the tags into column names.

To make this tag to column mapping most flexible, I created a KeyMap table. The code would read this KeyMap and use this to decide which tags to extract, and what column to place the value into.


Table Definitions

These are the tables I ended up with in the Access DB.















Monitor

The directories to monitor. Support UNC’s, file names with wildcards, and an Active/Inactive flag.

History

Remember each file that was processed. Track the full path name to the file, the last change date of the file, and when it was processed.

KeyMap

To map each tag (key) to a column name as it appears in the Log table.

Log

Container to hold the resulting log records.


 


FX_LOG Load.vbs – Walkthrough

This is the code that performs all the work.


Initialization

The name of the Access database is defined as a constant at the top of the code. This allows the DB name to be easily changed.

The first object instantiated is the File System Object (FSO). This object is actually part of the scripting environment provided by the Windows Scripting Host. It provides access to directory-type information (such as the list of files in a folder), as well as read/write access to individual files via a TextStream object. Consequently, this will be used extensively.

The code uses ActiveX Data Objects (ADO) to manipulate the Access tables. I prefer ADO to the older Data Access Objects (DAO) because I find ADO more natural. It also allows me to use other database systems (DB2, Oracle) using their native languages without forcing me into a Jet-mapped syntax, and without loading the Jet drivers. Note Access requires Jet, and that these other database systems are not used in this application, so for me, it was just a matter of preference.

ADO uses Recordsets to represent the result of a query (i.e. SQL select statement). My initialization code opens a connection to the Access database (using the Jet OLEDB driver – this is the ADO interface for Jet), then opens all the Recordsets it will need. Some Recordsets will need to be updated, so these are opened as Dynamic with Optimistic Locking. Why this is necessary is beyond the scope of this article, so just trust me.

The KeyMap needs to be searched multiple times with every record, so to improve performance, I load the table into a Scripting Dictionary object called Columns. Dictionaries are associate arrays, where each entry has a key and a value. Dictionaries are designed to be efficient when looking up keys.

I make extensive use of Regular Expressions (see sidebar) in this code. I instantiate all the Regular Expressions, and initialize the ones I can. I broke the log record into two Regular Expressions, one for the header, and one for the detail, for ease of use (the processing was simpler).

The pattern for detail portion is simple, but it might need some explanation. It just sucks out all the characters that are in front of the first equal sign as SubMatch(0), then all the characters after the equal sign until the hex 01 separator as SubMatch(1). It sets the Global property to True so it will find all occurrences of this pattern.

Another Dictionary is created, Data. This will hold the Key/Value pairs from each Log record when it is parsed.

The final initialization step is to open our log file. This log file is generated to help with any troubleshooting. The log file name is generated at run time from the script name. This is a generic technique I use so it can be copied to any program I need.


Main Processing Loop

Each record is read from the Monitor table. If the Active flag on the record is not True, the record is bypassed.

The specified folder is opened and the filename wildcard pattern is converted to a Regular Expression (see the sidebar for WildCards2RegExp). The complete list of files in the folder is “walked”. For each file, it’s compared to the filename pattern (via the Regular Expression) to see if it matches. If not, the file is bypassed.

If the filename matches the pattern, the History is searched to see if the file has already been processed. If the file is new or the WhenLoaded column is null, the file is processed (see ProcessFile).

After all the files in a folder have been processed, the next folder from the Monitor table is processed. Once all folders have been processed, the program ends.


Subroutine ProcessFile

The subroutine takes one parameter which is the log file to be processed. This is passed as an FSO File object.

The file is opened for input and all lines are read. For each line, the Parse routine is called to, you guessed it, parse the line.

After the entire file is processed, the current History record (as located in the main routine) is updated. As you can see, columns in the Recordset are referenced by indexing them by name, as in History(“FileDate”). The Recordset’s Update method performs the actual record update.


Subroutine Parse

The subroutine takes one parameter which is the string to be parsed. The parsing is done in two steps, both using Regular Expressions (RE). The first pattern extracts the fixed portion of the record (a.k.a. the header). Each value is stored in a discrete variable.

The result of the Execute (pattern match) is a Matches collection. This pattern is not global, so it will match only the first occurrence. So our resulting collection will have just a single Match. . So Set Match = FIXRcd1.Execute (Line)(0) sets Match to point to the first result of the pattern match.

This single Match object contains a SubMatches collection where each SubMatch object corresponds to each remembered (parenthesized) value. So each statement like Variable = Match.SubMatches(i) extracts one of the remembered values.

Note that the With Match statement allows me not to have to repeat the Match. qualification for every line. In addition to making the code easier to write, it actually improves performance since the basing address of the Match object is cached.

The second pattern extracts the repeating values from the detail section. Since this is variable, each key/value pair is stored in the Dictionary object called Data (original, I know).

The RemoveAll method clears the Data Dictionary. The detail pattern is a global pattern, so all matches will be returned. I loop through all the matches, and for each match, I extract the remembered key and value. Theses are stored in the Data Dictionary.

The Log file uses the Timestamp as a unique value. If the Timestamp exists, that record will be updated, otherwise a new record is created.

The header fields are all set in the record. Then all the key/value pairs from the Data object are used to update the corresponding columns in the record, and the record is updated.

The Columns Dictionary is used to look up the column name for the corresponding key. If a key is not found in the Columns Dictionary, that key is bypassed. If this happens, a record is written to our log so that possibly unidentified keys can be added.


Function WildCards2RegExp

The function takes a wildcard pattern, converts it to a Regular Expression and returns that as the value of the function. See the sidebar a complete explanation.


Putting It All Together

Extract the zip file to a new folder (I used C:FX_Log). The FX_Log DB as delivered monitors *.txt files in the current directory. I delivered a single source document, fx_2005_12_23.txt. Open the source document in Wordpad (Notepad won’t recognize the LF’s as line endings) and check out what the raw data looks like.

If you open the file in Notepad, you may see some garbage characters – they are not a problem. Those are the hex ’01’ characters that are the separators between the data elements in the detail portion of each record. See FIXRcd2.Pattern for how it’s parsed.

Now open a command prompt, CD to the new folder and run the script:

cscript “FX_Log Load.vbs”

Note from Helen:  I like to just double-click the .vbs file in a Windows Explorer pane – it works the same.  If you go the cmd route, don’t forget the quotes – I did, and I got an error because of the space in the file name.

Open the FX_Log DB in Access and open table Log. These are the Log records after they’ve been parsed. Open the History table to check out what files were processed. You can also check out the settings in the Monitor table and the KeyMap table.

Now open the “FX_Log Load.log” file in Notepad to see the log of what was processed. Notice how the indentation makes the log easier to follow.


Wrapping It Up

There are many complicated topics covered here. I’ve tried to simplify by moving the most complex items to their own sidebars. I hope this helped.

Once conquered, these items provide useful techniques that will aid you in all your future programming. In particular, ADO and Regular Expressions are used in most of the applications I develop. And you can’t beat the Windows Scripting Host for developing quick and not-so-dirty applications – with the ability to use any ActiveX or OLE Control on the machine, it ain’t your grandma’s Basic no more!


Supporting Files

The zip file containing this article, in Word format, plus the supporting file(s), may be downloaded from the Access Archon page of my Web site.  It is accarch144.zip, which is the last entry in the table of Access Archon columns for Access Watch.



















Document Name

Document Type

Place in

FX_Log.mdb

Access 2000 database (can also be used in higher versions of Access)

Wherever you want

FX_Log Load.vbs

Windows Script Host script

Wherever you want

fx_2005_12_23.txt

Sample source document

Wherever you want

 

About this author

Office 2024 - all you need to know. Facts & prices for the new Microsoft Office. Do you need it?

Microsoft Office upcoming support end date checklist.