Data Mining UrlScan 3.0 Logs using LogParser 2.2

We released a new version of UrlScan recently, and one of the great new features in this version is log files that conform to the W3C Extended Log File Format. What this means to administrators is that they can now parse their UrlScan activity using almost any common log utilities, including Microsoft's LogParser 2.2 utility. For anyone that hasn't heard of LogParser, this is a freeware utility from Microsoft that allows you to write SQL-style queries to extract useful information. Eventually I'd like for the following information to show up on the http://learn.iis.net/ web site, but for now I'd like everyone to at least have access to the information.


Getting Started

Downloading and Installing UrlScan and LogParser

The download locations for UrlScan are listed in the "Getting UrlScan" section of the following topic:

http://learn.iis.net/page.aspx/473/

Note: There are separate downloads for 32-bit and 64-bit Windows.

LogParser is available from the following URL:

http://www.iis.net/go/1287

Note: It's a 32-bit application, but you can use it on a 64-bit Windows system.

While it's not essential, copying Logparser.exe to a folder in your PATH will allow you to run LogParser from any folder on your system.

UrlScan Log Files

Locating your UrlScan Log Files

By default UrlScan installs to %WinDir%\System32\Inetsrv\urlscan, and this is where your Urlscan.ini file should be located. (If you installed to a different location, you would need to locate your installation by searching for your Urlscan.ini file.) In the [Options] section of the Urlscan.ini file you should see an entry for LoggingDirectory that is set "Logs" by default, which means that your default directory for UrlScan log files is %WinDir%\System32\Inetsrv\urlscan\Logs.

Log File Names

UrlScan log files are named using the following syntax:

urlscan.mmddyy.log

Where:

mm = Month
dd = Day
yy = Year

Log File Fields

The following table lists the fields that are available:

Field Description
Date Request date in YYYY-MM-DD format
Time Request time in UTC time
c-ip IP address of the client that made the request
s-siteid ID of the web site that received the request
cs-method HTTP method for the request
cs-uri URL for the request
x-action Action taken; for example:
- Logged and allowed
- Rejected
x-reason Reason for action; for example:
- a request header was too long
- content length too long
- disallowed header detected
- disallowed query string sequence detected
- disallowed url sequence detected
- dot in path detected
- extension not allowed
- failed urlscan rules
- high bit character detected
- query string too long
- second pass normalization failure
- url too long
- verb not allowed
x-context Part of request that triggered the action; for example:
- Content-Length
- file extension
- HTTP_ALL_RAW
- HTTP method
- query string
- request headers
- URL
cs-data The request data that triggered the action.
x-control Config control data that caused the trigger.  For example, if a request was rejected because of a deny string from a UrlScan rule, the specific deny string will be listed.

Querying Data

Simple Queries

Selecting Everything

The simplest (and least useful) query that you can issue is the following, which is listed here just to get us started:

logparser.exe "SELECT * FROM urlscan.*.log" -i:w3c

Let's analyze what this query does:

  1. Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
  2. Specifies the log file input type as "W3C Format" ("-i:w3c")

Note: Specifying the input file format as "W3C Format" is essential - without that step you would not be able to parse your UrlScan logs.

Selecting Specific Fields to Query

You can restrict the amount of data that you see by specifying which fields to return:

logparser.exe "SELECT date,time,c-ip,x-action FROM urlscan.*.log" -i:w3c

Here's what this query does:

  1. Selects only four specific fields ("SELECT date,time,c-ip,x-action") from all records in all log files ("FROM urlscan.*.log")
  2. Specifies the log file input type as "W3C Format" ("-i:w3c")

This allows you to greatly reduce the amount of information that you see in your results.

Exporting Logs into a Different Format

As mentioned earlier, selecting all of the records in all of your logs is probably not all that helpful, unless you were interested in parsing your UrlScan information using another application, like Microsoft Excel, Microsoft Access, SQL Server, etc. In which case, you might want to export all of your data into a comma-delimited file like the following example:

logparser.exe "SELECT * INTO UrlScan.csv FROM urlscan.*.log" -i:w3c -o:csv

Here's what this query does:

  1. Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
  2. Inserts from all records into a comma-delimited file ("INTO UrlScan.csv")
  3. Specifies the input file type as "W3C Format" ("-i:w3c")
  4. Specifies the output file type as "CSV Format" ("-o:csv")

You could just as easily export into a tab-separated file using the following syntax:

logparser.exe "SELECT * INTO UrlScan.tsv FROM urlscan.*.log" -i:w3c -o:tsv

Filtering Queries

One of the great things about LogParser and SQL queries in general is that you can filter the amount of data that you see by specifying criteria using a WHERE clause. The following

Filtering based on IP Address

For example, to see all of the requests from a specific client IP address, you could use the following syntax:

logparser.exe "SELECT * FROM urlscan.*.log WHERE c-ip='192.168.1.1'" -i:w3c

Here's what this query does:

  1. Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
  2. Filters the results based on a specific client IP address ("WHERE c-ip='192.168.1.1'")
  3. Specifies the input file type as "W3C Format" ("-i:w3c")

Filtering based on UrlScan Action

You can also filter based on UrlScan rejections using the following syntax:

logparser.exe "SELECT * FROM urlscan.*.log WHERE x-action='Rejected'" -i:w3c

Here's what this query does:

  1. Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
  2. Filters the results where UrlScan rejected the request ("WHERE x-action='Rejected'")
  3. Specifies the input file type as "W3C Format" ("-i:w3c")

Filtering based on Uniqueness

You can filter the amount of data that you see by specifying which fields to return and grouping the data based on uniqueness. This is done by adding the DISTINCT clause as seen in the following syntax:

logparser "SELECT DISTINCT c-ip FROM urlscan.*.log" -i:w3c

Here's what this query does:

  1. Selects only the unique client IP addresses ("SELECT DISTINCT c-ip") from all records in all log files ("FROM urlscan.*.log")
  2. Specifies the input file type as "W3C Format" ("-i:w3c")

Grouping, Tabulating, and Sorting Queries

Counting Requests by Fields

Another great feature of SQL-style queries is the ability to tabulate data. The following example shows how retrieve a count of requests based on the client IP address:

logparser.exe "SELECT c-ip, COUNT(*) FROM urlscan.*.log GROUP BY c-ip" -i:w3c

Here's what this query does:

  1. Selects only the client IP address and request count ("SELECT c-ip, COUNT(*)") from all records in all log files ("FROM urlscan.*.log")
  2. Groups the request count based on the client IP address ("GROUP BY c-ip")
  3. Specifies the input file type as "W3C Format" ("-i:w3c")

Sorting Data by Fields

You can reorder your results into something more useful using the ORDER BY clause, as illustrated in the following example:

logparser.exe "SELECT x-action,x-reason,c-ip,cs-uri FROM urlscan.*.log ORDER BY x-action,x-reason" -i:w3c

Here's what this query does:

  1. Selects four specific fields ("SELECT x-action,x-reason,c-ip,cs-uri") from all records in all log files ("FROM urlscan.*.log")
  2. Sorts the data in ascending order based on the UrlScan action and reason for the action ("ORDER BY x-action,x-reason")
  3. Specifies the input file type as "W3C Format" ("-i:w3c")

Putting it All Together

Counting Requests by Client IP and Sorting Data by Request Count

Starting with the request count query from a previous example, you can rename the request count field to something more useful using the AS clause, as illustrated in the following example:

logparser.exe "SELECT c-ip, COUNT(*) AS c-request-count FROM urlscan.*.log GROUP BY c-ip" -i:w3c

This helps reorder the data based on the count, as shown in the following example:

logparser.exe "SELECT c-ip, COUNT(*) AS c-request-count FROM urlscan.*.log GROUP BY c-ip ORDER BY c-request-count DESC" -i:w3c

Here's what this query does:

  1. Selects only the client IP address and request count ("SELECT c-ip, COUNT(*)") from all records in all log files ("FROM urlscan.*.log")
  2. Groups the request count based on the client IP address ("GROUP BY c-ip")
  3. Sorts the data in descending order based on the request count ("ORDER BY c-request-count DESC")
  4. Specifies the input file type as "W3C Format" ("-i:w3c")

Counting Rejected Requests based on the Reason for Rejections

While somewhat similar to the previous example, the following syntax will filter the results based on rejections and count the number of times that each reason occurs:

logparser.exe "SELECT x-reason, COUNT(*) AS c-request-count FROM urlscan.*.log WHERE x-action='Rejected' GROUP BY x-reason ORDER BY c-request-count DESC" -i:w3c

Here's what this query does:

  1. Selects only the reason and request count ("SELECT x-reason, COUNT(*)") from all records in all log files ("FROM urlscan.*.log")
  2. Filters the results based on rejections ("WHERE x-action='Rejected'")
  3. Groups the request count based on the reason ("GROUP BY x-reason")
  4. Sorts the data in descending order based on the request count ("ORDER BY c-request-count DESC")
  5. Specifies the input file type as "W3C Format" ("-i:w3c")

Splitting Requests into Separate Log Files based on Site ID

UrlScan writes all log entries to a common log file, but you may want to see your data separated by site. The following example splits your UrlScan logs into separate files by site by creating unique log files that are named for each site ID. This example is a little more complex than previous examples, and is written as a batch file.

logparser.exe "SELECT DISTINCT s-siteid INTO siteids.tsv FROM urlscan.*.log" -i:W3C -o:tsv -headers:off

for /f %%a in (siteids.tsv) do (
   logparser.exe "SELECT * INTO W3SVC%%a.log FROM urlscan.*.log WHERE s-siteid='%%a' ORDER BY date,time" -i:W3C -o:w3c
)

Here's what this batch file does:

  1. Creates a tab-separated file that contains a list of unique site IDs
  2. Creates a loop to step through the tab-separated file containing the site IDs
  3. Selects all fields for the specific site ID and creates a new W3C format file for each site

No Comments