Parsing multiple log files

For the first time, I've recently had cause to parse a large number of log files, looking for (and removing) fields and shipping the output into dynamically-named additional files.

The first challenge was to reduce the log files down to just the ones that contained relevant text. Usually i'd have done this with a simple 'grep' (perhaps then piped to another grep to further narrow).

I discovered that there's a tool 'somewhat' similar to grep, in the 'cmd' suite called 'findstr' and that I was able to run search multiple files by using a 'for' loop:

> for /F "tokens=*" %A in (listoflogfiles.txt) do findstr /s %A * >>output_%A.txt
(to simplify my life I create a text file which contained a line-by-line list of all the files I wanted to parse (dir /w >>listoflogfiles.txt) which allowed me to manually tweak the list)

Of interest: tokens is the number of variables on each line to deal with. So '*' means all of them. There's an additional parameter called 'delims' that can be used to specify the delimiter... so this actually has some similarities to bash cut.

Also of interest: findstr /s means it's recursive (search all subdirs)

So anyway. This gave me all the data in each line matching the string, but I wanted to limit the output fields to just ones useful in my application. The way I've done this historically is with a simple 'while line' loop in bash, together with something like 'cut -d " " -f 2,3,6-' >>$line.txt (using a blank space as a delimiter, show me only fields 2, 3, and all fields after and including 6 based on that delimiter, output to a file named based on the input value of the line).

Without the ability to ship my files onto a Linux machine at the time, and having figured out a way to do it with a similar for-loop in DOS (but not, most critically, how to feed said script multiple files automatically...) and not wanting to run this dozens of times manually, it was recommended I give PowerShell a go. I'll admit I've largely stayed away from PS to date - basically havn't had a need to use it before. So now i'm getting a crash course, it appears:

PS > foreach ($line in Get-Content .\listoffiles2.txt) { Get-Content $line | ForEach-Object { $_.split(" ")[1,2] -join ' ' } >>Completed_$line}

Having used my dir /w trick to get a new list of working files, this PS script finally let me read in multiple files, and for each file, output a new file (with a filename being Completed_ followed by the original filename) and this file only contained fields 1 and 2 of the original, stripping the first column (field 0) and any later columns.

Split is an interesting function. without the -join addition, it puts each field on its own line. This wasn't very useful and it took a while to find how to pull it back. Essentially you're running the split and then rejoining the results but only the fields specified (in this case, 1 and 2, with a space as a delimiter).

I'm not a natural coder or scripter - most of what I know is pretty basic, and has been driven by necessity (task oriented). Without a serious grounding in any scripting or programming languages, I find this sort of thing quite challenging and was pleased to get the outputs I wanted, I just wish it hadn't taken so much time to figure out. :| And i'm still not sure what i'd do if I wanted 'all the rest' of the fields, instead of just fields 1 and 2, of the parsed file.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

For the first time doing this

For the first time doing this in Windows.