Some RegEx stuff

Regular Expression NFA
Image by jl_2 via Flickr

So, I was trying to match results from /proc/pid/stat which gives a line that looks like this:

2321 (squid) S 1 2321 2321 0 -1 4202752 4841 0 0 0 530 577 0 0 20 0 1 0 24716 36311040 4417 18446744073709551615 1 1 0 0 0 0 0 4096 85571 18446744073709551615 0 0 17 0 0 0 1945 0 0

And where the 10th entry is the number of minor faults and the 12th entry is the number of major faults (as you can see here, Squid has had 4841 minor faults and 0 major faults since it was restarted when I changed IP address).

So a RegEx seemed to be the way to go and my first attempt looked like this:

(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+\\s(\\S)+

But this did not work … well, it matched the line but it gave me bad results. [Note: \s matches whitespace, \S matches non-whitespace.]

I am sure you are all cleverer than me and saw the flaw straight away – but it took me some time to figure it out: (\\S)+ would treat only the first character of 4841 as a match –  what I needed to use was (\\S+) which matched the group and not just the character.

 

And… further to my querying of the poorly written GNU RegEx documentation, the nmatch parameter should be one bigger than the number of groups expected to be matched – in the above case that means 13.