Discussion:
Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
Harry Putnam
2017-05-27 14:58:02 UTC
Permalink
all.SCORE:

((mark -100)
("from"
("***@gmail" -101 nil r)
("sina\.com" -101 nil r)
("@aol\\.com" -101 nil r)
("@[0-9]+\\.com>" -101 nil r)
("***@gmail" -101 nil r)
("s[ea]l[el]\\|discount\\|free\\|wholesale\\|paypal" -101 nil r))
("subject"
("~~" -101 nil r)
("~~\\|>>>\\|\\[A-Z\\]\\{4\\}" -101 nil r)
("!!\\|free\\|discount\\|wholesale" -101 nil r)))

I've forgotten how that was generated but would like to hand edit it.

You can see the term `free' in two places... in the last `from' element
and the last `subject' element.

I want the `free' at the last `from' element to be more restrictive as
it is hitting quite a few false positives due to network name with
various combinations of free with a dot like: `free.', `.free' and
`.free.'

This is happening in groups with thousands and thousands of messages
so I don't want to get it wrong... not sure how to re-run it.

So something like (please ignore the elisions (`[...]')):

[...] |[^\.]free[^\.]\\|[...]

But does it need the double slashes like:

[...] |\\[^\.\\]free\\[^\.\\] [...]
^^ ^^ ^^
Will that even accomplish what I am after; to allow `free' in any
combination of: `.free', `free.' or `.free.' to not be down scored?

Is there a handy way to test the regex?

Is there a handy way to rerun all those messages thru `all.SCORE'?
Ben Bacarisse
2017-05-29 10:34:07 UTC
Permalink
Post by Harry Putnam
((mark -100)
("from"
("sina\.com" -101 nil r)
("s[ea]l[el]\\|discount\\|free\\|wholesale\\|paypal" -101 nil r))
("subject"
("~~" -101 nil r)
("~~\\|>>>\\|\\[A-Z\\]\\{4\\}" -101 nil r)
("!!\\|free\\|discount\\|wholesale" -101 nil r)))
<snip>
Post by Harry Putnam
I want the `free' at the last `from' element to be more restrictive as
it is hitting quite a few false positives due to network name with
various combinations of free with a dot like: `free.', `.free' and
`.free.'
This is happening in groups with thousands and thousands of messages
so I don't want to get it wrong... not sure how to re-run it.
[...] |[^\.]free[^\.]\\|[...]
It's simpler than you think because . does not need \ inside []s. All
you need to add is [^.] on either side.
Post by Harry Putnam
[...] |\\[^\.\\]free\\[^\.\\] [...]
^^ ^^ ^^
Will that even accomplish what I am after; to allow `free' in any
combination of: `.free', `free.' or `.free.' to not be down scored?
You've added \s only where not needed! There are two things going on
that require \s. First, some elements i a regexp only mean what you
want when preceded by \. So | is just | unless you write \| to mean an
alternative. But then the regexp is being put into a string, and \s
need to be doubled inside a string so that they remain \s. So, if you
did need [^\.] (you don't) you'd have to write "... [^\\.] ..." in a
string.
Post by Harry Putnam
Is there a handy way to test the regex?
I use highlight mode. Text matching your regexp gets highlighted in
real time. Remember, get the regexp working, then double every \ to put
it into a string.
Post by Harry Putnam
Is there a handy way to rerun all those messages thru `all.SCORE'?
I think just exiting and re-entering the group does that, though I'm
sure there will be some more direct way.
--
Ben.
Harry Putnam
2017-05-30 00:51:47 UTC
Permalink
Post by Ben Bacarisse
It's simpler than you think because . does not need \ inside []s. All
you need to add is [^.] on either side.
Thanks for the well aimed tutorial... a great help

Loading...