Regular expressions

vladosubu

New Member
Dec 1, 2022
26
0
1
Good afternoon! After we started specifying domains in the blacklist through regular expressions

Oct 11 11:37:08 mx3.domain.com pmg-smtp-filter[312196]: WARNING: ^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE .cle.co$/ at /usr/share/perl5/PMG/RuleDB/WhoRegex.pm line 103.

For example, we want to block a domain and all its subdomains for example.com

We have specified a regular expression *.example.com

Apparently, that's why there are such errors in the logs. Please tell me, in our case, how to specify correctly?

.*example.com

Or

.*\.example\.com

Or something else? What's the right thing to do?
 
Aah, I see now, thanks.
Another question:
Is it possible somehow to automatically change all domains in the blacklist to regular expressions?
 
Hey vladosubu,

Just wanted to jump in and add a bit more detail to Stoiko's correct answer, as this is a really common point of confusion in regex that can lead to security loopholes if you're not careful.

The original error message you got (^* matches null string many times) was because * is a quantifier that means "zero or more of the preceding character". When you had ^*, there was no character before the *, which confuses the regex engine.

Now, regarding the correct pattern for blocking example.com and all its subdomains (like mail.example.com, test.mail.example.com, etc.):

The pattern .*\.example\.com is the most robust and secure option.

Here's a breakdown of why:

  • .* : This matches any character (.) zero or more times (*). This will cover any subdomain part (e.g., mail., test.mail., or even nothing at all).
  • \.: This is the most critical part. In regex, a single dot . is a wildcard that means "match any single character". If you just used .*example.com, it would accidentally block domains like example-com or exampleAcom because the . would match the - or the A. By putting a backslash \ before the dot, you are "escaping" it, which tells the regex engine to treat it as a literal dot character.
  • example: Matches the literal text "example".
  • \.: Another escaped dot.
  • com: Matches the literal text "com".
It can be really hard to see these subtle differences just by looking at the text. I've found that visualizing the expression is the best way to be sure. I've started using a tool called RegexVision which builds an "expression tree" that shows you what each component does.

Link: https://flura.top/reg

If you go there and compare .*example.com and .*\.example\.com, you'll see how the first one incorrectly treats the dot as a wildcard, while the second one correctly treats it as a literal period. It's super helpful for building and validating rules like this before putting them into production.