【问题】
python中,使用正则期间,用如下代码:
#http://autoexplosion.com/cars/buy/150594.php
foundMainType = re.search("http://autoexplosion\.com/(?<mainType>\w+)/buy/(?<adId>\d+)\.php", itemLink);
结果出错:
Traceback (most recent call last):
File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 3
80, in <module>
main();
File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 3
04, in main
itemInfoDict = processEachItem(itemLink);
File "E:\Dev_Root\freelance\Elance\projects\40377988_data_mining\40377988_data_mining\40377988_data_mining.py", line 1
83, in processEachItem
foundMainType = re.search("http://autoexplosion\.com/(?<mainType>\w+)/buy/(?<adId>\d+)\.php", itemLink);
File "E:\dev_install_root\Python27\lib\re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "E:\dev_install_root\Python27\lib\re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: syntax error
【解决过程】
1.调试了半天,结果也还是没找到错误的原因。
2.后来去看了re的语法,才发现是:
(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named id in the example below can also be referenced as the numbered group 1.
For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to.sub() (using \g<id>).
即,是:
(?P<xxx>…)
而不是:
(?<xxx>…)
所以,改为:
#http://autoexplosion.com/cars/buy/150594.php
foundMainType = re.search("http://autoexplosion\.com/(?P<mainType>\w+)/buy/(?P<adId>\d+)\.php", itemLink);
就可以了。
注:C#中的正则,named group是(?<xxx>…)
--转自
该贴由koei123转至本版2015-6-1 14:53:31