The previous regular expression based approach sometimes could not extract message properly. Using xml parser simplify code and fix several messages that were not extracted properly, like messages containing ", [] or {}
This also fix some problems when looking for messages sources:
- archived web pages were sometimes used instead of published ones
- messages from gadgets implemented as page templates/OFS files were not extracted.
A few more unit tests for the scripts involved in this process are added.