Regex find and replace phrase starting with X and ending with Y

I needed to remove many lines from my sitemap.xml file which Google search console didn’t crawl correctly (reported as soft 404). The lines to remove were similar to the following:

These lines were scattered all over sitemap.xml mixed with other records in between.

The solution:
Using regex to find all matching patterns and remove them (replace with nothing).
The pattern used to find the lines:

*yes I know the use of “.” above is not entirely correct, but it works perfectly

Note the use of two interesting regex matches:

  • “?” – non greedy match – finds the closest match for the ending pattern. Without it the regex above would find the first match up to the last match for the ending pattern
  • “\R” – match line return / new line – I had a new line starting right after “<url>”

The replace string was empty and the result was the removal of all the lines I needed to remove, but leaving empty lines in my sitemap.xml file instead of the replaced lines.  To solve this I ran another regex find and replace this time with the following pattern:

and nothing as the replace string. All empty lines gone.

Good luck with your replacement adventures!

Leave a Reply

Your email address will not be published. Required fields are marked *