I am hoping to get maximum exposure for this problem, so I get a solution.
I have a large XML file with one corrupt line – line 35,185,222
A very simple sed script prints out the broken line – as a single line (this is important!):
sed -n '35185222p' infile.xml
Gives this as output:
<load address=’11c1�����ze=’08’ />
But if I change my sed script (sticking it in a file to avoid having to escape the quotes) like so:
35185222s@^.*$@<load address=’11c1385b’ size=’08’ />@p
sed -n -f seddy.sed infile.xml
The script fails to match – because sed sees line 35185222 ending with the first corrupt character.
I know this because a script like:
So how do I fix this?
Update: I have been reading sed & awk but have only just got the the
c command – which I could have used. But I was also interested in a real fix – and thanks to Hugh and others in the comments I now know that comes from specifying the locale with