2011-06-05 19:47:05Help with RSS problems
John Cook

john@skepticalscience...
121.222.9.229

RSS goes dodgy intermittently. Been getting regular emails like this:

Some fraction of your recent posts that go out over RSS seem to be malformed, so that I see the raw html formatting. This is in NetNewsWire for the Mac, and has been going on for a few weeks. For example, in Part 2B of the "Average Anomalies" series, here's what I see:

p style="text-align: justify;"In a href="http://www.skepticalscience.com/OfAveragesAndAnomalies_pt_1A.html"Part 1A/a and a href="http://www.skepticalscience.com/OfAveragesAndAnomalies_pt_1B.html"Part 1B/a we looked at how surface temperature trends are calculated, the importance of using Temperature Anomalies as your starting point before doing any averagingm and why this can make our temperature record more robust./p p style="text-align: justify;"In a href="http://www.skepticalscience.com/OfAveragesAndAnomalies_pt_2A.html"Part 2A/a and in this Part 2B we will look at a number of the claims made about lsquo;problemsrsquo; in the record, and how misperceptions about how the record is calculated can lead us to think that it is more fragile than it actually is.

Seems to be any weird character in the blog title, like a &, and it throws the whole feed out. Any tips from our programming boffins on how to avoid this happening?

2011-06-10 07:23:14
oslo

borchinfolab@gmail...
90.149.33.182

The character & is not a welcome character for many purposes, and specifically in XML.

A translate to & amp; (no space) might do the trick, RSS should be pure XML - a singe & is not (sorry for slow reply)

2011-06-10 11:12:56More generic replace
John Cook

john@skepticalscience...
121.222.9.229
Is there some PHP command that could remove all non compliant characters in one fell swoop? Rather than hit individual characters.
2011-06-10 18:28:29
oslo

borchinfolab@gmail...
90.149.33.182

Actually there are only three:

& translated to & amp;

< translated to & lt;

> translated to & gt;

(leave the space out, just added space to make sure the editor dosn't convert)

No other html entities allowed, so you should probably use htmlspecialchars_decode() first (if the editor creates them), and then translate each of the characters to entities - translating < and > is a bit tricky if the code has html (you can probably skip translating them, all together). The code should have no html errors, so if the editor does not produce clean html code you have a problem. The text should also be in character set UTF8, but as far as I can tell it is.