Bulletproof Rich-content Filters

Tue, 18 Dec 2007 11:13:28 GMT
by pdp

It is true that here, at GNUCITIZEN, we try to look more on the offensive side of the things rather then the defensive side. I personally find that perfectly fine and ethical since you need people from both camps. Not, that we are the bad guys, (we are whitehats) but we primarily concentrate on how to break things. As such, we are part of the information security food chain. Some break, others fix. Some of us destroy, others build upon. And there is a lot of value in breaking things. More then you can imagine! It is a simple fact that you don't know how things work without first taking them apart.

Though, once in a while, we try to show how to fix problems by using what we have learned along the way. This is exactly what I am planning to do today. In the this post I will briefly introduce you to some of the concepts that have build up with the time, about how to allow rich user-supplied content such as HTML and still guarantee a bulletproof security. Keep in mind that although the proposed mechanism works perfectly fine, there is always a chance to screw up. In that case you should blame no one but yourself.

Before I continue I must say that I haven't pioneered the here discussed techniques. I don't know who did but what I know is that there are several tools (AntiSamy) that already implement them. Though, I will add my own twist to the overall concept.

Let's glance through the problem: "The fast world of Web2.0 moves towards what is known as _User-supplied Rich Content_, _Data in the Cloud_, etc. In this model the user supplies all the data in a rich way. The user is able to define HTML, execute client-side JavaScript and even execute server-side JavaScript. The problem is that in order for the user to do all of that in a secure manner, there must be an excellent understandings of what is malicious so that the bad input can be sanitized. Think about MySpace, for example. The users can define their own HTML and CSS for their home pages. Of course, MySpace needs to guarantee that the user-supplied HTML wont effect the owner or any other visitor arriving on the personal profile. This is a hard task which often involves a lot of sanitization, blacklistings, whitelistings, filters, etc, etc, etc. It is fair to say that the model is insecure by design. But is there something that we can do about it and will that something guarantee good enough security?"

The simple fact is that it is possible to lockdown a special cases like the one discussed above. Not only it is possible but also it can be made bulletproof and extremely reliable. One of the key problems that rich-content applications face today is that it is very hard to detect malicious input. This is due to the enormous amount of differences between client-side technologies. One type of expression may render in Firefox and at the same time fail in IE. Not only that, but the Web Apps' World is changing so drastically that it is not even feasible anymore to perform security checks based on whatever filtering mechanism you might have implemented.

Some security folks suggest a security model which I believe may work if all vendors start working together. We all know that this will never happen. The proposed model consists of several stages where first of all browsers and client-side technologies become compatible with each other and then a common sandboxing mechanism is invented where the data is clearly separated from the logic. It makes sense but as I said it wont work. So, what can we do on the server-side in order to improve the situation?

Your best chances is to find the secure/compromisable common dominator between client-side technologies and use that as a base. How do we do that? Let's take a look at HTML for example.

Stage1: The Intermediate Format

So in our case we have a scenario where we want to allow the user to upload rich content in a form of HTML but at the same time somehow to sandbox the content in such a way that no malicious activities can be performed. Due to the fact that HTML is based on SGML and both of them allow tags to overlap, i.e it is not XML, it is very hard to make use of regular expressions in order to secure the content. Our best bet is to convert the content to an intermediate format which can be examined.

At the first stage we get junk from the user, which we don't trust, and convert it to something that we can guarantee that is at least machine-readable. The best choice for this machine-readable format is XML, mainly because it is fairly easy to parse and analyse and there are more then enough tools that can deal with this problem. So, we receive the random text from the user and we convert it to XML. How do we convert junk into XML? This is kind of trivial. There are many implementations of the so called HTML Tidy engine. HTML Tidy receives a text and cleans it up to an extend where the result is well formatted XML or XHTML if you like.

Stage1 completed!

Stage2: The Compliance Checks

I bet a lot of your already know where this thing is going. But don't jump of your seats yet. You are probably thinking to parse the XML and do your crazy magic but if you go on this read you are already getting into trouble and probably reversing the good effect of the XML potion. Let's keep it simple. Instead of parsing the XML yourself, like the AntiSamy guys did, you should use tools that already does the same and are already been extensively tested for compatibility and security problems. For this type of purpose I would suggest to use XSD (XML Schema Definition).

In XSD we can define/whitelist the syntax that we think is fine for the user to use. For example, we can define that the user can supply a <p> tag, which may only contain whitespaces, text and <img> tags. On the other hand, each <img> tag may only have one attribute known as src= which is of a type URL. And the URL type may only contain strings that start with http:// or https://. There are a plenty of tools that will allow you to engineer the definition in a graphical way. It is really simple especially when you are using tools like Altove XMLSpy.

So, what do we do with XML and XSD? You are right, we match them. If the XSD matches the XML then the user has passed the check and the content is guaranteed to be none-malicious. There you go!

Things to watch out for

I said that there is always a chance to screw up. And probably you will unless you really know what you are doing. I personally won't allow <img> tags at all unless the src= attribute points to a real image file on a domain that I control. So for example, if you want your users to upload image files and then reference them from their pages with <img>, what you should probably do is to put all the images on a separate domain like static.yourdomain.com and make sure that your XSD matches against it for URL types. This will reduce the impact of CSRF attacks to an extend. It is also important to check the format of the files users can upload and deny write access if the don't match against any of your safe file formats. Etc!

So what does all that mean? It simply means that you can make a bulletproof rich-content filter, but this is just 10% from the whole journey. It also means that, there are 100% bulletproof filters, but not 100% bulletproof security models. The highest you can go with security is about 50% and I've been actually generous. The simple fact is there is nothing on this planet that cannot be hacked. If someone can use it, so attackers can. If you start living with this idea for a while, you will start perceiving the truth as it is.

Archived Comments

Thomas RoesslerThomas Roessler
This sounds like the kind of stuff that the W3C HTML working group should hear. See here for joining instructions: http://www.w3.org/html/wg/#who
pdppdp
Thomas, sure I can, but it will be better if someone transmits the message to the people who need to know about it. :)
tenesttenest
Not that this is a big deal, but XML Spy is by AltovA not AltovE. http://www.altova.com/
Sverre JohansenSverre Johansen
If I understand you correct this parser would drop any message that contains invalid (evil) markup. What you usually want is a parser that drops the invalid parts and produces a safe version. Is there any way of specifying what is legal in a formal way (like XML Schema, as you suggest, or RelaxNG), and then drop the invalid parts?
pdppdp
Sverre, hmmm, this is the hard bit. But I don't see why you have to be so user friendly. If the markup looks different from what you allow you should really say stop to the user and ask them to rephrase their input.
Sverre JohansenSverre Johansen
That would be a usability tradeoff, in my opinion. It is probably OK for small texts such as comments and descriptions, but not that suited for larger content such as a blog post. People are used to being able to write tagsoup and the browser will figure it out for them, it's really the same with security validation. A blogger might not know that the recipe he found for having a floating CSS menu (or whatever), could also be used for phishing. The content could even be something produces by Microsoft Word. It's kind of like how the web works, even if we like it or not :)
pdppdp
yes, I see your point. I don't see why it shouldn't be possible to strip out anything that does not much the XML Definition. Though, I am not aware of how this can be done in practice. It might be the case of converting the XML Definition into XML Stylesheet which only extract the parts that are safe and produce a normalized document. That might be a good solution.
ArkanoiDArkanoiD
I'd go for schema for "compliance" and XPath-based filter for extra filtering rules if there is need in more restructions..
arshanarshan
there are a few problems with using xsd after you turn the input into the predictable xhtml format. if i had not been so rushed and drained in san jose we could have talked about this. a) xsd is a blanket yes/no validation, so the application can't provide the user any feedback on how their input was altered. you may want to prevent any information leakage if you're a hardcore security person, but that's not very usable, especially for the myspace scenario when you want to help the user tune their input. i want the antisamy project to be an enabler - help give the little guy some functionality that has been totally reserved for the big players, and feedback is one way antisamy can differentiate you from others, safely. b) an xsd would have a very difficult time parsing cascading style sheets (which AntiSamy does, safely). if you're attacking antisamy, you can't provide a z-index, position values other than auto/inherit, and other CSS tricks that could lead to phishing. c) since antisamy doesn't retain all the extra functionality that an xsd parser has, it has a lighter footprint and doesn't have any of the xsd parser's attack surface. there are benefits of using xsd, of course: speed, stability, better supported code.