blog.Resource

Archive:

News-Feeds:


RSS 2.0
RSS 0.91
RDF
ATOM 0.3
January 13, 2008

Sanitizing FE input in TYPO3

Category: Zachary Davis

By: Zachary Davis

With social networking all the rage, clients want more display of frontend input. Turns out that parseFunc is a good tool for sanitizing the input.

In the last year or so, a good portion of our clients here at Cast Iron Coding have been requesting social networking functionality on their sites. In the current web climate, this means that we often find ourselves in a position where we're relying on frontend users to generate a lot of content on the site, which raise new challenges and problems in a system like TYPO3 that has historically enforced a strict split between the backend and the frontend -- a split in which the backend is the primary entry point for new content. Lately, we've been brainstorming ways to simplify the oft-repeated act of sanitizing frontend user input before saving it to the database and then parsing it before displaying it on the frontend.

Our current approach is to take the model used for RTE transformations -- in which content is transformed once before being saved to the database and then transformed again before being displayed on the frontend -- and use it for content input on the frontend. The flow is pretty straight-forward: the user inputs content into a textarea on the frontend (with or without an RTE, some HTML tags allowed). That content is passed through a parsefunc configuration (let's call it lib.parseFunc_feInput) before being saved to the database only to passed through another, corresponding parseFunc config before rendering it on the frontend. We've tried to build this out in such a way that the users are allowed to do things like include limited HTML tags and even object / embed tags for embedding flash video while maintaining strong precautions against javascript inject and whatnot.

What follows is going to be a bit technical, but may be of interest to TYPO3 extension developers. If you want to see this post with the code inline, you can view it here. Alternately, you can download the code snippets referenced below as a text file. In any case, this won't make much sense unless you follow along in the code.

To begin, take a look at this brief set of userFuncs that we use to expand on what parseFunc is capable of:

See snippet #1

These user funcs are called by the parseFunc configuration that frontend input is passed through:

See snippet #2

Ok -- so that's what we pass this through on input. At this point, if a user had entered this youtube snippet on input:

<object width="425" height="355">
<param name="movie"
value="http://www.youtube.com/v/6gmP4nk0EOE&rel=1">
</param>
<param name="wmode" value="transparent"></param>
<embed src="http://www.youtube.com/v/6gmP4nk0EOE&rel=1"
type="application/x-shockwave-flash"
wmode="transparent" width="425" height="355">
</embed>
</object>

It would have been replaced with our custom tag like this:

<youtube>6gmP4nk0EOE</youtube>

The next step is to make sure that this user input is passed through our corresponding parseFunc on output, which looks like this:

See snippet #3

At this point, all those custom tags are replaced with the code for the video. Note also that our output parsefunc closely corresponds with the input parsefunc -- we're doing our best here to limit the user to tags that are appropriate for frontend input while not being dangerous.

As a developer, then, your job is to make sure that all content input on the frontend is passed through lib.parseFunc_feInput before being saved, and that all content that came from the frontend is passed through lib.parseFunc_outputFeInput before being output on the frontend. We typically stick this typoscript and the corresponding user object in an extension, which makes all of this easily available throughout the frontend. We've come to develop most of our extensions following standard MVC conventions, and using typoscript extensively for outputting values from the database (our standard approach usually entails creating a cObject, calling start to set it's data to the record in question, and then looping through typoscript to output markers for inclusion in a template). With this approach, it's easy to output a field using this custom parsefunc like this:

bodytext = TEXT
bodytext {
field = bodytext
parseFunc < lib.parseFunc_outputFeInput
}

Likewise, it's easy to find the lib.parseFunc_feInput when you're saving to the database -- just look at
$GLOBALS['TSFE']->tmpl->setup['lib.']['parseFunc_feInput.']

Hope this helps someone out there -- it's still a work in progress and not something we're using in production environments just yet, but it's well on it's way. If you have any comments or feedback, I'd love to hear them.

--Zach


comments

comment #1
Gravatar: -julle -julle January 15, 2008 21:27
Seems like a great approach to a problem we all face more and more. And also it is a really good example what makes TYPO3 so unique to work with: It might be pretty well hidden, but there is very often very powerful tools avialable for the task at hand.

(and thanks for putting something else than mac-hype on buzz)

comment #2
Gravatar: Liquid Ice Liquid Ice January 26, 2008 20:14
The htmLawed PHP script - http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php - may be very good for customized HTML code restriction. Wonder if there's a plugin using this.

Sorry, comments are closed for this post.