ILIAS
release_5-3 Revision v5.3.23-19-g915713cf615
|
Go to the source code of this file.
Functions | |
tln_tagprint ($tagname, $attary, $tagtype) | |
htmlfilter.incThis set of functions allows you to filter html in order to remove any malicious tags from it. More... | |
tln_casenormalize (&$val) | |
A small helper function to use with array_walk. More... | |
tln_skipspace ($body, $offset) | |
This function skips any whitespace from the current position within a string and to the next non-whitespace value. More... | |
tln_findnxstr ($body, $offset, $needle) | |
This function looks for the next character within a string. More... | |
tln_findnxreg ($body, $offset, $reg) | |
This function takes a PCRE-style regexp and tries to match it within the string. More... | |
tln_getnxtag ($body, $offset) | |
This function looks for the next tag. More... | |
tln_deent (&$attvalue, $regex, $hex=false) | |
Translates entities into literal values so they can be checked. More... | |
tln_defang (&$attvalue) | |
This function checks attribute values for entity-encoded values and returns them translated into 8-bit strings so we can run checks on them. More... | |
tln_unspace (&$attvalue) | |
Kill any tabs, newlines, or carriage returns. More... | |
tln_fixatts ( $tagname, $attary, $rm_attnames, $bad_attvals, $add_attr_to_tag, $trans_image_path, $block_external_images) | |
This function runs various checks against the attributes. More... | |
tln_fixurl ($attname, &$attvalue, $trans_image_path, $block_external_images) | |
tln_fixstyle ($body, $pos, $trans_image_path, $block_external_images) | |
tln_body2div ($attary, $trans_image_path) | |
tln_sanitize ( $body, $tag_list, $rm_tags_with_content, $self_closing_tags, $force_tag_closing, $rm_attnames, $bad_attvals, $add_attr_to_tag, $trans_image_path, $block_external_images) | |
HTMLFilter ($body, $trans_image_path, $block_external_images=false) | |
HTMLFilter | ( | $body, | |
$trans_image_path, | |||
$block_external_images = false |
|||
) |
Definition at line 1013 of file htmlfilter.php.
References array, and tln_sanitize().
tln_body2div | ( | $attary, | |
$trans_image_path | |||
) |
Definition at line 791 of file htmlfilter.php.
Referenced by tln_sanitize().
tln_casenormalize | ( | & | $val | ) |
A small helper function to use with array_walk.
Modifies a by-ref value and makes it lowercase.
string | $val | a value passed by-ref. |
Definition at line 69 of file htmlfilter.php.
tln_deent | ( | & | $attvalue, |
$regex, | |||
$hex = false |
|||
) |
Translates entities into literal values so they can be checked.
string | $attvalue | the by-ref value to check. |
string | $regex | the regular expression to check against. |
boolean | $hex | whether the entities are hexadecimal. |
Definition at line 439 of file htmlfilter.php.
Referenced by tln_defang().
tln_defang | ( | & | $attvalue | ) |
This function checks attribute values for entity-encoded values and returns them translated into 8-bit strings so we can run checks on them.
string | $attvalue | A string to run entity check against. |
Skip this if there aren't ampersands or backslashes.
Definition at line 465 of file htmlfilter.php.
References $m, and tln_deent().
Referenced by tln_fixatts(), and tln_fixstyle().
tln_findnxreg | ( | $body, | |
$offset, | |||
$reg | |||
) |
This function takes a PCRE-style regexp and tries to match it within the string.
string | $body | The string to look for needle in. |
integer | $offset | Start looking from here. |
string | $reg | A PCRE-style regex to match. |
Definition at line 127 of file htmlfilter.php.
References array.
Referenced by tln_getnxtag().
tln_findnxstr | ( | $body, | |
$offset, | |||
$needle | |||
) |
This function looks for the next character within a string.
It's really just a glorified "strpos", except it catches the failures nicely.
string | $body | The string to look for needle in. |
integer | $offset | Start looking from this position. |
string | $needle | The character/string to look for. |
Definition at line 105 of file htmlfilter.php.
Referenced by tln_getnxtag().
tln_fixatts | ( | $tagname, | |
$attary, | |||
$rm_attnames, | |||
$bad_attvals, | |||
$add_attr_to_tag, | |||
$trans_image_path, | |||
$block_external_images | |||
) |
This function runs various checks against the attributes.
string | $tagname | String with the name of the tag. |
array | $attary | Array with all tag attributes. |
array | $rm_attnames | See description for tln_sanitize |
array | $bad_attvals | See description for tln_sanitize |
array | $add_attr_to_tag | See description for tln_sanitize |
string | $trans_image_path | |
boolean | $block_external_images |
See if this attribute should be removed.
Remove any backslashes, entities, or extraneous whitespace.
Now let's run checks on the attvalues. I don't expect anyone to comprehend this. If you do, get in touch with me so I can drive to where you live and shake your hand personally. :)
There are two arrays in valary. First is matches. Second one is replacements
See if we need to append any attributes to this tag.
Definition at line 514 of file htmlfilter.php.
References tln_defang(), tln_fixurl(), and tln_unspace().
Referenced by tln_sanitize().
tln_fixstyle | ( | $body, | |
$pos, | |||
$trans_image_path, | |||
$block_external_images | |||
) |
First look for general BODY style declaration, which would be like so: body {background: blah-blah} and change it to .bodyclass so we can just assign it to a
Definition at line 666 of file htmlfilter.php.
References $i, array, tln_defang(), tln_fixurl(), and tln_unspace().
Referenced by tln_sanitize().
tln_fixurl | ( | $attname, | |
& | $attvalue, | ||
$trans_image_path, | |||
$block_external_images | |||
) |
Replace empty src tags with the blank image. src is only used for frames, images, and image inputs. Doing a replace should not affect them working as should be, however it will stop IE from being kicked off when src for img tags are not set
Definition at line 598 of file htmlfilter.php.
Referenced by tln_fixatts(), and tln_fixstyle().
tln_getnxtag | ( | $body, | |
$offset | |||
) |
This function looks for the next tag.
string | $body | String where to look for the next tag. |
integer | $offset | Start looking from here. |
We are here: blah blah <tag attribute="value"> ———^
There are 3 kinds of tags:
A comment or an SGML declaration.
Assume tagtype 1 for now. If it's type 3, we'll switch values later.
Look for next [-_], which will indicate the end of the tag name.
$match can be either of these: '>' indicating the end of the tag entirely. '' indicating the end of the tag name. '/' indicating that this is type-3 xhtml tag.
Whatever else we find there indicates an invalid tag.
This is an xhtml-style tag with a closing / at the end, like so:
. Check if it's followed by the closing bracket. If not, then this tag is invalid
Check if it's whitespace
This is an invalid tag! Look for the next closing ">".
At this point we're here: <tagname attribute="blah"> -——^
At this point we loop in order to find all attributes.
Non-closed tag.
See if we arrived at a ">" or "/>", which means that we reached the end of the tag.
Yep. So we did.
There are several types of attributes, with optional [:space:] between members. Type 1: attrname[:space:]=[:space:]'CDATA' Type 2: attrname[:space:]=[:space:]"CDATA" Type 3: attr[:space:]=[:space:]CDATA Type 4: attrname
We leave types 1 and 2 the same, type 3 we check for '"' and convert to """ if needed, then wrap in double quotes. Type 4 we convert into: attrname="yes".
Looks like body ended before the end of tag.
We arrived at the end of attribute name. Several things possible here: '>' means the end of the tag and this is attribute type 4 '/' if followed by '>' means the same thing as above '' means a lot of things – look what it's followed by. anything else means the attribute is invalid.
This is an xhtml-style tag with a closing / at the end, like so:
. Check if it's followed by the closing bracket. If not, then this tag is invalid
Skip whitespace and see what we arrive at.
Two things are valid here: '=' means this is attribute type 1 2 or 3. means this was attribute type 4. anything else we ignore and re-loop. End of tag and invalid stuff will be caught by our checks at the beginning of the loop.
Here are 3 possibilities: "'" attribute type 1 '"' attribute type 2 everything else is the content of tag type 3
These are hateful. Look for , or >.
If it's ">" it will be caught at the top.
That was attribute type 4.
An illegal character. Find next '>' and return.
The fact that we got here indicates that the tag end was never found. Return invalid tag indication so it gets stripped.
Definition at line 157 of file htmlfilter.php.
References array, tln_findnxreg(), tln_findnxstr(), and tln_skipspace().
Referenced by tln_sanitize().
tln_sanitize | ( | $body, | |
$tag_list, | |||
$rm_tags_with_content, | |||
$self_closing_tags, | |||
$force_tag_closing, | |||
$rm_attnames, | |||
$bad_attvals, | |||
$add_attr_to_tag, | |||
$trans_image_path, | |||
$block_external_images | |||
) |
string | $body | The HTML you wish to filter |
array | $tag_list | see description above |
array | $rm_tags_with_content | see description above |
array | $self_closing_tags | see description above |
boolean | $force_tag_closing | see description above |
array | $rm_attnames | see description above |
array | $bad_attvals | see description above |
array | $add_attr_to_tag | see description above |
string | $trans_image_path | |
boolean | $block_external_images |
Normalize rm_tags and rm_tags_with_content.
See if tag_list is of tags to remove or tags to allow. false means remove these tags true means allow these tags
Take care of netscape's stupid javascript entities like &{alert('boo')};
Take care of <style>
Got to the end of tag we needed to remove.
$rm_tags_with_content
See if this is a self-closing type and change tagtype appropriately.
See if we should skip this tag and any content inside it.
Convert body into div.
This is where we run other checks.
Definition at line 842 of file htmlfilter.php.
References array, tln_body2div(), tln_fixatts(), tln_fixstyle(), tln_getnxtag(), and tln_tagprint().
Referenced by HTMLFilter().
tln_skipspace | ( | $body, | |
$offset | |||
) |
This function skips any whitespace from the current position within a string and to the next non-whitespace value.
string | $body | the string |
integer | $offset | the offset within the string where we should start looking for the next non-whitespace character. |
Definition at line 84 of file htmlfilter.php.
Referenced by tln_getnxtag().
tln_tagprint | ( | $tagname, | |
$attary, | |||
$tagtype | |||
) |
Useful in cases when you need to filter user input for any cross-site-scripting attempts.
Copyright (C) 2002-2004 by Duke University
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Konstantin Riabitsev icon@ Jim Jagielski < linu x.duk e.ed ujim@j / aguN ET.co mjimja> 1.1 ($Date$) This function returns the final tag out of the tag name, an array of attributes, and the type of the tag. This function is called by tln_sanitize internally. g@gm ail.c om
string | $tagname | the name of the tag. |
array | $attary | the array of attributes and their values |
integer | $tagtype | The type of the tag (see in comments). |
Definition at line 41 of file htmlfilter.php.
References array.
Referenced by tln_sanitize().
tln_unspace | ( | & | $attvalue | ) |
Kill any tabs, newlines, or carriage returns.
Our friends the makers of the browser with 95% market value decided that it'd be funny to make "java[tab]script" be just as good as "javascript".
string | $attvalue | The attribute value before extraneous spaces removed. |
Definition at line 491 of file htmlfilter.php.
References array.
Referenced by tln_fixatts(), and tln_fixstyle().