 |
CXLV. Tidy Functions
Tidy is a binding for the Tidy HTML clean and repair utility which
allows you to not only clean and otherwise manipulate HTML documents,
but also traverse the document tree.
Tidy is currently available for PHP 4.3.x and PHP 5 as a PECL
extension from
http://pecl.php.net/package/tidy.
注:
Tidy 1.0 is just for PHP 4.3.x, while Tidy 2.0 is just for PHP 5.
If PEAR is available on your *nix-like
system you can use the pear installer to install the tidy extension, by the
following command: pear -v install tidy.
You can always download the tar.gz package and install tidy by hand:
例子 1. tidy install by hand in PHP 4.3.x gunzip tidy-xxx.tgz
tar -xvf tidy-xxx.tar
cd tidy-xxx
phpize
./configure && make && make install |
|
Windows users can download the extension dll php_tidy.dll
from http://snaps.php.net/win32/PECL_STABLE/.
In PHP 5 you need only to compile using the --with-tidy option.
这些函数的行为受 php.ini 的影响。
表格 1. Tidy Configuration Options Name | Default | Changeable | Changelog |
---|
tidy.default_config | "" | PHP_INI_SYSTEM | Available since PHP 5.0.0. | tidy.clean_output | "0" | PHP_INI_PERDIR | Available since PHP 5.0.0. |
有关 PHP_INI_* 常量进一步的细节与定义参见 附录 G。
以下是配置选项的简要解释。
- tidy.default_config
string
Default path for tidy config file.
- tidy.clean_output
boolean
Turns on/off the output repairing by Tidy.
警告 |
Do not turn on tidy.clean_output if you are generating
non-html content such as dynamic images.
|
value - the value of the node (e.g. the html text) name - the name of the tag (e.g. html, a, etc..) type - the type of the node (one of the constants above, e.g. TIDY_NODETYPE_PHP) line* - the line where the node starts column* - the column where the node starts proprietary* - TRUE if the node refers to a proprietary tag id - the ID of the tag (one of the constants above, e.g. TIDY_TAG_FRAME) attribute - an array with the attributes of the current node, or NULL if there aren't any child - an array with the child tidyNodes, or NULL if there aren't any
注:
The properties marked with * are just available since PHP 5.1.0.
以下常量由本扩展模块定义,因此只有在本扩展模块被编译到
PHP 中,或者在运行时被动态加载后才有效。
Each TIDY_TAG_XXX represents a HTML tag. For example,
TIDY_TAG_A represents a <a
href="XX">link</a> tag. Each TIDY_ATTR_XXX
represents a HTML atribute. For example TIDY_ATTR_HREF
would represent the href atribute in the previous example.
The following constants are defined:
表格 2. tidy tag constants constant |
---|
TIDY_TAG_UNKNOWN | TIDY_TAG_A | TIDY_TAG_ABBR | TIDY_TAG_ACRONYM | TIDY_TAG_ALIGN | TIDY_TAG_APPLET | TIDY_TAG_AREA | TIDY_TAG_B | TIDY_TAG_BASE | TIDY_TAG_BASEFONT | TIDY_TAG_BDO | TIDY_TAG_BGSOUND | TIDY_TAG_BIG | TIDY_TAG_BLINK | TIDY_TAG_BLOCKQUOTE | TIDY_TAG_BODY | TIDY_TAG_BR | TIDY_TAG_BUTTON | TIDY_TAG_CAPTION | TIDY_TAG_CENTER | TIDY_TAG_CITE | TIDY_TAG_CODE | TIDY_TAG_COL | TIDY_TAG_COLGROUP | TIDY_TAG_COMMENT | TIDY_TAG_DD | TIDY_TAG_DEL | TIDY_TAG_DFN | TIDY_TAG_DIR | TIDY_TAG_DIV | TIDY_TAG_DL | TIDY_TAG_DT | TIDY_TAG_EM | TIDY_TAG_EMBED | TIDY_TAG_FIELDSET | TIDY_TAG_FONT | TIDY_TAG_FORM | TIDY_TAG_FRAME | TIDY_TAG_FRAMESET | TIDY_TAG_H1 | TIDY_TAG_H2 | TIDY_TAG_H3 | TIDY_TAG_H4 | TIDY_TAG_H5 | TIDY_TAG_H6 | TIDY_TAG_HEAD | TIDY_TAG_HR | TIDY_TAG_HTML | TIDY_TAG_I | TIDY_TAG_IFRAME | TIDY_TAG_ILAYER | TIDY_TAG_IMG | TIDY_TAG_INPUT | TIDY_TAG_INS | TIDY_TAG_ISINDEX | TIDY_TAG_KBD | TIDY_TAG_KEYGEN | TIDY_TAG_LABEL | TIDY_TAG_LAYER | TIDY_TAG_LEGEND | TIDY_TAG_LI | TIDY_TAG_LINK | TIDY_TAG_LISTING | TIDY_TAG_MAP | TIDY_TAG_MARQUEE | TIDY_TAG_MENU | TIDY_TAG_META | TIDY_TAG_MULTICOL | TIDY_TAG_NOBR | TIDY_TAG_NOEMBED | TIDY_TAG_NOFRAMES | TIDY_TAG_NOLAYER | TIDY_TAG_NOSAFE | TIDY_TAG_NOSCRIPT | TIDY_TAG_OBJECT | TIDY_TAG_OL | TIDY_TAG_OPTGROUP | TIDY_TAG_OPTION | TIDY_TAG_P | TIDY_TAG_PARAM | TIDY_TAG_PLAINTEXT | TIDY_TAG_PRE | TIDY_TAG_Q | TIDY_TAG_RP | TIDY_TAG_RT | TIDY_TAG_RTC | TIDY_TAG_RUBY | TIDY_TAG_S | TIDY_TAG_SAMP | TIDY_TAG_SCRIPT | TIDY_TAG_SELECT | TIDY_TAG_SERVER | TIDY_TAG_SERVLET | TIDY_TAG_SMALL | TIDY_TAG_SPACER | TIDY_TAG_SPAN | TIDY_TAG_STRIKE | TIDY_TAG_STRONG | TIDY_TAG_STYLE | TIDY_TAG_SUB | TIDY_TAG_TABLE | TIDY_TAG_TBODY | TIDY_TAG_TD | TIDY_TAG_TEXTAREA | TIDY_TAG_TFOOT | TIDY_TAG_TH | TIDY_TAG_THEAD | TIDY_TAG_TITLE | TIDY_TAG_TR | TIDY_TAG_TR | TIDY_TAG_TT | TIDY_TAG_U | TIDY_TAG_UL | TIDY_TAG_VAR | TIDY_TAG_WBR | TIDY_TAG_XMP |
表格 3. tidy attribute constants constant |
---|
TIDY_ATTR_UNKNOWN | TIDY_ATTR_ABBR | TIDY_ATTR_ACCEPT | TIDY_ATTR_ACCEPT_CHARSET | TIDY_ATTR_ACCESSKEY | TIDY_ATTR_ACTION | TIDY_ATTR_ADD_DATE | TIDY_ATTR_ALIGN | TIDY_ATTR_ALINK | TIDY_ATTR_ALT | TIDY_ATTR_ARCHIVE | TIDY_ATTR_AXIS | TIDY_ATTR_BACKGROUND | TIDY_ATTR_BGCOLOR | TIDY_ATTR_BGPROPERTIES | TIDY_ATTR_BORDER | TIDY_ATTR_BORDERCOLOR | TIDY_ATTR_BOTTOMMARGIN | TIDY_ATTR_CELLPADDING | TIDY_ATTR_CELLSPACING | TIDY_ATTR_CHAR | TIDY_ATTR_CHAROFF | TIDY_ATTR_CHARSET | TIDY_ATTR_CHECKED | TIDY_ATTR_CITE | TIDY_ATTR_CLASS | TIDY_ATTR_CLASSID | TIDY_ATTR_CLEAR | TIDY_ATTR_CODE | TIDY_ATTR_CODEBASE | TIDY_ATTR_CODETYPE | TIDY_ATTR_COLOR | TIDY_ATTR_COLS | TIDY_ATTR_COLSPAN | TIDY_ATTR_COMPACT | TIDY_ATTR_CONTENT | TIDY_ATTR_COORDS | TIDY_ATTR_DATA | TIDY_ATTR_DATAFLD | TIDY_ATTR_DATAPAGESIZE | TIDY_ATTR_DATASRC | TIDY_ATTR_DATETIME | TIDY_ATTR_DECLARE | TIDY_ATTR_DEFER | TIDY_ATTR_DIR | TIDY_ATTR_DISABLED | TIDY_ATTR_ENCODING | TIDY_ATTR_ENCTYPE | TIDY_ATTR_FACE | TIDY_ATTR_FOR | TIDY_ATTR_FRAME | TIDY_ATTR_FRAMEBORDER | TIDY_ATTR_FRAMESPACING | TIDY_ATTR_GRIDX | TIDY_ATTR_GRIDY | TIDY_ATTR_HEADERS | TIDY_ATTR_HEIGHT | TIDY_ATTR_HREF | TIDY_ATTR_HREFLANG | TIDY_ATTR_HSPACE | TIDY_ATTR_HTTP_EQUIV | TIDY_ATTR_ID | TIDY_ATTR_ISMAP | TIDY_ATTR_LABEL | TIDY_ATTR_LANG | TIDY_ATTR_LANGUAGE | TIDY_ATTR_LAST_MODIFIED | TIDY_ATTR_LAST_VISIT | TIDY_ATTR_LEFTMARGIN | TIDY_ATTR_LINK | TIDY_ATTR_LONGDESC | TIDY_ATTR_LOWSRC | TIDY_ATTR_MARGINHEIGHT | TIDY_ATTR_MARGINWIDTH | TIDY_ATTR_MAXLENGTH | TIDY_ATTR_MEDIA | TIDY_ATTR_METHOD | TIDY_ATTR_MULTIPLE | TIDY_ATTR_NAME | TIDY_ATTR_NOHREF | TIDY_ATTR_NORESIZE | TIDY_ATTR_NOSHADE | TIDY_ATTR_NOWRAP | TIDY_ATTR_OBJECT | TIDY_ATTR_OnAFTERUPDATE | TIDY_ATTR_OnBEFOREUNLOAD | TIDY_ATTR_OnBEFOREUPDATE | TIDY_ATTR_OnBLUR | TIDY_ATTR_OnCHANGE | TIDY_ATTR_OnCLICK | TIDY_ATTR_OnDATAAVAILABLE | TIDY_ATTR_OnDATASETCHANGED | TIDY_ATTR_OnDATASETCOMPLETE | TIDY_ATTR_OnDBLCLICK | TIDY_ATTR_OnERRORUPDATE | TIDY_ATTR_OnFOCUS | TIDY_ATTR_OnKEYDOWN | TIDY_ATTR_OnKEYPRESS | TIDY_ATTR_OnKEYUP | TIDY_ATTR_OnLOAD | TIDY_ATTR_OnMOUSEDOWN | TIDY_ATTR_OnMOUSEMOVE | TIDY_ATTR_OnMOUSEOUT | TIDY_ATTR_OnMOUSEOVER | TIDY_ATTR_OnMOUSEUP | TIDY_ATTR_OnRESET | TIDY_ATTR_OnROWENTER | TIDY_ATTR_OnROWEXIT | TIDY_ATTR_OnSELECT | TIDY_ATTR_OnSUBMIT | TIDY_ATTR_OnUNLOAD | TIDY_ATTR_PROFILE | TIDY_ATTR_PROMPT | TIDY_ATTR_RBSPAN | TIDY_ATTR_READONLY | TIDY_ATTR_REL | TIDY_ATTR_REV | TIDY_ATTR_RIGHTMARGIN | TIDY_ATTR_ROWS | TIDY_ATTR_ROWSPAN | TIDY_ATTR_RULES | TIDY_ATTR_SCHEME | TIDY_ATTR_SCOPE | TIDY_ATTR_SCROLLING | TIDY_ATTR_SELECTED | TIDY_ATTR_SHAPE | TIDY_ATTR_SHOWGRID | TIDY_ATTR_SHOWGRIDX | TIDY_ATTR_SHOWGRIDY | TIDY_ATTR_SIZE | TIDY_ATTR_SPAN | TIDY_ATTR_SRC | TIDY_ATTR_STANDBY | TIDY_ATTR_START | TIDY_ATTR_STYLE | TIDY_ATTR_SUMMARY | TIDY_ATTR_TABINDEX | TIDY_ATTR_TARGET | TIDY_ATTR_TEXT | TIDY_ATTR_TITLE | TIDY_ATTR_TOPMARGIN | TIDY_ATTR_TYPE | TIDY_ATTR_USEMAP | TIDY_ATTR_VALIGN | TIDY_ATTR_VALUE | TIDY_ATTR_VALUETYPE | TIDY_ATTR_VERSION | TIDY_ATTR_VLINK | TIDY_ATTR_VSPACE | TIDY_ATTR_WIDTH | TIDY_ATTR_WRAP | TIDY_ATTR_XML_LANG | TIDY_ATTR_XML_SPACE | TIDY_ATTR_XMLNS |
表格 4. tidy nodetype constants constant | description |
---|
TIDY_NODETYPE_ROOT | root node | TIDY_NODETYPE_DOCTYPE | doctype | TIDY_NODETYPE_COMMENT | HTML comment | TIDY_NODETYPE_PROCINS | Processing Instruction | TIDY_NODETYPE_TEXT | Text | TIDY_NODETYPE_START | start tag | TIDY_NODETYPE_END | end tag | TIDY_NODETYPE_STARTEND | empty tag | TIDY_NODETYPE_CDATA | CDATA | TIDY_NODETYPE_SECTION | XML section | TIDY_NODETYPE_ASP | ASP code | TIDY_NODETYPE_JSTE | JSTE code | TIDY_NODETYPE_PHP | PHP code | TIDY_NODETYPE_XMLDECL | XML declaration |
This simple example shows basic Tidy usage.
例子 2. Basic Tidy usage
<?php ob_start(); ?> <html>a html document</html> <? $html = ob_get_clean();
// Specify configuration $config = array( 'indent' => true, 'output-xhtml' => true, 'wrap' => 200);
// Tidy $tidy = new tidy; $tidy->parseString($html, $config, 'utf8'); $tidy->cleanRepair();
// Output echo $tidy; ?>
|
|
Paul Cook
11-Apr-2006 11:25
To get libtidy and PHP 5.0.5 compiled on OS X Tiger this is what I needed to do:
1) download and upack the tidy source.
2) cd tidy-source-dir
3) >> /bin/sh build/gnuauto/setup.sh
4) then you can configure/make/make install as normal
PHP build generates errors because of tidy so I needed to edit the platform.h file like this (use your favorite command line editor):
5) >> sudo emacs /usr/local/include/platform.h
6) comment out line 508 which was causing the 'duplicate "unsigned" ' error in the PHP build.
7) configure/make/make install PHP as normal using --with-tidy=/usr/local
Restart apache and everything works now. HTH someone.
patatraboum at nospam dot fr
26-Feb-2006 02:13
<?php
//
//The tidy tree of your favorite !
//For PHP 5 (CGI)
//Thanks to john@php.net
//
$file="http://www.php.net";
//
$cns=get_defined_constants(true);
$tidyCns=array("tags"=>array(),"types"=>array());
foreach($cns["tidy"] as $cKey=>$cVal){
if($cPos=strpos($cKey,$cStr="TAG")) $tidyCns["tags"][$cVal]="$cStr : ".substr($cKey,$cPos+strlen($cStr)+1);
elseif($cPos=strpos($cKey,$cStr="TYPE")) $tidyCns["types"][$cVal]="$cStr : ".substr($cKey,$cPos+strlen($cStr)+1);
}
$tidyNext=array();
//
echo "<html><head><meta http-equiv='Content-Type' content='text/html; charset=windows-1252'><title>Tidy Tree :: $file</title></head>";
echo "<body><pre>";
//
tidyTree(tidy_get_root(tidy_parse_file($file)),0);
//
function tidyTree($tidy,$level){
global $tidyCns,$tidyNext;
$tidyTab=array();
$tidyKeys=array("type","value","id","attribute");
foreach($tidy as $pKey=>$pVal){
if(in_array($pKey,$tidyKeys)) $tidyTab[array_search($pKey,$tidyKeys)]=$pVal;
}
ksort($tidyTab);
foreach($tidyTab as $pKey=>$pVal){
switch($pKey){
case 0 :
if($pVal==4) $value=true; else $value=false;
echo indent(true,$level).$tidyCns["types"][$pVal]."\n"; break;
case 1 :
if($value){
echo indent(false,$level)."VALEUR : ".str_replace("\n","\n".indent(false,$level),$pVal)."\n";
}
break;
case 2 :
echo indent(false,$level).$tidyCns["tags"][$pVal]."\n"; break;
case 3 :
if($pVal!=NULL){
echo indent(false,$level)."ATTRIBUTS : ";
foreach ($pVal as $aKey=>$aVal) echo "$aKey=$aVal "; echo "\n";
}
}
}
if($tidy->hasChildren()){
$level++; $i=0;
$tidyNext[$level]=true;
echo indent(false,$level)."\n";
foreach($tidy->child as $child){
$i++;
if($i==count($tidy->child)) $tidyNext[$level]=false;
tidyTree($child,$level);
}
}
else echo indent(false,$level)."\n";
}
//
function indent($tidyType,$level){
global $tidyNext;
$indent="";
for($i=1;$i<=$level;$i++){
if($i<$level||!$tidyType){
if($tidyNext[$i]) $str="| "; else $str=" ";
}
else $str="+--";
$indent=$indent.$str;
}
return $indent;
}
//
echo "</pre></body></html>";
//
?>
tonygambone at gNOSPAMmail dot com
07-Feb-2006 12:03
Using PHP 5.1.2 on Win32/IIS, I noticed that even with "output-xhtml: yes," tidy was adding the deprecated name attribute to form tags (using the value of the id attribute). Grabbing the latest dll from the snaps link at the top of the page fixed this.
tom at expresshosting dot net
24-Aug-2005 01:50
It should be noted that the examples on this page apply ONLY to PHP5. None of the functions in the manual apply to PHP4. The names are the same but arguments are different on some of them (tidy_parse_string).
If you wish to use tidy in PHP 4.3.x you can use the following example instead:
<?php
$tidyhtml = ob_get_contents();
if( function_exists( 'tidy_parse_string' ) ) {
tidy_set_encoding('iso-8859-1');
tidy_parse_string($tidyhtml);
tidy_setopt('output-xhtml', TRUE);
tidy_setopt('indent', TRUE);
tidy_setopt('indent-spaces', 2);
tidy_setopt('wrap', 200);
tidy_clean_repair();
$tidyhtml = tidy_get_output();
}
ob_end_clean();
echo $tidyhtml;
?>
Hope that helps somebody.
mohan at asix dot com dot my
11-Feb-2005 04:23
To those who need to install libtidy on mac os x , here is a guide that worked for me :
If you're on Mac OS X, you'll need to tell the Makefile that you use
ranlib:
$ export set RANLIB=ranlib
Change to the directory with the Makefile in it, and run make.
This example uses the GNU make Makefile.
$ cd tidy/build/gmake/
$ make
if [ ! -d ./obj ]; then mkdir ./obj; fi
gcc -o obj/access.o ...
... etc etc etc ...
Install the libs, headers and the tidy executable:
$ sudo make install
If you're on Mac OS X, you'll have to run ranlib again on the installed
lib:
$ sudo ranlib /usr/local/lib/libtidy.a
Jon Dowland (bugs at alcopop dot org)
01-Feb-2005 07:40
Rough installation instructions for debian/testing:
Use debian's apt package manager to install the required development packages
$ apt-get install php4-dev php4-pear libtidy-dev
Then use pear to install tidy
$ pear install tidy
Note: I did /not/ have success installing the tarball locally. Only using this method was the .so put in the correct place.
I also had to add an entry to the php.ini
$ echo extension=tidy.so >> /etc/php4/apache/php.ini
$ apachectl restart
...and you're done.
14-Jan-2005 03:20
Just in case anyone else has been having problems using the tidy extension in *PHP4 v4.3.10. Here is a working example:
$html = '<HTML><HEAD></HEAD><BODY>Hello World</BODY></HTML>';
$config = array('indent'=> TRUE,
'output-xhtml' => TRUE,
'wrap' => 80);
tidy_set_encoding('UTF8');
foreach ($config as $key => $value) {
tidy_setopt($key,$value);
}
tidy_parse_string($html);
tidy_clean_repair();
echo tidy_get_output();
Resultant HTML should be similar to:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
Hello World
</body>
</html>
doodleelephant
16-Nov-2004 09:34
I'm installing PHP 5.0.2 on Redhat Linux (I forget the version. Enterprise WS 3 I think) I had troubles installing the libtidy. It consistently complained that it could not find 'libtidy'. I finally got a clue into how to install it (in build/gnuauto/readme.txt). This is how I finally got it to install (after lots of trial and error):
First, don't get the binary distribution of of tidy.sf.net. It's not what you want. You need the source distribution.
Command by command this is what I did:
=======
wget http://tidy.sourceforge.net/src/tidy_src.tgz
tar -xzf tidy_src.tgz
cd tidy
/bin/sh build/gnuauto/setup.sh
./configure --prefix=/usr
make
make install
cd [php source directory]
./configure --with-tidy=/usr --[other extensions]
make
make install
=======
Tada. Finally it doesn't complain when I configure PHP about the installation. The info I needed was stuck in that build/gnuauto/readme.txt file in the tidy directory.
Took me a while. Hope my trials can help others save time.
Doodleelephant
bill dot mccuistion at qbopen dot com
30-Oct-2004 01:53
Installing tidy on Fedora Core 2 required three libraries:
tidy...
tidy-devel...
libtidy...
All of which I found at http://rpm.pbone.net
Then, finally, could "./configure --with-tidy"
Hope this helps someone out. This was "REALLY" hard (for me) to figure out as no where else was clearly documented.
|  |