 |
CLVI. XML 语法解析函数
XML(eXtensible Markup Language,可扩展标记语言)是一种在 web
上进行文档交换的数据格式。该语言是由 W3C(World
Wide Web Concortium,世界万维网组织)定义的一种标准。可以访问
http://www.w3.org/XML/ 以获取关于 XML 及其相关技术的更多信息。
本扩展模块可为 James Clark 的 expat
提供支持。该工具包帮助解析 XML 文档(而非 XML
文档的有效化)。它支持三种源代码的编码方式,这三种编码方式也被 PHP
本身所支持,它们分别是:US-ASCII、ISO-8859-1
和 UTF-8。本系统尚不支持 UTF-16。
本扩展模块使用户能够建立
XML 语法解析器,并对不同的 XML 事件定义对应的处理器。每个
XML 语法解析器都有若干个可根据需要调整的参数。
这些函数默认为有效的,它们使用了捆绑的 expat 库。您可以通过参数 --disable-xml 来屏蔽 XML 的支持。如果您将 PHP 编译为 Apache 1.3.9 或更高版本的一个模块, PHP 将自动使用 Apache 捆绑的 expat 库。如果您不希望使用该捆绑的 expat 库,请在运行 PHP 的 configure 配置脚本时使用参数 --with-expat-dir=DIR,其中 DIR 应该指向 expat 安装的根目录。
PHP 的 Windows
版本已经内置该扩展模块的支持。无需加载任何附加扩展库即可使用这些函数。 本扩展模块在 php.ini 中未定义任何配置选项。 以下常量由本扩展模块定义,因此只有在本扩展模块被编译到
PHP 中,或者在运行时被动态加载后才有效。 - XML_ERROR_NONE
(integer)
- XML_ERROR_NO_MEMORY
(integer)
- XML_ERROR_SYNTAX
(integer)
- XML_ERROR_NO_ELEMENTS
(integer)
- XML_ERROR_INVALID_TOKEN
(integer)
- XML_ERROR_UNCLOSED_TOKEN
(integer)
- XML_ERROR_PARTIAL_CHAR
(integer)
- XML_ERROR_TAG_MISMATCH
(integer)
- XML_ERROR_DUPLICATE_ATTRIBUTE
(integer)
- XML_ERROR_JUNK_AFTER_DOC_ELEMENT
(integer)
- XML_ERROR_PARAM_ENTITY_REF
(integer)
- XML_ERROR_UNDEFINED_ENTITY
(integer)
- XML_ERROR_RECURSIVE_ENTITY_REF
(integer)
- XML_ERROR_ASYNC_ENTITY
(integer)
- XML_ERROR_BAD_CHAR_REF
(integer)
- XML_ERROR_BINARY_ENTITY_REF
(integer)
- XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
(integer)
- XML_ERROR_MISPLACED_XML_PI
(integer)
- XML_ERROR_UNKNOWN_ENCODING
(integer)
- XML_ERROR_INCORRECT_ENCODING
(integer)
- XML_ERROR_UNCLOSED_CDATA_SECTION
(integer)
- XML_ERROR_EXTERNAL_ENTITY_HANDLING
(integer)
- XML_OPTION_CASE_FOLDING
(integer)
- XML_OPTION_TARGET_ENCODING
(integer)
- XML_OPTION_SKIP_TAGSTART
(integer)
- XML_OPTION_SKIP_WHITE
(integer)
元素处理函数可能会导致元素名称“大小写折叠”(case-folded)。“大小写折叠”被 XML 标准定义为“一个应用于一系列字符的过程,在该过程中,这些字符中的所有的非大写字符将被替换成它们对应大写等价字符”。换句话说,对于 XML,“大小写折叠”就是指将字符串转换成大写字符。
所有被传递给处理器函数的元素名称将默认的发生“大小写折叠”。该过程可以分别被
xml_parser_get_option() 和
xml_parser_set_option() 函数查询和控制。
以下常量被定义为 XML 的错误代码,将由 xml_parse() 返回:
XML_ERROR_NONE | XML_ERROR_NO_MEMORY | XML_ERROR_SYNTAX | XML_ERROR_NO_ELEMENTS | XML_ERROR_INVALID_TOKEN | XML_ERROR_UNCLOSED_TOKEN | XML_ERROR_PARTIAL_CHAR | XML_ERROR_TAG_MISMATCH | XML_ERROR_DUPLICATE_ATTRIBUTE | XML_ERROR_JUNK_AFTER_DOC_ELEMENT | XML_ERROR_PARAM_ENTITY_REF | XML_ERROR_UNDEFINED_ENTITY | XML_ERROR_RECURSIVE_ENTITY_REF | XML_ERROR_ASYNC_ENTITY | XML_ERROR_BAD_CHAR_REF | XML_ERROR_BINARY_ENTITY_REF | XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF | XML_ERROR_MISPLACED_XML_PI | XML_ERROR_UNKNOWN_ENCODING | XML_ERROR_INCORRECT_ENCODING | XML_ERROR_UNCLOSED_CDATA_SECTION | XML_ERROR_EXTERNAL_ENTITY_HANDLING |
PHP 的 XML 扩展库支持不同字符编码(character encoding)的
Unicode
字符集。字符编码有两种形式,它们分别是“源编码”(source
encoding)和“目标编码”(target
encoding)。PHP 对文档内部表示的编码方式是
UTF-8。
源编码将在 XML 文档被解析后完成。源编码可在建立一个 XML
解析器时指明(该编码方式在 XML
解析器的生命周期中不能被再次改变)。支持的编码方式包括
ISO-8859-1,US-ASCII
和 UTF-8。前两种为单字节编码,即每个字符被一个单一的字节表示。UTF-8
支持 1 至 4 个字节的多 bit(最多 12)字符编码。PHP 默认使用
ISO-8859-1 作为源编码方式。
目标编码将在 PHP 向 XML 处理器函数传送数据时被完成。当 XML
解析器被建立后,目标编码将被设置成与源编码相同的编码方式,但该方式可在任何时候被更改。目标编码将影响字符数据、标记符名称以及处理指令目标(PI target)。
如果 XML 解析器遇到其源编码方式表示能力之外的字符,它将返回一个错误。
当 PHP 在被解析的 XML 文档中遇到当前目标编码无法表示的字符时,这些字符将被“降级”。简单的说,这些字符将被问号替换。
以下是 PHP 脚本解析 XML 文档的一些范例。
第一个范例用缩进格式显示一个文档中起始元素的结构。
例子 1. 显示 XML 元素结构
<?php $file = "data.xml"; $depth = array();
function startElement($parser, $name, $attrs) { global $depth; for ($i = 0; $i < $depth[$parser]; $i++) { echo " "; } echo "$name\n"; $depth[$parser]++; }
function endElement($parser, $name) { global $depth; $depth[$parser]--; }
$xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, "startElement", "endElement"); if (!($fp = fopen($file, "r"))) { die("could not open XML input"); }
while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser); ?>
|
|
例子 2. 将 XML 映射为 HTML
以下范例将 XML 文档中的标记符直接映射成 HTML
标记符。在“映射数组”中不存在的元素将被忽略。当然,该范例将只对一个特定的
XML 文档有效。
<?php $file = "data.xml"; $map_array = array( "BOLD" => "B", "EMPHASIS" => "I", "LITERAL" => "TT" );
function startElement($parser, $name, $attrs) { global $map_array; if (isset($map_array[$name])) { echo "<$map_array[$name]>"; } }
function endElement($parser, $name) { global $map_array; if (isset($map_array[$name])) { echo "</$map_array[$name]>"; } }
function characterData($parser, $data) { echo $data; }
$xml_parser = xml_parser_create(); // 使用大小写折叠来保证我们能在元素数组中找到这些元素名称 xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true); xml_set_element_handler($xml_parser, "startElement", "endElement"); xml_set_character_data_handler($xml_parser, "characterData"); if (!($fp = fopen($file, "r"))) { die("could not open XML input"); }
while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser); ?>
|
|
该范例能够高亮显示 XML 源代码。它将说明如何外部实体指向处理器来包含和解析其它文档,如何处理 PIs,以及一种确定包含有 PIs 的代码的可信度。
能被该范例使用的的 XML 文档(xmltest.xml 和
xmltest2.xml)被列在该范例之后。
例子 3. 外部实体范例
<?php $file = "xmltest.xml";
function trustedFile($file) { // only trust local files owned by ourselves if (!eregi("^([a-z]+)://", $file) && fileowner($file) == getmyuid()) { return true; } return false; }
function startElement($parser, $name, $attribs) { echo "<<font color=\"#0000cc\">$name</font>"; if (count($attribs)) { foreach ($attribs as $k => $v) { echo " <font color=\"#009900\">$k</font>=\"<font color=\"#990000\">$v</font>\""; } } echo ">"; }
function endElement($parser, $name) { echo "</<font color=\"#0000cc\">$name</font>>"; }
function characterData($parser, $data) { echo "<b>$data</b>"; }
function PIHandler($parser, $target, $data) { switch (strtolower($target)) { case "php": global $parser_file; // If the parsed document is "trusted", we say it is safe // to execute PHP code inside it. If not, display the code // instead. if (trustedFile($parser_file[$parser])) { eval($data); } else { printf("Untrusted PHP code: <i>%s</i>", htmlspecialchars($data)); } break; } }
function defaultHandler($parser, $data) { if (substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") { printf('<font color="#aa00aa">%s</font>', htmlspecialchars($data)); } else { printf('<font size="-1">%s</font>', htmlspecialchars($data)); } }
function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId, $publicId) { if ($systemId) { if (!list($parser, $fp) = new_xml_parser($systemId)) { printf("Could not open entity %s at %s\n", $openEntityNames, $systemId); return false; } while ($data = fread($fp, 4096)) { if (!xml_parse($parser, $data, feof($fp))) { printf("XML error: %s at line %d while parsing entity %s\n", xml_error_string(xml_get_error_code($parser)), xml_get_current_line_number($parser), $openEntityNames); xml_parser_free($parser); return false; } } xml_parser_free($parser); return true; } return false; }
function new_xml_parser($file) { global $parser_file;
$xml_parser = xml_parser_create(); xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1); xml_set_element_handler($xml_parser, "startElement", "endElement"); xml_set_character_data_handler($xml_parser, "characterData"); xml_set_processing_instruction_handler($xml_parser, "PIHandler"); xml_set_default_handler($xml_parser, "defaultHandler"); xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
if (!($fp = @fopen($file, "r"))) { return false; } if (!is_array($parser_file)) { settype($parser_file, "array"); } $parser_file[$xml_parser] = $file; return array($xml_parser, $fp); }
if (!(list($xml_parser, $fp) = new_xml_parser($file))) { die("could not open XML input"); }
echo "<pre>"; while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d\n", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } echo "</pre>"; echo "parse complete\n"; xml_parser_free($xml_parser);
?>
|
|
例子 4. xmltest.xml <?xml version='1.0'?>
<!DOCTYPE chapter SYSTEM "/just/a/test.dtd" [
<!ENTITY plainEntity "FOO entity">
<!ENTITY systemEntity SYSTEM "xmltest2.xml">
]>
<chapter>
<TITLE>Title &plainEntity;</TITLE>
<para>
<informaltable>
<tgroup cols="3">
<tbody>
<row><entry>a1</entry><entry morerows="1">b1</entry><entry>c1</entry></row>
<row><entry>a2</entry><entry>c2</entry></row>
<row><entry>a3</entry><entry>b3</entry><entry>c3</entry></row>
</tbody>
</tgroup>
</informaltable>
</para>
&systemEntity;
<section id="about">
<title>About this Document</title>
<para>
<!-- this is a comment -->
<?php echo 'Hi! This is PHP version ' . phpversion(); ?>
</para>
</section>
</chapter> |
|
以下文档将被 xmltest.xml 文件调用:
例子 5. xmltest2.xml <?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY testEnt "test entity">
]>
<foo>
<element attrib="value"/>
&testEnt;
<?php echo "This is some more PHP code being executed."; ?>
</foo> |
|
james at clickmedia dot com
04-May-2006 11:35
Thanks for fixing those extra backslashes for me. I thought the double backslashes were necessary because my code sample didn't seem display properly in the Preview page when I used single slashes. I figured someone would eventually correct it for me. :)
Anyways, I have an update to storing HTML within XML. Different servers seem to handle HTML entities differently and I ended up having problems even after changing the trim() function to ignore spaces.
My latest solution is to just enclose the HTML using CDATA blocks. For example:
<mytag><![CDATA[Here is a bunch of <a href="#">HTML code</a>]]></mytag>
This seems to be a more compatible solution than making the trim() function to ignore spaces.
php dot notes at stoecklin dot net
02-May-2006 01:15
I realized that there is a tiny flaw at line 4 in James' update of Felix' code. The double backslash will add backslashes to the charlist an not tabs, new lines, carriage returns, NULL-bytes, and vertical tabs respectively. The rest of the code has been reproduced as is:
<?php
function dataHandler($parser, $data) {
//Trims everything except for spaces
if($data = trim($data, "\t\n\r\0\x0B")) {
$index = count($this->data) - 1;
if(isset($this->data[$index]['content'])) {
$this->data[$index]['content'] .= $data;
}
else $this->data[$index]['content'] = $data;
}
}
?>
james at clickmedia dot com
14-Apr-2006 10:12
Felix's fix for Raphael's code works almost perfectly for me. I'm storing HTML within my XML so any HTML special characters are encoded as HTML entities. The dataHandler() function trims the spaces around HTML entities, which causes problems with inline HTML like span or anchor tags. When I need to print out my inline HTML again, all the text is bunched together with no spaces.
To fix this problem, I just updated Felix's fixed dataHandler function. This shouldn't cause any problems since spaces are pretty harmless in HTML.
<?php
function dataHandler($parser, $data) {
//Trims everything except for spaces
if($data = trim($data, "\\t\\n\\r\\0\\x0B")) {
$index = count($this->data) - 1;
if(isset($this->data[$index]['content'])) {
$this->data[$index]['content'] .= $data;
}
else $this->data[$index]['content'] = $data;
}
}
?>
vankata at mikromaxbg dot com
28-Feb-2006 07:19
Here I introduce you an XML Parser which creats an array and can evaluate PHP code. If the PHP code returns a string the value of that string is added to the array.
Here is the XML
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY testEnt "test entity">
]>
<document>
<foo>
<?php $a = 'foo'; $b = 4; $c = $a.$b; echo "<b>It even evaluates html code $c</b>\n"; ?>
</foo>
<foo>hi this is text
</foo>
<foo> <?php echo $a; ?>
</foo>
</document>
Note that <?php echo $a; ?> will not echo "foo"
You might need to use include('foo.php').
And this is the Parser
<?php
$file = "xmltest2.xml";
$stack = array();
// start_element_handler ( resource parser, string name, array attribs )
function startElement($parser, $name, $attribs)
{
global $stack;
$tag=array("name"=>$name,"attrs"=>$attrs);
array_push($stack,$tag);
}
// end_element_handler ( resource parser, string name )
function endElement($parser, $name)
{
global $stack;
$stack[count($stack)-2]['children'][] = $stack[count($stack)-1];
array_pop($stack);
}
// handler ( resource parser, string data )
function characterData($parser, $data)
{
global $stack,$i;
if(trim($data))
{
$stack[count($stack)-1]['data'] .= $data;
}
}
// handler ( resource parser, string target, string data )
function PIHandler($parser, $target, $data)
{
global $stack,$i;
//PHP EVALUATION
if ((strtolower($target)) == "php") {
global $parser_file;
//eval($data); gust doesn't work if you want the
//string added to the array
$text .= '<?php '.$data.' ?>';
ob_start();
eval ('?>' . $text);
$text = ob_get_clean();
$stack[count($stack)-1]['data'] .= $text;
}
}
function defaultHandler($parser, $data)
{
}
function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
$publicId) {
if ($systemId) {
if (!list($parser, $fp) = new_xml_parser($systemId)) {
printf("Could not open entity %s at %s\n", $openEntityNames,
$systemId);
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($parser, $data, feof($fp))) {
printf("XML error: %s at line %d while parsing entity %s\n",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser), $openEntityNames);
xml_parser_free($parser);
return false;
}
}
xml_parser_free($parser);
return true;
}
return false;
}
function new_xml_parser($file)
{
global $parser_file;
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 0);
xml_set_processing_instruction_handler($xml_parser, "PIHandler");
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
xml_set_default_handler($xml_parser, "defaultHandler");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
if (!($fp = @fopen($file, "r"))) {
return false;
}
if (!is_array($parser_file)) {
settype($parser_file, "array");
}
$parser_file[$xml_parser] = $file;
return array($xml_parser, $fp);
}
if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
die("could not open XML input");
}
//ERROR
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d\n",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
//END
xml_parser_free($xml_parser);
print("<pre>\n");
print_r($stack);
print("</pre>\n");
?>
forquan
29-Jan-2006 07:45
Here's code that will create an associative array from an xml file. Keys are the tag data and subarrays are formed from attributes and child tags
<?php
$p =& new xmlParser();
$p->parse('/*xml file*/');
print_r($p->output);
?>
<?php
class xmlParser{
var $xml_obj = null;
var $output = array();
var $attrs;
function xmlParser(){
$this->xml_obj = xml_parser_create();
xml_set_object($this->xml_obj,$this);
xml_set_character_data_handler($this->xml_obj, 'dataHandler');
xml_set_element_handler($this->xml_obj, "startHandler", "endHandler");
}
function parse($path){
if (!($fp = fopen($path, "r"))) {
die("Cannot open XML data file: $path");
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($this->xml_obj, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml_obj)),
xml_get_current_line_number($this->xml_obj)));
xml_parser_free($this->xml_obj);
}
}
return true;
}
function startHandler($parser, $name, $attribs){
$_content = array();
if(!empty($attribs))
$_content['attrs'] = $attribs;
array_push($this->output, $_content);
}
function dataHandler($parser, $data){
if(!empty($data) && $data!="\n") {
$_output_idx = count($this->output) - 1;
$this->output[$_output_idx]['content'] .= $data;
}
}
function endHandler($parser, $name){
if(count($this->output) > 1) {
$_data = array_pop($this->output);
$_output_idx = count($this->output) - 1;
$add = array();
if ($_data['attrs'])
$add['attrs'] = $_data['attrs'];
if ($_data['child'])
$add['child'] = $_data['child'];
$this->output[$_output_idx]['child'][$_data['content']] = $add;
}
}
}
?>
sander at gameqube dot nl
25-Dec-2005 07:07
There is flaw in the xml2array function of aerik. It can extract several different nodes with the same name, but a child element with the same name as the parent will go wrong.
For example:
<root>
<element>
<element>2</element>
</element>
</root>
The value of the first 'element' attribute will be '<element>2'. This is not easily solved and personaly I didn't find a solution for this. The problem lies in the use of the regular expression for parsing the xml document, so beware of this drawback when using this method.
orbitphreak at yahoo dot com
15-Dec-2005 02:43
Here's another version of raphael's XMLParser class. The modification allows you to parse a remote (HTTP) XML file into an array without the need of fopen or fsockopen, which are usually disabled by hosting companies.
<?PHP
class XMLParser {
var $xml_url;
var $xml;
var $data;
function XMLParser($xml_url) {
$this->xml_url = $xml_url;
$this->xml = xml_parser_create();
xml_set_object($this->xml, $this);
xml_set_element_handler($this->xml, 'startHandler', 'endHandler');
xml_set_character_data_handler($this->xml, 'dataHandler');
$this->parse($xml_url);
}
function parse($xml_url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$store = curl_exec ($ch);
$data = curl_exec ($ch);
curl_close ($ch);
$parse = xml_parse($this->xml, $data, sizeof($data));
if (!$parse) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml)),
xml_get_current_line_number($this->xml)));
xml_parser_free($this->xml
);
}
return true;
}
function startHandler($parser, $name, $attributes) {
$data['name'] = $name;
if ($attributes) { $data['attributes'] = $attributes; }
$this->data[] = $data;
}
function dataHandler($parser, $data) {
if ($data = trim($data)) {
$index = count($this->data) - 1;
// begin multi-line bug fix (use the .= operator)
$this->data[$index]['content'] .= $data;
// end multi-line bug fix
}
}
function endHandler($parser, $name) {
if (count($this->data) > 1) {
$data = array_pop($this->data);
$index = count($this->data) - 1;
$this->data[$index]['child'][] = $data;
}
}
}
?>
Examples on how to use it to parse out geographic data are here:
http://www.digital-seven.net/?option=com_content&task=view&id=69
memandeemail at gmail dot com
13-Dec-2005 02:37
XML TO DB
XML Example:
<?xml version="1.0" encoding="iso-8859-1"?>
<data>
<table name="bairro" primary_key="hash">
<row>
<field name="hash">1</field>
<field name="descr">Centro</field>
</row>
</table>
</data>
<?php
class xmlToDB {
var $parser;
var $cn;
var $table;
var $recordset = array();
var $content;
var $value;
var $field;
function xmlToDB() {
$this->parser = xml_parser_create();
xml_set_object($this->parser,$this);
xml_set_element_handler($this->parser,"tag_open","tag_close");
xml_set_character_data_handler($this->parser,"cdata");
$this->cn = @mysql_pconnect(XOOPS_DB_HOST,XOOPS_DB_USER,XOOPS_DB_PASS) or trigger_error('Erro ao conectar ao servidor de dados',E_USER_ERROR);
@mysql_select_db(XOOPS_DB_NAME,$this->cn) or trigger_error('Erro ao selecionar o banco de dados',E_USER_ERROR);
}
function parse($data) {
@xml_parse($this->parser, $data) or
die(sprintf("Erro de XML: %s na linha %d",
xml_error_string(xml_get_error_code($this->parser)),
xml_get_current_line_number($this->parser)));
}
function tag_open($parser, $tag, $attributes) {
$parser;
switch ($tag) {
case 'DATA': break;
case 'TABLE': {
$this->table = $attributes['NAME'];
mysql_unbuffered_query("TRUNCATE $this->table");
$this->recordset = array();
//[todo]primary key no usada
} break;
case 'ROW': break;
case 'FIELD': {
$this->content = (isset($attributes['CONTENT'])) ? $attributes['CONTENT'] : '';
$this->field = $attributes['NAME'];
}
}
}
function cdata($parser, $cdata) {
$parser;
switch ($this->content) {
case 'base64': {
$this->value = base64_decode($cdata);
} break;
default: {
$this->value = $cdata;
}
}
}
function tag_close($parser, $tag) {
$parser;
switch ($tag) {
case 'DATA': break;
case 'TABLE': break;
case 'ROW': {
$sql = "INSERT INTO $this->table";
$iC=0;
foreach ($this->recordset as $tcaption => $tvalue) {
if ($iC==0) $sql .= "\nSET "; else $sql .= ", ";
$sql .= "`$tcaption` = '";
$sql .= mysql_real_escape_string($tvalue,$this->cn);
$sql .= "'";
$iC++;
}
mysql_unbuffered_query($sql,$this->cn);
} break;
case 'FIELD': {
$this->recordset[$this->field] = $this->value;
}
}
}
}
//require('mainfile.php');
require('config.php');
define('XOOPS_DB_HOST', DB_HOST);
define('XOOPS_DB_USER', DB_USUARIO);
define('XOOPS_DB_PASS', DB_SENHA);
define('XOOPS_DB_NAME', 'xoops');
$xsparse = new xmlToDB();
$file = 'bulkinsert.xml';
$fnum = @fopen($file,'r') or exit();//trigger_error('Erro ao abrir o XML',E_USER_ERROR);
while (($data = fread($fnum,4194304))) {
$xsparse->parse($data);
}
fclose($fnum);
?>
<script>self.close();</script>
aerik at wikidweb dot com
26-Nov-2005 11:24
Thanks Janne and zacheryph, for the fixes. I found the same problem with the regex, and did a pretty major re-write, which you may be interested in. This works with multiple tags having the same name, attributes, self closing tags, and text. It gets the closing tag, if there is one, so you can figure out if you have a self closing tag or a regular tag. I think it does all the basic stuff...
// Function xml2array
// takes an xml string and returns an array of elements
// each element is an associative array, consisting of 'name', 'test',
// possible 'attributes', sub 'elements', and closing tag if there is one.
// If there are attributes, they are also
// an associative array whose key values are the names of the attributes
// and the values are the array values
// Aerik Sylvan, Oct 27 2005
function xml2array ($xml)
{
$xmlary = array ();
if ((strlen ($xml) < 256) && is_file ($xml))
$xml = file_get_contents ($xml);
$ReElements = '/<(\w+)\s*([^\/>]*)\s*(?:\/>|>(.*?)<(\/\s*\1\s*)>)/s';
$ReAttributes = '/(\w+)=(?:"|\')([^"\']*)(:?"|\')/';
preg_match_all ($ReElements, $xml, $elements);
foreach ($elements[1] as $ie => $xx) {
$xmlary[$ie]["name"] = $elements[1][$ie];
if ( $attributes = trim($elements[2][$ie])) {
preg_match_all ($ReAttributes, $attributes, $att);
foreach ($att[1] as $ia => $xx)
// all the attributes for current element are added here
$xmlary[$ie]["attributes"][$att[1][$ia]] = $att[2][$ia];
} // if $attributes
// get text if it's combined with sub elements
$cdend = strpos($elements[3][$ie],"<");
if ($cdend > 0) {
$xmlary[$ie]["text"] = substr($elements[3][$ie],0,$cdend -1);
} // if cdend
if (preg_match ($ReElements, $elements[3][$ie])){
$xmlary[$ie]["elements"] = xml2array ($elements[3][$ie]);
}
else if (isset($elements[3][$ie])){
$xmlary[$ie]["text"] = $elements[3][$ie];
}
$xmlary[$ie]["closetag"] = $elements[4][$ie];
}//foreach ?
return $xmlary;
}
Greg S
18-Nov-2005 12:56
If you need utf8_encode support and configure PHP with --disable-all you will have some trouble. Unfortunately the configure options aren't completely documented. If you need utf8 functions and have everything disabled just recompile PHP with --enable-xml and you should be good to go.
Felix dot Riesterer at gmx dot net
17-Nov-2005 08:57
adrian_popescu at yahoo dot com is right when he states, that Raphael's XMLParser Class (which I began to love!) has a flaw with content of multiple lines. I ran into trouble with entities, because Raphael's class would cut off everything before the last entity plus the entity itself leaving only the rest string.
I found out that the "dataHandler" function needs to be repaired. Adrian's mending by use of the ".=" operator is not enough. If $this->data[$index]['content'] isn't set yet you might run into error messages. So check it first! Here's my modification:
<?php
// You need the complete XML Parser Class by Raphael
// as this is only a part of it
function dataHandler($parser, $data)
{
if($data = trim($data))
{
$index = count($this->data) -1;
if(isset($this->data[$index]['content']))
$this->data[$index]['content'] .= $data;
else $this->data[$index]['content'] = $data;
}
}
?>
adrian_popescu at yahoo dot com
16-Nov-2005 03:04
Raphael's XMLParser has a small bug in the dataHandler function: it cannot handle multiple line values correctly. For instance:
<mytag> This is a
very long
line
</mytag>
causes the parser to return this content:
Array
(
[name] => MYTAG
[content] => line
)
whereas one would have expected the
[content] => 'This is a very long line'.
Below is the fixed version:
<?php
class XMLParser {
var $filename;
var $xml;
var $data;
function XMLParser($xml_file)
{
$this->filename = $xml_file;
$this->xml = xml_parser_create();
xml_set_object($this->xml, $this);
xml_set_element_handler($this->xml, 'startHandler', 'endHandler');
xml_set_character_data_handler($this->xml, 'dataHandler');
$this->parse($xml_file);
}
function parse($xml_file)
{
if (!($fp = fopen($xml_file, 'r'))) {
die('Cannot open XML data file: '.$xml_file);
return false;
}
$bytes_to_parse = 512;
while ($data = fread($fp, $bytes_to_parse)) {
$parse = xml_parse($this->xml, $data, feof($fp));
if (!$parse) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml)),
xml_get_current_line_number($this->xml)));
xml_parser_free($this->xml
);
}
}
return true;
}
function startHandler($parser, $name, $attributes)
{
$data['name'] = $name;
if ($attributes) { $data['attributes'] = $attributes; }
$this->data[] = $data;
}
function dataHandler($parser, $data)
{
if ($data = trim($data)) {
$index = count($this->data) - 1;
// begin multi-line bug fix (use the .= operator)
$this->data[$index]['content'] .= $data;
// end multi-line bug fix
}
}
function endHandler($parser, $name)
{
if (count($this->data) > 1) {
$data = array_pop($this->data);
$index = count($this->data) - 1;
$this->data[$index]['child'][] = $data;
}
}
}
?>
daniel at lorch dot cc
11-Nov-2005 06:02
there has been a lot of discussion about the xml2array-function. keith devens wrote a library called phpxml, which does exactly this, but with the xml extension rather than regular expressions. I found it quite useful: http://keithdevens.com/software/phpxml
janne at consilia dot fi
09-Nov-2005 06:56
In aerik's xml2array() (below), tags with a value of 0 (zero) is not returned.
Change the line
else if ($elements[3][$ie])
to
else if (isset($elements[3][$ie]))
a few lines from the end to fix this.
raphael at schwarzschmid dot de
08-Nov-2005 08:51
Monte's class with its various amendments didn't quite work for me, so here's my version of it:
class XMLParser {
var $filename;
var $xml;
var $data;
function XMLParser($xml_file)
{
$this->filename = $xml_file;
$this->xml = xml_parser_create();
xml_set_object($this->xml, $this);
xml_set_element_handler($this->xml, 'startHandler', 'endHandler');
xml_set_character_data_handler($this->xml, 'dataHandler');
$this->parse($xml_file);
}
function parse($xml_file)
{
if (!($fp = fopen($xml_file, 'r'))) {
die('Cannot open XML data file: '.$xml_file);
return false;
}
$bytes_to_parse = 512;
while ($data = fread($fp, $bytes_to_parse)) {
$parse = xml_parse($this->xml, $data, feof($fp));
if (!$parse) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml)),
xml_get_current_line_number($this->xml)));
xml_parser_free($this->xml
);
}
}
return true;
}
function startHandler($parser, $name, $attributes)
{
$data['name'] = $name;
if ($attributes) { $data['attributes'] = $attributes; }
$this->data[] = $data;
}
function dataHandler($parser, $data)
{
if ($data = trim($data)) {
$index = count($this->data) - 1;
$this->data[$index]['content'] = $data;
}
}
function endHandler($parser, $name)
{
if (count($this->data) > 1) {
$data = array_pop($this->data);
$index = count($this->data) - 1;
$this->data[$index]['child'][] = $data;
}
}
}
Use like:
$myFile = new XMLParser($path_to_file);
echo $myFile->data[$n]['name'];
foreach ($myFile->data[$n]['attributes'] as $key => $val)
echo $key.'='.$val;
... and so forth.
If somebody would know how to say something like...
$myFile = new XMLParser($path_to_file);
echo $myFile[$n];
foreach ($myFile[$n]['attributes'] as $key => $val)
echo $key.'='.$val;
...instead, I'd be very interested in that!
php dot net at site dot masshole dot us
07-Nov-2005 06:37
Note that Monte's XML parser class below (from 14-Sep-2005) contains a bug; the dataHandler function needs to use concatenation rather than assignment. Here's what it should look like:
function dataHandler($parser, $data){
if(!empty($data)) {
$_output_idx = count($this->output) - 1;
$this->output[$_output_idx]['content'] .= $data;
}
}
alex at whitewhale dot net
05-Nov-2005 05:28
A simple XML parser that accepts an array which maps tags to the HTML format used for output of their values. You can optionally specify an array of PHP functions to pass the value through before it is returned in the specified format:
<?php
class XParser {
protected $parser=NULL;
protected $current='';
public function __construct($xml, $tag_map) {
$this->parser=xml_parser_create();
xml_set_element_handler($this->parser, array($this, 'begin'), array($this, 'end'));
xml_set_character_data_handler($this->parser, array($this, 'cdata'));
$this->tag_map=$tag_map;
if (!xml_parse($this->parser, $xml)) die(sprintf('XML error: %s at line %d', xml_error_string(xml_get_error_code($this->parser)), xml_get_current_line_number($this->parser)));
}
protected function begin($parser, $name, $attrs) {
$this->current=$this->current.'/'.$name;
}
protected function cdata($parser, $cdata) {
if (isset($this->tag_map[$this->current][1])) foreach($this->tag_map[$this->current][1] as $func) $cdata=call_user_func($func, $cdata);
$this->html.=sprintf($this->tag_map[$this->current][0], $cdata);
}
protected function end($parser, $name) {
$this->current=str_replace('/'.$name.'$', '', $this->current.'$');
}
public function html() {
return $this->html;
}
}
$source='
<books>
<book>
<title>My Book</title>
<description>A good book.</description>
</book>
<book>
<title>Some Other Book</title>
<description>Also a good book.</description>
</book>
</books>
';
$xml=new XParser($source, array(
'/BOOKS/BOOK/TITLE'=>array(
"<b>Title:</b> %s<br>\n", array('strtoupper')
),
'/BOOKS/BOOK/DESCRIPTION'=>array(
"<b>Description:</b> %s<br><br>\n\n"
),
));
echo $xml->html;
?>
Returns:
<b>Title:</b> MY BOOK<br>
<b>Description:</b> A good book.<br><br>
<b>Title:</b> SOME OTHER BOOK<br>
<b>Description:</b> Also a good book.<br><br>
zacheryph at gmail dot com
03-Nov-2005 06:44
@aerik
thanks a lot for this simple to use function.
@adrianus
i noticed this too as soon as i used it but it is easily resolved.
if anyone is using aerik's xml2array() (below) and noticed it doesn't play with with multiple nodes of the same name, here is a fix.
this was only briefly tested. if anyone notices a problem with it please tell me.
<?
// replace $ReElements with the following:
// note: only addition is changing (.*) to (.*?) causing the search
// to stop at the first instance of </$1>
// change
$ReElements = '/<(\w+)\s*([^\/>]*)\s*(?:\/>|>(.*)<\/\s*\\1\s*>)/s';
// to -->
$ReElements = '/<(\w+)\s*([^\/>]*)\s*(?:\/>|>(.*?)<\/\s*\\1\s*>)/s';
?>
adrianus at warmenhoven dot nl
24-Oct-2005 08:38
Previous example has some issues with xml-nodes such as:
<code>
<somethings>
<something>text1</something>
<something>text2</something>
<something>text3</something>
<something>text4</something>
</somethings>
</code>
It will put each subsequent <something> as an element of the previous <something>
aerik at wikidweb dot com
22-Oct-2005 05:30
For anyone else looking for a xml parser not requiring a library, here's a modified version of xml2array, which returns a nested associative array, including attributes and nested nodes. (BTW, thanks to whoever originally wrote these regexes - they're quite clever)
<?
function xml2array ($xml)
{
$xmlary = array ();
if ((strlen ($xml) < 256) && is_file ($xml))
$xml = file_get_contents ($xml);
$ReElements = '/<(\w+)\s*([^\/>]*)\s*(?:\/>|>(.*)<\/\s*\\1\s*>)/s';
$ReAttributes = '/(\w+)=(?:"|\')([^"\']*)(:?"|\')/';
preg_match_all ($ReElements, $xml, $elements);
foreach ($elements[1] as $ie => $xx) {
$xmlary[$ie]["name"] = $elements[1][$ie];
if ( $attributes = trim($elements[2][$ie])) {
preg_match_all ($ReAttributes, $attributes, $att);
foreach ($att[1] as $ia => $xx)
// all the attributes for current element are added here
$xmlary[$ie]["attributes"][$att[1][$ia]] = $att[2][$ia];
} // if $attributes
// get text if it's combined with sub elements
$cdend = strpos($elements[3][$ie],"<");
if ($cdend > 0) {
$xmlary[$ie]["text"] = substr($elements[3][$ie],0,$cdend -1);
} // if cdend
if (preg_match ($ReElements, $elements[3][$ie]))
$xmlary[$ie]["elements"] = xml2array ($elements[3][$ie]);
else if ($elements[3][$ie]){
$xmlary[$ie]["text"] = $elements[3][$ie];
}
}
return $xmlary;
}
?>
Parsing this:
<XML>
<header>
<title> Sample App </title>
<version> v. 1.0</version>
</header>
<window height='220' width='420'>
stuff in the window
</window>
</XML>
returns a print_r of this:
Array
(
[0] => Array
(
[name] => XML
[text] =>
[elements] => Array
(
[0] => Array
(
[name] => header
[text] =>
[elements] => Array
(
[0] => Array
(
[name] => title
[text] => Sample App
)
[1] => Array
(
[name] => version
[text] => v. 1.0
)
)
)
[1] => Array
(
[name] => window
[attributes] => Array
(
[height] => 220
[width] => 420
)
[text] =>
stuff in the window
)
)
)
)
m dot quinton at gmail dot com
13-Oct-2005 03:32
an example of using "monte" class for Klm XLM file from Google Earth. KlmObject can be reused because it uses xmlParser output as input.
<?php
error_reporting(E_ALL);
$p =& new xmlParser();
$p->parse('your_file.kml');
$tree = $p->GetNodeByPath('KML/FOLDER');
$folder = new KlmFolder($tree);
echo "type : " . $folder->type() . "\n";
echo "name : " . $folder->name() . "\n";
$placemarks = $folder->get_placemarks();
foreach($placemarks as $place){
echo $place->toText() . "\n";
}
class KlmObject {
var $tree;
function type(){
return $this->tree['name'];
}
function name(){
$node = $this->child_by_name('NAME');
return $node['content'];
}
function &childs(){
return $this->tree['child'];
}
function child_by_name($name){
foreach($this->childs() as $key=>$val){
$val = &$this->tree['child'][$key];
if($val['name'] == $name)
return $val;
}
return false;
}
}
class KlmFolder extends KlmObject{
function KlmFolder(&$tree){
$this->tree = &$tree;
}
function get_placemarks(){
$placemarks = array();
foreach($this->childs() as $val){
echo "{$val['name']}\n";
if($val['name'] == 'FOLDER'){
$subfolder = &new KlmFolder($val);
$placemarks = array_merge($placemarks, $subfolder->get_placemarks());
}
if($val['name'] == 'PLACEMARK'){
$placemarks[] = &new KlmPlacemark($val);
}
}
return $placemarks;
}
}
class KlmPlaceMark extends KlmObject{
var $tree;
function KlmPlacemark($tree){
$this->tree = &$tree;
$name = $this->name();
print_r($tree);
}
function longitude(){
return $this->tree['child'][1]['child'][0]['content'];
}
function latitude(){
return $this->tree['child'][1]['child'][1]['content'];
}
function toText(){
return $this->name() . ' : ' . $this-> longitude() . ' - ' . $this-> latitude();
}
}
?>
m dot quinton at gmail dot com
13-Oct-2005 03:26
from monte at NOT-SP-AM dot ohrt dot com :
I need to modify his class for some KLM data extracted from Google Earth XML file. Some times data contains multi-line. So we need to
add this "trim" statement to make it work correctly :
function dataHandler($parser, $data){
$data = trim($data);
if(!empty($data)) {
$_output_idx = count($this->output) - 1;
$this->output[$_output_idx]['content'] = $data;
}
}
jbernau at muc dot de
28-Sep-2005 08:41
I liked the xmlparser class by monte (see next comment) very much. So here is my small addition to it.
usage:
<?php
$p =& new xmlParser();
$p->parse('http://domain.com/rss.xml');
$node = $p->GetNodeByPath("root/leaf/leaf2");
print $node[content];
print $node[attrs][my_attr];
?>
The new member function to class xmlparser:
<?php
function GetNodeByPath($path,$tree = false) {
if ($tree) {
$tree_to_search = $tree;
}
else {
$tree_to_search = $this->output;
}
if ($path == "") {
return null;
}
$arrPath = explode('/',$path);
foreach($tree_to_search as $key => $val) {
if (gettype($val) == "array") {
$nodename = $val[name];
if ($nodename == $arrPath[0]) {
if (count($arrPath) == 1) {
return $val;
}
array_shift($arrPath);
$new_path = implode($arrPath,"/");
return $this->GetNodeByPath($new_path,$val[child]);
}
}
}
}
?>
monte at NOT-SP-AM dot ohrt dot com
15-Sep-2005 12:48
Here is another simple XML-to-array parser using the expat library. My goal was something that parsed everything (attributes, data, etc.) cleanly and also encapsulated all of its functionality inside a PHP class. You can supply anything fopen() accepts (file, url, etc.)
usage:
<?php
$p =& new xmlParser();
$p->parse('http://domain.com/rss.xml');
print_r($p->output);
?>
Here is the source:
<?php
class xmlParser{
var $xml_obj = null;
var $output = array();
function xmlParser(){
$this->xml_obj = xml_parser_create();
xml_set_object($this->xml_obj,$this);
xml_set_character_data_handler($this->xml_obj, 'dataHandler');
xml_set_element_handler($this->xml_obj, "startHandler", "endHandler");
}
function parse($path){
if (!($fp = fopen($path, "r"))) {
die("Cannot open XML data file: $path");
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($this->xml_obj, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml_obj)),
xml_get_current_line_number($this->xml_obj)));
xml_parser_free($this->xml_obj);
}
}
return true;
}
function startHandler($parser, $name, $attribs){
$_content = array('name' => $name);
if(!empty($attribs))
$_content['attrs'] = $attribs;
array_push($this->output, $_content);
}
function dataHandler($parser, $data){
if(!empty($data)) {
$_output_idx = count($this->output) - 1;
$this->output[$_output_idx]['content'] = $data;
}
}
function endHandler($parser, $name){
if(count($this->output) > 1) {
$_data = array_pop($this->output);
$_output_idx = count($this->output) - 1;
$this->output[$_output_idx]['child'][] = $_data;
}
}
}
?>
docvert at holloway dot co dot nz
06-Sep-2005 08:36
I see there's a few requests for a simple XML parser that doesn't depend on external libraries... this function parses a string of XML, creating nested arrays and attributes. It supports the /> syntax, multiple root nodes, whitespace in key=value attributes, and single/double quote attribute values. It doesn't, however, support text() nodes.
<?php
function parseXmlString($xmlString)
{
$exitAfterManyLoops = 0;
$xmlArray = array();
$currentNode = &$xmlArray;
$currentHierarchy = array();
$currentDepth = 0;
while($xmlString != '')
{
$exitAfterManyLoops++;
if($exitAfterManyLoops > 300)
{
print "BREAK";
break;
}
$xmlString = trim(substr($xmlString, strpos($xmlString, '<')));
$thisNodeAscends = (substr($xmlString, 1, 1) == '/');
$thisNodeDescends = (substr($xmlString, strpos($xmlString, '>') - 1, 1) != '/');
$openElement = substr($xmlString, strpos($xmlString, ' ') + 1);
$openElement = substr($openElement, 0, strpos($openElement, '>') );
if(substr($openElement, strlen($openElement) - 1, 1) == "/")
{
$openElement = substr($openElement, 0, strlen($openElement) - 1);
}
if($thisNodeAscends)
{
$currentDepth--;
$currentNode = &$currentHierarchy[$currentDepth];
}
else
{
if($thisNodeDescends)
{
$currentNode[] = array('attributes' => parseXmlAttributesString($openElement), 'children' => array());
$currentHierarchy[$currentDepth] = &$currentNode;
$currentDepth++;
$lastItem = &$currentNode[count($currentNode) - 1];
$currentNode = &$lastItem['children'];
}
else //this node is at the same level
{
$currentNode[] = array('attributes' => parseXmlAttributesString($openElement));
}
}
$xmlString = substr($xmlString, strpos($xmlString, '>') + 1);
}
return $xmlArray;
}
function parseXmlAttributesString($xmlElementString)
{
$exitAfter100Loops = 0;
$xmlElementArray = array();
while($xmlElementString != '')
{
$exitAfter100Loops++;
if($exitAfter100Loops > 100)
{
print "BREAK";
break;
}
$equalsCharacterPos = strpos($xmlElementString, '=');
$key = trim(substr($xmlElementString, 0, $equalsCharacterPos));
$xmlElementString = substr($xmlElementString, $equalsCharacterPos + 1);
$openBracket = substr($xmlElementString, 0, 1);
$xmlElementString = substr($xmlElementString, 1);
$endBracketPos = strpos($xmlElementString, $openBracket);
$value = substr($xmlElementString, 0, $endBracketPos);
$xmlElementString = substr($xmlElementString, $endBracketPos + 1);
if($key)
{
$xmlElementArray[$key]=$value;
}
}
return $xmlElementArray;
}
?>
Steven
06-Sep-2005 01:30
Also, please note that xml2array() does not work on arrays such as this...
<?xml version="1.0" encoding="utf-8" ?>
<aws:weather xmlns:aws="http://www.aws.com/aws">
<aws:station id="_____" name="__________" city="_________" state="___" zipcode="______" distance="2.5719" station-type="WeatherBug" />
<aws:station id="_____" name="__________" city="_________" state="___" zipcode="______" distance="3.3379" station-type="WeatherBug" />
</aws:weather>
BinnyVA
24-Aug-2005 04:19
The 'xml2array' functions(see below) have a small problem - they will ignore the Empty-element tags - ie. the tags that close themselves. An example...
<IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" />
If you wish to access the attributes of these tags, use this code before calling the function.
<?php
$data = file_get_contents("atom.xml");
$data = preg_replace('/<(\w+)([^>]*)\/>/s','<\1\2></\1>',$data);
$xml_data = xml2array($data);
?>
I have not modified the function in anyway. There are two versions(HansP's and guyemup's) of the same function. This code was made for the older one - guyemup's function - but can be used with the other variation with some slight modifications.
HansP
18-Aug-2005 09:55
Standing on the shoulders of guyemup (see below), I developed this simple parser which returns a linear array where each key is the heirarchical node name. Attributes and CDATA are both returned.
<?PHP
function xml2array ($name, $xml, $Echar='.', $Achar='/', $discardempty=true)
{
static $Result, $A, $E, $Discard;
if ((strlen ($xml) < 256) && is_file ($xml))
$xml = file_get_contents ($xml);
if ($name == '') {
$Result = array ();
$A = $Achar;
$E = $Echar;
$Discard = $discardempty;
}
$ReElements = '/<(\w+)\s*([^\/>]*)\s*(?:\/>|>(.*)<\/\s*\\1\s*>)/s';
$ReAttributes = '/(\w+)=(?:"|\')([^"\']*)(:?"|\')/';
preg_match_all ($ReElements, $xml, $elements);
foreach ($elements[1] as $ie => $xx) {
if ( $attributes = trim($elements[2][$ie])) {
preg_match_all ($ReAttributes, $attributes, $att);
foreach ($att[1] as $ia => $xx)
$Result[$name.$E.$elements[1][$ie].$A.$att[1][$ia]] = $att[2][$ia];
}
if (preg_match ($ReElements, $elements[3][$ie]))
xml2array ($name ? $name.$E.$elements[1][$ie] : $elements[1][$ie], $elements[3][$ie]);
else if (!$Discard || $elements[3][$ie])
$Result[$name.$E.$elements[1][$ie]] = $elements[3][$ie];
}
return $Result;
}
?>
Given as input
<XML>
<title> Sample App </title>
<version> v. 1.0</version>
<window>
<height>220</height>
<width>420</width>
</window>
<parameters p1='value 1' p2='value 2'>
<p3 p31='value 1 of p3'> ' value of 3 '</p3>
</parameters>
</XML>
The output is
Array
(
[XML.title] => Sample App
[XML.version] => v. 1.0
[XML.window.height] => 220
[XML.window.width] => 420
[XML.parameters/p1] => value 1
[XML.parameters/p2] => value 2
[XML.parameters.p3/p31] => value 1 of p3
[XML.parameters.p3] => ' value of 3 '
)
The parser is very forgiving and does not check for errors.
pj at netfire dot com dot au
26-Jul-2005 10:55
Here is a very lightweight alternate xml parser object,
it comes in handy if installing the xml parsing package is a problem at an installation.
Thought this might be useful for someone.
<?php
/////////////////////////////////////////////////////////////////////
//
// Module : SimpleXMLParser.php
// DateTime : 26/07/2005 11:32
// Author : Phillip J. Whillier
// Purpose : Very lightwieght "simple" XML parser does not support attributes.
//
/////////////////////////////////////////////////////////////////////
class SimpleXMLParser {
// Find the max number of specific nodes in the XML
function MaxElements($XMLSource, $XMLName) {
$MaxElements = 0;
$XMLTag = "<" . $XMLName . ">";
$Y = $this->instr($XMLSource, $XMLTag);
while($Y>=0) {
$MaxElements = $MaxElements + 1;
$Y = $this->instr($XMLSource, $XMLTag, $Y + strlen($XMLTag));
}
return $MaxElements;
}
// Parse xml to retrieve a specific element
// Instance number is a zero based index.
function Parse($XMLSource, $XMLName, $aInstance = 0, $Default = "") {
$XMLLength = strlen($XMLSource);
$XMLTag = "<" . $XMLName . ">";
$XMLTagEnd = "</" . $XMLName . ">";
$Instance = $aInstance + 1;
/* Find the start of the requested instance... */
$XMLStart = 0;
for($x = 1; $x < $Instance + 1; $x++) {
$Y = $this->instr($XMLSource, $XMLTag, $XMLStart);
if ($Y >= $XMLStart) {
$XMLStart = $Y + strlen($XMLTag);
}
else {
return $Default;
}
}
/* Find the end of the instance... */
$XMLEnd = $XMLStart;
$XMLMatch = 1;
while($XMLMatch) {
$c = substr($XMLSource, $XMLEnd, strlen($XMLTagEnd));
if($c == $XMLTagEnd) {
$XMLMatch = $XMLMatch - 1;
}
else {
if (substr(c, 0, 1) == $XMLTag) {
$XMLMatch = $XMLMatch + 1;
}
}
$XMLEnd = $XMLEnd + 1;
if ($XMLEnd == $XMLLength) {
return $Dufault;
}
}
return substr($XMLSource, $XMLStart, $XMLEnd - $XMLStart - 1);
}
// Helper function for finding substrings
function instr($haystack, $needle, $pos = 0) {
$thispos = strpos($haystack, $needle, $pos);
if ($thispos===false)
$thispos = -1;
return $thispos;
}
}
/////////////////////////////////////////////////////////////////////
// Testing the XML
//
//
TestXML();
function TestXML() {
$xmlObj = new SimpleXMLParser;
$myxml = "<detail><name>myname</name><name>myname2</name>" .
"<name>myname3</name><address><address1>my address</address1>" .
"</address></detail>";
$xml = $xmlObj->Parse($myxml, "detail");
print($xml . "\n");
$maxName = $xmlObj->MaxElements($xml, "name");
echo "maxName=$maxName\n";
$name = $xmlObj->Parse($xml, "name", 0);
echo "name1=$name\n";
$name = $xmlObj->Parse($xml, "name", 1);
echo "name2=$name\n";
$name = $xmlObj->Parse($xml, "name", 2);
echo "name3=$name\n";
print("Address:" . $xmlObj->Parse($xmlObj->Parse($xml, "address") , "address1") . "\n");
for($i=0; $i < $maxName; $i++) {
$name = $xmlObj->Parse($xml, "name", $i);
print("Name:" . $name . "\n");
}
}
?>
guyemup at yahoo dot com - heath
20-Jul-2005 11:24
Expanding further on Vladson's/Alper's code, this will create an array of attributes within the parent node called "{Parent} attributes". Probably isn't the most eloquent of code, but, it works.
function xml2array( $textXml )
{
$regExElements = '/<(\w+)([^>]*)>(.*?)<\/\\1>/s';
$regExAttributes = '/(\w+)="([^"]*)"/';
preg_match_all( $regExElements, $textXml, $matchElements );
foreach ( $matchElements[1] as $keyElements=>$valElements ) {
if ( $matchElements[2][$keyElements] )
{
preg_match_all( $regExAttributes, $matchElements[2][$keyElements], $matchAttributes );
foreach ( $matchAttributes[0] as $keyAttributes=>$valAttributes )
{
$arrayAttributes[ $valElements.' attributes' ][$matchAttributes[1][ $keyAttributes ] ] = $matchAttributes[2][ $keyAttributes ];
}
}
else
{
$arrayAttributes = null;
}
if ( preg_match( $regExElements, $matchElements[3][$keyElements]) ) {
if ( $arrayAttributes )
{
$arrayFinal[ $valElements ][ $valElements.' attributes' ] = $arrayAttributes[ $valElements.' attributes' ];
}
$arrayFinal[ $valElements ][] = xml2array( $matchElements[3][$keyElements] );
}
else
{
$arrayFinal[ $valElements ] = $matchElements[3][ $keyElements ];
$arrayFinal = array_merge( $arrayFinal, $arrayAttributes );
}
}
return $arrayFinal;
}
alper at sabanciuniv dot edu
10-Jul-2005 07:56
vladson has written a good code to converting an xml to an array. It is a good function but it is not work properly if xml elements has an attribute. I have cahanged the code to work properly when it has attributes.
The code is:
function xml2array($text) {
$reg_exp = '/<(\w+)[^>]*>(.*?)<\/\\1>/s';
preg_match_all($reg_exp, $text, $match);
foreach ($match[1] as $key=>$val) {
if ( preg_match($reg_exp, $match[2][$key]) ) {
$array[$val][] = xml2array($match[2][$key]);
} else {
$array[$val] = $match[2][$key];
}
}
return $array;
}
Alper Sari
MauricioMassaia
28-Jun-2005 01:10
/*xmlClass
example:
include("xmlClass.php");
$xml = new xmlClass();
$xml->parse("image/image1.xml");
*/
class xmlClass{
//xml
var $xml;
var $path;
var $tagName;
//counter
var $index = 0;
function xmlClass(){
$this->xml = xml_parser_create();
xml_set_object($this->xml,$this);
xml_set_character_data_handler($this->xml, 'dataHandler');
xml_set_element_handler($this->xml, "startHandler", "endHandler");
}
function parse($path){
$this->path = $path;
if (!($fp = fopen($this->path, "r"))) {
die("Cannot open XML data file: $file");
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($this->xml, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->xml)),
xml_get_current_line_number($this->xml)));
xml_parser_free($this->xml);
}
}
return true;
}
function startHandler($parser, $name, $attribs){
//add your commands here
}
function dataHandler($parser, $data){
//add your commands here
}
function endHandler($parser, $name){
$this->tagName = "";
}
}
?>
matt at tux dot appstate dot edu
16-Jun-2005 10:12
$insert = array('tag'=> $tag, 'value' => subvalue($arr_vals));
should read
$insert = array('tag'=> $tag, 'value' => parseAgain($arr_vals));
sorry
matt [at] tux dot appstate dot edu
15-Jun-2005 11:38
I was looking all over a simple parser. Thanks!
Here is my addition. I wanted something a little easier to parse. Hope it is useful to someone else.
function parseAgain(&$arr_vals) {
while (@$xml_val = array_shift($arr_vals)) {
extract($xml_val);
if ($type == 'close') {
return $new_val;
} elseif ($type == 'cdata') {
continue;
} elseif ($type == 'complete') {
$insert = array('tag'=> $tag, 'value'=> $value);
} else {
$insert = array('tag'=> $tag, 'value' => subvalue($arr_vals));
}
$new_val[] = $insert;
}
return $new_val;
}
Just send it the result from the xml2php function.
Cheers!
lorecarra at postino dot it
02-Jun-2005 10:41
I just wanted to post a correction to the script proposed by mv, a pair of posts below.
That script is quite good and really simple and fast, but i think there is a misunderstanding of the type "complete" in the array that the function xml_parse_into_struct creates.
When the parser meets an open tag FOLLOWED by another open tag, it classifies the first tag as "open" in the structure:
<tagone> <-------- "open" tag
<tagtwo>
When it finds a close tag FOLLOWING another close tag, it classifies the first tag as "close" :
</tagtwo>
</tagone> <--------- "close" tag
Here we are : when the parser finds an open tag, followed by content, followed by a close tag, it classifies the WHOLE thing as "complete":
<tag>hi my name is</tag> <------- "complete" tag
mv's script php2xml echoed "complete" tags as :
<tag attribute="value" />
, which (my opinion!) is wrong.
The rest is classified as "cdata", and can be trashed.
Now, i report a slight modification of mv's script; if my interpretation is wrong, let me know!
<?
$file = "menu_items.xml";
/**
* Get contents from XML file and insert into an array
*
* @param string $file
* @return mixed
*/
function xml2php($file) {
$xml_parser = xml_parser_create();
if (!($fp = fopen($file, "r"))) {
die("unable to open XML");
}
$contents = fread($fp, filesize($file));
fclose($fp);
xml_parse_into_struct($xml_parser, $contents, $arr_vals);
xml_parser_free($xml_parser);
return $arr_vals;
}
/**
* Convert php array to XML
*
* @param mixed $array_haystack
* @return string
*/
function php2xml($array_haystack) {
$xml = "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n";
if ((!empty($array_haystack)) AND (is_array($array_haystack))) {
foreach ($array_haystack as $xml_key => $xml_value) {
switch ($xml_value["type"]) {
case "open":
$xml .= str_repeat("\t", $xml_value["level"] - 1);
$xml .= "<" . strtolower($xml_value["tag"]);
if(isset($xml_value["attributes"])) $xml .= parseattributes($xml_value["attributes"]);
$xml .= ">\n";
break;
case "complete":
$xml .= str_repeat("\t", $xml_value["level"] - 1);
$xml .= "<" . strtolower($xml_value["tag"]);
if(isset($xml_value["attributes"])) $xml .= parseattributes($xml_value["attributes"]);
$xml .= ">" . $xml_value["value"];
$xml .= "</" . strtolower($xml_value["tag"]);
$xml .= ">\n";
break;
case "close":
$xml .= str_repeat("\t", $xml_value["level"] - 1);
$xml .= "</" . strtolower($xml_value["tag"]);
$xml .= ">\n";
break;
default:
break;
}
}
}
return $xml;
}
/**
* Convert php attributes array to XML
*
* @param mixed $attributes
* @return string
*/
function parseattributes($attributes)
{
$xml = "";
foreach ($attributes as $attribute_key => $attribute_value) {
$xml .= sprintf(' %s="%s"', strtolower($attribute_key), $attribute_value);
}
return $xml;
}
/**
* Output content as XML
*
* @param string $content
*/
function output_xml($content) {
header("Content-Type: application/xml; charset=ISO-8859-1");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: ". gmdate("D, d M Y H:i:s") ." GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
print $content;
}
$arr_xml = xml2php($file);
$xml = php2xml($arr_xml);
output_xml($xml);
?>
Bye,
lorenzo
vladson at pc-labs dot info
02-Jun-2005 05:35
A month ago i was looking for function like one i wrote now, i hope some one who looking for it too, will find it here...
(it's not like xml_parse_into_struct but it converts xml to array too)
function xml2array($text) {
$reg_exp = '/<(.*?)>(.*?)<\/\\1>/s';
preg_match_all($reg_exp, $text, $match);
foreach ($match[1] as $key=>$val) {
if ( preg_match($reg_exp, $match[2][$key]) ) {
$array[$val][] = xml2array($match[2][$key]);
} else {
$array[$val] = $match[2][$key];
}
}
return $array;
}
rsl at zonapersonal dot com
24-May-2005 10:38
You can find a group of funtions to convert and use xml to anarray at www.php-xmla.zonapersonal.com
mv at brazil dot com
25-Apr-2005 08:07
here a exemple to parse XML file to an PHP array and PHP array to XML.
<?
$file = "menu_items.xml";
/**
* Get contents from XML file and insert into an array
*
* @param string $file
* @return mixed
*/
function xml2php($file) {
$xml_parser = xml_parser_create();
if (!($fp = fopen($file, "r"))) {
die("unable to open XML");
}
$contents = fread($fp, filesize($file));
fclose($fp);
xml_parse_into_struct($xml_parser, $contents, $arr_vals);
xml_parser_free($xml_parser);
return $arr_vals;
}
/**
* Convert php array to XML
*
* @param mixed $array_haystack
* @return string
*/
function php2xml($array_haystack) {
$xml = "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n";
if ((!empty($array_haystack)) AND (is_array($array_haystack))) {
foreach ($array_haystack as $xml_key => $xml_value) {
switch ($xml_value["type"]) {
case "open":
$xml .= str_repeat("\t", $xml_value["level"] - 1);
$xml .= "<" . strtolower($xml_value["tag"]);
$xml .= (!isset($xml_value["attributes"]))? ">\n": false;
break;
case "complete":
$xml .= str_repeat("\t", $xml_value["level"] - 1);
$xml .= "<" . strtolower($xml_value["tag"]);
$xml .= (!isset($xml_value["attributes"]))? ">\n": false;
break;
case "close":
$xml .= str_repeat("\t", $xml_value["level"] - 1);
$xml .= "</" . strtolower($xml_value["tag"]);
$xml .= (!isset($xml_value["attributes"]))? ">\n": false;
break;
default:
break;
}
if (isset($xml_value["attributes"])) {
foreach ($xml_value["attributes"] as $atribute_key => $atribute_value) {
$xml .= sprintf(' %s="%s"', strtolower($atribute_key), $atribute_value);
}
$xml .= ($xml_value["type"] == "complete")? " />\n": ">\n";
}
}
}
return $xml;
}
/**
* Output content as XML
*
* @param string $content
*/
function output_xml($content) {
header("Content-Type: application/xml; charset=ISO-8859-1");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: ". gmdate("D, d M Y H:i:s") ." GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
print $content;
}
$arr_xml = xml2php($file);
$xml = php2xml($arr_xml);
output_xml($xml);
?>
salloum_corp at yahoo dot com
25-Apr-2005 03:23
An example FIX in XMLSimpleParser ..
pille at hbr1 dot com contributed a very very useful parser +regenerator, however, there is a small problem when handling an array.
When adding an object to an array using array_push function,
a new copy of the object is created and stored in the array. Therefore, any change made to the original object will not be
seen in the copy withing the array!
$this->current needs to reference the object in the array!
The only change needed is in replacing the following lines withing startElement function in the second if statement:
array_push( $this->current->$tag, $obj );
$this->current =& $obj;
Replace with:
$num=array_push( $this->current->$tag, $obj );
$arrayref =& $this->current->$tag;
$this->current =& $arrayref[$num-1];
and the function will be eventually:
function startElement($parser, $tag, $attributeList) {
if( is_object( $this->current->$tag ) ) {
$obj = $this->current->$tag;
$this->current->$tag = array();
array_push( $this->current->$tag, $obj );
}
if( is_array( $this->current->$tag ) ) {
$obj =& new stdClass;
$obj->_PARENT =& $this->current;
$obj->_ITEMS = 0;
$num=array_push( $this->current->$tag, $obj );
$arrayref =& $this->current->$tag;
$this->current =& $arrayref[$num-1];
}
else {
$this->current->$tag->_PARENT =& $this->current;
$this->current =& $this->current->$tag;
$this->current->_ITEMS = 0;
}
$this->current->_PARENT->_ITEMS ++;
}
simonguada at yahoo dot fr
06-Apr-2005 05:31
to import xml into mysql
$file = "article_2_3032005467.xml";
$feed = array();
$key = "";
$info = "";
function startElement($xml_parser, $attrs ) {
global $feed;
}
function endElement($xml_parser, $name) {
global $feed, $info;
$key = $name;
$feed[$key] = $info;
$info = ""; }
function charData($xml_parser, $data ) {
global $info;
$info .= $data; }
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "charData" );
$fp = fopen($file, "r");
while ($data = fread($fp, 8192))
!xml_parse($xml_parser, $data, feof($fp));
xml_parser_free($xml_parser);
$sql= "INSERT INTO `article` ( `";
$j=0;
$i=count($feed);
foreach( $feed as $assoc_index => $value )
{
$j++;
$sql.= strtolower($assoc_index);
if($i>$j) $sql.= "` , `";
if($i<=$j) {$sql.= "` ) VALUES ('";}
}
$h=0;
foreach( $feed as $assoc_index => $value )
{
$h++;
$sql.= utf8_decode(trim(addslashes($value)));
if($i-1>$h) $sql.= "', '";
if($i<=$h) $sql.= "','')";
}
$sql=trim($sql);
echo $sql;
kerxen at caramail dot com
28-Mar-2005 10:04
To use XML with objetcs, we can use xml_set_object () :
Pour utiliser XML avec des objets, on utilise xml_set_object () :
class xml {
var $parser;
function xml() // constructor
{
$this->parser = xml_parser_create();
xml_set_object($this->parser, $this);
xml_set_element_handler($this->parser, "tag_open", "tag_close");
xml_set_character_data_handler($this->parser, "cdata");
}
function tag_open($parser, $tag, $attributes)
{
var_dump($parser, $tag, $attributes);
}
function tag_close($parser, $tag)
{
var_dump($parser, $tag);
}
function cdata($parser, $cdata)
{
var_dump($parser, $cdata);
}
function parse($data)
{
xml_parse($this->parser, $data);
}
} // end of class xml
$xml_parser = new xml(); // creation of the objet
$xml_parser->parse("<a id='hello World'>PHP</a>");
Have a nice use of this piece of code.
This can be used to share XML files with others sites.
Eddy
http://www.djfrance.net
dma05 at web dot de
24-Mar-2005 03:48
re: cubecode at cubecode dot com
If you read the whole file in one chunk, you should watch out for the memory limit that is set in PHP's configuration. Many web servers only allow a few megabytes of ram to be used by your script, so if you were trying to parse a big file on one of those, your script might run out of memory and won't be able to finish.
Reading the file in small chunks and then concatenating the plaintext-parts might prevent that.
cubecode at cubecode dot com
23-Mar-2005 03:33
In my test on SVG source, whith :
while ($data = fread($fp, 4096)),
if 4096 byte is at the botom of text in <TSPAN>,
XMLhandler cut plain text in 2 parts.
I use :
$data = fread( $fp, filesize($source) ))
john at etechdata dot com dot au
08-Mar-2005 06:42
This code uses CURL to connect to a server and post XML info to it and then capture the response from the server. I spent hours trying to find the solution and it's really quite simple. Easy to say AFTER you find the answer. Hope it helps someone.
<?php
// store your XML code into a variable line by line
// this is simply an example. you should replace it with your XML code
$XPost = "<?xml version='1.0' encoding='UTF-8'?>";
$XPost .= "<XMLCodeBody>";
$XPost .= "<MessageInfo>";
$XPost .= "<messageID>8af793f9af34bea0ecd7eff71c94d6</messageID>";
$XPost .= "<messageTimestamp>20040710050758444000+600</messageTimestamp>";
$XPost .= "<timeoutValue>60</timeoutValue>";
$XPost .= "<apiVersion>spxml-3.0</apiVersion>";
$XPost .= "</MessageInfo>";
$XPost .= "</XMLCodeBody>";
$url = "https://www.urltopost.data.to"; // enter the URL to post to here
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_HEADER, 1); // capture the returned headers
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
curl_setopt($ch, CURLOPT_POSTFIELDS, $XPost); // add POST fields
$result = curl_exec($ch); // run the whole process
// the headers returned from the server will now be stored in $result
// for now display the result to screen
echo "<pre>";
print_r($result);
echo "</pre>";
// once you have the result you can simply extract the response and use it.
?>
pille at hbr1 dot com
05-Mar-2005 09:52
A simple xml parser +regenerator class without handling attributes:
<?
class XMLSimpleParser {
function XMLSimpleParser($data,$encoding='') {
$this->document = new stdClass();
$this->current =& $this->document;
$xml_parser = xml_parser_create($encoding);
xml_set_object($xml_parser, &$this);
xml_set_element_handler($xml_parser, 'startElement', 'endElement');
xml_set_character_data_handler($xml_parser, 'characterData');
xml_parse($xml_parser, $data, true);
$this->encoding = xml_parser_get_option( $xml_parser, XML_OPTION_TARGET_ENCODING );
xml_parser_free($xml_parser);
unset( $this->document->_ITEMS );
unset( $this->current );
}
function startElement($parser, $tag, $attributeList) {
if( is_object( $this->current->$tag ) ) {
$obj = $this->current->$tag;
$this->current->$tag = array();
array_push( $this->current->$tag, $obj );
}
if( is_array( $this->current->$tag ) ) {
$obj =& new stdClass;
$obj->_PARENT =& $this->current;
$obj->_ITEMS = 0;
array_push( $this->current->$tag, &$obj );
$this->current =& $obj;
}
else {
$this->current->$tag->_PARENT =& $this->current;
$this->current =& $this->current->$tag;
$this->current->_ITEMS = 0;
}
$this->current->_PARENT->_ITEMS ++;
}
function endElement($parser, $tag) {
$parent =& $this->current->_PARENT;
if( $this->current->_DATA != '' || $this->current->_ITEMS == 0 ) {
$this->current = $this->current->_DATA;
}
else {
unset( $this->current->_PARENT );
unset( $this->current->_ITEMS );
unset( $this->current->_DATA );
}
$this->current =& $parent;
}
function characterData($parser, $data) {
$this->current->_DATA = trim( $data );
}
function generateXML( $encoding = '' ) {
if( ! $encoding ) $encoding = $this->encoding;
$this->xml = '<?xml version="1.0"';
if( $encoding ) $this->xml .= ' encoding="' . $encoding . '"';
$this->xml .= "?>\n";
$this->xml .= $this->_generateXML( $this->document, 0 );
return $this->xml;
}
function _generateXML( $item, $level ) {
$xml = '';
if( is_object( $item ) ) {
$vars = get_object_vars( $item );
foreach( $vars as $key => $val ) {
if( is_array( $val ) ) {
foreach( $val as $entry ) {
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '<' . $key;
if( $xml2 = $this->_generateXML( $entry, $level + 1 ) ) {
$xml .= ">\n" . $xml2;
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '</' . $key . '>' . "\n";
}
else {
$xml .= " />\n";
}
}
}
else if( is_object( $val ) ) {
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '<' . $key;
if( $xml2 = $this->_generateXML( $val, $level + 1 ) ) {
$xml .= ">\n" . $xml2;
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .= '</' . $key . ">\n";
}
else {
$xml .= " />\n";
}
}
else {
for( $i = 0; $i < $level; $i ++ )
$xml .= "\t";
$xml .=
'<' . $key . '>' .
$val .
'</' . $key . ">\n";
}
}
}
return $xml;
}
}
?>
php at NOSPAM dot stratos-online dot nl
10-Feb-2005 04:48
the XML_OPTION_SKIP_WHITE din't work for me.
Or i don't fully understand what it is supposed to do.
either way, if you want to get rid of all access white space, new lines and tabs in your formatted XML, the following code snippet might help.
<?php
$buffer = preg_replace('/\>(\n|\r|\r\n| |\t)*\</','><',$buffer);
?>
i'm sure it isn't effecient, but atleast it works. (for me)
however when you would be parsing cdata with sgml style tags in them. (< >) I'm sure it will horribly mess it up.
compu_global_hyper_mega_net_2 at yahoo dot com
20-Sep-2004 04:35
The documentation regarding white space was never complete I think.
The XML_OPTION_SKIP_WHITE doesn't appear to do anything. I want to preserve the newlines in a cdata section. Setting XML_OPTION_SKIP_WHITE to 0 or false doesn't appear to help. My character_data_handler is getting called once for each line. This obviously should be reflected in the documentation as well. When/how often does the handler get called exactly? Having to build separate test cases is very time consuming.
Inserting newlines myself in my cdata handler is no good either. For non actual CDATA sections that cause my handler to get called, long lines are split up in multiple calls. My handler would not be able to tell the difference whether or not the subsequent calls would be due to the fact that the data is coming from the next line or the fact that some internal buffer is long enough for it to 'flush' out and call the handler.
This behaviour also needs to be properly documented.
andrewcare at execulink dot com
02-Jul-2004 07:24
I've been working on a similiar tree-based generator (although somewhat simpler), and I thought that it might be helpful to a developer just starting out:
http://www.drmatta.com/new/xml/
Simplified source:
<?
$file = /*File to be parsed*/;
$elements = $stack = array();
$count = $depth = 0;
class element{
var $name = '';
var $attributes = array();
var $data = '';
var $depth = 0;
}
function start_element_handler($parser, $name, $attribs){
global $elements, $stack, $count, $depth;
$id = $count;
$element = new element;
$elements[$id] = $element;
$elements[$id]->name = $name;
while(list($key, $value) = each($attribs))
$elements[$id]->attributes[$key] = $value;
$elements[$id]->depth = $depth;
array_push($stack, $id);
$count++;
$depth++;
}
function end_element_handler($parser, $name){
global $stack, $depth;
array_pop($stack);
$depth--;
}
function character_data_handler($parser, $data){
global $elements, $stack;
$elements[$stack[count($stack)-1]]->data .= $data;
}
$xml_parser = xml_parser_create('');
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 0);
xml_set_element_handler($xml_parser, "start_element_handler", "end_element_handler");
xml_set_character_data_handler($xml_parser, "character_data_handler");
if(!file_exists($file))
die("\n<p>\"$file\" does not exist.</p>\n</body>\n</html>");
if(!($handle = fopen($file, "r")))
die("<p>Cannot open \"$file\".</p>\n</body>\n</html>");
while($contents = fread($handle, 4096))
xml_parse($xml_parser, $contents, feof($handle));
fclose($handle);
xml_parser_free($xml_parser);
echo "<hr />\n";
$depth = $offset = 0;
while(list($key_a) = each($elements)){
$depth--;
$offset = 0;
if($elements[$key_a]->depth < $depth){
while($elements[$key_a]->depth != (($elements[$key_a - $offset]->depth) - 1) || $offset == 0){
$offset++;
if($elements[$key_a]->depth == (($elements[$key_a - $offset]->depth) - 1))
echo "<dl>\n<dt><strong>Element Closed:</strong></dt>\n<dd>" . $elements[$key_a - $offset]->name . "</dd>\n</dl>\n<hr />\n";
}
$depth--;
}
if($elements[$key_a]->depth == $depth && $depth != 0){
while($elements[$key_a]->depth != $elements[$key_a - $offset]->depth || $offset == 0){
$offset++;
if($elements[$key_a]->depth == $elements[$key_a - $offset]->depth)
echo "<dl>\n<dt><strong>Element Closed:</strong></dt>\n<dd>" . $elements[$key_a - $offset]->name . "</dd>\n</dl>\n<hr />\n";
}
$depth--;
}
$depth++;
echo "<dl>\n<dt><strong>Element:</strong></dt>\n<dd>" . $elements[$key_a]->name . "</dd>\n</dl>\n";
echo "<dl>\n<dt><strong>Attributes:</strong></dt>\n";
if(empty($elements[$key_a]->attributes))
echo "<dd>No attributes specified</dd>\n";
else{
while(list($key_b, $value) = each($elements[$key_a]->attributes))
echo "<dd>$key_b=\"$value\"</dd>\n";
}
echo "</dl>\n<dl>\n<dt><strong>Data:</strong></dt>\n";
if(trim($elements[$key_a]->data) == '')
echo "<dd>No data specified</dd>\n";
else
echo "<dd>" . $elements[$key_a]->data . "</dd>\n";
echo "</dl>\n<dl>\n<dt><strong>Depth:</strong></dt>\n<dd>" . $elements[$key_a]->depth . "</dd>\n</dl>\n<hr />\n";
$depth++;
}
$depth--;
for($i = $depth; $i >= 0; $i--){
$offset = 0;
$count = count($elements) - 1;
for($j = 0; $j <= $count; $j++){
if($elements[$count - $j]->depth == $depth){
echo "<dl>\n<dt><strong>Element Closed:</strong></dt>\n<dd>" . $elements[$count - $j]->name . "</dd>\n</dl>\n<hr />\n";
break;
}
}
$depth--;
}
?>
A few good tutorials on the subject of parsing XML with PHP:
http://www.zend.com/zend/art/parsing.php
http://www.sitepoint.com/article/560/
talraith at withouthonor dot com
29-Jun-2004 09:11
If you are looking for some heavy duty code to parse or create XML documents, then may I suggest taking a look at a class module I am working on. The module is complete except for support of namespaces and XPath.
The class takes a string of XML code and creates a TRUE object tree. Likewise, you can create a tree in your code and generate an XML document. There are no eval() statements used at all unlike some of the other examples shown here.
I posted this a while ago, but it has since been buried by a number of posts and I believe it to be beneficial to anyone looking to use XML / PHP to see this information.
http://www.withouthonor.com/obj_xml.phps for the source code. Sample usage can be found in my post below.
torsten at jserver dot de
08-Jun-2004 06:43
I expanded the function below a little bit, cause I wasn't really happy with the array created. This version creates an array, which has the same structure as the XML-Tree
<?php
// put elements, that need to be put in a list
// here
$XML_LIST_ELEMENTS = array( "concert", "song" );
function makeXMLTree($file)
{
// read file
$open_file = fopen($file, "r");
$data = "";
while ($r=fread($open_file,8192) ) {
$data .= $r;
}
// create parser
$parser = xml_parser_create();
xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,0);
xml_parser_set_option($parser,XML_OPTION_SKIP_WHITE,1);
xml_parse_into_struct($parser,$data,$values,$tags);
xml_parser_free($parser);
// we store our path here
$hash_stack = array();
// this is our target
$ret = array();
foreach ($values as $key => $val) {
switch ($val['type']) {
case 'open':
array_push($hash_stack, $val['tag']);
if (isset($val['attributes']))
$ret = composeArray($ret, $hash_stack, $val['attributes']);
else
$ret = composeArray($ret, $hash_stack);
break;
case 'close':
array_pop($hash_stack);
break;
case 'complete':
array_push($hash_stack, $val['tag']);
$ret = composeArray($ret, $hash_stack, $val['value']);
array_pop($hash_stack);
// handle attributes
if (isset($val['attributes']))
{
while(list($a_k,$a_v) = each($val['attributes']))
{
$hash_stack[] = $val['tag']."_attribute_".$a_k;
$ret = composeArray($ret, $hash_stack, $a_v);
array_pop($hash_stack);
}
}
break;
}
}
return $ret;
}
function &composeArray($array, $elements, $value=array())
{
global $XML_LIST_ELEMENTS;
// get current element
$element = array_shift($elements);
// does the current element refer to a list
if (in_array($element,$XML_LIST_ELEMENTS))
{
// more elements?
if(sizeof($elements) > 0)
{
$array[$element][sizeof($array[$element])-1] = &composeArray($array[$element][sizeof($array[$element])-1], $elements, $value);
}
else // if (is_array($value))
{
$array[$element][sizeof($array[$element])] = $value;
}
}
else
{
// more elements?
if(sizeof($elements) > 0)
{
$array[$element] = &composeArray($array[$element], $elements, $value);
}
else
{
$array[$element] = $value;
}
}
return $array;
}
echo "<pre>";
$res = makeXMLTree($xml_file);
var_dump($res);
echo "</pre>";
?>
juliano at setor4 dot com
26-May-2004 12:56
An update of rcotta at ig dot com dot br. The function below will not overwrite an existent element.
<?
function makeXMLTree($file) {
$open_file = fopen($file, "r");
$data = fread($open_file, filesize($file));
$ret = array();
$parser = xml_parser_create();
xml_parser_set_option($parser,XML_OPTION_CASE_FOLDING,0);
xml_parser_set_option($parser,XML_OPTION_SKIP_WHITE,1);
xml_parse_into_struct($parser,$data,$values,$tags);
xml_parser_free($parser);
$hash_stack = array();
$a=0;
foreach ($values as $key => $val) {
switch ($val['type']) {
case 'open':
array_push($hash_stack, $val['tag']);
break;
case 'close':
array_pop($hash_stack);
break;
case 'complete':
array_push($hash_stack, $val['tag']);
// uncomment to see what this function is doing
/* echo("\$ret[$a][" . implode($hash_stack, "][") . "] = '{$val[value]}';\n");
$a++;*/
eval("
\$ret[\$a][" . implode($hash_stack, "][") . "] = '{$val[value]}';
\$a++;");
array_pop($hash_stack);
break;
}
}
return $ret;
}
$res = makeXMLTree($xml_file);
print_r($res);
?>
johnt at divector dot net
13-May-2004 04:40
When I first read this documentation, and tried the examples none of which seemed to work. The contributed ones in the notes were a bit "long winded" for a simple example to demonstrate a working example of the required functions. So I came up with this, while not perfect, it does execute and give an idea of the functions and Call back functions to create an XML Parser.
<?PHP
// Variables
$file = "xmldata.xml";
$feed = array();
$key = "";
$info = "";
$in_HEAD = false;
function startElement($xml_parser, $name, $attrs ) {
global $feed, $key, $in_HEAD;
$key = $name;
if( $name == "HEAD" )
$in_HEAD = true; }
function endElement($xml_parser, $name) {
// The Workhorse of the Call Back Functions
// Most of the programming will be put in this function.
global $feed, $key, $info, $in_HEAD;
if( $name == "HEAD" )
$in_HEAD = false;
if($in_HEAD==false)
$key = $name;
elseif( $in_HEAD )
$key = "HEAD_".$name;
$feed[$key] = $info;
$info = ""; }
function charData($xml_parser, $data ) {
// $xml_parser - The resource ID for this parser
// $data - The character data returned by the parser, from the XML file
global $info;
$info .= $data; }
// The Beginning of Execution *******************************************
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "charData" );
$fp = fopen($file, "r");
while ($data = fread($fp, 8192))
!xml_parse($xml_parser, $data, feof($fp));
xml_parser_free($xml_parser);
// Start Web page
echo "<HTML>\n";
echo "<HEAD>\n";
echo "<TITLE>".$feed['HEAD_TITLE']."</TITLE>\n";
echo "</HEAD>\n";
echo "<BODY>\n";
echo "<CENTER><H1>".$feed['HEAD_TITLE']."</H1></CENTER>\n";
echo "<HR>\n";
foreach( $feed as $assoc_index => $value )
{
echo "\$assoc_index = $assoc_index<BR> \$value = $value<BR><BR>\n";
}
echo "</BODY>\n";
echo "</HTML>\n";
?>
<?xml version="1.0" encoding="UTF-8"?>
<XML>
<HEAD>
<TITLE>XML Data Demo</TITLE>
<DESCRIPTION>XML Data Demo for testing XML parsers. A Simple demo for demonstrating the PHP Call Back functions.</DESCRIPTION>
</HEAD>
<FUNCLIST>The functions necessary for Parser Creation are: xml_parser_create(); xml_set_element_handler($xml_parser, "startElement", "endElement");xml_set_character_data_handler($xml_parser, "charData" );</FUNCLIST>
<RECAP>The Really neat thing here is allowing the programmer complete control over these call back functions to parse virtually any XML file. In my opinion, an extra variable in the Call Back functions allowing an array to be passed would be better, this would keep globals from being used.</RECAP>
</XML>
moc.oohay@mijito
07-May-2004 01:21
I found a type-o in the XMLTag->addChild function. I re-examined the code and changed the function so it is a little cleaner.
Also, as an interesting side-note. I ran the script on a 1 MB XML file. The php.exe memory usage exceeded 50 MB during runtime. I did a print_r($XML_data) dumping into a plain-text file which resulted in a 70 MB text file. However, after I removed all the [spaces] used for formatting and readability the file size was reduced to 5 MB.
This script may not be efficient for very large data sets. ;)
I am very pleased that the script parsed the file without error. A very successful "real world" test.
Repaired addChild:
function addChild($XMLTag_obj) {
$key = $XMLTag_obj->name;
// If this tag *name* is not already a child initialize it.
if ( !isset($this->children[$key][0]) ) {
$this->children[$key][0] = 0;
}
// Get the next array index. This is the next available location to store the child
$index = $this->children[$key][0] + 1;
// Add the child and update the tag count
$this->children[$key][$index] = $XMLTag_obj;
$this->children[$key][0]++;
// Return the Child Tag
return $this->children[$key][$index];
}
otijim at AT at yahoo dot dot dot com
04-May-2004 03:47
After going through all the examples of XML to data structure examples posted here and having problems will all of them I came up with my own. It's not thoroughly tested but works very well for me.
This example will not give any 'depricated pass by reference' errors and returns false on mal-formed XML.
Example of using the Classes:
<?
$myXMLParser = new XMLStructParser;
$cds_XMLTag = $myXMLParser->parse('<cd_list><cd title="Best of PBS"><track number="1">Sesame Street Theme</track></cd></cd_list>');
print $bookXMLTag->children['cd_list'][1]->children['track'][1]->cdata;
?>
Here are the two class files:
XMLStructParse.phpclass
<?
require_once("XMLTag.phpclass");
class XMLStructParser {
var $index; // Tracks position in the stack
var $obj_data; // Holds the root XMLTag object
var $stack; // Stack to track where we are in the XML Hiarchy
// Constructor
function XMLStructParser() {
}
function parse($data) {
// Prepare the Object Array
$this->index = 0;
$this->obj_data = new XMLTag("XML");
// Prepare the Stack
$this->stack[$this->index] = &$this->obj_data;
// Setup the XML Parser
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);
xml_set_object($xml_parser, $this);
xml_set_element_handler($xml_parser, "tag_open", "tag_close");
xml_set_character_data_handler($xml_parser, "cdata");
// Parse the XML
$parse_results = xml_parse($xml_parser, $data);
// Clean up
xml_parser_free($xml_parser);
if ($parse_results) {
return $this->obj_data;
} else {
return false;
}
}
function tag_open($parser, $tag, $attributes) {
// Create the new Child tag
$theTag = new XMLTag($tag);
$theTag->addAttributes($attributes);
// Add the child tag to the current tag. The & is necessary to get a pointer to the object child in the object. Not a Clone of it.
$childTag = &$this->stack[$this->index]->addChild($theTag);
// Make the new child tag now be the current tag
$this->index++;
$this->stack[$this->index] = &$childTag;
}
function cdata($parser, $cdata) {
$this->stack[$this->index]->cdata = $cdata;
}
function tag_close($parser, $tag) {
$this->index--;
}
}
?>
XMLTag.phpclass
<?
class XMLTag {
var $name; // String
var $cdata; // String
var $children; // Array of children. array['tag'][0] is the tag count. [1] ... [n] reference each child tag object
// Constructor
function XMLTag($name, $cdata="") {
$this->name = $name;
$this->cdata = $cdata;
}
// Addes attributes as children. An attribute is really just a quick way to do a child tag anyways
function addAttributes($attribute_array) {
foreach ($attribute_array as $key => $value) {
// If no children by this name exist then initialize it.
if ( !isset($this->children[$key][0]) ) {
$index = 1;
$this->children[$key][0] = 0;
} else {
$index = $this->children[$key][0] + 1;
}
// Assign the value and increment the tag count
$this->children[$key][$index] = new XMLTag($key, $value);
$this->children[$key][0]++;
}
return;
}
function addChild($XMLTag_obj) {
$key = $XMLTag_obj->name;
// If this tag *name* is not already a child initialize it.
if ( !isset($this->children[$key][0]) ) {
$index = 1;
$this->children[$key][0] = 0;
} else {
$index = $$this->children[$key][0] + 1;
}
// Add the child and update the tag count
$this->children[$key][$index] = $XMLTag_obj;
$this->children[$key][0]++;
// Return the Child Tag
return $this->children[$key][$index];
}
}
?>
nate at adeptisoft dot com
22-Apr-2004 09:51
just a slight modification to info at b1g dot de's wonderful RDFParse class... I have changed "titel" to "title" and added the description.. so output should look like:
[1] => Array
(
[description] => Some story here
[title] => A title
[link] => http://www.urlofchoice.net/
)
-------
<?php
class RDFParser {
var $_item;
var $_url;
function RDFParser($url) {
$this->_url = $url;
}
function ParseRDF() {
$this->_item = array('i' => 0);
$parser = xml_parser_create();
xml_set_object($parser, &$this);
xml_set_element_handler($parser, "_startElement", "_endElement");
xml_set_character_data_handler($parser, "_charHandler");
$fp = fopen($this->_url, "r");
while(!feof($fp)) {
$line = fgets($fp, 4096);
xml_parse($parser, $line);
}
fclose($fp);
xml_parser_free($parser);
return($this->_item['items']);
}
function _startElement($parser, $name, $attrs) {
$this->_item['maychar'] = true;
if($name=="ITEM") {
$this->_item['mayparse'] = true;
$this->_item['i']++;
} elseif($name=="TITLE") {
$this->_item['akt'] = "TITLE";
} elseif($name=="LINK") {
$this->_item['akt'] = "LINK";
} elseif($name=="DESCRIPTION") {
$this->_item['akt'] = "DESCRIPTION";
} else {
$this->_item['maychar'] = false;
}
}
function _endElement($parser, $name) {
if($name=="ITEM") {
$this->_item['mayparse'] = false;
} elseif($name=="TITLE" || $name=="LINK" || $name="DESCRIPTION") {
$this->_item['maychar'] = false;
}
}
function _charHandler($parser, $data) {
if($this->_item['maychar'] && $this->_item['mayparse']) {
if($this->_item['akt']=="TITLE") {
$this->_item['items'][$this->_item['i']]['title'] = $data;
}
if($this->_item['akt']=="LINK") {
$this->_item['items'][$this->_item['i']]['link'] = $data;
}
if($this->_item['akt']=="DESCRIPTION") {
$this->_item['items'][$this->_item['i']]['description'] = $data;
}
}
}
}
?>
chibo at gmx dot de
15-Apr-2004 07:09
TO: jon at gettys dot org (the simple xml parser)
For german language change the function:
function characterData($parser, $data) {
global $obj;
eval($obj->tree.'->data=\''. $data .'\';');
}
to:
function characterData($parser, $data) {
global $obj;
eval($obj->tree.'->data.=\''. $data .'\';');
}
to get all the value of the attribute! otherwise you get only the last piece of the entire string.
Greets,
Chi
askgopal [AT] sify [PERIOD] com
06-Apr-2004 06:47
A simple XML parser that would allow us to retrieve a value of an element using its path.
-- cut here --
<?
$_elements = array();
$_cur_path = '';
function parse_xml_config($file, $elems)
{
global $_elements;
$e = error_reporting(0);
if (($fp = fopen($file, 'r')) === false)
return ($elements);
$xph = xml_parser_create();
if (is_resource($xph)) {
xml_parser_set_option($xph, XML_OPTION_CASE_FOLDING, true);
if (!xml_set_element_handler($xph,
'start_elem_handler', 'end_elem_handler'))
return ($elements);
while (($data = fread($fp, 4096)))
xml_parse($xph, $data, feof($fp));
xml_parser_free($xph);
}
fclose($fp);
$elems = $_elements;
error_reporting($e);
}
function start_elem_handler($xph, $name, $attrs)
{
global $_elements, $_cur_path;
$e = error_reporting(0);
$_cur_path .= "/$name";
while (list($key,$val) = each($attrs)) {
$index = "$_cur_path/$key";
if (isset($_elements[$index])) {
$tmp = $_elements[$index];
$_elements[$index] = array();
array_push($_elements[$index], $tmp);
array_push($_elements[$index], $val);
} else
$_elements[$index] = $val;
}
error_reporting($e);
}
function end_elem_handler($xph, $name)
{
global $_elements, $_cur_path;
$_cur_path = dirname($_cur_path);
}
/* main prog */
$config = array();
parse_xml_config('/usr/local/etc/myconfig.xml', &$config);
print_r($config);
?>
-- paste --
if the input is:
<config>
<db host="localhost" username="foo" password="bar" db="test"/>
<column name="x" value="x1"/>
<column name="y" value="y1"/>
</config>
the output would be:
Array
(
[/CONFIG/DB/HOST] => localhost
[/CONFIG/DB/USERNAME] => foo
[/CONFIG/DB/PASSWORD] => bar
[/CONFIG/DB/DB] => test
[/CONFIG/COLUMN/NAME] => Array
(
[0] => x
[1] => y
)
[/CONFIG/COLUMN/VALUE] => Array
(
[0] => x1
[1] => y1
)
)
odders
19-Mar-2004 02:36
I wrote a simple xml parser mainly to deal with rss version 2. I found lots of examples on the net, but they were all masive and bloated and hard to manipulate.
Output is sent to an array, which holds arrays containg data for each item.
Obviously, you will have to make modifications to the code to suit your needs, but there isnt a lot of code there, so that shouldnt be a problem.
<?php
$currentElements = array();
$newsArray = array();
readXml("./news.xml");
echo("<pre>");
print_r($newsArray);
echo("</pre>");
// Reads XML file into formatted html
function readXML($xmlFile)
{
$xmlParser = xml_parser_create();
xml_parser_set_option($xmlParser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($xmlParser, startElement, endElement);
xml_set_character_data_handler($xmlParser, characterData);
$fp = fopen($xmlFile, "r");
while($data = fread($fp, filesize($xmlFile))){
xml_parse($xmlParser, $data, feof($fp));}
xml_parser_free($xmlParser);
}
// Sets the current XML element, and pushes itself onto the element hierarchy
function startElement($parser, $name, $attrs)
{
global $currentElements, $itemCount;
array_push($currentElements, $name);
if($name == "item"){$itemCount += 1;}
}
// Prints XML data; finds highlights and links
function characterData($parser, $data)
{
global $currentElements, $newsArray, $itemCount;
$currentCount = count($currentElements);
$parentElement = $currentElements[$currentCount-2];
$thisElement = $currentElements[$currentCount-1];
if($parentElement == "item"){
$newsArray[$itemCount-1][$thisElement] = $data;}
else{
switch($name){
case "title":
break;
case "link":
break;
case "description":
break;
case "language":
break;
case "item":
break;}}
}
// If the XML element has ended, it is poped off the hierarchy
function endElement($parser, $name)
{
global $currentElements;
$currentCount = count($currentElements);
if($currentElements[$currentCount-1] == $name){
array_pop($currentElements);}
}
?>
talraith at withouthonor dot com
03-Feb-2004 06:27
I have created a class set that both parses XML into an object structure and from that structure creates XML code. It is mostly finished but I thought I would post here as it may help someone out or if someone wants to use it as a base for their own parser. The method for creating the object is original compared to the posts before this one.
The object tree is created by created seperate tag objects for each tag inside the main document object and associating them together by way of object references. An index table is created so that each tag is assigned an ID number (in numerical order from 0) and can be accessed directly using that ID number. Each tag has object references to its children. There are no uses of eval() in this code.
The code is too long to post here, so I have made a HTML page that has it: http://www.withouthonor.com/obj_xml.html
Sample code would look something like this:
<?
$xml = new xml_doc($my_xml_code);
$xml->parse();
$root_tag =& $xml->xml_index[0];
$children =& $root_tag->children;
// and so forth
// To create XML code using the object, would be similar to this:
$my_xml = new xml_doc();
$root_tag = $my_xml->CreateTag('ROOTTAG');
$my_xml->CreateTag('CHILDTAG',array(),'',$root_tag);
// The following is used for the CreateTag() method
// string Name (The name of the child tag)
// array Attributes (associative array of attributes for tag)
// string Content (textual data for the child tag)
// int ParentID (Index number for parent tag)
// To generate the XML, use the following method
$out_xml = $my_xml->generate();
?>
condor33NOSPAM at tiscali dot it
10-Dec-2003 03:01
This is a variation to the routine posted here by
jon at gettys dot org to convert an XML file
into a php structure.
I did not find a cleaner method than "eval"
as he asks, but anyway his way is not so bad.
<?php
Class xmlread
{
var $tree = '$this->ogg';
var $ogg ;
var $cnt = 0;
/************
* change_to_array
* is called by startElement
* to check if there is need
* to change element to array
************/
function change_to_array($test,$is_arr) {
if ($test and !$is_arr): //if element is set, change it to array
eval('$tmp = '.$this->tree.';'); //save element to tmp
eval('unset('.$this->tree.');'); //unset element
eval(''.$this->tree.'= array();'); //transform $this->tree in an array
eval('array_push('.$this->tree.',$tmp);');//push old object
//into the array
return true;
endif;
if ($is_arr)
return true;
}
/************
* startElement
************/
function startElement($parser, $name, $attrs)
{
$this->tree = $this->tree."->".$name; //add tag to tree string
//test if element is an array
eval('$is_arr = is_array('.$this->tree.');');
//test if element is set
eval('$test = isset('.$this->tree.');');
//if is already set (and not array)...
//...change it to array
$is_arr = $this->change_to_array($test,$is_arr);
if ($is_arr): //if is an array
$this->cnt = $this->cnt+1; //increase counter
//and set tree-string to add element
$this->tree = $this->tree.'['.$this->cnt.']';
endif;
return true;}
/************
* characterData
************/
function characterData($parser, $data)
{
if (trim($data)!=''):
$data = addslashes($data);
//add data to tree set up by startElement()
eval($this->tree."='".trim($data)."';");
endif;
return true;}
/************
* endElement
************/
function endElement($parser, $name)
{ //cut last ->element
$pos = strrpos($this->tree, ">");
$leng = strlen($this->tree);
$pos1 = ($leng-$pos)+1;
$this->tree = substr($this->tree, 0, -$pos1);
return true;}
/************
* get_data: this is the
* parser
************/
function get_data
($doc,$st_el='startElement',
$end_el='endElement',
$c_data='characterData') {
$this->mioparser = xml_parser_create();
xml_set_object($this->mioparser, &$this);
xml_set_element_handler
($this->mioparser, $st_el,$end_el);
xml_set_character_data_handler
($this->mioparser,$c_data);
xml_parser_set_option
($this->mioparser, XML_OPTION_CASE_FOLDING, false);
xml_parse($this->mioparser,$doc);
if (xml_get_error_code($this->mioparser)):
print "<b>XML error at line n. ".
xml_get_current_line_number
($this->mioparser)." -</b> ";
print xml_error_string
(xml_get_error_code($this->mioparser));
endif;
return true; }
function xmlread($doc) {
$xml = file_get_contents('document.xml');
$this->get_data($xml);
return true; }
} //end of class
?>
chris at hitcatcher dot com
08-Nov-2003 06:48
In regards to jon at gettys dot org's XML object, The data should be TRIM()ed to remove any whitespace that could appear in CDATA entered as :
<xml_tag>
cdata here. cdata here. cdata here. cdata here.
</xml_tag>
So, after applying fred at barron dot com's suggested change to the characterData function, the function should appear as:
function characterData($parser, $data)
{
global $obj;
$data = addslashes($data);
eval($obj->tree."->data.='".trim($data)."';");
}
SIDE NOTE: I'm fairly new to XML so perhaps it is considered bad form to enter CDATA as I did in my example. Is this true or is the extra whitespace for the sake of readablity acceptable?
ml at csite dot com
02-Jul-2003 11:29
A fix for the fread breaking thing:
while ($data = fread($fp, 4096)) {
$data = $cache . $data;
if (!feof($fp)) {
if (preg_match_all("(</?[a-z0-9A-Z]+>)", $data, $regs)) {
$lastTagname = $regs[0][count($regs[0])-1];
$split = false;
for ($i=strlen($data)-strlen($lastTagname); $i>=strlen($lastTagname); $i--) {
if ($lastTagname == substr($data, $i, strlen($lastTagname))) {
$cache = substr($data, $i, strlen($data));
$data = substr($data, 0, $i);
$split = true;
break;
}
}
}
if (!$split) {
$cache = $data;
}
}
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
panania at 3ringwebs dot com
21-May-2003 06:12
The above example doesn't work when you're parsing a string being returned from a curl operation (why I don't know!) I kept getting undefined offsets at the highest element number in both the start and end element functions. It wasn't the string itself I know, because I substringed it to death with the same results. But I fixed the problem by adding these lines of code...
function defaultHandler($parser, $name) {
global $depth;
@ $depth[$parser]--;
}
xml_set_default_handler($xml_parser, "defaultHandler");
Hope this helps 8-}
fred at barron dot com
23-Apr-2003 08:28
regarding jon at gettys dot org's nice XML to Object code, I've made some useful changes (IMHO) to the characterData function... my minor modifications allow multiple lines of data and it escapes quotes so errors don't occur in the eval...
function characterData($parser, $data)
{
global $obj;
$data = addslashes($data);
eval($obj->tree."->data.='".$data."';");
}
software at serv-a-com dot com
18-Feb-2003 01:10
2. Pre Parser Strings and New Line Delimited Data
One important thing to note at this point is that the xml_parse function requires a string variable. You can manipulate the content of any string variable easily as we all know.
A better approach to removing newlines than:
while ($data = fread($fp, 4096)) {
$data = preg_replace("/\n|\r/","",$data); //flarp
if (!xml_parse($xml_parser, $data, feof($fp))) {...
Above works across all 3 line-delimited text files (\n, \r, \r\n). But this could potentially (or will most likely) damage or scramble data contained in for example CDATA areas. As far as I am concerned end of line characters should not be used _within_ XML tags. What seems to be the ultimate solution is to pre-parse the loaded data this would require checking the position within the XML document and adding or subtracting (using a in-between fread temporary variable) data based on conditions like: "Is within tag", "Is within CDATA" etc. before fedding it to the parser. This of course opens up a new can of worms (as in parse data for the parser...). (above procedure would take place between fread and xml_parser calls this method would be compatible with the general usage examples on top of the page)
3. The Answer to parsing arbitrary XML and Preprocessor Revisited
You can't just feed any XML document to the parser you constructed and assuming that it will work! You have to know what kind of methods for storing data are used, for example is there a end of line delimited data in the file ?, Are there any carriage returns in the tags etc... XML files come formatted in different ways some are just a one long string of characters with out any end of line markers others have newlines, carriage returns or both (Microsloth Windows). May or may not contain space and other whitespace between tags. For this reason it is important to what I call Normalize the data before feeding it to the parser. You can perform this with regular expressions or plain old str_replace and concatenation. In many cases this can be done to the file it self sometimes to string data on the fly( as shown in the example above). But I feel it is important to normalize the data before even calling the function to call xml_parse. If you have the ability to access all data before that call you can convert it to what you fell the data should have been in the first place and omit many surprises and expensive regular expression substitution (in a tight spot) while fread'ing the data.
software at serv-a-com dot com
18-Feb-2003 01:09
My previous XML post (software at serv-a-com dot com/22-Jan-2003 03:08) resulted in some of the visitors e-mailg me on the carriage return stripping issue with questions. I'll try to make the following mumble as brief and easy to understand as possible.
1. Overview of the 4096 fragmentation issue
As you know the following freads the file 4096 bytes at a time (that is 4KB) this is perhaps ok for testing expat and figuring out how things work, but it it rather dangerous in the production environment. Data may not be fully understandable due to fread fragmentation and improperly formatted due to numerous sources(formats) of data contained within (i.e. end of line delimited CDATA).
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
Sometimes to save time one may want to load it all up into a one big variable and leave all the worries to expat. I think anything under 500 KB is ok (as long as nobody knows about it). Some may argue that larger variables are acceptable or even necessary because of the magic that take place while parsing using xml_parse. Our XML parser(expat) works and can be successfully implemented only when we know what type of XML data we are dealing with, it's average size and structure of general layout and data contained within tags. For example if the tags are followed by a line delimiter like a new line we can read it with fgets in and with minimal effort make sure that no data will be sent to the function that does not end with a end tag. But this require a fair knowledge of the file's preference for storing XML data and tags (and a bit of code between reading data and xml_parse'ing it).
software at serv-a-com dot com
23-Jan-2003 06:08
use:
while ($data = str_replace("\n","",fread($fp, 4096))){
instead of:
while ($data = fread($fp, 4096)) {
It will save you a headache.
and in response to (simen at bleed dot no 11-Jan-2003 04:27) "If the 4096 byte buffer fills up..."
Please take better care of your data don't just shove it in to the xml_parse() check and make sure that the tags are not sliced the middle, use a temporary variable between fread and xml_parse.
simen at bleed dot no
12-Jan-2003 07:27
I was experiencing really wierd behaviour loading a large XML document (91k) since the buffer of 4096, when reading the file actually doesn't take into consideration the following:
<node>this is my value</node>
If the 4096 byte buffer fills up at "my", you will get a split string into your xml_set_character_data_handler().
The only solution I've found so far is to read the whole document into a variable and then parse.
mreilly at ZEROSPAM dot MAC dot COM
15-Nov-2002 02:01
I wanted a way to reference the XML tree by path. I couldn't find exactly what I wanted, but using examples here and on phpbuilder.com came up with this. This results in a nested associative array, so elements can be accessed in the manner:
echo $ary_parsed_file['path']['to']['value'];
<?php
// Display the print_r() output in a readable format
echo '<PRE>';
// Array to store current xml path
$ary_path = array();
// Array to store parsed data
$ary_parsed_file = array();
// Starting level - Set to 0 to display all levels. Set to 1 or higher
// to skip a level that is common to all the fields.
$int_starting_level = 1;
// what are we parsing?
$xml_file = 'label.xml';
// declare the character set - UTF-8 is the default
$type = 'UTF-8';
// create our parser
$xml_parser = xml_parser_create($type);
// set some parser options
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, 'UTF-8');
// this tells PHP what functions to call when it finds an element
// these funcitons also handle the element's attributes
xml_set_element_handler($xml_parser, 'startElement','endElement');
// this tells PHP what function to use on the character data
xml_set_character_data_handler($xml_parser, 'characterData');
if (!($fp = fopen($xml_file, 'r'))) {
die("Could not open $xml_file for parsing!\n");
}
// loop through the file and parse baby!
while ($data = fread($fp, 4096)) {
if (!($data = utf8_encode($data))) {
echo 'ERROR'."\n";
}
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf( "XML error: %s at line %d\n\n",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
// Display the array
print_r($ary_parsed_file);
// This function is called for every opening XML tag. We
// need to keep track of our path in the XML file, so we
// will use this function to add the tag name to an array
function startElement($parser, $name, $attrs=''){
// Make sure we can access the path array
global $ary_path;
// Push the tag into the array
array_push($ary_path, $name);
}
// This function is called for every closing XML tag. We
// need to keep track of our path in the XML file, so we
// will use this function to remove the last item of the array.
function endElement($parser, $name, $attrs=''){
// Make sure we can access the path array
global $ary_path;
// Push the tag into the array
array_pop($ary_path);
}
// This function is called for every data portion found between
// opening and closing tags. We will use it to insert values
// into the array.
function characterData($parser, $data){
// Make sure we can access the path and parsed file arrays
// and the starting level value
global $ary_parsed_file, $ary_path, $int_starting_level;
// Remove extra white space from the data (so we can tell if it's empty)
$str_trimmed_data = trim($data);
// Since this function gets called whether there is text data or not,
// we need to prevent it from being called when there is no text data
// or it overwrites previous legitimate data.
if (!empty($str_trimmed_data)) {
// Build the array definition string
$str_array_define = '$ary_parsed_file';
// Add a [''] and data for each level. (Starting level can be defined.)
for ($i = $int_starting_level; $i < count($ary_path); $i++) {
$str_array_define .= '[\'' . $ary_path[$i] . '\']';
}
// Add the value portion of the statement
$str_array_define .= " = '" . $str_trimmed_data . "';";
// Evaluate the statement we just created
eval($str_array_define);
// DEBUG
//echo "\n" . $str_array_define;
} // if
}
?>
sfaulkner at hoovers dot com
04-Nov-2002 04:29
Building on... This allows you to return the value of an element using an XPath reference. This code would of course need error handling added :-)
function GetElementByName ($xml, $start, $end) {
$startpos = strpos($xml, $start);
if ($startpos === false) {
return false;
}
$endpos = strpos($xml, $end);
$endpos = $endpos+strlen($end);
$endpos = $endpos-$startpos;
$endpos = $endpos - strlen($end);
$tag = substr ($xml, $startpos, $endpos);
$tag = substr ($tag, strlen($start));
return $tag;
}
function XPathValue($XPath,$XML) {
$XPathArray = explode("/",$XPath);
$node = $XML;
while (list($key,$value) = each($XPathArray)) {
$node = GetElementByName($node, "<$value>", "</$value>");
}
return $node;
}
print XPathValue("Response/Shipment/TotalCharges/Value",$xml);
guy at bhaktiandvedanta dot com
28-Sep-2002 03:01
For a simple XML parser you can use this function. It doesn't require any extensions to run.
<?
// Extracts content from XML tag
function GetElementByName ($xml, $start, $end) {
global $pos;
$startpos = strpos($xml, $start);
if ($startpos === false) {
return false;
}
$endpos = strpos($xml, $end);
$endpos = $endpos+strlen($end);
$pos = $endpos;
$endpos = $endpos-$startpos;
$endpos = $endpos - strlen($end);
$tag = substr ($xml, $startpos, $endpos);
$tag = substr ($tag, strlen($start));
return $tag;
}
// Open and read xml file. You can replace this with your xml data.
$file = "data.xml";
$pos = 0;
$Nodes = array();
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($getline = fread($fp, 4096)) {
$data = $data . $getline;
}
$count = 0;
$pos = 0;
// Goes throw XML file and creates an array of all <XML_TAG> tags.
while ($node = GetElementByName($data, "<XML_TAG>", "</XML_TAG>")) {
$Nodes[$count] = $node;
$count++;
$data = substr($data, $pos);
}
// Gets infomation from tag siblings.
for ($i=0; $i<$count; $i++) {
$code = GetElementByName($Nodes[$i], "<Code>", "</Code>");
$desc = GetElementByName($Nodes[$i], "<Description>", "</Description>");
$price = GetElementByName($Nodes[$i], "<BasePrice>", "</BasePrice>");
}
?>
Hope this helps! :)
Guy Laor
dmarsh dot NO dot SPAM dot PLEASE at spscc dot ctc dot edu
19-Sep-2002 03:27
Some reference code I am working on as "XML Library" of which I am folding it info an object. Notice the use of the DEFINE:
Mainly Example 1 and parts of 2 & 3 re-written as an object:
--- MyXMLWalk.lib.php ---
<?php
if (!defined("PHPXMLWalk")) {
define("PHPXMLWalk",TRUE);
class XMLWalk {
var $p; //short for xml parser;
var $e; //short for element stack/array
function prl($x,$i=0) {
ob_start();
print_r($x);
$buf=ob_get_contents();
ob_end_clean();
return join("\n".str_repeat(" ",$i),split("\n",$buf));
}
function XMLWalk() {
$this->p = xml_parser_create();
$this->e = array();
xml_parser_set_option($this->p, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($this->p, array(&$this, "startElement"), array(&$this, "endElement"));
xml_set_character_data_handler($this->p, array(&$this, "dataElement"));
register_shutdown_function(array(&$this, "free")); // make a destructor
}
function startElement($parser, $name, $attrs) {
if (count($attrs)>=1) {
$x = $this->prl($attrs, $this->e[$parser]+6);
} else {
$x = "";
}
print str_repeat(" ",$this->e[$parser]+0). "$name $x\n";
$this->e[$parser]++;
$this->e[$parser]++;
}
function dataElement($parser, $data) {
print str_repeat(" ",$this->e[$parser]+0). htmlspecialchars($data, ENT_QUOTES) ."\n";
}
function endElement($parser, $name) {
$this->e[$parser]--;
$this->e[$parser]--;
}
function parse($data, $fp) {
if (!xml_parse($this->p, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($this->p)),
xml_get_current_line_number($this->p)));
}
}
function free() {
xml_parser_free($this->p);
}
} // end of class
} // end of define
?>
--- end of file ---
Calling code:
<?php
...
require("MyXMLWalk.lib.php");
$file = "x.xml";
$xme = new XMLWalk;
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
$xme->parse($data, $fp);
}
...
?>
jon at gettys dot org
15-Aug-2002 04:59
[Editor's note: see also xml_parse_into_struct().]
Very simple routine to convert an XML file into a PHP structure. $obj->xml contains the resulting PHP structure. I would be interested if someone could suggest a cleaner method than the evals I am using.
<?
$filename = 'sample.xml';
$obj->tree = '$obj->xml';
$obj->xml = '';
function startElement($parser, $name, $attrs) {
global $obj;
// If var already defined, make array
eval('$test=isset('.$obj->tree.'->'.$name.');');
if ($test) {
eval('$tmp='.$obj->tree.'->'.$name.';');
eval('$arr=is_array('.$obj->tree.'->'.$name.');');
if (!$arr) {
eval('unset('.$obj->tree.'->'.$name.');');
eval($obj->tree.'->'.$name.'[0]=$tmp;');
$cnt = 1;
}
else {
eval('$cnt=count('.$obj->tree.'->'.$name.');');
}
$obj->tree .= '->'.$name."[$cnt]";
}
else {
$obj->tree .= '->'.$name;
}
if (count($attrs)) {
eval($obj->tree.'->attr=$attrs;');
}
}
function endElement($parser, $name) {
global $obj;
// Strip off last ->
for($a=strlen($obj->tree);$a>0;$a--) {
if (substr($obj->tree, $a, 2) == '->') {
$obj->tree = substr($obj->tree, 0, $a);
break;
}
}
}
function characterData($parser, $data) {
global $obj;
eval($obj->tree.'->data=\''.$data.'\';');
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($filename, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
print_r($obj->xml);
return 0;
?>
jason at N0SPAM dot projectexpanse dot com
23-Mar-2002 05:16
In reference to the note made by sam@cwa.co.nz about parsing entities:
I could be wrong, but since it is possible to define your own entities within an XML DTD, the cdata handler function parses these individually to allow for your own implementation of those entities within your cdata handler.
jason at NOSPAM_projectexpanse_NOSPAM dot com
27-Feb-2002 08:11
For newbies wanting a good tutorial on how to actually get started and where to go from this listing of functions, then visit:
http://www.wirelessdevnet.com/channels/wap/features/xmlcast_php.html
It shows an excellent example of how to read the XML data into a class file so you can actually process it, not just display it all pretty-like, like many tutorials on PHP/XML seem to be doing.
hans dot schneider at bbdo-interone dot de
25-Jan-2002 12:43
I had to TRIM the data when I passed one large String containig a wellformed XML-File to xml_parse. The String was read by CURL, which aparently put a BLANK at the end of the String. This BLANK produced a "XML not wellformed"-Error in xml_parse!
sam at cwa dot co dot nz
28-Sep-2000 10:39
I've discovered some unusual behaviour in this API when ampersand entities are parsed in cdata; for some reason the parser breaks up the section around the entities, and calls the handler repeated times for each of the sections. If you don't allow for this oddity and you are trying to put the cdata into a variable, only the last part will be stored.
You can get around this with a line like:
$foo .= $cdata;
If the handler is called several times from the same tag, it will append them, rather than rewriting the variable each time. If the entire cdata section is returned, it doesn't matter.
May happen for other entities, but I haven't investigated.
Took me a while to figure out what was happening; hope this saves someone else the trouble.
Daniel dot Rendall at btinternet dot com
08-Jul-1999 01:21
When using the XML parser, make sure you're not using the magic quotes option (e.g. use set_magic_quotes_runtime(0) if it's not the compiled default), otherwise you'll get 'not well-formed' errors when dealing with tags with attributes set in them.
|  |