 |
CV. PDF functions
The PDF functions in PHP can create PDF files using the PDFlib
library created by Thomas
Merz.
The documentation in this section is only meant to be an overview
of the available functions in the PDFlib library and should not be
considered an exhaustive reference. Please consult the
documentation included in the source distribution of PDFlib for
the full and detailed explanation of each function here. It
provides a very good overview of what PDFlib is capable of doing
and contains the most up-to-date documentation of all functions.
All of the functions in PDFlib and the PHP module have identical
function names and parameters. You will need to understand some
of the basic concepts of PDF and PostScript to efficiently use
this extension. All lengths and coordinates are measured in
PostScript points. There are generally 72 PostScript points to an
inch, but this depends on the output resolution. Please see the
PDFlib documentation included with the source distribution of
PDFlib for a more thorough explanation of the coordinate system
used.
Please note that most of the PDF functions require a
pdfdoc as its first parameter. Please
see the examples below for more
information.
注:
If you're interested in alternative free PDF generators that do not
utilize external PDF libraries, see
this related FAQ.
注:
This extension has been moved to PECL as
of PHP 4.3.9.
PDFlib is available for download at http://www.pdflib.com/products/pdflib/index.html, but requires that you purchase
a license for commercial use. The JPEG and TIFF libraries are required to compile
this extension.
Any version of PHP 4 after March 9, 2000 does not support versions
of PDFlib older than 3.0.
PDFlib 3.0 or greater is supported by PHP 3.0.19 and later.
本 PECL 扩展未绑定于 PHP 中。
进一步信息例如新版本,下载,源程序,维护者信息以及更新日志可以在此找到:
http://pecl.php.net/package/pdflib.
To get these functions to work in PHP < 4.3.9, you have to compile PHP with
--with-pdflib[=DIR]. DIR is the PDFlib
base install directory, defaults to /usr/local.
In addition you can specify the jpeg, tiff, and pnglibrary for PDFlib to
use, which is optional for PDFlib 4.x.
To do so add to your configure line the options
--with-jpeg-dir[=DIR]
--with-png-dir[=DIR]
--with-tiff-dir[=DIR].
When using version 3.x of PDFlib, you should configure PDFlib
with the option --enable-shared-pdflib.
As of PHP 4.3.9, you must install this extension through PEAR, using the following command:
pear install pdflib.
本扩展模块在 php.ini 中未定义任何配置选项。
Starting with PHP 4.0.5, the PHP extension for PDFlib is
officially supported by PDFlib GmbH. This means that all the
functions described in the PDFlib manual (V3.00 or greater) are
supported by PHP 4 with exactly the same meaning and the same
parameters. Only the return values may differ from the PDFlib
manual, because the PHP convention of returning
FALSE was adopted. For compatibility reasons,
this binding for PDFlib still supports the old functions, but they
should be replaced by their new versions. PDFlib GmbH will not
support any problems arising from the use of these deprecated
functions.
表格 1. Deprecated functions and their replacements
Most of the functions are fairly easy to use. The most difficult part
is probably creating your first PDF document. The following
example should help to get you started.
It creates test.pdf
with one page. The page contains the text "Times Roman outlined" in an
outlined, 30pt font. The text is also underlined.
例子 1. Creating a PDF document with PDFlib
<?php $pdf = pdf_new(); pdf_open_file($pdf, "test.pdf"); pdf_set_info($pdf, "Author", "Uwe Steinmann"); pdf_set_info($pdf, "Title", "Test for PHP wrapper of PDFlib 2.0"); pdf_set_info($pdf, "Creator", "See Author"); pdf_set_info($pdf, "Subject", "Testing"); pdf_begin_page($pdf, 595, 842); pdf_add_outline($pdf, "Page 1"); $font = pdf_findfont($pdf, "Times New Roman", "winansi", 1); pdf_setfont($pdf, $font, 10); pdf_set_value($pdf, "textrendering", 1); pdf_show_xy($pdf, "Times Roman outlined", 50, 750); pdf_moveto($pdf, 50, 740); pdf_lineto($pdf, 330, 740); pdf_stroke($pdf); pdf_end_page($pdf); pdf_close($pdf); pdf_delete($pdf); echo "<A HREF=getpdf.php>finished</A>"; ?>
|
|
The script getpdf.php just returns the pdf document.
例子 2. Outputting a precalculated PDF
<?php $len = filesize($filename); header("Content-type: application/pdf"); header("Content-Length: $len"); header("Content-Disposition: inline; filename=foo.pdf"); readfile($filename); ?>
|
|
The PDFlib distribution contains a more complex example which
creates a page with an analog clock. Here we use the in-memory
creation feature of PDFlib to alleviate the need to use temporary
files. The example was converted to PHP from the PDFlib example.
(The same example is available in the CLibPDF documentation.)
例子 3. pdfclock example from PDFlib distribution
<?php $radius = 200; $margin = 20; $pagecount = 10;
$pdf = pdf_new();
if (!pdf_open_file($pdf, "")) { echo error; exit; };
pdf_set_parameter($pdf, "warning", "true");
pdf_set_info($pdf, "Creator", "pdf_clock.php"); pdf_set_info($pdf, "Author", "Uwe Steinmann"); pdf_set_info($pdf, "Title", "Analog Clock");
while ($pagecount-- > 0) { pdf_begin_page($pdf, 2 * ($radius + $margin), 2 * ($radius + $margin));
pdf_set_parameter($pdf, "transition", "wipe"); pdf_set_value($pdf, "duration", 0.5); pdf_translate($pdf, $radius + $margin, $radius + $margin); pdf_save($pdf); pdf_setrgbcolor($pdf, 0.0, 0.0, 1.0);
/* minute strokes */ pdf_setlinewidth($pdf, 2.0); for ($alpha = 0; $alpha < 360; $alpha += 6) { pdf_rotate($pdf, 6.0); pdf_moveto($pdf, $radius, 0.0); pdf_lineto($pdf, $radius-$margin/3, 0.0); pdf_stroke($pdf); }
pdf_restore($pdf); pdf_save($pdf);
/* 5 minute strokes */ pdf_setlinewidth($pdf, 3.0); for ($alpha = 0; $alpha < 360; $alpha += 30) { pdf_rotate($pdf, 30.0); pdf_moveto($pdf, $radius, 0.0); pdf_lineto($pdf, $radius-$margin, 0.0); pdf_stroke($pdf); }
$ltime = getdate();
/* draw hour hand */ pdf_save($pdf); pdf_rotate($pdf,-(($ltime['minutes']/60.0)+$ltime['hours']-3.0)*30.0); pdf_moveto($pdf, -$radius/10, -$radius/20); pdf_lineto($pdf, $radius/2, 0.0); pdf_lineto($pdf, -$radius/10, $radius/20); pdf_closepath($pdf); pdf_fill($pdf); pdf_restore($pdf);
/* draw minute hand */ pdf_save($pdf); pdf_rotate($pdf,-(($ltime['seconds']/60.0)+$ltime['minutes']-15.0)*6.0); pdf_moveto($pdf, -$radius/10, -$radius/20); pdf_lineto($pdf, $radius * 0.8, 0.0); pdf_lineto($pdf, -$radius/10, $radius/20); pdf_closepath($pdf); pdf_fill($pdf); pdf_restore($pdf);
/* draw second hand */ pdf_setrgbcolor($pdf, 1.0, 0.0, 0.0); pdf_setlinewidth($pdf, 2); pdf_save($pdf); pdf_rotate($pdf, -(($ltime['seconds'] - 15.0) * 6.0)); pdf_moveto($pdf, -$radius/5, 0.0); pdf_lineto($pdf, $radius, 0.0); pdf_stroke($pdf); pdf_restore($pdf);
/* draw little circle at center */ pdf_circle($pdf, 0, 0, $radius/30); pdf_fill($pdf);
pdf_restore($pdf);
pdf_end_page($pdf);
# to see some difference sleep(1); }
pdf_close($pdf);
$buf = pdf_get_buffer($pdf); $len = strlen($buf);
header("Content-type: application/pdf"); header("Content-Length: $len"); header("Content-Disposition: inline; filename=foo.pdf"); echo $buf;
pdf_delete($pdf); ?>
|
|
注:
An alternative PHP module for PDF document creation based on
FastIO's ClibPDF is
available. Please see the ClibPDF
section for details. Note that ClibPDF has a slightly different API
than PDFlib.
phpguy at theos dot me dot uk
01-Mar-2006 09:17
On my system at least (debian stable) the command to install pdflib is not
pear install pdflib
but rather
pecl install pdflib
spingary at yahoo dot com
13-Jan-2006 04:55
I was having trouble with streaming inline PDf's using PHP 5.0.2, Apache 2.0.54.
This is my code:
<?
header("Pragma: public");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: must-revalidate");
header("Content-type: application/pdf");
header("Content-Length: ".filesize($file));
header("Content-disposition: inline; filename=$file");
header("Accept-Ranges: ".filesize($file));
readfile($file);
exit();
?>
It would work fine in Mozilla Firefox (1.0.7) but with IE (6.0.2800.1106) it would not bring up the Adobe Reader plugin and instead ask me to save it or open it as a PHP file.
Oddly enough, I turned off ZLib.compression and it started working. I guess the compression is confusing IE. I tried leaving out the content-length header thinking maybe it was unmatched filesize (uncompressed number vs actual received compressed size), but then without it it screws up Firefox too.
What I ended up doing was disabling Zlib compression for the PDF output pages using ini_set:
<?
ini_set('zlib.output_compression','Off');
?>
Maybe this will help someone. Will post over in the PDF section as well.
davedotmarshallatcspencerltddotcodotuk
08-Nov-2005 08:17
RE: thodge at ipswich dot qld dot gov dot au
I think the line:
preg_match_all(
'/(T[wdcm*])[\s]*(\[([^\]]*)\]|\(([^\)]*)\))[\s]*Tj/si',
$postScriptData,
$matches
);
should read:
preg_match_all(
'/(T[wdcm*])[\s]*(\[([^\]]*)\]|\(([^\)]*)\))[\s]*Tj/si',
$psData,
$matches
);
ontwerp AT zonnet.nl
04-Nov-2005 03:01
I was searching for a lowcost/opensource option for combining static html files [as templates] and dynamic output from perl or php routines etc. And the sooner or later I found out that this was the most stable, 'speedest' and customizeable way to produce usable pdf 's with nice formatting :
1] create html page output [perl-> html output, direct html output from any app or php echo's etc. [sort these html files locally]
2] parse all html [inluding webimages links, tables font formatting etc] to [E]PS files with the perl app : html2ps [as mentioned beneath]
http://user.it.uu.se/~jan/html2ps.html [sort all ps files by future pdf page positions]
3] use the free ps2pdf/ps2pdfwr linux application
http://www.ps2pdf.com/convert/index.htm [uses gostscript, ghostview libs and so on etc]
Has great formatting options like headers, footers, numbering etc
[sort pdf files]
4] convert all pdf files to 1 pdf file with : pdftk [pdftoolkit], deliveres optional compressions/encryption, background stamps etc
One should ask why using different scripts :
- combination perl/php is great : perl is speedier at some issues like conversion to ps files in my experience
- ps to pdf is quickier then direct php to pdf [in my exp.!]
- I have total control over every files whenever i change html files as a template I use only editors or other app. for it [online or offline].
p.s. I had to make a opensource solution for creating simpel report analyses that's based on things like :
- first page [name / title / #/ date]
- some static info [like introduction, copyrights etc]
- some dynamic info [outputted from php->dbase queries] combined
with html tags/images etc.
And this all mixed [so seperated in files for transparancy]. Also the 3 way manner : data-> html, html->ps, ps->pdf, is easier and quickier to program or adjust in every step.
Correct me if i'm wrong [mail me to]
ing. Valentijn Langendorff
Design & Technologist
ragnar at deulos dot com
08-Oct-2005 10:30
After one hole day understanding how pdflib works i got the conclusion that its enough hard to draw just with words to furthermore for drawing a line maybe you will need something like four lines of code, so i did my own functions to do the life easier and the code more understable to modify and draw. I also made a function that will draw a rect with the corners round and the posibility even to fill it ;)
You can get it from http://www.deulos.com/pdf_php.php
feel free to make suggestions or whatever u like ;o)
18-Sep-2005 02:26
some code that can be very helpful for starters.
<?php
// Declare PDF File
$pdf = pdf_new();
PDF_open_file($pdf);
// Set Document Properties
PDF_set_info($pdf, "author", "Alexander Pas");
PDF_set_info($pdf, "title", "PDF by PHP Example");
PDF_set_info($pdf, "creator", "Alexander Pas");
PDF_set_info($pdf, "subject", "Testing Code");
// Get fonts to use
pdf_set_parameter($pdf, "FontOutline", "Arial=arial.ttf"); // get a custom font
$font1 = PDF_findfont($pdf, "Helvetica-Bold", "winansi", 0); // declare default font
$font2 = PDF_findfont($pdf, "Arial", "winansi", 1); // declare custom font & embed into file
/*
You can use the following Fontypes 14 safely (the default fonts)
Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique
Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique
Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic
Symbol, ZapfDingbats
*/
// make the images
$image1 = PDF_open_image_file($pdf, "gif", "image.gif"); //supported filetypes are: jpeg, tiff, gif, png.
//Make First Page
PDF_begin_page($pdf, 450, 450); // page width and height.
$bookmark = PDF_add_bookmark($pdf, "Front"); // add a top level bookmark.
PDF_setfont($pdf, $font1, 12); // use this font from now on.
PDF_show_xy($pdf, "First Page!", 5, 225); // show this text measured from the left top.
pdf_place_image($pdf, $image1, 255, 5, 1); // last number will schale it.
PDF_end_page($pdf); // End of Page.
//Make Second Page
PDF_begin_page($pdf, 450, 225); // page width and height.
$bookmark1 = PDF_add_bookmark($pdf, "Chapter1", $bookmark); // add a nested bookmark. (can be nested multiple times.)
PDF_setfont($pdf, $font2, 12); // use this font from now on.
PDF_show_xy($pdf, "Chapter1!", 225, 5);
PDF_add_bookmark($pdf, "Chapter1.1", $bookmark1); // add a nested bookmark (already in a nested one).
PDF_setfont($pdf, $font1, 12);
PDF_show_xy($pdf, "Chapter1.1", 225, 5);
PDF_end_page($pdf);
// Finish the PDF File
PDF_close($pdf); // End Of PDF-File.
$output = PDF_get_buffer($pdf); // assemble the file in a variable.
// Output Area
header("Content-type: application/pdf"); //set filetype to pdf.
header("Content-Length: ".strlen($output)); //content length
header("Content-Disposition: attachment; filename=test.pdf"); // you can use inline or attachment.
echo $output; // actual print area!
// Cleanup
PDF_delete($pdf);
?>
thodge at ipswich dot qld dot gov dot au
05-Sep-2005 01:22
Yet another addition to the PDF text extraction code last posted by jorromer. The code only seemed to work for PDF 1.2 (Acrobat 3.x) or below. This pdfExtractText function uses regular expressions to cover cases I have found in PDF 1.3 and 1.4 documents. The code also handles closing brackets in the text stream, which were ignored by the previous version. My regular expression skills are somewhat lacking, so improvements may possible by a more skilled programmer. I'm sure there are still cases that this function will not handle, but I haven't come across any yet...
<?php
function pdf2string($sourcefile) {
$fp = fopen($sourcefile, 'rb');
$content = fread($fp, filesize($sourcefile));
fclose($fp);
$searchstart = 'stream';
$searchend = 'endstream';
$pdfText = '';
$pos = 0;
$pos2 = 0;
$startpos = 0;
while ($pos !== false && $pos2 !== false) {
$pos = strpos($content, $searchstart, $startpos);
$pos2 = strpos($content, $searchend, $startpos + 1);
if ($pos !== false && $pos2 !== false){
if ($content[$pos] == 0x0d && $content[$pos + 1] == 0x0a) {
$pos += 2;
} else if ($content[$pos] == 0x0a) {
$pos++;
}
if ($content[$pos2 - 2] == 0x0d && $content[$pos2 - 1] == 0x0a) {
$pos2 -= 2;
} else if ($content[$pos2 - 1] == 0x0a) {
$pos2--;
}
$textsection = substr(
$content,
$pos + strlen($searchstart) + 2,
$pos2 - $pos - strlen($searchstart) - 1
);
$data = @gzuncompress($textsection);
$pdfText .= pdfExtractText($data);
$startpos = $pos2 + strlen($searchend) - 1;
}
}
return preg_replace('/(\s)+/', ' ', $pdfText);
}
function pdfExtractText($psData){
if (!is_string($psData)) {
return '';
}
$text = '';
// Handle brackets in the text stream that could be mistaken for
// the end of a text field. I'm sure you can do this as part of the
// regular expression, but my skills aren't good enough yet.
$psData = str_replace('\)', '##ENDBRACKET##', $psData);
$psData = str_replace('\]', '##ENDSBRACKET##', $psData);
preg_match_all(
'/(T[wdcm*])[\s]*(\[([^\]]*)\]|\(([^\)]*)\))[\s]*Tj/si',
$postScriptData,
$matches
);
for ($i = 0; $i < sizeof($matches[0]); $i++) {
if ($matches[3][$i] != '') {
// Run another match over the contents.
preg_match_all('/\(([^)]*)\)/si', $matches[3][$i], $subMatches);
foreach ($subMatches[1] as $subMatch) {
$text .= $subMatch;
}
} else if ($matches[4][$i] != '') {
$text .= ($matches[1][$i] == 'Tc' ? ' ' : '') . $matches[4][$i];
}
}
// Translate special characters and put back brackets.
$trans = array(
'...' => '…',
'\205' => '…',
'\221' => chr(145),
'\222' => chr(146),
'\223' => chr(147),
'\224' => chr(148),
'\226' => '-',
'\267' => '•',
'\(' => '(',
'\[' => '[',
'##ENDBRACKET##' => ')',
'##ENDSBRACKET##' => ']',
chr(133) => '-',
chr(141) => chr(147),
chr(142) => chr(148),
chr(143) => chr(145),
chr(144) => chr(146),
);
$text = strtr($text, $trans);
return $text;
}
?>
29-Aug-2005 12:58
If you want to display the number of pages (for example: page 1 of 3) then the following code could be helpful:
<?php
...
$pdf->begin_page_ext(842,595 , "");
.. add text,images,...
$pdf->suspend_page("");
$pdf->begin_page_ext(842,595 , "");
.. add text,images,...
$pdf->suspend_page("");
... create all pages
$pdf->resume_page("pagenumber 1");
... add number of pages to page 1
$pdf->end_page_ext("");
$pdf->resume_page("pagenumber 2");
... add number of pages to page 2
$pdf->end_page_ext("");
...
?>
jorromer at uchile dot cl -- Krash
08-Jun-2005 01:51
I recently use mattb code below for the extraction of text from PDF files. I modify this code for only extract text fields.
Hope i can help some one
Here is the Function
<?php
$text = pdf2string("file.pdf");
echo $text;
function pdf2string($sourcefile){
$fp = fopen($sourcefile, 'rb');
$content = fread($fp, filesize($sourcefile));
fclose($fp);
$searchstart = 'stream';
$searchend = 'endstream';
$pdfdocument = '';
$pos = 0;
$pos2 = 0;
$startpos = 0;
while( $pos !== false && $pos2 !== false ){
$pos = strpos($content, $searchstart, $startpos);
$pos2 = strpos($content, $searchend, $startpos + 1);
if ($pos !== false && $pos2 !== false){
if ($content[$pos]==0x0d && $content[$pos+1]==0x0a) $pos+=2;
else if ($content[$pos]==0x0a) $pos++;
if ($content[$pos2-2]==0x0d && $content[$pos2-1]==0x0a) $pos2-=2;
else if ($content[$pos2-1]==0x0a) $pos2--;
$textsection = substr($content, $pos + strlen($searchstart) + 2, $pos2 - $pos - strlen($searchstart) - 1);
$data = @gzuncompress($textsection);
$data = ExtractText2($data);
$startpos = $pos2 + strlen($searchend) - 1;
if ($data === false){
return -1;}
$pdfdocument .= $data;}}
return $pdfdocument;}
function ExtractText2($postScriptData){
$sw = true;
$textStart = 0;
$len = strlen($postScriptData);
while ($sw){
$ini = strpos($postScriptData, '(', $textStart);
$end = strpos($postScriptData, ')', $textStart+1);
if (($ini>0) && ($end>$ini)){
$valtext = strpos($postScriptData,'Tj',$end+1);
if ($valtext == $end + 2)
$text .= substr($postScriptData,$ini+1,$end - $ini - 1);}
$textStart = $end + 1;
if ($len<=$textStart) $sw=false;
if (($ini == 0) && ($end == 0)) $sw=false;}
$trans = array("\\341" => "a","\\351" => "e","\\355" => "i","\\363" => "o","\\223" => "","\\224" => "");
$text = strtr($text, $trans);
return $text;
}
?>
jonathan dot beckett at gmail dot com
06-Jun-2005 06:03
After spending ages writing my own PDF to text extraction routine (well... a couple of hours), I realised that you have to interpret the entire stream to have a hope of getting all the characters you really want - so I started digging.
I then discovered that the XPDF project has everything you need to deal with PDFs - Linux and Win32 binaries are available. Most distro's have the RPMs too.
The resultant command is thus;
$result = shell_exec("pdftotext -raw ".$filename." -");
...it works perfectly for content searching purposes.
q
02-Jun-2005 05:24
It seems that the newest adobe reader 7 (using pdf 1.6) is no longer fully compatible with pdfs generated with PDFlib <= 5. The solution is to upgrade to PDFlib 6. Unfortunately, this means coughing up some more cash to the authors, if you need to get rid of the watermark.
santa at selekcia dot com
19-May-2005 02:53
used function pdf2string does not work corectly with all PDFs. There are problems when in PDF are used 0x0D, 0x0A as line separator. Better way is detect length via /Length tag and detect first 2 chars if they are 0x0d or 0x0d and 0x0a both.
When I update this code i will send it, but if someone have now changed it please, publish it. May be it would be better to extend standard PDF lib included to PHP to add functionality to postprocess PDFs. It is usefull sometime to use for example templates, and so.
Thnx to all developpers extending PHP functions and base team.
webadmin at secretscreen dot com
06-Apr-2005 05:51
I found this info about pdflib scope on a Chinese (I think) site and translated it. I was trying to do pdf_setfont and kept getting the wrong scope error. Turns out it has to be in the Page scope. So pdf_setfont will only work when called between pdf_begin_page and pdf_end_page.
#########################################
When API of the PDFlib is called, the error, Can't - IN 'document' scope occurs
There is a concept of " the scope " in the PDFlib, as for all API of the PDFlib it is called with some scope, the *1 which is decided This error occurs when it is called other than the scope where API is appointed. The chart below in reference, please verify API call position.
Path: PDF_moveto (), PDF_circle (), PDF_arc (), PDF_arcn (), PDF_rect () in each case PDF_stroke (), PDF_closepath_stroke (), PDF_fill (), PDF_fill_stroke (), PDF_closepath_fill_stroke (), PDF_clip (), PDF_endpath () the between
Page: PDF_begin_page () with PDF_end_page () in between outside path
Template: PDF_begin_template () with PDF_end_template () in between outside path
Pattern: PDF_begin_pattern () with PDF_end_pattern () in between outside path
Font: PDF_begin_font () with PDF_end_font () in between outside glyph
Glyph: PDF_begin_glyph () with PDF_end_glyph () in between outside path
Document: PDF_open_* () with PDF_close () in between outside page tempalte and pattern
Object: The PDF_new () with the PDF_delete () it belongs to the other no scope in between the place
Null: Outside object
Any: All scopes other than
##########################################
Hope this helps others as much as it helped me!!!
kevin at kevinnading dot com
31-Mar-2005 04:46
Hey people.. the bug with IE not accepting a pdf created via post.. If you can use a get method instead then it will work fine. both post and get methods work in firefox, but only the get method seems to work in IE. However, you may use a content-disposition attachment(means requires user interaction) to popup an open/save dialog box to the user and post/get both work in IE and firefox. Hope this helps!
beanjammin dot removethis at gmail dot com
31-Mar-2005 02:32
This was originally posted by mat3582 at NOSPAM dot hotmail dot com on the Session Handling Functions manual page, however as it is pdf specific I hope that moving it here will make it easier for others to find.
I fought this for longer than I'd care to admit after a web server distros switch before discovering my problem was session related and subsequently discovering Mat's post.
// Mats Note:
Outputting a pdf file to a MSIE browser didn't work (MSIE mistook the file for an Active-X control,
then failed to download) untill I added
<?php
ini_set('session.cache_limiter',"0");
?>
to my script. I hope this will help someone else.
// End Mats Note
In addition to Mat's suggestion the php.ini file can also be edited to add/change the session.cach_limiter setting to 0.
chu61 dot tw at gmail dot com
07-Mar-2005 11:57
How to get how many pages in a PDF? I read PDF spec. V1.6 and find this:
PDF set a "Page Tree Node" to define the ordering of pages in the document. The tree structure allows PDF applications, using little memory to quickly open a document containing thousands of pages.
If a PDF have 63 pages, the page tree node will like this...
2 0 obj
<< /Type /Pages
/Kidsn [ 4 0 R
10 0 R
]
/Count 63 <---- YES, got it
>>
endobj
[P.S] a PDF may not only a pages tree node, The right answer is in "root page tree node", if /Count XX with /Parent XXX node, it not "root page tree node"
SO, You must find the node with /Count XX and Without /Parent terms, and you'll get total pages of PDF
%PDF-1.0 ~ %PDF-1.5 all works
Alex form Taipei,Taiwan
mattb at bluewebstudios dot com
05-Feb-2005 05:44
I recently tested Donatas' code below for the extraction of text from PDF files. After running into a few problems where PDF files were not being read at all, I've modified it somewhat. It still isn't perfect, but should work great for searching. Thanks Donatas.
<?php
$test = pdf2string("<pathtoPDFfile>");
echo "$test";
# Returns a -1 if uncompression failed
function pdf2string($sourcefile)
{
$fp = fopen($sourcefile, 'rb');
$content = fread($fp, filesize($sourcefile));
fclose($fp);
# Locate all text hidden within the stream and endstream tags
$searchstart = 'stream';
$searchend = 'endstream';
$pdfdocument = "";
$pos = 0;
$pos2 = 0;
$startpos = 0;
# Iterate through each stream block
while( $pos !== false && $pos2 !== false )
{
# Grab beginning and end tag locations if they have not yet been parsed
$pos = strpos($content, $searchstart, $startpos);
$pos2 = strpos($content, $searchend, $startpos + 1);
if( $pos !== false && $pos2 !== false )
{
# Extract compressed text from between stream tags and uncompress
$textsection = substr($content, $pos + strlen($searchstart) + 2, $pos2 - $pos - strlen($searchstart) - 1);
$data = @gzuncompress($textsection);
# Clean up text via a special function
$data = ExtractText($data);
# Increase our PDF pointer past the section we just read
$startpos = $pos2 + strlen($searchend) - 1;
if( $data === false ) { return -1; }
$pdfdocument = $pdfdocument . $data;
}
}
return $pdfdocument;
}
function ExtractText($postScriptData)
{
while( (($textStart = strpos($postScriptData, '(', $textStart)) && ($textEnd = strpos($postScriptData, ')', $textStart + 1)) && substr($postScriptData, $textEnd - 1) != '\\') )
{
$plainText .= substr($postScriptData, $textStart + 1, $textEnd - $textStart - 1);
if( substr($postScriptData, $textEnd + 1, 1) == ']' ) // This adds quite some additional spaces between the words
{
$plainText .= ' ';
}
$textStart = $textStart < $textEnd ? $textEnd : $textStart + 1;
}
return stripslashes($plainText);
}
?>
ken at thesmallbox.com
30-Oct-2004 11:13
Please note that these functions have been removed from PHP 5. They are still available through the pdflib PECL module.
14-Aug-2004 02:58
for people who are using PDF_FINDFONT there is a catch..
--------------------------------------------------------
int PDF_findfont(PDF *p, const char *fontname, const char *encoding, int embed)
Deprecated, use PDF_load_font( ).
----
use PDF_load_font instead....
arjen at queek dot nl
15-Jul-2004 10:50
If you prefer a OO-approach to the PDF-functions, you can use this snippet of code (PHP5 only! and does add some overhead). It's just a "start-up", extend/improve as you wish...
You can pass all pdf_* functions to your object and stripping pdf_ of the function name. Plus, you don't have to pass the pdf-resource as the first argument.
For example:
<?php
pdf_show($pdf, $text); // Where $pdf is your pdf-resource
?>
Can become:
<?php
$pdf->show($text); // Where $pdf is your PDF-object
?>
Code:
<?php
class PDF {
private $pdf;
/* public Void __construct(): Constructor */
public function __construct() {
$this->pdf = pdf_new();
}
/* public Mixed __call(): Re-route all function calls to the PHP-functions */
public function __call($function, $arguments) {
// Prepend the pdf resource to the arguments array
array_unshift($arguments, $this->pdf);
// Call the PHP function
return call_user_func_array('pdf_' . $function, $arguments);
}
}
?>
michi (Alt+Q) marel.at
01-Jul-2004 10:10
<?PHP
/* A little helpful function to calculate millimeters to points */
function calcToPt($intMillimeter) {
$intPoints = ($intMillimeter*72)/25.4;
$intPoints = round($intPoints);
return $intPoints;
}
/* For example: Create DIN A4 210x297 mm */
pdf_begin_page( $pdf, calcToPt(210), calcToPt(297)); // 595x842 pt
?>
donatas at spurgius dot com
23-Jun-2004 03:56
I've been looking for a way to extract plain text from PDF documents (needed to search for text inside 'em). Not being able to find one I wrote the needed functions myself. here you go folks.
<?php
function pdf2string ($sourceFile)
{
$textArray = array ();
$objStart = 0;
$fp = fopen ($sourceFile, 'rb');
$content = fread ($fp, filesize ($sourceFile));
fclose ($fp);
$searchTagStart = chr(13).chr(10).'stream';
$searchTagStartLenght = strlen ($searchTagStart);
while ((($objStart = strpos ($content, $searchTagStart, $objStart)) && ($objEnd = strpos ($content, 'endstream', $objStart+1))))
{
$data = substr ($content, $objStart + $searchTagStartLenght + 2, $objEnd - ($objStart + $searchTagStartLenght) - 2);
$data = @gzuncompress ($data);
if ($data !== FALSE && strpos ($data, 'BT') !== FALSE && strpos ($data, 'ET') !== FALSE)
{
$textArray [] = ExtractText ($data);
}
$objStart = $objStart < $objEnd ? $objEnd : $objStart + 1;
}
return $textArray;
}
function ExtractText ($postScriptData)
{
while ((($textStart = strpos ($postScriptData, '(', $textStart)) && ($textEnd = strpos ($postScriptData, ')', $textStart + 1)) && substr ($postScriptData, $textEnd - 1) != '\\'))
{
$plainText .= substr ($postScriptData, $textStart + 1, $textEnd - $textStart - 1);
if (substr ($postScriptData, $textEnd + 1, 1) == ']') //this adds quite some additional spaces between the words
{
$plainText .= ' ';
}
$textStart = $textStart < $textEnd ? $textEnd : $textStart + 1;
}
return stripslashes ($plainText);
}
?>
uwe at steinmann dot cx
13-May-2004 09:25
Those looking for a free replacement of pdflib may consider
pslib at http://pslib.sourceforge.net which produces PostScript but it can be easily turned into PDF by Acrobat Distiller or ghostscript. The API is very similar and even hypertext functions are supported. There
is also a php extension for pslib in PECL, called ps.
james at lanpad dot org
19-Apr-2004 11:36
PDFLib has a free replacement, that also is much easier to work with too (no more working with co-ordinates from the bottom left hand corner!)!
http://www.fpdf.org
Its also free for commercial use, and is very useable, unlike the PDFlib extensions.
matic at koncan dot net
12-Jan-2004 10:22
The solution for IE (refresh):
...
$buf = PDF_get_buffer($p);
$len = strlen($buf);
header("Cache-Control: no-store");
header("Cache-Control: no-cache");
header("Cache-Control: must-revalidate");
header("Content-type: application/pdf");
header("Content-Length: $len");
header("Content-Disposition: inline; filename=file.pdf");
print $buf;
PDF_delete($p);
SenorTZ senortz at nospam dot yahoo dot com
28-Jul-2003 09:23
About creating a PDF document based on the content of another document(let's say a text file):
I have tried to send to the PDF-creator page from a link from the sender page the file name of the file I want to read the content from and generate the PDF document containing this content. The idea is is that when I tried to reffer the pdf-creator page via the link your_root/create_pdf.php?filename=$your_file_name, the pdf-creator page does not behave well when before creating the pdf document I have a line like $filename = $_GET["filename"].
I solved this using on the sender page instead of the link a form with a button, so the form has as action "create_pdf.php", as method "post" and a hidden field containing the "filename" value. And it works like this if, on the pdf-creator page I have a line like $filename = $_POST["filename"].
I would like to understand why this way it works and the other way does not.
I hope this helps. Here are the pieces of code I used.
Sender page:
print("<form name='to_pdf' action='see_pdf_file.php' method='post'>");
print("<br/><input type='submit' value='PDF'><input type='hidden' name='filename' value='$filename'></form>");
PDF-creator page:
<?
$filename = $_POST["filename"];
$file_handle = fopen($filename, "r");
$file_content = file_get_contents($filename);
fclose($file_handle);
//
$file_content = wordwrap($file_content,72,"|");
$a_row = explode("|",$file_content);
$i = 0;
//
$pdf = pdf_new();
pdf_open_file($pdf, "");
pdf_begin_page($pdf, 595, 842);
pdf_set_font($pdf, "Times-Roman", 16, "host");
pdf_add_outline($pdf, "Page 1");
pdf_set_value($pdf, "textrendering", 1);
pdf_show_xy($pdf, 'The content of the file:',50,700);
while ($a_row[$i] != "")
{
pdf_continue_text($pdf,$a_row[$i]);
$i++;
}
pdf_end_page($pdf);
pdf_close($pdf);
//
$data = pdf_get_buffer($pdf);
//
header("Content-type: application/pdf");
header("Content-disposition: inline; filename=test.pdf");
header("Content-length: " . strlen($data));
//
echo $data;
?>
PDFLib and PHP 431 used.
Thanks.
bmironov at jonview dot com
25-Jun-2003 06:46
RedHat 9 + Apache 2.0 + PHP 4.3.2 + Oracle 9i + PDFlib 5.0.1 (binary distribution)
It seems to be a working bundle if you do some magic with ./configure:
RedHat 9:
kernel-2.4.20-18.9
Apache 2.0.46:
./configure --enable-so --enable-rewrite=shared --enable-status --enable-mpm=prefork
PHP 4.3.2:
./configure \
--program-prefix= \
--prefix=/usr \
--exec-prefix=/usr \
--bindir=/usr/bin \
--sbindir=/usr/sbin \
--sysconfdir=/etc \
--datadir=/usr/share \
--includedir=/usr/include \
--libdir=/usr/lib \
--libexecdir=/usr/libexec \
--localstatedir=/var \
--sharedstatedir=/usr/com \
--mandir=/usr/share/man \
--infodir=/usr/share/info \
--with-config-file-path=/etc \
--with-config-file-scan-dir=/etc/php.d \
--without-tsrm-pthreads \ # !!!!!!!!!!!!!!!!!!!!
--with-zlib \
--with-gd \
--enable-gd-native-ttf \
--with-ttf \
--without-mysql \
--with-apxs2filter=/usr/local/apache2/bin/apxs \
--with-oci8 \
--enable-sigchild \
--enable-inline-optimization
Oracle9i:
ln -s $ORACLE_HOME/rdbms/public/nzerror.h $ORACLE_HOME/rdbms/demo/nzerror.h
ln -s $ORACLE_HOME/rdbms/public/nzt.h $ORACLE_HOME/rdbms/demo/nzt.h
ln -s $ORACLE_HOME/rdbms/public/ociextp.h $ORACLE_HOME/rdbms/demo/ociextp.h
If you want to use bundled GD-library then:
1) install following packages: libjpeg, libjpeg-devel, libpng, libpng-devel, freetype, freetype-devel, libtiff, libtiff-devel, zlib, zlib-devel
2) ln -s /usr/lib/libjpeg.so.62 /usr/lib/libjpeg.so
ln -s /usr/lib/libpng.so.62 /usr/lib/libpng.so
It seems to be a working combination, because it is NOT give you:
1) error message in Apache's error_log:
Module compiled with module API=20020429, debug=0, thread-safety=0
PHP compiled with module API=20020429, debug=0, thread-safety=1
2) error message in Apache's error_log:
[notice] child pid 12345 exit signal Segmentation fault (11)
3) MS Internet Explorer can show PDF-output from your PHP-script via Acrobat plug-in and does not crush. No confusing messages about opening "Adobe Acrobat Control for ActiveX".
Hope it will save you some time.
Good luck,
Boris
matt at nospam dot org
30-Aug-2002 02:11
Adding to my prior note, IE 6 has a strange feature of using GET when refreshing a pdf document, even though the page was originally POSTed to. This may be the root cause of all the trouble listed above regarding posting and pdf.
So, I recommend:
1) using a two page form/action handler when doing pdf rendering instead of the standard $PHP_SELF form/self handler to resolve the problem discussed above
2) Using either GET, or a self posting form that sets cookies and then redirects to the pdf creation page instead of POST, so that the parms get to the page. HTH
gilbertng at hongkong dot com
11-Jun-2002 06:23
Hope it can help someone:
$pdf = pdf_new();
//pdf_open_file($pdf,"");
if (!pdf_open_file($pdf, "")) {
print error;
exit;
}
PDF_set_parameter($pdf, "resourcefile", "/usr/local/pdflib/fonts/pdflib.upr");
PDF_set_parameter($pdf,"prefix","/usr/local/pdflib/fonts");
pdf_begin_page($pdf, 595, 842);
pdf_add_outline($pdf, "Page 1");
//pdf_set_font($pdf, "Times-Roman", 30, "host");
// set chinese characters,
$font = pdf_findfont($pdf, "MHei-Medium", "B5pc-H",0);
if ($font) {
pdf_setfont($pdf, $font, 30);
}
pdf_set_value($pdf, "textrendering",0);
pdf_show_xy($pdf, " 100 Roman outlined", 50, 750);
pdf_set_font($pdf, "Times-Roman", 30, "host");
pdf_show_xy($pdf, " Times Roman outlined", 50, 600);
pdf_moveto($pdf, 50, 740);
pdf_lineto($pdf, 330, 740);
pdf_stroke($pdf);
pdf_end_page($pdf);
pdf_close($pdf);
$buf = pdf_get_buffer($pdf);
$len = strlen($buf);
header("Content-type: application/pdf");
header("Content-Length: $len");
header("Content-Disposition: inline; filename=foo.pdf");
print $buf;
pdf_delete($pdf);
chernyshevsky at hotmail dot com
06-May-2002 06:22
If you're wondering how to highlight words inside a PDF file, take a look at this script I've written (doesn't need PDFLib)
http://zeus.jtlnet.com/~conradis/pdfhi.php.txt
It's a whole lot harder than you think. (Rarely has no much code been written that does so little, that's what I say :-) Worth looking at if you want to do searches inside a PDF.
pbierans at lynet dot de
28-Mar-2002 01:56
Load extension, open a PDF, add a font, modify PDF in memory and send
it to browser:
<?php
// no cache headers:
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
$ext_name="libpdf_php.so";
// libpdf_php.so is the PDFLIB for SunOS by "PDFlib GmbH"
// visit http://www.pdflib.com
// if the extension is not automatically loaded by Apache
// dl() will try to load it on demand:
if (!extension_loaded($ext_name) && !@dl($ext_name))
{
?>
<table width="100%" border="0"><tr><td align="center">
<table style="border: solid #f0f0f0 2px;"><tr>
<td valign="middle" style="padding: 20px; margin: 0px;">
<p style="font-family: arial; font-size: 12px; ">
<b>Sorry,</b><br>
<br>
A PDF can not be generated right now.<br>
The administrator has been informed and will fix this as
soon as possible.<br>
Please try again later.
</p>
</td></tr></table>
</td></tr></table>
<?php
mail('admin@domain.com','Error: PDFLib not found',
'Called by script:\n '.$SCRIPT_FILENAME.'?'.$QUERY_STRING,
"From: warnings@domain.com\n");
exit;
} // verify that extension is usable
// unique serial number:
srand(microtime()*10000);
$usnr= gmdate("Ymd-His-").rand(1000,9999).'-';
$pdf_file=$usnr.'result.pdf';
$src_file='source.pdf';
// create pdf object
$pdf = pdf_new();
pdf_open_file($pdf);
pdf_set_parameter($pdf, 'serial', 'if-you-have-one');
// fonts to embed, they are in the folder of this file:
pdf_set_parameter($pdf, 'FontAFM', 'TradeGothic=Tg______.afm');
pdf_set_parameter($pdf, 'FontOutline', 'TradeGothic=Tg______.pfb');
pdf_set_parameter($pdf, 'FontPFM', 'TradeGothic=Tg______.pfm');
// load the source file:
$src_doc =pdf_open_pdi($pdf,$src_file,'', 0);
$src_page =pdf_open_pdi_page($pdf,$src_doc,1,'');
$src_width =pdf_get_pdi_value($pdf,'width' ,$src_doc,$src_page,0);
$src_height=pdf_get_pdi_value($pdf,'height',$src_doc,$src_page,0);
pdf_begin_page($pdf, $src_width, $src_height);
{
// place the sourcefile to the background of the actual page:
pdf_place_pdi_page($pdf,$src_page,0,0,1,1);
pdf_close_pdi_page($pdf,$src_page);
// modify the page:
pdf_set_font($pdf, 'TradeGothic', 8, 'host');
pdf_show_xy($pdf, 'Now: '.gmdate("Y-m-d H:i:s"),50,50);
}
pdf_end_page($pdf);
pdf_close($pdf);
// prepare output:
$pdfdata = pdf_get_buffer($pdf); // to echo the pdf-data
$pdfsize = strlen($pdfdata); // IE requires the datasize
// real datatype headers:
header('Content-type: application/pdf');
header('Content-disposition: attachment; filename="'.$pdf_file.'"');
header('Content-length: '.$pdfsize);
echo $pdfdata;
exit; // keep this one so no #13#10 or #32 will be written
?>
a dot marchand dot nospam at home dot com
02-May-2001 03:42
To continue on the internet explorer (Iexplorer, IE) requirements, instead of content-length, a simple:
header("Accept-Ranges: bytes");
is enough for the getpdf.php file working right. Even Netscape will without error with this modification.
Aurelien
|  |