•  JSON Formatter
  •  My Ip
  • Search
  • Recent Links
  • Sample
  • More
  • Links Archived Profile Favourites Logout
  • Sign in
Copied to Clipboard.

HTML to PHP Converter

Input Full Screen Clear



value=”Download” onclick=”createFile(‘php’);” title=”Download Result”>

Result Full Screen

What can you do with HTML TO PHP CONVERTER ?

  • This tool will help you to convert your HTML Entity/String/Data to PHP String/Data
  • To Save and Share this code, use Save and Share button

Related Tools

  • HTML to CSV Converter
  • HTML to TSV Converter
  • JSON to HTML Converter

Buy us a Coffee JSON Formatter FAQ About Contact History Sitemap Where am I right now? Blog

Title
Description
Tags

SunshinePHP 2019

    htmlspecialchars_decode »

    « html_entity_decode

    • PHP Manual
    • Function Reference
    • Text Processing
    • Strings
    • String Functions

    Edit
    Report a Bug

    htmlentities

    (PHP 4, PHP 5, PHP 7)

    htmlentitiesConvert all applicable characters to HTML entities

    Description

    string htmlentities
    ( string $string
    [, int $flags = ENT_COMPAT | ENT_HTML401
    [, string $encoding = ini_get("default_charset")
    [, bool $double_encode = TRUE
    ]]] )

    This function is identical to htmlspecialchars() in all
    ways, except with htmlentities(), all characters which
    have HTML character entity equivalents are translated into these entities.

    If you want to decode instead (the reverse) you can use
    html_entity_decode() .

    Parameters

    string

    The input string.

    flags

    A bitmask of one or more of the following flags, which specify how to handle quotes,
    invalid code unit sequences and the used document type. The default is
    ENT_COMPAT | ENT_HTML401.

    Available flags constants
    Constant NameDescription
    ENT_COMPATWill convert double-quotes and leave single-quotes alone.
    ENT_QUOTESWill convert both double and single quotes.
    ENT_NOQUOTESWill leave both double and single quotes unconverted.
    ENT_IGNORE Silently discard invalid code unit sequences instead of returning
    an empty string. Using this flag is discouraged as it
    » may have security implications .
    ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character
    U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
    ENT_DISALLOWED Replace invalid code points for the given document type with a
    Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD;
    (otherwise) instead of leaving them as is. This may be useful, for
    instance, to ensure the well-formedness of XML documents with
    embedded external content.
    ENT_HTML401 Handle code as HTML 4.01.
    ENT_XML1 Handle code as XML 1.
    ENT_XHTML Handle code as XHTML.
    ENT_HTML5 Handle code as HTML 5.
    encoding

    An optional argument defining the encoding used when converting characters.

    If omitted, the default value of the encoding varies
    depending on the PHP version in use. In PHP 5.6 and later, the
    default_charset configuration
    option is used as the default value. PHP 5.4 and 5.5 will use
    UTF-8 as the default. Earlier versions of PHP use
    ISO-8859-1.

    Although this argument is technically optional, you are highly encouraged to
    specify the correct value for your code if you are using PHP 5.5 or earlier,
    or if your default_charset
    configuration option may be set incorrectly for the given input.

    The following character sets are supported:

    Supported charsets
    CharsetAliasesDescription
    ISO-8859-1ISO8859-1 Western European, Latin-1.
    ISO-8859-5ISO8859-5 Little used cyrillic charset (Latin/Cyrillic).
    ISO-8859-15ISO8859-15 Western European, Latin-9. Adds the Euro sign, French and Finnish
    letters missing in Latin-1 (ISO-8859-1).
    UTF-8  ASCII compatible multi-byte 8-bit Unicode.
    cp866ibm866, 866 DOS-specific Cyrillic charset.
    cp1251Windows-1251, win-1251, 1251 Windows-specific Cyrillic charset.
    cp1252Windows-1252, 1252 Windows specific charset for Western European.
    KOI8-Rkoi8-ru, koi8r Russian.
    BIG5950 Traditional Chinese, mainly used in Taiwan.
    GB2312936 Simplified Chinese, national standard character set.
    BIG5-HKSCS  Big5 with Hong Kong extensions, Traditional Chinese.
    Shift_JISSJIS, SJIS-win, cp932, 932 Japanese
    EUC-JPEUCJP, eucJP-win Japanese
    MacRoman  Charset that was used by Mac OS.
      An empty string activates detection from script encoding (Zend multibyte),
    default_charset and current
    locale (see nl_langinfo() and
    setlocale() ), in this order. Not recommended.

    Note:

    Any other character sets are not recognized. The default encoding will be
    used instead and a warning will be emitted.

    double_encode

    When double_encode is turned off PHP will not
    encode existing html entities. The default is to convert everything.

    Return Values

    Returns the encoded string.

    If the input string contains an invalid code unit
    sequence within the given encoding an empty string
    will be returned, unless either the ENT_IGNORE or
    ENT_SUBSTITUTE flags are set.

    Changelog

    VersionDescription
    5.6.0 The default value for the encoding parameter was
    changed to be the value of the
    default_charset configuration
    option.
    5.4.0 The default value for the encoding parameter was
    changed to UTF-8.
    5.4.0 The constants ENT_SUBSTITUTE, ENT_DISALLOWED,
    ENT_HTML401, ENT_XML1,
    ENT_XHTML and ENT_HTML5 were added.
    5.3.0 The constant ENT_IGNORE was added.
    5.2.3 The double_encode parameter was added.

    Examples

    Example #1 A htmlentities() example


    <?php
    $str 
    "A 'quote' is <b>bold</b>";

    // Outputs: A 'quote' is &lt;b&gt;bold&lt;/b&gt;
    echo htmlentities($str);

    // Outputs: A &#039;quote&#039; is &lt;b&gt;bold&lt;/b&gt;
    echo htmlentities($strENT_QUOTES);
    ?>

    Example #2 Usage of ENT_IGNORE


    <?php
    $str 
    "\x8F!!!";

    // Outputs an empty string
    echo htmlentities($strENT_QUOTES"UTF-8");

    // Outputs "!!!"
    echo htmlentities($strENT_QUOTES ENT_IGNORE"UTF-8");
    ?>

    See Also

    • html_entity_decode() – Convert HTML entities to their corresponding characters
    • get_html_translation_table() – Returns the translation table used by htmlspecialchars and htmlentities
    • htmlspecialchars() – Convert special characters to HTML entities
    • nl2br() – Inserts HTML line breaks before all newlines in a string
    • urlencode() – URL-encodes string

    add a note add a note

    User Contributed Notes 43 notes

    up
    down
    112

    Sijmen Ruwhof

    8 years ago

    An important note below about using this function to secure your application against Cross Site Scripting (XSS) vulnerabilities.

    When printing user input in an attribute of an HTML tag, the default configuration of htmlEntities() doesn't protect you against XSS, when using single quotes to define the border of the tag's attribute-value. XSS is then possible by injecting a single quote:

    <?php
    $_GET
    ['a'] = "#000' onload='alert(document.cookie)";
    ?>

    XSS possible (insecure):

    <?php
    $href
    = htmlEntities($_GET['a']);
    print
    "<body bgcolor='$href'>"; # results in: <body bgcolor='#000' onload='alert(document.cookie)'>
    ?>

    Use the 'ENT_QUOTES' quote style option, to ensure no XSS is possible and your application is secure:

    <?php
    $href
    = htmlEntities($_GET['a'], ENT_QUOTES);
    print
    "<body bgcolor='$href'>"; # results in: <body bgcolor='#000&#039; onload=&#039;alert(document.cookie)'>
    ?>

    The 'ENT_QUOTES' option doesn't protect you against javascript evaluation in certain tag's attributes, like the 'href' attribute of the 'a' tag. When clicked on the link below, the given JavaScript will get executed:

    <?php
    $_GET
    ['a'] = 'javascript:alert(document.cookie)';
    $href = htmlEntities($_GET['a'], ENT_QUOTES);
    print
    "<a href='$href'>link</a>"; # results in: <a href='javascript:alert(document.cookie)'>link</a>
    ?>

    up
    down
    23

    q (dot) rendeiro (at) gmail (dot) com

    11 years ago

    I've seen lots of functions to convert all the entities, but I needed to do a fulltext search in a db field that had named entities instead of numeric entities (edited by tinymce), so I searched the tinymce source and found a string with the value->entity mapping. So, i wrote the following function to encode the user's query with named entities.

    The string I used is different of the original, because i didn't want to convert ' or ". The string is too long, so I had to cut it. To get the original check TinyMCE source and search for nbsp or other entity ;)

    <?php

    $entities_unmatched = explode(',', '160,nbsp,161,iexcl,162,cent, [...] ');
    $even = 1;
    foreach(
    $entities_unmatched as $c)
        if(
    $even)
           
    $ord = $c;
        else
           
    $entities_table[$ord] = $c;
       
       
    $even = 1 - $even;

    function encode_named_entities($str)
        global
    $entities_table;
       
       
    $encoded_str = '';
        for(
    $i = 0; $i < strlen($str); $i++)
           
    $ent = @$entities_table[ord($str$i)];
            if(
    $ent)
               
    $encoded_str .= "&$ent;";
            else
               
    $encoded_str .= $str$i;
           
       
        return
    $encoded_str;

    ?>

    up
    down
    20

    n at erui dot eu

    6 years ago

    html entities does not encode all unicode characters. It encodes what it can [all of latin1], and the others slip through. &#1033; is the nasty I use. I have searched for a function which encodes everything, but in the end I wrote this. This is as simple as I can get it. Consult an ansii table to custom include/omit chars you want/don't. I'm sure it's not that fast.

    // Unicode-proof htmlentities.
    // Returns 'normal' chars as chars and weirdos as numeric html entites.
    function superentities( $str )
        // get rid of existing entities else double-escape
        $str = html_entity_decode(stripslashes($str),ENT_QUOTES,'UTF-8');
        $ar = preg_split('/(?<!^)(?!$)/u', $str );  // return array of every multi-byte character
        foreach ($ar as $c)
        return $str2;

    up
    down
    8

    hajo-p

    4 years ago

    The flag ENT_HTML5 also strips newline chars like \n with htmlentities while htmlspecialchars is not affected by that.

    If you want to use nl2br on that string afterwards you might end up searching the problem like i did. This does not apply to other flags like e.g. ENT_XHTML which confused me.

    Tested this with PHP 5.4 / 5.5 / 5.6-dev with same results, so it seems that this is an intended "feature".

    up
    down
    9

    Waygood

    7 years ago

    When putting values inside comment tags <!-- --> you should replace -- with &#45;&#45; too, as this would end your tag and show the rest of the comment.

    up
    down
    11

    phil at lavin dot me dot uk

    8 years ago

    The following will make a string completely safe for XML:

    <?php
    function philsXMLClean($strin)
           
    $strout = null;

            for ($i = 0; $i < strlen($strin); $i++) ($ord >= 127))
                           
    $strout .= "&amp;#$ord;";
                   
                    else
                            switch (
    $strin[$i])
                                    case
    '<':
                                           
    $strout .= '&lt;';
                                            break;
                                    case
    '>':
                                           
    $strout .= '&gt;';
                                            break;
                                    case
    '&':
                                           
    $strout .= '&amp;';
                                            break;
                                    case
    '"':
                                           
    $strout .= '&quot;';
                                            break;
                                    default:
                                           
    $strout .= $strin[$i];
                           
                   
           

            return $strout;

    ?>

    up
    down
    10

    realcj at g mail dt com

    12 years ago

    If you are building a loadvars page for Flash and have problems with special chars such as " & ", " ' " etc, you should escape them for flash:

    Try trace(escape("&")); in flash' actionscript to see the escape code for &;

    % = %25
    & = %26
    ' = %27

    <?php
    function flashentities($string)
    return
    str_replace(array("&","'"),array("%26","%27"),$string);

    ?>

    Those are the two that concerned me. YMMV.

    up
    down
    8

    wd at NOSPAMwd dot it

    6 years ago

    Hi there,

    after several and several tests, I figured out that dot:

    - htmlentities() function remove characters like "à","è",etc when you specify a flag and a charset

    - htmlentities() function DOES NOT remove characters like those above when you DO NOT specify anything

    So, let's assume that..

    <?php

    $str = "Hèèèllooo";

    $res_1 = htmlentities($str, ENT_QUOTES, "UTF-8");
    $res_2 = htmlentities($str);

    echo var_dump($res_1); // Result: string '' (length=0)
    echo var_dump($res_2); // string 'H&egrave;&egrave;&egrave;llooo' (length=30)

    ?>

    I used this for a textarea content for comments. Anyway, note that using the "$res_2" form the function will leave unconverted single/double quotes. At this point you should use str_replace() function to perform the characters but be careful because..

    <?php

    $str = "'Hèèèllooo'";

    $res_2 = str_replace("'","&#039;",$str);
    $res_2 = htmlentities($str);
    echo
    var_dump($res_2); // string '&amp;#039;H&egrave;&egrave;&egrave;llooo&amp;#039;'

    $res_3 = htmlentities($str);
    $res_3 = str_replace("'","&#039;",$res_3);
    echo
    var_dump($res_3); // string '&#039;H&egrave;&egrave;&egrave;llooo&#039;' --> Nice
    ?>

    Hope it will helps you.

    Regards,
    W.D.

    up
    down
    6

    robin at robinwinslow dot co dot uk

    7 years ago

    htmlentities seems to have changed at some point between version 5.1.6 and 5.3.3, such that it now returns an empty string for anything containing a pound sign:

    $ php -v
    PHP 5.1.6 (cli) (built: May 22 2008 09:08:44)
    $ php -r "echo htmlentities('£hello', null, 'utf-8');"
    &pound;hello
    $

    $ php -v
    PHP 5.3.3 (cli) (built: Aug 19 2010 12:07:49)
    $ php -r "echo htmlentities('£hello', null, 'utf-8');"
    $

    (Returns an empty string the second time)

    Just a heads up.

    up
    down
    8

    ustimenko dot alexander at gmail dot com

    6 years ago

    For those Spanish (and not only) folks, that want their national letters back after htmlentities :)

    <?php
    protected function _decodeAccented($encodedValue, $options = array())
       
    $options += array(
           
    'quote'     => ENT_NOQUOTES,
           
    'encoding'  => 'UTF-8',
        );
        return
    preg_replace_callback(
           
    '/&\w(acute
    ?>

    up
    down
    6

    h_guillaume at hotmail dot com

    8 years ago

    I use this function to encode all the xml entities and also all the &something; that are not defined in xml like &trade;
    You can also decode what you encode with my decode function.
    My function works a little like the htmlentities.
    You can also add other string to the array if you want to exclude them from the encoding.

    <?php
    function xml_entity_decode($text, $charset = 'Windows-1252')
       
    // Double decode, so if the value was &amp;trade; it will become Trademark
       
    $text = html_entity_decode($text, ENT_COMPAT, $charset);
       
    $text = html_entity_decode($text, ENT_COMPAT, $charset);
        return
    $text;

    function xml_entities($text, $charset = 'Windows-1252')
        
    // Debug and Test
        // $text = "test &amp; &trade; &amp;trade; abc &reg; &amp;reg; &#45;";
       
        // First we encode html characters that are also invalid in xml
       
    $text = htmlentities($text, ENT_COMPAT, $charset, false);
       
       
    // XML character entity array from Wiki
        // Note: &apos; is useless in UTF-8 or in UTF-16
       
    $arr_xml_special_char = array("&quot;","&amp;","&apos;","&lt;","&gt;");
       
       
    // Building the regex string to exclude all strings with xml special char
       
    $arr_xml_special_char_regex = "(?";
        foreach(
    $arr_xml_special_char as $key => $value)
           
    $arr_xml_special_char_regex .= "(?!$value)";
       
       
    $arr_xml_special_char_regex .= ")";
       
       
    // Scan the array for &something_not_xml; syntax
       
    $pattern = "/$arr_xml_special_char_regex&([a-zA-Z0-9]+;)/";
       
       
    // Replace the &something_not_xml; with &amp;something_not_xml;
       
    $replacement = '&amp;$1';
        return
    preg_replace($pattern, $replacement, $text);

    ?>

    up
    down
    4

    admin at wapforum dot rs

    7 years ago

    A useful little function to convert the symbols in the different inputs.
    <?php
    function ConvertSimbols($var, $ConvertQuotes = 0)
    if (
    $ConvertQuotes > 0)
    $var = htmlentities($var, ENT_NOQUOTES, 'UTF-8');
    $var = str_replace('\"', '', $var);
    $var = str_replace("\'", '', $var);
    else
    $var = htmlentities($var, ENT_QUOTES, 'UTF-8');

    return $var;

    ?>

    Usage with quotes for example message:

    $message = ConvertSimbols($message);

    Usage without quotes for example link:

    $link = ConvertSimbols($link, 1);

    up
    down
    1

    rq

    5 years ago

    For use of html  tags, ampersands, etc. in xml document

    (f.e.

    <xml>

    <xmltag1><span > data 1</span> & data2</xmltag1>

    </xml>

    )

    one can use the CDATA brackets:

    <xmltag1><![CDATA[<span > data 1</span> & data2]]></xmltag1>

    -rq

    up
    down
    3

    Tom Walter

    10 years ago

    Note that as of 5.2.5 it appears that if the input string contains a character that is not valid for the output encoding you've specified, then this function returns null.

    You might expect it to just strip the invalid char, but it doesn't.

    You can strip the chars yourself like so:

    iconv('utf-8','utf-8',$str);

    You can combine that with htmlentities also:

    $str = htmlentities(iconv('UTF-8', 'UTF-8//IGNORE', $str, ENT_QUOTES, 'UTF-8');

    Should give you a string with htmlentities encoded to utf-8, and any unsupported chars stripped.

    up
    down
    2

    steve at mcdragonsoftware dot com

    7 years ago

    I'm glad 5.4 has xml support, but many of us are working with older installations, some of us still have to use PHP4. If you're like me you've been frustrated with trying to use htmlentites/htmlspecial chars with xml output. I was hoping to find an option to force numeric encoding, lacking that, I have written my own xmlencode function, which I now offer:

    usage:

    $string xmlencode( $string )

    it will use htmlspecialchars for the valid xml entities amp, quote, lt, gt, (apos) and return the numeric entity for all other non alpha-numeric characters.

    -------------------------------------------

    <?php
    if( !function_exists( 'xmlentities' ) )
        function
    xmlentities( $string )
           
    $not_in_list = "A-Z0-9a-z\s_-";
            return
    preg_replace_callback( "/[^$not_in_list]/" , 'get_xml_entity_at_index_0' , $string );
       
        function
    get_xml_entity_at_index_0( $CHAR )
            if( !
    is_string( $CHAR[0] )
        function
    numeric_entity_4_char( $char )
            return
    "&#".str_pad(ord($char), 3, '0', STR_PAD_LEFT).";";
           

    ?>

    up
    down
    1

    keenskelly at gmail dot com

    10 years ago

    Correction to my previous post: the set of ENTITY declarations must be inside a <!DOCTYPE element; also &nbsp; is NOT pre-defined in XML and must be left in the entity list. I also extended the list with the windows 1252 character set using a sample function borrowed from php.net user comments and extended with euro entity which we need for our app. Here is the final code that is in our production app:

    <?php

    // Generate a list of entity declarations from the HTML_ENTITIES set that PHP knows about to dump into the document
    function htmlentities_entities()
           
    $output = "<!DOCTYPE html [\n";
            foreach (
    get_html_translation_table_CP1252(HTML_ENTITIES) as $value)
                   
    $name = substr($value, 1, strlen($value) - 2);
                    switch (
    $name)
                           
    // These ones we can skip because they're built into XML
                           
    case 'gt':
                            case
    'lt':
                            case
    'quot':
                            case
    'apos':
                            case
    'amp': break;
                            default:
    $output .= "<!ENTITY $name \"&$name;\">\n";
                   
           
           
    $output .= "]>\n";
            return(
    $output);

    // ref: http://php.net/manual/en/function.get-html-translation-table.php#76564
    function get_html_translation_table_CP1252($type)
           
    $trans = get_html_translation_table($type);
           
    $trans[chr(130)] = '&sbquo;';    // Single Low-9 Quotation Mark
           
    $trans[chr(131)] = '&fnof;';    // Latin Small Letter F With Hook
           
    $trans[chr(132)] = '&bdquo;';    // Double Low-9 Quotation Mark
           
    $trans[chr(133)] = '&hellip;';    // Horizontal Ellipsis
           
    $trans[chr(134)] = '&dagger;';    // Dagger
           
    $trans[chr(135)] = '&Dagger;';    // Double Dagger
           
    $trans[chr(136)] = '&circ;';    // Modifier Letter Circumflex Accent
           
    $trans[chr(137)] = '&permil;';    // Per Mille Sign
           
    $trans[chr(138)] = '&Scaron;';    // Latin Capital Letter S With Caron
           
    $trans[chr(139)] = '&lsaquo;';    // Single Left-Pointing Angle Quotation Mark
           
    $trans[chr(140)] = '&OElig;';    // Latin Capital Ligature OE
           
    $trans[chr(145)] = '&lsquo;';    // Left Single Quotation Mark
           
    $trans[chr(146)] = '&rsquo;';    // Right Single Quotation Mark
           
    $trans[chr(147)] = '&ldquo;';    // Left Double Quotation Mark
           
    $trans[chr(148)] = '&rdquo;';    // Right Double Quotation Mark
           
    $trans[chr(149)] = '&bull;';    // Bullet
           
    $trans[chr(150)] = '&ndash;';    // En Dash
           
    $trans[chr(151)] = '&mdash;';    // Em Dash
           
    $trans[chr(152)] = '&tilde;';    // Small Tilde
           
    $trans[chr(153)] = '&trade;';    // Trade Mark Sign
           
    $trans[chr(154)] = '&scaron;';    // Latin Small Letter S With Caron
           
    $trans[chr(155)] = '&rsaquo;';    // Single Right-Pointing Angle Quotation Mark
           
    $trans[chr(156)] = '&oelig;';    // Latin Small Ligature OE
           
    $trans[chr(159)] = '&Yuml;';    // Latin Capital Letter Y With Diaeresis
           
    $trans['euro'] = '&euro;';    // euro currency symbol
           
    ksort($trans);
            return
    $trans;

    ?>

    [EDIT BY danbrown AT php DOT net: The user's original note contained the following text:

    "So here's something fun: if you create an XML document in PHP and use htmlentities() to encode text data, then later want to read and parse the same document with PHP's xml_parse(), unless you include entity declarations into the generated document, the parser will stop on the unknown entities.

    To account for this, I created a small function to take the translation table and turn it into XML <!ENTITY> definitions. I insert this output into the XML document immediately after the <?xml?> line and the parse errors magically vanish"
    ]

    up
    down
    1

    Bassie (:

    15 years ago

    Note that you'll have use htmlentities() before any other function who'll edit text like nl2br().

    If you use nl2br() first, the htmlentities() function will change < br > to &lt;br&gt;.

    up
    down
    2

    2962051004 at qq dot com

    2 months ago

    <?php

    $str = <<<EOT
    你好 world
    EOT;

    function ChineseToEntity($str)
    return
    preg_replace_callback(
           
    '/[\x4e00-\x9fa5]/u', // utf-8
            // '/[\x7f-\xff]+/', // if gb2312
           
    function ($matches)
               
    $json = json_encode(array($matches[0]));
               
    preg_match('/\[\"(.*)\"\]/', $json, $arr);
               

               
    return '&#x'. str_replace('\\u', '', $arr[1]). ';';
            ,
    $str
      
    );

    echo ChineseToEntity($str);
    // &#x4f60;&#x597d; world

    up
    down
    2

    Jeff

    6 months ago

    There is a feature when writing to XML using an AJAX call to PHP that rarely is mentioned. I struggled for many hours using htmlentities() because what was getting written to my XML document was not as expected. I naturally assumed that I should be converting my strings before writing them to XML to adhere to XML rules on illegal characters. To my surprise, when converting with htmlentities() or htmlspecialchars() and then writing to an XML file, the resulting ampersands get converted afterwards! Consider the following example:

    <?php
    $str
    = "<b>I am cool</b>" ;
    $str = htmlentities($str) ;
    ?>

    When you append $str to an XML element and save() the document, you would expect the XML document's source code to look something like this:

    <ele>&lt;b&gt;I am cool&lt;/b&gt;</ele>

    But that is not what happens. The resulting ampersands get converted by PHP automatically to &amp; and your source code ends up looking like this:

    <ele>&amp;lt;b&amp;gt;I am cool&amp;lt;/b&amp;gt;</ele>

    As you can see, this creates problems when trying to output the XML data back to HTML. It is important to remember that when writing to XML this way, special characters like ">" and "<"; PHP converts them automatically and there becomes no need to use htmlentities() in certain cases. I assume this feature is in place to aid with passing data through header queries, to avoid reserved characters conflicting with others in a header query (e.g. & or =). Now I understand this may not be the case with older versions of PHP and that this might be a feature of my version (PHP version 5.6.32). With older versions, I assume using htmlentities() or htmlspecialchars() is a must, as stated with previous notes here. Also I use the charset UTF-8 in my HTML and XML and am not sure if this also effects the results I get.

    Anyway, I struggled for many hours with using htmlentities() to convert strings for XML writing and saving, when all I had to do was simply not use the function and let PHP convert my strings for me. I hope this helps because I would think I am not the only one who has struggled with this situation.

    up
    down
    1

    edo at edwaa dot com

    13 years ago

    A version of the xml entities function below. This one replaces the "prime" character (′) with which I had difficulties.

    <?php
    // XML Entity Mandatory Escape Characters
    function xmlentities($string)
       return
    str_replace ( array ( '&', '"', "'", '<', '>', '�' ), array ( '&amp;' , '&quot;', '&apos;' , '&lt;' , '&gt;', '&apos;' ), $string );

    ?>

    up
    down
    0

    chris at ocproducts dot com

    1 year ago

    This function throws a warning on bad input even if ENT_SUBSTITUTE is set, so be prepared for this.

    up
    down
    1

    jake_mcmahon at hotmail dot com

    14 years ago

    This fuction is particularly useful against XSS (cross-site-scripting-). XSS makes use of holes in code, whether it be in Javascript or PHP. XSS often, if not always, uses HTML entities to do its evil deeds, so this function in co-operation with your scripts (particularly search or submitting scripts) is a very useful tool in combatting "H4X0rz".

    up
    down
    0

    mzvarik at gmail dot com

    9 years ago

    CZECH entities:

    <?php
    $ent
    = array(
       
    'ě' => '&#283;',
       
    'Ě' => '&#282;',
       
    'š' => '&#353;',
       
    'Š' => '&#352;',
       
    'č' => '&#269;',
       
    'Č' => '&#268;',
       
    'ř' => '&#345;',
       
    'Ř' => '&#344;',
       
    'ž' => '&#382;',
       
    'Ž' => '&#381;',
       
    'ý' => '&#253;',
       
    'Ý' => '&#221;',
       
    'á' => '&#225;',
       
    'Á' => '&#193;',
       
    'í' => '&#237;',
       
    'Í' => '&#205;',
       
    'é' => '&#233;',
       
    'É' => '&#201;',
       
    'ú' => '&#250;',
       
    'ů' => '&#367;',
       
    'Ů' => '&#366;',
       
    'ď' => '&#271;',
       
    'Ď' => '&#270;',
       
    'ť' => '&#357;',
       
    'Ť' => '&#356;',
       
    'ň' => '&#328;',
       
    'Ň' => '&#327;'
    );

    echo strtr('ěščřžýáíéúůďťňĚŠČŘŽÝÁÍÉÚŮĎŤŇ', $ent);
    ?>

    up
    down
    0

    za at byza dot it

    10 years ago

    Trouble when using files with different charset?

    htmlentities and html_entity_decode can be used to translate between charset!

    Sample function:

    <?php
    function utf2latin($text)
      
    $text=htmlentities($text,ENT_COMPAT,'UTF-8');
       return
    html_entity_decode($text,ENT_COMPAT,'ISO-8859-1');

    ?>

    up
    down
    -1

    gunter [dot] sammet [at] gmail [dot] com

    9 years ago

    Had a heck of a time to get my rss entities right. using htmlentities didn't work and using html_entity_decode didn't work either. Ended up writing a custom function to encode and decode. It might still need some work but I thought to share it because I couldn't find anything on the net. Always open for suggestions to improve it! Here it is:

    <?php
      $entity_custom_from
    = false;
     
    $entity_custom_to = false;
      function
    html_entity_decode_encode_rss($data) !is_array($entity_custom_to))
         
    $array_position = 0;
          foreach (
    get_html_translation_table(HTML_ENTITIES) as $key => $value)
           
    //print("<br />key: $key, value: $value <br />\n");
           
    switch ($value)
             
    // These ones we can skip
             
    case '&nbsp;':
                break;
              case
    '&gt;':
              case
    '&lt;':
              case
    '&quot;':
              case
    '&apos;':
              case
    '&amp;':
               
    $entity_custom_from[$array_position] = $key;
               
    $entity_custom_to[$array_position] = $value;
               
    $array_position++;
                break;
              default:
               
    $entity_custom_from[$array_position] = $value;
               
    $entity_custom_to[$array_position] = $key;
               
    $array_position++;
           
         
       
        return
    str_replace($entity_custom_from, $entity_custom_to, $data);
     
    ?>

    up
    down
    -2

    Wired

    8 years ago

    I needed a simple little function to take a string and convert extended ascii characters into html entities. I couldn't find a function for this so I whipped one up.

    <?php

    function ascii2entities($string)
        for(
    $i=128;$i<=255;$i++)
           
    $entity = htmlentities(chr($i), ENT_QUOTES, 'cp1252');
           
    $temp = substr($entity, 0, 1);
           
    $temp .= substr($entity, -1, 1);
            if (
    $temp != '&;')
               
    $string = str_replace(chr($i), '', $string);
           
            else
               
    $string = str_replace(chr($i), $entity, $string);
           
       
        return
    $string;

    echo ascii2entities("•");
    ?>

    up
    down
    -1

    snevi at im dot com dot ve

    10 years ago

    correction to my previous post and improvement of the function: (the post was changed by the html parser and the characters displays as they should not)

    <?php
       
    function XMLEntities($string)
       
           
    $string = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/e', '_privateXMLEntities("$0")', $string);
            return
    $string;
       

        function _privateXMLEntities($num)
       
       
    $chars = array(
           
    128 => '&#8364;',
           
    130 => '&#8218;',
           
    131 => '&#402;',
           
    132 => '&#8222;',
           
    133 => '&#8230;',
           
    134 => '&#8224;',
           
    135 => '&#8225;',
           
    136 => '&#710;',
           
    137 => '&#8240;',
           
    138 => '&#352;',
           
    139 => '&#8249;',
           
    140 => '&#338;',
           
    142 => '&#381;',
           
    145 => '&#8216;',
           
    146 => '&#8217;',
           
    147 => '&#8220;',
           
    148 => '&#8221;',
           
    149 => '&#8226;',
           
    150 => '&#8211;',
           
    151 => '&#8212;',
           
    152 => '&#732;',
           
    153 => '&#8482;',
           
    154 => '&#353;',
           
    155 => '&#8250;',
           
    156 => '&#339;',
           
    158 => '&#382;',
           
    159 => '&#376;');
           
    $num = ord($num);
            return ((
    $num > 127 && $num < 160) ? $chars[$num] : "&#".$num.";" );
       
    ?>

    in the previous post, to correct the HEX values that are not rendered, the program use a for each cicle, but that introduces a mayor complexity in execution time, so, we use the ability to call functions in the preg_replace second parameter, and ceate another funcion that evaluates the ord of the character given, and if it is between 127 and 160 it returns the modified HEX value to be understood by the browser and not brake the XML
    (this work with dynamic XML generated form php with dynamic data from any source)

    p.d: the '&'(&) should appear in this post as a single ampersand character and not as the html entity

    up
    down
    -2

    D. Gasser

    11 years ago

    When using UTF-8 as charset, you'll have to set UTF-8 in braces, otherwise the varaible is not recognized.

    up
    down
    -2

    drallen at cs dot uwaterloo dot ca

    8 years ago

    A pointer to http://www.php.net/manual/en/function.mb-convert-encoding.php if your intention is to translate *all* characters in a charset to their corresponding HTML entities, not just named characters. Non-named characters will be replaced with HTML numeric encoding. eg:

    $text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");

    up
    down
    -1

    daviscabral[arroba]gmail[ponto]com

    12 years ago

    unhtmlentities for all entities:

    <?php

    function unhtmlentities ($string)
      
    $trans_tbl1 = get_html_translation_table (HTML_ENTITIES);
       foreach (
    $trans_tbl1 as $ascii => $htmlentitie )
           
    $trans_tbl2[$ascii] = '&#'.ord($ascii).';';
      
      
    $trans_tbl1 = array_flip ($trans_tbl1);
      
    $trans_tbl2 = array_flip ($trans_tbl2);
       return
    strtr (strtr ($string, $trans_tbl1), $trans_tbl2);

    ?>

    up
    down
    -3

    montana

    9 years ago

    under what circumstances would someone want a ntilde [ñ] to be converted into "ñ" as htmlentities does?
    the correct method of translation should return the accurate NCR for the multibyte unicode sequence
    which in this case is &#241;

    <?php

        //simple task: convert everything from utf-8 into an NCR[numeric character reference]
       
    class unicode_replace_entities
            public function
    UTF8entities($content="")
               
    $contents = $this->unicode_string_to_array($content);
               
    $swap = "";
               
    $iCount = count($contents);
                for (
    $o=0;$o<$iCount;$o++)
                   
    $contents[$o] = $this->unicode_entity_replace($contents[$o]);
                   
    $swap .= $contents[$o];
               
                return
    mb_convert_encoding($swap,"UTF-8"); //not really necessary, but why not.
           

            public function unicode_string_to_array( $string ) //adjwilli
               
    $strlen = mb_strlen($string);
                while (
    $strlen)
                   
    $array[] = mb_substr( $string, 0, 1, "UTF-8" );
                   
    $string = mb_substr( $string, 1, $strlen, "UTF-8" );
                   
    $strlen = mb_strlen( $string );
               
                return
    $array;
           

            public function unicode_entity_replace($c) //m. perez
               
    $h = ord($c0);   
                if (
    $h <= 0x7F)
                    return
    $c;
                else if (
    $h < 0xC2)
                    return
    $c;
               
               
                if (
    $h <= 0xDF)
                   
    $h = ($h & 0x1F) << 6 else if ($h <= 0xEF) (ord($c1) & 0x3F) << 6 else if ($h <= 0xF4) (ord($c3) & 0x3F);
                   
    $h = "&#" . $h . ";";
                    return
    $h;
               
           
       
    //
       
        //utf-8 environment   
       
    $content = "<strong>baño baño baño</strong>日本語 = nihongo da ze.<br />";

        $oUnicodeReplace = new unicode_replace_entities();
       
    $content = $oUnicodeReplace->UTF8entities($content);
        echo
    "<br />Result:<br />";
        echo
    $content;
       
    $source = htmlentities($content);
        echo
    "<br />htmlentities of resulting data:<br />";
        echo
    $source;

        echo "<br /><br />Note: Entities get replaced with 'literals' in textarea FF3<br /><br />";
        echo
    "<textarea style='width:300px;height:150px;'>";
        echo
    $content;
        echo
    "</textarea>";
       
        echo
    "<br /><br />For editing NCR's rather than 'literals' in a textarea<br /><br />";
        echo
    "<textarea style='width:300px;height:150px;'>";
        echo
    preg_replace("/(&#)+/","&amp;#",$content); 
        echo
    "</textarea>";

    ?>

    up
    down
    -3

    brianhamner at yahoo dot com

    9 years ago

    If you want something simple that actually works, try this. Strips MS word and other entities and returns a clear data string:

    <?php
    //call this function

    function DoHTMLEntities ($string)
       
    $trans_tbl[chr(145)] = '&#8216;';
       
    $trans_tbl[chr(146)] = '&#8217;';
       
    $trans_tbl[chr(147)] = '&#8220;';
       
    $trans_tbl[chr(148)] = '&#8221;';
       
    $trans_tbl[chr(142)] = '&eacute;';
       
    $trans_tbl[chr(150)] = '&#8211;';
       
    $trans_tbl[chr(151)] = '&#8212;';
        return
    strtr ($string, $trans_tbl);

    //insert your string variable here

            $foo = str_replace("\r\n\r\n","",htmlentities($your_string));
           
    $foo2 = str_replace("\r\n"," ",$foo);
           
    $foo3 = str_replace(" & ","&amp;",$foo2);
            echo
    DoHTMLEntities ($foo3);
    ?>

    up
    down
    -4

    galert420 at gmail dot com

    8 years ago

    Croatian entites

    <?php
    $ent
    = array(
       
    'Ć'=>'&#262;',
       
    'ć'=>'&#263;',
       
    'Č'=>'&#268;',
       
    'č'=>'&#269;',
       
    'Đ'=>'&#272',
       
    'đ'=>'&#273',
       
    'Š'=>'&#352',
       
    'š'=>'&#353',
       
    'Ž'=>'&#381',
       
    'ž'=>'&#382'
    );

    echo strtr('ĆćČčĐ𩹮ž', $ent);
    ?>

    up
    down
    -2

    sirarthur at sirarthur dot info

    9 years ago

    When happens that you want to encode special characters but not the HTML tags using this function you've two options:

    a) Build your own function and go replace by character; eg.

    <?php
     
    for($i = 0; $i < strlen($string); $i++)
         switch(
    substr($string,$i,1))
           
    //..... A VERY HUGE switch here with all characters to encode.
       

    ?>

    b) use this function and simple restore the html tags afterwards. Which gives you a 6 line function as follow:

    <?php
     
    function keephtml($string)
             
    $res = htmlentities($string);
             
    $res = str_replace("&lt;","<",$res);
             
    $res = str_replace("&gt;",">",$res);
             
    $res = str_replace("&quot;",'"',$res);
             
    $res = str_replace("&amp;",'&',$res);
              return
    $res;

    ?>

    up
    down
    -2

    Kenneth Kin Lum

    10 years ago

    use htmlspecialchars() if you are passing in a usual ASCII string.  It is faster than htmlentities().

    For example, if you are just doing

    htmlentities('<div style="background: #fff"></div>');

    then you can just use htmlspecialchars().  htmlentities() will look for all possible ways to convert string into html entities, such as &copy; or &eacute; (which is e with an acute accent on top).

    Note that ASCII is just 7 bit, which is 0x00 to 0x7F.  htmlspecialchars() will handle characters inside this range already.  htmlentities() is for the 8-bit Latin-1 (ISO-8859-1) to handle European characters, or for UTF-8 when the 3rd argument is "UTF-8" to handle UTF-8 characters, or other types of encodings using different values for the 3rd argument passed into htmlentities().

    up
    down
    -2

    info at pirandot dot de

    12 years ago

    The data returned by a text input field is ready to be used in a data base query when enclosed in single quotes, e.g.
    <?php
       mysql_query
    ("SELECT * FROM Article WHERE id = '$data'");
    ?>
    But you will get problems when writing back this data into the input field's value,
    <?php
      
    echo "<input name='data' type='text' value='$data'>";
    ?>
    because hmtl codes would be interpreted and escape sequences would cause strange output.

    The following function may help:
    <?php
    function deescape ($s, $charset='UTF-8')

       //  don't interpret html codes and don't convert quotes
      
    $s  htmlentities ($s, ENT_NOQUOTES, $charset);

       //  delete the inserted backslashes except those for protecting single quotes
      
    $s  preg_replace ("/\\\\([^'])/e", '"&#" . ord("$1") . ";"', $s);

       //  delete the backslashes inserted for protecting single quotes
      
    $s  str_replace ("\\'", "&#" . ord ("'") . ";", $s);

       return  $s;

    ?>
    Try some input like:  a'b"c\d\'e\"f\\g&x#27;h  to test ...

    up
    down
    -3

    marktpitman at gmail dot com

    11 years ago

    I just thought I would add that if you're using the default charset, htmlentities will not correctly return the trademark ( ™ ) sign.

    Instead it will return something like this: �

    If you need the trademark symbol, use:

    <?php htmlentities( $html, ENT_QUOTES, "UTF-8" ); ?>

    up
    down
    -6

    php dot net at softmoon-webware dot com

    8 years ago

    <?php
    $HTML_ENTS
    =array("quot", "amp", "apos", "lt", "gt", "nbsp", "iexcl", "cent",
    "pound","curren", "yen", "brvbar", "sect", "uml", "copy", "ordf", "laquo",
    "not", "shy", "reg", "macr", "deg", "plusmn", "sup2", "sup3", "acute",
    "micro", "para", "middot", "cedil", "sup1", "ordm", "raquo", "frac14",
    "frac12", "frac34", "iquest", "Agrave", "Aacute", "Acirc", "Atilde", "Auml",
    "Aring", "AElig", "Ccedil", "Egrave", "Eacute", "Ecirc", "Euml", "Igrave",
    "Iacute", "Icirc", "Iuml", "ETH", "Ntilde", "Ograve", "Oacute", "Ocirc",
    "Otilde", "Ouml", "times", "Oslash", "Ugrave", "Uacute", "Ucirc", "Uuml",
    "Yacute", "THORN", "szlig", "agrave", "aacute", "acirc", "atilde", "auml",
    "aring", "aelig", "ccedil", "egrave", "eacute", "ecirc", "euml", "igrave",
    "iacute", "icirc", "iuml", "eth", "ntilde", "ograve", "oacute", "ocirc",
    "otilde", "ouml", "divide", "oslash", "ugrave", "uacute", "ucirc", "uuml",
    "yacute", "thorn", "yuml", "OElig", "oelig", "Scaron", "scaron", "Yuml",
    "fnof", "circ", "tilde", "Alpha", "Beta", "Gamma", "Delta", "Epsilon",
    "Zeta", "Eta", "Theta", "Iota", "Kappa", "Lambda", "Mu", "Nu", "Xi",
    "Omicron", "Pi", "Rho", "Sigma", "Tau", "Upsilon", "Phi", "Chi", "Psi",
    "Omega", "alpha", "beta", "gamma", "delta", "epsilon", "zeta", "eta",
    "theta", "iota", "kappa", "lambda", "mu", "nu", "xi", "omicron", "pi",
    "rho", "sigmaf", "sigma", "tau", "upsilon", "phi", "chi", "psi", "omega",
    "thetasym", "upsih", "piv", "ensp", "emsp", "thinsp", "zwnj", "zwj", "lrm",
    "rlm", "ndash", "mdash", "lsquo", "rsquo", "sbquo", "ldquo", "rdquo",
    "bdquo", "dagger", "Dagger", "bull", "hellip", "permil", "prime", "Prime",
    "lsaquo", "rsaquo", "oline", "frasl", "euro", "image", "weierp", "real",
    "trade", "alefsym", "larr", "uarr", "rarr", "darr", "harr", "crarr", "lArr",
    "uArr", "rArr", "dArr", "hArr", "forall", "part", "exist", "empty", "nabla",
    "isin", "notin", "ni", "prod", "sum", "minus", "lowast", "radic", "prop",
    "infin", "ang", "and", "or", "cap", "cup", "int", "there4", "sim", "cong",
    "asymp", "ne", "equiv", "le", "ge", "sub", "sup", "nsub", "sube", "supe",
    "oplus", "otimes", "perp", "sdot", "lceil", "rceil", "lfloor",
    "rfloor", "lang", "rang", "loz", "spades", "clubs", "hearts", "diams");

    // The selection of tags below is optimized for use with a webmaster's database,
    // --NOT-- to process user POSTs from the World Wide Web
    //  for inclusion on a public page.

    //  NOT included:
    //   form,  input,  select,  option,  label,  optgroup,  textarea,  area,  map,
    //   html,  head,  style,  link,  meta,  base,  body,  isindex,
    //   frame,  frameset,  noframes
    //  (include those above at your wish,  remove those below at your wish)
    $HTML_TAGS=array("a", "abbr", "acronym", "address", "applet", "b", "basefont",
    "bdo", "big", "blockquote", "br", "button", "caption", "center", "cite",
    "code", "col", "colgroup", "dd", "del", "dfn", "dir", "div", "dl", "dt", "em",
    "embed", "fieldset", "font", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "i",
    "iframe", "img", "ins", "kbd", "legend", "li", "menu", "noembed", "noscript",
    "object", "ol", "p", "param", "pre", "q", "s", "samp", "script", "small",
    "span", "strike", "strong", "sub", "sup", "table", "tbody", "td", "tfoot",
    "th", "thead", "title", "tr", "tt", "u", "ul", "var");

    $Xchars = array(
    128 => '&#8364;',
    130 => '&#8218;',
    131 => '&#402;',
    132 => '&#8222;',
    133 => '&#8230;',
    134 => '&#8224;',
    135 => '&#8225;',
    136 => '&#710;',
    137 => '&#8240;',
    138 => '&#352;',
    139 => '&#8249;',
    140 => '&#338;',
    142 => '&#381;',
    145 => '&#8216;',
    146 => '&#8217;',
    147 => '&#8220;',
    148 => '&#8221;',
    149 => '&#8226;',
    150 => '&#8211;',
    151 => '&#8212;',
    152 => '&#732;',
    153 => '&#8482;',
    154 => '&#353;',
    155 => '&#8250;',
    156 => '&#339;',
    158 => '&#382;',
    159 => '&#376;');
    ?>

    up
    down
    -3

    wwb at 3dwargamer dot net

    14 years ago

    htmlentites is a very handy function, but it fails to fix one thing which I deal with alot: word 'smart' quotes and emdashes.

    The below function replaces the funky double quotes with &quot;, funky single quotes with standard single quotes and fixes emdashes.

    <?php
       
    function CleanupSmartQuotes($text)
       
           
    $badwordchars=array(
                               
    chr(145),
                               
    chr(146),
                               
    chr(147),
                               
    chr(148),
                               
    chr(151)
                                );
           
    $fixedwordchars=array(
                               
    "'",
                               
    "'",
                               
    '&quot;',
                               
    '&quot;',
                               
    '&mdash;'
                               
    );
            return
    str_replace($badwordchars,$fixedwordchars,$text);
       
    ?>

    up
    down
    -11

    kindrosker at gmail dot com

    7 years ago

    All Codes list

    array('À'=>'&Agrave;', 'à'=>'&agrave;', 'Á'=>'&Aacute;', 'á'=>'&aacute;', 'Â'=>'&Acirc;', 'â'=>'&acirc;', 'Ã'=>'&Atilde;', 'ã'=>'&atilde;', 'Ä'=>'&Auml;', 'ä'=>'&auml;', 'Å'=>'&Aring;', 'å'=>'&aring;', 'Æ'=>'&AElig;', 'æ'=>'&aelig;', 'Ç'=>'&Ccedil;', 'ç'=>'&ccedil;', 'Ð'=>'&ETH;', 'ð'=>'&eth;', 'È'=>'&Egrave;', 'è'=>'&egrave;', 'É'=>'&Eacute;', 'é'=>'&eacute;', 'Ê'=>'&Ecirc;', 'ê'=>'&ecirc;', 'Ë'=>'&Euml;', 'ë'=>'&euml;', 'Ì'=>'&Igrave;', 'ì'=>'&igrave;', 'Í'=>'&Iacute;', 'í'=>'&iacute;', 'Î'=>'&Icirc;', 'î'=>'&icirc;', 'Ï'=>'&Iuml;', 'ï'=>'&iuml;', 'Ñ'=>'&Ntilde;', 'ñ'=>'&ntilde;', 'Ò'=>'&Ograve;', 'ò'=>'&ograve;', 'Ó'=>'&Oacute;', 'ó'=>'&oacute;', 'Ô'=>'&Ocirc;', 'ô'=>'&ocirc;', 'Õ'=>'&Otilde;', 'õ'=>'&otilde;', 'Ö'=>'&Ouml;', 'ö'=>'&ouml;', 'Ø'=>'&Oslash;', 'ø'=>'&oslash;', 'Œ'=>'&OElig;', 'œ'=>'&oelig;', 'ß'=>'&szlig;', 'Þ'=>'&THORN;', 'þ'=>'&thorn;', 'Ù'=>'&Ugrave;', 'ù'=>'&ugrave;', 'Ú'=>'&Uacute;', 'ú'=>'&uacute;', 'Û'=>'&Ucirc;', 'û'=>'&ucirc;', 'Ü'=>'&Uuml;', 'ü'=>'&uuml;', 'Ý'=>'&Yacute;', 'ý'=>'&yacute;', 'Ÿ'=>'&Yuml;', 'ÿ'=>'&yuml;');

    up
    down
    -7

    anonymous

    12 years ago

    This function will encode anything that is non Standard ASCII (that is, that is above #127 in the ascii table)

    <?php
    // allhtmlentities : mainly based on "chars_encode()"  by Tim Burgan <[email protected]> [ http://www.php.net/htmlentities]
    function allhtmlentities($string)
        if (
    strlen($string) == 0 )
            return
    $string;
       
    $result = '';
       
    $string = htmlentities($string, HTML_ENTITIES);
       
    $string = preg_split("//", $string, -1, PREG_SPLIT_NO_EMPTY);
       
    $ord = 0;
        for (
    $i = 0; $i < count($string); $i++ )
           
    $ord = ord($string[$i]);
            if (
    $ord > 127 )
               
    $string[$i] = '&#' . $ord . ';';
           
       
        return
    implode('',$string);

    ?>

    up
    down
    -5

    info at bleed dot ws

    13 years ago

    here the centralized version of htmlentities() for multibyte.

    <?php
    function mb_htmlentities($string)

        $string = htmlentities($string, ENT_COMPAT, mb_internal_encoding());
        return
    $string;

    ?>

    up
    down
    -12

    sanjayaggarwal1562 at gmail dot com

    1 year ago

    Your take good points, every person should think about it and they have to choose their leader after known about them I am Waiting for your new article keep posting. <a href=" https://yahoocustomerservice.co/faqs/ "> https://yahoocustomerservice.co/faqs/</a>

    add a note add a note
    • String Functions

      • addcslashes
      • addslashes
      • bin2hex
      • chop
      • chr
      • chunk_​split
      • convert_​cyr_​string
      • convert_​uudecode
      • convert_​uuencode
      • count_​chars
      • crc32
      • crypt
      • echo
      • explode
      • fprintf
      • get_​html_​translation_​table
      • hebrev
      • hebrevc
      • hex2bin
      • html_​entity_​decode
      • htmlentities
      • htmlspecialchars_​decode
      • htmlspecialchars
      • implode
      • join
      • lcfirst
      • levenshtein
      • localeconv
      • ltrim
      • md5_​file
      • md5
      • metaphone
      • money_​format
      • nl_​langinfo
      • nl2br
      • number_​format
      • ord
      • parse_​str
      • print
      • printf
      • quoted_​printable_​decode
      • quoted_​printable_​encode
      • quotemeta
      • rtrim
      • setlocale
      • sha1_​file
      • sha1
      • similar_​text
      • soundex
      • sprintf
      • sscanf
      • str_​getcsv
      • str_​ireplace
      • str_​pad
      • str_​repeat
      • str_​replace
      • str_​rot13
      • str_​shuffle
      • str_​split
      • str_​word_​count
      • strcasecmp
      • strchr
      • strcmp
      • strcoll
      • strcspn
      • strip_​tags
      • stripcslashes
      • stripos
      • stripslashes
      • stristr
      • strlen
      • strnatcasecmp
      • strnatcmp
      • strncasecmp
      • strncmp
      • strpbrk
      • strpos
      • strrchr
      • strrev
      • strripos
      • strrpos
      • strspn
      • strstr
      • strtok
      • strtolower
      • strtoupper
      • strtr
      • substr_​compare
      • substr_​count
      • substr_​replace
      • substr
      • trim
      • ucfirst
      • ucwords
      • vfprintf
      • vprintf
      • vsprintf
      • wordwrap

    To Top

    SunshinePHP 2019

      htmlentities »

      « hex2bin

      • Руководство по PHP
      • Справочник функций
      • Обработка текста
      • Строки
      • Функции для работы со строками

      Edit
      Report a Bug

      html_entity_decode

      (PHP 4 >= 4.3.0, PHP 5, PHP 7)

      html_entity_decodeПреобразует HTML-сущности в соответствующие им символы

      Описание

      string html_entity_decode
      ( string $string
      [, int $flags = ENT_COMPAT | ENT_HTML401
      [, string $encoding = ini_get("default_charset")
      ]] )

      html_entity_decode() является противоположностью функции
      htmlentities() . Она преобразует HTML-сущности в строке
      string в соответствующие им символы.

      Если быть точнее, то эта функция преобразует все сущности (в том числе все числовые
      сущности), которые а) обязательно верны для выбранного типа документа – то есть,
      для XML эта функция не преобразует именованные сущности, которые могут быть определены
      в каком-нибудь DTD – и б) их символы находятся в кодировке, соответствующей выбранной
      и разрешены в выбранном типе документа. Все другие сущности остаются без изменений.

      Список параметров

      string

      Входная строка.

      flags

      Битовая маска, состоящая из одного или более флагов, которые указывают, как
      обращаться с кавычками и какой тип документа использовать. По умолчанию маска принимает значение ENT_COMPAT | ENT_HTML401.

      Константы flags
      Имя константыОписание
      ENT_COMPATПреобразуются двойные кавычки, одинарные остаются без изменений.
      ENT_QUOTESПреобразуются и двойные, и одинарные кавычки.
      ENT_NOQUOTESОставить как двойные, так и одинарные кавычки без изменений.
      ENT_HTML401 Обрабатывать код как HTML 4.01.
      ENT_XML1 Обрабатывать код как XML 1.
      ENT_XHTML Обрабатывать код как XHTML.
      ENT_HTML5 Обрабатывать код как HTML 5.
      encoding

      Необязательный аргумент, определяющий кодировку, используемую при конвертации симоволов.

      Если не указан, то значение по умолчанию для encoding зависит
      от используемой версии PHP. В PHP 5.6 и выше, для значения по умолчанию используется
      конфигурационная опция default_charset .
      В PHP 5.4 и 5.5 используется UTF-8 по умолчанию. Более ранние версии PHP
      используют ISO-8859-1.

      Хотя этот аргумент является технически необязательным, настоятельно рекомендуется
      указать правильное значение для вашего кода, если вы используете PHP 5.5 или выше,
      или если ваша опция конфигурации default_charset
      может быть задана неверно для входных данных.

      Поддерживаются следующие кодировки:

      Поддерживаемые кодировки
      КодировкаПсевдонимыОписание
      ISO-8859-1ISO8859-1 Западно-европейская Latin-1.
      ISO-8859-5ISO8859-5 Редко используемая кириллическая кодировка (Latin/Cyrillic).
      ISO-8859-15ISO8859-15 Западно-европейская Latin-9. Добавляет знак евро, французские и
      финские буквы к кодировке Latin-1 (ISO-8859-1).
      UTF-8  8-битная Unicode, совместимая с ASCII.
      cp866ibm866, 866 Кириллическая кодировка, применяемая в DOS.
      cp1251Windows-1251, win-1251, 1251 Кириллическая кодировка, применяемая в Windows.
      cp1252Windows-1252, 1252 Западно-европейская кодировка, применяемая в Windows.
      KOI8-Rkoi8-ru, koi8r Русская кодировка.
      BIG5950 Традиционный китайский, применяется в основном на Тайване.
      GB2312936 Упрощенный китайский, стандартная национальная кодировка.
      BIG5-HKSCS  Расширенная Big5, применяемая в Гонконге.
      Shift_JISSJIS, SJIS-win, cp932, 932 Японская кодировка.
      EUC-JPEUCJP, eucJP-win Японская кодировка.
      MacRoman  Кодировка, используемая в Mac OS.
        Пустая строка активирует режим определения кодировки из файла
      скрипта (Zend multibyte),
      default_charset и текущей
      локали (см. nl_langinfo() и
      setlocale() ) в указанном порядке.
      Не рекомендуется к использованию.

      Замечание:

      Остальные кодировки не поддерживаются, вместо них будет применена
      кодировка по умолчанию и сгенерировано предупреждение.

      Возвращаемые значения

      Возвращает раскодированную строку.

      Список изменений

      ВерсияОписание
      5.6.0 Значение по умолчанию для параметра encoding было изменено
      на значение конфигурационной опции default_charset .
      5.4.0 Кодировка по умолчанию сменилась с ISO-8859-1 на UTF-8.
      5.4.0 Были добавлены константы ENT_HTML401, ENT_XML1,
      ENT_XHTML и ENT_HTML5.

      Примеры

      Пример #1 Декодирование HTML-сущностей


      <?php
      $orig 
      "I'll \"walk\" the <b>dog</b> now";

      $a htmlentities($orig);

      $b html_entity_decode($a);

      echo $a// I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now

      echo $b// I'll "walk" the <b>dog</b> now
      ?>

      Примечания

      Замечание:

      Может показаться странным, что результатом вызова
      trim(html_entity_decode(&nbsp;)); не является пустая строка.
      Причина том, что &nbsp; преобразуется не в символ с
      ASCII-кодом 32 (который удаляется функцией trim() ), а в символ с
      ASCII-кодом 160 (0xa0) в принимаемой по умолчанию кодировке ISO-8859-1.

      Смотрите также

      • htmlentities() – Преобразует все возможные символы в соответствующие HTML-сущности
      • htmlspecialchars() – Преобразует специальные символы в HTML-сущности
      • get_html_translation_table() – Возвращает таблицу преобразований, используемую функциями htmlspecialchars и htmlentities
      • urldecode() – Декодирование URL-кодированной строки

      add a note add a note

      User Contributed Notes 19 notes

      up
      down
      91

      Martin

      7 years ago

      If you need something that converts &#2+ entities to UTF-8, this is simple and works:

      <?php

      echo $output;
      ?>

      up
      down
      20

      txnull

      3 years ago

      Use the following to decode all entities:
      <?php html_entity_decode($string, ENT_QUOTES | ENT_XML1, 'UTF-8') ?>

      I've checked these special entities:
      - double quotes (&#34;)
      - single quotes (&#39; and &apos;)
      - non printable chars (e.g. &#13;)
      With other $flags some or all won't be decoded.

      It seems that ENT_XML1 and ENT_XHTML are identical when decoding.

      up
      down
      5

      Benjamin

      5 years ago

      The following function decodes named and numeric HTML entities and works on UTF-8. Requires iconv.

      function decodeHtmlEnt($str)
          $ret = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
          $p2 = -1;
          for(;;)
              $p = strpos($ret, '&#', $p2+1);
              if ($p === FALSE)
                  break;
              $p2 = strpos($ret, ';', $p);
              if ($p2 === FALSE)
                  break;
                 
              if (substr($ret, $p+2, 1) == 'x')
                  $char = hexdec(substr($ret, $p+3, $p2-$p-3));
              else
                  $char = intval(substr($ret, $p+2, $p2-$p-2));
                 
              //echo "$char\n";
              $newchar = iconv(
                  'UCS-4', 'UTF-8',
                  chr(($char>>24)&0xFF).chr(($char>>16)&0xFF).chr(($char>>8)&0xFF).chr($char&0xFF)
              );
              //echo "$newchar<$p<$p2<<\n";
              $ret = substr_replace($ret, $newchar, $p, 1+$p2-$p);
              $p2 = $p + strlen($newchar);
         
          return $ret;

      up
      down
      5

      aidan at php dot net

      14 years ago

      This functionality is now implemented in the PEAR package PHP_Compat.

      More information about using this function without upgrading your version of PHP can be found on the below link:

      http://pear.php.net/package/PHP_Compat

      up
      down
      4

      php dot net at c dash ovidiu dot tk

      13 years ago

      Quick & dirty code that translates numeric entities to UTF-8.

      <?php

          function replace_num_entity($ord)
         
             
      $ord = $ord[1];
              if (
      preg_match('/^x([0-9a-f]+)$/i', $ord, $match))
             
                 
      $ord = hexdec($match[1]);
             
              else
             
                 
      $ord = intval($ord);
             
             
             
      $no_bytes = 0;
             
      $byte = array();

              if ($ord < 128)
             
                  return
      chr($ord);
             
              elseif (
      $ord < 2048)
             
                 
      $no_bytes = 2;
             
              elseif (
      $ord < 65536)
             
                 
      $no_bytes = 3;
             
              elseif (
      $ord < 1114112)
             
                 
      $no_bytes = 4;
             
              else
             
                  return;
             

              switch($no_bytes)
             
                  case
      2:
                 
                     
      $prefix = array(31, 192);
                      break;
                 
                  case
      3:
                 
                     
      $prefix = array(15, 224);
                      break;
                 
                  case
      4:
                 
                     
      $prefix = array(7, 240);
                 
             

              for ($i = 0; $i < $no_bytes; $i++)
             
      128;
             

              $byte[0] = ($byte[0] & $prefix[0])

          $test = 'This is a &#269;&#x5d0; test&#39;';

          echo $test . "<br />\n";
          echo
      preg_replace_callback('/&#([0-9a-fx]+);/mi', 'replace_num_entity', $test);

      ?>

      up
      down
      2

      me at richardsnazell dot com

      10 years ago

      I had a problem getting the 'TM' trademark symbol to display correctly in an email subject line. Using html_entity_decode() with different charsets didn't work, but directly replacing the entity with it's ASCII equivalent did:

      $subject = str_replace('&trade;', chr(153), $subject);

      up
      down
      0

      Daniel A.

      4 months ago

      I wanted to use this function today and I found the documentation, especially about the flags, not particularly helpful.

      Running the code below, for example, failed because the flag I used was the wrong one...

      $string = 'Donna&#039;s Bakery';
      $title = html_entity_decode($string, ENT_HTML401, 'UTF-8');
      echo $title;

      The correct flag to use in this case is ENT_QUOTES.

      My understanding of the flag to use is the one that would correspond to the expected, converted outcome. So, ENT_QUOTES for a character that would be a single or double quote when converted... and so on.

      Please help make the documentation a bit clearer.

      up
      down
      1

      Free at Key dot no

      8 years ago

      Handy function to convert remaining HTML-entities into human readable chars (for entities which do not exist in target charset):

      <?php
      function cleanString($in,$offset=null)

          $out = trim($in);
          if (!empty(
      $out))
         
             
      $entity_start = strpos($out,'&',$offset);
              if (
      $entity_start === false)
             
                 
      // ideal
                 
      return $out;   
             
              else
             
                 
      $entity_end = strpos($out,';',$entity_start);
                  if (
      $entity_end === false)
                 
                       return
      $out;
                 
                 
      // zu lang um eine entity zu sein
                 
      else if ($entity_end > $entity_start+7)
                 
                      
      // und weiter gehts
                      
      $out = cleanString($out,$entity_start+1);
                 
                 
      // gottcha!
                 
      else
                 
                      
      $clean = substr($out,0,$entity_start);
                      
      $subst = substr($out,$entity_start+1,1);
                      
      // &scaron; => "s" / &#353; => "_"
                      
      $clean .= ($subst != "#") ? $subst : "_";
                      
      $clean .= substr($out,$entity_end+1);
                      
      // und weiter gehts
                      
      $out = cleanString($clean,$entity_start+1);
                 
             
         
          return
      $out;

      ?>

      up
      down
      0

      neurotic dot neu at gmail dot com

      8 years ago

      This is a safe rawurldecode with utf8 detection:

      <?php
      function utf8_rawurldecode($raw_url_encoded)
         
      $enc = rawurldecode($raw_url_encoded);
          if(
      utf8_encode(utf8_decode($enc))==$enc);
              return
      rawurldecode($raw_url_encoded);
          else
              return
      utf8_encode(rawurldecode($raw_url_encoded));
         

      ?>

      up
      down
      -1

      Matt Robinson

      9 years ago

      I wrote in a previous comment that html_entity_decode() only handled about 100 characters. That's not quite true; it only handles entities that exist in the output character set (the third argument). If you want to get ALL HTML entities, make sure you use ENT_QUOTES and set the third argument to 'UTF-8'.

      If you don't want a UTF-8 string, you'll need to convert it afterward with something like utf8_decode(), iconv(), or mb_convert_encoding().

      If you're producing XML, which doesn't recognise most HTML entities:

      When producing a UTF-8 document (the default), then htmlspecialchars(html_entity_decode($string, ENT_QUOTES, 'UTF-8'), ENT_NOQUOTES, 'UTF-8') (because you only need to escape < and > and & unless you're printing inside the XML tags themselves).

      Otherwise, either convert all the named entities to numeric ones, or declare the named entities in the document's DTD. The full list of 252 entities can be found in the HTML 4.01 Spec, or you can cut and paste the function from my site ( http://inanimatt.com/php-convert-entities.php ).

      up
      down
      -4

      Victor

      7 years ago

      We were having very peculiar behavior regarding foreign characters such as e-acute.

      However, it was only showing up as a problem when extracting those characters out of our mysql database and when being displayed through a proxy server of ours that handles dns issues.

      As other users have made a note of, the default character setting wasn't what they were expecting it to be when they left theirs blank.

      When we changed our default_charset to "UTF-8", our problems and needs for using functions like these were no longer necessary in handling foreign characters such as e-acute. Good enough for us!

      up
      down
      -2

      jojo

      12 years ago

      The decipherment does the character encoded by the escape function of JavaScript.
      When the multi byte is used on the page, it is effective.

      javascript escape('aaああaa') ..... 'aa%u3042%u3042aa'
      php  jsEscape_decode('aa%u3042%u3042aa')..'aaああaa'

      <?php
      function jsEscape_decode($jsEscaped,$outCharCode='SJIS')
         
      $arrMojis = explode("%u",$jsEscaped);
          for (
      $i = 1;$i < count($arrMojis);$i++)
             
      $c = substr($arrMojis[$i],0,4);
             
      $cc = mb_convert_encoding(pack('H*',$c),$outCharCode,'UTF-16');
             
      $arrMojis[$i] = substr_replace($arrMojis[$i],$cc,0,4);
         
          return
      implode('',$arrMojis);

      ?>

      up
      down
      -5

      florianborn (at) yahoo (dot) de

      13 years ago

      Note that

      <?php

      echo urlencode(html_entity_decode("&nbsp;"));

      ?>

      will output "%A0" instead of "+".

      up
      down
      -8

      daniel at brightbyte dot de

      14 years ago

      This function seems to have to have two limitations (at least in PHP 4.3.8):

      a) it does not work with multibyte character codings, such as UTF-8
      b) it does not decode numeric entity references

      a) can be solved by using iconv to convert to ISO-8859-1, then decoding the entities, than convert to UTF-8 again. But that's quite ugly and detroys all characters not present in Latin-1.

      b) can be solved rather nicely using the following code:

      <?php
      function decode_entities($text)
         
      $text= html_entity_decode($text,ENT_QUOTES,"ISO-8859-1"); #NOTE: UTF-8 does not work!
         
      $text= preg_replace('/&#(\d+);/me',"chr(\\1)",$text); #decimal notation
         
      $text= preg_replace('/&#x([a-f0-9]+);/mei',"chr(0x\\1)",$text);  #hex notation
         
      return $text;

      ?>

      HTH

      up
      down
      -8

      jl dot garcia at gmail dot com

      9 years ago

      I created this function to filter all the text that goes in or comes out of the database.

      <?php
      function filter_string($string, $nohtml='', $save='')
          if(!empty(
      $nohtml))
             
      $string = trim($string);
              if(!empty(
      $save)) $string = htmlentities(trim($string), ENT_QUOTES, 'ISO-8859-15');
              else
      $string = html_entity_decode($string, ENT_QUOTES, 'ISO-8859-15');
         
          if(!empty(
      $save)) $string = mysql_real_escape_string($string);
          else
      $string = stripslashes($string);
          return(
      $string);

      ?>

      up
      down
      -16

      grvg (at) free (dot) fr

      12 years ago

      Here is the ultimate functions to convert HTML entities to UTF-8 :
      The main function is htmlentities2utf8
      Others are helper functions

      <?php
      function chr_utf8($code)
          (
      $code >> 12)) . chr(128

          // Callback for preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $str);
         
      function html_entity_replace($matches)
         
              if (
      $matches[2])
             
                  return
      chr_utf8(hexdec($matches[3]));
              elseif (
      $matches[1])
             
                  return
      chr_utf8($matches[3]);
             
              switch (
      $matches[3])
             
                  case
      "nbsp": return chr_utf8(160);
                  case
      "iexcl": return chr_utf8(161);
                  case
      "cent": return chr_utf8(162);
                  case
      "pound": return chr_utf8(163);
                  case
      "curren": return chr_utf8(164);
                  case
      "yen": return chr_utf8(165);
                 
      //... etc with all named HTML entities
             

              return
      false;
         
         
          function
      htmlentities2utf8 ($string) // because of the html_entity_decode() bug with UTF-8
         

             
      $string = preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $string);
              return
      $string;
         
      ?>

      up
      down
      -11

      marion at figmentthinking dot com

      9 years ago

      I just ran into the:
      Bug #27626 html_entity_decode bug - cannot yet handle MBCS in html_entity_decode()!

      The simple solution if you're still running PHP 4 is to wrap the html_entity_decode() function with the utf8_decode() function.

      <?php
      $string
      = '&nbsp;';
      $utf8_encode = utf8_encode(html_entity_decode($string));
      ?>

      By default html_entity_decode() returns the ISO-8859-1 character set, and by default utf8_decode()...

      http://us.php.net/manual/en/function.utf8-decode.php
      "Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1"

      up
      down
      -12

      kae at verens dot com

      10 years ago

      the references to 'chr()' in the example unhtmlentities() function should be changed to unichr, using the example unichr() function described in the 'chr' reference ( http://php.net/chr ).

      the reason for this is characters such as &#x20AC; which do not break down into an ASCII number (that's the Euro, by the way).

      up
      down
      -5

      slickriptide at gmail dot com

      2 years ago

      When using this function, it's a good idea to pay attention when it says that leaving the charset parameter empty is "not recommended".

      I had an issue where I was storing text files, with entities converted, into a database. When I retrieved them later and ran

      $text_file = html_entity_decode($text_data);

      the entities were NOT decoded.

      Once I was aware of the problem, I changed the decode call to fully specify all of the parameters:

      $text_file = html_entity_decode($text_data, ENT_COMPAT | ENT_HTML5,'utf-8');

      This converted the entities as expected.

      add a note add a note
      • Функции для работы со строками

        • addcslashes
        • addslashes
        • bin2hex
        • chop
        • chr
        • chunk_​split
        • convert_​cyr_​string
        • convert_​uudecode
        • convert_​uuencode
        • count_​chars
        • crc32
        • crypt
        • echo
        • explode
        • fprintf
        • get_​html_​translation_​table
        • hebrev
        • hebrevc
        • hex2bin
        • html_​entity_​decode
        • htmlentities
        • htmlspecialchars_​decode
        • htmlspecialchars
        • implode
        • join
        • lcfirst
        • levenshtein
        • localeconv
        • ltrim
        • md5_​file
        • md5
        • metaphone
        • money_​format
        • nl_​langinfo
        • nl2br
        • number_​format
        • ord
        • parse_​str
        • print
        • printf
        • quoted_​printable_​decode
        • quoted_​printable_​encode
        • quotemeta
        • rtrim
        • setlocale
        • sha1_​file
        • sha1
        • similar_​text
        • soundex
        • sprintf
        • sscanf
        • str_​getcsv
        • str_​ireplace
        • str_​pad
        • str_​repeat
        • str_​replace
        • str_​rot13
        • str_​shuffle
        • str_​split
        • str_​word_​count
        • strcasecmp
        • strchr
        • strcmp
        • strcoll
        • strcspn
        • strip_​tags
        • stripcslashes
        • stripos
        • stripslashes
        • stristr
        • strlen
        • strnatcasecmp
        • strnatcmp
        • strncasecmp
        • strncmp
        • strpbrk
        • strpos
        • strrchr
        • strrev
        • strripos
        • strrpos
        • strspn
        • strstr
        • strtok
        • strtolower
        • strtoupper
        • strtr
        • substr_​compare
        • substr_​count
        • substr_​replace
        • substr
        • trim
        • ucfirst
        • ucwords
        • vfprintf
        • vprintf
        • vsprintf
        • wordwrap

      To Top