can any one tell me what is up with all the random ("ñ")'s scattered in some posts?

Status
Not open for further replies.

leenix

Senior Member
ECF Veteran
Feb 23, 2011
111
3
Upstate NY USA
Hi all, been here a couple weeks now and once in a while, sometimes frequently, i see words with things like "ñ" just seemingly randomly thrown into them. many times this odd interjection seems to be where the '%' symbol should be but i just noticed this section of a post where I'm sure the symbol shouldn't be:

'With ... never heard of that one. Sounds kind of good though. never thought of chocolate covered jalapeños."

If anyone knows why these seemingly random strings are scattered throughout many posts please let me know. it really is bugging me that i see these all the time and cant figure out what its about.
:confused:
Thanks
happy vapes.
 
I've seen them too and i can tell you what they are...


They are called "HTML entitites", and are bits of code used to harmlessly display certain symbols in a webpage.

For example, the "&" symbol is & and the copysight symbol is ©, which are the easy to remember shorthands for most commonly used symbols.

The numbers that you see here are the decimal representation of the same, which is easier for the machine to use, since it's a direct conversion from the # of the cell which holds that particular letter in the ANSI and Unicode character tables.



The problem is, that the forum server sanitizes posts twice, most likely when a person has edited their post and resubmitted. I'll have to PM the Admin and let them know..

If you decide to look at the HTML source of pages with this bug, you'll see that after the symbol has been "escaped", the leading ampersand has been "escaped" again, causing you to see the character code, and not the character.

-[Arsenic] :)
 

dormouse

ECF Guru
ECF Veteran
Oct 31, 2010
12,347
1,611
Pennsylvania
You actually see that in posts? I have seen similar substitutions in URLs. I don't get them in posts. I would guess it's the browser or font you use. You need a font that includes the characters with values above 127 decimal. Those higher numbers include all of the accented characters as well as a number of math and science symbols and some that are just graphical.

chart
The Extended ASCII Chart

Not all fonts have them especially if you use a special language font, or an Asian font which may interpret a character > 127 as the first byte of an Asian 2-byte character in some pre-Unicode character encodings.
 
Last edited:
P.S. I forgot to mention that the reason "sanitizing" posts to "escape" characters is necessary, is to pervent people from inserting their own HTML and Javascript into the site.

The < and >, so crucial to making HTML code work are reduced to < and > respectively, so you can be sure that a spammer won't hijack the site with a ...... ad taking up 100% of the screen and following you where ever you go.
 
@Dormouse,

It's not an issue with the font. If it were, people would just see a box in place of the character, just as you would if you were to go to a Chinese site, without having a Chinese language pack installed.

I'm a web developer myself, so i'm not just guessing here. What it is, is double-sanitized posts, and since so few people use accented characters in their posts and even fewer of those posts get "twice cooked", you would probably not run into them.

-[Arsenic] :)
 
Piñata test 1....

This is the post i won't edit....

Oh, and @Dormouse, regarding URLs, that's a pretty odd thing for them to have done, unless you mean a form is set to use a GET request when submitting, and the developer has escaped everything outside [A-Z],[a-z] and [0-9]... that would probably do it.

But the correct way for a URL would be to use HEX (base 16) instead of decimal (base 10) for those non-safe chars... like you'd see %20 in place of a whitespace character ( )...

And regarding asian fonts, yeah, in a very basic font rendering engine you would see it sometimes choke like that, though something more modern usually just substitutes Arial or Helvetica for the missing charsets... and Arial itself, being the font used here, has a pretty wide charset support

Anyway, i hope they will come around to fixing it

It was nice talking to you :)


-[Arsenic]
 
Last edited:
Piñata test 2....

This is the post that should break when i edit it....

Aaaand it looks like it's not breaking yet :2cool:


It's most definitely a serverside issue, so i'm now thinking it's between the database and processing scripts, somewhere in there

It could be also that the user had RichText mode enabled in the forum's text inputs, and that's messed it up... i don't have the time to test that now
 
Last edited:

dormouse

ECF Guru
ECF Veteran
Oct 31, 2010
12,347
1,611
Pennsylvania
I'm an embedded systems programmer and wrote Chinese and Japanese text entry systems in a past life and an assembler game programmer before that. But yeah I am light on HTML stuff or anything with a 4-letter acronym.

Yeah I was thinking of the % numbers for special chars in a url

Just want to see what jalapeño looks like for me

...

Why does it look fine for me with the n with a ~ on top of it?

What does it look like to others?
 
Last edited:

OaklandCA

Super Member
ECF Veteran
Feb 27, 2010
520
11
OaklandCA
I'm an embedded systems programmer and wrote Chinese and Japanese text entry systems in a past life and an assembler game programmer before that. But yeah I am light on HTML stuff or anything with a 4-letter acronym.

Yeah I was thinking of the % numbers for special chars in a url

Just want to see what jalapeño looks like for me

...

Why does it look fine for me with the n with a ~ on top of it?

What does it look like to others?

does it work inside of quotes? back to edit
 
Last edited:

wolcen

Ultra Member
ECF Veteran
Verified Member
Mar 9, 2011
1,182
1,302
Boston, MA
www.wolcen.com
Is anyone else seeing the ñ substitutions? I'm having a hard time even finding a post with that. I'm starting to think either it is occurring only during the handling of certain types of elements in a posting (like OaklandCA tried to test), or it's otherwise being substituted by the browser.
 

OaklandCA

Super Member
ECF Veteran
Feb 27, 2010
520
11
OaklandCA
> In HTML 3.2, if I code Español using Español, the
> validator at The W3C Markup Validation Service does not do what I
> expect.

Then you need to tune your expectations.

> This page validates as HTML 3.2. The semicolon is
> missing after 241. I think the page is not valid HTML
> 3.2.

The construct Español contains no markup error. In HTML 3.2,
SGML rules apply, so that "The refc [= semicolon] or
RE [= record end, i.e. end of line] can be omitted only if the reference
is not followed by a character that could occur in the reference, or by a
character that could be interpreted as the omitted reference end."
(The SGML standard, clause 9.4.5)
Since the letter "o" cannot be part of the character reference,
the semicolon is optional.
 

OaklandCA

Super Member
ECF Veteran
Feb 27, 2010
520
11
OaklandCA
ahha that explains it....seriously the poster should go to:
http://www.mountaindragon.com/html/iso.htm
and see whether the characters are all legible
Entity Name ISO Code Code Result HTML Entity Code Code Result Entity Name ISO Code Code Result HTML Entity Code Code Result
space (n/a) (n/a) Capital A acute Á Á Á Á
double quotes " " " " Capital A circumflex    Â
ampersand & & & & Capital A tilde à à à Ã
less-than < < < < Capital A umlaut Ä Ä Ä Ä
greater-than > > > > Capital A ring Å Å Å Å
vertical bar | | (n/a) (n/a) Capital A-E ligature Æ Æ Æ Æ
function ƒ ƒ (n/a) (n/a) Capital C cedilla Ç Ç Ç Ç
double comma „ „ (n/a) (n/a) Capital E grave È È È È
elipses … … (n/a) (n/a) Capital E acute É É É É
dagger † † (n/a) (n/a) Capital E circumflex Ê Ê Ê Ê
double dagger ‡ ‡ (n/a) (n/a) Capital E umlaut Ë Ë Ë Ë
circumflex ˆ ˆ (n/a) (n/a) Capital I grave Ì Ì Ì Ì
percentage ‰ ‰ (n/a) (n/a) Capital I acute Í Í Í Í
Capital S hot check Š Š (n/a) (n/a) Capital I circumflex Î Î Î Î
opening quimet ‹ ‹ (n/a) (n/a) Capital I umlaut Ï Ï Ï Ï
Capital O-E ligature Œ Œ Œ Œ Capital ETH Icelandic Ð Ð Ð Ð
(n/a)  � (n/a) (n/a) Capital N tilde Ñ Ñ Ñ Ñ
open single quote ‘ ‘ (n/a) (n/a) Upercase O grave Ò Ò Ò Ò
close single quote ’ ’ (n/a) (n/a) Capital O acute Ó Ó Ó Ó
open double quotes “ “ (n/a) (n/a) Capital O circumflex Ô Ô Û Û
close double quotes ” ” (n/a) (n/a) Capital O tilde Õ Õ Õ Õ
bullet • • • • Capital O umlaut Ö Ö Ö Ö
en dash – – (n/a) (n/a) multiplication × × × ×
em dash — — (n/a) (n/a) Capital O
slash Ø Ø Ø Ø
tilde ˜ ˜ ˜ ˜ Capital U grave Ù Ù Ù Ù
trade mark ™ ™ ™ ™ Capital U acute Ú Ú Ã Ã
small s hot check š š (n/a) (n/a) Capital U circumflex Û Û Ã Ã
closing quimet › › (n/a) (n/a) Capital U umlaut Ü Ü Ü Ü
small o-e ligature œ œ œ œ Capital Y acute Ý Ý Ý Ý
Capital Y umlat Ÿ Ÿ Ÿ Ÿ Capital Thorn
Icelandic Þ Þ Þ Þ
non-breaking space     small sharp s
German ß ß ß ß
inverted exclamation ¡ ¡ ¡ ¡ small a grave à à à à
cent ¢ ¢ ¢ ¢ small a acute á á á á
pound sterling £ £ £ £ small a circumflex â â â â
currency ¤ ¤ ¤ ¤ small a tilde ã ã ã ã
yen ¥ ¥ ¥ ¥ small a umlaut ä ä ä ä
broken bar ¦ ¦ &brkbar; &brkbar; small a ring å å å å
section § § § § small a-e ligature æ æ æ æ
umlaut (diaeresis) ¨ ¨ ¨ ¨ small c cedilla ç ç ç ç
copyright © © © © small e grave è è è è
feminine ordinal ª ª ª ª small e acute é é é é

left angle quote (guillemotleft) « « « « small e circumflex ê ê ê ê
not sign ¬ ¬ ¬ ¬ small e umlaut ë ë ë ë
soft hyphen ­ . ­ . small i grave ì ì ì ì
registered ® ® ® ® small i acute í í í í
macron ¯ ¯ &hibar; &hibar; small i circumflex î î î î
degree ° ° ° ° small i umlaut ï ï ï ï
plus-minus ± ± ± ± small eth Icelandic ð ð ð ð
superscript 2 ² ² ² ² small n tilde ñ ñ ñ ñ
superscript 3 ³ ³ ³ ³ small o grave ò ò ò ò
acute accent ´ ´ ´ ´ small o acute ó ó ó ó
micro µ µ µ µ small o circumflex ô ô ô ô
pilcrow ¶ ¶ ¶ ¶ small o tilde õ õ õ õ
middle dot · · · · small o umlaut ö ö ö ö
cedilla ¸ ¸ ¸ ¸ division ÷ ÷ ÷ ÷
superscript 1 ¹ ¹ ¹ ¹ small o slash ø ø ø ø
masculine ordinal º º º º small u grave ù ù ù ù
right angle quote (guillemotright) » » » » small u acute ú ú ú ú
one quarter ¼ ¼ ¼ ¼ small u circumflex û û û û
one half ½ ½ ½ ½ small u umlaut ü ü ü ü
three quarters ¾ ¾ ¾ ¾ small y acute ý ý ý ý
inverted question mark ¿ ¿ ¿ ¿ small thorn Icelandic þ þ þ þ
Capital A grave À À À À small y umlaut ÿ ÿ ÿ ÿ

Return to Top of page.
 
Last edited:
Status
Not open for further replies.

Users who are viewing this thread