Cyrillic character entity names won't validate.

Question

MidiMagic 579 Nearly a Senior Poster

9 Years Ago

I wrote a web page with one Russian word in it, using the Cyrillic alphabet character entity names (found in a list of all Unicode characters that I have). The page works fine in every browser I tried. But the W3C validator won't validate the page because the Cyrillic character entity names were used. It validates if I use the Unicode numbers instead, but that is such clanky programming. It reminds me of having to type in the ASCII code for every character the program displays in assembly language programming.

Why can't we have the better programming practices? W3C seems to want to deprecate all but 5 of the character entity names. This sounds like going back to the stone age.

assembly html-css web-browser

3 Contributors
14 Replies
516 Views
2 Weeks Discussion Span
Latest Post 9 Years Ago Latest Post by diafol

All 14 Replies

diafol

9 Years Ago

Have you set encoding to utf-8?

Edit: sorry not sure I understood properly. WHat is it actually you need help with?

Could you give examples?

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Page Title</title>
</head>
<body>
АВДИ
</body>
</html>

Validated OK for me.

Edited 9 Years Ago by diafol

almostbob 866 Retired: passive income ROCKS

9 Years Ago

try wrappingthe cyriliic in <span lang='ru'>АБВГДЕЖЅ</span>

w3c declaring language

Edited 9 Years Ago by almostbob

diafol commented: Good info +15

diafol

9 Years Ago

Try a proper editor. Even notepad accepts.

Edited 9 Years Ago by diafol

almostbob 866 Retired: passive income ROCKS

9 Years Ago

@diafol:
his source/editor appears functional, the W3 validator picks a unicode fault thats hard to find, 'cause its not really there.

Edited 9 Years Ago by almostbob

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

diafol · Answer 1 · 2016-02-29T07:24:35+00:00

Btw the ???? Above are ABD and back to front N. Daniweb just ain.t up to displaying them in msgs.

MidiMagic 579 Nearly a Senior Poster · Answer 2 · 2016-02-29T21:07:56+00:00

I am using the doctype for xhtml. Does that make a difference?

Yes, I am using utf-8.

As an example, it did not recognize the character entity & YAcy; (the backward R) as valid.
It does recongize & #1071; (same character) as valid.
(I inserted a space after the & here to prevent encoding.)

Every browser I tried accepted & YAcy; and displayed the correct character.

diafol · Answer 3 · 2016-02-29T21:11:07+00:00

OK, any reason why you're using xhtml instead of html5?
Also you should be able to insert the characters directly, not use the HTML encodings - that would be ridiculous for anybody using anything other than English :)

diafol · Answer 4 · 2016-02-29T21:21:24+00:00

This validated:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>Page Title</title>
</head>
<body>
АВДИ
</body>
</html>

???? = ABDN again
THere was no problem. I didn't try with strict mode.

diafol · Answer 5 · 2016-03-02T21:43:02+00:00

OK, just read back the post again. You're using entity names (the ones starting with &...; like 'yacy'). Why are you using entity names instead of typing the actual characters directly? You can even use something as simple as charmap to do this.

BTW - I get 'yacy' to work and 'Yacy'.

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Page Title</title>
</head>
<body>
&yacy; &YAcy;
</body>
</html>

diafol · Answer 6 · 2016-03-02T21:54:58+00:00

Aha. Yes I get the error message if using XHTML or HTML4. Change to HTML5 if you can. I only just realised what it was that was causing the issue - doh! Does the document need to be validated as XML? If not use HTML5.

MidiMagic 579 Nearly a Senior Poster · Answer 7 · 2016-03-02T23:03:57+00:00

My entire site is xhtml 1.0 strict as required by the webmaster for uniformity. It is the price I gladly pay for free hosting with unlimited storage.

I am still converting some of my old pages (not currently up) from html 4.0 (from when I had Geocities) to xhtml 1.0 so I can put them up (but doing it at my leisure, as new pages have my priority).

My native language is English. I do not have the ablity to type in the character directly because I do not speak or write Russian. I put this one word in to explain a connection between the Bible and a historic event.

diafol · Answer 8 · 2016-03-03T00:10:13+00:00

Easy fix. Copy the text on the screen and paste it over the encodings.

MidiMagic 579 Nearly a Senior Poster · Answer 9 · 2016-03-17T00:37:36+00:00

I get little white rectangles when I paste the copied text. My editor can't accept the Cyrillic characters.

diafol · Answer 10 · 2016-03-17T15:30:37+00:00

Ok, my last bit on this. Can't believe it's still on-going...

Why you can't just use charmap in Windows I don't know. You are creating a problem where one doesn't exist.

Cyrillic character entity names won't validate.

Recommended Answers Collapse Answers

All 14 Replies

Recommended Answers