Welcome to MSDN Blogs Sign in | Join | Help

The title of this blog is an allusion to Coppola's Apocalypse Now, and eventually I'll be quoting a bit of the Herr-provided narration (those are the pieces Martin Sheen read)...

It all started with a seemingly innocent question the other day. It went something like this (product and component names removed to protect whatever might deserve protecting):

We are hitting an issue where surrogate pair characters do not display correctly on localized builds, but display correctly on English builds. This appears to be because the MS UI Gothic font used in the localized builds doesn’t “automatically” do the correct font linking.  (This can be verified by e.g. opening Wordpad, setting the font to MS UI Gothic, and typing some surrogate pair characters—you just get squares.  If the font is something else, e.g. Arial, the font linking works correctly.)

Is this a known issue with the MS UI Gothic font face? We are currently using one function to obtain the desired font face. Should we be calling a different function instead of this, or in addition to this?

Now as it turns out, there were several different issues going on here.

It start with the involvement of GDI font linking and Uniscribe font fallback, discussed previously in blogs like Font Linking vs. Font Fallback.

First and foremost was the fact that this was what they call a tester scenario. Because of this,the actual supplementary, CJK Extension B characters in question were not ones that are in any version of JIS (including the latest JIS X 213), which is why they were seeing notdef glyphs (aka square boxes).

Uniscribe largely stays out of the world of CJK (Chinese, Japanese, and Korean) text, allowing GDI font linking to so most of the work here. Usually this will guarantee that some ideograph will make an appearance, because as long as it is in one of those core CJK fonts, it will be on the screen.

But there is one time when Uniscribe is completely involved and GDI font linking is not -- and that is supplementary characters.

And Uniscribe is not quite as sophisticated in its efforts here -- it will see if the current font claims to support the Unicode supplementary ideographic plane (which contains e.g. CJK Extension B). If it does then the font will be used, even if there turn out to be some missing characters.

For the Japanese fonts, such as MS Gothic:

MS Gothic

and MS PGothic:

MS PGothic

and MS Mincho:

MS Mincho

and MS PMincho:

MS PMincho

and Meiryo:

Meiryo

each font is actually pretty much limited to the 300-some CJK Extension B characters in JIS X 213.

If you pick one of these fonts to display any other random Extension B ideograph, then you will get a square box.

And if you pick a font with no Extension B support at all, then it will pick one font to look in, based on its algorithm and system locale settings -- thus if you choose Arial or Tahoma or Microsoft Sans Serif or Segoe UI, then you will possibly also get an ideograph!

Korean does not have Extension B in any of its fonts.Given the gemneral tendency toward de-emphasis of Hanja in South Korea and the virtual illegality of it in North Korea, this is hardly a surprise (though this could change in the future if the customer demand drives change here).

And for the most part Chinese has the widest support. Because whether one uses the Simplified Chinese SimSun-ExtB font:

SimSun-ExtB

or the Taiwanese style Traditional Chinese font MingLiU-ExtB:

MingLiU-ExtB

or the Taiwanese style Traditional Chinese font PMingLiU-ExtB:

PMingLiU-ExtB

or the Hong Kong style Traditional Chinese font MingLiU_HKSCS-ExtB:

MingLiU_HKSCS-ExtB

one has a much larger number of ideographs to choose from.

The ranges are of course based on preferred glyphs in the PRC GB18030, Taiwan CNS11643, and Hong Kong HKSCS standards, respectively -- kind of the ultimate exercise of using a code page as a repertoire fence (something I have discussed before).

But the bug did not quite end there.

You seem it seems that the application had its own custom font choosing behavior, which in this case happened to be preferring the newer ClearType Simplified Chinese Microsoft YaHei font.

A font that also has some Extension B in it.

Eight CJK Extension B Ideographs, in fact:

Microsoft YaHei

These eight ideographs are:

So far, these eight characters as a set seem to have no special relationship in China, Taiwan, Hong Kong, Macao, Singapore, Japan, Korea, or Vietnam, those being the major places where ideographs either are in use or have been within the last 1000 years.

If the characters spelled something special, I'd assume it was some kind of Easter Egg in the font (imagine the challenge if coming up with such an egg that relied on eight Unicode characters displayed in code point order -- talk about a fun word challenge in any language!

I am reminded of a bit from Apocalypse Now where Martin Sheen describes a report about Col Kurtz. Specially modified for the current situation, for the conspiracy theory minded:

Late Summer-Fall 2008:
The proper glyphs for ideographic text in the supplementary
planes show up fine in Vista. Then in November in one font
is noted the presence of eight specific ideographs. Two of
them are in JIS X 213, three are from a list of Hong Kong
Cantonese, one is from some from China. T
he number of
Extension B ideographs visible in the application in China
drops off to nothing.
Guess they must have picked the
wrong eight characters.

Kind of a stretch obviously. But still fun to write (had I time to really draw this one out it would have been as much fun in my opinion as that Matrix one!

Whatever the reasons, their presence (due to the Uniscribe design here) can really break Extension B display support if someone is using the cool font with ClearType support.

If I had to guess, I'd wonder whether they were in there as part of an experimental effort at looking at ClearType Extension B support that just never got taken out (why would they? It's not like they are wrong, except in the meta sense of their effect!). But again that is just a guess. Probably more likely than my Apocalypse Font scenario above! ;-)

An interesting situation, in any case....

 

This blog brought to you by 𠂇 𠂉 𠃌 𠦝 𡗗 𢦏 𤇾 𧾷 (U+20087, U+20089, U+200cc, U+2099d, U+215d7, U+2298f, U+241fe, and U+27fb7)

This blog is as off topic as you can get without a prescription from your doctor....

Sometimes when one doesn't get the answer one wants, one can feel somewhat bitter about that fact.

Technical problems with computers can cause a person to be particularly susceptible to that kind of reaction, actually.

Though there can also be more to it, sometimes.

Case in point: a response to a blog from almost two years ago Vista turns on everything, which explains how you can't turn of the Text Services Framework anymore, like you could in the old days of prior versions.

Admittedly not great news for people who wanted to turn it off for application compatibility reasons.

Anyway, the response that Luke sent on (with no return address):

Useless as ever. You are nothing but a fool. This post is less useful than a broken key. I come here wanting to learn how to turn off advanced text services, and you take up several paragraphs to say "You can't". Don't ever attempt at helping anyone you useless 9 year old.

There is something particularly hateful about these words that really gives me pause.

It could be simple frustration leading to an emotional over-reaction, one that the seemingly anonymous nature of the Internet only encourages.

And some of the words such as "I come here wanting to learn how to turn off advanced text services" though clearly the title doesn't even suggest that the blog is about Advanced Text Services at all. tend to clearly suggest a man who found Vista turns on everything via a Google search (as I have to admit so many do).

And someone who found the blog by searching specifically for how to turn off TSF in Vista who read the whole post might get very frustrated about the "waste of time" and all.

And the conclusion of the comment (Don't ever attempt at helping anyone you useless 9 year old) certainly does display a certain amount of impatience and frustration. The kind that makes people lash out in perhaps strange ways that seem vaguely inappropriate.

Though there is something else in those words and others like "you are nothing but a fool", something that does not fit the picture -- it is not just the one blog that has Luke so unhappy with me. There really is something more going on here, running much deeper than a momentary frustration at not solving one single problem.

And then the initial bit of the comment (i.e. useless as ever) really doesn't seem to match here either, and makes no sense in the context of someone who had never been here before and (after the mistake of visiting the one time) would never visit again.

This is someone who doesn't like me, or maybe my online "persona", or maybe after having met me in person. Someone who just really finds no use for me whatsoever.

It's funny, I think that some of the people who hate me the most spend more time dissecting my words for inappropriate meanings to prove their beliefs than the people who are actually fans. This Luke may be one of them, one of the people who just really doesn't care for the taste of my brand of chai.

I used to talk with my friend Liz about this, and I have talked about it with Andrea too - in fact I've had this conversation with both of them long before I even had a Blog, nay before Blog was even a word. And both of them have pointed out that if I wanted to reverse everything I could, but that I speak with a very distinctive voice and would probably have a very hard time changing that since it mirrors the way I think about things.

I gave up three decades ago (in the third grade) trying to please everybody, and have never had cause to think I made the wrong decision back then.

The Blog is perhaps a megaphone, but not one that is changing what I say or how I say it all that much. I use it (and occasionally even abuse it!) in the same way that I would have done in any book or website or email or newsgroup post or presentation or conversation. I can name both people I have maddeningly frustrated and people I have ecstatically delighted. And I think I do serve a "net positive" purpose with what I do -- for myself, for my group, for Windows, for Microsoft, etc.

And of course you do have a choice here -- you could just not read me if you don't like me, or what I say, or both.

So Luke (or whatever your name actually is), if you want to come out from where you are hiding and tell me what your actual concerns with me are then I'd be happy to hear them or even discuss them. Or if you'd rather hide grudges or hatreds behind anonymous venomous messages then I suppose that is okay too.

Though the likelihood of having either influence or impact is much greater in the former approach than the latter. A friendly suggestion. :-)

 

This blog no sponsor, just as this sentence no verb.

Warning: Excessive PASS puns follow; if you don't like that sort of thing, then do not PASS...

It was one of those fun conversations that I find myself in from time to time.

Over IM, I was talking to my friend Rachel (a very smart ASPNET developer I know who rather delightfully has a properly spelled last name Appel rather than those more fruity types of names out there, if you know what I mean).

We were just kind of PASSing the time.

I asked her (in PASSing) whether she was going to PASS (I was referring to the upcoming conference put on by the PASS, the Professional Association for SQL Server).

She told me that due to the dates in question, she was pretty sure she'd have to PASS this time.

I explained that I was given a PASS for PASS, working in the Ask the Experts area.

She couldn't PASS up the opportunity to ask, "When is PASS?"

The dates were, I explained, "The PASS Summit is November 18th-21st in Seattle."

"Oh, I'll definitely have to PASS, I have client work I have to do then."

"You're going to PASS on PASS?" I asked.

"I think i'll have to PASS on PASS, even if you could get me a PASS to PASS."

"Bummer. I probably shouldn't try to get you a PASS to PASS anyway. People might think it inappropriate -- like I was trying to make a PASS at you or something!"

"Nash, they know me. I could probably get my own PASS to PASS, if I didn't have to PASS on PASS. I do have that client work to do."

"Yeah, plus those boarding PASSes don't pay for themselves."

"That too."

"Okay, as excuses go it is a pretty good one. I'll let it PASS."

It went on a bit longer with PASS puns, though we never made it to the NONE SHALL PASS scene from Monty Python and the Holy Grail, though I suspect that was because I didn't think of it until later.

Though as a point of fact, unlike the black knight, way over 3000 people will PASS, to go to PASS -- and not PASS on PASS.

It is an awesome SQL Server conference and I promise that I have all of the PASS puns out of my system now. I promise. :-)

If you will be there, be sure to look for me on my iBOT with the I'm a PC stickers on each side....

 

This blog brought to you by(U+2391, aka PASSIVE-PULL-DOWN-OUTPUT SYMBOL)

Microsoft tends to get criticized, no matter what they (by which I mean we) do.

They (by which I mean customers) hate that the default install the additional IME, keyboard, font, and code page files(ref: What isn't in the default install for NLS).

But of course someone else (customer again) would be disturbed when we (Microsoft) "fixed" this, concerned that The fonts directory is freaking huge in Vista.

And then of course other people (customers again) are installing more fonts then any human could really want and trying their best at blowing their font cache. Obviously if several hundred fonts is too much then several hundreds of hundreds of them would be stratospheric.

Anyway, when we (Microsoft) remove the instructions to install files conditionally since we (Microsoft) no longer need to install them, people (customers again) point out the problems of the smaller intl.inf.

And then just the other day, someone else noticed you could install code page files anymore, and asked urgently:

...customer has an application that uses iso 8859-7  (WinXP and Win2000); after upgrading to Vista business he sees no option for elot-928 (iso 8859-7 ) and he is saying that the codepage has been removed and has been looking for it and ways to set it on Vista.

Ah, the concern was that because someone noticed that the additional code page install option had been removed, and they assumed this meant we had removed the code page.

I suppose we could have cluttered the user interface up with the note that you don't have to install the code pages because they are there, but I think the right call was made there.

But it does seem challenging sometimes to do the work to address a customer concern....

The happy note is that for the application in question the behavior will be better. Though clearly this does not always impact customer behavior. :-)

 

This blog brought to you by 𧾷 (U+27fb7, a CJK Extension B ideograph that is not on any legacy code page)

Lest you have any doubts, I speak here for myself and only for myself, not for Microsoft or for any person, group, or division within Microsoft. This statement is so simple that anyone can get it, right?

Now although I work for Microsoft, for everything I am about to discuss I am just a user of the technologies, not someone who even knows who owns it or works on it. So for this particular blog within the Blog, think of me as an outsider....

At the end of last month I got a mail that many other people got as well. The mail went:



Dear MSN Groups Customer,

As a valued MSN Groups or MSN Communities Web Folders customer, we want to notify you that the MSN Groups service will close on February 21, 2009 and you will have the opportunity to move your group to our new partner service, Multiply. We understand the importance of keeping your group together, so we partnered with Multiply to create a migration process that moves your group to their service to preserve your online community and its history. Read on to find out about how to kick off the automatic migration of your group to Multiply.

We realise this may be unexpected, so before presenting your options we want to briefly share why we've made this decision.

Why?
Because we are dedicated to providing our customers with the most current and user friendly technology available today we made the difficult decision to close the MSN Groups service. This decision is part of an overall investment to update and re-align our online services with Windows Live. In the long term we believe that closing the service is the best way to continue to offer innovative and effective services that help you stay in touch with the people you care about. We plan to launch a new Groups service in the coming weeks, but unlike MSN Groups, Windows Live Groups will focus on offering a place for small groups to collaborate. Multiply is available now, making it your best option today for continuing to share and communicate together online.

Options for moving your group to a new service
We've listed some options and resources below to help you decide what to do with your group.

  • Option 1: Automatically move your group and its data. We have established a partnership with Multiply, an online group and media sharing service so our users can choose to migrate their group to Multiply's service. Choosing this option is free and easy to use: Multiply will move the Group's content on your behalf and invite members to re-join your group in its new location. To begin the migration click here.
  • Option 2: Start again on another service. You can start from scratch and create your group on a different service but we recommend having your Group moved automatically by Multiply. This will enable your Group to transition easily and continue to enjoy the community you have created.
  • Option 3: Start again on Windows Live Groups. To further expand our mix of communications and sharing services, Windows Live will launch a new service this autumn, Windows Live Groups. We plan to launch Windows Live Groups to the public in the coming weeks as a service that helps small groups or clubs collaborate online.

 

Options for MSN Communities Web Folders users
If you use save files to the MSN Communities web folders (also known as "My Web Sites on MSN" or the web folder "My Groups"), these services are part of MSN Groups and will therefore will also be closed on February 21, 2009. We recommend that if you store files online using MSN Communities web folders that you back up these files locally, then upload them to another online storage service such as Windows Live SkyDrive. For more details on how to find and move files saved to your web folders, visit the MSN Groups Resource Center.

Your Next Steps
We have sent this letter to each MSN Groups user, whether member or manager. If you are:

  • A member or user of MSN Groups: Check with your group manager to determine whether they plan to migrate the group.
  • A manager: Visit the MSN Groups Resource Center to learn more about your options and consider soliciting feedback from your group members about what they would prefer to do, when and how. The Resource Center also provides a sample splash page you can use to notify your members that the group will move. If you're ready to move the group now, click here.

 

What to Expect between now and the closing date
Between today and February 21, 2009 the MSN Groups service will remain the same as it is now. We will remove the option to add more storage to your group but other features will remain until the service is shut down and you can use it the same way you do today until the date of closure.

Where can I learn more?
You probably have more questions, and that's why we created a website to address them. Please visit the MSN Groups Resource Center at any time for the most up to date answers to common questions, information about migrating your group to Multiply, contact information for our support staff, and important dates.

Our support staff are equipped to answer your questions and guide you through issues that may arise as you decide what to do with your group. They are ready to help so don't hesitate to contact them at MSN Groups Customer Support with your questions.

We thank you for using our services and regret any inconvenience this may cause.

MSN Groups, Microsoft Corporation



Microsoft respects your privacy. To learn more, please read our online Privacy Statement.

Microsoft Corporation, One Microsoft Way, Redmond, WA 98052


Now the groups I belong to are pretty much limited to the VOLT and WEFT groups, and I don't own any groups myself.

The whole situation seemed eerily familiar, though.

It was several years ago, in the CompuServe forums.

Microsoft had a huge presence there -- for betas, for product support, for product insiders.

Suddenly, they were moving out -- everything was moving to the new (non-replicated) NNTP servers that Microsoft put up.

There was a new Microsoft provided newsgroup reader that was in Beta, it had a code name of Athena, I believe. Though the goddess would undoubtedly smite the folk who though it was ready to handle the traffic in question, and the users in question (many of whom had never been in a newsgroup, some of whom had never been on the Internet outside of a closed client like CIS).

Several products in beta kept their CIS forums after people made a strong push to explain that this move could risk their product ship dates....

I remember a few months later talking to a product manager I knew who remarked how impressed he was at the level of sophistication of the questions being asked in the new Microsoft newsgroups, as compared with the old CIS forums. I had to break it him by pointing out that the reason was that the move was so poorly done that most of the customers had gotten lost along the way.

An effective way to improve the sophistication of your audience, that.

Reminds me of an old joke:

A man takes his wife to the doctor because she is ill.

The doctor explains that he hasn't run all the tests yet and it will take him several days to do so. But in the meantime he has narrowed it down to either Alzheimer's Disease or AIDS.

The man is horrified. "What do I do until the test results come back?" he asks, fearfully.

The doctor responds: "That's simple. Take her to the mall and leave her there. If she comes home then don't sleep with her."

Now this joke is truly offensive, yet in its own way this is kind of what was done to a whole bunch of customers.

And given the differences between Athena/all later Microsoft newsgroup clients and the clients that were already out there, many issues with differences in the way the MS clients work still plague the newsgroups community to this day -- phenomena like fully quoting old posts by default, top posting, etc. Microsoft managed to make itself even less popular with a large group of people that really didn't like them much anyway, and they managed to lose a bunch of their own customers too. MVPs like me went from posting thousands of responses a month to low hundreds -- and if I skipped a month, I lost no sleep. And I was not unusual in this regard -- many regularly posting experts disappeared or massively decreased their support due to the real annoyances with the client software. And they never came back.

That product manager I mentioned figured that Microsoft should put up a white paper explaining how to get to the newsgroups, which prompted to ask him where to put it up. "On the Internet!" he exclaimed, not even realizing the irony of the response....

Now as it turns out, the scuttlebutt of the CIS to newsgroups migration (reportedly) had to do with a limited time offer that Microsoft had to get out of it contract with CompuServe, which they jumped at even though their migration plans were not fully ready. It might be total fiction and I have no evidence that this is the case but since it kind of explains all of the facts I am willing to take it as the most likely hypothesis of the many I heard.

What am I to make now of this new announcement that the MSN Groups are shutting down, and what would otherwise be the most obvious intended replacement (Windows Live Groups) are not being provided a migration path like the Multiply option. What's up with that?

Now I am not a group owner, so I can't say whether they informed owners first or if they truly told everyone at the same time and told group members to ask your owners who may not have even heard about the plans. But I think this is probably just stupidity in the mail and not the plan -- I assume they sent an earlier mail to the owners and just didn't mention it in case people got offended that they were not given as much notice.

Primarily, I'm annoyed that they are doing all this before the replacement is ready -- it looks like the CIS thing all over again. And I suspect that lot of people will get lost in the shuffle, either intentionally because they go somewhere else (perhaps Google Groups, as one person in the VOLT group joked -- I wonder if Google is going to add a migration plan of their own to pick up some of these folk) or unintentionally because they just got lost on the way somewhere.

With MSN Groups and Windows Live Groups apparently targeting two different audiences, and with no replacement coming from Microsoft for some of those who will now be disenfranchised unless they do go to some other company entirely (such as Multiply, this looks a lot more like Microsoft getting out of a market (one they themselves took advantage of given groups like VOLT and WEFT) without being willing to admit why (the non-specific "Because we are dedicated to providing our customers with the most current and user friendly technology available today" implies that Microsoft thinks itself unable to provide those customers with something current or user-friendly? Surely that is not the message they intended here?).

Given that Microsoft did this kind of thing before (in the forum to newsgroup debacle), the whole thing doesn't really even seem all that innovative, to me. More of a "same shit, different group" kind of thing. Or, since the CIS thing happened back in the 90s, a "same shit, different decade/century/millennium" kind of thing. :-(

 

This blog brought to you by 𒁁 (U+12041, aka CUNEIFORM SIGN BAD)

Regular readers might recall a long ago blog entitled New in Vista: What's your name? Who's your daddy?, which talked about the new name-based NLS API functions, intended to wean people off of their use of LCIDs. Because let's face it, LCIDs suck.

Anyway, it turns out that in one case at least, bugs suck more.

Maybe people recall the even earlier blog entitled New in Vista Beta 1: more use of the word 'linguistic', which described (among other things) the NORM_LINGUISTIC_CASING flag -- a flag to do proper casing for Turkic languages.

Turns out there is a problem getting these two features to work together properly....

The bug?

Well, take the following code in C# (it is a Win32 bug not a C# bug, but this lets us look at the managed case and the native one, which is sometimes relevant; plus in this case, more people can test it out themselves!):

using System;
using System.Globalization;
using System.Runtime.InteropServices;

class  test {
    static unsafe void Main() {
        Console.WriteLine("Turkish by name (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringEx("tr-TR", 0x08000010, "\u0131", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine("130, \u0069 = " + CompareStringEx("tr-TR", 0x08000010, "\u0130", -1, "\u0069", -1, null, null, 0));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringEx("tr-TR", 0x08000010, "\u0069", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine("English by name (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringEx("en-US", 0x08000010, "\u0131", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine("130, \u0069 = " + CompareStringEx("en-US", 0x08000010, "\u0130", -1, "\u0069", -1, null, null, 0));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringEx("en-US", 0x08000010, "\u0069", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine();
        Console.WriteLine("Turkish by LCID (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringW(0x041f, 0x08000010, "\u0131", -1, "\u0049", -1));
        Console.WriteLine("130, \u0069 = " + CompareStringW(0x041f, 0x08000010, "\u0130", -1, "\u0069", -1));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringW(0x041f, 0x08000010, "\u0069", -1, "\u0049", -1));
        Console.WriteLine("English by LCID (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringW(0x0409, 0x08000010, "\u0131", -1, "\u0049", -1));
        Console.WriteLine("130, \u0069 = " + CompareStringW(0x0409, 0x08000010, "\u0130", -1, "\u0069", -1));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringW(0x0409, 0x08000010, "\u0069", -1, "\u0049", -1));
        Console.WriteLine();
        Console.WriteLine("CultureInfo Turkey");
        CultureInfo ci;
        ci = new CultureInfo("tr-TR");
        Console.WriteLine(ci.CompareInfo.Name);
        Console.WriteLine(ci.CompareInfo.Compare("\u0131", "\u0049", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0130", "\u0069", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0069", "\u0049", CompareOptions.IgnoreCase));
        Console.WriteLine("CI en-US");
        ci = new CultureInfo("en-US");
        Console.WriteLine(ci.CompareInfo.Name);
        Console.WriteLine(ci.CompareInfo.Compare("\u0131", "\u0049", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0130", "\u0069", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0069", "\u0049", CompareOptions.IgnoreCase));
    }

    [DllImport("kernel32.dll",CharSet=CharSet.Unicode)]
    static unsafe extern int CompareStringEx(String strLocale, uint dwCmpFlags, String str1, int count1, string str2, int count2,
        char* version, char* reserved, int param );   

    [DllImport("kernel32.dll",CharSet=CharSet.Unicode)]
    static unsafe extern int CompareStringW(uint Locale, uint dwCmpFlags, string lpString1,
        int cchCount1, string lpString2, int cchCount2);
}

The results?

Turkish by name (native)
131, I = 3
130, i = 3
i, I = 2
English by name (native)
131, I = 3
130, i = 3
i, I = 2

Turkish by LCID (native)
131, I = 2
130, i = 2
i, I = 3
English by LCID (native)
131, I = 3
130, i = 3
i, I = 2

CultureInfo Turkey
tr-TR
0
0
1
CI en-US
en-US
1
1
0

The results that are the bug are in red.

Basically, the NORM_LINGUISTIC_CASING flag feature added in Vista does not work if you use the name-based NLS collation API functions added in Vista.

Not as bad as the whole IsSortable() == false? Well, sometimes it may be lying.... situation since in this case at least it was two different people.

However, that just lets the two people feel a little better; it doesn't really do anything for someone hit by the bug.

Thus on a scale of 1 to LAME, as mitigations go, this one is kinda lame. :-)

This is fixed in Windows 7, so I guess that's why we have new versions.

And I mentioned it here, so I guess that's why we have blogs. :-)


This post brought to you by İ (U+0130, a.k.a. LATIN CAPITAL LETTER I WITH DOT ABOVE)

Cheekheon asked via the Contact link:

Hi,

I have a NEC Desktop Computer running Windows XP Home Edition SP 2 with an ATI X300 (RV370) display card.

A few months ago, I did a security update from Microsoft and was advised to update my ATI display driver. So, I updated it accordingly to Version 8.6 (Display driver only).

However, the language of the system boot up display and the Windows Advanced Options Menu changed from English to another European language after the driver update.

The language is similarly changed when booting up a Linux LiveCD although the same LiveCD would display correctly in English when it is being booted up on another computer.

It may be pertinent to note that only the language of the system boot up display has changed.  Windows XP and Linux are still in English after the booting.  The BIOS Setup Utility is also still in English.

I have googled for a solution for the past few months without any success.

I would appreciate very much your expert opinion on what could be the problem and how to resolve it.  Thanks.

Regular readers may already know what to do here to get the situation fixed up....

I described the technique you use indirectly in

and more directly described the problem in

Now I have no idea why any kind of install would change this setting, but it is easy enough to change it back using this technique. :-)

 

This post brought to you by(U+a0c6, a.k.a. YI SYLLABLE MUP)

So anyway, Kim's other recent blog, entitled Making a StreamWriter usable even after given garbage characters, highlights an interesting difference some of the methodology between the way that Windows and .Net handle encoding and codepages.

In Windows (in contrast to the behavior of most NLS API functions, as I have mentioned previously), the WideCharToMultiByte and MultiByteToWideChar functions will use the target buffer up until the point of failure, so that in the case of failure you may be able to do something with the partial results.

Now without a length indication the options of what can be done are more limited, but if nothing else then at least subsequent calls will not be affected by their predecessors.

.Net, on the other hand, has a default behavior here when you write to the stream that causes the StreamWriter to be useless.

The description in Kim's blog did not fully explain the problem, so I'll fill in the blank to it. :-)

She said:

For example, on an attempt to write U+DFC9, which is only half of a Unicode character (not a complete surrogate pair) an EncoderFallbackException was thrown

Now we have a stream here, so why is the stiry iver? Isn't the point of the stream thing that you can do it in chunks? Why would this be unrecoverable?

Well, the problem is that U+dfc9 is a low surogate.

See The basics of supplementary for a glossary update here!

As I mention in Why do the high surrogates have the low numbers? and other places, a surrogate pair is a high surrogate followed by a low surrogate.

A lone high surrogate is recoverable because it is incomplete.

But a lone low surrogate with no preceding high surrogate has no place to go, nothing to do -- it is toast unless you have a fallback plan in place, as Kim mentioned.

Though to be perfectly honest, after situations like that described in The torrents of U+fffd, I would much rather have had the default fallback plan be the U+fffd insertion.

I'm not a fan of the whole U+fffd thing, as I pointed out many times before. But given the huge push to change behavior from "drop illegal sequences" to "replace illegal sequences with the replacement character", I think behavior that did not throw in this case would have made for a better default....

And yes, I know there is a backcompat question here for the behavior, but since behavior was being changed anyway in this "in a service pack" change, there was a good opportunity to take a hard look at changing that default (since even already compiled applications were going to change their behavior!

 

This post brought to you by (U+fffd, a.k.a. REPLACEMENT CHARACTER)

People have been misusing the word neutral in the whole area of internationalization of Microsoft products for quite some time now, a fact that I have discussed previously in blogs like Neutral? I do not think that word means what you think it means! and How ConvertDefaultLocale sorta broke backward compatibility in Vista, and why and Using full locales rather than the neutral ones? and Behold the Table Driven Text Service, Part 5 (All about the language, baby!).

As these various technologies mentioned in the above blogs tortuously abuse the nature of the word neutral over and over again, one is left wondering how one could remain neutral about the meaning of neutral given how much the meanings conflict!

Though to be honest, there is a competition for most over-used term in this area, and the other contender is the word default, which has been misused/overused in a lot of situations too, as I previously discussed in blogs like The ever-misleadingly incorrect usage of the word DEFAULT and Neutral? I do not think that word means what you think it means! again, and so on (I could cite more but it would just depress me if I did).

And of course the term locale has to be thrown in there too -- has any term been overused as much as it, really? And misused when language was what was really needed? And don't even get me started on culture here. From The ever-misleadingly incorrect usage of the word DEFAULT again and so on, locale nd the other two are way overused, and just as often misused as the rest (I'd add more cites but I might collapse in moral despair if I did!).

Then yesterday, my BCL friend and fave and fan Kim Hamilton wrote in her blog about What does the NeutralResourcesLanguageAttribute do?, and managed to avoid most of the difficulties in the above and in what managed code brings in to further confuse the issue. As she mentions:

NeutralResourcesLanguageAttribute marks the neutral culture for an assembly. That sounds self-referential, but a full description would require another blog post. To avoid getting bogged down, think of neutral culture roughly as the default language. (Fingers crossed that Michael Kaplan doesn't flame me for that oversimplification.)

Well Kim won't need to wear the flame retardant britches to lunch tomorrow, though I may enlist her at some point to help flame the people who made it so complicated and weird and terminologically retarded in the first place.

The confusion here obviously predates her attempt to explain things, and at least this simplification is self-consciously aware that it is walking on eggshells.

The fault, such as it is, lies in the shoulders she is standing on here. :-)

Do you notice how .NET now has yet another meaning for neutral that they threw in the ring -- they are using the term neutral to refer to their ultimate fallback (something I haqve talked about a bit before with the managed/native differences thereof in Random irreverent thoughts about the Ultimate Fallback).

That would have been a good [over-]use of the word default, to be honest, especially given the strong meaning that neutral has in .NET when it comes to neutral cultures, which are heavily used in resource loading.

When you consider that the .NET Framework was once so paranoid about using the same term to mean two different things that they introduced Ordinal for binary comparisons because they were paranoid that people would confuse the other use of binary in .NET (binary serialization/formatting) elsewhere, the fact that they would overload the word neutral in the same technology does suggest a small lapse in judgment somewhere.

Were the consistent terminology nuts over there napping that day? :-)

Consider the confusing legacy of the term ordinal which, while more consistently used then all of the above terms, has the disadvantage that all of them have the potential to be intuitive, something that ordinal will never have going for it.

They could have called it binary and binaryignorecase and been intuitive!

 

This blog brought to you by(U+2afb, aka TRIPLE SOLIDUS BINARY RELATION, which is slanted even though not italicized)

Now if you look at all of the following blogs:

The real issue we are talking about (once everyone stops complaining, which can take a while!) is the problem I explain in Who owns English, exactly?.

Of course if we "owned" English (assuming "we" could define who "we" are in this case!), then wouldn't we take all of the following and more:

Angielski
anglais
Anglè
Anglès
angleščina
Anglické
Angličtina
Anglis
anglisli
anglizča
Anglu
anglų
Angol
Engels
Engelsk
engelska
englanti
Engleski
Englezã
Englisch
English
English Hol
Ingelesa
Inggeris
Inggris
İngilis dili
ingilizçä
İngilizce
ingles
Inglés
Inglês
Inglese
inglise keel
TiếngAnh
Αγγλικά
англiйская
англизча
Английски
Английский
Англис
англисū
Англиски
Англия
Англійська
Енглески
Инҝилис дили
Անգլերեն
אנגלית
ענגליש
الإنكليزية
انگليسي آمريكايي
अंग्रेज़ी
ஆங்கிலம்
ภาษาอังกฤษ
ინგლისური
영어
英語
英语

and have opinions about the way English is spelled in other languages?

Perhaps since no one in Great Britain or Australia or Canada or the USA is dictating how the items in this list are to look,people sould not spend so much time trying to tell people how their language is to be spelled in English? :-)

You might be living in Iran, and bothered by the English word Farsi.

Or perhaps you are living in the Xinjiang Uyghur Autonomous Region of China and bothered by the English word Uighur.

And so on.

But it might be a good idea to take a deep breath and relax....

 

This blog brought to you by E (U+0045, aka LATIN CAPITAL LETTER E)

Via the Contact link, Alain asked:

Hello Michael,

I ask you about a problem I searched on the net all morning and get no response.

We work à UNESCO (Paris/France) on a multi-lingual database (SQL Server 2005). We actually add Arabic to a English/French/Spanish/Russian thesaurus.

We have Arab people at Alexandria test our application and they complained about not getting response when searching in Arabic with letters having not the same diacritics (e.g. Alif with and without Hamzah).

We use SQL_Latin1_General_CP1_CI_AI, but I tried with Latin1_General_CI_AI and Arabic_CI_AI and got the same result.

My questions : is there a way to add my own collation to a SLQ 2005 server. Or is there a collation just ignoring *all* diacritics for every UNICODE character ? And why does Arabic_CI_AI  not ignore Hamzah on Alif ?

I wonder if I am the only guy around the world searching Arabic text on a SQL Server database. I am not an Arabic reader not speaker, but it seems the the requirment is very basic for Arabic...

Thank you and, please, forgive my poor English

Very good bunch of issues in there that all deserve some coverage! :-)

Starting with the easiest part: SQL collations are terrible and essentially useless in most cases. My words in SQL Server: compatibility collations vs. Window collations are probably the best answer here to explain why not to use them. It is just that given how awful the SQL compatibility collations are for text outside of English, they are pretty much only the default in the US (otherwise SQL Compatibility collations are a bit too retro because Unicode and SQL Collations have nothing to do with each other).

So less than ideal results there are kind of par for the course....

Then there is the fact that Latin1_General_CI_AI and Arabic_CI_AI return the same results. This is actually also expected since both collations use the default table and the only difference between them in SQL Server is how they have different code pages attached to them for non-Unicode columns (1252 for the one, 1256 for the other).

Therefore, this too is expected.

Ok, enough stalling -- let's get too the actual issue -- the incorrect results!

This is a longstanding bug that I have previously described in Is it punctuation, symbol, or diacritic?, which explains the nature of the problem and describes how in some cases NORM_IGNORESYMBOLS will help here when one is dealing with Windows 2000, XP, or Server 2003.

Unfortunately there is no way to set this flag in SQL Server, so in the end there is no collation setting to work around the bug in SQL Server 7.0, 2000, or 2005.

However, Is it punctuation, symbol, or diacritic? explains how Vista and Server 2008 actually fix this longstanding issue. and the cost of fixing eight separate problems with Arabic script collations was just one bug, in Persian (ref: Hello Madda, Hello Father (Iranian style)).

and how does SQL Server get this fix?

Ah, for that you can find the answer in On changing the world, or at least the way people order things in it, which explains that SQL Server 2008 has the absolute latest version of the tables to date when SQL Server shipped, and thus has the fix for this bug in it.

There is, however, no downlevel fix for this problem that has really been around in Windows for as long as Arabic support has been in the product and in SQL Server for as long as Arabic support has been in that product.

Custom collations or any way to modify collations? That is a feature that does not exist in either windows or SQL Server....


This blog brought to you by ب (U+0628, aka ARABIC LETTER BEH)

 In response to About the Fonts folder in Windows, Part 3 (aka What changes in Vista?), Shaun asked in a comment:

I unzipped a large number of font folders into my Windows/Fonts folder and now the unzipped folders are not showing up… my Fonts folder is only showing about 4,500 fonts and there are 65,000 fonts in there somewhere but they can’t be viewed and they’re not installed, just sucking up space and invisible. My “show hidden folders” option is enabled in Folder Options, and I can see the folders when I go into “Install Fonts”, but I can’t delete them!

Any ideas on how to access these folders that are obviously there, but unaccesible?

This comment took me back.

Way back, in fact.

To the Spring and Summer of 2000.

I was in the final stages of a book.

This book:

Internationalization with Visual Basic

Now most of the production was done in Microsoft Word for Windows, and the machines they were running it on were almost all running Microsoft Windows NT 4.0.

And the folks doing the production work were having problems.

It seemed like every chapter would have some characters missing. They would exit programs, log off, and reboot. The symptoms changed each time as the exact characters missing would vary, but invariably something would go wrong.

They were desperate.

Folks were getting more frantic in Indiana (where Sams was located), and the stress was being transferred to Redmond (where I was).

So we had a nice long conversation where we wnt through the issue with the fragile font cache in NT 4.0 and how easy it was to blow by having too many fonts. And the large number of fonts that the book needed were more than enough to blow the font cache. and blow it huge.

In Windows 2000, a huge push to fix these problems was very successful, but switching the production machines was just not an option -- the IT staff was just not set up to doing this. Even if they were, the results in Word were not the same between NT 4.0 and Windows 2000, and I was working on NT 4.0 for the book. Moving to a new platform would mean major reformatting work for the whole book. So if it seemed like there was a competition between them and I to decide who would drag their heels the most on the idea of a Win2000 upgrade, then the appearance probably wasn't far from reality.

So my suggestion was to strip down the fonts to the absolute bare minimum, then add just the necessary fonts for each chapter and take them off the machine when done. And reboot in between each chapter, just in case.

It mostly worked just fine (I say mostly because there were some problems in Chapter 3 that were not caught prior to print), and I swore to move all of my machines to Windows 2000 as soon as the book was completely done.

Now like I said the problem was largely fixed in Windows 2000.

But just because things don't blow up as easily as they did in NT 4.0 doesn't mean that GDI and the Fonts folder are prepared to scale beyond 65,000 -- or even 4,500 -- fonts! :-(

In the words of the folks from In Living Color, Homey don't play that.

But the deleton of the extra fonts is easy enough via an elevated CMD prompt. Which should allow the deletion to happen for all of the extraneous font files.

Obviously the situation with my book was what at the time thought of as quite an extreme case of a machine being overwhelmed by the raw power of typography, but all things considered I am pretty sure that 65,000 fonts would probably top that on any version fo Windows (as the folder itself obviously has its own problems scaling to that quantity, beyond whatever problems the underlying infrastructure hits!).

What are the scenarios that one would really need 65,000 fonts installed?

How many fonts do you have on your machine, and in what version of Windows?

when I think about Windows 7 and Long Zheng's Improvements to fonts in Windows 7 over in I Started Something, I can't help wondering if the new Fonts folder in Windows 7 will scale up to 65,000 fonts. I realized three things:

  • The mere fact that I ask the question here does not really make it a meaningful one, and
  • The mere fact that I ask the question here probably does mean someone will have to try it out if they aren't doing so already -- to have a description of the behavior in the future KB article, if nothing else, and
  • The behavior between now in the Windows 7 PDC build and in the final release of Windows 7 might be improved if this scenario wasn't already being tested....
There is a class of bug that if someone finds it you kind of have to do something about it; this scenario might qualify. So I apologize to whoever is tasked to look further into things. :-)

 

This blog brought to you by T (U+0054, aka LATIN CAPITAL LETTER T)