Worldware Conference Summary – Not as Good as Being There

In March I attended and presented at the first Worldware Conference, which took place in Santa Clara, California in the heart of Silicon Valley. I became really excited about this conference as it proved to be the first to directly target business issues around software internationalization and globalization. Too often in other conferences, the focus is very low level on technical issues, while missing greater business planning and operational issues that affect every organization that looks to build and maintain world-ready products. In fact, that issue had been a long running annoyance for me when attending conferences like Unicode and LocalizationWorld. So I was eager to get involved in Worldware and sat on its board as well.

The conference had outstanding material, and featured various business leaders from well known world software brands. The downside was that the conference was not particularly well attended. There were probably a total of about 70 people there, including speakers, but at least we all got to know one another. Presentations featured executives from companies like EMC, Microsoft, Linden Labs, Oracle, Mozilla, Sun, Adobe, Yahoo!, Intel, various industry consultants and of course me.

Here’s a few items from my notes and memory, in little particular order:

  • Don Depalma, of CommonSense Advisory, had some excellent data showing return on investment and overwhelming customer preference for software which was internationalized with locale sensitive language and formatting support. His numbers were of the Holy Grail that managers have been asking for. A big point was that even when end-users are perfectly capable of reading, writing and speaking English, they vastly preferred software in their own language to the point where they made choices and spent more in line with that preference. Don had data broken down even per country. I can’t wait to poach some of these slides.
  • Common points were that i18n is an enabler for localization and ultimately revenues. A way to waste a ton of money is to pursue localization before you’ve properly internationalized.
  • Organizations like Mozilla and Linden Labs are making great use of crowdsourcing to enable new features and localization. So if you have a product which has an emotional type of rabid following, crowdsourcing is a relatively new form of getting help, though it needs its own adaptation for management.
  • Some companies, like EMC, must simultaneously ship for all top tier locales when releasing new products. So globalization isn’t an afterthought.
  • Executives don’t understand internationalization but understand the cascading effect.
  • Invest in internationalization expertise. Too expensive to “wing” it.
  • Empower product teams
  • Create i18n boot camp training
  • Some companies demonstrated that they have built whole organizational frameworks to support internationalization. Particularly Intel and Yahoo! presented how they are using technologies for automatically auditing global readiness. Happy to say Globalyzer got many accolades.
  • There was a lively Agile (extremely popular development methodology) discussion as it relates to internationalization. This is because if i18n is built into the product development from the start Agile works great. When there are Agile cycles and i18n on existing code going on simultaneously, both efforts are very unlikely to synchronize well. Lots of reasons for this, which would probably make a great future article for this newsletter. This issue came up multiple times and Tony Jewteshenko gave a whole presentation session on it (but I wasn’t able to attend that one).
  • It’s extremely difficult to take back a language after you release for a particular market. So consider that request for your software in Klingon carefully.
  • How you communicate around the world will empower your organization.
  • Brand recognition
  • Market Share
  • ROI
  • I presented along with Daniel Goldschmidt on how to get an i18n effort going
  • Technical buyer, vs. Management objectives
  • Need to get a good plan for budget approval first, design second
  • Showed Globalyzer 3.0 and scanned some open source code
  • Demonstrated a project plan
  • Daniel broke down i18n projects into a 3 phase approach
  • Transportation – moving data from A to B
  • Application – doing something with the data (e.g. sorting)
  • User Interfaces
  • Then we both talked about keeping software world-ready and answered questions
  • Kamal Monsour of Monotype Imaging gave a most informative presentation showing intricacies of digital fonts in languages like Arabic and Hindi.
  • I was on a panel along with Ed Watts of Oracle and Mike McKenna from Yahoo! on Assessing and Quantifying efforts. Ed emphasized the role of pseudo-localization. Mike was his usual incredible reservoir of information and experiences both organizationally and on the technical side in supporting i18n. I talked about how we essentially have had to learn to estimate and execute internationalization projects and still make a profit, and that’s why we’ve created tools and methodologies to do so.
  • Aaron Marcus of Aaron Marcus and Associates gave a presentation on cross cultural user-experience design showing many cultural differences, certain scales by which cultures accept power hierarchies and how that shows up in site design.
  • Mike McKenna showed a fabulous presentation on trends in internationalizing which featured several i18n initiatives at Yahoo! As a bonus, I got a Fight Mojibake sticker (ghost characters), which is now on my notebook. In particular, they work to get people enthusiastic and understanding that they are creating products for the world. He also talked about how his team supports i18n with tools like Globalyzer. Thanks Mike.
  • Barbara Burbach of Cisco talked about staffing models, including outsourcing for i18n and l10n. She felt i18n outsourcing for an existing product was a good idea, as it keeps the core development team focused on new features. For new products being internationalized from the beginning, she preferred in house engineering.
  • Tex Texin (i18n Guy) discussed how he has worked with various teams to promote internationalization, and how decisions were often affected. He also gave Globalyzer a nice recommendation. Tex was formerly in charge of internationalization at Yahoo! and NetApp, both of which are Lingoport customers. Thanks Tex.

I’ve missed a ton in this quick summary, as I haven’t managed to master being in two places at once and couldn’t have attended all the sessions.

Enterprise Internationalization and Automation

There are some technology companies where thinking globally has been fundamental to their operations for years and years. I’m referring to companies like IBM, HP, Yahoo, Google and the like. These companies all made significant investments in their global infrastructure, sales teams, products, development and strategic planning. It didn’t happen by accident. And as these companies develop new products or acquire companies, they look to leverage them across that global infrastructure quickly and profitably. Global companies are good prospects for my company in our internationalization products and services business, because they tend to be more experienced in their understanding of engineering challenges, knowing that it takes people, tools, time and money to globalize software so that they can gain the best return on their product distribution and sales infrastructure.

One very potent way to make software globalization fundamental to a company’s mindset is to make internationalization a fully integrated and automated part of software development practices. There are all kinds of tools, checkers and environments to help developers create interfaces, access and transform all kinds of information buried in databases, support coding constructs, manage memory and perform application modeling. With that in mind, we’ve been hard at work with a major new Globalyzer release, clearly aimed at supporting entire development departments and enterprises, automatically using batch processes on servers to monitor internationalization progress as well as on the desktop where issues can be individually examined and fixed. While that has always been our aim, we’re now getting there in more robust ways that track internationalization status over time over multiple programming languages and even over multiple products.

Globalyzer i18n software

For those non-developers reading this, let me explain what I mean about automation in this context. When engineers create code, they generally all submit their work to a code repository. This repository provides version control so that when multiple engineers are all working together, they can check code in and out and merge together all their changes. Then the code has to be put together and built. This build process usually occurs on some interval, such as nightly or even on a continual basis. During this automated process, you can also automatically check for many other issues like performance, load balancing, and I’m proposing that this is a great time to check on internationalization/localization readiness by running tools on the code automatically as a batch process, which then tracks issues via reports. Now counting issues is one thing, but you can go even further by showing exactly where a problem exists in the code, along with the context of the errant issue. That information can then be brought forward for quick review and fixing.

Two companies which come to mind, doing this very thing are Intel and Yahoo. Michael Kuperstein of Intel, presenting at the WorldWare Conference in March, reported how his team developed their own internationalization toolkit a few years ago and have integrated it into many of their automated build processes. That automation has made internationalization an important and measured component of their ongoing development efforts. By Mike’s own admission, he would have used Globlayzer had he known about it years ago.

Mike McKenna of Yahoo also reported at WorldWare that his globalization team is using automation, in this case Globalyzer, to measure internationalization benchmarks on development teams.

Globalyzer, a leading internationalization tool support software internationalization, Java internationalization, and software localization.

On the localization product side, there are multiple tools for different aspects of managing words. But when it comes to products which support an enterprise in their software internationalization efforts, there is a pretty empty playing field. Aside from some very simple string externalization utilities in a few development environments and frameworks, our Globalyzer is simply the only commercial software I know of that can automatically monitor development over time over a wide range of programming languages, while also stepping entire teams through internationalization fixes in large amounts of code.

I’ve said a few times in my columns that I’ve found that it’s quite powerful to embrace the management principal that whatever gets measured gets done and improved over time. So it follows that one of the most important aspects of any software development undertaking is that you measure desired outcome over regular intervals. If you just hope that it will all come together in the end, you always end up late and over budget. That is ultimately behind the agile and extreme programming development movements, in that you make more frequent intervals of measurement and goals. But it’s not so easy to track something like internationalization, either as a project where you are refactoring software for new globalization requirements or even for ongoing development. Consider that developers are typically over tasked, and often distributed across time zones and continents. Then factor in that internationalization can be quite subjective to a particular development task. Plus internationalization is a fuzzy thing, in that it is tailored to requirements, technologies and special cases. So what development teams grapple with how to handle it, and make their way through the task by brute force – or simply postpone or avoid internationalization whenever possible. Issues get missed, and if you’re lucky, you have an iterative process during localization to fix internationalization bugs, which is a very expensive and time consuming path. Or worse, development ignores the issues and calls it a localization problem.

I spoke with a company in just that situation last week. They were upset with their localization provider for poor quality, but when we examined some of the issues, there were also extensive internationalization mistakes that were sure to break localization context and execution. These included missed strings and extensive string concatenations. Had they been monitoring these efforts all along, and been clearer on internationalization requirements, they would have had better results and a clean release. The biggest costs to them were poor market entry, customer dissatisfaction and complaints from their distributors and sales teams which had to overcome a poorly localized release. Now I also feel that as vendors we have some responsibility in taking care of clients and not selling them a solution that risks poor quality and a weak market entry, so some blame also goes to the localization provider. But I hardly know what really happened, I was just there to offer help in picking up the pieces. Clearly that’s an expensive route in many ways.

Remember, internationalization is often run by a different crew than localization. Software developers are upstream from localization, and they are sometimes all too disconnected from a final localized product releases. Localization is often someone else’s problem and engineers are focused on getting a release out with all its new features. They don’t know what they don’t know, which is only human. That leaves localization teams waving their arms around trying to get the developers to build software right the first time. And those teams likely have no way to measure if the product they are tasked with for localization actually passes internationalization muster, until they go through localization testing. Again that’s very late and expensive in a software development process, and more often than not, localization testing tends to be underfunded and vendor dependent. You’re going to have trouble finding everything. So for localization teams, what I’m suggesting is to consider a kind of automated litmus test. When code comes to the localization group, scan the code for internationalization issues, and consider what’s found. The technology is now there to do this in detail and examine each potential issue, quickly and easily. So at the worst case, you can at least have engineering fixing internationalization bugs during the localization process rather than when it’s far more expensive.

Again, anything that measures and sheds light on the situation will also have the result in making improvements. So if you want well globalized software, better start measuring how that code is developed, not just what it’s costing to localize it.

P.S. I’m thinking of writing a column on funny ways people fell into the localization business. If you have a good story you’d like to share, please contact me!

Internationalization Management Tips: 10 Mistakes to Avoid

It’s extremely common for us to work with clients who have had a bumpy past with regards to internationalization. Sometimes you have to learn things the hard way, but that is always expensive.

In the past I’ve written about ten tips for managing internationalization projects. Here’s a look at mistakes that I’ve commonly seen repeated on the client side. In our services practice at Lingoport, we often have to council our clients through one or more of these sorts of process issues, which is actually a very rewarding part of what we do. While this list is pretty high level, we’ve seen that the processes involved can set up cascading failures that eventually can have a serious impact on a project’s success. Some apply more to internationalization of existing applications; others can apply to development where internationalization is planned in from the point of conception (still kind of a rare thing, but gaining).

So, here are 10 internationalization process mistakes to avoid:

1. Don’t forget what drives internationalization:

Money on the top and bottom lines of your company’s balance sheet. The point here is that the costs of being late or lousy endure way beyond benefits of cutting corners on development. Internationalization happens because of a:
a. New customer(s) sale
b. New partnership
c. Strategic initiative backed by marketing, legal and other types of efforts and investments

2. Don’t assume internationalization is just an older software legacy issue.

It comes up surprisingly often that people even in our industry think that internationalization is mainly an issue for older applications. No framework, whether it’s J2EE, .Net, Ruby on Rails, PHP or whatever is new and improved, internationalizes itself. You still need to do all the steps necessary to implement locale and all the associated internationalization practices. Many newer programming platforms do an excellent job of internationalization support, which is great news as you can estimate and execute with a higher degree of accuracy. But you still have plenty of work to do.

3. Don’t assume you can treat internationalization like any other feature improvement when it comes to source control management.

With internationalization source control can need an extra step of thinking things through. It’s very typical for new feature development and bug fixing to be going on in parallel to internationalization efforts. However, in the process of performing internationalization, you are going to be breaking major pieces of functionality within your application as you make large changes to your database and other application components. In order for respective developers to work on their own tasks and bugs, you typically need to branch code, often with specifically orchestrated code merges.

4. Don’t assume internationalization is just a string externalization exercise.

Prevent corrupted software strings with Lingoport's software internationalization toolString externalization is important and highly visible, but the scope of internationalization includes so much more. For example: creating a locale framework, character encoding support, major changes to the database, refactoring of methods/functions and classes for data input, manipulation and output. How these are all approached, varies greatly based on requirements and technologies.

5. Don’t wing it on Locale

Designing how locale will be selected and managed often doesn’t get the amount of thought and planning deserved. How the application interacts with the user, detects or selects locale, and then how it correspondingly behaves is a design process needing input from an experienced architect, product marketing and the development team. This is not an area to be chosen by any one representative by fiat. It’s a whole lot of work to redo locale if it’s executed inadequately for user, business and locale requirements.

6. Don’t create your very own internationalization framework

Don’t even do it if you think you know better. We regularly run into clients who have half-way implemented internationalization using their own homegrown methods for string extraction and locale management when there were already well establish methods provided within their programming language framework or established solutions like ICU. Using these will ensure that your code is far easier to maintain, and you’ll know that thousands of applications have used them successfully before you. No unpleasant surprises.

7. Don’t think that the team internationalizing your software can work without a working build

This seems obvious, but it comes up lots. Without a working build, the developers can’t smoke test the changes they are making. Even if you provide a dedicated QA person, my own experience is that developers need to be able to compile and run themselves to head off problems later. It’s too hard to rely on reconstructing coding errors at a later time and make for unnecessary bug fixing iterations, lost time and poor quality.

8. Don’t run out of money

Internationalization planning often suffers from underscoping. At Lingoport, we have both software and well established methodologies for estimating internationalization, as we really don’t want to ever break this rule and have to ask our clients for more funding. Same should hold true for internal efforts. Lapses in funding can cause expensive delays, as new funding takes more time than anyone imagined to get approved. It also reduces management credibility. And chances are, if you need to ask for more money, than you also need more time, which brings you back to consequences regarding tip #1.

9. Don’t use a half thought-out character encoding strategy

Use Unicode, rather than native encodings. If you have budget and time constraints and you’re only targeting dominant languages in markets like Western Europe, North and South America, you can often get away with ISO Latin – 1, but even for Eastern European languages, go Unicode. Then when you do, make sure your encoding works all the way through the application. And don’t forget that if your customer needs to support worldwide customers themselves (e.g. enterprise software), they may need you to support Unicode data processing even if the interface remains in English. One more consideration tilting toward Unicode is that programming languages like C# and Java already internally pass strings and data as Unicode, so you might as well think about engineering for the world.

10. Don’t use your same testing plan, or just rely on localization testing, when your functional testing needs to grow to include internationalization requirements

In our services projects, we always put special emphasis on working through pseudo-localization of not only the interface, but sending test data using target character sets, locale altered date/time formats, phone numbers and more, from data input to database, to reports and so on. If your testers are English only speakers, that’s fine. For example, we have a utility, PseudoJudo in one Globalyzer that puts target language buffer characters surround English strings. You can expand data fields to fit physically longer strings giving room for translation changes in sizing as well as encoding.

11. Bonus Tip: Don’t assume localization is just someone else’s problem

It’s funny how many of our customers are strictly concerned with software development and don’t actually have anything to do with localization processes. We always work to bring together localization into the internationalization effort. We do this by interfacing localization resources early on, helping them understand the technical requirements and then feeding translators strings that we extract on the front end of projects, so that when internationalization functional testing is done, we are immediately ready to perform linguistic translation testing and ultimately deliver a finished product. This compresses times to global release, while also making for a more fluid process, less programming iterations and higher quality.

Unicode and Internationalization Primer for the Uninitiated

Among our friends and clients at Lingoport, we regularly see ranges of confusion, to complete lack of awareness of what Unicode is. So for the less- or under-informed, perhaps this article will help. The advent of Unicode is a key underpinning for global software applications and websites so that they can support worldwide language scripts. So it’s a very important standard to be aware of, whether you’re in localization, an engineer or a business manager.

Unicode and Internationalization

Html Unicode and XML character encoding

Firstly, Unicode is a character set standard used for displaying and processing language data in computer applications. The Unicode character set is the entire world’s set of characters, including letters, numbers, currencies, symbols and the like, supporting a number of character encodings to make that all happen. Before your eyes glaze over, let me explain what character encoding means. You have to remember that for a computer, all information is represented in zeros and ones (i.e. binary values). So if you think of the letter A in the ASCII standard of zeros and ones it would look like this: 1000001. That is, a 1 then five zeros and a 1 to make a total of 7 bits. This binary representation for A is called A’s code point, and this mapping of zeros and ones to characters is called the character encoding. In the early days of computing, unless you did something very special, ASCII (7 bits per character) was how your data got managed. The problem is that ASCII doesn’t leave you enough zeros and ones to represent extended characters, like accents and characters specific to non-English alphabets, such as you find in European languages. You certainly can’t support the complex characters that make up Chinese, Korean and Japanese languages. These languages require 8-bit (single-byte) or 16-bit (double-byte) character encodings. One important note on all of these single- and double-byte encodings is that they are a superset of 7-bit ASCII encoding, which means that English code points will always be the same regardless the encoding.

The Bad Old Days

In the early computing days, specific character single- and double-byte encodings were developed to support various languages. That was very bad, as it meant that software developers needed to build a version of their application for every language they wanted to support that used a different encoding. You’d have the Japanese version, the Western European language version, the English-only version and so on. You’d end up with a hoard of individual software code bases, each needing their own testing, updating and ongoing maintenance and support, which is very expensive, and pretty near impossible for businesses to realistically support without serious digressions among the various language versions over time. You don’t see this problem very often for newly developed applications, but there are plenty of holdovers. We see it typically when a new client has turned over their source code to a particular country partner or marketing agent which was responsible for adapting the code to multiple languages. The worst case I saw was in 2004 when a particular client, who I will leave unmentioned, had a legacy product with 18 separate language versions and had no real idea any longer the level of functionality that varied from language to language. That’s no way to grow a corporate empire!

ISO Latin

A single-byte character set that we often see in applications is ISO Latin 1, which is represented in various encoding standards such as ISO-8859-1 for UNIX, Windows-1252 for Windows and MacRoman on guess what platform. This character set supports characters used in Western European languages such as French, Spanish, German, and U.K. English. Since each character requires only a single byte, this character set provides support for multiple languages, while avoiding the work required to support either Unicode or a double-byte encoding. Trouble is that still leaves out much of the world. For example, to support Eastern European languages you need to use a different character set, often referred to as Latin 2, which provides the characters that are uniquely needed for these languages. There are also separate character sets for Baltic languages, Turkish, Arabic, Hebrew, and on and on. When having to internationalize software for the first time, sometimes companies will start with just supporting ISO Latin 1 if it meets their immediate marketing requirements and deal with the more extensive work of supporting other languages later. The reason is that it’s likely these software applications will need major reworking of the encoding support in their database and functions, methods and classes within their source code to go beyond ISO Latin support, which means more time and more money – often cascading into later releases and foregone revenues. However, if the software company has truly global ambitions, they will need to take that plunge and provide Unicode support. I’ll argue that if companies are supporting global customers, and even not doing a bit of translation/localization for the interface, they still need to support Unicode so they can provide processing of their customer’s global data.

Unicode

We come back to Unicode, which as we mentioned above, is a character set created to enable support of any written language worldwide. Now you might find a language or two lacking Unicode support for its script but that is becoming extremely isolated. For instance, currently Javanese, Loma, and Tai Viet are among scripts not yet supported. Arcane until you need them I suppose. I remember a few years ago when we were developing a multi-lingual site which needed support for Khmer and Armenian, and we were thankful that Unicode had just added their support a few months prior. If you have a marketing requirement for your software to support Japanese or Chinese, think Unicode. That’s because you will need to move to a double-byte encoding at the very least, and as soon as you go through the trouble to do that, you might as well support Unicode and get the added benefit of support for all languages.

UTF-8

Once you’ve chosen to support Unicode, you must decide on the specific character encoding you want to use, which will be dependent on the application requirements and technologies. UTF-8 is one of the commonly used character encodings defined within the Unicode Standard, which uses a single byte for each character unless it needs more, in which case it can expand up to 4 bytes. People sometimes refer to this as a variable-width encoding since the width of the character in bytes varies depending upon the character. The advantage of this character encoding is that all English (ASCII) characters will remain as single-bytes, saving data space. This is especially desirable for web content, since the underlying HTML markup will remain in single-byte ASCII. In general, UNIX platforms are optimized for UTF-8 character encoding. Concerning databases, where large amounts of application data are integral to the application, a developer may choose a UTF-8 encoding to save space if most of the data in the database does not need translation and so can remain in English (which requires only a single byte in UTF-8 encoding). Note that some databases will not support UTF-8, specifically Microsoft’s SQL Server.

UTF-16

UTF-16 is another widely adopted encoding within the Unicode standard. It assigns two bytes for each character whether you need it or not. So the letter A is 00000000 01000001 or 9 zeros, a one, followed by 5 zeros and a one. If more than 2 bytes are needed for a character, four bytes can be combined, however you must adapt your software to be capable of handling this four-byte combination. Java and .Net internally process strings (text and messages) as UTF-16.

For many applications, you can actually support multiple Unicode encodings so that for example your data is stored in your database as UTF-8 but is handled within your code as UTF-16, or vice versa. There are various reasons to do this, such as software limitations (different software components supporting different Unicode encodings), storage or performance advantages, etc.. But whether that’s a good idea is one of those “it depends” kinds of questions. Implementing can be tricky and clients pay us good money to solve this.

Microsoft’s SQL Server is a bit of a special case, in that it supports UCS-2, which is like UTF-16 but without the 4-byte characters (only the 16-bit characters are supported).

GB 18030

There’s also a special-case character set when it comes to engineering for software intended for sale in China (PRC), which is required by the Chinese Government. This character set is GB 18030GB 18030, and it is actually a superset of Unicode, supporting both simplified and traditional Chinese. Similarly to UTF-16, GB 18030 character encoding allows 4 bytes per character to support characters beyond Unicode’s “basic” (16-bit) range, and in practice supporting UTF-16 (or UTF-8) is considered an acceptable approach to supporting GB 18030 (the UCS-2 encoding just mentioned is not, however).

Now all of this considered, a converse question might be, what happens when you try to make your application support complex scripts that need Unicode, and the support isn’t there? Depending upon your system, you get anything from garbled and meaningless gibberish where data or messages become corrupted characters or weird square boxes, or the application crashes forcing a restart. Not good.

If your application supports Unicode, you are ready to take on the world.

I18n JavaScript – the Good, the Bad, and the Ugly

i18n JavaScript: Given JavaScript’s status as the de facto browser client scripting language, and given the international nature of the Internet, it was inevitable that JavaScript and internationalization (i18n) would eventually cross paths. Fortunately, in this day and age of Unicode, character corruption can be avoided if care is taken to make sure JavaScript is using it. Unfortunately, strings are hard coded in JavaScript and locale-specific methods are unpredictable, making localization more difficult.

To continue reading, and to see how JavaScript strings and data formatting can be supported by your selected locale, please fill out the form below. A brief preview:

Assuming currentLocale is set to English (US), the resulting code block should look like this:

Current Locale Resulting Block | Internationalize JavaScript

 

Enter Your Information to Download the White Paper

  • This field is for validation purposes and should be left unchanged.