Video Recording of LocalizationWorld Presentation: Intro to Internationalization and Localization

Internationalization and Localization experts Adam Asnes, of Lingoport, and Angelika Zerfaß, of zaac, recently presented at LocWorld in Seattle. Their session “Intro to Internationalization and Localization” was moderated by Daniel Goldschmidt, principal consultant and cofounder of RIGI Localization Solutions, and is now available for online viewing.

The one-hour recording of their presentation provides an overview over the different areas in internationalization and localization projects where best practices exist — starting from the concept of internationalization and how it is applied to project management dos and don’ts and the tools and technologies used in the field.

The business case why US companies need to internationalize their software in order to sell to the Canadian Government

In Adam Asnes’ article in the September 2010 issue of MultiLingual, he illustrated how business cases for US companies can drive their need to internationalize their software in order to sell to the Canadian Government, or to sell broadly in Quebec. I liked in his article how he mentioned that companies may adapt their software because of sales-driven reasons rather than part of a broad global marketing initiative, which have “different needs-drivers reflected in deadlines, resources and scope” than regular, consistent localization projects.

Adam goes on to describe very well, for both the techie and sales person alike (me for example), what needs to be completed to get the software localization-ready and how Lingoport rocks at helping companies with that process. Here at Milengo, we assist clients with their language support commonly after Lingoport has finished their work. And we too notice clients’ needs for Canadian Language support is different when it is deal-based, rather than as part of a broader sales plan, so I too will focus my ideas on that part. I wanted to use this blog to illustrate some examples of projects we’ve worked on to give readers ideas on what processes and technology are available and what is do-able, to help stretch your budget when sales lands a big new deal in Canada.

Let’s make the assumption that your company is doing very well and the software you produce is awesome. Sales are booming in North America. The Sales Director got a big contract with the Canadian Government. Big deal and big money. It’s signed after the champagne has been popped, you’re told that you have 3 months to deliver a Canadian French version, with documentation, since it’s required by law in Canada. And if it’s late, the company will have to pay a fine for every-day its late, eating into profits and good will. So after a big gulp of bubbly, the process begins.

Luckily, you know Lingoport already from Adam’s excellent articles in Multilingual. His company helps your developers in completing the i18n of the software so that it can be localized. He did it on-budget and before he promised, just because that’s how Lingoport rolls. Milestone 1 completed. Then you see you have about 10,000 strings for translation as well as help and user manuals, which require about 200,000 words for translation. Oy vey. The volume is too much for your staff in Canada to do it internally within this timeframe. What options can you consider?

Option 1: Have an LSP do the translation for you. Luckily, your sales team collaborated with you closely and the deal was priced to allow for high-quality human translations in Canada. You can create a glossary from the software translation, which forms a bed-rock for future updates. Consistency in your software, documentation and customer communication is recorded and used across all documents, lowering costs, increasing quality and enhancing the brand experience (a big topic that we’ll go into another time). Sounds good, right? With all those happy French-speaking Canadian customers, it may get you thinking that a more developed localization strategy might not be a bad idea after all?

Option 2: Your sales team did not collaborate with you, and the overall price of the package sold was too low. Your manager is balking at the double-digit figures for the cost of the documentation localization since the budget is not available and you have limited financial resources. Alternatively, perhaps its not a priority to have this done with high-quality human translations since this is a one-off deal. Options to consider include:

  • One of Milengo’s customers had some 1.5 million words of help-desk and customer support information that needed to be translated in a month a half in order to outsource call-center operations. Do-able? Yes! Did they have a budget of ~ $500,000? No. To get around this we worked with our partner AsiaOnline to develop a customized, enterprise-level statistical machine translation engine that uses sophisticated algorithms to provide machine translation results. To make the translations publish-ready, human linguists reviewed the machine translation output to correct errors, fix stylistic problems, etc so that it looked and felt correct. The overall saving was over 50%.
  • You want to leverage your in-house team of people in Canada, but need to make them more efficient. How about taking the glossary from your UI and use it as a basis within the Google Translator Toolkit? The Google engine will produce a translation for you using your glossary as a reference point, and afterwards, your in-house team can correct and fix the errors and improve style. Or you can have an LSP like Milengo do it for you. Depending on the nature of the content or corporate culture, if may not be appropriate, but it is an option that you can consider. Google is doing more and more of their own translations this way, and we’ve helped them with correcting the output of their translations using their own toolkit.

Option 3: You can do a mishmash of all 3 above. The UI is translated by your in-house staff (i.e “the humans”) since they are the experts. The documentation is translated by AsiaOnline’s customized statistical machine translation with human post-editing, and Google Translator Toolkit is used for internal communication in Canadian French <> English.

Option 4: While the above mentioned scenario is unlikely since you are internationalizing your software for the first time, if you did have a French translation, we could leverage that considerably. An adaptation from Continental French to Canadian can be done. While both languages are French, there are of course differences and copy-editors can go through and change terminology, style and make the local feel local, saving considerable time and budget.

There you have it. Of course each option, scenario and client requirement is more complicated and detailed than portrayed here, but hopefully it gets the juices flowing in terms of what can be done.

Post written by Adam Blau, Rebellion Leader at Milengo, a global language services provider.

The Business Why and How of Simship

This article was originally featured in the July/August 2010 issue of MultiLingual Computing Magazine, in Adam Asnes’ Business Side column. 

The subject of managing releases over worldwide markets can be a contentious one, with pros and cons on either side of business and development cases. The concept of simship is that if you are releasing your product to worldwide markets, you do it all at once rather than first releasing to your home market and then following with localized versions later. I can’t say that any one approach is right for all organizations, business situations and products, but I can share with you some of the organizational, procedural and business issues that contribute to successful simship global releases.

When a company commits to product releases that serve a worldwide customer base, there’s a long shadow cast on revenue, marketing, sales teams and of course development practices and testing. It’s a challenging logistical undertaking to release software products in multiple markets, requiring well-integrated planning and practices. It’s no wonder simship is viewed alternatively as difficult and impractical to the best thing a company can do. Let’s consider a few of the issues within any organization, starting with the business case.

Internationalization and localization are always in pursuit of a business case, and one exists both for and against simship. That said, the business cases tend to vary based on the global perspective and maturity of the company. The case for simship is strongest among experienced global companies. Their revenues are already global, so delaying releases for localized versions only serves to delay resulting new release revenues. There may be good reason for adding secondary tiers for some local release schedules, but products really should be internationalized, with a clear path for localization and testing within the development path. In practice this isn’t the reality, but there’s quite a bit of agreement and successful data on the business case existing for simship with this class of company.

When companies are relatively new to global markets, they generally tend to put less of an emphasis on simship with new releases, and more of an emphasis on market or business agreements as drivers for their efforts. Perhaps they have a new customer or distributor that must have a localized version. In that case, synchronizing new version development with localization is usually—but not always—an afterthought. This is because the company sees its prime revenues being driven by current product customers. New releases boost sales, renewals and competition, so that connection is strongest where the current customers are. We’d still argue that even under these circumstances, simship should not be pushed aside, as there are gains to be made both for revenues and operations.

Time and Revenue Projections

Attached to initial time to release and revenue opportunities are quarterly and annual growth numbers. If a product is expected to grow sales by percentages outlined and expected in a marketing plan over months, quarters and years, significant delays in turn make those projections difficult, if not impossible, to meet. Delays add up to real dollars. Now let’s leave the business case behind and look at software development organizations. It is extremely common among both development and localization teams to view localization as a tail-end process. But this is a critically limiting perception if your company is committing itself to serve global customers. Practically, a company shouldn’t build a product with a requirement as major as supporting multiple locales as a tail-end process. Even in cases where legacy code is now being first internationalized for global customers, once that adaptation is complete, from then on localization should be included as an expected part of the development process. That means including requirements for planning, architecture, development implementation, testing and release.

I asked my internationalization colleague Tex Texin to add some words about this. He seconded that as with many other aspects of globalizing applications, development organizations tend to see just the work and delay to releasing their product and not the benefits. And although we work to plan to minimize the pain, there is cost to achieving simship. However, exercising the localized versions often uncovers critical problems in the product core that can require urgent updates, recalls or even the creation of specialized tools to repair customer data in the field. In that context, simship is not only a requirement to be in the international markets and significantly enhance revenue, but is an important part of product testing preventing problems that are costly to repair and damaging to both reputation and future domestic sales.


Simship nearly always seems to be the outcome of an internationalization implementation. So, we have some experience working with legacy code that we are internationalizing and then merging with concurrent new development, building localization proactively into the process.

We find and work with the localizable content embedded in the code first. We gain a clear estimate of localization costs by examining those strings, even while they are still embedded in the code using static source analysis. That’s important because it allows the budget and financing mechanisms of an organization more time to accurately fund the localization. Then we systematically provide externalized strings for localization as we go along in the project, rather than waiting until the end. We also perform static analysis on concurrent new feature development so that when we merge legacy and new code, we minimize the risk of expensive surprises. We build functional internationalization and localization test cases and execute both. The internationalization functional testing can be performed by testers regardless of linguistic proficiency. However, because we have been localizing all along, we are also quickly ready for linguistic testing. The combined processes are extremely effective in finding both functional and linguistic defects that may have passed through if performed as an afterthought.

Agile Development: It’s one thing to talk about including localization into your internationalization and development process on large-scale efforts, but what about smaller scale and rapid agile releases? Turns out it’s really no different. I talked to Mike McKenna, globalization manager at Yahoo!, to get some perspective. An extreme example is the release cycles for Flickr, Yahoo!’s photo sharing social network. Flickr sometimes rolls out four to six releases per day, holding the expectation that developers can get immediate access to translations they may need, likely to be small UI changes. Then they pride themselves with directly connecting their developers to users, without intermediaries, to fix issues that may arise from localization or functional changes.

Yahoo! has other software, such as its Open Strategy Platform or Yahoo! Application Platform, which typically have six-week release cycles. In this case, there is a UI freeze before the release sprint so that localization can be integrated into the final release sprint. Developers work with their localization managers and ensure any last-minute tweaks that may become necessary to the UI during the release sprint are well coordinated.

Security: Let’s go back using our timetunnel to the 1990s: Windows 95 was first released in August 1995, its first service pack was released in February 1996 and the second pack in 1998. The localized versions were always lagging behind: Microsoft first released the “Enabled“ version, which was not localized but could run software in your language. A few months later, Microsoft released the localized version. Today, Microsoft and other companies release security patches on a monthly basis if not on a weekly basis. Can you imagine releasing the patch in North America first and only a few months later in the rest of the world? Simship enables the release of security patches and other critical patches on a timely basis to all markets and prevents security glitches.

Internationalization as Enabler

The success of localization and the ability to coordinate simship processes are directly dependent upon the quality of a product’s internationalization as well as the development team’s ongoing internationalization practices. Internationalization is the software development enabler, and without it or without a consistent internationalization benchmark, localization and particularly simship get broken. As the saying goes, garbage in, garbage out. Simship takes a little more planning, time, tools and coordination, but it’s hardly an onerous process. Like a lot of things, your organization has to be aware of the benefits and just do it. Then the actual doing is clearly achievable.

About the Author

Adam Asnes is President and CEO at Lingoport and enjoys investigating how globalization technology affects businesses expanding their worldwide reach. Adam is a sought after speaker at industry events and a columnist on globalization technology as it affects businesses expanding their worldwide reach. He often writes articles for localization, internationalization and globalization industry publications and enjoys cycling and Colorado’s Rocky Mountains; he can be reached by clicking here.

Lingoport’s Internationalization (I18n) and Localization (L10n) Tools and Consulting Solutions

Founded in 2001, Lingoport provides extensive software localization and internationalization consulting services. Lingoport’s Globalyzer software, a market leading software internationalization tool, helps entire enterprises and development teams to effectively internationalize existing and newly developed source code and to prepare their applications for localization.

For more information on how Lingoport can assist you with all of your internationalization and localization needs, please contact us at, call 303.444.8020, or complete the quote request form.

What If Internationalization Expectations Exceed Your Budget? – Significantly

Note: This article is featured in the June 2010 issue
of MultiLingual Computing Magazine, in Adam Asnes’ Business Side column.

If you’re considering internationalizing a large and complex software product, there’s one thing you should be prepared for: it’s expensive. There’s just no way around it if you want an application that properly presents, inputs, transforms and reports complex data. I’m talking about applications measured in the hundreds of thousands to millions of lines of code. Seriously, you’re just not going to internationalize a sizeable application that you’ve taken years to develop with money just laying around – unless you have a lot of money laying around, which is pretty rare these days. But before we consider what to do about it, let’s consider the main reasons why you may need to internationalize:

Survival –

Your customers are increasingly global, and perhaps they use your product to reach their customers. If you’re not internationalized, you’re limiting their business. The competition and your customers will know this and will eventually eat your company alive. You’d better start finding some money.

A Sale –

There is nothing like an important customer to get an initiative moving. If this sale funds the internationalization effort, it makes things easier, though there will be commitment that will extend beyond any one customer. I’ve written before how changing your encoding will change your company. But if this sale doesn’t pay for the effort, corporate initiative will be needed.

Your company is global –

Perhaps your company is a global brand and you’ve quickly developed or acquired a product that isn’t internationalized. In this case, the decision to internationalize is usually simple. You do it because you already have a global reputation, sales and distribution. If you have to justify ROI, somebody is missing the point, there’s a temporary issue or the product isn’t showing promise.

Strategic Initiative –

This article isn’t going to be about all the strategic benefits of growing global revenues with products that leverage themselves worldwide, because you know all about that, right? But acting on strategy takes foresight, money, expertise and perseverance.

If you have any of the above situations except budget, this article is especially for you.

I’ll repeat a situation I’ve seen many times. My firm, Lingoport, will be called upon for initial consulting as a company is considering internationalization in reaction to a declared strategic objective to gain business outside a home market. They usually have one or two customers asking for just that, but perhaps there isn’t enough initial interest to finance the necessary development and localization. We go back and perform static analysis on the code using our Globalyzer software, counting the embedded strings, locale-limiting methods/functions/classes and programming patterns that will need attention and refactoring, combined with architectural changes to support locale and changes in processing.

Even with automating tasks for batch efforts like string externalization (after analysis), you still have design, engineering and testing cycles that add up to significant expense. At this point we find out just how strong corporate global resolve sits. And in some cases that resolve is just not quite ready. It’s not a lost cause by any means. In fact, almost always, it’s just a matter of time and resources and most come around in future quarters or fiscal years. But there lies the gap for development managers.

Rarely do developers internationalize software just because it would be cool. You do see that kind of initiative for new features, where a developer might get an idea, work on it during odd or even personal time, and voila, present it to his or her company peers. I have yet to see that happen regarding internationalization (write me if you see otherwise). Still, developers and management often know the need to internationalize is there; ready to become a firm requirement any quarter now. They can go on continuing to develop new features and update current code and not go near internationalization, but actually increasing the scope of the internationalization effort as they grow the code base. Or they can take some simple steps to get ready. To use an expression, “When you find yourself in a hole, stop digging.” Here’s a brief list of what you can do:

  • Gather requirements – new locale requirements will go much further than what languages will need to be supported. An architect can be tasked with learning about issues like character encoding and locale frameworks. A product marketing person can learn a bit about use cases and business logic that may alter how the product behaves in new countries. It is all too easy to underestimate the requirements phase. Locale behavior will involve quite a bit more than just string externalization. Start tallying and recording what is found in a centrally available resource, like the company wiki for all to build upon and learn about.
  • Prototype a string retrieval method. Learn about resource files and string ID’s and how to make them work. Again, list your results in the company wiki.
  • Do a little reading about Unicode and its various encodings, along with appropriate technologies for their use. It’s not enough to commit to using Unicode. You have to gain some understanding of just what that means.
  • Consider your database schema and how that might change for locale support along with likely changes to character encoding.
  • Consider any third party components or open source you use within your application. Start inquiring about their internationalization support.
  • Consider internationalizing a pilot effort or component of your software if your product architecture will permit it. There’s nothing like learning by doing. And if you decide to take a somewhat different approach later, it probably won’t be too difficult to alter what you’ve already done.
  • Refine your planning – as you learn more, your planning efforts are likely to get clearer. As plans get clearer, they seem less risky and large. You’ll be in a better position to defend expected costs, resources and schedules.
  • Consider application logic. Does your software manage a process that is performed differently around the world?
  • Talk with experts – It’s not prudent to try and reinvent the internationalization process. An experience expert, who’s really been through multiple implementations rather than just advising, can get you prepared faster and cheaper than the time it will take using your internal developers. I’ve seen companies create their own proprietary approaches that ultimately get in the way of a successful implementation. Initial consultation shouldn’t be a budget buster. Even so there are free internationalization webinars (we give them and others do too) and excellent conferences available (i.e. Worldware and the Unicode Conferences).
  • Start measuring toward your expected outcome – If you establish internationalization development practices and measure benchmarks, you are likely to see improvements to new development without significant cost in time and money. Static analysis tools like Globalyzer create a systematic approach, but if there’s no budget, then a simple and clear inclusion of practices and expectations can go a long way.

If you do at least some of this prior to any funded but highly likely internationalization requirement, you’ll be a tremendous asset to your firm’s globalization efforts. And globalization might just be one of the more significant and company-making undertakings that your firm can embark upon.

Internationalization ROI

Note: This article is scheduled to be featured in the August/September 2009 issue
of MultiLingual Computing Magazine, in Adam Asnes’ Business Side column.

It’s easy to get agreement that revenues beyond a company’s home country market are important. If you look at some of the great global US brands, you’ll find that global revenues are 50% or even greater than 65% of their gross. While much has been made of measuring the return on investment for localizing software, what about measuring the very process of making software which is internationalized so that it can be localized and supported worldwide?

There are lots of issues to measure, and they vary in emphasis for the company which is making its first efforts outside its home market, to companies that have highly evolved processes for global releases.

First we must consider opportunity costs, backing up marketing and sales efforts, competitive pressures and right down to cost of engineering. Now typically ROI calculations get down to hours saved at a particular rate, which is certainly valuable information and usually those numbers are paramount to analyzing any kind of process changes. But if a company is making new efforts or experiencing painful delays in global releases, opportunity costs and major market factors are deal makers and the stuff that executive level directives are made of.

Internationalize or Die

This heading may sound dramatic, but it’s quite the case for some of our clients. For instance, we have a client whose software platform is used by third parties in e-commerce efforts. Many of their accounts are well recognized names in retail and merchandising, who are beginning to look at markets outside the US as important to their brands. While our client is not interested in purchasing localization themselves, if they can’t make their product support data management and presentation in multiple languages and locale sensitive formats, they will lose their customers to competitors. I asked their senior management what was at stake, and they replied nothing less than their company’s future growth and survival. Given that this a billion dollar company, I’d say that’s a pretty big opportunity cost ROI on an internationalization effort.

Opportunity Costs

Internationalization happens because it’s first and foremost a business driver. I have yet to meet the development team that decides to internationalize just because it would be an interesting task. So I think it’s appropriate to first consider business drivers outside of the development process itself.

Perhaps global sales efforts have been taking place with a US English product. Outside of development, there are costs of sales, marketing personnel, supporting distributors, legal and administrative costs to name a few. These all have expensive price tags, which are independent of having an internationalized and localized product. And an internationalized and localized product has been shown to make those representative costs far more effective at producing revenue.

Cost of Delays

In an earlier article and subsequent whitepaper on our site, I outlined the cost of being late. The quick summary is that the marketing team will typically have projected revenues for each market, but dependent upon release criteria. If a product is a single quarter late, which is not bad for a large project for some software development teams, they just lost a quarter of their year for the sales teams to meet those projections. What’s the value of one quarter of sales effort? If those sales efforts are expected to produce increased results over time, how does that roll out and effect market penetration in future years? While these are broadly variable scenarios, I always like to consider the “top end” revenue implications before beginning to count development hour savings. The top end always has far broader consequences and those opportunity costs get very real with numbers followed by many zeros in a competitive world.

Cutting Development Costs

Catch Bugs EarlyMy company, Lingoport, has just released Globalyzer 3.0 which is aimed squarely at supporting entire development organizations. It’s actually the only commercial system of its nature, purpose built to support a very broad list of programming languages, measuring, filtering, reporting, tracking and even fixing internationalization issues over the development processes via its client, server and database components. Companies have products to measure coding quality, security issues, memory management and more. Now we are adding static analysis of internationalization to the source code development process. Remember, if so much revenue is riding on global markets, doesn’t it make sense to actively measure and aid software globalization issues, just as much as software security issues? Why not check source for embedded strings, locale-limiting methods/functions and classes, Unicode compliance, Font issues, i18n limiting programming patterns and the like at regular automated intervals rather than waiting until QA or localization? Remember the management principal that if you want to improve anything, measure, track and report it as close to its creation as possible. What gets measured gets done.

Cost per i18n Bug – Case Study with Mature Localization Practices

In working with a new client, which is already quite mature in their localization and internationalization efforts, we had the opportunity to get actual ROI data, based on real internationalization bug fixing costs they had measured over 60 localized products. After cleansing that information of confidential data, they gave me permission to share it though limiting the data to results from 17 products.

Traditionally, they have been finding internationalization bugs during internal and external localization QA testing efforts, including both Psuedo-Localization (creating fake translations for testing purposes) and actual localization testing performed by both their organization and vendors. They counted five organizations touched by internationalization errors: Localization Vendor QA, Localization Project Management, Internal Localization QA, Product Development QA and Core Engineering. The process goes something like this:

  1. Internationalization bug is discovered and reported during Localization
  2. Project Manager tracks the bug, may enter or flag it in a bug tracking
  3. Core Engineering, which likely has moved on to other efforts by
    now, must assign and fix the bug
  4. Product Development QA must verify the
    fix and any other issues the fix may have affected
  5. Additional Localization
    efforts may need to be made for the same issue

This iterative process gets pretty expensive. Remember that a maxim for software development is that the earlier you find and fix bugs, the less expensive. Fix a bug before a QA cycle, and you save multiple people having to process that bug in some way, and retest the solution. Need to fix a bug after release? Costs get much worse. This principal is a major contributor to the popularity of moving to agile development cycles, so that you enhancing and verifying software in smaller, successful, less expensive cycles.

Our client figured on an average of 25 internationalization issue bugs per release, an average of 10 hours spent cumulatively by the five groups per bug , with an average of 60 releases per year over these 17 products. Some products had zero i18n bugs reported, others had over 100. The business case for finding internationalization issues in source code as part of regular automated processes integrated into their build cycle gets very clear at this level. They estimated savings of $420,000 per year, just on reducing localization QA costs. By finding the issues early, total product development savings were calculated to be over $760,000 per year.

Internationalization ROI Chart

Remember that even maturely localized products, still have regular new release cycles, which in turn create the potential for new internationalization issues. Product Development never really stops, and teams tend to be more broadly geographically distributed than ever before. That makes measurement tools all the more valuable for localization savvy companies.

Cost per i18n Bug – Case Study, Product has Never Been Localized

When you consider companies engaging in early globalization efforts, the payback simply multiplies per product as you can expect the i18n bug count to go way up. Without a tools-based approach to finding and fixing issues, internationalization will be very heavily trial and error iterative. One can write a few scripts which will take considerable time, research and effort, and still likely produce unreliable results. Then you can pseudo-localize display strings after you’ve found as many as you can and externalized, or populate the database with target encoding data. You would then test, test and test again while you had to hunt down the issues one by one in the source. This only multiplies the cost per i18n bug. By finding issues first at the source level, you can actually begin to orchestrate their correction, tying directly to that issues precise location within hundreds of thousands, or even millions of lines of code. And that’s an intelligent way to find and remove a needle in a haystack.

The table below illustrates the costs of i18n bug iterations for a single product of about 500,000 lines of code during the first internationalization effort. This table doesn’t include additional costs of researching and implementing various scripts and homemade utilities to help the work get done. It also doesn’t take into account that a tool like ours actually isolates i18n issues, pinpointing them in source, while also facilitating batch externalization of strings – both very tedious and time consuming activities. Consider that even a simple error message that gets missed using traditional scripts and trial and error, may not show up at best during late QA efforts that force the error to appear, or worse, after product release. We commonly hear that it takes three or four localization releases to weed out those sorts of issues that get missed so easily. That is why this table lists a higher i18n bug rate for 2 subsequent releases than the table used for the localization mature company earlier in this article.

Internationalization ROI Chart

Pitfalls and Adjustments

I think it’s fair to say that no tool offers a panacea. The strike against coding quality checkers in general has been complaints about over reporting errors, often referred to as false positives. It’s true that if you overload a developer on data that is only partially relevant, that data risks being ignored. That is why any enterprise scalable solution must include dynamic ways to filter results, share those filter controls and track them over time. You also must have flexible detection, so that you can add unique parameters that invariably crop up and can be quite particular to a specific code base.

New processes may not be greeted with enthusiasm by development teams which are typically already over tasked and under-resourced, so it’s important to help them understand the meaningfulness of getting global releases out faster and with higher quality. Automating code checking and reporting during a regular process like a periodic build is an excellent way to track and highlight progress.

Enterprise Internationalization and Automation

There are some technology companies where thinking globally has been fundamental to their operations for years and years. I’m referring to companies like IBM, HP, Yahoo, Google and the like. These companies all made significant investments in their global infrastructure, sales teams, products, development and strategic planning. It didn’t happen by accident. And as these companies develop new products or acquire companies, they look to leverage them across that global infrastructure quickly and profitably. Global companies are good prospects for my company in our internationalization products and services business, because they tend to be more experienced in their understanding of engineering challenges, knowing that it takes people, tools, time and money to globalize software so that they can gain the best return on their product distribution and sales infrastructure.

One very potent way to make software globalization fundamental to a company’s mindset is to make internationalization a fully integrated and automated part of software development practices. There are all kinds of tools, checkers and environments to help developers create interfaces, access and transform all kinds of information buried in databases, support coding constructs, manage memory and perform application modeling. With that in mind, we’ve been hard at work with a major new Globalyzer release, clearly aimed at supporting entire development departments and enterprises, automatically using batch processes on servers to monitor internationalization progress as well as on the desktop where issues can be individually examined and fixed. While that has always been our aim, we’re now getting there in more robust ways that track internationalization status over time over multiple programming languages and even over multiple products.

Globalyzer i18n software

For those non-developers reading this, let me explain what I mean about automation in this context. When engineers create code, they generally all submit their work to a code repository. This repository provides version control so that when multiple engineers are all working together, they can check code in and out and merge together all their changes. Then the code has to be put together and built. This build process usually occurs on some interval, such as nightly or even on a continual basis. During this automated process, you can also automatically check for many other issues like performance, load balancing, and I’m proposing that this is a great time to check on internationalization/localization readiness by running tools on the code automatically as a batch process, which then tracks issues via reports. Now counting issues is one thing, but you can go even further by showing exactly where a problem exists in the code, along with the context of the errant issue. That information can then be brought forward for quick review and fixing.

Two companies which come to mind, doing this very thing are Intel and Yahoo. Michael Kuperstein of Intel, presenting at the WorldWare Conference in March, reported how his team developed their own internationalization toolkit a few years ago and have integrated it into many of their automated build processes. That automation has made internationalization an important and measured component of their ongoing development efforts. By Mike’s own admission, he would have used Globlayzer had he known about it years ago.

Mike McKenna of Yahoo also reported at WorldWare that his globalization team is using automation, in this case Globalyzer, to measure internationalization benchmarks on development teams.

Globalyzer, a leading internationalization tool support software internationalization, Java internationalization, and software localization.

On the localization product side, there are multiple tools for different aspects of managing words. But when it comes to products which support an enterprise in their software internationalization efforts, there is a pretty empty playing field. Aside from some very simple string externalization utilities in a few development environments and frameworks, our Globalyzer is simply the only commercial software I know of that can automatically monitor development over time over a wide range of programming languages, while also stepping entire teams through internationalization fixes in large amounts of code.

I’ve said a few times in my columns that I’ve found that it’s quite powerful to embrace the management principal that whatever gets measured gets done and improved over time. So it follows that one of the most important aspects of any software development undertaking is that you measure desired outcome over regular intervals. If you just hope that it will all come together in the end, you always end up late and over budget. That is ultimately behind the agile and extreme programming development movements, in that you make more frequent intervals of measurement and goals. But it’s not so easy to track something like internationalization, either as a project where you are refactoring software for new globalization requirements or even for ongoing development. Consider that developers are typically over tasked, and often distributed across time zones and continents. Then factor in that internationalization can be quite subjective to a particular development task. Plus internationalization is a fuzzy thing, in that it is tailored to requirements, technologies and special cases. So what development teams grapple with how to handle it, and make their way through the task by brute force – or simply postpone or avoid internationalization whenever possible. Issues get missed, and if you’re lucky, you have an iterative process during localization to fix internationalization bugs, which is a very expensive and time consuming path. Or worse, development ignores the issues and calls it a localization problem.

I spoke with a company in just that situation last week. They were upset with their localization provider for poor quality, but when we examined some of the issues, there were also extensive internationalization mistakes that were sure to break localization context and execution. These included missed strings and extensive string concatenations. Had they been monitoring these efforts all along, and been clearer on internationalization requirements, they would have had better results and a clean release. The biggest costs to them were poor market entry, customer dissatisfaction and complaints from their distributors and sales teams which had to overcome a poorly localized release. Now I also feel that as vendors we have some responsibility in taking care of clients and not selling them a solution that risks poor quality and a weak market entry, so some blame also goes to the localization provider. But I hardly know what really happened, I was just there to offer help in picking up the pieces. Clearly that’s an expensive route in many ways.

Remember, internationalization is often run by a different crew than localization. Software developers are upstream from localization, and they are sometimes all too disconnected from a final localized product releases. Localization is often someone else’s problem and engineers are focused on getting a release out with all its new features. They don’t know what they don’t know, which is only human. That leaves localization teams waving their arms around trying to get the developers to build software right the first time. And those teams likely have no way to measure if the product they are tasked with for localization actually passes internationalization muster, until they go through localization testing. Again that’s very late and expensive in a software development process, and more often than not, localization testing tends to be underfunded and vendor dependent. You’re going to have trouble finding everything. So for localization teams, what I’m suggesting is to consider a kind of automated litmus test. When code comes to the localization group, scan the code for internationalization issues, and consider what’s found. The technology is now there to do this in detail and examine each potential issue, quickly and easily. So at the worst case, you can at least have engineering fixing internationalization bugs during the localization process rather than when it’s far more expensive.

Again, anything that measures and sheds light on the situation will also have the result in making improvements. So if you want well globalized software, better start measuring how that code is developed, not just what it’s costing to localize it.

P.S. I’m thinking of writing a column on funny ways people fell into the localization business. If you have a good story you’d like to share, please contact me!

Corruption! Creating an ìèíèñòð Opportunity

by Adam Asnes, President, Lingoport
As appeared in Multilingual Magazine

Chances are you’ve seen corrupted data, but perhaps didn’t think too much about it unless you’re a localization engineer. Most people see it first in their spam, coming with promises of Euro-Lottery millions or other nefarious offers. The corruption evidence is in the square boxes or random nonsensical characters that fill the subject heading or email body, if you haven’t deleted it already. What’s happening is that somewhere along the way, or in your mail client, the character encoding the message is written in is not being supported. Obviously you wouldn’t feel very confident using a product, site or system that suffers this same issue, so it’s a clear defect. Sometimes you even see it when everything is still all English, most notoriously when somewhere along the way the software system you are using can’t process a simple apostrophe.

Internationalization tools support the software localization process.Remember that all data on computers ultimately breaks down to zeros and ones. These values are then interpreted to form characters and then strung together as words or symbols. Corruption occurs when the interpretation of the encoded zeros and ones does not form the intended character. For example, the application thinks the encoding of a character is ISO-Latin 1 rather than UTF-8 and so displays the wrong character. We have run into several internationalization services customers over the years that have inadvertently corrupted character data buried within large databases. Here’s an example of how bad this can get:

Imagine your company is a world leader for building heavy machinery and construction equipment. You have a massive parts catalog. Over time, an unknown amount of data has experienced character corruption. The characters are no longer humanly readable. They look like gobbledygook. Or, you have a complex online customer management system with a large database of users and corresponding account information with broken character encodings sprinkled throughout.

In each case there are too many occurrences peppered throughout the data to review and manually decipher what the original intent of the content was. You can imagine the panicked conversations when the broken characters are discovered. “Oh σηιτ, look at this! How the φυχκ are we going to fix this!”

Often the instances are too scattered and it’s too difficult to roll back to previous versions of the data, as everything new would be lost, and it may not be known just when the character corruption might have started happening.

The corruption occurs in the first place when there’s some source in the application or process or reviewing data breaks the encoding. For example developers may have implemented a web page form that isn’t properly set up to return data in the correct encoding. Another possibility is that someone manually imported new data into the database, but used an editor that is not set up to handle, say UTF-8 encoding. The culprit might be as innocent as using Notepad incorrectly.

At this point, this conversation has happened with clients several times a year, and in every case, these clients already happened to be working with us in some capacity, whether on service projects or licensing our Globalyzer software. I suspect the problem isn’t actually all that uncommon. So we finally decided to take some of the advice I’ve been trumpeting in this column and productize some of our solutions. At the time of this writing, we haven’t decided on a product name yet, so we affectionately call this solution The Decombobulator. We’ll probably officially release it as something boring like db Ambassador, but we’ll always call it the Decombobulator internally because it sounds funnier. Check our website to find out if humor or practicality wins out (remember that we are probably the only company using an icon of a toilet plunger as part of an interface and utility names like PseudoJudo). In fact, I encourage you to contact me if you’d like to vote on it or suggest a better name.

So here’s how we solve this problem. The Decombobulator runs on your data or database, reviewing characters at the byte level and reporting the results. It then helps you compare character encoding to the intended encoding and then reports, suggests and helps automate the correction back to what the character was intended to be.

Here’s an example using corrupted names from a database which initially had problems with some cases of extended characters:

Internationalization tools can help prevent character encoding corruption.

I’ll add that we’ve seen strings that clients have submitted to their localization vendor which also have the same types of instances of corruption. Often this happens when someone opens a file, just to check that the data is there in the first place, but then saves it again without the proper character encoding settings. The localization firm then has a number of isolated strings, perhaps including past translations, which are now broken.

I’m not illustrating all this as a sales pitch. I somehow doubt we’ll sell very much of the Decombobulator, but for the people that need it, it will be a lifesaver. In fact, much of the development and productization of the Decombobulator happened without my knowledge and even in part against my intentions. One of our team just took it upon himself to take extra time while getting his other work done, to enhance what we had and put it together. I bring this all up because in your business, you likely encounter some problems just like this which are just begging for a repeatable and scalable approach that will make you a savior to your client or coworkers. And if you can repackage it for the benefit of your organization or clientele, you’ve just created a significant differentiating value. That’s what people love to buy, whether it’s you selling your continued employment or cementing a client relationship. This doesn’t mean you learn software development on the side if you’re not a developer. Every process presents its own opportunities.

The economy is rough out there. I won’t bother parroting what you’re no doubt reading. It may be that one of the few bright spots is still the language services and technology industry. I talk to quite a few CEO’s of localization companies and they all seem to be reporting that business is holding up, but they are crossing all their fingers and toes that it stays that way. If I were in the automobile or furniture business in the US, I’d be beyond scared. But the fact is that the entire language computing industry directly connects to helping technology firms make more money. Notice I didn’t say save money. While that’s important too, making money always wins. So the way that we differentiate our industry and for our clients and co-workers is by innovating in ways that get work done faster, better and cheaper, so that someone can sell something more effectively anywhere in the world. And that’s just great business.

Internationalization Management Tips: 10 Mistakes to Avoid

It’s extremely common for us to work with clients who have had a bumpy past with regards to internationalization. Sometimes you have to learn things the hard way, but that is always expensive.

In the past I’ve written about ten tips for managing internationalization projects. Here’s a look at mistakes that I’ve commonly seen repeated on the client side. In our services practice at Lingoport, we often have to council our clients through one or more of these sorts of process issues, which is actually a very rewarding part of what we do. While this list is pretty high level, we’ve seen that the processes involved can set up cascading failures that eventually can have a serious impact on a project’s success. Some apply more to internationalization of existing applications; others can apply to development where internationalization is planned in from the point of conception (still kind of a rare thing, but gaining).

So, here are 10 internationalization process mistakes to avoid:

1. Don’t forget what drives internationalization:

Money on the top and bottom lines of your company’s balance sheet. The point here is that the costs of being late or lousy endure way beyond benefits of cutting corners on development. Internationalization happens because of a:
a. New customer(s) sale
b. New partnership
c. Strategic initiative backed by marketing, legal and other types of efforts and investments

2. Don’t assume internationalization is just an older software legacy issue.

It comes up surprisingly often that people even in our industry think that internationalization is mainly an issue for older applications. No framework, whether it’s J2EE, .Net, Ruby on Rails, PHP or whatever is new and improved, internationalizes itself. You still need to do all the steps necessary to implement locale and all the associated internationalization practices. Many newer programming platforms do an excellent job of internationalization support, which is great news as you can estimate and execute with a higher degree of accuracy. But you still have plenty of work to do.

3. Don’t assume you can treat internationalization like any other feature improvement when it comes to source control management.

With internationalization source control can need an extra step of thinking things through. It’s very typical for new feature development and bug fixing to be going on in parallel to internationalization efforts. However, in the process of performing internationalization, you are going to be breaking major pieces of functionality within your application as you make large changes to your database and other application components. In order for respective developers to work on their own tasks and bugs, you typically need to branch code, often with specifically orchestrated code merges.

4. Don’t assume internationalization is just a string externalization exercise.

Prevent corrupted software strings with Lingoport's software internationalization toolString externalization is important and highly visible, but the scope of internationalization includes so much more. For example: creating a locale framework, character encoding support, major changes to the database, refactoring of methods/functions and classes for data input, manipulation and output. How these are all approached, varies greatly based on requirements and technologies.

5. Don’t wing it on Locale

Designing how locale will be selected and managed often doesn’t get the amount of thought and planning deserved. How the application interacts with the user, detects or selects locale, and then how it correspondingly behaves is a design process needing input from an experienced architect, product marketing and the development team. This is not an area to be chosen by any one representative by fiat. It’s a whole lot of work to redo locale if it’s executed inadequately for user, business and locale requirements.

6. Don’t create your very own internationalization framework

Don’t even do it if you think you know better. We regularly run into clients who have half-way implemented internationalization using their own homegrown methods for string extraction and locale management when there were already well establish methods provided within their programming language framework or established solutions like ICU. Using these will ensure that your code is far easier to maintain, and you’ll know that thousands of applications have used them successfully before you. No unpleasant surprises.

7. Don’t think that the team internationalizing your software can work without a working build

This seems obvious, but it comes up lots. Without a working build, the developers can’t smoke test the changes they are making. Even if you provide a dedicated QA person, my own experience is that developers need to be able to compile and run themselves to head off problems later. It’s too hard to rely on reconstructing coding errors at a later time and make for unnecessary bug fixing iterations, lost time and poor quality.

8. Don’t run out of money

Internationalization planning often suffers from underscoping. At Lingoport, we have both software and well established methodologies for estimating internationalization, as we really don’t want to ever break this rule and have to ask our clients for more funding. Same should hold true for internal efforts. Lapses in funding can cause expensive delays, as new funding takes more time than anyone imagined to get approved. It also reduces management credibility. And chances are, if you need to ask for more money, than you also need more time, which brings you back to consequences regarding tip #1.

9. Don’t use a half thought-out character encoding strategy

Use Unicode, rather than native encodings. If you have budget and time constraints and you’re only targeting dominant languages in markets like Western Europe, North and South America, you can often get away with ISO Latin – 1, but even for Eastern European languages, go Unicode. Then when you do, make sure your encoding works all the way through the application. And don’t forget that if your customer needs to support worldwide customers themselves (e.g. enterprise software), they may need you to support Unicode data processing even if the interface remains in English. One more consideration tilting toward Unicode is that programming languages like C# and Java already internally pass strings and data as Unicode, so you might as well think about engineering for the world.

10. Don’t use your same testing plan, or just rely on localization testing, when your functional testing needs to grow to include internationalization requirements

In our services projects, we always put special emphasis on working through pseudo-localization of not only the interface, but sending test data using target character sets, locale altered date/time formats, phone numbers and more, from data input to database, to reports and so on. If your testers are English only speakers, that’s fine. For example, we have a utility, PseudoJudo in one Globalyzer that puts target language buffer characters surround English strings. You can expand data fields to fit physically longer strings giving room for translation changes in sizing as well as encoding.

11. Bonus Tip: Don’t assume localization is just someone else’s problem

It’s funny how many of our customers are strictly concerned with software development and don’t actually have anything to do with localization processes. We always work to bring together localization into the internationalization effort. We do this by interfacing localization resources early on, helping them understand the technical requirements and then feeding translators strings that we extract on the front end of projects, so that when internationalization functional testing is done, we are immediately ready to perform linguistic translation testing and ultimately deliver a finished product. This compresses times to global release, while also making for a more fluid process, less programming iterations and higher quality.

Understanding Internationalization Stakeholders

by Adam Asnes, President, Lingoport
As appeared in Multilingual Magazine

In pretty much all of our client engagement opportunities at Lingoport, we quickly arrive at a common discrepancy in how people within organizations view the decision process for internationalization and localization. On the one hand you have a VP or CEO saying, “We must have this product ready for such and such market by year end!” and on the other extreme, you might have an engineer plotting out her decision process based on technical task oriented details – like locale frameworks, database changes and the like. One mindset is event or strategic driven. The other is focusing on the minutia of the process. Neither approach is wrong, but I always feel the client is best served when both mindsets come together.

When companies internationalize their software, it is fundamentally changing its world view from their status quo of selling what they have for their home market, to adapting software to work gracefully in any language or locale. It’s a strategic vision or customer request that brings this about. Or in many cases, a company may have even been localizing product support information, yet selling software as English version only for many years, and recognizes it needs to correct that weakness. Fortunately for us, internationalization is becoming less of a surprise process as executive understanding of software globalization has been maturing.

Globalization is a hot strategic subject for just about every business conference these days. Competition worldwide is tougher, and overall world demand for software is up, so the globalization impetus is hardly visionary any longer. I like to broadly summarize internationalization drivers as:

      • The boss went to a conference/board meeting/gathering and sees that he/she must move forward more aggressively with supporting global software sales


      • Or, the company has a big new client/partner/joint venture opportunity, but it requires that the software work in another or several languages.


      • A competitor is successfully entering new markets with an internationalized product and the company must catch up to compete


      • Or, the company is already quite global but is purchasing another company which is not, and needs to get the software adapted as quickly as possible.


    • The company has a global view, but developed software quickly and as such, let internationalization go in favor of getting to market quickly. The product has proven successful and it’s time to roll it out.

The same company, just depending upon the business unit or product team, may fit into some level of all these business drivers.

Internationalization Tools by Lingoport
Executive View

The executive team will be concerned about the balance of issues regarding delivery time, marketing, sales and personnel expenses, setting up offices/distributors/partners, legal and tax issues, and more countered against revenue projections. Internationalization for them is getting the product ready so that is supports revenues, global logistics and strategies. It’s a key part of the deliverable though clearly a means to a carefully projected outcome.

Engineering View

Lingoport offers software internationalization consulting and software internationalization toolsI have yet to meet the VP of Engineering, or any engineer for that matter, which wakes up one morning and thinks, “Gee, I think I’ll internationalize our software because it would be cool!” Engineering is in general over tasked, shorthanded, time critical and primarily responsive to documented marketing requirements. New feature functionality on the other hand, is occasionally trail blazed by engineering even before marketing clearly understands a need. For most engineers, internationalization is revisiting development they’ve already done, and breaking it, only to be rebuilt again. That’s seen differently than a new feature.

Engineering will view internationalization as a technical objective and use case, deconstructing it into tactical steps. As a rule, Engineers are really smart people, so they go about figuring out how to internationalize their code, but often with no or limited previous internationalization experience. So they intensively hit the books and Google. Here at Lingoport, after internationalizing so many applications over so many programming languages, we are still learning with every implementation, but the bank of knowledge has become quite deep. Internationalizing a complex software system for the first time, the engineers will almost certainly miss-scope part of the effort, make some mistakes, endure some poor assumptions and run late. That has the potential to sabotage the plans that the executive team is counting on. This is where at a minimum, getting some educated advice, tools and assistance can be highly effective in meeting broader market release goals and obligations.

On top of that, engineering time is never free or infinitely available though sometimes both these conditions are initially assumed. The development team requires salaries and other support. Engineering production also has an important opportunity cost. Does the team work on new features for their current clientele in markets where they are already strong, or do they take a “time out” on new feature development to engage in a full on internationalization effort? You can rarely have both going on at the same time unless you bring in outside help, with well coordinated project management and a good source control strategy.

I consider it part of our job, when working with clients, to bring together the executive and engineering criteria, so the strengths of both are considered and all stakeholders are educated and can have a predictable outcome. This makes a foundation for stronger individuals, teams, products and companies.

Unicode and Internationalization Primer for the Uninitiated

Among our friends and clients at Lingoport, we regularly see ranges of confusion, to complete lack of awareness of what Unicode is. So for the less- or under-informed, perhaps this article will help. The advent of Unicode is a key underpinning for global software applications and websites so that they can support worldwide language scripts. So it’s a very important standard to be aware of, whether you’re in localization, an engineer or a business manager.

Unicode and Internationalization

Firstly, Unicode is a character set standard used for displaying and processing language data in computer applications. The Unicode character set is the entire world’s set of characters, including letters, numbers, currencies, symbols and the like, supporting a number of character encodings to make that all happen. Before your eyes glaze over, let me explain what character encoding means. You have to remember that for a computer, all information is represented in zeros and ones (i.e. binary values). So if you think of the letter A in the ASCII standard of zeros and ones it would look like this: 1000001. That is, a 1 then five zeros and a 1 to make a total of 7 bits. This binary representation for A is called A’s code point, and this mapping of zeros and ones to characters is called the character encoding. In the early days of computing, unless you did something very special, ASCII (7 bits per character) was how your data got managed. The problem is that ASCII doesn’t leave you enough zeros and ones to represent extended characters, like accents and characters specific to non-English alphabets, such as you find in European languages. You certainly can’t support the complex characters that make up Chinese, Korean and Japanese languages. These languages require 8-bit (single-byte) or 16-bit (double-byte) character encodings. One important note on all of these single- and double-byte encodings is that they are a superset of 7-bit ASCII encoding, which means that English code points will always be the same regardless the encoding.

The Bad Old Days

In the early computing days, specific character single- and double-byte encodings were developed to support various languages. That was very bad, as it meant that software developers needed to build a version of their application for every language they wanted to support that used a different encoding. You’d have the Japanese version, the Western European language version, the English-only version and so on. You’d end up with a hoard of individual software code bases, each needing their own testing, updating and ongoing maintenance and support, which is very expensive, and pretty near impossible for businesses to realistically support without serious digressions among the various language versions over time. You don’t see this problem very often for newly developed applications, but there are plenty of holdovers. We see it typically when a new client has turned over their source code to a particular country partner or marketing agent which was responsible for adapting the code to multiple languages. The worst case I saw was in 2004 when a particular client, who I will leave unmentioned, had a legacy product with 18 separate language versions and had no real idea any longer the level of functionality that varied from language to language. That’s no way to grow a corporate empire!

ISO Latin

A single-byte character set that we often see in applications is ISO Latin 1, which is represented in various encoding standards such as ISO-8859-1 for UNIX, Windows-1252 for Windows and MacRoman on guess what platform. This character set supports characters used in Western European languages such as French, Spanish, German, and U.K. English. Since each character requires only a single byte, this character set provides support for multiple languages, while avoiding the work required to support either Unicode or a double-byte encoding. Trouble is that still leaves out much of the world. For example, to support Eastern European languages you need to use a different character set, often referred to as Latin 2, which provides the characters that are uniquely needed for these languages. There are also separate character sets for Baltic languages, Turkish, Arabic, Hebrew, and on and on. When having to internationalize software for the first time, sometimes companies will start with just supporting ISO Latin 1 if it meets their immediate marketing requirements and deal with the more extensive work of supporting other languages later. The reason is that it’s likely these software applications will need major reworking of the encoding support in their database and functions, methods and classes within their source code to go beyond ISO Latin support, which means more time and more money – often cascading into later releases and foregone revenues. However, if the software company has truly global ambitions, they will need to take that plunge and provide Unicode support. I’ll argue that if companies are supporting global customers, and even not doing a bit of translation/localization for the interface, they still need to support Unicode so they can provide processing of their customer’s global data.


We come back to Unicode, which as we mentioned above, is a character set created to enable support of any written language worldwide. Now you might find a language or two lacking Unicode support for its script but that is becoming extremely isolated. For instance, currently Javanese, Loma, and Tai Viet are among scripts not yet supported. Arcane until you need them I suppose. I remember a few years ago when we were developing a multi-lingual site which needed support for Khmer and Armenian, and we were thankful that Unicode had just added their support a few months prior. If you have a marketing requirement for your software to support Japanese or Chinese, think Unicode. That’s because you will need to move to a double-byte encoding at the very least, and as soon as you go through the trouble to do that, you might as well support Unicode and get the added benefit of support for all languages.


Once you’ve chosen to support Unicode, you must decide on the specific character encoding you want to use, which will be dependent on the application requirements and technologies. UTF-8 is one of the commonly used character encodings defined within the Unicode Standard, which uses a single byte for each character unless it needs more, in which case it can expand up to 4 bytes. People sometimes refer to this as a variable-width encoding since the width of the character in bytes varies depending upon the character. The advantage of this character encoding is that all English (ASCII) characters will remain as single-bytes, saving data space. This is especially desirable for web content, since the underlying HTML markup will remain in single-byte ASCII. In general, UNIX platforms are optimized for UTF-8 character encoding. Concerning databases, where large amounts of application data are integral to the application, a developer may choose a UTF-8 encoding to save space if most of the data in the database does not need translation and so can remain in English (which requires only a single byte in UTF-8 encoding). Note that some databases will not support UTF-8, specifically Microsoft’s SQL Server.


UTF-16 is another widely adopted encoding within the Unicode standard. It assigns two bytes for each character whether you need it or not. So the letter A is 00000000 01000001 or 9 zeros, a one, followed by 5 zeros and a one. If more than 2 bytes are needed for a character, four bytes can be combined, however you must adapt your software to be capable of handling this four-byte combination. Java and .Net internally process strings (text and messages) as UTF-16.

For many applications, you can actually support multiple Unicode encodings so that for example your data is stored in your database as UTF-8 but is handled within your code as UTF-16, or vice versa. There are various reasons to do this, such as software limitations (different software components supporting different Unicode encodings), storage or performance advantages, etc.. But whether that’s a good idea is one of those “it depends” kinds of questions. Implementing can be tricky and clients pay us good money to solve this.

Microsoft’s SQL Server is a bit of a special case, in that it supports UCS-2, which is like UTF-16 but without the 4-byte characters (only the 16-bit characters are supported).

GB 18030

There’s also a special-case character set when it comes to engineering for software intended for sale in China (PRC), which is required by the Chinese Government. This character set is GB 18030GB 18030, and it is actually a superset of Unicode, supporting both simplified and traditional Chinese. Similarly to UTF-16, GB 18030 character encoding allows 4 bytes per character to support characters beyond Unicode’s “basic” (16-bit) range, and in practice supporting UTF-16 (or UTF-8) is considered an acceptable approach to supporting GB 18030 (the UCS-2 encoding just mentioned is not, however).

Now all of this considered, a converse question might be, what happens when you try to make your application support complex scripts that need Unicode, and the support isn’t there? Depending upon your system, you get anything from garbled and meaningless gibberish where data or messages become corrupted characters or weird square boxes, or the application crashes forcing a restart. Not good.

If your application supports Unicode, you are ready to take on the world.

The State of Continuous i18n & L10n Survey Results