What If Internationalization Expectations Exceed Your Budget? – Significantly

Note: This article is featured in the June 2010 issue
of MultiLingual Computing Magazine, in Adam Asnes’ Business Side column.

If you’re considering internationalizing a large and complex software product, there’s one thing you should be prepared for: it’s expensive. There’s just no way around it if you want an application that properly presents, inputs, transforms and reports complex data. I’m talking about applications measured in the hundreds of thousands to millions of lines of code. Seriously, you’re just not going to internationalize a sizeable application that you’ve taken years to develop with money just laying around – unless you have a lot of money laying around, which is pretty rare these days. But before we consider what to do about it, let’s consider the main reasons why you may need to internationalize:

Survival –

Your customers are increasingly global, and perhaps they use your product to reach their customers. If you’re not internationalized, you’re limiting their business. The competition and your customers will know this and will eventually eat your company alive. You’d better start finding some money.

A Sale –

There is nothing like an important customer to get an initiative moving. If this sale funds the internationalization effort, it makes things easier, though there will be commitment that will extend beyond any one customer. I’ve written before how changing your encoding will change your company. But if this sale doesn’t pay for the effort, corporate initiative will be needed.

Your company is global –

Perhaps your company is a global brand and you’ve quickly developed or acquired a product that isn’t internationalized. In this case, the decision to internationalize is usually simple. You do it because you already have a global reputation, sales and distribution. If you have to justify ROI, somebody is missing the point, there’s a temporary issue or the product isn’t showing promise.

Strategic Initiative –

This article isn’t going to be about all the strategic benefits of growing global revenues with products that leverage themselves worldwide, because you know all about that, right? But acting on strategy takes foresight, money, expertise and perseverance.

If you have any of the above situations except budget, this article is especially for you.

I’ll repeat a situation I’ve seen many times. My firm, Lingoport, will be called upon for initial consulting as a company is considering internationalization in reaction to a declared strategic objective to gain business outside a home market. They usually have one or two customers asking for just that, but perhaps there isn’t enough initial interest to finance the necessary development and localization. We go back and perform static analysis on the code using our Globalyzer software, counting the embedded strings, locale-limiting methods/functions/classes and programming patterns that will need attention and refactoring, combined with architectural changes to support locale and changes in processing.

Even with automating tasks for batch efforts like string externalization (after analysis), you still have design, engineering and testing cycles that add up to significant expense. At this point we find out just how strong corporate global resolve sits. And in some cases that resolve is just not quite ready. It’s not a lost cause by any means. In fact, almost always, it’s just a matter of time and resources and most come around in future quarters or fiscal years. But there lies the gap for development managers.

Rarely do developers internationalize software just because it would be cool. You do see that kind of initiative for new features, where a developer might get an idea, work on it during odd or even personal time, and voila, present it to his or her company peers. I have yet to see that happen regarding internationalization (write me if you see otherwise). Still, developers and management often know the need to internationalize is there; ready to become a firm requirement any quarter now. They can go on continuing to develop new features and update current code and not go near internationalization, but actually increasing the scope of the internationalization effort as they grow the code base. Or they can take some simple steps to get ready. To use an expression, “When you find yourself in a hole, stop digging.” Here’s a brief list of what you can do:

  • Gather requirements – new locale requirements will go much further than what languages will need to be supported. An architect can be tasked with learning about issues like character encoding and locale frameworks. A product marketing person can learn a bit about use cases and business logic that may alter how the product behaves in new countries. It is all too easy to underestimate the requirements phase. Locale behavior will involve quite a bit more than just string externalization. Start tallying and recording what is found in a centrally available resource, like the company wiki for all to build upon and learn about.
  • Prototype a string retrieval method. Learn about resource files and string ID’s and how to make them work. Again, list your results in the company wiki.
  • Do a little reading about Unicode and its various encodings, along with appropriate technologies for their use. It’s not enough to commit to using Unicode. You have to gain some understanding of just what that means.
  • Consider your database schema and how that might change for locale support along with likely changes to character encoding.
  • Consider any third party components or open source you use within your application. Start inquiring about their internationalization support.
  • Consider internationalizing a pilot effort or component of your software if your product architecture will permit it. There’s nothing like learning by doing. And if you decide to take a somewhat different approach later, it probably won’t be too difficult to alter what you’ve already done.
  • Refine your planning – as you learn more, your planning efforts are likely to get clearer. As plans get clearer, they seem less risky and large. You’ll be in a better position to defend expected costs, resources and schedules.
  • Consider application logic. Does your software manage a process that is performed differently around the world?
  • Talk with experts – It’s not prudent to try and reinvent the internationalization process. An experience expert, who’s really been through multiple implementations rather than just advising, can get you prepared faster and cheaper than the time it will take using your internal developers. I’ve seen companies create their own proprietary approaches that ultimately get in the way of a successful implementation. Initial consultation shouldn’t be a budget buster. Even so there are free internationalization webinars (we give them and others do too) and excellent conferences available (i.e. Worldware and the Unicode Conferences).
  • Start measuring toward your expected outcome – If you establish internationalization development practices and measure benchmarks, you are likely to see improvements to new development without significant cost in time and money. Static analysis tools like Globalyzer create a systematic approach, but if there’s no budget, then a simple and clear inclusion of practices and expectations can go a long way.

If you do at least some of this prior to any funded but highly likely internationalization requirement, you’ll be a tremendous asset to your firm’s globalization efforts. And globalization might just be one of the more significant and company-making undertakings that your firm can embark upon.

Innovation, Rejection and Overcoming Pitfalls

We pay a great deal of attention to innovation and sing its praises. But actually the road to creation, improvement and acceptance is messy and full of pitfalls. Innovation is often hard to recognize and to assign value, at first. More often than not, its introduction doesn’t live up to everyone’s expectations. But still it leaps forward, gracefully or not. I think it’s worth considering innovation more closely, given my own trials of bringing software to market, as well as watching the current industry public opinion mêlée regarding crowdsourcing.

Innovation promises great leaps forward. It offers hopeful and seemingly wondrous shortcuts and economies to everything it touches. It’s a new way, maybe audaciously conceived, and often tricky to execute. It’s also a fundamental pedestal for all we do. And many of us, if we are perseverant and lucky, are actually in the business of being innovative. But innovation always faces initial rejection. It’s just part of the deal.

There’s the promise of dramatic improvement, the skepticism, disappointment and persistence that we find so addicting. So I think it’s worth the time dissecting that process a bit, so we can all benefit a bit more from understanding the inventor, while bringing ourselves forward in ways we can apply to our professional and personal lives.

Great Leaps and Incremental Improvements

I recently read an article that proclaimed a requirement to call something an innovation is a 10x improvement in a process, expense or service. I rather like the idea of putting a numerical value on innovation, as it sets a target standard to be aiming for. I can ask, does my product provide that 10x improvement? That’s a demanding figure! However I don’t think you can discount innovation that isn’t as startling.

Some innovations, think of the printing press and more recently the internet, offer astronomical gains in productivity and information access across society. Going to the library to research has become a quaint activity, with power usurped from librarians everywhere. The internet becomes our personal assistant, advertising vehicle and even a translator. That doesn’t mean incremental improvements aren’t important either. Actually, I think the two are implicitly married, and that one doesn’t persist towards adoption without the other. Broadly applied innovation has an ecosystem of technologies, users and materials. For example, improvements in virus protection probably don’t have a 10x multiplier on internet use, but they do have a cumulative effect on browsing behavior of the people who adapt that protection. Think of the distinction in terms of game changing, and solving serious pitfalls. Both are important to success and adaptation.

Now it also seems that with innovation, you also necessarily encounter a sociological refusal that I’m saying you must overcome to be optimally successful. An example from my mid 90’s past we’d consider small minded now is needing to lobby a particular VP to grant internet access to sales people to help them research customers sites. The establishment fear was that people would spend all day surfing inappropriate sites that would take away from productivity. I can’t imagine an information technology company in that science-focused business applying that same reasoning any longer.

People, particularly from my generation or older, discount social media and blogging, but it’s actually a fairly effective and potent form of circulating news – yes many may not want all the minutia that comes with it, but it can be used quite powerfully and personally when used well.

In a more pedestrian example, I often hear about how code analysis tools won’t work, particularly applied to internationalization, even when there’s apparent proof in project and customer success that they do. I consider it a badge of honor that a leading localization company featured in their blog how internationalization tools are a myth. They all but called out my company’s product by name.  Yet an open mind and some actual research or even a phone call would have shown more of an embrace of the possibility of improvements that actually help the whole industry. People are all too happy to kill off innovation without a serious thought or investigation based on their experiences in the past. In other words, past attempts were unsuccessful before, so we’ll assume nothing could have changed. The blog post even sited products that have been extinct for years as evidence. Small example but this is how reactionary thinking plays out in management efforts that can potentially be damaging in an information industry routed in advancing technologies and development methods.

Where Innovation Comes From

I haven’t noticed a clear path to an innovation process, but what I do know is that ideas are common, good ideas are rare and good ideas followed with action are rarer still. A dynamic individual may have or come across what many would feel is a good idea, about 4 to 8 times per year – some people much more, some less. Ideas are always fun and exciting to me, but I confess to only following up on a few. The rest of creativity goes into tweaking current projects, or reading and learning and bringing those ideas into everyday activities.

Since there isn’t really any value in a creative or innovative idea without follow-through, there is nothing wrong and everything to gain by running with someone else’s innovative idea or improvement. You just have to keep an open mind to where it may come from.

Big ideas can come from the top down or bottom up. But incremental improvements more typically come from your everyday users or developers living with a product every day.

For instance, an ongoing challenge for us in our Globalyzer product, is that when our clients first apply it to perform static analysis on their code, they often end up with what we refer to as false positive results. That is, the product will flag internationalization errors, and in particular embedded strings, which may be programmatic elements such as debug statements or database queries. We developed rules based filters and a back end database to minimize, catch and tag them, but they typically need some adaptation and customization for each code base. That’s fine and to be expected and managed, and even a strength of the system, but what if there was another way?

And in fact a Jr. Programmer/intern working at my company doing a lot of code scanning for service projects made a simple remark, “what if we compared those strings to an actual dictionary? That would tell us quite a bit about the nature of the string just based on content, rather than programmatic rules.” It was a very good idea and one of our architects adapted it to make it real. By the time you are reading this, this improvement will have been released in our software. The young programmer is back in school and has moved on, but his good idea is about to become a real part of our product.

Innovation Devalues Everything it Touches

By its very nature, innovation puts either a person or process out of work. It wouldn’t be worth anything if it didn’t make someone more productive with less. At the same time, the first rounds of innovation are typically full of pitfalls that need to be overcome.

The immediate case that comes to mind is the current brouhaha over crowdsourcing. In case you haven’t attended LocalizationWorld,  read up on industry happenings, or participated in numerous LinkedIn discussions, Crowdsourcing is either a great evil or the most innovative thing that’s happened in our industry in a while, or something in between. There are complaints about the very concept, the devaluing of translator expertise and what some people feel is an inferior end result produced by enthusiastic, but naive, volunteers willing to work for accolades alone. Others, notably at Facebook, feel it’s a process that results in faster, cheaper translations at a higher quality. It’s not hard to find evidence supporting both sides, and I suppose at the moment final judgment on immediate results may not be the relevant criteria. More likely the industry could potentially have something to gain using the technologies for rendering translations in context with application pages, rather than the contextless traditional table view. These tools can be applied to more traditional translation resources, while also gaining a better linguistic review platform and buy-in from in-country clients and employees – who are after all, the real stakeholders and judges in a localization effort. But that’s just my understanding of it, and I may be overlooking something. Certainly there’s a long way to go, but I wouldn’t be caught on the side of belittling the persistent follow-through of dedicated people bringing ideas into reality and adding enhancements to overcome pitfalls.

Building a Site for Worldwide Customers

Ecommerce sitesThe world is yours, or at least that’s the promise e-commerce offers. Get your products, services or information online and you can gain customers anywhere. It can be challenging, though, to build an active worldwide customer base that buys and comes back for more. It’s a competitive world, and studies have shown over and over that people prefer to buy in the ways they are accustomed, especially with information in their native language. The first obvious customization is to provide translation of your e-commerce site, but this doesn’t happen with an easy wave of a magic wand. There are steps needed, from business planning to technical adaptation to facilitating the localization process and streamlining updates. In this article, we offer you an overview of these considerations, and logical steps to help you move forward.

Business Case

While this article mostly covers issues regarding site creation and adaptation, any discussion must include the business drivers as they strongly impact cost and time considerations.

Whether you work at a large or small company, your business case leads your budget and resource allocation in creating sites for global audiences. In most cases, this globalization strategy involves high level management visibility and strategic commitment. There are revenue expectations, distribution issues to sort out, possibly local in-country representation to support and a host of other logistics. All that adds up to plenty of expectations for a return on the investment. Getting a good plan in place, including a strong understanding of the scope of implementation efforts, a technical and process roadmap, as well as some kind of measurement metrics helps you get the right funding and resources to be successful.

The costs of poorly globalizing your e-commerce site certainly include building expensive systems that don’t have the needed functionality for an international customer base. Even worse are the delays in deployment that have rather painful and visible effects on your company’s revenue stream, global aspiration objectives, and ultimately the bottom line.

Process Steps in Creating a Global Site

In creating a site adapted to worldwide customers, there are two major defined steps: Internationalization and Localization. For a site to be localized, giving it the “look and feel” of having been developed in the target market, it must first be internationalized.

Internationalization – One Site, Many Adaptations

To the outside observer, internationalization (i18n) remains a hidden and often unknown attribute, but it is critical to leveraging your success from market to market. When you internationalize your site, you adapt its technology to be capable of not only supporting any language, but also supporting local formats and ways of doing business. Translators and regional stakeholders can alter content and more, but the site itself – what presents and processes information – remains consistent and leveraged for each market.

We often counsel our clients to think in terms of locales, and not languages. That’s because you can’t assign local purchasing behaviors to a language. It’s more of the other way around; a locale includes the language of the region as well as numerous other issues, ranging from character set support, date/time formatting, forms of payment, data/product sorting, phone/address formatting and more.

If you are using another company’s e-commerce platform technology for your site, then you must find out exactly how they support internationalization. If you are building a new site, be aware that some technologies adapt to internationalization and localization demands better than others. The technologies you choose should tread the balance of your current organization requirements and your business objectives.

If you are adapting your current site to support internationalization, consider these areas in your migration:

Architecture –

The structure of your e-commerce system, including the software itself, the externally visible properties of the user interface, and the relationships between them.

  • Consider your new requirements for international markets, finding the balance of what is not in your e-commerce site that needs to be added
  • Likewise, examine what is in your site’s code that needs to be changed to support the markets.

Code Refactoring –

Unless you are developing a new e-commerce site where support for international markets is planned from the beginning, it is likely that the internal structure of your e-commerce site will require modification to improve or change the code to better support international functionality. Typical code refactoring on internationalization projects includes:

  • Extract embedded strings from the code so that they can be easily accessed for translation
  • Changing locale-limiting functions, methods and classes
  • Mark relevant business logic object-based so that can be affected by locale requirements
  • Enable character set support (Unicode) so that extended characters display properly
  • Ensure character encoding changes to pages, database and individual coding elements are implemented
  • Abstract transaction workflow on the site that may need to be dynamically customized to support locale requirements

It can be a challenge to identify and fix internationalization issues in an e-commerce Web site application. Fortunately, there is at least one tool that greatly facilitates the process. Globalyzer, a product available from Lingoport, is built to help teams of developers find and fix internationalization issues, and keep software internationalized over time. You can learn more on the Lingoport
web site
and even sign up for a trial Globalyzer account.

Content Management Systems

Another aspect to take into account during the internationalization phase is the type of tools you are using for developing your content. For Web sites, there are plenty of good content management systems (CMS) that are available; however, there are differences among them that affect the support for international markets. If you use one, you want make sure it is localization-friendly. It must have a way to export the translatable content in some kind of file format that translation tools can use. XLIFF (XML Localization Interchange File Format) or other variations of XML-based formats are good choices. The tool must also be able to merge back the translated exported data into the right places in the localized content.

The capability to generate “delta” files – which contain only the content “chunks” that need to go through the localization process for translation – is a very efficient way to reduce the costs for localizing updates as your site is modified. It is often helpful to the linguists, though, to provide reference materials or to include the already-translated content around the new translatable chunks, so the translation can be done within a meaningful context.

Some content management systems also allow you to control the granularity of the chunks you create, and to re-use them across the whole published Web site. This allows for even more cost saving in localization.

Content Creation

Whether you are using a content management system or not, how you write the content and design your icons and graphics affects the ultimate localizability of the site. Taking into consideration the way the content is developed saves money during the localization process and results in better international sites. Ultimately, it is much cheaper to create content correctly in the source language before translating the content into many languages for the target markets and having to address content issues for each market:

  • Write in simplified English. In creating the source content, write in the active voice, avoiding complex sentence structures. Avoid the use of slang, colloquial expressions, and cultural references. This is even more important if you anticipate having some users from markets that are not covered by your globalization plan. They may end up using machine translation engines to get a gist of the content of your Web site.
  • Reuse text. If you say the same thing at different places, say it the same way, so the translation of the first occurrence can be used for the second one. This leveraging of text can significantly reduce the linguistic fees through the reuse of previously translated content. By all means, avoid minor wording changes as that just means more costs. Content management systems can help you to parse your content into “chunks” that are easily translated while facilitating the reuse of content throughout the site.
  • Icons. Make sure all icons are understandable by your target markets. It is cheaper to have icons that “work everywhere” than to customize icons for each market. Identifying culturally acceptable icons can require a bit of up-front cost in assessing them for your target markets, but it avoids confusing (or worse, offending) your customers. Alternatively, you can design your Web site to easily substitute icons according to the market (e.g. by using style-sheets instead of hard-coding style changes in your pages).
  • Graphics. While it is tempting to have complex graphics with layered text, remember that all text has to be translated. Translating text that is embedded into graphics is more expensive. If you have to use call outs on your graphics, use numbers or letters that are then referenced in the text of the page rather than on the graphic itself. Whatever you do, make sure that you keep the graphic source files for your localization team (not just the collapsed JPG or GIF files).
  • Search Engine Optimization (SEO). In creating the source e-commerce site, great care is taken to optimize search terms so that the site appears readily in search engine matches. Extend your efforts to include SEO for each of your target markets, using appropriate search terms in the metadata as well as the content itself.

Ensuring Internationalization Success

A good internationalization effort should be validated with a careful review of the source site:

  • Consider using pseudo-translation (where the content is passed through a small program to convert the text into extended characters so that display can be verified) of the content to verify that all modifiable elements of the site are indeed accessible and can be changed for the various translated versions. Verify that locale-sensitive data can be processed accordingly (date/time/numbers format, currency issues, measurement units, etc.) and that, when needed, locale-specific content can be provided as well (end-user license agreements, privacy and confidentiality statements, 800-type numbers, part numbers, etc..)

Success of your site on the international scene comes from a combination

  • Good development practices
  • Well-adapted tools used during the development
    and the maintenance of the site, and
  • Content that is ready for localization,
    taking into account cultural differences as appropriate.

By following these high-level guidelines, you are better prepared for the localization and translation of your e-commerce site for each of your target markets.

Unicode Primer


When developing software that is to be used in multiple languages, it is essential to support a character set that will render any character. Unicode, a standard for representing text of all the world’s writing systems fills this void, allowing for full character support and workable software in all languages.

In this 4-minute video, software internationalization expert Adam Asnes illustrates:

  • How character encoding has evolved
  • Why Unicode is essential for commercial software
  • The differences in Unicode encoding; i.e.: UTF-8 and UTF-16


Related Videos


Worldware Conference Summary – Not as Good as Being There

In March I attended and presented at the first Worldware Conference, which took place in Santa Clara, California in the heart of Silicon Valley. I became really excited about this conference as it proved to be the first to directly target business issues around software internationalization and globalization. Too often in other conferences, the focus is very low level on technical issues, while missing greater business planning and operational issues that affect every organization that looks to build and maintain world-ready products. In fact, that issue had been a long running annoyance for me when attending conferences like Unicode and LocalizationWorld. So I was eager to get involved in Worldware and sat on its board as well.

The conference had outstanding material, and featured various business leaders from well known world software brands. The downside was that the conference was not particularly well attended. There were probably a total of about 70 people there, including speakers, but at least we all got to know one another. Presentations featured executives from companies like EMC, Microsoft, Linden Labs, Oracle, Mozilla, Sun, Adobe, Yahoo!, Intel, various industry consultants and of course me.

Here’s a few items from my notes and memory, in little particular order:

  • Don Depalma, of CommonSense Advisory, had some excellent data showing return on investment and overwhelming customer preference for software which was internationalized with locale sensitive language and formatting support. His numbers were of the Holy Grail that managers have been asking for. A big point was that even when end-users are perfectly capable of reading, writing and speaking English, they vastly preferred software in their own language to the point where they made choices and spent more in line with that preference. Don had data broken down even per country. I can’t wait to poach some of these slides.
  • Common points were that i18n is an enabler for localization and ultimately revenues. A way to waste a ton of money is to pursue localization before you’ve properly internationalized.
  • Organizations like Mozilla and Linden Labs are making great use of crowdsourcing to enable new features and localization. So if you have a product which has an emotional type of rabid following, crowdsourcing is a relatively new form of getting help, though it needs its own adaptation for management.
  • Some companies, like EMC, must simultaneously ship for all top tier locales when releasing new products. So globalization isn’t an afterthought.
  • Executives don’t understand internationalization but understand the cascading effect.
  • Invest in internationalization expertise. Too expensive to “wing” it.
  • Empower product teams
  • Create i18n boot camp training
  • Some companies demonstrated that they have built whole organizational frameworks to support internationalization. Particularly Intel and Yahoo! presented how they are using technologies for automatically auditing global readiness. Happy to say Globalyzer got many accolades.
  • There was a lively Agile (extremely popular development methodology) discussion as it relates to internationalization. This is because if i18n is built into the product development from the start Agile works great. When there are Agile cycles and i18n on existing code going on simultaneously, both efforts are very unlikely to synchronize well. Lots of reasons for this, which would probably make a great future article for this newsletter. This issue came up multiple times and Tony Jewteshenko gave a whole presentation session on it (but I wasn’t able to attend that one).
  • It’s extremely difficult to take back a language after you release for a particular market. So consider that request for your software in Klingon carefully.
  • How you communicate around the world will empower your organization.
  • Brand recognition
  • Market Share
  • ROI
  • I presented along with Daniel Goldschmidt on how to get an i18n effort going
  • Technical buyer, vs. Management objectives
  • Need to get a good plan for budget approval first, design second
  • Showed Globalyzer 3.0 and scanned some open source code
  • Demonstrated a project plan
  • Daniel broke down i18n projects into a 3 phase approach
  • Transportation – moving data from A to B
  • Application – doing something with the data (e.g. sorting)
  • User Interfaces
  • Then we both talked about keeping software world-ready and answered questions
  • Kamal Monsour of Monotype Imaging gave a most informative presentation showing intricacies of digital fonts in languages like Arabic and Hindi.
  • I was on a panel along with Ed Watts of Oracle and Mike McKenna from Yahoo! on Assessing and Quantifying efforts. Ed emphasized the role of pseudo-localization. Mike was his usual incredible reservoir of information and experiences both organizationally and on the technical side in supporting i18n. I talked about how we essentially have had to learn to estimate and execute internationalization projects and still make a profit, and that’s why we’ve created tools and methodologies to do so.
  • Aaron Marcus of Aaron Marcus and Associates gave a presentation on cross cultural user-experience design showing many cultural differences, certain scales by which cultures accept power hierarchies and how that shows up in site design.
  • Mike McKenna showed a fabulous presentation on trends in internationalizing which featured several i18n initiatives at Yahoo! As a bonus, I got a Fight Mojibake sticker (ghost characters), which is now on my notebook. In particular, they work to get people enthusiastic and understanding that they are creating products for the world. He also talked about how his team supports i18n with tools like Globalyzer. Thanks Mike.
  • Barbara Burbach of Cisco talked about staffing models, including outsourcing for i18n and l10n. She felt i18n outsourcing for an existing product was a good idea, as it keeps the core development team focused on new features. For new products being internationalized from the beginning, she preferred in house engineering.
  • Tex Texin (i18n Guy) discussed how he has worked with various teams to promote internationalization, and how decisions were often affected. He also gave Globalyzer a nice recommendation. Tex was formerly in charge of internationalization at Yahoo! and NetApp, both of which are Lingoport customers. Thanks Tex.

I’ve missed a ton in this quick summary, as I haven’t managed to master being in two places at once and couldn’t have attended all the sessions.

Enterprise Internationalization and Automation

There are some technology companies where thinking globally has been fundamental to their operations for years and years. I’m referring to companies like IBM, HP, Yahoo, Google and the like. These companies all made significant investments in their global infrastructure, sales teams, products, development and strategic planning. It didn’t happen by accident. And as these companies develop new products or acquire companies, they look to leverage them across that global infrastructure quickly and profitably. Global companies are good prospects for my company in our internationalization products and services business, because they tend to be more experienced in their understanding of engineering challenges, knowing that it takes people, tools, time and money to globalize software so that they can gain the best return on their product distribution and sales infrastructure.

One very potent way to make software globalization fundamental to a company’s mindset is to make internationalization a fully integrated and automated part of software development practices. There are all kinds of tools, checkers and environments to help developers create interfaces, access and transform all kinds of information buried in databases, support coding constructs, manage memory and perform application modeling. With that in mind, we’ve been hard at work with a major new Globalyzer release, clearly aimed at supporting entire development departments and enterprises, automatically using batch processes on servers to monitor internationalization progress as well as on the desktop where issues can be individually examined and fixed. While that has always been our aim, we’re now getting there in more robust ways that track internationalization status over time over multiple programming languages and even over multiple products.

Globalyzer i18n software

For those non-developers reading this, let me explain what I mean about automation in this context. When engineers create code, they generally all submit their work to a code repository. This repository provides version control so that when multiple engineers are all working together, they can check code in and out and merge together all their changes. Then the code has to be put together and built. This build process usually occurs on some interval, such as nightly or even on a continual basis. During this automated process, you can also automatically check for many other issues like performance, load balancing, and I’m proposing that this is a great time to check on internationalization/localization readiness by running tools on the code automatically as a batch process, which then tracks issues via reports. Now counting issues is one thing, but you can go even further by showing exactly where a problem exists in the code, along with the context of the errant issue. That information can then be brought forward for quick review and fixing.

Two companies which come to mind, doing this very thing are Intel and Yahoo. Michael Kuperstein of Intel, presenting at the WorldWare Conference in March, reported how his team developed their own internationalization toolkit a few years ago and have integrated it into many of their automated build processes. That automation has made internationalization an important and measured component of their ongoing development efforts. By Mike’s own admission, he would have used Globlayzer had he known about it years ago.

Mike McKenna of Yahoo also reported at WorldWare that his globalization team is using automation, in this case Globalyzer, to measure internationalization benchmarks on development teams.

Globalyzer, a leading internationalization tool support software internationalization, Java internationalization, and software localization.

On the localization product side, there are multiple tools for different aspects of managing words. But when it comes to products which support an enterprise in their software internationalization efforts, there is a pretty empty playing field. Aside from some very simple string externalization utilities in a few development environments and frameworks, our Globalyzer is simply the only commercial software I know of that can automatically monitor development over time over a wide range of programming languages, while also stepping entire teams through internationalization fixes in large amounts of code.

I’ve said a few times in my columns that I’ve found that it’s quite powerful to embrace the management principal that whatever gets measured gets done and improved over time. So it follows that one of the most important aspects of any software development undertaking is that you measure desired outcome over regular intervals. If you just hope that it will all come together in the end, you always end up late and over budget. That is ultimately behind the agile and extreme programming development movements, in that you make more frequent intervals of measurement and goals. But it’s not so easy to track something like internationalization, either as a project where you are refactoring software for new globalization requirements or even for ongoing development. Consider that developers are typically over tasked, and often distributed across time zones and continents. Then factor in that internationalization can be quite subjective to a particular development task. Plus internationalization is a fuzzy thing, in that it is tailored to requirements, technologies and special cases. So what development teams grapple with how to handle it, and make their way through the task by brute force – or simply postpone or avoid internationalization whenever possible. Issues get missed, and if you’re lucky, you have an iterative process during localization to fix internationalization bugs, which is a very expensive and time consuming path. Or worse, development ignores the issues and calls it a localization problem.

I spoke with a company in just that situation last week. They were upset with their localization provider for poor quality, but when we examined some of the issues, there were also extensive internationalization mistakes that were sure to break localization context and execution. These included missed strings and extensive string concatenations. Had they been monitoring these efforts all along, and been clearer on internationalization requirements, they would have had better results and a clean release. The biggest costs to them were poor market entry, customer dissatisfaction and complaints from their distributors and sales teams which had to overcome a poorly localized release. Now I also feel that as vendors we have some responsibility in taking care of clients and not selling them a solution that risks poor quality and a weak market entry, so some blame also goes to the localization provider. But I hardly know what really happened, I was just there to offer help in picking up the pieces. Clearly that’s an expensive route in many ways.

Remember, internationalization is often run by a different crew than localization. Software developers are upstream from localization, and they are sometimes all too disconnected from a final localized product releases. Localization is often someone else’s problem and engineers are focused on getting a release out with all its new features. They don’t know what they don’t know, which is only human. That leaves localization teams waving their arms around trying to get the developers to build software right the first time. And those teams likely have no way to measure if the product they are tasked with for localization actually passes internationalization muster, until they go through localization testing. Again that’s very late and expensive in a software development process, and more often than not, localization testing tends to be underfunded and vendor dependent. You’re going to have trouble finding everything. So for localization teams, what I’m suggesting is to consider a kind of automated litmus test. When code comes to the localization group, scan the code for internationalization issues, and consider what’s found. The technology is now there to do this in detail and examine each potential issue, quickly and easily. So at the worst case, you can at least have engineering fixing internationalization bugs during the localization process rather than when it’s far more expensive.

Again, anything that measures and sheds light on the situation will also have the result in making improvements. So if you want well globalized software, better start measuring how that code is developed, not just what it’s costing to localize it.

P.S. I’m thinking of writing a column on funny ways people fell into the localization business. If you have a good story you’d like to share, please contact me!

Internationalization Management Tips: 10 Mistakes to Avoid

It’s extremely common for us to work with clients who have had a bumpy past with regards to internationalization. Sometimes you have to learn things the hard way, but that is always expensive.

In the past I’ve written about ten tips for managing internationalization projects. Here’s a look at mistakes that I’ve commonly seen repeated on the client side. In our services practice at Lingoport, we often have to council our clients through one or more of these sorts of process issues, which is actually a very rewarding part of what we do. While this list is pretty high level, we’ve seen that the processes involved can set up cascading failures that eventually can have a serious impact on a project’s success. Some apply more to internationalization of existing applications; others can apply to development where internationalization is planned in from the point of conception (still kind of a rare thing, but gaining).

So, here are 10 internationalization process mistakes to avoid:

1. Don’t forget what drives internationalization:

Money on the top and bottom lines of your company’s balance sheet. The point here is that the costs of being late or lousy endure way beyond benefits of cutting corners on development. Internationalization happens because of a:
a. New customer(s) sale
b. New partnership
c. Strategic initiative backed by marketing, legal and other types of efforts and investments

2. Don’t assume internationalization is just an older software legacy issue.

It comes up surprisingly often that people even in our industry think that internationalization is mainly an issue for older applications. No framework, whether it’s J2EE, .Net, Ruby on Rails, PHP or whatever is new and improved, internationalizes itself. You still need to do all the steps necessary to implement locale and all the associated internationalization practices. Many newer programming platforms do an excellent job of internationalization support, which is great news as you can estimate and execute with a higher degree of accuracy. But you still have plenty of work to do.

3. Don’t assume you can treat internationalization like any other feature improvement when it comes to source control management.

With internationalization source control can need an extra step of thinking things through. It’s very typical for new feature development and bug fixing to be going on in parallel to internationalization efforts. However, in the process of performing internationalization, you are going to be breaking major pieces of functionality within your application as you make large changes to your database and other application components. In order for respective developers to work on their own tasks and bugs, you typically need to branch code, often with specifically orchestrated code merges.

4. Don’t assume internationalization is just a string externalization exercise.

Prevent corrupted software strings with Lingoport's software internationalization toolString externalization is important and highly visible, but the scope of internationalization includes so much more. For example: creating a locale framework, character encoding support, major changes to the database, refactoring of methods/functions and classes for data input, manipulation and output. How these are all approached, varies greatly based on requirements and technologies.

5. Don’t wing it on Locale

Designing how locale will be selected and managed often doesn’t get the amount of thought and planning deserved. How the application interacts with the user, detects or selects locale, and then how it correspondingly behaves is a design process needing input from an experienced architect, product marketing and the development team. This is not an area to be chosen by any one representative by fiat. It’s a whole lot of work to redo locale if it’s executed inadequately for user, business and locale requirements.

6. Don’t create your very own internationalization framework

Don’t even do it if you think you know better. We regularly run into clients who have half-way implemented internationalization using their own homegrown methods for string extraction and locale management when there were already well establish methods provided within their programming language framework or established solutions like ICU. Using these will ensure that your code is far easier to maintain, and you’ll know that thousands of applications have used them successfully before you. No unpleasant surprises.

7. Don’t think that the team internationalizing your software can work without a working build

This seems obvious, but it comes up lots. Without a working build, the developers can’t smoke test the changes they are making. Even if you provide a dedicated QA person, my own experience is that developers need to be able to compile and run themselves to head off problems later. It’s too hard to rely on reconstructing coding errors at a later time and make for unnecessary bug fixing iterations, lost time and poor quality.

8. Don’t run out of money

Internationalization planning often suffers from underscoping. At Lingoport, we have both software and well established methodologies for estimating internationalization, as we really don’t want to ever break this rule and have to ask our clients for more funding. Same should hold true for internal efforts. Lapses in funding can cause expensive delays, as new funding takes more time than anyone imagined to get approved. It also reduces management credibility. And chances are, if you need to ask for more money, than you also need more time, which brings you back to consequences regarding tip #1.

9. Don’t use a half thought-out character encoding strategy

Use Unicode, rather than native encodings. If you have budget and time constraints and you’re only targeting dominant languages in markets like Western Europe, North and South America, you can often get away with ISO Latin – 1, but even for Eastern European languages, go Unicode. Then when you do, make sure your encoding works all the way through the application. And don’t forget that if your customer needs to support worldwide customers themselves (e.g. enterprise software), they may need you to support Unicode data processing even if the interface remains in English. One more consideration tilting toward Unicode is that programming languages like C# and Java already internally pass strings and data as Unicode, so you might as well think about engineering for the world.

10. Don’t use your same testing plan, or just rely on localization testing, when your functional testing needs to grow to include internationalization requirements

In our services projects, we always put special emphasis on working through pseudo-localization of not only the interface, but sending test data using target character sets, locale altered date/time formats, phone numbers and more, from data input to database, to reports and so on. If your testers are English only speakers, that’s fine. For example, we have a utility, PseudoJudo in one Globalyzer that puts target language buffer characters surround English strings. You can expand data fields to fit physically longer strings giving room for translation changes in sizing as well as encoding.

11. Bonus Tip: Don’t assume localization is just someone else’s problem

It’s funny how many of our customers are strictly concerned with software development and don’t actually have anything to do with localization processes. We always work to bring together localization into the internationalization effort. We do this by interfacing localization resources early on, helping them understand the technical requirements and then feeding translators strings that we extract on the front end of projects, so that when internationalization functional testing is done, we are immediately ready to perform linguistic translation testing and ultimately deliver a finished product. This compresses times to global release, while also making for a more fluid process, less programming iterations and higher quality.

Unicode and Internationalization Primer for the Uninitiated

Among our friends and clients at Lingoport, we regularly see ranges of confusion, to complete lack of awareness of what Unicode is. So for the less- or under-informed, perhaps this article will help. The advent of Unicode is a key underpinning for global software applications and websites so that they can support worldwide language scripts. So it’s a very important standard to be aware of, whether you’re in localization, an engineer or a business manager.

Unicode and Internationalization

Firstly, Unicode is a character set standard used for displaying and processing language data in computer applications. The Unicode character set is the entire world’s set of characters, including letters, numbers, currencies, symbols and the like, supporting a number of character encodings to make that all happen. Before your eyes glaze over, let me explain what character encoding means. You have to remember that for a computer, all information is represented in zeros and ones (i.e. binary values). So if you think of the letter A in the ASCII standard of zeros and ones it would look like this: 1000001. That is, a 1 then five zeros and a 1 to make a total of 7 bits. This binary representation for A is called A’s code point, and this mapping of zeros and ones to characters is called the character encoding. In the early days of computing, unless you did something very special, ASCII (7 bits per character) was how your data got managed. The problem is that ASCII doesn’t leave you enough zeros and ones to represent extended characters, like accents and characters specific to non-English alphabets, such as you find in European languages. You certainly can’t support the complex characters that make up Chinese, Korean and Japanese languages. These languages require 8-bit (single-byte) or 16-bit (double-byte) character encodings. One important note on all of these single- and double-byte encodings is that they are a superset of 7-bit ASCII encoding, which means that English code points will always be the same regardless the encoding.

The Bad Old Days

In the early computing days, specific character single- and double-byte encodings were developed to support various languages. That was very bad, as it meant that software developers needed to build a version of their application for every language they wanted to support that used a different encoding. You’d have the Japanese version, the Western European language version, the English-only version and so on. You’d end up with a hoard of individual software code bases, each needing their own testing, updating and ongoing maintenance and support, which is very expensive, and pretty near impossible for businesses to realistically support without serious digressions among the various language versions over time. You don’t see this problem very often for newly developed applications, but there are plenty of holdovers. We see it typically when a new client has turned over their source code to a particular country partner or marketing agent which was responsible for adapting the code to multiple languages. The worst case I saw was in 2004 when a particular client, who I will leave unmentioned, had a legacy product with 18 separate language versions and had no real idea any longer the level of functionality that varied from language to language. That’s no way to grow a corporate empire!

ISO Latin

A single-byte character set that we often see in applications is ISO Latin 1, which is represented in various encoding standards such as ISO-8859-1 for UNIX, Windows-1252 for Windows and MacRoman on guess what platform. This character set supports characters used in Western European languages such as French, Spanish, German, and U.K. English. Since each character requires only a single byte, this character set provides support for multiple languages, while avoiding the work required to support either Unicode or a double-byte encoding. Trouble is that still leaves out much of the world. For example, to support Eastern European languages you need to use a different character set, often referred to as Latin 2, which provides the characters that are uniquely needed for these languages. There are also separate character sets for Baltic languages, Turkish, Arabic, Hebrew, and on and on. When having to internationalize software for the first time, sometimes companies will start with just supporting ISO Latin 1 if it meets their immediate marketing requirements and deal with the more extensive work of supporting other languages later. The reason is that it’s likely these software applications will need major reworking of the encoding support in their database and functions, methods and classes within their source code to go beyond ISO Latin support, which means more time and more money – often cascading into later releases and foregone revenues. However, if the software company has truly global ambitions, they will need to take that plunge and provide Unicode support. I’ll argue that if companies are supporting global customers, and even not doing a bit of translation/localization for the interface, they still need to support Unicode so they can provide processing of their customer’s global data.


We come back to Unicode, which as we mentioned above, is a character set created to enable support of any written language worldwide. Now you might find a language or two lacking Unicode support for its script but that is becoming extremely isolated. For instance, currently Javanese, Loma, and Tai Viet are among scripts not yet supported. Arcane until you need them I suppose. I remember a few years ago when we were developing a multi-lingual site which needed support for Khmer and Armenian, and we were thankful that Unicode had just added their support a few months prior. If you have a marketing requirement for your software to support Japanese or Chinese, think Unicode. That’s because you will need to move to a double-byte encoding at the very least, and as soon as you go through the trouble to do that, you might as well support Unicode and get the added benefit of support for all languages.


Once you’ve chosen to support Unicode, you must decide on the specific character encoding you want to use, which will be dependent on the application requirements and technologies. UTF-8 is one of the commonly used character encodings defined within the Unicode Standard, which uses a single byte for each character unless it needs more, in which case it can expand up to 4 bytes. People sometimes refer to this as a variable-width encoding since the width of the character in bytes varies depending upon the character. The advantage of this character encoding is that all English (ASCII) characters will remain as single-bytes, saving data space. This is especially desirable for web content, since the underlying HTML markup will remain in single-byte ASCII. In general, UNIX platforms are optimized for UTF-8 character encoding. Concerning databases, where large amounts of application data are integral to the application, a developer may choose a UTF-8 encoding to save space if most of the data in the database does not need translation and so can remain in English (which requires only a single byte in UTF-8 encoding). Note that some databases will not support UTF-8, specifically Microsoft’s SQL Server.


UTF-16 is another widely adopted encoding within the Unicode standard. It assigns two bytes for each character whether you need it or not. So the letter A is 00000000 01000001 or 9 zeros, a one, followed by 5 zeros and a one. If more than 2 bytes are needed for a character, four bytes can be combined, however you must adapt your software to be capable of handling this four-byte combination. Java and .Net internally process strings (text and messages) as UTF-16.

For many applications, you can actually support multiple Unicode encodings so that for example your data is stored in your database as UTF-8 but is handled within your code as UTF-16, or vice versa. There are various reasons to do this, such as software limitations (different software components supporting different Unicode encodings), storage or performance advantages, etc.. But whether that’s a good idea is one of those “it depends” kinds of questions. Implementing can be tricky and clients pay us good money to solve this.

Microsoft’s SQL Server is a bit of a special case, in that it supports UCS-2, which is like UTF-16 but without the 4-byte characters (only the 16-bit characters are supported).

GB 18030

There’s also a special-case character set when it comes to engineering for software intended for sale in China (PRC), which is required by the Chinese Government. This character set is GB 18030GB 18030, and it is actually a superset of Unicode, supporting both simplified and traditional Chinese. Similarly to UTF-16, GB 18030 character encoding allows 4 bytes per character to support characters beyond Unicode’s “basic” (16-bit) range, and in practice supporting UTF-16 (or UTF-8) is considered an acceptable approach to supporting GB 18030 (the UCS-2 encoding just mentioned is not, however).

Now all of this considered, a converse question might be, what happens when you try to make your application support complex scripts that need Unicode, and the support isn’t there? Depending upon your system, you get anything from garbled and meaningless gibberish where data or messages become corrupted characters or weird square boxes, or the application crashes forcing a restart. Not good.

If your application supports Unicode, you are ready to take on the world.

The State of Continuous i18n & L10n Survey Results

I18n JavaScript – the Good, the Bad, and the Ugly

i18n JavaScript: Given JavaScript’s status as the de facto browser client scripting language, and given the international nature of the Internet, it was inevitable that JavaScript and internationalization (i18n) would eventually cross paths. Fortunately, in this day and age of Unicode, character corruption can be avoided if care is taken to make sure JavaScript is using it. Unfortunately, strings are hard coded in JavaScript and locale-specific methods are unpredictable, making localization more difficult.

To continue reading, and to see how JavaScript strings and data formatting can be supported by your selected locale, please fill out the form below. A brief preview:

Assuming currentLocale is set to English (US), the resulting code block should look like this:

Current Locale Resulting Block | Internationalize JavaScript


Enter Your Information to Download the White Paper

  • This field is for validation purposes and should be left unchanged.