Understanding Internationalization Stakeholders

by Adam Asnes, President, Lingoport
As appeared in Multilingual Magazine

In pretty much all of our client engagement opportunities at Lingoport, we quickly arrive at a common discrepancy in how people within organizations view the decision process for internationalization and localization. On the one hand you have a VP or CEO saying, “We must have this product ready for such and such market by year end!” and on the other extreme, you might have an engineer plotting out her decision process based on technical task oriented details – like locale frameworks, database changes and the like. One mindset is event or strategic driven. The other is focusing on the minutia of the process. Neither approach is wrong, but I always feel the client is best served when both mindsets come together.

When companies internationalize their software, it is fundamentally changing its world view from their status quo of selling what they have for their home market, to adapting software to work gracefully in any language or locale. It’s a strategic vision or customer request that brings this about. Or in many cases, a company may have even been localizing product support information, yet selling software as English version only for many years, and recognizes it needs to correct that weakness. Fortunately for us, internationalization is becoming less of a surprise process as executive understanding of software globalization has been maturing.

Globalization is a hot strategic subject for just about every business conference these days. Competition worldwide is tougher, and overall world demand for software is up, so the globalization impetus is hardly visionary any longer. I like to broadly summarize internationalization drivers as:

      • The boss went to a conference/board meeting/gathering and sees that he/she must move forward more aggressively with supporting global software sales


      • Or, the company has a big new client/partner/joint venture opportunity, but it requires that the software work in another or several languages.


      • A competitor is successfully entering new markets with an internationalized product and the company must catch up to compete


      • Or, the company is already quite global but is purchasing another company which is not, and needs to get the software adapted as quickly as possible.


    • The company has a global view, but developed software quickly and as such, let internationalization go in favor of getting to market quickly. The product has proven successful and it’s time to roll it out.

The same company, just depending upon the business unit or product team, may fit into some level of all these business drivers.

Internationalization Tools by Lingoport
Executive View

The executive team will be concerned about the balance of issues regarding delivery time, marketing, sales and personnel expenses, setting up offices/distributors/partners, legal and tax issues, and more countered against revenue projections. Internationalization for them is getting the product ready so that is supports revenues, global logistics and strategies. It’s a key part of the deliverable though clearly a means to a carefully projected outcome.

Engineering View

Lingoport offers software internationalization consulting and software internationalization toolsI have yet to meet the VP of Engineering, or any engineer for that matter, which wakes up one morning and thinks, “Gee, I think I’ll internationalize our software because it would be cool!” Engineering is in general over tasked, shorthanded, time critical and primarily responsive to documented marketing requirements. New feature functionality on the other hand, is occasionally trail blazed by engineering even before marketing clearly understands a need. For most engineers, internationalization is revisiting development they’ve already done, and breaking it, only to be rebuilt again. That’s seen differently than a new feature.

Engineering will view internationalization as a technical objective and use case, deconstructing it into tactical steps. As a rule, Engineers are really smart people, so they go about figuring out how to internationalize their code, but often with no or limited previous internationalization experience. So they intensively hit the books and Google. Here at Lingoport, after internationalizing so many applications over so many programming languages, we are still learning with every implementation, but the bank of knowledge has become quite deep. Internationalizing a complex software system for the first time, the engineers will almost certainly miss-scope part of the effort, make some mistakes, endure some poor assumptions and run late. That has the potential to sabotage the plans that the executive team is counting on. This is where at a minimum, getting some educated advice, tools and assistance can be highly effective in meeting broader market release goals and obligations.

On top of that, engineering time is never free or infinitely available though sometimes both these conditions are initially assumed. The development team requires salaries and other support. Engineering production also has an important opportunity cost. Does the team work on new features for their current clientele in markets where they are already strong, or do they take a “time out” on new feature development to engage in a full on internationalization effort? You can rarely have both going on at the same time unless you bring in outside help, with well coordinated project management and a good source control strategy.

I consider it part of our job, when working with clients, to bring together the executive and engineering criteria, so the strengths of both are considered and all stakeholders are educated and can have a predictable outcome. This makes a foundation for stronger individuals, teams, products and companies.

Unicode and Internationalization Primer for the Uninitiated

Among our friends and clients at Lingoport, we regularly see ranges of confusion, to complete lack of awareness of what Unicode is. So for the less- or under-informed, perhaps this article will help. The advent of Unicode is a key underpinning for global software applications and websites so that they can support worldwide language scripts. So it’s a very important standard to be aware of, whether you’re in localization, an engineer or a business manager.

Unicode and Internationalization

Firstly, Unicode is a character set standard used for displaying and processing language data in computer applications. The Unicode character set is the entire world’s set of characters, including letters, numbers, currencies, symbols and the like, supporting a number of character encodings to make that all happen. Before your eyes glaze over, let me explain what character encoding means. You have to remember that for a computer, all information is represented in zeros and ones (i.e. binary values). So if you think of the letter A in the ASCII standard of zeros and ones it would look like this: 1000001. That is, a 1 then five zeros and a 1 to make a total of 7 bits. This binary representation for A is called A’s code point, and this mapping of zeros and ones to characters is called the character encoding. In the early days of computing, unless you did something very special, ASCII (7 bits per character) was how your data got managed. The problem is that ASCII doesn’t leave you enough zeros and ones to represent extended characters, like accents and characters specific to non-English alphabets, such as you find in European languages. You certainly can’t support the complex characters that make up Chinese, Korean and Japanese languages. These languages require 8-bit (single-byte) or 16-bit (double-byte) character encodings. One important note on all of these single- and double-byte encodings is that they are a superset of 7-bit ASCII encoding, which means that English code points will always be the same regardless the encoding.

The Bad Old Days

In the early computing days, specific character single- and double-byte encodings were developed to support various languages. That was very bad, as it meant that software developers needed to build a version of their application for every language they wanted to support that used a different encoding. You’d have the Japanese version, the Western European language version, the English-only version and so on. You’d end up with a hoard of individual software code bases, each needing their own testing, updating and ongoing maintenance and support, which is very expensive, and pretty near impossible for businesses to realistically support without serious digressions among the various language versions over time. You don’t see this problem very often for newly developed applications, but there are plenty of holdovers. We see it typically when a new client has turned over their source code to a particular country partner or marketing agent which was responsible for adapting the code to multiple languages. The worst case I saw was in 2004 when a particular client, who I will leave unmentioned, had a legacy product with 18 separate language versions and had no real idea any longer the level of functionality that varied from language to language. That’s no way to grow a corporate empire!

ISO Latin

A single-byte character set that we often see in applications is ISO Latin 1, which is represented in various encoding standards such as ISO-8859-1 for UNIX, Windows-1252 for Windows and MacRoman on guess what platform. This character set supports characters used in Western European languages such as French, Spanish, German, and U.K. English. Since each character requires only a single byte, this character set provides support for multiple languages, while avoiding the work required to support either Unicode or a double-byte encoding. Trouble is that still leaves out much of the world. For example, to support Eastern European languages you need to use a different character set, often referred to as Latin 2, which provides the characters that are uniquely needed for these languages. There are also separate character sets for Baltic languages, Turkish, Arabic, Hebrew, and on and on. When having to internationalize software for the first time, sometimes companies will start with just supporting ISO Latin 1 if it meets their immediate marketing requirements and deal with the more extensive work of supporting other languages later. The reason is that it’s likely these software applications will need major reworking of the encoding support in their database and functions, methods and classes within their source code to go beyond ISO Latin support, which means more time and more money – often cascading into later releases and foregone revenues. However, if the software company has truly global ambitions, they will need to take that plunge and provide Unicode support. I’ll argue that if companies are supporting global customers, and even not doing a bit of translation/localization for the interface, they still need to support Unicode so they can provide processing of their customer’s global data.


We come back to Unicode, which as we mentioned above, is a character set created to enable support of any written language worldwide. Now you might find a language or two lacking Unicode support for its script but that is becoming extremely isolated. For instance, currently Javanese, Loma, and Tai Viet are among scripts not yet supported. Arcane until you need them I suppose. I remember a few years ago when we were developing a multi-lingual site which needed support for Khmer and Armenian, and we were thankful that Unicode had just added their support a few months prior. If you have a marketing requirement for your software to support Japanese or Chinese, think Unicode. That’s because you will need to move to a double-byte encoding at the very least, and as soon as you go through the trouble to do that, you might as well support Unicode and get the added benefit of support for all languages.


Once you’ve chosen to support Unicode, you must decide on the specific character encoding you want to use, which will be dependent on the application requirements and technologies. UTF-8 is one of the commonly used character encodings defined within the Unicode Standard, which uses a single byte for each character unless it needs more, in which case it can expand up to 4 bytes. People sometimes refer to this as a variable-width encoding since the width of the character in bytes varies depending upon the character. The advantage of this character encoding is that all English (ASCII) characters will remain as single-bytes, saving data space. This is especially desirable for web content, since the underlying HTML markup will remain in single-byte ASCII. In general, UNIX platforms are optimized for UTF-8 character encoding. Concerning databases, where large amounts of application data are integral to the application, a developer may choose a UTF-8 encoding to save space if most of the data in the database does not need translation and so can remain in English (which requires only a single byte in UTF-8 encoding). Note that some databases will not support UTF-8, specifically Microsoft’s SQL Server.


UTF-16 is another widely adopted encoding within the Unicode standard. It assigns two bytes for each character whether you need it or not. So the letter A is 00000000 01000001 or 9 zeros, a one, followed by 5 zeros and a one. If more than 2 bytes are needed for a character, four bytes can be combined, however you must adapt your software to be capable of handling this four-byte combination. Java and .Net internally process strings (text and messages) as UTF-16.

For many applications, you can actually support multiple Unicode encodings so that for example your data is stored in your database as UTF-8 but is handled within your code as UTF-16, or vice versa. There are various reasons to do this, such as software limitations (different software components supporting different Unicode encodings), storage or performance advantages, etc.. But whether that’s a good idea is one of those “it depends” kinds of questions. Implementing can be tricky and clients pay us good money to solve this.

Microsoft’s SQL Server is a bit of a special case, in that it supports UCS-2, which is like UTF-16 but without the 4-byte characters (only the 16-bit characters are supported).

GB 18030

There’s also a special-case character set when it comes to engineering for software intended for sale in China (PRC), which is required by the Chinese Government. This character set is GB 18030GB 18030, and it is actually a superset of Unicode, supporting both simplified and traditional Chinese. Similarly to UTF-16, GB 18030 character encoding allows 4 bytes per character to support characters beyond Unicode’s “basic” (16-bit) range, and in practice supporting UTF-16 (or UTF-8) is considered an acceptable approach to supporting GB 18030 (the UCS-2 encoding just mentioned is not, however).

Now all of this considered, a converse question might be, what happens when you try to make your application support complex scripts that need Unicode, and the support isn’t there? Depending upon your system, you get anything from garbled and meaningless gibberish where data or messages become corrupted characters or weird square boxes, or the application crashes forcing a restart. Not good.

If your application supports Unicode, you are ready to take on the world.

The State of Continuous i18n & L10n Survey Results

I18n JavaScript – the Good, the Bad, and the Ugly

i18n JavaScript: Given JavaScript’s status as the de facto browser client scripting language, and given the international nature of the Internet, it was inevitable that JavaScript and internationalization (i18n) would eventually cross paths. Fortunately, in this day and age of Unicode, character corruption can be avoided if care is taken to make sure JavaScript is using it. Unfortunately, strings are hard coded in JavaScript and locale-specific methods are unpredictable, making localization more difficult.

To continue reading, and to see how JavaScript strings and data formatting can be supported by your selected locale, please fill out the form below. A brief preview:

Assuming currentLocale is set to English (US), the resulting code block should look like this:

Current Locale Resulting Block | Internationalize JavaScript


Enter Your Information to Download the White Paper

  • This field is for validation purposes and should be left unchanged.

Internationalization Engineering Planning: Secret Sauce

by Adam Asnes for Multilingual Computing

Just recently I got a call out of the blue from a colleague who leads his own internal internationalization (i18n) team at a well known software company, with many leading commercial products. The discussion particularly related to best practices and turning information into actual plans. I suppose the art of planning is kind of a “secret sauce” for any type of engineering. And i18n has its own special ingredients which need to be blended with their own puree of painful lessons. Seriously, i18n is dangerous stuff to estimate.

Here are a few reasons:

  • Requirements are notoriously easy to under estimate. People start just considering string management and then realize that’s just a small part of the full scope (see my other articles).
  • Code bases are typically very large and often you have limited history or connection to the people who wrote it.
  • Different programming languages, web servers, databases and platforms involve optimizing all kinds of encoding issues.
  • Internationalization issues aren’t easy to uncover and they are hidden in the code.
  • There may be all kinds of programming logic that will need to be rewritten as it just won’t work for multiple locales.
  • Architectural elements that need to be added, like locale operations or database changes, touch large amounts of the code, and tend to break everything.
  • The development team isn’t going to sit on their hands while the internationalization effort goes on – so you have two parallel coding efforts, one of which breaks everything (see prior item).

Any one of these issues has enough excitement to warrant an article on its own (and I may just take that path in the future), but it’s probably good to start on a high level describing some of the process with a few example questions and answers. What locales are being targeted and when? You can lump some aspects of target markets together by encoding. For instance, ISO-Latin 1 for Western European languages, Unicode for Asian languages. From there, you need a good idea of what the product in question actually does. How will the user need to set locale? Are address formats, phone numbers, dates, times, currencies, numerical units managed in particular ways? What are the various application tiers? How is data flowing from one part of the code to another?

Regarding those application tiers, are there whole sections of code that are out of scope? Could there be inherent danger in making them out of scope? What programming languages are used? There are drastic differences in how internationalization is handled among programming languages. Java and C# tend to be among the easiest with regard to i18n. PHP has gotten a lot better, but used to have no i18n framework. JavaScript is just a pain, as the very nature of how it’s used typically inspires all kinds of concatenation. C and C++ are typically difficult as there is just so much more involved with character set support, memory management and hosts of nasties like pointer arithmetic. On top of that, ANSI C/C++ is different than Microsoft C/C++. Many Microsoft products in most cases have their own special constraints. For instance with regard to databases Oracle will support ISO-Latin, UTF-8 and UTF-16 encodings. Yet Microsoft SQL Server is ISO-Latin or UCS-2 only (which happens to be nearly the same as UTF-16). The list issues as they pertain to technologies goes on, and on, but you get the idea.

You can break down planning a project in terms of:

  1. What’s not in the code that needs to be added?
  2. What’s in the code that needs to be changed?
  3. When does it need to be completed?
  4. Should parts of the effort be phased?
  5. What’s the budget?

The first question has everything to do with marketing, usage and technology requirements. If you miss requirements you will be late, and build something less desirable than imagined. What’s not in the code is broadly an architectural issue including everything from locale selection and operations, to the method of resource files being used. This takes good smart leadership which has been through i18n planning and construction efforts multiple times.

Question two is all about detective work. How are you going to find all the strings, methods and classes, programmatic logic patterns and more that have to be changed – yet lie buried in those hundred thousand to millions of lines of code? You can look at the interface and start to make guesses (the old way), or really count the issues, while locating and verifying them all with powerful diagnostic software tools. You can relatively quickly list all internationalization issues, view them, confirm their location, even figure the costs of translating the embedded strings with the right software.

Question three is all about marketing plans and revenue expectations. Often there’s a lot riding on target dates, with advertising, sales and customers waiting. Plus you need to factor in ample time for testing. In a perfect world, you would internationalize to the fullest scope possible, but budget and timing reality may mean a phased approach with some aspects left out of scope, depending upon application needs, customer requirements and locale targets.

Question four is often not fully known until the plan comes together. There may be a loose number assigned, but the specifics are a result of planning activities. Nevertheless, money is like oxygen. You’ll need a consistent supply if the project is to get finished without interruption.

Next comes the artistic finesse part. You have to put it all together into a plan. It takes experience to convert your data regarding requirements, architecture and code refactoring into a plan that optimizes the tasks, engineering team, schedule and costs. You could try applying hard metrics for this, like X number of issues means Y time, but often this is only a place to start. You have to plan for “surprises” and variations. Experience shows you where those tend to occur.

I suppose the chief service value that people buy from a quality i18n firm is that experience and its effect on risk, efficiency, time and expenses. Clients only looking to buy an hourly rate are missing the point.

Internationalization Tips for Successful Globalization

by Adam Asnes for ClientSide News

There are two kinds of software internationalization you can refer to – built in to the product from the start, and performed on existing code. The kind of internationalization (i18n) this article invokes isn’t the sort that’s designed into a product right from conception. That is less common, though the pull of global markets is changing that tide. Few application development teams have historically had the opportunity to incorporate world market foresight. They had to produce a product to market for the most immediate business requirements. So then most internationalization happens on existing code because someone sells something, a global company buys another company, or a strategic initiative has taken form. Suddenly there is a new requirement for software to work in any number of new languages and locales. Business requirements drive technical schedules first, rather than involving a creative path of inventing new cool functionality or products from the ground up.

I’m tempted to just write Don’t Panic, carry a towel and avoid Vogon poetry – and while you’re at it, Unicode’s pretty good stuff. I’m being flippant because internationalization efforts tend to each have their own unique challenges when you get into the details. I’ll instead provide this article as a series of i18n process tips that apply across the board. In general Internationalization (i18n) is messy, full of exceptions, and generally not considered optimally from a development perspective. Maybe that should be tip one.

Tip One: Internationalization is ugly. Expect that from the start. You are reverse engineering basic logic of how your software inputs, stores, retrieves, transforms and displays data. You are adding user interaction functionality that your product wasn’t originally designed to do. It’s rarely just about embedded strings. There are a lot of things that can go wrong. It’s a lot of work. In some cases you can run into weird stuff from areas such as compilers, middleware, database connectivity, and even low level operating system issues.

Tip Two: Get the big picture questions handled quickly. That is, what are the high level requirements, how much time do you have, how much time do you need and how much budget can you get? Be prepared to ask for what you need in the CFO’s or CEO’s language.

Tip Three: Remember what’s driving this – Revenue. Internationalizing a complex application is a big new requirement. Don’t underestimate. Being late will cause delays in revenue, stall marketing and sales investments and make you very unpopular. Do it poorly and rushed, and your product will be shabby for the very new customers you seek.

Tip Four: Do some good research or get help identifying requirements. For instance, consider language only as one aspect of a locale. English is a language. Yet England is a different locale, with different expected behavior than the States. Consider numerical formats, dates, times, postal addresses, phone numbers, paper size, currencies and more. Then add the specifics that your application may need, like any possible customizations of workflow, locale selection and more. Consider what the optimal character encoding implementation strategy is for your computer platforms, application tiers, programming languages, database requirements, etc.

Tip Five: Get some good code intelligence. Tools like our Globalyzer software let you comb through your source and identify all kinds of internationalization issues right up front. It’s way better to get a good inventory of what you need to inspect and change, rather than hunting through your myriad lines of code trying to anticipate all kinds of variable conditions using grep, and then trial and error your way through the boatloads of issues you’ll miss.

We are just adding a new capability to Globalyzer, a leading software internationalization tool, called Diagnostics. It will give you summary information internationalization readiness and issues found in your code. It’s fully functional even with just a trial Globalyzer license. No excuses, it’s free to use all you want.

Tip Six: Prepare for nests of difficulties depending upon your programming language(s), database and third party products. Programming languages rate differently in terms of difficulty to internationalize. For instance C and C++ are harder, with many hundreds of potential issues, compared to Java and C#, which have quite a bit of internationalization baked in. But Java and C# don’t internationalize themselves. You have to use their frameworks, which are very capable. The good thing is that when a programming language has well designed internationalization capability, the work goes faster.

Tip Seven: Third party products can cause some challenges. They are not always built for your new internationalization needs. For instance, a couple of years ago we worked on a product that used a third party product for displaying animations in a kid’s game. At first glance, you wouldn’t think it would be an issue, as there was no text being processed or displayed. But when we looked at things more closely, user name and file path info was being passed into the animation tool, which in this case could very well involve wide characters (e.g. Chinese). But the particular version of the animation product, could not support this and so it would always crash. The fix took time and some inventiveness.

Another example involved a third party product that generated a spreadsheet view. While data within the cells was handling Kanji just fine, tabs were corrupting. The third party product provider had declared their product Unicode compliant, but in practice it wasn’t done all the way through. The choice became to find a better third party product to replace this one, or get the spreadsheet provider to fix their product -which they may or may not want to do on your schedule.

Tip Eight: Remember your i18n fundamentals. Don’t embed strings or concatenate them. Watch out for sorting. A and Z are not the beginning and end of all alphabets – some languages don’t use the concept of alphabets. Don’t hardcode fonts. Remember your interface Geometry will need to expand. Use functions, methods or classes that adapt to locale needs. Use Locale adapting sorting  or let your database perform sorting for you whenever possible.

You can automate aspects of repetitive like string externalization using Globalyzer. It makes that tedious job go much faster.

Tip Nine: Account for merging code with parallel feature developments. This can be tricky, as your new feature development cycles could be quite different from your internationalization milestones. In most cases, be prepared to branch the code for internationalization efforts.

Tip Ten: Use Pseudo Localization (PseudoJudo in Globalyzer) to perform many internationalization functional tests before your localize. That means you add pad characters from target locales to the beginning and end of strings, and stretch the whole string based on target requirements. You’ll then be able to see how those strings behave in your display and moving through application tiers, without your engineers needing to understand the target language.

Bonus Tip Eleven: Plan for QA to take longer than it did when your app was just monolingual. Remember, you have internationalization functional testing and bug fixing, with new testing cases, and then, should you be localizing, you have linguistic testing.

Lingoport’s Internationalization Approach

Internationalization tools and software localization project scheduleYou’ve just received a request to prepare your software for sales opportunities in China, Japan and Germany. Your code base is large, maybe you don’t even know how large, but it’s had years of development. The question is how do you tackle the problem and successfully internationalize your code without expensive surprises and delays? Regardless of the size of your code base or what technologies you use, several key actions must be performed in order to create a product that works elegantly anywhere in the world. This document summarizes those actions and how Lingoport’s Globalyzer software, a leading software internationalization tool, enable seamless internationalization of code and long term maintenance.

Planning and Requirements

Internationalization projects can be strategic, tactical or both, depending upon the impetus to perform the effort. Whether internationalization is being pursued as an immediate response to a client opportunity or as a long planned effort to reach new clients in foreign lands can determine the pace, phasing and scope of internationalization. The easiest markets to internationalize for are countries with locale requirements which can be supported using ISO-Latin 1 character sets. These include Western European countries, the Americas, Australia and more. Bi-directional languages, such as Arabic and Hebrew have their own challenges. It can get one step more complicated to support Eastern European locales and further challenging to support “double-byte” languages such as Japanese, Chinese and Korean, using Unicode (though Unicode support should become part of your eventual, if not immediate internationalization plans). The right phasing will depend on a company’s opportunities, technologies and limitations.

Locale support requirements will also affect application logic and formatting. This includes I18n issues such as phone numbers, addresses, dates, times, sorting orders, units of measurement, currencies and more.

Locale selection and application behavior also needs to be defined. For example, is the user’s locale being selected based on the browser setting, or based on account preferences? Does a user need to access or enter data in more than one language?

Technologies from programming languages to databases and third party products will have their differences in how they support locale and character sets.

Creating an internationalization architectural document and project plan, gives the development team a clear roadmap while accounting for requirements and technologies. It also provides a resource that can be followed throughout a product’s lifecycle.

Database Refactoring

Often the first area we will address is migrating the database to the chosen encoding and multi-locale schema. This usually has far reaching implications for many software applications, touching upon how data is stored and retrieved.

String Identification and Externalization

Strings that are embedded in source code and will be seen by a product user, in most business cases, will have to be extracted from the code so they may be translated, and then the corresponding string must be presented to the user depending upon locale selection. However, there are lots of strings in your software that are really debug statements, database queries and the like, which will never be seen by a user, much less ever need to be translated. You have to sift though your code for what you need, and eliminate what you don’t. Then you have the process of externalizing all the strings. That’s slow and tedious work without the right tools and process.

Refactoring of Locale-limiting Methods/Functions and Web Page Encoding

Chances are that all through your code there are methods/functions and pages that won’t properly support your locale requirements. Issues can include support for character encoding, date/time/number fixed formats and the like. These have to be identified and fixed.

Third Party Products

Often software can include the use of third party products that may be used for anything from data input/output, graphics, reporting and more. Third party products need to be researched for any character corruption or locale limitations they may cause, and then rectified. This area can particularly cause surprises, as support isn’t always as claimed.


You need a way to test your application, without requiring your engineers to speak all your target languages. You need a plan and set of procedures to simulate supporting your new locale requirements.

Lingoport’s Approach

Lingoport offers both knowledgeable internationalization architects and engineers while also being the developers of Globalyzer, software for analyzing and performing internationalization efforts. By combining strong analysts with Globalyzer, a leading software internationalization tool, you can attack internationalization challenges based on optimizing internationalization architecture together with comprehensively analyzing internationalization issues buried in your code.

Analysis, Architecture and Planning

Our first step is to meet with your team, including product managers, marketing staff, developers and management to evaluate and develop requirements and plans. Simultaneously, we analyze your source code using Globalyzer, giving us a clear count of internationalization issues that will have to be rectified. We can then apply our metrics, both architectural changes needed and Globalyzer measurements, to accurately estimate internationalization development tasks.


During development construction, we actively use Globalyzer to speed up finding and fixing issues in code, including a wide range of programming languages and even database scripts. Our engineers have strong successful experience internationalizing all kinds of software, which makes the work move along well. We can also parallelize our work with your development team using Globalyzer’s client/server architecture to help us coordinate our efforts together.


During testing we use Globalyzer’s PseudoJudo to “pad” strings in resource files with target locale characters, enabling developers to test that all UI strings have been externalized, characters are properly rendered, fonts work properly and UI layouts expand as needed based on language requirements. We work with your team to make sure testing goes smoothly so that your product works exactly as expected.

Ongoing Internationalization Support

To support internationalization as an ongoing requirement for all new product development, Globalyzer can be used in command-line mode as an automated process, measuring and reporting on any new internationalization issues that may be inadvertently introduced into code. Furthermore, our internationalization architectural documents serve as an important design reference for locale support for your product lifecycle.

Please to discuss your next project.

Product Tip: Finding and Externalizing Strings in Large Amounts of Code

The Plunger Botton: Sucking Strings from Software

It’s a point of cavalier pride that we figure Globalyzer, a leading software internationalization tool, is the only commercial software that features a toilet plunger in its interface. Obviously that flies in the face of internationalization (i18n) convention regarding use of culturally sensitive images in software. But if you’ve ever had to find and externalize strings without Globalyzer, you understand the metaphor pretty quickly.

Finding and externalizing strings in large amounts of code without Globalyzer is repetitive, tedious, error-prone and really not very fun at all. It can cause a serious distraction from other product critical feature development and bog teams down.

The Plunger dates back to us sitting around and trying to think what image makes sense for string externalization. At first, the Plunger was a funny joke. Next thing we knew, we were paying a graphic artist to draw it up, along with the rest of our buttons. We all still get a chuckle out of it. But enough about us, here’s why that button is so important and how to make it work best for you:

One of the important productive contributions that Globalyzer can make to internationalizing existing code is accurately finding and externalizing interface messages, otherwise known as strings. For any readers that might not be familiar with what strings are and why they are a pain, here’s a simple explanation: strings are messages, words (and I’ll lump in images) that are part of the interface of a product. If these words, messages and images are left in source code, they present a technical challenge for a translator to implement a translation without breaking the code.

Plus, even if you do successfully translate without first extracting the strings, and you happen to be really lucky or talented and not break the code, then you end up with a whole new version of your code to support.

Years ago, it was more common to see companies make this mistake. Now we still see it as a legacy of companies having distributors or agents manage adapting their products for various locales. We do still see companies not realizing that as multi-locale data comes in and must be processed by their applications, things break regarding data storage and manipulation, in addition to just display issues, but that’s another story for another day.

Finding strings buried in tens, to hundreds of thousands of lines to millions of lines of code is challenging. Significant efforts were undertaken, and we undergo continual optimization within Globalyzer to solve that problem. It’s important to distinguish actual interfacing messages from programmatic issues such as database queries or debug statements. So Globalyzer lets you build and create special rules around string detection, in addition to providing many default detection and filtering capabilities.

Once you’ve found the strings, you need to put it in a separate file (e.g. properties, resources, .resx), and in its place, put a function in the code that says exactly where that string is, and tells the application to go get it. That’s where the Plunger comes in. Globalyzer’s GIDE interface let’s you visually inspect all the strings detected. You can move from string to string, while also linking a source code view. When you are ready, you simply select the string and hit the plunger button. The string is sucked out of the code, the command to get the string is put in its place, and Globalyzer generates and tracks numeric key values managing that string. All the string “bookkeeping” is done for you. Plus you can optionally insert a comment including the original string so you can see it in the context of your code.

Extracting Multiple Strings

Once you really get going on string externalization, you can use the multiple extraction Plunger button, shown above. You still need to visually inspect strings using Globalyzer’s GIDE to make sure that they aren’t concatenated or something you don’t want to externalize. However, this little button lets you externalize and automatically manage hundreds to thousands of strings at a time. Using Globalyzer, we’ve had customer development teams tell us that they could now find and extract in an afternoon, what had previously taken 6 weeks or more (plus costing release delays and the loss of hair), when they were doing it on their own, even when using simple utilities in their preferred IDE.

Even if you think you’ve already found and extracted all the strings in your source code, chances are good some have slipped through. In fact Lingoport is often hired to find and fix string issues in code that has been globalized previously. It’s just hard to find it all without a system like Globalyzer, and so strings sneak through, resulting in users seeing things like error messages in a language they don’t have command over. The result is a damaged perception of the product, plus a possible call to support.

Plunger Caveat

It’s important to remember that you still have to fix any string concatenation before extracting strings into resource, properties or resx files. Globalyzer provides help for that too.

String Extraction Supported Programming Languages

Globalyzer 2.3 supports string extraction for java, jsp, html, c#, aspx, asp, c/c++, php and Delphi programming languages. If you’re using something else, we can provide custom string externalization extensions to Globalyzer and do so in a timely and cost efficient manner.

Internationalization Primer: How Helping Your Client Solve Coding Issues Can Give You a Competitive Advantage

By Adam Asnes for GALAxy: The Globalization and Localization Association (GALA) Newsletter – written for localization companies who are GALA members.

While recent industry headlines have been dominated by merger mania, I think the long term story for GALA companies is really about how to provide better service, products and returns for our customers. Thats how we compete for and keep customers. Within software localization, the functional emphasis is typically on words – word counts, what they cost, when they will be received, translation memories, translation quality, localization engineering and delivery milestones. But for our company, we get involved months, if not years, before our clients are ready to localize. This article aims to show that you can put internationalization to work as a repeatable and successful activity to differentiate your company further as a problem-solver, helping clients get to market faster and more efficiently.

Why Internationalization is Important

Internationalizing applications can be an extremely painful activity for software development organizations. If they do it poorly, they can expect a pretty weak localized product…and guess who gets blamed for that! There are many issues for development teams to consider regarding locale requirements when they create applications. If they are internationalizing existing code, it gets compounded by actually having to find and fix all the issues buried in hundreds of thousands to millions of lines of code. Consequently, our customers tell us things like, “this is actually much harder to figure out and do than we thought.” Internationalization causes long delays in development and that means big delays for localization projects. Plus, companies usually do it wrong the first few times, and have to learn through painful lessons which initially seem like the localization company’s fault – not a good experience for your company to be associated with. I’d wager that many of you have lost customers because clients blamed localization issues on you, which were actually their own internationalization issues. On the positive side, wouldn’t you want a new and earlier way to be involved with the development managers, product managers, VP’s and CEO’s of your clients? Internationalization is a significant undertaking for many companies. When it’s a new process, internationalization always involves executive decision making. It is not unheard of for our small company to make presentations to the board members of large, publicly traded companies as part of budget planning efforts and global decision making. We think that’s pretty cool! We have unique products and services that make the internationalization effort both scalable and repeatable for development teams, even if they are spread out around the globe. That makes us a strategic bridge for companies going global.

Internationalization 101

You can skip this part if you have a technical background, but it always surprises me that there is still the need to define internationalization within our industry. Though clients often confuse how they use the words internationalization and localization, whenever I talk to them, they are generally pretty clear on the differences in the processes, even if they do throw the wrong terms around. Yet I meet many localization sales people and executive staff that actually don’t understand what internationalization is at all. It’s simply a problem that they have never dealt with. Perhaps there’s more than a touch of “eyes glazing over in boredom” when they see technical articles about the subject; but you really don’t have to make major technical leaps to understand the issues. Simply put, internationalization is all of the planning and execution that needs to be included in the development of software that lets the software support languages and locale formatting (like numerical formats, dates, times, currencies, postal addresses and more). Applications not only have to be capable of displaying any language, they have to correctly allow the input, storage, processing and retrieval of that multilingual/multi-locale data. It mostly breaks down to engineering for a few categories of issues which include:

Character Encoding

Every character you see on the screen corresponds to a set of zeros and ones which get “interpreted” into what you read on the screen. How an application supports character encoding determines whether it will actually work in Chinese, Japanese, French, German, etc. This is where terms like Unicode or ISO-Latin apply. The right character encoding strategy isn’t always obvious and will depend on a balance of marketing requirements, technical requirements and development budget, especially if the code already exists rather than starting from scratch.

String, Images and Resource Management

Every message presented and ultimately translated in an application is referred to in software terms as a string. An important and time consuming part of internationalization involves finding all the user-facing messages (but can also include things like interface sizing), extracting them from the source code, and placing them in some kind of repository files (or database) appropriate to the software architecture. That way you can work on translating the words without breaking the source code. With the right engineering those words can be replaced with any language that the application is supporting. Additionally, string management includes issues like sorting, string concatenation and the like. You’ll also want to identify and manage any images that are embedded in the code (just like strings) so that they may be localized as necessary.

Locale-limiting Functions

Each programming language has its own set of functions or methods that do things like limit the way a date is interpreted, or how many bytes a character can contain. There are hundreds of these sneaky little things in C/C++ and there are dependencies based on your character encoding choice (e.g. Unicode UTF-8). Other programming languages such as Java and C# have less of these issues, but still have their own possible pitfalls. These functions need to be found and replaced with others that support the locale requirements that will be needed.

Locale-limiting Programming Patterns

Programmers may do many of the right things in terms of extracting strings, using functions that support “wide” characters and the like, but it’s still easy to get in trouble. Think of programming patterns as logic created for a specific application, which doesn’t work once you include issues around multiple locales. Programmatic sorting logic is a good example; a typical developer would sort by alphabetical order rather than by character brush stroke. Programming patterns can be a big nasty area to re-engineer, and it takes experienced examination and planning to manage.

Locale Operators

Simply determine how the software will detect what locale it needs to support and how it will behave under the circumstances. For instance, does the user manually choose the locale, or does the application check the operating system setting?

Third Party Product Limitations

Most software makes use of other application components. These can include databases, reporting mechanisms (i.e. Crystal Reports), email generators and more. Often these components have their own internationalization support issues, which can create their own challenges to the software developer.

Localizing When the Client Hasn’t Internationalized

Another comment I hear from localization companies is that they have localized applications that weren’t internationalized, even working on translating strings that were buried in the code. I have to say this is a poor practice that should be avoided. I have had software companies come to us quite bitter about localization companies that were just doing what they were told in this regard. Chances are very high the software is going to break. In addition, making the interface translatable is just one part of the internationalization effort. If, by sheer luck, the application still works, they will not be able to leverage the translation when they go to a new version. There is no way this is going to have a happy ending in the long run. One way to help a customer in this situation is to suggest them checking their code, for example by running it through our Globalyzer software. This will give them a very clear inventory of what they need to fix. They can use Globalyzer to save 40% to 60% of time and resources to get the internationalization done, or they can hire us to do it for them.

How can you use all this to make a difference?

When your client says they are not ready for localization, that’s your signal to ask them if they are working on internationalization. If they still say no, find out if they have plans of going global with their software. The earlier they start thinking about internationalization and putting practices in place, the less painful the transition will be. If the client is experienced with localization, ask them if they are interested in learning about products that help them perform and verify internationalization so localization is made easier. You are doing them a service to bring it up and discuss it either way. This discussion can establish you as a strategic partner rather than another tactical translation company. Use internationalization to help you get to know your client’s organization – from Product Manager, to the VP of Development. to the VP of Marketing, to the CEO. Don’t try to talk techie if you’re not qualified. But discussing the concept can lead to opportunities and help you build a strong relationship. When it comes to the technical side, work with an internationalization expert who performs well both technically and professionally. Of course I’d like you to contact Lingoport, as we do a great job of partnering with localization companies, and, just as importantly, we have products and a well developed methodology that make internationalization far more efficient and complete.