标签云

微信群

扫码加入我们

WeChat QR Code

I'm developing a multilanguage software. As far as the application code goes, localizability is not an issue. We can use language specific resources and have all kinds of tools that work well with them.But what is the best approach in defining a multilanguage database schema? Let's say we have a lot of tables (100 or more), and each table can have multiple columns that can be localized (most of nvarchar columns should be localizable). For instance one of the tables might hold product information:CREATE TABLE T_PRODUCT (NAMENVARCHAR(50),DESCRIPTION NTEXT,PRICE NUMBER(18, 2))I can think of three approaches to support multilingual text in NAME and DESCRIPTION columns:Separate column for each languageWhen we add a new language to the system, we must create additional columns to store the translated text, like this:CREATE TABLE T_PRODUCT (NAME_ENNVARCHAR(50),NAME_DENVARCHAR(50),NAME_SPNVARCHAR(50),DESCRIPTION_EN NTEXT,DESCRIPTION_DE NTEXT,DESCRIPTION_SP NTEXT,PRICENUMBER(18,2))Translation table with columns for each languageInstead of storing translated text, only a foreign key to the translations table is stored. The translations table contains a column for each language.CREATE TABLE T_PRODUCT (NAME_FKint,DESCRIPTION_FK int,PRICENUMBER(18, 2))CREATE TABLE T_TRANSLATION (TRANSLATION_ID,TEXT_EN NTEXT,TEXT_DE NTEXT,TEXT_SP NTEXT)Translation tables with rows for each languageInstead of storing translated text, only a foreign key to the translations table is stored. The translations table contains only a key, and a separate table contains a row for each translation to a language.CREATE TABLE T_PRODUCT (NAME_FKint,DESCRIPTION_FK int,PRICENUMBER(18, 2))CREATE TABLE T_TRANSLATION (TRANSLATION_ID)CREATE TABLE T_TRANSLATION_ENTRY (TRANSLATION_FK,LANGUAGE_FK,TRANSLATED_TEXT NTEXT)CREATE TABLE T_TRANSLATION_LANGUAGE (LANGUAGE_ID,LANGUAGE_CODE CHAR(2))There are pros and cons to each solution, and I would like to know what are your experiences with these approaches, what do you recommend and how would you go about designing a multilanguage database schema.


Check codeproject.com/KB/aspnet/… stackoverflow.com/questions/3077305/… stackoverflow.com/questions/929410/…

2019年10月20日16分42秒

You can check this link: gsdesign.ro/blog/multilanguage-database-design-approach although reading the comments is very helpful

2019年10月20日16分42秒

LANGUAGE_CODE are natural key, avoid LANGUAGE_ID.

2019年10月20日16分42秒

I already seen/used the 2. and 3., I don't recommend them, you easily end up with orphaned rows. SunWiKung design looks better IMO.

2019年10月20日16分42秒

I prefer SunWuKungs design, which coincidentally is what we have implemented.However, you need to consider collations.In Sql Server at least, each column has a collation property, which determines things like case sensitivity, equivalence (or not) of accented characters, and other language-specific considerations.Whether you use language-specific collations or not depends on your overall application design, but if you get it wrong, it'll be hard to change later.If you need language-specific collations, then you'll need a column per language, not a row per language.

2019年10月20日16分42秒

This option is similar to my option nr 1 but better. It is still hard to maintain and requires creating new tables for new languages, so I'd be reluctant to implement it.

2019年10月20日16分42秒

it doesn't require a new table for a new language - you simply add a new row to the appropriate _tr table with your new language, you only need to create a new _tr table if you create a new translatable table

2019年10月20日16分42秒

i beleive that this is a good method. other methods require tons of left joins and when you are joining multiple tables that each of them have translation like 3 level deep, and each one has 3 fields you need 3*3 9 left joins only for translations.. other wise 3. Also it is easier to add constraints etc and i beleive searching is more resonable.

2019年10月20日16分42秒

When T_PRODUCT has 1 million rows, T_PRODUCT_tr would have 2 million.Would it reduce sql efficiency much?

2019年10月20日16分42秒

Mithril Either way you have 2 million rows. At least you don't need joins with this method.

2019年10月20日16分42秒

Good detailed response, many thanks. But what do you think about the collation issues in the Method 5 solution.It seems this is not the best way when you needed to sort or to filter the translated text in the multilingual environment with different collations. And in such case the Method 2 (which you "ostracized" so quickly :) ) could be a better option with slight modifications indicating target collation for each localized column.

2019年10月20日16分42秒

Eugene Evdokimov: Yes, but "ORDER BY" is always going to be a problem, because you can't specify it as a variable. My approach would be to save the collation name in the language table, and have this in the userinfo. Then, on each SQL-Statement you can say ORDER BY COLUMN_NAME {#collation}, and then you can do a replace in your dal (cmd.CommandText = cmd.CommandText.Replace("{#COLLATION}", auth.user.language.collation) . Alternatively, you can sort in your application code, e.g. using LINQ. This would also take some processing load off your database. For reports, the report sorts anyway.

2019年10月20日16分42秒

o.o This must be the longest SO answer I've seen, and I saw people make whole programs in answers. You're good.

2019年10月20日16分42秒

Can totally agree SunWuKung's solution is the best

2019年10月20日16分42秒

I agree, though personally I'd have a localised table for each main table, to allow foreign keys to be implemented.

2019年10月20日16分42秒

Although the third option is the most clean and sound implementation of the problem it is more complex then first one. I think displaying, editing, reporting the general version needs so much extra effort that it does not always acceptable. I have implemented both solutions, the simpler was enough when the users needed a read-only (sometimes missing) translation of the "main" application language.

2019年10月20日16分42秒

What if the product table contains several translated fields ? When retrieving products, you will have to do one additional join per translated field, which will result in severe performance issues. There is as well (IMO) additional complexity for insert/update/delete. The single advantage of this is the lower number of tables. I would go for the method proposed by SunWuKung : I think it's a good balance between performance, complexity, and maintenance issues.

2019年10月21日16分42秒

rics- I agree, well what do you suggest to ... ?

2019年10月20日16分42秒

Adam- I am confused, maybe I misunderstood. You suggested the third one, right? Please explain it in more detail how are relations between those tables gonna be ? You mean we have to implement Translation and TranslationEntry tables for each tables in DB ?

2019年10月20日16分42秒

this is good. but how would you search ( for example product_name ) ?

2019年10月20日16分42秒

Did you have a live example somewhere of your sample ? Did you get any problems by using it ?

2019年10月20日16分42秒

Sure, I have multilingual real estate project, we support 4 languages. The search is a bit complicated, but its fast. Of course in large projects it might be slower than it needs to be. In small or medium projects its ok.

2019年10月20日16分42秒

Same remark as Adam Davis's answer.

2019年10月20日16分42秒

What purpose does the Translation table or the TranslationItem.translationitemid column serve?

2019年10月21日16分42秒

I know that localization is a much broader topic and I am aware of the issues that you bring to my attention, but currently I am looking for an answer for a very specific problem of schema design. I assume that new languages will be added incrementally and each will be translated almost completely.

2019年10月20日16分42秒

Sure. I'd call the ProductItem table something like ProductTexts or ProductL10n though. Makes more sense.

2019年10月20日16分42秒