Properly Matching Character Sets and Collations in MySQL

Man works on character set and collation changes on his tablet.

Have you ever come across a scenario where your data looks good until you import it into MySQL? The data you’ve imported into MySQL is suddenly displayed as “???”, letting you wonder what’s wrong in your database configuration. The reason of this problem is a character set or collation mismatch – when worked with properly, character sets and collations can make our work with MySQL a dream, or a nightmare.

Collations and character sets have their place – they let MySQL decide where from we acquired data and how best to work with it internally; since we can choose from a whole sea of charsets and collations within the RDBMS, properly matching them for our use case is crucial.

What are Character Sets and Collations?

Put Simply, character sets define what characters are considered to be “legal” in a certain string, while collations are a set of rules that tell MySQL how to sort characters within the string. That’s why when we input, for example, data from a Chinese, Taiwanese, or Russian source without fully preparing for it, we’ll see “????” instead of our character string – our database won’t be prepared.

Character sets and collations aren’t unique to MySQL, though – they are relevant to every database management system we find ourselves using, including Oracle, Sybase, TimescaleDB, or, you’ve guessed it, any flavor of MySQL as well.

A proper match of character sets and collations is crucial if we want to work with data from “exotic” (not english) sources. Some of the character sets and collations considered “default” in MySQL are as follows:

  • The default character set is latin1 with a collation of latin1_swedish_ci. This character set is a fit if we want to work with Swedish characters (MySQL is built by Swedish people), or Russian characters.
  • If we use Chinese characters, the collation that would be a fit would be big5_chinese_ci with a charset of big5.
  • For general use cases, we should use a character set of utf8mb4 with a collation of utf8mb4_general_ci (or utf8mb4_unicode_ci.) The reason behind this is that utf8mb4 is considered the “true” form of utf8 since it can support 4 bytes in a character while utf8 only supports 3. For more “corner” use cases, there’s the utf8mb4_unicode_ci variation,
  • In the case of Hebrew characters, we should use a Hebrew charset with a collation of hebrew_general_ci.  You get the point – there are multiple character sets and collations we are able to choose to employ for a concrete use case. For a complete list, connect to MySQL and issue a SHOW CHARACTER SET query.

Some of the results derived by that query are shown below:

example of character sets

MySQL will even provide a description of the character set and collation, allowing you to make easier decisions.

Properly Matching Character Sets and Collations

As you can tell by now, there are numerous character sets and collations we are able to choose from. Properly matching them is a skill every database administrator gets hold of with experience and with the help of MySQL (see above) – there inevitably will be cases where we mismatch character sets and collations, and get to know of it after the fact, but that’s not a very big issue – and it allows us to learn as well.

Some of you might know that some character sets consume more space on the disk than others do and once we have a lot of data that could become an issue, but here’s a lifehack: to overcome such an issue, we can issue a SHOW CHARACTER SET query and look at the “Maxlen” column (see above), or if we are dealing with UTF8, closely observe a part of the collation that is in use – e.g. “utf8mb4” will use 4 bytes instead of 3, etc.

With all these character sets and collations, properly matching them all might be a challenge if we are not educated on how databases work – even with the query given above, MySQL doesn’t tell us all that much if we’re not aware of the basics of charsets and collations. Don’t worry much, though – the description, though very short and to the point, should become a good starting point.

There’s no need to get too in depth on this one – look at the language descriptors after the charset and decide from there. Before deciding, though, you should use when to use different character sets and how to choose.

When to Use Different Character Sets?

People mostly use different character sets (i.e. not the default ones provided by MySQL) when they’re working with data from sources that MySQL would consider “exotic” (i.e. not English or Swedish of nature.) The closest definition on when you should switch to a different character set is “whenever you feel like your data belongs to an “exotic” country.

Think of the countries people from Europe are vacationing in: Turkey, Dubai, Spain, Portugal, etc. There are countries that are different beasts in and of themselves, amongst them is China, Taiwan, Japan, South Korea, Russia and a couple of others (we mean those countries that talk in completely different alphabets than the one we are used to.)

If you are not sure on when to make the switch and to what character set, see the documentation for some more explanation, but the explanation above should put you closer towards your goal.

Going Beyond Character Sets and Collations

Once you’ve properly matched character sets and collations, there a couple more things you need to be aware of for your database to perform at the very best of its capability – you need to keep a very close eye on your database performance and availability as well.

Not even talking about security – everyone knows that after COVID, the number of data breaches have continued to rise and rise and no one wants their database infrastructure to become the next target.

For that, consider employing database monitoring inside of your database infrastructure – database monitoring tools such as the ones provided by dbWatch or database vendors themselves will let you see how your database is doing from availability, performance, or security points of view all the time. The problem is, though, that database management software solutions provided by database vendors (think MariaDB, Percona, TimescaleDB, Redis, and the like), are generally expensive and you’re paying more for the name than software capabilities.

Discover how dbWatch seamlessly supports diverse character sets—start your 90-day free trial today and experience the difference firsthand!

Can a database management solution help future-proof your business?

Future proofing is part and parcel of any business strategy. You need to plan for your business to succeed. A noteworthy solution is investing in a flexible and reliable database monitoring tool. By investing into it, your staff is equipped with the means to increase the productivity and your business adapts to the changing landscape of your IT infrastructure. However, the process of future proofing does not end with just getting the right tool. The application of the tool and how it can add value to your business will merit your business longevity and success.

In this article, we will discuss what a database monitoring tool is, why future proofing matters and the applications of a data monitoring tool to your business are.

Tools For Growing Businesses

Future proofing plans are indispensable when a business is continually growing. As management plots the course for their company’s direction, the alignment of everyone and everything being on the same page becomes more important than ever.

A pragmatic approach for future proofing is equipping your IT staff with the right tools. In terms of database management, this enables them to handle the growing number of databases dispersed into numerous clustered instances.

As your business grows, your databases and servers will need to handle the additional strain.You can opt to upgrade your hardware resources or optimize your databases by horizontally or vertically scaling them. Either method will cope with the increase in demand but implementing them would be costly.

Under a data management tool, you get a more cost-effective solution when applying either or both methods. Financially, you will not needlessly implement hardware upgrades since it monitors the memory of your database and provides real-time health checks. Time-wise, it saves you the additional steps in cascading database optimizations and monitor all your database because you can do it in a single window.Won’t it be great if you can reap the rewards of your business venture without worrying about possible future headaches?

Flexible Database Management Software

Business growth is unpredictable. You are not 100% certain what your exact needs for your IT systems will be five years down the line. In that regard, flexibility of another key attribute when searching for the correct database monitoring tool.

Future proofing reassures you that your business is prepared to what adversities may come. With a flexible database monitoring tool, you can grow your business the way you want.

Some database monitoring solutions are designed to work across multiple platforms and versions. This is great for businesses who are currently using multiple technologies or adopting a new database management systems or license.

Flexible tools that can work capable of working across multiple platforms and varying sizes of servers can be beneficial for your long-term business goals. Not only is your Return of Investment larger when you invest in a flexible database monitoring tool, you get the advantage of:

  1. Not needing to purchase different database tools.
  2. Preventing additional cost to train your staff in handling the tool.
  3. Decreasing the period of learning as your staff familiarizes themselves with the tool.

Evidently, there are numerous advantages and it can be package neatly into one database monitoring tool.

Application of Future-proofing Growing, scaling and monitoring

There are two ways to of scaling. One of them is horizontal scaling while the other is vertical scaling. Traditionally, vertical scaling is a preferred method since it involves focusing on a single component and improving upon it. But times change and business are more attune to horizontal scaling because it’s more cost effective and flexible than its counterpart.

 

Whether you choose to implement vertical scaling or horizontal scaling, one thing is for certain – your databases will continue to expand. This is where a database monitoring tool comes in handy. For your monitoring solutions, the tool hides the complexity of deep diving into health checks and status displays. You don’t need to be a veteran DBA to understand this tool; the tool does the work for you. Even an amateur DBA can perform database monitoring without the need to repeatedly rely on SQL scripting.

When a downtime scenario happens, a database monitoring tool will be your reliable partner. It gives pinpoint accuracy on what and where the problem occurred. Even if your databases are not the cause of the issue, you can eliminate your databases as the possible suspect and move on in analyzing your servers. It saves steps in analyzing the root cause and saves your staff time. Potentially, you can recuperate losses faster brought about by a downtime and reassure your customers that they mean more as they go about the business.

But what if your database is the issue? Automation is our key word. Automating backups and real time analysis becomes your go-to options. As mentioned earlier, you get a real-time analysis of problems encountered. If your database is no longer viable, you have backups to be restored in multiple instances with only a press of a button.

Cost-benefit with hybrid environments

Moving into cloud-based environment is a good alternative for future proofing. Scaling becomes more convenient and quickly adapt to business demands. But, there’s a catch for adapting a cloud-based environment; it’s a pay-per-usage model. Unchecked resource utilization and vendor’s discretion on provisioning cloud databases will eventually be the bane of your business.

But this should not discourage you from integrating cloud to your business. Cloud services has their own unique advantages and by correctly utilizing them, you gain more value in the long run.

 

Database monitoring is one key ingredient in tracking your cost. By knowing your actual utilization of resources, cost can be efficiently reduced. Insights into acquiring additional resources or trimming down resources through cost-benefit analysis are unlock in a strategic level. While, database performance monitoring, even if it’s inside a cloud platform, benefits DBAs on the technical level.

You can proportionally balance the utilization of both cloud and on-premise. It is mostly up to the business on how they will hybridize their IT structure but with a database monitoring tool, these challenges are dealt with.

Aside from savings in your IT infrastructure, you also save on auxiliary costs for hybrid environments. You won’t need to shoulder the incremental costs of procuring additional licenses nor do you need to retrain your DBAs to familiarize with cloud platform databases. You save them hours of exerting extra effort so they can proceed with more relevant tasks. It’s a win for you and for them.

Multiple platforms in a single view

Database platforms offer differing advantages and disadvantages in their structure and performance. One platform could partition its layers into multiple instances while the others follow one instance and subdivided into schemas. There are also platforms that prevents cross database scripting while other openly embrace this feature. Whatever the case maybe, when a business decides to adopt one of the platforms and maintain an old one, this burdens the DBA even more.

Imagine a business deciding to transition to Postgres while retaining their MSSQL legacy database. The DBA will be forced to learn both database structure, limitation of their SQL languages and maintain both databases into two separate windows. But what if the business wants to try out Oracle then MySQL? This becomes problematic as you add more database platforms to your business.

 

Luckily, a database monitoring tool handles this job problem efficiently. The tool helps the DBA in monitoring all database across all platforms. Even with someone whose database knowledge is not proficient with SQL scripting or with the database management software, he or she can utilize the tool’s features to a great degree.

Without the need of accessing multiple database management software, a database monitoring tool can access several heterogeneous databases and consolidates their statuses into one window. This makes it easier to do health checks to your databases, optimize functions and stored procedures, and monitor the uptime of your databases.

Monitoring an Oracle, for instance, will be a lot easier since the tool hides the complexity of the standard DBA procedure of doing health checks and optimizations. Not only that, you can save up more in terms of licenses.

Usually, business tend to conclude that additional licenses are their only option to answer their growing demand in storing data. The alternative of hiring or directing their staff to gather utilization information on their databases is no different. It costs time and money. But, with a database monitoring tool, you won’t needlessly purchase additional licenses nor allocate significant manpower to the task. The tool does it for you.

Ultimately, your DBAs needs are met so as your business needs.

Customization creates clear communication

A report can only convey much information as needed. Having a very lengthy and thorough report can bombard your readers with unwanted information and even discourage them from continuing.

Customization is another aspect that is essential for a database monitoring tool. By customizing your dashboard, you can get the overview of database’s performance. When generating reports, you only provide the most important information management seeks. No need to have lengthy emails explaining the issue at hand. A visual graph will summarize it for you. This line of communication delivers transparency between your managers and DBAs.

 

Customizing your dashboard and reports brings salient and accurate information for management to decide on their future strategy. In addition, with visual facts at their disposal, DBAs can confidently and easily execute solutions.

Recommendations

To sum it all up, it’s very reassuring with a database monitoring tool at your disposal when you are future proofing your business. It creates that freedom of choice, convenience and cost savings. However, do not just settle for any database management tool. You need to also assess your company’s needs and what value the tool offers. To help you with your decision process, below are questions that you might want to consider:

  1. – Is this tool compatible with the database management system I am currently using?
  2. – Will this tool still be compatible even if I change my database management system?
  3. – Does this tool support major database management systems such as Oracle and Microsoft?
  4. – With this tool, can my staff perform in that same level of efficiency even if my business rapidly grows?

You may not have the answers now, but there will be a point in time you will need to answer them. It’s very hard to decide on what database monitoring tool to use. There are several products out there in the market. But starting now with a brand like dbWatch can help you go a long way.

Alternative Storage Engines in MySQL

alternative storages engines with databases visual

If you’ve been a frequent user of MySQL now or in the past, you know that there are multiple storage engines MySQL allows us to choose from. The most frequent is InnoDB which is known as a “high performance” storage engine. If InnoDB is optimized properly via the parameters provided by MySQL in my.cnf, the storage engine can do wonders.

MyISAM is considered more obsolete though, but it has its own use cases including providing an exact count of the rows in a specific table and storing database files in a recoverable format since the files provided by MyISAM are not bound to any other files such as ibdata1 provided by InnoDB.

alternative storages engines with databases visual

What Alternative Storage Engines are Provided by MySQL?

InnoDB is the default storage engine provided by MySQL – that’s a given. However, there are a bunch of other storage engines provided by the RDBMS, including the following:

Percona XtraDB 

This storage engine is only available on Percona’s infrastructure (i.e., only for people using Percona Server). It is still essentially meant to be a fit for those who want InnoDB capabilities on their infrastructure.

It’s considered an advanced version of InnoDB because it’s designed for scale on any hardware, and it has a couple of other features designed for high performance and capability. It’s also fully backward compatible with InnoDB, so no worries if you want to switch back towards InnoDB after you’ve tried the storage engine and disliked it.

MyISAM

This storage engine was the default storage engine until MySQL 5 rolled around so it’s old as a piece of bread, and most database administrators advise against using it. In fact, the newest version of MySQL considers this storage engine to be obsolete as it’s not reliable, and also prone to crashes due to how it’s built.

However, it has a couple of upsides – those include the fact that all files relevant to MyISAM are stored in. MYD (.MYData) files alongside MYI (.MYIndex) files an d they are not dependent on any one file to work, unlike InnoDB is dependent on ibdata1.

The storage engine is also renowned for its capability to actually store row count of tables inside of its own metadata (that’s why InnoDB doesn’t show the row count and MyISAM does), but that’s really the extent of the usefulness of this storage engine. Users of MyISAM will face frequent crashes, data corruption, and other issues, which means that this storage engine is not a fit for those who are looking for performance capabilities neither now, or anytime in the future. It’s useful if we want to play around with it though!

MEMORY

As the name suggests, this storage engine stores all of its data inside of the memory. The storage engine is designed to be a special-purpose storage engine as its all of its contents stored in memory, unlike other storage engines. Such an approach has a glaring flaw – if our server is shut down, all of the data is gone, but on the other hand, the storage engine is very suitable as a temporary working realm or as a read-only cache for data.

CSV

This storage engine is known for its capability to store data in .CSV files together with comma-separated values. There’s not much that can be said here: .csv files usually consist of data values that are separated by a comma (“,”), and that’s the primary reason the storage engine is called that way. Such a storage engine isn’t used very frequently, but it can be useful for some corner-cases of project development, for example, for bulk data imports, or simply to make the job easier for an accountant working with Excel – that way files can be imported straight into MySQL without any additional hassle.

ARCHIVE

This storage engine, as the name already suggests, is suitable for archival operations. The primary use case of this storage engine is to store large amounts of data that isn’t accessed frequently – for such a use case, this storage engine is golden because it’s footprint on the disk is almost non-existent allowing DBAs and server admins to focus on other tasks rather than scaling up the server simply to support an archive of data.

FEDERATED 

This storage engine allows us to access data in a remote database without using replication or clustering in the fashion that if we query a local table with a FEDERATED storage engine, MySQL automatically pulls data from other tables running the same engine. As easy as that! One thing to note, though, is that the FEDERATED storage engine would not usually be enabled by default, but it can be enabled by running MySQL after using the –federated option.

BLACKHOLE

If there’s a storage engine in MySQL that makes you question the existence of the universe, it’s BLACKHOLE. Literally – the storage engine is called after a black hole in the universe in the fashion that it accepts data, but never stores it inside of itself. You might question yourself and wonder what it’s use case is, but MySQL explains it in a simple fashion that probably doesn’t warrant further explanation at all – such a storage engine should be used performance bottlenecks in MySQL that are not related to storage engines, or used for running triggers on data that shouldn’t be kept.

EXAMPLE

Last but not least, there’s an example in the MySQL infrastructure letting you get a grasp of how storage engines are built in the first place. That’s the job of the EXAMPLE storage engine – this storage engine doesn’t store data (or, better put, it doesn’t have any functionality at all), but rather, it’s intended to become an example on how to build or improve on the functionality of storage engines yourself. If you want an example that depicts how to build storage engines in the MySQL source code, look no further! Glance at the MySQL documentation and start building one yourself.

Exploring MySQL Beyond Storage Engines

Should you want to explore MySQL beyond it’s storage engine capabilities, be advised that it’s a dangerous world out there in the sense that there’s quite a bit of information that might not be true – to get up to date information, consider reading up on blogs such as the one provided by MariaDB, Percona, and other database vendors.

Of course, the dbWatch blog is a very good place to start – it will provide you information on various kinds of database management systems aside from MySQL, and provide you information on how to best solve the issues related to any database management system you elect to use.

Grab a free trial of dbWatch to see how your databases fare against issues targeting them free!