A more recent version of this article can be found here.
When we discuss data volumes in electronic discovery, we typically speak in terms of kilobytes, megabytes, gigabytes and sometimes even petabytes, at least for very large matters. What do we mean by these terms? How many bytes are in a kilobyte, a megabyte or a gigabyte?
The question is trickier than you might think. Are we talking about multiples of 1,000 or 1,024? Is a kilobyte 1,000 bytes or 1,024 bytes? Is a gigabyte 1,000 megabytes or 1,024 megabytes? I have heard both measures used by people I respect. Which is right?
Let’s start by checking the Sedona Conference Glossary. I am not saying this is the bible, but there are a lot of smart people involved with Sedona. I have always found their materials helpful and exceedingly well written.
In the Glossary, the Working Group on Electronic Document Retention and Protection (WG1) defines a kilobyte as a unit of 1,024 bytes. They then go on to define megabyte as 1,024 kilobytes (1,048,576 bytes) and a gigabyte as 1,024 megabytes (1,073,741,824 bytes).
Is that right? The authors don’t cite any authority for their definitions. (To be fair, none of the glossary terms are referenced.) They may have had specific authority in mind or it may simply represent the consensus of the group at that time. (A lot of people believe this definition is correct.) Either way, I think we need to look a little farther before we reach a conclusion. Here is why.
The Metric System
The metric system was an outgrowth of the work done in 1875 by the International Bureau of Weights and Measures (“BIPM” for the French version), which itself was set up by the “Metre Convention.” At the time, 17 countries banded together by treaty in an attempt to create measurement standards. Today, at least 51 countries have signed on to the treaty, including the United States. See, Le Système international d’unités (8th ed. 2006) (English translation at 95).
The group almost immediately began work ratifying definitions of the meter and kilogram, both measures that had been used in France and elsewhere for over a hundred years. That work led to the International System of Units (SI), which was ratified in 1960. It is often called the metric system, with expanding and contracting units built around the power of 10 (base 10).
Following this history, let’s move onto some firm ground. The prefixes “kilo,” “mega” and “giga” are a central part of the SI. Each prefix is defined based on the power of 10. Under the SI, kilo means 103, mega means 106, and giga means 109. Under the SI, one gigabyte is 1 billion bytes. No ifs, ands or buts.
If kilo means 1,000, where did all this 1,024 business come in? We need to go back a bit to find out. Like, to the ’60s when I was still wearing tie dye t-shirts and playing in rock bands.
Early Days of Computing
In the early days of computing, computer professionals needed a way to describe numbers that were growing by the minute. As most of us know, computers are binary creatures, using combinations of 1 and 0 for all of their calculations. A bit is a single integer that can be either a 1 or a 0. A byte consists of 8 bits and was the smallest unit in computing associated with a letter or other character.
As the number of bytes used for programs or data grew larger, computer scientists needed a way to express these larger amounts easily. Out of convenience, they reached for decimal prefixes from the metric system to aggregate byte values. They borrowed the term kilobyte for units of 1,024 bytes, and then megabytes and gigabytes for the larger groupings. The idea caught on and people started using the terms to describe binary values based on a divisor of 1,024.
While a bit odd, this misuse of the metric prefixes didn’t matter very much, at least early on. With the two values being relatively close, it seemed simpler to give 1,024 a metric label than invent another name. Since the volumes they were talking about were low, who cared? The differential between the metric and binary approaches was more a theoretical than practical problem in the early days.
By the late 1990s, volumes increased to the point where the differential mattered. The key point to understand is that the difference compounds in a semi-logarithmic function. For example, the SI kilobyte value is nearly 98% of the binary kilobyte, a megabyte is under 96% of a binary megabyte, and a gigabyte is just over 93% of a binary gigabyte value. That meant that a 300 gigabyte hard disk would show as only containing 279 gigabytes.
Different people were now using different measurements for the same or similar things. Memory makers, for example, used the binary system to calculate memory size. In contrast, hard drive manufacturers used the decimal system to express bytes. Remember CD-ROMs? They were measured using the binary system. Today, DVDs are measured using the decimal system. Computer clock speeds are expressed in kilohertz, which mean a thousand hertz. And so on.
From the beginning, the Windows operating system expressed gigabytes in terms of the binary calculation. Bytes are expressed as gigabytes. The measure is based on binary calculations.
Apple takes a different approach. Like other hardware manufacturers, they report on hard drive size using the decimal version of the gigabyte. In earlier versions of the Mac OS, however, they reported on disk size using binary gigabytes. That changes with Mac OS 10.6, called Snow Leopard. Now, the OS reports storage capacity based on decimal calculations. For the first time, a 200 gigabyte hard drive will show 200 gigabytes of storage.
But, whoa, it gets trickier. If you happen to be using Mac OS 10.6, Snow Leopard, storage capacity will be expressed based on decimal calculations. For the first time, a 200 gigabyte hard drive will show 200 gigabytes of storage. Go figure.
For what it’s worth, some components of the Linux kernel measure capacity using decimal units as well.
Naturally, consumers and non-geeks managed to get confused by all of this and lawsuits followed. Class actions were brought against Seagate and Western Digital, two of the largest hard drive manufacturers in the world. While they maintained that the decimal divisor was the correct one for measuring disk size, they ended up settling with a refund to the consumer class.
The Move to Standardize
In the mid-1990s, people started suggesting that we standardize on terms and stop this confusion. After a lot of discussion and false starts, the International Union of Pure and Applied Chemistry proposed the use of specific terms for storage values expressed in metric terms. Three more groups, the Institute of Electrical and Electronic Engineers (IEEE), the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) quickly joined the band.
In December 1998, the IEC, one of the leading international standards organizations, came up with new terms for binary multiples in an attempt to distinguish them from the metric terms. They suggested that the proper terms for binary calculations based on 1,024 as the divisor are kibi, mebi, gibi and the like.
This approach was picked up in the United States by the National Institute of Standards and Technology (NIST), which offered the following chart to describe the binary measures.
To ease confusion, here is a chart from Wikipedia showing the relationship between the metric units and the binary ones.
This movement has gained steam to the point where every major standards body is in agreement that a gigabyte is 1 billion bytes (109) and the corresponding gibibyte represents 1,073,741,824 bytes, based on the binary factorial 230. The organizations that accept this include:
Harkening back to our discussion on the International System of Units, the International Bureau of Weights and Measures (BIPM) now expressly prohibits the use of SI prefixes to denote binary multiples. Instead, they suggest adoption of the IEC prefixes for binary units. (See, Le Système international d’unités, page 121.)
I am not aware of any recognized standards organization, except perhaps the Sedona Conference, proposing that the prefixes kilo, mega and giga mean anything other than multiples of 1,000.
So Which Is It: 1,024 or 1,000?
So what is a gigabyte? Is it 1,024 megabytes as many of our techno geeks claim? Or is it 1,000 megabytes? At the least, we now have a basis to address the questions with a little broader perspective.
Maybe the answer is, “It can be whatever you want it to be.” I was the frog footman in our production of Alice in Wonderland back at Kecoughtan High School. I will never forget this dialog between Humpty Dumpty and Alice, excerpted from Through the Looking-Glass, by Lewis Carroll:
“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is, which is to be master—that’s all.”
Does that work for e-discovery? I suppose it could if everyone agreed that 1,024 should be the measure. A kilobyte means 1,024 bytes because that is what we chose it to mean–or, more appropriately, because that is what we have been calling it for years.
I have spoken with a number of technical guys I respect about this topic. They are adamant. “A gigabyte is 1,024 megabytes,” they say with fervor. “That’s the way it’s always been.”
Maybe they are right. As one pointed out to me, “Every console and network application out there uses binary multiples. Even Windows shows binary gigabytes.” Another person suggested that file systems store data in blocks that are better tracked in binary multiples. That one flew over my right by me but some of you may understand it. Others just go off what they learned when they got started.
With all due respect for differing opinions, I side with NIST and the other international standards bodies. The prefix kilo means 1,000 and that is that. It makes no sense to mix and match definitions depending on how the wind is blowing that day. Mega means 1 million and giga means 1 billion.
Certainly disclosure is central to this discussion. At Catalyst, we have followed the definitions used by the SI for as long as I can remember. We disclose that fact prominently on our price sheets and on our support site and explain that it is the accepted international standard. If others use different definitions, that is certainly their prerogative, just as it was for Humpty Dumpty. It is primarily a matter of disclosure but consistency and standards should factor into the discussion as well.
The problem was significant enough to lead the international standards bodies to create new titles for binary multiples: kibibyte, mebibyte and gibibyte. These sound a bit silly, which perhaps caused people not to use them as substitutes for their metric counterparts. Maybe the problem is a matter of familiarity; we just aren’t used to them. I remember seeing the first elevated tail lights in cars and thinking they looked strange. Now, they look quite normal. Perhaps it would be the same for kibibytes and gibibytes. Or perhaps they aren’t needed in the first place.
How many bytes in a gigabyte? The answer seems simple and straightforward to me. There are 1 billion bytes in a gigabyte, 1 million in a megabyte and 1,000 in a kilobyte. Kilo means 1,000 whether measuring bytes, meters or grams. These are metric figures and they should remain constant across the board. It is as simple as that.
About John Tredennick
A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.
View all posts by John Tredennick →
- When were ancient Roman mosaics made
- Will Richard Dawkins join Quora
- Do you notice your own cognitive dissonance
- Are they any fax app like iFax
- Are hamburgers popular in Germany
- What happens to used windshields
- Should you cut grass in the winter
- Do chickens need water at night
- How do you disable YouTubes restricted mode
- What is a sarcomere
- Are people conceived through IVF fertile
- What drink is good for glowing skin
- Is atheism widespread among Israelis
- Did someone ever die from insomnia
- Why do ISFJs hate MBTI
- How many satoshi equals one bitcoin
- How can I study abroad in Japan
- What’s the mentality of a gangster