How to best structure product data and metadata is a challenge facing many companies today. Take, for example, “Memory Organization,” a complex attribute found in Memory ICs. Memory cells are organized into rows and columns and one chip can even have two different organizations of its cells. This makes it possible to have two values for the attribute.
Memory Organization: 256 K x 16 bit; 512 K x 8 bit
What is the optimal architecture for a complex attribute like the one above? The old adage of “Keep It Simple” persists and leads many to capture “Memory Organization” in one text field.
Solution A:
Memory Organization: 256 K x 16 bit; 512 K x 8 bit
Solution A may simplify the initial process of data modeling and capture, but over time it proves inflexible and virtually guarantees inconsistent and unnormalized data. Manufacturers detail specification information in different ways and Solution A does not provide a structure for how the information should be captured and stored. Additionally, because all of the elements of the attribute are captured in one field, any manipulation of the data has to take place manually.
An alternative model might look like the following:
Solution B:
Memory Organization No. of Units 1: 256 K
Memory Organization Unit Size 1: 16 bit
Memory Organization No. of Units 2: 512 K
Memory Organization Unit Size 2: 8 bit
(Concatenated) Memory Organization: 256 K x 16 bit; 512 K x 8 bit
Solution B treats the attribute as having two values, each composed of two elements: number of units and unit size. It is the superior solution for a number of critical reasons:
1. Consistency: Name, value and unit of measure are broken out for each element of each attribute value. Limiting what is found in each field ensures consistency. Where appropriate, restricted values can be defined for the value and unit of measure fields to further promote normalized data that can be effectively maintained over time.
2. Search: Each “Memory Organization” can appear separately in the dropdown list of a faceted search menu, reducing the number of unique values a customer must sort through. As search technologies advance and navigation attributes employ searchable textboxes, a customer will later be able to search on the element of the attribute that is of most interest. Perhaps the customer is concerned with finding a 32 bit organization and less worried about the number of units. Solution B enables more effective search for both internal and external users and can easily feed increasingly sophisticated systems.
3. Flexibility: Solution B is a flexible structure that can be manipulated in a number of additional ways. Conversion and other mathematical operations on values and units are possible. Elements can be added or removed as the data needs change. Perhaps the test conditions of a particular attribute value were captured—“120 V at 50 A”—and it is later determined that only the voltage is of interest. If the entire string was captured in a single text field there is no simple way of deleting the “Test Condition.” If, however, the model parses out each element of the attribute value into “Voltage” and “Test Condition,” the latter is easily deleted or altered across an entire data set.
4. Integration: A data model that fully defines of all of its constituent elements and their relation to one another is a model that can more easily integrate with other systems both internal and external to the organization.
Gina Bulatovic
Sr. Data Solutions Consultant