What is Data Objects ?

Data Objects

Data sets are made up of data objects.A data object represents an entity.

Examples:

  • sales database: customers, store items, sales
  • medical database: patients, treatments
  • university database: students, professors, courses

Also called samples , examples, instances, data points, objects, tuples.Data objects are described by attributes.

Database rows -> data objects; columns ->attributes.

Attribute (or dimensions, features, variables): a data field, representing a characteristic or feature of a data object.
E.g., customer _ID, name, address


Attribute  Types:

  • Nominal
  • Binary
  • Numeric: quantitative
    • Interval-scaled
    • Ratio-scaled

Nominal: categories, states, or “names of things”

  • Hair_color = {auburn, black, blond, brown, grey, red, white}
  • marital status, occupation, ID numbers, zip codes

Binary

  • Nominal attribute with only 2 states (0 and 1)
    • Symmetric binary: both outcomes equally important
      • e.g., gender
    • Asymmetric binary: outcomes not equally important.
      • e.g., medical test (positive vs. negative)
      • Convention: assign 1 to most important outcome (e.g., HIV positive)

Ordinal

  • Values have a meaningful order (ranking) but magnitude between successive values is not known.
  • Size = {small, medium, large}, grades, army rankings
  • Quantity (integer or real-valued)

Interval

  • Measured on a scale of equal-sized units
  • Values have order
    • E.g., temperature in C˚or F˚, calendar dates
  • No true zero-point

Ratio

  • Inherent zero-point
  • We can speak of values as being an order of magnitude larger than the unit of measurement (10 K˚ is twice as high as 5 K˚).
    • e.g., temperature in Kelvin, length, counts, monetary quantities




 

Discrete vs. Continuous Attributes

Discrete Attribute

  • Has only a finite or countably infinite set of values
    • E.g., zip codes, profession, or the set of words in a collection of documents
  • Sometimes, represented as integer variables
  • Note: Binary attributes are a special case of discrete attributes

Continuous Attribute

  • Has real numbers as attribute values
    • E.g., temperature, height, or weight
  • Practically, real values can only be measured and represented using a finite number of digits
  • Continuous attributes are typically represented as floating-point variables




 

Types of Data Sets

Record

  • Relational records
  • Data matrix, e.g., numerical matrix, crosstabs
  • Document data: text documents: term-frequency vector
  • Transaction data

Graph and network

  • World Wide Web
  • Social or information networks
  • Molecular Structures

Ordered

  • Video data: sequence of images
  • Temporal data: time-series
  • Sequential Data: transaction sequences
  • Genetic sequence data

Spatial, image and multimedia:

  • Spatial data: maps
  • Image data
  • Video data