Session 2.1: What Is Structured Data

Course → Module 2: Structured Data: Speaking Google's Language

Session 1 of 10

Search engines read HTML. They parse headings, paragraphs, lists, and links. They extract text, follow anchors, and build an index. But HTML was designed for humans, not machines. A paragraph that says "Acme Corp was founded in 2003 by Jane Smith in Austin, Texas" contains four distinct facts: an organization name, a founding date, a founder, and a location. A human reader understands that instantly. A search engine has to guess.

That guessing process is called inference. Google's natural language processing is extraordinarily good at it. But inference is probabilistic. It can be wrong. And when it is wrong, the consequences compound: your entity gets confused with another, your Knowledge Panel shows incorrect information, or your business simply never appears for the queries that matter most.

Structured data eliminates the guessing. Instead of hoping that Google correctly infers "Jane Smith founded Acme Corp," you declare it explicitly in a machine-readable format. That declaration is not a suggestion. It is a statement of fact, encoded in a vocabulary that search engines have agreed to understand.

Unstructured vs. Structured Data

The difference is not about content quality. A beautifully written about page with rich detail about your company is still unstructured data from a machine's perspective. The machine has to parse natural language, resolve ambiguity, and infer relationships. Structured data skips all of that.

graph LR A["Web Page
(HTML text)"] -->|NLP Inference| B["Google's Understanding
(probabilistic)"] A -->|Structured Data| C["Explicit Declaration
(deterministic)"] B -->|"May be correct"| D["Entity Graph"] C -->|"Is correct"| D style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#8a8478,color:#ede9e3 style C fill:#222221,stroke:#6b8f71,color:#ede9e3 style D fill:#222221,stroke:#c8a882,color:#ede9e3

The diagram above shows the two paths from your web page to Google's entity graph. The top path relies on inference. The bottom path uses structured data. Both can arrive at the same destination, but the structured data path is deterministic. You control what gets understood.

What Structured Data Actually Looks Like

There are three formats for adding structured data to a web page: Microdata (HTML attributes), RDFa (another set of HTML attributes), and JSON-LD (a JavaScript object in a script tag). Google recommends JSON-LD. It is the cleanest, easiest to maintain, and does not require modifying your visible HTML.

Here is a minimal example:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Acme Corp",
  "founder": {
    "@type": "Person",
    "name": "Jane Smith"
  },
  "foundingDate": "2003",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Austin",
    "addressRegion": "TX"
  }
}
</script>

That block sits in the <head> or <body> of your HTML. It is invisible to users. But it tells Google, explicitly, that Acme Corp is an Organization, founded in 2003 by Jane Smith, located in Austin, TX. No inference required.

Declaration vs. Inference: Why Both Matter

Structured data does not replace good content. It supplements it. Google uses both signals. If your structured data says one thing and your page content says another, Google will trust neither. The two must align.

Aspect	Inference (from HTML)	Declaration (JSON-LD)
Format	Natural language in HTML	Machine-readable JSON
Accuracy	Probabilistic	Deterministic
Control	Limited	Full
Maintenance	Edit page content	Edit JSON block
Rich results eligible	Rarely	Yes, when valid
Entity graph impact	Indirect	Direct

Think of it this way: your page content is your argument. Your structured data is your sworn testimony. Both are evidence. But testimony, given in a structured format the court (Google) expects, carries more procedural weight.

The Schema.org Vocabulary

Structured data needs a shared vocabulary. If every website invented its own property names, search engines would be back to guessing. That shared vocabulary is Schema.org, a collaborative project founded in 2011 by Google, Microsoft, Yahoo, and Yandex.

Schema.org defines types (Organization, Person, Product, Event, etc.) and properties (name, url, founder, address, etc.). When you write "@type": "Organization", you are using Schema.org's vocabulary. When you write "founder", you are using a property that Schema.org has defined for the Organization type.

The vocabulary is large. Schema.org defines over 800 types and thousands of properties. You will not use most of them. For entity authority work, the types that matter most are Organization, Person, LocalBusiness, Product, Service, FAQPage, and HowTo. We will cover each in dedicated sessions.

Key concept: Structured data is not metadata. Metadata describes the page. Structured data describes the entity. Your meta description tells Google what the page is about. Your JSON-LD tells Google what the entity IS.

Who Benefits From Structured Data

Every entity that wants to be correctly understood by machines. That includes businesses, people, products, events, recipes, articles, and more. But for this course, we focus on the entity authority use case: establishing your organization or personal brand as a recognized, disambiguated entity in Google's Knowledge Graph.

If you completed Module 1's NAP audit, you already have the raw data. Structured data is how you formalize that data in a way machines can consume without ambiguity.

What Structured Data Cannot Do

It cannot fix bad content. It cannot create authority where none exists. It cannot trick Google into displaying a Knowledge Panel for an entity that has no real-world presence. Google explicitly states that structured data must reflect the content visible on the page. Misleading structured data can result in manual actions (penalties).

It also cannot guarantee rich results. Google uses structured data as an input, not a command. Having valid FAQ schema does not mean Google will show FAQ rich results for your page. It means your page is eligible. The decision is Google's.

Assignment

Pick any business website (yours or a competitor's). View the page source and search for application/ld+json.

Does the site use structured data? If so, which format (JSON-LD, Microdata, RDFa)?
What types are declared (Organization, LocalBusiness, Product, etc.)?
List three facts about the entity that are explicitly declared in the structured data.
List three facts that are present in the visible page content but NOT in the structured data. These represent missed opportunities.

What Is Structured Data