Canonical: all shop pages now assign og_url (reusing the existing og:url assign), which the layout renders as <link rel="canonical">. Collection pages strip the sort param so ?sort=price_asc doesn't create a duplicate canonical. robots.txt: dynamic controller disallows /admin/, /api/, /users/, /webhooks/, /checkout/. Removed robots.txt from static_paths so it goes through the router instead of Plug.Static. sitemap.xml: auto-generated from all visible products + categories + static pages, served as application/xml. 8 tests. Also updates PROGRESS.md: marks tasks 55, 58, 59, 61, 62 as done. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
19 KiB
URL redirects
Status: Planned Tasks: #78–81 in PROGRESS.md Tier: 3 (Compliance & quality — SEO dependency)
Goal
Preserve link equity and customer experience when product URLs change or products are removed. Automatically handle the most common cases, use analytics data to identify what actually matters, and surface anything ambiguous for admin review.
Why it matters
Product slugs in Berrypod are generated from product titles via Slug.slugify(title). When a provider renames a product, the next sync generates a new slug and the old URL becomes a 404. These old URLs may be:
- Indexed by Google (losing SEO rank)
- Shared on social media, in emails, in newsletters
- Bookmarked by returning customers
Most redirect implementations just provide a manual table. The insight here is that we already have analytics data recording which paths have had real human traffic — so we can separate 404s that matter (broken real URLs) from noise (bot scanners, /wp-admin probes, etc.) without any manual work.
Three layers
Layer 1: Automatic redirect creation on slug change
The most common case. When a product's title changes during sync, the slug changes, and the old /products/old-slug URL breaks. We detect this automatically in upsert_product/2.
Hook point: lib/berrypod/products.ex:421–425 — the product -> branch in upsert_product/2 where update_product(product, attrs) is called. At this point we have product.slug (old) and can compute the new slug from attrs[:title].
product ->
old_slug = product.slug
new_slug = Slug.slugify(attrs[:title] || attrs["title"])
case update_product(product, attrs) do
{:ok, updated_product} ->
if old_slug != updated_product.slug do
Redirects.create_auto(%{
from_path: "/products/#{old_slug}",
to_path: "/products/#{updated_product.slug}",
source: :auto_slug_change
})
end
{:ok, updated_product, :updated}
error -> error
end
create_auto/1 uses on_conflict: :nothing on the from_path unique index — safe to call repeatedly if sync runs multiple times.
Layer 2: A redirects table checked early in the Plug pipeline
One table, one Plug, all redirect types flow through the same path.
Plug position: Added to the :browser pipeline in router.ex, before routing. Checks a path, 301s and halts if a redirect exists, otherwise passes through.
# router.ex
pipeline :browser do
...
plug BerrypodWeb.Plugs.Redirects
...
end
defmodule BerrypodWeb.Plugs.Redirects do
import Plug.Conn
alias Berrypod.Redirects
def init(opts), do: opts
def call(%{request_path: path} = conn, _opts) do
case Redirects.lookup(path) do
{:ok, redirect} ->
Redirects.increment_hit_count(redirect)
conn
|> put_resp_header("location", redirect.to_path)
|> send_resp(redirect.status_code, "")
|> halt()
:not_found ->
conn
end
end
end
Caching: The redirect lookup is on the hot path for every request. Use ETS for an in-memory cache, populated on app start and invalidated on any redirect create/update/delete.
# On app start, load all redirects into ETS
Redirects.warm_cache()
# On redirect change, invalidate
Redirects.invalidate_cache(from_path)
The ETS table maps from_path (binary) → {to_path, status_code}. Cache miss falls through to DB. Given redirects are rare and mostly set-and-forget, the cache hit rate should be near 100% after warmup.
Layer 3: Analytics-powered 404 monitoring
When a 404 fires, most hits are bots and scanners. The signal that distinguishes a real broken URL from noise is analytics history: if a path appears in events with prior real pageviews, it was a genuine product page.
404 handler hook: The existing error.ex LiveView renders 404s. Add a side-effect: when a 404 fires on a path matching /products/:slug or /collections/:slug, query analytics and potentially auto-resolve.
defp maybe_log_broken_url(path) do
prior_hits = Analytics.count_pageviews_for_path(path)
if prior_hits > 0 do
BrokenUrls.record(%{
path: path,
prior_analytics_hits: prior_hits
})
attempt_auto_resolve(path, prior_hits)
end
end
Auto-resolution attempt:
For /products/:slug 404s, extract the slug and run it through the FTS5 search index to find the most likely current product:
defp attempt_auto_resolve("/products/" <> old_slug, _hits) do
query = String.replace(old_slug, "-", " ")
case Search.search_products(query, limit: 1) do
[%{score: score, slug: new_slug}] when score > @confidence_threshold ->
Redirects.create_auto(%{
from_path: "/products/#{old_slug}",
to_path: "/products/#{new_slug}",
source: :analytics_detected,
confidence: score
})
_ ->
# No confident match - leave in broken_urls for admin review
:ok
end
end
The @confidence_threshold needs tuning — FTS5 BM25 scores are negative (more negative = better match). Start conservative; it's better to leave something for manual review than to auto-redirect to the wrong product.
For deleted products with no match, the redirect target defaults to the product's last known category collection page if that's inferable (from the path or broken_url record), otherwise falls back to /.
Schemas
redirects table
create table(:redirects, primary_key: false) do
add :id, :binary_id, primary_key: true
add :from_path, :string, null: false # "/products/old-classic-tee"
add :to_path, :string, null: false # "/products/classic-tee-v2" or "/"
add :status_code, :integer, default: 301 # 301 permanent, 302 temporary
add :source, :string, null: false # "auto_slug_change" | "analytics_detected" | "admin"
add :confidence, :float # FTS5 match score for analytics_detected, nil otherwise
add :hit_count, :integer, default: 0 # incremented each time this redirect fires
timestamps()
end
create unique_index(:redirects, [:from_path])
create index(:redirects, [:source])
broken_urls table
create table(:broken_urls, primary_key: false) do
add :id, :binary_id, primary_key: true
add :path, :string, null: false
add :prior_analytics_hits, :integer, default: 0 # pageviews before the 404 started
add :recent_404_count, :integer, default: 1 # 404s since it broke
add :first_seen_at, :utc_datetime, null: false
add :last_seen_at, :utc_datetime, null: false
add :status, :string, default: "pending" # "pending" | "resolved" | "ignored"
add :resolved_redirect_id, :binary_id # FK to redirects when resolved
timestamps()
end
create unique_index(:broken_urls, [:path])
create index(:broken_urls, [:status])
create index(:broken_urls, [:prior_analytics_hits]) # sort by impact
Admin UI
Route: /admin/redirects
Tab 1: Active redirects
Table of all redirects with columns: from path, to path, source (badge: auto/detected/manual), hit count, created at. Delete button to remove. Edit to change destination.
Sources:
auto_slug_change— created automatically when sync detected a slug change. Trust these.analytics_detected— created from analytics + FTS5 match. Show confidence score. Worth reviewing.admin— manually created.
Tab 2: Broken URLs (pending review)
Table sorted by prior_analytics_hits descending — highest impact broken URLs at the top.
Columns: path, prior traffic (from analytics), 404s since breaking, first seen.
Each row has a quick action: enter a redirect destination and save, or mark as ignored (e.g. it's a legitimate 404 from a product intentionally removed).
Pre-filled suggestion from FTS5 search (same logic as auto-resolution, just surfaced for human confirmation rather than applied automatically).
Tab 3: Dead links
See below — dead link monitoring surfaces here alongside redirects, since they're two sides of the same problem.
Tab 4: Create redirect
Simple form: from path, to path, status code (301/302). For manual one-off redirects (external links, social posts, etc.).
Data flow
Provider renames product
↓
ProductSyncWorker → upsert_product/2
↓
old_slug != new_slug detected
↓
Redirects.create_auto({from: /products/old, to: /products/new})
→ ETS cache invalidated
─────
Customer visits /products/old-slug
↓
BerrypodWeb.Plugs.Redirects checks ETS cache
↓ hit
301 → /products/new-slug
hit_count incremented
─────
Bot/customer visits an unknown broken URL
↓
Plug: no redirect found → pass through
↓
Router: no match → 404 LiveView
↓
Analytics.count_pageviews_for_path(path)
↓
0 hits → likely a bot, discard silently
> 0 hits → real broken URL
↓
BrokenUrls.record(path, prior_hits)
↓
Attempt FTS5 auto-resolve
↓ confident match
Redirects.create_auto({..., source: :analytics_detected})
↓ no match
Left in broken_urls for admin review
─────
Admin opens /admin/redirects → broken URLs tab
↓
Sees sorted list of broken URLs by prior traffic
↓
Enters destination → creates redirect
↓
ETS cache warmed → Plug now catches future requests
Dead link monitoring
Redirects fix incoming broken URLs. Dead link monitoring fixes outgoing broken links in your own content — nav links, footer links, social URLs, announcement bar targets, rich text content, product descriptions. Two sides of the same problem.
Why Berrypod can do this better than external tools
External link checkers (Ahrefs, Screaming Frog, etc.) crawl your site periodically from the outside. They can't know why a link broke or when it's about to break. Berrypod knows:
- Exactly which URLs are valid (it owns the router and the DB)
- When products are deleted or renamed (sync events)
- Where every admin-configured link is stored (settings keys)
This means internal links can be validated instantly and without any HTTP request — just check the router and DB. External links need an async HTTP HEAD check via Oban.
Sources of links in Berrypod
| Source | Type | When to check |
|---|---|---|
| Nav/footer links (settings) | Internal or external | On save + when referenced product changes |
| Social links (settings) | External | On save + weekly Oban job |
| Announcement bar target URL (settings) | Internal or external | On save |
| Rich text content (future page editor) | Internal or external | On save + when referenced product changes |
| Product descriptions (synced from providers) | Potentially external | After each sync |
| Contact page email | Not a URL | Format validation only |
Note: Links rendered from DB data (product cards, collection listings) are safe by construction — you only render a link if the product/collection exists. The risk is entirely in user-entered free-text URLs stored in settings or content.
Two-phase validation
Phase 1: Internal links — instant router + DB check
defmodule Berrypod.LinkValidator do
alias BerrypodWeb.Router.Helpers
def validate(url) when is_binary(url) do
uri = URI.parse(url)
cond do
# External URL — queue for async check
uri.host != nil -> {:external, url}
# Internal — check router match
true -> validate_internal(uri.path)
end
end
defp validate_internal("/products/" <> slug) do
case Products.get_product_by_slug(slug) do
%{visible: true, status: "active"} -> :ok
%{visible: false} -> {:dead, :product_hidden}
nil -> {:dead, :product_not_found}
end
end
defp validate_internal("/collections/" <> slug) do
if Products.category_exists?(slug), do: :ok, else: {:dead, :category_not_found}
end
defp validate_internal(path) do
# Check against router for known static paths
case Phoenix.Router.route_info(BerrypodWeb.Router, "GET", path, "") do
:error -> {:dead, :no_route}
_match -> :ok
end
end
end
Phase 2: External links — async Oban job
defmodule Berrypod.Workers.ExternalLinkCheckWorker do
use Oban.Worker, queue: :default, max_attempts: 2
def perform(%{args: %{"url" => url, "source_key" => source_key}}) do
case Req.head(url, receive_timeout: 10_000, redirect: true) do
{:ok, %{status: status}} when status < 400 -> :ok
{:ok, %{status: status}} -> record_dead_link(url, source_key, status)
{:error, _} -> record_dead_link(url, source_key, :unreachable)
end
end
end
Rate limiting: one check per URL per 24 hours. Don't hammer external servers.
Event-driven invalidation
The smart part. Rather than only checking periodically, hook into the events that cause dead links:
On product deleted/made invisible:
# After Products.delete_product/1 or hiding a product
DeadLinks.scan_stored_links_for_path("/products/#{old_slug}")
# Finds any nav/footer/content links pointing to that path → flags them
On product slug change: The redirect is created automatically (existing plan). Additionally:
# Stored links pointing to the old slug are now stale
# Flag them with a "link moved" status + the new destination
DeadLinks.flag_moved_links("/products/#{old_slug}", "/products/#{new_slug}")
# Admin sees: "Your footer links to /products/old-name — this moved to /products/new-name. Update it?"
This is more actionable than just "link is broken" — it tells you where it moved to.
On admin saves any content with URLs: Validate immediately. Internal links checked synchronously (fast). External links enqueued for async check.
Schema
create table(:stored_links, primary_key: false) do
add :id, :binary_id, primary_key: true
add :url, :string, null: false # the full URL or path
add :source_key, :string, null: false # e.g. "settings.footer_link_1", "nav.about"
add :link_type, :string, null: false # "internal" or "external"
add :status, :string, default: "ok" # "ok" | "dead" | "moved" | "unchecked"
add :http_status, :integer # last HTTP status for external links
add :dead_reason, :string # "product_not_found", "no_route", "unreachable", etc.
add :moved_to, :string # when status is "moved", the new destination
add :last_checked_at, :utc_datetime
timestamps()
end
create unique_index(:stored_links, [:url, :source_key])
create index(:stored_links, [:status])
create index(:stored_links, [:link_type])
Admin UI: Dead links tab
Table of all dead/moved/unchecked stored links, sorted by status (dead first, then moved, then unchecked).
Columns: source (where the link is — "Footer", "Nav", "Announcement bar"), URL, status badge, last checked, action.
Actions:
- Dead: "Edit" (opens the relevant settings section pre-focused on that field) — or "Ignore" if intentional
- Moved: "Update link" one-click to replace old URL with the new destination in the source setting
- Unchecked: "Check now" to trigger immediate validation
Dashboard integration: a small badge on the admin dashboard card ("3 dead links") to draw attention without being annoying. Cleared when all are resolved or ignored.
Weekly Oban cron job
Re-check all external links stored in stored_links. Internal links don't need periodic re-checking — they're validated on demand and on data-change events, which is more efficient.
# In Oban crontab
{"0 3 * * 1", Berrypod.Workers.WeeklyExternalLinkCheckWorker}
The weekly job enqueues one ExternalLinkCheckWorker job per external stored link, with rate limiting.
What it deliberately doesn't do
- Doesn't crawl rendered HTML — too fragile, too slow. We work from structured data (settings keys, content blocks), not parsed HTML.
- Doesn't check links in transactional emails — those are templates, not user content.
- Doesn't validate email addresses — format check only, not SMTP validation (too invasive).
- Doesn't check links in product images — image URLs are managed by the Media pipeline, not free-text.
Relationship to redirect system
| Problem | Solution |
|---|---|
| Visitor hits a broken URL | Redirect — 301 to new location |
| Your own content links to a broken URL | Dead link fix — update the link in your content |
| Product renamed — old URL works | Redirect created automatically |
| Product renamed — your nav still says old URL | Dead link flagged as "moved" with suggestion |
They complement each other. The redirect preserves SEO and visitor experience for external links you can't control (social posts, other websites linking to you). The dead link monitor fixes links you can control — your own navigation, content, and settings.
Implementation notes
Slug change detection is safe to add with no behaviour change for products that don't change slug. The on_conflict: :nothing insert ensures idempotency across repeated syncs.
The FTS5 confidence threshold should be tuned conservatively at first. An incorrect auto-redirect (wrong product) is worse than no redirect. Admin review catches the gaps.
ETS cache invalidation needs to happen on: redirect created, updated, deleted. Simple GenServer or :persistent_term approach — at the scale of a single-tenant shop, the full redirect table easily fits in memory.
Redirect chains (A → B → C) should be detected and flattened on creation. If a new redirect's to_path is itself an existing from_path, follow it and set the new redirect's to_path to the final destination. Avoids multi-hop redirects.
Status code guidance:
301Permanent — use for slug changes and deleted products. Tells Google to update its index.302Temporary — only for sales/temporary campaigns. Tells Google to keep the original URL indexed.
Files to create/modify
- Migration —
redirectsandbroken_urlstables lib/berrypod/redirects/redirect.ex— schemalib/berrypod/redirects/broken_url.ex— schemalib/berrypod/redirects.ex— context:lookup/1,create_auto/1,create_manual/1,warm_cache/0,invalidate_cache/1,increment_hit_count/1,list_broken_urls/0,record_broken_url/2lib/berrypod_web/plugs/redirects.ex— new Pluglib/berrypod/products.ex— slug change detection inupsert_product/2lib/berrypod_web/live/shop/error.ex— hook analytics query on 404lib/berrypod_web/live/admin/redirects_live.ex— new LiveView (3 tabs)- Router —
/admin/redirectsroute, ETS cache warm on startup - Admin nav — new sidebar link
Tests
upsert_product/2with title change creates redirect automaticallyupsert_product/2with no title change does not create redirect- Redirect Plug: matching path → 301, no match → passthrough
- Redirect Plug: ETS cache hit (no DB call)
- 404 handler: path with analytics history → broken_url record created
- 404 handler: path with no analytics history → nothing recorded
- FTS5 auto-resolve: confident match → redirect created; no match → broken_url pending
- Redirect chain flattening: A→B, new B→C → stored as A→C
hit_countincremented on each redirect fire