Search

The main user task is locating existing microformatted data found on the public Web. This will eventually require ranking multiple results to find the most relevant. It may also require a more-specific query language for working with particular fields (“Advanced Search”). A secondary task is reusing this information once found.

Query Interface

A one-line input form would be ideal. Using query terms alone won’t specify the range of microformat types to search. Not knowing the type also complicates handling of compound microformats — should it return hCards within hCalendar events as ‘naked’ hCards?

  • text Terms: should be case-folded and matched anywhere, in any text node.
    • Çelik, Argent, web-2.0, “Web 2.0”, smith +sunnyvale, yankees -NY -“New York”
  • h* Directives: should choose a schema-appropriate range of shortcuts for common elements of popular microformats
    • Tantek hCard — where name-of-spec should force type matching
    • region:CA, postal-code:94040 — literal matching of class names from the specs
    • Tantek zip:9*, Yankees in:CA, party by:Rohit — meta-matching across colloquialisms for “all location-related stuff” or “organizer or participant”
  • xq Directives: should be applied to some? XML representation of all the stored microformats
    • //region=”CA”, //given-name=”Khare*” (?). Based on automatic application of miniML transformation rules, with dictionaries for all the common µf terms (XMDP)
  • CSS Selectors ?: would it be designer-friendly to use CSS3
    • (.vcard .fn):foo == fn:foo. Basically, roll-your-own directive by using css to specify what will match.

Result Interface

Start with a standardized card UI element.

Submission

Client-side: Miffy

Server-side: UFP (?)

could easily be delayed — make it work with interactively submitted content first.

Crawling

How to schedule regular refreshes? How to avoid duplicating all of evdb & upcoming, say?

Deployment Issues

If we invest in the Microsearch scenario, it’s worth asking what it will take to fund it all the way through to deployment.

Scalability

dbxml truly sufficient in the short term? concurrency?

Privacy

how to “unsubmit”?

is downloading images still a risk in this context? Probably not.

Security

… scrubbing & sterilization …

Spam

no, PageRank-of-the-source-page is no protection :(

user accounts with strong passwords? (blacklists…)

should we burn a unique-key into every IPL, so we can at least distinguish contributors?

Recap: Goals

Searcher’s Goals:

  1. Find people, events, reviews — the sorts of things microformats have been invented for
  2. Re-use the search results easily (e.g. in .vcf or other formats)
  3. Explore the world of microformats

Author’s goals:

  1. Test out their markup
  2. Attract readers for their content

Our goals:

  1. Promote microformats by “showing off” how much is out there already
  2. Give microformated data chunks their own addresses for further automation/remixing

Microformats.org community goals:

  1. Gather all the “test cases” together
  2. Cross-check specs with actual practice