How does Google read flash?

FlashAs Daniel identifies in his recent post, Google is indexing increasing quantities of flash content, and returning such content further up the search results. This raises the question of what web designers can do to see how google will experience their flash content.

The process of indexing flash is more challenging than for web pages for two reasons. With its clearly defined structure, XHTML provides a rich description of content – it identifies headings, paragraphs, and lists and provides a way to assign metadata such as title and description – it is by definition a markup language. Flash is not: flash is a multimedia platform focussed on interaction and experience and providing fewer semantic cues to search engine robots. In addition, Flash supports storing entire applications within a single file, allowing multiple pages of content to share a single URL and thus presenting as a single indexable resource.

According to a Google announcement last year..

“We’ve developed an algorithm that explores Flash files in the same way that a person would, by clicking buttons, entering input, and so on. Our algorithm remembers all of the text that it encounters along the way, and that content is then available to be indexed.”

Unsurprisingly, the details of this algorithm are not public, although Google suggest that it is based on Adobe’s Searchable SWF library. While Adobe says that content developers do not need to do anything in order to benefit from the improved indexing of swf files, they do provide a utility called the Search Engine SDK which was originally designed by Macromedia to “provide search engines with the means to search and index Macromedia Flash movies”.

The tool is available for free, but it’s quite hidden on the Adobe website and you do need to sign up to the Adobe Player Licensing Program. The best way to get it is via this link: Follow the prompts to sign up and verify your email and you’ll receive a download link.

For anyone not familiar with command-line applications, here’s how to use the tool on windows:

  1. Extract the file somewhere on your computer.
  2. Open the extracted folder and copy the swf2html.exe file from the ‘windows’ directory to a new directory such as C:swf
  3. Copy the swf file you want to convert into the same directory.
  4. Fire up a terminal window (in Vista, click the start button, type ‘cmd’ in the search field and press enter)
  5. Switch to the directory containing the tool
    cd c:swf
  6. enter the following (where myswf is the name of your flash file):
    swf2html myswf.swf

This will output the html to the screen, but if you’d like to save the html to a file (for viewing in your browser) then simply redirect the output as follows:

swf2html myswf.swf > myhtml.html

While the tool won’t show you exactly what will appear in search engines, a quick look at the output should give you some idea whether all of the important content is visible as text, and conversely whether unimportant text could be hidden (for example, by converting it to images within the Flash).