Saturday, September 24, 2011

Just how much data must TA 1 mine?

Here's a quote from the initial introduction to TA 1 technologies:
TA1 performers will develop automated and semi-automated operator support tools and techniques for the systematic and methodical use of social media at data scale and in a timely fashion 
Since I have been working on TA 2 test systems with just 5k users, and finding 185k posts per fake year at a posting rate p of 0.1 posts/day,  I wondered what the real world has in store for SMiSC.  That is, what is "social media at data scale and in a timely fashion?"

Well here is "data scale" as of   By The Numbers: Twitter Vs. Facebook Vs. Google Buzz
Updates/Posts
  • Facebook status updates: 700 per second
  • Twitter tweets: 600 per second
  • Buzz posts: 55 per second
1355 updates per second, discriminated, categorized, aggregated, and reported on.  A "timely fashion" implies that it is okay to be "behind" by some time, but eventually the system must process everything.  I figure the requirement for maximum delay is set up to give a report on any new/significant meme within our leaders' decision-making cycle so that leaders cannot be outfoxed by a rapidly-spreading strategic message.

Yikes.

Here's stuff just on Facebook (current): FB stats
Twitter doesn't seem to have a similar page.
Couldn't find one for Google+ either.

Friday, September 2, 2011

DARPA's SMiSC Industry Day


On August 2, DARPA held the first SMiSC industry day. It was hosted by Systems Planning Corporation, in Arlington, from 10:00 AM-5:30 PM. The day was divided into two parts.  The morning session was an introduction to SMiSC and the BAA process. The second half of the day consisted of one on one secessions between attendees and Dr. Rand Walzman. Nothing of importance has been reported about this day yet.  Despite this I can bring two bits of information.

First, I noticed the only documents posted by Dr. Rand Walzman is on the IRB process. This leads me to guess that TA2 is of important and they recognize the complications of doing human studies. Second, we were able to obtain a list of attendees. Analyzing it we can see several things. There were 122 people attending divided roughly into 76 different corporations, agencies and educational institutions. Sixty of them were companies, with 30 of them being primarily defense companies the others were non-defense firms.  The remaining were academic or government agencies.  The leading speciality for the defense companies were in simulation, security and enterprise solutions.  The non defense firms looked at social networks, linguistics and data mining.

What we see, at least by this list, is the military is behind the curve when it comes to using social networks as a source of date mining and developing ways to monitor its content.  This would shock many reporters.  Indeed civilian companies have been monitoring the Internet for years to develop not just market strategy but PR and political strategies as well.  It will be interesting to see how the military applies the technology in the coming years.