The Webbot Project: How Webbot works

Also known as time monk has a patent on computer-assisted reading technology which allows reading from computer screens at up to 2000 words per minute. Reaching into other areas of hidden potential within language use by humans, he has been developing a system of software internet agents (like search engines use) and other proprietary processing methods to predict future events. The software project, begun in 1997, captures near-real-time changes in language patterns within internet discussions. Then, employing radical linguistic techniques of his own devising, he develops a model which anticipates future events with some seeming accuracy. The processing has, at its core, a method of assigning emotional values to complex content and time carry-values to predict changes in future behavior based on how people are using language now.

ALTA (Alta stand for Asymmetric Linguistic Trend Analysis)

Process and practice

We are not alone in forecasting the future, all humans do it to some degree. Just a quick search of the internet will provide dozens of forms of future forecasting. Some use astrology, some use other methods.

We employ a technique based on radical linguistics to reduce extracts from readings of dynamic postings on the internet into an archetypical database. With this database of archtypical language, we calculate the rate of change of the language. The forecasts of the future are derived from these calculations. Our calculations are based on a system of associations between words and numeric values for emotional responses from those words. These 'emotional impact indicators' are also of our own devising. They are attached to a data base of over 300/three hundred thousand words. This data base of linked words/phrases and emotions is our lexicon from which the future forecasting is derived.

We call our future viewing the ALTA reports for 'asymetric language trend analysis'. The ALTA reports are available by subscription.

In the beginning.....

The beginning of all of our processing is the word. Or words, rather - actually excessively large amounts of them. Totals of words beyond all reason. These then are distilled down into a thick syrupy mass and placed in an inadequate visual display and from there interpretation proceeds.

Our interpretations of the data sets that we accumulate are presented in the form of a series of reports which detail the interpretations of the changes in language and what we think that they may mean.

Please note that our interpretations are provided as entertainment only. We are to be held harmless for universe placing substance behind our words. Or not, as it so chooses.

The interpretations provide a broad brush view of the future over the next few years. The broad view of the future is based on set theory and provides a collection of linguistic clues which can be used to forecast developing trends.

Some of our subscribers use these forecast interpretations to develop models of their own futures in our collective and changing planetary future. Some use the forecasts for trading purposes. Others for wild entertainment of the mostly implausible and highly improbable kind.

Even by our own rigorous standards, our forecasts are proving out better than mere chance would allow. Our track record is being tested with each new report series. So far, so good. We have a very high rate of returning subscribers (over 90%) which is likely an indication of needs being met.

Concepts
The changing nature of our lexicon was an unexpected part of our processing. It requires constant tuning as the 'emotional quantifiers' placed on the words change over time, as well as usage. Ever notice how some words and phrases get 'burned out' by excessive popular usage? As though they had no real emotional legs and by the time that the mainstream media grabs hold of them, or now, by the time they have been round the net a few times their emotional 'cache' has diminished and they fade away? Well, Igor and I sure notice such linguistic flows as we have to then adjust whatever emotional values that we had placed on that word or phrase and which has now had its emotional 'tonality' altered. Such tuning of the lexicon is quite tedious and as may be imagined requires a significant level of feedback with testing to even come close to obtaining a decent emotional representation within the confines of numbers.

Speaking of numbers, when the interpretation refers to an entity or lexical structure {note: a lexical structure is a collection of descriptor sets} as being 'fully populated' or 'complete and correct' in being populated, it means that the current data set has a representation within it for a required number of contexts which go into entity definition. So as an instance, there are nearly 20/twenty thousand contexts within the Bushista entity definition. These are composed of nearly 50/fifty thousand words. But not all of the contexts will be found by the spyders at any given point within the processing, and so an arbitrary per centage has been set at 64/sixty-four percent for the Bushista entity in order to consider that it has been 'minimally constituted'. This is an important distinction as when the times, and emotional waves within the populace change, it is possible for entities to also change. We have several large changes within entities already showing for 2007. So far these include our Markets entity, specifically the sub set for 'global markets' which apparently undergoes a large scale redefinition over the course of this coming Summer {June solstice to September equinox}.

Some of the entities involved within our modelspace are detailed in the table below. The entities are actually rather numerous, but many are either small, and thus rarely reported upon, or primarily used as 'supporting confirmation' for movements shown in larger entities. Some entities are so large, such as Terra entity, that the modelspace has to be dumped and reloaded to clear space for its display. This necessitates a rather tedious process for look ups on the cross links to other, not currently loaded, entities.

Some particulars of interest include that our data sets are constructed on a lunar month calendar. We use the sidereal lunar month as opposed to the synodic since humans inherently seem to be synchronized, at least linguistically, to the 27.321 day length month. The ALTA report series is built around a modelspace which extends out 1.618 lunar years from the incept date. Our lunar year contains 13 lunar, sidereal months for a 'year' length of 355.173 days. Our modelspace extends out to 575 days from the first date of processing for details on that ALTA series, for a total of 1.57 solar years of 365.25 days each. Not that any of this will be on the test later. But it sometimes does surface in interpretation of particular event/condition predictions.

Meta Data layers...
Meta data layers are linguistic concepts which meet the criteria of 'dominating' the modelspace, and being 'fully populated'. By 'dominating modelspace' we are referring to these lexical structures appearing in all major entities and some minor. Further the lexical structure must also be contributing to the 'emotional tension direction' in that it must represent the majority emotional 'force' of the emotional sums at that point in processing. In other words, the lexical structure of a meta data layer, within an entity, must be sympathetic to the general trend of the emotional tension of the entity as a whole, either negative or positive. Also the meta data layer will be 'fully populated' in a 'correct and complete' fashion. This is geek-speak for having all of the elements within the context at the core of the meta data layer be represented within the entity. Thus if there are 2000/two-thousand primary element descriptors to 'duality', then the 'duality' aspect is considered to be participating within the formation of a meta data layer in any given entity if it meets the 2/two criteria of 'domination' by its impact on the entities' emotional summations, as well as having all elements of the context for the meta data layer be found serendipitously within the entity.

Meta data layers are not visible within our display. These are dominating aspects/attribute sets which are 'active' or 'rising' within a number of entities, yet the lexical sets are based on the same core set. Further definition for a meta data layer requires that the lexical structure relate back to one of the 'prime emotional sets' within the lexicon. These are sets which, as a rule, contain lots of bespoke emotion words for aspects and even supporting attribute details. Such words would be 'fear', 'elation', 'confusion', and other words or phrases which can broadly be thought of as describing either an emotion, or a mental state *resulting* from emotional impacts to the human system. The meta data layers are composed of these sorts of aspect/attributes in their descriptor sets. Further, the same lexical structure, arranged in the same hierarchical fashion must be on a 'rise' in emotional summations within many of the larger entities in order to be considered as a candidate for meta data status.

When we note that a meta data layer is 'going vertical' within an entity, it is a representation that the meta data layer is now driving all of the emotional summations in its primary direction. The meta data layer first appears within an entity and then is filled, or populated over time. It then starts cross linking over to other entities which magnifies its impact on the emotional sums within each entity that comes under its influence, as well as also gaining in influence within entities in which it already exists.

When an entity has 1/one or more meta data layers vertical, it means that the lexical structures which thereafter accrue to that entity are interpreted as though being modified by the meta data layers.

When an entity has 1/one or more meta data layers vertical, it means that the lexical structures which thereafter accrue to that entity are interpreted as though being modified by the meta data layers.

The above data table was a snapshot of the meta data layers as of February 2007. The table below shows the state of the meta data layers as they exist at the time of this writing in October 2007 {ed note: from 0508 data set early processing}.

Emotional Tension Values - Release versus Building

At a very core level all of the bespoke emotion aspects can be separated into 2/two fundamental types. Obviously, since humans are involved, some overlap occurs and the separation is not pristine. The separating criteria is how the emotion can be characterized, which is to say, does this emotion build tension within an individual human body, or does it release it? Does the emotion twist the stomach? Or exercise the voice? Face? Rest of body? If the emotion causes the body to expel anything from any orifice, it is likely a [release] emotion and has summations in the lexicon to that effect. If the response of the body/mind to the emotion is the increase in muscular tension associated with 'holding the breath', then the emotion will fall into the [building tension] category.

Some emotions such as grief are of both types in that they bring complex emotional constructs to the surface. Some such as 'rage' are clearly 'expressive' of emotion, and thus are 'expelling' in nature. However it is worth noting that every expressive or release emotion also has a complement form for building emotions. As an example, we have the relationship between 'rage' and 'frustration'. In the case of 'rage', the emotion is expressed...flashing teeth, furious fists, screaming frenzy...that sort of thing, in the case of 'frustration' the emotion is of a 'building' nature which induces knotted stomach, tight lips, tense face, tight muscles and other familiar frustration body reactions. The building emotions such as 'frustration' all are part of a gradient of emotional containment which eventually bursts and spills over into 'expressive' or 'release' emotions. In this example the range can be thought of as beginning within 'constraints on behavior (from the outside)' and going into 'frustration' which then flows through various forms of 'gritting teeth' kinds of building anger until ultimately 'rage' is encountered and the emotional summations instantly change over to 'release states'.

A general 'release period' is expected to consist of an overall downward trend line composed of various steps of rising, then falling below the previous low levels, and onward until all of the pent up emotional states are/have been expressed. It is not unusual to have many building periods of some significance within a release period, as is illustrated within the months of May through July of 2007 on the chart above. As may be expected, a building emotional trend line should be composed of a similar, if reversed set of steps creating a general upward movement. The chart above shows the next large building period as a very distinct exception to our usual fare in that it has only 2/two release events within it, and they are both very small relative to the totality of the build. Highly unusual. Unique so far in point of fact. The other building portions of the trend line illustrated, which is to say, prior to March 8th 2007, and after January 20th 2008, show repeated release episodes within the over all trend.
-------------------------------------------------------------------------------

Process
The process for predicting the future from dynamic linguistics extracted from the internet began in 1997. Well, we have actually been gnawing on most of the concepts employed since about 1983. In 1994, we developed what we call the 'language model' for storage of data in large quanities for fast recovery when using SQL data bases. This extention of set theory as applied to data storage led to the examination of language in a new way. With the advent of the internet, the ability to test some of the 'language model' theories led to a series of programs being developed. This software is primarily written in prolog and implemented using LPA (Logic Programming Associates) version of this language as it offered pure prolog for the PC, as well as superior 'word concept' recognition and regex handling.
Our process begins with internet software agents which read in vast quantities of text from the public, and commonly accessible areas of the internet. We hunt for any of the words which are used as 'descriptors' within our process to define a context. Please see graphic below this discussion. These 'descriptors' are representative words or phrases which are used to define the basic 'idea' or 'concepts' of interest at that point.

Note we do NOT use 'conscious expressions' located on the internet, nor simple word counting techniques. We do not read emails. We only access publically posted texts on the internet seeking so much what is there, as what has recently changed there. So as a rule we concentrate our software agents on the forums and other community based sites.

What we do employ is a series of programming steps which reduce the text read down to 4/four digit hexidecimal integers which are themselves held in sets of 'inter linked linguistic discoveries'. These sets of hexidecimal integers are then aggregated along with information in a general sense as to where the text was located. The text returned is aggregated through further processing, producing a very large SQL based data base which is accessed by our prolog processing software. Once the data starts rolling in, the processing starts by associating the 'descriptor' with a whole group of 'values'. Together these form an 'aspect/attribute' coupling. In turn, the aspects/attributes are gathered into sets, and then humans examine these for various timing and manifestation clues.

The process of data gathering can take up to 3/three weeks to begin filling the data bases for processing. In total, in any given series , the data gathering will continue for an additional 3/three to 4/four weeks.
Once the interpretation is begun, a series of reports are prepared to present our findings in an entertaining and informative manner.