Sunday, January 25, 2009
the pattern itself has much more benefit and performance scalability is only the one of them (and in many cases not the most important one). the principle to process a flow of information in atomic, semantically closed steps is a pattern which unix itself was build on.
if you ever seen how easy even complex processes can be created based on pretty simple and atomic executables you can imagine what pipelining mean in terms of software architecture (see also http://en.wikipedia.org/wiki/Pipeline_(Unix))
unix (and nowadays many other similar infrastructures) use a abstract stream of data passed through the pipeline and therefore the flow of information is less semantic, is is only a abstract stream of data. the usage of pipelines can be improved if more common semantic is part of the overall pipeline, means each step / operation has more information of what is expected to flow through than just data.
XML pipelining is intent to use XML as information flow. this ensures that data processing / transformation can be done using less basic byte stream operation. processing can be done on declarative languages like xpath, xquery, xslt, .... which again reduce the complexity of information access and transformation.
xml itself can express endless amount of user data using different data models. a pipeline defined for a subset of data models can again reduce the complexity and therefore can improve the benefit of a particular pipeline infrastructure.
you can imagine pipelines dedicated to transform content created against a DITA data model into endless of distribution formats. the semantic of a particular pipeline step can be used in several pipes are much higher than on "general purpose XML" level.
if we're looking into the second term "SOA" within the mentioned book title we have to divide:
- pipelining to orchestrate dedicated services (macro pipelining)
because each business process can be expressed using the pipeline paradigm the implementation of SOA orchestration is suggested to do using the pipeline pattern.
therefore you have to define the sequence / control flow of services and corresponding message transformation.
languages like BPEL providing a model to express such kind of pipelines.
this layer often require persistence of pipeline state because execution of such processes can take between hours and years.
this layer often requires human steps, means not the complete pipeline can be executed by a machine without human interaction. languages like BPEL4People are extension to cover this standard requirement.
there are many frameworks out there trying to provide an easy to start infrastructure. by the way the usage and complexity of current implementation must be still not underestimated.
- pipelining to solve one or more dedicated business steps (micro pipelining)
within each business process a given amount of data confirms to specification A must be transformed into data confirms to specification B.
e.g. extracting order data from ERP system and add sum for particular product groups before the result must be render to HTML for later display to the person in charge.
those operation can of course as well defined as sequence of steps in which the input data is transformed into output data using multiple steps. those definition mainly derived from business rules.
this layer does not require human interaction and persistence and therefore can be implemented on fully automated frameworks. using XML as data backbone results in "xml pipelining which combines most advantages required for "micro pipelining"
languages like xproc, xpl , .... and corresponding implementation can be used in this area.
in general a micro pipeline transform one input of a macro pipeline step into corresponding output(s)
pipelining is one of the most powerful paradigm we faced with for todays common IT problems. but this pattern is either new not magic its more a "back to the roots....".
Sunday, January 18, 2009
Thursday, January 15, 2009
Wednesday, January 14, 2009
- from application to solution
=> customer is able to configure a solution based on provided services and corresponding orchestration and configuration
- from static deployment to dynamic deployment
=>customer is able to update a bought component via online connection. new solution feature can be added, configuration can be changed based on demand.
- from function oriented usage to process oriented usage
=>the application functions are embedded in business tasks reflecting the business process of different user group. different user groups therefore faced with different application behavior.
this in particular means that the information must designed to provide
- size on demand
each information product can be configured to fit one particular product installation. product configuration can change over time
- update on demand
information updates can be provided as fast as possible using "online channels". on the other hand content must be available without any "online channel" available.
- workflow related content mapping
information must not only map to a particular function of the product as already done with e.g. context sensitive onine helps. the information must be mapped and aligned with the workflow / buisness process the production customer intent to use the product
if we look into the information deployment process (other part of information lifecycle is part of subsequent blog posts) one of the most interesting answers to this question is the usage of RIA frameworks for that purpose.
most promising application looking at information deployment is "Adobe AIR".
- most of the required features are part of the design focus or already implemented
- Adobe has selected information products in mind (using RoboHelp for creating online help for Air).
- DITA user community start development of a corresponding plugin to create Adobe AIR help from content created using DITA architecture. (see http://tech.groups.yahoo.com/group/dita-users/message/12821
next step would be to define a gap analysis which features is missing in Adobe AIR and which buisness goal is therefore not fulfilled based on current available platform. i will do this during the next few weeks...see what the results are.
Tuesday, January 13, 2009
one domain often faced with this term is the world of xml and related "standards". lot's of them out there, some of them really stable and useful and even are interoperable (e.g. xml 1.0, xslt 1.0, xpath 1.0) itself.
by the way just using xml does not gurantee interoperablility for your data. this is only available if application behavior is addressed by a related standard. xml related standards try to achive this (e.g. svg) often fail or they are difficult to use because they missing essential features the specific user domain requires and corresponding tool vendors / application provides add them in a tool specific way. or the standard is too complex to implement a 100% complaint application (e.g. xlink).
DITA for example a new OASIS standard / information architecture to maintain mainly techdoc related features more and more faces with those issues. this standard has customization in mind, means specialization to specific needs is part of the design but there are of course still limitation and there a good reasons for those limitations in general.
the initial standard was not feature complete (means essential requirements were missed in user point of view) and therefore vendors /consultants / end user adding specific non complaint features for their specific needs which often results in missing the goal of interoperability.
why is DITA still successful?
to understand this you have two things to consider:
- keep in mind that just using xml does not solve your interoperability goals without any additional effort
- keep in mind that fully inoperable data is not always what you need. regular business cases often working well with a inoperable subset or predefined transformation on demand.
and that is the key feature if you think about organization specific information models.
Sunday, January 11, 2009
"Freebase is an open, shared database that contains structured information on millions of topics in hundreds of categories. This information is compiled from open datasets like Wikipedia, MusicBrainz, the Securities and Exchange Commission, and the CIA World Fact Book, as well as contributions from our user community."
this means that applications like freebase using already existing information and trying to add additional semantic to them based on combining and extracting information and context or in this case let user add additional semantics without modification to the source of the information.
the tool thinkbase using freebase to provide a visual graph of information and corresponding link dependencies.
- MailMark using a xml database (Mark Logic) as backbone for building the application just on XQuery
the semantic comes from information aggregation and combination. in this application no additional user interaction is possible
Saturday, January 10, 2009
on the other hand more and more companies start to creating certain type of information (user documentation, online helps, service information) using more semantic rich information architecture as provided by dita.
that opens up the success for databases with native xml read / maintain and search feature set. they are able to provided additional value to already existing information created without the knowledge of their future use.
good summery of technologies in this area are provided by Kurt Cagle "Analysis 2009: XForms and XML-enabled clients gain traction with XQuery databases"
based on my personal experience most of the wiki project's seen in reality failing silent, means they start with more or less enthusiasm but end up in either
- content silos with outdated, bad findable information chunks
- unused part of the companies intranet / IT infrastructure
- derived by only a handful contributers and users
by the way there are wiki projects out there (internet -> wikipedia, intranet) which are successful.
what makes them successful?
in my personal point of view, each successful "information process" requires at least
- definition of common information lifecycle
- who has to create which kind of information?
- which criteries must be fulfilled to define a information object as usable?
- which kind of subject matter expert must a involved for which kind of information
- and common information taxonomie
- what kind of information must be maintained
- what kind of common classification do we use
- best practices for structuring the information
- and people who create, maintain and use the information
- training is required
- advantages and usage of information must be part of common understanding => people must see personal benefit in using and maintaining the information
the most successfull wiki project Wikipedia provides the mentioned guidlines all in an open and collaborative way (http://en.wikipedia.org/wiki/Wikipedia:About#Contributing_to_Wikipedia)
one thing does not work is to setup a wiki platform and post a link to all potential users without any additional hard work.
always remember: providing information not more but not less than hard work. the more value a information must provide the more hard work is required to create them.
Tuesday, January 06, 2009
the list shows two things
- there are lot of services out there many of them can be used free of charge
- the stability of usage is a huge problem
- few listed services are moved or removed completely
- few service definition often changed without providing a sufficient version management
main reason for that success is the corresponding visibility and based on that the opportunity to get budget. the main characteristic of such terms is that there are no formal definition of what is really the essence / definition of such term but on the other hand everybody seems to have a clear and complete understanding and definition for the term / buzzword.
second characteristic of such terms is that a common trend is associated with those terms.
and last but not least the life cycle of such trends are pretty similar, approx. 1/2 year until everybody is aware of it (through publications, blog posts, articles), 1 year highest awareness incl. associated investments and at the end the trend will be replaced by next one.
that looks pretty similar to fashion industry and in my point of view there is not too much structural differences between a new fashion trend and a IT trend.
just a small list of buzzwords from the last few years:
- SOA (Service Oriented Architecture)
- xxx 2.0 (esp. Web 2.0)
- Semantic Web
- SaaS (Software as a Service)
- PaaS (Platform as a Service)
- Cloud computing
- consistent access to required information at the right time at the right place
- get rid of increasing IT complexity
- get rid of proprietary vendor driven information silos
- reduce Total cost of ownership for hosting the available information within a company
- improve collaboration between different business groups
- improve adaptability to changing business requirements
- usage of dedicated and well defined services for business automation
- pay for usage of a defined service level instead of paying for hardware / software and corresponding maintenance (what really cares is the service that automates a certain business step)
- architecture that adapts fast and controlled to change of business requirements (changed SLA) and not to changed IT requirements
Sunday, January 04, 2009
this list can help
Saturday, January 03, 2009
Warum sollte man sich im Umfeld der technischen Dokumentation mit Pipelinesprachen insbesondere mit XML basierten Pipelinesprachen beschäftigen?
Zwei Thesen zur Begründung
These 1 – Nutzen von Information
Der Nutzen von Informationseinheiten steigt mit der Anzahl der Prozesse, die auf diesen angewendet werden.
Erstellt und liefert ein Unternehmen Gebrauchsanweisungen in Papier für sich sehr stark unterscheidende Produktgruppen in nur einer Sprache, so sind die darin enthaltenen Informationen relativ einfach zu erstellen und verwalten aber der Nutzen der Information für das Unternehmen sehr gering. Die Bedeutung und der Wert der Information nimmt mit jedem zusätzlichen Nutzer der Information (zusätzliche Online Hilfe, Sprachvarianten, Produktvarianten, Nutzung der Information in Produktschnittstelle....) zu.
Zur Nutzenmaximierung muss somit die Anzahl der Verwender einer Information innerhalb der Anwendungsfallspezifischen Rahmenbedingungen maximiert werden. Jede Verwendung von Information basiert auf der Etablierung eines Prozesses zur Verwendung dieser(Erstellung eines Handlungsanleitenden Textes in deutsch, Wiederverwenden von dedizierten Informationsbausteinen einer Sprache, Erstellung einer Variante innerhalb eines bestehenden Informationsbausteines, Publikation einer Online Hilfe, ....). Da jeder Prozess die Komplexität des Gesamtprozesses erhöht steigt der Aufwand über den Gesamtprozess des Informationslebenszyklus mit jedem zusätzlichen Prozess, d.h. mit jeder zusätzlichen Verwendung der Information.
These 2 – Prozesse auf Informationen
Prozesse auf Informationseinheiten sind zum überwiegenden Teil innerhalb eines Unternehmens und sogar Unternehmensübergreifend identisch. Dies bedeutet im Umkehrschluss, das sich die Branche im Umfeld der technischen Dokumentation mit den Auswirkungen von „marginalen“ Unterschieden befasst. Die Unterschiede liegen im Wesentlichen in unterschiedlichen Informationsquellen (Art, Ablage, Format, ....) und den zu liefernden Informationsprodukten(Unternehmensspezifische Styleguides, zu liefernde Formate, ....).
Die Vielzahl von individuellen und spezifischen Prozessen ist weitgehend der fehlenden Zerlegung der Prozesse und der fehlenden übergreifenden Standardisierung von Prozessbestandteilen zuzuschreiben.
Die notwendigen Informationsbestandteile für jedes Kundendokument müssen anhand variabler Eingangsparameter identifiziert und bereitgestellt und schließlich zusammengebaut werden. Das Kundendokument wird mit angereichert, d.h. erhält einen oder mehrere Index mit definierten Anforderungen, ein Glossar, TOC, usw. Schlussendlich erfolgt eine Überführung nach HTML, PDF oder andere Formate. Eine weitere Zergliederung dieser Teilschritte führt für jeden dieser Schritte zu einem grossteil identischer und einer kleinen Anzahl spezifischer Schritte.
Schlüssel zum Erfolg
Um den Nutzen seiner Information nachhaltig zu maximieren muss dies mit einer konsequenten Zerlegung der Prozesse in ihre atomaren Bestandteile und somit der maximalen Nutzung vorhandener Prozessbestandteile (und das zugrunde liegende Wissen darüber) erfolgen. Somit kann der Aufwand und die Komplexität für die Nutzung von Informationseinheiten im Verhältnis zum Nutzen gering gehalten werden.
"SMILA (SeMantic Information Logistics Architecture) is an extensible framework
for building search solutions to access unstructured information in the enterprise.
Besides providing essential infrastructure components and services, SMILA also delivers
ready-to-use add-on components, like connectors to most relevant data sources."
initiated by German based company empolis this project seems to be promising in solving one common problem while dealing with todays information overflow:
- identification and access to information relevant for a given business task / process
- integration of "unstructured" information in corresponding business process