Monday, April 23, 2012

Open Source: data management and transformation library

Sumbled upon the following post http://flowingdata.com/2012/04/23/miso-an-open-source-toolkit-for-data-visualisation/

Relational data (or data can be stored in a table or matrix) is "old style" but still an an common use-case in todays web applications.

A new JavaScript library called "Miso Project" starts to implement components that simplifies the management and transformation of this kind of data (and will be extended with visualization use-case). This means that you are easy manage relational data on client side which can be very handy in certain use-cases. So its like a client side database with corresponding query syntax.

One of the most common patterns we've found while building JavaScript-based interactive content is the need to handle a variety of data sources such as JSON files, CSVs, remote APIs and Google Spreadsheets. Dataset simplifies this part of the process by providing a set of powerful tools to import those sources and work with the data. Once data is in a Dataset, it becomes simple to select, group, and calculate properties of, the data. Additionally, Dataset makes it easy to work with real-time and changing data, which pose one of the more complex challenges to data visualization work.

In case you have to develop a e.g. simple, standalone HTML application without a permanent server backend this library will help you without adding too much complexity in your (implementation) infrastructure.

Sunday, April 08, 2012

QR Codes in your documents

QR codes are one method to ease the information exchange between classical medias and mobile devices (common usage direct link to the corresponding web page in paper based catalogs, manuals, ....).

But how to create those QR codes without too much complexity?

Google Chart API provides:
  • chart wizard to create QR codes and corresponding styling
  • infographics API to create static images based on posted chart definition (URLs)
viola. You now have a easy to use backend for creation of QR codes using e.g XSL-T.

The following code is taken from "QR Codes in DTIA Output" which shows how to create QR codes for PDF output of the DITA-OT using XSL-FO:

<!-- Insert QR code -->
<xsl:template match="*[contains(@class,' topic/xref ')]
                      [contains(@outputclass, 'qrcode')]">
<fo:external-graphic>
<xsl:attribute name="src"><xsl:value-of select="concat('https://chart.
googleapis.com/chart?cht=qr&amp;chs=100x100&amp;chl=', .)"/>
</xsl:attribute>
</fo:external-graphic>
</xsl:template>
 see: http://ditanauts.org/2012/03/14/qr-codes-in-dita-ouput/

Sample (code: <img src="http://chart.apis.google.com/chart?chs=200x100&cht=qr&chl=http%3A%2F%2Ftrent-intovalue.blogspot.de%2F2006%2F05%2Ftrent-definition.html" width="200" height="100" alt="" />):

QR Code Sample


how to preserve the value of big data over time....

Today the buzzword "Big data" getting more and more popular. Nice label for a common statement "the amount of data and the valuable usage getting important".

There are millions of information and products out there which promise to help you storing and analyzing those data. But one of the major issues with data is not current usage it is the maintenance of the information over time.

The "Web of Data" is one common example. It is the biggest data store we currently faced with. Pretty simple to access and analyze. So far so good. But there is one maintenance of this data (required?). Collect 100 links to resources on the web today. than 24 month later try access them...how many of those links still work, and if they work the resulting information still using the same semantic as it was once you build up the link?

The "Web of Data" currently decided not to maintain data just provide them now, enrich them and just replace them with different semantic...The Web Wayback machine (http://archive.org/web/web.php) is an approach to help individual users to keep their individual value of data for some scenarios.

Now think about your cooperate information you collect right now. The speed and adaption rate of this data will increase and new demands to enrich the data  will appear. Do you ever thought about how you ensure that all that data can be adapt to new needs? Based on my personal experience at least more than 60 % of the over all project costs are related to data migration in IT project dealing with information in a certain domain of the organization. Those costs are related to adapting data to the new tools which maintains the data, converting data between different data models and formats and ensure the quality of the data and their usage in existing business processes.

What does this mean for each IT project dealing with data?

  • Initial load is important
    You always have to define how to get the data you need for the initial start (and not only during the regular operation of your business process) and how to verify that this data is valid for your future need.
  • Expandability of your data might be important
    You can use static data models and tools (e.g. classical relational data models) compared to more flexible approaches like typed graphs of data where content using different models can simpler coexist.
  • Adaptability of your IT systems might be important
    What happens to your existing data once the model will be extended, changed. Do not only take care of the data itself also take into account the relation to the data. Today you only access a specific level of your data few years later some use-case requires you to access the individual step or introduce an additional level not yet exists.
  • Ensure the maintenance of your data.
    Do not "use" any data which you do not have any value in your primary business process. The usage of information requires the correctness of data. Your data will never be correct if the process creating this data does not have any value out of the data itself. This means that the data will be simple partially incorrect, incomplete.
It is and will be the most expensive IT task in your organization "how to preserve the value of big data over time...."