re: Amazon Redshift Leaves On-Premises Opening, Says ParAccel
Amazon finally responded to my questions today, but no interviews are possible until the release. Here's what the AWS had to say about loading Redshift & where to handle the BI:
What are the options for getting data up into Redshift, and what latency is introduced?
For data that is already in AWS, we offer direct, parallel loading from Amazon S3, Amazon DynamoDB. We also enable easy integration with other data sources within AWS by way of the AWS Data Pipeline.
For data that is on premise, Amazon S3 is a great option to get data to Amazon Redshift. Choices here include pushing multiple files in parallel to Amazon S3 across a network, doing so over an AWS Direct Connect link to ensure dedicated bandwidth, using AWS Storage Gateway, or using import/export where you ship drives to AWS, removing bandwidth considerations. S3 has a high network cross-section to absorb input traffic, so latency really turns into a question of how many threads can be used to push data. Generally, this can be set up to saturate on network bandwidth out of the source node and, as such, should not require meaningful additional time relative to moving data within an internal network.
AWS also has numerous partners who can help customers with their on-premise to AWS data movement strategy and Amazon Redshift will be developing integrations with leading ETL vendors to make this process even simpler for customers than it is today.
What are the options for where and how to do analysis (BI) against Redshift?
Customers can use standard PostgreSQL drivers over ODBC/JDBC connections to connect their existing SQL-based BI tools to Amazon Redshift. MicroStrategy and Jaspersoft have already certified Amazon Redshift.
Customers may choose to run their BI software in cloud or on premise based on their preferences. Performance is very dependent on the specifics of queries, rendering, data set, result set cache sharing, network traffic and concurrency, and likely to vary from one customer to another. While network traffic between the BI node and Amazon Redshift may sometimes be a factor arguing for collocation in the cloud, there is little data to suggest that this will dominate other considerations. For example, in our booth at re: Invent, MicroStrategy demonstrated integration from Amazon Redshift in the AWS cloud to MicroStrategy in their own private cloud to a tablet in our booth with very fast response times throughout.