Don’t Miss the Latest

When you subscribe to the d-wise blog, you’ll get the latest
industry trends, news and tips right in your inbox!

No, thanks. I don‘t need to stay current.

Big Data: "One Size Does Not Fit All!”

big-dataThis blog post elaborates on the 3-Big Data solutions from SAS®. d-Wise Technologies can guide you through an assessment process to determine if Big Data is for you, and if it is, we can use our SAS implementation expertise to get you up and running rapidly.

When used in the context of using SAS technologies, the term "big data" means one of two things: handling large volumes of data and performing statistical analysis in near real time, or distributing a large processing problem over multiple servers. SAS offers a grid solution for the latter and two solutions for former: In-Database and In-Memory. Here is a brief description of each to help you decide which solution is best for your organization.

The In-Database solution means handing off computing work to the underlying DBMS and returning summarized results. SAS provides direct support for a half-dozen SASBase procedures. In addition, the solution supports custom UDFs for scoring acceleration and statistical procedures. In short, SAS passes the guts of the procedures to the DBMS in the form of complex SQL queries and the DBMS returns the result sets to SAS for further processing.

Examples of external database management systems supported by the SAS In-Database solution are Teradata (http://teradata.com), Gleenplum (http://www.greenplum.com )  Exadata (http://www.oracle.com/us/products/database/exadata/overview/index.html), and Netezza (http://www-01.ibm.com/software/data/netezza).

The In-Memory solution performs work inside the RAM resident on a multi-core and multi-processor compute server. In particular, SAS Visual Analytics Explorer (VAE) and High Performance Analytics (HPA) use Hadoop as high performance file storage to load large data sets into a SAS LASR server running on blade servers.  Under this topology, performance is nearly linearly scalable to the memory and CPU resources available.  Although this solution currently requires special hardware, in the near future it will run on standard multi-core SMP PCs.

The Grid solution distributes the computing workload associated with SAS/BI products across different servers using SAS/tkGrid. This solution uses a shared disk store as the central repository that all grid nodes access. For truly large data, this data store can become a bottle neck. However, for CPU-bound SAS jobs, the solution is ideal because it allows load balancing, failover, and is easily scalable - all you do is add another node.

Are you ready for Big Data?

Find out more about d-Wise’s Big Data Readiness Assessment the latest in our suite of strategic assessment and analysis services. Much of what we go through, we have learned while performing similar data-oriented assessments in the healthcare and drug development processes, specifically the highly regulated clinical trial process where we deal with highly sensitive and proprietary data.

big-data

 

Share

About the Author

Leave a Comment