I need to conduct a large amount of data analysis on database. Could anyone recommend an interactive application for data analysis?
The requirements are:
1. Able to cope with the unexpected requirement rapidly.
2. Able to perform further computations on results interactively.
3. Easy to confront even a large amount of complex computations
What would you great expert recommend?
Thanks in advance.
Matlab, Wolfram, and other math packages will be suitable for this. There are also such things as SAS and SPSS (now owned by IBM I think). These are all very expensive, but what you are asking for is not simple stuff. I deal with real-time data analytics using many engineering techniques including things like Kalman filters, statistical sampling, monte carlo methods etc. in order to extract meaningful data from masses of data (millions of data points per day). We are in the process of working up machine learning techniques (I'm starting to learn the R language for that), including neural networks, fuzzy logic, and simulated anealing methods in order to better understand how our systems behave (supporting millions of concurrent users world-wide). So, what I am saying is that there is no quick path to this. Complexity != simple, or easy! Also, if you don't understand the underlying math, you will never be able to know if you are getting valid information out of your data sets.
FWIW, I spent a few years writing real-time risk analysis software for the options trading industry in Chicago, and that (lots of 3rd order differential equations applied to massive data flows) was simple compared to what I am doing now!
thank you for your reply. You had given me a huge information content : -) , and I think you must be professional in mass data analysis. I'm glad to meet you, a expert.
I know a little about SPSS and SASS, It maybe too expensive for me and too difficult to understand.
BTW, It seems more fit for "tell me what happend", but not fit for "let me find the reason", I mean It hides many detail. ( I guessed so, maybe not correct)
some of my computation goals like:
a. to select out the 10 categories of best sellers
b. as a further computation on the basis of result from a., to select out the top 20 products from each category,
c. as a further comparison with that of the last year based on the result from a., to select out the newly-appeared and the disappeared categories on the list of this year.
could you give me a more narrow range according to the samples above, and not too expensive.
BTW, I know well about Excel and a little about VBA, but Excel is not a good choice because the data is updated daily in my case while Excel is fit for the fixed data
All my goals are commercial and functional, There maybe no arithmetic like "linear regression"
I'm so grateful for your help
I'm willing to pay for the right one, including SPSS if it is useful.
I konw It's hard to image how to give a suggestion and make a choice without a discussible example.
So, I will give a discussible example below (with my friends's help, It's some difficult) ,please give me a suggestion abou how to use SPSS or any other solutions to solve the same problem( more details are ideal).
for example, I needs to compute the product whose annual sales values are all among the top 100
MSSQL data structure( sales table's fields): productID, time, value
SQL solution is as below:
WITH sales1 AS ( SELECT productID, YEAR(time) AS year, SUM(value) AS value1 FROM sales GROUP BY productID, YEAR(time) ) SELECT productID FROM ( SELECT productID FROM ( SELECT productID,RANK() OVER(PARTITION BY year ORDER BY value1 DESC) rankorder FROM sales1 ) T1 WHERE rankorder<=100) T2 GROUP BY productID HAVING COUNT(*)=(SELECT COUNT(DISTINCT year ) FROM sales1)
any suggestion? plz