I need tech advice.
Dec. 20th, 2011 10:04 amI have large CSVs I need to analyze. Some have over 100,000 lines, some have over 100 columns. So a spreadsheet is not going to cut it. The first rows are always column titles. I need to choose a database back-end and preferably good GUI front end (unless they come together, which I'm now very doubtful of.) I'd spend $40 for something that gets the job done. I'd spend $80 for something that has everything I want with a good, intuitive UI and documentation. But free is better if it's not too much of a hassle.
I'm running a Windows laptop with 4GB of memory. I don't need client-server, multiple users, optimization for read only access, or anything that web developers choose a database for. But I do need to crunch a lot of data without using more memory than I have. So it seems SQLite > PostgreSQL > MySQL. But I also need a facility to import the CSVs and with those 100+ columns, I want a wizard that will create the table and use the column names in the first row. It seems there are only two SQLite tools that do that, except one has that particular functionality currently broken and the other is very clunky and costs more than I want to pay for clunky.
LibreOffice Base's documentation (at least) regarding importing CSVs is also on the fritz and near as I can tell, the functionality is also totally missing after a reorganization of the UI. But OpenOffice/LibreOffice (if it works) is a 4th option for the back end, I suppose.
I'm running a Windows laptop with 4GB of memory. I don't need client-server, multiple users, optimization for read only access, or anything that web developers choose a database for. But I do need to crunch a lot of data without using more memory than I have. So it seems SQLite > PostgreSQL > MySQL. But I also need a facility to import the CSVs and with those 100+ columns, I want a wizard that will create the table and use the column names in the first row. It seems there are only two SQLite tools that do that, except one has that particular functionality currently broken and the other is very clunky and costs more than I want to pay for clunky.
LibreOffice Base's documentation (at least) regarding importing CSVs is also on the fritz and near as I can tell, the functionality is also totally missing after a reorganization of the UI. But OpenOffice/LibreOffice (if it works) is a 4th option for the back end, I suppose.
Learn ...
Date: 2011-12-20 03:12 pm (UTC)Re: Learn ...
Date: 2011-12-20 03:35 pm (UTC)How much ...
Date: 2011-12-20 05:36 pm (UTC)If you are mostly interested in focusing on the calculations themselves, then I would prefer the using python to express the calculation, rather than SQL to re-express the data.
Re: How much ...
Date: 2011-12-20 08:56 pm (UTC)Someone at Google+ saidwhich looks good to me, but you're not the only person to suggest Python, either. If Python can handle several of theses tables simultaneously in memory on my machine, then it might be worth learning a few lines of it. I know some SQL and wrote a proc that people kept trying to replace for years, except whatever the wrote ran more slowly. But I never used it much and it's been many years since I had the month where it was 20% of my job.
I'm good ...
Date: 2011-12-20 09:48 pm (UTC)Python is a useful language to have in the toolbox, but if this is a 'production' job and not a learning opportunity, use what you are fluent in.
Re: I'm good ...
Date: 2011-12-20 10:02 pm (UTC)This is data once used to determine other constants, will not be part of the application. It is also what is provided by other people. I think keeping the CSVs as backup makes sense, because if I accidentally do something to my database, I still have the original data in the original condition. The databse will have both the original and the derived data.