Scilab Home Page | Wiki | Bug Tracker | Forge | Mailing List Archives | Scilab Online Help | File Exchange
ATOMS : csv_readwrite details
Please login or create an account

csv_readwrite

(2008/9294 downloads)
fast dedicated scilab functions to read and write csv files
Details
Version
0.4-1
Most recent version: 0.6.1
Author(s)
Allan CORNET
Michael Baudin
Entity
Scilab - DIGITEO
Package maintainers
Michael Baudin
Allan Cornet
Category
License
Supported Scilab Versions
>= 5.3
Creation Date
4th of May 2011
ATOMS packaging system
Available on
How To Install
atomsInstall('csv_readwrite')
Description
Purpose ------- The purpose of this module is to read and write Comma Separated Values (CSV) data files. The goal of this toolbox is to improve the flexibility, consistency and speed of CSV reading and writing with respect to Scilab built-in write_csv and read_csv functions. On some large data files, we observed a 100x improvement of the speed. Features -------- * csv_default : Get or set defaults for csv files. * csv_getToolboxPath : Returns the path to the current module. * csv_read : Read comma-separated value file * csv_stringtodouble : Convert a matrix of strings to a matrix of doubles. * csv_textscan : Read comma-separated value in a matrix of strings * csv_write : Write comma-separated value file To compare speed: with optimized functions: stacksize('max'); M = ones(1000, 1000); tic(); csv_write(M, TMPDIR + "/csv_write_1.csv"); toc() tic(); r = csv_read(TMPDIR + "/csv_write_1.csv") toc() with default scilab functions (be patient): stacksize('max'); M = ones(1000, 1000); tic(); write_csv(M, TMPDIR + "/csv_write_1.csv"); toc() tic(); r = read_csv(TMPDIR + "/csv_write_1.csv") toc()
Files (6)
[530.40 Ko] csv_readwrite_0.4-1.bin.windows.zip
Windows version (i686)
Automatically generated by the ATOMS compilation chain

[544.93 Ko] csv_readwrite_0.4-1.bin.x64.windows.zip
Windows version (x64)
Automatically generated by the ATOMS compilation chain

[121.13 Ko] csv_readwrite_0.4-1.bin.x86_64.linux.tar.gz
Linux version (x86_64)
Automatically generated by the ATOMS compilation chain

[117.69 Ko] csv_readwrite_0.4-1.bin.i686.linux.tar.gz
Linux version (i686)
Automatically generated by the ATOMS compilation chain

[110.52 Ko] csv_readwrite_0.4-1.bin.x86_64.darwin.tar.gz
MacOSX version
Automatically generated by the ATOMS compilation chain

News (0)
Comments (5)
    Leave a comment 
Comment from Allan Cornet -- 4th of May 2011, 04:25:33 PM    
csv_readwrite (0.4)
   * This version requires Scilab 5.3.2
   * csv_stringtodouble manages %i format for complex numbers.
   * csv_read manages regexp to remove comments in files.
   * Fixed ticket #299: extends format to digit in csv_default and csv_write
   * Fixed ticket #294: default conversion moved as 'double'
   * Fixed ticket #270: added licence header to all files.
   * Added documentation for csv_getToolboxPath function.
   * Fixed ticket #274: The help and tests of csv_default were wrong.
   * Fixed ticket #276: The output csv_default() calling sequence
     was inconsistent with the names of the fields.
   * Fixed ticket #275: The default precision was insufficient.
   * Fixed ticket #277: The help of csv_write was wrong.
   * Fixed ticket #245: csv_stringtodouble failed on some special cases.
   * Fixed ticket #242: The description of csv_write in the help was wrong.
   * Fixed ticket #194: csv_read may fail on large files.
   * Added examples in csv_read.
   * Added examples in csv_write.
   * Improved the csv_textscan help.
   * Improved the help of csv_stringtodouble.
   * Separated tests for csv_read and csv_write.
   * Added tests to check write-read cycles.
   * Added tests for csv_write and the comment option.
   * Improved the unit test for csv_textscan.
   * Fixed ticket #281: The substitute option did not work in csv_read.
   * Fixed ticket #298: The text_scan function did not extract
     the correct range.
   * Fixed ticket #350: The csv_stringtodouble function always returned
     complex doubles.
   * Fixed ticket #351: The csv_read function always returns complex entries.
   * Fixed ticket #352: The csv_textscan function always returned complex matrices.
   * Fixed ticket #297: The csv_textscan function did not take range as a row matrix.
   * Fixed ticket #353: The csv_read function did not manage the range.
   * Added non regression test for ticket #360.
Comment from Guillaume Azema -- 29th of July 2011, 03:36:53 PM    
Hello,

Thank you for this toolbox which seems powerful.

I have been testing it, and there are a few things that are not working:

  - I tried to call the function as follow: 
M = rand(3,3);
csv_write(M, "test.txt", precision="%.3g");
--> not working.
I have to do: csv_write(M, "test.txt", [], [], "%.3g");

  - I tried to write a column file with different formats for each column, with no
success.
csv_write(M, "test.txt", [] , [] "%.3g %.4g %.5g");
I found a workaround using:
csv_write(msprintf("%.3g %.4g %.5g\n", M), "test.txt");
But i dont know if using the msprintf function is efficient..

So i decided to try to benchmark different methods to write csv files:
path = "D:\DONNEES\PROGRAM\Celestlab\csv_readwrite_0.4\";

//path = "H:\LOGICIELS\SCILAB\DEVELOPPEMENT\tmp\";

N = 80000;
M = rand(N,3);

tic;
csv_write(M, path + "test1.txt", precision="%.3g");
toc

tic;
fprintfMat(path + "test2.txt" , M, "%.3g");
toc

tic;
csv_write(msprintf("%.3g %.3g %.3g\n", M), path + "test3.txt");
toc

tic;
mputl(msprintf("%.3g %.3g %.3g\n", M), path + "test4.txt");
toc

fd = mopen(path + "test5.txt","w");
tic;
mfprintf(fd,"%.3g %.3g %.3g\n", M);
toc
mclose(fd);


Results on a local hard drive :
 ans  =
 
    0.625  
 ans  =
 
    0.297  
 ans  =
 
    0.516  
 ans  =
 
    0.453  
 ans  =
 
    0.625  

Results on a network hard drive :
 ans  =
 
    3.359  
 ans  =
 
    1.078  
 ans  =
 
    1.265  
 ans  =
 
    1.297  
 ans  =
 
    119.481

You can see that for some reason, in that case, csv_write has 3 a times lower speed.
And mfprintf has a whooping 100 times lower speed than all other methods.
(Already reported in bug http://bugzilla.scilab.org/show_bug.cgi?id=8262

Any idea why that is?
Comment from Allan Cornet -- 1st of August 2011, 08:01:12 AM    
Thanks for your tests

Please report trouble here: http://forge.scilab.org/index.php/p/csv-readwrite/issues/

A network drive will be always slower that a local hard drive 3x is a good speed

Allan
Comment from Guillaume Azema -- 1st of August 2011, 09:32:47 AM    
I meant it's 3 times slower than another way of writing files (fprintfMat, mputl) on a
network drive.
But on a local drive, it's only slightly slower.
Comment from Allan Cornet -- 1st of August 2011, 09:47:55 AM    
Please open a ticket issue it is not the plase here !!!

fprinfMat and csv_write uses same internal functions then there is no reason ...

Allan
Leave a comment
You need to log in before you can leave a comment.