Scilab Home Page | Wiki | Bug Tracker | Forge | Mailing List Archives | Scilab Online Help | File Exchange
ATOMS : csv_readwrite details
Please login or create an account

csv_readwrite

fast dedicated scilab functions to read and write csv files
(1596 downloads for this version - 23824 downloads for all versions)
Details
Version
0.4
A more recent valid version with binaries for Scilab 5.3 exists: 0.5
Authors
Allan CORNET
Michael Baudin
Owner Organization
Scilab - DIGITEO
Maintainers
Michael Baudin
Allan Cornet
Category
License
Creation Date
May 4, 2011
Source created on
Scilab 5.3.x
Binaries available on
Scilab 5.3.x:
Linux 32-bit Windows 32-bit Windows 64-bit MacOSX Linux 64-bit
Install command
--> atomsInstall("csv_readwrite")
Description
            Purpose
-------

The purpose of this module is to read and write 
Comma Separated Values (CSV) data files. 
The goal of this toolbox is to improve the flexibility, consistency and speed of

CSV reading and writing with respect to Scilab built-in 
write_csv and read_csv functions. 

On some large data files, we observed a 100x improvement of the 
speed.

Features
--------

 * csv_default : Get or set defaults for csv files.
 * csv_getToolboxPath : Returns the path to the current module.
 * csv_read : Read comma-separated value file
 * csv_stringtodouble : Convert a matrix of strings to a matrix of doubles.
 * csv_textscan : Read comma-separated value in a matrix of strings
 * csv_write : Write comma-separated value file



To compare speed:

with optimized functions:
stacksize('max');
M = ones(1000, 1000);
tic();
csv_write(M, TMPDIR + "/csv_write_1.csv");
toc()

tic();
r = csv_read(TMPDIR + "/csv_write_1.csv")
toc()


with default scilab functions (be patient):
stacksize('max');
M = ones(1000, 1000);
tic();
write_csv(M, TMPDIR + "/csv_write_1.csv");
toc()

tic();
r = read_csv(TMPDIR + "/csv_write_1.csv")
toc()
            
Files (6)
[149.65 kB]
Source code archive

[121.35 kB]
Linux 32-bit binary for Scilab 5.3.x
Mise à jour du fichier de Description.
[542.95 kB]
Windows 32-bit binary for Scilab 5.3.x
Mise à jour du fichier de Description.
[557.83 kB]
Windows 64-bit binary for Scilab 5.3.x
Mise à jour du fichier de Description.
[113.17 kB]
MacOSX binary for Scilab 5.3.x
MacOSX version
Automatically generated by the ATOMS compilation chain

[125.46 kB]
Linux 64-bit binary for Scilab 5.3.x
Mise à jour du fichier de Description.
News (0)
Comments (5)     Leave a comment 
Comment from Allan Cornet -- May 4, 2011, 04:25:33 PM    
csv_readwrite (0.4)
   * This version requires Scilab 5.3.2
   * csv_stringtodouble manages %i format for complex numbers.
   * csv_read manages regexp to remove comments in files.
   * Fixed ticket #299: extends format to digit in csv_default and csv_write
   * Fixed ticket #294: default conversion moved as 'double'
   * Fixed ticket #270: added licence header to all files.
   * Added documentation for csv_getToolboxPath function.
   * Fixed ticket #274: The help and tests of csv_default were wrong.
   * Fixed ticket #276: The output csv_default() calling sequence
     was inconsistent with the names of the fields.
   * Fixed ticket #275: The default precision was insufficient.
   * Fixed ticket #277: The help of csv_write was wrong.
   * Fixed ticket #245: csv_stringtodouble failed on some special cases.
   * Fixed ticket #242: The description of csv_write in the help was wrong.
   * Fixed ticket #194: csv_read may fail on large files.
   * Added examples in csv_read.
   * Added examples in csv_write.
   * Improved the csv_textscan help.
   * Improved the help of csv_stringtodouble.
   * Separated tests for csv_read and csv_write.
   * Added tests to check write-read cycles.
   * Added tests for csv_write and the comment option.
   * Improved the unit test for csv_textscan.
   * Fixed ticket #281: The substitute option did not work in csv_read.
   * Fixed ticket #298: The text_scan function did not extract
     the correct range.
   * Fixed ticket #350: The csv_stringtodouble function always returned
     complex doubles.
   * Fixed ticket #351: The csv_read function always returns complex entries.
   * Fixed ticket #352: The csv_textscan function always returned complex matrices.
   * Fixed ticket #297: The csv_textscan function did not take range as a row matrix.
   * Fixed ticket #353: The csv_read function did not manage the range.
   * Added non regression test for ticket #360.
Comment from Guillaume Azema -- July 29, 2011, 03:36:53 PM    
Hello,

Thank you for this toolbox which seems powerful.

I have been testing it, and there are a few things that are not working:

  - I tried to call the function as follow: 
M = rand(3,3);
csv_write(M, "test.txt", precision="%.3g");
--> not working.
I have to do: csv_write(M, "test.txt", [], [], "%.3g");

  - I tried to write a column file with different formats for each column, with no
success.
csv_write(M, "test.txt", [] , [] "%.3g %.4g %.5g");
I found a workaround using:
csv_write(msprintf("%.3g %.4g %.5g\n", M), "test.txt");
But i dont know if using the msprintf function is efficient..

So i decided to try to benchmark different methods to write csv files:
path = "D:\DONNEES\PROGRAM\Celestlab\csv_readwrite_0.4\";

//path = "H:\LOGICIELS\SCILAB\DEVELOPPEMENT\tmp\";

N = 80000;
M = rand(N,3);

tic;
csv_write(M, path + "test1.txt", precision="%.3g");
toc

tic;
fprintfMat(path + "test2.txt" , M, "%.3g");
toc

tic;
csv_write(msprintf("%.3g %.3g %.3g\n", M), path + "test3.txt");
toc

tic;
mputl(msprintf("%.3g %.3g %.3g\n", M), path + "test4.txt");
toc

fd = mopen(path + "test5.txt","w");
tic;
mfprintf(fd,"%.3g %.3g %.3g\n", M);
toc
mclose(fd);


Results on a local hard drive :
 ans  =
 
    0.625  
 ans  =
 
    0.297  
 ans  =
 
    0.516  
 ans  =
 
    0.453  
 ans  =
 
    0.625  

Results on a network hard drive :
 ans  =
 
    3.359  
 ans  =
 
    1.078  
 ans  =
 
    1.265  
 ans  =
 
    1.297  
 ans  =
 
    119.481

You can see that for some reason, in that case, csv_write has 3 a times lower speed.
And mfprintf has a whooping 100 times lower speed than all other methods.
(Already reported in bug http://bugzilla.scilab.org/show_bug.cgi?id=8262

Any idea why that is?
Comment from Allan Cornet -- August 1, 2011, 08:01:12 AM    
Thanks for your tests

Please report trouble here: http://forge.scilab.org/index.php/p/csv-readwrite/issues/

A network drive will be always slower that a local hard drive 3x is a good speed

Allan
Comment from Guillaume Azema -- August 1, 2011, 09:32:47 AM    
I meant it's 3 times slower than another way of writing files (fprintfMat, mputl) on a
network drive.
But on a local drive, it's only slightly slower.
Comment from Allan Cornet -- August 1, 2011, 09:47:55 AM    
Please open a ticket issue it is not the plase here !!!

fprinfMat and csv_write uses same internal functions then there is no reason ...

Allan
Leave a comment
You must register and log in before leaving a comment.
Email notifications
Send me email when this toolbox has changes, new files or a new release.
You must register and log in before setting up notifications.